CN106227800B - Storage method and management system for highly-associated big data - Google Patents

Storage method and management system for highly-associated big data Download PDF

Info

Publication number
CN106227800B
CN106227800B CN201610579013.8A CN201610579013A CN106227800B CN 106227800 B CN106227800 B CN 106227800B CN 201610579013 A CN201610579013 A CN 201610579013A CN 106227800 B CN106227800 B CN 106227800B
Authority
CN
China
Prior art keywords
data
entity
model
entities
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610579013.8A
Other languages
Chinese (zh)
Other versions
CN106227800A (en
Inventor
李�昊
张敏
付艳艳
惠榛
陈震宇
张宗福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN201610579013.8A priority Critical patent/CN106227800B/en
Publication of CN106227800A publication Critical patent/CN106227800A/en
Application granted granted Critical
Publication of CN106227800B publication Critical patent/CN106227800B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2219Large Object storage; Management thereof

Abstract

The invention discloses a storage method and a management system of highly-associated big data. The system comprises a storage module and a unified data management module; the storage module comprises a Hashmap model used for storing the contents of the data entities, a relation model used for storing the attributes of the data entities and a graph data model used for storing the association relation between the data entities; each data entity sets an entity type and a unique ID number, and the association relationship is established between the attribute and the content of the same data entity through the ID number of the data entity; the unified data management module is used for adding, deleting, updating and inquiring the incidence relation, the attribute and the data content of the data entity in the storage module. The invention can realize the storage and management of the large data set and simultaneously can support high-efficiency associated query analysis.

Description

Storage method and management system for highly-associated big data
Technical Field
The invention belongs to the field of big data storage, and particularly relates to a storage method and a management system of highly-associated big data.
Background
In the big data era, enterprises or organizations increasingly attach importance to the value of data, and gradually start to collect, store, analyze and utilize big data. In these large datasets, associations between data are ubiquitous. Particularly, in application scenes such as social network big data and medical big data which are closely related to individual users, the data objects are highly correlated. The complex relationships between the data in these highly correlated data sets tend to be of great analytical value. For example, a friendship between social users, an association between a drug and a patient, and so forth. Meanwhile, these highly correlated large data sets are also characterized by large scale, high speed and diversity, so in order to analyze and utilize them better, research on efficient storage and management of such data sets is needed.
In order to meet the storage requirement of big data, a structured relational database is generally used for storing structured data in a targeted manner, and a NoSQL database is used for storing semi-structured or unstructured data. Among these storage methods, both relational databases and most NoSQL databases (e.g., key-value databases, document databases, column databases) are very inefficient for the storage and management of associations between data. All the data are stored in unrelated records, values, documents and columns, and when the query and analysis of the relevance among the data are required, additional mechanisms such as indexes, foreign keys and table connection are required to be adopted for implementation.
In contrast, graph databases are dedicated to storing and querying the links between data, and the efficiency of multi-level associative query and reverse query is far higher than that of relational databases and other NoSQL databases. Multi-tier associative queries refer to making multiple tiers of queries over connections between data. For example, querying "friends of a person" is to perform multiple layers of queries on the relationships of friends. And the reverse query means that the query direction is opposite to the index building direction. For example, there is an index "patient- > drugs", it is very fast to query which drugs a certain patient has bought, but it is much less efficient to query in reverse which patients have bought a certain drug. Even if an index of "medicine- > patient" is established in order to deal with the above reverse query, in the face of the query "which patients buy medicine a and medicine B", it is still necessary to perform multiple queries, which is inefficient. Graph databases can solve the query problem of the incidence relation between the data of interest. However, graph databases do not satisfy the large-scale and diverse storage characteristics of large data sets.
Currently, some methods have emerged to store large data sets using a hybrid NoSQL database and file system. The methods respectively store different data in a large data set in a proper database or a file system according to respective characteristics of the different data. For example, a structured relational database is used to store structured data, and a NoSQL database is used to store semi-structured or unstructured data. However, due to the lack of consideration for complex association among data in a large highly-associated data set and the absence of a data model, a storage method and a query method matched with the complex association, the data sets stored in different databases or file systems are often independent of each other, so that a large amount of redundancy exists, and the complex association query is inefficient.
In summary, an efficient storage and management method for a large data set with a high degree of association is still lacking in the field of large data storage at present, and the storage requirement of large data and the efficient analysis requirement of complex association in the large data cannot be met at the same time.
Disclosure of Invention
In view of the above technical problems, an object of the present invention is to provide a storage method and a management system for highly-associated big data, which can realize storage and management of such big data sets and support efficient association query analysis.
The basic principle of the technology is as follows: a mixed data model taking the incidence relation between data entities as a core is provided based on a graph data model, a relation model and a Hashmap model, namely, the incidence relation between the data entities is described by adopting the graph data model, the structural attribute of the data entities is described by adopting the relation model, and the content of the entities is described by adopting the Hashmap model; respectively adopting a graph database, a relational database, a key value database or a distributed file system to realize the data model; and optimizing the priority order of the relation query, the attribute query and the content query of the data entity by adopting a proper strategy to improve the query retrieval efficiency of the data.
Specifically, in order to achieve the technical purpose, the invention adopts the following technical scheme:
a storage method and a management system for highly-associated big data comprise the following steps:
1) establishing a mixed data model aiming at the highly-associated big data set;
further, the mixed data model comprises a graph data model, a relation model and a Hashmap model.
Further, the graph data model refers to a data model in which data entities are represented by nodes, and the edges connecting the nodes represent the connections between the data entities.
Further, the relational model refers to a normalized model adopted in a relational database, that is, a data model representing entities and relations between the entities in a two-dimensional table form. In the technical scheme, only the two-dimensional table is adopted to describe the attribute of the entity. That is, the relational model is typically used to store entities and relationships between entities, but the present invention only uses it to describe attributes of entities in the mixed data model.
Further, the Hashmap model refers to a data model that employs keys to store and retrieve subsequent values. The Hashmap model may be implemented in the form of a key-value store or a distributed file system, both.
Further, the construction method of the mixed data model is shown in fig. 1: each data entity has an entity type and a unique ID number; the attributes of the data entities are described in the form of a two-dimensional table in a relational model, i.e., each entity type corresponds to a two-dimensional table in the form of [ data entity ID | attribute a | attribute b. ·. ]; the key in the Hashmap model is the ID number of the data entity, and the value is the original content of the data entity; the association between the data entities is represented by a graph data model, namely, a node in the graph data model represents a data entity, the node is identified with the type and the ID number of the corresponding data entity, and the edge between the nodes represents the association between the data entities; and the association between the attributes and the content of the same entity is represented by a mapping of entity ID numbers.
2) A storage method and a management system of highly-associated big data matched with the mixed data model;
further, the storage method and management system of the highly-associated big data are shown in fig. 2 and include: the device comprises a storage module, a unified data management module and an auxiliary index mechanism.
Further, the storage module refers to a plurality of databases or file systems for storing data entities, that is, a graph database for storing the relation between data entities, a key value database or distributed file system for storing the original content of data entities, and a relational database for storing the attributes of data entities. Wherein the content of the first and second substances,
the graph database implements a graph data model, i.e., each data entity is stored as a node in the graph database having an identification of an entity ID and an identification of an entity type.
The key-value database or distributed file system implements a Hashmap model. An entity ID may be used as a key, but is not limited to. When a non-entity ID is used as a key (for example, when several entities are stored together, a combination of storage time or entity IDs may be used as a key), an auxiliary index needs to be constructed to improve the query efficiency, that is, a key corresponding to the entity content is indexed by an entity ID. The secondary index may be implemented as a table in a relational database.
The relational database realizes a relational data model and an auxiliary indexing mechanism, namely a data table [ entity ID | attribute A | attribute B.. 9 ] is constructed for each entity type, and the entity ID is a main key of the data table.
Furthermore, the unified data management module mainly realizes the functions of adding, deleting, updating and querying the association relationship, the attribute and the data content of the data entity in different databases or file systems, and optimizes the priority order of the relationship query, the attribute query and the content query of the data entity by adopting a proper strategy to improve the query and retrieval efficiency of the data.
Further, the auxiliary index mechanism is mainly used for improving the efficiency of common query, that is, an index is constructed for attributes or attribute combinations which are often used as query conditions, so as to realize quick retrieval of the entity ID. In addition, the system also comprises an auxiliary index table constructed when the non-entity ID is used as a key of a key value database or a distributed file system.
Further, the data adding process is as follows:
step A1: and storing the original content of the newly added data entity into a key value database or a distributed file system, wherein the key of the newly added data entity is set as an entity ID or other unique identification. If other unique identification which is not the entity ID is adopted, a table [ entity ID | other unique identification ] for auxiliary index is established in the relational database in advance, and when a data entity is newly added, a corresponding relation record of 'the ID of the newly added data entity and the new other unique identification' is inserted into the table.
Step A2: and extracting attribute values of the newly added data entities, and inserting the attribute values into a relation table corresponding to the types of the data entities, wherein the newly added record is in the form of' entity ID | attribute A | attribute B.
Step A3: extracting the incidence relation between the newly added data entity and other data entities, inserting a new node into the graph database to represent the newly added data entity, setting two node attributes for the newly added data entity, respectively recording the entity ID and the entity type, and finally establishing a side according to the incidence relation between the newly added data entity and other data entities. And when another data entity associated with the newly added data entity exists in the graph database, connecting the nodes representing the two entities by using edges. If another data entity associated with the newly added data entity does not already exist in the graph database, a node representing another data entity should be established in the graph database, and then edges are used to connect them.
Further, the data deleting process is as follows:
step B1: deleting the data entity to be deleted from the key-value database or the distributed file system, and if a table [ entity ID | other unique identification ] for auxiliary index exists in the relational database, deleting the record corresponding to the data entity.
Step B2: and deleting the data entity to be deleted from the corresponding relation table of the relation database.
Step B3: the node represented by the data entity to be deleted is deleted from the graph database, while the edges having the node as one of the end points are also all deleted.
Further, the data query process is as follows:
step C1: and according to the incidence relation constraint between the data entities in the query condition, performing matching search in the graph database to obtain a result set R1 meeting the condition. If the query does not contain constraints on the association relationship between the data entities, step C1 is not performed and R1 is directly empty.
Step C2: and according to the attribute constraint of the data entity in the query condition, performing matching search in the relational database to obtain a result set R2. If the query does not contain the attribute constraint for the data entity, step C2 is not performed and R2 is directly empty.
Step C3: the result set R3 is set according to the set of entity IDs in R1 and R2. If both R1 and R2 are empty, then R3 is empty. If either R1 or R2 is empty, then R3 equals the set of entity IDs in one of the result sets that is not empty. If neither R1 nor R2 is empty, then R3 is set to the intersection of the entity ID sets in R1 and R2 (possibly with the intersection empty).
Step C4: if the result of the data query requires the original text content of the data, the corresponding entity content is found in the key value database or the distributed file system according to the entity ID in R3. If R3 is empty, i.e. it indicates that there is no data entity in R3, then step C4 is not performed and step C5 is entered directly.
Step C5: if R3 is not empty, the contents of the data entities in the query results R1, R2, R3 and R3 are returned. If R3 is empty, return to R1, R2 and R3.
Further, the data updating process is as follows:
step D1: and acquiring an ID set R3 of the data entity to be updated according to the query condition based on the data search process.
Step D2: if the content of a certain data entity needs to be updated, the corresponding entity content is found in the key value database or the distributed file system according to the data entity ID in R3, and is replaced by new content.
Step D3: if the attribute of a certain data entity needs to be updated, the record corresponding to the entity is updated in the relational table corresponding to the relational database according to the ID of the data entity in R3.
Step D4: if some incidence relations between a certain data entity and other data entities in the R3 need to be updated, a node representing the entity is located according to the entity ID, the node is used as a starting point of the query, then matching search is performed in the graph database according to the pattern of the incidence relations to be updated from the starting point, and the incidence relations to be modified are updated when found.
Further, the strategy for optimizing the query sequence is as follows: when the query of the relational database does not contain complex table connection, preferentially performing the query of the relational database, namely exchanging the sequence of the query steps C2 and C1, taking the entity ID in a result set R2 executed by C2 as the starting node of the query of the database of C1, and then starting the path matching query of the database from the starting nodes; when the query of the relational database contains complex table connection, the query of the graph database is preferentially carried out, namely the sequence of the steps C1 and C2 is maintained, R1 is obtained through the association query of the graph database, and then the query of the complex relational database is carried out in the entity ID range in R1, so that the query efficiency is improved.
The invention has the following beneficial effects:
the incidence relation between data entities in a big data set is modeled by adopting a graph model, and a graph database is used for storage, so that efficient complex incidence query can be supported, namely, the performance is averagely in the second level when complex incidence data (the length of a path formed by nodes and edges is 3 or more) is searched on the graph model under the same software and hardware environment; while other relational models and Hashmap models are adopted to express the association relationship, the complex association data query is mostly in the level of tens of seconds, even hundreds of seconds.
And (II) the attributes of the data entities in the big data set and the original content are properly associated and stored together, so that the query of the big data attributes and the original content can be simultaneously satisfied.
And thirdly, a unified data management module is adopted, so that the query sequence can be optimized according to the characteristics of the query conditions, and the query efficiency is improved.
Drawings
FIG. 1 is a hybrid data model proposed by the present invention;
FIG. 2 is a technical architecture of a data management system proposed by the present invention;
FIG. 3 is an example of modeling a data set for a social networking site presented by the present invention.
Detailed Description
The following is an illustrative explanation of embodiments of the key techniques and methods in this summary, but the scope of the invention is not limited by this explanation.
1) Data set
Taking data of a certain social network site as an example, the data mainly comprises user information data and microblog information data. The user information data includes a user account, gender, age, hobbies, registration time, a list of other users interested by the user. The microblog information data comprises the ID of the microblog, the user account for releasing, the ID for forwarding the microblog, the content of the microblog, the releasing time, the releasing place, the device for releasing the microblog and the user account of @. There is a large number of relationships between data in this dataset: concern relationship among users, release relationship between users and microblogs, forwarding relationship among microblogs, and @ relationship in microblogs.
2) Data modeling
As shown in fig. 3, the structured attribute information in the user information is modeled as a relationship table UserInfo [ user account | gender | age | hobby | registration time ], and the primary key is the user account. And then, establishing the structured attribute information in the microblog information as a relation table Weibo [ microblog ID | issuing time | issuing place | microblog issuing equipment ], wherein the primary key is the microblog ID. And modeling the original text content of the microblog into a Hashmap model, and taking the ID of the microblog as a key. And then, the user account and the microblog ID are used as nodes of the graph model, and the attention relationship among the users, the releasing relationship between the users and the microblog, the forwarding relationship between the microblog and the @ relationship between the microblog and the users are described by using edges among the nodes. Finally, mapping association is carried out between the relational table UserInfo and the user nodes of the graph model through user accounts; mapping association is carried out on the relation table Weibo, the microblog nodes of the graph model and the Hashmap model through microblog IDs.
Furthermore, to improve query efficiency, appropriate redundancy may be allowed to exist in general. For example, a relationship table Weibo [ device for issuing a microblog ID | issue time | issue place | issue a microblog ] is modified to Weibo [ user account for issuing device for issuing a microblog ID | issue time | issue place | issue a microblog ]. Although the publishing relationship between the user and the microblogs is described in the graph model, and the redundancy of data is increased after modification, the redundancy has higher efficiency when querying 'all microblogs published by the user A in the last month'. Because the relational query without complex table connection operation is good for the relational database, the query in a database is not needed, and the total database query operation is only needed once. The data redundancy consideration in such modeling needs to be based on actual business requirements.
3) Storage method
Firstly, constructing a relational table UserInfo and Weibo in a data model by adopting a traditional relational database, and finishing the storage of attributes of two data entities, namely a user and a microblog; then, a Hashmap model is implemented by adopting a key value database, and the storage of the microblog original text content is completed; then, constructing a graph model by adopting a graph database, and finishing storing the attention relationship among users, the release relationship between the users and the microblog, the forwarding relationship among the microblog and the relationship @ in the microblog; finally, since the ID of the microblog data entity is directly adopted as the key of the key value database, the establishment of the auxiliary index is not needed. If it is necessary to speed up some common attribute condition queries, other auxiliary indexes can be built in the relational database as appropriate. The data storage method is implemented by a unified data management module, namely two types of data, namely the collected user information data and the collected microblog information data, are organized according to the data model and are respectively stored in different databases for management.

Claims (6)

1. A storage method of highly-associated big data comprises the following steps:
1) setting an entity type and a unique ID number for each data entity;
2) only storing the attribute of the data entity by adopting a two-dimensional table in the relational model; storing the content of the data entity by adopting a Hashmap model; in the Hashmap model, merging and storing the contents of a plurality of data entities, taking the storage time of the stored data entities as a key, the contents of the stored data entities as a key value, and constructing an index from a data entity ID index to the key corresponding to the entity contents; or the storage time of a plurality of data entities stored together and the combination of the IDs of the data entities serve as keys, the contents of the plurality of data entities stored together serve as key values, and an index of the data entity ID index to the key corresponding to the entity content is constructed;
3) establishing an association relation between the attribute and the content of the same data entity through the ID number of the data entity; storing the incidence relation between the data entities by adopting a graph data model; the method for storing the incidence relation between the data entities by adopting the graph data model comprises the following steps: one node in the graph data model represents one data entity, the entity type and the ID number of the corresponding data entity are identified on the node, and the association relationship between the data entities is represented by the edges between the nodes.
2. The method of claim 1, wherein storing attributes of data entities using a two-dimensional table in a relational model is by: storing the attribute of the data entity of each entity type by adopting a two-dimensional table, wherein the format of the table is as follows: [ data entity ID | Attribute A | Attribute B. ].
3. A management system of highly-associated big data is characterized by comprising a storage module and a unified data management module; wherein the content of the first and second substances,
the storage module comprises a Hashmap model used for storing the contents of the data entities, a relation model used for storing the attributes of the data entities and a graph data model used for storing the association relation between the data entities; each data entity sets an entity type and a unique ID number, and the association relationship is established between the attribute and the content of the same data entity through the ID number of the data entity; wherein, only the attribute of the data entity is stored by adopting a two-dimensional table in the relational model; the graph data model is realized by utilizing a graph database, namely, each data entity is stored as a node in the graph database, the entity type and the ID number of the corresponding data entity are identified on the node, and the association relationship between the data entities is represented by utilizing edges between the nodes;
the unified data management module is used for adding, deleting, updating and inquiring the incidence relation, the attribute and the data content of the data entity in the storage module;
in the Hashmap model, merging and storing the contents of a plurality of data entities, taking the storage time of the stored data entities as a key, the contents of the stored data entities as a key value, and constructing an index of a data entity ID index to the key corresponding to the entity contents; or the combination of the storage time and the ID of the plurality of data entities stored together is used as a key, the content of the plurality of data entities stored together is used as a key value, and an index of the data entity ID index to the key corresponding to the entity content is constructed.
4. The system of claim 3, wherein the relational data model is implemented using a relational database, wherein the attributes of the data entities for each entity type are stored in a two-dimensional table having a format of: [ data entity ID | Attribute A | Attribute B. ].
5. The system according to claim 3 or 4, wherein when a newly added data entity needs to be stored in the storage module, the unified data management module stores the content of the newly added data entity in the Hashmap model, and the key of the newly added data entity is set as entity ID; then extracting attribute values of the newly added data entity, and inserting the attribute values into a two-dimensional table corresponding to the type of the newly added data entity in a relation model; then extracting the incidence relation between the newly added data entity and other data entities, inserting a new node in the graph data model to represent the newly added data entity, and setting two node attributes to record the entity ID and the entity type respectively; and then establishing an edge according to the incidence relation between the newly added data entity and other data entities.
6. The system according to claim 3 or 4, wherein when a data entity query request is received, the unified data management module performs matching search in the graph data model according to the incidence relation constraint between the data entities in the query condition to obtain a result set R1 meeting the condition; according to the attribute constraint of the data entity in the query condition, matching search is carried out in the relation model, and a result set R2 is obtained; the result set R3 is then set from the set of entity IDs in the result sets R1, R2: if both result sets R1 and R2 are empty then result set R3 is empty, if either result set R1 or R2 is empty then result set R3 is equal to the set of entity IDs in the one of the result sets that is not empty, if both result sets R1 and R2 are not empty then set result set R3 to be the intersection of the sets of entity IDs in R1 and R2; if the data query request requires the original text content of the data, the corresponding entity content is found in the Hashmap model according to the entity ID in the result set R3, and then the content of the data entity in the query result sets R1, R2, R3 and R3 is returned.
CN201610579013.8A 2016-07-21 2016-07-21 Storage method and management system for highly-associated big data Active CN106227800B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610579013.8A CN106227800B (en) 2016-07-21 2016-07-21 Storage method and management system for highly-associated big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610579013.8A CN106227800B (en) 2016-07-21 2016-07-21 Storage method and management system for highly-associated big data

Publications (2)

Publication Number Publication Date
CN106227800A CN106227800A (en) 2016-12-14
CN106227800B true CN106227800B (en) 2020-02-21

Family

ID=57532054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610579013.8A Active CN106227800B (en) 2016-07-21 2016-07-21 Storage method and management system for highly-associated big data

Country Status (1)

Country Link
CN (1) CN106227800B (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106775742A (en) * 2016-12-27 2017-05-31 中国建设银行股份有限公司 The extended method and system of a kind of user customized information
CN107391533B (en) * 2017-04-18 2020-04-07 阿里巴巴集团控股有限公司 Method and device for generating query result of graphic database
CN107491476B (en) * 2017-06-29 2021-01-12 中国科学院计算机网络信息中心 Data model conversion and query analysis method suitable for various big data management systems
CN110019399A (en) * 2017-12-19 2019-07-16 成都蜀信信用服务有限公司 Intelligence relationship circle dynamic flexible analysis system and method
CN108170847B (en) * 2018-01-18 2021-08-31 国网福建省电力有限公司 Big data storage method based on Neo4j graph database
CN108664582B (en) * 2018-05-04 2021-06-18 企查查科技有限公司 Enterprise relation query method and device, computer equipment and storage medium
CN109325076B (en) * 2018-06-07 2020-03-27 北京百度网讯科技有限公司 Data management method and device in Internet of things, computer device and readable medium
CN109241052A (en) * 2018-07-26 2019-01-18 山东大学 A kind of storage method based on associated data, device, medium and equipment
CN109325048B (en) * 2018-08-23 2021-12-07 上海海洋大学 Red tide data query method constructed based on graph model
CN110895548B (en) * 2018-08-24 2022-08-09 百度在线网络技术(北京)有限公司 Method and apparatus for processing information
CN109726203A (en) * 2018-12-20 2019-05-07 四川新网银行股份有限公司 A kind of date storage method of reconstruct image
CN110347699B (en) * 2019-06-26 2022-01-28 北京明略软件系统有限公司 Method and device for determining activity of entity related to identity card
CN110362706B (en) * 2019-07-05 2022-02-08 北京明略软件系统有限公司 Data searching method and device, storage medium and electronic device
CN110928960B (en) * 2019-10-28 2023-08-11 华中科技大学 Data storage system, method, equipment and storage medium
CN111026732B (en) * 2019-12-03 2023-11-17 深圳块织类脑智能科技有限公司 Dynamic inspection tour method and system
CN111090824B (en) * 2019-12-23 2023-09-19 百度国际科技(深圳)有限公司 Content processing method and device
CN111930958B (en) * 2020-07-13 2023-12-01 车智互联(北京)科技有限公司 Graph database construction method, computing device and readable storage medium
CN112465294B (en) * 2020-10-21 2023-08-22 郑州大学第一附属医院 Intelligent hospital medicine supply method and device based on audit result
CN112836063B (en) * 2021-01-27 2023-06-06 四川新网银行股份有限公司 Method for realizing feature tracing
CN113220659B (en) * 2021-04-08 2023-06-09 杭州费尔斯通科技有限公司 Data migration method, system, electronic device and storage medium
CN113903421B (en) * 2021-10-11 2022-04-12 上海柯林布瑞信息技术有限公司 Method and device for rapidly processing medical scientific research form data
CN114238268B (en) * 2021-11-29 2022-09-30 武汉达梦数据技术有限公司 Data storage method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023979A (en) * 2009-09-09 2011-04-20 中国工商银行股份有限公司 Meta-data management method and system
CN102541992A (en) * 2010-11-03 2012-07-04 微软公司 Homomorphism lemma for efficiently querying databases
CN103617295A (en) * 2013-12-16 2014-03-05 北京锐安科技有限公司 Method and device for processing geographic information vector data
CN103810275A (en) * 2014-02-13 2014-05-21 清华大学 Method and device for data interaction between non-relation type database and relation type database
CN104123369A (en) * 2014-07-24 2014-10-29 中国移动通信集团广东有限公司 CMDB system based on graphic data base and implementation method
CN104572856A (en) * 2014-12-17 2015-04-29 武汉科技大学 Converged storage method of service source data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160055233A1 (en) * 2014-08-25 2016-02-25 Ca, Inc. Pre-join tags for entity-relationship modeling of databases

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023979A (en) * 2009-09-09 2011-04-20 中国工商银行股份有限公司 Meta-data management method and system
CN102541992A (en) * 2010-11-03 2012-07-04 微软公司 Homomorphism lemma for efficiently querying databases
CN103617295A (en) * 2013-12-16 2014-03-05 北京锐安科技有限公司 Method and device for processing geographic information vector data
CN103810275A (en) * 2014-02-13 2014-05-21 清华大学 Method and device for data interaction between non-relation type database and relation type database
CN104123369A (en) * 2014-07-24 2014-10-29 中国移动通信集团广东有限公司 CMDB system based on graphic data base and implementation method
CN104572856A (en) * 2014-12-17 2015-04-29 武汉科技大学 Converged storage method of service source data

Also Published As

Publication number Publication date
CN106227800A (en) 2016-12-14

Similar Documents

Publication Publication Date Title
CN106227800B (en) Storage method and management system for highly-associated big data
CN109299102B (en) HBase secondary index system and method based on Elastcissearch
JP6617117B2 (en) Scalable analysis platform for semi-structured data
CN107122443B (en) A kind of distributed full-text search system and method based on Spark SQL
CN107402995B (en) Distributed newSQL database system and method
US9805079B2 (en) Executing constant time relational queries against structured and semi-structured data
US9665607B2 (en) Methods and apparatus for organizing data in a database
CN104298771A (en) Massive web log data query and analysis method
CN111881223B (en) Data management method, device, system and storage medium
CN111221791A (en) Method for importing multi-source heterogeneous data into data lake
CN102890678A (en) Gray-code-based distributed data layout method and query method
CN104391908B (en) Multiple key indexing means based on local sensitivity Hash on a kind of figure
CN103795811A (en) Information storage and data statistical management method based on meta data storage
US20180150544A1 (en) Synchronized updates across multiple database partitions
CN108959538A (en) Text retrieval system and method
Yafooz et al. Managing unstructured data in relational databases
US10628421B2 (en) Managing a single database management system
CN103365987A (en) Clustered database system and data processing method based on shared-disk framework
Calçada et al. Evaluation of Couchbase, CouchDB and MongoDB using OSSpal.
KR101955376B1 (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
Liu et al. Finding smallest k-compact tree set for keyword queries on graphs using mapreduce
Sreemathy et al. Data validation in ETL using TALEND
WO2018218504A1 (en) Method and device for data query
Yuksel et al. An analysis of RDF storage models and query optimization techniques
Mughees Data migration from standard SQL TO NoSQL

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant