CN112181950B - Construction method of distributed object database - Google Patents

Construction method of distributed object database Download PDF

Info

Publication number
CN112181950B
CN112181950B CN202011120635.7A CN202011120635A CN112181950B CN 112181950 B CN112181950 B CN 112181950B CN 202011120635 A CN202011120635 A CN 202011120635A CN 112181950 B CN112181950 B CN 112181950B
Authority
CN
China
Prior art keywords
odb
module
data
task
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011120635.7A
Other languages
Chinese (zh)
Other versions
CN112181950A (en
Inventor
王成光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mirian Technology Co ltd
Original Assignee
Beijing Mirian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mirian Technology Co ltd filed Critical Beijing Mirian Technology Co ltd
Priority to CN202011120635.7A priority Critical patent/CN112181950B/en
Publication of CN112181950A publication Critical patent/CN112181950A/en
Application granted granted Critical
Publication of CN112181950B publication Critical patent/CN112181950B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/289Object oriented databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a distributed Object Database (ODB) storage system for dealing with complex data, wherein single data, namely objects, database fragments, main and standby and single data, support elastic expansion, data index separation and high-efficiency bottom storage communication dependence Apache top-level project (Avro), and the whole system comprises an ODB task client module, an ODB read-write service module, an ODB index module and an ODB bottom storage module, wherein the ODB task client module is used for submitting object inquiry and object update tasks to ODB read-write service; the ODB read-write service module is used for receiving the inquiry and update requests of the task client and interacting with the ODB index module and the ODB bottom storage module; the ODB index module is used for updating or quickly searching single or batch object data indexes; and the ODB bottom storage module is used for receiving the query and storing the object instance.

Description

Construction method of distributed object database
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a construction method of a distributed object database for dealing with complex data.
Background
An Object Database (ODB), as the name implies, refers to an Object-oriented Database, a Database that represents data in the form of objects and classes. In object-oriented terminology, an object is a real entity and a class is a collection of objects. Object-oriented databases follow the basic principles of object-oriented programming. In brief, the object-oriented database=object-oriented programming+database, whose characteristics are as shown in fig. 1, satisfies inheritance, encapsulation and polymorphism of object-oriented programming, and satisfies read-write, integrity and concurrency of basic databases. The ODB queries and stores each data value updated as a complete object instance.
At present, the domestic study object databases are few, most of the databases are various non-relational databases (NoSQL, not Only SQL-Structured Query Language) stored based on healthy value pairs, and the common NoSQL has 4 types for representation:
1) The key values represented by MongoDB are excellent in query and storage performance for the stored document formats, and the query is supported by multiple index modes, so that the method is very convenient to use, but the maximum single data is not more than 16M, and the method is very attractive for some relatively complex businesses.
2) The Hbase is used as a representative listing storage, the storage layer has obvious advantages, but the query mode is not friendly enough in supporting strength;
3) The graph database represented by Neo4J stores a relational network, but the prior graph database is not ideal in a distributed aspect, single data are generally simpler, and the performance of the query efficiency is greatly reduced when the data volume is large.
4) newSQL represented by TiDB provides the same scalability as NoSQL, but also retains mature SQL as a query language based on a relational model, guaranteeing ACID transaction characteristics. In brief, newSQL integrates the powerful extensibility of NoSQL on a traditional relational database. Typically a single piece of data is relatively simple.
In contrast, for some current businesses that require quick batch access to a single complex data type, there are significant shortcomings with the various databases described above. For example, in order to facilitate the update of the basic user portrait tag, the related behavior, the access times, the time stamp and other data in the designated time period are often aggregated by taking the user as a unit to be regarded as a whole. The appointed time period can be generally set as a day, a week or a month, so that up to tens, hundreds or even thousands of scattered behavior data of each user exist in the day, week or month time period, the scattered behavior data are aggregated into a piece of complex data with dense behaviors, and the algorithm training is more convenient to use; but also, its storage becomes more complex because the single data volume easily exceeds the upper limit of the maximum 16M of the single data of MongoDB, and if forcibly split into finer granularity, the data scattering is not easily controlled, and its document structure format: the fusion of data and structures can lead to redundancy in data storage and waste of storage space; if Hbase is present, query support itself is less friendly and less convenient in multi-dimensional queries.
There is an open-source object-oriented database Db4o abroad at present, which has the following disadvantages:
1) The method is mainly a single-machine embedded application, which means that the expandability is not good enough and mass data storage cannot be supported;
2) Secondly, the use level is that when the object is stored, the optimized serialization storage technical support is not added, which means that the problems of low efficiency and data redundancy exist in the storage and the inverse serialization analysis of the object;
3) The last update was 2019-9-29, whose authorities have essentially stopped update maintenance in the last 4 years.
Disclosure of Invention
In order to overcome the defects, the invention provides a distributed object database storage system for dealing with complex data and a construction method thereof, and solves the problems in the background technology.
The technical scheme of the invention is that a distributed object database storage system for dealing with complex data is provided, wherein in the storage system, single data, namely objects (seamless integrated development), database shards, master and slave and single data all support elastic expansion, data index separation and high-efficiency bottom storage communication dependence Apache top-level project Avro. The whole system comprises an ODB task client module, an ODB read-write service module, an ODB index module and an ODB bottom layer storage module, wherein:
the ODB task client module is used for submitting an object query and an object update task to an ODB read-write service;
the ODB read-write service module is used for receiving the inquiry and update requests of the task client and interacting with the ODB index module and the ODB bottom storage module;
the ODB index module is used for updating or quickly searching single or batch object data indexes;
the ODB bottom layer storage module is used for receiving the inquiry and storing the object instance;
further, the ODB task client module includes a single object update sub-module, a batch object update sub-module, a single object query sub-module, a batch object query sub-module, a monitoring read-write service sub-module, a batch adjustment sub-module and an update task directional distribution sub-module, wherein, at the beginning of starting, nodes registered by the ODB read-write service are monitored through a Zookeeper, service heartbeats of the nodes are detected at regular time, and only healthy nodes keep working; for single or batch inquiry tasks, submitting the single or batch inquiry tasks to corresponding interfaces of the ODB read-write service module according to the primary key ID and the time range parameter; for the aspect of data updating, when the ODB task client module submits a new task to the ODB read-write service module, the number of tasks to be calculated of a currently available node is detected, if the number of the tasks to be calculated is less than 2, the sleep waits for 5 seconds, the next request is continuously executed until an idle node is available, and the task is continuously submitted again.
Further, the ODB read-write service module is used for receiving the query and update request of the ODB task client module based on the communication between the thread implementation and the ODB task client module, interacting with the ODB index module and the ODB bottom storage module, monitoring the service of the ODB bottom storage node through the Zookeeper at the beginning of starting, and registering the service of the ODB read-write service module at the same time, so that the ODB task client module monitors the dynamics of the ODB task client module.
Further, the OBD read-write service module comprises a first object inquiry sub-module and a first object updating sub-module,
wherein the method comprises the steps of
The first object querying sub-module,
according to the received request parameter primary key ID or timestamp range, firstly searching for sub-objects meeting the filtering condition in an ODB index, determining whether to adopt Fork/Join multithread concurrent call according to the size of the query scale, then calling RPC query service realized based on Avro, obtaining objects under each ODB bottom storage node in batches, merging the sub-objects under each ODB storage node according to the primary key ID, and returning to an upstream calling party;
the first object-updating sub-module,
for the received object adding request, firstly caching data into a local queue, and simultaneously circularly executing an asynchronous thread task all the time, if the current node is found that the task which is not executed yet is in a busy state, sleep for 5 seconds, and skipping to continue executing the next cycle; if the current service node is found to be idle and no executing calculation task exists, changing the current node state into an executing busy state, immediately acquiring data from a local queue, completing corresponding business logic, constructing a serialization object designed by utilizing Avro, hashing the serialization object into a queue corresponding to a designated ODB bottom layer storage node according to a main key ID, updating an object index into MongoDB, calling Avro RPC, submitting the result to the ODB bottom layer storage in batches, and waiting for the execution of the calculation state of the nodes to be idle.
Further, the ODB index module is configured to scribe according to a time window, when the underlying storage node operates normally and stably, only one fixed storage node exists in one object, and if the node fails, the node exists in another storage node.
Further, the ODB bottom storage module adopts a time period 'week' directory structure, each object can scatter under a plurality of directories according to the activity period, each directory is an independent table, the directory is a file directory, and each node registers its own service node through a Zookeeper at the beginning of starting, so that the upstream monitoring of the available state of the storage node is facilitated.
Further, the ODB underlying storage module includes a second object update sub-module, a second object query sub-module, and a RocksDB database handle snapshot maintenance sub-module.
Further, the second object updating sub-module is used for placing newly added data into a local queue at the first time, circularly executing the objects to be added in the local queue by an asynchronous thread task, updating the objects to a local disk in batches, if the objects to be added in batches are data fusion tasks, inquiring whether old values exist in a current time slice according to the IDs of the objects to be added in batches at the first time before updating, and if the old values exist, merging the old values into the attribute values of the newly added objects; if the task is a replacement task, directly adding in batches, automatically covering old data by the data, and splitting a large update task by adopting a JDK self Fork/Join framework; the second object inquiring submodule inquires parameters carried by the API according to batch inquiry transmitted by the upstream service: the main key ID and the time slice range are subjected to the appointed directory to obtain data, and the data are inversely sequenced into objects; and checking whether corresponding data exist in the node cache, and if so, merging is needed.
Further, the RocksDB database handle snapshot maintenance submodule is provided with two modes of reading and writing, and opens a write-once handle in a specified time period; and reading the handle, and re-opening the snapshot to acquire the latest data if new data is written.
The technical scheme of the invention also comprises a construction method of the distributed object database storage system for dealing with complex data, wherein the read-write request of the user is completed through an ODB task client module, the client forwards the request of the user to an ODB read-write service module, and the read-write service module completes logic processing according to the read-write request parameters of the user: for a query request, searching an object index meeting the requirement from an ODB index module, calling a service API of an ODB bottom storage module to acquire data, merging sub-objects of the same object ID as unique objects in an ODB read-write service module, and finally returning to an ODB task client module;
for a write request, hashing to a designated ODB storage node according to the object ID, asynchronously writing into an ODB bottom layer storage in small batches, and updating the mapping relation to an index library of an ODB index module;
for the modification request, firstly searching a corresponding storage node from the ODB index module, covering the object information on a corresponding bottom layer storage, and updating the object index change time stamp;
for the deletion request, the corresponding storage node is first searched from the ODB index module, and the object information is deleted on the corresponding underlying storage and index.
The distributed object database storage system for dealing with complex data has the following advantages:
1) And supporting the transverse Suard elastic expansion, only one index of single data is provided, and the value of the index is composed of a plurality of independent object entities hashed at the ODB bottom storage node according to the main key ID. This is not the case with the open source Db 40.
2) The data index is separated, the storage and the index part of the data are stored separately, the index is currently based on MongoDB storage, and the bottom storage meets the flexible expansion of the non-fixed size of the service data by means of the byte stream of the RocksDB.
3) Compared with MongoDB single data which does not exceed 16M, the single data can be infinitely expanded, the single data is regarded as a whole from outside, and the inside is automatically segmented into individual object individuals according to time slices or fixed data sizes.
4) Efficient storage and efficient serialization tool transformation of objects to be stored. The standardized serialization framework Avro, which is based primarily on the Hadoop bottom layer, is here. Compared with MongoDB, the storage of the method has the same data size, and YdOdb saves 1/3 space compared with MongoDB storage, which is also the fact that Db4o does not adopt an optimized serialization technology and cannot be compared with a default common JavaBean object.
5) The high-efficiency communication is realized, the communication transmission at the bottom layer depends on high-efficiency RPC (RPC), namely Avro and thread, and each node in the read-write of the high-efficiency RPC fully utilizes the multi-core advantage of a CPU (Central processing Unit), and the high-efficiency RPC is asynchronously processed in parallel in batches.
6) The method is convenient to use, the existing NoSQL stores only data, the data is used in specific business, the data needs to be processed into a mode required by the business, the data acquired by YdOdb query is a data object required by the business, and the method can be directly applied and is closer to the business.
7) The integrated efficiency of calculation and storage is higher, by taking the calculation of the user portrait tag weight as an example, ydOdb with only 4 calculation nodes is used, the update of the active user portrait weight of one day is calculated for 46.68 minutes, the active user time is calculated for 109 minutes within 3 days, and each calculation node only consumes 2-3G space. The Spark cluster based on 14 computing nodes can take 50 minutes to calculate the weight of one-day active user portrait, and can consume up to hundreds of G in memory, and if the continuous computing is carried out for 3 days, the current resource support strength is insufficient, and the node needs to be expanded.
Drawings
FIG. 1 is a diagram illustrating the features of a prior art object database storage system;
FIG. 2 is a diagram of a distributed object database storage system architecture for complex data in accordance with the present invention;
FIG. 3 is a functional logic architecture of a distributed object database storage system for complex data in accordance with the present invention.
Detailed Description
The invention relates to a distributed object database storage system for dealing with complex data and a construction method thereof, wherein the database storage system can be abbreviated as YdOdb, the whole system architecture and the logic function architecture are respectively shown in the accompanying drawings 2 and 3, and the whole system is divided into 4 parts from top to bottom: ODB task client, ODB read-write service, ODB index-MongoDB and ODB bottom storage-RocksDB, wherein the ODB read-write service and the ODB bottom storage service are integral cores. Looking at the database storage system of the invention as a whole, any reading and writing request of a user is required to be completed through an ODB task client, the client forwards the request of the user to an ODB reading and writing service module, and the reading and writing service module completes logic processing according to the reading and writing request parameters of the user:
if the query request is a query request, firstly searching an object index meeting the requirement from an ODB index Mongo, then calling an ODB bottom storage service API to acquire data, merging sub-objects of the same object ID in an ODB read-write service module to be unique objects, and finally returning to the client;
if the request is a writing request, hashing the request to a designated ODB storage node according to the object ID, asynchronously writing the request into an ODB bottom layer storage Rocksdb in small batches, and updating the mapping relation to an ODB index library;
if the modification request is a modification request, firstly searching a corresponding storage node from the ODB index, covering the object information on the corresponding bottom storage Rocksdb, and updating the object index change time stamp;
if the object information is a deletion request, firstly searching a corresponding storage node from the ODB index, and deleting the object information on the corresponding bottom storage Rocksdb and the index MongoDB.
In this embodiment, the basic construction method and functional module of each part of the distributed object database storage system for dealing with complex data according to the present invention are described in detail:
(1) The ODB task client is configured to submit an object query and an object update task to an ODB read-write service, where the operations of adding and deleting the overlay are generally completed by a specific actual service, and its own functions are as the content contained in fig. 3: single/batch object add-drop-check interface, client-side monitoring read-write service, etc. At the beginning of starting, monitoring the node registered by the ODB read-write service through a Zookeeper, and detecting the service heartbeat at regular time, wherein only healthy nodes keep working; for single or batch inquiry tasks, the single or batch inquiry tasks are generally submitted to an ODB read-write service corresponding interface through a thread RPC according to a main key ID and a time range parameter; in the aspect of data updating, in order to ensure the stable operation of the whole service cluster, when a client submits a new task to the ODB read-write service, the number of tasks to be calculated of a currently available node is detected through a thread RPC, if the number of the tasks to be calculated is less than 2, the sleep waits for 5 seconds, the next request is continuously executed until an idle node is available, and the task is continuously submitted again.
(2) The ODB read-write service is based on the communication between thread realization and the ODB task client, and is used for receiving the query and update request of the task client, realizing corresponding business logic, interacting with the ODB index and the ODB bottom layer storage, and particularly has the functions as introduced in figure 3, namely, monitoring the ODB bottom layer storage node service through a Zookeeper at the beginning of starting, and registering own service at the same time, so that the ODB task client monitors the dynamics of the ODB.
Object query: according to the received request parameter primary key ID or timestamp range, firstly searching the sub-objects meeting the filtering condition in the ODB index, determining whether to use Fork/Join multithread concurrent call according to the size of the query scale, then calling RPC query service realized based on Avro, obtaining the objects under each ODB bottom storage node in batches, merging the sub-objects under each ODB storage node according to the primary key ID, and returning to the upstream calling party.
Updating the object: for accepted add object requests, either batched or single, the data is first cached in the local queue. Meanwhile, an asynchronous thread task is circularly executed all the time, if the current node is found that the task which is not executed is in a busy state, sleep is carried out for 5 seconds, and the next cycle is skipped to be continuously executed; if the current service node is found to be idle and no executing computing task exists, changing the current node state into an executing busy state, immediately acquiring data from a local queue, completing corresponding business logic, constructing a serialization object designed by utilizing Avro, hashing the serialization object into a queue corresponding to a designated ODB bottom layer storage node according to a main key ID, updating an object index into MongoDB (facilitating multidimensional quick query), calling Avro RPC to submit the result to the ODB bottom layer storage in batches, and waiting for the execution of updating the node computing state to be idle.
(3) The ODB index is used for updating or quickly searching single or batch object data indexes, and is completed by means of MongoDB, the basic structure is mainly that when the underlying storage nodes normally and stably run, one object only has one fixed storage node, and unless the node fails, the object only has another storage node.
MongoDB is used for storing object indexes, on one hand, because Mongo is convenient to inquire in a multi-dimensional way, and meanwhile, the single data is 16M at maximum, so that the simple index data can be stored.
(4) The ODB bottom layer is used for receiving query and storing object instances, and in order to facilitate lateral expansion of single data, a time period "week of year" directory structure is adopted, and a similar "202034" indicates week 34 in 2020, each object can be scattered under multiple directories according to an activity period of the object, and each directory is equivalent to an independent table, and the below is a RocksDB file directory. The basic function of ODB bottom storage is shown in figure 3, and each node registers its own service node through a Zookeeper at the beginning of starting, so that the upstream monitoring of the available state of the storage node is facilitated.
Updating the object: the addition and modification integration is realized, the single/batch data is updated asynchronously, only a basic addition interface is provided externally, the latest data is fused or replaced by the asynchronous update task, and the upstream service end is not required to care how to update. The specific flow is as follows: putting newly added data into a local queue at the first time, circularly executing objects to be added in the local queue by an asynchronous thread task, updating the objects to a local disk in batches, if the objects are data fusion tasks, inquiring whether old values exist in the current time slice for the objects to be added in batches according to the IDs of the objects to be added in batches before updating, if the old values exist, merging the old values into the newly added object attribute values, and otherwise, covering and losing the old data; if the task is replaced, the old data is automatically covered by the data after the task is directly added in batches. Because the data update is generally batch update, the JDK self Fork/Join framework is adopted to split large update tasks, and the multi-core CPU of the server is fully used.
Object query: parameters carried by the batch query API delivered by the upstream service: the main key ID and the time slice range need to be obtained under the appointed catalogue, and are inversely sequenced into objects; and meanwhile, whether corresponding data exist in the node cache or not is also required to be checked, and if so, the data acquired from the latest time period of the RocksDb snapshot handle does not contain the data to be written in the cache before the data objects in the cache queue are not updated to the disk.
RocksDB database handle snapshot maintenance: the RocksDB has two modes of reading and writing, writing can be added all the time, and the once-written handle is opened in a specified time period generally; and the handle is read, because each opening is a snapshot of the current database, once new data is written, the snapshot needs to be re-opened to obtain the latest data. This requires that a state be maintained inside the storage node that requires the database handle to be reopened depending on whether new data is added.
RocksDB optimal configuration: the configuration items are very numerous, the efficiency and the performance of the CPU are greatly influenced, and the key point is that the multi-core advantages and the large memory of the current CPU are fully utilized. The key configuration items, such as the size and the maximum number of the independent memtable, the minimum number of the memtable to be combined before the disc is dropped, the default 4k of the data block, which is required to be enlarged and cannot be oversized, and the number of concurrent compression and disc dropping threads, and the like, need to be correspondingly adjusted according to the configuration of the actual business and service so as to exert better performance.
The invention relates to a distributed object database storage system for dealing with complex data and a construction method thereof, which have the following basic characteristics:
(1) The service node elastically expands, namely, the transverse and longitudinal dynamic expansion of the service node is realized by means of a Zookeeper, namely, the data Shard and the data copy replica;
(2) Data index separation-data storage and index independent storage;
(3) Single data elastic expansion-single data longitudinal expansion by means of index, unlike MongoDB with single 16M limit;
(4) The bottom layer is communicated with high efficiency, namely RPC (Remote Procedure Call) remote procedure call by means of thread and Avro;
(5) Efficient storage-the standard serialization framework Avro serialization of complex data objects based on Hadoop bottom, high resolution efficiency and saving storage space, and the equivalent data scale is saved by 1/3 compared with MongoDB.
(6) High-efficiency reading and writing-providing batch inquiry and addition modification integrated asynchronous update interfaces, namely no special update interfaces, only providing basic addition interfaces, fusing the latest data or replacing the existing data by internally relying on asynchronous update tasks, and requiring no attention of an upstream service end on how to update.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (7)

1. The distributed object database storage system for dealing with complex data is characterized by comprising an ODB task client module, an ODB read-write service module, an ODB index module and an ODB bottom layer storage module, wherein:
the ODB task client module is used for submitting an object query and an object update task to an ODB read-write service;
the ODB read-write service module is used for receiving the inquiry and update requests of the task client and interacting with the ODB index module and the ODB bottom storage module;
the ODB index module is used for updating or quickly searching single or batch object data indexes;
the ODB bottom layer storage module is used for receiving the inquiry and storing the object instance;
wherein the ODB read-write service module comprises a first object inquiring sub-module and a first object updating sub-module,
wherein the method comprises the steps of
The first object querying sub-module,
according to the received request parameter primary key ID or timestamp range, firstly searching for sub-objects meeting the filtering condition in an ODB index, determining whether to adopt Fork/Join multithread concurrent call according to the size of the query scale, then calling RPC query service realized based on Avro, obtaining objects under each ODB bottom storage node in batches, merging the sub-objects under each ODB storage node according to the primary key ID, and returning to an upstream calling party;
the first object-updating sub-module,
for the received object adding request, firstly caching data into a local queue, and simultaneously circularly executing an asynchronous thread task all the time, if the current node is found that the task which is not executed yet is in a busy state, sleep for 5 seconds, and skipping to continue executing the next cycle; if the current service node is found to be idle and no executing calculation task exists, changing the current node state into an executing busy state, immediately acquiring data from a local queue, completing corresponding business logic, constructing a serialization object designed by utilizing Avro, hashing the serialization object into a queue corresponding to a designated ODB bottom layer storage node according to a main key ID, updating an object index into MongoDB, calling Avro RPC, submitting the result to ODB bottom layer storage in batches, and waiting for the execution of updating the node calculation state to be idle; the ODB index module is structurally characterized in that when the bottom storage node operates normally and stably, only one fixed storage node exists in one object, and if the node fails, the node exists in the other storage node; the ODB bottom storage module adopts a time period 'week' directory structure, each object can be scattered under a plurality of directories according to the activity period, each directory is an independent table, the directory is a file directory, and each node registers a self service node through a Zookeeper at the beginning of starting, so that the upstream monitoring of the available state of the storage node is facilitated.
2. The distributed object database storage system for handling complex data according to claim 1, wherein the ODB task client module includes a single object update sub-module, a batch object update sub-module, a single object query sub-module, a batch object query sub-module, a listening read-write service sub-module, a batch adjustment sub-module, and an update task orientation distribution sub-module, which, at the start of startup, listens to nodes registered by the ODB read-write service through a Zookeeper and detects its service heartbeat at regular time, only healthy nodes remain active; for single or batch inquiry tasks, submitting the single or batch inquiry tasks to corresponding interfaces of the ODB read-write service module according to the primary key ID and the time range parameter; for the aspect of data updating, when the ODB task client module submits a new task to the ODB read-write service module, the number of tasks to be calculated of a currently available node is detected, if the number of the tasks to be calculated is less than 2, the sleep waits for 5 seconds, the next request is continuously executed until an idle node is available, and the task is continuously submitted again.
3. The distributed object database storage system for handling complex data according to claim 1, wherein the ODB read-write service module is configured to communicate with the ODB task client module based on thread implementation, and is configured to receive an ODB task client module query and update request, interact with the ODB index module and the ODB bottom storage module, monitor an ODB bottom storage node service through a Zookeeper at the beginning of startup, and register a service of the service itself at the same time, so that the ODB task client module monitors its dynamics.
4. The distributed object database storage system for handling complex data according to claim 1, wherein the ODB underlying storage module comprises a second object update sub-module, a second object query sub-module, a RocksDB database handle snapshot maintenance sub-module.
5. The system of claim 4, wherein the second object update sub-module is configured to place newly added data into a local queue at a first time, execute task loop execution by an asynchronous thread to update objects to be added in the local queue to a local disk in batches, and if the objects are data fusion tasks, query whether old values exist in a current time slice according to an ID of the objects to be added in batches at the time before updating, and if the objects exist, merge the objects into newly added object attribute values; if the task is a replacement task, directly adding in batches, automatically covering old data by the data, and splitting a large update task by adopting a JDK self Fork/Join framework; the second object inquiring submodule inquires parameters carried by the API according to batch inquiry transmitted by the upstream service: the main key ID and the time slice range are subjected to the appointed directory to obtain data, and the data are inversely sequenced into objects; and checking whether corresponding data exist in the node cache, and if so, merging is needed.
6. The distributed object database storage system for handling complex data according to claim 4, wherein the locksDB database handle snapshot maintenance sub-module has a locksDB open handle with read and write modes, and opens a write once handle within a specified period of time; and reading the handle, and re-opening the snapshot to acquire the latest data if new data is written.
7. The method for constructing a distributed object database storage system for handling complex data according to any one of claims 1 to 6, wherein the user's read and write requests are completed through an ODB task client module, the client forwards the user's requests to an ODB read-write service module, and the read-write service module completes logic processing according to the user's read-write request parameters: for a query request, searching an object index meeting the requirement from an ODB index module, calling a service API of an ODB bottom storage module to acquire data, merging sub-objects of the same object ID as unique objects in an ODB read-write service module, and finally returning to an ODB task client module;
for a write request, hashing to a designated ODB storage node according to the object ID, asynchronously writing into an ODB bottom layer storage in small batches, and updating the mapping relation to an index library of an ODB index module;
for the modification request, firstly searching a corresponding storage node from the ODB index module, covering the object information on a corresponding bottom layer storage, and updating the object index change time stamp;
for the deletion request, the corresponding storage node is first searched from the ODB index module, and the object information is deleted on the corresponding underlying storage and index.
CN202011120635.7A 2020-10-19 2020-10-19 Construction method of distributed object database Active CN112181950B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011120635.7A CN112181950B (en) 2020-10-19 2020-10-19 Construction method of distributed object database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011120635.7A CN112181950B (en) 2020-10-19 2020-10-19 Construction method of distributed object database

Publications (2)

Publication Number Publication Date
CN112181950A CN112181950A (en) 2021-01-05
CN112181950B true CN112181950B (en) 2024-03-26

Family

ID=73921955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011120635.7A Active CN112181950B (en) 2020-10-19 2020-10-19 Construction method of distributed object database

Country Status (1)

Country Link
CN (1) CN112181950B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112817992B (en) * 2021-01-29 2023-06-23 北京百度网讯科技有限公司 Method, apparatus, electronic device and readable storage medium for executing change task
CN113238924B (en) * 2021-04-09 2023-09-15 杭州欧若数网科技有限公司 Chaotic engineering realization method and system in distributed graph database system

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101473324A (en) * 2006-04-21 2009-07-01 惠普开发有限公司 Method and system for finding data objects within large data-object libraries
CN102486784A (en) * 2010-12-06 2012-06-06 耶宝智慧(北京)技术发展有限公司 Information requesting method and information providing method
CN102624911A (en) * 2012-03-14 2012-08-01 中山大学 Cluster-based visible media storage system
CN102779185A (en) * 2012-06-29 2012-11-14 浙江大学 High-availability distribution type full-text index method
WO2014051897A1 (en) * 2012-09-27 2014-04-03 Ge Intelligent Platforms, Inc. System and method for enhanced process data storage and retrieval
CN103905537A (en) * 2014-03-20 2014-07-02 冶金自动化研究设计院 System for managing industry real-time data storage in distributed environment
CN105279286A (en) * 2015-11-27 2016-01-27 陕西艾特信息化工程咨询有限责任公司 Interactive large data analysis query processing method
WO2016023471A1 (en) * 2014-08-11 2016-02-18 张锐 Methods for processing handwritten inputted characters, splitting and merging data and encoding and decoding processing
CN106294402A (en) * 2015-05-21 2017-01-04 阿里巴巴集团控股有限公司 The data search method of a kind of heterogeneous data source and device thereof
CN108011919A (en) * 2017-10-23 2018-05-08 济南浪潮高新科技投资发展有限公司 A kind of objective platform of Internet of Things wound based on cloud Internet of Things technology
CN111565211A (en) * 2020-01-14 2020-08-21 西安奥卡云数据科技有限公司 CDN configuration distribution network system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9043294B2 (en) * 2011-03-21 2015-05-26 International Business Machines Corporation Managing overflow access records in a database

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101473324A (en) * 2006-04-21 2009-07-01 惠普开发有限公司 Method and system for finding data objects within large data-object libraries
CN102486784A (en) * 2010-12-06 2012-06-06 耶宝智慧(北京)技术发展有限公司 Information requesting method and information providing method
CN102624911A (en) * 2012-03-14 2012-08-01 中山大学 Cluster-based visible media storage system
CN102779185A (en) * 2012-06-29 2012-11-14 浙江大学 High-availability distribution type full-text index method
WO2014051897A1 (en) * 2012-09-27 2014-04-03 Ge Intelligent Platforms, Inc. System and method for enhanced process data storage and retrieval
CN103905537A (en) * 2014-03-20 2014-07-02 冶金自动化研究设计院 System for managing industry real-time data storage in distributed environment
WO2016023471A1 (en) * 2014-08-11 2016-02-18 张锐 Methods for processing handwritten inputted characters, splitting and merging data and encoding and decoding processing
CN106294402A (en) * 2015-05-21 2017-01-04 阿里巴巴集团控股有限公司 The data search method of a kind of heterogeneous data source and device thereof
CN105279286A (en) * 2015-11-27 2016-01-27 陕西艾特信息化工程咨询有限责任公司 Interactive large data analysis query processing method
CN108011919A (en) * 2017-10-23 2018-05-08 济南浪潮高新科技投资发展有限公司 A kind of objective platform of Internet of Things wound based on cloud Internet of Things technology
CN111565211A (en) * 2020-01-14 2020-08-21 西安奥卡云数据科技有限公司 CDN configuration distribution network system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Object to NoSQL Database Mappers (ONDM): A systematic survey and comparison of frameworks;Vincent Reniers 等;《Information Systems》;20191130;第85卷;1-20 *
基于LevelDB的分布式数据库的研究与实现;赵江;《中国优秀硕士学位论文全文数据库 信息科技辑》;20200115(第01期);I138-1006 *
海量矢量数据的分布式存储及时空查询;谢冲;《中国优秀硕士学位论文全文数据库 基础科学辑》;20200615(第06期);A008-47 *

Also Published As

Publication number Publication date
CN112181950A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
US11816126B2 (en) Large scale unstructured database systems
US10592488B2 (en) Application-centric object interfaces
Auradkar et al. Data infrastructure at LinkedIn
CN111797121B (en) Strong consistency query method, device and system of read-write separation architecture service system
CN111352925B (en) Policy driven data placement and information lifecycle management
US10929398B2 (en) Distributed system with accelerator and catalog
US7392259B2 (en) Method and system for supporting XQuery trigger in XML-DBMS based on relational DBMS
CN111881223B (en) Data management method, device, system and storage medium
Chavan et al. Survey paper on big data
US20110161290A1 (en) Data caching for mobile applications
US9971820B2 (en) Distributed system with accelerator-created containers
WO2020238597A1 (en) Hadoop-based data updating method, device, system and medium
CN112181950B (en) Construction method of distributed object database
US20140229435A1 (en) In-memory real-time synchronized database system and method
Xiong et al. Data vitalization: a new paradigm for large-scale dataset analysis
CN114969441A (en) Knowledge mining engine system based on graph database
CN113312345A (en) Kubernetes and Ceph combined remote sensing data storage system, storage method and retrieval method
CN112100186A (en) Data processing method and device based on distributed system and computer equipment
Liu et al. Modeling fuzzy relational database in HBase
Wang Activating Big Data at Scale
Chen A framework for human resource information systems based on data streams
Liu Big data management solutions for CAN bus data: system architecture development and comparison of various types of databases
Wang et al. A construction System of Lake-Warehouse Integration in the Electricity Industry Based on Hudi
Armenatzoglou et al. Amazon Redshi Re-invented
Decker et al. Wide-area replication support for global data repositories

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant