WO2014015488A1 - Procédé et appareil de stockage et d'interrogation de données - Google Patents

Procédé et appareil de stockage et d'interrogation de données Download PDF

Info

Publication number
WO2014015488A1
WO2014015488A1 PCT/CN2012/079155 CN2012079155W WO2014015488A1 WO 2014015488 A1 WO2014015488 A1 WO 2014015488A1 CN 2012079155 W CN2012079155 W CN 2012079155W WO 2014015488 A1 WO2014015488 A1 WO 2014015488A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
cloud storage
saved
query
rules
Prior art date
Application number
PCT/CN2012/079155
Other languages
English (en)
Chinese (zh)
Inventor
韩建中
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2012/079155 priority Critical patent/WO2014015488A1/fr
Priority to CN201280000916.6A priority patent/CN102906751B/zh
Publication of WO2014015488A1 publication Critical patent/WO2014015488A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Definitions

  • the present invention relates to the field of communication network technologies, and in particular, to a data storage and data query method and apparatus. Background technique
  • C loud Computing is the product of the development of distributed processing, parallel processing and grid computing.
  • Cloud storage is an extension and development of cloud computing. It refers to a large number of storage devices in the network working together through clustering applications, grid technologies or distributed file systems, distributed databases, etc., to provide data storage and externally.
  • a system of business access functions
  • relational databases store data in rows and columns. Take the orac le database CDR list as an example. Generally, each CDR record exists in the form of a row in the database table. Each row contains: number, number of the other party, duration of the call, duration of the call, and the like.
  • the data is stored in the form of data blocks (orac le data blocks).
  • the data block is the smallest storage unit of oracel, which occupies a certain amount of disk space (such as a 16k block), that is, Orac le each time I / O (input / output, input and output) operations are in blocks, for example, although a word It only has 100 bytes, but at least one block of data is read when querying. If this statement spans two data blocks, you need to read 2 blocks.
  • the file system can also be used for data storage and query.
  • the detailed billing and billing data are stored as files in the file system.
  • the file system can classify data by region, time (such as account period), number, etc., and directly store structured records in files by text or other means.
  • the file system adopts a storage method based on time as a directory structure, for example, a directory is created according to time (account period) and a user number segment, and a record file is created in units of numbers.
  • you need to query data you can create a simple index by means of directory hierarchy, file name, and so on.
  • Embodiments of the present invention provide a data storage and data query method and apparatus, which can improve the speed of storing and retrieving data.
  • a method of data storage including:
  • the cloud storage device obtains data to be saved
  • the cloud storage device distributes the data to be saved to each cloud storage data node, and stores the data to be saved in parallel to the distributed database of the cloud storage.
  • a method of data query including:
  • the cloud storage device obtains an index field input by the user, and generates a query instruction according to the index field; the cloud storage device sends the query instruction to each cloud storage data node, and queries the data in a distributed database of the cloud storage in parallel. ;
  • the cloud storage device sends the set of query results of the cloud storage nodes to the user.
  • a device for data storage comprising:
  • An obtaining module configured to obtain data to be saved
  • a storage module configured to distribute the data to be saved to each cloud storage data node, and store the data to be saved in a distributed database of the cloud storage.
  • a device for data query comprising:
  • An obtaining module configured to obtain an index field input by a user, and generate a query instruction according to the index field
  • a processing module configured to send the query instruction to a data node of each cloud storage, query data in a distributed database of the cloud storage in parallel, and send the set of query results of the cloud storage nodes to the user.
  • a data storage system includes: a terminal and a cloud storage device;
  • the terminal is configured to extract data in the data source according to the configured data extraction rule, to obtain the first data, and save the first data in a temporary folder, so that the cloud storage device obtains the rule according to the data, and obtains
  • the transit zone path uploads the first data in the temporary folder to a corresponding directory in a temporary file transfer area of the cloud storage device;
  • the cloud storage device is configured to upload the first data in the temporary folder in the terminal to the temporary file transfer area of the cloud storage according to the configured data uploading rule and the transit zone path In the corresponding directory, the first data in the corresponding directory of the temporary file transfer area is evenly distributed to each cloud storage data node, and the to-be-saved data is stored in parallel to the distributed database of the cloud storage.
  • the embodiment of the present invention improves a method and a device for data storage and data query, and acquires data to be saved by using a cloud storage device; the cloud storage device uniformly distributes the data to be saved to each cloud storage data node, and in parallel The data to be saved is stored in a distributed database of cloud storage. And the cloud storage device obtains an index field input by the user, and generates a query instruction according to the index field; the cloud storage device sends the query instruction to each cloud storage data node, and queries the cloud storage distributed database in parallel. Data; the cloud storage device sends the set of query results of the cloud storage nodes to the user.
  • FIG. 1 is a flowchart of a method for data storage according to Embodiment 1 of the present invention
  • 2 is a flowchart of a method for data query according to Embodiment 1 of the present invention
  • FIG. 3 is a block diagram of a device for data storage according to Embodiment 1 of the present invention
  • FIG. 4 is a block diagram of an apparatus for data query according to Embodiment 1 of the present invention.
  • FIG. 5A is a flowchart of a data storage and data query method according to Embodiment 2 of the present invention
  • FIG. 5B is a schematic diagram of a data storage and data query method according to Embodiment 2 of the present invention
  • Figure ⁇ is a block diagram of an apparatus for data query provided by Embodiment 2 of the present invention.
  • FIG. 8 is a schematic diagram of a system for data storage according to Embodiment 2 of the present invention. detailed description
  • An embodiment of the present invention provides a data storage method. As shown in FIG. 1 , the method includes: Step 101: A cloud storage device acquires data to be saved.
  • the method further includes: initial configuration of the cloud storage rule, including defining a rule of the directory and the sub-service type, defining an import rule of the file corresponding to the sub-service type, and defining a life cycle of the data, where the life cycle refers to each time defined by time.
  • Storage strategy of class data initial configuration of data extraction rules, including data source for extracting data, number of extraction processes, data range corresponding to each extraction process; initial configuration of data uploading rules, including number of uploading processes, each The data range corresponding to the upload process.
  • the first data or The second data is saved in the corresponding directory of the temporary file transfer area of the cloud storage, wherein the temporary storage step 102, the cloud storage device distributes the data to be saved to each cloud storage data node, in parallel
  • the data to be saved is stored in a distributed database of cloud storage.
  • the cloud storage device uniformly distributes the to-be-saved data to each cloud storage data node according to a hash algorithm
  • the cloud storage device stores the different data to be saved on the cloud storage data nodes in a distributed database of the cloud storage, or the same one on the cloud storage data nodes
  • the fragmented data is saved and distributed to a distributed database of cloud storage.
  • the cloud storage device uniformly distributes the to-be-saved data to each cloud storage data node, and stores the to-be-saved data in a distributed database of the cloud storage in parallel, along with the cloud
  • the increase in storage data nodes automatically increases the parallelism of parallel storage.
  • the method further includes:
  • the data to be saved is evenly distributed to each cloud storage node according to the importing rule of the corresponding file of the sub-service type in the cloud storage rule, and the data to be saved in the first file is stored in parallel to Cloud storage in the database.
  • data for different uses stored in the distributed database is saved as different number of copies according to the configured cloud storage rule, wherein the data of different uses includes production data and backup data, and the production data Used for querying.
  • the embodiment of the invention provides a data storage method, by distributing the data to be saved to On each cloud storage data node, the data to be saved is stored in parallel to the distributed database of the cloud storage, and the distributed storage of the data records in the database enables the data to be saved quickly.
  • An embodiment of the present invention provides a data query method. As shown in FIG. 2, the method includes: Step 201: A cloud storage device acquires an index field input by a user, and generates a query instruction according to the index field.
  • the user can input the mobile phone number and the month of the detailed list to be queried, and can generate a query command according to the mobile phone number and the month to perform subsequent query operations.
  • the index field of the user input is received through the query interface.
  • Step 202 The cloud storage device sends the query instruction to each cloud storage data node, and queries the data in a distributed database of the cloud storage in parallel;
  • the saved data can be divided into production data and backup data.
  • querying only the production data is queried.
  • the backup data can be used to recover the production data.
  • querying data you don't care where the data is stored and whether it is compressed.
  • the cloud storage device sends the query instruction to the data node of each cloud storage at the same time; the cloud storage device simultaneously queries the distributed database carried by the data nodes of the cloud storage to meet the query instruction. data.
  • Step 203 The cloud storage device sends the set of query results of the cloud storage nodes to the user.
  • the cloud storage device sorts the query results of the cloud storage nodes according to a user-defined rule, and sends the sorted query result set to the user; or
  • the cloud storage device sorts the query results of the cloud storage nodes in the order of the nodes, and sends the sorted query result sets to the user; or
  • the cloud storage device sequentially performs the query result of each cloud storage node according to keywords in the query result, and sends the sorted query result set to the user.
  • the embodiment of the invention provides a data query method.
  • each cloud storage data node queries data in a distributed database of the cloud storage in parallel, so that the query performance can be greatly improved.
  • the embodiment of the present invention provides a data storage device, which may be a cloud storage device, as shown in FIG. 3, the device includes: an obtaining module 301, a storage module 302;
  • the obtaining module 301 is configured to obtain data to be saved
  • the data obtaining unit in the obtaining module 301 is configured to extract data in an external data source according to the configured data extraction rule to obtain first data, or format data in the external data source. Obtaining second data after conversion;
  • a data uploading unit in the acquiring module 301 configured to acquire a transit area path of the cloud storage according to the management node of the cloud storage; and the first data according to the configured data uploading rule, and the transit area path
  • the second data is saved to the corresponding destination of the temporary file transfer area of the cloud storage.
  • the device further includes an initial configuration module, configured to initially configure a cloud storage rule, including a rule for defining a directory and a sub-service type, an import rule for defining a file corresponding to the sub-service type, and defining a life cycle of the data, where The life cycle refers to the storage strategy for defining each type of data according to time; and the initial configuration of the data extraction rules, including the data source for extracting data, the number of extraction processes, the data range corresponding to each extraction process; and the initial uploading of data upload rules Configuration, including the number of upload processes, the data range corresponding to each upload process.
  • a cloud storage rule including a rule for defining a directory and a sub-service type, an import rule for defining a file corresponding to the sub-service type, and defining a life cycle of the data, where The life cycle refers to the storage strategy for defining each type of data according to time; and the initial configuration of the data extraction rules, including the data source for extracting data, the number of extraction processes, the data range
  • the storage module 302 is configured to evenly distribute the data to be saved to each cloud storage data node, and store the data to be saved in a distributed database of the cloud storage in parallel.
  • the distribution unit in the storage module 302 is configured to uniformly distribute the to-be-saved data to each cloud storage data node according to a hash algorithm
  • a storage unit in the storage module 302 configured to simultaneously store different data to be saved on the cloud storage data nodes into a distributed database of the cloud storage, or store the data nodes in the cloud storage
  • the split segments on the same data to be saved are simultaneously stored in a distributed database of cloud storage.
  • the device further includes: a determining module, configured to determine, according to a corresponding directory of the temporary file transfer area, and a configured rule of the directory and the sub-service type in the cloud storage rule, the child of the to-be-saved data business type;
  • the storage module 302 is configured to distribute the data to be saved to the data nodes of each cloud storage according to the imported import rule of the sub-service type corresponding file in the cloud storage rule, and the Save the data to a distributed database of cloud storage.
  • the device further includes: a management module, configured to perform, according to the configured rules of the data life cycle in the cloud storage rule, different periods of the data to be saved in the distributed database of the cloud storage Processing.
  • a management module configured to perform, according to the configured rules of the data life cycle in the cloud storage rule, different periods of the data to be saved in the distributed database of the cloud storage Processing.
  • the storage module is further configured to:
  • the data of different uses stored in the distributed database is saved as different number of copies according to the configured cloud storage rule, wherein the data of the different uses includes production data and backup data, and the production data is used for querying. When used.
  • An embodiment of the present invention provides a device for storing data, where the storage module obtains data to be saved, and the storage module uniformly distributes the data to be saved to the data nodes of each cloud storage, and stores the data to be saved in parallel to the cloud storage in parallel.
  • the distributed storage of data records in the database makes it possible to save data quickly.
  • the embodiment of the present invention provides a device for querying data, and the device may be a cloud storage device, as shown in FIG. 4, the device includes: an obtaining module 401, a processing module 402;
  • the obtaining module 401 is configured to obtain an index field input by the user, and generate a query instruction according to the index field;
  • the processing module 402 is configured to send the query instruction to each cloud storage data node, query data in a distributed database of the cloud storage in parallel, and send the set of the query results of the cloud storage nodes to the User.
  • the sending unit in the processing module 402 is configured to send the query instruction to the data nodes of each cloud storage at the same time;
  • the processing unit in the processing module 402 is configured to simultaneously query data that meets the query instruction in a distributed database carried on the data nodes of each cloud storage.
  • processing module 402 is configured to:
  • the query results of the cloud storage nodes are sequenced according to keywords in the query result, and the sorted query result sets are sent to the user.
  • the embodiment of the invention provides a device for querying data.
  • the processing module By acquiring the query instruction generated by the module, the processing module simultaneously queries the data in the database in parallel, so that the query performance can be greatly improved.
  • the embodiment of the invention provides a data storage and data query method. As shown in FIG. 5, the method includes:
  • Step 501 The cloud storage device performs initial configuration on the cloud storage rule.
  • the cloud storage device receives an initial configuration of the cloud storage rule by the administrator, including defining a basic rule of the storage, and defining a data life cycle.
  • 1 defines a business name; for example, a detailed business, a billing service, an electronic document service, and the like.
  • each data can be divided into production data or backup data, wherein the production data is used for querying data, the backup data is used for recovery of production data, and is not used for data query.
  • the data can be divided into production data or backup data, wherein the production data is used for querying data, the backup data is used for recovery of production data, and is not used for data query.
  • the first and second copies are used for querying data
  • the backup data is used for recovery of production data, and is not used for data query.
  • the first and second copies as production data
  • the third copy as backup data
  • three copies of data can be set for production, and the cloud storage scheduler evenly distributes requests among three pieces of data.
  • the life cycle refers to the storage strategy for defining each type of data according to time.
  • Storage policies can include: no compression storage, compressed storage, and deletion.
  • the compressed storage can define different compression algorithms, such as low-density compression, that is, the compression ratio of the data with high query efficiency is about 2:1; the moderate compression, that is, the query and storage space, and the compression of the data.
  • the compression ratio is about 5:1; high-density compression, that is, the compression ratio is higher than the compression ratio of 8:1 for the data with better query efficiency.
  • different storage time ranges can adopt different storage policies. For example, different storage policies can be set when the data is stored in the database, the data storage is on the Xth day, and the data storage is in the Yth month.
  • the management of the database using the data lifecycle rules of the business type can automatically compress and clear the data, reduce the management difficulty, and improve the database usage rate.
  • production data and backup data can have different data life cycles.
  • production data can be stored without being compressed in the database, low-density compression after 30 days, and deleted after 90 days.
  • the backup data can be compressed by medium density when stored in the database. High-density compression after 90 days, never deleted.
  • sub-service type for example, the detailed service, which can be divided into GSM (Globa l Stem tem of Mobi le communication), GSM voice companion, SMS companion, etc.
  • GSM Global System for Mobile communications
  • SMS companion etc.
  • the type is equivalent to a table of cloud storage.
  • information such as the number of copies of a sub-business type inherits the settings of the business type.
  • the sub-service type can also set its own number of saved copies, and the data life cycle of each copy.
  • the main settings include: Name of the information; Location of the information; Type of information, such as integer, decimal, string, large text (for storing images) , file), etc.; whether the information is data-distributed, for example, the entire data can be uniformly distributed according to the information; whether the information is time-type data, the life cycle of the data can be defined according to the field; when the information is time-type data , Set the time format, for example, using YYYY-MM-DD HH24: MI: SS format, etc.
  • the information name is: mobile phone number (such as 13606401754); information resolution position is 1; information type is string STRING, that is, according to string processing; information is data distribution type: Yes, for example, GSM detailed list is evenly distributed according to mobile phone number; Whether the information is time type data: not time type; no time format is set.
  • the format of the GSM voice bill is:
  • the information name is: call start time; information resolution position is 4; information type is string STRING, that is, according to string processing; information is data distribution type: Yes, for example, GSM detailed list is evenly distributed according to mobile phone number; Whether the information is time type data: It is time type; the time format is: YYYY-MM-DD HH24: MI : SS , such as 2011-12-31 09: 30: 00.
  • the information name is: CDR type; the information resolution position is 2; the information type is string STRING, that is, according to the string processing; whether the information is data distribution type: No; whether the information is time type data: not time type; Time format.
  • the information name is: the other party number (such as: 053188163000); the information resolution position is 3; the information type is string STRING, that is, according to the string processing; whether the information is data distribution type: No; whether the information is time type data: not time Type; no time format is set.
  • the information name is: the duration of the call (such as 51 seconds); the information resolution position is 5; the type of information is The string STRING, that is, processed according to the string; whether the information is data distribution type: No; whether the information is time type data: not time type; no time format is set.
  • the information name is: call cost (such as 0. 20 yuan); information resolution position is 6; information type is decimal type; information is data distribution type: no; information is time type data: not time type; no time is set format.
  • Step 502 The cloud storage device performs initial configuration on a data extraction rule, and initially configures a data upload rule.
  • the cloud storage device receives an initial configuration of the data extraction and data uploading rules by the administrator.
  • the data extraction rule includes the following contents: 1 an external data source used for extracting data, and a connection mode with the data source; 2 data extraction The number of processes, and the data range corresponding to each data extraction process, for example, data extraction according to region, mobile number segment, number mantissa, etc.; 3 size of extracted files, for example, the first file is 10M, and the number of extracted numbers is threshold For example, extract up to 100 phone numbers; 4 file storage path after data extraction.
  • the data uploading rules include the following: The number of data uploading processes, and the data range corresponding to each data uploading process.
  • steps 501 and 502 are performed for the implementation of the embodiment of the present invention.
  • the execution sequence of the steps 501 and 502 is not strictly fixed. Step 501 may be performed first, or step 502 may be performed first.
  • Step 503 The cloud storage device extracts data in the external data source according to the configured data extraction rule, obtains the first data, or performs format conversion on the data in the external data source to obtain the second data.
  • the external data source can be a data source saved in the terminal.
  • the external data source can directly receive the data in a format that can be recognized by the cloud storage by converting the CDR format into a format that can be recognized by the cloud storage.
  • the data is not extracted from the external data source, the second data after the format conversion is obtained, and then the received data is received.
  • Data is uploaded and imported into a distributed database of cloud storage.
  • the format of the extracted data is in the format of a text file, for example: a bill file, in the format:
  • each field is defined as:
  • the other party number for example, 053188163000;
  • the length of the call (in seconds), for example, 51 seconds;
  • the meaning of the first call detail list is 13606400001.
  • the mobile phone owner dialed the number 053188163000 at 2011-12-31 09: 30: 00.
  • the call duration is 51 seconds, and the call charge is 0.2 yuan.
  • Step 504 The cloud storage device acquires a transit path of the cloud storage according to the management node of the cloud storage.
  • the data uploading process is connected to the management node of the cloud storage, and the uploading directory service provided by the cloud storage is invoked, wherein the data uploading directory service includes parameters: a service type, a sub-service type, and a data feature (for example, the Jinan area with the area code 531) , 20111201.
  • the cloud storage management node determines the file directory that the data uploading process can use according to the service setting and the busyness of each cloud storage node, and organizes the file directory into a URL (Uniform / Universal Resource Locator, unified resource)
  • the locator format is returned to the data uploading process.
  • the directory in the URL format is the path of the transit zone to be obtained. For example, it can be ftp: ⁇ 192.168.1. l/CDR/gsm_cdr/531/20111201/ o
  • Step 505 The cloud storage device saves the first data or the second data to a corresponding directory of a temporary file transfer area of the cloud storage according to the configured data uploading rule and the transit area path, where Each file in each directory in the temporary file transfer area is saved in a text file format; According to the type of the specific data, the data is saved to the corresponding directory in the temporary file transfer area of the cloud storage, for example, the GSM format data is saved in all files under "/CDR/gsm_cdi7", and the GPRS format data is saved to "/" CDR/gpr s/" in all files.
  • the file transfer area can ensure that the data is completely uploaded to the second directory of the cloud storage, and then imported into the distributed database of the cloud storage to improve the integrity of the transmitted data.
  • Step 506 The cloud storage device determines, according to the corresponding directory of the temporary file transfer area, and the configured rules of the directory and the sub-service type in the cloud storage rule, the sub-service type of the data to be saved;
  • the data import process of the cloud storage saves the transit zone temporary file to the distributed database of the cloud storage.
  • the data import process determines the sub-service type according to the corresponding directory of the transit zone where the data to be imported is located. Specifically, the table name corresponding to the file in each directory is obtained from the predefined "directory and sub-service type" rule in the cloud storage rule initially configured in step 501, and then multiple data import processes scan multiple directories in parallel, thereby Determine the sub-service type under the second directory.
  • Step 507 The cloud storage device distributes the to-be-saved data to the data nodes of each cloud storage according to the imported import rule of the sub-service type corresponding file in the cloud storage rule, and the Save the data storage to a distributed database of cloud storage;
  • the cloud storage device stores the data to be saved according to a hash algorithm to each cloud storage data node; the cloud storage device stores the different data to be saved on the data node in the cloud storage device. At the same time, it is stored in the distributed database of the cloud storage, or the fragments that are split by the same data to be saved on the cloud storage data nodes are simultaneously stored in the distributed database of the cloud storage.
  • the file in the temporary file transfer area is imported into the distributed database of the cloud storage, wherein, when importing, according to the data distribution rule defined in the import rule, the data to be saved in the temporary file transfer area is automatically distributed to each cloud storage node according to the hash hash algorithm.
  • the mobile phone number can be uniformly distributed, and the record of the mobile phone number 1 in the data to be saved in the temporary file transfer area is distributed to the node A, and the record of the mobile phone number 2 is distributed to the node B.
  • the cloud storage is automatically imported in parallel, and the degree of parallelism is automatically increased for the increase of the cloud storage node.
  • the degree of parallelism is 3, and when the cloud storage nodes are four, The degree of parallelism is 4.
  • the traditional import method does not import the database in parallel by default. Although it can be manually specified to import data in parallel, it will not be imported and read with the addition of hardware capabilities, which is more performance than the traditional data import method. Upgrade.
  • the data to be saved when the data to be saved is imported into the distributed database of the cloud storage, the data can be saved in multiple copies according to the configured cloud storage rules. For example, three copies of the data can be saved for the detailed service, and the first and second copies are production data. , used for query, and does not compress, the third is backup data, for medium density compression.
  • Step 508 The cloud storage device performs different processing on different periods of the data to be saved in the distributed database of the cloud storage according to the configured rules of the data life cycle in the cloud storage rule.
  • low-density compression can be performed after 30 days, and deleted after 90 days.
  • medium-density compression is performed while being stored in the distributed database of cloud storage, and high-density compression is performed after 90 days, and will never be deleted.
  • Automatically compressing and clearing the database according to the rules of the data life cycle can improve the storage rate of the database, and can reduce the workload of maintenance personnel and reduce the management difficulty.
  • Step 509 The cloud storage device acquires an index field input by a user, and generates a query instruction according to the index field.
  • the index field can be the phone number, after the query month.
  • the generated query instructions include Mobile number, query month.
  • the index field of the user input is received through the query interface.
  • Step 510 The cloud storage device sends the query instruction to each cloud storage data node, and queries the data in a distributed database of the cloud storage in parallel;
  • the cloud storage device sends the query instruction to the data node of each cloud storage at the same time; the cloud storage device simultaneously queries the distributed database carried by the data nodes of the cloud storage to meet the query instruction. data.
  • Step 511 The cloud storage device sends the set of query results of the cloud storage nodes to the user.
  • the cloud storage device sorts the query results of the cloud storage nodes according to a user-defined rule, and sends the sorted query result set to the user; or
  • the cloud storage device sorts the query results of the cloud storage nodes in the order of the nodes, and sends the sorted query result sets to the user; or
  • the cloud storage device sequentially performs the query result of each cloud storage node according to keywords in the query result, and sends the sorted query result set to the user.
  • a cloud storage device acquires data to be saved in an external data source through a data extraction process.
  • the data upload process acquires a temporary file and uploads it to a temporary file transfer area. Waiting for import into the distributed database of the cloud storage; the data import process obtains the files in the transfer area of the temporary file and imports them into the distributed database of the cloud storage in parallel.
  • the data in the distributed database is managed in the later stage, the data is compressed and cleared by the rules of the data life cycle.
  • the user needs to query the data in the distributed database of the cloud storage, the user can directly query through the query interface.
  • the data storage and data query method provided by the embodiment of the invention can improve the speed of storing and retrieving data and reduce the management difficulty by providing parallel storage and parallel data query.
  • An embodiment of the present invention provides a data storage device, where the device may be a cloud storage device.
  • the device includes: an obtaining module 601, a data acquiring unit 6011, a data uploading unit 6012, a storage module 602, and a distribution unit. 6021, a storage unit 6022, an initial configuration module 603, a determination module 604, a management module 605;
  • the obtaining module 601 is configured to obtain data to be saved
  • the storage module 602 is connected to the acquisition module 601, and is configured to evenly distribute the data to be saved to the data nodes of each cloud storage, and store the data to be saved in parallel to the distributed database of the cloud storage.
  • the device further includes an initial configuration module 603, configured to initially configure a cloud storage rule, including a rule for defining a directory and a sub-service type, defining an import rule of a file corresponding to the sub-service type, and defining a life cycle of the data.
  • the life cycle refers to a storage strategy for defining each type of data according to time; and initial configuration of the data extraction rules, including data sources for extracting data, number of extraction processes, data ranges corresponding to each extraction process; and data uploading rules
  • the initial configuration including the number of upload processes, the data range corresponding to each upload process.
  • the obtaining module 601 includes: a data acquiring unit 6011, a data uploading unit 6012;
  • the data obtaining unit 6011 is configured to extract, according to the configured data extraction rule, the data in the external data source to obtain the first data, or perform format conversion on the data in the external data source to obtain the second data.
  • the data uploading unit 6012 is connected to the data acquiring unit 6011, and configured to acquire a transit area path of the cloud storage according to the management node of the cloud storage; and the data uploading rule according to the configuration, and the path of the transit area
  • the data or the second data is saved in a corresponding directory of the temporary file transfer area of the cloud storage, wherein each file in each directory in the temporary file transfer area is saved in a text file format.
  • the device further includes a determining module 604; the determining module 604 and the storing
  • the module 602 is configured to determine, according to the corresponding directory of the temporary file transfer area, and the configured rules of the directory and the sub-service type in the cloud storage rule, the sub-service type of the data to be saved; the storage module 602 And storing, according to the imported import rule of the sub-service type corresponding file in the cloud storage rule, the data to be saved is distributed to the data nodes of each cloud storage, and the data to be saved is stored in the cloud in parallel Stored in a distributed database.
  • the distribution unit 6021 in the storage module 602 is configured to uniformly distribute the to-be-stored data to each cloud storage data node according to a hash algorithm
  • the storage unit 6022 of the storage module 602 is configured to simultaneously store different data to be saved on the cloud storage data nodes into a distributed database of the cloud storage, or store the data in the cloud storage.
  • the split segments of the same data to be saved on the node are simultaneously stored in a distributed database of cloud storage.
  • the cloud storage data node increases automatically as the cloud storage data node increases. Parallelism of parallel storage.
  • the storage module 602 is further configured to: save, according to the configured cloud storage rule, data of different uses stored in the distributed database as different number of copies, where the data of the different uses includes production data and Backing up data, the production data is used for querying.
  • the device further includes: a management module 605, configured to perform different processing on different periods of the data to be saved in the distributed database of the cloud storage according to the configured rules of data lifecycle in the cloud storage rule .
  • a management module 605 configured to perform different processing on different periods of the data to be saved in the distributed database of the cloud storage according to the configured rules of data lifecycle in the cloud storage rule .
  • 3 copies of data can be saved, the first and second copies are production data, which are used for query, and are not compressed, and the third is backup data for medium density compression.
  • An embodiment of the present invention provides a data storage device, which acquires data to be saved by using an acquiring module.
  • the storage module uniformly distributes the data to be saved to data nodes of each cloud storage, and stores the data to be saved to the cloud in parallel. In a distributed distributed database, you can increase the speed of data storage.
  • An embodiment of the present invention provides a device for querying data.
  • the device may be a cloud storage device.
  • the device includes: an obtaining module 701, a processing module 702, and a sending unit 7021.
  • the obtaining module 701 is configured to obtain an index field input by the user, and generate a query instruction according to the index field;
  • the processing module 702 is configured to send the query instruction to the data nodes of each cloud storage, query data in a distributed database of the cloud storage in parallel, and send the set of the query results of the cloud storage nodes to the User.
  • the saved data can be divided into production data and backup data.
  • querying only the production data is queried.
  • the backup data can be used to recover the production data.
  • querying data you don't care where the data is stored and whether it is compressed.
  • the sending unit 7021 in the processing module 702 is configured to send the query instruction to the data nodes of each cloud storage at the same time.
  • the processing unit 7022 in the processing module 702 is configured to simultaneously be in the cloud.
  • the distributed database that is carried on the stored data node queries the data that meets the query instruction.
  • processing module 702 is configured to:
  • the query results of the cloud storage nodes are sequenced according to keywords in the query result, and the sorted query result sets are sent to the user.
  • the embodiment of the invention provides a device for querying data.
  • the processing module queries the data in the database in parallel by the query module generated by the module, so that the query performance can be greatly improved.
  • the devices shown in FIG. 6 and FIG. 7 may be the same device, and the cloud storage device, that is, the cloud storage device, can perform data storage and data query functions at the same time.
  • the embodiment of the present invention provides a data storage system, as shown in FIG. 8, including a terminal 801 and a cloud storage device 802;
  • the terminal 801 is configured to pump data in the data source according to the configured data extraction rule. Obtaining, obtaining the first data; saving the first data in a temporary folder, so that the cloud storage device uploads the first data in the temporary folder according to the data uploading rule and the obtained transit zone path to The corresponding directory of the temporary file transfer area of the cloud storage device;
  • the cloud storage device 802 is configured to upload the first data in the temporary folder in the terminal to a temporary file transfer area of the cloud storage according to the configured data uploading rule and the transit zone path. And correspondingly distributing the first data in the corresponding directory of the temporary file transfer area to each cloud storage data node, and storing the to-be-saved data in parallel to the distributed database of the cloud storage.
  • the data source is data stored in the terminal 801.
  • Each data extraction process performs data extraction according to the configured data extraction rules, and saves the extracted data to a directory corresponding to the data extraction process, and each directory includes multiple temporary files, such as saving the first data to be extracted. Is the first file, and saves the first file to the first directory corresponding to the data extraction process.
  • the corresponding directory is /531/gsm_cdr/201112/01/1360640.
  • a new file is generated when the size of the temporary file or the number of saved numbers reaches the configured data extraction rule threshold.
  • GSM_531_20111201_1360640.0020 represents the CDR file of the 16000640 segment of Jinan City in December 2011, the serial number is 0020.
  • GSM_531-20111201_1360640 is generated. 0021 file.
  • the cloud storage device 802 can be the device for data storage as described in FIG.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention porte sur un procédé et sur un appareil de stockage et d'interrogation de données qui concernent le domaine technique des réseaux de communication et qui améliorent la vitesse de stockage et d'interrogation de données. La solution décrite selon les modes de réalisation de la présente invention consiste à obtenir les données à stocker par l'intermédiaire d'un dispositif de stockage en nuage, à distribuer lesdites données à stocker à chaque nœud de données de stockage en nuage d'une manière uniforme, et à stocker lesdites données à stocker dans la base de données distribuée de stockage en nuage en parallèle. Le dispositif de stockage en nuage obtient les champs d'index saisis par l'utilisateur, génère des instructions d'interrogation conformément auxdits champs d'index; envoie lesdites instructions d'interrogation à chaque nœud de données de stockage en nuage, interrogation des données dans la base de données distribuée de stockage en nuage en parallèle; envoie un ensemble de résultats d'interrogation de chaque nœud de données de stockage en nuage audit utilisateur. La solution décrite selon les modes de réalisation de la présente invention est appropriée pour être utilisée pendant un stockage et une interrogation de données.
PCT/CN2012/079155 2012-07-25 2012-07-25 Procédé et appareil de stockage et d'interrogation de données WO2014015488A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2012/079155 WO2014015488A1 (fr) 2012-07-25 2012-07-25 Procédé et appareil de stockage et d'interrogation de données
CN201280000916.6A CN102906751B (zh) 2012-07-25 2012-07-25 一种数据存储、数据查询的方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/079155 WO2014015488A1 (fr) 2012-07-25 2012-07-25 Procédé et appareil de stockage et d'interrogation de données

Publications (1)

Publication Number Publication Date
WO2014015488A1 true WO2014015488A1 (fr) 2014-01-30

Family

ID=47577492

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/079155 WO2014015488A1 (fr) 2012-07-25 2012-07-25 Procédé et appareil de stockage et d'interrogation de données

Country Status (2)

Country Link
CN (1) CN102906751B (fr)
WO (1) WO2014015488A1 (fr)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116661B (zh) * 2013-03-20 2016-01-27 广东宜通世纪科技股份有限公司 一种数据库的数据处理方法
CN103207919A (zh) * 2013-04-26 2013-07-17 北京亿赞普网络技术有限公司 一种MongoDB集群快速查询计算的方法及装置
CN104123300B (zh) * 2013-04-26 2017-10-13 上海云人信息科技有限公司 数据分布式存储系统及方法
CN104426942A (zh) * 2013-08-27 2015-03-18 鸿富锦精密工业(深圳)有限公司 文件上传方法及系统
CN104424109B (zh) * 2013-09-09 2020-03-24 联想(北京)有限公司 一种信息处理方法及电子设备
CN103458055A (zh) * 2013-09-22 2013-12-18 广州中国科学院软件应用技术研究所 一种云计算平台
CN103841177A (zh) * 2013-11-08 2014-06-04 汉柏科技有限公司 一种云计算基于网格化的数据存储系统和方法
CN103685488B (zh) * 2013-12-03 2017-08-11 华为软件技术有限公司 网盘中资源的控制方法及网盘
CN105264521B (zh) * 2014-02-18 2018-10-30 华为技术有限公司 一种数据表的导入方法、数据管理器以及服务器
CN104049917A (zh) * 2014-06-25 2014-09-17 北京思特奇信息技术股份有限公司 数据处理方法及系统
TWI604320B (zh) * 2014-08-01 2017-11-01 緯創資通股份有限公司 巨量資料存取方法以及使用該方法的系統
CN104572862A (zh) * 2014-12-19 2015-04-29 阳珍秀 一种海量数据存储访问方法及系统
CN104618482B (zh) * 2015-02-02 2019-07-16 浙江宇视科技有限公司 访问云数据的方法、服务器、传统存储设备、系统
CN106156209A (zh) * 2015-04-23 2016-11-23 中兴通讯股份有限公司 数据处理方法及装置
CN105022833A (zh) * 2015-08-10 2015-11-04 浪潮(北京)电子信息产业有限公司 一种数据处理的方法、节点及监控系统
CN106557469B (zh) * 2015-09-24 2020-11-20 创新先进技术有限公司 一种处理数据仓库中数据的方法及装置
CN105912609B (zh) * 2016-04-06 2019-04-02 中国农业银行股份有限公司 一种数据文件处理方法和装置
CN105938489A (zh) * 2016-04-14 2016-09-14 北京思特奇信息技术股份有限公司 一种压缩详单的存储和展示方法及系统
CN106372115A (zh) * 2016-08-23 2017-02-01 成都乾威科技有限公司 一种数据读写方法、系统及数据库系统
CN107967279A (zh) * 2016-10-19 2018-04-27 北京国双科技有限公司 分布式数据库的数据更新方法及装置
CN106649530B (zh) * 2016-10-21 2020-12-15 北京卡拉卡尔科技股份有限公司 云详单查询管理系统及方法
CN106569896B (zh) * 2016-10-25 2019-02-05 北京国电通网络技术有限公司 一种数据分发及并行处理方法和系统
CN107092700A (zh) * 2017-05-02 2017-08-25 山东浪潮通软信息科技有限公司 一种基于大数据量下批量导入数据的方法及装置
CN108241742B (zh) * 2018-01-02 2022-03-29 联想(北京)有限公司 数据库查询系统和方法
CN109521954B (zh) * 2018-10-12 2021-11-16 许继集团有限公司 一种配网ftu定点文件管理方法及装置
CN109447876A (zh) * 2018-10-16 2019-03-08 湖北三峡云计算中心有限责任公司 一种市民卡系统
CN111797422A (zh) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 数据隐私保护查询方法、装置、存储介质及电子设备
US11100109B2 (en) 2019-05-03 2021-08-24 Microsoft Technology Licensing, Llc Querying data in a distributed storage system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222090A (zh) * 2011-06-02 2011-10-19 清华大学 一种云环境下海量数据资源管理框架
CN102360390A (zh) * 2011-10-24 2012-02-22 浙江大学 一种基于医学关键词的知识云数据库检索方法和系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101557551A (zh) * 2009-05-11 2009-10-14 成都市华为赛门铁克科技有限公司 一种移动终端访问云服务的方法、装置和通信系统

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222090A (zh) * 2011-06-02 2011-10-19 清华大学 一种云环境下海量数据资源管理框架
CN102360390A (zh) * 2011-10-24 2012-02-22 浙江大学 一种基于医学关键词的知识云数据库检索方法和系统

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHEN, YONG: "Design and implementation of communication data distributed query algorithm based on Hadoop platform", ELECTRONIC TECHNOLOGY & INFORMATION SCIENCE, CHINA MASTER'S THESES FULL-TEXT DATABASE, June 2010 (2010-06-01), pages 14 AND 31 - 40 *
WANG, PENG ET AL.: "Study of Realized Method on a Cloud Computer Architecture", COMPUTER ENGINEERING & SCIENCE, vol. 31, no. AL, October 2009 (2009-10-01), pages 11 - 13 *

Also Published As

Publication number Publication date
CN102906751B (zh) 2015-12-02
CN102906751A (zh) 2013-01-30

Similar Documents

Publication Publication Date Title
WO2014015488A1 (fr) Procédé et appareil de stockage et d'interrogation de données
CN102790760B (zh) 安全网盘系统中一种基于目录树的数据同步方法
CN1318974C (zh) 数据库备份数据的压缩和查询方法
WO2016180055A1 (fr) Procédé, dispositif et système de stockage et de lecture de données
CN106161633B (zh) 一种基于云计算环境下打包文件的传输方法及系统
CN102456059A (zh) 重复数据删除的处理系统
WO2018036324A1 (fr) Procédé et dispositif de partage d'informations de ville intelligente
CN101937474A (zh) 海量数据查询方法及设备
CN105302920A (zh) 一种云存储数据的优化管理方法和系统
CN104239377A (zh) 跨平台的数据检索方法及装置
US20200065306A1 (en) Bloom filter partitioning
CN104050276A (zh) 一种分布式数据库的缓存处理方法及系统
CN108415671B (zh) 一种面向绿色云计算的重复数据删除方法及系统
WO2017174013A1 (fr) Procédé et appareil de gestion de stockage de données et système de stockage de données
Upadhyay et al. Deduplication and compression techniques in cloud design
CN103823807A (zh) 一种去除重复数据的方法、装置及系统
JP2022520654A (ja) データのハイブリッド保存を利用したデータアーカイブ方法およびシステム
CN113486026A (zh) 数据处理方法、装置、设备及介质
WO2009097710A1 (fr) Procédé pour organiser et récupérer des fichiers, module et système pour organiser des fichiers et support de mémoire associé
CN104035943A (zh) 存储数据的方法及相应服务器
CN115203159A (zh) 一种数据存储方法、装置、计算机设备和存储介质
CN110913017B (zh) 一种基于云桌面的文件压缩传输方法
JP7390356B2 (ja) クローニング後のテナント識別子変換のためのレコードの識別
CN106776617B (zh) 日志文件的保存方法和装置
Kaur et al. Image processing on multinode hadoop cluster

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201280000916.6

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12881838

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12881838

Country of ref document: EP

Kind code of ref document: A1