CN108446363B - Data processing method and device of KV engine - Google Patents

Data processing method and device of KV engine Download PDF

Info

Publication number
CN108446363B
CN108446363B CN201810205692.1A CN201810205692A CN108446363B CN 108446363 B CN108446363 B CN 108446363B CN 201810205692 A CN201810205692 A CN 201810205692A CN 108446363 B CN108446363 B CN 108446363B
Authority
CN
China
Prior art keywords
data
storage
key
warehoused
engine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810205692.1A
Other languages
Chinese (zh)
Other versions
CN108446363A (en
Inventor
白敏�
高浩浩
李朋举
韩志立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qianxin Technology Group Co Ltd
Original Assignee
Beijing Qianxin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qianxin Technology Co Ltd filed Critical Beijing Qianxin Technology Co Ltd
Priority to CN201810205692.1A priority Critical patent/CN108446363B/en
Publication of CN108446363A publication Critical patent/CN108446363A/en
Application granted granted Critical
Publication of CN108446363B publication Critical patent/CN108446363B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Storage Device Security (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data processing method and a device of a KV engine, wherein the method comprises the following steps: acquiring information source data, and preprocessing the information source data to obtain data to be put in storage; the data to be warehoused is checked, and if the data to be warehoused passes the check, different KEYs are adopted to store the data to be warehoused into a KEY value KV engine library according to the data storage mode of the information invasion index; and if the triggering operation of data query is received, returning data corresponding to at least one KEY after traversing each primary KEY in the KV engine library. According to the embodiment of the invention, the data to be put in storage is checked, if the data to be put in storage passes the check, the data to be put in storage is stored in the KV engine base by adopting different KEY KEY according to the data storage mode of the information invasion index, and the data processing performance can be improved by storing different KEY on the premise of ensuring the safety of the information data.

Description

Data processing method and device of KV engine
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a data processing method and device of a KV engine.
Background
With the rapid development of computer technology and network applications, the amount of network information data is larger and larger, and the storage of mass data becomes more and more important. Data generated by various applications such as a current social network, mobile communication, network video and audio, electronic commerce, a sensor network, scientific experiments and the like can often generate tens of millions, billions or even billions of massive small files, the small files are generally very small fragmented data in many scenes, and the efficiency of traditional storage in processing writing of the massive small files is very low, because the massive small files often bring a large amount of random writing, the performance is greatly reduced due to frequent addressing of a disk in local writing, and the performance is reduced due to the inherent network delay, multiple copy settings and metadata writing in distributed storage which is more popular nowadays.
In the process of implementing the embodiment of the present invention, the inventor finds that the existing data processing method adopts a Redis or other Innodb method for processing, but the processing performance cannot meet the requirements.
Disclosure of Invention
Because the existing method has the above problems, the embodiments of the present invention provide a data processing method and apparatus for a KV engine.
In a first aspect, an embodiment of the present invention provides a data processing method for a KV engine, including:
acquiring information source data, and preprocessing the information source data to obtain data to be put in storage;
the data to be warehoused is checked, and if the data to be warehoused passes the check, different KEYs are adopted to store the data to be warehoused into a KEY value KV engine library according to the data storage mode of the information invasion index;
and if the triggering operation of data query is received, returning data corresponding to at least one KEY after traversing each primary KEY in the KV engine library.
Optionally, the acquiring the information source data, and preprocessing the information source data to obtain data to be put into a storage includes:
acquiring information source data, and if the information source data is judged to be encrypted data, carrying out decryption processing; and after the information source data are determined to be complete in structure and the content field is in a field format capable of being put into a warehouse, carrying out uniqueness check on the information source data, and carrying out de-coincidence and processing on the information source data which do not meet the uniqueness check to obtain the data to be put into the warehouse.
Optionally, the inspecting the data to be warehoused, and if the data to be warehoused passes the inspection, storing the data to be warehoused into a KEY value KV engine base by using different KEYs KEY according to a data storage manner of an intelligence invasion index, specifically including:
and checking the data to be put in storage, converting the data to be put in storage into object marked JSON strings in a preset format if the data to be put in storage passes the checking, and storing the JSON strings in a KEY value KV engine library in batches by adopting different KEY KEY according to a data storage mode of an information invasion index.
Optionally, the inspecting the data to be warehoused, and if the data to be warehoused passes the inspection, storing the data to be warehoused into a KEY value KV engine base by using different KEYs KEY according to a data storage manner of an intelligence invasion index, specifically including:
the data to be put in storage is checked, if the data to be put in storage passes the check, different KEY are adopted to additionally store the data to be put in storage into a KV engine base according to a data storage mode of an intelligence invasion index;
and if the KEY of the data to be warehoused is judged to be the same as the KEY of the target data in the KV engine library, storing the data to be warehoused into the KV engine library and covering the target data.
Optionally, the inspecting the data to be warehoused, and if the data to be warehoused passes the inspection, storing the data to be warehoused into a KEY value KV engine base by using different KEYs KEY according to a data storage manner of an intelligence invasion index, specifically including:
the data to be put in storage is checked, and if the data quantity of the data to be put in storage is smaller than a preset value after the data to be put in storage passes the check, the data to be put in storage is stored in a KEY value KV engine base in different KEY and Hash search modes according to a data storage mode of an information invasion index; and if the data quantity of the data to be warehoused is larger than or equal to the preset value after passing the inspection, continuously storing the data to be warehoused into a KEY value KV engine library by adopting different KEY and B + trees through persistent disk storage according to a data storage mode of an information invasion index.
Optionally, the method further comprises:
and performing snapshot backup on the data in the KV engine library, if the data reading is judged to be abnormal, rolling back to the snapshot backup data, and performing query and read-write operations.
In a second aspect, an embodiment of the present invention further provides a data processing apparatus for a KV engine, including:
the data preprocessing module is used for acquiring information source data and preprocessing the information source data to obtain data to be put into a warehouse;
the data inspection module is used for inspecting the data to be put in storage, and if the data to be put in storage passes the inspection, the data to be put in storage is stored in the KEY value KV engine base by adopting different KEY KEY according to the data storage mode of the information invasion index;
and the data query module is used for returning data corresponding to at least one KEY after traversing each main KEY in the KV engine library if the triggering operation of data query is received.
Optionally, the data preprocessing module is specifically configured to obtain information source data, and perform decryption processing if it is determined that the information source data is encrypted data; and after the information source data are determined to be complete in structure and the content field is in a field format capable of being put into a warehouse, carrying out uniqueness check on the information source data, and carrying out de-coincidence and processing on the information source data which do not meet the uniqueness check to obtain the data to be put into the warehouse.
Optionally, the data inspection module is specifically configured to inspect the data to be put into storage, convert the data to be put into storage into object marked JSON strings in a preset format if the data passes the inspection, and store the JSON strings in batches into the KEY value KV engine library by using different KEY KEYs according to a data storage manner of an information intrusion index.
Optionally, the data inspection module is specifically configured to inspect the data to be warehoused, and if the data to be warehoused passes the inspection, the data to be warehoused is additionally stored in the KV engine base by using different KEYs according to a data storage manner of an intelligence invasion index;
and if the KEY of the data to be warehoused is judged to be the same as the KEY of the target data in the KV engine library, storing the data to be warehoused into the KV engine library and covering the target data.
Optionally, the data inspection module is specifically configured to inspect the data to be warehoused, and if the data volume of the data to be warehoused passes the inspection and is smaller than a preset value, store the data to be warehoused in a KEY value KV engine library by using different KEY and hash search modes according to a data storage mode of an information invasion index; and if the data quantity of the data to be warehoused is larger than or equal to the preset value after passing the inspection, continuously storing the data to be warehoused into a KEY value KV engine library by adopting different KEY and B + trees through persistent disk storage according to a data storage mode of an information invasion index.
Optionally, the apparatus further comprises:
and the data backup module is used for performing snapshot backup on the data in the KV engine library, if the data reading is judged to be abnormal, rolling back to the snapshot backup data, and performing query and read-write operations.
In a third aspect, an embodiment of the present invention further provides an electronic device, including:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, which when called by the processor are capable of performing the above-described methods.
In a fourth aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium storing a computer program, which causes the computer to execute the above method.
According to the technical scheme, the data to be warehoused is inspected, if the data to be warehoused passes the inspection, the data to be warehoused is stored in the KV engine base by adopting different KEY KEY according to the data storage mode of the information invasion index, and the data processing performance can be improved by storing different KEY on the premise of ensuring the safety of the information data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic flowchart illustrating a data processing method of a KV engine according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a data processing method of a KV engine according to another embodiment of the present invention;
fig. 3 is a schematic structural diagram of a data processing apparatus of a KV engine according to an embodiment of the present invention;
fig. 4 is a logic block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Fig. 1 shows a schematic flowchart of a data processing method of a KV engine provided in this embodiment, including:
s101, obtaining information source data, and preprocessing the information source data to obtain data to be put into a warehouse.
And the intelligence source data is data to be stored in the KV engine library.
The preprocessing is to perform corresponding processing on the intelligence source data so as to facilitate warehousing, such as decompression processing, decryption processing or deduplication processing.
And the data to be put into the database is preprocessed and then stored into the KV engine database.
And S102, checking the data to be warehoused, and if the data to be warehoused passes the checking, storing the data to be warehoused into a KEY value KV engine base by adopting different KEYs KEY according to a data storage mode of an information invasion index.
And checking the data to be put in storage, namely checking the data to be put in storage according to the data checking model, returning a checking value, for example, if 1 is returned, the data is successful, if 0 is returned, the data is failed, if 2 is returned, the file is not existed, and if 3 is returned, the data is updated wrongly. Only when the return value is 1, the file update is successful, and the other file writes fail and no update is performed.
The KEY is a data representation mode, specifically, according to different data storage modes of an information intrusion Indicator (IOC), data of different data storage modes are identified and stored by adopting different KEYs:
for example: the format of a data storage mode is as follows:
{"sha1":"f603b749db0e25bc1d3296103d2069182a5391aa","network":{"udp":[],"http":[],"tcp":[],"dns":[]},"family":"not-a-virus","filetype":"HTML","filename":"VirusShare_d240791c96a4ec988fca4f651c6ec0e4","filesize":20470,"first_seen":"2017-11-2504:53:20","malicious_type":"\u672a\u77e5","md5":"d240791c96a4ec988fca4f651c6ec0e4"}
another data storage method is in a hash format, for example:
fff70334695e7b2a7442ef0f3ed13acabd7267f838657f28a3e6e60c3e6f47120d074423
s103, if the triggering operation of data query is received, traversing each main KEY in the KV engine library, and returning data corresponding to at least one KEY.
Specifically, a KEY may have duplicate values and be allowed to move up and down between values according to the criteria of improving performance of the query in storage. And in data deduplication query, the MD5 and the HASH are used as main KEYs, and if the MD5 and the HASH are both the main KEYs, data corresponding to one or more KEY values are returned through traversal access and query on each main KEY. Sequential access performs much faster than random access, since performance depends on the speed of the main memory or storage device, forward match search and integer range search on the string.
When processing information with large data volume, the performance of the traditional processing method is low, the dynamic changes of the data structure and the data volume are flexibly processed through a mechanism for effectively reading the data of the combined key value structure, and the frequent reading of the simple table structure and the complex table is improved by utilizing the mechanism. In the information intrusion index scene, under the conditions of large read-write quantity and small query quantity, the query result is taken as a key value storage structure cache, so that the query access speed of the application program is improved, the whole application program is higher in speed through expansion, and the processing performance is greatly improved.
In the embodiment, the data to be put in storage is checked, if the data to be put in storage passes the check, the data to be put in storage is stored in the KV engine base by adopting different KEY KEYs according to the data storage mode of the information invasion index, and the data processing performance can be improved by storing different KEY KEYs on the premise of ensuring the safety of the information data.
Further, on the basis of the above method embodiment, S101 specifically includes:
acquiring information source data, and if the information source data is judged to be encrypted data, carrying out decryption processing; and after the information source data are determined to be complete in structure and the content field is in a field format capable of being put into a warehouse, carrying out uniqueness check on the information source data, and carrying out de-coincidence and processing on the information source data which do not meet the uniqueness check to obtain the data to be put into the warehouse.
Specifically, whether the structure of the information source data is complete or not is checked, whether the content field is in a corresponding field format capable of being put in a warehouse or not is judged, the type of a corresponding information file is judged, information presetting and analysis fields are met in a Key-value KV engine library, the KV field of the original file is compared and verified, uniqueness verification is carried out on the data, when the same value of the KEY is added and covered, and the value of the same KEY is subjected to de-overlapping and processing.
And (3) data generation: decrypting the encrypted threat information by adopting a KV engine warehousing mode, and performing memory operation or temporary file local storage by adopting JSON format decryption
And for compressed and encrypted data such as information invasion indexes, decompression and decryption are preferentially carried out on information source data, JSON normalization is carried out, and then different KV engine libraries are inserted, so that subsequent processing is facilitated.
Further, on the basis of the above method embodiment, S102 specifically includes:
and checking the data to be put in storage, converting the data to be put in storage into object marked JSON strings in a preset format if the data to be put in storage passes the checking, and storing the JSON strings in a KEY value KV engine library in batches by adopting different KEY KEY according to a data storage mode of an information invasion index.
Specifically, a KVP (Key-VALUE process, KV processing flow) engine processing mode is adopted to convert data contents into compliant JSON strings, batch warehousing is carried out under the condition of billion-level data processing, after normalization dit operation is carried out, corresponding data warehousing additional operation is carried out, and data processing performance can be improved.
Or, S102 specifically includes:
and checking the data to be put in storage, and if the data to be put in storage passes the checking, additionally storing the data to be put in storage into a KV engine base by adopting different KEY according to a data storage mode of an intelligence invasion index.
And if the KEY of the data to be warehoused is judged to be the same as the KEY of the target data in the KV engine library, storing the data to be warehoused into the KV engine library and covering the target data.
The embodiment provides a file caching mechanism, and when the direct reading and writing occupation of data in a memory is too large, the loading reading and writing of a large file are directly written into a tmp file for additional warehousing. Only when the values of KEY are completely the same, the KEY is regarded as the same KV pair, and the covering updating is carried out at the moment; when any value is different, the additional writing is performed as different data.
Or, S102 specifically includes:
the data to be put in storage is checked, and if the data quantity of the data to be put in storage is smaller than a preset value after the data to be put in storage passes the check, the data to be put in storage is stored in a KEY value KV engine base in different KEY and Hash search modes according to a data storage mode of an information invasion index; and if the data quantity of the data to be warehoused is larger than or equal to the preset value after passing the inspection, continuously storing the data to be warehoused into a KEY value KV engine library by adopting different KEY and B + trees through persistent disk storage according to a data storage mode of an information invasion index.
Specifically, two storage modes are adopted by using a HashTable structure in the memory and a persistent storage Store on the disk:
1) when the data volume record entry is not large, the bulk set is sent to a KV engine library in a hash searching mode;
2) when the data volume is higher than hundred million levels, a memory read-write mode is not directly adopted, and the memory space overhead is reduced by adopting a mode that a B + tree performs file caching through persistent disk storage, continuously writes in and destroys after use.
Further, on the basis of the above embodiment of the method, the method further comprises:
and S104, performing snapshot backup on the data in the KV engine library, if the data is judged to be read abnormally, rolling back to the snapshot backup data, and performing query and read-write operations.
Specifically, snapshot backup is performed on the storage content in the KV engine library periodically, and when data reading is abnormal, the snapshot data can be rolled back to perform query and output, so as not to affect data processing services.
As shown in fig. 2, the data processing method of the KV engine provided in this embodiment is based on common key-value data processing, and adopts a KV engine insertion and KV management manner to provide a method for reading and writing data of the high-performance KV engine, which processes KV big data by cc (command and control server), sinkhole (slot hole) and DGA (DNS domain name generated by special algorithm) to obtain JSON data, and stores the JSON data in a first KV engine library (KV store 1); or, the IOC file is converted into a tmp large file and then stored in a second KV engine library (KV store 2). The method is suitable for persistent storage and updating of billions of levels of key-value type data processed by high-read-write low-query; when information is read and written or abnormal, contents are rolled back to be recovered, and the robustness of data reading is improved.
Fig. 3 is a schematic structural diagram of a data processing apparatus of a KV engine provided in this embodiment, where the apparatus includes: a data preprocessing module 301, a data checking module 302 and a data query module 303, wherein:
the data preprocessing module 301 is configured to obtain information source data, and preprocess the information source data to obtain data to be put into a database;
the data inspection module 302 is configured to inspect the data to be warehoused, and if the data to be warehoused passes the inspection, store the data to be warehoused into the KEY value KV engine base by using different KEYs KEY according to a data storage manner of an information invasion index;
the data query module 303 is configured to return data corresponding to at least one KEY after traversing each primary KEY in the KV engine library if a trigger operation for data query is received.
Specifically, the data preprocessing module 301 obtains information source data, and preprocesses the information source data to obtain data to be put into a warehouse; the data inspection module 302 inspects the data to be warehoused, and if the data to be warehoused passes the inspection, different KEYs KEY are adopted to store the data to be warehoused into the KEY value KV engine base according to the data storage mode of the information invasion index; if the data query module 303 receives the trigger operation of data query, the data query module returns data corresponding to at least one KEY after traversing each primary KEY in the KV engine library.
In the embodiment, the data to be put in storage is checked, if the data to be put in storage passes the check, the data to be put in storage is stored in the KV engine base by adopting different KEY KEYs according to the data storage mode of the information invasion index, and the data processing performance can be improved by storing different KEY KEYs on the premise of ensuring the safety of the information data.
Further, on the basis of the above device embodiment, the data preprocessing module is specifically configured to obtain information source data, and perform decryption processing if it is determined that the information source data is encrypted data; and after the information source data are determined to be complete in structure and the content field is in a field format capable of being put into a warehouse, carrying out uniqueness check on the information source data, and carrying out de-coincidence and processing on the information source data which do not meet the uniqueness check to obtain the data to be put into the warehouse.
Further, on the basis of the above device embodiment, the data inspection module is specifically configured to inspect the data to be put in storage, and if the data to be put in storage passes the inspection, convert the data to be put in storage into object marked JSON strings in a preset format, and store the JSON strings in batches into the KEY value KV engine library by using different KEY KEYs according to a data storage manner of an intelligence invasion index.
Further, on the basis of the above device embodiment, the data inspection module is specifically configured to inspect the data to be warehoused, and if the data to be warehoused passes the inspection, the data to be warehoused is additionally stored in the KV engine library by using different KEYs according to a data storage manner of an intelligence invasion index;
and if the KEY of the data to be warehoused is judged to be the same as the KEY of the target data in the KV engine library, storing the data to be warehoused into the KV engine library and covering the target data.
Further, on the basis of the above device embodiment, the data inspection module is specifically configured to inspect the data to be warehoused, and if the data volume of the data to be warehoused passes the inspection and is smaller than a preset value, store the data to be warehoused in a KEY value KV engine base by adopting different KEY and hash search modes according to a data storage mode of an intelligence invasion index; and if the data quantity of the data to be warehoused is larger than or equal to the preset value after passing the inspection, continuously storing the data to be warehoused into a KEY value KV engine library by adopting different KEY and B + trees through persistent disk storage according to a data storage mode of an information invasion index.
Further, on the basis of the above embodiment of the apparatus, the apparatus further comprises:
and the data backup module is used for performing snapshot backup on the data in the KV engine library, if the data reading is judged to be abnormal, rolling back to the snapshot backup data, and performing query and read-write operations.
The data processing apparatus of the KV engine described in this embodiment may be used to execute the above method embodiments, and the principle and technical effect are similar, and are not described herein again.
Referring to fig. 4, the electronic device includes: a processor (processor)401, a memory (memory)402, and a bus 403;
wherein the content of the first and second substances,
the processor 401 and the memory 402 complete communication with each other through the bus 403;
the processor 401 is configured to call program instructions in the memory 402 to perform the methods provided by the above-described method embodiments.
The present embodiments disclose a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the methods provided by the above-described method embodiments.
The present embodiments provide a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the methods provided by the method embodiments described above.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
It should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A data processing method for a KV engine, comprising:
acquiring information source data, and preprocessing the information source data to obtain data to be put in storage;
and inspecting the data to be warehoused, and if the data to be warehoused passes the inspection, storing the data to be warehoused into a KEY value KV engine library by adopting different KEYs according to a data storage mode of an information invasion index, wherein the method specifically comprises the following steps:
the data to be put in storage is checked, if the data to be put in storage passes the check, the data to be put in storage is converted into object marked JSON strings in a preset format, and the JSON strings are stored in a KEY value KV engine library in batches by adopting different KEY KEY according to a data storage mode of an information invasion index; if the triggering operation of data query is received, returning data corresponding to at least one KEY after traversing each main KEY in the KV engine library;
the method for checking the data to be put in storage includes the following steps that the data to be put in storage are checked, if the data to be put in storage pass the checking, different KEY KEY are adopted to store the data to be put in storage into a KEY value KV engine base according to a data storage mode of an information invasion index, and the method further includes the following steps:
the data to be put in storage is checked, and if the data quantity of the data to be put in storage is smaller than a preset value after the data to be put in storage passes the check, the data to be put in storage is stored in a KEY value KV engine base in different KEY and Hash search modes according to a data storage mode of an information invasion index; and if the data quantity of the data to be warehoused is larger than or equal to the preset value after passing the inspection, continuously storing the data to be warehoused into a KEY value KV engine library by adopting different KEY and B + trees through persistent disk storage according to a data storage mode of an information invasion index.
2. The method according to claim 1, wherein the obtaining of the intelligence source data, and the preprocessing of the intelligence source data to obtain the data to be put into the database specifically include:
acquiring information source data, and if the information source data is judged to be encrypted data, carrying out decryption processing; and after the information source data are determined to be complete in structure and the content field is in a field format capable of being put into a warehouse, carrying out uniqueness check on the information source data, and carrying out de-coincidence and processing on the information source data which do not meet the uniqueness check to obtain the data to be put into the warehouse.
3. The method according to claim 1, wherein the inspecting the data to be warehoused, and if the data to be warehoused passes the inspection, storing the data to be warehoused into a KEY value KV engine base by adopting different KEY KEYs according to data storage modes of intelligence invasion indexes, specifically comprises:
the data to be put in storage is checked, if the data to be put in storage passes the check, different KEY are adopted to additionally store the data to be put in storage into a KV engine base according to a data storage mode of an intelligence invasion index;
and if the KEY of the data to be warehoused is judged to be the same as the KEY of the target data in the KV engine library, storing the data to be warehoused into the KV engine library and covering the target data.
4. The method of claim 1, further comprising:
and performing snapshot backup on the data in the KV engine library, if the data reading is judged to be abnormal, rolling back to the snapshot backup data, and performing query and read-write operations.
5. A data processing apparatus for a KV engine, comprising:
the data preprocessing module is used for acquiring information source data and preprocessing the information source data to obtain data to be put into a warehouse;
the data inspection module is used for inspecting the data to be put in storage, converting the data to be put in storage into object marked JSON strings in a preset format if the data to be put in storage passes the inspection, and storing the JSON strings in batches into a KEY value KV engine library by adopting different KEYs according to a data storage mode of an information invasion index;
the data verification module is further configured to: the data to be put in storage is checked, and if the data quantity of the data to be put in storage is smaller than a preset value after the data to be put in storage passes the check, the data to be put in storage is stored in a KEY value KV engine base in different KEY and Hash search modes according to a data storage mode of an information invasion index; if the data quantity of the data to be warehoused is larger than or equal to the preset value after passing the inspection, according to a data storage mode of an information invasion index, adopting different KEY and B + trees to continuously store the data to be warehoused in a KEY value KV engine library through persistent disk storage;
and the data query module is used for returning data corresponding to at least one KEY after traversing each main KEY in the KV engine library if the triggering operation of data query is received.
6. The apparatus according to claim 5, wherein the data preprocessing module is specifically configured to obtain information source data, and perform decryption processing if it is determined that the information source data is encrypted data; and after the information source data are determined to be complete in structure and the content field is in a field format capable of being put into a warehouse, carrying out uniqueness check on the information source data, and carrying out de-coincidence and processing on the information source data which do not meet the uniqueness check to obtain the data to be put into the warehouse.
7. The device according to claim 5, wherein the data inspection module is specifically configured to inspect the data to be warehoused, and if the data to be warehoused passes the inspection, the data to be warehoused is additionally stored in the KV engine base by adopting different KEY according to a data storage manner of an intelligence invasion index;
and if the KEY of the data to be warehoused is judged to be the same as the KEY of the target data in the KV engine library, storing the data to be warehoused into the KV engine library and covering the target data.
8. The apparatus of claim 5, further comprising:
and the data backup module is used for performing snapshot backup on the data in the KV engine library, if the data reading is judged to be abnormal, rolling back to the snapshot backup data, and performing query and read-write operations.
9. An electronic device, comprising:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 4.
10. A non-transitory computer-readable storage medium storing a computer program that causes a computer to perform the method according to any one of claims 1 to 4.
CN201810205692.1A 2018-03-13 2018-03-13 Data processing method and device of KV engine Active CN108446363B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810205692.1A CN108446363B (en) 2018-03-13 2018-03-13 Data processing method and device of KV engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810205692.1A CN108446363B (en) 2018-03-13 2018-03-13 Data processing method and device of KV engine

Publications (2)

Publication Number Publication Date
CN108446363A CN108446363A (en) 2018-08-24
CN108446363B true CN108446363B (en) 2021-05-25

Family

ID=63194978

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810205692.1A Active CN108446363B (en) 2018-03-13 2018-03-13 Data processing method and device of KV engine

Country Status (1)

Country Link
CN (1) CN108446363B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271614B (en) * 2018-10-30 2022-12-13 中译语通科技股份有限公司 Data duplicate checking method
CN109684334A (en) * 2018-12-26 2019-04-26 百度在线网络技术(北京)有限公司 Date storage method, device, equipment and the storage medium of key-value pair storage system
CN111666278B (en) * 2019-03-06 2024-03-26 阿里巴巴集团控股有限公司 Data storage method, data retrieval method, electronic device and storage medium
CN110147398B (en) * 2019-04-25 2020-05-15 北京字节跳动网络技术有限公司 Data processing method, device, medium and electronic equipment
CN110175176A (en) * 2019-05-31 2019-08-27 杭州复杂美科技有限公司 A kind of KV configuration method for database, querying method, equipment and storage medium
CN110489475B (en) * 2019-08-14 2021-01-26 广东电网有限责任公司 Multi-source heterogeneous data processing method, system and related device
CN113094292B (en) 2020-01-09 2022-12-02 上海宝存信息科技有限公司 Data storage device and non-volatile memory control method
CN111209285A (en) * 2020-04-23 2020-05-29 成都四方伟业软件股份有限公司 Statistical index storage method and device based on time sequence data
CN113504896B (en) * 2021-07-12 2023-08-18 云南腾云信息产业有限公司 Service data processing method and device of application program and mobile terminal
CN115455011B (en) * 2022-11-10 2023-08-01 北京微步在线科技有限公司 Multi-source information data processing method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510209A (en) * 2009-03-30 2009-08-19 北京金山软件有限公司 Method, system and server for implementing real time search
CN103810224A (en) * 2012-11-15 2014-05-21 阿里巴巴集团控股有限公司 Information persistence and query method and device
CN104408091A (en) * 2014-11-11 2015-03-11 清华大学 Data storage method and system for distributed file system
CN104462421A (en) * 2014-12-12 2015-03-25 中国科学院声学研究所 Multi-tenant expanding method based on Key-Value database
CN106708427A (en) * 2016-11-17 2017-05-24 华中科技大学 Storage method suitable for key value pair data
CN107566376A (en) * 2017-09-11 2018-01-09 中国信息安全测评中心 One kind threatens information generation method, apparatus and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9268834B2 (en) * 2012-12-13 2016-02-23 Microsoft Technology Licensing, Llc Distributed SQL query processing using key-value storage system
CN104778182B (en) * 2014-01-14 2018-03-02 博雅网络游戏开发(深圳)有限公司 Data lead-in method and system based on HBase
CN104484471B (en) * 2014-12-31 2017-09-15 天津南大通用数据技术股份有限公司 A kind of implementation method of high-performance data storage engines
EP3128423A1 (en) * 2015-08-06 2017-02-08 Hewlett-Packard Enterprise Development LP Distributed event processing
US10275357B2 (en) * 2016-01-15 2019-04-30 Samsung Electronics Co., Ltd. System and methods for adaptive multi-level cache allocation for KV store

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510209A (en) * 2009-03-30 2009-08-19 北京金山软件有限公司 Method, system and server for implementing real time search
CN103810224A (en) * 2012-11-15 2014-05-21 阿里巴巴集团控股有限公司 Information persistence and query method and device
CN104408091A (en) * 2014-11-11 2015-03-11 清华大学 Data storage method and system for distributed file system
CN104462421A (en) * 2014-12-12 2015-03-25 中国科学院声学研究所 Multi-tenant expanding method based on Key-Value database
CN106708427A (en) * 2016-11-17 2017-05-24 华中科技大学 Storage method suitable for key value pair data
CN107566376A (en) * 2017-09-11 2018-01-09 中国信息安全测评中心 One kind threatens information generation method, apparatus and system

Also Published As

Publication number Publication date
CN108446363A (en) 2018-08-24

Similar Documents

Publication Publication Date Title
CN108446363B (en) Data processing method and device of KV engine
US9727573B1 (en) Out-of core similarity matching
US8825626B1 (en) Method and system for detecting unwanted content of files
US9230111B1 (en) Systems and methods for protecting document files from macro threats
US8756249B1 (en) Method and apparatus for efficiently searching data in a storage system
CN111723056B (en) Small file processing method, device, equipment and storage medium
US11176110B2 (en) Data updating method and device for a distributed database system
US11080196B2 (en) Pattern-aware prefetching using parallel log-structured file system
US20230254326A1 (en) System and Method for Information Gain for Malware Detection
Neuner et al. Effectiveness of file‐based deduplication in digital forensics
Rowe Identifying forensically uninteresting files using a large corpus
Wang et al. A cost‐efficient resemblance detection scheme for post‐deduplication delta compression in backup systems
US20220100718A1 (en) Systems, methods and devices for eliminating duplicates and value redundancy in computer memories
ABUSAIMEH et al. HYBRID DATA DEDUPLICATION TECHNIQUE IN CLOUD COMPUTING FOR CLOUD STORAGE.
US10209892B2 (en) Storage of format-aware filter format tracking states
CN110598467A (en) Memory data block integrity checking method
US20230138113A1 (en) System for retrieval of large datasets in cloud environments
Vikraman et al. A study on various data de-duplication systems
CN114840502A (en) Hashing using different hash sizes and compression sizes
Savić et al. The analysis and implication of data deduplication in digital forensics
US8265428B2 (en) Method and apparatus for detection of data in a data store
Neuner et al. PeekaTorrent: Leveraging P2P hash values for digital forensics
US11822803B2 (en) Method, electronic device and computer program product for managing data blocks
US11100088B1 (en) Bulk file verification
KESKİN et al. Examining The Importance of Artificial Intelligence In The Singularization Of Big Data With The Development Of Cloud Computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Room 332, 3 / F, Building 102, 28 xinjiekouwei street, Xicheng District, Beijing 100088

Patentee after: Qianxin Technology Group Co.,Ltd.

Address before: 100015 15, 17 floor 1701-26, 3 building, 10 Jiuxianqiao Road, Chaoyang District, Beijing.

Patentee before: Beijing Qi'anxin Technology Co.,Ltd.