CN115421665A - Data storage method, device, equipment and storage medium - Google Patents

Data storage method, device, equipment and storage medium Download PDF

Info

Publication number
CN115421665A
CN115421665A CN202211145714.2A CN202211145714A CN115421665A CN 115421665 A CN115421665 A CN 115421665A CN 202211145714 A CN202211145714 A CN 202211145714A CN 115421665 A CN115421665 A CN 115421665A
Authority
CN
China
Prior art keywords
data
parameter
words
target data
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211145714.2A
Other languages
Chinese (zh)
Inventor
杨超
安沛贤
周添楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dameng Database Co Ltd
Original Assignee
Shanghai Dameng Database Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dameng Database Co Ltd filed Critical Shanghai Dameng Database Co Ltd
Priority to CN202211145714.2A priority Critical patent/CN115421665A/en
Publication of CN115421665A publication Critical patent/CN115421665A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data storage method, a data storage device, data storage equipment and a data storage medium. The method comprises the following steps: acquiring target data; splitting the target data to obtain parameter words; replacing the parameter words in the target data with placeholders to obtain parameterized data; if the parameterized data meets the preset conditions, the target data is stored, and by the technical scheme of the invention, the problems that too much storage space is occupied due to the fact that a large amount of repeated data is stored and the data analysis and processing are very complicated are solved, the storage of the data with the same key information is reduced, and the occupied storage space is further reduced.

Description

Data storage method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a data storage method, a data storage device, data storage equipment and a data storage medium.
Background
Various types of data are generally generated during the running of the program, and the generated data is more and more as the running time of the program is longer and longer. The large amount of data occupies a very large storage space on one hand and is very cumbersome to analyze on the other hand. The data is found to have very many repeated contents through analysis, and in order to ensure that excessive storage space is not occupied and to facilitate analysis of such data, data generated by program operation needs to be deduplicated.
Common practice for data deduplication is to compare the newly generated data with the previously generated data in an equal value, retain the newly generated data if the comparison results are completely equal, and delete the same data that was previously generated.
However, a large amount of duplicate data still exists when the deduplication method is used for data generated by program operation, and therefore, excessive storage space is occupied.
Disclosure of Invention
Embodiments of the present invention provide a data storage method, an apparatus, a device, and a storage medium, which solve the problems that too much storage space is occupied due to the storage of a large amount of repeated data, and the data analysis and processing are very complicated, reduce the storage of data of the same key information, and further reduce the occupied storage space.
According to an aspect of the present invention, there is provided a data storage method, including:
acquiring target data;
splitting the target data to obtain parameter words;
replacing the parameter words in the target data with placeholders to obtain parameterized data;
and if the parameterized data meet preset conditions, storing the target data.
According to another aspect of the present invention, there is provided a data storage device comprising:
the data acquisition module is used for acquiring target data;
the splitting module is used for splitting the target data to obtain parameter words;
the replacing module is used for replacing the parameter words in the target data with placeholders to obtain parametric data;
and the storage module is used for storing the target data if the parameterized data meets preset conditions.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the data storage method of any of the embodiments of the invention.
According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement a data storage method according to any one of the embodiments of the present invention when the computer instructions are executed.
The embodiment of the invention obtains the target data; splitting the target data to obtain parameter words; replacing the parameter words in the target data with placeholders to obtain parameterized data; if the parameterized data meets the preset conditions, the target data is stored, the problems that too much storage space is occupied due to the fact that a large amount of repeated data are stored, and data analysis and processing are very complex are solved, storage of data with the same key information is reduced, and occupied storage space is further reduced.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flow chart of a data storage method in an embodiment of the invention;
FIG. 2 is a flow chart of another method of data storage in an embodiment of the invention;
FIG. 3 is a schematic diagram of a data storage device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a flowchart of a data storage method provided in an embodiment of the present invention, where this embodiment is applicable to a data storage situation, and the method may be executed by a data storage device in an embodiment of the present invention, where the device may be implemented in a software and/or hardware manner, and as shown in fig. 1, the method specifically includes the following steps:
and S110, acquiring target data.
The target data may be an SQL statement, a piece of data in a log file, or an event, which is not limited in this embodiment of the present invention.
And S120, splitting the target data to obtain parameter words.
The parameter word may be a word including special characters and numbers, the parameter word may also be a word including special characters and target characters, and the target characters may be preset words or preset characters, for example, the parameter word may include: a string of characters enclosed by apostrophes, such as '192.168.100.164', or the parameter words include: bracketed numbers, e.g., (1620).
Specifically, the method for splitting the target data to obtain the parameter words may be as follows: acquiring lexical format information of the parameter words; and performing word segmentation processing on the target data according to the lexical format information of the parameter words to obtain key words and parameter words, for example, determining a numerical value enclosed by a character string and a bracket, which are enclosed by quotation marks, in the target data as the parameter words, and determining the rest words as the key words.
Optionally, splitting the target data to obtain parameter words, where the parameter words include:
acquiring lexical format information of parameter words;
and performing word segmentation processing on the target data according to the lexical format information of the parameter words to obtain key words and parameter words.
The lexical format information of the parameter words may be: the special character + string + special character may be, for example, a string enclosed by an apostrophe, such as '192.168.100.164'. The lexical format information of the parameter words may also be: the special character + value + special character may be, for example, a bracketed value, e.g., (1620).
Wherein the key words are words other than parameter words.
Specifically, the method for performing word segmentation processing on the target data according to the lexical format information of the parameter words to obtain the key words and the parameter words may be: determining words including special characters and numbers in the target data as parameter words; determining words except for the parameter words in the target data as key words.
Optionally, performing word segmentation processing on the target data according to the lexical format information of the parameter words to obtain key words and parameter words, including:
determining words including special characters and numbers in the target data as parameter words;
determining words except for the parameter words in the target data as key words.
The number may be a numeric value or a character string, which is not limited in this embodiment of the present invention.
Specifically, the manner of determining the word including the special characters and numbers in the target data as the parameter word may be: and the target data is structured as follows: the word of special character + number + special character is determined as a parameter word.
In one specific example, the target data is: can't connect to DM server on '192.168.100.164' port (1620) errno (111). In this case, '192.168.100.164', (1620) and (111) are parameter words, and the other words are all key words.
S130, replacing the parameter words in the target data with placeholders to obtain parameterized data.
The placeholder can be a plurality of placeholders, and the placeholder can be determined in a manner of: for different placeholders corresponding to different types of parameter words, the way of determining the placeholder may be: if the special characters included in the parameter words are different, the placeholders corresponding to the parameter words are different. For example, the string parameter word placeholder may be: 'String', numeric parameter word placeholder: (Number).
Specifically, replacing the parameter words in the target data with placeholders to obtain the parameterized data may be: acquiring a first placeholder corresponding to the character string parameter word and a second placeholder corresponding to the numerical parameter word, wherein the first placeholder and the second placeholder are different; and replacing the character string parameter words in the target data with first placeholders, and replacing the numerical parameter words in the target data with second placeholders to obtain parameterized data.
In a specific example, the String parameter word in the target data is replaced with 'String', and the numeric parameter word in the target data is replaced with (Number).
Optionally, the parameter words include: a string parameter word and a numeric parameter word;
replacing the parameter words in the target data with placeholders to obtain parameterized data, including:
acquiring a first placeholder corresponding to the character string parameter word and a second placeholder corresponding to the numerical parameter word, wherein the first placeholder and the second placeholder are different;
and replacing the character string parameter words in the target data with first placeholders, and replacing the numerical parameter words in the target data with second placeholders to obtain parameterized data.
Wherein the string parameter words include: the first special character and the first number may be, for example, a character string in which the character string parameter word is enclosed by a single quotation mark, and the numerical parameter word includes: the second special character and the second number may be, for example, a numeric parameter word that is bracketed.
Wherein the first placeholder comprises: the first special character is different from the first word, which is not a critical word, and the first word is a non-parametric word, which may be, for example, that the first placeholder is 'String'. The second placeholder includes: the second special character is different from a second word, which is not a critical word, and the second word is a non-parametric word, for example, the second placeholder may be (Number).
Optionally, the character string parameter words include: a first special character and a first number, the numeric parameter word comprising: a second special character and a second number, the first placeholder comprising: a first special character and a first word, the second placeholder comprising: a second special character and a second word, neither the first word nor the second word being a non-critical word, and the first word and the second word being different.
And S140, if the parameterized data meet preset conditions, storing the target data.
Wherein the preset condition may be: the hash table does not contain the same data as the parameterized data; the preset conditions may also be: the hash table does not have the same MD5 value as the MD5 value of the parameterized data.
Specifically, if the parameterized data meets a preset condition, the target data may be stored in a manner of: if the hash table does not have the data which is the same as the parameterized data, storing the parameterized data and the target data in the hash table; if the parameterized data meets the preset condition, the target data may be stored in the following manner: acquiring an MD5 value of parametric data; and if the MD5 value identical to the MD5 value of the parameterized data does not exist in the hash table, storing the MD5 value of the parameterized data and the target data into the hash table.
Optionally, if the parameterized data meets a preset condition, storing the target data, including:
and if the hash table does not have the same data as the parameterized data, storing the parameterized data and the target data in the hash table.
The hash table stores parametric data and raw data corresponding to the parametric data.
Specifically, parameterized data is compared with parameterized data stored in a hash table, and if data identical to the parameterized data does not exist in the hash table, the parameterized data and the target data are stored in the hash table. Optionally, if the parameterized data meets a preset condition, storing the target data:
acquiring an MD5 value of parametric data;
and if the MD5 value identical to the MD5 value of the parameterized data does not exist in the hash table, storing the MD5 value of the parameterized data and the target data into the hash table.
The hash table stores MD5 values and original data corresponding to the MD5 values.
Specifically, the MD5 value of the parameterized data is compared with the MD5 value stored in the hash table, and if the MD5 value identical to the MD5 value of the parameterized data does not exist in the hash table, the MD5 value of the parameterized data and the target data are stored in the hash table.
If the same data as the parameterized data exists in the hash table, whether to store the parameterized data and the target data in the hash table is selected as required. For example, if it is required to retain the oldest generated data, the parameterized data and the target data are not used to replace the existing data in the hash table; and replacing the data in the hash table with the parameterized data and the target data if the latest generated data is required to be reserved. .
And if an MD5 value identical to the MD5 value of the parametric data exists in the hash table, selecting whether to store the MD5 value of the parametric data and the target data in the hash table according to requirements. For example, if it is required to retain the oldest generated data, the MD5 value of the parameterized data and the target data are not used to replace the existing data in the hash table; and if the requirement is to reserve the latest generated data, replacing the existing data in the hash table by using the MD5 value of the parameterized data and the target data.
In one specific example, the following data exists:
Can't connect to DM server on'192.168.100.164'port(1620)errno(111);
Can't connect to DM server on'192.168.100.163'port(1620)errno(111);
Can't connect to DM server on'192.168.100.163'port(1640)errno(111);
Can't connect to DM server on'192.168.100.164'port(1641)errno(111);
comm_inet_msg_recv_for_ecs got error,port Failure occurs in data_recv_inet_once,code(104)len(32892);
comm_inet_msg_recv_for_ecs got error,port Failure occurs in data_recv_inet_once,code(104)len(1020);
the above six pieces of data are run log data generated by the database program. The first four data are the same except the specific IP address value in the form of character string, the port number value in the form of numerical value and the error number value; the latter two pieces of data are identical except for the code number value and the length value in numerical form. Therefore, it is considered that a portion of the data excluding a specific parameter value is regarded as key information of the data. The first four log key information expressions are all database connection failure, and the second two log key information expressions are all database reading failure.
Summarizing, it is found that the same kind of data with the same key information can be regarded as the repeated data, i.e. the first four data are mutually repeated in this example, and the last two data are also mutually repeated. The expected result of this example is to retain the latest generated data after data deduplication, i.e., retain only the fourth and sixth pieces of data.
And through a conventional data deduplication method with equivalence comparison, the first four pieces of data are unequal, the last two pieces of data are also unequal, deduplication cannot be performed, and an expected result cannot be achieved. Therefore, the embodiment of the invention provides a method for realizing data deduplication, which comprises the steps of splitting target data into key words and parameter words, and replacing the parameter words by using placeholders to form parameterized data; then calculating the MD5 value of the parameterized data, comparing the MD5 value with a key of a hash table, judging whether the MD5 value exists or not, if so, storing the latest data (the data in the hash table needs to be updated) in a user requirement table, so that the data stored in the hash table needs to be replaced according to the user requirement, and if not, recording the MD5 value and the corresponding original data into the hash table; and finally, the value stored in the hash table is the data after the duplication is removed. The method can effectively and quickly realize data deduplication. The duplicate removal method can be applied to scenes such as log duplicate removal, SQL duplicate removal, event duplicate removal and the like.
Step one, taking a piece of target data, and splitting the target data into key words and parameter words. The key word is as follows: non-parametric words are all called key words. The parameter words: divided into a string parameter word and a numeric parameter word. String parameter words: a string of characters enclosed by apostrophes, such as '192.168.100.164'. Numerical parameter words: bracketed numbers, e.g., (1620). After lexical formats of key words, character string parameter words and numerical parameter words are defined, single data are divided into a plurality of words through a lexical analysis program. The target data are as follows:
Can't connect to DM server on'192.168.100.164'port(1620)errno(111);
and splitting the data to obtain words. Wherein, '192.168.100.164' is a character string parameter word, (1620) and (111) are numerical parameter words, and the other words are all key words.
And step two, after replacing the parameter words with the placeholders, forming parameterized data with the key words, and calculating the MD5 value of the parameterized data. Wherein the string parameter word placeholder: 'String'. Numeric parameter word placeholder: (Number).
Connecting key words and parameter placeholders by spaces to form parameterized data:
Can't connect to DM server on'String'port(Number)errno(Number)
the MD5 value of the parametric data is calculated.
Typically, the length of the MD5 value of the parameterized data string is less than the parameterized string itself, and the MD5 value of the comparison string is faster than the string is compared directly.
And step three, storing the MD5 value calculated by the data after the duplication removal and the parameterized data by using a hash table.
The key of the hash table is an MD5 value calculated by the parameterized data, and the value is the original data corresponding to the MD5 value.
And traversing and searching the key of the hash table, and comparing and judging whether the calculated MD5 value already exists.
If the MD5 value calculated in the step two exists in the hash table, when the latest generated data is needed, replacing the data corresponding to the MD5 value by the data taken out in the step one; when the earliest generated data is required, no replacement is performed. Without limitation, the present invention is described in terms of data that needs to be generated at the latest.
And if the data does not exist, adding the MD5 value calculated in the step two and the data taken out in the step one into the hash table.
And step four, repeating the step one to the step three until the data are processed, wherein the data in the hash table are the data reserved after the duplication removal.
The data retained in the above example are:
Can't connect to DM server on'192.168.100.164'port(1641)errno(111)
comm_inet_msg_recv_for_ecs got error,port Failure occurs in data_recv_inet_once,code(104)len(1020)
in another specific example, as shown in fig. 2, it is determined whether data has been completely retrieved, if the data has been completely retrieved, the data in the hash table is stored, if the data has not been completely retrieved, a piece of data is taken as target data, lexical analysis is performed on the target data to obtain a word list, a word is taken out from the word list, if the taken-out word is a character string parameter word, the word is replaced with a character string placeholder, if the taken-out word is a numerical parameter word, the word is replaced with a numerical placeholder, if all the words in the word list are completely replaced, parameterized data is obtained, an MD5 value of the parameterized data is calculated, it is determined whether the same MD5 value exists in the hash table, and if the word exists, data corresponding to the MD5 value in the hash table is replaced as the target data. And if not, storing the MD5 value and the target data to a hash table.
According to the technical scheme of the embodiment, target data are acquired; splitting the target data to obtain parameter words; replacing the parameter words in the target data with placeholders to obtain parameterized data; if the parameterized data meets the preset conditions, the target data is stored, the problems that too much storage space is occupied due to the fact that a large amount of repeated data are stored, and data analysis and processing are very complex are solved, storage of data with the same key information is reduced, and occupied storage space is further reduced.
Example two
Fig. 3 is a schematic structural diagram of a data storage device according to an embodiment of the present invention. The present embodiment may be applicable to the case of data storage, and the apparatus may be implemented in a software and/or hardware manner, and the apparatus may be integrated in any device providing a data storage function, as shown in fig. 3, where the data storage apparatus specifically includes: a data acquisition module 310, a splitting module 320, a replacement module 330, and a storage module 340.
The data acquisition module is used for acquiring target data;
the splitting module is used for splitting the target data to obtain parameter words;
the replacing module is used for replacing the parameter words in the target data with placeholders to obtain parametric data;
and the storage module is used for storing the target data if the parameterized data meets a preset condition.
The product can execute the method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
According to the technical scheme of the embodiment, target data are acquired; splitting the target data to obtain parameter words; replacing the parameter words in the target data with placeholders to obtain parameterized data; if the parameterized data meets the preset conditions, the target data is stored, the problems that too much storage space is occupied due to the fact that a large amount of repeated data is stored and data analysis and processing are very complicated are solved, the storage of data with the same key information is reduced, and the occupied storage space is further reduced.
EXAMPLE III
FIG. 4 shows a schematic block diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to the bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as a data storage method.
In some embodiments, the data storage method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the data storage method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the data storage method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the Internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired result of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method of storing data, comprising:
acquiring target data;
splitting the target data to obtain parameter words;
replacing the parameter words in the target data with placeholders to obtain parameterized data;
and if the parameterized data meet preset conditions, storing the target data.
2. The method of claim 1, wherein storing the target data if the parameterized data meets a predetermined condition comprises:
and if the hash table does not have the same data as the parameterized data, storing the parameterized data and the target data in the hash table.
3. The method of claim 1, wherein the target data is stored if the parameterized data satisfies a predetermined condition:
acquiring an MD5 value of parametric data;
and if the hash table does not have the MD5 value which is the same as the MD5 value of the parameterized data, storing the MD5 value of the parameterized data and the target data into the hash table.
4. The method of claim 1, wherein splitting the target data to obtain parameter words comprises:
acquiring lexical format information of the parameter words;
and performing word segmentation processing on the target data according to the lexical format information of the parameter words to obtain key words and parameter words.
5. The method of claim 4, wherein performing word segmentation processing on the target data according to lexical format information of the parameter words to obtain key words and parameter words comprises:
determining words including special characters and numbers in the target data as parameter words;
determining words except for the parameter words in the target data as key words.
6. The method of claim 1, wherein the parameter word comprises: a string parameter word and a numeric parameter word;
replacing the parameter words in the target data with placeholders to obtain parameterized data, including:
acquiring a first placeholder corresponding to the character string parameter word and a second placeholder corresponding to the numerical parameter word, wherein the first placeholder and the second placeholder are different;
and replacing the character string parameter words in the target data with first placeholders, and replacing the numerical parameter words in the target data with second placeholders to obtain parameterized data.
7. The method of claim 6, wherein the string parameter word comprises: a first special character and a first number, the numeric parameter word comprising: a second special character and a second number, the first placeholder including: a first special character and a first word, the second placeholder comprising: a second special character and a second word, neither the first word nor the second word being a non-critical word, and the first word and the second word being different.
8. A data storage device, comprising:
the data acquisition module is used for acquiring target data;
the splitting module is used for splitting the target data to obtain parameter words;
the replacing module is used for replacing the parameter words in the target data with placeholders to obtain parametric data;
and the storage module is used for storing the target data if the parameterized data meets a preset condition.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data storage method of any one of claims 1-7.
10. A computer-readable storage medium storing computer instructions for causing a processor to perform the data storage method of any one of claims 1-7 when executed.
CN202211145714.2A 2022-09-20 2022-09-20 Data storage method, device, equipment and storage medium Pending CN115421665A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211145714.2A CN115421665A (en) 2022-09-20 2022-09-20 Data storage method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211145714.2A CN115421665A (en) 2022-09-20 2022-09-20 Data storage method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115421665A true CN115421665A (en) 2022-12-02

Family

ID=84204867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211145714.2A Pending CN115421665A (en) 2022-09-20 2022-09-20 Data storage method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115421665A (en)

Similar Documents

Publication Publication Date Title
CN112528067A (en) Graph database storage method, graph database reading method, graph database storage device, graph database reading device and graph database reading equipment
CN113868273A (en) Metadata snapshot method and device
CN116796085A (en) File processing method and device, electronic equipment and storage medium
CN116578646A (en) Time sequence data synchronization method, device, equipment and storage medium
CN115617800A (en) Data reading method and device, electronic equipment and storage medium
CN114564149B (en) Data storage method, device, equipment and storage medium
CN114817223A (en) Service data extraction method and device, electronic equipment and storage medium
CN112887426B (en) Information stream pushing method and device, electronic equipment and storage medium
CN115639966A (en) Data writing method and device, terminal equipment and storage medium
CN114168119B (en) Code file editing method, device, electronic equipment and storage medium
CN115328898A (en) Data processing method and device, electronic equipment and medium
CN115454971A (en) Data migration method and device, electronic equipment and storage medium
CN115617549A (en) Thread decoupling method and device, electronic equipment and storage medium
CN115329150A (en) Method and device for generating search condition tree, electronic equipment and storage medium
CN115422275A (en) Data processing method, device, equipment and storage medium
CN115438007A (en) File merging method and device, electronic equipment and medium
CN115249043A (en) Data analysis method and device, electronic equipment and storage medium
CN115048352A (en) Log field extraction method, device, equipment and storage medium
CN115421665A (en) Data storage method, device, equipment and storage medium
CN115858325B (en) Project log adjusting method, device, equipment and storage medium
CN115511014B (en) Information matching method, device, equipment and storage medium
CN115454977A (en) Data migration method, device, equipment and storage medium
CN115408195A (en) Batch task management method, equipment and storage medium for heterogeneous platform
CN118132533A (en) Data processing method, device, equipment and storage medium
CN115712645A (en) Data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination