CN112527210A

CN112527210A - Storage method and device of full data and computer readable storage medium

Info

Publication number: CN112527210A
Application number: CN202011535141.5A
Authority: CN
Inventors: 王晓红; 唐积益; 黄旭辉; 侯腾蛟
Original assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Current assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-03-19

Abstract

The invention discloses a storage method and a storage device for full data and a computer readable storage medium, wherein the method comprises the following steps: acquiring a plurality of full-scale data, and extracting a keyword in each full-scale data and a characteristic value corresponding to the keyword; generating fields corresponding to the full data according to the keywords and the characteristic values, and generating data documents according to the fields; and storing the data document in a first storage space. The invention saves the storage space of the full data.

Description

Storage method and device of full data and computer readable storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for storing full data, and a computer-readable storage medium.

Background

In the construction of smart city/community projects, a connection platform needs to face the access problem of mass data, the data come from sensing equipment of various types, the data are called as full data, and the full data comprise data such as a face snapshot machine, a vehicle barrier gate, a bayonet snapshot machine, an entrance guard, smoke detection and the like.

Analyzing the full-scale data can find that the full-scale data contains structured information, such as device numbers, names, reporting time and the like, and may also contain some unstructured information, such as pictures, videos and the like. The storage of the full amount of data has a problem that the storage space occupies a large amount due to the huge data amount of the full amount of data and the diverse information content of the full amount of data from different devices.

Disclosure of Invention

The invention mainly aims to provide a method and a device for storing full data and a computer readable storage medium, and aims to solve the problem that the storage space of the full data is large.

In order to achieve the above object, the present invention provides a method for storing full-scale data, which comprises the following steps:

acquiring a plurality of full-scale data, and extracting a keyword in each full-scale data and a characteristic value corresponding to the keyword;

generating fields corresponding to the full data according to the keywords and the characteristic values, and generating data documents according to the fields;

and storing the data document in a first storage space.

In an embodiment, the full-amount data includes text information and picture information, and the step of generating the field corresponding to the full-amount data according to the keyword and the feature value includes:

acquiring a storage path corresponding to the picture information;

and generating fields corresponding to the full data according to the keywords, the characteristic values and the storage paths, wherein the keywords and the characteristic values are determined according to the text information.

In an embodiment, the step of obtaining the storage path corresponding to the image information includes:

extracting the picture information in the full data, and converting the picture information into a data file;

sending the data file to a server, storing the data file by the server, and generating a storage path corresponding to the data file;

and receiving a storage path corresponding to the picture information fed back by the server.

In an embodiment, after the step of storing the data document in the preset first storage space, the method further includes:

and creating an index corresponding to the data document, and establishing a mapping relation between the index and the data document, wherein the index comprises creation time of the index.

In an embodiment, the step of creating the index corresponding to the data document includes:

determining slicing parameters and copy parameters of the data document;

and creating the index according to the fragment parameters, the copy parameters and the current time, wherein the current time is the creation time of the index.

In one embodiment, the step of establishing the mapping relationship between the index and the data document includes:

acquiring the retrieval frequency of the index within a preset time length;

and if the retrieval frequency is less than the preset frequency, deleting the index corresponding to the retrieval frequency and the data document corresponding to the index.

In an embodiment, after the step of storing the data document in the first storage space, the method further includes:

generating an identifier corresponding to the full data;

and associating the identifier with the data document to be sorted at the tail of a list in a first storage space, wherein the list comprises a plurality of groups of target data sorted from morning to evening according to time, and the target data comprises the data document and the identifier associated with the data document.

acquiring the access frequency of the data document within a preset time length;

when the access frequency is less than a preset frequency, storing the data document in a second storage space, wherein the hardware configuration of the second storage space is lower than that of the first storage space;

deleting the data document in the first storage space.

In order to achieve the above object, the present invention further provides a storage device for full data, which includes a memory, a processor, and a storage program for full data stored in the memory and executable on the processor, wherein the storage program for full data realizes the steps of the storage method for full data as described above when executed by the processor.

To achieve the above object, the present invention also provides a computer-readable storage medium storing a storage program of the full-volume data, which when executed by a processor implements the steps of the storage method of the full-volume data as described above.

The invention provides a storage method and a storage device of full data and a computer readable storage medium, which are used for acquiring a plurality of full data and extracting keywords and characteristic values in each full data; and generating fields corresponding to the full data according to the keywords and the characteristic values, generating a data document according to each field, and storing the data document in a first storage space. Because the data volume of the full data is huge, the full data can occupy a larger storage space when being directly stored, after the full data is obtained, the keywords of each full data and the characteristic values corresponding to the keywords are extracted, the fields corresponding to the full data are generated according to the keywords and the characteristic values, the data documents are generated according to the fields, and the data documents are stored in the first storage space, namely the storage space of the full data is reduced by storing the key data of the full data.

Drawings

FIG. 1 is a diagram illustrating a hardware structure of a full data storage device according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a method for storing full data according to a first embodiment of the present invention;

FIG. 3 is a detailed flowchart of step S20 of the method for storing full data according to the second embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a connection platform of the storage method for full data according to the present invention;

FIG. 5 is a flowchart illustrating a full data storage method according to a third embodiment of the present invention;

FIG. 6 is a flowchart illustrating a method for storing full data according to a fourth embodiment of the present invention;

FIG. 7 is a flowchart illustrating a fifth embodiment of a method for storing full data according to the present invention;

FIG. 8 is a schematic structural diagram of binary data generated by snowflake coding of the storage method of full data according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The main solution of the embodiment of the invention is as follows: acquiring a plurality of full-scale data, and extracting a keyword in each full-scale data and a characteristic value corresponding to the keyword; generating fields corresponding to the full data according to the keywords and the characteristic values, and generating data documents according to the fields; and storing the data document in a first storage space.

Because the data volume of the full data is huge, the full data can occupy a larger storage space when being directly stored, after the full data is obtained, the keywords of each full data and the characteristic values corresponding to the keywords are extracted, the fields corresponding to the full data are generated according to the keywords and the characteristic values, the data documents are generated according to the fields, and the data documents are stored in the first storage space, namely the storage space of the full data is reduced by storing the key data of the full data.

As one implementation, a storage device based on a full amount of data may be as shown in fig. 1.

The embodiment of the invention relates to a storage device of full data, which comprises: a processor 101, e.g. a CPU, a memory 102, a communication bus 103. Wherein a communication bus 103 is used for enabling the connection communication between these components.

The memory 102 may be a high-speed RAM memory or a non-volatile memory (e.g., a disk memory). As shown in fig. 1, a storage program of a full amount of data may be included in a memory 102 as a computer-readable storage medium; and the processor 101 may be configured to invoke a stored program of the full amount of data stored in the memory 102 and perform the following operations:

and storing the data document in a first storage space.

In one embodiment, the processor 101 may be configured to invoke a stored procedure for the full amount of data stored in the memory 102 and perform the following operations:

acquiring a storage path corresponding to the picture information;

determining slicing parameters and copy parameters of the data document;

acquiring the retrieval frequency of the index within a preset time length;

generating an identifier corresponding to the full data;

deleting the data document in the first storage space.

Based on the hardware architecture of the storage device of the full amount of data, the embodiment of the storage method of the full amount of data is provided.

Referring to fig. 2, fig. 2 is a first embodiment of a storage method of full data according to the present invention, where the storage method of full data includes the following steps:

step S10, acquiring a plurality of full volume data, and extracting keywords in each full volume data and feature values corresponding to the keywords;

specifically, the full data is data detected by sensing devices of different types, and the sensing devices can be face snapshot machines, vehicle gates, bayonet snapshot machines, access control machines and the like. The full volume data may be 1400 protocol data, where 1400 protocol is a public security video image information database protocol. The full data can also be a face picture of a kafka cluster, wherein kafka is a distributed publish-subscribe message system which can process operation flow data of a user in a website. Extracting keywords and feature values corresponding to the keywords in each full amount of data, illustratively, the full amount of data of the face information, wherein the keywords are sex codes, and the corresponding feature values are males; the key words are the face appearance time, and the corresponding characteristic values are ten am; the key words are face disappearance time, and the corresponding characteristic values are ten tenths of a day in the morning; and judging whether the keyword is a person involved in the case or not, and judging whether the characteristic value corresponding to the keyword is negative or not.

Step S20, generating fields corresponding to the full data according to the keywords and the characteristic values, and generating data documents according to the fields;

specifically, fields corresponding to the full data are generated according to the keywords and the characteristic values, and a data document in a preset document format is generated according to each field, wherein the preset document format can be a JSON (JavaScript Object Notation) format, and the data document is a JSON document containing the fields. The full data is converted into the JSON document, and various types of full data are stored in a uniform mode, so that the table structure change caused by the fact that different table structures need to be designed for different data types and the content of the full data changes is avoided.

Step S30, storing the data document in a first storage space.

Specifically, the data document is stored in the first storage space, the data document may be directly stored in the first storage space, or the data document may be cached, the cached data document is stored in the first storage space at regular time, and the cached data document is deleted, or when the data amount of the cached data document is greater than the preset data amount, the cached data document is stored in the first storage space, and the cached data document is deleted.

After the data document is stored in the first storage space, acquiring the access frequency of the data document within a preset time length; when the access frequency is less than the preset frequency, the use of the data documents is less, the data documents are stored in a second storage space, and the data documents in the first storage space are deleted, wherein the hardware configuration of the second storage space is lower than that of the first storage space; and when the access frequency is greater than or equal to the preset frequency, the data file is normally used, and the data file is kept stored in the first storage space.

In the technical scheme of this embodiment, because the data volume of the full data is huge, and direct storage occupies a large storage space, after the full data is acquired, the keywords of each full data and the feature values corresponding to the keywords are extracted, the fields corresponding to the full data are generated according to the keywords and the feature values, the data documents are generated according to the fields, and the data documents are stored in the first storage space, that is, the storage space of the full data is reduced by storing the key data of the full data.

Referring to fig. 3, fig. 3 is a second embodiment of the storage method of the full amount data according to the present invention, and based on the first embodiment, the step S20 includes:

step S21, obtaining a storage path corresponding to the picture information;

step S22, generating a field corresponding to the full data according to the keyword, the feature value, and the storage path, where the keyword and the feature value are determined according to the text information.

Specifically, the full data may include text information and picture information, where the text information includes information such as a device number, a name, and reporting time, and the picture information may be information such as a picture and a video. In the case where the keyword and the feature value of the picture information are difficult to specify, the storage path corresponding to the picture information may be acquired and the storage path may be used as the field corresponding to the picture information. The method comprises the steps of obtaining a storage path corresponding to picture information, firstly extracting the picture information in the full data, and converting the picture information into a data file in a preset format, wherein the picture information and the data file can be in one-to-one correspondence; then sending the data file to a server, storing the data file by the server, and generating a storage path corresponding to the data file; and finally, receiving a storage path corresponding to the picture information fed back by the server.

And if the full data contains both the text information and the picture information, generating a field corresponding to the full data according to the keyword and the characteristic value of the text information and the storage path of the picture information. A field may be generated according to each full amount of data, for example, a field may be generated according to a keyword and a feature value of each text message, and a field may be generated according to a storage path of each picture message. A field may also be generated from multiple full volumes of data, illustratively generated from keywords, feature values, and storage paths of multiple full volumes of data over a day. The external application may obtain a storage path of the picture information from the data document and then download the picture information.

As shown in fig. 4, fig. 4 provides a functional module for implementing a method for storing full data, where the obtaining module obtains the full data, determines a data type of the full data, and introduces an FDFS (fast dfs, open source distributed file system) as a file server of a connection platform in a connection platform if the full data is picture information. When the connection platform uploads the data file to the FDFS through an API (Application Program Interface), wherein the API is a calling Interface reserved for the connection platform by the FDFS. And storing the picture information in the FDFS, feeding back a storage path corresponding to the data file to the processing module by the FDFS, and generating a data document by the processing module by taking the storage path as a field. The connection platform through the storage path may download a corresponding data file from the FDFS through the API. If the full data is the text information, the processing module extracts the keywords of the text information and the characteristic values corresponding to the keywords, and the processing module takes the keywords and the characteristic values as fields to generate a data document. After converting the full amount of data into a data document, it is written into the index of the open source distributed search and analysis engine.

In the technical scheme in this embodiment, the full-size data includes picture information from which the keyword and the feature value cannot be extracted, and after the picture information is stored, a storage path corresponding to the picture information is used as a field, so that the data size of the full-size data is reduced while main data of the full-size data is retained, and the storage space of the full-size data is saved.

Referring to fig. 5, fig. 5 is a third embodiment of the storage method of the full amount data according to the present invention, and based on the first or second embodiment, after step S30, the method further includes:

step S40, creating an index corresponding to the data document, and establishing a mapping relationship between the index and the data document, where the index includes creation time of the index.

In particular, the index is used to quickly find the storage location of the data document. The index is equivalent to the directory of the book, and the required content can be quickly found according to the page number in the directory. The mapping relation between the index and the data document can be one-to-one, and the corresponding data document can be retrieved through the index. Creating an index corresponding to a data document, wherein a fragment parameter and a copy parameter of the index data document need to be determined firstly; since the open source distributed search and analysis engine is a distributed search engine, the index is usually decomposed into different parts, and the data distributed at different nodes is the fragments. The open source distributed search and analysis engine automatically manages and organizes shards and is capable of rebalancing distribution of shard data under preset conditions. The slice parameter is a parameter for determining the number of slices of the index and the like. If the open source distributed search and analysis engine creates 5 master shards for an index, a copy shard may be created for each master shard. The copy parameter is a parameter for determining a copy of the index shard. And creating the index according to the fragment parameter, the copy parameter and the current time, wherein the current time is the creation time of the index.

A new index can be created in the open source distributed search and analysis engine every day, the name of the index can be event- [ yyyy-mm-dd ], the suffix is the date of the day and is used for storing the data documents of the day, a time interval is set, for example, 5 minutes is set, the data documents of the day are inquired regularly, and the new index is generated according to the data documents of the day.

In the technical scheme of the embodiment, the indexes corresponding to the data documents are created, the mapping relation between the indexes and the data documents is established, the retrieval efficiency is improved, and the situations that when a large number of data documents exist in the storage space, if the data documents are required to be queried, all the data documents need to be taken out one by one, the data documents are compared with query conditions one by one, then records meeting the conditions are returned, a large amount of query time is consumed, and a large number of disk I/O operations are caused are avoided.

Referring to fig. 6, fig. 6 is a fourth embodiment of the storage method of full amount data according to the present invention, and based on either the first or third embodiment, after step S40, the method further includes:

step S50, acquiring the retrieval frequency of the index within a preset time length;

and step S60, if the retrieval frequency is less than the preset frequency, deleting the index corresponding to the retrieval frequency and the data document corresponding to the index.

Specifically, the retrieval frequency of the index within a preset time length is obtained, the index is less used under the condition that the retrieval frequency is less than the preset frequency, and the index corresponding to the retrieval frequency and the data document corresponding to the index are deleted in order to not occupy the storage space; when the retrieval frequency is greater than or equal to the preset frequency, the indexes are used more, and the indexes corresponding to the retrieval frequency are not deleted. Illustratively, a time period of 15 days is set, and when the full amount of data is written into the index, all the data documents in the index are written into the other second storage space, and the index and the data documents in the first storage space are deleted. The hardware configuration of the second storage space is lower than that of the first storage space.

In the technical scheme of the embodiment, when the retrieval frequency of the index is less than the preset frequency, the index corresponding to the retrieval frequency and the data document corresponding to the index are deleted, so that the index with low retrieval frequency is prevented from occupying a large amount of storage space, and the storage space is saved.

Referring to fig. 7, fig. 7 is a fifth embodiment of the storage method of full-scale data according to the present invention, and based on any one of the first or fourth embodiments, after step S30, the method further includes:

step S70, generating an identifier corresponding to the full data;

step S80, associating the identifier with the data document to sort at the end of a list in the first storage space, where the list includes a plurality of sets of target data sorted from morning to evening according to time, and the target data includes the data document and the identifier associated with the data document.

Specifically, a snowflake coding (Snow Flake) algorithm may be adopted to generate the identifier of the data document, and the identifier may be represented by one long type of event _ id, wherein the event _ id is sorted according to the increment of the acquisition time of the full amount of data and uniquely identifies the data document. Illustratively, the snowflake encoding algorithm will generate a 64-bit binary data, of the Long type, as shown in fig. 8, with the first bit: is unused; a second part: 41 bits are millisecond time; and a third part: 5 digits datacentrId (Central data identification) and 5 workerId (device identification); the fourth part: 12 bits are a count in milliseconds, and a 12-bit count sequence number supports generation of 4096 ID sequence numbers per millisecond by each node. The marks generated by the snowflake coding algorithm are sorted according to the time increment and are distinguished by the datacenter and the workerId, the generated marks cannot be repeated, and the generation efficiency is high. The identifier is associated with the data document to be sorted at the end of the list in the first storage space, wherein the list comprises a plurality of sets of target data sorted from morning to evening in time, and the target data comprises the data document and the identifier associated with the data document. The data documents may be designated to be sorted by identity when retrieved by the external application.

In the technical solution of this embodiment, an identifier corresponding to the full-volume data is generated, and the identifier is associated with a data document generated from the full-volume data. When the external application retrieves the data documents, the data documents can be designated to be sorted according to the identifiers, so that the full amount of data can be inquired more effectively.

The present invention also provides a storage device for full data, which includes a memory, a processor, and a storage program for full data stored in the memory and executable on the processor, and when the storage program for full data is executed by the processor, the storage program for full data implements the steps of the storage method for full data according to the above embodiments.

The present invention also provides a computer-readable storage medium storing a storage program of the full-volume data, which when executed by a processor implements the steps of the storage method of the full-volume data as described in the above embodiments.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a computer-readable storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above, and includes several instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A storage method of full-volume data is characterized by comprising the following steps:

and storing the data document in a first storage space.

2. The method for storing full-scale data according to claim 1, wherein the full-scale data includes text information and picture information, and the step of generating the field corresponding to the full-scale data according to the keyword and the feature value includes:

acquiring a storage path corresponding to the picture information;

3. The method for storing full amount of data according to claim 2, wherein the step of obtaining the storage path corresponding to the image information comprises:

4. The method for storing full amount of data as set forth in claim 1, wherein after the step of storing the data document in a preset first storage space, further comprising:

5. The method for storing full amount of data according to claim 4, wherein the step of creating the index corresponding to the data document comprises:

determining slicing parameters and copy parameters of the data document;

6. The method for storing full amount of data according to claim 4, wherein the step of establishing a mapping relationship between the index and the data document comprises:

acquiring the retrieval frequency of the index within a preset time length;

7. The method for storing full amount of data according to claim 1, wherein said step of storing said data document in said first storage space is followed by further comprising:

generating an identifier corresponding to the full data;

8. The method for storing full amount of data according to claim 1, wherein said step of storing said data document in said first storage space is followed by further comprising:

deleting the data document in the first storage space.

9. A storage device for full data, characterized in that the storage device for full data comprises a memory, a processor and a storage program for full data stored in the memory and executable on the processor, and when the storage program for full data is executed by the processor, the storage program for full data realizes the steps of the storage method for full data according to any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a storage program of the full volume data, which when executed by a processor implements the steps of the storage method of the full volume data according to any one of claims 1 to 8.