CN107562804B

CN107562804B - Data caching service system and method and terminal

Info

Publication number: CN107562804B
Application number: CN201710670167.2A
Authority: CN
Inventors: 汤奇峰; 邓仲举
Original assignee: Shanghai Data Exchange Corp
Current assignee: Shanghai Data Exchange Corp
Priority date: 2017-08-08
Filing date: 2017-08-08
Publication date: 2020-09-01
Anticipated expiration: 2037-08-08
Also published as: CN107562804A

Abstract

A data cache service system, a method and a terminal are provided, the system comprises: the data loading module is suitable for loading various data of multiple data sources, and each data has a label value and a key value; at least one data cache, adapted to encode the tag value and/or the key value of each data when the plurality of data are obtained from the data loading module, so that the encoded tag value is smaller than the storage space occupied by the tag value, the storage space occupied by the encoded key value is smaller than the storage space occupied by the key value, and the encoded tag value and the encoded key value of the plurality of data are stored; and the at least one data query module is suitable for performing matching query on the various data in the at least one data cache according to a query request of a demand party. The technical scheme of the invention can avoid the expansion of the cache data and realize the quick query of the data.

Description

Data caching service system and method and terminal

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data caching service system, a data caching service method, and a terminal.

Background

In the field of data circulation, in mass data distribution, a data supply system needs to meet the requirements of high concurrency, high throughput, low delay and data real-time.

In existing systems for supplying data, the large data volume of the data source is usually loaded to the database. When the demander has data demand, the demander can acquire data from the database. Often, access to data or files in a database depends on disk Input/Output (IO) operations.

However, in the case of high concurrency of data, the existing provisioning data system has the following problems: 1. due to a large number of disk IO operations, the efficiency of data reading of a database is low, and requirements of millisecond-level low delay and high throughput of data distribution cannot be met. 2. Traditional databases, files and common cache systems are difficult to implement to expand horizontally as business develops. 3. After data is loaded into the cache, data expansion often occurs and a large amount of memory is occupied. 4. Under the premise of limited hardware resources and massive data, the database is difficult to keep real-time update with a data source, and the stable external service capability is kept in the data updating period.

Disclosure of Invention

The invention solves the technical problem of how to avoid buffer data expansion and realize quick query of data.

To solve the above technical problem, an embodiment of the present invention provides a data caching service system, where the data caching service system includes: the data loading module is suitable for loading various data of multiple data sources, and each data has a label value and a key value; at least one data cache, adapted to encode the tag value and/or the key value of each data when the plurality of data are obtained from the data loading module, so that the encoded tag value is smaller than the storage space occupied by the tag value, the storage space occupied by the encoded key value is smaller than the storage space occupied by the key value, and the encoded tag value and the encoded key value of the plurality of data are stored; and the at least one data query module is suitable for performing matching query on the various data in the at least one data cache according to a query request of a demand party.

Optionally, the data caching includes: and the label value encoding unit is suitable for encoding the label value of each datum to form an identification code corresponding to the label value, and the identification code comprises characters and/or numbers.

Optionally, the data caching includes: and the key value processing unit is suitable for obtaining two-level key value pairs according to the key value of each datum to serve as the coded key value, the two-level key value pairs comprise primary key values and secondary key values, and the type number of the primary key values is smaller than that of the key values.

Optionally, the key value processing unit includes: the quantity determining subunit is suitable for determining the type quantity of the primary key values according to the type quantity of the key values of the various data; a numerical value conversion subunit adapted to convert the key values of the plurality of types of data into a first numerical value string; a primary key value determining subunit, adapted to modulo the type number of the primary key value by the first numeric string, and convert the modulo value into a second numeric string as the primary key value; and the secondary key value determining subunit is suitable for selecting the characters with the set number in the first numerical string as the secondary key value.

Optionally, the data loading module includes: the sorting unit is suitable for sorting second data according to the size of the key value, wherein the first data is sorted according to the size of the key value, the second data is data in the data source, and the first data is data in the data cache; the comparison unit is suitable for sequentially selecting the first data and the sorted second data and comparing key values of at least the selected first data and the second data to obtain a comparison result; the type determining unit is suitable for determining the type of the first data and/or the type of the second data according to the comparison result; and the updating unit is suitable for updating the data of the at least one data cache according to the type of the first data and/or the type of the second data.

Optionally, the sorting unit sorts the second data according to a sequence of key values from small to large, and the first data is sorted according to a sequence of key values from small to large, where the type determining unit includes: the first type determining subunit is adapted to determine that the first data is to-be-deleted data when the comparison result indicates that the key value of the first data is smaller than the key value of the second data; and the second type determining subunit is adapted to determine that the second data is to-be-newly-added data when the comparison result shows that the key value of the first data is greater than the key value of the second data.

Optionally, when the key value of the first data is consistent with the key value of the second data, the comparing unit compares the tag value of the first data with the tag value of the second data to obtain the comparison result.

Optionally, the type determining unit includes: and the third type determining subunit is adapted to determine that the second data is changed data when the comparison result shows that the tag value of the first data is inconsistent with the tag value of the second data.

Optionally, the updating unit deletes the data to be deleted, and loads the change data and the data to be newly added to the at least one data cache.

Optionally, the data loading module stores the key values and the tag values of the plurality of data into the at least one data cache in a pipeline transmission manner or an HTTP or CLI interface manner.

Optionally, the data query module pre-establishes a coroutine pool and a connection pool, where the coroutine pool includes multiple threads, and the connection pool includes connections from multiple demander access interfaces to the cache interface.

Optionally, the data caching service system further includes: and the reverse proxy and load balancing module is suitable for receiving a plurality of query requests and uniformly distributing the query requests to the at least one data query module.

The embodiment of the invention also discloses a data cache service method, which comprises the following steps: loading multiple data of multiple data sources, wherein each data has a label value and a key value; when the multiple data are obtained from the data loading module, the tag value and/or the key value of each data are/is coded, so that the coded tag value is smaller than the storage space occupied by the tag value, the storage space occupied by the coded key value is smaller than the storage space occupied by the key value, and the coded tag values and the coded key values of the multiple data are stored; and performing matching query on the various data according to the query request of the demand party.

Optionally, the encoding the tag value of each datum includes: the tag value of each datum is encoded to form an identification code corresponding to the tag value, the identification code including characters and/or numbers.

Optionally, the encoding the key value of each datum includes: and obtaining two-level key value pairs according to the key value of each datum to be used as the coded key value, wherein the two-level key value pairs comprise primary key values and secondary key values, and the type number of the primary key values is smaller than that of the key values.

Optionally, the obtaining two levels of key value pairs according to the key value of each data includes: determining the type number of the primary key values according to the type number of the key values of the various data; converting the key values of the various data into a first numerical string; the first numerical value string is subjected to modulus operation on the type number of the primary key value, and the modulus operation value is converted into a second numerical value string to serve as the primary key value; and selecting a set number of characters in the first numerical string as the secondary key value.

Optionally, the loading the multiple data of the multiple data sources includes: sorting second data according to the size of the key values, wherein first data is sorted according to the size of the key values, the second data is data in the data source, and the first data is data in the data cache; sequentially selecting the first data and the sorted second data, and at least comparing key values of the selected first data and the second data to obtain a comparison result; determining the type of the first data and/or the type of the second data according to the comparison result; and updating the data of the at least one data cache according to the type of the first data and/or the type of the second data.

Optionally, the sorting the second data according to the sequence of key values from small to large, the sorting the first data according to the sequence of key values from small to large, and the determining the type of the first data and/or the type of the second data according to the comparison result includes: if the comparison result shows that the key value of the first data is smaller than the key value of the second data, determining that the first data is to-be-deleted data; and if the comparison result shows that the key value of the first data is larger than the key value of the second data, determining that the second data is the data to be newly added.

Optionally, the comparing at least key values of the selected first data and the second data includes: and if the key value of the first data is consistent with the key value of the second data, comparing the label value of the first data with the label value of the second data to obtain the comparison result.

Optionally, the determining the type of the first data and/or the type of the second data according to the comparison result includes: and if the comparison result shows that the label value of the first data is inconsistent with the label value of the second data, determining that the second data is changed data.

Optionally, the performing data update on the at least one data cache according to the type of the first data and/or the type of the second data includes: and deleting the data to be deleted, and loading the change data and the data to be newly added to the at least one data cache.

Optionally, the loading the multiple data of the multiple data sources includes: and loading the key values and the label values of the various data in a pipeline transmission mode or an HTTP or CLI interface mode.

Optionally, an assistant program pool and a connection pool are pre-established, where the assistant program pool includes multiple threads, and the connection pool includes connections from multiple demander access interfaces to the cache interface.

Optionally, before performing matching query on the multiple data according to the query request of the demand side, the method further includes: a plurality of query requests that are evenly distributed are received.

The embodiment of the invention also discloses a computer readable storage medium, wherein computer instructions are stored on the computer readable storage medium, and the steps of the data caching service method are executed when the computer instructions are executed.

The embodiment of the invention also discloses a terminal which comprises a memory and a processor, wherein the memory is stored with a computer instruction which can be operated on the processor, and the processor executes the steps of the data caching service method when operating the computer instruction.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

the data cache service system comprises a data loading module, a data cache module and a data cache module, wherein the data loading module is suitable for loading various data of multiple data sources, and each data has a tag value and a key value; at least one data cache, adapted to encode the tag value and/or the key value of each data when the plurality of data are obtained from the data loading module, so that the encoded tag value is smaller than the storage space occupied by the tag value, the storage space occupied by the encoded key value is smaller than the storage space occupied by the key value, and the encoded tag value and the encoded key value of the plurality of data are stored; and the at least one data query module is suitable for performing matching query on the various data in the at least one data cache according to a query request of a demand party. According to the technical scheme, the data cache encodes the tag value and/or the key value of the data, and the encoded data can occupy smaller storage space compared with original data, so that data expansion is greatly reduced, and the storage space of the cache is saved; in addition, by saving the storage space of the cache, the data query service can be provided more quickly, and the performance of the data cache service system is improved. Furthermore, at least one data cache and at least one data query module in the technical scheme of the invention can be dynamically expanded, thereby realizing the dynamic expansion of the whole system.

Further, the data loading module includes: the sorting unit is suitable for sorting second data according to the size of the key value, wherein the first data is sorted according to the size of the key value, the second data is data in the data source, and the first data is data in the data cache; the comparison unit is suitable for sequentially selecting the first data and the sorted second data and comparing key values of at least the selected first data and the second data to obtain a comparison result; the type determining unit is suitable for determining the type of the first data and/or the type of the second data according to the comparison result; and the updating unit is suitable for updating the data of the at least one data cache according to the type of the first data and/or the type of the second data. According to the technical scheme, the data type is determined, and the updating is carried out according to the data type, namely, the data loading mode of incremental updating is adopted, so that the time spent on data updating is greatly reduced, and the data loading efficiency is improved; in addition, repeated data can be prevented from being loaded, and the storage space of the cache is further saved.

Further, the data query module establishes an assistant thread pool and a connection pool in advance, wherein the assistant thread pool comprises a plurality of threads, and the connection pool comprises a plurality of connections from the access interfaces of the demand parties to the cache interfaces. According to the technical scheme, the co-program pool and the connection pool are pre-established, when an inquiry request comes, threads can be directly called from the co-program pool, and the connection from a requiring party access interface to a cache interface is called from the connection pool, so that the quick inquiry service of data is realized, and the high-concurrency and low-delay data inquiry access service can be further provided.

Drawings

Fig. 1 is a schematic structural diagram of a data caching service system according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a key value processing unit in the embodiment of the present invention;

FIG. 3 is a schematic diagram showing the detailed structure of the data loading module 101 shown in FIG. 1;

FIG. 4 is a schematic structural diagram of another data caching service system according to an embodiment of the present invention;

FIG. 5 is a flow chart of a data caching service method according to an embodiment of the present invention;

fig. 6 is a flowchart illustrating an implementation of step S501 shown in fig. 5.

Detailed Description

As described in the background, in the case of high concurrency of data, the existing provisioning data system has the following problems: 1. due to a large number of disk IO operations, the efficiency of data reading of a database is low, and requirements of millisecond-level low delay and high throughput of data distribution cannot be met. 2. Traditional databases, files and common cache systems are difficult to implement to expand horizontally as business develops. 3. After data is loaded into the cache, data expansion often occurs and a large amount of memory is occupied. 4. Under the premise of limited hardware resources and massive data, the database is difficult to keep real-time update with a data source, and the stable external service capability is kept in the data updating period.

In summary, providing a data caching service system that can be dynamically expanded, has low latency and high concurrency, and can be updated in real time is a technical problem that needs to be solved urgently in the field of data transaction.

According to the technical scheme, the data cache encodes the tag value and/or the key value of the data, and the encoded data can occupy smaller storage space compared with original data, so that data expansion is greatly reduced, and the storage space of the cache is saved; in addition, by saving the storage space of the cache, the data query service can be provided more quickly, and the performance of the data cache service system is improved. Furthermore, at least one data cache and at least one data query module in the technical scheme of the invention can be dynamically expanded, thereby realizing the dynamic expansion of the whole system.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

Fig. 1 is a schematic structural diagram of a data caching service system according to an embodiment of the present invention.

The data cache service system 10 shown in fig. 1 may include a data loading module 101, at least one data cache 102, and at least one data query module 103.

The data loading module 101 is adapted to load multiple data of multiple data sources, each data having a tag value (value) and a key value (key); the data cache 102 is adapted to encode the tag value and/or the key value of each data when the plurality of data are obtained from the data loading module 101, so that the encoded tag value is smaller than the storage space occupied by the tag value, the storage space occupied by the encoded key value is smaller than the storage space occupied by the key value, and the encoded tag value and the encoded key value of the plurality of data are stored; the data query module 103 is adapted to perform matching query on the plurality of data in the at least one data cache 102 according to a query request of a demander.

In a particular implementation, the data source may be a data supplier. The data loading module 101 may load data provided by a supplier into the data cache 102. Since the running speed of the cache (cache) is relatively high, the cache can be used for storing various data. The various data in the data cache 102 may come from multiple data suppliers. Specifically, each data may include a key value (key) and a tag value (value). Wherein the tag value may be a plurality of selectable values under a key value. In particular, the key value may serve as a tag for the data, and the tag value corresponds to the key value. For example, when the key value is age, the tag value may be an age value or a numerical range in which age is located.

The amount of the various data provided by the multiple data sources is typically large, and if the various data is stored in the data cache 102 in its entirety, it will occupy a large amount of storage space and affect the access speed. In this embodiment, when the data cache 102 obtains the plurality of data from the data loading module 101, the tag value and/or the key value of each data is encoded. The tag value and/or key value after encoding occupies less storage space than the tag value and/or key value of the data before encoding. The data cache 102 stores encoded tag values and/or key values. Specifically, the data cache 102 may encode only the key value of the data, may encode only the tag value of the data, or may encode both the key value and the tag value of the data.

Further, a mapping relationship between the tag value of the data before encoding and the tag value of the data after encoding, and a mapping relationship between the key value of the data before encoding and the key value of the data after encoding can be stored.

Those skilled in the art will understand that the data cache 102 may be implemented by any implementable technology such as redis, memcache, and the like, and the embodiment of the present invention is not limited thereto.

In a specific implementation, the data query module 103 may query the data cache 102 for data according to a query request of a requester. Specifically, since the data in the data cache 102 is encoded, when the query request is matched with the data in the data cache 102, the data query module 103 may encode the query request and match the query request with the data in the cache, where the encoding manner is the same as the encoding manner of the tag value and/or the key value; alternatively, the data in the cache may be matched according to the mapping relationship and the query request.

According to the data cache disclosed by the embodiment of the invention, the tag value and/or the key value of the data are/is coded, and compared with the original data, the coded data can occupy smaller storage space, so that the data expansion is greatly reduced, and the storage space of the cache is saved; in addition, by saving the storage space of the cache, the data query service can be provided more quickly, and the performance of the data cache service system is improved. Furthermore, at least one data cache and at least one data query module in the embodiment of the invention can be dynamically expanded, thereby realizing the dynamic expansion of the whole system.

In practical application, the storage effects of the prior art and the embodiment of the invention are compared through experiments, and the experimental results are shown in table 1.

Data volume	Occupied space of the prior art	The embodiment occupies space
			5 hundred million pieces of data	18.4G	15.4G
10 hundred million pieces of data	39.74G	31.13G

TABLE 1

As shown in table 1, for 5 hundred million pieces of data, the prior art will occupy 18.4G of storage space in the cache, whereas the embodiment of the present invention only occupies 15.4G of storage space; similarly, for 10 hundred million pieces of data, 39.74G of storage space will be occupied in the cache by using the prior art, while only 31.13G of storage space is occupied by using the embodiment of the present invention.

Therefore, compared with the prior art, the embodiment of the invention can reduce about 20% of the storage space of the cache.

Preferably, the data cache 102 may include a tag value encoding unit (not shown) adapted to encode a tag value of each data to form an identification code corresponding to the tag value, the identification code including characters and/or numbers.

Specifically, the tag value and the identification code may be in a one-to-one mapping relationship. The storage space occupied by the identification code is smaller than the storage space occupied by the tag value.

In one embodiment of the present invention, the identification code may be a number. All tag values are encoded, for example tag value1 is encoded as 0, tag value2 is encoded as 1, tag value3 is encoded as 2, …, and tag value N is encoded as N-1. That is, if the data tag value is "value 1", the integer 0 is stored, and if the data tag value is "value 2", the integer 1 is stored.

In another embodiment of the present invention, the identification code may be a character. If the data tag value is "value 1", character a is stored, and if the data tag value is "value 2", character b is stored.

In still another embodiment of the present invention, the identification code may be characters and numbers. If the data tag value is "value 1," the character a1 is stored, and if the data tag value is "value 2," the character a2 is stored.

By encoding the tag values of the data and establishing the mapping relationship between the tag values and the identification codes, the data cache 102 can perform shared storage, that is, the tag values of the same data are stored only once, thereby greatly reducing the occupation amount of the storage space.

Preferably, the data cache 102 may include a key value processing unit (not shown), and the key value processing unit is adapted to obtain two levels of key value pairs according to the key value of each data, where the two levels of key value pairs serve as encoded key values, and the two levels of key value pairs include a first level key value and a second level key value, and the number of types of the first level key value is smaller than the number of types of the key values.

In specific implementation, the mapping relation between the two levels of key value pairs and the key values is established by encoding the key values. The number of types of the primary key values is smaller than the number of types of the key values, and the number of types of the primary key values stored in the data cache 102 is smaller than the number of types of the key values, so that the same key values only need to be stored in the cache once, and the occupation amount of the storage space is reduced. In addition, in the prior art, when the key values are stored in the data cache 102, a data structure is allocated to each type of key value, and when the number of the key value types is large, a large storage space is occupied. According to the embodiment, the number of the allocated data structures can be reduced through the primary key values, so that the storage space is further reduced.

Further, as shown in fig. 2, the key value processing unit may include: a number determining subunit 21, adapted to determine the number of types of the primary key values according to the number of types of key values of the multiple types of data; a numerical value conversion subunit 22 adapted to convert key values of the plurality of types of data into a first numerical value string; a primary key value determining subunit 23, adapted to modulo the type number of the primary key value by the first numeric string, and convert the modulo value into a second numeric string as the primary key value; and a secondary key value determining subunit 24 adapted to select a set number of characters in the first numerical string as the secondary key value.

In a specific implementation, the number of types of the primary key values determined by the number determining subunit 21 is smaller than the number of types of the key values. In order to make the difference between the key values of the data larger and the distribution of the key values more dispersed, the key values of various data may be converted into the first numerical string. For example, the method may be implemented by using a Message Digest Algorithm (MD), etc.; more specifically, the fifth or sixth version of the algorithm using message digest may be employed.

The primary key value may be a value of the first numeric string modulo the number of types of the primary key value. Further, the primary key value may also be a second numeric string of modulo value conversion. The second numeric string may be a numeric string of fixed length, so that when a storage space is allocated for the first-level key value, the storage space of fixed length may be allocated, thereby further saving the storage space. Specifically, the conversion may be implemented by using a hash algorithm such as MD5 or MD 6.

In this embodiment, if there is a difference between the plurality of first numerical value strings, the secondary key values are selected as the characters of the set number in the first numerical value strings, so as to be combined with the primary key values to implement the differentiation and mapping of the key values. For example, when the first numeric string is 32 characters, 16 characters in the 32 characters can be selected as the secondary key value; when the first numeric string is 16 characters, 8 characters in the 16 characters can be selected as the secondary key value. It should be understood by those skilled in the art that the set number may be any other practicable number, and the embodiment of the invention is not limited thereto.

It should be noted that, if the differences between the key values of the multiple data are large, the value conversion subunit 22 does not need to convert the key values of the multiple data into the first numerical string, and the primary key value determination subunit 23 may directly modulo the number of types of the primary key values by the key values to serve as the primary key values; the secondary key value determining subunit 24 selects a set number of characters in the key value as the secondary key value.

In a specific application scenario of the present invention, the number of types of primary key values is determined according to the number of types of key values of original data, and is recorded as num (level1_ key). And converting the key value of the original data by adopting an MD5 algorithm to obtain a first numerical string. And (3) taking a modulus of num (level1_ key) by using the first numerical string, and converting the modulus value by adopting an MD5 algorithm to obtain a primary key value level1_ key. And taking the middle 16 characters of the first numerical string to obtain a secondary key value level2_ key. Thus, two-level key value pairs are obtained and used as the key values after the original data are coded.

Preferably, as shown in fig. 3, the data loading module 101 (refer to fig. 1) may include: the sorting unit 31 is adapted to sort second data according to the size of the key value, where the first data is sorted according to the size of the key value, the second data is data in the data source, and the first data is data in the data cache; the comparison unit 32 is adapted to sequentially select the first data and the sorted second data, and compare at least key values of the selected first data and the selected second data to obtain a comparison result; a type determining unit 33, adapted to determine the type of the first data and/or the type of the second data according to the comparison result; an updating unit 34 adapted to perform data updating on the at least one data cache according to the type of the first data and/or the type of the second data.

According to the embodiment of the invention, the data type is determined, and the updating is carried out according to the data type, namely, the data loading mode of incremental updating is adopted, so that the time spent on data updating is greatly reduced, and the data loading efficiency is improved; in addition, repeated data can be prevented from being loaded, and the storage space of the cache is further saved.

Further, the sorting unit 31 may sort the second data according to a descending order of key values, and the first data according to a descending order of key values, where the type determining unit 33 may include: a first type determining subunit (not shown), adapted to determine that the first data is to-be-deleted data when the comparison result indicates that the key value of the first data is smaller than the key value of the second data; a second type determining subunit (not shown), adapted to determine that the second data is to-be-added data when the comparison result indicates that the key value of the first data is greater than the key value of the second data.

Further, when the key value of the first data is consistent with the key value of the second data, the comparing unit 32 compares the tag value of the first data with the tag value of the second data to obtain the comparison result. The type determining unit 33 may include: a third type determining subunit (not shown), adapted to determine that the second data is changed data when the comparison result indicates that the tag value of the first data is not consistent with the tag value of the second data.

It should be noted that, in a variation of the present invention, the sorting unit 31 may also sort the second data according to the descending order of key values, and sort the first data according to the descending order of key values. In this case, the first type determining subunit is adapted to determine that the first data is to-be-deleted data when the comparison result indicates that the key value of the first data is greater than the key value of the second data; and the second type determining subunit is suitable for determining the second data as the data to be newly added when the comparison result shows that the key value of the first data is smaller than the key value of the second data.

Further, the updating unit 34 deletes the data to be deleted, and loads the change data and the data to be newly added to the at least one data cache.

In an embodiment of the present invention, the manner of acquiring data from the data source by the data loading module 101 may be selected from the following manners: pulling data by accessing a data source Application Programming Interface (API); acquiring data in a Secure File Transfer Protocol (SFTP) mode; obtaining data from other streaming data sources, such as kafka, etc.; and obtaining the data from a big data processing platform, such as Hadoop and the like.

Data files are usually large, so that the data files can be fragmented. Specifically, the file may be divided into N parts on average according to the number C of Central Processing Unit (CPU) cores and the load L of the server where the data cache is located; the formula for calculating N is: n is 1 if the server is a single core, otherwise N ═ L > (C/2)? ((C/4): (C/2)), wherein N is a positive integer.

The data loading module 101 may sort the fragmented files, or may directly sort the data. And then comparing the data files obtained after sequencing with the data files in the cache to obtain the data to be added, the changed data and the data to be deleted.

Preferably, the data query module 103 may pre-establish a coroutine pool and a connection pool, where the coroutine pool includes multiple threads, and the connection pool includes connections of multiple demander access interfaces to the cache interface.

According to the embodiment of the invention, the pre-established coroutine pool and the connection pool can directly call the thread from the coroutine pool when the query request comes, and call the connection from the access interface of the demand party to the cache interface from the connection pool, so that the rapid query service of data is realized, and the high-concurrency and low-delay data query access service can be further provided.

Specifically, an HTTP service may be constructed using a hypertext Transfer Protocol (HTTP), and a connection pool and a Protocol pool are created when the service is started. When a query request occurs, the data query module 103 first fetches a coroutine from the coroutine pool, then fetches a connection from the connection pool, accesses the data cache 102, and returns an access result.

Services can also be constructed by using a Thrift mode, and a connection pool and a coroutine pool are created when the services are started. When a query request occurs, the data query module 103 first fetches a coroutine from the coroutine pool, then fetches a connection from the connection pool, accesses the data cache 102, and returns an access result.

Fig. 4 is a schematic structural diagram of another data caching service system according to an embodiment of the present invention.

The data caching service system 40 shown in fig. 4 may include: a data loading module 401, a data cache cluster 402, a data query module 403, and a reverse proxy and load balancing module 404.

In this embodiment, the reverse proxy and load balancing module 404 is adapted to receive a plurality of query requests of the demanding party 10 and uniformly distribute the plurality of query requests to the at least one data query module 403. The query module 403 accesses the data cache cluster 402, and feeds back the access result to the demander 10, thereby completing the request. Meanwhile, data loading module 401 may update the source data of supplier 20 to data cache cluster 402 in real time.

When the data loading module 401 loads data from the supplier 20 into the data cache cluster 402, the requirement of efficient and real-time update can be met, and the external service capability and stability of the data cache cluster 402 are not affected in the loading process.

Fig. 5 is a flowchart of a data caching service method according to an embodiment of the present invention.

The data caching service method shown in fig. 5 may include the steps of:

step S501: loading multiple data of multiple data sources, wherein each data has a label value and a key value;

step S502: when the multiple data are obtained from the data loading module, the tag value and/or the key value of each data are/is coded, so that the coded tag value is smaller than the storage space occupied by the tag value, the storage space occupied by the coded key value is smaller than the storage space occupied by the key value, and the coded tag values and the coded key values of the multiple data are stored;

step S503: and performing matching query on the various data in the at least one data cache according to the query request of the demand side.

By encoding the tag value and/or the key value of the data, the encoded data can occupy smaller storage space compared with the original data, thereby greatly reducing data expansion and saving the storage space of the cache; in addition, by saving the storage space of the cache, the data query service can be provided more quickly, and the performance of the data cache service system is improved.

Preferably, step S502 may include the steps of: the tag value of each datum is encoded to form an identification code corresponding to the tag value, the identification code including characters and/or numbers.

The tag values of the data are encoded, and the mapping relation between the tag values and the identification codes is established, so that the data cache can share and store the data, that is, the tag values of the same data can be stored only once, and the occupation amount of a storage space is greatly reduced.

Preferably, step S502 may include the steps of: and obtaining two-level key value pairs according to the key value of each datum to be used as the coded key value, wherein the two-level key value pairs comprise primary key values and secondary key values, and the type number of the primary key values is smaller than that of the key values.

In specific implementation, the mapping relation between the two levels of key value pairs and the key values is established by encoding the key values. The number of the types of the first-level key values is smaller than that of the types of the key values, so that the number of the types of the first-level key values stored in the data cache is smaller than that of the types of the key values, the same key values only need to be stored in the cache once, and the occupation amount of a storage space is reduced. In addition, in the prior art, when key values are stored in a data cache, a data structure is allocated to each type of key value, and when the number of types of key values is large, a large storage space is occupied. According to the embodiment, the number of the allocated data structures can be reduced through the primary key values, so that the storage space is further reduced.

Preferably, the obtaining a two-level key-value pair according to the key value of each data includes: determining the type number of the primary key values according to the type number of the key values of the various data; converting the key values of the various data into a first numerical string; the first numerical value string is subjected to modulus operation on the type number of the primary key value, and the modulus operation value is converted into a second numerical value string to serve as the primary key value; and selecting a set number of characters in the first numerical string as the secondary key value.

Preferably, step S501 may include the steps of: sorting second data according to the size of the key values, wherein first data is sorted according to the size of the key values, the second data is data in the data source, and the first data is data in the data cache; sequentially selecting the first data and the sorted second data, and at least comparing key values of the selected first data and the second data to obtain a comparison result; determining the type of the first data and/or the type of the second data according to the comparison result; and updating the data of the at least one data cache according to the type of the first data and/or the type of the second data.

In specific implementation, in order to make the difference between the key values of the data larger and the distribution of the key values more dispersed, the key values of various data may be converted into the first numerical string. For example, it may be implemented by using Message digest algorithm (MD) or the like; more specifically, the fifth or sixth version of the algorithm using message digest may be employed.

Further, the second data are sorted according to the sequence of key values from small to large, the first data are sorted according to the sequence of key values from small to large, and the determining the type of the first data and/or the type of the second data according to the comparison result includes: if the comparison result shows that the key value of the first data is smaller than the key value of the second data, determining that the first data is to-be-deleted data; and if the comparison result shows that the key value of the first data is larger than the key value of the second data, determining that the second data is the data to be newly added.

Further, the comparing of the key values of the at least selected first data and the second data includes: and if the key value of the first data is consistent with the key value of the second data, comparing the label value of the first data with the label value of the second data to obtain the comparison result.

Further, the determining the type of the first data and/or the type of the second data according to the comparison result includes: and if the comparison result shows that the label value of the first data is inconsistent with the label value of the second data, determining that the second data is changed data.

Further, the data updating the at least one data cache according to the type of the first data and/or the type of the second data includes: and deleting the data to be deleted, and loading the change data and the data to be newly added to the at least one data cache.

Preferably, the loading the plurality of data of the plurality of data sources includes: storing the key values and the tag values of the various data into the at least one data cache in a pipeline transmission mode, a hypertext Transfer Protocol (HTTP) mode or a Command Line Interface (CLI) mode.

Preferably, a coroutine pool and a connection pool are pre-established, the coroutine pool comprises a plurality of threads, and the connection pool comprises a plurality of connections from the access interfaces of the demand side to the cache interfaces.

Preferably, before step S503, the method further includes: a plurality of query requests that are evenly distributed are received.

For more details about the working mode of the data caching service method, reference may be made to the relevant descriptions of the embodiments shown in fig. 1 to fig. 4, which are not described herein again.

Referring to fig. 6, fig. 6 is a flowchart illustrating an embodiment of step S501 shown in fig. 5.

In this embodiment, data in the cached data file and the sorted data file (i.e., the file provided by the data source) are respectively denoted as R1 and R2, key values thereof are key1 and key2, and tag values thereof are v1 and v 2. The cache files and the sorted data files are sorted according to the sequence of key values from small to large. The cache file and the sorted data file may include multiple pieces of data.

Step S501 may include the steps of:

step S601: judging whether the key1 is smaller than the key2, if so, entering the step S602, otherwise, entering the step S603;

step S602: determining the data R1 as the data to be deleted, taking down a piece of data from the cache data file, and returning to the step S601;

step S603: judging whether the key1 is larger than the key2, if so, entering the step S604, otherwise, entering the step S605;

step S604: determining that the data R2 is the data to be newly added, fetching the next data of the data file, and returning to the step S601;

step S605: judging whether the tag value v1 is equal to the tag value v2, if so, entering the step S606, otherwise, entering the step S607;

step S606: determining that the data R1 is invariant data, taking the next data of the cache data file and the data file, and returning to the step S601;

step S607: determining that the data R2 is changed data, fetching the cached data file and the next piece of data of the data file, and returning to step S601.

Further, the following steps may also be performed after step S602, step S604, step S606, and step S607: if the data in the cache data file is completely fetched, the rest un-fetched data in the sorted data file are all the data to be newly added, and the execution is finished; and if the data in the sorted data file is completely fetched, all the residual data which are not fetched in the cache data file are the data to be deleted, and the execution is finished.

Further, the following steps may also be performed after step S602, step S604, step S606, and step S607: and (3) processing the determined data to be deleted, data to be newly added, invariant data and variant data in real time, namely deleting the data to be deleted in the cache data file, reserving the invariant data in the cache data file, and loading the data to be newly added and the variant data to the cache data file.

In a variation of the embodiment of the present invention, the following steps may be further performed after step S602, step S604, step S606, and step S607: and respectively adding the determined data to be deleted, data to be newly added, invariant data and variant data into the file to be deleted, the file to be newly added, the invariant file and the variant file, and after all data in the cache data file and the sorted data file are taken out, respectively processing the file to be deleted, the file to be newly added, the invariant file and the variant file. In other words, the data to be deleted in the file to be deleted is deleted, the invariant data in the invariant file is retained, and the data to be added and the variant data in the file to be added and the variant file are loaded to the cache data file.

Furthermore, when the file to be deleted, the file to be newly added, the unchanged file and the changed file are large, the file can be further fragmented, divided into a plurality of small files, and then the data is loaded.

In this embodiment, when the data to be newly added and the changed data are loaded into the cache data file, the loading mode may be selected from the following two modes: and writing the data into a cache by adopting a cache client, converting the data into a cache protocol and transmitting the data through a pipeline.

In practical application, the data loading effects of the prior art and the embodiment of the invention are compared through tests, and the test results are shown in table 2.

Data volume	The prior art takes time	This embodiment takes time
			10 hundred million pieces of data	3 hours and 20 minutes	40 minutes
25 hundred million pieces of data	6 hours and 32 minutes	1 hour

TABLE 2

As shown in table 2, for 10 billion pieces of data, it takes 3 hours and 20 minutes to load the data into the data buffer by using the prior art, while it takes 40 minutes by using the embodiment of the present invention; for 25 billion pieces of data, it would take 6 hours and 32 minutes to load into the data cache using the prior art, while it would take only 1 hour using embodiments of the present invention.

Therefore, tests prove that on the premise of loading the same data volume, compared with the prior art, the time spent on adopting the data updating mode of the embodiment of the invention can be reduced to 1/5 of the time spent on the original updating, and the data updating speed is greatly improved. In addition, the time spent on loading in the prior art is linearly increased along with the increase of the data volume, and the time spent on loading in the embodiment of the invention is in a nonlinear relation with the data volume, so that the data updating speed is further improved.

The embodiment of the invention also discloses a computer readable storage medium, which stores computer instructions, and the computer instructions can execute the steps of the method shown in fig. 5 or fig. 6 when running. The storage medium may include ROM, RAM, magnetic or optical disks, etc.

The embodiment of the invention also discloses a terminal which can comprise a memory and a processor, wherein the memory is stored with computer instructions capable of running on the processor. The processor, when executing the computer instructions, may perform the steps of the method shown in fig. 5 or fig. 6. The terminal can include, but is not limited to, a mobile phone, a computer, a tablet computer and other terminal devices.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A data caching service system, comprising:

the data loading module is suitable for loading various data of multiple data sources, and each data has a label value and a key value;

at least one data cache, adapted to encode the tag value and/or the key value of each data when the plurality of data are obtained from the data loading module, so that the encoded tag value is smaller than the storage space occupied by the tag value, the storage space occupied by the encoded key value is smaller than the storage space occupied by the key value, and the encoded tag value and the encoded key value of the plurality of data are stored;

the data query module is suitable for performing matching query on the various data in the at least one data cache according to a query request of a demand party;

the data caching includes:

the key value processing unit is suitable for obtaining two-level key value pairs according to the key value of each datum to serve as the coded key value, the two-level key value pairs comprise primary key values and secondary key values, and the type number of the primary key values is smaller than that of the key values;

the key-value processing unit includes: the quantity determining subunit is suitable for determining the type quantity of the primary key values according to the type quantity of the key values of the various data;

a numerical value conversion subunit adapted to convert the key values of the plurality of types of data into a first numerical value string;

a primary key value determining subunit, adapted to modulo the type number of the primary key value by the first numeric string, and convert the modulo value into a second numeric string as the primary key value;

and the secondary key value determining subunit is suitable for selecting the characters with the set number in the first numerical string as the secondary key value.

2. The data caching service system of claim 1, wherein the data caching comprises:

and the label value encoding unit is suitable for encoding the label value of each datum to form an identification code corresponding to the label value, and the identification code comprises characters and/or numbers.

3. The data caching service system of claim 1, wherein the data loading module comprises:

the sorting unit is suitable for sorting second data according to the size of the key value, wherein the first data is sorted according to the size of the key value, the second data is data in the data source, and the first data is data in the data cache;

the comparison unit is suitable for sequentially selecting the first data and the sorted second data and comparing key values of at least the selected first data and the second data to obtain a comparison result;

the type determining unit is suitable for determining the type of the first data and/or the type of the second data according to the comparison result;

and the updating unit is suitable for updating the data of the at least one data cache according to the type of the first data and/or the type of the second data.

4. The data cache service system according to claim 3, wherein the sorting unit sorts the second data in order of key values from small to large, and the first data is sorted in order of key values from small to large, and the type determination unit includes:

the first type determining subunit is adapted to determine that the first data is to-be-deleted data when the comparison result indicates that the key value of the first data is smaller than the key value of the second data;

and the second type determining subunit is adapted to determine that the second data is to-be-newly-added data when the comparison result shows that the key value of the first data is greater than the key value of the second data.

5. The data cache service system according to claim 4, wherein the comparing unit compares the tag value of the first data with the tag value of the second data to obtain the comparison result when the key value of the first data is consistent with the key value of the second data;

the type determination unit includes:

a third type determining subunit, adapted to determine that the second data is changed data when the comparison result indicates that the tag value of the first data is inconsistent with the tag value of the second data;

the updating unit deletes the data to be deleted and loads the change data and the data to be newly added to the at least one data cache.

6. The data cache service system of claim 1, wherein the data loading module stores the key values and tag values of the plurality of data into the at least one data cache by using a pipeline, HTTP or CLI interface.

7. The data caching service system of claim 1, wherein the data query module pre-establishes a coroutine pool and a connection pool, the coroutine pool comprising a plurality of threads, the connection pool comprising connections of a plurality of requestor access interfaces to the caching interface.

8. The data caching service system of claim 1, further comprising:

and the reverse proxy and load balancing module is suitable for receiving a plurality of query requests and uniformly distributing the query requests to the at least one data query module.

9. A data caching service method, comprising:

loading multiple data of multiple data sources, wherein each data has a label value and a key value;

when the multiple data are obtained, the tag value and/or the key value of each data are/is coded, so that the coded tag value is smaller than the storage space occupied by the tag value, the storage space occupied by the coded key value is smaller than the storage space occupied by the key value, and the coded tag value and the coded key value of the multiple data are stored in at least one data cache;

performing matching query on the various data according to a query request of a demand party;

the encoding the key value of each data includes: obtaining two-level key value pairs according to the key value of each datum to be used as coded key values, wherein the two-level key value pairs comprise primary key values and secondary key values, the type number of the primary key values is smaller than that of the key values,

the obtaining of the two-level key value pairs according to the key value of each data comprises: determining the type number of the primary key values according to the type number of the key values of the various data;

converting the key values of the various data into a first numerical string;

the first numerical value string is subjected to modulus operation on the type number of the primary key value, and the modulus operation value is converted into a second numerical value string to serve as the primary key value;

and selecting a set number of characters in the first numerical string as the secondary key value.

10. The data caching service method of claim 9, wherein the encoding the tag value of each data comprises:

the tag value of each datum is encoded to form an identification code corresponding to the tag value, the identification code including characters and/or numbers.

11. The data caching service method of claim 9, wherein the loading the plurality of data from the plurality of data sources comprises:

sorting second data according to the size of the key values, wherein first data is sorted according to the size of the key values, the second data is data in the data source, and the first data is data in the data cache;

sequentially selecting the first data and the sorted second data, and at least comparing key values of the selected first data and the second data to obtain a comparison result;

determining the type of the first data and/or the type of the second data according to the comparison result;

and updating the data of the at least one data cache according to the type of the first data and/or the type of the second data.

12. The data caching service method of claim 11, wherein the second data are sorted in a descending order of key values, the first data are sorted in a descending order of key values, and determining the type of the first data and/or the type of the second data according to the comparison result comprises:

if the comparison result shows that the key value of the first data is smaller than the key value of the second data, determining that the first data is to-be-deleted data;

and if the comparison result shows that the key value of the first data is larger than the key value of the second data, determining that the second data is the data to be newly added.

13. The data caching service method of claim 12, wherein the comparing at least key values of the selected first data and the second data comprises:

if the key value of the first data is consistent with the key value of the second data, comparing the label value of the first data with the label value of the second data to obtain a comparison result;

the determining the type of the first data and/or the type of the second data according to the comparison result comprises:

if the comparison result shows that the tag value of the first data is inconsistent with the tag value of the second data, determining that the second data is changed data;

the data updating the at least one data cache according to the type of the first data and/or the type of the second data includes:

and deleting the data to be deleted, and loading the change data and the data to be newly added to the at least one data cache.

14. The data caching service method of claim 9, wherein the loading the plurality of data from the plurality of data sources comprises:

and loading the key values and the label values of the various data in a pipeline transmission mode or an HTTP or CLI interface mode.

15. The data caching service method of claim 9, wherein a coroutine pool and a connection pool are pre-established, the coroutine pool comprising a plurality of threads, the connection pool comprising connections of a plurality of requiring party access interfaces to the caching interface.

16. The data caching service method according to claim 9, wherein before the performing matching query on the plurality of types of data according to the query request of the demander, the method further comprises:

a plurality of query requests that are evenly distributed are received.

17. A terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the data caching service method according to any one of claims 9 to 16.