CN106681995B

CN106681995B - Data caching method, data query method and device

Info

Publication number: CN106681995B
Application number: CN201510746534.3A
Authority: CN
Inventors: 苏国庆
Original assignee: Cainiao Smart Logistics Holding Ltd
Current assignee: Cainiao Smart Logistics Holding Ltd
Priority date: 2015-11-05
Filing date: 2015-11-05
Publication date: 2020-08-18
Anticipated expiration: 2035-11-05
Also published as: CN106681995A

Abstract

The application discloses a data caching method, a data query method and a data query device, wherein the method comprises the following steps: receiving data to be cached, and executing the following steps aiming at each first data element included in the data to be cached: acquiring an identifier of a first data element; determining whether existing data elements corresponding to the identification of the first data element are stored in a memory of the big data terminal; if the existing data elements are determined to be stored in the memory, replacing the existing data elements with the first data elements; if the existing data elements are not stored in the memory, determining whether the memory capacity of the memory reaches a set threshold value; if the memory capacity of the memory does not reach the set threshold value, storing the first data element and the identification thereof in the memory; and if the storage capacity of the memory reaches a set threshold value, storing the first data element in an external memory of the big data terminal, and storing the identifier and the storage position of the first data element in the memory. The scheme can effectively avoid the problems in the related art.

Description

Data caching method, data query method and device

Technical Field

The present application relates to the field of network technologies, and in particular, to a data caching method, a data querying method, and an apparatus.

Background

With the rapid development of network technology, more and more users acquire information through the network, and the data volume required to be processed by the website is larger and larger, which is now entering the big data era. The server of the website usually includes at least one big data terminal, a push server and a database, where different big data terminals may process different services of a user, and data interaction may also be performed between the big data terminals, that is, each big data terminal may process data acquisition requests from the user and other big data terminals, and a time required by the big data terminal from receiving the data acquisition request to feeding back data is a response time.

In order to reduce the response time, the push server acquires the full amount of data in the database in a set period, and then pushes the data of each big data terminal to the corresponding big data terminal. According to a related data caching method, each big data terminal receives data pushed by a push server in a set period, if existing data exists in a memory of the big data terminal, the received data is replaced by the existing data, if the existing data does not exist in the memory of the big data terminal, the received data is directly stored in the memory, and after the big data terminal receives a data acquisition request, required data elements are acquired from the data cached in the memory and fed back to a user sending the data acquisition request or other big data terminals.

In the data caching method, in order to ensure that the big data terminal can normally process the data acquisition request, two data sets are stored in the memory in the process of replacing the existing data, the two data sets are all data elements required by the big data terminal and stored in the database, the data volume is very large, and the process of replacing the data sets occupy a lot of memory resources, so that insufficient memory is easily caused, and even memory recovery and big data terminal downtime are caused.

Disclosure of Invention

The embodiment of the application provides a data caching method, a data query method and a data query device, which are used for solving the problems that a large number of memory resources are occupied in the data replacement process in the related technology, so that insufficient memory is easily caused, and even memory recovery and large data terminal downtime can be caused.

According to the embodiment of the application, a data caching method is provided, which is applied to a big data terminal and comprises the following steps:

receiving data to be cached, and executing, for each first data element included in the data to be cached:

acquiring an identifier of the first data element;

determining whether existing data elements corresponding to the identifier of the first data element are stored in a memory of the big data terminal;

if the existing data elements are determined to be stored in the memory, replacing the existing data elements with the first data elements;

if the existing data elements are not stored in the memory, determining whether the memory capacity of the memory reaches a set threshold value; if the storage capacity of the memory does not reach the set threshold value, storing the first data element and the identification thereof in the memory; and if the storage capacity of the memory reaches the set threshold value, storing the first data element in an external memory of the big data terminal, and storing the identifier and the storage position of the first data element in the memory.

Specifically, the obtaining the identifier of the first data element specifically includes:

looking up a selected field from the first data element;

and obtaining the numerical value of the selected field to obtain the identifier of the first data element.

Specifically, determining whether to store an existing data element corresponding to the identifier of the first data element in the memory of the big data terminal includes:

acquiring a bidirectional linked list with indexes stored in the memory;

determining whether an identification of the first data element is included in an index of the doubly linked list;

if the index of the doubly linked list comprises the identifier of the first data element, determining that the existing data element is stored in the memory;

and if the index of the doubly linked list does not include the identifier of the first data element, determining that the existing data element is not stored in the memory.

Optionally, after determining that the existing data element is stored in the memory, the method further includes:

acquiring a first node corresponding to the identifier of the first data element in the doubly linked list;

acquiring the existing data element from the data field of the first node;

comparing whether the content included in the first data element is identical to the content included in the existing data element;

and if the content included in the first data element is not identical to the content included in the existing data element, replacing the existing data element with the first data element.

Specifically, storing the first data element and the identifier thereof in the memory specifically includes:

storing the identifier of the first data element in the index of the doubly linked list;

and establishing a second node corresponding to the identifier of the first data element in the doubly linked list, and storing the first data element in a data field of the second node.

Specifically, storing the first data element in an external memory of the big data terminal, and storing the identifier and the storage location of the first data element in the memory specifically includes:

saving the first data element in a file of an external memory of the big data terminal;

acquiring the storage position of the first data element in an external memory of the big data terminal;

and establishing an item comprising the identifier and the storage position of the first data element in the corresponding relation between the identifier and the storage position stored in the memory.

Optionally, the method further includes:

receiving a data acquisition request which is sent by a user or other big data terminals and carries the identification of the second data element;

inquiring the second data element or the storage position of the second data element in the memory according to the identifier of the second data element;

if the second data element is inquired, sending the second data element to the user or other big data terminals;

if the storage position of the second data element is inquired, acquiring the second data element from the external memory according to the storage position of the second data element, and sending the second data element to the user or other big data terminals; and storing the second data element in the memory.

Optionally, querying the second data element or the storage location of the second data element in the memory according to the identifier of the second data element specifically includes:

respectively searching the identification of the second data element from the index-bearing doubly linked list stored in the memory and the corresponding relation between the identification and the storage position;

if the identifier of the second data element is found in the doubly linked list, acquiring a third node corresponding to the identifier of the second data element from the doubly linked list, and acquiring the second data element from a data field of the third node;

and if the table entry comprising the identifier of the second data element is found in the corresponding relationship, acquiring the storage position of the second data element in the table entry.

Optionally, after querying the second data element, the method further includes:

updating the number of times said second data element is accessed in said third node; and the number of the first and second groups,

and adjusting the position of each node in the double linked list according to the accessed times, and correspondingly adjusting the pointer in each node in the double linked list.

Specifically, storing the second data element in the memory specifically includes:

judging whether the memory space of the memory reaches the set threshold value;

if the memory space of the memory reaches the set threshold, determining a third data element with the least accessed times stored in the memory, storing the third data element in the external memory, and storing a table entry comprising an identifier and a storage position of the third data element in the corresponding relationship; deleting the identifier of the third data element stored in the index of the doubly linked list and the node where the third data element is located in the doubly linked list; establishing a fourth node corresponding to the identifier of the second data element in the doubly linked list, storing the second data element in a data field of the fourth node, and storing the identifier of the second data element in an index of the doubly linked list; updating the accessed times of the second data elements in the fourth node, adjusting the position of each node in the double linked list according to the accessed times, and correspondingly adjusting the pointer in each node in the double linked list;

if the memory capacity of the memory does not reach the set threshold value, establishing a fifth node corresponding to the identifier of the second data element in the doubly linked list, storing the second data element in the data field of the fifth node, and storing the identifier of the second data element in the index of the doubly linked list; and updating the accessed times of the second data elements in the fifth node, adjusting the position of each node in the double linked list according to the accessed times, and correspondingly adjusting the pointer in each node in the double linked list.

According to the embodiment of the application, a data query method is further provided, which is applied to a big data terminal and comprises the following steps:

inquiring the second data element or the storage position of the second data element in the memory of the big data terminal according to the identifier of the second data element;

if the storage position of the second data element is inquired, acquiring the second data element from an external memory of the big data terminal according to the storage position of the second data element, and sending the second data element to the user or other big data terminals; and storing the second data element in the memory.

Specifically, querying the second data element or the storage location of the second data element in the memory according to the identifier of the second data element specifically includes:

judging whether the memory space of the memory reaches the set threshold value;

According to an embodiment of the present application, there is also provided a data caching apparatus, applied in a big data terminal, including:

a first receiving unit, configured to receive data to be cached, and execute, for each first data element included in the data to be cached:

an obtaining unit, configured to obtain an identifier of the first data element;

a determining unit, configured to determine whether an existing data element corresponding to the identifier of the first data element is stored in a memory of the big data terminal;

the cache unit is used for replacing the existing data element with the first data element if the existing data element is determined to be stored in the memory; if the existing data elements are not stored in the memory, determining whether the memory capacity of the memory reaches a set threshold value; if the storage capacity of the memory does not reach the set threshold value, storing the first data element and the identification thereof in the memory; and if the storage capacity of the memory reaches the set threshold value, storing the first data element in an external memory of the big data terminal, and storing the identifier and the storage position of the first data element in the memory.

Specifically, the obtaining unit is configured to obtain an identifier of the first data element, and specifically configured to:

looking up a selected field from the first data element;

Specifically, the determining unit is configured to determine whether an existing data element corresponding to the identifier of the first data element is stored in a memory of the big data terminal, and specifically includes:

acquiring a bidirectional linked list with indexes stored in the memory;

Optionally, the determining unit is further configured to:

after the existing data elements are stored in the memory, acquiring a first node corresponding to the identifier of the first data element in the doubly linked list;

acquiring the existing data element from the data field of the first node;

and if the content included in the first data element is not identical to the content included in the existing data element, turning to the cache unit.

Specifically, the cache unit is configured to store the first data element and the identifier thereof in the memory, and specifically configured to:

Specifically, the cache unit is configured to store the first data element in an external memory of the big data terminal, and store the identifier and the storage location of the first data element in the memory, and specifically configured to:

Optionally, the method further includes:

the second receiving unit is used for receiving a data acquisition request which is sent by a user or other big data terminals and carries the identification of the second data element;

a query unit, configured to query, in the memory, the second data element or a storage location of the second data element according to the identifier of the second data element;

a sending unit, configured to send the second data element to the user or another big data terminal if the second data element is queried; if the storage position of the second data element is inquired, acquiring the second data element from the external memory according to the storage position of the second data element, and sending the second data element to the user or other big data terminals; and storing the second data element in the memory.

Specifically, the querying unit is configured to query, in the memory, the second data element or a storage location of the second data element according to the identifier of the second data element, and is specifically configured to:

Optionally, the sending unit is further configured to:

after the second data element is queried by the query unit, updating the accessed times of the second data element in the third node; and the number of the first and second groups,

Specifically, the sending unit is configured to store the second data element in the memory, and is specifically configured to:

judging whether the memory space of the memory reaches the set threshold value;

According to an embodiment of the present application, there is also provided a data query device, applied in a big data terminal, including:

the receiving unit is used for receiving a data acquisition request which is sent by a user or other big data terminals and carries the identification of the second data element;

the query unit is used for querying the second data element or the storage position of the second data element in the memory of the big data terminal according to the identifier of the second data element;

a sending unit, configured to send the second data element to the user or another big data terminal if the second data element is queried; if the storage position of the second data element is inquired, acquiring the second data element from an external memory of the big data terminal according to the storage position of the second data element, and sending the second data element to the user or other big data terminals; and storing the second data element in the memory.

Optionally, the sending unit is further configured to:

judging whether the memory space of the memory reaches the set threshold value;

The embodiment of the application provides a data caching method, a data query method and a data query device, wherein data to be cached is received, and for each first data element included in the data to be cached, the following steps are executed: acquiring an identifier of the first data element; determining whether existing data elements corresponding to the identifier of the first data element are stored in a memory of the big data terminal; if the existing data elements are determined to be stored in the memory, replacing the existing data elements with the first data elements; if the existing data elements are not stored in the memory, determining whether the memory capacity of the memory reaches a set threshold value; if the storage capacity of the memory does not reach the set threshold value, storing the first data element and the identification thereof in the memory; and if the storage capacity of the memory reaches the set threshold value, storing the first data element in an external memory of the big data terminal, and storing the identifier and the storage position of the first data element in the memory. In the scheme, the data to be cached pushed by the pushing server or the database are not stored in the memory, but the data to be cached are stored in a mode of combining the memory and external storage, and the storage capacity of the memory is always controlled within a set threshold value, so that the problem that the process of replacing the data in the related technology occupies a lot of memory resources, the memory is insufficient easily, and even the problems of memory recovery and large data terminal downtime can be caused can be effectively avoided.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a flowchart of a data caching method in an embodiment of the present application;

FIG. 2 is a flowchart of S12 in the embodiment of the present application;

FIG. 3 is a flowchart of S13 in the embodiment of the present application;

FIG. 4 is a schematic structural diagram of a doubly linked list with an index in an embodiment of the present application;

FIG. 5 is a flowchart of a data query method in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a data caching apparatus in an embodiment of the present application;

FIG. 7 is a schematic structural diagram of another data caching apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a data query device in an embodiment of the present application.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the present application clearer and clearer, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In order to solve the problems that a process of replacing data occupies a lot of memory resources in the related art, so that memory is insufficient, and even memory recovery and a big data terminal is down are caused, an embodiment of the present application provides a data caching method, which may be but not limited to be applied to a big data terminal, where the big data terminal may be a unique big data terminal in a server of a website, or any big data terminal in a big data terminal cluster, and a flow of the method is shown in fig. 1, and specifically includes the following steps:

s11: receiving data to be cached, and executing S12 for each first data element included in the data to be cached.

The data to be cached may be pushed by a push server or a database, and the following describes the situation that the push server and the database push the data to be cached, respectively.

The push server will push the data to be cached in the following two situations:

in the first case, the push server obtains the full amount of data from the database in a first period, selects the data required by each big data terminal from the full amount of data to obtain the data to be cached of each big data terminal, and then pushes the data to the corresponding big data terminal, the first period may be set according to actual needs, and may be set to 1 day, 2 days, and the like, and preferably, a period with relatively small traffic volume per day is selected for pushing, for example, 0 point, 30 minutes, and the like.

In the second case, the push server may further obtain the update data from the database at a second period, select the update data of each big data terminal from the update data, obtain the data to be cached of each big data terminal, and then push the data to the corresponding big data terminal, where the second period is smaller than the first period, and may be set according to actual needs, and may be set to 1 hour, 2 hours, 3 hours, and so on. The push server pushes the update data of each big data terminal at regular time, so that the newest and most complete data stored in the big data terminal can be ensured, the big data terminal can be further ensured to process the service from the user in real time, and the service processing delay is reduced.

The method comprises the steps that a database pushes data to be cached, when a big data terminal is initialized, the data to be cached are firstly registered in a pushing server, then the pushing server pushes data required by the big data terminal out of all data acquired from the database to the big data terminal, if the pushing server pushes the data to be cached to each big data terminal in a first set period, namely the pushing server pushes the data before fixed time, in order to ensure that the latest data is acquired, the big data terminal can automatically request the database for the data between the current time and the fixed time, and the database pushes the data required by the big data terminal to the big data terminal. For example, if the push server pushes data to the big data terminal at 0 o 'clock every day, and the current time is 13 o' clock, the data between 0 o 'clock and 13 o' clock today needs the big data terminal to request the database, and the database pushes the data between 0 o 'clock and 13 o' clock today to the big data terminal.

S12: an identification of the first data element is obtained.

S13: determining whether an existing data element corresponding to the identifier of the first data element is stored in a memory of the big data terminal, and if the existing data element is determined to be stored in the memory, executing S14; if it is determined that the existing data element is not stored in the memory, S15 is executed.

The data volume of one data element may be relatively large, when determining whether the first data element is stored in the memory of the big data terminal, the identifier of the first data element may be obtained, and then it is determined whether the existing data element corresponding to the identifier of the first data element is stored in the memory of the big data terminal.

S14: the existing data element is replaced with the first data element.

If the existing data element is stored in the memory, the existing data element may be directly replaced with the first data element in order to ensure that the latest data element is stored in the memory.

S15: determining whether the storage capacity of the memory reaches a set threshold, and if the storage capacity of the memory does not reach the set threshold, executing S16; if the storage amount of the memory reaches the set threshold, S17 is executed.

If the existing data elements are not stored in the memory, it is necessary to determine whether the first data elements are stored in the memory or in an external memory of the big data terminal, and specifically, it may be determined whether the memory has a storage space, that is, it is determined whether the storage capacity of the memory reaches a set threshold, and the set threshold may be set according to actual needs.

S16: and storing the first data element and the identification thereof in a memory.

S17: and storing the first data element in an external memory of the big data terminal, and storing the identification and the storage position of the first data element in a memory.

If the storage capacity of the memory reaches the set threshold, the first data element needs to be saved in the external memory. The identifier and the storage location of the first data element may be stored in the memory in a form of a Key/Value (Vlaue) pair, where Key is the identifier of the first data element, and Value is the storage location of the first data element in the external memory. In one embodiment, the storage location may specifically be an offset of the external memory. In another embodiment, the memory location may specifically be a discrete memory location in an external memory.

In the scheme, the data to be cached pushed by the pushing server or the database are not stored in the memory, but the data to be cached are stored in a mode of combining the memory and external storage, and the storage capacity of the memory is always controlled within a set threshold value, so that the problem that the process of replacing the data in the related technology occupies a lot of memory resources, the memory is insufficient easily, and even the problems of memory recovery and large data terminal downtime can be caused can be effectively avoided.

Each step of the above data caching method is described in detail below.

Specifically, the implementation process of obtaining the identifier of a data element in S12 is shown in fig. 2, and specifically includes the following steps:

s121: the selected field is looked up from the first data element.

S122: and acquiring the value of the selected field to obtain the identifier of the first data element.

The first data element usually includes many unique fields which are distinguished from other data elements, one or more fields can be selected to obtain a selected field, the value of the selected field is used as the identifier of the first data element, the identifier is a globally unique identifier of the first data element, and the identifier can be in a form of including numbers, letters, characters and the like, and can also include various combinations of numbers, letters, characters and the like.

Specifically, the implementation process of determining whether to store the existing data element corresponding to the identifier of the first data element in the memory of the big data terminal in S13 is shown in fig. 3, and specifically includes the following steps:

s131: and acquiring the doubly linked list with the index stored in the memory.

The storage structure of the data elements in the memory can take various forms, such as a doubly linked list, a data table, and the like, and the doubly linked list with an index shown in fig. 4 is taken as an example for description, where:

head pointer (Head), the first node pointer of the linked list, from which the first node of the linked list can be found;

front node pointer (Pre): pointing to the previous node of the current node, and if the previous node does not exist, indicating that the current node is the first node;

back node pointer (Next): pointing to the next node of the current node, and indicating that the current node is the last node if no next node exists;

data field (data): the data entity of the current node can be service data and the like;

table Tail node (Tail): the last node pointer of the linked list can find the last node of the linked list from the tail node of the list.

The above-mentioned bidirectional linked list with index adopts the form of Key/Vlaue pair, wherein, the label of the data element is Key, the nodal point where the data element locates is Value.

S132: determining whether the index of the doubly linked list includes the identifier of the first data element, and if the index of the doubly linked list includes the identifier of the first data element, executing S133; if the index of the doubly linked list does not include the identifier of the first data element, S134 is performed.

S133: and determining that the existing data elements are stored in the memory.

S134: and determining that the existing data elements are not stored in the memory.

Because the identification of the data element in the doubly linked list with the index corresponds to the node where the data element is located, whether the first data element is stored in the memory can be determined according to whether the identification of the first data element is stored in the index of the doubly linked list, so that whether the first data element is stored in the memory can be determined quickly, and the data caching efficiency is improved.

Optionally, after determining that the existing data elements are stored in the memory, before performing S16, the method further includes:

acquiring a first node corresponding to the identifier of a first data element in a doubly linked list;

acquiring an existing data element from a data field of a first node;

if the first data element does not include exactly the same content as the existing data element, S16 is performed.

If the existing data element is stored in the memory, whether the existing data element is the same as the first data element can be further determined, and if the first data element is different from the existing data element, that is, the content of the first data element is updated to a certain extent relative to the content of the existing data element, the identification is not changed, and only the existing data element needs to be replaced by the first data element; if the first data element is the same as the existing data element, the first data element may be ignored, and the determination may continue for the next data element.

In an alternative embodiment, the step of comparing the content included in the first data element with the content included in the existing data element may be omitted, and the existing data element may be directly replaced with the first data element. That is, regardless of the content of the existing data elements, the retrieved first data element is always defaulted to the most recent value, and thus a replacement operation is performed.

Specifically, when the storage structure in the memory adopts a doubly linked list with an index, the step S16 replaces the existing data element with the first data element, which specifically includes: deleting the existing data elements stored in the data field of the first node; the first data element is stored in a data field of the first node.

When replacing the existing data element, the data element stored in the data field of the first node corresponding to the identifier of the first data element is actually replaced by the first data element.

Correspondingly, the step of storing the first data element and the identifier thereof in the memory in S17 specifically includes: storing an identifier of a first data element in an index of a doubly linked list; and establishing a second node corresponding to the identifier of the first data element in the doubly linked list, and storing the first data element in a data field of the second node.

The first data element and the identifier thereof are stored in the memory, that is, a Key/Value pair is to be established in a doubly linked list with an index, the Key is the identifier of the first data element, and the Value is a second node including the first data element.

Correspondingly, the step of storing the first data element in the external memory of the big data terminal, and storing the identifier and the storage location of the first data element in the memory in S17 specifically includes: saving the first data element in a file of an external memory of the big data terminal; acquiring the storage position of a first data element in an external memory of a big data terminal; and establishing an item comprising the identifier and the storage position of the first data element in the corresponding relation between the identifier and the storage position stored in the memory.

The external memory may be set according to actual needs, and preferably, a mass storage device with a high data reading/writing speed, such as a Solid State Drive (SSD), may be used, or a general mass storage device may be used. The data elements in the external memory may be stored in a file, and other storage manners may also be adopted, which are not described herein again.

The data caching method is introduced above, and after the big data terminal caches data, the big data terminal may also search for a required data element from the cached data, and an embodiment of the present application further provides a data query method, which may be but is not limited to be applied in the big data terminal, and a flow of the method is as shown in fig. 5, and includes the following steps:

s51: and receiving a data acquisition request which is sent by a user or other big data terminals and carries the identification of the second data element.

When a user or other big data terminal needs to acquire data elements, the data elements may be defined as second data elements, and a data acquisition request carrying an identifier of the second data elements may be sent to the big data terminal.

S52: inquiring the second data element or the storage position of the second data element in the memory of the big data terminal according to the identifier of the second data element, and executing S53 if the second data element is inquired; if the storage location of the second data element is queried, S54 is executed.

Because the data elements or the storage positions of the data elements are stored in the memory of the big data terminal, and the query speed in the memory is very high, the second data elements or the storage positions of the second data elements can be directly searched in the memory.

Specifically, the identifier of the second data element may be searched from a doubly linked list with an index stored in the memory and a corresponding relationship between the identifier and the storage location, respectively; if the identification of the second data element is found in the doubly linked list, acquiring a third node corresponding to the identification of the second data element from the doubly linked list, and acquiring the second data element from a data field of the third node; and if the table entry comprising the identifier of the second data element is found in the corresponding relation, acquiring the storage position of the second data element in the table entry.

S53: and sending the second data element to a user or other big data terminal.

S54: acquiring a second data element from an external memory of the big data terminal according to the storage position of the second data element, and sending the second data element to a user or other big data terminals; and storing the second data element in the memory.

In the scheme, because the data elements in the big data terminal adopt a mode of combining the memory and the external storage, and the identification and the storage position of the data elements stored in the external storage are stored in the memory, correspondingly, when the second data element is inquired, if the second data element is stored in the memory, the second data element can be directly acquired, and if the second data element is stored in the external storage, the storage position of the second data element can be inquired in the memory, therefore, even if the second data element is stored in the external storage, the storage position of the second data element can be quickly acquired, and further the second data element is acquired, thereby reducing the delay of data inquiry, ensuring the real-time performance of data inquiry, and improving the user experience.

The steps of the above data query method are described in detail below.

updating the number of times the second data element is accessed in the third node; and the number of the first and second groups,

and adjusting the position of each node in the doubly linked list according to the accessed times, and correspondingly adjusting the pointer in each node in the doubly linked list.

If the storage structure of the memory is a doubly linked list with indexes, a counter pointer can be added to each node of the doubly linked list to record the accessed times of data elements in the corresponding node. In an alternative embodiment, if the nodes in the doubly linked list are sorted according to the accessed times of the data elements, the position of the second data element may need to be adjusted after the accessed times of the second data element in the third node are updated. That is, for example, the second data element is a, and two data elements B and C in the doubly linked list are used, after the number of times of access of a is updated, a node whose number of times of access is greater than a is searched forward from the node where a is located, and assuming that the number of times of access of B is greater than a and the number of times of access of C is less than a, the node where a is located is adjusted to be behind the node where B is located and in front of the node where C is located, and the pointer in the node where A, B, C is located is adjusted.

Optionally, after the second data element is stored in the memory in S54, in order to avoid repeated storage, the storage location of the second data element stored in the memory and the second data element stored in the external memory may also be deleted, where the specific process is as follows: deleting the table entry comprising the identifier of the second data element and the storage position of the second data element from the corresponding relation; and deleting the second data element from the external memory.

Specifically, the implementation process of storing the second data element in the memory in S54 specifically includes the following steps:

the first step, judge whether the memory space of the memory reaches the threshold value of presuming, if the memory space of the memory reaches the threshold value of presuming, carry out the second step; and if the memory space of the memory does not reach the set threshold value, executing the third step.

Determining a third data element with the least accessed times stored in the memory, storing the third data element in an external memory, and storing a table entry comprising an identifier and a storage position of the third data element in a corresponding relationship; deleting the identifier of the third data element stored in the index of the doubly linked list and the node where the third data element in the doubly linked list is located; establishing a fourth node corresponding to the identifier of the second data element in the doubly linked list, storing the second data element in the data field of the fourth node, and storing the identifier of the second data element in the index of the doubly linked list; and updating the accessed times of the second data element in the fourth node, adjusting the position of each node in the doubly linked list according to the accessed times, and correspondingly adjusting the pointer in each node in the doubly linked list.

Since the third data element is accessed the least times, which means that the third data element is less likely to be reused, it can be saved in the external storage, and the space of the memory is reserved for the data elements accessed the most times. In a specific embodiment, the second data element is a, if the storage amount of the memory reaches a set threshold, after deleting the third data element, the node is searched forward from the tail of the doubly linked list, once the accessed frequency of the data element B is found to be greater than 1, the node including a is established after the node where the B is located, if the B does not exist, the node including a is established at the tail of the doubly linked list, meanwhile, the pointer in the adjacent node of the fourth node where the a is located is adjusted, and the accessed frequency in the fourth node is updated to be 1.

Thirdly, establishing a fifth node corresponding to the identifier of the second data element in the doubly linked list, storing the second data element in the data field of the fifth node, and storing the identifier of the second data element in the index of the doubly linked list; and updating the accessed times of the second data element in the fifth node, adjusting the position of each node in the doubly linked list according to the accessed times, and correspondingly adjusting the pointer in each node in the doubly linked list.

In a specific embodiment, the second data element is a, and if the storage amount of the memory does not reach the set threshold, a fifth node including a is directly established at the tail of the doubly linked list, and meanwhile, a pointer in the fifth node is adjusted, and the number of times of being accessed in the fifth node is updated to be 1.

It should be noted that, the data caching method and the data query method are respectively described above, but for a large data terminal, the data caching method and the data query method may be executed separately, or the data caching method and the data query method may be executed simultaneously.

Based on the same inventive concept, an embodiment of the present application further provides a data caching apparatus, which corresponds to the data caching method shown in fig. 1, and the structure of the apparatus is shown in fig. 6, including

A first receiving unit 61, configured to receive data to be cached, and for each first data element included in the data to be cached, perform:

an obtaining unit 62, configured to obtain an identifier of the first data element;

a determining unit 63, configured to determine whether an existing data element corresponding to the identifier of the first data element is stored in a memory of the big data terminal;

the cache unit 64 is configured to replace an existing data element with a first data element if it is determined that the existing data element is stored in the memory; if the existing data elements are not stored in the memory, determining whether the memory capacity of the memory reaches a set threshold value; if the memory capacity of the memory does not reach the set threshold value, storing the first data element and the identification thereof in the memory; and if the storage capacity of the memory reaches a set threshold value, storing the first data element in an external memory of the big data terminal, and storing the identifier and the storage position of the first data element in the memory.

Specifically, the obtaining unit 62 is configured to obtain an identifier of the first data element, and specifically configured to:

looking up a selected field from the first data element;

and acquiring the value of the selected field to obtain the identifier of the first data element.

Specifically, the determining unit 63 is configured to determine whether an existing data element corresponding to the identifier of the first data element is stored in a memory of the big data terminal, and specifically includes:

acquiring a bidirectional linked list with indexes stored in a memory;

determining whether an index of a doubly linked list includes an identification of a first data element;

if the index of the doubly linked list comprises the identifier of the first data element, determining that the existing data elements are stored in the memory;

and if the index of the doubly linked list does not comprise the identifier of the first data element, determining that the existing data element is not stored in the memory.

Optionally, the determining unit 63 is further configured to:

after the existing data elements are stored in the memory, acquiring a first node corresponding to the identifier of a first data element in the doubly linked list;

acquiring an existing data element from a data field of a first node;

Specifically, the cache unit 64 is configured to store the first data element and the identifier thereof in the memory, and specifically configured to:

storing an identifier of a first data element in an index of a doubly linked list;

Specifically, the cache unit 64 is configured to store the first data element in an external memory of the big data terminal, and store the identifier and the storage location of the first data element in the memory, and specifically configured to:

acquiring the storage position of a first data element in an external memory of a big data terminal;

According to an embodiment of the present application, there is provided another data buffering apparatus, the structure of which is shown in fig. 7, the apparatus omits the table with the same units as the apparatus shown in fig. 6, and on the basis of the apparatus shown in fig. 6, the apparatus further includes:

a second receiving unit 65, configured to receive a data acquisition request that is sent by a user or another big data terminal and carries an identifier of a second data element;

a querying unit 66, configured to query the memory for the second data element or a storage location of the second data element according to the identifier of the second data element;

a sending unit 67, configured to send the second data element to a user or another big data terminal if the second data element is queried; if the storage position of the second data element is inquired, acquiring the second data element from an external memory according to the storage position of the second data element, and sending the second data element to a user or other big data terminals; and storing the second data element in the memory.

Specifically, the querying unit 66 is configured to query, according to the identifier of the second data element, the memory for the second data element or a storage location of the second data element, and specifically configured to:

respectively searching the identification of the second data element from a bidirectional linked list with indexes stored in the memory and the corresponding relation between the identification and the storage position;

if the identification of the second data element is found in the doubly linked list, acquiring a third node corresponding to the identification of the second data element from the doubly linked list, and acquiring the second data element from a data field of the third node;

and if the table entry comprising the identifier of the second data element is found in the corresponding relation, acquiring the storage position of the second data element in the table entry.

Optionally, the sending unit 67 is further configured to:

updating the number of times the second data element is accessed in the third node after the second data element is queried by the querying unit 66; and the number of the first and second groups,

Specifically, the sending unit 67 is configured to store the second data element in the memory, and specifically configured to:

judging whether the memory capacity of the memory reaches a set threshold value or not;

if the memory capacity of the memory reaches a set threshold value, determining a third data element with the least accessed times stored in the memory, storing the third data element in an external memory, and storing a table entry comprising an identifier and a storage position of the third data element in a corresponding relation; deleting the identifier of the third data element stored in the index of the doubly linked list and the node where the third data element in the doubly linked list is located; establishing a fourth node corresponding to the identifier of the second data element in the doubly linked list, storing the second data element in the data field of the fourth node, and storing the identifier of the second data element in the index of the doubly linked list; updating the accessed times of the second data element in the fourth node, adjusting the position of each node in the double linked list according to the accessed times, and correspondingly adjusting the pointer in each node in the double linked list;

if the memory capacity of the memory does not reach the set threshold value, establishing a fifth node corresponding to the identifier of the second data element in the doubly linked list, storing the second data element in the data field of the fifth node, and storing the identifier of the second data element in the index of the doubly linked list; and updating the accessed times of the second data element in the fifth node, adjusting the position of each node in the doubly linked list according to the accessed times, and correspondingly adjusting the pointer in each node in the doubly linked list.

Based on the same inventive concept, the embodiment of the present application further provides a data query device, which corresponds to the data query method shown in fig. 5, and the structure of the data query device is shown in fig. 8, which includes

A receiving unit 81, configured to receive a data acquisition request carrying an identifier of a second data element, sent by a user or another big data terminal;

the query unit 82 is configured to query the second data element or a storage location of the second data element in the memory of the big data terminal according to the identifier of the second data element;

a sending unit 83, configured to send the second data element to a user or another big data terminal if the second data element is queried; if the storage position of the second data element is inquired, acquiring the second data element from an external memory of the big data terminal according to the storage position of the second data element, and sending the second data element to a user or other big data terminals; and storing the second data element in the memory.

Specifically, the querying unit 82 is configured to query, in the memory, the second data element or a storage location of the second data element according to the identifier of the second data element, and specifically configured to:

Optionally, the sending unit 83 is further configured to:

updating the number of times the second data element is accessed in the third node after the second data element is queried by the querying unit 82; and the number of the first and second groups,

Specifically, the sending unit 83 is configured to store the second data element in the memory, and specifically is configured to:

The foregoing description shows and describes the preferred embodiments of the present application, but as aforementioned, it is to be understood that the application is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the application, which is to be protected by the claims appended hereto.

Claims

1. A data caching method is applied to a big data terminal, wherein the big data terminal belongs to a server side of a website, and the method is characterized by comprising the following steps:

acquiring an identifier of the first data element;

if the existing data elements are not stored in the memory, determining whether the memory capacity of the memory reaches a set threshold value; if the storage capacity of the memory does not reach the set threshold value, storing the first data element and the identification thereof in the memory; if the memory space of the memory reaches the set threshold value, storing the first data element in an external memory of the big data terminal, and storing the identifier and the storage position of the first data element in the memory;

and the data elements are used for the big data terminal to process a data acquisition request.

2. The method of claim 1, wherein obtaining the identifier of the first data element specifically comprises:

looking up a selected field from the first data element;

3. The method according to claim 1, wherein determining whether an existing data element corresponding to the identifier of the first data element is stored in a memory of the big data terminal specifically includes:

acquiring a bidirectional linked list with indexes stored in the memory;

4. The method of claim 3, wherein after determining that the existing data element is stored in the memory, further comprising:

acquiring the existing data element from the data field of the first node;

5. The method of claim 4, wherein storing the first data element and its identifier in the memory specifically comprises:

6. The method of claim 4, wherein storing the first data element in an external memory of the big data terminal, and storing the identity and storage location of the first data element in the memory, specifically comprises:

7. The method of any of claims 1-6, further comprising:

8. The method of claim 7, wherein querying the memory for the second data element or a storage location of the second data element according to the identifier of the second data element specifically comprises:

9. The method of claim 8, wherein querying the second data element further comprises:

10. The method of claim 8, wherein storing the second data element in the memory comprises:

judging whether the memory space of the memory reaches the set threshold value;

11. A data caching device is applied to a big data terminal, wherein the big data terminal belongs to a server side of a website, and the data caching device is characterized by comprising:

the cache unit is used for replacing the existing data element with the first data element if the existing data element is determined to be stored in the memory; if the existing data elements are not stored in the memory, determining whether the memory capacity of the memory reaches a set threshold value; if the storage capacity of the memory does not reach the set threshold value, storing the first data element and the identification thereof in the memory; if the memory space of the memory reaches the set threshold value, storing the first data element in an external memory of the big data terminal, and storing the identifier and the storage position of the first data element in the memory;

12. The apparatus according to claim 11, wherein the obtaining unit is configured to obtain the identifier of the first data element, and specifically is configured to:

looking up a selected field from the first data element;

13. The apparatus according to claim 11, wherein the determining unit is configured to determine whether an existing data element corresponding to the identifier of the first data element is stored in a memory of the big data terminal, and specifically includes:

acquiring a bidirectional linked list with indexes stored in the memory;

14. The apparatus of claim 13, wherein the determining unit is further configured to:

acquiring the existing data element from the data field of the first node;

15. The apparatus according to claim 14, wherein the cache unit is configured to store the first data element and the identifier thereof in the memory, and is specifically configured to:

16. The apparatus of claim 14, wherein the caching unit is configured to store the first data element in an external memory of the big data terminal, and store an identifier and a storage location of the first data element in the memory, and is specifically configured to:

17. The apparatus of any of claims 11-16, further comprising:

18. The apparatus as claimed in claim 17, wherein the querying unit is configured to query the memory for the second data element or a storage location of the second data element according to the identifier of the second data element, and is specifically configured to:

19. The apparatus of claim 18, wherein the sending unit is further configured to:

20. The apparatus as claimed in claim 18, wherein said sending unit is configured to store said second data element in said memory, and is specifically configured to:

judging whether the memory space of the memory reaches the set threshold value;