CN117827102A

CN117827102A - Method for supporting data local cache and high concurrency security access

Info

Publication number: CN117827102A
Application number: CN202311710296.1A
Authority: CN
Inventors: 李海兵; 叶子聪; 康江彬; 王文娟; 王妍兰
Original assignee: Tianyi Cloud Technology Co Ltd
Current assignee: Tianyi Cloud Technology Co Ltd
Priority date: 2023-12-13
Filing date: 2023-12-13
Publication date: 2024-04-05

Abstract

The invention belongs to the technical field of content distribution networks, and particularly relates to a method for supporting data local cache and high concurrency secure access. The invention can support the high concurrency safe read-write of hundreds of millions of data, has low memory occupation, generates zero fragments of the memory, can not lose cache data when the CDN cache gateway process exits normally (abnormally), and can control the total scale of the cache data and automatically eliminate historical data when the data volume reaches the upper limit of cache storage.

Description

Method for supporting data local cache and high concurrency security access

Technical Field

The invention belongs to the technical field of content distribution networks, and particularly relates to a method for supporting data local cache and high concurrency security access.

Background

The Content Delivery Network (CDN) is a novel network content service system, is constructed based on an IP network, provides content delivery and service based on efficiency requirements, quality requirements and content order of content access and application, is an intelligent virtual network constructed on the basis of the existing network, and enables users to obtain required content nearby by means of the edge servers deployed in various places and through the functional modules of load balancing, content delivery, scheduling and the like of a central platform, so that network congestion is reduced, and user access response speed is improved.

Problems of the prior art:

currently, there are many scenarios of a CDN cache gateway related to querying whether an element is in a dataset, for example, whether an ip of a query client is in an ip blacklist repository, whether url requested by the query client is in a url forbidden repository, and if the above ip blacklist repository and url forbidden repository are stored in a centralized manner in one place: for example, in the database, the CDN node machine is used for inquiring the database, so that the method is feasible in theory and is not feasible in practice, and the main reasons are as follows:

the number of CDN node machines is tens of thousands, one node machine (query rate per second) is assumed to be 3000, all the node machines access the database, any database host with 3000 tens of thousands of QPS access can not meet the search query requirement, if database clusters are built, a large number of database clusters are needed, and the management, maintenance and operation costs are high.

The performance of the CDN cache gateway is important, after all, the user side opens a webpage or video, the shorter the time is, the better the time is, if each request accesses a database through a network protocol stack, on one hand, the network fluctuation can lead to the lengthening of the request time, on the other hand, the high availability of the database is needed to be relied on, and if the service is down, the whole request can be influenced.

Since the remote access approach mentioned above is difficult to implement, it is another approach to drop the data set access to local access on each CDN node.

Disclosure of Invention

The invention aims to provide a method for supporting data local cache and high concurrency safe access, which can support high concurrency safe read-write of hundreds of millions of data, has low memory occupation, generates memory zero fragments, does not lose cache data when a CDN cache gateway process normally (abnormally) exits, and can control the total scale of the cache data and automatically eliminate historical data when the data volume reaches the upper limit of cache storage.

The technical scheme adopted by the invention is as follows:

a method for supporting data local cache and high concurrency security access comprises the steps of creating a shared memory through a mmap function provided by an operating system, wherein the shared memory establishes a mapping relation with a corresponding specific file on a disk file.

When the shared memory writes data, the operating system synchronizes the shared memory data into the disk file in real time, wherein the process of writing the cache data and the process of inquiring the data between the shared memory and the disk file can be realized, and when the cache data exceeds the capacity of the shared memory, the lru elimination algorithm automatically deletes old data and then writes new data.

The shared memory is divided into a lock management area, a data area, a hash bucket area, an algorithm management area and a transaction management area, wherein the lock management area is used for controlling modification of data in a hash table critical area, ensuring that only one process can modify the critical area data at the same time, wherein a shared lock is added through the lock management area when the data is read and written, the transaction management area is used for ensuring whether modification is executed or not, avoiding an intermediate state, and being responsible for clearing unreleased locks and rollback intermediate state transactions when the processes exit abnormally.

The write cache data flow includes:

step 1, adding a write lock to a lock management area, starting a transaction to a transaction management area, and temporarily cancelling write protection for a hash bucket memory page;

step 2, calculating a hash value of the cache data according to the hash function;

step 3, searching whether the bucket is written into the corresponding hash bucket according to the hash value, if so, turning to the next step, otherwise turning to the step 5;

step 4, matching whether the memory data is consistent with the written cache data or not according to the value recorded by the hash bucket to the data area, and if so, directly updating, and turning to step 8;

otherwise, continuing searching, and turning the data which are not found to be matched into the next step;

step 5, searching a free space from the data area, if no free space is available, turning to the next step, otherwise turning to the step 7;

step 6, eliminating the longest history record from the algorithm management area and placing the longest history record in the idle space of the data area;

step 7, applying for memory from the free space of the data area, writing in the cache data, and storing the cache data on the hash bucket at the subscript position of the data area:

and 8, ending the transaction in the transaction management area, releasing the write lock in the lock management area, and recovering the write protection of the memory page of the hash bucket.

The query data flow includes:

s1, adding a read lock to a lock management area;

s2, calculating a hash value of the cache data according to the hash function;

s3, searching whether the bucket has a value according to the hash value to the corresponding hash bucket, if so, turning to the next step, otherwise, returning to the unmatched state, and releasing the read lock to the lock management area;

s4, matching whether the memory data is consistent with the queried cache data or not according to the value recorded by the hash bucket to the data area, and if so, turning to the next step;

if not, continuing searching, returning unmatched data which are not found, and releasing the read lock to the lock management area;

s5, updating data access time in an algorithm management area;

s6, returning a matching result, and releasing the read lock to the lock management area.

The invention has the technical effects that:

the invention can realize the high concurrency safe read-write of hundreds of millions of data, has low memory occupation, can not lose cache data when the CDN cache gateway process normally (abnormally) exits when the zero fragments of the memory are generated, can control the total scale of the cache data when the data volume reaches the upper limit of cache storage, and automatically eliminates historical data.

According to the invention, by supporting high concurrency read-write, lock control is added to critical region data, lock granularity is reduced to read locks and write locks, and read concurrency is higher.

According to the invention, the memory page protection mechanism is added to safely strengthen the cache data, so that the wild pointer or the buffer area is prevented from overflowing and writing the dirty shared memory cache data.

The invention can avoid the occurrence of intermediate transactions and dirty data by supporting an automatic rollback mechanism of the transactions, and the data keeps strong consistency.

The invention avoids generating memory fragments, and the stored data are of fixed length type after being processed, thereby avoiding the memory fragments caused by different data sizes.

Drawings

FIG. 1 is a block diagram of the present invention;

FIG. 2 is a flow chart of writing cache data in the present invention;

FIG. 3 is a flow chart of the present invention for reading buffered data.

Detailed Description

The present invention will be specifically described with reference to examples below in order to make the objects and advantages of the present invention more apparent. It should be understood that the following text is intended to describe only one or more specific embodiments of the invention and does not limit the scope of the invention strictly as claimed.

As shown in FIG. 1, a method for supporting data local cache and high concurrency secure access includes creating a shared memory by an mmap function provided by an operating system, wherein the shared memory establishes a mapping relationship with a corresponding specific file on a disk file.

When the shared memory writes data, the operating system synchronizes the shared memory data into the disk file in real time, wherein the data writing and caching process and the data inquiring process between the shared memory and the disk file can be realized.

When the cache data exceeds the capacity of the shared memory, the lru elimination algorithm automatically deletes old data and then writes new data, and different functional partitions are created in the shared memory, so that the shared memory is divided into a lock management area, a data area, a hash bucket area, an algorithm management area and a transaction management area, the cache data is enabled to support high concurrency on the basis of meeting high-speed query insertion, transaction processing is added to the cache data writing, rollback abnormal middle transaction is supported, meanwhile, safety reinforcement processing is added to the cache data, and illegal modification is prevented, so that the cache data is prevented from being scattered. Dividing the shared memory into the following intervals:

lock management area: and controlling the modification of the critical section data of the hash table, and ensuring that only one process can modify the critical section data at the same time.

Data area: is a fixed length array, and the size of each element in the array is equal. The element stores externally written specific data and data metadata information, such as the actual length of the query key of the data, and the array index of the next data with the same hash value.

Hash bucket area: is a row of consecutively arranged buckets, each bucket storing a value that is a data index of the upper data field. Because only the array subscripts are stored, the occupied space is small, and the proper enlarged hash bucket area does not occupy too much memory, but the hash collision probability after data hashing can be reduced.

Algorithm management area: and recording the access sequence of each data, and eliminating the data which is written earliest but not accessed when the memory is full of data.

Transaction management area: to ensure that modifications are either done entirely or not, avoiding intermediate states. When the process exits abnormally, the process is responsible for clearing unreleased locks and rolling back intermediate state transactions.

Referring to fig. 2, the flow of writing cache data into the shared memory is as follows:

and performing write protection on the shared memory page during initialization to prevent accidental modification.

1. Adding a write lock to the lock management area, starting a transaction to the transaction management area, and temporarily canceling write protection for the hash bucket memory page.

2. And calculating the hash value of the cache data according to the hash function.

3. And searching whether the bucket is written in or not according to the hash value to the corresponding hash bucket, if so, turning to the next step, otherwise turning to the 5 th step.

4. And (8) according to whether the data area matching memory data is consistent with the written cache data or not according to the value recorded by the hash bucket, if so, directly updating, and turning to the step 8. Otherwise, continuing searching, and turning to the next step if the matched data are not found.

5. Searching for free space from the data area, and turning to the next step if no free space exists, otherwise turning to the 7 th step.

6. The longest history is eliminated from the algorithm management area and placed in the free space of the data area.

7. Applying for memory from the free space of the data area, writing in the cache data, and storing the cache data on the hash bucket at the subscript position of the data area.

8. Ending the transaction to the transaction management area, releasing the write lock to the lock management area, and recovering the write protection of the memory page of the hash bucket.

Referring to fig. 3, the flow of querying the shared memory data:

1. adding a read lock to the lock management area;

2. calculating a hash value of the cache data according to the hash function;

3. searching whether the bucket has a value according to the hash value to the corresponding hash bucket, if so, turning to the next step, otherwise, returning to the unmatched state, and releasing the read lock to the lock management area;

4. matching whether the memory data is consistent with the queried cache data or not according to the value recorded by the hash bucket to the data area, and if so, turning to the next step;

5. updating the data access time to the algorithm management area;

6. and returning a matching result, and releasing the read lock to the lock management area.

In order to support efficient query and update of data at hundreds of millions, the data storage format is either a red-black tree or a hash table, and considering the time complexity of O (1) under the optimistic condition of the hash table, the query and update performance is very high because the data is stored in the hash table and the query and update time complexity is ideal.

Furthermore, in order to support concurrent query and update, locking control is needed when data is operated, if mutual exclusion locking is added to data reading and writing, the locking range is expanded, data reading cannot cause data change, mutual exclusion is not needed between reading, and only a shared lock is added, so that the data query efficiency can be greatly improved.

Further, to ensure that data is not lost when the process exits (normal/abnormal), one way is that the data resides in shared memory, but if the host is restarted, the shared memory data is also lost;

another way is to synchronize the shared memory data to the disk file, which is used in the present application (by means of an automatic write back mechanism provided by the operating system).

However, many memory fragments are generated due to inconsistent allocation and release sizes, in order to avoid memory fragments, constraint is added in the method, the size of a slot (slot) for pre-allocating stored data in a shared memory is fixed, so that the memory space size of each writing and deleting is the same, memory fragments do not exist, when the length of url is inconsistent for a plurality of requests of a CDN, md5 value is calculated for url, and due to the fact that the md5 value is fixed by 32 bits, the fact that the written memory is fixed is ensured, and md5 signature calculation processing is also performed for url during inquiry.

Furthermore, because the memory of the host is fixed, the hash table occupies memory and cannot be expanded infinitely, the total size of the stored data needs to be controlled, when the written data exceeds the total size, the scheme supports a data elimination algorithm LRU, eliminates the historical data, and then writes new data.

Because the environment on the CDN line is complex, sometimes the CDN gateway process is terminated when being written into the hash table or the process is abnormally exited under other conditions, the transaction is incomplete, and other process processing is blocked because the abnormal transaction is locked and not released.

In order to prevent the created hash table shared memory from being abnormally rewritten by other modules, such as wild pointers or buffer overflow and rewrite, the scheme performs page write protection on the shared memory pages of the hash table, and only the modules of the scheme have the hash table memory rewrite authority, and other modules throw abnormal errors once rewritten.

The wild pointer refers to a pointer which is not initialized, a random address pointer is defaulted by the system, and if the wild pointer is used carelessly, the segment error of memory leakage is easy to cause; buffer overflow is an abnormal phenomenon, namely when a section of program tries to put more data into a buffer, the data exceeds the capacity of the buffer, so that the data is damaged, the program crashes and the like; writing dirty shared memory cache data refers to the problem that when multiple processes share the same block of memory, one process modifies the data in the block of memory, and other processes are unaware of this, so that the data is inconsistent.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention. Structures, devices and methods of operation not specifically described and illustrated herein, unless otherwise indicated and limited, are implemented according to conventional means in the art.

Claims

1. A method for supporting data local caching and high concurrency secure access, comprising:

creating a shared memory through a mmap function provided by an operating system, wherein the shared memory establishes a mapping relation with a corresponding specific file on a disk file;

when the shared memory writes data, the operating system synchronizes the shared memory data into the disk file in real time, wherein the data writing and caching process and the data inquiring process between the shared memory and the disk file can be realized;

when the cache data exceeds the capacity of the shared memory, the lru elimination algorithm automatically deletes old data and then writes new data;

the shared memory is divided into a lock management area, a data area, a hash bucket area, an algorithm management area and a transaction management area;

the lock management area is used for controlling modification of the data of the critical area of the hash table, so that only one process can modify the data of the critical area at the same time, wherein a shared lock is added through the lock management area when the data is read and written;

the transaction management area is used for ensuring whether modification is executed or not, avoiding generating an intermediate state, and is responsible for clearing unreleased locks and rolling back intermediate state transactions when the process is abnormally exited.

2. A method of supporting data local caching and high concurrency security access as recited in claim 1, wherein: the data area is a fixed-length array, the size of each element in the array is equal, and the elements store external writing specific data and data metadata information.

3. A method of supporting data local caching and high concurrency security access as recited in claim 1, wherein: the hash bucket area is a row of buckets which are arranged in series, and the value stored in each bucket is a data index of the data area above.

4. A method of supporting data local caching and high concurrency security access as recited in claim 1, wherein: the algorithm management area is used for recording the sequence of each data access, and when the memory is full of data, the data which is written earliest but not accessed is eliminated.

5. A method of supporting data local caching and high concurrency security access as recited in claim 1, wherein: the data writing and caching process comprises the following steps:

6. A method of supporting data local caching and high concurrency security access as recited in claim 1, wherein: the query data flow includes:

s1, adding a read lock to a lock management area;

s2, calculating a hash value of the cache data according to the hash function;

s5, updating data access time in an algorithm management area;

7. A method of supporting data local caching and high concurrency security access as recited in claim 1, wherein: the mode of synchronizing the shared memory data into the disk file is realized through an automatic write-back mechanism provided by an operating system.

8. A method of supporting data local caching and high concurrency security access as recited in claim 1, wherein: and the size of a card slot for pre-distributing stored data in the shared memory is fixed.

9. A method of supporting data local caching and high concurrency security access as recited in claim 1, wherein: and when the shared memory writes in the cache data flow and inquires the data flow, performing md5 signature calculation on url to obtain an md5 value.

10. A method of supporting data local caching and high concurrency security access as recited in claim 1, wherein: and the lock management area writes and locks the shared memory pages of the hash table, and performs page write protection on the shared memory pages.