WO2022016946A1

WO2022016946A1 - Shared caching method, baseband processing unit, and chip thereof

Info

Publication number: WO2022016946A1
Application number: PCT/CN2021/090998
Authority: WO
Inventors: 朱佳; 沈家瑞; 丁杰; 蒋云翔; 文承淦; 刘勇; 黄维; 陈宇
Original assignee: 长沙海格北斗信息技术有限公司
Priority date: 2020-07-20
Filing date: 2021-04-29
Publication date: 2022-01-27
Also published as: CN111737191A; CN111737191B

Abstract

Disclosed is a shared caching method, comprising: setting a shared cache area shared by a capture subsystem and a plurality of tracking subsystems; designing the shared cache area according to the number of access requests; and performing tracking access control, capture access control, and cache clock control. Further disclosed are a baseband processing unit comprising the shared cache method above, and a chip comprising the shared cache method and the baseband processing unit. By sharing a cache unit and controlling the shared cache unit, the present invention effectively improves the utilization rate of a sampling point cache and effectively reduces the capacity of a cache. At the same time, the present invention can effectively reduce the cache area of the chip, which facilitates the miniaturized design of the chip. In addition, the present invention improves the utilization rate and unity of the design of a cache, reduces the power consumption of the cache, and has high reliability and good practicability.

Description

Shared cache method, baseband processing unit and chip thereof

technical field

The invention belongs to the field of chip design, and in particular relates to a shared cache method, a baseband processing unit and a chip thereof.

Background technique

With the development of economy and technology and the improvement of people's living standards, navigation has become an indispensable auxiliary function in people's production and life, bringing endless convenience to people's production and life.

In the high-precision navigation chip, the baseband processing unit mainly includes two parts: the capture subsystem and the tracking subsystem. In order to support the application scenarios of multi-system and multi-frequency points, especially for the high-end requirements of positioning and orientation, the high-precision navigation chip needs to support the feature of simultaneous tracking of multi-frequency points, and multiple tracking subsystems are introduced. To support the multi-channel feature in each tracking subsystem, a large buffer of tracking samples needs to be introduced. In the design of the capture module, to improve the capture sensitivity, it is necessary to introduce a large capture sampling point buffer. In the traditional baseband processing method, separate design of the sampling point buffer for capture and tracking will bring a lot of area and power consumption. The block diagram of a typical solution is shown in Figure 1.

The baseband processing unit mainly includes two parts: acquisition and tracking:

In the typical design of the tracking module, 8 tracking subsystems are introduced, which can support the simultaneous tracking of 8 frequency points. Each tracking subsystem is designed with 4 physical related channels, and supports up to 16 logical channels for simultaneous tracking through multiplexing. After the sampling points are preprocessed, they are written to the trace sampling point buffer. For good tracking sensitivity characteristics, the sampling rate is up to 80MHz. In order to support high sampling rate and related multiplexing of channels, the design capacity of trace sampling point buffer is 64KB. Therefore, the sampling point buffer capacity of all tracking subsystems is 64K*8=512KB;

In the typical design of the capture module, a specific channel is selected for multiple sampling points according to the configuration, and is written into the capture sampling point buffer after the preprocessing of the capture. The capture algorithm processing unit repeatedly reads the data buffered by the capture sampling point for coherent cumulative integration and matching selection. The time of coherent accumulation and integration will affect the capture sensitivity, and a longer integration time will result in higher capture sensitivity. A typical capture sample buffer is configured with a capacity of 512KB.

Therefore, in the prior art, the total capacity requirement of all sampling point buffers is 1MB, which leads to a sharp increase in the area and power consumption of chip design. In practical applications, tracking with high sampling rate, concurrent tracking of all subsystems, and multiplexing of all tracking channels; therefore, in the traditional cache-independent design, the utilization efficiency of the sampling point cache is low.

SUMMARY OF THE INVENTION

One of the objectives of the present invention is to provide a shared cache method capable of effectively reducing cache capacity, improving cache utilization, high reliability and good practicability.

Another object of the present invention is to provide a baseband processing unit including the shared cache method.

The third object of the present invention is to provide a chip including the shared cache method and a baseband processing unit.

The shared cache method provided by the present invention includes the following steps:

S1. Set the shared buffer area shared by the capture subsystem and several tracking subsystems;

S2. According to the number of access requests, design the shared buffer area obtained in step S1; specifically, there are A-way tracking subsystem and B-way capture subsystem; each tracking subsystem has a1 write requests and a2 read requests , and the a1+a2 requests of each tracking subsystem access the same buffer at the same time; each capture subsystem has b1 write requests, b2 read requests, and each capture subsystem has b1+b2 requests Time-sharing access to the same batch of cache areas; a total of C KB is designed for the shared cache area, and divided into D cache units, each of which is E KB; A, B, a1, a2, b1, b2, C, D, and E are all is a positive integer, and E=C/D;

S3. According to the shared buffer area designed in step S2, trace access control, capture access control and cache clock control are performed.

According to the number of access requests described in step S2, the shared buffer area obtained in step S1 is designed, specifically, there are 8 tracking subsystems and 1 capturing subsystem; each tracking subsystem has 1 write request, 4 Read requests, and 5 requests of each tracking subsystem access the same cache area at the same time; the capture subsystem has 1 write request and 1 read request, and the 2 requests of the capture subsystem access the same batch in time-sharing Cache area; the shared cache area is designed with a total of 640KB, and is divided into 40 cache units, each of which is 16KB.

The tracking access control described in step S3 is specifically controlled by the following steps:

The tracking access control is divided into control flow control, write flow control and read flow control;

For control flow control: control the cache space address, and divide the system time window into several control segments;

For write flow control: control the splicing of sampling point data, and write the spliced sampling point data into the cache unit in the time slot of the last control segment;

For read flow control: divided into 4 parallel channels, 4 parallel channels work independently of each other, and meet the sampling point bandwidth of 4 correlators working at the same time; when a channel correlator initiates a read request, in the corresponding control time slot Control the timing read cache unit, split the data and return it to the correlator in order.

The capture access control described in step S3 is specifically controlled by the following steps:

Configure the start address and space capacity of the cache used by the capture subsystem, and ensure that it does not overlap with the cache space of the tracking subsystem; after the capture sampling point is preprocessed, write the data into the capture buffer, wait for the set sampling point to be collected, and repeat from The capture buffer reads the data for calculation, and finally outputs the capture result and releases the capture buffer.

The buffer clock control described in step S3 is specifically controlled by the following steps:

Configure the clock of each cache unit individually;

According to the configuration of the cache unit, dynamically switch the clock enable of each cache unit;

When a cache unit is assigned to a subsystem, the clock of the cache unit is automatically turned on; when the cache unit is released, the clock of the cache unit is automatically turned off.

The present invention also provides a baseband processing unit, which includes the above-mentioned shared cache method.

The present invention also provides a chip, which includes the above-mentioned shared cache method and baseband processing unit.

The shared cache method, the baseband processing unit and the chip thereof provided by the present invention effectively improve the utilization rate of the sampling point cache and effectively reduce the cache capacity by sharing the cache unit and controlling the shared cache unit; At the same time, the invention can effectively reduce the cache area of the chip, which is beneficial to the miniaturization design of the chip; meanwhile, the invention improves the utilization rate and uniformity of the cache design, reduces the power consumption of the cache, and has high reliability and good practicability.

Description of drawings

FIG. 1 is a schematic diagram of functional modules of a baseband processing unit in an existing high-precision navigation chip.

FIG. 2 is a schematic flow chart of the method of the present invention.

FIG. 3 is a functional block diagram of the hardware implementation of the method of the present invention.

FIG. 4 is a schematic diagram of functional modules of the shared cache unit of the method of the present invention.

FIG. 5 is a schematic flowchart of a method for tracking access control according to the method of the present invention.

FIG. 6 is a schematic diagram of the configuration of a cache array according to an embodiment of the method of the present invention.

detailed description

Figure 2 is a schematic flow chart of the method of the present invention: this shared cache method provided by the present invention includes the following steps:

S1. Set the shared buffer area shared by the capture subsystem and several tracking subsystems (as shown in Figure 3);

In specific implementation, the technical solution shown in Figure 4 can be adopted: there are 8 tracking subsystems and 1 capturing subsystem; each tracking subsystem has 1 write request, 4 read requests, and each track 5 requests (1 write request and 4 read requests) of the tracking subsystem access the same cache area at the same time; the capture subsystem has 1 write request, 1 read request, and 2 of the capture subsystem Requests (1 write request and 1 read request) access the same batch of cache areas in time-sharing; the shared cache area is designed to be 640KB in total and divided into 40 cache units, each of which is 16KB;

S3. According to the shared cache area designed in step S2, track access control (as shown in Figure 5), capture access control and cache clock control;

Tracking access control: Up to 8 tracking subsystems work at the same time, and each subsystem needs to be allocated an independent cache space. Each subsystem has different sampling point rate requirements, so the size of the cache space may be different, and the cache spaces cannot overlap each other; There are 1 write request and 4 read requests in each subsystem, which will access the same cache unit at the same time, so time-sharing control is required;

In the specific implementation, the tracking access control is divided into control flow control, write flow control and read flow control;

For read flow control: divided into 4 parallel channels, 4 parallel channels work independently of each other, and meet the sampling point bandwidth of 4 correlators working at the same time; when a channel correlator initiates a read request, in the corresponding control time slot Control the timing read cache unit, split the data and return it to the correlator in order;

In the picture:

base addr indicates the allocation base address; buf size indicates the allocation cache capacity; slice_cnt indicates the time window count;

sample_vld represents the valid flag of the sampling point; sample_cnt represents the count of the valid flag of the sampling point; sample data joint represents the data splicing value of the sampling point; write buffer represents the write cache unit;

read_req[n] indicates that the nth channel initiates a read request; read_flag[n] indicates that the nth channel is currently reading data; slice_cnt indicates the time window count; read buffer indicates the read buffer unit; send samples indicates sending sampling point data;

Capture access control is controlled using the following steps:

Configure the start address and space capacity of the cache used by the capture subsystem, and ensure that it does not overlap with the cache space of the tracking subsystem; after the capture sampling point is preprocessed, write the data into the capture buffer, wait for the set sampling point to be collected, and repeat from The capture cache reads the data for calculation, and finally outputs the capture result and releases the capture cache;

In the specific implementation, in order to meet the requirements of the capture time, the operation bit width of the capture algorithm processing access to the capture cache is 256 bits, so the unit of the capture operation to allocate the shared cache is 4 cache units, and the user software needs to allocate space. The buffer clock control is controlled by the following steps:

Configure the clock of each cache unit individually;

When a certain cache unit is allocated to a certain subsystem, the clock of the cache unit is automatically turned on; when the cache unit is released, the clock of the cache unit is automatically turned off, thereby reducing power consumption.

The advantages of the present invention are illustrated below through a typical application.

The user configures the tracking subsystem 1 to allocate 4 buffer units, the tracking subsystem 2 to allocate 6 buffer units, and the capture subsystem to allocate 16 buffer units. The configuration of the buffer array is shown in Figure 6.

In this application, there are 40 cache units in total, 26 cache units are allocated and used, and the utilization rate is 65%, and the unallocated cache units are in a clock-off state.

In the method of the invention, the sampling point buffers of the tracking system and the capture system are uniformly designed and divided, the buffer space is dynamically allocated to each system by the user software, and the clock switch of each buffer unit is automatically managed by the logic, which can reduce the overall area of the buffer and improve the buffer space. The utilization rate of the chip is reduced, and the power consumption of the chip is reduced, which has a high promotion value; its value is mainly reflected in the following aspects: (1) The cache area of the chip is effectively reduced, and the total cache capacity is reduced to 62.5%, while satisfying the vast majority of The demand for cache in the scene reduces the overall area of the chip, which is conducive to the miniaturized design of the chip, and further provides a foundation for the portability of the product; (2) Improve the utilization and uniformity of the cache design, for different subsystems Allocate different cache sizes and bandwidths, which effectively improves the utilization rate. The size of each cache unit is uniform, which also improves the simplicity of design and reduces the difficulty of back-end design; (3) Reduces cache power consumption and automatically monitors through logic Whether each cache unit is allocated for use, automatically turns on or off the clock of each cache unit, realizes refined management of power consumption, and effectively reduces the power consumption of the chip.

Claims

A shared cache method, comprising the steps of:

S1. Set the shared buffer area shared by the capture subsystem and several tracking subsystems;

S2. According to the number of access requests, design the shared buffer area obtained in step S1; specifically, there are A-way tracking subsystem and B-way capture subsystem; each tracking subsystem has a1 write requests and a2 read requests , and the a1+a2 requests of each tracking subsystem access the same buffer at the same time; each capture subsystem has b1 write requests, b2 read requests, and each capture subsystem has b1+b2 requests Time-sharing access to the same cache area; a total of C KB is designed in the shared cache area and divided into D cache units, each of which is E KB; A, B, a1, a2, b1, b2, C, D and E are all is a positive integer, and E=C/D;

S3. According to the shared buffer area designed in step S2, trace access control, capture access control and cache clock control are performed.
The shared cache method according to claim 1, characterized in that the shared cache area obtained in step S1 is designed according to the number of access requests described in step S2, specifically, there are 8 tracking subsystems and 1 capturing subsystem; each A tracking subsystem has 1 write request and 4 read requests, and 5 requests of each tracking subsystem access the same cache area at the same time; the capture subsystem has 1 write request, 1 read request, And the two requests of the capture subsystem access the same cache area in a time-sharing manner; the shared cache area is designed to be 640KB in total, and is divided into 40 cache units, each of which is 16KB.
The shared cache method according to claim 2, wherein the tracking access control described in step S3 is specifically controlled by the following steps:

The tracking access control is divided into control flow control, write flow control and read flow control;

For control flow control: control the cache space address, and divide the system time window into several control segments;

For write flow control: control the splicing of sampling point data, and write the spliced sampling point data into the cache unit in the time slot of the last control segment;

For read flow control: divided into 4 parallel channels, 4 parallel channels work independently of each other, and meet the sampling point bandwidth of 4 correlators working at the same time; when a channel correlator initiates a read request, in the corresponding control time slot Control the timing read cache unit, split the data and return it to the correlator in order.
The shared cache method according to claim 2, wherein the capturing access control described in step S3 is specifically controlled by the following steps:

Configure the cache starting address and space capacity used by the capture subsystem, and ensure that it does not overlap with the cache space of the tracking subsystem;

After the capture sampling point is preprocessed, the data is written into the capture buffer. After waiting for the set sampling point to be collected, the data is repeatedly read from the capture buffer for calculation, and the capture result is finally output and the capture buffer is released.
The shared cache method according to claim 2, wherein the cache clock control described in step S3 is specifically controlled by the following steps:

Configure the clock of each cache unit individually;

According to the configuration of the cache unit, dynamically switch the clock enable of each cache unit;

When a cache unit is assigned to a subsystem, the clock of the cache unit is automatically turned on; when the cache unit is released, the clock of the cache unit is automatically turned off.
A baseband processing unit is characterized in that it includes the shared cache method according to any one of claims 1 to 5.
A chip is characterized by comprising the baseband processing unit of claim 6 .