WO2021104393A1

WO2021104393A1 - Method for achieving multi-rule flow classification, device, and storage medium

Info

Publication number: WO2021104393A1
Application number: PCT/CN2020/131891
Authority: WO
Inventors: 傅斌; 刘衡祁
Original assignee: 深圳市中兴微电子技术有限公司
Priority date: 2019-11-27
Filing date: 2020-11-26
Publication date: 2021-06-03
Also published as: CN112866139A

Abstract

Provided in the present application are a method for achieving multi-rule flow classification, a device, and a storage medium. The method comprises: in the situation in which it is detected that a hash lookup instruction is triggered, generating a corresponding original field on the basis of an original field structure and according to a field to be extracted in original input information; generating a corresponding intermediate field according to the original field and table item content in a pre-configured system device table (SDT); according to the intermediate field, searching a preset hash bucket for at least one target field matching both the bit width of the field to be extracted and the hash table width; and, according to the order of priority of the at least one target field, using the target field having the highest priority as an output result for output.

Description

Method, equipment and storage medium for implementing multi-rule flow classification

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office with an application number of 201911183380.6 on November 27, 2019. The entire content of this application is incorporated into this application by reference.

Technical field

This application relates to communications, for example, to a method, device, and storage medium for implementing multi-rule flow classification.

Background technique

Gigabit-capable Passive Optical Networks (GPON) is the network technology with the most complete architecture and the most complete standard content in the PON series of technologies. It has been widely used in access networks and is used to replace asymmetric digital users. The best solution for loop (Asymmetric Digital Subscriber Line, ADSL) access technology. GPON network terminal equipment-Optical Network Unit (ONU) is one of the core chips of GPON chips. The detection and processing of Ethernet packets by the chip is a very important link. It needs to identify a variety of packets in the network, and has different processing methods for different packets.

With the continuous enrichment of services, the processing methods of the same message are different in different scenarios. The microcode editing in the Network Processor (NP) technology can be combined in different ways according to different processing methods. Therefore, it becomes very important to use NP technology to process message services.

In NP technology, stream classification is mainly based on Ternary Content Addressable Memory (TCAM); using TCAM has an advantage: for a specified scene, the feature fields that need to be extracted can be solidified.

However, there are several aspects to consider when using TCAM: one is that in the scenario of GPON terminal, the number of entries required is very large, and the capacity of TCAM is generally not large; the other is that TCAM has many functions, and ONU terminal only uses one of them. A small part of the function is wasted. Whether it is the purchase of Intellectual Property (IP) or the consumption of chip area, from the terminal point of view, the cost is a sensitive factor; third, when using TCAM, there is a good chip. Rate issues, which also makes the cost further increase.

Before the NP architecture, TCAM could not be used when performing flow classification processing. At this time, at least 3 types of table lookups are required to look up a rule. The first type is the rule extraction index table, and the second type is the extraction rule table. The third category is HASH lookup. If the original table lookup process is directly borrowed into the current NP architecture, the number of table lookups will become very large due to the limitation of the return bit width of the table lookup, and there is a delay in the table lookup, and the increase in delay will make the NP internal The processing time of the threads becomes longer, which reduces the performance in disguise. In order to improve the performance, more threads can only be logically added to satisfy the application. But doing so will increase the area, that is, the cost will increase.

From the above description, it can be seen that using TCAM directly under the NP architecture is not a very ideal choice for GPON terminal applications. At the same time, using the flow classification rules under the non-NP architecture cannot perfectly fit the NP architecture.

Summary of the invention

The embodiments of this application provide a method, device and storage medium for implementing multi-rule flow classification. Under the premise of ensuring that service packets can be processed flexibly, the delay in table lookup for NPs for flow classification rules is reduced architecturally, and GPON is reduced at the same time. The cost of the terminal chip.

A method for implementing multi-rule flow classification is provided, including:

In the case of detecting that the hash HASH search instruction is triggered, the corresponding original field is generated based on the original field structure and the field to be extracted in the original input information;

Generate corresponding intermediate fields according to the original fields and the contents of the entries in the pre-configured system equipment table SDT;

Searching for at least one target field matching the bit width of the field to be extracted and the width of the HASH table from the preset HASH bucket according to the intermediate field;

According to the priority order of the at least one target field, the target field with the highest priority is output as the output result.

A device is also provided, including: a memory, and one or more processors;

Memory, set to store one or more programs;

When the one or more programs are executed by the one or more processors, the one or more processors implement the method described in any embodiment of the present application.

A storage medium is also provided, the storage medium stores a computer program, and when the computer program is executed by a processor, the method described in any embodiment of the present application is implemented.

Description of the drawings

FIG. 1 is a flowchart of a method for implementing multi-rule flow classification according to an embodiment of the present application;

FIG. 2 is a schematic flowchart of a flow classification provided by an embodiment of the present application;

FIG. 3 is a flow chart of processing from HASH KEY preprocessing to comparison of HASH results in a flow classification according to an embodiment of the present application;

Figure 4 is a schematic diagram of a HASH entry structure provided by an embodiment of the present application;

FIG. 5 is a flowchart of an expansion port acquisition item provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a reconfigurable structure of a HASH entry provided by an embodiment of the present application;

FIG. 7 is a structural block diagram of a device for implementing multi-rule flow classification according to an embodiment of the present application;

Fig. 8 is a schematic structural diagram of a device provided by an embodiment of the present application.

Detailed ways

Hereinafter, the embodiments of the present application will be described with reference to the drawings. In the case of no conflict, the embodiments in the application and the features in the embodiments can be combined with each other arbitrarily.

In the embodiment of this application, through in-depth analysis of the existing NP microcode characteristics and flow classification process, on the one hand, it is necessary to avoid the use of TCAM, on the other hand, it needs to meet the current application, so as to solve the problem of GPON terminal application encountering table lookup. Delay the problem.

In the existing flow classification process, the following steps are included:

S1. The input information includes 128-byte packet header data, port number, and other preprocessing information input by the previous module.

S2. Check the extraction rule index table based on the port number, and obtain the corresponding address of the extraction rule and information about whether each extraction rule is valid or not.

S3. Look up the extraction rule table based on each effective extraction rule index.

S4. Based on the content obtained from the extraction rule table, extract the corresponding fields from the 128-byte packet header data, and form a key with some pre-processing information, and perform a HASH search and matching operation. HASH matching is an exact match.

S5. If all the rules are not matched, the message is processed according to the "default action for all unmatched" in the extraction rule index table. If there is and only one rule is matched, then output according to the result of the rule. If there is more If one rule is valid, then according to the priority of the results of multiple rules, the result with the highest priority is output.

If a rule lookup is performed according to the existing flow classification process, the lookup includes: a rule extraction index table lookup, a rule extraction rule table lookup, and a HASH lookup. The return result of an NP table lookup is limited, that is, a rule lookup requires at least 3 times of the above three types of lookups, and the high probability is more than 3.

According to the current GPON terminal application, it is common for the service message rules to use the existing flow classification process to check the table more than 3 times. Therefore, the total number of table lookups is equivalent to more than 10 times. The basic service processing of two-layer messages (ie L2) also includes at least the learning and searching of Media Access Control (MAC) addresses, packet modification, and other branches. Function, which makes the number of table look-ups become very large.

Under the NP architecture, under ideal circumstances, 64 bytes reach the line speed, and the number of table lookups must be as few as possible. Therefore, under the NP architecture, if the original process is followed, there will be too many table lookups, and each table lookup must be analyzed in turn to consider whether it can be optimized and how to optimize it, which makes it difficult to meet GPON terminal applications. As mentioned earlier, the lookup table includes three types: rule extraction index table, extraction rule table, and HASH lookup.

According to the role of the microcode in the existing NP architecture, it can be known that the extraction of data can be completed by the microcode, and the purpose of the search rule extraction index table and the search extraction rule table is to extract data. Therefore, these two types of tables can be optimized under the NP architecture. Then analyze the HASH lookup. From the previous description, we can see that the solution to solve the flow classification function under the NP architecture is to use TCAM, and then the use of TCAM in GPON terminal products can be analyzed. TCAM applies its two characteristics to GPON terminals:

First, in the TCAM, each entry has an independent MASK bit, and the TCAM will enter KEY&&MASK to get the KEY that the entry really focuses on. This step completes the functions of the original rule extraction index and rule extraction table. TCAM entries and corresponding MASK bits can be dynamically added and deleted, that is, the software can dynamically add rules and corresponding entries.

Second, the data after the MASK is accurately matched with the entry. At this time, there are multiple entries matching. According to the priority of the entry, the TCAM selects the matching entry with the highest priority as the final result, and outputs the corresponding information to feedback to the microcode. . This step is similar to the exact match search of HASH.

Each entry in the TCAM has a MASK bit. TCAM does not support many items, and it can compare each item. If this work is done through pure logic, it is equivalent to a TCAM to complete this work. Therefore, this MASK bit needs to be analyzed according to the application scenarios of the current ONU project to meet the GPON terminal application as the passing criterion. The output of the comparison result based on the matching priority of multiple entries is also a feature of TCAM.

In an implementation manner, FIG. 1 is a flowchart of a method for implementing multi-rule flow classification according to an embodiment of the present application. This embodiment is applied to reduce the delay in table lookup of the flow classification rule performed by the NP. This embodiment may be executed by a device. For example, the device may be an optical network terminal.

As shown in Figure 1, the method in this embodiment includes S110-S140.

S110: In the case of detecting that a hash (HASH) search instruction is triggered, generate a corresponding original field based on the original field structure and according to the field to be extracted in the original input information.

In the embodiment, the HASH search instruction refers to an instruction that the flow classification process discovered by the microcode enters the field search stage. In the embodiment, when the microcode detects that the process enters the field search stage, field splicing and assembly are started to generate the corresponding original fields.

In one embodiment, based on the original field structure and generating the corresponding original field according to the field to be extracted in the original input information, including: according to the original field structure, the address of the system device table (SDT) to be searched, The thread number of the current microcode is combined with the field to be extracted in the original input information to generate the corresponding original field. In the embodiment, the original field structure refers to the field structure used to look up the SDT table request. That is to say, after the microcode composes the corresponding original field, the SDT table search request is initiated. Exemplarily, the original field structure may include: the field to be extracted, the address of the SDT table to be searched, and the thread number of the current microcode. In the embodiment, when entering the field search phase, the microcode assembles the field to be extracted, the address of the SDT table to be searched, and the thread number of the current microcode according to the original field structure to generate the corresponding original field. In the embodiment, the field to be extracted refers to the field to be extracted by the microcode as required.

S120: Generate a corresponding intermediate field according to the original field and the content of the entry in the pre-configured SDT table.

In an embodiment, the table entry content of the SDT table may include: multiple HASH access flag bits, HASH table identity (Identity, id) number flag bits, HASH field size flag bits, and HASH table width flag bits. In the embodiment, after the microcode generates the original field according to the original field structure, it initiates the SDT table search request. First, it searches the SDT table based on the address of the SDT table to be searched, and reads the contents of the entries in the SDT table, and then according to the pre-configuration The intermediate field structure of SDT combines the entry content in the SDT table with the content in the original field to generate the corresponding intermediate field. The intermediate field structure may include: the thread number of the current microcode, multiple HASH access flags, HASH table id number flags, HASH field size flags, HASH table width flags, and fields to be extracted. After the microcode reads the corresponding entry content from the SDT table, the entry content and the thread number of the current microcode in the original field and the field to be extracted are assembled according to the intermediate field structure to generate the corresponding intermediate field.

S130. Search for at least one target field matching the bit width of the field to be extracted and the width of the HASH table from the preset HASH bucket according to the intermediate field.

In the embodiment, the preset HASH bucket refers to a pre-configured space for storing various fields. The preset HASH bucket can be used to store a variety of different field widths, different types, and various fields corresponding to different HASH table widths, which is not limited. In the embodiment, the field to be extracted has its own attributes, such as its own bit width and the width of its own HASH table. After the intermediate field is obtained, the preset HASH bucket can be searched according to the bit width of the field to be extracted and the width of the HASH table in the intermediate field to obtain at least one target field that matches the bit width of the field to be extracted and the width of the HASH table. In the embodiment, the preset HASH bucket is pre-configured, that is, there is no need to perform rule extraction index search and extraction rule related operations according to the field to be extracted, thereby greatly reducing the table lookup delay of the flow classification rule.

S140: According to the priority order of at least one target field, output the target field with the highest priority as an output result.

In the embodiment, according to the priority order of at least one target field, the target field with the highest priority is output as the output result, which means that when multiple rules take effect at the same time and need to be matched, according to the agreed priority The result with the highest priority is selected in order for output. On the premise of ensuring flexible processing of service messages, the architecture reduces the delay of NP's table lookup for flow classification rules, and at the same time reduces the cost of GPON terminal chips.

In an embodiment, generating the corresponding intermediate field according to the original field and the entry content in the pre-configured system equipment table SDT includes: reading the pre-configured SDT table according to the address of the SDT table to be searched in the original field, Obtain the corresponding entry content; based on the intermediate field structure, combine the entry content, the thread number of the current microcode in the original field, and the field to be extracted to generate the corresponding intermediate field.

In the embodiment, the microcode searches the SDT table based on the address of the SDT table to be searched in the original field, and after finding the corresponding SDT table, reads the content of each entry in the SDT table; then, according to the structure of the intermediate field, The content of each entry in the SDT table is combined with the thread number of the current microcode in the original field and the field to be extracted to generate the corresponding intermediate field.

In an embodiment, before searching for at least one target field matching the bit width of the field to be extracted and the width of the HASH table from the preset HASH bucket according to the intermediate field, the method further includes: determining the corresponding according to multiple HASH flag bits in the intermediate field For the HASH search strategy, the multi-HASH flag bit is used to indicate the number of HASH rules for the current multi-HASH search request.

In an embodiment, the corresponding HASH search strategy can be determined according to the multiple HASH flag bits in the middle field. In other words, the HASH search strategy is related to multiple HASH flag bits. In an embodiment, the HASH search strategy may include: a single HASH search strategy and a multiple HASH search strategy. The single HASH lookup strategy is used to characterize the flow classification process through a single HASH path; the multiple HASH lookup strategy is used to characterize the flow classification process through multiple HASH paths.

In one embodiment, when the multiple HASH flag bits are the first value, searching for at least one target field matching the bit width of the field to be extracted and the width of the HASH table from the preset HASH bucket according to the intermediate field includes: Multi-HASH search strategy, and search the corresponding multi-HASH rule index table according to the HASH table id number in the middle field; determine the field corresponding to at least one target field from the preset HASH bucket according to the HASH table id number in the multi-HASH rule index table Bit width and HASH table width; generate at least one corresponding target field according to the field bit width and HASH table width.

In an embodiment, the multiple HASH flag bits can be represented by the first value and the second value. Exemplarily, the first value is 1, and the second value is 0. In the embodiment, when the multi-HASH flag bit is 1, it indicates that the flow classification process performs multiple HASH lookups, that is, the multi-HASH lookup strategy is adopted to determine the corresponding target field, that is, the multi-HASH lookup strategy is used for the first HASH lookup. , And find the corresponding multi-HASH rule index table based on the HASH table id number in the middle field, and then find the field bit width and HASH corresponding to the target field from the preset mapping table according to the HASH table id number in the multi-HASH rule index table Table width; then the field bit width and HASH table width corresponding to the target field are replaced with the field bit width and HASH table width of the field to be extracted to obtain at least one corresponding target field. After completing the first HASH search, the rule index number is reduced by one, and then the above-mentioned multi-HASH search strategy is re-executed until the rule index number is reduced to 0.

In an embodiment, according to the priority order of at least one target field, outputting the target field with the highest priority as the output result includes: determining the priority order of each target field; and using the target field with the highest priority as the output The result is output.

In the embodiment, after multiple target fields are found from the HASH bucket, the priority of each target field is sorted, and the target field with the highest priority is output as the output result. In the embodiment, sorting the priority of the target field means that for a situation where there are multiple rules that take effect at the same time and need to be matched, the priority of each rule is sorted.

In an embodiment, the method for implementing multi-rule flow classification further includes: storing the original input information and at least one output result in a combination of an idle pointer queue and a random access memory (Random Access Memory, RAM).

In the embodiment, in the case of storing the output result obtained by executing the multi-HASH search strategy, the storage can be carried out in a combination of a free pointer queue and RAM. In one embodiment, the depth of RAM is related to the total number of threads. In an embodiment, the depth of the RAM can be set to N, and N depends on the total number of threads. In the embodiment, during the storage process, the original input information can be stored, or N output results can be stored, which is not limited.

In an embodiment, FIG. 2 is a schematic flowchart of a flow classification provided by an embodiment of the present application. As shown in Figure 2, the flow classification process includes the following steps:

S210. Extract the HASH KEY.

S220, HASH KEY preprocessing.

S230. Find the HASH bucket.

S240. Comparison of multiple HASH results.

S250, the result is output.

In the embodiment, HASH KEY refers to the field in the above embodiment. In the embodiment, the process of extracting the HASH KEY includes: in the case of initializing the flow classification processing flow, the microcode command is used to perform KEY combination according to application needs.

The process of HASH KEY preprocessing includes: customizing an SDT table, flexibly responding to multiple HASH input conditions, and flexibly reprocessing HASH KEY for different input conditions. In the embodiment, different HASH table widths, different HASH KEY widths, and different result bit widths are defined according to application requirements, thereby satisfying the situation of multi-rule arbitration for flow classification.

The process of finding the HASH bucket includes: Corresponding to the HASH KEY in the HASH KEY preprocessing section, which can flexibly match different HASH KEY bit widths, and flexibly match according to different HASH table widths, and output the matched Corresponding results.

The process of comparing multiple HASH results includes: when there are multiple rules that take effect at the same time and need to be matched, the result with the highest priority is selected for output according to the pre-configured priority order.

In an embodiment, FIG. 3 is a processing flowchart from HASH KEY preprocessing to comparison of HASH results in a flow classification provided in an embodiment of the present application. As shown in Figure 3, the processing flow includes the following steps:

S310. The microcode detects that the flow classification process enters the HASH KEY extraction stage, and performs KEY extraction.

S320, the microcode performs KEY assembly to generate original fields.

S330. After the microcode composes the original fields, initiate a table lookup request and look up the SDT table.

S340. After reading the SDT table, combine the contents of the entries in the SDT table and the original fields to generate intermediate fields.

S350: When the multi-HASH flag bit in the middle field is 1, perform multiple HASH lookups.

S360. Perform the first HASH search, search for a table in Arithmetic LoGic (ALG) based on the HASH table id number in the KEY, and replace the HASH KEY in the original field with the search result.

S370. Perform the second until the last HASH lookup, and use the hash table id number in the HASH KEY plus 1 as an index to look up the table, and replace the HASH KEY in the original field with the search result.

S380. The search result is stored by using the combination of the idle pointer queue and the RAM.

S390. After completing the HASH KEY assembly, send it to the original ALG function module for processing.

In the embodiment, when the microcode detects that the flow classification process enters the HASH KEY extraction stage, the splicing is started. First, the microcode performs HASH KEY assembly according to the original field structure to generate the corresponding original field. Table 1 is a schematic table of an original field structure provided by an embodiment of the present application. As shown in Table 1, the original field structure includes: the field to be extracted, the address of the SDT table to be searched, and the thread number of the current microcode.

Table 1 A schematic table of the original field structure

The address of the SDT table to be looked up

Thread number of current microcode

Field to be extracted

In the embodiment, HASH KEY may be used to indicate the field extracted by the microcode as needed, that is, the field to be extracted; sdt_num indicates the address of the SDT table to be searched; np_no indicates the thread number of the current microcode.

After the original field is generated by the microcode, a table lookup request is initiated, and the SDT table is first searched based on sdt_num. Table 2 is a schematic table of the structure of an SDT table provided in an embodiment of the present application.

Table 2 A schematic table of the structure of an SDT table

In an embodiment, multi_hash may be used to indicate a multi-HASH flag bit; multi_table_id indicates a HASH table number; hash_key_size indicates a HASH KEY size flag bit; multi_tbl_width indicates a HASH table width flag bit.

Exemplarily, when multi_hash is 0, it indicates the length of the key value for accessing the HASH module, and the total HASH key value is 384 bits. For example, when the hash_key_keytype is 128bit, the minimum supported granularity is 1byte, the value range of keysize is 1-14, and the corresponding HASH key length (in 8bit units) are: 000001-1*8bit, 000010-2*8bit , 001110-14*8bit, reserved for other cases; when the hash_key_keytype is 256bit, the minimum supported granularity is 2byte, the value range of keysize is 1-15, and the corresponding HASH key length (in 16bit units) are: 000001- 1*16bit, 000010-2*16bit, 001111-15*16bit, reserved for other cases; when hash_key_keytype is 384bit, the minimum supported granularity is 4byte, the value range of keysize is 1-12, and the corresponding HASH key length (in 32bit is the unit) respectively: 000001-1*32bit, 000010-2*32bit, 001100-12*32bit, other cases are reserved; when hash_key_keytype is 512bit, the minimum supported granularity is 4byte, and the value range of keysize is 1-12 , The corresponding HASH key value length (in 32bit unit) are respectively: 000001-1*32bit, 000010-2*32bit, 001100-12*32bit, otherwise reserved. In the embodiment, multi_tbl_width represents the HASH table width flag bit, which is used to store the bit width. The key type is 2bit, which means that the length type of the HASH entry corresponding to the search is: 01-128bit, 10-256bit, 11-512bit, 00-384bit.

After reading the SDT table, combine the contents of each entry in the SDT table and the HASH KEY to assemble to generate the corresponding middle field. Table 3 is a schematic table of an intermediate field structure provided in an embodiment of the present application. As shown in Table 3, the intermediate field structure includes: the thread number of the current microcode, multiple hash flags, hash table number, hash key size flags, hash table width flags, and fields to be extracted.

Table 3 Schematic table of an intermediate field structure

np_nonp_no
multi_hashmulti_hash
multi_table_idmulti_table_id
hash_key_sizehash_key_size
multi_tbl_widthmulti_tbl_width
HASH KEYHASH KEY

In the HASH processing process, the pre-processing module is used to pre-shunt. When the multi_hash in the middle field is 0, the basic HASH path (that is, the single HASH search strategy) is executed, but it is necessary to add 8bit information in the high position. That is, 1bit multi hash+7bit multi hash index; when the multi hash in the middle field is 1, it means that multiple hash lookups need to be performed and multiple hash paths (ie, multiple hash lookup strategies) are required.

In the case of using multiple HASH channels for the first HASH lookup, first look up a table inside the ALG based on the hash_table_id in the middle field. Table 4 is an index table of multiple HASH rules provided in an embodiment of the present application.

Table 4 A multi-HASH rule index table

Table 5 is a hash_table_id configuration table provided by an embodiment of the present application. As shown in Table 5, it includes: HASH table width, HASH KEY size, and HASH KEY location. Table 5 A configuration table of hash_table_id

Replace the hash_key_size and hash_key_type in the original key with the hash_table_id, hash_key_size, and hash_table_width obtained by checking the SDT table in turn. Regenerate the HASH KEY according to the hash_key_size, hash_key_location, and hash_key_mask obtained by looking up the table. The meaning of the parameters is as shown in the table above. For example, when hash_key_size is set to 64bit and hash_key_location is set to 5, hash_key(504bit) is {320'b0,(HASH_KEY [383:320]&hash_key_mask[63:0], 120'b0}.

In the case of performing the second until the last HASH lookup, use hash_mul_next_addr in Table 4 as the index to look up Table 4, and the other processes are the same as S350. Each time the HASH is checked, the multi_num read in Table 4 is reduced by one. When it is reduced to 0, it means that the current multi-hash_key application has ended, and the next multi-hash_key application is started.

On the other hand, the preprocessing module does not return the result immediately for the key raised by multi_hash, so it needs to be stored. The storage adopts the free pointer queue + RAM, and the depth of RAM is set to N (N depends on the total number of threads). Storage is not only storing the original input signal, but also storing N output results (the bit width is determined by the application, and the highest bit is hit). In the embodiment, the next highest 5 bits of the output result are used as priority arbitration. (In the HASH table entry, it is the highest 5bit immediately following the key. After all the results are read, the result corresponding to the highest priority is taken.) A first-in-first-out (First In First Out) for storing the pointer to be processed is also required. FIFO) memory (that is, multi_index is stored). The first-in-first-out principle is adopted between N KEYs, and the next one is copied after one is copied.

The number of replications is determined based on the multi_num parameter in Table 4. For each replication, this parameter is reduced by one until it is reduced to 0, which means that the replication is over and the next replication is started. After the keys are assembled, they are sent to the basic HASH functional module. For the scheduling between the keys of multiple HASH and the keys of the original path, the round-robin (RR) method is currently used, and the two have the same priority. After the HASH function module searches, it returns the relevant parameters corresponding to the output result. Using this technical solution can not only cooperate with the solidification part of the microcode to make the extraction field fixed, but more importantly, greatly reduce the time delay of the ONU's flow classification function look-up table when the business type is complex, and the design can be flexibly matched with the microcode. The HASH search scheme, such as the reconfigurable HASH entry structure in the scheme, improves the utilization of RAM, reduces the number of threads of NP, and reduces the cost of the chip.

Fig. 4 is a schematic diagram of a HASH entry structure provided by an embodiment of the present application. As shown in Figure 4, the HASH entry structure includes: entry valid flag (ie valid), key type flag (key_type), HASH table id (ie table_id), business key value (ie key) and output result (ie result) .

In the embodiment, it can be defined that when valid is 0, the current entry is an invalid HASH entry; when vaild is 1, the current entry is valid. The definition of key_type is as follows: when key_type is 00, it means the key type is invalid; when key_type is 01, it means the key type is 128bit; when key_type is 10, it means the key type is 256bit; when key_type is 11, it means the key type is 512bit. The table_id is defined as follows: 0-31, which means 32 HASH tables. key represents the user-defined business key value, the length must be n*8bit (n=1, 2, 3……48), the maximum business key value length is 384bit, and n is between [0:48]. Result is user-defined result data, the highest 1bit no_result bit is reserved for chip hardware to instruct the microcode to check whether the HASH is hit (for example, 0 means hit, 1 means miss). The length of the result actually returned to the microcode is determined by the rsp_mode in the SDT attribute. When the length of the result actually stored in the entry is greater than rsp_mode, the data is truncated when it is returned to the microcode. On the contrary, the low bit of the result is filled with 0 to the rsp_mode bit width.

In an embodiment, taking two HASH lookups as an example, the process of multi-rule flow classification is described. Fig. 5 is a flowchart of an expansion port obtaining entry provided by an embodiment of the present application. As shown in Figure 5, this process includes the following steps:

S410, HASH request input.

S420. The HASH request intercepts a valid key based on the width of the HASH table or the size of the HASH KEY, and performs HASH calculation with the ID number of the HASH table.

S440. After the hash address is obtained by the hash calculation, N bits of data are read from the hash bucket.

S450. In the case that there is a matching entry, obtain the position of the result in the entry according to the bit width of the entry.

Fig. 6 is a schematic diagram of a reconfigurable structure of a HASH entry provided by an embodiment of the present application. In an embodiment, the HASH request input may include: np_no, multi_hash_index, multi_hash, multi_table_id, hash_key_size, and multi_tbl_width. Then, the HASH request intercepts a valid key based on multi_tbl_width or hash_key_size, and then adds multi_table_id for hash calculation, and executes the hash calculation step twice. Then, after the hash address is calculated by the hash, N bits of data are read from the hash bucket. Exemplarily, taking N as 512 and the bit width of the hash matching entry as 128bit as an example, judge whether [510:509]bit is the value of 128bit, and if so, use [511:384] to match the requested 128bit , And at the same time determine whether [382:381] is a 128bit value. If so, use [383:256] to match the requested 128bit, and so on, match [255:128], [127:0] with the requested 128bit, see Figure 6 for the reconfigurable structure.

In an embodiment, the configuration of an access control list (Access Control List, ACL) rule for Layer 3 packets (ie, L3) is taken as an example to describe the implementation process of multi-rule flow classification.

For L3 messages, it can include: 7-tuple, 6-tuple, 5-tuple, etc. In the embodiment, the 7-tuple is taken as an example to describe the implementation process of multi-rule flow classification. When KEY is extracted, all the parameters of the 7-tuple are first extracted and spliced into a key (or original field), and the KEY bit width is between 64 and 128. The software pre-configures an sdt_num corresponding to this multi-hash rule; then configures the sdt table corresponding to sdt_num, the parameters are described in sequence as follows: multi_hash: set to 1, indicating that multi-hash search is enabled; hash table width: meaningless here; multi_hash_addr: (That is, the original hash key size+hash table id) here means the first address of the multi-hash search. In one implementation, reservation is made according to the current usage, for example, 8 multi-hash lookups are reserved first, the first address is configured to be 0, and the first address of the next multi-hash is set to 8. In this case, there may be waste. If the software is easy to implement, the number of times the multi-hash rule index table is currently used, the value is configured as the value of the number of times used, for example, if it is not used at the beginning, it is configured to 0, and the parameter is set to 1 in the second configuration. , Which is the position of the next free pointer.

Then, configure the multi-hash rule index table, the first address is multi_hash_addr. In the multi-HASH rule index table, multi_num: This parameter is as described in Table 4, and only the first address is valid. For example, currently only a 7-tuple search is performed, then this parameter is configured to 1, if a subsequent triple search is added , Then this parameter needs to be modified to 2.

hash_id: hash_id and hash_table_id are configured together, corresponding to the real business table number, such as the current 7-tuple business, if the 7-tuple business has been set on other ports and the current extraction conditions are exactly the same, the currently set hash_id and hash_table_id Match with the previous hash_id and hash_table_id to save the business table number; hash_table_id: Same as above; hash_mul_next_addr: If the value of the remaining number of replications is greater than 1, then hash_mul_next_addr is meaningful. This parameter is similar to the multi_hash_addr configuration idea in the sdt table. Use the following two configuration methods to achieve. One implementation method is: reservation + increment method. When this implementation method is adopted, it means that N positions (N is less than or equal to 8) are directly reserved after each use of multi_hash_addr, hash_mul_next_addr = multi_hash_addr + current copy times -1, for example, such as new A triple rule is added, then it is equivalent to the current number of copies is 2 (there is only one 7-tuple rule), hash_mul_next_addr=multi_hash_addr+2-1; one implementation is: idle pointer method, when this implementation is adopted, The software records the address of the smallest number of the currently unused multi-hash rule index table. For example, if all of them are not used at the beginning, then the free pointer will use 0, if it is used once, the free pointer will use 1 when it is used for the second time, and so on.

Then, configure the hash_table_id configuration table. In the embodiment, hash_table_width corresponds to the bit width of the table entry. For example, the current key length is between 64 and 128, and the result is obtained by cascading. At this time, the parameter is configured to 01, which is 128bit; hash_key_size: indicates that the current key length is 64 ~128, when the key length is 128bit, the value of hash_key_size is 0, if the key length is 64bit, then the value of hash_key_size is 8; hash_key_location: The current overall bit width is less than 128bit, and the key bit width is also 128bit, select 0, see Table 5 for other more value descriptions; hash_key_mask: The current overall bit width is only used less than 128bit, and the key bit width is also 128bit, then the mask bit [127:0] is valid, and according to the current The actual bit width of the 7-tuple is actually intercepted. For example, if the 7-tuple uses the lower 90 bits, then mask[89:0] is configured as 1, and [127:90] is configured as 0.

In one embodiment, when the ONU is currently working, the first 128-bit entry has been configured in the bit width of the 512-bit RAM, occupying [511:384] bits. When the new entry is hashed to the same address again , The second 128bit entry is configured, occupying [383:256]bit. When there is another item HASH to the same address, and 256bit is required, it can be configured in [255:0] at this time.

Fig. 7 is a structural block diagram of a device for implementing multi-rule flow classification provided by an embodiment of the present application. This embodiment is applied to reduce the delay in table lookup of the flow classification rule performed by the NP. As shown in FIG. 7, the device in this embodiment includes: a first generation module 510, a second generation module 520, a search module 530, and an output module 540.

The first generation module 510 is configured to generate corresponding original fields based on the original field structure and according to the fields to be extracted in the original input information when the hash HASH search instruction trigger is detected; the second generation module 520 is configured to According to the original field and the content of the entries in the pre-configured system device table SDT, the corresponding intermediate field is generated; the search module 530 is set to search the preset HASH bucket according to the intermediate field and the bit width and the width of the HASH table of the field to be extracted At least one matched target field; the output module 540 is configured to output the target field with the highest priority as the output result according to the priority order of the at least one target field.

The device for implementing multi-rule flow classification provided in this embodiment is configured to implement the method for implementing multi-rule flow classification in the embodiment shown in FIG. 1. The implementation principle and technical effect of the device for implementing multi-rule flow classification provided in this embodiment are similar. I won't repeat it here.

In one embodiment, the first generation module 510 is configured to combine the address of the SDT table to be searched, the thread number of the current microcode, and the field to be extracted in the original input information according to the original field structure to generate the corresponding original field structure. Field.

In an embodiment, the second generating module 520 includes: a reading unit configured to read the pre-configured SDT table according to the address of the SDT table to be looked up in the original field to obtain the corresponding table item content; the first generating unit , It is set to combine the content of the table entry, the thread number of the current microcode in the original field, and the field to be extracted based on the structure of the intermediate field to generate the corresponding intermediate field.

In an embodiment, the device for implementing multi-rule flow classification further includes: a determining module configured to search for at least one target field matching the bit width of the field to be extracted and the width of the HASH table from the preset HASH bucket according to the intermediate field Previously, the corresponding HASH search strategy was determined according to the multi-HASH flag bit in the middle field. The multi-HASH flag bit is used to indicate the number of hash rules for the current multi-HASH search request. The hash search strategy includes a multi-HASH search strategy.

In one embodiment, when the multi-HASH flag bit is the first value, the search module 530 includes: a search unit, configured to search based on a multi-HASH search strategy, and search for the corresponding multi-HASH table ID according to the HASH table id number in the middle field. HASH rule index table; a first determining unit configured to determine the field bit width and HASH table width corresponding to at least one target field from the preset HASH bucket according to the HASH table id number in the multi-HASH rule index table; second generation The unit is set to generate at least one corresponding target field according to the bit width of the field and the width of the HASH table.

In an embodiment, the output module 540 includes: a second determining unit configured to determine the priority order of each target field; and an output unit configured to output the target field with the highest priority as an output result.

In one embodiment, the device for implementing multi-rule flow classification further includes: a storage module configured to store the original input information and at least one output result in a combination of an idle pointer queue and RAM.

In one embodiment, the depth of RAM is related to the total number of threads.

Fig. 8 is a schematic structural diagram of a device provided by an embodiment of the present application. As shown in FIG. 8, the device provided by the present application includes: a processor 610 and a memory 620. The number of processors 610 in the device may be one or more. In FIG. 8, one processor 610 is taken as an example. The number of memories 620 in the device may be one or more. In FIG. 8, one memory 620 is taken as an example. The processor 610 and the memory 620 of the device are connected through a bus or in other ways. In FIG. 8, the connection through a bus is taken as an example. In this embodiment, the device is an optical network terminal.

As a computer-readable storage medium, the memory 620 can be configured to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the device of any embodiment of the present application (for example, in the device for implementing multiple rule flow classification) The first generation module, the second generation module, the search module and the output module). The memory 620 may include a program storage area and a data storage area. The program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created according to the use of the device, and the like. In addition, the memory 620 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices. In some examples, the memory 620 may further include a memory remotely provided with respect to the processor 610, and these remote memories may be connected to the device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The above-provided device can be configured to implement the multi-rule flow classification implementation method provided by any of the above-mentioned embodiments, and has corresponding functions and effects.

The embodiment of the present application also provides a storage medium containing computer-executable instructions. When the computer-executable instructions are executed by a computer processor, they are used to implement a method for implementing multi-rule flow classification. The method includes: When the HASH lookup command is triggered, the corresponding original field is generated based on the original field structure and the field to be extracted in the original input information; the corresponding intermediate field is generated according to the original field and the content of the entry in the pre-configured system equipment table SDT Field; according to the intermediate field, find at least one target field matching the bit width of the field to be extracted and the width of the HASH table from the preset HASH bucket; according to the priority order of at least one target field, the target field with the highest priority is used as the output result Perform output.

Those skilled in the art should understand that the term user equipment encompasses any suitable type of wireless user equipment, such as a mobile phone, a portable data processing device, a portable web browser, or a vehicle-mounted mobile station.

The various embodiments of the present application can be implemented in hardware or dedicated circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor, or other computing device, although the present application is not limited thereto.

The embodiments of the present application may be implemented by executing computer program instructions by a data processor of a mobile device, for example, in a processor entity, or by hardware, or by a combination of software and hardware. Computer program instructions can be assembly instructions, Instruction Set Architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or written in any combination of one or more programming languages Source code or object code.

The block diagram of any logic flow in the drawings of the present application may represent program steps, or may represent interconnected logic circuits, modules, and functions, or may represent a combination of program steps and logic circuits, modules, and functions. The computer program can be stored on the memory. The memory can be of any type suitable for the local technical environment and can be implemented using any suitable data storage technology, such as but not limited to read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), optical Memory devices and systems (Digital Video Disc (DVD) or Compact Disk (CD)), etc. Computer-readable media may include non-transitory storage media. The data processor can be any type suitable for the local technical environment, such as but not limited to general-purpose computers, special-purpose computers, microprocessors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (ASICs) ), programmable logic devices (Field-Programmable Gate Array, FPGA), and processors based on multi-core processor architecture.

Claims

A method for implementing multi-rule flow classification includes:

In the case of detecting that the hash HASH search instruction is triggered, the corresponding original field is generated based on the original field structure and the field to be extracted in the original input information;

Generate corresponding intermediate fields according to the original fields and the contents of the entries in the pre-configured system equipment table SDT;

Searching for at least one target field matching the bit width of the field to be extracted and the width of the HASH table from the preset HASH bucket according to the intermediate field;

According to the priority order of the at least one target field, the target field with the highest priority is output as the output result.
The method according to claim 1, wherein the generating the corresponding original field based on the original field structure and according to the field to be extracted in the original input information comprises:

According to the original field structure, the address of the SDT table to be searched, the thread number of the current microcode, and the field to be extracted in the original input information are combined to generate the corresponding original field.
The method according to claim 1, wherein the generating the corresponding intermediate field according to the original field and the entry content in the pre-configured system equipment table SDT comprises:

Read the pre-configured SDT table according to the address of the SDT table to be searched in the original field to obtain the corresponding table entry content;

Based on the intermediate field structure, the entry content, the thread number of the current microcode in the original field, and the field to be extracted are combined to generate a corresponding intermediate field.
The method according to claim 1, before said searching for at least one target field matching the bit width of the field to be extracted and the width of the HASH table from a preset HASH bucket according to the intermediate field, further comprising:

The corresponding HASH search strategy is determined according to the multiple HASH flag bits in the middle field, the multiple HASH flag bits are used to indicate the number of hash rules of the current multiple HASH search request, and the HASH search strategy includes a multiple HASH search strategy.
The method according to claim 4, wherein, in the case that the multi-HASH flag bit is the first value, the search for the sum of the bit width of the field to be extracted from a preset HASH bucket according to the intermediate field At least one target field that matches the width of the HASH table includes:

Based on the multi-HASH search strategy, and search the corresponding multi-HASH rule index table according to the HASH table identification id number in the intermediate field;

Determining the field bit width and the HASH table width corresponding to the at least one target field from the preset HASH bucket according to the HASH table id number in the multiple HASH rule index table;

The at least one target field is generated according to the bit width of the field and the width of the HASH table.
The method according to claim 1, wherein the outputting the target field with the highest priority as the output result according to the priority order of the at least one target field comprises:

Determine the priority order of each target field;

The target field with the highest priority is output as the output result.
The method according to any one of claims 1-6, further comprising:

The free pointer queue and the random access memory RAM are combined to store the original input information and at least one output result.
The method according to claim 7, wherein the depth of the RAM is related to the total number of threads.
A device, including: a memory, and at least one processor;

Memory, set to store at least one program;

When the at least one program is executed by the at least one processor, the at least one processor is enabled to implement the method according to any one of claims 1-8.
A storage medium storing a computer program, which implements the method of any one of claims 1-8 when the computer program is executed by a processor.