WO2015184706A1 - 统计计数设备及其实现方法、具有统计计数设备的系统 - Google Patents

统计计数设备及其实现方法、具有统计计数设备的系统 Download PDF

Info

Publication number
WO2015184706A1
WO2015184706A1 PCT/CN2014/087236 CN2014087236W WO2015184706A1 WO 2015184706 A1 WO2015184706 A1 WO 2015184706A1 CN 2014087236 W CN2014087236 W CN 2014087236W WO 2015184706 A1 WO2015184706 A1 WO 2015184706A1
Authority
WO
WIPO (PCT)
Prior art keywords
statistical
request
data
packet
increment
Prior art date
Application number
PCT/CN2014/087236
Other languages
English (en)
French (fr)
Inventor
孙远航
张炜
李彧
王志忠
刘衡祁
王晓明
Original Assignee
中兴通讯股份有限公司
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司, 深圳市中兴微电子技术有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2015184706A1 publication Critical patent/WO2015184706A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks

Definitions

  • the present invention relates to statistical technologies in the field of data communications, and in particular, to a statistical counting device and an implementation method thereof, and a system having a statistical counting device.
  • the interface rate of the core router used for the backbone network interconnection reaches 100 Gbps, and the traffic management (TM), the operation management and maintenance (OAM), and the operation management and maintenance (OAM) supported by the network processing chip.
  • TM traffic management
  • OAM operation management and maintenance
  • OAM operation management and maintenance
  • the existing solution 2 if a plug-in memory, such as an SRAM or a Synchronous Dynamic Random Access Memory (SDRAM), is disposed outside the network processing chip, the calculated statistical count value is written into the external memory.
  • a plug-in memory such as an SRAM or a Synchronous Dynamic Random Access Memory (SDRAM)
  • SDRAM Synchronous Dynamic Random Access Memory
  • the counting module built in the network processing chip and the external memory such as SRAM are used, the result of the counting module needs to be transmitted to the SRAM storage, and the frequent interaction between the two will inevitably occupy bandwidth, resulting in a problem of small access bandwidth.
  • the embodiment of the present invention is to provide a statistical counting device, an implementation method thereof, and a system with a statistical counting device, which can implement a statistical counting function, avoiding an increase in manufacturing cost of a network processing chip and a problem of a small access bandwidth.
  • a statistical counting device is provided on the outside of the network processing chip, and the device includes:
  • a receiving unit configured to receive a statistical request sent by the network processing chip
  • a parsing unit configured to parse the statistical request to obtain a type of the statistical request and an increment of the statistical request
  • a statistical unit configured to convert the type of the statistical request and the increment of the statistical request into an address of the storage unit and a data calculation increment according to the preset configuration; and send the read data to the corresponding storage unit according to the address of the storage unit Requesting; performing statistical counting operation on the read data returned by the storage unit and the data calculation increment, and writing the obtained statistical result into the corresponding storage unit;
  • a storage unit configured to store data, receive the read data request, respond to the read data request, and return the read data to the statistical unit.
  • the statistical counting device further includes:
  • a configuration unit configured to receive an access request sent by the main CPU, in response to the access request, The statistical result is obtained from the storage unit via the statistical unit and provided to the primary CPU for use.
  • the receiving unit further includes:
  • the high speed interface module is configured to use a physical link formed by a high speed serializer/deserializer Serdes interface to receive the statistical request in conjunction with the high speed transmission protocol Interlaken.
  • the high speed interface module further includes:
  • a Serdes conversion submodule configured to perform serial to parallel conversion of high speed data and convert the statistical request from serial data to parallel data transmission
  • the Interlaken protocol sub-module is configured to base the parallel data transmitted by the Serdes conversion sub-module on the Interlaken-formatted request packet in the Interlaken format.
  • the parsing unit further includes:
  • the packet parsing module is configured to parse the statistical request according to an agreed format, and obtain a type of the statistical request and an increment of the statistical request;
  • the statistical request is obtained by encapsulating the network processing chip according to the agreed format
  • the format of the convention is based on the format of the Interlaken encapsulation, and the basic unit is a statistical packet slice.
  • a physical link formed by a high-speed serializer/deserializer Serdes interface is used, and a high-speed interface module for receiving the statistical request is matched with a high-speed transmission protocol Interlaken;
  • the high speed interface module further includes:
  • a Serdes conversion submodule configured to perform serial to parallel conversion of high speed data and convert the statistical request from serial data to parallel data transmission
  • An Interlaken protocol submodule configured to base the parallel data transmitted by the Serdes conversion submodule on the Interlaken format request packet according to the Interlaken format;
  • the parsing unit further includes:
  • the parsing sub-module is configured to obtain any one of the Interlaken format request data packets, and read the request data packet according to a valid flag bit of the statistical message slice, and when the valid flag bit matches, the obtained one is obtained.
  • the text slice is parsed until all statistical packet slices included in the request packet are read and parsed.
  • the parsing unit further includes:
  • a cache submodule configured to store all request data packets to be parsed
  • the parsing sub-module is further configured to read the currently read request packet according to the valid flag of the statistical packet slice to obtain the statistical packet slice, and wait for the currently read request packet. After all the statistical message slices are processed, the next request data packet is extracted from the cache submodule;
  • the parsing of the statistical packet is performed, and the type of the statistical request is: an ID number that distinguishes different statistical services, a statistical pair number supported by the statistical service, and a statistical item supported by the statistical service.
  • the statistical unit further includes:
  • a statistical pre-processing module configured to convert the type of the statistical request and the increment of the statistical request into an address of the storage unit and a data calculation increment according to a preset configuration
  • a statistical calculation module configured to send a read data request to the corresponding storage unit according to the address of the storage unit; perform statistical counting operation on the read data returned by the storage unit and the data calculation increment, and obtain the obtained The statistical result is written to the corresponding storage unit.
  • the receiving unit further includes:
  • a high-speed interface module that uses the high-speed serializer/deserializer Serdes interface to cooperate with the high-speed transmission protocol Interlaken to receive the statistical request;
  • the high speed interface module further includes:
  • Serdes conversion sub-module configured to perform serial-to-parallel conversion of high-speed data and to perform the statistics Requests are converted from serial data to parallel data transfers;
  • An Interlaken protocol submodule configured to base the parallel data transmitted by the Serdes conversion submodule on the Interlaken format request packet according to the Interlaken format;
  • the parsing unit further includes:
  • the parsing sub-module is configured to obtain any one of the Interlaken format request data packets, and read the request data packet according to a valid flag bit of the statistical message slice, and when the valid flag bit matches, the obtained one is obtained.
  • the data slice is parsed, and the type of the statistical request obtained by the parsing includes: an ID number that distinguishes different statistical services, a statistical pair number supported by the statistical service, and a statistical item supported by the statistical service; and an increment of the statistical request obtained by the parsing Including statistical increments;
  • the statistical preprocessing module further includes:
  • a pre-processing sub-module configured to obtain the ID number of the different statistical services, the number of statistical pairs supported by the statistical service, the statistics items supported by the statistical service, the statistical increment, and the preset configuration
  • the statistical calculation module further includes:
  • the instruction selection sub-module is configured to respond to any one of the statistical request sent by the network processing chip and the access request sent by the main CPU according to the preset scheduling rule, and perform corresponding statistical counting processing or statistical result access processing;
  • Instruction cache submodule configured to cache instructions, waiting for an instruction to execute Retrieving the next instruction, the instruction includes: an instruction to send a read data request after querying the corresponding storage unit according to the address of the storage unit, and an instruction to write the obtained statistical result to the write back data of the corresponding storage unit Providing at least one of the instructions for the statistical result to be accessed by the main CPU;
  • the calculating submodule is configured to query the corresponding storage unit according to the address of the storage unit, issue an instruction to read the data request, and perform a statistical counting operation according to the returned read data and the statistical increment.
  • the calculation sub-module can be implemented by a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or a Field-Programmable Gate Array (FPGA).
  • CPU Central Processing Unit
  • DSP Digital Signal Processor
  • FPGA Field-Programmable Gate Array
  • a method for implementing a statistical counting according to an embodiment of the present invention includes:
  • the statistical counting device receives the statistical request sent by the network processing chip; the statistical counting device is disposed outside the network processing chip;
  • the statistical counting device parses the statistical request, and obtains the type of the statistical request and the increment of the statistical request;
  • the statistical counting device converts the type of the statistical request and the increment of the statistical request into an address of the built-in memory and a data calculation increment according to a preset configuration; the statistical counting device sends a read data request to the corresponding memory according to the address of the memory. And performing the statistical counting operation on the read data returned by the memory and the data calculation increment, and writing the obtained statistical result into the corresponding memory.
  • the method further includes:
  • the statistical counting device receives the access request sent by the main CPU
  • the statistical result is obtained from the storage unit via the statistical unit and provided to the primary CPU for use.
  • the statistical counting device receives the statistical request sent by the network processing chip, and includes:
  • the statistical counting device uses a physical link formed by a high speed serializer/deserializer Serdes interface to receive the statistical request in conjunction with the high speed transmission protocol Interlaken.
  • the method further includes:
  • the statistical counting device converts the statistical request from serial data to parallel data transmission
  • the statistical counting device bases the transmitted parallel data on the Interlaken formatted request packet in Interlaken format.
  • the statistical request is an Interlaken format request packet obtained by the network processing chip being encapsulated according to an agreed format
  • the method further includes: the statistical counting device parsing the statistical request according to the agreed format, and obtaining a type of the statistical request and an increment of the statistical request.
  • any request packet of the Interlaken format includes a plurality of statistical packet slices
  • the method further includes:
  • any request packet of the Interlaken format includes a plurality of statistical packet slices
  • the method further includes:
  • the currently read request packet is read according to the valid flag of the statistical packet slice to obtain the statistical packet slice, and waits for all statistical packet slices in the currently read request packet to be processed.
  • the next request packet is extracted in the cache.
  • the type of the statistical request includes: an ID number that distinguishes different statistical services, a statistical pair of columns supported by the statistical service, and a statistical item supported by the statistical service;
  • the increment of the statistical request includes a statistical increment.
  • the statistical counting device converts the type of the statistical request and the increment of the statistical request into an address of the built-in memory according to a preset configuration, and the data calculation increment includes:
  • the statistical service support statistics calculate the number of columns and the configuration information to obtain a memory target address and serve as an address of the built-in memory;
  • the statistical counting device sends a read data request to the corresponding memory according to the address of the memory, and performs statistical counting operation on the read data returned by the memory and the statistical increment, and the obtained statistics are obtained.
  • the result is written to the corresponding memory, including:
  • the statistical counting device queries the corresponding memory according to the address of the built-in memory, and after issuing an instruction to read the data request, performs a statistical counting operation according to the returned read data and the data calculation increment.
  • a system with a statistical counting device comprising: statistics a counting device, the system further comprising any one of a network processing chip and a main CPU;
  • the network processing chip is configured to send a statistical request to the statistical counting device
  • the primary CPU is configured to send an access request to the statistical counting device
  • the statistical counting device is the statistical counting device according to any one of the above aspects.
  • the statistical counting device is disposed outside the network processing chip, and the device includes: a receiving unit configured to receive a statistical request sent by the network processing chip; and an analyzing unit configured to parse the statistical request to obtain statistics The type of the request and the increment of the statistical request; the statistical unit is configured to convert the type of the statistical request and the increment of the statistical request into an address of the storage unit and a data calculation increment according to the preset configuration; according to the storage unit The address sends a read data request to the corresponding storage unit; the read data returned by the storage unit and the data calculation increment are statistically counted, and the obtained statistical result is written into the corresponding storage unit; the storage unit, Configuring to store data, receiving the read data request, responding to the read data request, and returning the read data to the statistical unit, so that the statistical unit can perform statistics based on the read data. Operation.
  • the statistical counting device is disposed outside the network processing chip, and the statistical counting device has a built-in storage unit such as SRAM, and the counting unit can be independently completed by a series of units inside the statistical counting device, and the result is directly returned to the built-in storage.
  • the unit is stored, and the device architecture can not only realize the statistical counting function at a high speed, but also avoid the problem that the manufacturing cost of the network processing chip and the access bandwidth are small due to the above-mentioned prior art architecture.
  • FIG. 1 is a schematic structural diagram of a statistical counting device of the present invention
  • FIG. 2 is a schematic diagram showing an implementation flow of a method for implementing statistical counting according to the present invention
  • FIG. 3 is a schematic structural diagram of a system of the present invention.
  • FIG. 4 is a schematic diagram of a system architecture corresponding to the first embodiment of the present invention.
  • FIG. 5 is a flowchart of implementing a method according to a first embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of an internal implementation of an SST according to a first embodiment of the present invention.
  • FIG. 7 is a schematic diagram of an internal implementation architecture of a HIF according to a first embodiment of the present invention.
  • FIG. 8 is a schematic diagram of an internal implementation architecture of a UPK unit according to a first embodiment of the present invention. and a data flow diagram;
  • FIG. 9 is a flowchart of implementing internal calculation of a PRE according to a first embodiment of the present invention.
  • FIG. 10 is a schematic diagram of an internal implementation architecture of a STAT according to a first embodiment of the present invention. and a working flowchart;
  • FIG. 11 is a schematic diagram of a system architecture corresponding to a second embodiment of the present invention in application scenario 2;
  • FIG. 12 is a flowchart of implementing a method according to a second embodiment of the present invention.
  • the solution of the embodiment of the present invention is a scheme capable of realizing high-speed serial statistical counting, using the underlying high-speed Serdes interface, the upper layer interactive interface protocol Interlaken, the built-in storage unit such as SRAM and the statistical counting calculation logic, according to the agreed statistical request.
  • the format of the data packet is implemented by receiving a request from the network processing chip or the host CPU (a statistical request for the network processing chip, an access request for the host CPU), and reading the existing counting information from the storage unit such as the SRAM. And perform the specified counting operation, and write the statistical counting result back to the storage unit such as SRAM for storage.
  • the user can read the statistical counting result through the direct memory access (DMA) interface, and realize a series of functions such as receiving, parsing, counting, and reading and writing of the data request unit, such as SRAM.
  • DMA direct memory access
  • the embodiment of the invention can implement high-performance, large-capacity statistical counting, adopts a plug-in statistical counting device disposed outside the network processing chip, independently completes the statistical counting function, and uses a storage unit for storing statistical counting results, such as SRAM.
  • SDRAM is also built in the plug-in statistical counting device to solve the problem of the capacity and performance of the network processing chip existing in the existing design, and does not need to occupy the SRAM inside the network processing chip, and save the network processing chip. Manufacturing cost; no external SDRAM is used, and there is no problem of small access bandwidth.
  • the high-speed Serdes interface is a general term including a serial interface (SERializer) and a deserializer (DESerializer) interface;
  • the Interlaken is a new generation data packet interconnection protocol, and Interlaken is an extensible The protocol supports chip-to-chip packet transfer from 10 Gbps to 100 Gbps and above to meet today's design demands for greater bandwidth and higher performance.
  • a statistical counting device is provided in the embodiment of the present invention.
  • the statistical counting device is disposed outside the network processing chip. As shown in FIG. 1 , the statistical counting device includes:
  • a receiving unit configured to receive a statistical request sent by the network processing chip
  • a parsing unit configured to parse the statistical request to obtain a type of the statistical request and an increment of the statistical request
  • a statistical unit configured to convert the type of the statistical request and the increment of the statistical request into an address of the storage unit and a data calculation increment according to the preset configuration; and send the read data to the corresponding storage unit according to the address of the storage unit Requesting; performing statistical counting operation on the read data returned by the storage unit and the data calculation increment, and writing the obtained statistical result into the corresponding storage unit;
  • a storage unit configured to store data, receive the read data request, respond to the read data request, and return the read data to the statistical unit, so that the statistical unit can be based on the read
  • the data is statistically operated.
  • the statistical counting device further includes:
  • a configuration unit configured to receive an access request sent by the primary CPU, and obtain the statistical result from the storage unit via the statistical unit and provide the primary CPU with the access request.
  • the receiving unit further includes:
  • High-speed interface module configured to use a high-speed serializer/deserializer Serdes interface
  • the link in conjunction with the high speed transport protocol Interlaken, receives the statistical request.
  • the high speed interface module further includes:
  • a Serdes conversion submodule configured to perform serial to parallel conversion of high speed data and convert the statistical request from serial data to parallel data transmission
  • the Interlaken protocol sub-module is configured to base the parallel data transmitted by the Serdes conversion sub-module on the Interlaken-formatted request packet in the Interlaken format.
  • the parsing unit further includes:
  • the packet parsing module is configured to parse the statistical request according to an agreed format, and obtain a type of the statistical request and an increment of the statistical request;
  • the statistical request is obtained by encapsulating the network processing chip according to the agreed format
  • the format of the convention is based on the format of the Interlaken encapsulation, and the basic unit is a statistical packet slice.
  • the high speed interface module further includes:
  • a Serdes conversion submodule configured to perform serial to parallel conversion of high speed data and convert the statistical request from serial data to parallel data transmission
  • An Interlaken protocol submodule configured to base the parallel data transmitted by the Serdes conversion submodule on the Interlaken format request packet according to the Interlaken format;
  • the parsing unit further includes:
  • the parsing sub-module is configured to obtain any one of the Interlaken format request data packets, and read the request data packet according to a valid flag bit of the statistical message slice, and when the valid flag bit matches, the obtained one is obtained.
  • the text slice is parsed until all statistical packet slices included in the request packet are read and parsed.
  • the parsing unit further includes:
  • a cache submodule configured to store all request data packets to be parsed
  • the parsing sub-module is further configured to read the currently read request packet according to the valid flag of the statistical packet slice to obtain the statistical packet slice, and wait for the currently read request packet. After all the statistical message slices are processed, the next request data packet is extracted from the cache submodule;
  • the parsing of the statistical packet is performed, and the type of the statistical request is: an ID number that distinguishes different statistical services, a statistical pair number supported by the statistical service, and a statistical item supported by the statistical service.
  • the statistical unit further includes:
  • a statistical pre-processing module configured to convert the type of the statistical request and the increment of the statistical request into an address of the storage unit and a data calculation increment according to a preset configuration
  • a statistical calculation module configured to send a read data request to the corresponding storage unit according to the address of the storage unit; perform statistical counting operation on the read data returned by the storage unit and the data calculation increment, and obtain the obtained The statistical result is written to the corresponding storage unit.
  • the high speed interface module further includes:
  • a Serdes conversion submodule configured to perform serial to parallel conversion of high speed data and convert the statistical request from serial data to parallel data transmission
  • An Interlaken protocol submodule configured to base the parallel data transmitted by the Serdes conversion submodule on the Interlaken format request packet according to the Interlaken format;
  • the parsing unit further includes:
  • the parsing sub-module is configured to obtain any one of the Interlaken format request data packets, and read the request data packet according to a valid flag bit of the statistical message slice, and when the valid flag bit matches, the obtained one is obtained.
  • the data slice is parsed, and the type of the statistical request obtained by the parsing includes: distinguishing the ID number of different statistical services, the number of statistical pairs supported by the statistical service, and statistics.
  • the statistical item supported by the service; the increment of the statistical request obtained by the parsing includes a statistical increment;
  • the statistical preprocessing module further includes:
  • a pre-processing sub-module configured to obtain the ID number of the different statistical services, the number of statistical pairs supported by the statistical service, the statistics items supported by the statistical service, the statistical increment, and the preset configuration
  • the statistical calculation module further includes:
  • the instruction selection sub-module is configured to respond to any one of the statistical request sent by the network processing chip and the access request sent by the main CPU according to the preset scheduling rule, and perform corresponding statistical counting processing or statistical result access processing;
  • the instruction cache submodule is configured to cache the instruction, and wait for an instruction to be executed after extracting the next instruction, where the instruction includes: sending an instruction to read the data request after querying the corresponding storage unit according to the address of the storage unit And writing the obtained statistical result to at least one of an instruction for writing back data of the corresponding storage unit and an instruction for providing a statistical result to the main CPU;
  • the calculating submodule is configured to query the corresponding storage unit according to the address of the storage unit, issue an instruction to read the data request, and perform a statistical counting operation according to the returned read data and the statistical increment.
  • FIG. 2 A method for implementing a statistical counting provided by an embodiment of the present invention is as shown in FIG. 2, where the method includes:
  • Step 101 The statistical counting device receives a statistical request sent by the network processing chip; the statistical counting device is disposed outside the network processing chip;
  • Step 102 The statistical counting device parses the statistical request, and obtains a type of the statistical request and an increment of the statistical request.
  • Step 103 The statistics counting device converts the type of the statistical request and the increment of the statistical request into an address of the built-in memory and a data calculation increment according to a preset configuration.
  • Step 104 The statistical counting device sends a read data request to the corresponding memory according to the address of the memory, and performs statistical counting operation on the read data returned by the memory and the data calculation increment, and writes the obtained statistical result. Enter the corresponding memory.
  • the method further includes:
  • the statistical counting device receives the access request sent by the main CPU
  • the statistical result is obtained from the storage unit via the statistical unit and provided to the primary CPU for use.
  • the statistical counting device receives the statistical request sent by the network processing chip, and includes:
  • the statistical counting device uses a physical link formed by a high speed serializer/deserializer Serdes interface to receive the statistical request in conjunction with the high speed transmission protocol Interlaken.
  • the method further includes:
  • the statistical counting device converts the statistical request from serial data to parallel data transmission
  • the statistical counting device bases the transmitted parallel data on the Interlaken formatted request packet in Interlaken format.
  • the statistical request is an Interlaken format request packet obtained by the network processing chip being encapsulated according to an agreed format
  • the method further includes: the statistical counting device parsing the statistical request according to the agreed format, and obtaining a type of the statistical request and an increment of the statistical request.
  • the request packet of the Interlaken format includes a plurality of statistical packet slices.
  • the method further includes:
  • the request packet of the Interlaken format includes a plurality of statistical packet slices.
  • the method further includes:
  • the currently read request packet is read according to the valid flag of the statistical packet slice to obtain the statistical packet slice, and waits for all statistical packet slices in the currently read request packet to be processed.
  • the next request packet is extracted in the cache.
  • the type of the statistical request includes: an ID number that distinguishes different statistical services, a number of statistical pairs supported by the statistical service, and a statistical item supported by the statistical service;
  • the increment of the statistical request includes a statistical increment.
  • the statistical counting device converts the type of the statistical request and the increment of the statistical request into an address of the built-in memory according to a preset configuration, and the data calculation increment includes:
  • the statistical counting device sends a read data request to the corresponding memory according to the address of the memory, and statistically counts the read data returned by the memory and the statistical increment. The operation is performed, and the obtained statistical result is written into the corresponding memory, including:
  • the statistical counting device queries the corresponding memory according to the address of the built-in memory, and after issuing an instruction to read the data request, performs a statistical counting operation according to the returned read data and the data calculation increment.
  • a system with a statistical counting device is provided in the embodiment of the present invention. As shown in FIG. 3, the system includes: a statistical counting device, and the system further includes any one of a network processing chip and a main CPU;
  • the network processing chip is configured to send a statistical request to the statistical counting device
  • the main CPU is configured to send an access request to the statistical counting device
  • the statistical counting device is the statistical counting device according to any one of the above aspects.
  • the embodiment of the present invention is specifically configured as a high-speed interface module (HIF), and the parsing unit is a packet parsing module (UPK) and a statistical unit.
  • the parsing unit is a packet parsing module (UPK) and a statistical unit.
  • PRE statistical preprocessing module
  • STAT statistical calculation module
  • MEM storage unit
  • CFM configuration management unit
  • the following units and modules may be included, but are not limited to the units and modules described herein.
  • the statistical counting device is independent of the network processing chip, and has a built-in storage unit, such as SRAM or SDRAM,
  • SRAM static random access memory
  • SDRAM static random access memory
  • HIF used to interact with the network processing chip, facilitates the underlying high-speed serial interface, such as the Serdes interface and the upper layer high-speed transport protocol Interlaken, to receive statistical requests sent by the network processing chip.
  • UPK used for the statistical request sent by the external network processing chip, parses according to the format of the agreed statistical request data packet, and parses two analysis results: the request type of the statistical request and the increment of the statistical request.
  • PRE for receiving the parsing result of the message parsing module, and converting into a built-in SRAM address and a data calculation increment, the conversion is: according to the mapping relationship between the request type and the SRAM address, querying and Describe the SRAM address corresponding to the type of the statistical request so as to be addressable to the SRAM according to the SRAM address;
  • the request type is used for addressing
  • the data request increment is used for subsequent data statistics counting, and is not described herein.
  • STAT is used to send a read request to the built-in SRAM according to the SRAM address sent by the PRE, so as to read data from the SRAM for statistical counting; the return data read from the SRAM and the data request increment are counted statistically. Operation, and finally write the statistical calculation result back to the corresponding address in the SRAM;
  • the STAT can also be used to process the read counter value command of the Host CPU because it can be provided to the user through the Host CPU.
  • the SRAM is used to store the statistical counting information, including: the existing counting value and the calculation result according to the newly added counting value and the existing counting value, so as to realize the statistical calculation result obtained by the statistical counting, in order to ensure the access bandwidth.
  • the storage unit uses the on-chip SRAM as the storage medium
  • the storage capacity and the number of groups can be designed as needed (determining the number of access ports to support simultaneous counting of multiple counters), and additionally implemented from an ASIC.
  • the on-chip SRAM can be composed of multiple SRAM small blocks, and multiple SRAM small blocks are serialized into a pipeline, which ensures the feasibility of the ASIC implementation without affecting the access performance.
  • CFM is used to receive the configuration command sent by the Host CPU, access the corresponding register configured by itself according to the configuration command, and write the corresponding configuration item.
  • CFM also includes a DMA module to provide a way for the Host CPU to quickly read the statistical counter value.
  • the statistic counter is located in the statistic counting device and can be located in the statistic calculation module of the statistic counting device as a basic computing tool.
  • the HIF may include:
  • the Serdes conversion sub-module is configured to perform a serial-to-parallel conversion function of high-speed data to convert serial data into parallel data;
  • the Interlaken protocol sub-module is configured to encapsulate the parallel data sent by the Serdes module into a data packet format of a statistical count data packet according to an interconnection protocol optimized for implementing high bandwidth and reliable packet transmission, such as the interlaken protocol.
  • an interconnection protocol optimized for implementing high bandwidth and reliable packet transmission such as the interlaken protocol.
  • the Interlaken protocol sub-module may also be a set of interlaken components, corresponding to an interlaken access port, configured to use the statistical counting device as a plug-in counting chip of a main network processing chip;
  • Interlaken protocol The module can also be a plurality of sets of interlaken components, corresponding to multiple interlaken access ports, thereby realizing the goal of sharing a plug-in counting chip by multiple main network processing chips, and saving system-level cost while satisfying the counting requirement.
  • the UPK may include:
  • the buffer submodule is configured to receive the data packet in the Interlaken format from the high speed interface module, because the data packet includes a plurality of statistical packet slices, and the statistical counting device except the high speed connection
  • the processing module and the sub-module except the port module and the packet parsing module all use the statistical packet slicing as the minimum processing unit, so there is a problem of poor processing rate, and the buffer sub-module is arranged to solve the problem.
  • the speed difference is the function of speed regulation.
  • the parsing sub-module is configured to: take one of the data packets from the buffer sub-module, and cut into a plurality of statistical packet slices according to the slice width of the statistical packet slice, and send the valid slice according to the valid flag of the statistical packet slice.
  • the subsequent modules and sub-modules are processed, and each time a statistical packet is sent, and after all the valid slices in the current data packet are sent, the new data packet is taken out from the cache sub-module.
  • the statistics packet is parsed according to the format of the statistical packet slicing, and the ID number of the different statistical services, the number of statistical pairs supported by the service (Qnum), the statistical increment, and the statistics supported by the service are obtained.
  • For the subsequent unit access storage unit, such as SRAM block, statistical calculation unit for calculating statistical values, etc. to provide the necessary information.
  • the PRE is configured to receive the parsed content of the parsing sub-module, including the ID number of different statistical services, the number of statistically supported columns (Qnum), and the statistical increment of the service. Statistics items supported by the service, etc.
  • reading the configuration command including the statistical rule configured by the configuration management unit to access the corresponding register information configured by itself, and writing the corresponding configuration item;
  • the STAT includes:
  • the instruction selection sub-module is configured to select whether to execute a statistical counting request sent by the network processing chip or a DMA read access request sent by the host CPU, and the scheduling rule is configured by the user;
  • An instruction cache sub-module for buffering a certain number of SRAM access addresses, write-back data, and DMA flags
  • a calculation sub-module (ALU) for accessing the SRAM based on the SRAM access address provided by the PRE A read access command is issued. After the returned data is read, the obtained return data and the data provided by the statistical preprocessing module are incremented for mathematical operations, and finally the calculated result is written back to the corresponding address of the SRAM.
  • ALU calculation sub-module
  • the STAT is further configured to check whether the new SRAM access address has the same address in the Cache, that is, multiple requests access the corresponding address of the SRAM, and if so, the instructions are merged according to certain rules, if Without instruction merging, it is easy to have read and write errors.
  • a read command is issued to the SRAM, and the read return data is returned to the configuration management unit.
  • the embodiment of the present invention has the following main contents from the implementation of the specific application of the method:
  • the statistical counting device receives a request sent by another chip, for example, receiving a statistical request of the network processing chip, and the statistical counting device performs statistical counting. Receiving the DMA read access request sent by the Host CPU, the statistical counting device provides the statistical counting result to the Host CPU for use.
  • the method includes:
  • the statistical request is parsed according to the format of the agreed statistical data packet, and is addressed to the storage unit according to the parsing result, and the data is read from the storage unit;
  • the statistical counting operation is performed according to the read data, and the calculation result is written back to the storage unit such as SRAM for subsequent supply to the Host CPU.
  • the statistical count data packet constituting the statistical request is received, parsed, sent to a storage unit such as SRAM to obtain the read return data, and the statistical count operation is performed according to the return data and the data request increment.
  • a series of actions, such as writing back to the storage unit, such as SRAM, are implemented by a pipeline architecture to improve system processing performance.
  • the pipeline The architecture is the architecture formed by the various units and modules of the statistical counting device.
  • the receiving, by the Serdes interface, a statistical request sent by another chip, such as the network processing chip specifically includes:
  • the physical path of the statistical request transmission uses the current popular high-speed serial interface, the Serdes interface, and the upper layer transmission protocol, Interlaken, to achieve high-performance, high-university line link transmission.
  • the interworking upper layer transmission protocol uses Interlaken, which is optimized for high bandwidth and reliable packet transmission, and can achieve high versatility and high compatibility of the interface protocol. .
  • the statistical request exists in the form of a statistical count data packet, where the statistical count data packet based on the Interlaken protocol encapsulation format includes a plurality of statistical packet slices, and each statistic
  • the packet slice encapsulates the statistical count data packet according to the specified packet encapsulation format
  • the peer chip such as the network processing chip
  • the statistical counting device encapsulates the statistical count data packet according to the specified packet encapsulation format, and the device receives the packet.
  • the statistics packet is counted, it is decapsulated by a special decapsulation module, such as UPK, and the information needed for statistical counting is extracted for corresponding calculation.
  • a specified packet encapsulation format that is, an Interlaken encapsulation format, includes information necessary for statistical counting calculation, including a valid flag bit, an ID number for distinguishing different statistical services, and the like.
  • a plurality of statistical packet slices included in the statistical counting data packet of the Interlaken encapsulation format, and the plurality of statistical packet slices are independent of each other, and are determined by corresponding valid flag bits. Whether the statistics packet is valid.
  • the one statistical message slice includes at most two target statistical count items, corresponding to two sets of statistical counters.
  • the target statistical count item may correspond to a counter, or may correspond to multiple counters, such as a statistic item of the TM, and may need to count the number of packets and the packet length at the same time. Two counters are required, and the number of counters corresponding to a specific statistical count item can be configured by the user.
  • the statistical packet slice after decapsulation enters the PRE, and the address information of the counter in the SRAM and the calculation increment are determined according to the corresponding identification information.
  • the address information of the counter in the SRAM and the calculation increment enter the STAT, the STAT completes the counting function of each statistical item, issues an SRAM read command, and performs calculation after receiving the SRAM return value. Finally, write the result of the calculation back to the corresponding address of the SRAM.
  • the STAT can process conflicts that may be generated by the SRAM read and write data, and can process the access request of the statistical count command and the DMA read command for the SRAM according to a certain priority scheduling.
  • the access priority may be a default priority configuration or a user configured priority.
  • the STAT may also be implemented by a pipeline architecture.
  • a Cache is arranged to cache the SRAM access address, access type flag, calculation increment and other information to avoid the calculation information not being updated in time to cause the count value not to be The exact problem, for example, there are two statistical count data packets pointing to the same group of counters.
  • the second statistical count data packet is introduced for calculation, which is bound to be incorrectly calculated. , using Cache, can make the first statistical count packet operation after the end of the second statistical count packet, so calculate the knot It will not be wrong.
  • the Cache in the STAT is implemented by a queue structure, and the queue depth is determined by the SRAM access delay. If the new SRAM access address pointed to by the statistical count packet is in the instruction cache submodule If it can be found, then the access requests that point to the same SRAM address are merged (statistical count packet merge) and the incremental information is combined.
  • the STAT can design multiple sets of SRAM access ports, thereby implementing parallel processing of multiple sets of statistical count requests.
  • the specifications of the storage unit such as the SRAM
  • the SRAM memory block can be composed of multiple SRAM small blocks, and multiple SRAM small blocks are in the form of pipelines, which ensures the feasibility of the ASIC implementation without affecting the access performance.
  • the embodiment of the invention implements functions such as statistical request data packet receiving, parsing, SRAM storage data reading, statistical calculation, write back, DMA access, etc. based on Serdes interface, Interlaken, SRAM and calculation logic, and the existing counting network processing chip Including the network processing chip for the network processor (NP, Network Processor,), switching access processor (SA, Switch Access) and other chips of on-chip storage resources or external memory (SRAM or SDRAM), etc. Cost, flexibility, versatility, and access bandwidth have advantages.
  • NP Network Processor
  • SA switching access processor
  • SRAM or SDRAM external memory
  • Application scenario 1 A system with a statistical counting device based on a statistical counting device, a network processing chip, and a CPU implements a statistical counting function, wherein a network processing chip that interacts with the statistical counting device is a scenario.
  • FIG. 4 shows a high-speed serial statistical count of the first embodiment of the present invention in the application scenario.
  • SST Serial Statistics
  • the NP/SA sends the statistical request to the SST through the Serdes interface.
  • the SST After receiving the statistical request, the SST performs statistical counting according to the steps shown in Figure 5 below, and writes the count value into the built-in SRAM.
  • the SST reception can also be performed.
  • the access request of the read counter value sent by the host CPU is received and corresponding processing is performed.
  • Step 501 The NP/SA encapsulates the statistical request into a statistical packet slice according to the format of the statistical request data packet specified by the SST.
  • Step 502 The NP/SA combines and fills a plurality of statistical packet slices into a single Interlaken data packet, and sends the data through the Serdes interface.
  • Step 503 serial data is transmitted on a physical link formed by the Serdes interface
  • Step 504 After the serial data is transmitted to the SST through the physical link formed by the Serdes interface, the HIF obtains the statistical request data packet according to the Interlaken protocol.
  • Step 505 The UPK parses the statistical request data packet, and outputs two parsing contents: a type of the statistical request and an increment of the statistical request;
  • Step 506 The PRE receives the parsed content of the UPK, and converts the type of the statistical request and the increment of the statistical request into an address of the SRAM and an increment of the calculation;
  • Step 507 the STAT sends a read request to the SRAM according to the SRAM address sent by the PRE, and the SRAM read return data is incremented by the statistical request, and finally the calculation result is written back to the corresponding address in the SRAM;
  • the STAT can also process an access request of the CPU to read the counter count value
  • Step 508 The Host CPU sends a read counter value request to the STAT through the CFM, and receives a corresponding return value.
  • Figure 6 shows the overall implementation architecture of the SST.
  • the SST intersects with the NP/SA.
  • Mutual, the SST includes:
  • the HIF is configured to receive a statistical request sent by the NP/SA.
  • the UPK is configured to parse the statistical request sent by the SST external NP/SA, and output two parsing contents: the type of the statistical request and the increment of the statistical request;
  • PRE configured to receive the parsed content of the UPK output, and convert the statistical request into an address of the SRAM and calculate the increment
  • STAT is configured to send a read request to the SRAM according to the SRAM address sent by the PRE, read the SRAM read return data and the statistical request increment, and finally write the calculation result back to the corresponding address in the SRAM, and also process the read counter value of the Host CPU. command;
  • SRAM can be used as the storage medium.
  • the storage capacity and the number of groups can be designed according to the needs (determining the number of access ports to support simultaneous counting of multiple counters).
  • SRAM memory blocks can be composed of multiple SRAMs.
  • the small block is composed of multiple SRAM small blocks in the form of pipelines, which ensures the feasibility of ASIC implementation without affecting the access performance.
  • the MEM uses the SRAM as a storage medium, and two sets of access ports are used to support the statistical access of the two sets of statistical count items at most;
  • the CFM is configured to receive the configuration command sent by the Host CPU, access the corresponding register, and write the corresponding configuration item.
  • the DMA module (not shown) is provided to provide the Host CPU to quickly read the statistical counter value path.
  • PCIe may be used as a CPU access path.
  • Figure 7 shows the HIF internal implementation architecture. Figure 7 also shows HIF-based data flow transmission.
  • the HIF includes:
  • the Serdes conversion sub-module is configured to receive the high-speed serial bit stream data sent by the NP/SA, and complete the serial-to-parallel conversion function.
  • the number of Serdes links is variable, and can be selected according to bandwidth requirements in actual applications.
  • Interlaken protocol submodule configured to optimize for high bandwidth and reliable packet transmission
  • the interconnection protocol, Interlaken protocol encapsulates the parallel data sent by the Serdes conversion sub-module into a statistical message data packet format, and completes link detection and protection functions.
  • the standard Interlaken data message is shown in Table 1, including the data valid flag pkt_ena, the packet header pkt_sop, the packet tail pkt_eop, the error flag pkt_err, the packet data pkt_dat, and the like.
  • the length of the interlaken data packet is 3 beats
  • the data width of the interlaken per beat is 1024 bits. In actual use, it is not limited to this scenario.
  • Figure 8 shows the internal implementation architecture of the UPK.
  • Figure 8 also shows an UPK-based data stream transmission.
  • the UPK includes:
  • the buffer submodule is configured to receive the request packet in the Interlaken format from the HIF, reject the error packet, and splicing all relevant information of the valid data packet into the buffer FIFO. Since a data packet contains a plurality of statistical packet slices, and the subsequent processing module and the sub-module use the statistical packet slice as the minimum processing unit, there is a problem of processing rate difference, and the buffer sub-module just solves such a problem. The rate is poor.
  • Parsing the sub-module taking a data packet from the buffer sub-module, cutting into a plurality of statistical message slices according to the slice bit width, and transmitting the valid slice to the subsequent processing module and the sub-module according to the slice valid flag bit, each time sending a statistical report
  • the slice is sliced, and after all the valid slices in the data packet are sent, the new data packet is taken out from the cache submodule.
  • the statistics packet is parsed according to the format of the statistical packet slicing, and the ID number of the different statistical services, the number of columns supported by the service (Queue), the statistical increment, and the statistics supported by the service are obtained. Provide necessary information for subsequent units to access SRAM blocks, calculate statistical values, and so on.
  • Table 1 is a standard packet format, which is not limited to this format, and may be extended on the basis of the description.
  • the statistics packet is 88 bits wide and contains two sets of statistical service items, each occupying 44 bits.
  • the two sets of statistical service items are independent of each other.
  • the two sets of statistical business item fields have the same format and contain the following fields:
  • Vld the valid flag of the service
  • Service ID Service id, which distinguishes different statistical service items and supports up to 8 different statistical service items, such as TM enqueue statistics, TM dequeue statistics, and OAM statistics.
  • Qnum Counts the number of statistics queues. It supports a maximum of 1M statistics queues. For example, the number of statistics queues that the TM needs to support is 512K. The lower 19 bits of the field indicates the queue number, and the highest bit is 0.
  • Len Statistics increment. The maximum supported increment is 32K. For example, if the TM statistics need to count the packet length, you can put the packet length data into this field.
  • Type id The number of statistics items to be counted in the next statistics queue.
  • the maximum number of statistics items is 16 statistics.
  • the service ID is 0, the number of queues that need to be supported is 512K, and the number of queues that need to be supported is 512K.
  • the lowest number of Qnum is the corresponding queue number.
  • the maximum number is fixed to 0.
  • Each queue needs to support 11 statistics items, including normal enqueue and TD discard. Disable packet discarding, WRED/GRED 0 to 7 priority packet loss, etc., corresponding to Type 0 to 10, each packet supports packet length statistics, and the packet length is determined by The Len field is indicated.
  • the PRE is configured to receive the parsed content of the UPK unit and convert the statistical request into the address of the SRAM and calculate the increment.
  • the SRAM address corresponding to the target counter can be calculated according to the statistical rule corresponding to the ID number, the Qnum, the Type field, and some configuration information preset by the user, and the calculation process is as shown in FIG. 9 , including :
  • Step 901 Query the user configuration register according to the received statistical request ID number information, and obtain the starting address of the ID corresponding service in the memory and the service counting manner from the register (only counting the number of packets or counting the number of packets) The length of the packet or only the length of the packet) and other information necessary for subsequent calculation of the counter address;
  • Step 902 Query a user configuration register according to the Type number information, and obtain an offset address of the relative ID base address of the Type corresponding statistical item in the memory from the register;
  • Step 903 Calculate the memory address according to the Qnum and the configuration information obtained by the previous query.
  • the ID number is 0, indicating the TM enqueue statistics
  • the corresponding start address in the memory is base_addr_id0
  • the statistical counting mode is only the number of packets
  • the Type number is 0, indicating normal entry.
  • the offset address of the team is base_addr_type0
  • the SRAM data bit width is 100bit
  • the counter bit width is 50bit. Then one SRAM address can store two counters. Therefore, the SRAM memory address corresponding to the statistics item of queue number Qnum is:
  • the counter corresponds to the lower 100 bits of the 100 bit corresponding to the Addr position, otherwise the corresponding height is 50 bits.
  • the SRAM corresponding to the statistical item of the queue number is Qnum.
  • the memory address is:
  • the high 50bit corresponds to the packet number counter, and the lower 50bit corresponds to the packet length counter.
  • FIG. 10 is a schematic diagram of an internal implementation architecture of the STAT, and the STAT and the PRE are Line interaction, Figure 10 also shows the data stream transmission implemented by the STAT-based pipeline architecture, the STAT includes:
  • the MUX receives the statistical request from the PRE and the DMA read access request from the CFM, and selects a command response according to the priority configured by the user;
  • Cache due to the internal implementation of the pipeline architecture in SST, in order to solve the potential problems caused by SRAM access delay, for example, multiple statistical request packets point to the same set of counters, so that the calculation information is not updated in time due to the SRAM access delay, resulting in inaccurate count values.
  • the problem is that the Cache is used in the design of the embodiment to cache the SRAM access address, the access type flag, the calculation increment, and the like;
  • the ALU receives the statistical request or the DMA read access request scheduled by the MUX, and compares the SRAM access address with the cached address in the Cache. If there is no identical, the read access request is directly sent to the MEM, and the address information is written. Cache; otherwise, the new access request is merged with the address cached in the Cache, and the counter increment carried by it is also combined and settled, and the content in the Cache is updated.
  • the MEM returns the read data, according to the counting rule provided by the PRE, the data calculation increment and the read return data are mathematically operated to obtain new count value information, and a write command request is sent to the MEM, and the count value is written into the MEM.
  • the counter value is returned to CFM; at this time, the data written in the MEM is determined by the read clear mode, and if it is read, the value 0 is written into the MEM, otherwise the original value is written. MEM.
  • Application scenario 2 A system with a statistical counting device based on a statistical counting device, a network processing chip, and a CPU implements a statistical counting function.
  • the network processing chip that interacts with the statistical counting device has two scenarios, and the present invention is not limited to two.
  • the scene can also be more than two scenes, and will not be described.
  • the high-speed serial statistical counting device (SST) of the second embodiment of the present invention is used as an external statistical chip of two network processing chips (taking NP or SA as an example). application.
  • NP/SA sends statistics requests through the Serdes interface.
  • the SST receives the statistical request, performs statistical counting according to the steps shown in Figure 12 below, and writes the count value into the built-in SRAM.
  • the SST receives the access request of the read counter value sent by the Host CPU. Perform the corresponding processing.
  • This embodiment is similar to the first embodiment of the present invention, except that in the first embodiment of the present invention, the SST is used as a plug-in statistical chip application of a network processing chip, and in this embodiment, the SST is used as a plug-in for two network processing chips.
  • the internal implementation structure and process of the two embodiments are basically the same.
  • the specific working process shown in Figure 12 includes:
  • Step 1201 The two NP/SAs respectively encapsulate the statistical request into a statistical packet slice according to the statistical packet format specified by the SST.
  • Step 1202 The two NP/SAs respectively combine and fill a plurality of statistical packet slices into a one-shot Interlaken data packet, and send the same through the Serdes interface;
  • Step 1203 Serial data is transmitted on a physical link formed by two sets of Serdes interfaces
  • Step 1204 After the serial data is transmitted to the SST through the physical link formed by the Serdes interface, the HIF obtains two sets of statistical request data packets according to the Interlaken protocol;
  • Step 1205 The UPK parses the statistical request data packet, and outputs two parsing contents: a type of the statistical request and an increment of the statistical request;
  • Step 1206 the PRE receives the parsed content of the UPK, and converts the type of the statistical request and the increment of the statistical request into an address of the SRAM and an increment of the calculation;
  • Step 1207 the STAT sends a read request to the SRAM according to the SRAM address sent by the PRE, and reads the SRAM read return data and the statistical request increment, and finally writes the calculation result back to the corresponding address in the SRAM;
  • the STAT can also process an access request of the CPU to read the counter count value
  • Step 1208 The Host CPU sends a read counter value request to the STAT through the CFM, and receives a corresponding return value.
  • This embodiment relates to the internal implementation architecture of the SST, the internal implementation architecture of the HIF, and the UPK.
  • the implementation architecture, the statistical pre-processing flow based on the PRE implementation, and the description of the internal implementation architecture of the STAT are the same as the first embodiment of the present invention, and the specific description is as follows:
  • the SST internal overall implementation architecture, the SST interacts with the NP/SA, and the SST includes:
  • the HIF is configured to receive a statistical request sent by the NP/SA.
  • the UPK is configured to parse the statistical request sent by the SST external NP/SA, and output two parsing contents: the type of the statistical request and the increment of the statistical request;
  • PRE configured to receive the parsed content of the UPK output, and convert the statistical request into an address of the SRAM and calculate the increment
  • STAT is configured to send a read request to the SRAM according to the SRAM address sent by the PRE, read the SRAM read return data and the statistical request increment, and finally write the calculation result back to the corresponding address in the SRAM, and also process the read counter value of the Host CPU. command;
  • SRAM can be used as the storage medium.
  • the storage capacity and the number of groups can be designed according to the needs (determining the number of access ports to support simultaneous counting of multiple counters).
  • SRAM memory blocks can be composed of multiple SRAMs.
  • the small block is composed of multiple SRAM small blocks in the form of pipelines, which ensures the feasibility of ASIC implementation without affecting the access performance.
  • the MEM uses the SRAM as a storage medium, and two sets of access ports are used to support the statistical access of the two sets of statistical count items at most;
  • the CFM is configured to receive the configuration command sent by the Host CPU, access the corresponding register, and write the corresponding configuration item.
  • the DMA module (not shown) is provided to provide the Host CPU to quickly read the statistical counter value path.
  • PCIe may be used as a CPU access path.
  • Figure 7 shows the HIF internal implementation architecture. Figure 7 also shows HIF-based data flow transmission.
  • the HIF includes:
  • the Serdes conversion submodule is configured to receive high speed serial bit stream data sent by the NP/SA, After the serial-to-parallel conversion function is completed, the number of Serdes links is variable, and can be selected according to bandwidth requirements in actual applications.
  • the Interlaken protocol sub-module is configured to encapsulate the parallel data sent by the Serdes conversion sub-module into a statistical message data packet format according to the Interlaken protocol, which is optimized for high bandwidth and reliable packet transmission, and complete the link. Detection and protection functions.
  • the standard Interlaken data message is shown in Table 2, including the data valid flag pkt_ena, the packet header pkt_sop, the packet tail pkt_eop, the error flag pkt_err, the packet data pkt_dat, and the like.
  • the length of the interlaken data packet is 3 beats
  • the data width of the interlaken per beat is 1024 bits. In actual use, it is not limited to this scenario.
  • Figure 8 shows the internal implementation architecture of the UPK.
  • Figure 8 also shows an UPK-based data stream transmission.
  • the UPK includes:
  • the buffer submodule is configured to receive the request packet in the Interlaken format from the HIF, reject the error packet, and splicing all relevant information of the valid data packet into the buffer FIFO. Due to one data
  • the packet contains multiple statistical packet slices, and the subsequent processing module and sub-module use the statistical packet slice as the minimum processing unit, so there is a problem of processing rate difference, and the buffer sub-module just solves such a rate difference.
  • Parsing the sub-module taking a data packet from the buffer sub-module, cutting into a plurality of statistical message slices according to the slice bit width, and transmitting the valid slice to the subsequent processing module and the sub-module according to the slice valid flag bit, each time sending a statistical report
  • the slice is sliced, and after all the valid slices in the data packet are sent, the new data packet is taken out from the cache submodule.
  • the statistics packet is parsed according to the format of the statistical packet slicing, and the ID number of the different statistical services, the number of columns supported by the service (Queue), the statistical increment, and the statistics supported by the service are obtained. Provide necessary information for subsequent units to access SRAM blocks, calculate statistical values, and so on.
  • Table 2 is a standard packet format, which is for illustrative purposes only and is not limited to this format, and may be extended on the basis of this.
  • the statistics packet is 88 bits wide and contains two sets of statistical service items, each occupying 44 bits.
  • the two sets of statistical service items are independent of each other.
  • the two sets of statistical business item fields have the same format and contain the following fields:
  • Vld the valid flag of the service
  • Service ID Service id, which distinguishes different statistical service items and supports up to 8 different statistical service items, such as TM enqueue statistics, TM dequeue statistics, and OAM statistics.
  • Qnum Counts the number of statistics queues. It supports a maximum of 1M statistics queues. For example, the number of statistics queues that the TM needs to support is 512K. The lower 19 bits of the field indicates the queue number, and the highest bit is 0.
  • Len Statistics increment. The maximum supported increment is 32K. For example, if the TM statistics need to count the packet length, you can put the packet length data into this field.
  • Type id The number of statistics items to be counted in the next statistics queue.
  • the maximum number of statistics items is 16 statistics.
  • each data packet supports packet length statistics, and the packet length is represented by a Len field.
  • the PRE is configured to receive the parsed content of the UPK unit and convert the statistical request into the address of the SRAM and calculate the increment.
  • the SRAM address corresponding to the target counter can be calculated according to the statistical rule corresponding to the ID number, the Qnum, the Type field, and some configuration information preset by the user, and the calculation process is as shown in FIG. 9 , including :
  • Step 901 Query the user configuration register according to the received statistical request ID number information, and obtain the starting address of the ID corresponding service in the memory and the service counting manner from the register (only counting the number of packets or counting the number of packets) The length of the packet or only the length of the packet) and other information necessary for subsequent calculation of the counter address;
  • Step 902 Query a user configuration register according to the Type number information, and obtain an offset address of the relative ID base address of the Type corresponding statistical item in the memory from the register;
  • Step 903 Calculate the memory address according to the Qnum and the configuration information obtained by the previous query.
  • the ID number is 0, indicating the TM enqueue statistics
  • the corresponding start address in the memory is base_addr_id0
  • the statistical counting mode is only the number of packets
  • the Type number is 0, indicating normal entry.
  • the offset address of the team is base_addr_type0
  • the SRAM data bit width is 100bit
  • the counter bit width is 50bit. Then one SRAM address can store two counters. Therefore, the SRAM memory address corresponding to the statistics item of queue number Qnum is:
  • the counter corresponds to the lower 100 bits of the 100 bit corresponding to the Addr position, otherwise the corresponding height is 50 bits.
  • the SRAM corresponding to the statistical item of the queue number is Qnum.
  • the memory address is:
  • the high 50bit corresponds to the packet number counter, and the lower 50bit corresponds to the packet length counter.
  • FIG. 10 is a schematic diagram of an internal implementation architecture of the STAT.
  • the STAT interacts with the PRE.
  • FIG. 10 also shows data flow transmission implemented by a STAT-based pipeline architecture.
  • the STAT includes:
  • the MUX receives the statistical request from the PRE and the DMA read access request from the CFM, and selects a command response according to the priority configured by the user;
  • Cache due to the internal implementation of the pipeline architecture in SST, in order to solve the potential problems caused by SRAM access delay, for example, multiple statistical request packets point to the same set of counters, so that the calculation information is not updated in time due to the SRAM access delay, resulting in inaccurate count values.
  • the problem is that the Cache is used in the design of the embodiment to cache the SRAM access address, the access type flag, the calculation increment, and the like;
  • the ALU receives the statistical request or the DMA read access request scheduled by the MUX, and compares the SRAM access address with the cached address in the Cache. If there is no identical, the read access request is directly sent to the MEM, and the address information is written. Cache; otherwise, the new access request is merged with the address cached in the Cache, and the counter increment carried by it is also combined and settled, and the content in the Cache is updated.
  • the MEM returns the read data, according to the counting rule provided by the PRE, the data calculation increment and the read return data are mathematically operated to obtain new count value information, and a write command request is sent to the MEM, and the count value is written into the MEM.
  • the counter value is returned to CFM; at this time, the data written in the MEM is determined by the read clear mode, and if it is read, the value 0 is written into the MEM, otherwise the original value is written. MEM.
  • the integrated modules described in the embodiments of the present invention may also be stored in a computer readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions.
  • Make a computer device can be a personal computing The machine, server, or network device, etc.) performs all or part of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or an optical disk, and the like.
  • the embodiment of the present invention further provides a computer storage medium, wherein a computer program is stored, and the computer program is used to execute the statistical counting implementation method of the embodiment of the present invention.
  • the statistical counting device is disposed outside the network processing chip, and the statistical counting device has a built-in storage unit such as SRAM, and the counting unit can be independently completed by a series of units inside the statistical counting device, and the result is directly returned to the built-in storage.
  • the unit is stored, and the device architecture can not only realize the statistical counting function at a high speed, but also avoid the problem that the manufacturing cost of the network processing chip and the access bandwidth are small due to the above-mentioned prior art architecture.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明公开了一种统计计数设备及其实现方法、及具有统计计数设备的系统,其中,所述统计计数设备设置于网络处理芯片外部,所述设备包括:接收单元,配置为接收网络处理芯片发送的统计请求;解析单元,配置为对所述统计请求进行解析,得到统计请求的类型和统计请求的增量;统计单元,配置为根据预设配置将所述统计请求的类型和统计请求的增量转化为存储单元的地址及数据计算增量;根据所述存储单元的地址向对应的存储单元发送读取数据请求;将存储单元返回的所读取的数据与所述数据计算增量进行统计计数运算,并将得到的统计结果写入对应的存储单元。

Description

统计计数设备及其实现方法、具有统计计数设备的系统 技术领域
本发明涉及数据通信领域的统计技术,尤其涉及一种统计计数设备及其实现方法、及具有统计计数设备的系统。
背景技术
本申请发明人在实现本申请实施例技术方案的过程中,至少发现相关技术中存在如下技术问题:
随着Internet的迅猛发展,用于主干网络互联的核心路由器的接口速率达到100Gbps,并且网络处理芯片所支持的流量管理(TM,Traffic Management)、操作管理及维护(OAM,Operation Administration and Maintenance)等功能模块的性能也越来越高,带来的问题是对于统计计数的需求也相应提高。
现有方案一:如果采用片内静态随机访问存储器(SRAM,Static Random Access Memory)存储统计计数信息,为了满足现代核心路由器的性能要求,需要占用大量的片内存储器资源,由于片内SRAM是设置于所述网络处理芯片内部的存储器,随着片内SRAM的增加,会带来网络处理器芯片成本的大量增长,显然这是不能承受的。
现有方案二:如果采用设置于所述网络处理芯片外部的外挂存储器,如SRAM或者同步动态随机访问存储器(SDRAM,Synchronous Dynamic Random Access Memory),将计算好的统计计数值写到该外挂存储器中,这样做的好处是不必占用所述网络处理芯片的片内存储资源,成本较低;缺点是实现统计计数功能的计数模块需要放在网络处理芯片内部,也需要占用一定的网络处理芯片的面积,会导致网络处理器芯片的成本增加。而且, 如果采用网络处理芯片内置的计数模块、及外挂存储器如SRAM,由于计数模块得到的结果需要传输给SRAM存储,二者的频繁交互势必会占用带宽,从而导致出现访问带宽偏小的问题。
综上所述,无论是采用现有方案一,还是采用现有方案二,都存在各自的缺陷,会导致网络处理芯片的制造成本增加,访问带宽偏小的问题,对于这个问题,相关技术并未存在有效的解决方案。
发明内容
有鉴于此,本发明实施例希望提供一种统计计数设备及其实现方法、及具有统计计数设备的系统,能实现统计计数功能,避免网络处理芯片的制造成本增加,访问带宽偏小的问题。
本发明实施例的技术方案是这样实现的:
本发明实施例的一种统计计数设备,所述统计计数设备设置于网络处理芯片外部,所述设备包括:
接收单元,配置为接收网络处理芯片发送的统计请求;
解析单元,配置为对所述统计请求进行解析,得到统计请求的类型和统计请求的增量;
统计单元,配置为根据预设配置将所述统计请求的类型和统计请求的增量转化为存储单元的地址及数据计算增量;根据所述存储单元的地址向对应的存储单元发送读取数据请求;将存储单元返回的所读取的数据与所述数据计算增量进行统计计数运算,并将得到的统计结果写入对应的存储单元;
存储单元,配置为存储数据,接收到所述读取数据请求,响应所述读取数据请求并将所读取的数据返回所述统计单元。
上述方案中,所述统计计数设备还包括:
配置单元,配置为接收主CPU发送的访问请求,响应所述访问请求, 经由所述统计单元从所述存储单元获取所述统计结果并提供给主CPU使用。
上述方案中,所述接收单元,还包括:
高速接口模块,配置为采用高速串行器/解串器Serdes接口构成的物理链路,配合高速传输协议Interlaken来接收所述统计请求。
上述方案中,所述高速接口模块,还包括:
Serdes转换子模块,配置为完成高速数据的串并转换,并将所述统计请求由串行数据转换为以并行数据传输;
Interlaken协议子模块,配置为将所述Serdes转换子模块传输的所述并行数据基于所述Interlaken封装为Interlaken格式的请求数据包。
上述方案中,所述解析单元,还包括:
报文解析模块,配置为按照约定的格式对所述统计请求进行解析,得到统计请求的类型和统计请求的增量;
所述统计请求,为由所述网络处理芯片按照所述约定的格式进行封装得到的;
所述约定的格式为基于所述Interlaken封装的格式,基本单元为统计报文切片。
上述方案中,采用高速串行器/解串器Serdes接口构成的物理链路,配合高速传输协议Interlaken来接收所述统计请求的高速接口模块;
所述高速接口模块,还包括:
Serdes转换子模块,配置为完成高速数据的串并转换,并将所述统计请求由串行数据转换为以并行数据传输;
Interlaken协议子模块,配置为将所述Serdes转换子模块传输的所述并行数据基于所述Interlaken封装为Interlaken格式的请求数据包;
任一个所述Interlaken格式的请求数据包,包括多个统计报文切片;
相应的,所述解析单元,还包括:
解析子模块,配置为获取任一个所述Interlaken格式的请求数据包,根据统计报文切片的有效标志位读取所述请求数据包,与所述有效标志位匹配时,对得到的一个统计报文切片进行解析,直至读取出所述请求数据包中包含的所有统计报文切片并解析。
上述方案中,所述解析单元,还包括:
缓存子模块,配置为存储待解析的所有请求数据包;
所述解析子模块,还配置为对当前读取的请求数据包根据统计报文切片的有效标志位读取,以获得所述统计报文切片的情况下,等待当前读取的请求数据包中的所有统计报文切片都处理完毕后再从所述缓存子模块提取下一个请求数据包;
以及,对所述统计报文切片进行解析,得到的所述统计请求的类型包括:区分不同统计业务的ID号、统计业务支持的统计对列数、及统计业务支持的统计项。
上述方案中,所述统计单元,还包括:
统计预处理模块,配置为根据预设配置将所述统计请求的类型和统计请求的增量转化为存储单元的地址及数据计算增量;
统计计算模块,配置为根据所述存储单元的地址向对应的存储单元发送读取数据请求;将存储单元返回的所读取的数据与所述数据计算增量进行统计计数运算,并将得到的统计结果写入对应的存储单元。
上述方案中,其中,所述接收单元,还包括:
采用高速串行器/解串器Serdes接口构成的物理链路,配合高速传输协议Interlaken来接收所述统计请求的高速接口模块;
所述高速接口模块,还包括:
Serdes转换子模块,配置为完成高速数据的串并转换,并将所述统计 请求由串行数据转换为以并行数据传输;
Interlaken协议子模块,配置为将所述Serdes转换子模块传输的所述并行数据基于所述Interlaken封装为Interlaken格式的请求数据包;
任一个所述Interlaken格式的请求数据包,包括多个统计报文切片;
所述解析单元,还包括:
解析子模块,配置为获取任一个所述Interlaken格式的请求数据包,根据统计报文切片的有效标志位读取所述请求数据包,与所述有效标志位匹配时,对得到的一个统计报文切片进行解析,解析得到的所述统计请求的类型包括:区分不同统计业务的ID号、统计业务支持的统计对列数及统计业务支持的统计项;解析得到的所述统计请求的增量包括统计增量;
相应的,所述统计预处理模块还包括:
预处理子模块,配置为获取所述区分不同统计业务的ID号、所述统计业务支持的统计对列数及所述统计业务支持的统计项、所述统计增量及所述预设配置;
以及,基于所述预设配置得到的统计规则,根据所述区分不同统计业务的ID号得到配置信息,根据所述统计业务支持的统计项得到对应的存储单元基地址,根据所述存储单元基地址、所述统计业务支持的统计对列数及所述配置信息计算得到存储单元目标地址,以根据所述存储单元目标地址为所述存储单元的地址来查询对应的存储单元;基于所述预设配置得到的统计规则,根据所述统计增量得到所述数据计算增量。
上述方案中,所述统计计算模块,还包括:
指令选择子模块,配置为根据预设调度规则对网络处理芯片发送的统计请求及主CPU发送的访问请求中的任意一个请求选择进行响应,并执行对应的统计计数处理或统计结果访问处理;
指令缓存子模块,配置为对指令进行缓存,等待一个指令执行完毕后 再提取下一个指令,所述指令包括:根据所述存储单元的地址查询到对应的存储单元后发送读取数据请求的指令、将得到的统计结果写入对应的存储单元的回写数据的指令、提供统计结果给主CPU访问的指令中的至少一种指令;
计算子模块,配置为根据所述存储单元的地址查询到对应的存储单元,发出读取数据请求的指令后,根据返回的所读取的数据与所述统计增量进行统计计数运算。
所述接收单元、所述解析单元、所述统计单元、所述存储单元、所述配置单元、所述高速接口模块、所述Serdes转换子模块、所述Interlaken协议子模块、所述报文解析模块、所述解析子模块、所述缓存子模块、所述统计预处理模块、所述统计计算模块、所述预处理子模块、所述指令选择子模块、所述指令缓存子模块、所述计算子模块在执行处理时,可以采用中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Singnal Processor)或可编程逻辑阵列(FPGA,Field-Programmable Gate Array)实现。
本发明实施例的一种统计计数实现方法,所述方法包括:
统计计数设备接收网络处理芯片发送的统计请求;所述统计计数设备设置于网络处理芯片外部;
统计计数设备对所述统计请求进行解析,得到统计请求的类型和统计请求的增量;
统计计数设备根据预设配置将所述统计请求的类型和统计请求的增量转化为内置存储器的地址及数据计算增量;统计计数设备根据所述存储器的地址向对应的存储器发送读取数据请求,将存储器返回的所读取的数据与所述数据计算增量进行统计计数运算,并将得到的统计结果写入对应的存储器。
上述方案中,所述方法还包括:
统计计数设备接收主CPU发送的访问请求;
响应所述访问请求,经由所述统计单元从所述存储单元获取所述统计结果并提供给所述主CPU使用。
上述方案中,所述统计计数设备接收网络处理芯片发送的统计请求,包括:
统计计数设备采用高速串行器/解串器Serdes接口构成的物理链路,配合高速传输协议Interlaken来接收所述统计请求。
上述方案中,所述方法还包括:
统计计数设备将所述统计请求由串行数据转换为以并行数据传输;
统计计数设备将传输的所述并行数据基于所述Interlaken封装为Interlaken格式的请求数据包。
上述方案中,所述统计请求为由所述网络处理芯片按照约定的格式进行封装得到的Interlaken格式的请求数据包;
所述方法还包括:所述统计计数设备按照所述约定的格式对所述统计请求进行解析,得到统计请求的类型和统计请求的增量。
上述方案中,任一个所述Interlaken格式的请求数据包,包括多个统计报文切片;
所述方法还包括:
获取任一个所述Interlaken格式的请求数据包,根据统计报文切片的有效标志位读取所述请求数据包,与所述有效标志位匹配时,对得到的一个统计报文切片进行解析,直至读取出所述请求数据包中包含的所有统计报文切片并解析。
上述方案中,任一个所述Interlaken格式的请求数据包,包括多个统计报文切片;
所述方法还包括:
在缓存中存储待解析的所有请求数据包;
对当前读取的请求数据包根据统计报文切片的有效标志位读取,以获得所述统计报文切片,等待当前读取的请求数据包中的所有统计报文切片都处理完毕后再从所述缓存中提取下一个请求数据包。
上述方案中,所述统计请求的类型包括:区分不同统计业务的ID号、统计业务支持的统计对列数、及统计业务支持的统计项;
所述统计请求的增量包括统计增量。
上述方案中,所述统计计数设备根据预设配置将所述统计请求的类型和统计请求的增量转化为内置存储器的地址及数据计算增量包括:
获取所述区分不同统计业务的ID号、所述统计业务支持的统计对列数、及统计业务支持的统计项、及所述预设配置;
基于所述预设配置得到的统计规则,根据所述区分不同统计业务的ID号得到配置信息,根据所述统计业务支持的统计项得到对应的存储器基地址,根据所述存储器基地址、所述统计业务支持的统计对列数及所述配置信息计算得到存储器目标地址并作为所述内置存储器的地址;
获取所述统计增量,基于所述预设配置得到的统计规则,得到所述数据计算增量。
上述方案中,所述统计计数设备根据所述存储器的地址向对应的存储器发送读取数据请求,将存储器返回的所读取的数据与所述统计增量进行统计计数运算,并将得到的统计结果写入对应的存储器,包括:
所述统计计数设备根据所述内置存储器的地址查询到对应的存储器,发出读取数据请求的指令后,根据返回的所读取的数据与所述数据计算增量进行统计计数运算。
本发明实施例的一种具有统计计数设备的系统,所述系统包括:统计 计数设备,所述系统还包括网络处理芯片、主CPU中的任意一种设备;
所述网络处理芯片,配置为向所述统计计数设备发送统计请求;
所述主CPU,配置为向所述统计计数设备发送访问请求;
所述统计计数设备为如上述方案任一项所述的统计计数设备。
本发明实施例提供的统计计数设备设置于网络处理芯片外部,所述设备包括:接收单元,配置为接收网络处理芯片发送的统计请求;解析单元,配置为对所述统计请求进行解析,得到统计请求的类型和统计请求的增量;统计单元,配置为根据预设配置将所述统计请求的类型和统计请求的增量转化为存储单元的地址及数据计算增量;根据所述存储单元的地址向对应的存储单元发送读取数据请求;将存储单元返回的所读取的数据与所述数据计算增量进行统计计数运算,并将得到的统计结果写入对应的存储单元;存储单元,配置为存储数据,接收到所述读取数据请求,响应所述读取数据请求并将所读取的数据返回所述统计单元,使所述统计单元能基于所述所读取的数据进行统计运算。
采用本发明实施例,统计计数设备设置于网络处理芯片外部,且统计计数设备内置有存储单元如SRAM,再配合统计计数设备内部的一系列单元能独立完成计数运算,将结果直接返回内置的存储单元进行存储,这种设备架构,不仅能高速实现统计计数功能,同时能避免上述现有技术架构所导致的网络处理芯片的制造成本增加,访问带宽偏小的问题。
附图说明
图1为本发明统计计数设备的组成结构示意图;
图2为本发明统计计数实现方法的实现流程示意图;
图3为本发明系统的组成结构示意图;
图4为应用场景一的本发明第一实施例对应的系统架构示意图;
图5为本发明第一实施例的方法实现流程图;
图6为本发明第一实施例的SST内部实现结构示意图;
图7为本发明第一实施例的HIF内部实现架构示意图;
图8为本发明第一实施例的UPK单元内部实现架构示意图;及数据流图;
图9为本发明第一实施例的PRE内部计算实现流程图;
图10为本发明第一实施例的STAT内部实现架构示意图;及工作流程图;
图11为应用场景二的本发明第二实施例对应的系统架构示意图;
图12为本发明第二实施例的方法实现流程图。
具体实施方式
下面结合附图对技术方案的实施作进一步的详细描述。
本发明实施例的方案是一种能实现高速串行统计计数的方案,利用底层的高速Serdes接口、上层交互接口协议Interlaken、内置的存储单元如SRAM及统计计数的计算逻辑,根据约定的统计请求数据包的格式,实现接收网络处理芯片或主(Host)CPU发出的请求(对于网络处理芯片是统计请求,对于Host CPU,是访问请求),从存储单元如SRAM中读取已有的计数信息,并进行规定的计数运算,将统计计数结果写回到存储单元如SRAM中存储。进而,用户能通过直接内存存取(DMA,Direct Memory Access)接口读取统计计数结果,实现统计请求数据包的接收、解析、统计计数、存储单元如SRAM读写等一系列功能。
本发明实施例能实现高性能、大容量的统计计数,采用一个设置于网络处理芯片外部的外挂的统计计数设备,独立完成统计计数功能,并将用于存储统计计数结果的存储单元,比如SRAM或SDRAM也同样内置于该外挂的统计计数设备中,以解决现有设计中存在的网络处理芯片容量及性能的问题,既不需要占用网络处理芯片内部的SRAM,节约网络处理芯片 的制造成本;又不使用外挂的SDRAM,不存在访问带宽偏小的问题。
这里需要指出的是,所述高速Serdes接口,是包含串行器(SERializer)和解串器(DESerializer)对应接口的统称;所述Interlaken,是新一代数据包互连协议,Interlaken作为一种可扩展协议,支持从10 Gbps到100 Gbps及以上的芯片至芯片数据包传送,能满足当今对更大带宽、更高性能的设计需求。
本发明实施例提供的一种统计计数设备,所述统计计数设备设置于网络处理芯片外部,如图1所示,所述统计计数设备包括:
接收单元,配置为接收网络处理芯片发送的统计请求;
解析单元,配置为对所述统计请求进行解析,得到统计请求的类型和统计请求的增量;
统计单元,配置为根据预设配置将所述统计请求的类型和统计请求的增量转化为存储单元的地址及数据计算增量;根据所述存储单元的地址向对应的存储单元发送读取数据请求;将存储单元返回的所读取的数据与所述数据计算增量进行统计计数运算,并将得到的统计结果写入对应的存储单元;
存储单元,配置为存储数据,接收到所述读取数据请求,响应所述读取数据请求并将所读取的数据返回所述统计单元,使所述统计单元能基于所述所读取的数据进行统计运算。
在本发明实施例一优选实施方式中,所述统计计数设备还包括:
配置单元,配置为接收主CPU发送的访问请求,响应所述访问请求,经由所述统计单元从所述存储单元获取所述统计结果并提供给主CPU使用。
在本发明实施例一优选实施方式中,所述接收单元,还包括:
高速接口模块,配置为采用高速串行器/解串器Serdes接口构成的物理 链路,配合高速传输协议Interlaken来接收所述统计请求。
在本发明实施例一优选实施方式中,所述高速接口模块,还包括:
Serdes转换子模块,配置为完成高速数据的串并转换,并将所述统计请求由串行数据转换为以并行数据传输;
Interlaken协议子模块,配置为将所述Serdes转换子模块传输的所述并行数据基于所述Interlaken封装为Interlaken格式的请求数据包。
在本发明实施例一优选实施方式中,所述解析单元,还包括:
报文解析模块,配置为按照约定的格式对所述统计请求进行解析,得到统计请求的类型和统计请求的增量;
所述统计请求,为由所述网络处理芯片按照所述约定的格式进行封装得到的;
所述约定的格式为基于所述Interlaken封装的格式,基本单元为统计报文切片。
在本发明实施例一优选实施方式中,所述高速接口模块,还包括:
Serdes转换子模块,配置为完成高速数据的串并转换,并将所述统计请求由串行数据转换为以并行数据传输;
Interlaken协议子模块,配置为将所述Serdes转换子模块传输的所述并行数据基于所述Interlaken封装为Interlaken格式的请求数据包;
任一个所述Interlaken格式的请求数据包,包括多个统计报文切片;
相应的,所述解析单元,还包括:
解析子模块,配置为获取任一个所述Interlaken格式的请求数据包,根据统计报文切片的有效标志位读取所述请求数据包,与所述有效标志位匹配时,对得到的一个统计报文切片进行解析,直至读取出所述请求数据包中包含的所有统计报文切片并解析。
在本发明实施例一优选实施方式中,所述解析单元,还包括:
缓存子模块,配置为存储待解析的所有请求数据包;
所述解析子模块,还配置为对当前读取的请求数据包根据统计报文切片的有效标志位读取,以获得所述统计报文切片的情况下,等待当前读取的请求数据包中的所有统计报文切片都处理完毕后再从所述缓存子模块提取下一个请求数据包;
以及,对所述统计报文切片进行解析,得到的所述统计请求的类型包括:区分不同统计业务的ID号、统计业务支持的统计对列数、及统计业务支持的统计项。
在本发明实施例一优选实施方式中,所述统计单元,还包括:
统计预处理模块,配置为根据预设配置将所述统计请求的类型和统计请求的增量转化为存储单元的地址及数据计算增量;
统计计算模块,配置为根据所述存储单元的地址向对应的存储单元发送读取数据请求;将存储单元返回的所读取的数据与所述数据计算增量进行统计计数运算,并将得到的统计结果写入对应的存储单元。
在本发明实施例一优选实施方式中,所述高速接口模块,还包括:
Serdes转换子模块,配置为完成高速数据的串并转换,并将所述统计请求由串行数据转换为以并行数据传输;
Interlaken协议子模块,配置为将所述Serdes转换子模块传输的所述并行数据基于所述Interlaken封装为Interlaken格式的请求数据包;
任一个所述Interlaken格式的请求数据包,包括多个统计报文切片;
所述解析单元,还包括:
解析子模块,配置为获取任一个所述Interlaken格式的请求数据包,根据统计报文切片的有效标志位读取所述请求数据包,与所述有效标志位匹配时,对得到的一个统计报文切片进行解析,解析得到的所述统计请求的类型包括:区分不同统计业务的ID号、统计业务支持的统计对列数及统计 业务支持的统计项;解析得到的所述统计请求的增量包括统计增量;
相应的,所述统计预处理模块还包括:
预处理子模块,配置为获取所述区分不同统计业务的ID号、所述统计业务支持的统计对列数及所述统计业务支持的统计项、所述统计增量及所述预设配置;
以及,基于所述预设配置得到的统计规则,根据所述区分不同统计业务的ID号得到配置信息,根据所述统计业务支持的统计项得到对应的存储单元基地址,根据所述存储单元基地址、所述统计业务支持的统计对列数及所述配置信息计算得到存储单元目标地址,以根据所述存储单元目标地址为所述存储单元的地址来查询对应的存储单元;基于所述预设配置得到的统计规则,根据所述统计增量得到所述数据计算增量。
在本发明实施例一优选实施方式中,所述统计计算模块,还包括:
指令选择子模块,配置为根据预设调度规则对网络处理芯片发送的统计请求及主CPU发送的访问请求中的任意一个请求选择进行响应,并执行对应的统计计数处理或统计结果访问处理;
指令缓存子模块,配置为对指令进行缓存,等待一个指令执行完毕后再提取下一个指令,所述指令包括:根据所述存储单元的地址查询到对应的存储单元后发送读取数据请求的指令、将得到的统计结果写入对应的存储单元的回写数据的指令、提供统计结果给主CPU访问的指令中的至少一种指令;
计算子模块,配置为根据所述存储单元的地址查询到对应的存储单元,发出读取数据请求的指令后,根据返回的所读取的数据与所述统计增量进行统计计数运算。
本发明实施例提供的一种统计计数实现方法,如图2所示,所述方法包括:
步骤101、统计计数设备接收网络处理芯片发送的统计请求;所述统计计数设备设置于网络处理芯片外部;
步骤102、统计计数设备对所述统计请求进行解析,得到统计请求的类型和统计请求的增量;
步骤103、统计计数设备根据预设配置将所述统计请求的类型和统计请求的增量转化为内置存储器的地址及数据计算增量;
步骤104、统计计数设备根据所述存储器的地址向对应的存储器发送读取数据请求,将存储器返回的所读取的数据与所述数据计算增量进行统计计数运算,并将得到的统计结果写入对应的存储器。
在本发明实施例一优选实施方式中,所述方法还包括:
统计计数设备接收主CPU发送的访问请求;
响应所述访问请求,经由所述统计单元从所述存储单元获取所述统计结果并提供给所述主CPU使用。
在本发明实施例一优选实施方式中,所述统计计数设备接收网络处理芯片发送的统计请求,包括:
统计计数设备采用高速串行器/解串器Serdes接口构成的物理链路,配合高速传输协议Interlaken来接收所述统计请求。
在本发明实施例一优选实施方式中,所述方法还包括:
统计计数设备将所述统计请求由串行数据转换为以并行数据传输;
统计计数设备将传输的所述并行数据基于所述Interlaken封装为Interlaken格式的请求数据包。
在本发明实施例一优选实施方式中,所述统计请求为由所述网络处理芯片按照约定的格式进行封装得到的Interlaken格式的请求数据包;
所述方法还包括:所述统计计数设备按照所述约定的格式对所述统计请求进行解析,得到统计请求的类型和统计请求的增量。
在本发明实施例一优选实施方式中,任一个所述Interlaken格式的请求数据包,包括多个统计报文切片;
所述方法还包括:
获取任一个所述Interlaken格式的请求数据包,根据统计报文切片的有效标志位读取所述请求数据包,与所述有效标志位匹配时,对得到的一个统计报文切片进行解析,直至读取出所述请求数据包中包含的所有统计报文切片并解析。
在本发明实施例一优选实施方式中,任一个所述Interlaken格式的请求数据包,包括多个统计报文切片;
所述方法还包括:
在缓存中存储待解析的所有请求数据包;
对当前读取的请求数据包根据统计报文切片的有效标志位读取,以获得所述统计报文切片,等待当前读取的请求数据包中的所有统计报文切片都处理完毕后再从所述缓存中提取下一个请求数据包。
在本发明实施例一优选实施方式中,所述统计请求的类型包括:区分不同统计业务的ID号、统计业务支持的统计对列数、及统计业务支持的统计项;
所述统计请求的增量包括统计增量。
在本发明实施例一优选实施方式中,所述统计计数设备根据预设配置将所述统计请求的类型和统计请求的增量转化为内置存储器的地址及数据计算增量包括:
获取所述区分不同统计业务的ID号、所述统计业务支持的统计对列数、及统计业务支持的统计项、及所述预设配置;
基于所述预设配置得到的统计规则,根据所述区分不同统计业务的ID号得到配置信息,根据所述统计业务支持的统计项得到对应的存储器基地 址,根据所述存储器基地址、所述统计业务支持的统计对列数及所述配置信息计算得到存储器目标地址并作为所述内置存储器的地址;
获取所述统计增量,基于所述预设配置得到的统计规则,得到所述数据计算增量。
在本发明实施例一优选实施方式中,所述统计计数设备根据所述存储器的地址向对应的存储器发送读取数据请求,将存储器返回的所读取的数据与所述统计增量进行统计计数运算,并将得到的统计结果写入对应的存储器,包括:
所述统计计数设备根据所述内置存储器的地址查询到对应的存储器,发出读取数据请求的指令后,根据返回的所读取的数据与所述数据计算增量进行统计计数运算。
本发明实施例提供的一种具有统计计数设备的系统,如图3所示,所述系统包括:统计计数设备,所述系统还包括网络处理芯片、主CPU中的任意一种设备;
所述网络处理芯片,配置为向所述统计计数设备发送统计请求;
所述主CPU,用于向所述统计计数设备发送访问请求;
所述统计计数设备为如上述方案任一项所述的统计计数设备。
基于上述针对设备及方法实现方案的描述,本发明实施例从设备具体应用的实现上来说,结合接收单元具体为高速接口模块(HIF)、解析单元具体为报文解析模块(UPK)、统计单元具体为由统计预处理模块(PRE)及统计计算模块(STAT)构成、存储单元(MEM)具体为SRAM或SDRAM、配置单元具体为配置管理单元(CFM)为例进行描述,所述统计计数设备可以包括以下单元及模块,但是并不限于这里所描述的单元及模块。
对于本发明实施例这种高速串行的统计计数设备来说,统计计数设备独立于网络处理芯片之外,且内置有存储单元,如SRAM或SDRAM,由 于SRAM或SDRAM的处理方式类似,以下都用SRAM来描述,对于这种设备具体应用的实现上来说,具有以下主要内容:
1)HIF,用于与网络处理芯片交互时,利于底层高速串行接口,如Serdes接口和上层高速传输协议Interlaken,来接收网络处理芯片发送过来的统计请求。
2)UPK,用于对外接的网络处理芯片发送的统计请求,按照约定的统计请求数据包的格式进行解析,解析得到两项解析结果:统计请求的请求类型及统计请求的增量。
3)PRE,用于接收报文解析模块的解析结果,并转化为内置的SRAM地址及数据计算增量,所述转化即为:根据所述请求类型与SRAM地址的映射关系,查询到与所述统计请求的类型对应的SRAM地址,以便能根据所述SRAM地址寻址到SRAM;
这里,所述请求类型是为了寻址使用,所述数据请求增量是为了后续的数据统计计数使用,不做赘述。
4)STAT,用于根据PRE发送过来SRAM地址向内置的SRAM发送读请求,以便从SRAM读取数据,用于统计计数;将从SRAM读取的返回数据与所述数据请求增量进行统计计数运算,最后将统计计算结果写回到该SRAM中的相应地址;
另外,统计计数设备得到统计计数结果后,由于能通过Host CPU提供给用户使用,所以,所述STAT还可以用于处理Host CPU的读计数器值命令。
5)MEM,可以采用片内SRAM作为存储介质,也可以采用其他高速momory。SRAM用于存储统计计数的信息,包括:已有计数值及根据新增计数值与已有计数值进行运算,以实现统计计数所得到的统计计算结果,为了保证访问带宽。
这里需要指出的是,所述存储单元,采用片内SRAM作为存储介质时,可以根据需要设计存储容量、群组数量(决定访问口数目,从而支持多路计数器同时计数),另外从ASIC实现的角度考虑,片内SRAM可以由多个SRAM小块组成,多个SRAM小块串成流水线形式,在不影响访问性能的同时保证ASIC实现的可行性。
6)CFM,用于接收Host CPU发送过来的配置命令,根据配置命令访问自身配置的对应寄存器,书写相应的配置项。
另外,CFM还包含DMA模块,以便提供给Host CPU快速读取统计计数器值的通路。所述统计计数器位于所述统计计数设备中,作为基础的计算工具,可以位于所述统计计数设备的所述统计计算模块中。
在本发明实施例一优选实施方式中,所述HIF,可以包括:
Serdes转换子模块,配置为完成高速数据的串并转换功能,将串行数据转换为并行数据;
Interlaken协议子模块,配置为根据为实现高带宽及可靠包传输而优化的互连协议,如interlaken协议,将所述Serdes模块发送过来的所述并行数据封装成统计计数数据包的数据包格式,另外完成链路检测及保护功能。
在本发明实施例一优选实施方式中,Interlaken协议子模块还可以为一组interlaken组件,对应一个interlaken访问口,配置为将本统计计数设备作为一个主网络处理芯片的外挂计数芯片;Interlaken协议子模块还可以为多组interlaken组件,对应多个interlaken访问口,从而实现多个主网络处理芯片共享一篇外挂计数芯片的目标,在满足计数需求的前提下节省系统级的成本。
在本发明实施例一优选实施方式中,所述UPK,可以包括:
缓存子模块,配置为从高速接口模块接收Interlaken格式的数据包,由于一个数据包中包含多个统计报文切片,而本统计计数设备除所述高速接 口模块和所述报文解析模块之外的几个处理模块及子模块都是以统计报文切片作为最小处理单元的,所以存在处理速率差的问题,安排所述缓存子模块正好解决了这样的速率差,起到调速的作用。
解析子模块,配置为从缓存子模块中取出一个所述数据包,根据统计报文切片的切片位宽切成多个统计报文切片,根据统计报文切片的有效标志位将有效切片发送给后续模块及子模块进行处理,每次发送一个统计报文切片,待这个当前数据包中所有有效切片发送完毕之后再从缓存子模块中取出新的数据包。根据统计报文切片规定格式对统计报文切片进行解析,得到区分不同统计业务的ID号、该业务支持的统计对列数(Qnum)、统计增量及该业务支持的统计项(Type)等,为后续单元访问存储单元,如SRAM块、统计计算单元用于计算统计值等提供必要信息。
在本发明实施例一优选实施方式中,所述PRE,配置为接收解析子模块的解析内容,包括区分不同统计业务的ID号、该业务支持的统计对列数(Qnum)、统计增量及该业务支持的统计项(Type)等。另外,读取配置管理单元配置的包括统计规则的配置命令来访问自身配置的对应寄存器信息,写入对应的配置项;
综合上述这些信息按照预设的统计规则,计算得到计数器地址及计数增量。
在本发明实施例一优选实施方式中,所述STAT,具体包括:
指令选择子模块(MUX),用以选择执行由网络处理芯片发送的统计计数请求,还是执行Host CPU发送的DMA读访问请求,调度规则由用户配置;
指令缓存子模块(Cache),用以缓存一定拍数的SRAM访问地址、回写数据及DMA标志等信息;
计算子模块(ALU),用以根据PRE提供的SRAM访问地址,向SRAM 发出读访问命令,待读取到返回数据后,将得到的返回数据及统计预处理模块提供的数据计算增量进行数学运算,最后将计算结果回写到SRAM相应地址中。
另外,所述STAT,还配置为检查新进SRAM访问地址在Cache中是否有相同地址,也就是说有多个请求都访问到SRAM相应地址,如果有的话就按照一定规则进行指令合并,如果不进行指令合并,容易有读写错误。另外,在DMA读访问模式下,向SRAM发出读命令,将读返回数据返回给配置管理单元。
相应地,本发明实施例从方法具体应用的实现上来说,具有以下主要内容:
对于本发明实施例这种高速串行的统计计数实现方法来说,包括:本统计计数设备接收其他芯片发送过来的请求,比如,接收到网络处理芯片的统计请求,本统计计数设备进行统计计数;接收到Host CPU发送的DMA读访问请求,本统计计数设备将统计计数结果提供给Host CPU使用。
对于本统计计数设备进行统计计数而言,所述方法包括:
与网络处理芯片交互时,基于底层高速串行接口,如Serdes接口和上层高速传输协议Interlaken,来接收网络处理芯片发送过来的统计请求;
按照约定的统计数据包的格式对统计请求进行解析,根据解析结果寻址到存储单元,从存储单元读取数据;
根据读取的数据进行统计计数运算,将计算结果回写入存储单元如SRAM,以便后续提供给Host CPU使用。
这里需要指出的是,构成所述统计请求的统计计数数据包的接收、解析处理、向存储单元如SRAM发出访问以得到所读取的返回数据、根据返回数据和数据请求增量进行统计计数运算、回写入存储单元如SRAM等一系列动作采用流水级(pipeline)架构实现,提高系统处理性能。所述pipeline 架构即本统计计数设备的各个单元及模块所形成的架构。
在本发明实施例一优选实施方式中,基于所述Serdes接口接收其他芯片(如所述网络处理芯片)发出的统计请求,具体包括:
统计请求传输的物理通路,采用当前流行的高速串行接口——Serdes接口,配合了上层传输协议——Interlaken,以实现高性能、高通用行的链路传输。
在本发明实施例一优选实施方式中,所述配合上层传输协议,采用能够实现高带宽及可靠包传输而优化的互连协议---Interlaken,能够实现接口协议的高通用性、高兼容性。
在本发明实施例一优选实施方式中,所述统计请求以统计计数数据包的形式存在,所述基于Interlaken协议封装格式的所述统计计数数据包中包含多个统计报文切片,每个统计报文切片按照规定的报文封装格式封装统计计数数据包,与本统计计数设备对应的对端芯片(如网络处理芯片)按照该规定的报文封装格式封装统计计数数据包,本设备接收到统计计数数据包后,由一个专门的解封装模块,如UPK进行解封装,提取统计计数需要的信息进行相应计算。
在本发明实施例一优选实施方式中,定义了一种规定的报文封装格式,即Interlaken封装格式,包含统计计数计算所必需的信息,包括有效标志位、区分不同统计业务的ID号、该业务支持的统计对列数(Qnum)、统计增量及该业务支持的统计项(Type)等。
在本发明实施例一优选实施方式中,所述一个Interlaken封装格式的统计计数数据包中包含的多个统计报文切片,多个统计报文切片之间相互独立,通过相应的有效标志位决定该统计报文切片是否有效。
在本发明实施例一优选实施方式中,所述一个统计报文切片,其中最多包含两个目标统计计数项,对应两组统计计数器。
在本发明实施例一优选实施方式中,所述目标统计计数项,可以对应一个计数器,也可以对应多个计数器,比如TM的一个统计项,可能需要同时统计包个数及包长度,这就需要两个计数器,具体一个统计计数项对应的计数器个数可以由用户配置。
在本发明实施例一优选实施方式中,经过解封装之后的统计报文切片进入PRE,根据相应的标识信息决定计数器在SRAM中的地址信息以及计算增量。
在本发明实施例一优选实施方式中,所述计数器在SRAM中的地址信息以及计算增量进入STAT,STAT完成各个统计项的计数功能,发出SRAM读取命令,接收到SRAM返回值后进行计算,最后将计算结果写回到SRAM相应地址中。
在本发明实施例一优选实施方式中,所述STAT能够对SRAM读写数据可能产生的冲突进行处理,并且能够按照一定的优先级调度处理统计计数命令和DMA读取命令对于SRAM的访问请求。
在本发明实施例一优选实施方式中,所述访问优先级可以采用默认优先级配置,也可以采用用户配置的优先级。
在本发明实施例一优选实施方式中,所述STAT,内部也可以采用pipeline架构实现,为了解决SRAM访问延迟带来的潜在问题,比如多个统计计数数据包指向同一组计数器,从而由于SRAM访问延迟造成计算信息未及时更新导致计数值不准确的问题,设计中安排了一个Cache,用来缓存SRAM访问地址、访问类型标志、计算增量等信息,以避免计算信息未及时更新导致计数值不准确的问题,比如有两个统计计数数据包指向同一组计数器,若计数器对第一个统计计数数据包还未计算结束,就引入了第二个统计计数数据包进行运算,势必会计算有误,采用Cache,可以使得第一个统计计数数据包运算结束后再引入第二个统计计数数据包,这样计算结 果不会有误。
在本发明实施例一优选实施方式中,所述STAT中的Cache,采用队列结构实现,队列深度由SRAM访问延迟决定,如果新进的统计计数数据包指向的SRAM访问地址在指令缓存子模块中能够找到,那么就将指向同一个SRAM地址的访问请求合并(统计计数数据包合并),并将计算增量信息合并。
在本发明实施例一优选实施方式中,所述STAT可以设计多组SRAM访问口,进而实现多组统计计数请求的并行处理。
在本发明实施例一优选实施方式中,所述存储单元如SRAM的规格,可以根据实际需要进行选择,包括SRAM块数据位宽、深度、块数目等。另外,从ASIC实现的角度考虑,SRAM存储块可以由多个SRAM小块组成,多个SRAM小块串成流水线形式,在不影响访问性能的同时保证ASIC实现的可行性。
采用本发明实施例,至少具有下列优点:
本发明实施例由于基于Serdes接口、Interlaken、SRAM及计算逻辑,实现统计请求数据包接收、解析、SRAM存储数据读取、统计计算、回写、DMA访问等功能,与现有计数以网络处理芯片,包括网络处理芯片为网络处理器(NP,Network Processor,)、交换接入处理器(SA,Switch Access)等芯片的片内存储资源或外挂存储器(SRAM或SDRAM)等做法相比,在芯片成本、使用灵活性、通用性及访问带宽等方面有着优势。
以下用具体应用场景对本发明实施例进行具体阐述。
应用场景一:基于统计计数设备、网络处理芯片和CPU构成的具有统计计数设备的系统实现统计计数功能,其中,与统计计数设备交互的网络处理芯片为一个的场景。
如图4所示为本应用场景下的本发明第一实施例的高速串行统计计数 设备(SST,Serial Statistics),作为一片网络处理芯片(以NP或SA为例)的外挂统计芯片的应用。图4中,NP/SA将统计请求通过Serdes接口发送给SST,SST接收到统计请求后,按照以下图5所示的步骤进行统计计数,并将计数值写入内置SRAM中,SST接收还可以接收Host CPU发送过来的读计数器值的访问请求并进行对应的处理。
如图5所示,基于图4具备统计计数设备的系统架构,能实现以下步骤:
步骤501、NP/SA将统计请求按照SST规定的统计请求数据包的格式封装成统计报文切片;
步骤502、NP/SA将多个统计报文切片合并填充到一拍Interlaken数据包中,通过Serdes接口发送出去;
步骤503、串行数据在Serdes接口构成的物理链路上传输;
步骤504、串行数据通过Serdes接口构成的物理链路传进SST后,HIF按照Interlaken协议得到统计请求数据包;
步骤505、UPK对统计请求数据包进行解析,输出两项解析内容:统计请求的类型及统计请求的增量;
步骤506、PRE接收UPK的解析内容,并将统计请求的类型及统计请求的增量转化为SRAM的地址及计算增量;
步骤507、STAT根据PRE发送过来SRAM地址向SRAM发送读请求,将SRAM读返回数据与统计请求增量进行运算,最后将计算结果写回到SRAM中相应地址;
这里,STAT还可以处理CPU读计数器计数值的访问请求;
步骤508,Host CPU通过CFM向STAT发送读计数器值请求,并接收相应返回值。
如图6所示为SST内部总体实现架构,所述SST与所述NP/SA进行交 互,所述SST包括:
HIF,配置为接收NP/SA发送过来的统计请求;
UPK,配置为对SST外接的NP/SA发送的统计请求进行解析,输出两项解析内容:统计请求的类型及统计请求的增量;
PRE,配置为接收UPK输出的解析内容,并将统计请求转化为SRAM的地址及计算增量;
STAT,配置为根据PRE发送过来SRAM地址向SRAM发送读请求,将SRAM读返回数据与统计请求增量进行运算,最后将计算结果写回到SRAM中相应地址,另外还处理Host CPU的读计数器值命令;
MEM,可以采用SRAM作为存储介质,可以根据需要设计存储容量、群组数量(决定访问口数目,从而支持多路计数器同时计数),另外从ASIC实现的角度考虑,SRAM存储块可以由多个SRAM小块组成,多个SRAM小块串成流水线形式,在不影响访问性能的同时保证ASIC实现的可行性。本实施例中,MEM采用SRAM作为存储介质,出两组访问口,用以最多支持两组统计计数项统计访问;
CFM,配置为接收Host CPU发送过来的配置命令,访问相应寄存器,书写相应配置项;另外包含DMA模块(图中未显示),提供Host CPU快速读取统计计数器值通路。本实施例中,为了提供DMA访问速率,可以采用PCIe作为CPU访问通路。
如图7所示为HIF内部实现架构,图7中还显示有基于HIF的数据流传输示意,所述HIF包括:
Serdes转换子模块,配置为接收NP/SA发送过来的高速串行bit流数据,完成串并转换功能,Serdes链路数目不定,可根据实际应用中带宽要求进行选择。
Interlaken协议子模块,配置为根据为实现高带宽及可靠包传输而优化 的互连协议——Interlaken协议,将Serdes转换子模块发送过来的并行数据封装成统计报文数据包格式,另外完成链路检测及保护功能。
这里需要指出的是,标准Interlaken数据报文如表1所示,包括数据有效标志pkt_ena,包头pkt_sop,包尾pkt_eop,错误标志pkt_err,包数据pkt_dat等。本实施例中,interlaken数据包长度为3拍,每拍interlaken数据位宽为1024bit,实际运用中,不限于此示意场景。
Figure PCTCN2014087236-appb-000001
表1
如图8所示为UPK内部实现架构,图8中还显示有基于UPK的数据流传输示意,所述UPK包括:
缓存子模块,配置为从HIF接收Interlaken格式的请求数据包,剔除错误包,将有效数据包所有相关信息拼接起来存入缓存FIFO。由于一个数据包中包含多个统计报文切片,而后面处理模块及子模块都是以统计报文切片作为最小处理单元的,所以存在处理速率差的问题,安排缓存子模块正好解决了这样的速率差。
解析子模块,从缓存子模块中取出一个数据包,根据切片位宽切成多个统计报文切片,根据切片有效标志位将有效切片发送给后续处理模块及子模块,每次发送一个统计报文切片,待这个数据包中所有有效切片发送完毕之后再从缓存子模块中取出新的数据包。根据统计报文切片规定格式对统计报文切片进行解析,得到区分不同统计业务的ID号、该业务支持的统计对列数(Queue)、统计增量及该业务支持的统计项(Type)等,为后续单元访问SRAM块、计算统计值等提供必要信息。本实施例中,统计报文切片所包含的内容上述表1所示,表1为标准报文格式,只是为了示例说明,并不限制于这种格式,也可以在此基础上进行扩展。
如表1所示,该统计报文切片位宽88bit,包含两组统计业务项,分别占用44bit,两组统计业务项之间相互独立。两组统计业务项字段格式相同,包含如下几个字段:
Vld:业务有效标志位;
ID:业务id,区分不同的统计业务项,最多支持8种不同的统计业务项,比如TM入队统计、TM出队统计、OAM统计等;
Qnum:统计队列数,最多支持1M的统计队列,比如TM需要支持的统计队列数为512K,那么该字段低19bit就表示队列号,最高bit位为0;
Len:统计增量,最大支持增量为32K,比如TM统计需要统计报文长度,那么就可以把报文长度数据放进这个字段;
Type:id下一个统计队列中需要统计的统计项数目,最大支持16个统计项。
比如业务ID为0,对应TM入队统计,需要支持的队列数为512K,Qnum低19bit对应队列号,最高为固定为0,每个队列需要支持11个统计项,包括正常入队、TD丢弃、禁用丢弃、WRED/GRED第0~7优先级丢包等,分别对应Type号0~10,每个数据包支持报文长度统计,报文长度由 Len字段表示。
这里需要指出的是,PRE,配置为接收UPK单元的解析内容,并将统计请求转化为SRAM的地址及计算增量。具体的,可以根据所述ID号、所述Qnum、所述Type字段及用户预设的一些配置信息对应的统计规则就可以计算得到目标计数器对应的SRAM地址,计算流程如图9所示,包括:
步骤901,根据收到的统计请求ID号信息查询用户配置寄存器,从寄存器中得到该ID对应业务在存储器中的起始地址、该业务计数方式(只记包个数还是既记包个数又记包长度或者只记包长度)等后续计算计数器地址所必需信息;
步骤902,根据Type号信息查询用户配置寄存器,从寄存器中得到该Type对应统计项在存储器中的相对ID基地址的偏移地址;
步骤903,根据Qnum及前面查询得到的配置信息计算存储器地址。
具体来说,在本实施例中,ID号为0,表示TM入队统计,对应在存储器中的起始地址为base_addr_id0,统计计数方式为只记包个数,Type号为0,表示正常入队,其偏移地址为base_addr_type0,SRAM数据位宽为100bit,一个计数器位宽为50bit,那么一个SRAM地址能够存放两个计数器,因此队列号为Qnum的统计项对应的SRAM存储器地址为:
Addr=base_addr_id0+base_addr_type0+Qnum/2
如果Qnum是奇数,那么那么该计数器对应Addr位置对应的100bit中低50bit,否则对应高50bit。
如果对于一个统计项既记包个数又记包长度,那么一个统计项对应两个计数器,需要占用100bit存储空间,占用一个SRAM地址,所以这种情况下队列号为Qnum的统计项对应的SRAM存储器地址为:
Addr=base_addr_id0+base_addr_type0+Qnum
高50bit对应包个数计数器,低50bit对应包长度计数器。
如图10所示为STAT内部实现架构示意图,所述STAT与所述PRE进 行交互,图10还显示有基于STAT的pipeline架构所实现的数据流传输,所述STAT包括:
MUX,接收来自PRE的统计请求及来自CFM的DMA读访问请求,根据用户配置的优先级选择命令响应;
Cache,由于SST内部采用pipeline架构实现,为了解决SRAM访问延迟带来的潜在问题,比如多个统计请求数据包指向同一组计数器,从而由于SRAM访问延迟造成计算信息未及时更新导致计数值不准确的问题,本实施例的设计中安排了Cache这个用于指令缓存的模块,用来缓存SRAM访问地址、访问类型标志、计算增量等信息;
ALU,接收MUX调度出来的统计请求或DMA读访问请求,将SRAM访问地址与Cache中缓存的地址比对,如果没有相同的,那么就直接向MEM发送读访问请求,并将该地址信息写入Cache;否则,将新的访问请求与Cache中缓存的地址合并,并将其携带的计数器增量也进行合并结算,更新Cache中内容。待MEM返回读数据后,根据PRE提供的计数规则,将数据计算增量与读返回数据进行数学运算,得到新的计数值信息,向MEM发送写命令请求,将计数值写入MEM中。如果命令请求是DMA读访问请求,那么将计数器值返回CFM;此时写入MEM中的数据由读清模式决定,如果是读清,那么将数值0写入MEM中,否则将原数值写入MEM。
应用场景二:基于统计计数设备、网络处理芯片和CPU构成的具有统计计数设备的系统实现统计计数功能,其中,与统计计数设备交互的网络处理芯片为二个的场景,本发明不限于二个的场景,还可以为二个以上的场景,不做赘述。
如图11所示为本应用场景下的本发明第二实施例的高速串行统计计数设备(SST,Serial Statistics),作为两片网络处理芯片(以NP或SA为例)的外挂统计芯片的应用。图11中,NP/SA将统计请求通过Serdes接口发 送给SST,SST接收到统计请求后,按照以下图12所示的步骤进行统计计数,并将计数值写入内置SRAM中,SST接收还可以接收Host CPU发送过来的读计数器值的访问请求并进行对应的处理。
本实施例与上述本发明第一实施例类似,只是上述本发明第一实施例中,SST作为一片网络处理芯片的外挂统计芯片应用,而本实施例中,SST作为两片网络处理芯片的外挂统计芯片应用,两个实施例内部实现结构及流程基本相同。如图12所示的具体工作过程包括:
步骤1201、两片NP/SA内部分别将统计请求按照SST规定的统计报文格式封装成统计报文切片;
步骤1202、两片NP/SA分别将多个统计报文切片合并填充到一拍Interlaken数据包中,通过Serdes接口发送出去;
步骤1203、串行数据在两组Serdes接口构成的物理链路上传输;
步骤1204、串行数据通过Serdes接口构成的物理链路传进SST后,HIF按照Interlaken协议得到两组统计请求数据包;
步骤1205、UPK对统计请求数据包进行解析,输出两项解析内容:统计请求的类型及统计请求的增量;
步骤1206、PRE接收UPK的解析内容,并将统计请求的类型及统计请求的增量转化为SRAM的地址及计算增量;
步骤1207、STAT根据PRE发送过来SRAM地址向SRAM发送读请求,将SRAM读返回数据与统计请求增量进行运算,最后将计算结果写回到SRAM中相应地址;
这里,STAT还可以处理CPU读计数器计数值的访问请求;
步骤1208、Host CPU通过CFM向STAT发送读计数器值请求,并接收相应返回值。
本实施例涉及到SST内部总体实现架构、HIF内部实现架构、UPK内 部实现架构、及基于PRE实现的统计预处理流程、STAT内部实现架构的描述,都与本发明第一实施例是同样的,具体描述如下:
如图6所示为SST内部总体实现架构,所述SST与所述NP/SA进行交互,所述SST包括:
HIF,配置为接收NP/SA发送过来的统计请求;
UPK,配置为对SST外接的NP/SA发送的统计请求进行解析,输出两项解析内容:统计请求的类型及统计请求的增量;
PRE,配置为接收UPK输出的解析内容,并将统计请求转化为SRAM的地址及计算增量;
STAT,配置为根据PRE发送过来SRAM地址向SRAM发送读请求,将SRAM读返回数据与统计请求增量进行运算,最后将计算结果写回到SRAM中相应地址,另外还处理Host CPU的读计数器值命令;
MEM,可以采用SRAM作为存储介质,可以根据需要设计存储容量、群组数量(决定访问口数目,从而支持多路计数器同时计数),另外从ASIC实现的角度考虑,SRAM存储块可以由多个SRAM小块组成,多个SRAM小块串成流水线形式,在不影响访问性能的同时保证ASIC实现的可行性。本实施例中,MEM采用SRAM作为存储介质,出两组访问口,用以最多支持两组统计计数项统计访问;
CFM,配置为接收Host CPU发送过来的配置命令,访问相应寄存器,书写相应配置项;另外包含DMA模块(图中未显示),提供Host CPU快速读取统计计数器值通路。本实施例中,为了提供DMA访问速率,可以采用PCIe作为CPU访问通路。
如图7所示为HIF内部实现架构,图7中还显示有基于HIF的数据流传输示意,所述HIF包括:
Serdes转换子模块,配置为接收NP/SA发送过来的高速串行bit流数据, 完成串并转换功能,Serdes链路数目不定,可根据实际应用中带宽要求进行选择。
Interlaken协议子模块,配置为根据为实现高带宽及可靠包传输而优化的互连协议——Interlaken协议,将Serdes转换子模块发送过来的并行数据封装成统计报文数据包格式,另外完成链路检测及保护功能。
这里需要指出的是,标准Interlaken数据报文如表2所示,包括数据有效标志pkt_ena,包头pkt_sop,包尾pkt_eop,错误标志pkt_err,包数据pkt_dat等。本实施例中,interlaken数据包长度为3拍,每拍interlaken数据位宽为1024bit,实际运用中,不限于此示意场景。
Figure PCTCN2014087236-appb-000002
表2
如图8所示为UPK内部实现架构,图8中还显示有基于UPK的数据流传输示意,所述UPK包括:
缓存子模块,配置为从HIF接收Interlaken格式的请求数据包,剔除错误包,将有效数据包所有相关信息拼接起来存入缓存FIFO。由于一个数据 包中包含多个统计报文切片,而后面处理模块及子模块都是以统计报文切片作为最小处理单元的,所以存在处理速率差的问题,安排缓存子模块正好解决了这样的速率差。
解析子模块,从缓存子模块中取出一个数据包,根据切片位宽切成多个统计报文切片,根据切片有效标志位将有效切片发送给后续处理模块及子模块,每次发送一个统计报文切片,待这个数据包中所有有效切片发送完毕之后再从缓存子模块中取出新的数据包。根据统计报文切片规定格式对统计报文切片进行解析,得到区分不同统计业务的ID号、该业务支持的统计对列数(Queue)、统计增量及该业务支持的统计项(Type)等,为后续单元访问SRAM块、计算统计值等提供必要信息。本实施例中,统计报文切片所包含的内容上述表2所示,表2为标准报文格式,只是为了示例说明,并不限制于这种格式,也可以在此基础上进行扩展。
如表2所示,该统计报文切片位宽88bit,包含两组统计业务项,分别占用44bit,两组统计业务项之间相互独立。两组统计业务项字段格式相同,包含如下几个字段:
Vld:业务有效标志位;
ID:业务id,区分不同的统计业务项,最多支持8种不同的统计业务项,比如TM入队统计、TM出队统计、OAM统计等;
Qnum:统计队列数,最多支持1M的统计队列,比如TM需要支持的统计队列数为512K,那么该字段低19bit就表示队列号,最高bit位为0;
Len:统计增量,最大支持增量为32K,比如TM统计需要统计报文长度,那么就可以把报文长度数据放进这个字段;
Type:id下一个统计队列中需要统计的统计项数目,最大支持16个统计项。
比如业务ID为0,对应TM入队统计,需要支持的队列数为512K, Qnum低19bit对应队列号,最高为固定为0,每个队列需要支持11个统计项,包括正常入队、TD丢弃、禁用丢弃、WRED/GRED第0~7优先级丢包等,分别对应Type号0~10,每个数据包支持报文长度统计,报文长度由Len字段表示。
这里需要指出的是,PRE,配置为接收UPK单元的解析内容,并将统计请求转化为SRAM的地址及计算增量。具体的,可以根据所述ID号、所述Qnum、所述Type字段及用户预设的一些配置信息对应的统计规则就可以计算得到目标计数器对应的SRAM地址,计算流程如图9所示,包括:
步骤901,根据收到的统计请求ID号信息查询用户配置寄存器,从寄存器中得到该ID对应业务在存储器中的起始地址、该业务计数方式(只记包个数还是既记包个数又记包长度或者只记包长度)等后续计算计数器地址所必需信息;
步骤902,根据Type号信息查询用户配置寄存器,从寄存器中得到该Type对应统计项在存储器中的相对ID基地址的偏移地址;
步骤903,根据Qnum及前面查询得到的配置信息计算存储器地址。
具体来说,在本实施例中,ID号为0,表示TM入队统计,对应在存储器中的起始地址为base_addr_id0,统计计数方式为只记包个数,Type号为0,表示正常入队,其偏移地址为base_addr_type0,SRAM数据位宽为100bit,一个计数器位宽为50bit,那么一个SRAM地址能够存放两个计数器,因此队列号为Qnum的统计项对应的SRAM存储器地址为:
Addr=base_addr_id0+base_addr_type0+Qnum/2
如果Qnum是奇数,那么那么该计数器对应Addr位置对应的100bit中低50bit,否则对应高50bit。
如果对于一个统计项既记包个数又记包长度,那么一个统计项对应两个计数器,需要占用100bit存储空间,占用一个SRAM地址,所以这种情况下队列号为Qnum的统计项对应的SRAM存储器地址为:
Addr=base_addr_id0+base_addr_type0+Qnum
高50bit对应包个数计数器,低50bit对应包长度计数器。
如图10所示为STAT内部实现架构示意图,所述STAT与所述PRE进行交互,图10还显示有基于STAT的pipeline架构所实现的数据流传输,所述STAT包括:
MUX,接收来自PRE的统计请求及来自CFM的DMA读访问请求,根据用户配置的优先级选择命令响应;
Cache,由于SST内部采用pipeline架构实现,为了解决SRAM访问延迟带来的潜在问题,比如多个统计请求数据包指向同一组计数器,从而由于SRAM访问延迟造成计算信息未及时更新导致计数值不准确的问题,本实施例的设计中安排了Cache这个用于指令缓存的模块,用来缓存SRAM访问地址、访问类型标志、计算增量等信息;
ALU,接收MUX调度出来的统计请求或DMA读访问请求,将SRAM访问地址与Cache中缓存的地址比对,如果没有相同的,那么就直接向MEM发送读访问请求,并将该地址信息写入Cache;否则,将新的访问请求与Cache中缓存的地址合并,并将其携带的计数器增量也进行合并结算,更新Cache中内容。待MEM返回读数据后,根据PRE提供的计数规则,将数据计算增量与读返回数据进行数学运算,得到新的计数值信息,向MEM发送写命令请求,将计数值写入MEM中。如果命令请求是DMA读访问请求,那么将计数器值返回CFM;此时写入MEM中的数据由读清模式决定,如果是读清,那么将数值0写入MEM中,否则将原数值写入MEM。
本发明实施例所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算 机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本发明实施例不限制于任何特定的硬件和软件结合。
相应的,本发明实施例还提供一种计算机存储介质,其中存储有计算机程序,该计算机程序用于执行本发明实施例的统计计数实现方法。
以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。
工业实用性
采用本发明实施例,统计计数设备设置于网络处理芯片外部,且统计计数设备内置有存储单元如SRAM,再配合统计计数设备内部的一系列单元能独立完成计数运算,将结果直接返回内置的存储单元进行存储,这种设备架构,不仅能高速实现统计计数功能,同时能避免上述现有技术架构所导致的网络处理芯片的制造成本增加,访问带宽偏小的问题。

Claims (21)

  1. 一种统计计数设备,所述统计计数设备设置于网络处理芯片外部,所述设备包括:
    接收单元,配置为接收网络处理芯片发送的统计请求;
    解析单元,配置为对所述统计请求进行解析,得到统计请求的类型和统计请求的增量;
    统计单元,配置为根据预设配置将所述统计请求的类型和统计请求的增量转化为存储单元的地址及数据计算增量;根据所述存储单元的地址向对应的存储单元发送读取数据请求;将存储单元返回的所读取的数据与所述数据计算增量进行统计计数运算,并将得到的统计结果写入对应的存储单元;
    存储单元,配置为存储数据,接收到所述读取数据请求,响应所述读取数据请求并将所读取的数据返回所述统计单元。
  2. 根据权利要求1所述的设备,其中,所述统计计数设备还包括:
    配置单元,配置为接收主CPU发送的访问请求,响应所述访问请求,经由所述统计单元从所述存储单元获取所述统计结果并提供给主CPU使用。
  3. 根据权利要求1所述的设备,其中,所述接收单元,还包括:
    高速接口模块,配置为采用高速串行器/解串器Serdes接口构成的物理链路,配合高速传输协议Interlaken来接收所述统计请求。
  4. 根据权利要求3所述的设备,其中,所述高速接口模块,还包括:
    Serdes转换子模块,配置为完成高速数据的串并转换,并将所述统计请求由串行数据转换为以并行数据传输;
    Interlaken协议子模块,配置为将所述Serdes转换子模块传输的所述并行数据基于所述Interlaken封装为Interlaken格式的请求数据包。
  5. 根据权利要求1所述的设备,其中,所述解析单元,还包括:
    报文解析模块,配置为按照约定的格式对所述统计请求进行解析,得到统计请求的类型和统计请求的增量;
    所述统计请求,为由所述网络处理芯片按照所述约定的格式进行封装得到的;
    所述约定的格式为基于所述Interlaken封装的格式,基本单元为统计报文切片。
  6. 根据权利要求5所述的设备,其中,所述接收单元,还包括:
    采用高速串行器/解串器Serdes接口构成的物理链路,配合高速传输协议Interlaken来接收所述统计请求的高速接口模块;
    所述高速接口模块,还包括:
    Serdes转换子模块,配置为完成高速数据的串并转换,并将所述统计请求由串行数据转换为以并行数据传输;
    Interlaken协议子模块,配置为将所述Serdes转换子模块传输的所述并行数据基于所述Interlaken封装为Interlaken格式的请求数据包;
    任一个所述Interlaken格式的请求数据包,包括多个统计报文切片;
    相应的,所述解析单元,还包括:
    解析子模块,配置为获取任一个所述Interlaken格式的请求数据包,根据统计报文切片的有效标志位读取所述请求数据包,与所述有效标志位匹配时,对得到的一个统计报文切片进行解析,直至读取出所述请求数据包中包含的所有统计报文切片并解析。
  7. 根据权利要求6所述的设备,其中,所述解析单元,还包括:
    缓存子模块,配置为存储待解析的所有请求数据包;
    所述解析子模块,还配置为对当前读取的请求数据包根据统计报文切片的有效标志位读取,以获得所述统计报文切片的情况下,等待当前读取 的请求数据包中的所有统计报文切片都处理完毕后再从所述缓存子模块提取下一个请求数据包;
    以及,对所述统计报文切片进行解析,得到的所述统计请求的类型包括:区分不同统计业务的ID号、统计业务支持的统计对列数、及统计业务支持的统计项。
  8. 根据权利要求1所述的设备,其中,所述统计单元,还包括:
    统计预处理模块,配置为根据预设配置将所述统计请求的类型和统计请求的增量转化为存储单元的地址及数据计算增量;
    统计计算模块,配置为根据所述存储单元的地址向对应的存储单元发送读取数据请求;将存储单元返回的所读取的数据与所述数据计算增量进行统计计数运算,并将得到的统计结果写入对应的存储单元。
  9. 根据权利要求8所述的设备,其中,所述接收单元,还包括:
    采用高速串行器/解串器Serdes接口构成的物理链路,配合高速传输协议Interlaken来接收所述统计请求的高速接口模块;
    所述高速接口模块,还包括:
    Serdes转换子模块,配置为完成高速数据的串并转换,并将所述统计请求由串行数据转换为以并行数据传输;
    Interlaken协议子模块,配置为将所述Serdes转换子模块传输的所述并行数据基于所述Interlaken封装为Interlaken格式的请求数据包;
    任一个所述Interlaken格式的请求数据包,包括多个统计报文切片;
    所述解析单元,还包括:
    解析子模块,配置为获取任一个所述Interlaken格式的请求数据包,根据统计报文切片的有效标志位读取所述请求数据包,与所述有效标志位匹配时,对得到的一个统计报文切片进行解析,解析得到的所述统计请求的类型包括:区分不同统计业务的ID号、统计业务支持的统计对列数及统计 业务支持的统计项;解析得到的所述统计请求的增量包括统计增量;
    相应的,所述统计预处理模块还包括:
    预处理子模块,配置为获取所述区分不同统计业务的ID号、所述统计业务支持的统计对列数及所述统计业务支持的统计项、所述统计增量及所述预设配置;
    以及,基于所述预设配置得到的统计规则,根据所述区分不同统计业务的ID号得到配置信息,根据所述统计业务支持的统计项得到对应的存储单元基地址,根据所述存储单元基地址、所述统计业务支持的统计对列数及所述配置信息计算得到存储单元目标地址,以根据所述存储单元目标地址为所述存储单元的地址来查询对应的存储单元;基于所述预设配置得到的统计规则,根据所述统计增量得到所述数据计算增量。
  10. 根据权利要求9所述的设备,其中,所述统计计算模块,还包括:
    指令选择子模块,配置为根据预设调度规则对网络处理芯片发送的统计请求及主CPU发送的访问请求中的任意一个请求选择进行响应,并执行对应的统计计数处理或统计结果访问处理;
    指令缓存子模块,配置为对指令进行缓存,等待一个指令执行完毕后再提取下一个指令,所述指令包括:根据所述存储单元的地址查询到对应的存储单元后发送读取数据请求的指令、将得到的统计结果写入对应的存储单元的回写数据的指令、提供统计结果给主CPU访问的指令中的至少一种指令;
    计算子模块,配置为根据所述存储单元的地址查询到对应的存储单元,发出读取数据请求的指令后,根据返回的所读取的数据与所述统计增量进行统计计数运算。
  11. 一种统计计数实现方法,所述方法包括:
    统计计数设备接收网络处理芯片发送的统计请求;所述统计计数设备 设置于网络处理芯片外部;
    统计计数设备对所述统计请求进行解析,得到统计请求的类型和统计请求的增量;
    统计计数设备根据预设配置将所述统计请求的类型和统计请求的增量转化为内置存储器的地址及数据计算增量;统计计数设备根据所述存储器的地址向对应的存储器发送读取数据请求,将存储器返回的所读取的数据与所述数据计算增量进行统计计数运算,并将得到的统计结果写入对应的存储器。
  12. 根据权利要求11所述的方法,其中,所述方法还包括:
    统计计数设备接收主CPU发送的访问请求;
    响应所述访问请求,经由所述统计单元从所述存储单元获取所述统计结果并提供给所述主CPU使用。
  13. 根据权利要求11所述的方法,其中,所述统计计数设备接收网络处理芯片发送的统计请求,包括:
    统计计数设备采用高速串行器/解串器Serdes接口构成的物理链路,配合高速传输协议Interlaken来接收所述统计请求。
  14. 根据权利要求13所述的方法,其中,所述方法还包括:
    统计计数设备将所述统计请求由串行数据转换为以并行数据传输;
    统计计数设备将传输的所述并行数据基于所述Interlaken封装为Interlaken格式的请求数据包。
  15. 根据权利要求14所述的方法,其中,所述统计请求为由所述网络处理芯片按照约定的格式进行封装得到的Interlaken格式的请求数据包;
    所述方法还包括:所述统计计数设备按照所述约定的格式对所述统计请求进行解析,得到统计请求的类型和统计请求的增量。
  16. 根据权利要求15所述的方法,其中,任一个所述Interlaken格式 的请求数据包,包括多个统计报文切片;
    所述方法还包括:
    获取任一个所述Interlaken格式的请求数据包,根据统计报文切片的有效标志位读取所述请求数据包,与所述有效标志位匹配时,对得到的一个统计报文切片进行解析,直至读取出所述请求数据包中包含的所有统计报文切片并解析。
  17. 根据权利要求15所述的方法,其中,任一个所述Interlaken格式的请求数据包,包括多个统计报文切片;
    所述方法还包括:
    在缓存中存储待解析的所有请求数据包;
    对当前读取的请求数据包根据统计报文切片的有效标志位读取,以获得所述统计报文切片,等待当前读取的请求数据包中的所有统计报文切片都处理完毕后再从所述缓存中提取下一个请求数据包。
  18. 根据权利要求16或17所述的方法,其中,所述统计请求的类型包括:区分不同统计业务的ID号、统计业务支持的统计对列数、及统计业务支持的统计项;
    所述统计请求的增量包括统计增量。
  19. 根据权利要求18所述的方法,其中,所述统计计数设备根据预设配置将所述统计请求的类型和统计请求的增量转化为内置存储器的地址及数据计算增量包括:
    获取所述区分不同统计业务的ID号、所述统计业务支持的统计对列数、及统计业务支持的统计项、及所述预设配置;
    基于所述预设配置得到的统计规则,根据所述区分不同统计业务的ID号得到配置信息,根据所述统计业务支持的统计项得到对应的存储器基地址,根据所述存储器基地址、所述统计业务支持的统计对列数及所述配置 信息计算得到存储器目标地址并作为所述内置存储器的地址;
    获取所述统计增量,基于所述预设配置得到的统计规则,得到所述数据计算增量。
  20. 根据权利要求19所述的方法,其中,所述统计计数设备根据所述存储器的地址向对应的存储器发送读取数据请求,将存储器返回的所读取的数据与所述统计增量进行统计计数运算,并将得到的统计结果写入对应的存储器,包括:
    所述统计计数设备根据所述内置存储器的地址查询到对应的存储器,发出读取数据请求的指令后,根据返回的所读取的数据与所述数据计算增量进行统计计数运算。
  21. 一种具有统计计数设备的系统,所述系统包括:统计计数设备,所述系统还包括网络处理芯片、主CPU中的任意一种设备;
    所述网络处理芯片,配置为向所述统计计数设备发送统计请求;
    所述主CPU,配置为向所述统计计数设备发送访问请求;
    所述统计计数设备为如权利要求1至10任一项所述的统计计数设备。
PCT/CN2014/087236 2014-06-05 2014-09-23 统计计数设备及其实现方法、具有统计计数设备的系统 WO2015184706A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410248238.6 2014-06-05
CN201410248238.6A CN105207794B (zh) 2014-06-05 2014-06-05 统计计数设备及其实现方法、具有统计计数设备的系统

Publications (1)

Publication Number Publication Date
WO2015184706A1 true WO2015184706A1 (zh) 2015-12-10

Family

ID=54765996

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/087236 WO2015184706A1 (zh) 2014-06-05 2014-09-23 统计计数设备及其实现方法、具有统计计数设备的系统

Country Status (2)

Country Link
CN (1) CN105207794B (zh)
WO (1) WO2015184706A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084219A (zh) * 2020-09-16 2020-12-15 京东数字科技控股股份有限公司 用于处理数据的方法、装置、电子设备和介质
CN112131155A (zh) * 2020-09-29 2020-12-25 中国船舶重工集团公司第七二四研究所 一种高扩展性的基于fpga的pcie事务层传输方法
CN112507005A (zh) * 2019-09-16 2021-03-16 北京京东振世信息技术有限公司 一种处理报文的方法和装置

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108733408A (zh) * 2017-04-21 2018-11-02 上海寒武纪信息科技有限公司 计数装置及计数方法
EP3654172A1 (en) 2017-04-19 2020-05-20 Shanghai Cambricon Information Technology Co., Ltd Fused vector multiplier and method using the same
CN108733348B (zh) 2017-04-21 2022-12-09 寒武纪(西安)集成电路有限公司 融合向量乘法器和使用其进行运算的方法
CN109145027A (zh) * 2017-06-19 2019-01-04 中兴通讯股份有限公司 数据统计方法、装置、设备及计算机可读存储介质
CN111401541A (zh) * 2020-03-10 2020-07-10 湖南国科微电子股份有限公司 一种数据传输控制方法及装置
CN112667537A (zh) * 2021-01-05 2021-04-16 烽火通信科技股份有限公司 流量上报方法、装置、设备及可读存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101000589A (zh) * 2006-12-22 2007-07-18 清华大学 一种自适应的外部存储设备io性能优化方法
CN101634939A (zh) * 2008-07-24 2010-01-27 中兴通讯股份有限公司 一种快速寻址装置和方法
CN102270178A (zh) * 2011-08-02 2011-12-07 中兴通讯股份有限公司 统计信息存储方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101000589A (zh) * 2006-12-22 2007-07-18 清华大学 一种自适应的外部存储设备io性能优化方法
CN101634939A (zh) * 2008-07-24 2010-01-27 中兴通讯股份有限公司 一种快速寻址装置和方法
CN102270178A (zh) * 2011-08-02 2011-12-07 中兴通讯股份有限公司 统计信息存储方法及装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507005A (zh) * 2019-09-16 2021-03-16 北京京东振世信息技术有限公司 一种处理报文的方法和装置
CN112507005B (zh) * 2019-09-16 2024-04-16 北京京东振世信息技术有限公司 一种处理报文的方法和装置
CN112084219A (zh) * 2020-09-16 2020-12-15 京东数字科技控股股份有限公司 用于处理数据的方法、装置、电子设备和介质
CN112131155A (zh) * 2020-09-29 2020-12-25 中国船舶重工集团公司第七二四研究所 一种高扩展性的基于fpga的pcie事务层传输方法
CN112131155B (zh) * 2020-09-29 2024-04-26 中国船舶集团有限公司第七二四研究所 一种高扩展性的基于fpga的pcie事务层传输方法

Also Published As

Publication number Publication date
CN105207794A (zh) 2015-12-30
CN105207794B (zh) 2019-11-05

Similar Documents

Publication Publication Date Title
WO2015184706A1 (zh) 统计计数设备及其实现方法、具有统计计数设备的系统
US20240171507A1 (en) System and method for facilitating efficient utilization of an output buffer in a network interface controller (nic)
CN101771627B (zh) 互联网实时深度包解析和控制节点设备和方法
US20190044879A1 (en) Technologies for reordering network packets on egress
US8543754B2 (en) Low latency precedence ordering in a PCI express multiple root I/O virtualization environment
CN102185833B (zh) 一种基于fpga的fc i/o并行处理方法
US20200249874A1 (en) High Speed Data Packet Flow Processing
WO2021052374A1 (zh) 网络拥塞控制方法、节点、系统及存储介质
US7529865B1 (en) Packet buffer management apparatus and method
US9678891B2 (en) Efficient search key controller with standard bus interface, external memory interface, and interlaken lookaside interface
US11425057B2 (en) Packet processing
WO2015027806A1 (zh) 一种内存数据的读写处理方法和装置
US10430364B2 (en) Packet forwarding
CN111026324B (zh) 转发表项的更新方法及装置
US11995017B2 (en) Multi-plane, multi-protocol memory switch fabric with configurable transport
CN111641622A (zh) 一种融合网络接口卡、报文编码方法及其报文传输方法
CN106850440B (zh) 一种面向多地址共享数据路由包的路由器、路由方法及其芯片
US9288163B2 (en) Low-latency packet receive method for networking devices
CN108614792B (zh) 1394事务层数据包存储管理方法及电路
US20160011995A1 (en) Island-based network flow processor with efficient search key processing
CN104598430A (zh) 一种cpu互联扩展系统的网络接口互联设计与控制系统
CN115756296A (zh) 缓存管理方法和装置、控制程序及控制器
CN115237829A (zh) 处理数据的装置、方法及存储介质
Zang et al. PROP: Using PCIe-based RDMA to accelerate rack-scale communications in data centers
US9632959B2 (en) Efficient search key processing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14894110

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14894110

Country of ref document: EP

Kind code of ref document: A1