CN116414344A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN116414344A
CN116414344A CN202310460612.8A CN202310460612A CN116414344A CN 116414344 A CN116414344 A CN 116414344A CN 202310460612 A CN202310460612 A CN 202310460612A CN 116414344 A CN116414344 A CN 116414344A
Authority
CN
China
Prior art keywords
data
buffer space
processed
annular buffer
queues
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310460612.8A
Other languages
Chinese (zh)
Inventor
袁暾
郭旭晨
王梓谦
马亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Tianlang Defense Technology Co ltd
Original Assignee
Nanjing Tianlang Defense Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Tianlang Defense Technology Co ltd filed Critical Nanjing Tianlang Defense Technology Co ltd
Priority to CN202310460612.8A priority Critical patent/CN116414344A/en
Publication of CN116414344A publication Critical patent/CN116414344A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/06Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
    • G06F5/065Partitioned buffers, e.g. allowing multiple independent queues, bidirectional FIFO's
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/06Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
    • G06F5/08Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor having a sequence of storage locations, the intermediate ones not being accessible for either enqueue or dequeue operations, e.g. using a shift register
    • G06F5/085Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor having a sequence of storage locations, the intermediate ones not being accessible for either enqueue or dequeue operations, e.g. using a shift register in which the data is recirculated
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a data processing method and a device, which are applied to a NUMA architecture, wherein the data processing method comprises the following steps: the receiving node distributes the data poll to be processed to a plurality of intermediate nodes; the intermediate node takes out the data to be processed in the respective first annular buffer space queues, and processes the data to be processed by utilizing the data processing link; the intermediate node sends the processed data to respective second annular buffer space queues; the sending node takes out all the data in the second annular buffer space queues, and sends the data to the next-stage system after sequencing; according to the invention, a complete data processing link is deployed on each node, so that each node can independently complete a data processing flow, and in addition, the annular buffer space queue is utilized to realize the polling receiving and sending of data, so that the data is processed with high efficiency and high instantaneity, and the resources of a processor can be utilized to the greatest extent.

Description

Data processing method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method and apparatus.
Background
Referring to fig. 1, fig. 1 is a diagram of a conventional data processing architecture, in the conventional signal processing architecture, each functional module of a signal processing link is divided into M-1 combinations according to the passing rate of the functional module, and the combinations are sequentially deployed on M-1 nodes according to a processing flow, in a signal processing process, any node receives a processing result of a previous node, processes the processing result at the node, and finally transmits the processing result to a next node.
The data processing mode is similar to a pipeline, each node processes the same thing, and enters the next node after the processing is completed, and although the resources of a processor can be fully utilized, the data processing speed is low, the next node can be entered only after the previous node finishes the data processing, and when the processing time of one node is too long, the following node is in a long-time waiting state, so that the processing efficiency of the data is low.
Disclosure of Invention
In order to solve the problems, the invention provides a data processing method and a data processing device with high data processing efficiency and strong expandability.
In order to achieve the above object, an aspect of the present invention provides a data processing method, applied to a NUMA architecture, including:
the receiving node distributes the data poll to be processed to a plurality of intermediate nodes; each intermediate node is provided with a first annular buffer space queue, a second annular buffer space queue and a complete data processing link;
the intermediate node takes out the data to be processed in the respective first annular buffer space queues, and processes the data to be processed by utilizing the data processing link;
the intermediate node sends the processed data to respective second annular buffer space queues;
and the sending node takes out all the data in the second annular buffer space queues, and sends the data to the next-stage system after sequencing.
As a preferable solution, the first annular buffer space queue and the second annular buffer space queue each include a plurality of memory spaces for repeated use.
As a preferred solution, the receiving node distributes the data poll to be processed to a plurality of intermediate nodes, further including:
the receiving node fills the data to be processed into a plurality of memory spaces of the first annular buffer space queue in turn in a polling mode;
after the data to be processed is placed in the current memory space, the white pointer of the first annular buffer space queue is pointed to the position of the pointer of the next memory space.
As a preferred technical solution, the intermediate node fetches the data to be processed in the respective first ring buffer queues, further includes:
the intermediate node sequentially takes out the data to be processed from a plurality of memory spaces of the respective first annular buffer space queues;
and after the data to be processed stored in the current memory space is taken out, the red pointer of the first annular buffer space queue is pointed to the position of the pointer of the next memory space.
As a preferred technical solution, the method further includes, before fetching, at the intermediate node, the data to be processed in the respective first ring buffer space queues: the intermediate node detects whether available data exists in the memory space in the respective first annular buffer space queues in real time.
As a preferred technical solution, the intermediate node detects in real time whether available data exists in the memory space in the respective first ring buffer space queue, and further includes:
the intermediate node detects the positions of the red pointer and the white pointer of the respective first annular buffer space queue in real time;
when the red pointer and the white pointer are overlapped, indicating that no data is available in the first annular buffer space queue;
when the amount of memory space separated by the red pointer and the white pointer is equal to the total amount of memory space minus one, the first annular buffer space queue is filled with available data.
As a preferred solution, the intermediate node sends the processed data to the respective second ring buffer space queues, and further includes:
the intermediate node sequentially puts the processed data into a plurality of memory spaces of the respective second annular buffer space queues;
after the processed data is placed in the current memory space, the white pointer of the second annular buffer space queue is pointed to the position of the pointer of the next memory space.
As a preferable technical scheme, the sending node fetches the data in all the second annular buffer space queues, and sends the data to the next-stage system after sequencing: and the sending node selects the data with the smallest sequence number to send to the next-stage system.
In another aspect, the present invention also provides a data processing apparatus, including:
a receiving unit for distributing the data poll to be processed to a plurality of intermediate nodes; each intermediate node is provided with a first annular buffer space queue, a second annular buffer space queue and a complete data processing link;
the processing unit is used for taking out the data to be processed in the respective first annular buffer space queues and processing the data to be processed by utilizing the data processing link;
a first transmitting unit, configured to transmit the processed data to respective second ring buffer space queues;
and the second sending unit is used for taking out all the data in the second annular buffer space queues, and sending the data to the next-stage system after sequencing.
Compared with the prior art, the invention has the following effects: according to the invention, a complete data processing link is deployed on each node, so that each node can independently complete a data processing flow, in addition, the annular buffer space queue is utilized to realize the polling receiving and sending of data, the high-efficiency data processing is realized, the instantaneity is high, the next node does not wait for the end of the processing of the previous node like the traditional data processing architecture, and the resources of a processor can be utilized to the greatest extent; in addition, the data processing method is high in expandability, and the number of intermediate nodes can be configured according to the size of the data volume, so that the processing efficiency is improved.
Drawings
FIG. 1 is a prior art architecture diagram for data processing;
FIG. 2 is a schematic diagram of an FT200 architecture according to an embodiment of the present invention;
FIG. 3 is a flow chart of a data processing method according to an embodiment of the present invention;
FIG. 4 is a diagram of an annular buffer space queue according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of poll distribution of pending data in an embodiment of the invention;
FIG. 6 is a schematic diagram of ordering before sending data according to an embodiment of the invention;
fig. 7 is a block diagram of a data processing apparatus according to an embodiment of the present invention.
Description of the embodiments
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The present embodiment provides a data processing method, which is based on a NUMA architecture, and in this embodiment, an FT200 architecture is illustrated as an example, as shown in fig. 2, where the FT200 architecture is a typical NUMA architecture, and the platform integrates 64 processor cores, and is divided into 8 panels, each Panel has 2 clusters, each Cluster includes 4 processor cores, and the 4 processor cores share a second level cache, which is logically equivalent to an SMP system. Two local Directory Control Units (DCUs), a network-on-chip router node (Cell), and a tightly coupled Memory Controller (MCU). The panels are connected through network interfaces on a chip, and a consistency maintenance message, a data message, a test adjustment message, an interrupt message and the like are uniformly routed and communicated from the same set of network interfaces.
In the architecture platform, according to different affinities of different panels and Cluster to storage space, dividing the whole storage space into 8 large spaces, wherein each large space corresponds to one Panel with the nearest distance; each large space is divided into 2 subspaces, one for each Cluster. The task deployment and scheduling can fully utilize the characteristics to optimize, the structure supports mapping a plurality of threads with higher affinity to the same Panel, global communication among the threads can be reduced, and the global communication delay and energy efficiency can be further optimized by combining an on-chip data movement and migration mechanism.
In order to meet the requirements of a multi-core processor on access bandwidth and delay, a chip realizes a hierarchical on-chip storage architecture and a hierarchical network structure, supports high-speed on-chip Cache and large-capacity storage, has high task communication frequency and large data synchronization volume, adopts an interconnection network with short delay and high bandwidth and a local private Cache, has low task communication frequency, adopts an interconnection network with good expansibility and longer delay and a Cache with distributed sharing, and is placed in a closer Panel as far as possible for an application needing cross-Panel access. And the distributed directory control and storage are adopted, the directory controller and the storage are distributed in each Panel, and the maintenance and access of the parallel processing consistency protocol are maximized. Meanwhile, different access capacities of the system configuration are supported through a flexible address mapping mode. In the affinity mode, a Directory Controller (DCU) in the Panel only accesses a local memory access Module (MCU), and memory access channels among the panels are not affected, so that the method has minimum delay and maximum bandwidth; in the partial mode, the DCU can access any MCU according to the configuration, and supports the system to configure the DDR channel number of different scales.
In this embodiment, according to the characteristics of the architecture of the data-based affinity multi-core processor, one Cluster in the multi-core platform is regarded as an SMP system, one Cluster shares a cache, and the Panel where the Cluster is located is mounted with a DDR with large space for use, that is, the Cluster has its own cache and memory, so that one Cluster can be equivalent to a small isomorphic multi-core CPU.
As shown in fig. 3, based on the above FT200 architecture platform, the data processing method provided in this embodiment includes the following steps,
s10: the receiving node distributes the data poll to be processed to a plurality of intermediate nodes;
it should be noted that, the node described in this embodiment refers to a Cluster in the FT200 platform, for example, a Cluster-0 in the FT200 platform is deployed as a receiving node, that is, the Cluster-0 is used to receive data, and then a first annular buffer space queue, a second annular buffer space queue and a complete data processing link are deployed on each intermediate node for distributing the received data poll; the first annular buffer space queue is used for storing input data, and the second annular buffer space queue is used for storing output data.
As shown in fig. 4, a total of N memory spaces in the ring buffer space queue are available for repeated use, the size of the memory spaces depends on the size of input data of each frame, that is, the storage space of the memory spaces is larger than the space occupied by the input data of each frame, and the memory spaces are set according to the size of the input data before data processing.
In a specific polling distribution manner, as shown in fig. 5, the receiving node sequentially fills the data to be processed into the first annular buffer space queue of each intermediate node in a polling manner, and points the white pointer of the first annular buffer space queue to the position of the next pointer, where the position refers to the position of the memory space, that is, after the data to be processed is placed in the current memory space, the white pointer points to the starting position of the next memory space, so that the storage of the next frame of data is convenient.
In addition, whether available data exist in the first annular buffer space queue can be detected in real time in the process of putting the data into the annular buffer space queue, and as long as the available data arrive in the first annular buffer space queue, the intermediate node immediately takes out the data for processing and points the red pointer of the first annular buffer space queue to the position of the next pointer.
When the red pointer and the white pointer coincide, the first annular buffer space queue is in a waiting state, and the node receiving the data fills in the data. When the red pointer and the white pointer are separated by N-1 memory space, the first annular buffer space queue is filled with available data, at the moment, the node receiving the data cannot continue to fill the data into the first annular buffer space queue, and the intermediate node waits for reading the data in the first annular buffer space queue and vacates the available memory space.
S20: the intermediate node takes out the data to be processed in the respective first annular buffer space queues, and processes the data to be processed by utilizing the data processing link;
specifically, after the data is placed in the memory of the first ring buffer space queue, the following nodes can perform data processing, in this embodiment, a plurality of parallel intermediate nodes are set to process the data, and a complete data processing link is deployed in each intermediate node, where it should be noted that each functional module included in the data processing link includes, for example: FFT, MTD, CFAR, capon, EKF, in this embodiment, the whole data processing link is deployed on one node, so that one piece of data can be completely processed on one node, the framework of the FT200 includes 16 clusters, 1 cluster is used for receiving the data, and 1 cluster is used for sending the processing result, so that the intermediate node can only be expanded to 14 at most, 14 frames of data can be processed at the same time, the processing efficiency is higher, for example, 20ms is required for processing 1 frame of data by the signal processing link deployed by the intermediate node, the real-time requirement of the signal processing system is 5ms, the requirement for system resource redundancy is 20%, and the intermediate node can meet the system requirement only by expanding to 5.
S30: the intermediate node sends the processed data to respective second annular buffer space queues;
it should be noted that, after the intermediate node finishes processing the data, the data are sequentially put into the second ring buffer space queue of each word, and the rule of the put is the same as that of the input data put into the first ring buffer space queue, so that the description is omitted here.
S40: and the sending node takes out all the data in the second annular buffer space queues, and sends the data to the next-stage system after sequencing.
Specifically, as shown in fig. 6, the transmitting node takes out the processing results of the signal processing links from the ring buffer space queues of each intermediate processing node, sorts the processing results according to the data sequence numbers, and selects the data with the smallest sequence number each time to transmit to the next-stage system.
In addition, in order to verify the technical effects of the present invention, the present embodiment provides the following test data:
test platform: the domestic multi-core processing platform FT2000 platform.
Test data amount: 528K complex floating point data. 1 data receiving node, 1 data transmitting node, 6 intermediate processing nodes, and occupies 32 processor cores in total.
Polling signal processing architecture of this experiment:
1 data receiving node, 1 data transmitting node, 6 intermediate processing nodes occupy 32 processor cores in total.
1 intermediate processing node takes about 23ms to process 1 frame data, 6 intermediate processing nodes can process 6 frames of data simultaneously, and in this scenario, signal processing instantaneity can reach 3.9ms.
Traditional signal processing architecture:
the 1 data receiving node, the 1 data sending node and the signal processing link are split into 6 functional module combinations according to the data passing rate of each module, and are deployed on 6 intermediate processing nodes according to the illustration of fig. 1, and occupy 32 processor cores in total.
After receiving the data, the nodes receiving the data are sequentially transmitted to the nodes deployed with the functional modules for processing, and the total time spent on processing 1 frame of data by each intermediate node is about 21ms.
Table 1 results of comparison of different architectural processing effects
Data volume The processing method of the present embodiment Conventional processing method
528K complex number 3.9ms 21ms
As shown by experimental results, the data processing method provided by the implementation improves the signal processing efficiency and improves the utilization rate of system resources under the condition of occupying the same calculation and storage resources.
Referring to fig. 7, the present embodiment further provides a data processing apparatus, including:
a receiving unit 100 for distributing data polls to be processed to a plurality of intermediate nodes; each intermediate node is provided with a first annular buffer space queue, a second annular buffer space queue and a complete data processing link; it should be noted that, since the specific receiving manner and principle are described in detail in the step S10 of the data processing method described in the above embodiment, the detailed description is omitted here.
The processing unit 200 is configured to take out data to be processed in the respective first ring buffer space queues, and process the data to be processed by using the data processing link; it should be noted that, since the specific processing manner and principle are described in detail in the step S20 of the data processing method described in the above embodiment, the description is omitted here.
A first transmitting unit 300, configured to transmit the processed data to respective second ring buffer space queues; it should be noted that, since the specific transmission mode and principle are described in detail in the step S30 of the data processing method described in the above embodiment, the description is omitted here.
The second sending unit 400 is configured to take out all the data in the second ring buffer space queues, and send the data to the next stage system after sequencing; it should be noted that, since the specific transmission mode and principle are described in detail in the step S40 of the data processing method described in the above embodiment, the description is omitted here.
Compared with the traditional signal processing architecture, the traditional signal processing architecture is characterized in that a complete signal processing link is split into M-1 functional module sets which are respectively deployed on M-1 processing nodes; the polling signal processing architecture is to deploy a complete signal processing link on an intermediate node, and generate a plurality of identical processing branches, so as to process multi-frame data simultaneously.
The signal processing program running on one cluster reduces the computing resource scheduling, data I/O, etc. operations of the signal processing program between clusters compared to running on multiple clusters, so the running efficiency of the program running on a single cluster should be higher.
In addition, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium may store a program, where the program when executed includes some or all of the steps of any one of the data processing methods described in the foregoing method embodiments.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on this understanding, the technical solution of the present invention may be embodied essentially or partly in the form of a software product, or all or part of the technical solution, which is stored in a memory, and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, which may include: flash disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.
Exemplary flowcharts for processing data according to embodiments of the present invention are described above with reference to the accompanying drawings. It should be noted that the numerous details included in the above description are merely illustrative of the invention and not limiting of the invention. In other embodiments of the invention, the method may have more, fewer, or different steps, and the order, inclusion, functional relationship between steps may be different than that described and illustrated.

Claims (10)

1. A data processing method applied to a NUMA architecture, comprising:
the receiving node distributes the data poll to be processed to a plurality of intermediate nodes; each intermediate node is provided with a first annular buffer space queue, a second annular buffer space queue and a complete data processing link;
the intermediate node takes out the data to be processed in the respective first annular buffer space queues, and processes the data to be processed by utilizing the data processing link;
the intermediate node sends the processed data to respective second annular buffer space queues;
and the sending node takes out all the data in the second annular buffer space queues, and sends the data to the next-stage system after sequencing.
2. A data processing method according to claim 1, characterized in that: the first annular buffer space queue and the second annular buffer space queue each comprise a plurality of memory spaces for repeated use.
3. The data processing method of claim 1, wherein the receiving node distributes the pending data poll to a plurality of intermediate nodes, further comprising:
the receiving node fills the data to be processed into a plurality of memory spaces of the first annular buffer space queue in turn in a polling mode;
after the data to be processed is placed in the current memory space, the white pointer of the first annular buffer space queue is pointed to the position of the pointer of the next memory space.
4. A data processing method according to claim 3, wherein the intermediate node fetches the data to be processed in the respective first ring buffer queues, further comprising:
the intermediate node sequentially takes out the data to be processed from a plurality of memory spaces of the respective first annular buffer space queues;
and after the data to be processed stored in the current memory space is taken out, the red pointer of the first annular buffer space queue is pointed to the position of the pointer of the next memory space.
5. The method of claim 4, wherein retrieving the data to be processed in the respective first ring buffer queues at the intermediate node further comprises:
the intermediate node detects whether available data exists in the memory space in the respective first annular buffer space queues in real time.
6. The method of claim 5, wherein the intermediate node detects in real time whether there is available data in the memory space in the respective first ring buffer space queue, further comprising:
the intermediate node detects the positions of the red pointer and the white pointer of the respective first annular buffer space queue in real time;
when the red pointer and the white pointer are overlapped, indicating that no data is available in the first annular buffer space queue;
when the amount of memory space separated by the red pointer and the white pointer is equal to the total amount of memory space minus one, the first annular buffer space queue is filled with available data.
7. The data processing method of claim 2, wherein the intermediate nodes send the processed data to respective second ring buffer space queues, further comprising:
the intermediate node sequentially puts the processed data into a plurality of memory spaces of the respective second annular buffer space queues;
after the processed data is placed in the current memory space, the white pointer of the second annular buffer space queue is pointed to the position of the pointer of the next memory space.
8. The data processing method according to claim 7, wherein the data in all the second ring buffer queues are fetched at the transmitting node, and are sent to the next-stage system after being ordered:
and the sending node selects the data with the smallest sequence number to send to the next-stage system.
9. A data processing apparatus, comprising:
a receiving unit for distributing the data poll to be processed to a plurality of intermediate nodes; each intermediate node is provided with a first annular buffer space queue, a second annular buffer space queue and a complete data processing link;
the processing unit is used for taking out the data to be processed in the respective first annular buffer space queues and processing the data to be processed by utilizing the data processing link;
a first transmitting unit, configured to transmit the processed data to respective second ring buffer space queues;
and the second sending unit is used for taking out all the data in the second annular buffer space queues, and sending the data to the next-stage system after sequencing.
10. A computer-readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of a data processing method according to any one of claims 1 to 8.
CN202310460612.8A 2023-04-26 2023-04-26 Data processing method and device Pending CN116414344A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310460612.8A CN116414344A (en) 2023-04-26 2023-04-26 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310460612.8A CN116414344A (en) 2023-04-26 2023-04-26 Data processing method and device

Publications (1)

Publication Number Publication Date
CN116414344A true CN116414344A (en) 2023-07-11

Family

ID=87057805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310460612.8A Pending CN116414344A (en) 2023-04-26 2023-04-26 Data processing method and device

Country Status (1)

Country Link
CN (1) CN116414344A (en)

Similar Documents

Publication Publication Date Title
CN107689948B (en) Efficient data access management device applied to neural network hardware acceleration system
WO2020078470A1 (en) Network-on-chip data processing method and device
CN101989942B (en) Arbitration control method, communication method, arbitrator and communication system
CN113807509B (en) Neural network acceleration device, method and communication equipment
CN114564434B (en) General multi-core brain processor, acceleration card and computer equipment
US10496329B2 (en) Methods and apparatus for a unified baseband architecture
CN110347626B (en) Server system
EP3777059B1 (en) Queue in a network switch
CN113010845B (en) Computing device, method and related product for performing matrix multiplication
US12038866B2 (en) Broadcast adapters in a network-on-chip
CN116185641B (en) Fusion architecture system, nonvolatile storage system and storage resource acquisition method
US20220129179A1 (en) Data processing apparatus, data processing system including the same, and operating method thereof
US20230403232A1 (en) Data Transmission System and Method, and Related Device
CN116414344A (en) Data processing method and device
US9930117B2 (en) Matrix vector multiply techniques
CN113159302B (en) Routing structure for reconfigurable neural network processor
US20230259486A1 (en) Neural processing unit synchronization systems and methods
CN115964982A (en) Topological structure of accelerator
US9146848B2 (en) Link training for a serdes link
CN114490465B (en) Data transmission method and device for direct memory access
CN111045965B (en) Hardware implementation method for multi-channel conflict-free splitting, computer equipment and readable storage medium for operating method
CN112565065B (en) Gateway system and processing method based on LORA
US20240201987A1 (en) Neural network hardware acceleration via sequentially connected computation modules
US11934337B2 (en) Chip and multi-chip system as well as electronic device and data transmission method
CN106383791A (en) Memory block combination method and apparatus based on non-uniform memory access architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination