CN109324982B - Data processing method and data processing device - Google Patents

Data processing method and data processing device Download PDF

Info

Publication number
CN109324982B
CN109324982B CN201710640687.9A CN201710640687A CN109324982B CN 109324982 B CN109324982 B CN 109324982B CN 201710640687 A CN201710640687 A CN 201710640687A CN 109324982 B CN109324982 B CN 109324982B
Authority
CN
China
Prior art keywords
data block
data
data processing
inactive
standby
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710640687.9A
Other languages
Chinese (zh)
Other versions
CN109324982A (en
Inventor
张争争
矫渊培
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Huawei Technologies Co Ltd
Original Assignee
Shanghai Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Huawei Technologies Co Ltd filed Critical Shanghai Huawei Technologies Co Ltd
Priority to CN201710640687.9A priority Critical patent/CN109324982B/en
Publication of CN109324982A publication Critical patent/CN109324982A/en
Application granted granted Critical
Publication of CN109324982B publication Critical patent/CN109324982B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The embodiment of the application discloses a data processing method and data processing equipment, which can improve the data access efficiency and reduce the power consumption. The method comprises the following steps: the data processing device receives processor configuration information, wherein the configuration information is used for indicating the data processing device to call a standby data block, and comprises a source address of the standby data block, a destination address of the standby data block, a size of the standby data block and a call command format of the standby data block; the data processing device performs fine granularity division according to the size of the data block to be used to determine the transmission times; the data processing device calls the standby data block according to the transmission times, the source address of the standby data block and the call command format of the standby data block; the data processing apparatus stores the inactive data block at a destination address of the inactive data block.

Description

Data processing method and data processing device
Technical Field
The present disclosure relates to the field of communications, and in particular, to a data processing method and a data processing apparatus.
Background
The current processor mostly adopts a Cache (Cache) structure as a local Cache, which is used for solving the problem of greatly deteriorated performance caused by remote access of a memory (memory), and continuously optimizing and perfecting the Cache structure by utilizing the space locality and the time locality of data access, supporting the prefetching of software and hardware of a Cache line (Cache line), the consistent access and the like, and greatly improving the flexibility and the access performance of software.
The existing method for processing data by adopting the Cache comprises the following steps: reading data to be accessed in the Cache access process into a local memory, and setting a corresponding mark to indicate that the data is stored in the local memory; when the data is accessed again, since the data is already stored in the local memory, there is no need to access the external storage unit again, i.e., high-speed access is achieved.
However, in each Cache access process, the data to be accessed needs to be compared with a plurality of address labels, so that whether the data to be accessed is valid or not is judged, the data access efficiency is low, on one hand, the power consumption is overlarge, and on the other hand, the situation that the data to be accessed is invalid is judged, so that the access probability of the data to be accessed is low, the redundant access quantity is large, and the power consumption is further increased.
Disclosure of Invention
The embodiment of the application provides a data processing method and a data processing device, which can improve the data access efficiency and reduce the power consumption.
In view of this, a first aspect of the present application provides a data processing method, which may include: when software needs to access the external space continuous data block or a small amount of single data, the processor rapidly configures related information of the data processing device, such as a source address of a standby data block, a destination address of the standby data block, a size of the standby data block and a calling command format of the standby data block, and after the processor generates the configuration information, the processor sends the configuration information to the data processing device; the data processing device performs fine granularity division according to the size of the data block to be used to determine the transmission times; then, the data processing device calls the standby data block according to the transmission times, the source address of the standby data block and the calling command format of the standby data block; finally, the data processing apparatus stores the inactive data block at the destination address of the inactive data block. Therefore, the standby data block is directly accessed, so that the comparison of a plurality of address labels is avoided, the access efficiency is improved, and the power consumption is reduced. In addition, the probability of deterministic access to the inactive data blocks increases to further reduce power consumption. Therefore, the data access efficiency can be improved, and the power consumption can be reduced.
In some possible implementations, the determining the number of transmissions by the data processing apparatus based on the size of the inactive data block may be: the data processing device divides fine granularity according to the size of the database to be used to determine the number of bursts (burst); and determining the transmission times according to the burst number, wherein burst is a data packet and can contain 512byte data.
In other possible implementations, the destination address at which the data processing apparatus stores the inactive data block in the inactive data block may be: the data processing device configures a plurality of channels in advance, and the data processing device sends the standby data block to a destination address of the standby data block through the configured plurality of channels; the inactive data blocks are stored by their destination addresses.
In other possible implementations, the destination address of the inactive data block is an external memory, the source address of the inactive data block may be an internal memory, and storing the inactive data block by the data processing apparatus through the destination address of the inactive data block may be: the data processing device stores the inactive data blocks via an external memory.
In other possible implementations, the destination address of the inactive data block is an internal memory, the source address of the inactive data block may be an external memory, and storing the inactive data block by the data processing apparatus through the destination address of the inactive data block may be: the data processing device stores the inactive data blocks via the internal memory.
In other possible implementations, after the data processing apparatus receives the processor configuration information, the data processing apparatus may cache the configuration information through a circular queue.
In other possible implementations, if the configuration information is further used to instruct the data processing apparatus to call the standby discrete data, the data processing apparatus may store the standby discrete data through the preset Cache. Therefore, by presetting the Cache, the access flexibility of the discrete data is still reserved under the condition that the specification of the Cache is reduced, and the application scene range of the application is enlarged.
A second aspect of the present application provides a data processing method, which may include: the processor generates configuration information, wherein the configuration information comprises a source address of a standby data block, a destination address of the standby data block, a size of the standby data block and a calling command format of the standby data block; the processor sends configuration information to the data processing apparatus so that the data processing apparatus calls the inactive data blocks according to the configuration information.
A third aspect of the present application provides a data processing apparatus, which may implement the functions of the method provided by the first aspect or any optional implementation manner of the first aspect, where the functions may be implemented by software, where the software includes modules corresponding to the functions described above, and each module is configured to perform a corresponding function.
A fourth aspect of the present application provides a processor, where the processor may implement the functions of the method provided by the second aspect or any alternative implementation manner of the second aspect, where the functions may be implemented by software, where the software includes modules corresponding to the functions described above, and each module is configured to perform a corresponding function.
A fifth aspect of the present application provides a computer storage medium storing computer software instructions for use with the data processing apparatus described above, comprising a program designed to perform the functions performed by the data processing apparatus in the above aspects.
A sixth aspect of the present application provides a computer storage medium storing computer software instructions for use with the processor described above, comprising a program designed to perform the functions performed by the processor in the above aspects.
From the above technical solutions, the embodiments of the present application have the following advantages: the standby data block is directly accessed, so that the comparison of a plurality of address labels is avoided, the access efficiency is improved, and the power consumption is reduced. In addition, the deterministic access probability of the standby data block is improved, and the redundant access quantity is reduced, so that the power consumption is further reduced. Therefore, the data access efficiency can be improved, and the power consumption can be reduced.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1 is a system architecture diagram of a data processing method provided herein;
FIG. 2 is a schematic diagram of an internal structure of an FDFU provided in the present application;
FIG. 3 is a schematic state management diagram of a command buffer management unit according to the present application;
FIG. 4 is a flow chart of a data processing method provided in the present application;
FIG. 5 is a schematic diagram of the internal structure of a bi-directional FDFU provided in the present application;
FIG. 6 is a schematic diagram of an architecture of an FDFU matching Cache mechanism provided in the present application;
FIG. 7 is a block diagram of a data processing apparatus provided herein;
FIG. 8 is a block diagram of another data processing apparatus according to the present application.
Detailed Description
The embodiment of the application provides a data processing method and a data processing device, which can improve the data access efficiency and reduce the power consumption.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, fig. 1 is a system architecture diagram of a data processing method provided in the present application, and fig. 1 includes the following parts: memory, fast Data pre-Fetch Unit (FDFU), and kernel. By rapidly configuring the FDFU module, the effect of fine-grained data moving is achieved. It should be noted that the data processing apparatus in the present application may be the FDFU in fig. 1, and the processor in the present application may include the kernel in fig. 1.
Referring to fig. 2, fig. 2 is a schematic diagram of an internal structure of an FDFU provided in the present application, an FDFU module may include: the system comprises a command receiving unit, a command cache management unit, a read prefetch management unit, a read data interaction unit and a write data interaction unit.
The command receiving unit is mainly responsible for receiving commands from the kernel, distributing proper logic Identifiers (IDs) and returning the IDs to the kernel; meanwhile, the newly received command is analyzed and then is filled into the command cache management unit;
the command buffer management unit is mainly responsible for managing the command buffer, and can buffer 16 commands from the kernel and maintain 16 named execution states, and the state management mechanism can be seen in fig. 3, in which the command buffer is a circular queue, and a first signal (such as instr_buffer_head) indicates a first position of the circular queue, and the command pointed to by the first signal is a command to be executed; the second signal (e.g., instr_buffer_over) indicates that the instruction is executing and expecting to return in order, the command ID supporting the return is not a function of the second signal, and the third signal (e.g., instr_buffer_tail) is the tail address of the circular queue where the next received data will be written. FIG. 3 is an example of a queue state in which a first signal points to ID4, indicating that a command with ID4 will be provided to the next command resolution; while the second signal points to ID0, indicating that a command with ID0 is executing (status bit 1), and in addition, that commands with ID1 and 2 have been executed, status bits are cleared; at this time, if the command with the ID of 0 is executed, the command with the ID of 3 should be executed; the third signal is directed to ID15 and only one additional command can be received in the circular queue. The command cache management unit is responsible for the inquiry command of the kernel, and when the corresponding ID status bit is 0, the data movement is completed, and the logic channel is released; otherwise, the data movement is not completed, and the kernel needs to be stopped.
The read prefetch management unit is mainly responsible for reading commands from the command cache management unit and can support 4 continuous command analysis (before 8 buses burst outstanding run out); dividing a big data packet (data block) into a plurality of bursts; record and maintain the status of 4 command executions and 8 bus burst outstanding status; receiving data returned by the read operation; merging and aligning the data to adapt to the read-write bit width of the local memory. The read prefetch management unit is mainly realized through a state machine, wherein the state machine comprises 3 states, and the meanings of the states are as follows: first state: an idle state, wherein a command channel is waited for to be ready, the source address, the destination address and the data block size of the current command are recorded on the premise of having the command, and the state jumps to a second state; second state: when the first burst request is the last burst request at the same time, if the size of the data block is smaller than or equal to the preset maximum burst size, the preset maximum burst size can be set to 512 bytes (byte), and the state machine jumps to the first state; if the size of the data block is larger than the preset maximum burst size, the state machine jumps to a third state; third state: and (3) carrying out pipelining processing on the received read data according to the mark generated by the state machine, returning all the outloding representing a certain command ID to the command cache management unit after finishing successfully writing back the local memory, and updating the corresponding state.
The read data interaction unit is mainly responsible for arbitrating the prefetch request and the direct read request, and the priority of the direct read request is higher than that of the prefetch request.
The write data interaction unit can cache 4 data with 256 bits, support the merging function among multiple storage, and support the data coverage of the same address on the premise of no flush operation.
The following describes a data processing method in the present application by means of a specific embodiment, referring to fig. 4, one embodiment of the data processing method in the present application includes:
101. the data processing apparatus receives processor configuration information for instructing the data processing apparatus to call the inactive data block, the configuration information including a source address of the inactive data block, a destination address of the inactive data block, a size of the inactive data block, and a call command format of the inactive data block;
in this embodiment, when software needs to access an external spatially continuous data block or a small amount of single data, the processor rapidly configures information related to the data processing apparatus, for example, a source address of a standby data block, a destination address of the standby data block, a size of the standby data block, and a call command format of the standby data block, and then, after the processor configures the configuration information, sends the configuration information to the data processing apparatus.
The related configuration of the data processing apparatus in this embodiment can satisfy the following requirements:
the data transfer from the external memory to the internal memory is supported, and the size of the transferred data can be matched, and the maximum size can be 64k. Data movement between external memories is not supported, and data movement from internal memory to external memory is not supported.
The single instruction completes the configuration of relevant information of the data processing device, including source address, destination address, data size, and automatically returns channel ID for inquiry.
And finishing a single instruction, inquiring whether the data is moved successfully or not, if not, blocking the kernel of the processor, and waiting for finishing the previous data movement.
The data processing device configures the number of channels, such as 16, if the 16 channels are occupied during configuration, the kernel is blocked, and the data is waited for to be moved before completion.
The internal memory and the external memory are addressed in a unified way, and the addresses of the internal memory and the external memory are not repeated.
Four parallel logic channels are supported, and execution of 4 commands can be maintained simultaneously.
And supporting data block division, and splitting a large data block into a plurality of bursts.
8 independent read operations are supported, corresponding to 8 bus outlining, 8 burst requests.
The method supports the merging function of writing data in 16 continuous periods, and supports the writing cache of 4 data with 256 bits; supporting a buffer emptying function;
the high order 16bit address of the external configuration internal memory is supported.
In addition, the processor supports external memory direct read and write operations, up to 2 external memory direct read operations, up to 32 external memory direct write operations.
It should be noted that, the call command format may be used to indicate a direction of moving the inactive data block, for example, from the external space to the internal space, or from the internal space to the external space.
The processor may determine the size of the dormant data block in a manner such that the processor may determine the time granularity based on a distance between an external memory of the external space and a core of the processor, and then determine the size of the moved dormant data block based on a data throughput per unit time of the time granularity.
The processor may determine the destination address of the dormant data block in the following manner, for example, the processor sets the destination address by creating a local temporary variable according to the size of the dormant data block, which is specifically as follows: defining a temporary variable according to the size of each standby data block (a continuous data block of a cyclic pattern accesses a defined ping-pong buffer, and a discontinuous variable can directly define a local temporary variable); the source address of the external space (global variable) and the destination address of the local temporary variable are generated, and the size of each move is set (the continuous data block is the ping-pong size of fine-grained move, and the discontinuous variable is the size of the whole variable).
In practical application, the processor is connected with the data processing device through a simple interactive internal bus, the data processing device is connected with the external memory through a bus, and the data processing device is connected with the internal memory through a bus.
Optionally, in some possible embodiments, after the data processing apparatus receives the processor configuration information, the method may further include:
the data processing apparatus caches configuration information through the circular queue.
102. The data processing device performs fine granularity division according to the size of the data block to be used to determine the transmission times;
in this embodiment, the data processing apparatus may determine the number of transmissions in the following manner: the data processing device performs fine granularity division according to the size of the data block to be used to determine the burst number; and determining the transmission times according to the burst number.
Wherein burst is a packet containing 512 bytes of data.
103. The data processing device calls the standby data block according to the transmission times, the source address of the standby data block and the calling command format of the standby data block;
in this embodiment, after the data processing apparatus determines the number of transmission times, the data processing apparatus calls the inactive data block according to the number of transmission times, the source address of the inactive data block, and the call command format of the inactive data block.
104. The data processing apparatus stores the inactive data block at a destination address of the inactive data block.
In this embodiment, after the data processing apparatus calls the inactive data block according to the number of transmissions, the source address of the inactive data block, and the call command format of the inactive data block, the data processing apparatus stores the inactive data block in the destination address of the inactive data block.
Optionally, in some possible embodiments, the storing, by the data processing apparatus, the inactive data block at a destination address of the inactive data block may be:
the data processing device sends the standby data block to the destination address of the standby data block through a pre-configured channel;
the data processing means stores the inactive data blocks by their destination addresses.
In this embodiment, the data processing apparatus configures the channels in advance, and the maximum number of channels may be limited to 16.
Further, if the destination address of the inactive data block is an external memory, the storing, by the data processing apparatus, the inactive data block by the destination address of the inactive data block may be:
the data processing device stores the inactive data blocks via an external memory.
Or if the destination address of the inactive data block is an internal memory, the data processing apparatus stores the inactive data block by the destination address of the inactive data block as follows:
the data processing device stores the inactive data blocks via the internal memory.
In this embodiment, the data processing apparatus supports data movement from the external memory to the internal memory, and also supports data movement from the internal memory to the external memory, i.e. supports external movement of the inactive data blocks.
The embodiment provides a schematic diagram of an internal structure of a bidirectional FDFU supporting outward movement of a data block to be used, and particularly, refer to fig. 5, fig. 5 is a schematic diagram of an internal structure of a bidirectional FDFU provided in the present application, and the FDFU shown in fig. 5 is equivalent to a data processing device in the present application. The data blocks under the bidirectional FDFU internal structure shown in fig. 5 support bidirectional two-dimensional shifting, which refers to shifting of the data blocks from the external memory to the internal memory or shifting of the data blocks from the internal memory to the external memory, two-dimensional shifting refers to shifting of the data in the data blocks at fixed intervals, or shifting of the offset value of each shift is an arithmetic progression, and the fixed intervals and the offset value can be configured by the processor.
Optionally, in some possible embodiments, if the configuration information is further used to instruct the data processing apparatus to call the standby discrete data, the method may further include:
the data processing device stores standby discrete data through a preset Cache.
In this embodiment, by presetting the Cache, under the condition that the specification of the Cache is reduced, the access flexibility of the discrete data is still reserved, and the application scene range of the application is enlarged.
The schematic architecture diagram of the FDFU matching Cache mechanism supporting discrete data access provided in this embodiment may be specifically referred to fig. 6, where fig. 6 is schematic architecture diagram of the FDFU matching Cache mechanism provided in this application, and the FDFU shown in fig. 6 is equivalent to a data processing apparatus in this application.
In this embodiment, the standby data block is directly accessed, so that comparison of a plurality of address labels is avoided, access efficiency is improved, and power consumption is reduced. In addition, the deterministic access probability of the standby data block is improved, and the redundant access quantity is reduced, so that the power consumption is further reduced. Therefore, the data access efficiency can be improved, and the power consumption can be reduced.
And secondly, the outward moving of the standby data block is supported, the efficiency of the outward writing operation is improved, and the complexity of the bus is reduced.
Finally, the access flexibility of the discrete data is reserved, and the application scene range of the application is enlarged.
Having described the data processing method in the present application by way of example, the data processing apparatus in the present application is described by way of example, referring to fig. 7, one example of the data processing apparatus in the present application includes:
a receiving module 201, configured to receive processor configuration information, where the configuration information is configured to instruct the data processing apparatus to call the inactive data block, and the configuration information includes a source address of the inactive data block, a destination address of the inactive data block, a size of the inactive data block, and a call command format of the inactive data block;
a determining module 202, configured to determine the number of transmissions by performing fine granularity division according to the size of the data block to be used;
a calling module 203, configured to call the inactive data block according to the number of transmission times, the source address of the inactive data block, and a calling command format of the inactive data block;
a storage module 204 is configured to store the inactive data block at a destination address of the inactive data block.
In this embodiment, the standby data block is directly accessed, so that comparison of a plurality of address labels is avoided, access efficiency is improved, and power consumption is reduced. In addition, the deterministic access probability of the standby data block is improved, and the redundant access quantity is reduced, so that the power consumption is further reduced. Therefore, the data access efficiency can be improved, and the power consumption can be reduced.
Further, in some possible embodiments, the determining module 202 is specifically configured to determine the burst number by performing fine granularity division according to the size of the data block to be used; and determining the transmission times according to the burst number.
Further, in some possible embodiments, the storage module 204 is specifically configured to send the inactive data block to the destination address of the inactive data block through a pre-configured channel; the inactive data blocks are stored by their destination addresses.
Further, in some possible embodiments, the destination address of the inactive data block is an external memory, and the storage module 204 is specifically configured to store the inactive data block through the external memory.
Further, in some possible embodiments, the destination address of the inactive data block is an internal memory, and the storage module 204 is specifically configured to store the inactive data block through the internal memory.
Further, in some possible embodiments, the storage module 204 is further configured to cache configuration information through a circular queue.
Further, in some possible embodiments, if the configuration information is further used to instruct the data processing apparatus to call the standby discrete data, the storage module 204 is further used to store the standby discrete data through the preset Cache.
Therefore, by presetting the Cache, the access flexibility of the discrete data is still reserved under the condition that the specification of the Cache is reduced, and the application scene range of the application is enlarged.
The data processing apparatus in the present application is described above from the point of view of the modularized functional entity, and the data processing apparatus in the present application is described below from the point of view of hardware processing, referring to fig. 8, the data processing apparatus in the present application includes: a receiver 301, a processor 302 and a memory 303.
The data processing apparatus according to the present application may have more or less components than those shown in fig. 8, may combine two or more components, or may have different configurations or arrangements of components, and each component may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
The receiver 301 is configured to perform the following operations:
receiving processor configuration information, wherein the configuration information is used for indicating a data processing device to call a standby data block, and comprises a source address of the standby data block, a destination address of the standby data block, a size of the standby data block and a call command format of the standby data block;
the processor 302 is configured to perform the following operations:
carrying out fine granularity division according to the size of a data block to be used to determine the transmission times;
calling the standby data block according to the transmission times, the source address of the standby data block and the calling command format of the standby data block;
the memory 303 is used to perform the following operations:
the dormant data block is stored at the destination address of the dormant data block.
In this embodiment, the standby data block is directly accessed, so that comparison of a plurality of address labels is avoided, access efficiency is improved, and power consumption is reduced. In addition, the deterministic access probability of the standby data block is improved, and the redundant access quantity is reduced, so that the power consumption is further reduced. Therefore, the data access efficiency can be improved, and the power consumption can be reduced.
The processor 302 is further configured to perform the following operations:
carrying out fine granularity division according to the size of the data block to be used to determine the burst number; and determining the transmission times according to the burst number.
The memory 303 is also used to perform the following operations:
transmitting the standby data block to a destination address of the standby data block through a pre-configured channel; the inactive data blocks are stored by their destination addresses.
The memory 303 is also used to perform the following operations:
when the destination address of the inactive data block is the external memory, the inactive data block is stored by the external memory.
The memory 303 is also used to perform the following operations:
when the destination address of the inactive data block is the internal memory, the inactive data block is stored by the internal memory.
The memory 303 is also used to perform the following operations:
the configuration information is cached through a circular queue.
The memory 303 is also used to perform the following operations:
and if the configuration information is also used for indicating the data processing device to call the standby discrete data, storing the standby discrete data through a preset Cache.
In the above embodiments, it may be implemented in whole or in part by software, hardware, or a combination thereof. When implemented in software or a combination of software and hardware, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a storage medium or transmitted from one storage medium to another. For example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, twisted pair, fiber optic) or wireless (e.g., infrared, wireless, microwave, etc.) means. The storage medium may be any medium that can be stored by a computer or a data storage device including one or more media integrated servers, data centers, and the like. The medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., optical disk), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, apparatuses and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network devices. Some or all of the units may be selected according to actual needs to achieve the purposes of the embodiments of the present application.
The relevant parts of the embodiments of the present application may be referred to each other, including: the relevant parts of the method embodiments can be mutually referred to; the apparatus provided by each apparatus embodiment is configured to perform the method provided by the corresponding method embodiment, so each apparatus embodiment may be understood with reference to the relevant part of the relevant method embodiment.
The device configuration diagrams presented in the device embodiments of the present application only show a simplified design of the corresponding device. In practical applications, the apparatus may include any number of communication modules, processors, memories, etc. to implement the functions or operations performed by the apparatus in the embodiments of the apparatus, and all apparatuses capable of implementing the application are within the scope of protection of the application.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same. Modifications to the embodiments described in the foregoing may occur to those skilled in the art, and such modifications do not depart from the scope of the appended claims.

Claims (15)

1. A method of data processing, comprising:
the data processing device receives processor configuration information, wherein the configuration information is used for indicating the data processing device to call a standby data block, the configuration information comprises a source address of the standby data block, a destination address of the standby data block, a size of the standby data block and a call command format of the standby data block, the size of the standby data block is determined according to data throughput in unit time of time granularity, and the time granularity is determined according to a distance between an external memory of an external space and a kernel of a processor;
the data processing device performs fine granularity division according to the size of the data block to be used to determine the transmission times;
the data processing device calls the standby data block according to the transmission times, the source address of the standby data block and a calling command format of the standby data block, wherein the calling command format can be used for indicating the moving direction of the standby data block;
the data processing apparatus stores the inactive data block at a destination address of the inactive data block.
2. The method of claim 1, wherein the data processing apparatus determining the number of transmissions by fine granularity partitioning according to the inactive data block size comprises:
the data processing device performs fine granularity division according to the size of the standby data block to determine the burst number;
and determining the transmission times according to the burst number.
3. The method of claim 1, wherein the data processing apparatus storing the inactive data block at a destination address of the inactive data block comprises:
the data processing device sends the standby data block to a destination address of the standby data block through a pre-configured channel;
the data processing apparatus stores the inactive data blocks by their destination addresses.
4. A method according to claim 3, wherein the destination address of the inactive data block is an external memory, and wherein the data processing apparatus storing the inactive data block via the destination address of the inactive data block comprises:
the data processing apparatus stores the inactive data blocks through the external memory.
5. A method according to claim 3, wherein the destination address of the inactive data block is an internal memory, and wherein the data processing apparatus storing the inactive data block via the destination address of the inactive data block comprises:
the data processing apparatus stores the inactive data blocks through the internal memory.
6. The method according to any one of claims 1 to 5, wherein the data processing apparatus after receiving the processor configuration information comprises:
the data processing device caches the configuration information through a circular queue.
7. The method of any of claims 1 to 5, wherein if the configuration information is further used to instruct the data processing apparatus to invoke inactive discrete data, the method further comprises:
and the data processing device stores the standby discrete data through a preset Cache.
8. A data processing apparatus, comprising:
a receiving module configured to receive processor configuration information, the configuration information being configured to instruct the data processing apparatus to invoke a dormant data block, the configuration information including a source address of the dormant data block, a destination address of the dormant data block, a size of the dormant data block, and a call command format of the dormant data block, the size of the dormant data block being determined according to a data throughput per unit time of time granularity, the time granularity being determined according to a distance between an external memory of an external space and a core of a processor;
the determining module is used for carrying out fine granularity division according to the size of the standby data block to determine the transmission times;
the calling module is used for calling the standby data block according to the transmission times, the source address of the standby data block and a calling command format of the standby data block, wherein the calling command format can be used for indicating the moving direction of the standby data block;
and the storage module is used for storing the standby data block in a destination address of the standby data block.
9. The apparatus of claim 8, wherein the determining module is specifically configured to determine a burst number by performing fine granularity division according to the size of the inactive data block; and determining the transmission times according to the burst number.
10. The apparatus according to claim 8, wherein the storage module is configured to send the inactive data block to a destination address of the inactive data block via a pre-configured channel; and storing the standby data block through the destination address of the standby data block.
11. The apparatus according to claim 10, wherein the destination address of the inactive data block is an external memory, and wherein the storage module is configured to store the inactive data block in the external memory.
12. The apparatus according to claim 10, wherein the destination address of the inactive data block is an internal memory, and wherein the storage module is configured to store the inactive data block in the internal memory.
13. The apparatus according to any of claims 8 to 12, wherein the storage module is further configured to cache the configuration information via a circular queue.
14. The apparatus according to any one of claims 8 to 12, wherein if the configuration information is further used to instruct the data processing apparatus to invoke inactive discrete data, the storage module is further configured to store the inactive discrete data through a preset Cache.
15. A computer storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the method according to any of claims 1 to 7.
CN201710640687.9A 2017-07-31 2017-07-31 Data processing method and data processing device Active CN109324982B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710640687.9A CN109324982B (en) 2017-07-31 2017-07-31 Data processing method and data processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710640687.9A CN109324982B (en) 2017-07-31 2017-07-31 Data processing method and data processing device

Publications (2)

Publication Number Publication Date
CN109324982A CN109324982A (en) 2019-02-12
CN109324982B true CN109324982B (en) 2023-06-27

Family

ID=65245648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710640687.9A Active CN109324982B (en) 2017-07-31 2017-07-31 Data processing method and data processing device

Country Status (1)

Country Link
CN (1) CN109324982B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407357B (en) * 2020-03-17 2023-08-22 华为技术有限公司 Method and device for inter-process data movement

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9116856B2 (en) * 2012-11-08 2015-08-25 Qualcomm Incorporated Intelligent dual data rate (DDR) memory controller
CN107577614B (en) * 2013-06-29 2020-10-16 华为技术有限公司 Data writing method and memory system
CN103500107B (en) * 2013-09-29 2017-05-17 中国船舶重工集团公司第七0九研究所 Hardware optimization method for CPU
CN105446888B (en) * 2014-05-30 2018-10-12 华为技术有限公司 The method of mobile data, controller and storage system between storage device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杜学亮 ; 金西 ; .NAND flash的并行调度算法.小型微型计算机系统.2010,(第06期),3128-3130. *

Also Published As

Publication number Publication date
CN109324982A (en) 2019-02-12

Similar Documents

Publication Publication Date Title
US9116800B2 (en) Block-based storage device with a memory-mapped interface
CN100481028C (en) Method and device for implementing data storage using cache
EP2478441B1 (en) Read and write aware cache
KR101051506B1 (en) Method and memory controller for scalable multichannel memory access
US6782454B1 (en) System and method for pre-fetching for pointer linked data structures
CN110119304B (en) Interrupt processing method and device and server
US10496550B2 (en) Multi-port shared cache apparatus
CN108496161A (en) Data buffer storage device and control method, data processing chip, data processing system
KR102594657B1 (en) Method and apparatus for implementing out-of-order resource allocation
US20060218332A1 (en) Interface circuit, system, and method for interfacing between buses of different widths
US9342258B2 (en) Integrated circuit device and method for providing data access control
CN113900974B (en) Storage device, data storage method and related equipment
KR102617154B1 (en) Snoop filter with stored replacement information, method for same, and system including victim exclusive cache and snoop filter shared replacement policies
CN102065073B (en) Directly providing data messages to protocol layer
CN117312201B (en) Data transmission method and device, accelerator equipment, host and storage medium
US7694041B2 (en) Method for managing buffers pool and a system using the method
CN109324982B (en) Data processing method and data processing device
KR101103619B1 (en) Multi-port memory system and access control method thereof
CN111562883B (en) Cache management system, method and device for solid state disk
US20150212942A1 (en) Electronic device, and method for accessing data in electronic device
CN109032965B (en) Data reading method, host and storage device
CN112148653A (en) Data transmission device, data processing system, data processing method, and medium
KR100950356B1 (en) Data transfer unit with support for multiple coherency granules
JP2002024007A (en) Processor system
CN113168293B (en) Method and apparatus for accessing cache in clustered storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant