CN116360858B - Data processing method, graphic processor, electronic device and storage medium - Google Patents

Data processing method, graphic processor, electronic device and storage medium Download PDF

Info

Publication number
CN116360858B
CN116360858B CN202310612804.6A CN202310612804A CN116360858B CN 116360858 B CN116360858 B CN 116360858B CN 202310612804 A CN202310612804 A CN 202310612804A CN 116360858 B CN116360858 B CN 116360858B
Authority
CN
China
Prior art keywords
data
operated
address
storage area
interaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310612804.6A
Other languages
Chinese (zh)
Other versions
CN116360858A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Moore Threads Technology Co Ltd
Original Assignee
Moore Threads Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Moore Threads Technology Co Ltd filed Critical Moore Threads Technology Co Ltd
Priority to CN202310612804.6A priority Critical patent/CN116360858B/en
Publication of CN116360858A publication Critical patent/CN116360858A/en
Application granted granted Critical
Publication of CN116360858B publication Critical patent/CN116360858B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The present disclosure relates to the field of information processing technologies, and in particular, to a data processing method, a graphics processor, an electronic device, and a storage medium, where the processing method includes: receiving an instruction to be operated; according to the instruction to be operated, determining an operation mode and an interaction address corresponding to the data to be operated, and acquiring the data to be operated stored in the register group based on the interaction address; wherein the interactive address includes: a storage area row address alignment offset, a storage area intermediate address; the row address alignment offset of the storage area is used for representing the address offset condition of the data to be operated in the storage area under a preset data processing mode; the storage area intermediate address is used for representing the intermediate address of the data to be operated in the corresponding storage area; and generating an operation result according to the data to be operated and the operation mode. The data processing method provided by the embodiment of the disclosure is beneficial to improving the access flexibility of the register set.

Description

Data processing method, graphic processor, electronic device and storage medium
Technical Field
The present disclosure relates to the field of information processing technologies, and in particular, to a data processing method, a graphics processor, an electronic device, and a storage medium.
Background
In the related art, frequent data interaction is performed between the graphics processor and the host, the host sends the data to be operated to the graphics processor, and the graphics processor performs parallel operation on the data according to an operator of the data to be operated after storing the data in the memory, so as to realize high-speed processing of the data to be operated. However, with different program application scenarios, the graphics processor cannot process the data to be operated in different data processing modes at the same time, so that the operation limitation is too large, and how to process the data better is a technical problem that needs to be solved by the developer.
Disclosure of Invention
The disclosure provides a technical scheme for processing data.
According to an aspect of the present disclosure, there is provided a data processing method applied to a graphics processor, where the graphics processor includes a computing core and a register set; the processing method comprises the following steps: receiving an instruction to be operated; according to the instruction to be operated, determining an operation mode and an interaction address corresponding to the data to be operated, and acquiring the data to be operated stored in the register group based on the interaction address; wherein the interactive address includes: a storage area row address alignment offset, a storage area intermediate address; the row address alignment offset of the storage area is used for representing the address offset condition of the data to be operated in the storage area under a preset data processing mode; the storage area intermediate address is used for representing the intermediate address of the data to be operated in the corresponding storage area; and generating an operation result according to the data to be operated and the operation mode.
In a possible implementation manner, the register set includes at least one storage area, and the processing method further includes: receiving a plurality of data to be operated; dividing the data to be operated into a plurality of data groups to be operated according to the data processing modes corresponding to the data to be operated; wherein, the data quantity to be operated in the data group to be operated corresponding to different data processing modes is different; distributing the plurality of data groups to be operated to the at least one storage area to obtain an interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated; wherein each of the plurality of data sets to be operated on is allocated into at least one storage area.
In a possible implementation manner, the obtaining the interaction address corresponding to each data to be operated in the plurality of data to be operated sets includes: according to the address bits of the preset bit number from the lowest bit in the base address of the logic address corresponding to each data to be operated as the storage area row address alignment offset in the interactive address corresponding to each data to be operated, after the rest address bits are shifted left by the preset bit number, adding the sum value obtained by the logic offset address as the storage area middle address in the interactive address corresponding to each data to be operated; the logic offset address is an address offset of a logic address corresponding to each data to be operated.
In a possible implementation manner, the dividing the plurality of data to be operated into a plurality of data groups to be operated according to the data processing modes corresponding to the plurality of data to be operated includes: dividing the data to be operated into a plurality of data groups to be operated corresponding to task identifiers according to the data processing modes corresponding to the data to be operated; wherein the task identifier is used for mapping each to-be-operated data group in the plurality of to-be-operated data groups to a base address position in the at least one storage area.
In a possible implementation manner, the interaction address includes: the step of obtaining the interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated includes: generating a hash value corresponding to each data group to be operated according to a task identifier corresponding to each data group to be operated in the plurality of data groups to be operated; the hash values between adjacent data groups to be operated are different according to the task identification; generating an interaction address corresponding to each data to be operated in each data set to be operated according to the hash value corresponding to each data set to be operated; and storing the data to be operated in the data group to be operated corresponding to different hash values into different physical storage areas.
In a possible implementation manner, the interaction address includes: the step of obtaining the interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated includes: generating a hash value corresponding to each data group to be operated according to a task identifier corresponding to each data group to be operated in the plurality of data groups to be operated and a sampling identifier corresponding to the data group to be operated; wherein the sampling mark is used for representing different sampling points of each data group to be operated aiming at processing in the plurality of data groups to be operated; generating an interaction address corresponding to each data to be operated in each data set to be operated according to the hash value corresponding to each data set to be operated; and storing the data to be operated in the data group to be operated corresponding to different hash values into different physical storage areas.
In one possible implementation manner, the register set includes a storage area, and the acquiring the data to be operated on stored in the register set based on the interaction address includes: determining a row address and a column address of each data to be operated in a corresponding storage area according to the interaction address corresponding to each data to be operated; wherein the row address and the column address are used for representing the positions of registers in the storage area; and accessing a corresponding register according to the row address and the column address of each piece of data to be operated in the corresponding storage area to obtain each piece of data to be operated.
In a possible implementation manner, the register set includes a plurality of storage areas, and the acquiring the data to be operated on stored in the register set based on the interaction address includes: for each piece of data to be operated in the plurality of pieces of data to be operated, determining an interaction address and a storage area identifier corresponding to each piece of data to be operated according to the plurality of pieces of data to be operated; the storage area identifier is used for determining a storage area corresponding to each data group to be operated in the plurality of data groups to be operated when the plurality of data groups to be operated are distributed into a plurality of storage areas; determining a storage area corresponding to each piece of data to be operated according to the storage area identifier corresponding to each piece of data to be operated; determining a row address and a column address of each piece of data to be operated in a corresponding storage area according to the interaction address corresponding to each piece of data to be operated; wherein the row address and the column address are used for representing the positions of registers in the storage area; and accessing a register in the corresponding storage area according to the row address and the column address of each piece of data to be operated in the corresponding storage area to obtain each piece of data to be operated.
In a possible implementation manner, the obtaining the interaction address corresponding to each data to be operated in the plurality of data to be operated sets includes: generating an interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated; generating a segment number corresponding to each piece of data to be operated according to the thread number corresponding to the data to be operated and the total number of threads corresponding to a preset data processing mode; the segment number represents segment offset generated by storing each piece of data to be operated in the preset data processing mode; the determining the row address and the column address of each data to be operated in the corresponding storage area according to the interaction address corresponding to each data to be operated includes: generating a row address of each data to be operated in a corresponding storage area according to the interactive address and the segment number corresponding to each data to be operated; generating a column address of each data to be operated in a corresponding storage area according to the interaction address, the segment number and the hash value corresponding to each data to be operated, or generating a column address of each data to be operated in a corresponding storage area according to the interaction address and the segment number corresponding to each data to be operated.
In a possible implementation manner, the allocating the plurality of data groups to be operated on to the at least one storage area includes: and for each data group to be operated in the plurality of data groups to be operated, uniformly distributing and sending each data group to be operated to each storage area.
In a possible implementation manner, the acquiring the data to be operated on stored in the register set based on the interaction address includes: acquiring data to be operated stored in the register group based on the interaction address through a plurality of pipelines of the graphic processor; the generating an operation result according to the data to be operated and the operation mode comprises the following steps: and generating an operation result corresponding to each piece of data to be operated according to the data to be operated and the operation mode through a plurality of pipelines of the graphic processor.
In a possible implementation manner, the generating, by the multiple pipelines of the graphics processor, an operation result corresponding to each piece of data to be operated according to the data to be operated and the operation mode includes: under the condition that different pipelines simultaneously access a target port of a register group in a plurality of pipelines of the graphic processor, arbitration is carried out according to the priority of the different pipelines and/or the priority of a read write operation, and a target pipeline corresponding to the target port is determined; and sequentially passing through the target pipeline and other pipelines in different pipelines, and generating an operation result corresponding to each piece of data to be operated according to the data to be operated and the operation mode.
According to an aspect of the present disclosure, there is provided a graphic processor including: a computing core, a register set connected to the computing core; the computing core is used for receiving an instruction to be operated; according to the instruction to be operated, determining an operation mode and an interaction address corresponding to the data to be operated, and acquiring the data to be operated stored in the register group based on the interaction address; wherein the interactive address includes: a storage area row address alignment offset, a storage area intermediate address; the row address alignment offset of the storage area is used for representing the address offset condition of the data to be operated in the storage area under a preset data processing mode; the storage area intermediate address is used for representing the intermediate address of the data to be operated in the corresponding storage area; and generating an operation result according to the data to be operated and the operation mode.
According to an aspect of the present disclosure, there is provided an electronic apparatus including: host computer, the graphics processor described above.
According to an aspect of the present disclosure, there is provided a computer-readable, writable data storage medium having stored thereon computer program instructions which when executed by a processor implement the above-described data processing method.
In the embodiment of the disclosure, an instruction to be operated can be received, then an operation mode and an interaction address corresponding to the data to be operated are determined according to the instruction to be operated, the data to be operated stored in the register set is obtained based on the interaction address, and finally an operation result is generated according to the data to be operated and the operation mode. According to the embodiment of the disclosure, the flow interaction can be performed by recording the interaction address of the row address alignment offset of the storage area, and the logical address of the data to be operated in the register set in different data processing modes is converted into the interaction address in the preset data processing mode, so that the register set can be compatible with storing the data to be operated in the different data processing modes and can also be compatible with operation processing of the data to be operated in the different data processing modes, and the access flexibility of the register set is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.
Fig. 1 shows a flowchart of a method for processing data according to an embodiment of the present disclosure.
Fig. 2 shows a flowchart of a method for processing data according to an embodiment of the present disclosure.
Fig. 3 shows a reference schematic diagram of a method for processing data according to an embodiment of the disclosure.
Fig. 4 shows a reference schematic diagram of a method for processing data according to an embodiment of the disclosure.
FIG. 5 illustrates a block diagram of a graphics processor provided in accordance with an embodiment of the present disclosure.
Fig. 6 shows a block diagram of an electronic device provided in accordance with an embodiment of the present disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Furthermore, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.
In the related art, the register set can only process the data to be operated in the same data processing mode generally, the WAVE32 in the related art (in one data processing flow, the WAVE is completed by 32 threads, wherein the WAVE is a thread bundle under a SIMT (single instruction multi-thread stream) programming model, the WAVE32, the WAVE64 and the WAVE128 respectively execute the same instruction sequence together for 32 threads, 64 threads and 128 threads), the WAVE64 (in one data processing flow, the WAVE128 (in one data processing flow, the data processing is completed by 64 threads), the WAVE128 (in one data processing flow, the WAVE128 is not compatible), the WAVE32 and the WAVE128 are taken as examples, if the register set is a register matrix of a plurality of rows and a plurality of columns, the WAVE is exemplified by 32, the data to be operated stored in the WAVE32 data processing mode is an integral row, the WAVE128 is stored in the WAVE128 data processing mode, the WAVE128 is a complete row, the WAVE128 is carried out by the same instruction sequence together for executing the same instruction sequence, the WAVE128 is carried out by the corresponding to the address of the required to be operated by the WAVE128 in the related art, the error handling mode is added by adding the address of 1 or the WAVE128 to the address of the associated data processing mode, and the error handling mode is not added to be compatible with the data is added by the logic 32 when the address of the associated data is added to the address of the data is required to be added to be processed by the logic 1. In other words, there is no data processing mode in the related art, that is, there is only a fixed data processing mode.
In view of this, an embodiment of the present disclosure provides a data processing method, which may receive an instruction to be operated, then determine an operation mode and an interaction address corresponding to the data to be operated according to the instruction to be operated, obtain the data to be operated stored in the register set based on the interaction address, and finally generate an operation result according to the data to be operated and the operation mode. According to the embodiment of the disclosure, the flow interaction can be performed by recording the interaction address of the row address alignment offset of the storage area, and the logical address of the data to be operated in the register set in different data processing modes is converted into the interaction address in the preset data processing mode, so that the register set can be compatible with storing the data to be operated in the different data processing modes and can also be compatible with operation processing of the data to be operated in the different data processing modes, and the access flexibility of the register set is improved.
Referring to fig. 1, fig. 1 shows a flowchart of a method for processing data according to an embodiment of the disclosure, where the method is applied to a graphics processor, and in one example, the graphics processor is heterogeneous connected to a host, and the graphics processor includes a computing core and a register set, as shown in fig. 1. The processing method comprises the following steps: step S100, receiving an instruction to be operated. For example, the above-described instruction to be operated on may be expressed as an operator, such as addition, multiplication, or the like, and the operators of the plurality of data to be operated on may be the same. In one example, if the graphics processor is connected to a host, the instructions to be operated on may be sent by the host.
Step 200, determining an operation mode and an interaction address corresponding to the data to be operated according to the instruction to be operated, and acquiring the data to be operated stored in the register set based on the interaction address. Wherein the interactive address includes: the row address of the storage area is aligned with the offset and the middle address of the storage area. The row address alignment offset of the storage area is used for representing the address offset condition of the data to be operated in the storage area under a preset data processing mode. The storage area intermediate address is used for representing the intermediate address of the data to be operated in the corresponding storage area. The constitution of the interactive address will be described in detail later. The intermediate address can be between the conversion process of the physical address and the logical address to realize the conversion between different addresses, and the corresponding relationship exists between the intermediate address and the physical address and between the intermediate address and the logical address. In one example, the instruction to be operated on may include an operation code and an operation field, where the operation code is used to represent the operation mode, and the operation field is used to be analyzed to obtain the interaction address, and the specific conversion relationship may be dependent on the actual situation.
Step S300, generating an operation result according to the data to be operated and the operation mode. In one example, the operation result may be stored in a storage medium of the graphics processor, and if the graphics processor is connected to the host, the host may access the storage medium after receiving the calculation completion signal of the graphics processor, so as to obtain the operation result. In another example, the above operation result may be converted into a display signal of a display screen connected to the graphic processor only by data processing in the related art to realize display of a picture.
In a possible implementation manner, the obtaining the data to be operated on stored in the register set based on the interaction address in step S200 may include: and acquiring data to be operated stored in the register group based on the interactive address through a plurality of pipelines of the graphic processor. Step S300 may include: and generating an operation result corresponding to each piece of data to be operated according to the data to be operated and the operation mode through a plurality of pipelines of the graphic processor. For example, the graphics processor may access the interaction address through at least one pipeline to obtain data to be operated, and operate the data to be operated based on an instruction to be operated, where a specific operation flow and the instruction to be operated specifically indicate what operation the embodiment of the disclosure is not limited herein, and a developer may set the operation according to actual situations. For example, the address may be calculated by an instruction in the compute core resolving the address unit or placed in a register set for calculation, and embodiments of the present disclosure are not limited herein. For example, the data processing of one addition operation is taken as an example of a single thread instance, the thread index_n processes an a array element a [ index_n ] +b array element B [ index_n ], and the array a [5] = {1,3,5,7,9} and the array B [5] = {2,4,6,8, 10} are added to obtain 3 (1+2), 7 (3+4), 11 (5+6), 15 (7+8), and 19 (9+10) as the operator results, and the array C [5] = {3,7, 11, 15, 19} as the operator results.
In a possible implementation manner, the generating, by the multiple pipelines of the graphics processor, an operation result corresponding to each piece of data to be operated according to the data to be operated and the operation mode includes: and under the condition that different pipelines exist in a plurality of pipelines of the graphic processor and simultaneously access the target port of the register group, the target pipeline corresponding to the target port is determined according to the priority of the different pipelines and/or the priority of the read write operation. Illustratively, the priority of the pipeline and the priority of the read and write operations are not limited herein, and the developer may set the priority according to the actual requirements. For example: the priority of the pipeline may be related to the timing, numbering order, etc. corresponding to the pipeline. And then sequentially passing through the target pipeline and other pipelines in different pipelines, and generating an operation result corresponding to each piece of data to be operated according to the data to be operated and the operation mode. Referring to fig. 2, fig. 2 is a flowchart illustrating a method for processing data according to an embodiment of the present disclosure, where in a possible implementation, the register set includes at least one storage area, and illustratively, the register set may be represented as a general-purpose data register set, where each storage area includes a number of registers. The processing method further comprises the following steps: step S10, a plurality of data to be operated are received. In one example, the plurality of data to be operated on corresponds to an operation order. For example, the above data to be operated on may be represented as a vector, an array, a matrix, etc., and the elements thereof may have an attribute of an operation order, which may be used to divide the data to be operated on, determine a storage order of each element, etc. Taking an array as an example, the instruction to be operated on may be represented by an addition between arrays, and the plurality of data to be operated on may include: the array A [5] = {1,3,5,7,9} and the array B [5] = {2,4,6,8, 10} are then calculated sequentially by A [0] +B0 ], A [1] +B1 ], A [2] +B2 ], A [3] +B3 ], A [4] +B4 ], resulting in a new array C [5] = {3,7, 11, 15, 19}. In one example, if the graphics processor is connected to a host, multiple data to be operated on may be sent by the host, with the specific data content being dependent on the upper layer task running in the host.
Step S20, dividing the plurality of data to be operated into a plurality of data sets to be operated according to the data processing modes corresponding to the plurality of data to be operated. The data to be operated in the data group to be operated corresponding to different data processing modes are different in data quantity. For example, the data processing modes may include, for example, WAVE32, WAVE64, WAVE128, etc. in the related art, where, for example, WAVE32 is taken as an example, if there are 64 pieces of data to be operated on, the pieces of data to be operated on may correspond to 2 pieces of segment processing of WAVE32, each WAVE32 includes 32 threads (threads 0 to 31), and the threads with corresponding numbers may be allocated according to the operation sequence of the pieces of data to be operated on (in other examples, the threads may also be allocated by other rules, and the embodiments of the disclosure are not limited herein). Continuing to take the example of array A [5] = {1,3,5,7,9}, thread number 0 corresponds to number 1, thread number 1 corresponds to number 3, until the elements in array A are allocated by WAVE32 to completion. If array A is less than 32 elements in the allocation process, it may still be allocated a WAVE32, and some threads in the WAVE32 may be idle. For example, array a has 50 elements, and 18 elements remain after allocation of one WAVE32 (one thread corresponds to one element in this example), then one WAVE32 is allocated, and 14 threads in the WAVE32 are in an idle state. Taking the data processing modes of the WAVE32 and the WAVE128 as examples, the different amounts of data to be operated in the data groups to be operated corresponding to the different data processing modes can be expressed as follows: there are 32 threads in WAVE32, i.e., there are 32 data to be operated on, and 128 threads in WAVE128, i.e., there are 128 data to be operated on.
In one possible implementation, step S20 may include: and dividing the data to be operated into a plurality of data groups to be operated corresponding to the task identifiers according to the data processing modes corresponding to the data to be operated. Wherein the task identifier is used for mapping each to-be-operated data group in the plurality of to-be-operated data groups to a base address position in the at least one storage area. For example: if the array a has 100 elements and the corresponding data processing mode is WAVE32, then the task of WAVE32 with task identifiers of 0, 1, 2, and 3 can be allocated.
With continued reference to fig. 2, step S30 allocates the plurality of data sets to be operated to the at least one storage area, so as to obtain an interaction address corresponding to each data to be operated in each data set to be operated in the plurality of data sets to be operated. Wherein each of the plurality of data sets to be operated on is allocated into at least one storage area. The interactive address includes: the row address of the storage area is aligned with the offset and the middle address of the storage area. The row address alignment offset of the storage area is used for representing the address offset condition of the plurality of data to be operated in the storage area under a preset data processing mode. The storage area intermediate address is used for representing the intermediate address of each data to be operated in the corresponding storage area. For example, the row address alignment offset of the storage area in the interactive address may be located at a high level, and the middle address of the storage area may be located at a low level. In one example, the preset data processing mode may be a data processing mode with a smaller number of threads, for example: if a developer wants to make the register set compatible with WAVE32, WAVE64, and WAVE128, since WAVE64 can be split into two WAVEs 32 and WAVE128 can be split into four WAVEs 32, WAVE32 can be used as the above-mentioned preset data processing mode, so that the data processing mode with larger thread bundle can be aligned to the data processing mode with smaller thread number. If the total number of bits of the intermediate address corresponding to the WAVE32 is 10, the total number of bits of the intermediate address corresponding to the WAVE128 is 10, and the preset number of bits (described in detail later) corresponding to the WAVE128 is 2, the last two bits of the intermediate address of the WAVE128 can be selected as the above-mentioned row address alignment offset of the storage area. Of course, the above-mentioned row address alignment offset of the storage area may also be mapped with the last two digits of the numerical value, and the embodiment of the disclosure is not limited herein.
In a possible implementation manner, the obtaining, in step S30, the interaction address corresponding to each data to be operated in the plurality of data sets to be operated may include: and according to the address bit of the preset bit number from the lowest bit in the base address of the logic address corresponding to each data to be operated as the storage area row address alignment offset in the interactive address corresponding to each data to be operated, shifting the rest address bit left by the preset bit number, and adding the sum value obtained by the logic offset address as the storage area middle address in the interactive address corresponding to each data to be operated. The logic offset address is an address offset of a logic address corresponding to each data to be operated. Illustratively, the logical offset address is used to represent a logical offset corresponding to different data to be operated on, for example: the logical offset between the 9 th data to be operated on is 9 compared with the 0 th data to be operated on. In one example, the logical address, logical offset address, may be obtained by a compiler in a graphics processor, and in one example, the instruction to be operated on may include: operation type, logical address. In another example, the logical address may include a base address, a logical offset. In one example, the number of preset bits corresponding to different data processing modes is different. For example, if WAVE32 is used as the default data processing mode, then WAVE32 may have a default number of bits of 0, WAVE64 may have a default number of bits of 1, and WAVE128 may have a default number of bits of 2. Referring to fig. 3, fig. 3 is a reference diagram of a data processing method according to an embodiment of the present disclosure, and in conjunction with fig. 3, taking an example when n in a storage area of a general Register set is 0 to 3 (r_4n+3, R0 to R4 in the reference diagram, where R is a Register, and the abbreviation of a Register) as a case, a data array D [5] to be processed by a first thread instance is {11, 13, 15, 17, 19}, then 19 is to be processed data of a 4 th participation operation, 11 is to be processed data of a 0 th participation operation (dw0=11 in a first data processing mode in the reference diagram, DW is to be Double-Word in the abbreviation of the reference diagram), in an example, each Double-Word may be stored with 8 Double words (DW 0 to DW7 in the reference diagram) in the present disclosure, each Double Word may be accessed by a corresponding thread, and if hash 32 is to be stored in a first data processing mode (DW 0=19 in the reference diagram), and when the hash 32 is not to be processed in the first data processing mode, and the hash area is stored in the first data processing mode (DW 1) is not to be processed in the reference diagram. In the Wave128 data processing mode (refer to the second data processing mode in the figure), the corresponding logical offset is 4 rows (refer to the segments 0 to 3 in the figure), if the hash bit processing is not considered, the data 11 to be operated is stored at the R0 position (refer to dw0=11 in the second data processing mode in the figure) in the current task starting row, and the data 19 to be operated is stored at the R4 position (refer to dw0=19 in the second data processing mode in the figure) where the logical offset is increased by 4 rows in the current task starting row.
For example, the embodiment of the disclosure provides a method for generating a row address alignment offset of a storage area for reference, where a preset data processing mode is WAVE32, a data processing mode corresponding to a plurality of data to be operated is WAVE128, a total bit number of an intermediate address corresponding to WAVE32 is 10 bits, a total bit number of an intermediate address corresponding to WAVE128 is 10 bits (a high bit may have 0 padding), and a pseudo code is as follows: the storage area row address alignment offset base_align_mod4= (wave_mode= wave 32). With the above example, base & 0x3 may obtain two lower bits of the WAVE128 that are not aligned compared to WAVE32 as the storage area row address alignment offset, and the specific value may be combined with the actual situation of the developer and then the storage area row address alignment offset is moved to the higher bit of the interactive address, i.e. base_align_mod4 < = const_addr_bit, where const_addr_bit is the total number of bits of the middle address of WAVE32, in this example, 10, i.e. the storage area row address alignment offset is used as the 11 th bit and 12 th bit of the interactive address, in other words, the value of const_addr_bit in this example is equal to the total number of bits of the middle address of the preset data processing mode. The sum of the values corresponding to the other address bits (in this example, bits 2 to 9 of WAVE 128) and the logical offset addresses is then used as the intermediate address of the storage area in the interaction address corresponding to each address to be operated on. In another example, the interactive address may further include a hash bit (described in detail below), and the value of const_addr_bit is equal to the total number of bits of the intermediate address of the preset data processing mode plus the number of bits of the hash bit. For example, if the data processing mode corresponding to the data to be operated is the same as the preset data processing mode, the row address alignment offset of the storage area is 0. Illustratively, the pseudo code for generating the storage area intermediate address ac_addr1 is as follows: ac_addr1= ((base > > base_shift_bit) < < 2) +reg_off, where reg_off is the logical offset address after instruction logical address translation and base_shift_bit is the address alignment movement offset. Illustratively, taking the example that the intermediate address of the WAVE32 and the intermediate address of the WAVE128 are two bits worse (filled to the upper order to align the two intermediate addresses), the preset data format is WAVE32, and the compatibility of WAVE32 and WAVE128 is to be realized, the pseudo code for generating the address alignment shift offset base_shift_bit is: base_shift_bit= (wave_mode= wave 32). Illustratively, the pseudo code for generating the interaction address ac_addr2 is as follows: ac_addr2=base_align_mod4|ac_addr1.
In a possible implementation manner, the interaction address includes: the step S30 of obtaining the interaction address corresponding to each data to be operated in each data set to be operated in the plurality of data sets to be operated may include: and generating a hash value corresponding to each data group to be operated according to the task identifier corresponding to each data group to be operated in the plurality of data groups to be operated. Wherein the hash values between adjacent data groups to be operated on by the task identification are different. For example, the hash value may be simply divided into parity identifiers, which may be represented by 1 bit (0 identifier even or 1 identifier odd), and the remainder of 2 may be taken from the task identifier, that is, the parity of the task identifier may be known, and 0 or 1 may be used as the hash value. In one example, the lowest order of the task identifier may also be directly used as the hash value. The generation manner of the task identifier in the embodiment of the present disclosure is not described herein, and may be generated when the plurality of data to be operated is divided into different data sets to be operated, and may represent the calculation sequence of the plurality of data sets to be operated. Of course, in one example, the hash value may also be the remainder of any value of the task identifier, and the number of bits of the hash value may also be correspondingly changed, so that the value of const_addr_bit in the above description may be adaptively changed. The task is a task identifier. And then generating an interaction address corresponding to each data to be operated in each data set to be operated according to the hash value corresponding to each data set to be operated. And storing the data to be operated in the data group to be operated corresponding to different hash values into different physical storage areas. Illustratively, the hash value is generated as follows: task_hash= (task & 0x 1) < (const_addr_bit-1), where const_addr_bit is processed 1 bit more than no hash bits due to the addition of hash bits. The interaction address thus available is: the row address of the high-order storage area is aligned with the offset, the hash value and the middle address of the low-order storage area. Illustratively, in the case where the interaction address ac_addr2 is generated from a hash value, then the pseudocode that generates it may be as follows: ac_addr2=base_align_mod4|task_hash|ac_addr1.
Referring to fig. 4, fig. 4 is a reference schematic diagram of a data processing method according to an embodiment of the present disclosure, and in conjunction with fig. 4, in the embodiment of the present disclosure, storage manners of data processing modes under different hash values are different, where a preset data processing mode is taken as a second data processing mode, a Register set is taken as an example of a general data Register set, the first data processing mode occupies four rows (refer to middle 0 to section 3 of the figure), R0 to r_2n-1 (where R is an abbreviation of Register, or referred to as a Register) in the figure is 2N registers, and it should be understood that the number of storage areas may be arbitrarily set, and the figure is not limited. In the figure, the storage modes corresponding to even tasks and odd tasks (which can be represented by upper Wen Haxi values) of a first data processing mode are staggered, and the storage modes corresponding to even tasks and odd tasks of a second data processing mode are staggered, and because the storage Bank positions of parity tasks are different, the parity tasks can also read information from a plurality of register columns (or called a plurality of banks and sharing the same communication channel) at the same time in a reading stage, thereby being beneficial to improving the processing speed of data. The specific interleaving manner is not limited herein, and different hash values may correspond to different storage orders of the data to be operated.
In a possible implementation manner, the interaction address includes: the step S30 of obtaining the interaction address corresponding to each data to be operated in each data set to be operated in the plurality of data sets to be operated may include: and generating a hash value corresponding to each data group to be operated according to the task identifier corresponding to each data group to be operated in the plurality of data groups to be operated and the sampling identifier corresponding to the data group to be operated. Wherein the sampling identifier is used for representing different sampling points of each data group to be operated on in the plurality of data groups to be operated on for processing. And generating an interaction address corresponding to each data to be operated in each data set to be operated according to the hash value corresponding to each data set to be operated. And storing the data to be operated in the data group to be operated corresponding to different hash values into different physical storage areas. Illustratively, the hash value is generated as follows: task_hash= ((task & 0x 1)/(sample & 0x 1)) < (const_addr_bit-1). The sampling identifier may be generated by the computing core, and reference may be made to related technologies, which are not described herein in detail in this disclosure. The data processing method provided by the embodiment of the disclosure not only considers the influence of different task identifications on the storage Bank position, but also considers the influence of the sampling identifications on the storage Bank position, thereby being beneficial to realizing the equalization of the subsequent pipeline when the data to be operated is accessed and reducing the access conflict.
In one possible implementation manner, the register set includes a storage area, and the acquiring the data to be operated on stored in the register set based on the interaction address includes: and determining the row address and the column address of each piece of data to be operated in the corresponding storage area according to the interaction address corresponding to each piece of data to be operated. Wherein the row address and the column address are used for representing the positions of registers in the storage area. And then accessing the corresponding register according to the row address and the column address of each piece of data to be operated in the corresponding storage area to obtain each piece of data to be operated. For example, in the case that the register group includes only one storage area, each interaction address corresponds to a unique row address and a unique column address, and the corresponding register in the register group can be accessed directly through the row address and the column address to obtain the data to be operated. Each interactive address corresponds to a unique row address and a unique column address, and the specific generation corresponding relation can be determined according to actual requirements.
In one possible implementation, the register set includes a plurality of memory regions. In one example, the acquiring the data to be operated on stored in the register set based on the interaction address in step S200 may include: and determining an interaction address and a storage area identifier corresponding to each piece of data to be operated according to the plurality of pieces of data to be operated aiming at each piece of data to be operated in the plurality of pieces of data to be operated. The storage area identifier is used for determining a storage area corresponding to each data group to be operated in the plurality of data groups to be operated when the plurality of data groups to be operated are distributed into the plurality of storage areas. For example, each of the plurality of data sets to be operated may correspond to a storage area identifier, and the specific allocation rule is not limited herein, for example, the plurality of data sets to be operated may be arbitrarily allocated according to a storage space weight of each storage area. In one example, the allocating the plurality of data groups to be operated on into the at least one storage area may include: and for each data group to be operated in the plurality of data groups to be operated, uniformly transmitting each data group to be operated to each storage area. Taking the processing format of the data to be calculated as WAVE32 and the storage area as 4 examples, the 32 threads in WAVE32 (each thread corresponds to store one data to be calculated) can be divided into 4 parts, each 8 continuous threads are small thread blocks and are provided with a thread block number, namely thread blocks corresponding to threads 0-7 are stored in the storage area 0, threads 8-15 are stored in the storage area 1, threads 16-23 are stored in the storage area 2, and threads 24-31 are stored in the storage area 3. Illustratively, taking a total of four memory regions as an example, the memory regions are identified as the lower two bits of the thread block number to represent a memory into the corresponding memory region. And then determining the storage area corresponding to each piece of data to be operated according to the storage area identifier corresponding to each piece of data to be operated. Illustratively, the storage area identifiers are in one-to-one correspondence with the storage areas. And determining the row address and the column address of each piece of data to be operated in the corresponding storage area according to the interaction address corresponding to each piece of data to be operated. Wherein the row address and the column address are used for representing the positions of registers in the storage area. And finally, accessing a register in the corresponding storage area according to the row address and the column address of each piece of data to be operated in the corresponding storage area to obtain each piece of data to be operated. The row address and the column address are used for indicating the number of rows and the number of columns of the register to be accessed in the register group, so as to accurately access the register.
In a possible implementation manner, the obtaining, in step S30, the interaction address corresponding to each data to be operated in the plurality of data sets to be operated may include: and generating an interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated. And generating the segment number corresponding to each piece of data to be operated according to the thread number corresponding to the data to be operated and the total number of threads corresponding to the preset data processing mode. The segment number represents segment offset generated by storing each piece of data to be operated in the preset data processing mode. For example, each data to be operated on may correspond to a thread, and if the data processing mode corresponding to the data to be operated on is WAVE32, the thread number corresponding to the data to be operated on may be 0 to 31, and if it is WAVE128, the thread number corresponding to the data to be operated on may be 0 to 127. The total number of threads corresponding to the preset data processing mode is 32 if WAVE32 is taken as an example, and 128 if WAVE128 is taken as an example. The number of threads may be directly divided by the total number of threads in the segment, for example, the number of threads is 30, the total number of threads is 32, and the corresponding number of segments is 0 (30/32=0), for example, the number of threads 66 (for example, the data processing mode corresponding to the data to be operated is WAVE 128), and the total number of threads is 32, for example, the corresponding number of segments is 2 (66/32=2). Referring to fig. 4, the first data processing mode generates a segment offset when being adapted to a preset data processing mode, where the first data processing mode is taken as a WAVE128, the preset data processing mode is taken as a WAVE32 example, the WAVE128 has 128 threads, and the WAVE32 has 32 threads, so that processing the data format of the WAVE128 in the data format of the WAVE32 is equivalent to a segment that the WAVE128 includes four WAVEs 32, and if each row of registers can store data to be operated corresponding to one WAVE32, 4 rows (or 4 segments) are required, the WAVE128 includes four segments of data to be operated with the number of segments being 0 to 3 (refer to the segments 0 to 3 in the figure). The embodiment of the disclosure processes the situation by setting the number of segments to realize compatible processing of the data to be operated in different data processing modes.
Illustratively, the interactive address is first processed to obtain physical_addr1, and the pseudo code is as follows: physical_addr1=ac_addr2+burst_offset, which is the accumulated value corresponding to the repeat instruction or burst instruction in the related art. For example, the pipeline may obtain the physical_addr1, and then process it to obtain parameters required for accessing the data to be operated. Illustratively, the pseudocode is as follows: base_align_mod4= (physical_addr1 > > const_addr_bit) & 0x3, task_hash= ((physical_addr1 > > (const_addr_bit-1))) & 0x 1) < < 1, segment_hash= (4-segment), physical_addr2=physical_addr1 = (0 x1 < < < (const_addr_bit-1)) -0x1. The segment_num is the total number of the segments of the Wave, and the effect of the segment_hash (or the segment hash) is similar to that of the hash value in the above, so that the balance of the pipeline in access can be improved, namely the occurrence probability of access conflict can be reduced. In one example, the determining, according to the interaction address corresponding to each piece of data to be operated, a row address and a column address of each piece of data to be operated in the corresponding storage area includes: and generating a row address of each piece of data to be operated in a corresponding storage area according to the interaction address and the segment number corresponding to each piece of data to be operated. Illustratively, the pseudo code for the generation of the row address line_addr is as follows: line_addr= ((physical_addr2 > > 2) < < base_shift_bit) +segment+base_align_mod4. And then generating a column address of each piece of data to be operated in a corresponding storage area according to the interaction address, the segment number and the hash value corresponding to each piece of data to be operated, or generating a column address of each piece of data to be operated in a corresponding storage area according to the interaction address and the segment number corresponding to each piece of data to be operated. Illustratively, the pseudo code for the generation of the column address bank addr is as follows: bank_addr= (task_hash+segment_hash+ (physical_addr2 & 0x 3))%bank_num. Where bank_num is the total number of banks in one storage area. In one example, to better support the operation of the pipeline, operations to read the data to be operated on may be placed inside the pipeline, which may be read first and then operated on when the pipeline is ready to execute instructions to be operated on. The segment number can better indicate the access of the data processing mode with the thread number higher than the preset data processing mode so as to be compatible with the data to be operated for reading different data processing modes. In another example, when different pipelines perform simultaneous reading or simultaneous writing operations to ports of the same register group, an arbitration decision may be performed according to the priority of the pipeline itself, in one example, the reading operation and the writing operation may also be separately performed for arbitration, where the priority and the specific arbitration content are not limited herein, and a developer may set the arbitration decision according to the actual situation. When the pipeline finally accesses the register set, the embodiment of the disclosure can remove the unaligned part based on the physical_addr1, and then separate the storage area row address alignment offset, the hash value and the storage area intermediate address respectively. The physical address addr2 can be a storage area middle address, the storage area middle address is shifted right and then shifted left to align the digits needing to be shifted, and then the segment number, the hash value and the storage area row address alignment offset are added to obtain the row address which is accessed finally. The hash value, the segment number hash, and the base_shift_bit number of physical_addr2 may be summed up, and the remainder may be taken according to the total number of segment numbers, to obtain the column address. In the embodiment of the disclosure, the data to be operated and the instruction to be operated which are fetched from each pipeline can be dynamically and circularly executed for a plurality of times in a vector operation mode, so that the resource utilization rate is higher. In hardware implementation, each computing core can internally comprise a WAVE scheduling and instruction scheduling execution unit so as to realize compatible management of different WAVE, scheduling can be performed according to the execution granularity and the execution state of the WAVE, scheduling efficiency can be improved, and the utilization rate of each pipeline can be improved.
According to the data processing method provided by the embodiment of the disclosure, each group of computing cores can be directly compatible with the data processing mode of WAVE32, and can also be cycled to be compatible with WAVE64 for two times and compatible with WAVE128 for four times. The graphics rendering pipeline in the graphics processor can use the WAVE32 mode to execute a producer-consumer model in the related technology, can also be compatible with the simultaneous processing of WAVE in multiple modes, and can dynamically configure and assemble the WAVE to process data according to specific application scenes.
It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.
In addition, the disclosure further provides an electronic device, a computer readable and writable data storage medium, and a program, where the foregoing may be used to implement any data processing method provided by the disclosure, and corresponding technical schemes and descriptions and corresponding descriptions referring to method parts are not repeated.
Referring to fig. 5, fig. 5 shows a block diagram of a graphics processor provided according to an embodiment of the disclosure, and in conjunction with fig. 5, the graphics processor 100 includes: a computing core 110, a register set 120 coupled to the computing core. In one example, the graphics processor 100 may be connected to a host, and the computing core may be configured to receive instructions to be operated on; determining an operation mode and an interaction address corresponding to the data to be operated according to the instruction to be operated, and acquiring the data to be operated stored in the register group based on the interaction address; wherein the interactive address includes: a storage area row address alignment offset, a storage area intermediate address; the row address alignment offset of the storage area is used for representing the address offset condition of the data to be operated in the storage area under a preset data processing mode; the storage area intermediate address is used for representing the intermediate address of the data to be operated in the corresponding storage area; and generating an operation result according to the data to be operated and the operation mode.
In a possible implementation, the register set includes at least one storage area, and the computing core is further configured to: receiving a plurality of data to be operated; dividing the data to be operated into a plurality of data groups to be operated according to the data processing modes corresponding to the data to be operated; wherein, the data quantity to be operated in the data group to be operated corresponding to different data processing modes is different; distributing the plurality of data groups to be operated to the at least one storage area to obtain an interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated; wherein each of the plurality of data sets to be operated on is allocated into at least one storage area.
In a possible implementation manner, the obtaining the interaction address corresponding to each data to be operated in the plurality of data to be operated sets includes: according to the address bits of the preset bit number from the lowest bit in the base address of the logic address corresponding to each data to be operated as the storage area row address alignment offset in the interactive address corresponding to each data to be operated, after the rest address bits are shifted left by the preset bit number, adding the sum value obtained by the logic offset address as the storage area middle address in the interactive address corresponding to each data to be operated; the logic offset address is an address offset of a logic address corresponding to each data to be operated.
In a possible implementation manner, the dividing the plurality of data to be operated into a plurality of data groups to be operated according to the data processing modes corresponding to the plurality of data to be operated includes: dividing the data to be operated into a plurality of data groups to be operated corresponding to task identifiers according to the data processing modes corresponding to the data to be operated; wherein the task identifier is used for mapping each to-be-operated data group in the plurality of to-be-operated data groups to a base address position in the at least one storage area.
In a possible implementation manner, the interaction address includes: the step of obtaining the interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated includes: generating a hash value corresponding to each data group to be operated according to a task identifier corresponding to each data group to be operated in the plurality of data groups to be operated; the hash values between adjacent data groups to be operated are different according to the task identification; generating an interaction address corresponding to each data to be operated in each data set to be operated according to the hash value corresponding to each data set to be operated; and storing the data to be operated in the data group to be operated corresponding to different hash values into different physical storage areas.
In a possible implementation manner, the interaction address includes: the step of obtaining the interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated includes: generating a hash value corresponding to each data group to be operated according to a task identifier corresponding to each data group to be operated in the plurality of data groups to be operated and a sampling identifier corresponding to the data group to be operated; wherein the sampling mark is used for representing different sampling points of each data group to be operated aiming at processing in the plurality of data groups to be operated; generating an interaction address corresponding to each data to be operated in each data set to be operated according to the hash value corresponding to each data set to be operated; and storing the data to be operated in the data group to be operated corresponding to different hash values into different physical storage areas.
In one possible implementation manner, the register set includes a storage area, and the acquiring the data to be operated on stored in the register set based on the interaction address includes: determining a row address and a column address of each data to be operated in a corresponding storage area according to the interaction address corresponding to each data to be operated; wherein the row address and the column address are used for representing the positions of registers in the storage area; and accessing a corresponding register according to the row address and the column address of each piece of data to be operated in the corresponding storage area to obtain each piece of data to be operated.
In a possible implementation manner, the register set includes a plurality of storage areas, and the acquiring the data to be operated on stored in the register set based on the interaction address includes: for each piece of data to be operated in the plurality of pieces of data to be operated, determining an interaction address and a storage area identifier corresponding to each piece of data to be operated according to the plurality of pieces of data to be operated; the storage area identifier is used for determining a storage area corresponding to each data group to be operated in the plurality of data groups to be operated when the plurality of data groups to be operated are distributed into a plurality of storage areas; determining a storage area corresponding to each piece of data to be operated according to the storage area identifier corresponding to each piece of data to be operated; determining a row address and a column address of each piece of data to be operated in a corresponding storage area according to the interaction address corresponding to each piece of data to be operated; wherein the row address and the column address are used for representing the positions of registers in the storage area; and accessing a register in the corresponding storage area according to the row address and the column address of each piece of data to be operated in the corresponding storage area to obtain each piece of data to be operated.
In a possible implementation manner, the obtaining the interaction address corresponding to each data to be operated in the plurality of data to be operated sets includes: generating an interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated; generating the number of segments corresponding to each piece of data to be operated according to the thread number corresponding to the data to be operated and the total number of threads in the segments corresponding to the preset data processing mode; the segment number represents segment offset generated by storing each piece of data to be operated in the preset data processing mode; the determining the row address and the column address of each data to be operated in the corresponding storage area according to the interaction address corresponding to each data to be operated includes: generating a row address of each data to be operated in a corresponding storage area according to the interactive address and the segment number corresponding to each data to be operated; generating a column address of each data to be operated in a corresponding storage area according to the interaction address, the segment number and the hash value corresponding to each data to be operated, or generating a column address of each data to be operated in a corresponding storage area according to the interaction address and the segment number corresponding to each data to be operated.
In a possible implementation manner, the allocating the plurality of data groups to be operated on to the at least one storage area includes: and for each data group to be operated in the plurality of data groups to be operated, uniformly distributing and sending each data group to be operated to each storage area.
In a possible implementation manner, the acquiring the data to be operated on stored in the register set based on the interaction address includes: acquiring data to be operated stored in the register group based on the interaction address through a plurality of pipelines of the graphic processor; the generating an operation result according to the data to be operated and the operation mode comprises the following steps: and generating an operation result corresponding to each piece of data to be operated according to the data to be operated and the operation mode through a plurality of pipelines of the graphic processor.
In a possible implementation manner, the generating, by the multiple pipelines of the graphics processor, an operation result corresponding to each piece of data to be operated according to the data to be operated and the operation mode includes: under the condition that different pipelines simultaneously access a target port of a register group in a plurality of pipelines of the graphic processor, arbitration is carried out according to the priority of the different pipelines and/or the priority of a read write operation, and a target pipeline corresponding to the target port is determined; and sequentially passing through the target pipeline and other pipelines in different pipelines, and generating an operation result corresponding to each piece of data to be operated according to the data to be operated and the operation mode.
The method has specific technical association with the internal structure of the computer system, and can solve the technical problems of improving the hardware operation efficiency or the execution effect (including reducing the data storage amount, reducing the data transmission amount, improving the hardware processing speed and the like), thereby obtaining the technical effect of improving the internal performance of the computer system which accords with the natural law.
In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.
The embodiment of the disclosure also provides a computer readable and writable data storage medium, on which computer program instructions and data to be processed are stored, the computer program instructions realizing the above method when being executed by a processor. The computer readable and writable data storage medium may be a volatile or non-volatile computer readable and writable data storage medium.
The embodiment of the disclosure also provides an electronic device, which comprises: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored in the memory to perform the above method.
Embodiments of the present disclosure also provide a computer program product comprising computer readable code, or a non-volatile computer readable and writable data storage medium carrying computer readable code, which when run in a processor of an electronic device, performs the above method.
The electronic device may be provided as a terminal device, a server or other form of device.
Referring to fig. 6, fig. 6 illustrates a block diagram of an electronic device 1900 provided in accordance with an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server or terminal device. Referring to FIG. 6, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output interface 1958. Electronic device 1900 may operate an operating system based on memory 1932, such as the Microsoft Server operating system (Windows Server) TM ) Apple Inc. developed graphical user interface based operating System (Mac OS X TM ) Multi-user multi-process computer operating system (Unix) TM ) Unix-like operating system (Linux) of free and open source code TM ) Open original codeUnix-like operating system (FreeBSD) TM ) Or the like.
In an exemplary embodiment, a non-transitory computer readable and writable data storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.
The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable and writable data storage medium on which computer readable program instructions are loaded for causing a processor to implement aspects of the present disclosure.
The computer readable and writable data storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable and writable data storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable and writable data storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. A computer-readable and writable data storage medium as used herein is not to be construed as a transitory signal itself, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., an optical pulse through a fiber optic cable), or an electrical signal transmitted through an electrical wire.
The computer readable program instructions described herein may be downloaded from a computer readable writable data storage medium to the respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable, writable data storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable and writeable data storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
If the technical scheme of the application relates to personal information, the product applying the technical scheme of the application clearly informs the personal information processing rule before processing the personal information and obtains the autonomous agreement of the individual. If the technical scheme of the application relates to sensitive personal information, the product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'explicit consent'. For example, a clear and remarkable mark is set at a personal information acquisition device such as a camera to inform that the personal information acquisition range is entered, personal information is acquired, and if the personal voluntarily enters the acquisition range, the personal information is considered as consent to be acquired; or on the device for processing the personal information, under the condition that obvious identification/information is utilized to inform the personal information processing rule, personal authorization is obtained by popup information or a person is requested to upload personal information and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing mode, and a type of personal information to be processed.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (14)

1. The data processing method is characterized by being applied to a graphic processor, wherein the graphic processor comprises a computing core and a register group, and the computing core is used for converting logical addresses of data to be operated in different data processing modes in the register group into interactive addresses in a preset data processing mode; the processing method comprises the following steps:
receiving an instruction to be operated;
according to the instruction to be operated, determining an operation mode and an interaction address corresponding to the data to be operated, and acquiring the data to be operated stored in the register group based on the interaction address; wherein the interactive address includes: a storage area row address alignment offset, a storage area intermediate address; the row address alignment offset of the storage area is used for representing the address offset condition of the data to be operated in the storage area under the preset data processing mode; the storage area intermediate address is used for representing the intermediate address of the data to be operated in the corresponding storage area;
Generating an operation result according to the data to be operated and the operation mode;
the row address alignment offset of the storage area is an address bit with a preset bit number from the lowest bit in a base address of a logic address corresponding to the data to be operated, the middle address of the storage area is a sum value obtained by shifting left of the rest address bit by the preset bit number and adding the logic offset address, and the logic offset address is the address offset of the logic address corresponding to the data to be operated.
2. The processing method of claim 1, wherein the register set includes at least one memory region, the processing method further comprising:
receiving a plurality of data to be operated;
dividing the data to be operated into a plurality of data groups to be operated according to the data processing modes corresponding to the data to be operated; wherein, the data quantity to be operated in the data group to be operated corresponding to different data processing modes is different;
distributing the plurality of data groups to be operated to the at least one storage area to obtain an interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated; wherein each of the plurality of data sets to be operated on is allocated into at least one storage area.
3. The processing method as set forth in claim 1, wherein the dividing the plurality of data to be operated into a plurality of data groups to be operated according to the data processing mode corresponding to the plurality of data to be operated includes:
dividing the data to be operated into a plurality of data groups to be operated corresponding to task identifiers according to the data processing modes corresponding to the data to be operated; wherein the task identifier is used for mapping each to-be-operated data group in the plurality of to-be-operated data groups to a base address position in the at least one storage area.
4. A processing method as claimed in claim 3, wherein the interaction address comprises: the step of obtaining the interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated includes:
generating a hash value corresponding to each data group to be operated according to a task identifier corresponding to each data group to be operated in the plurality of data groups to be operated; the hash values between adjacent data groups to be operated are different according to the task identification;
Generating an interaction address corresponding to each data to be operated in each data set to be operated according to the hash value corresponding to each data set to be operated; and storing the data to be operated in the data group to be operated corresponding to different hash values into different physical storage areas.
5. A processing method as claimed in claim 3, wherein the interaction address comprises: the step of obtaining the interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated includes:
generating a hash value corresponding to each data group to be operated according to a task identifier corresponding to each data group to be operated in the plurality of data groups to be operated and a sampling identifier corresponding to the data group to be operated; wherein the sampling mark is used for representing different sampling points of each data group to be operated aiming at processing in the plurality of data groups to be operated;
generating an interaction address corresponding to each data to be operated in each data set to be operated according to the hash value corresponding to each data set to be operated; and storing the data to be operated in the data group to be operated corresponding to different hash values into different physical storage areas.
6. The processing method as claimed in claim 1, wherein said register set includes a memory area, said acquiring data to be operated on stored in said register set based on said interactive address includes:
determining a row address and a column address of each data to be operated in a corresponding storage area according to the interaction address corresponding to each data to be operated; wherein the row address and the column address are used for representing the positions of registers in the storage area;
and accessing a corresponding register according to the row address and the column address of each piece of data to be operated in the corresponding storage area to obtain each piece of data to be operated.
7. The processing method according to claim 1, wherein the register set includes a plurality of memory areas, and the acquiring the data to be operated on stored in the register set based on the interactive address includes:
for each piece of data to be operated in the plurality of pieces of data to be operated, determining an interaction address and a storage area identifier corresponding to each piece of data to be operated according to the plurality of pieces of data to be operated; the storage area identifier is used for determining a storage area corresponding to each data group to be operated in the plurality of data groups to be operated when the plurality of data groups to be operated are distributed into a plurality of storage areas;
Determining a storage area corresponding to each piece of data to be operated according to the storage area identifier corresponding to each piece of data to be operated;
determining a row address and a column address of each piece of data to be operated in a corresponding storage area according to the interaction address corresponding to each piece of data to be operated; wherein the row address and the column address are used for representing the positions of registers in the storage area;
and accessing a register in the corresponding storage area according to the row address and the column address of each piece of data to be operated in the corresponding storage area to obtain each piece of data to be operated.
8. The processing method as set forth in claim 2, wherein the obtaining the interaction address corresponding to each data to be operated in the plurality of data to be operated sets includes:
generating an interaction address corresponding to each data to be operated in each data group to be operated in the plurality of data groups to be operated;
generating a segment number corresponding to each piece of data to be operated according to the thread number corresponding to the data to be operated and the total number of threads corresponding to a preset data processing mode; the segment number represents segment offset generated by storing each piece of data to be operated in the preset data processing mode;
The determining the row address and the column address of each data to be operated in the corresponding storage area according to the interaction address corresponding to each data to be operated includes:
generating a row address of each data to be operated in a corresponding storage area according to the interactive address and the segment number corresponding to each data to be operated;
generating a column address of each data to be operated in a corresponding storage area according to the interaction address, the segment number and the hash value corresponding to each data to be operated, or generating a column address of each data to be operated in a corresponding storage area according to the interaction address and the segment number corresponding to each data to be operated.
9. The processing method as claimed in claim 2, wherein said allocating the plurality of data groups to be operated on into the at least one memory area includes: and for each data group to be operated in the plurality of data groups to be operated, uniformly distributing and sending each data group to be operated to each storage area.
10. The processing method of claim 1, wherein the acquiring the data to be operated on stored in the register set based on the interaction address comprises: acquiring data to be operated stored in the register group based on the interaction address through a plurality of pipelines of the graphic processor;
The generating an operation result according to the data to be operated and the operation mode comprises the following steps:
and generating an operation result corresponding to each piece of data to be operated according to the data to be operated and the operation mode through a plurality of pipelines of the graphic processor.
11. The processing method as set forth in claim 10, wherein the generating, by the pipelines of the graphics processor, an operation result corresponding to each data to be operated according to the data to be operated and the operation mode includes:
under the condition that different pipelines simultaneously access a target port of a register group in a plurality of pipelines of the graphic processor, arbitration is carried out according to the priority of the different pipelines and/or the priority of a read write operation, and a target pipeline corresponding to the target port is determined;
and sequentially passing through the target pipeline and other pipelines in different pipelines, and generating an operation result corresponding to each piece of data to be operated according to the data to be operated and the operation mode.
12. A graphics processor, the graphics processor comprising: the system comprises a computing core and a register group connected with the computing core, wherein the computing core is used for converting logical addresses of data to be operated in different data processing modes in the register group into interactive addresses in a preset data processing mode;
The computing core is further used for receiving an instruction to be operated; according to the instruction to be operated, determining an operation mode and an interaction address corresponding to the data to be operated, and acquiring the data to be operated stored in the register group based on the interaction address; wherein the interactive address includes: a storage area row address alignment offset, a storage area intermediate address; the row address alignment offset of the storage area is used for representing the address offset condition of the data to be operated in the storage area under the preset data processing mode; the storage area intermediate address is used for representing the intermediate address of the data to be operated in the corresponding storage area; generating an operation result according to the data to be operated and the operation mode;
the row address alignment offset of the storage area is an address bit with a preset bit number from the lowest bit in a base address of a logic address corresponding to the data to be operated, the middle address of the storage area is a sum value obtained by shifting left of the rest address bit by the preset bit number and adding the logic offset address, and the logic offset address is the address offset of the logic address corresponding to the data to be operated.
13. An electronic device, comprising: a host, a graphics processor as claimed in claim 12.
14. A computer-readable, writable data storage medium, on which computer program instructions, data to be operated on, are stored, characterized in that the computer program instructions, when executed by a processor, implement a method of processing data according to any one of claims 1 to 11.
CN202310612804.6A 2023-05-26 2023-05-26 Data processing method, graphic processor, electronic device and storage medium Active CN116360858B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310612804.6A CN116360858B (en) 2023-05-26 2023-05-26 Data processing method, graphic processor, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310612804.6A CN116360858B (en) 2023-05-26 2023-05-26 Data processing method, graphic processor, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN116360858A CN116360858A (en) 2023-06-30
CN116360858B true CN116360858B (en) 2023-08-29

Family

ID=86922445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310612804.6A Active CN116360858B (en) 2023-05-26 2023-05-26 Data processing method, graphic processor, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN116360858B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688158A (en) * 2017-07-20 2020-01-14 上海寒武纪信息科技有限公司 Computing device and processing system of neural network
CN111381872A (en) * 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 Operation method, device and related product
CN111897579A (en) * 2020-08-18 2020-11-06 腾讯科技(深圳)有限公司 Image data processing method, image data processing device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT201700082213A1 (en) * 2017-07-19 2019-01-19 Univ Degli Studi Di Siena PROCEDURE FOR AUTOMATIC GENERATION OF PARALLEL CALCULATION CODE

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688158A (en) * 2017-07-20 2020-01-14 上海寒武纪信息科技有限公司 Computing device and processing system of neural network
CN111381872A (en) * 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 Operation method, device and related product
CN111897579A (en) * 2020-08-18 2020-11-06 腾讯科技(深圳)有限公司 Image data processing method, image data processing device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN116360858A (en) 2023-06-30

Similar Documents

Publication Publication Date Title
US20190102671A1 (en) Inner product convolutional neural network accelerator
US8941674B2 (en) System and method for efficient resource management of a signal flow programmed digital signal processor code
US11216281B2 (en) Facilitating data processing using SIMD reduction operations across SIMD lanes
CN116431099B (en) Data processing method, multi-input-output queue circuit and storage medium
US10229044B2 (en) Conditional stack frame allocation
US20200319861A1 (en) Compiling a Program from a Graph
US20240086359A1 (en) Dynamic allocation of arithmetic logic units for vectorized operations
US11755320B2 (en) Compute array of a processor with mixed-precision numerical linear algebra support
WO2012129446A1 (en) Register allocation for graphics processing
US20070101320A1 (en) Method for scheduling instructions and method for allocating registers using the same
CN116360858B (en) Data processing method, graphic processor, electronic device and storage medium
US9760282B2 (en) Assigning home memory addresses to function call parameters
KR102178290B1 (en) Decimal multiply and shift command
US11494326B1 (en) Programmable computations in direct memory access engine
CN112463218B (en) Instruction emission control method and circuit, data processing method and circuit
CN112232003B (en) Method for simulating design, electronic device and storage medium
US8510539B2 (en) Spilling method involving register files based on communication costs and use ratio
US20240103813A1 (en) Compute engine with transpose circuitry
CN115934102B (en) Dynamic allocation method and device for general registers, computer equipment and storage medium
US20240111528A1 (en) Programmable compute engine having transpose operations
US20220019531A1 (en) Allocating Variables to Computer Memory
US20220350570A1 (en) Pipelined hardware to accelerate modular arithmetic operations
US20200356371A1 (en) Reusing an operand in an instruction set architecture (isa)
CN117540669A (en) Method and device for processing structured data of digital circuit
CN112905181A (en) Model compiling and running method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant