CN112148371A - Data operation method, device, medium and equipment based on single instruction multiple data streams - Google Patents

Data operation method, device, medium and equipment based on single instruction multiple data streams Download PDF

Info

Publication number
CN112148371A
CN112148371A CN201910566415.8A CN201910566415A CN112148371A CN 112148371 A CN112148371 A CN 112148371A CN 201910566415 A CN201910566415 A CN 201910566415A CN 112148371 A CN112148371 A CN 112148371A
Authority
CN
China
Prior art keywords
data
operated
simd
interface function
fixed point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910566415.8A
Other languages
Chinese (zh)
Other versions
CN112148371B (en
Inventor
陈亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Horizon Robotics Technology Research and Development Co Ltd
Original Assignee
Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Horizon Robotics Technology Research and Development Co Ltd filed Critical Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority to CN201910566415.8A priority Critical patent/CN112148371B/en
Publication of CN112148371A publication Critical patent/CN112148371A/en
Application granted granted Critical
Publication of CN112148371B publication Critical patent/CN112148371B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

A method, apparatus, medium, and device for data operation based on single instruction multiple data streams are disclosed. The method comprises the following steps: generating an operation array according to at least one group of data to be operated participating in the same type of operation, wherein the group of data to be operated comprises: the data to be operated on participating in the same operation in the same type of operation comprises an operation array: at least one first fixed point number used for expressing sign bit and mantissa of the data to be operated on, and at least one second fixed point number used for expressing exponent of the data to be operated on; calling a SIMD interface function corresponding to the operation aiming at mantissas and exponents related to the same type of operation, and determining input parameters of the SIMD interface function according to elements in the operation array; and generating at least one group of operation results corresponding to the data to be operated according to the operation results of the SIMD interface function based on the input parameters. The present disclosure facilitates improving the applicability of SIMD-based interface functions and the efficiency of non-fixed-point number operations.

Description

Data operation method, device, medium and equipment based on single instruction multiple data streams
Technical Field
The present disclosure relates to data processing technologies, and in particular, to a method for computing data based on a single instruction multiple data stream, a device for computing data based on a single instruction multiple data stream, a storage medium, and an electronic device.
Background
SIMD (Single Instruction Multiple Data) technology is introduced into processor architectures such as the ARM (Advanced RISC Machine) v-series. By using SIMD technology, parallel arithmetic processing of fixed-point numbers can be realized. How to realize the operation of more data types based on the SIMD technology is a technical problem of great concern.
Disclosure of Invention
The present disclosure is proposed to solve the above technical problems. The embodiment of the disclosure provides a data operation method and device based on single instruction multiple data streams, a storage medium and an electronic device.
According to an aspect of the embodiments of the present disclosure, there is provided a method for data operation based on single instruction multiple data streams, the method including: generating an operation array according to at least one group of data to be operated participating in the same type of operation, wherein the group of data to be operated comprises: data to be operated on participating in the same operation in the same type of operation, the operation array including: at least one first fixed point number used for expressing sign bit and mantissa of the data to be operated on, and at least one second fixed point number used for expressing exponent of the data to be operated on; calling a single instruction multiple data stream SIMD interface function corresponding to the operation aiming at mantissas and exponents involved in the same type of operation, and determining input parameters of the SIMD interface function according to elements in the operation array; and generating operation results corresponding to the at least one group of data to be operated according to the operation results of the SIMD interface function based on the input parameters.
According to another aspect of the embodiments of the present disclosure, there is provided a single instruction multiple data stream-based data operation apparatus, including: the generating operation array module is used for generating an operation array according to at least one group of data to be operated participating in the same type of operation, wherein the group of data to be operated comprises: all data to be operated on participating in the same operation, the operation array comprises: at least one first fixed point number used for expressing sign bit and mantissa of the data to be operated on, and at least one second fixed point number used for expressing exponent of the data to be operated on; the calling interface function module is used for calling a corresponding interface function based on single instruction multiple data Stream (SIMD) according to the operation related to the operation aiming at the mantissa and the exponent, and determining input parameters of the interface function based on the SIMD according to elements in an operation array generated by the operation array generating module; and the operation result generation module is used for generating operation results corresponding to the at least one group of data to be operated according to the operation results of the SIMD-based interface functions called by the calling interface function module.
According to still another aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, where the storage medium stores a computer program for executing the above-mentioned data operation method based on single instruction multiple data streams.
According to still another aspect of an embodiment of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; the processor is used for reading the executable instruction from the memory and executing the instruction to realize the data operation method based on the single instruction multiple data streams.
Based on the data operation method and device based on the single instruction multiple data streams provided by the above embodiments of the present disclosure, sign bits, mantissas and indexes of data to be operated are expressed by using a first fixed point number and a second fixed point number, an interface function based on SIMD is called by an operation for the mantissa and the index involved in the operation, and an input parameter is provided for the interface function based on SIMD by using an element in an operation array, so that not only can two operation operations be realized in parallel by calling the interface function based on SIMD once, but also an operation result of the data to be operated is formed by using an operation result of the interface function based on SIMD, thereby realizing operation of non-fixed point numbers. Therefore, the technical scheme provided by the disclosure is beneficial to improving the application range of the SIMD-based interface function and improving the efficiency of non-fixed-point number operation.
The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:
FIG. 1 is a flowchart of an embodiment of a SIMD data operation method according to the present disclosure;
FIG. 2 is a diagram illustrating an example of multiple groups of data to be operated on participating in the same type of operation according to the present disclosure;
FIG. 3 is a diagram illustrating a format of a single precision floating point number;
FIG. 4 is a schematic diagram of an example of a first operand array generated by the present disclosure;
FIG. 5 is a schematic diagram of an example of a second operand array generated by the present disclosure;
FIG. 6 is a schematic diagram of another example of a first operand array generated by the present disclosure;
FIG. 7 is a schematic diagram of another example of a second operand array generated by the present disclosure;
FIG. 8 is a schematic diagram of yet another example of a first array of operations generated by the present disclosure;
FIG. 9 is a schematic diagram of yet another example of a second array of operations generated by the present disclosure;
FIG. 10 is a block diagram illustrating an embodiment of a SIMD data operation apparatus according to the present disclosure;
fig. 11 is a block diagram of an electronic device provided in an exemplary embodiment of the present application.
Detailed Description
Example embodiments according to the present disclosure will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.
It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more than two and "at least one" may refer to one, two or more than two.
It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.
In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing the associated object, and means that there may be three kinds of relationships, such as a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.
It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Embodiments of the present disclosure may be implemented in electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with an electronic device, such as a terminal device, computer system, or server, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment. In a distributed cloud computing environment, tasks may be performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
Summary of the disclosure
In implementing the present disclosure, the inventors found that, with the SIMD technology, parallel arithmetic processing of a fixed-point number (for example, 16 bits) can be implemented, however, there are types of data such as a half-precision floating point number, a single-precision floating point number, and a complex number in addition to the fixed-point number, and SIMD cannot perform arithmetic on these types of data. If the operation of data of types such as half-precision floating point number, single-precision floating point number, complex number and the like is realized in a software simulation mode, the operation speed is generally low, and the real-time requirement is difficult to meet. Although processor architectures such as the ARMv series can realize operations for Floating-point numbers by introducing a VFP (Vector Floating-point Coprocessor), the VFP often needs to satisfy certain conditions. In one example, the use of a VFP requires the following three conditions to be met:
condition 1, VFP, is often an option that needs to be checked out.
Condition 2, the relevant access rights need to be allowed in the embedded operating system.
Condition 3, the corresponding version of the compiler must be used.
The three conditions need to be met simultaneously to enable the use of VFP. The conditions that need to be met using a VFP may cause certain difficulties in the practical application of the VFP. In addition, for an existing device, if the VFP is not introduced into the processor architecture of the device, it means that the floating point number cannot be operated by the VFP.
Brief description of the drawings
Assuming that a data processor (e.g., an ARM11 series data processor, etc.) supports SIMD technology, the data processor can implement fixed-point parallel arithmetic processing using SIMD technology.
Although the SIMD technology does not support operations on data of half-precision floating point number, single-precision floating point number, complex number and the like, the present disclosure may represent the data to be operated by using a first fixed point number and a second fixed point number, where the first fixed point number is used to represent a sign bit and a mantissa of the data to be operated, and the second fixed point number is used to represent an exponent of the data to be operated, regardless of whether the data to be operated is the half-precision floating point number, the single-precision floating point number, or the complex number and other non-fixed point type data. Because the corresponding SIMD interface function can be called aiming at the first fixed point number, the second fixed point number and the operation type corresponding to the data to be operated, the parallel operation processing of the fixed point number is realized, and the operation result corresponding to the data to be operated can be obtained by the method of combining the exponent and the mantissa according to the result of the parallel operation processing, therefore, the parallel operation processing of various data types can be realized by utilizing the SIMD technology on the basis of the data processor.
Exemplary method
FIG. 1 is a flowchart illustrating an embodiment of a method for SIMD data operations according to the present disclosure. As shown in fig. 1, the method of this embodiment includes the steps of: s100, S101 and S102.
And S100, generating an operation array according to at least one group of data to be operated participating in the same type of operation.
Operations in this disclosure are generally referred to as algebraic operations. Operations in this disclosure include, but are not limited to: addition, subtraction, multiplication, division, squaring, evolution, reciprocal, logarithmic, exponential, and the like.
A set of data to be operated on generally refers to all data to be operated on that perform the same operation. The group of data to be operated may include one data to be operated or may include two data to be operated. The amount of data to be operated on included in a set of data to be operated on is generally related to the operation to be performed on the data to be operated on. It can be considered that the number of data to be operated on included in a set of data to be operated on is generally related to the number of participants performing the operation. For example, for an addition operation, a set of data to be operated on includes two data to be operated on, which are both addends, i.e., a first addend and a second addend. For another example, for a subtraction operation, a set of data to be operated on includes two data to be operated on, one of which is a subtrahend and the other of which is a subtrahend. For another example, for reciprocal operation, a set of data to be operated on includes one data to be operated on.
Any set of data to be operated on in the present disclosure may include, but is not limited to: fixed point numbers (e.g., 16 bits), half-precision floating point numbers (e.g., 16 bits), single-precision floating point numbers (e.g., 32 bits), and complex numbers (e.g., 64 bits).
The operation array comprises: a plurality of elements, and each element is a fixed point number. For example, one element in the operation array may be a first fixed point number for representing a sign bit and a mantissa of data to be operated on, and another element in the operation array may be a second fixed point number for representing an exponent of the data to be operated on. The number of the arithmetic arrays may be one or more.
S101, calling a SIMD interface function corresponding to the operation of mantissas and exponents related to the same type of operation, and determining input parameters of the SIMD interface function according to elements in the operation array.
Operations for mantissa and exponent involved in the same type of operation in this disclosure may be represented as: the type of operation whether to operate on mantissas and mantissas, and the type of operation whether to operate on exponents and exponents. That is, the operation type of the same type of operation in this disclosure may determine the operations related thereto for mantissas and exponents. For example, operations of the type such as addition and subtraction require operations on mantissas and may not be performed on exponents. For example, operations of the type such as multiplication, division, and reciprocal require operations on both mantissas and exponents.
The method and the device can preset operations for mantissas and exponents related to different types of operations and the SIMD interface functions corresponding to the operations. The input parameters of the SIMD interface functions in this disclosure typically include: at least one fixed point number for performing the corresponding operation is required.
And S102, generating at least one group of operation results corresponding to the data to be operated according to the operation results of the SIMD interface function based on the input parameters.
After the SIMD interface function is called and the input parameters of the SIMD interface function are assigned, the SIMD interface function executes corresponding operation based on the input parameters and returns a corresponding operation result. The operation result is an operation result corresponding to the input parameter. The operation result may include: at least one of an operation result for mantissas and an operation result for exponents.
The present disclosure may form the operation result corresponding to each group of data to be operated by processing the operation result (e.g., combining and restoring data types). The present disclosure may also form an operation result corresponding to each set of data to be operated by processing the operation result and the elements in the operation array (e.g., combining and restoring data type processing).
The sign bit, the mantissa and the index of the data to be operated are expressed by using the first fixed point number and the second fixed point number, the SIMD-based interface function is called through the operation aiming at the mantissa and the index related to the operation, and the input parameters are provided for the SIMD-based interface function by using the elements in the operation array. Therefore, the technical scheme provided by the disclosure is beneficial to improving the application range of the SIMD-based interface function and improving the efficiency of non-fixed-point number operation.
In an alternative example, the first fixed point number of the present disclosure is 16 bits, wherein the most significant bit (i.e., the leftmost bit) is the sign bit and the other 15 bits are occupied by mantissas. The second fixed point number in this disclosure is 16 bits, all of which 16 bits are occupied by an exponent.
In an alternative example, an example of the disclosed multiple sets of data to be operated on that participate in the same type of operation is shown in fig. 2.
In fig. 2, it is assumed that there are m +1 data to be operated (m is an odd number), and the same type of operation with a participant of 2 is required (fig. 2 is described below by taking an addition operation as an example). The m +1 data to be operated on are usually the same type of non-fixed point number, for example, all single-precision floating point numbers or complex numbers. Of course, the m +1 data to be calculated may be different types of non-fixed point numbers. The m +1 number of non-fixed points is respectively: data to be operated n1, data to be operated n2, data to be operated n3, data to be operated n4, … …, data to be operated nm and data to be operated nm + 1. The data to be operated n1 and the data to be operated n2 form a group of data to be operated, namely a first group of data to be operated, which needs to be added. The data to be operated n3 and the data to be operated n4 form a group of data to be operated, namely a second group of data to be operated, which needs to be added. And repeating the steps until the data nm to be operated and the data nm +1 to be operated form a group of data to be operated, which needs to be subjected to addition operation, namely the (m +1)/2 th group of data to be operated. The data n1 to be calculated, the data n3, … … to be calculated and the data nm to be calculated are all the first addends. The data n2 to be operated, the data n4, … … to be operated and the data nm +1 to be operated are all the second addends.
In an alternative example, the present disclosure may generate N operation arrays from all sets of data to be operated on that participate in the same type of operation. Where N is the number of participants in the operation. The number of participants in an operation in this disclosure is determined by the inherent nature of the operation and not by the number of groups of data to be operated upon that participate in the operation. For example, the number of participating parties is 2 for operations such as addition, subtraction, multiplication, and division. For another example, for an operation such as a reciprocal operation, the number of participants is 1.
Optionally, if an operation includes N participants, when N operation arrays are generated, the first fixed point number and the second fixed point number corresponding to all the first participants in each set of data to be operated are generally used as elements in one operation array, the first fixed point number and the second fixed point number corresponding to all the second participants in each set of data to be operated are used as elements in another operation array, and so on, all the nth participants in each set of data to be operated are used as the nth operation array.
For example, for M sets of data to be operated to participate in an addition operation, since the participation party of the addition operation is 2, the present disclosure may generate two operation arrays for the M sets of data to be operated to, where one operation array corresponds to all first addends in the M sets of data to be operated to, and where the other operation array corresponds to all second addends in the M sets of data to be operated to.
For another example, for M sets of data to be operated on for subtraction, since the participating party of the subtraction is 2, the present disclosure may generate two operation arrays for the M sets of data to be operated on, where one operation array corresponds to all the subtrees in the M sets of data to be operated on, and where another operation array corresponds to all the subtrees in the M sets of data to be operated on.
As another example, for M sets of data to be operated on for multiplication, since the number of participants of the multiplication is 2, the present disclosure may generate two operation arrays for the M sets of data to be operated on, where one operation array corresponds to all first multipliers in the M sets of data to be operated on, and another operation array corresponds to all second multipliers in the M sets of data to be operated on.
As another example, for M groups of data to be operated participating in division operation, since the participation party of the division operation is 2, the present disclosure may generate two operation arrays for the M groups of data to be operated, where one operation array corresponds to all dividends in the M groups of data to be operated, and another operation array corresponds to all divisors in the M groups of data to be operated.
As another example, for M groups of data to be operated, which participate in the reciprocal operation, since the participation party of the reciprocal operation is 1, the present disclosure may generate an operation array for the M groups of data to be operated, where the operation array corresponds to all data to be operated in the M groups of data to be operated.
In one optional example, the present disclosure may generate an operand array in different ways for different types of data to be operated on. The following examples illustrate:
as a first example, for a first to-be-operated data including a mantissa and an exponent, the present disclosure may convert the sign bit and the mantissa in the first to-be-operated data into a first fixed-point number, and convert the exponent in the first to-be-operated data into a second fixed-point number. For example, in a case where the first to-be-operated data is a half-precision floating point number, the present disclosure may convert a sign bit and a mantissa of the half-precision floating point number into a first fixed point number, and convert an exponent of the half-precision floating point number into a second fixed point number. For another example, in the case that the first data to be operated on is a single-precision floating point number, the present disclosure may set the sign bit (e.g., s in fig. 3) and the mantissa (e.g., b in fig. 3) of the single-precision floating point number1b2b3b4......b23) Converting to a first fixed point number (16 bits), and converting the exponent of the single-precision floating point number (e in FIG. 3)1e2e3e4e5e6e7e8) To a second fixed point number (16 bits). In which operations of the same type are to be involvedAfter all the first to-be-operated data in all the groups of to-be-operated data are respectively converted into a first fixed point number and a second fixed point number, the method can generate operation arrays with corresponding numbers of all the first fixed point numbers and the second fixed point numbers according to the number of participants participating in the same type of operation. For example, an arithmetic array is generated from all the first fixed point numbers and all the second fixed point numbers. For another example, an operation array is generated by all the first fixed point numbers and all the second fixed point numbers of the first participant, and another operation array is generated by all the first fixed point numbers and all the second fixed point numbers of the second participant.
With reference to fig. 2 and the first example described above, assuming that m +1 pieces of data to be operated in the present disclosure are all single-precision floating point numbers, and (m +1)/2 pieces of data to be operated all need to be added, an example of two operation arrays generated by the present disclosure is shown in fig. 4 and fig. 5.
Fig. 4 shows one of two operand arrays, which may be referred to as a first operand array. The first operand array includes: and the first fixed point number and the second fixed point number respectively correspond to the first addend in each group of data to be operated. For example, element 1 (i.e., the first fixed point number n1a) and element 2 (i.e., the second fixed point number n1b) in fig. 4 are the first fixed point number and the second fixed point number corresponding to the data to be calculated n1 in fig. 2; element 3 (i.e., the first fixed point number n3a) and element 4 (i.e., the second fixed point number n3b) in fig. 4 are the first fixed point number and the second fixed point number corresponding to the data to be calculated n3 in fig. 2; by analogy, the element m (i.e., the first fixed point number nma) and the element m +1 (i.e., the second fixed point number nmb) in fig. 4 are the first fixed point number and the second fixed point number corresponding to the data nm to be calculated in fig. 2.
FIG. 5 is a diagram of the other of the two operand arrays, which may be referred to as the second operand array. The second operand array includes: and the first fixed point number and the second fixed point number respectively correspond to the second addend in each group of data to be operated. For example, element 1 (i.e., the first fixed point number n2a) and element 2 (i.e., the second fixed point number n2b) in fig. 5 are the first fixed point number and the second fixed point number corresponding to the data to be calculated n2 in fig. 2; element 3 (i.e., the first fixed point number n4a) and element 4 (i.e., the second fixed point number n4b) in fig. 5 are the first fixed point number and the second fixed point number corresponding to the data to be calculated n4 in fig. 2; by analogy, the element m (i.e., the first fixed point number nm +1a) and the element m +1 (i.e., the second fixed point number nm +1b) in fig. 5 are the first fixed point number and the second fixed point number corresponding to the data to be calculated nm +1 in fig. 2.
It should be particularly noted that the arrangement order of the elements in fig. 4 and fig. 5 can be flexibly set according to actual needs. For example, all of the first fixed point numbers in fig. 4 and 5 are arranged in the order of the group and in front of all of the second fixed point numbers, which are also arranged in the order of the group. In addition, when a set of data to be operated includes a plurality of data to be operated (e.g., two data to be operated), and the operation type does not require the exponent portion to participate in the operation, the second fixed point numbers corresponding to the plurality of data to be operated in the set of data to be operated may be the same. Hereinafter, the same will not be described one by one.
According to the method, the first to-be-operated data such as the half-precision floating point number or the single-precision floating point number is converted into the first fixed point number and the second fixed point number, so that the first to-be-operated data such as the half-precision floating point number or the single-precision floating point number can be subjected to parallel operation processing by utilizing the SIMD technology, and the operation compatibility is improved.
As a second example, for a second data to be operated on, which includes a plurality of mantissas and a plurality of exponents, the present disclosure may convert a plurality of sign bits and a plurality of mantissas in the second data to be operated on into a first fixed point number, respectively, and convert a plurality of exponents in the data to be operated on into a shared second fixed point number. For example, in the case that the second data to be operated on is a complex number, the real part of the complex number may be a single-precision floating point number, and the imaginary part of the complex number may also be a single-precision floating point number, each of which includes: the sign bit, the mantissa and the exponent of the single-precision floating point number of the real part and the exponent of the single-precision floating point number of the imaginary part of the complex number are different, the sign bit and the mantissa of the single-precision floating point number of the real part and the exponent of the single-precision floating point number of the imaginary part of the complex number can be adjusted to have the same exponent, and then the sign bit and the mantissa of the single-precision floating point number of the real part can be converted into a first fixed point number, the sign bit and the mantissa of the single-precision floating point number of the imaginary part are converted into another first fixed point number, and the exponent of the single-precision floating point number of the real part and the exponent of the single-precision floating point number of the imaginary part are converted into a shared second fixed point number. After all the second data to be operated in all the groups of data to be operated participating in the same type of operation are respectively converted into two first fixed points and one shared second fixed point, the method and the device can generate operation arrays with corresponding quantities of all the first fixed points and the second fixed points according to the quantity of participants participating in the same type of operation. For example, an arithmetic array is generated by all the first fixed point numbers and all the shared second fixed point numbers. For another example, an operation array is generated by all the first fixed point numbers of the first participant and all the shared second fixed point numbers of the first participant, and another operation array is generated by all the first fixed point numbers of the second participant and all the second fixed point numbers of the second participant.
With reference to fig. 2 and the second example described above, assuming that m +1 data to be operated in the present disclosure are all complex numbers, and (m +1)/2 groups of data to be operated all need to be added, an example of two operation arrays generated by the present disclosure is shown in fig. 6 and 7.
FIG. 6 shows one of two operand arrays, which may be referred to as a first operand array. The first operand array includes: the method comprises the steps of respectively corresponding first fixed point numbers to real parts of first addends in each group of data to be operated, respectively corresponding first fixed point numbers to imaginary parts of the first addends in each group of data to be operated, and shared second fixed point numbers corresponding to the real parts and the imaginary parts of the first addends in each group of data to be operated. For example, element 1 (i.e., the first fixed point number n1a) in fig. 6 is a first fixed point number corresponding to the real part of the data to be calculated n1 in fig. 2, element 2 (i.e., the second fixed point number n1b) in fig. 6 is a first fixed point number corresponding to the imaginary part of the data to be calculated n1 in fig. 2, and element 3 (i.e., the second fixed point number 2n1c) in fig. 6 is a shared second fixed point number corresponding to the real part and the imaginary part of the data to be calculated n1 in fig. 2; element 4 (i.e., the first fixed point number n3a) in fig. 6 is a first fixed point number corresponding to the real part of the data to be calculated n3 in fig. 2, element 5 (i.e., the second fixed point number n3b) in fig. 6 is a first fixed point number corresponding to the imaginary part of the data to be calculated n3 in fig. 2, and element 6 (i.e., the second fixed point number n3c) in fig. 6 is a shared second fixed point number corresponding to the real part and the imaginary part of the data to be calculated n3 in fig. 2; and so on, until the element (3m-1)/2 (i.e. the first fixed point nma) in fig. 6 is the first fixed point corresponding to the real part of the data nm to be calculated in fig. 2, the element (3m +1)/2 (i.e. the second fixed point nmb) in fig. 6 is the first fixed point corresponding to the imaginary part of the data nm to be calculated in fig. 2, and the element (3m +3)/2 (i.e. the second fixed point nmc) in fig. 6 is the shared second fixed point corresponding to the real part and the imaginary part of the data nm to be calculated in fig. 2.
FIG. 7 is a diagram of the other of the two operand arrays, which may be referred to as the second operand array. The second operand array may include: the real part of the second addend in each group of data to be operated corresponds to the first fixed point number, the imaginary part of the second addend in each group of data to be operated corresponds to the first fixed point number, and the real part and the imaginary part of the second addend in each group of data to be operated correspond to the shared second fixed point number. For example, element 1 (i.e., the first fixed point number n2a) in fig. 7 is a first fixed point number corresponding to the real part of the data to be calculated n2 in fig. 2, element 2 (i.e., the second fixed point number n2b) in fig. 7 is a first fixed point number corresponding to the imaginary part of the data to be calculated n2 in fig. 2, and element 3 (i.e., the second fixed point number n2c) in fig. 7 is a shared second fixed point number corresponding to the real part and the imaginary part of the data to be calculated n1 in fig. 2; element 4 (i.e., the first fixed point number n4a) in fig. 7 is a first fixed point number corresponding to the real part of the data to be calculated n4 in fig. 2, element 5 (i.e., the second fixed point number n4b) in fig. 7 is a first fixed point number corresponding to the imaginary part of the data to be calculated n4 in fig. 2, and element 6 (i.e., the second fixed point number n4c) in fig. 7 is a shared second fixed point number corresponding to the real part and the imaginary part of the data to be calculated n4 in fig. 2; and so on, until the element (3m-1)/2 (i.e. the first fixed point number nma) in fig. 7 is the first fixed point number corresponding to the real part of the data to be calculated nm +1 in fig. 2, the element (3m +1)/2 (i.e. the second fixed point number nm +1b) in fig. 7 is the first fixed point number corresponding to the imaginary part of the data to be calculated nm +1 in fig. 2, and the element (3m +3)/2 (i.e. the second fixed point number nm +1c) in fig. 7 is the shared second fixed point number corresponding to the real part and the imaginary part of the data to be calculated nm +1 in fig. 2.
It should be noted that, the present disclosure may generate one shared second fixed point number for all the real part indexes of the complex numbers, may also generate one shared second fixed point number for all the imaginary part indexes of the complex numbers, and may also generate one shared second fixed point number for all the real part indexes and the imaginary part indexes of the complex numbers. And will not be described in detail herein. In addition, the arrangement order of the elements in fig. 6 and 7 can be flexibly set according to actual needs. For example, all the first fixed point numbers in fig. 6 and fig. 7 are arranged in the order of the group, and for the same data to be calculated, the first fixed point number corresponding to the real part is arranged in front of the first fixed point number corresponding to the imaginary part, all the first fixed point numbers are arranged in front of all the second fixed point numbers, and the second fixed point numbers are also arranged in the order of the group.
The method converts the plurality of pieces of second data to be operated into a plurality of first fixed points and a shared second fixed point, so that the method can utilize the SIMD technology to carry out parallel operation processing on the plurality of pieces of second data to be operated, thereby improving the operation compatibility and being beneficial to reducing the memory consumption in the operation process.
As a third example, for a plurality of first to-be-operated data, the present disclosure may convert all sign bits and mantissas in the plurality of first to-be-operated data into a first fixed point number, respectively, and convert all exponents in the plurality of first to-be-operated data into a shared second fixed point number. For example, assuming that the plurality of first data to be operated are half-precision floating point numbers, when the exponents of all the half-precision floating point numbers are not completely the same, the disclosure may adjust the mantissas of at least one half-precision floating point number to make all the half-precision floating point numbers have the same exponent, and then, for the sign bit and the mantissa of each half-precision floating point number in all the half-precision floating point numbers, the disclosure converts the mantissas into one first fixed point number, thereby obtaining a plurality of first fixed point numbers, and the disclosure may convert the exponents of all the half-precision floating point numbers into one shared second fixed point number. For another example, assuming that the plurality of first data to be operated are all single-precision floating point numbers, when the exponents of all the single-precision floating point numbers are not completely the same, the disclosure may adjust the mantissa of at least one of the single-precision floating point numbers to make all the single-precision floating point numbers have the same exponent, and then, for the sign bit and the mantissa of each single-precision floating point number in all the single-precision floating point numbers, the disclosure converts the sign bit and the mantissa into one first fixed point number, thereby obtaining a plurality of first fixed point numbers, and the disclosure may convert the exponents of all the single-precision floating point numbers into one shared second fixed point number. After all the first to-be-operated data in all the groups of to-be-operated data participating in the same type of operation are respectively converted into a first fixed point number and a shared second fixed point number corresponding to the first fixed point number, the method and the device can generate operation arrays with corresponding numbers of all the first fixed point numbers and the shared second fixed point numbers according to the number of participants participating in the same type of operation. For example, an arithmetic array is generated by all the first fixed point numbers and a shared second fixed point number. For another example, an operation array is generated by using all the first fixed point numbers of the first participant and the shared second fixed point numbers of the first participant, and another operation array is generated by using all the first fixed point numbers of the second participant and the shared second fixed point numbers of the second participant.
With reference to fig. 2 and the third example described above, assuming that m +1 pieces of data to be operated in the present disclosure are all single-precision floating point numbers, and (m +1)/2 pieces of data to be operated all need to be added, an example of two operation arrays generated by the present disclosure is shown in fig. 8 and fig. 9.
FIG. 8 shows one of two operand arrays, which may be referred to as a first operand array. The first operation array may include: the first fixed point number corresponding to the first addend in each group of data to be operated and the shared second fixed point number corresponding to all the first addends. For example, element 1 (i.e., the first fixed point number n1a) in fig. 8 is the first fixed point number corresponding to the data to be calculated n1 in fig. 2, and element 2 (i.e., the first fixed point number n3a) is the first fixed point number corresponding to the data to be calculated n3 in fig. 2; and so on until the element (m +1)/2 (i.e. the first fixed point nma) in fig. 8 is the first fixed point corresponding to the data nm to be calculated in fig. 2. The last element (m +3)/2 (i.e., the second fixed-point number nb1) in the first operand array in fig. 8 is the shared second fixed-point number corresponding to all the first addends in fig. 2.
FIG. 9 is a diagram of the other of the two operand arrays, which may be referred to as the second operand array. The second operand array may include: and the first fixed point number corresponding to the second addend in each group of data to be operated and a shared second fixed point number corresponding to all the second addends. For example, element 1 (i.e., the first fixed point number n2a) in fig. 9 is the first fixed point number corresponding to the data to be calculated n2 in fig. 2, and element 2 (i.e., the first fixed point number n4a) in fig. 9 is the first fixed point number corresponding to the data to be calculated n4 in fig. 2; and so on, until the element (m +1)/2 (i.e. the first fixed point nma) in fig. 9 is the first fixed point corresponding to the data nm +1 to be calculated in fig. 2. The last element (m +3)/2 (i.e., the second fixed-point number nb1) in the second operand array in fig. 9 is the shared second fixed-point number corresponding to all the second addends in fig. 2.
It should be particularly noted that the arrangement order of the elements in fig. 8 and fig. 9 can be flexibly set according to actual needs. For example, the second fixed-point numbers in fig. 8 and 9 may be ranked in front of all the first fixed-point numbers. For another example, in the present disclosure, one of the first operation array and the second operation array may not include the second fixed point number, that is, in a case where all the data to be operated have the same shared second fixed point number, the present disclosure may enable all the operation arrays to include the shared second fixed point number, and also enable one of the operation arrays to include the shared second fixed point number, and the other operation arrays do not include the shared second fixed point number.
According to the method, a plurality of pieces of first data to be operated, such as half-precision floating points or single-precision floating points, are converted into a plurality of first fixed points and a shared second fixed point, so that the method can utilize the SIMD technology to perform parallel operation processing on the first data to be operated, such as half-precision floating points or single-precision floating points, and the operation compatibility is improved; and is beneficial to obviously reducing the memory consumption in the operation process.
In an alternative example, operations involving mantissas and exponents in the present disclosure may include: the exponent is kept unchanged by performing an operation on the mantissa. In this case, the present disclosure may determine a first SIMD interface function corresponding to the operation on the mantissa, and determine an input parameter of the first SIMD interface function according to an element of the corresponding mantissa in the operation array.
In an alternative example, operations involving mantissas and exponents in the present disclosure may include: operations are performed for both mantissas and exponents. In this case, the present disclosure may determine a first SIMD interface function corresponding to an operation on mantissas and a second SIMD interface function corresponding to an operation on exponents, and may determine input parameters of the first SIMD interface function from elements of the corresponding mantissas in the operational array and input parameters of the second SIMD interface function from elements of the corresponding exponents in the operational array.
Optionally, the first SIMD interface function and the second SIMD interface function in this disclosure are both interface functions provided by SIMD technology. The first and second SIMD interface functions may or may not be the same. Whether the first and second SIMD interface functions are the same depends on whether the operation on the mantissa and the operation on the exponent are the same. The SIMD interface functions in this disclosure may also be referred to as SIMD instructions or SIMD functions, etc.
According to the method and the device, the corresponding SIMD interface function is called according to the operation on the mantissa and the exponent involved in the operation, the operation can be performed on the mantissa, and the mantissa and the exponent can also be respectively performed with the operation, so that the method and the device can perform parallel operation processing on semi-precision floating point data, single-precision floating point data and complex number waiting operation data by utilizing the SIMD technology, and further improve the compatibility of the operation. In addition, operations for mantissas and exponents involved in the operations include: the method and the device can be used for calculating the mantissas, and under the condition that the exponent is kept unchanged, the SIMD interface function is called once, so that the calculation of two groups of mantissas can be realized, and the calculation efficiency of the number of non-fixed points can be improved.
In an optional example, the present disclosure may preset an operation library, where the operation library generally includes a plurality of preset operation interface functions, and each preset operation interface function may correspond to an operation of the operation array. That is, a default calculation interface function may indicate a processing mode used for each element of the calculation array. The operations of calling the SIMD interface function and setting the input parameters for the SIMD interface function in the present disclosure can be realized by presetting the operation interface function. For example, first, a preset operation interface function is called from a preset operation library, and the obtained operation arrays (such as the first operation array and the second operation array in fig. 4 to 9) are used as input parameters of the called preset operation interface function, and after the called preset operation interface function obtains the input parameters, operations executed according to the input parameters include, but are not limited to: and calling the operation aiming at the mantissa and the exponent related to the same type of operation, and the corresponding SIMD interface function, and determining the input parameters of the called SIMD interface function according to the elements in the operation array.
Optionally, under the condition that the ith element in the first operational array and the ith element in the second operational array perform corresponding operations, the preset operational interface function in the disclosure can conveniently and quickly obtain the element to be operated from the two operational arrays of the input parameter of the preset operational interface function, and set the input parameter for the SIMD interface function.
Optionally, when the preset operation interface function is called from a preset operation library, the type of the data to be operated and the operation type may be considered. That is, in the process of setting the preset operation interface function in the operation library, the type of the data to be operated and the operation type may be considered. For example, one preset operation interface function may be set for addition of single-precision floating-point numbers, and another preset operation interface function may be set for addition of complex numbers. If the same preset operation interface function is set for single-precision floating-point number addition and complex number addition, the input parameter of the preset operation interface function should be represented by the data type (such as single-precision floating-point number type or complex number type) of the data to be operated, which participates in the operation, so that the preset operation interface function can execute subsequent operations (such as call operation, combination and restoration operation, and the like).
In one optional example, the input parameters of the SIMD interface function include: two 32 bits. Under the condition that the operation type comprises two participants, the method can enable two first fixed point numbers needing to be operated or two second fixed point numbers needing to be operated to form 32 bits, and enable the other two first fixed point numbers needing to be operated or the other two second fixed point numbers needing to be operated to form 32 bits to be used as input parameters of the SIMD interface function, so that the SIMD interface function can carry out operation twice at one time. Under the condition that the operation type comprises one participant, the method can enable one first fixed point number and one second fixed point number which need to be operated to form 32 bits, and enable the other first fixed point number and the other second fixed point number which need to be operated to form 32 bits to be used as input parameters of the SIMD interface function, so that the SIMD interface function can carry out two operations once.
According to the method and the device, the operation library is set, the corresponding operation is realized by calling the preset operation interface function in the operation library, the operation for multiple groups of data to be operated can be modularized, and the maintainability of the technical scheme of the method and the device is improved.
In an optional example, when the input parameters of the preset operation interface function are set, in addition to taking the operation array as the input parameters of the preset operation interface function, the present disclosure may also take the operation times corresponding to the operation array as the input parameters of the preset operation interface function. The operation times are used for representing the times of calling the SIMD interface function by the preset operation interface function. The operation times can be used as a first input parameter of a preset operation interface function.
According to the method and the device, the corresponding times of the operation array are also used as the input parameters of the preset operation interface function, so that the preset operation interface function can process the operation array more clearly, and the maintainability of the preset operation interface function is improved.
In an optional example, the operation of generating the operation results corresponding to at least one group of data to be operated according to the operation results of the SIMD interface function based on the input parameters in the present disclosure may also be implemented by a preset operation interface function. The preset operation interface function in the disclosure calls the SIMD interface function and sets the input parameters of the SIMD interface function, and then the SIMD interface function executes corresponding operations according to the input parameters, so as to return an operation result. It should be noted that, when obtaining the operation result returned by the interface function of SIMD, the sign bit in the operation result should be reserved with 1 bit.
According to the method and the device, the operation library is set, the corresponding operation is realized by calling the preset operation interface function in the operation library, and the operation modularization aiming at multiple groups of data to be operated can be realized, so that the maintainability and the applicability of the technical scheme are improved.
In an alternative example, the operations involved in the operations of the present disclosure for mantissa and exponent are: the mantissa is operated, and under the condition that the exponent is kept unchanged, the method can perform processing (such as combination reduction) according to the operation result of the SIMD interface function corresponding to the mantissa operation and the generated second fixed point number, so as to form the operation result corresponding to each group of data to be operated.
With reference to fig. 4 and 5, in the case that all data to be operated are subjected to addition operation, the present disclosure may obtain (m +1)/2 operation results, where the (m +1)/2 operation results are operation results obtained by operating on mantissas of the participating parties, and the present disclosure may combine and restore the operation results and corresponding second fixed point numbers to form single-precision floating point numbers, thereby implementing the addition operation of the single-precision floating point numbers. Moreover, the method and the device can realize two times of addition operation of the single precision floating point number by calling the SIMD interface function once.
In conjunction with fig. 6 and 7, in the case where all the data to be operated are subjected to the addition operation, the present disclosure may obtain (m +1) operation results, where the (m +1) operation results include: the method and the device have the advantages that operation results obtained by operating the mantissas of the real part of the participant and operation results obtained by operating the mantissas of the imaginary part of the participant can be combined and restored with the corresponding second fixed point number to form a complex number, so that the addition operation of the complex number is realized. Moreover, the present disclosure may implement a complex addition operation by calling the SIMD interface function once.
With reference to fig. 8 and 9, in the case that all data to be operated are subjected to addition operation, the present disclosure may obtain (m +1)/2 operation results, where the (m +1)/2 operation results are obtained by operating on the mantissas of the participating parties, and the present disclosure may combine and restore one operation result and the corresponding shared second fixed point number to form a single-precision floating point number, thereby implementing the addition operation of the single-precision floating point number. Moreover, the method and the device can realize two times of addition operation of the single precision floating point number by calling the SIMD interface function once.
In an alternative example, the operations involved in the operations of the present disclosure for mantissa and exponent are: in the case of performing operations on both mantissas and exponents, the present disclosure may generate operation results corresponding to each set of data to be operated according to operation results of the SIMD interface functions corresponding to mantissa operations and operation results of the SIMD interface functions corresponding to exponent operations. For example, for multiplication, operations are required for both mantissa and exponent, where the result of mantissa operation may be a 16-bit fixed-point number, and the result of exponent operation may be a 16-bit fixed-point number. According to the method, the two fixed point numbers with 16 bits are combined and restored to form the single-precision floating point number, so that the multiplication of the single-precision floating point number is realized. The present disclosure can realize multiplication of complex numbers by forming complex numbers by combining and restoring fixed-point numbers of four 16-bit bits. Other operations, i.e. processes for forming operation results of data to be operated on, can refer to the above description, and are not described in detail herein.
According to the method and the device, the operation of the non-fixed point number can be realized by performing processing such as combination reduction on the operation result according to the operation of the mantissa and the exponent involved in the operation, so that the application range of the interface function based on the SIMD can be favorably improved, and the efficiency of the operation of the non-fixed point number can be favorably improved.
Exemplary devices
Fig. 10 is a schematic structural diagram illustrating an embodiment of a simd-based data operation apparatus according to the present disclosure. The apparatus of this embodiment may be used to implement the method embodiments of the present disclosure described above.
In fig. 10, the apparatus of this embodiment includes: a generate operand array module 1000, a call interface function module 1001, and a generate operation result module 1002.
The generate operation array module 1000 is mainly used for generating an operation array according to at least one group of data to be operated participating in the same type of operation. Wherein, a group of data to be operated comprises: all data to be operated on participating in the same operation, wherein the operation array comprises: the arithmetic device comprises at least one first fixed point number used for representing sign bits and mantissas of data to be operated, and at least one second fixed point number used for representing exponents of the data to be operated.
Optionally, the generate operand group module 1000 may include: at least one of the first sub-module, the second sub-module, and the third sub-module.
The first submodule is used for converting a sign bit and a mantissa in first data to be operated into a first fixed point number and converting an exponent in the first data to be operated into a second fixed point number. The first data to be operated on comprises a mantissa and an exponent. The first data to be operated on can be a half-precision floating point number or a single-precision floating point number. For example, the first submodule converts the mantissa of a half-precision floating point number in the data to be operated into a first fixed point number, and converts the exponent of the half-precision floating point number into a second fixed point number. For another example, the first submodule converts the mantissa of the single-precision floating point number in the data to be operated into a first fixed point number, and converts the exponent of the single-precision floating point number into a second fixed point number.
The second submodule is used for converting a plurality of sign bits and mantissas in a second data to be operated into a first fixed point number respectively, and converting a plurality of exponents in the second data to be operated into a shared second fixed point number. The second data to be calculated comprises: a plurality of mantissas and a plurality of exponents. The second data to be calculated may be a plurality of numbers, etc. For example, the second sub-module may convert the complex number in the data to be operated on into a first fixed point number corresponding to the mantissa of the real part of the complex number, a first fixed point number corresponding to the mantissa of the imaginary part of the complex number, and a second fixed point number corresponding to a share of the exponent of the real part and the exponent of the imaginary part of the complex number.
The third sub-module may be configured to convert each sign bit and mantissa in the plurality of first data to be operated into a first fixed point number, and convert each exponent in the plurality of first data to be operated into a shared second fixed point number. The first data to be operated on comprises a mantissa and an exponent. The first data to be operated on can be a half-precision floating point number or a single-precision floating point number. For example, the third sub-module may convert a group of half-precision floating point numbers in the data to be operated into a group of first fixed point numbers corresponding to mantissas of the group of half-precision floating point numbers and a shared second fixed point number corresponding to exponent of the group of half-precision floating point numbers. For another example, the third sub-module may convert a group of single-precision floating point numbers in the number to be calculated into a group of first fixed point numbers corresponding to mantissas of the group of single-precision floating point numbers and a shared second fixed point number corresponding to exponent of the group of single-precision floating point numbers.
The call interface function module 1001 is configured to call a corresponding SIMD-based interface function according to operations on mantissas and exponents involved in the operations, and determine input parameters of the SIMD-based interface function according to elements in an operation array generated by the generate operation array module 1000.
Optionally, the call interface function module 1001 may call a first SIMD interface function corresponding to the operation on the mantissa, and determine an input parameter of the first SIMD interface function according to an element of the corresponding mantissa in the operation array.
Optionally, the call interface function module 1001 may call a first SIMD interface function corresponding to the operation on the mantissas, call a second SIMD interface function corresponding to the operation on the exponents, and determine input parameters of the first SIMD interface function and the second SIMD interface function according to elements of the corresponding mantissas and elements of the corresponding exponents in the operation array, respectively.
Optionally, the calling interface function module 1001 may include: an operation library and a calling submodule. The operation base comprises a plurality of preset operation interface functions, and each preset operation interface function corresponds to one type of operation. The calling submodule can call a preset operation interface function from the operation library, and the operation array is used as an input parameter of the preset operation interface function, so that the method can execute the steps of calling the operation related to the same type of operation and aiming at the mantissa and the exponent through the preset operation interface function, calling the corresponding SIMD interface function, and determining the input parameter of the called SIMD interface function according to the elements in the operation array.
Optionally, the calling sub-module may use the operation array and the operation times corresponding to the operation array as input parameters of a preset operation interface function; the operation times are used for representing the times of calling the SIMD interface function by the preset operation interface function.
The operation result generation module 1002 is configured to generate operation results corresponding to at least one set of data to be operated according to the operation result of the SIMD-based interface function called by the calling interface function module.
Optionally, the preset operation interface function in the present disclosure may execute the operation of generating the operation result corresponding to each set of data to be operated according to the operation result of the SIMD interface function based on the input parameter. That is, the operation result generation module 1002 may be integrated inside the preset operation interface function.
Optionally, the preset operation interface function may generate operation results corresponding to each set of data to be operated according to the operation result of the SIMD interface function corresponding to the mantissa operation and the corresponding second fixed point number.
Optionally, the preset operation interface function may generate operation results corresponding to each set of data to be operated according to an operation result of the SIMD interface function corresponding to the mantissa operation and an operation result of the SIMD interface function corresponding to the exponent operation.
Exemplary electronic device
An electronic device according to an embodiment of the present disclosure is described below with reference to fig. 11. FIG. 11 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure. As shown in fig. 11, the electronic device 111 includes one or more processors 1111 and memory 1112.
The processor 1111 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 111 to perform desired functions.
Memory 1112 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory, for example, may include: random Access Memory (RAM) and/or cache memory (cache), etc. The nonvolatile memory, for example, may include: read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 1111 to implement the single instruction multiple data stream based data operation method of the various embodiments of the present disclosure described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.
In one example, the electronic device 111 may further include: an input device 1113, and an output device 1114, among other components, interconnected by a bus system and/or other form of connection mechanism (not shown). The input device 1113 may also include, for example, a keyboard, mouse, or the like. The output device 1114 can output various information to the outside. The output devices 1114 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.
Of course, for simplicity, only some of the components of the electronic device 111 relevant to the present disclosure are shown in fig. 11, omitting components such as buses, input/output interfaces, and the like. In addition, electronic device 111 may include any other suitable components, depending on the particular application.
Exemplary computer program product and computer-readable storage Medium
In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in a single instruction multiple data stream based data operation method according to various embodiments of the present disclosure described in the above-mentioned "exemplary methods" section of this specification.
The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the single instruction multiple data stream based data operation method according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium may include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, and systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," comprising, "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects, and the like, will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (12)

1. A data operation method based on Single Instruction Multiple Data (SIMD) streams comprises:
generating an operation array according to at least one group of data to be operated participating in the same type of operation, wherein the group of data to be operated comprises: data to be operated on participating in the same operation in the same type of operation, the operation array including: at least one first fixed point number used for expressing sign bit and mantissa of the data to be operated on, and at least one second fixed point number used for expressing exponent of the data to be operated on;
calling a single instruction multiple data stream SIMD interface function corresponding to the operation aiming at mantissas and exponents involved in the same type of operation, and determining input parameters of the SIMD interface function according to elements in the operation array;
and generating operation results corresponding to the at least one group of data to be operated according to the operation results of the SIMD interface function based on the input parameters.
2. The method according to claim 1, wherein the generating an operation array according to at least one group of data to be operated participating in the same type of operation comprises at least one of:
converting a sign bit and a mantissa in first data to be operated into a first fixed point number, and converting an exponent in the first data to be operated into a second fixed point number; wherein, the first data to be operated on comprises a mantissa and an exponent;
converting a plurality of sign bits and mantissas in second data to be operated into first fixed point numbers respectively, and converting a plurality of indexes in the second data to be operated into a shared second fixed point number; the second data to be calculated includes: a plurality of mantissas and a plurality of exponents;
and converting each sign bit and mantissa in the plurality of first data to be operated into a first fixed point number respectively, and converting each exponent in the plurality of first data to be operated into a shared second fixed point number.
3. The method of claim 1 or 2, wherein the invoking of operations for mantissas and exponents related to the same type of operation, corresponding single instruction multiple data flow (SIMD) interface functions, and determining input parameters for the SIMD interface functions from elements in the operational array comprises at least one of:
calling a first SIMD interface function corresponding to the operation aiming at the mantissa, and determining the input parameters of the first SIMD interface function according to the elements of the corresponding mantissa in the operation array;
calling a first SIMD interface function corresponding to the operation aiming at the mantissa, calling a second SIMD interface function corresponding to the operation aiming at the exponent, and respectively determining the input parameters of the first SIMD interface function and the second SIMD interface function according to the element corresponding to the mantissa and the element corresponding to the exponent in the operation array.
4. The method of any one of claims 1 to 3, wherein said invoking operations on mantissas and exponents involved with said same type of operation, corresponding single instruction multiple data flow (SIMD) interface functions, and determining input parameters for said SIMD interface functions from elements in said operational array comprises:
calling a preset operation interface function from a preset operation library, and taking the operation array as an input parameter of the preset operation interface function;
executing operations aiming at mantissas and exponents related to the calling and the same type of operation through the preset operation interface function, corresponding SIMD interface functions, and determining input parameters of the called SIMD interface functions according to elements in the operation array;
the operation base comprises a plurality of preset operation interface functions, and each preset operation interface function corresponds to one operation of the operation array.
5. The method of claim 4, wherein the taking the operand array as an input parameter of the preset operation interface function comprises:
taking the operation array and the operation times corresponding to the operation array as input parameters of the preset operation interface function;
and the operation times are used for expressing the times of calling the SIMD interface function by the preset operation interface function.
6. The method according to claim 4 or 5, wherein the generating operation results corresponding to the at least one group of data to be operated based on the operation results of the input parameters according to the SIMD interface function comprises:
executing the operation result based on the input parameters according to the SIMD interface function through the preset operation interface function, and generating operation results corresponding to the at least one group of data to be operated respectively;
and obtaining operation results corresponding to the at least one group of data to be operated according to the output of the preset operation interface function.
7. The method according to any one of claims 1 to 6, wherein the generating operation results respectively corresponding to the at least one group of data to be operated according to the operation results of the SIMD interface function based on the input parameters comprises:
generating operation results corresponding to the at least one group of data to be operated respectively according to the operation result of the SIMD interface function corresponding to the mantissa operation and the second fixed point number; or
And generating operation results corresponding to the at least one group of data to be operated respectively according to the operation result of the SIMD interface function corresponding to the mantissa operation and the operation result of the SIMD interface function corresponding to the exponent operation.
8. A single instruction multiple data stream-based data operation apparatus, comprising:
the generating operation array module is used for generating an operation array according to at least one group of data to be operated participating in the same type of operation, wherein the group of data to be operated comprises: all data to be operated on participating in the same operation, the operation array comprises: at least one first fixed point number used for expressing sign bit and mantissa of the data to be operated on, and at least one second fixed point number used for expressing exponent of the data to be operated on;
the calling interface function module is used for calling a corresponding interface function based on single instruction multiple data Stream (SIMD) according to the operation related to the operation aiming at the mantissa and the exponent, and determining input parameters of the interface function based on the SIMD according to elements in an operation array generated by the operation array generating module;
and the operation result generation module is used for generating operation results corresponding to the at least one group of data to be operated according to the operation results of the SIMD-based interface functions called by the calling interface function module.
9. The apparatus of claim 8, wherein the generate operand set module comprises at least one of:
the first submodule is used for converting a sign bit and a mantissa in first data to be operated into a first fixed point number and converting an exponent in the first data to be operated into a second fixed point number; wherein, the first data to be operated on comprises a mantissa and an exponent;
the second submodule is used for converting a plurality of sign bits and mantissas in second data to be operated into first fixed point numbers respectively and converting a plurality of indexes in the second data to be operated into a shared second fixed point number; the second data to be calculated includes: a plurality of mantissas and a plurality of exponents;
and the third submodule is used for converting each sign bit and mantissa in the plurality of first data to be operated into a first fixed point number respectively and converting each exponent in the plurality of first data to be operated into a shared second fixed point number.
10. The apparatus of claim 8 or 9, wherein the call interface function module comprises:
the operation library comprises a plurality of preset operation interface functions, and each preset operation interface function corresponds to one type of operation;
the calling submodule is used for calling a preset operation interface function from the operation library and taking the operation array as an input parameter of the preset operation interface function;
executing operations aiming at mantissas and exponents related to the calling and the same type of operation through the preset operation interface function, corresponding SIMD interface functions, and determining input parameters of the called SIMD interface functions according to elements in the operation array;
the operation base comprises a plurality of preset operation interface functions, and each preset operation interface function corresponds to one operation of the operation array.
11. A computer-readable storage medium, the storage medium storing a computer program for performing the method of any of the preceding claims 1-7.
12. An electronic device, the electronic device comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of any one of claims 1-7.
CN201910566415.8A 2019-06-27 2019-06-27 Data operation method, device, medium and equipment based on single-instruction multi-data stream Active CN112148371B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910566415.8A CN112148371B (en) 2019-06-27 2019-06-27 Data operation method, device, medium and equipment based on single-instruction multi-data stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910566415.8A CN112148371B (en) 2019-06-27 2019-06-27 Data operation method, device, medium and equipment based on single-instruction multi-data stream

Publications (2)

Publication Number Publication Date
CN112148371A true CN112148371A (en) 2020-12-29
CN112148371B CN112148371B (en) 2023-10-24

Family

ID=73868492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910566415.8A Active CN112148371B (en) 2019-06-27 2019-06-27 Data operation method, device, medium and equipment based on single-instruction multi-data stream

Country Status (1)

Country Link
CN (1) CN112148371B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6341300B1 (en) * 1999-01-29 2002-01-22 Sun Microsystems, Inc. Parallel fixed point square root and reciprocal square root computation unit in a processor
JP2004078886A (en) * 2002-06-20 2004-03-11 Matsushita Electric Ind Co Ltd Floating point storing method and floating point operating device
US20060028482A1 (en) * 2004-08-04 2006-02-09 Nvidia Corporation Filtering unit for floating-point texture data
EP2057549A1 (en) * 2006-08-11 2009-05-13 Aspex Semiconductor Limited Improvements relating to direct data input/output interfaces
US20090164544A1 (en) * 2007-12-19 2009-06-25 Jeffrey Dobbek Dynamic range enhancement for arithmetic calculations in real-time control systems using fixed point hardware
CN101620589A (en) * 2008-06-30 2010-01-06 英特尔公司 Efficient parallel floating point exception handling in a processor
US20140208069A1 (en) * 2013-01-22 2014-07-24 Samplify Systems, Inc. Simd instructions for data compression and decompression
CN104111816A (en) * 2014-06-25 2014-10-22 中国人民解放军国防科学技术大学 Multifunctional SIMD structure floating point fusion multiplying and adding arithmetic device in GPDSP
CN104166535A (en) * 2013-07-19 2014-11-26 郑州宇通客车股份有限公司 Fixed point processor and anti-overflow method thereof
CN106951211A (en) * 2017-03-27 2017-07-14 南京大学 A kind of restructural fixed and floating general purpose multipliers
CN107077323A (en) * 2014-11-03 2017-08-18 Arm 有限公司 Use the apparatus and method of the data processing of programmable efficacy data
US20180225093A1 (en) * 2017-02-03 2018-08-09 Intel Corporation Implementing logarithmic and antilogarithmic operations based on piecewise linear approximation
CN108459840A (en) * 2018-02-14 2018-08-28 中国科学院电子学研究所 A kind of SIMD architecture floating-point fusion point multiplication operation unit
US20180322607A1 (en) * 2017-05-05 2018-11-08 Intel Corporation Dynamic precision management for integer deep learning primitives
US20190042922A1 (en) * 2018-06-29 2019-02-07 Kamlesh Pillai Deep neural network architecture using piecewise linear approximation

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6341300B1 (en) * 1999-01-29 2002-01-22 Sun Microsystems, Inc. Parallel fixed point square root and reciprocal square root computation unit in a processor
JP2004078886A (en) * 2002-06-20 2004-03-11 Matsushita Electric Ind Co Ltd Floating point storing method and floating point operating device
US20060028482A1 (en) * 2004-08-04 2006-02-09 Nvidia Corporation Filtering unit for floating-point texture data
CN1993728A (en) * 2004-08-04 2007-07-04 辉达公司 Filtering unit for floating-point texture data
EP2057549A1 (en) * 2006-08-11 2009-05-13 Aspex Semiconductor Limited Improvements relating to direct data input/output interfaces
US20090164544A1 (en) * 2007-12-19 2009-06-25 Jeffrey Dobbek Dynamic range enhancement for arithmetic calculations in real-time control systems using fixed point hardware
CN101620589A (en) * 2008-06-30 2010-01-06 英特尔公司 Efficient parallel floating point exception handling in a processor
US20140208069A1 (en) * 2013-01-22 2014-07-24 Samplify Systems, Inc. Simd instructions for data compression and decompression
CN104166535A (en) * 2013-07-19 2014-11-26 郑州宇通客车股份有限公司 Fixed point processor and anti-overflow method thereof
CN104111816A (en) * 2014-06-25 2014-10-22 中国人民解放军国防科学技术大学 Multifunctional SIMD structure floating point fusion multiplying and adding arithmetic device in GPDSP
CN107077323A (en) * 2014-11-03 2017-08-18 Arm 有限公司 Use the apparatus and method of the data processing of programmable efficacy data
US20180225093A1 (en) * 2017-02-03 2018-08-09 Intel Corporation Implementing logarithmic and antilogarithmic operations based on piecewise linear approximation
CN106951211A (en) * 2017-03-27 2017-07-14 南京大学 A kind of restructural fixed and floating general purpose multipliers
US20180322607A1 (en) * 2017-05-05 2018-11-08 Intel Corporation Dynamic precision management for integer deep learning primitives
CN108459840A (en) * 2018-02-14 2018-08-28 中国科学院电子学研究所 A kind of SIMD architecture floating-point fusion point multiplication operation unit
US20190042922A1 (en) * 2018-06-29 2019-02-07 Kamlesh Pillai Deep neural network architecture using piecewise linear approximation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨毅;郭立;史鸿声;季建;: "一个面向移动设备的可编程顶点处理器的设计", 中国科学技术大学学报 *
焦永;: "一种单精度浮点对数运算的硬件实现", 电脑知识与技术 *

Also Published As

Publication number Publication date
CN112148371B (en) 2023-10-24

Similar Documents

Publication Publication Date Title
KR102447636B1 (en) Apparatus and method for performing arithmetic operations for accumulating floating point numbers
US9519460B1 (en) Universal single instruction multiple data multiplier and wide accumulator unit
EP1857925B1 (en) Method and apparatus for decimal number multiplication using hardware for binary number operations
CN112230881A (en) Floating-point number processor
US20200272419A1 (en) Compressing like-magnitude partial products in multiply accumulation
JPH10214176A (en) Device for quickly calculating transcendental function
CN107025091B (en) Binary fused multiply-add floating point calculation
US10649730B2 (en) Normalization of a product on a datapath
US11275561B2 (en) Mixed precision floating-point multiply-add operation
CN111290732B (en) Floating-point number multiplication circuit based on posit data format
CN110826706B (en) Data processing method and device for neural network
US9430190B2 (en) Fused multiply add pipeline
CN117472325B (en) Multiplication processor, operation processing method, chip and electronic equipment
CN117420982A (en) Chip comprising a fused multiply-accumulator, device and control method for data operations
CN112148371B (en) Data operation method, device, medium and equipment based on single-instruction multi-data stream
CN113126954A (en) Method and device for multiplication calculation of floating point number and arithmetic logic unit
US7747669B2 (en) Rounding of binary integers
CN111190910A (en) Quota resource processing method and device, electronic equipment and readable storage medium
JP2019101896A (en) Arithmetic processing unit and control method of arithmetic processing unit
CN116700666A (en) Floating point number processing method and device
US5661674A (en) Divide to integer
CN116700664A (en) Method and device for determining square root of floating point number
US20220291899A1 (en) Processing unit, method and computer program for multiplication
Gonzalez-Navarro et al. A binary integer decimal-based multiplier for decimal floating-point arithmetic
US20230289138A1 (en) Hardware device to execute instruction to convert input value from one data format to another data format

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant