CN112148371B - Data operation method, device, medium and equipment based on single-instruction multi-data stream - Google Patents

Data operation method, device, medium and equipment based on single-instruction multi-data stream Download PDF

Info

Publication number
CN112148371B
CN112148371B CN201910566415.8A CN201910566415A CN112148371B CN 112148371 B CN112148371 B CN 112148371B CN 201910566415 A CN201910566415 A CN 201910566415A CN 112148371 B CN112148371 B CN 112148371B
Authority
CN
China
Prior art keywords
data
operated
interface function
simd
point number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910566415.8A
Other languages
Chinese (zh)
Other versions
CN112148371A (en
Inventor
陈亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Horizon Robotics Technology Research and Development Co Ltd
Original Assignee
Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Horizon Robotics Technology Research and Development Co Ltd filed Critical Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority to CN201910566415.8A priority Critical patent/CN112148371B/en
Publication of CN112148371A publication Critical patent/CN112148371A/en
Application granted granted Critical
Publication of CN112148371B publication Critical patent/CN112148371B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]

Abstract

A data operation method, device, medium and equipment based on single instruction multiple data stream are disclosed. The method comprises the following steps: generating an operation array according to at least one group of data to be operated which participates in the same type of operation, wherein one group of data to be operated comprises: and the operation array comprises data to be operated, which participates in the same operation in the same type of operation, and the operation array comprises: at least one first fixed point number for representing sign bits and mantissas of data to be operated, and at least one second fixed point number for representing exponents of data to be operated; calling SIMD interface functions corresponding to the arithmetic for mantissa and exponent related to the same type of arithmetic, and determining input parameters of the SIMD interface functions according to elements in an arithmetic array; and generating at least one group of operation results respectively corresponding to the data to be operated according to the operation results of the SIMD interface function based on the input parameters. The present disclosure facilitates improving the applicability of SIMD-based interface functions and the efficiency of non-fixed point number operations.

Description

Data operation method, device, medium and equipment based on single-instruction multi-data stream
Technical Field
The present disclosure relates to data processing technology, and more particularly, to a data operation method based on a single instruction multiple data stream, a data operation device based on a single instruction multiple data stream, a storage medium, and an electronic apparatus.
Background
SIMD (Single Instruction Multiple Data, single instruction multiple data stream) technology was introduced into ARM (Advanced RISC Machine, advanced reduced instruction set) v-series, etc. processor architectures. By utilizing the SIMD technology, the parallel operation processing of the fixed point number can be realized. How to implement operations of more data types based on SIMD technology is a considerable technical problem.
Disclosure of Invention
The present disclosure has been made in order to solve the above technical problems. The embodiment of the disclosure provides a data operation method, a data operation device, a storage medium and electronic equipment based on single-instruction multi-data streams.
According to an aspect of an embodiment of the present disclosure, there is provided a data operation method based on a single instruction multiple data stream, the method including: generating an operation array according to at least one group of data to be operated which participates in the same type of operation, wherein the group of data to be operated comprises: data to be operated on for participating in a same operation of the same type of operation, the operation array comprising: at least one first fixed point number for representing sign bits and mantissas of data to be operated, and at least one second fixed point number for representing exponents of data to be operated; invoking a single instruction multiple data Stream (SIMD) interface function corresponding to the arithmetic for mantissa and exponent related to the same type of operation, and determining input parameters of the SIMD interface function according to elements in the arithmetic array; and generating operation results corresponding to the at least one group of data to be operated respectively according to the operation results of the SIMD interface function based on the input parameters.
According to another aspect of an embodiment of the present disclosure, there is provided a data operation apparatus based on a single instruction multiple data stream, the apparatus including: the generating operation array module is used for generating an operation array according to at least one group of data to be operated which participates in the same type of operation, wherein the group of data to be operated comprises: all data to be operated on participating in the same operation, the operation array comprises: at least one first fixed point number for representing sign bits and mantissas of data to be operated, and at least one second fixed point number for representing exponents of data to be operated; the interface function calling module is used for calling a corresponding interface function based on single instruction multiple data Stream (SIMD) according to the arithmetic directed against mantissas and exponents related to the arithmetic, and determining input parameters of the interface function based on the SIMD according to elements in an arithmetic array generated by the arithmetic array generating module; and the operation result generation module is used for generating operation results respectively corresponding to the at least one group of data to be operated according to the operation results of the SIMD-based interface function called by the interface function calling module.
According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the above-described data operation method based on single instruction multiple data streams.
According to still another aspect of the embodiments of the present disclosure, there is provided an electronic device including: a processor; a memory for storing the processor-executable instructions; the processor is used for reading the executable instructions from the memory and executing the instructions to realize the data operation method based on the single-instruction multi-data stream.
According to the data operation method and device based on the single instruction multiple data stream, sign bits, mantissas and exponents of data to be operated are represented by using the first fixed point number and the second fixed point number, the interface function based on SIMD is called based on operation of the mantissas and exponents related to operation, input parameters are provided for the interface function based on SIMD by utilizing elements in an operation array, operation can be achieved in parallel by calling the interface function based on SIMD once, operation results of the data to be operated can be formed by utilizing operation results of the interface function based on SIMD, and operation of non-fixed point numbers can be achieved. Therefore, the technical scheme provided by the disclosure is beneficial to improving the application range of the interface function based on SIMD and improving the efficiency of non-fixed point number operation.
The technical scheme of the present disclosure is described in further detail below through the accompanying drawings and examples.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The disclosure may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a flow chart of an embodiment of a data operation method based on single instruction multiple data streams of the present disclosure;
FIG. 2 is a schematic diagram of an example of a plurality of sets of data to be operated on in the same type of operation according to the present disclosure;
FIG. 3 is a schematic diagram of a single precision floating point number format;
FIG. 4 is a schematic diagram of one example of a first operand set generated by the present disclosure;
FIG. 5 is a schematic diagram of one example of a second operand set generated by the present disclosure;
FIG. 6 is a schematic diagram of another example of a first operand set generated by the present disclosure;
FIG. 7 is a schematic diagram of another example of a second operand set generated by the present disclosure;
FIG. 8 is a schematic diagram of yet another example of a first operand set generated by the present disclosure;
FIG. 9 is a schematic diagram of yet another example of a second operand set generated by the present disclosure;
FIG. 10 is a schematic diagram illustrating an embodiment of a data operation device based on single instruction multiple data streams according to the present disclosure;
fig. 11 is a block diagram of an electronic device according to an exemplary embodiment of the present application.
Detailed Description
Example embodiments according to the present disclosure will be described in detail below with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present disclosure and not all of the embodiments of the present disclosure, and that the present disclosure is not limited by the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.
It will be appreciated by those of skill in the art that the terms "first," "second," etc. in embodiments of the present disclosure are used merely to distinguish between different steps, devices or modules, etc., and do not represent any particular technical meaning nor necessarily logical order between them.
It should also be understood that in embodiments of the present disclosure, "plurality" may refer to two or more, and "at least one" may refer to one, two or more.
It should also be appreciated that any component, data, or structure referred to in the presently disclosed embodiments may be generally understood as one or more without explicit limitation or the contrary in the context.
In addition, the term "and/or" in this disclosure is merely an association relationship describing an association object, and indicates that three relationships may exist, such as a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the front and rear association objects are an or relationship.
It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail.
Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Embodiments of the present disclosure are applicable to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, or server, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment. In a distributed cloud computing environment, tasks may be performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.
Summary of the disclosure
In implementing the present disclosure, the inventors found that parallel operation processing of fixed-point numbers (e.g., 16 bits) can be implemented using SIMD technology, however, in addition to fixed-point numbers, there are types of data such as half-precision floating-point numbers, single-precision floating-point numbers, and complex numbers, and SIMD cannot operate on these types of data. If the operation of half-precision floating point number, single-precision floating point number, complex number and other types of data is realized in a software simulation mode, the operation speed is generally slow, and the real-time requirement is difficult to meet. Although the ARMv family of processor architectures may implement operations on Floating point numbers by introducing a VFP (Vector flow-point Coprocessor), the use of a VFP often requires certain conditions to be met. In one example, the use of VFP requires that the following three conditions be met:
condition 1, VFP, is often an option that needs to be checked.
Condition 2, the need to allow related access rights in the embedded operating system.
Condition 3, a compiler of the corresponding version must be used.
The three conditions described above need to be met simultaneously to be able to use VFP. The conditions that need to be met using a VFP may be somewhat disturbing for practical applications of the VFP. In addition, for existing devices, if no VFP is introduced into the processor architecture in the device, this means that floating point arithmetic cannot be implemented by way of VFP.
Exemplary overview
Assuming that a data processor (such as an ARM11 series data processor or the like) supports SIMD technology, the data processor can realize parallel operation processing of fixed point numbers by utilizing the SIMD technology.
Although SIMD technology does not support operations on half-precision floating-point, single-precision floating-point, complex, and other types of data, whether the data to be operated is half-precision floating-point, single-precision floating-point, or complex, and other non-fixed-point types of data, the present disclosure may utilize a first fixed-point number and a second fixed-point number, where the first fixed-point number is used to represent sign bits and mantissas of the data to be operated, and the second fixed-point number is used to represent an exponent of the data to be operated. The method and the device can call the corresponding SIMD interface function to realize the parallel operation processing of the fixed point number according to the first fixed point number, the second fixed point number and the operation type corresponding to the data to be operated, and can obtain the operation result corresponding to the data to be operated in an exponential and mantissa combination mode according to the result of the parallel operation processing.
Exemplary method
Fig. 1 is a flowchart of an embodiment of a data operation method based on single instruction multiple data streams in the present disclosure. As shown in fig. 1, the method of this embodiment includes the steps of: s100, S101, and S102.
S100, generating an operation array according to at least one group of data to be operated which participates in the same type of operation.
Operations in this disclosure are generally referred to as algebraic operations. Operations in this disclosure include, but are not limited to: addition, subtraction, multiplication, division, square, evolution, reciprocal, logarithmic, exponential, etc.
A set of data to be operated on generally refers to all data to be operated on that performs the same operation. The set of data to be operated on may include one data to be operated on, or may include two data to be operated on. The number of data to be operated on included in a set of data to be operated on is generally related to the operation that the data to be operated on is required to perform. It is considered that the number of data to be operated on included in a set of data to be operated on is generally related to the number of participants performing the operation. For example, for addition operations, a set of data to be operated on includes two data to be operated on, both of which are addends, i.e., a first addend and a second addend. For another example, for subtraction, a set of data to be operated on includes two data to be operated on, one of which is a decremented number and the other of which is a decremented number. For another example, for reciprocal operations, a set of data to be operated on includes one data to be operated on.
Any data to be operated on included in any set of data to be operated on of the present disclosure may include, but is not limited to: fixed-point number (e.g., 16 bits), semi-precision floating-point number (e.g., 16 bits), single-precision floating-point number (e.g., 32 bits), complex number (e.g., 64 bits), and the like.
The operation array comprises: a plurality of elements, and each element is a fixed point number. For example, one element in the operation array may be a first fixed point number for representing sign bits and mantissas of data to be operated on, and another element in the operation array may be a second fixed point number for representing exponents of data to be operated on. The number of the operation arrays may be one or a plurality of.
S101, calling a SIMD interface function corresponding to the operation for mantissa and exponent related to the same type of operation, and determining input parameters of the SIMD interface function according to elements in an operation array.
Operations for mantissas and exponents that are involved in the same type of operation in this disclosure may be represented as: whether or not to operate on mantissas and whether or not to operate on mantissas, and whether or not to operate on exponents. I.e., the type of operation of the same type of operation in this disclosure, may determine the operation for mantissa and exponent to which it relates. For example, operations of the type such as addition and subtraction are required to be performed on mantissas, and may not be performed on exponents. For another example, operations such as multiplication, division, and reciprocal operations need to be performed on both mantissas and exponents.
The present disclosure may preset operations for mantissas and exponents, which are related to different types of operations, and respective SIMD interface functions. Input parameters for SIMD interface functions in this disclosure generally include: at least one fixed point number is required for the corresponding operation.
S102, generating at least one group of operation results respectively corresponding to the data to be operated according to the operation results of the SIMD interface function based on the input parameters.
After the SIMD interface function is called and the input parameters of the SIMD interface function are assigned, the SIMD interface function executes corresponding operation based on the input parameters and returns corresponding operation results. The operation result is the operation result corresponding to the input parameter. The operation result may include: at least one of an operation result for mantissa and an operation result for exponent.
The method and the device can be used for processing the operation results (such as combined reduction data type processing) to form operation results corresponding to each group of data to be operated. The present disclosure may also form an operation result corresponding to each set of data to be operated by processing (e.g., combining and restoring data type processing) the operation result and elements in the operation array.
The sign bit, the mantissa and the exponent of the data to be operated are expressed by using the first fixed point number and the second fixed point number, the interface function based on the SIMD is called by operation related to the operation aiming at the mantissa and the exponent, the element in the operation array is utilized to provide input parameters for the interface function based on the SIMD, the operation can be realized in parallel for two times by calling the interface function based on the SIMD once, and the operation result of the data to be operated can be formed by utilizing the operation result of the interface function based on the SIMD, so that the operation of the non-fixed point number is realized. Therefore, the technical scheme provided by the disclosure is beneficial to improving the application range of the interface function based on SIMD and improving the efficiency of non-fixed point number operation.
In an alternative example, the first fixed point number of the present disclosure is 16 bits, with the most significant bit (i.e., leftmost bit) being the sign bit and the other 15 bits being occupied by mantissas. The second fixed point number in this disclosure is 16 bits, which 16 bits are all exponentially occupied.
In an alternative example, one example of multiple sets of data to be operated on that participate in the same type of operation of the present disclosure is shown in fig. 2.
In fig. 2, assuming that there are m+1 pieces of data to be operated (m is an odd number), the same type of operation with the participation party 2 needs to be performed (fig. 2 is described below by taking an addition operation as an example). The m+1 data to be operated on are usually the same type of non-fixed point number, for example, all single-precision floating point numbers or complex numbers. Of course, the m+1 pieces of data to be operated on can be different types of non-fixed point numbers. m+1 non-fixed point numbers are respectively: data to be operated n1, data to be operated n2, data to be operated n3, data to be operated n4, … …, data to be operated nm and data to be operated nm+1. The data to be operated n1 and the data to be operated n2 form a group of data to be operated, namely a first group of data to be operated, which needs to be subjected to addition operation. The data to be operated n3 and the data to be operated n4 form a group of data to be operated, namely a second group of data to be operated, which needs to be subjected to addition operation. And the like, until the data to be operated nm and the data to be operated nm+1 form a group of data to be operated which needs to be subjected to addition operation, namely (m+1)/2 group of data to be operated. The data to be operated n1, the data to be operated n3 and the data to be operated … … are all first summands. The data to be operated n2, the data to be operated n4 and … … and the data to be operated nm+1 are all second addends.
In one alternative example, the present disclosure may generate N operand groups from all sets of data to be operated on that participate in the same type of operation. Where N is the number of participants in the operation. The number of participants in an operation in the present disclosure is determined by the inherent nature of the operation and not by the number of groups of data to be operated on that participate in the operation. For example, the number of participants is 2 for operations such as addition, subtraction, multiplication, and division. For another example, for an operation such as a reciprocal operation, the number of participants is 1.
Optionally, if an operation includes N participants, when N operation arrays are generated, the first fixed points and the second fixed points corresponding to all the first participants in each group of data to be operated are generally used as elements in one operation array, the first fixed points and the second fixed points corresponding to all the second participants in each group of data to be operated are used as elements in another operation array, and so on, all the nth participants in each group of data to be operated are used as the nth operation array.
For one example, for M sets of data to be operated on, which participate in the addition operation, since the participation party of the addition operation is 2, the disclosure may generate two operation arrays for the M sets of data to be operated on, where one operation array corresponds to all first addends in the M sets of data to be operated on, and the other operation array corresponds to all second addends in the M sets of data to be operated on.
For another example, for M sets of data to be operated on, which participate in the subtraction operation, since the participation party of the subtraction operation is 2, the present disclosure may generate two operation arrays for the M sets of data to be operated on, where one operation array corresponds to all the decremented numbers in the M sets of data to be operated on, and the other operation array corresponds to all the decrements in the M sets of data to be operated on.
For yet another example, for M sets of data to be operated on that participate in a multiplication operation, since the party to the multiplication operation is 2, the present disclosure may generate two operation arrays for the M sets of data to be operated on, one of the operation arrays corresponding to all first multipliers in the M sets of data to be operated on, and the other operation array corresponding to all second multipliers in the M sets of data to be operated on.
For yet another example, for M sets of data to be operated on, which participate in a division operation, since the participation party of the division operation is 2, the present disclosure may generate two operation arrays for the M sets of data to be operated on, wherein one operation array corresponds to all divisors in the M sets of data to be operated on, and the other operation array corresponds to all divisors in the M sets of data to be operated on.
For yet another example, for M sets of data to be operated on, which participate in the reciprocal operation, since the participant of the reciprocal operation is 1, the present disclosure may generate one operation array for the M sets of data to be operated on, the operation array corresponding to all of the M sets of data to be operated on.
In one alternative example, the present disclosure may generate an operand array in different ways for different types of data to be operated on. The following is illustrative:
for the first example, for a first data to be operated on that includes a mantissa and an exponent, the present disclosure may convert sign bits and mantissas in the first data to be operated on to a first fixed point number and convert exponents in the first data to be operated on to a second fixed point number. For example, in the case where the first data to be operated on is a half-precision floating-point number, the present disclosure may convert sign bits and mantissas of the half-precision floating-point number to a first fixed-point number and convert exponents of the half-precision floating-point number to a second fixed-point number. For another example, where the first data to be operated on is a single precision floating point number, the present disclosure may combine the sign bit (e.g., s in FIG. 3) and mantissa (e.g., b in FIG. 3) of the single precision floating point number 1 b 2 b 3 b 4 ......b 23 ) Converts to a first fixed point number (16 bits), and indexes the single precision floating point number (e.g. e in FIG. 3 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 ) And converted to a second fixed point number (16 bits). After all first to-be-operated data in all groups of to-be-operated data participating in the same type of operation are respectively converted into a first fixed point number and a second fixed point number, the method and the device can generate an operation array with corresponding quantity from all the first fixed point number and the second fixed point number according to the quantity of the participants participating in the same type of operation. For example, all first fixed-point numbers and all second fixed-point numbers are generated into one operation array. For another example, all of the first parties are And generating an operation array by the first fixed point number and all the second fixed point numbers, and generating another operation array by all the first fixed point numbers and all the second fixed point numbers of the second participators.
In connection with fig. 2 and the first example described above, assuming that m+1 pieces of data to be operated on in the disclosure are all single-precision floating point numbers, and (m+1)/2 pieces of data to be operated on all need to be added, one example of two operation arrays generated in the disclosure is shown in fig. 4 and 5.
FIG. 4 is a diagram of one of two operand sets, which may be referred to as a first operand set. The first operand set includes: the first addend in each group of data to be operated corresponds to the first fixed point number and the second fixed point number. For example, element 1 (i.e., first fixed-point number n1 a) and element 2 (i.e., second fixed-point number n1 b) in fig. 4 are first fixed-point number and second fixed-point number corresponding to the data to be operated on n1 in fig. 2; element 3 (i.e., first fixed-point number n3 a) and element 4 (i.e., second fixed-point number n3 b) in fig. 4 are first fixed-point number and second fixed-point number corresponding to data to be calculated n3 in fig. 2; similarly, the element m (i.e., the first fixed point nma) and the element m+1 (i.e., the second fixed point nmb) in fig. 4 are the first fixed point and the second fixed point corresponding to the data nm to be calculated in fig. 2.
FIG. 5 is another of the two operand sets, which may be referred to as a second operand set. The second operand set includes: the second addend in each group of data to be operated corresponds to the first fixed point number and the second fixed point number. For example, element 1 (i.e., first fixed-point number n2 a) and element 2 (i.e., second fixed-point number n2 b) in fig. 5 are first fixed-point number and second fixed-point number corresponding to data to be operated on n2 in fig. 2; element 3 (i.e., first fixed-point number n4 a) and element 4 (i.e., second fixed-point number n4 b) in fig. 5 are first fixed-point number and second fixed-point number corresponding to data to be calculated n4 in fig. 2; similarly, the element m (i.e., the first fixed point number nm+1a) and the element m+1 (i.e., the second fixed point number nm+1b) in fig. 5 are the first fixed point number and the second fixed point number corresponding to the data to be operated on nm+1 in fig. 2.
It should be noted that the arrangement order of the elements in fig. 4 and 5 may be flexibly set according to actual needs. For example, all the first fixed-point numbers in fig. 4 and 5 are arranged in the order of the group, and are arranged before all the second fixed-point numbers, which are also arranged in the order of the group. In addition, in the case that the set of data to be operated includes a plurality of data to be operated (for example, two data to be operated), and the operation type does not need to participate in the operation in the exponent portion, the second fixed point numbers corresponding to the plurality of data to be operated in the set of data to be operated may be the same. In the following, the description will not be given.
According to the method and the device, the first data to be operated such as the half-precision floating point number or the single-precision floating point number is converted into the first fixed point number and the second fixed point number, so that the half-precision floating point number or the single-precision floating point number can be operated in parallel by utilizing the SIMD technology, and the compatibility of operation is improved.
For a second example, for a second data to be operated on that includes a plurality of mantissas and a plurality of exponents, the present disclosure may convert the plurality of sign bits and the plurality of mantissas in the second data to be operated on into first fixed-point numbers, respectively, and convert the plurality of exponents in the data to be operated on into a shared second fixed-point number. For example, in the case where the second data to be operated on is a complex number, the real part of the complex number may be a single-precision floating point number, and the imaginary part of the complex number may also be a single-precision floating point number, where each single-precision floating point number includes: the sign bit, the mantissa and the exponent, in the case that the exponents of the single-precision floating point number of the real part and the single-precision floating point number of the imaginary part of the complex number are different, the disclosure can enable the single-precision floating point number of the real part and the single-precision floating point number of the imaginary part of the complex number to have the same exponent by adjusting the mantissa of one single-precision floating point number, and then the disclosure can convert the sign bit and the mantissa of the single-precision floating point number of the real part into one first fixed point number, convert the sign bit and the mantissa of the single-precision floating point number of the imaginary part into another first fixed point number, and convert the exponents of the single-precision floating point number of the real part and the single-precision floating point number of the imaginary part into one shared second fixed point number. After converting all second data to be operated in all groups of data to be operated which participate in the same type of operation into two first fixed-point numbers and one shared second fixed-point number, the present disclosure may generate an operation array of a corresponding number from all first fixed-point numbers and second fixed-point numbers according to the number of participants which participate in the same type of operation. For example, all first fixed-point numbers and all shared second fixed-point numbers are generated into an operation array. For another example, one operation array is generated by all the first fixed-point numbers of the first participants and all the shared second fixed-point numbers of the first participants, and another operation array is generated by all the first fixed-point numbers of the second participants and all the second fixed-point numbers of the second participants.
In connection with fig. 2 and the second example described above, assuming that m+1 data to be operated on in the present disclosure are all complex, and (m+1)/2 sets of data to be operated on all need to be added, one example of two operation arrays generated in the present disclosure is shown in fig. 6 and 7.
FIG. 6 is a diagram of one of two operand sets, which may be referred to as a first operand set. The first operand set includes: the real part of the first addend in each group of data to be operated corresponds to a first fixed point number, the imaginary part of the first addend in each group of data to be operated corresponds to a first fixed point number, and the real part and the imaginary part of the first addend in each group of data to be operated correspond to a shared second fixed point number. For example, element 1 (i.e., first fixed point number n1 a) in fig. 6 is a first fixed point number corresponding to the real part of the data to be operated n1 in fig. 2, element 2 (i.e., second fixed point number n1 b) in fig. 6 is a first fixed point number corresponding to the imaginary part of the data to be operated n1 in fig. 2, and element 3 (i.e., second fixed point number 2n1 c) in fig. 6 is a shared second fixed point number corresponding to the real part and the imaginary part of the data to be operated n1 in fig. 2; element 4 (i.e., first fixed point number n3 a) in fig. 6 is a first fixed point number corresponding to the real part of the data to be operated n3 in fig. 2, element 5 (i.e., second fixed point number n3 b) in fig. 6 is a first fixed point number corresponding to the imaginary part of the data to be operated n3 in fig. 2, and element 6 (i.e., second fixed point number n3 c) in fig. 6 is a shared second fixed point number corresponding to the real part and the imaginary part of the data to be operated n3 in fig. 2; and so on, until the element (3 m-1)/2 (i.e., the first fixed point nma) in fig. 6 is the first fixed point corresponding to the real part of the data nm to be operated in fig. 2, the element (3m+1)/2 (i.e., the second fixed point nmb) in fig. 6 is the first fixed point corresponding to the imaginary part of the data nm to be operated in fig. 2, and the element (3m+3)/2 (i.e., the second fixed point nmc) in fig. 6 is the shared second fixed point corresponding to the real part and the imaginary part of the data nm to be operated in fig. 2.
FIG. 7 is another of the two operand sets, which may be referred to as a second operand set. The second operand set may include: the real part of the second addend in each group of data to be operated corresponds to a first fixed point number, the imaginary part of the second addend in each group of data to be operated corresponds to a first fixed point number, and the real part and the imaginary part of the second addend in each group of data to be operated correspond to a shared second fixed point number. For example, element 1 (i.e., first fixed point number n2 a) in fig. 7 is a first fixed point number corresponding to the real part of the data to be operated n2 in fig. 2, element 2 (i.e., second fixed point number n2 b) in fig. 7 is a first fixed point number corresponding to the imaginary part of the data to be operated n2 in fig. 2, and element 3 (i.e., second fixed point number n2 c) in fig. 7 is a shared second fixed point number corresponding to the real part and the imaginary part of the data to be operated n1 in fig. 2; element 4 (i.e., first fixed point number n4 a) in fig. 7 is a first fixed point number corresponding to the real part of the data to be operated n4 in fig. 2, element 5 (i.e., second fixed point number n4 b) in fig. 7 is a first fixed point number corresponding to the imaginary part of the data to be operated n4 in fig. 2, and element 6 (i.e., second fixed point number n4 c) in fig. 7 is a shared second fixed point number corresponding to the real part and the imaginary part of the data to be operated n4 in fig. 2; and so on, until the element (3 m-1)/2 (i.e., the first fixed point nma) in fig. 7 is the first fixed point corresponding to the real part of the data to be operated nm+1 in fig. 2, the element (3m+1)/2 (i.e., the second fixed point nm+1b) in fig. 7 is the first fixed point corresponding to the imaginary part of the data to be operated nm+1 in fig. 2, and the element (3m+3)/2 (i.e., the second fixed point nm+1c) in fig. 7 is the shared second fixed point corresponding to the real part and the imaginary part of the data to be operated nm+1 in fig. 2.
It should be specifically noted that, the present disclosure may generate one shared second fixed point number for the indexes of all real parts of the complex numbers, one shared second fixed point number for the indexes of all imaginary parts of the complex numbers, and one shared second fixed point number for the indexes of all real parts and the indexes of the imaginary parts of the complex numbers. And will not be described in detail herein. In addition, the arrangement order of the elements in fig. 6 and 7 can be flexibly set according to actual needs. For example, all the first fixed-point numbers in fig. 6 and 7 are arranged in the order of the groups, and for the same data to be calculated, the first fixed-point number corresponding to the real part is arranged in front of the first fixed-point number corresponding to the imaginary part, all the first fixed-point numbers are arranged in front of all the second fixed-point numbers, and the second fixed-point numbers are also arranged in the order of the groups.
The method and the device convert the second data to be operated into a plurality of first fixed points and a shared second fixed point, so that the method and the device can utilize the SIMD technology to perform parallel operation processing on the second data to be operated, thereby improving the compatibility of operation and being beneficial to reducing the memory consumption in the operation process.
In a third example, for a plurality of first data to be operated on, the present disclosure may convert all sign bits and mantissas in the plurality of first data to be operated on into first fixed-point numbers, respectively, and convert all exponents in the plurality of first data to be operated on into one shared second fixed-point number. For example, assuming that the plurality of first data to be operated are all half-precision floating-point numbers, in the case where the exponents of all the half-precision floating-point numbers are not identical, the present disclosure may make all the half-precision floating-point numbers have the same exponent by adjusting the mantissas of at least one of the half-precision floating-point numbers, and then, for each sign bit and mantissa of each of the half-precision floating-point numbers, the present disclosure converts each of the half-precision floating-point numbers into one first fixed-point number, thereby obtaining a plurality of first fixed-point numbers, and the present disclosure may convert the exponents of all the half-precision floating-point numbers into one shared second fixed-point number. For another example, assuming that the plurality of first data to be operated are all single-precision floating-point numbers, in the case that the exponents of all the single-precision floating-point numbers are not identical, the disclosure may enable all the single-precision floating-point numbers to have the same exponent by adjusting the mantissa of at least one of the single-precision floating-point numbers, and then, for each sign bit and mantissa of all the single-precision floating-point numbers, the disclosure may convert each sign bit and mantissa of all the single-precision floating-point numbers into one first fixed-point number, thereby obtaining a plurality of first fixed-point numbers, and the disclosure may convert the exponents of all the single-precision floating-point numbers into one shared second fixed-point number. After all first to-be-operated data in all groups to-be-operated data participating in the same type of operation are respectively converted into a first fixed point number and a shared second fixed point number corresponding to the first fixed point number, the present disclosure may generate an operation array of a corresponding number from all the first fixed point number and the shared second fixed point number according to the number of participants participating in the same type of operation. For example, all first fixed-point numbers and a shared second fixed-point number are generated into an operation array. For another example, one operation array is generated by all the first fixed-point numbers of the first participants and the shared second fixed-point numbers of the first participants, and another operation array is generated by all the first fixed-point numbers of the second participants and the shared second fixed-point numbers of the second participants.
In connection with fig. 2 and the third example described above, assuming that m+1 pieces of data to be operated on in the present disclosure are all single-precision floating point numbers, and (m+1)/2 pieces of data to be operated on all need to be added, one example of two operation arrays generated in the present disclosure is shown in fig. 8 and 9.
FIG. 8 is a diagram of one of two operand sets, which may be referred to as a first operand set. The first operand set may include: the first addends in each group of data to be operated respectively correspond to the first fixed point numbers and one shared second fixed point number corresponding to all the first addends. For example, element 1 (i.e., first fixed point n1 a) in fig. 8 is a first fixed point corresponding to the data to be operated n1 in fig. 2, and element 2 (i.e., first fixed point n3 a) is a first fixed point corresponding to the data to be operated n3 in fig. 2; and so on until element (m+1)/2 (i.e., first fixed point nma) in fig. 8 is the first fixed point corresponding to the data nm to be operated on in fig. 2. The last element (m+3)/2 (i.e., the second fixed point number nb 1) in the first operand set in fig. 8 is the shared second fixed point number corresponding to all the first addends in fig. 2.
FIG. 9 is another of the two operand sets, which may be referred to as a second operand set. The second operand set may include: the second addends in each group of data to be operated respectively correspond to the first fixed point number and one shared second fixed point number corresponding to all the second addends. For example, element 1 (i.e., first fixed point n2 a) in fig. 9 is a first fixed point corresponding to the data to be operated n2 in fig. 2, and element 2 (i.e., first fixed point n4 a) in fig. 9 is a first fixed point corresponding to the data to be operated n4 in fig. 2; and so on until the element (m+1)/2 in fig. 9 (i.e., the first fixed point number nma) is the first fixed point number corresponding to the data nm+1 to be operated in fig. 2. The last element (m+3)/2 (i.e., the second fixed point number nb 1) in the second operand set in fig. 9 is the shared second fixed point number corresponding to all the second addends in fig. 2.
It should be noted that the arrangement order of the elements in fig. 8 and 9 may be flexibly set according to actual needs. For example, the second fixed point number in fig. 8 and 9 may be arranged before all the first fixed point numbers. For another example, one of the first operand set and the second operand set in the present disclosure may not include the second fixed point number, that is, in the case where all the data to be operated has the same shared second fixed point number, the present disclosure may make all the operand sets include the shared second fixed point number, or make one operand set include the shared second fixed point number, and the other operand sets do not include the shared second fixed point number.
According to the method and the device, the first data to be operated such as the plurality of half-precision floating points or single-precision floating points are converted into the plurality of first fixed points and the shared second fixed point, so that the first data to be operated such as the half-precision floating points or the single-precision floating points can be operated in parallel by utilizing the SIMD technology, and the compatibility of operation is improved; and is beneficial to remarkably reducing the memory consumption in the operation process.
In an alternative example, operations directed to mantissa and exponent in this disclosure may include: and (4) carrying out operation on mantissas, and keeping the exponent unchanged. In this case, the present disclosure may determine a first SIMD interface function corresponding to an operation for a mantissa and determine input parameters of the first SIMD interface function from elements of the corresponding mantissa in the operation array.
In an alternative example, operations directed to mantissa and exponent in this disclosure may include: the operation is performed for both mantissa and exponent. In this case, the present disclosure may determine a first SIMD interface function corresponding to an operation for a mantissa and a second SIMD interface function corresponding to an operation for an exponent, and the present disclosure may determine input parameters of the first SIMD interface function from elements of the operation array corresponding to the mantissa and determine input parameters of the second SIMD interface function from elements of the operation array corresponding to the exponent.
Optionally, the first SIMD interface function and the second SIMD interface function in the present disclosure are both interface functions provided by SIMD technology. The first SIMD interface function and the second SIMD interface function may be the same or different. Whether the first SIMD interface function and the second SIMD interface function are identical depends on whether the operation for mantissas and the operation for exponents are identical. SIMD interface functions in this disclosure may also be referred to as SIMD instructions or SIMD functions, etc.
According to the method and the device, the corresponding SIMD interface function is called according to the arithmetic aiming at the mantissa and the exponent, the arithmetic can be carried out aiming at the mantissa, and the arithmetic can be carried out aiming at the mantissa and the exponent respectively, so that the method and the device can utilize SIMD technology to carry out parallel arithmetic processing on semi-precision floating point number, single-precision floating point number and complex waiting arithmetic data, and therefore the arithmetic compatibility is improved. In addition, the operations for mantissas and exponents involved in the operations include: under the condition of carrying out operation on mantissas and keeping the exponent unchanged, the SIMD interface function is called once, and the operation of two groups of mantissas can be realized, thereby being beneficial to improving the operation efficiency of non-fixed point numbers.
In an alternative example, the disclosure may preset an operation library, which generally includes a plurality of preset operation interface functions, each of which may correspond to one operation of the operation array. That is, a predetermined operation interface function may represent a processing manner adopted for each element of the operation array. The operations of calling the SIMD interface function and setting the input parameters for the SIMD interface function in the present disclosure may be implemented by a preset operation interface function. For example, first, a preset operation interface function is called from a preset operation library, and the obtained operation array (such as the first operation array and the second operation array in fig. 4 to 9) is used as an input parameter of the called preset operation interface function, and after the called preset operation interface function obtains the input parameter, operations performed according to the input parameter include, but are not limited to: and calling the SIMD interface function corresponding to the arithmetic for mantissa and exponent related to the same type of arithmetic, and determining the input parameters of the called SIMD interface function according to the elements in the arithmetic array.
Optionally, under the condition that the ith element in the first operation array and the ith element in the second operation array perform corresponding operations, the preset operation interface function in the disclosure can conveniently and quickly obtain the elements needing to perform operations from the two operation arrays of the input parameters, and set the input parameters for the SIMD interface function.
Alternatively, when a preset operation interface function is called from a preset operation library, the type of data to be operated and the operation type may be considered. That is, in setting a preset operation interface function in the operation library, the type of data to be operated and the operation type may be considered. For example, one preset operation interface function may be set for addition operations of single precision floating point numbers, and another preset operation interface function may be set for addition operations of complex numbers. If the same preset operation interface function is set for the addition operation of the single-precision floating point number and the addition operation of the complex number, the input parameters of the preset operation interface function should generally represent the data type (such as the single-precision floating point number type or the complex number type) of the data to be operated in the operation so that the preset operation interface function can execute the subsequent operations (such as the calling operation, the combined restoring operation and the like).
In one alternative example, the input parameters of the SIMD interface function include: two 32 bits. In the case that the operation type includes two parties, the present disclosure may form 32 bits from two first fixed-point numbers to be operated or two second fixed-point numbers to be operated, and form 32 bits from another two first fixed-point numbers to be operated or another two second fixed-point numbers to be operated, as input parameters of the SIMD interface function, so that the SIMD interface function may perform two operations at one time. Under the condition that the operation type comprises a participant, the method and the device can form 32-bit positions by the first fixed point number and the second fixed point number which need to be operated, and form 32-bit positions by the first fixed point number and the second fixed point number which need to be operated as input parameters of the SIMD interface function, so that the SIMD interface function can operate twice at one time.
According to the method and the device, the operation library is set, and corresponding operation is realized by calling the preset operation interface function in the operation library, so that operation aiming at multiple groups of data to be operated is modularized, and maintainability of the technical scheme of the method and the device is improved.
In an alternative example, when setting the input parameter of the preset operation interface function, the present disclosure may use the operation number corresponding to the operation number as the input parameter of the preset operation interface function in addition to using the operation number as the input parameter of the preset operation interface function. The operation times are used for representing the times of calling the SIMD interface function by the preset operation interface function. The number of operations may be used as a first input parameter to a predetermined operation interface function.
The number of times corresponding to the operation array is also used as the input parameter of the preset operation interface function, so that the operation array is processed more clearly by the preset operation interface function, and the maintainability of the preset operation interface function is improved.
In an optional example, the operation of generating the operation result corresponding to the at least one group of data to be operated according to the operation result based on the input parameter by the SIMD interface function in the disclosure may also be implemented by a preset operation interface function. That is, after the preset operation interface function in the present disclosure invokes the SIMD interface function and sets the input parameters of the SIMD interface function, the SIMD interface function performs a corresponding operation according to the input parameters thereof, so as to return an operation result, and after the preset operation interface function obtains the operation result returned by the SIMD interface function, the preset operation interface function may generate, according to a combination reduction manner preset for the operation result, operation results corresponding to each group of data to be operated, and output the operation results corresponding to each group of data to be operated, so that the present disclosure may obtain, according to the output of the preset operation interface function, the operation results corresponding to each group of data to be operated. It should be noted that when obtaining the operation result returned by the interface function of SIMD, the sign bit in the operation result should be reserved with 1 bit.
According to the method and the device, the operation library is set, and corresponding operation is realized by calling the preset operation interface function in the operation library, so that operation aiming at multiple groups of data to be operated is modularized, and maintainability and applicability of the technical scheme of the method and the device are improved.
In an alternative example, the operations for mantissas and exponents involved in the operations of the present disclosure are: under the condition that the mantissa is operated and the exponent is kept unchanged, the method and the device can process (such as combination reduction and the like) according to the operation result of the SIMD interface function corresponding to the mantissa operation and the generated second fixed point number, so as to form operation results respectively corresponding to each group of data to be operated.
With reference to fig. 4 and 5, in the case where all data to be operated are subjected to addition operation, the disclosure may obtain (m+1)/2 operation results, where (m+1)/2 operation results are operation results obtained by performing operation on mantissas of the participants, and may combine and restore the operation results with corresponding second fixed-point numbers to form single-precision floating-point numbers, so as to implement addition operation of the single-precision floating-point numbers. Moreover, the present disclosure may implement two single precision floating point number addition operations by invoking a single SIMD interface function.
In conjunction with fig. 6 and fig. 7, in the case where all data to be operated on are subjected to addition operation, the present disclosure may obtain (m+1) operation results, where the (m+1) operation results include: the present disclosure may combine and reduce the operation result of the mantissa of the real part, the operation result of the mantissa of the imaginary part, and the corresponding second fixed point number to form a complex number, thereby implementing an addition operation of the complex number. Moreover, the present disclosure may implement a complex addition operation by invoking a SIMD interface function once.
With reference to fig. 8 and 9, in the case where all data to be operated are subjected to addition operation, the disclosure may obtain (m+1)/2 operation results, where (m+1)/2 operation results are obtained by performing operation on mantissas of the participants, and may combine and restore one operation result with a corresponding shared second fixed point number to form a single-precision floating point number, thereby implementing addition operation of the single-precision floating point number. Moreover, the present disclosure may implement two single precision floating point number addition operations by invoking a single SIMD interface function.
In an alternative example, the operations for mantissas and exponents involved in the operations of the present disclosure are: under the condition of carrying out operation on both mantissas and exponents, the method and the device can generate operation results respectively corresponding to each group of data to be operated according to the operation results of the SIMD interface functions corresponding to the mantissa operation and the operation results of the SIMD interface functions corresponding to the exponents. For example, for multiplication operations, operations are required for both mantissas and exponents, where the mantissa operation result may be a 16-bit fixed point number and the exponent operation result may be a 16-bit fixed point number. The method and the device can form the single-precision floating point number by combining and restoring the two fixed point numbers with 16 bits, thereby realizing the multiplication operation of the single-precision floating point number. The present disclosure may implement complex multiplication operations by combining and recovering four 16-bit fixed point numbers to form a complex number. The process of forming the operation result of the data to be operated on by other operations may refer to the above description, and will not be described in detail here.
According to the method and the device, the operation results are subjected to combination reduction and other treatments according to the arithmetic directed against mantissas and exponents, and the operation of the non-fixed point number can be realized, so that the application range of the interface function based on SIMD is favorably improved, and the efficiency of the operation of the non-fixed point number is favorably improved.
Exemplary apparatus
Fig. 10 is a schematic diagram of a data operation device based on a single instruction multiple data stream according to an embodiment of the present disclosure. The apparatus of this embodiment may be used to implement the method embodiments of the present disclosure described above.
In fig. 10, the apparatus of this embodiment includes: a generate operation array module 1000, a call interface function module 1001, and a generate operation result module 1002.
The generating operation array module 1000 is mainly used for generating an operation array according to at least one group of data to be operated which participates in the same type of operation. Wherein, a group of data to be operated on includes: all data to be operated on participating in the same operation, wherein the operation array comprises: at least one first fixed-point number for representing sign bits and mantissas of data to be operated on, and at least one second fixed-point number for representing exponents of data to be operated on.
Optionally, generating the operand set module 1000 may include: at least one of the first sub-module, the second sub-module, and the third sub-module.
The first submodule is used for converting sign bits and mantissas in first data to be operated into a first fixed point number and converting exponents in the first data to be operated into a second fixed point number. The first data to be operated on includes a mantissa and an exponent. The first data to be operated can be a half-precision floating point number or a single-precision floating point number. For example, the first sub-module converts the mantissa of the half-precision floating point number in the data to be operated into a first fixed point number, and converts the exponent of the half-precision floating point number into a second fixed point number. For another example, the first sub-module converts the mantissa of the single-precision floating point number in the data to be operated into a first fixed point number, and converts the exponent of the single-precision floating point number into a second fixed point number.
The second submodule is used for respectively converting a plurality of sign bits and mantissas in second data to be operated into first fixed point numbers and converting a plurality of exponents in the second data to be operated into a shared second fixed point number. The second data to be operated includes: a plurality of mantissas and a plurality of exponents. The second data to be operated on may be complex, etc. For example, the second sub-module may convert the complex number in the data to be operated on into a first fixed point number corresponding to the mantissa of the real part of the complex number, a first fixed point number corresponding to the mantissa of the imaginary part of the complex number, and a shared second fixed point number corresponding to the exponent of the real part and the exponent of the imaginary part of the complex number.
The third sub-module may be configured to convert each sign bit and mantissa in the first plurality of data to be operated into a first fixed point number, and convert each exponent in the first plurality of data to be operated into a shared second fixed point number. The first data to be operated on includes a mantissa and an exponent. The first data to be operated can be a half-precision floating point number or a single-precision floating point number. For example, the third sub-module may convert a set of half-precision floating-point numbers in the data to be operated on into a set of first fixed-point numbers corresponding to mantissas of the set of half-precision floating-point numbers and a shared second fixed-point number corresponding to exponents of the set of half-precision floating-point numbers. For another example, the third sub-module may convert a set of single precision floating point numbers in the operands to be operated upon into a set of first fixed point numbers corresponding to mantissas of the set of single precision floating point numbers and a shared second fixed point number corresponding to exponents of the set of single precision floating point numbers.
The calling interface function module 1001 is configured to call a corresponding interface function based on SIMD according to the mantissa and exponent related operations, and determine input parameters of the SIMD-based interface function according to elements in the operation array generated by the generation operation array module 1000.
Alternatively, the call interface function module 1001 may call a first SIMD interface function corresponding to an operation on a mantissa, and determine an input parameter of the first SIMD interface function according to an element of the corresponding mantissa in the operation array.
Alternatively, the call interface function module 1001 may call a first SIMD interface function corresponding to an operation for a mantissa and call a second SIMD interface function corresponding to an operation for an exponent, and determine input parameters of the first SIMD interface function and the second SIMD interface function according to an element of the corresponding mantissa and an element of the corresponding exponent in the operation array, respectively.
Optionally, the calling interface function module 1001 may include: the operation library and the calling sub-module. The operation library comprises a plurality of preset operation interface functions, and each preset operation interface function corresponds to one type of operation. The calling sub-module can call a preset operation interface function from an operation library and take an operation array as an input parameter of the preset operation interface function, so that the method can execute the steps of calling the SIMD interface function corresponding to the operation aiming at the mantissa and the exponent related to the same type of operation through the preset operation interface function and determining the input parameter of the called SIMD interface function according to the elements in the operation array.
Optionally, the calling sub-module may use the operation array and the operation times corresponding to the operation array as input parameters of a preset operation interface function; the operation times are used for representing times of calling the SIMD interface function by the preset operation interface function.
The operation result generation module 1002 is configured to generate at least one set of operation results corresponding to the data to be operated according to the operation result of the SIMD-based interface function called by the calling interface function module.
Optionally, the preset operation interface function in the present disclosure may execute the operation of generating the operation result corresponding to each group of data to be operated according to the operation result of the SIMD interface function based on the input parameter. That is, the generated operation result module 1002 may be integrally provided inside the preset operation interface function.
Optionally, the preset operation interface function may generate operation results corresponding to each group of data to be operated according to the operation result of the SIMD interface function corresponding to the mantissa operation and the corresponding second fixed point number.
Optionally, the preset operation interface function may generate the operation result corresponding to each group of data to be operated according to the operation result of the SIMD interface function corresponding to the mantissa operation and the operation result of the SIMD interface function corresponding to the exponent operation.
Exemplary electronic device
An electronic device according to an embodiment of the present disclosure is described below with reference to fig. 11. Fig. 11 shows a block diagram of an electronic device according to an embodiment of the disclosure. As shown in fig. 11, the electronic device 111 includes one or more processors 1111 and memory 1112.
The processor 1111 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities and may control other components in the electronic device 111 to perform the desired functions.
Memory 1112 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example: random Access Memory (RAM) and/or cache, etc. The nonvolatile memory may include, for example: read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 1111 to implement the single instruction multiple data stream based data operation method and/or other desired functions of the various embodiments of the present disclosure described above. Various contents such as an input signal, a signal component, a noise component, and the like may also be stored in the computer-readable storage medium.
In one example, the electronic device 111 may further include: input devices 1113, output devices 1114, and so forth, interconnected by a bus system and/or other form-factor connection mechanism (not shown). In addition, the input device 1113 may also include, for example, a keyboard, a mouse, and the like. The output device 1114 can output various information to the outside. The output devices 1114 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.
Of course, only some of the components of the electronic device 111 relevant to the present disclosure are shown in fig. 11 for simplicity, components such as buses, input/output interfaces, and the like being omitted. In addition, the electronic device 111 may include any other suitable components depending on the particular application.
Exemplary computer program product and computer readable storage Medium
In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in a single instruction multiple data stream based data operation method according to various embodiments of the present disclosure described in the above "exemplary methods" section of the present description.
The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium, having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in a single instruction multiple data stream based data operation method according to various embodiments of the present disclosure described in the above "exemplary method" section of the present disclosure.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium may include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present disclosure have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present disclosure are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, since the disclosure is not necessarily limited to practice with the specific details described.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, so that the same or similar parts between the embodiments are mutually referred to. For system embodiments, the description is relatively simple as it essentially corresponds to method embodiments, and reference should be made to the description of method embodiments for relevant points.
The block diagrams of the devices, apparatuses, devices, systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatus, devices, and systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the apparatus, devices and methods of the present disclosure, components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered equivalent to the present disclosure.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects, and the like, will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, changes, additions, and sub-combinations thereof.

Claims (12)

1. A data operation method based on single instruction multiple data streams, comprising:
generating an operation array according to at least one group of data to be operated which participates in the same type of operation, wherein the group of data to be operated comprises: data to be operated on for participating in a same operation of the same type of operation, the operation array comprising: at least one first fixed point number and at least one second fixed point number, wherein the first fixed point number is used for representing the sign bit and mantissa of the data to be operated;
invoking a single instruction multiple data Stream (SIMD) interface function corresponding to the arithmetic for mantissa and exponent related to the same type of operation, and determining input parameters of the SIMD interface function according to elements in the arithmetic array;
and generating operation results corresponding to the at least one group of data to be operated respectively according to the operation results of the SIMD interface function based on the input parameters.
2. The method of claim 1, wherein the generating an operand array from at least one set of data to be operated on that participates in the same type of operation comprises at least one of:
converting sign bits and mantissas in first data to be operated into a first fixed point number, and converting exponents in the first data to be operated into a second fixed point number; wherein the first data to be operated on includes a mantissa and an exponent;
converting a plurality of sign bits and mantissas in second data to be operated into first fixed point numbers respectively, and converting a plurality of exponents in the second data to be operated into a shared second fixed point number; the second data to be operated includes: a plurality of mantissas and a plurality of exponents;
and converting each sign bit and mantissa in the first data to be operated into a first fixed point number respectively, and converting each exponent in the first data to be operated into a shared second fixed point number.
3. The method of claim 1 or 2, wherein said invoking a single instruction multiple data stream, SIMD, interface function corresponding to an operation for mantissa and exponent related to the same type of operation and determining input parameters of the SIMD interface function from elements in the operation array comprises at least one of:
Invoking a first SIMD interface function corresponding to an operation for a mantissa, and determining input parameters of the first SIMD interface function according to elements of the corresponding mantissa in the operation array;
and calling a first SIMD interface function corresponding to the operation aiming at the mantissa, and calling a second SIMD interface function corresponding to the operation aiming at the exponent, and respectively determining input parameters of the first SIMD interface function and the second SIMD interface function according to the element of the corresponding mantissa and the element of the corresponding exponent in the operation array.
4. The method of claim 1 or 2, wherein said invoking a single instruction multiple data stream, SIMD, interface function corresponding to an operation for mantissa and exponent related to the same type of operation and determining input parameters of the SIMD interface function from elements in the operation array comprises:
calling a preset operation interface function from a preset operation library, and taking the operation array as an input parameter of the preset operation interface function;
executing the operation for mantissa and exponent related to the same type operation through the preset operation interface function, corresponding SIMD interface function, and determining the input parameters of the called SIMD interface function according to the elements in the operation array;
The operation library comprises a plurality of preset operation interface functions, and each preset operation interface function corresponds to one operation of the operation array.
5. The method of claim 4, wherein the setting the operation array as the input parameter of the preset operation interface function comprises:
taking the operation number corresponding to the operation array and the operation number corresponding to the operation array as input parameters of the preset operation interface function;
the operation times are used for representing times of calling the SIMD interface function by the preset operation interface function.
6. The method of claim 4, wherein the generating, according to the SIMD interface function, the operation result respectively corresponding to the at least one set of data to be operated on based on the operation result of the input parameter includes:
executing the operation results based on the input parameters according to the SIMD interface function through the preset operation interface function, and generating operation results respectively corresponding to the at least one group of data to be operated;
and obtaining operation results respectively corresponding to the at least one group of data to be operated according to the output of the preset operation interface function.
7. The method of any one of claims 1, 2, 5, and 6, wherein the generating, according to the SIMD interface function, an operation result corresponding to the at least one set of data to be operated on, respectively, based on the operation result of the input parameter, includes:
Generating operation results respectively corresponding to the at least one group of data to be operated according to the operation results of the SIMD interface functions corresponding to the mantissa operations and the second fixed point number; or alternatively
And generating the operation results respectively corresponding to the at least one group of data to be operated according to the operation results of the SIMD interface functions corresponding to the mantissa operation and the operation results of the SIMD interface functions corresponding to the exponent operation.
8. A single instruction multiple data stream based data operation device, comprising:
the generating operation array module is used for generating an operation array according to at least one group of data to be operated which participates in the same type of operation, wherein the group of data to be operated comprises: all data to be operated on participating in the same operation, the operation array comprises: at least one first fixed point number and at least one second fixed point number, wherein the first fixed point number is used for representing the sign bit and mantissa of the data to be operated;
the interface function calling module is used for calling a corresponding interface function based on single instruction multiple data Stream (SIMD) according to the arithmetic directed against mantissas and exponents related to the arithmetic, and determining input parameters of the interface function based on the SIMD according to elements in an arithmetic array generated by the arithmetic array generating module;
And the operation result generation module is used for generating operation results respectively corresponding to the at least one group of data to be operated according to the operation results of the SIMD-based interface function called by the interface function calling module.
9. The apparatus of claim 8, wherein the generate operand module comprises at least one of:
the first sub-module is used for converting sign bits and mantissas in first data to be operated into a first fixed point number and converting exponents in the first data to be operated into a second fixed point number; wherein the first data to be operated on includes a mantissa and an exponent;
the second sub-module is used for respectively converting a plurality of sign bits and mantissas in second data to be operated into first fixed point numbers and converting a plurality of exponents in the second data to be operated into a shared second fixed point number; the second data to be operated includes: a plurality of mantissas and a plurality of exponents;
and the third sub-module is used for respectively converting each sign bit and mantissa in the plurality of first data to be operated into a first fixed point number and converting each exponent in the plurality of first data to be operated into a shared second fixed point number.
10. The apparatus of claim 8 or 9, wherein the call interface function module comprises:
the operation library comprises a plurality of preset operation interface functions, and each preset operation interface function corresponds to one type of operation;
the calling sub-module is used for calling a preset operation interface function from the operation library and taking the operation array as an input parameter of the preset operation interface function;
executing the operation for mantissa and exponent related to the same type operation through the preset operation interface function, corresponding SIMD interface function, and determining the input parameters of the called SIMD interface function according to the elements in the operation array;
the operation library comprises a plurality of preset operation interface functions, and each preset operation interface function corresponds to one operation of the operation array.
11. A computer readable storage medium storing a computer program for performing the method of any one of the preceding claims 1-7.
12. An electronic device, the electronic device comprising:
a processor;
a memory for storing the processor-executable instructions;
The processor being configured to read the executable instructions from the memory and execute the instructions to implement the method of any of the preceding claims 1-7.
CN201910566415.8A 2019-06-27 2019-06-27 Data operation method, device, medium and equipment based on single-instruction multi-data stream Active CN112148371B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910566415.8A CN112148371B (en) 2019-06-27 2019-06-27 Data operation method, device, medium and equipment based on single-instruction multi-data stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910566415.8A CN112148371B (en) 2019-06-27 2019-06-27 Data operation method, device, medium and equipment based on single-instruction multi-data stream

Publications (2)

Publication Number Publication Date
CN112148371A CN112148371A (en) 2020-12-29
CN112148371B true CN112148371B (en) 2023-10-24

Family

ID=73868492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910566415.8A Active CN112148371B (en) 2019-06-27 2019-06-27 Data operation method, device, medium and equipment based on single-instruction multi-data stream

Country Status (1)

Country Link
CN (1) CN112148371B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6341300B1 (en) * 1999-01-29 2002-01-22 Sun Microsystems, Inc. Parallel fixed point square root and reciprocal square root computation unit in a processor
JP2004078886A (en) * 2002-06-20 2004-03-11 Matsushita Electric Ind Co Ltd Floating point storing method and floating point operating device
CN1993728A (en) * 2004-08-04 2007-07-04 辉达公司 Filtering unit for floating-point texture data
EP2057549A1 (en) * 2006-08-11 2009-05-13 Aspex Semiconductor Limited Improvements relating to direct data input/output interfaces
CN101620589A (en) * 2008-06-30 2010-01-06 英特尔公司 Efficient parallel floating point exception handling in a processor
CN104111816A (en) * 2014-06-25 2014-10-22 中国人民解放军国防科学技术大学 Multifunctional SIMD structure floating point fusion multiplying and adding arithmetic device in GPDSP
CN104166535A (en) * 2013-07-19 2014-11-26 郑州宇通客车股份有限公司 Fixed point processor and anti-overflow method thereof
CN106951211A (en) * 2017-03-27 2017-07-14 南京大学 A kind of restructural fixed and floating general purpose multipliers
CN107077323A (en) * 2014-11-03 2017-08-18 Arm 有限公司 Use the apparatus and method of the data processing of programmable efficacy data
CN108459840A (en) * 2018-02-14 2018-08-28 中国科学院电子学研究所 A kind of SIMD architecture floating-point fusion point multiplication operation unit

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164544A1 (en) * 2007-12-19 2009-06-25 Jeffrey Dobbek Dynamic range enhancement for arithmetic calculations in real-time control systems using fixed point hardware
US9298457B2 (en) * 2013-01-22 2016-03-29 Altera Corporation SIMD instructions for data compression and decompression
US10445064B2 (en) * 2017-02-03 2019-10-15 Intel Corporation Implementing logarithmic and antilogarithmic operations based on piecewise linear approximation
US10643297B2 (en) * 2017-05-05 2020-05-05 Intel Corporation Dynamic precision management for integer deep learning primitives
US11775805B2 (en) * 2018-06-29 2023-10-03 Intel Coroporation Deep neural network architecture using piecewise linear approximation

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6341300B1 (en) * 1999-01-29 2002-01-22 Sun Microsystems, Inc. Parallel fixed point square root and reciprocal square root computation unit in a processor
JP2004078886A (en) * 2002-06-20 2004-03-11 Matsushita Electric Ind Co Ltd Floating point storing method and floating point operating device
CN1993728A (en) * 2004-08-04 2007-07-04 辉达公司 Filtering unit for floating-point texture data
EP2057549A1 (en) * 2006-08-11 2009-05-13 Aspex Semiconductor Limited Improvements relating to direct data input/output interfaces
CN101620589A (en) * 2008-06-30 2010-01-06 英特尔公司 Efficient parallel floating point exception handling in a processor
CN104166535A (en) * 2013-07-19 2014-11-26 郑州宇通客车股份有限公司 Fixed point processor and anti-overflow method thereof
CN104111816A (en) * 2014-06-25 2014-10-22 中国人民解放军国防科学技术大学 Multifunctional SIMD structure floating point fusion multiplying and adding arithmetic device in GPDSP
CN107077323A (en) * 2014-11-03 2017-08-18 Arm 有限公司 Use the apparatus and method of the data processing of programmable efficacy data
CN106951211A (en) * 2017-03-27 2017-07-14 南京大学 A kind of restructural fixed and floating general purpose multipliers
CN108459840A (en) * 2018-02-14 2018-08-28 中国科学院电子学研究所 A kind of SIMD architecture floating-point fusion point multiplication operation unit

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一个面向移动设备的可编程顶点处理器的设计;杨毅;郭立;史鸿声;季建;;中国科学技术大学学报(第02期) *
一种单精度浮点对数运算的硬件实现;焦永;;电脑知识与技术(第01期) *

Also Published As

Publication number Publication date
CN112148371A (en) 2020-12-29

Similar Documents

Publication Publication Date Title
KR102447636B1 (en) Apparatus and method for performing arithmetic operations for accumulating floating point numbers
CN110688158B (en) Computing device and processing system of neural network
US9519460B1 (en) Universal single instruction multiple data multiplier and wide accumulator unit
CN112230881A (en) Floating-point number processor
US20170220344A1 (en) Stochastic rounding floating-point add instruction using entropy from a register
US20210182026A1 (en) Compressing like-magnitude partial products in multiply accumulation
JP2006107463A (en) Apparatus for performing multiply-add operations on packed data
CN111936965A (en) Random rounding logic
US10445066B2 (en) Stochastic rounding floating-point multiply instruction using entropy from a register
CN112148371B (en) Data operation method, device, medium and equipment based on single-instruction multi-data stream
CN111445016A (en) System and method for accelerating nonlinear mathematical computation
US20220188109A1 (en) System and method for handling floating point hardware exception
US7747669B2 (en) Rounding of binary integers
US20200081784A1 (en) Load exploitation and improved pipelineability of hardware instructions
Blanchet et al. Computer architecture
US5661674A (en) Divide to integer
CN110826706A (en) Data processing method and device for neural network
US8041927B2 (en) Processor apparatus and method of processing multiple data by single instructions
EP3118737B1 (en) Arithmetic processing device and method of controlling arithmetic processing device
US8185723B2 (en) Method and apparatus to extract integer and fractional components from floating-point data
WO2023141933A1 (en) Techniques, devices, and instruction set architecture for efficient modular division and inversion
Shaikh et al. IEEE 754-Based Single-and Double-Precision Floating-Point Multiplier Analysis
Sundaresan et al. High speed BCD adder
CN113010143A (en) System and method for handling floating point hardware exceptions
CN117827282A (en) Instruction generation method, data processing method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant