CN112346782A

CN112346782A - Method, device and equipment for processing data in function and storage medium

Info

Publication number: CN112346782A
Application number: CN201910726018.2A
Authority: CN
Inventors: 张亚霏
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-08-07
Filing date: 2019-08-07
Publication date: 2021-02-09

Abstract

The application discloses a method, a device, equipment and a storage medium for processing data in a function, wherein the method comprises the following steps: determining a vector in an input objective function; determining the number of elements of Single Instruction Multiple Data (SIMD) streams processed at a single time based on the number of bits of single elements in the vector and the number of bits of data of Single Instruction Multiple Data (SIMD) streams processed at a single time; determining a first element set and a second element set in the vector based on the number of elements in the vector and the number of elements in single-instruction-multiple-data-stream single processing; the ratio of the sum of the number of elements in the first element set to the number of elements processed by the single instruction multiple data stream at a time is a positive integer; processing the first set of elements based on the single instruction multiple data stream; processing the second set of elements based on a single instruction, single data stream. By adopting the technical scheme, the processing efficiency of the data in the elements is improved, and the operation speed of the function is improved.

Description

Method, device and equipment for processing data in function and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing data in a function.

Background

The RELU function is a core function in deep learning, and its speed is crucial. Currently, the RELU functions known in the art are all implemented using Single Instruction Single Data (SISD) instructions. The SISD instruction processes one element of an input vector at a time until all elements in the vector have been processed.

While the computation speed of the RELU function based on SISD instructions is slow, when the RELU function is executed in a critical area, such as an online service at a glance, slower speed means greater latency, more timeout failures, and poorer user experience.

Therefore, it is necessary to provide a data processing method, apparatus, device, and storage medium that can improve the operation speed of a function.

Disclosure of Invention

The application provides a method, a device, equipment and a storage medium for processing data in a function, which can improve the processing efficiency of data in elements, thereby improving the operation speed of the function and improving the user experience.

In one aspect, the present application provides a method for processing data in a function, where the method includes:

determining a vector in an input objective function;

determining the number of elements of Single Instruction Multiple Data (SIMD) streams processed at a single time based on the number of bits of single elements in the vector and the number of bits of data of Single Instruction Multiple Data (SIMD) streams processed at a single time;

determining a first element set and a second element set in the vector based on the number of elements in the vector and the number of elements in single-instruction-multiple-data-stream single processing; the ratio of the sum of the number of elements in the first element set to the number of elements processed by the single instruction multiple data stream at a single time is a positive integer, and the second element set is a set of elements in the vector except the first element set;

processing the first set of elements based on the single instruction multiple data stream;

processing the second set of elements based on a single instruction, single data stream.

Another aspect provides an apparatus for data processing in a function, the apparatus comprising:

the vector determining module is used for determining a vector in the input target function;

a single processing element number determining module, configured to determine the number of elements of a single instruction multiple data stream that are processed at a single time based on the number of bits of a single element in the vector and the number of bits of single processing data of the single instruction multiple data stream;

the element set determining module is used for determining a first element set and a second element set in the vector based on the number of elements in the vector and the number of elements of single instruction multiple data stream single processing; the ratio of the sum of the number of elements in the first element set to the number of elements processed by the single instruction multiple data stream at a single time is a positive integer, and the second element set is a set of elements in the vector except the first element set;

a first element set processing module to process the first element set based on the single instruction multiple data stream;

and the second element set processing module is used for processing the second element set based on the single-instruction single data stream.

Another aspect provides a data processing apparatus in function, the apparatus comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the data processing method in function as described above.

Another aspect provides a computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement a method of data processing in a function as described above.

The method, the device, the equipment and the storage medium for processing the data in the function have the following technical effects:

the method comprises the steps of determining the number of elements which can be processed by a single instruction multiple data stream at one time according to the number of bits of a single element in an input vector of a function and the number of bits of single instruction multiple data stream single processing data; determining the number of elements allocated to the single instruction multiple data stream by combining the total number of the elements in the vector, and allocating the rest elements to the single instruction single data stream for processing; based on single instruction multiple data streams, multiple elements can be processed at one time, and by the adoption of the scheme, the processing efficiency of data in the elements is improved, so that the operation speed of the function is improved, and the user experience is improved.

Drawings

In order to more clearly illustrate the technical solutions and advantages of the embodiments of the present application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a diagram of a data processing system in function according to an embodiment of the present application;

FIG. 2 is a schematic flowchart of a method for processing data in a function according to an embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating a method for determining the number of elements in a single instruction multiple data stream for a single processing according to an embodiment of the present application;

FIG. 4 is a schematic flow chart diagram illustrating a method for determining a first set of elements and a second set of elements in the vector according to an embodiment of the present application;

FIG. 5 is a schematic flow chart diagram illustrating another method for determining a first set of elements and a second set of elements in the vector according to an embodiment of the present application;

FIG. 6 is a flowchart illustrating a method for determining a first number of elements in the first element set according to an embodiment of the present disclosure;

FIG. 7 is a flowchart illustrating a method for processing the first set of elements based on the SIMD stream according to an embodiment of the present application;

FIG. 8 is a flow chart illustrating a method for determining a plurality of element groups according to an embodiment of the present disclosure;

FIG. 9 is a flowchart illustrating a method for processing data in a vector according to an embodiment of the present application;

FIG. 10 is a schematic structural diagram of a data processing apparatus in function according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an element set determining module provided in an embodiment of the present application;

fig. 12 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, fig. 1 is a schematic diagram of a data processing system in function according to an embodiment of the present application, and as shown in fig. 1, the data processing system in function may at least include a server 01 and a client 02.

Specifically, in this embodiment of the present disclosure, the server 01 may include a server that operates independently, or a distributed server, or a server cluster composed of a plurality of servers. The server 01 may comprise a network communication unit, a processor, a memory, etc. Specifically, the server 01 may be configured to receive an information query request sent by the client 02, and perform data processing by using a function to obtain query information.

Specifically, in the embodiment of the present disclosure, the client 02 may include a physical device such as a smart phone, a desktop computer, a tablet computer, a notebook computer, a digital assistant, and a smart wearable device, and may also include software running in the physical device, such as a web page provided by some service providers to a user, and an application provided by the service providers to the user. Specifically, the client 02 may be configured to send an information query request to the server 01.

The following describes a method for processing data in a function according to the present application, and fig. 2 is a schematic flow chart of a method for processing data in a function according to an embodiment of the present application, and the present specification provides the method operation steps as described in the embodiment or the flow chart, but may include more or less operation steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. In practice, the system or server product may be implemented in a sequential or parallel manner (e.g., parallel processor or multi-threaded environment) according to the embodiments or methods shown in the figures. Specifically, as shown in fig. 2, the method may include:

s201: and determining a vector in the input objective function.

In the embodiments of the present disclosure, the objective function may be a Linear rectification function (ReLU), which is also called a modified Linear Unit (modified Linear Unit), and is an activation function commonly used in an artificial neural network, and generally refers to a non-Linear function represented by a ramp function and its variants.

In embodiments of the present specification, the objective function may comprise one vector or a plurality of vectors.

S203: determining the number of elements of the single instruction multiple data stream for single processing based on the number of bits of the single element in the vector and the number of bits of the single processing data of the single instruction multiple data stream.

In the embodiment of the present specification, a Single Instruction Multiple Data Stream (SIMD) refers to a class of Instruction sets capable of processing Multiple Data elements simultaneously in a Single Instruction cycle, Data level parallelism is used to improve the operating efficiency, and the use environment of such instructions is to perform the same processing on Multiple Data, so a typical application scenario is in the multimedia field, especially in the codec flow therein.

The single instruction multiple data flow means that only one instruction cycle is needed to simultaneously process a plurality of data in batches, and although the instruction cycle of the instruction itself may be longer than that of a general instruction, the overall consideration improves the data processing efficiency.

In this embodiment, as shown in fig. 3, the determining the number of elements processed in a single instruction multiple data stream based on the number of bits of a single element in the vector and the number of bits of data processed in a single instruction multiple data stream may include:

s2031: determining a number of bits of a single element in the vector;

s2033: and calculating the ratio of the single processing data bit number of the single instruction multiple data stream to the bit number of the single element in the vector to obtain the single processing element number of the single instruction multiple data stream.

In this specification, the types of the elements in the vector are the same, the number of bits of each element is also the same, and the number of bits of a single element in the vector may be a binary number of the element, for example, the number of bits of an element of a single-precision floating point number is 32 bits, and the number of bits of an element of a double-precision floating point number is 64 bits; the SIMD instruction may include an asm _ max instruction, which may process 128, 256, 512, … … bits of data at a time.

In some embodiments, the element type T of the vector is a single precision (float, 32 bits) or double precision (double, 64 bits) floating point number, b1 is used to represent a binary bit number of 1T element, and b1 is 32 or 64. As shown in fig. 9, the asm _ max instruction can process b2 bits of data at a time, and b2 can be 128, 256, 512, …. Obviously, b-b 2/b1 is the number of T elements that an asm _ max instruction can process at one time, and b must be an integer power of 2. b is the element number of single instruction multiple data stream single processing.

S205: determining a first element set and a second element set in the vector based on the number of elements in the vector and the number of elements in single-instruction-multiple-data-stream single processing; the ratio of the sum of the number of elements in the first element set to the number of elements processed by the single instruction multiple data stream at a single time is a positive integer, and the second element set is a set of elements in the vector except the first element set.

In this embodiment, as shown in fig. 4, the determining the first element set and the second element set in the vector based on the number of elements in the vector and the number of elements in a single processing of the single instruction multiple data stream may include:

s2051: dividing the number of elements in the vector by the number of elements of single-instruction-multiple-data-stream single-processing to obtain quotient and remainder of the elements;

s2053: determining a first number of elements in the first element set based on a quotient of the elements and a number of elements in a single processing of the SIMD stream;

s2055: taking the remainder of the elements as a second number of elements in the second set of elements;

in this embodiment, the remainder of the elements is less than the number of elements processed by the single instruction multiple data stream in a single time.

In this embodiment, the second number may be determined according to the first number, and the second number is obtained by subtracting the first number from the total number of elements in the vector.

S2057: based on the first number and the second number, a first set of elements and a second set of elements in the vector are determined.

In some embodiments, as shown in fig. 5, the determining the first set of elements and the second set of elements in the vector based on the first number and the second number may include:

s20571: forming the first element set by a first number of elements in the vector which are ordered at the top;

s20573: and forming a second quantity of elements which are ranked backwards in the vector into the second element set.

In some embodiments, a second set of elements may also be determined from the first set of elements, and elements in the vector other than the first set of elements may be grouped into the second set of elements.

In some embodiments, a first number of elements at any position in the vector may be further combined into the first set of elements, and remaining positions of elements may be combined into the second set of elements. For example, two adjacent first number of elements may be taken, or alternatively, alternate elements may be taken; on the premise of ensuring the number of the elements in the first element set, the positions of the elements in the first element set can be determined according to actual conditions.

In this embodiment, a first number of elements in the vector that are ordered later may be grouped into the first set of elements, and a second number of elements in the vector that are ordered earlier may be grouped into the second set of elements.

In some embodiments, as shown in fig. 6, the determining the first number of elements in the first element set based on the quotient of the elements and the number of elements in the single instruction multiple data stream for the single processing may include:

s20531: calculating the product of the quotient of the elements and the number of the elements processed by the single instruction multiple data stream at a single time;

s20533: taking the product of the quotient of the elements and the number of elements processed by the single instruction multiple data stream at a single time as the first number of elements in the first element set.

S207: processing the first set of elements based on the single instruction multiple data stream.

In this embodiment, as shown in fig. 7, the objective function is configured to determine a larger value of a co-located corresponding element in a first vector and a second vector, the first vector and the second vector having the same length, and the processing the first element set based on the simd stream may include:

s2071: determining a plurality of element groups based on a first set of elements in the first vector and a first set of elements in the second vector;

s2073: and forming the elements with larger values in each element group into a third element set based on the single instruction multiple data stream.

In a specific embodiment, as shown in fig. 9, 03 is Data corresponding to elements in a first vector, 04 is Data corresponding to elements in a second vector, a first element set in the two vectors is processed by an asm _ max Instruction, the number of bits of Data processed by the asm _ max Instruction at a time is b2, the number of Data in the first element set is an integer multiple of b2, and elements (second element set) in the vectors except for the first element set are all processed by a Single Instruction Single Data stream (SISD). SISD is a type of CPU instruction set, and a SISD instruction can process a datum. The number of times the SISD instruction is executed is determined according to the number of bits of data in the first element set.

In the embodiments of the present specification, when the remainder of the elements is 0, all elements in the vector can be processed by SIMD without SISD processing any data.

In the embodiment of the specification, the data is processed by combining SIMD and SISD according to the number of bits of the data in the vector, and based on the characteristic that the SIMD can process a plurality of data once, the SIMD is adopted in the function to process the data, so that the data processing efficiency is improved, and the operation speed of the function is accelerated.

In some embodiments, as shown in fig. 8, the determining a plurality of element groups based on the first set of elements in the first vector and the first set of elements in the second vector may comprise:

s20711: acquiring two elements which are positioned at the same position in a first element set in the first vector and a first element set in the second vector;

s20713: and taking the two elements which are positioned at the same position as an element group.

In embodiments of the present specification, the SIMD may process data in one or more element groups at a time.

S209: processing the second set of elements based on a single instruction, single data stream.

In the embodiments of the present specification, the single instruction single data stream can only process one data in an element at a time.

In some embodiments, the RELU function may be: y ═ max (x, 0);

inputting: the type is a vector x with the length of n bytes;

and (3) outputting: the type is a vector y of length n bytes.

The data processing method in the function can be realized by adopting the code of the following version one.

Version one:

1. initializing a vector y of length n bytes

2. Initializing a zero vector zero of length b bytes

I ═ 0 (where i is an integer variable)

N1 ═ n% b (where n1 refers to the remainder of n for b, where n refers to the length of the vector x, and where n in the code below has the same meaning as n, n1< b)

For (i ═ 0; i < n 1; i ═ i + b) (where i ═ 0 refers to the loop start condition; i < n1 is used to define the loop end condition, when this condition is not met, the loop exits; i ═ i + b refers to i to be added to b every time the loop is executed)

Y [ i: i + b ] ═ asm _ max (x [ i: i + b ], zero) (where asm _ max is a SIMD instruction here, x [ i: i + b ] refers to the i-th to i + b-1-th byte in x)

7.endfor

For (; i < n; i ═ i +1) (where i < n is used to define the loop termination condition, when this condition is not met, the loop exits; i ═ i +1 means that i is added to 1 every time the loop is executed)

Y [ i ] ═ asm _ max (x [ i ],0) (where asm _ max is a SISD instruction here)

10.endfor

11. And returning to the step y.

In other embodiments, the data processing method in the function can also be circularly expanded to expand k paths, wherein k is an integral power of 2; the k value can be determined according to the number of registers in a Central Processing Unit (CPU), wherein the k value is large when the number of registers is large, and the k value is small when the number of registers is small; utilizing all of the above registers; the data processing method in the function can be realized by adopting the code of the following version two.

Version two:

1. initializing a vector y with the length of n bytes;

2. initializing a zero vector zero with the length of b bytes;

i is 0; (wherein i is an integer variable)

N1 ═ n% (b × k) (where n1 is the remainder of n for b × k, where n is the length of the vector x, and n is the same as n in the following code, n1 ═ b × k)

For (i ═ 0; i < n 1; i ═ i + b × (k) (where i ═ 0 means the loop start condition; i < n1 is used to define the loop end condition and when this condition is not met, the loop exits; i ═ i + b × (k) means that i is added to b × (k) every time the loop is executed)

Y [ i + b: i + b 2] ═ asm _ max (x [ i + b: i + b 2], zero) (where asm _ max is a SIMD instruction here, x [ i + b: i + b 2] means the i + b to i + b 2-1 bytes in x)

Y [ i + b 2: i + b 3] ═ asm _ max (x [ i + b 2: i + b 3], zero) (where asm _ max is a SIMD instruction here, x [ i + b 2: i + b 3] means the i + b2 byte to the i + b 3-1 byte in x)

9.…

Y [ i + b (k-1): i + b k ] ═ asm _ max (x [ i + b (k-1): i + b k ], zero) (where asm _ max is a SIMD instruction and x [ i + b (k-1): i + b k ] means the i + b (k-1) th to i + b k-1 th bytes in x)

11.endfor

Y [ i ] ═ asm _ max (x [ i ],0) (where asm _ max is a SISD instruction here)

14.endfor

15. And returning to the step y.

The first version calls a single SIMD instruction, and the second version calls a plurality of SIMD instructions; if the step 5 of the version one needs to be executed m times, the step 5 of the version two only needs to be executed m/k times, and the time consumption is low; it can be seen that the time that version two spends on loop control is 1/k of version one; the execution times of the two steps in the loop body are the same, namely the execution time of the step 6 of the version one is the same as that of the step 6-10 of the version two; thus, version two is faster.

In a specific embodiment, the SIMD-based double precision RELU function is 3-8 times faster than the SISD-based RELU function, depending on the dimensionality of the input vector and the type of floating point. For example, in the recommended service, compared with the prior art, the technical scheme of the application reduces the delay of the back end by about 5 milliseconds.

As can be seen from the technical solutions provided by the embodiments of the present specification, in the embodiments of the present specification, the number of elements that can be processed by a single instruction multiple data stream at a time is determined according to the number of bits of a single element in an input vector of a function and the number of bits of data to be processed by the single instruction multiple data stream at a time; determining the number of elements allocated to the single instruction multiple data stream by combining the total number of the elements in the vector, and allocating the rest elements to the single instruction single data stream for processing; based on single instruction multiple data streams, multiple elements can be processed at one time, and by the adoption of the scheme, the processing efficiency of data in the elements is improved, so that the operation speed of the function is improved, and the user experience is improved.

An embodiment of the present application further provides a data processing apparatus in function, as shown in fig. 10, the apparatus may include:

a vector determining module 1010, configured to determine a vector in an input objective function;

a single-processing element number determining module 1020, configured to determine the number of elements of a single instruction multiple data stream that are processed at a single time based on the number of bits of a single element in the vector and the number of bits of single-processing data of the single instruction multiple data stream;

an element set determining module 1030, configured to determine a first element set and a second element set in the vector based on the number of elements in the vector and the number of elements in single-instruction multiple-data stream processing; the ratio of the sum of the number of elements in the first element set to the number of elements processed by the single instruction multiple data stream at a single time is a positive integer, and the second element set is a set of elements in the vector except the first element set;

a first element set processing module 1040, configured to process the first element set based on the single instruction multiple data stream;

a second element set processing module 1050 configured to process the second element set based on a single instruction single data stream.

In some embodiments, the single-processing element count determination module may include:

an element bit number determination unit for determining the bit number of a single element in the vector;

and the element number determining unit is used for calculating the ratio of the single processing data bit number of the single instruction multiple data stream to the bit number of the single element in the vector to obtain the single processing element number of the single instruction multiple data stream.

In some embodiments, as shown in fig. 11, the element set determination module may include:

a quotient and remainder determining unit 1110, configured to divide the number of elements in the vector by the number of elements in the single-instruction-multiple-data stream for single processing to obtain a quotient and a remainder of the elements;

a first number determination unit 1120, configured to determine a first number of elements in the first element set based on a quotient of the elements and a number of elements of a single processing of the single instruction multiple data stream;

a second number determination unit 1130 configured to use a remainder of the elements as a second number of elements in the second element set;

an element set determining unit 1140, configured to determine a first element set and a second element set in the vector based on the first number and the second number.

In some embodiments, the element set determining unit may include:

a first element set determining subunit, configured to combine a first number of elements, ordered top in the vector, into the first element set;

a second element set determining subunit, configured to form the second element set by a second number of elements in the vector that are ranked next to each other.

In some embodiments, the first number determination unit may include:

a product calculating subunit, configured to calculate a product of the quotient of the element and the number of elements processed by the single instruction multiple data stream at a single time;

a first number determination subunit, configured to use a product of a quotient of the element and a number of elements of the single instruction multiple data stream single processing as a first number of elements in the first element set.

In some embodiments, the objective function is used to determine a larger value in a co-located corresponding element in a first vector and a second vector, the first vector and the second vector being the same length, the first element set processing module may include:

an element group determination unit configured to determine a plurality of element groups based on a first set of elements in the first vector and a first set of elements in the second vector;

and the third element set determining unit is used for forming the elements with larger values in each element group into a third element set based on the single instruction multiple data stream.

In some embodiments, the element group determination unit may include:

the element obtaining subunit is configured to obtain two elements, which are located at the same position in the first element set in the first vector and the first element set in the second vector;

and the element group determining subunit is used for taking the two elements located at the same position as an element group.

The device and method embodiments in the device embodiment described are based on the same inventive concept.

The embodiment of the present application provides a data processing device in function, which includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or a set of instructions, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the data processing method in function provided by the above method embodiment.

Embodiments of the present application further provide a computer-readable storage medium, which may be disposed in a server to store at least one instruction, at least one program, a code set, or a set of instructions related to implementing a data processing method in a function in the method embodiments, where the at least one instruction, the at least one program, the code set, or the set of instructions are loaded and executed by the processor to implement the data processing method in the function provided by the above method embodiments.

Alternatively, in the present specification embodiment, the storage medium may be located at least one network server among a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The memory described in the embodiments of the present disclosure may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system, application programs needed by functions and the like; the storage data area may store data created according to use of the apparatus, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide the processor access to the memory.

The embodiment of the data processing method in the function provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal, a server or a similar operation device. Taking the example of the application running on a server, fig. 12 is a hardware structure block diagram of the server of a data processing method in a function provided in the embodiment of the present application. As shown in fig. 12, the server 1200 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1210 (the processors 1210 may include but are not limited to Processing devices such as a microprocessor MCU or a programmable logic device FPGA), a memory 1230 for storing data, and one or more storage media 1220 (e.g., one or more mass storage devices) for storing applications 1223 or data 1222. Memory 1230 and storage media 1220, among other things, may be transient storage or persistent storage. The program stored in the storage medium 1220 may include one or more modules, each of which may include a series of instruction operations for a server. Further, the central processor 1210 may be configured to communicate with the storage medium 1220, and execute a series of instruction operations in the storage medium 1220 on the server 1200. The server 1200 may also include one or more power supplies 1260, one or more wired or wireless network interfaces 1250, one or more input-output interfaces 1240, and/or one or more operating systems 1221, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

The input/output interface 1240 may be used to receive or transmit data via a network. The specific example of the network described above may include a wireless network provided by a communication provider of the server 1200. In one example, the input/output Interface 1240 includes a Network Interface Controller (NIC) that may be coupled to other Network devices via a base station to communicate with the internet. In one example, the input/output interface 1240 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

It will be understood by those skilled in the art that the structure shown in fig. 12 is only an illustration and is not intended to limit the structure of the electronic device. For example, server 1200 may also include more or fewer components than shown in FIG. 12, or have a different configuration than shown in FIG. 12.

As can be seen from the embodiments of the method, the apparatus, the device, or the storage medium for processing data in a function provided by the present application, the present application determines the number of elements that can be processed by a single instruction multiple data stream at a time according to the number of bits of a single element in an input vector of the function and the number of bits of data processed by the single instruction multiple data stream at a time; determining the number of elements allocated to the single instruction multiple data stream by combining the total number of the elements in the vector, and allocating the rest elements to the single instruction single data stream for processing; based on single instruction multiple data streams, multiple elements can be processed at one time, and by the adoption of the scheme, the processing efficiency of data in the elements is improved, so that the operation speed of the function is improved, and the user experience is improved.

It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus, device, and storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for processing data in a function, the method comprising:

determining a vector in an input objective function;

2. The method of claim 1, wherein determining the number of single-instruction-multiple-data-stream single-processing elements based on the number of single-element bits in the vector and the number of single-instruction-multiple-data-stream single-processing data bits comprises:

determining a number of bits of a single element in the vector;

and calculating the ratio of the single processing data bit number of the single instruction multiple data stream to the bit number of the single element in the vector to obtain the single processing element number of the single instruction multiple data stream.

3. The method of claim 1, wherein determining the first set of elements and the second set of elements in the vector based on the number of elements in the vector and the number of elements in a single processing of the single instruction multiple data stream comprises:

dividing the number of elements in the vector by the number of elements of single-instruction-multiple-data-stream single-processing to obtain quotient and remainder of the elements;

determining a first number of elements in the first element set based on a quotient of the elements and a number of elements in a single processing of the SIMD stream;

taking the remainder of the elements as a second number of elements in the second set of elements;

based on the first number and the second number, a first set of elements and a second set of elements in the vector are determined.

4. The method of claim 3, wherein determining the first set of elements and the second set of elements in the vector based on the first number and the second number comprises:

forming the first element set by a first number of elements in the vector which are ordered at the top;

and forming a second quantity of elements which are ranked backwards in the vector into the second element set.

5. The method of claim 3, wherein determining the first number of elements in the first set of elements based on the quotient of the elements and the number of elements in a single pass of the single instruction multiple data stream comprises:

calculating the product of the quotient of the elements and the number of the elements processed by the single instruction multiple data stream at a single time;

taking the product of the quotient of the elements and the number of elements processed by the single instruction multiple data stream at a single time as the first number of elements in the first element set.

6. The method of claim 1, wherein the objective function is used to determine a larger value in a co-located corresponding element in a first vector and a second vector, the first vector and the second vector being the same length, wherein processing the first set of elements based on the single instruction multiple data stream comprises:

determining a plurality of element groups based on a first set of elements in the first vector and a first set of elements in the second vector;

and forming the elements with larger values in each element group into a third element set based on the single instruction multiple data stream.

7. The method of claim 6, wherein determining a plurality of element groups based on the first set of elements in the first vector and the first set of elements in the second vector comprises:

acquiring two elements which are positioned at the same position in a first element set in the first vector and a first element set in the second vector;

and taking the two elements which are positioned at the same position as an element group.

8. An apparatus for data processing in a function, the apparatus comprising:

9. A data-in-function processing apparatus, comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, the at least one instruction, the at least one program, set of codes, or set of instructions being loaded and executed by the processor to implement a data-in-function processing method according to any one of claims 1 to 7.

10. A computer-readable storage medium, having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a method of data processing in a function according to any one of claims 1 to 7.