CN114201729A

CN114201729A - Method, device and equipment for selecting matrix operation mode and storage medium

Info

Publication number: CN114201729A
Application number: CN202111454114.XA
Authority: CN
Inventors: 杨良志; 白琳; 汪志新; 白小刚; 瞿勇金; 王向军
Original assignee: Richinfo Technology Co ltd
Current assignee: Richinfo Technology Co ltd
Priority date: 2021-12-01
Filing date: 2021-12-01
Publication date: 2022-03-18

Abstract

The invention discloses a method, a device, equipment and a storage medium for selecting a matrix operation mode, and belongs to the technical field of deep learning. The method comprises the following steps: identifying a candidate hardware environment in which the neural network is located; wherein the candidate hardware environment comprises at least one of a CPU, AVX, SSE, and a promotion; determining a target hardware environment from at least one candidate hardware environment, and determining a target matrix operation mode corresponding to the target hardware; and calculating the neural network by adopting the target matrix operation mode. By the technical scheme, the matrix operation capability of the running environment where the neural network is located can be intelligently selected, and the hardware capability can be maximally utilized to accelerate the matrix operation speed.

Description

Method, device and equipment for selecting matrix operation mode and storage medium

Technical Field

The embodiment of the invention relates to the technical field of deep learning, in particular to a method, a device, equipment and a storage medium for selecting a matrix operation mode.

Background

The AI deep learning mainly adopts a neural network algorithm, and the most important operation in the forward and backward calculation of the neural network is the calculation of a large floating-point number matrix. How to increase the operation speed of the neural network program is particularly important.

Disclosure of Invention

The invention provides a method, a device, equipment and a storage medium for selecting a matrix operation mode.

In a first aspect, an embodiment of the present invention provides a method for selecting a matrix operation mode, where the method includes:

identifying a candidate hardware environment in which the neural network is located; wherein the candidate hardware environment comprises at least one of a CPU, AVX, SSE, and a promotion;

determining a target hardware environment from at least one candidate hardware environment, and determining a target matrix operation mode corresponding to the target hardware;

and calculating the neural network by adopting the target matrix operation mode.

In a second aspect, an embodiment of the present invention further provides a device for selecting a matrix operation mode, where the device includes:

the candidate environment identification module is used for identifying a candidate hardware environment where the neural network is located; wherein the candidate hardware environment comprises at least one of a CPU, AVX, SSE, and a promotion;

the target determining module is used for determining a target hardware environment from at least one candidate hardware environment and determining a target matrix operation mode corresponding to the target hardware;

and the operation module is used for operating the neural network by adopting the target matrix operation mode.

In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:

one or more processors;

a memory for storing one or more programs;

when the one or more programs are executed by the one or more processors, the one or more processors implement the method for selecting the matrix operation mode provided by any embodiment of the invention.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for selecting the matrix operation manner according to any embodiment of the present invention.

According to the technical scheme of the embodiment of the invention, the candidate hardware environment where the neural network is located is identified; wherein the candidate hardware environment includes at least one of CPU, AVX, SSE and upgrade equipment; and then determining a target hardware environment from at least one candidate hardware environment, determining a target matrix operation mode corresponding to the target hardware, and further adopting the target matrix operation mode to operate the neural network. According to the technical scheme, the matrix operation capability of the running environment where the neural network is located can be intelligently selected, and the hardware capability can be maximally utilized to accelerate the matrix operation speed.

Drawings

Fig. 1 is a flowchart of a method for selecting a matrix operation manner according to an embodiment of the present invention;

fig. 2 is a flowchart of a method for selecting a matrix operation manner according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a selection device for a matrix operation method according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a method for selecting a matrix operation manner according to an embodiment of the present invention, where the embodiment is applicable to how to intelligently select a matrix operation manner, and the method may be executed by a selection device of a matrix operation manner, where the selection device may be implemented by software and/or hardware, and may be integrated in an electronic device, such as a server, that carries a function of selecting a matrix operation manner.

As shown in fig. 1, the method may specifically include:

and S110, identifying a candidate hardware environment where the neural network is located.

In this embodiment, the candidate hardware environment refers to an operating environment configured in a hardware device in which the neural network is located, and may include at least one of a CPU, an AVX, an SSE, and a promotion device. The AVX instruction set, namely Advanced Vector Extensions (AVX), refers to some AMD SSE5 design ideas, and is expanded and strengthened to form a new generation of complete SIMD instruction set specification. The SSE instruction set, known as Streaming single instruction multiple data Extensions (Streaming SIMD Extensions), was first introduced by intel in the Pentium III processor in 1999 and extended vector processing power from 64 bits to 128 bits. The Sheng Teng equipment is a Sheng Teng chip, which provides a new AI computing basic capability.

In this embodiment, the candidate hardware environment of the hardware where the neural network is located may be identified according to a preset detection condition. For example, an M4 directory may be set under the neural network's execution file directory, which contains M4 macro files that detect AVX and its variants, SSE and its variants, Gamerck and its supporting lib, and detect operating system type or other component types. Furthermore, it can call each M4 macro file under M4 directory by the automatic configuration script, and based on the M4 macro file, detect whether there is information of corresponding components and specification models of AVX, SSE and Hirtan device in the hardware where the execution code of the neural network is located.

Furthermore, the recognition result is displayed to the user for the user to check or to make self-defined setting.

And S120, determining a target hardware environment from at least one candidate hardware environment, and determining a target matrix operation mode corresponding to the target hardware.

In this embodiment, the target hardware environment refers to a selected hardware environment in which the neural network operates; the target matrix operation mode is a matrix operation mode adopted by the running of the neural network.

In this embodiment, the target hardware environment may be determined from at least one candidate hardware environment, and a matrix operation manner corresponding to the target hardware environment may be used as the target matrix operation manner.

Optionally, if the number of the candidate hardware environments is at least two, the target hardware environment may be determined from the at least two candidate hardware environments according to the priorities of the candidate hardware environments. Wherein the highest priority of the promotion device is followed by AVX or SSE, and the lowest priority is the CPU. Specifically, the candidate hardware environment with a high priority is set as the target hardware environment.

And S130, calculating the neural network by adopting a target matrix operation mode.

In this embodiment, when the neural network code runs, the neural network code is compiled in a target matrix operation manner.

On the basis of the technical scheme, as an optional mode of the embodiment of the invention, after the candidate hardware environment where the neural network is located is identified, whether the neural network is located in the set hardware environment can be identified according to the user configuration information; if yes, taking the matrix operation mode corresponding to the set hardware environment as a target matrix operation mode. And if the neural network is identified not to be in the set hardware environment, displaying error information to the user.

The user configuration information may be set hardware identifier and other information. The set hardware environment refers to an operating environment of the neural network specified by a user according to requirements.

Specifically, according to a hardware identifier in the user configuration information, whether the neural network is in a set hardware environment is identified, that is, whether set hardware is configured in a device running the neural network; if yes, setting a hardware environment and taking the corresponding matrix operation mode as a target matrix operation mode. If the neural network is identified not to be in the set hardware environment, an error message, such as "no corresponding hardware environment detected", is presented to the user.

It should be noted that, when the user configures a required hardware environment, whether the device in which the neural network is located includes other candidate hardware environments with higher priority than the set hardware environment, the set hardware environment is preferentially adopted as the target hardware environment.

It can be understood that the scheme provided by the invention can support a user to select a hardware environment in the device to run the neural network code according to actual requirements, so that a comparative test can be performed in an experimental stage.

Example two

Fig. 2 is a flowchart of a method for selecting a matrix operation manner according to a second embodiment of the present invention, and an optional implementation is provided for further optimizing "operation on a neural network by using a target matrix operation manner" based on the second embodiment.

As shown in fig. 2, the method may specifically include:

s210, identifying a candidate hardware environment where the neural network is located; wherein the candidate hardware environment includes at least one of CPU, AVX, SSE, and upgrade facility.

S220, determining a target hardware environment from at least one candidate hardware environment, and determining a target matrix operation mode corresponding to the target hardware.

And S230, calculating the neural network by adopting a target matrix operation mode.

Optionally, if the target hardware environment is a CPU, the operation on the neural network may be performed in a target matrix operation manner, where for each layer of the neural network, an output matrix of a layer above the layer is used as an input matrix of the layer; for each row vector of the input matrix, traversing the column vectors of the weight matrix of the layer; for each column vector, respectively calculating the product of the element of the row vector and the corresponding element of the column vector, and adding the products to obtain the dot product of the row vector and the column vector; and adding the bias value to each dot product to obtain the elements of the output matrix of the layer.

Specifically, for any layer of the neural network, the output matrix of the layer above the layer is used as the input matrix of the layer, further, for each row vector of the input matrix, the column vector of the weight matrix of the layer is traversed, further, for each column vector of the weight matrix, the product of the element of the row vector and the element of the corresponding column vector is calculated, the products are added, the added product is used as the dot product of the row vector and the column vector, and then the dot product of the row vector and each column vector of the weight matrix is added with the corresponding offset value, so as to obtain each element of the row vector of the output matrix of the layer. That is, the dot product result of one row of the input matrix and all columns in the weight matrix constitutes one row of the output matrix.

It should be noted that, for the first layer of the neural network, the input matrix of the layer is the input matrix of the neural network.

Optionally, if the target hardware environment is AVX or SSE, the operation of the neural network by using the target matrix operation mode may be that, for each layer of the neural network, an output matrix of a layer above the layer is used as an input matrix of the layer; for each row vector of the input matrix, traversing the column vectors of the weight matrix of the layer; for each column vector, transferring the row vector and the column vector to AVX or SSE to obtain a dot product of the row vector and the column vector; and adding the bias value to each dot product to obtain an output matrix of the layer.

Specifically, for any layer of the neural network, the output matrix of the layer above the layer is used as the input matrix of the layer, further, for each row vector of the input matrix, the column vector of the weight matrix of the layer is traversed, the row vector and each column vector of the weight matrix are respectively transmitted to AVX or SSE, the dot product of the row vector and each column vector of the weight matrix is obtained, and further, the dot product of the row vector and each column vector of the weight matrix is respectively added with the corresponding offset value, and each element of the row vector of one row of the output matrix of the layer is obtained. That is, the dot product result of one row of the input matrix and all columns in the weight matrix constitutes one row of the output matrix.

Optionally, if the target hardware environment is a soar device, the operation of the neural network by using the target matrix operation method may be to use the output matrix of the layer above the layer as the input matrix of the layer for each layer of the neural network; constructing a first matrix according to the input matrix, and constructing a second matrix according to the weight matrix of the layer; copying the first matrix, the second matrix and the offset vector from the host memory to the memory of the keep alive device; and calling a matrix multiplier to calculate the first matrix, the second matrix and the offset vector to obtain an output matrix of the layer.

Specifically, for any layer of the neural network, an output matrix of the layer above the layer is used as an input matrix of the layer, the input matrix is used for constructing a first matrix according to BATCH HEIGHT WIDTH DEPTH, a weight matrix is converted into a second matrix of input m multiplied by output n, a BIAS value (BIAS) is constructed into a BIAS (BIAS) vector, the first matrix, the second matrix and the BIAS vector are copied from a host memory to a memory of a rising device, and then a matmul operator is called to calculate matrix multiplication on the rising device to obtain an output result of the layer. That is, the dot product result of one row of the input matrix and all columns in the weight matrix constitutes one row of the output matrix.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a selection apparatus for a matrix operation manner according to a third embodiment of the present invention, which is applicable to how to intelligently select a matrix operation manner.

As shown in fig. 3, the apparatus may specifically include:

a candidate environment identification module 310, configured to identify a candidate hardware environment in which the neural network is located; wherein the candidate hardware environment includes at least one of CPU, AVX, SSE and upgrade equipment;

a target determining module 320, configured to determine a target hardware environment from at least one candidate hardware environment, and determine a target matrix operation manner corresponding to the target hardware;

and the operation module 330 is configured to perform operation on the neural network by using a target matrix operation mode.

Further, the candidate hardware environments include at least two types, and the goal determining module 320 is specifically configured to:

and determining the target hardware environment from at least two candidate hardware environments according to the priorities of the candidate hardware environments.

Further, if the target hardware environment is a CPU, the operation module 330 is specifically configured to:

for each layer of the neural network, taking an output matrix of a layer above the layer as an input matrix of the layer;

for each row vector of the input matrix, traversing the column vectors of the weight matrix of the layer;

for each column vector, respectively calculating the product of the element of the row vector and the corresponding element of the column vector, and adding the products to obtain the dot product of the row vector and the column vector;

and adding the bias value to each dot product to obtain the elements of the output matrix of the layer.

Further, if the target hardware environment is AVX or SSE, the operation module 330 is specifically configured to:

for each column vector, transferring the row vector and the column vector to AVX or SSE to obtain a dot product of the row vector and the column vector;

and adding the bias value to each dot product to obtain an output matrix of the layer.

Further, if the target hardware environment is a soaring device, the operation module 330 is specifically configured to:

constructing a first matrix according to the input matrix, and constructing a second matrix according to the weight matrix of the layer;

copying the first matrix, the second matrix and the offset vector from the host memory to the memory of the keep alive device;

and calling a matrix multiplier to calculate the first matrix, the second matrix and the offset vector to obtain an output matrix of the layer.

Further, the apparatus further comprises a custom module configured to:

identifying whether the neural network is in a set hardware environment or not according to the user configuration information;

if yes, taking the matrix operation mode corresponding to the set hardware environment as a target matrix operation mode.

Further, the custom module is further configured to:

if the neural network is identified not to be in the set hardware environment, displaying error information to the user

The selection device of the matrix operation mode can execute the selection method of the matrix operation mode provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Example four

Fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention, and fig. 4 shows a block diagram of an exemplary device suitable for implementing the embodiment of the present invention. The device shown in fig. 4 is only an example and should not bring any limitation to the function and the scope of use of the embodiments of the present invention.

As shown in FIG. 4, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory (cache 32). The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of embodiments described herein.

Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and data processing by running a program stored in the system memory 28, for example, to implement a selection method of a matrix operation manner provided by an embodiment of the present invention.

EXAMPLE five

The fifth embodiment of the present invention further provides a computer-readable storage medium, on which a computer program (or referred to as a computer-executable instruction) is stored, where the computer program is used, when executed by a processor, to execute the method for selecting the matrix operation manner provided in the fifth embodiment of the present invention, where the method includes:

identifying a candidate hardware environment in which the neural network is located; wherein the candidate hardware environment includes at least one of CPU, AVX, SSE and upgrade equipment;

and calculating the neural network by adopting a target matrix operation mode.

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the embodiments of the present invention have been described in more detail through the above embodiments, the embodiments of the present invention are not limited to the above embodiments, and many other equivalent embodiments may be included without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for selecting a matrix operation mode is characterized by comprising the following steps:

2. The method of claim 1, wherein determining the target hardware environment from at least one candidate hardware environment if the candidate hardware environments include at least two, comprises:

3. The method of claim 1, wherein if the target hardware environment is a CPU, the performing the operation on the neural network by using the target matrix operation manner comprises:

for each row vector of the input matrix, traversing a column vector of a weight matrix of the layer;

and adding the offset value to each dot product to obtain the elements of the output matrix of the layer.

4. The method of claim 1, wherein if the target hardware environment is AVX or SSE, the operating the neural network in the target matrix operation manner comprises:

and adding the dot products to the offset value respectively to obtain the output matrix of the layer.

5. The method of claim 1, wherein if the target hardware environment is a soaring device, the operating the neural network using the target matrix operation comprises:

copying the first matrix, the second matrix, and the offset vector from a host memory into a memory of a keep alive device;

6. The method of claim 1, wherein after identifying the candidate hardware environment in which the neural network is located, further comprising:

and if so, taking the matrix operation mode corresponding to the set hardware environment as a target matrix operation mode.

7. The method of claim 6, further comprising:

and if the neural network is identified not to be in the set hardware environment, displaying error information to a user.

8. An apparatus for selecting a matrix operation method, comprising:

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method of selecting a manner of matrix operation as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for selecting a matrix operation mode according to any one of claims 1 to 7.