CN109740729B

CN109740729B - Operation method, device and related product

Info

Publication number: CN109740729B
Application number: CN201811534505.0A
Authority: CN
Inventors: 不公告发明人
Original assignee: Anhui Cambricon Information Technology Co Ltd
Current assignee: Anhui Cambricon Information Technology Co Ltd
Priority date: 2018-12-14
Filing date: 2018-12-14
Publication date: 2020-12-22
Anticipated expiration: 2038-12-14
Also published as: CN109740729A

Abstract

The present disclosure relates to an arithmetic method, apparatus and related product, the product comprising a control module, the control module comprising: the system comprises an instruction cache submodule, an instruction cache submodule and a storage queue submodule; the instruction cache submodule is used for storing the calculation instruction associated with the artificial neural network operation; the instruction processing submodule is used for analyzing the calculation instruction to obtain a plurality of operation instructions; the storage queue submodule is configured to store an instruction queue, where the instruction queue includes: and a plurality of operation instructions or calculation instructions to be executed according to the front and back sequence of the queue. Through the method, the operation efficiency of the related product in the operation of the neural network model can be improved.

Description

Operation method, device and related product

Technical Field

The present disclosure relates to the field of machine learning technologies, and in particular, to an operation method, an operation device, and a related product.

Background

The neural network algorithm is a machine learning algorithm which is very popular recently, and achieves very good effects in various fields, such as image recognition, voice recognition, natural language processing and the like. Along with the development of neural network algorithms, the complexity of the algorithms is higher and higher, and in order to improve the recognition degree, the scale of the model is gradually increased. Processing these large-scale models with GPUs and CPUs takes a lot of computation time and consumes a lot of power. Under the condition, a new artificial intelligence processor is provided to improve the operation speed of the neural network model, save the operation time and reduce the power consumption. However, current algorithmic support for new artificial intelligence processors is far from adequate.

Disclosure of Invention

According to an aspect of the present disclosure, there is provided an arithmetic method, the method including:

acquiring a basic operator in an artificial intelligence operator library, wherein the basic operator is used for executing corresponding operation on input data;

and forming a splicing operator by using the basic operator, wherein the splicing operator is used for executing splicing operation on the input data in the artificial intelligence processor, so that the input data is subjected to normalization processing.

In one possible embodiment, the base operator includes a first deformation operator, a second deformation operator and a normalization index operator, the first deformation operator and the second deformation operator are used for performing type conversion processing on the input data, and the normalization index operator is used for performing normalization operation, wherein,

the forming a splicing operator by using the basic operator comprises:

taking the first deformation operator as a preceding stage operator of the normalization index operator;

and taking the second deformation operator as a post-stage operator of the normalization index operator.

In one possible implementation, the operation of stitching comprises:

when the dimensionality of the first input data of the first type is larger than 2 and a first parameter and a second parameter carried in the first input data meet a preset condition, converting the first input data into a second type by using the first deformation operator, wherein the dimensionality of the first input data of the second type is 2;

performing normalization operation on the first input data of the second type on a second dimension by using the normalization index operator to output data of the second type;

and converting the output data of the second type into the output data of the first type by using the second deformation operator.

In one possible embodiment, the base operator includes a scaling operator for scaling the input data and a normalization operator for normalizing the input data, wherein,

the forming a splicing operator by using the basic operator comprises:

and taking the normalization operator as a preceding stage operator of the scaling operator.

In one possible implementation, the operation of stitching comprises:

normalizing the input data by using the normalization operator to obtain a normalization result;

and scaling the normalization result by using the scaling operator to obtain the scaled normalization result.

In one possible embodiment, the base operators include a squaring operator for squaring the input data, a convolution operator for summing the input data, a square root reciprocal operator for squaring the input data and a multiplication operator for multiplying the input data, wherein,

the forming a splicing operator by using the basic operator comprises:

and sequentially splicing the square operator, the convolution operator, the square root reciprocal operator and the multiplication operator to form the splicing operator.

In one possible implementation, the operation of stitching comprises:

carrying out square operation on input data by utilizing the square operator to obtain a square operation result;

carrying out summation operation on a plurality of square operation results by using the convolution operator to obtain a summation operation result;

utilizing a square root reciprocal operator to sequentially carry out square root opening operation and reciprocal solving operation on the summation operation result to obtain a reciprocal operation result;

and performing multiplication operation on the input data and the reciprocal operation result by using the multiplication operator to obtain a normalization result.

In one possible implementation, the stitching operator is applied to an application layer in a software call hierarchy, the deep learning operator library is located at an operator library layer in the software call hierarchy, and the artificial intelligence processor is located at a chip layer in the software call hierarchy.

According to another aspect of the present disclosure, there is provided an arithmetic device, the device including:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring basic operators in an artificial intelligence operator library, and the basic operators are used for executing corresponding operation operations on input data;

and the operation module is connected with the acquisition module and used for forming a splicing operator by using the basic operator, and the splicing operator is used for executing splicing operation on the input data in the artificial intelligence processor so as to normalize the input data.

the operation module comprises a first operation submodule configured to:

In one possible implementation, the operation of stitching comprises:

the operation module comprises a second operation submodule configured to:

In one possible implementation, the operation of stitching comprises:

the operation module comprises a third operation sub-module configured to:

In one possible implementation, the operation of stitching comprises:

According to another aspect of the present disclosure, an artificial intelligence processing apparatus is provided, the apparatus comprising:

the main processor is used for executing the method to acquire a splicing operator, and the splicing operator is used for executing corresponding operation on the input data;

the artificial intelligence processor is electrically connected with the main processor;

the main processor is further configured to send input data and the splice operator to an artificial intelligence processor, the artificial intelligence processor configured to:

receiving the input data and the splicing operator sent by the main processor;

carrying out artificial intelligence operation on the input data by using the splicing operator to obtain an operation result;

and sending the operation result to the main processor.

In one possible embodiment, the main processor further comprises a main processor memory space for storing the splicing operator, wherein,

the main processor is also used for providing input data and a splicing operator stored in the storage space of the main processor.

In one possible implementation, the artificial intelligence processor transmits the operation result to the main processor through an I/O interface;

when the device comprises a plurality of artificial intelligence processors, the plurality of artificial intelligence processors can be connected through a specific structure and transmit data;

the plurality of artificial intelligence processors are interconnected through a PCIE bus of a fast peripheral equipment interconnection bus and transmit data so as to support larger-scale artificial intelligence operation; the artificial intelligence processors share the same control system or own respective control systems; the artificial intelligence processors share the memory or own the memories; the interconnection mode of the plurality of artificial intelligence processors is any interconnection topology.

In a possible implementation, the apparatus further includes: and the storage device is respectively connected with the artificial intelligence processor and the main processor and is used for storing the data of the artificial intelligence processor and the main processor.

According to another aspect of the present disclosure, an artificial intelligence chip is provided, which includes the artificial intelligence processing device.

According to another aspect of the present disclosure, an electronic device is provided, which includes the artificial intelligence chip.

According to another aspect of the present disclosure, a board card is provided, where the board card includes: the memory device, the interface device, the control device and the artificial intelligence chip;

wherein, the artificial intelligence chip is respectively connected with the storage device, the control device and the interface device;

the storage device is used for storing data;

the interface device is used for realizing data transmission between the chip and external equipment;

and the control device is used for monitoring the state of the chip.

In one possible implementation, the memory device includes: a plurality of groups of memory cells, each group of memory cells is connected with the chip through a bus, and the memory cells are: DDR SDRAM;

the chip includes: the DDR controller is used for controlling data transmission and data storage of each memory unit;

the interface device is as follows: a standard PCIE interface.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the above-described method.

By the method, the basic operator in the artificial intelligence operator library can be obtained, the splicing operator is formed by the basic operator, the formed splicing operator can be used for carrying out normalization processing on input data, a new artificial intelligence processor is supported, and therefore the operation efficiency of the new artificial intelligence processor in operation of the neural network model is improved.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flow chart of an operational method according to an embodiment of the present disclosure.

FIG. 2 shows a software call hierarchy diagram according to an embodiment of the present disclosure.

Fig. 3a-3c show schematic diagrams of a splice operator according to an embodiment of the present disclosure.

Fig. 4 shows a block diagram of an arithmetic device according to an embodiment of the present disclosure.

Fig. 5 shows a block diagram of an arithmetic device according to an embodiment of the present disclosure.

FIG. 6 shows a block diagram of an artificial intelligence processing apparatus according to an embodiment of the present disclosure.

FIG. 7 shows a block diagram of an artificial intelligence processing apparatus according to an embodiment of the present disclosure.

FIG. 8 illustrates a block diagram of an artificial intelligence processor according to an embodiment of the disclosure.

FIG. 9 shows a block diagram of the main processing circuit 331 according to an embodiment of the present disclosure.

FIG. 10 shows a schematic diagram of an artificial intelligence processor, according to an embodiment of the present disclosure.

FIG. 11 shows a schematic diagram of an artificial intelligence processor, according to an embodiment of the present disclosure.

Fig. 12 illustrates a board card according to an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Referring to fig. 1, fig. 1 is a flowchart illustrating a computing method according to an embodiment of the disclosure.

The method can be applied to a server or a terminal, and as shown in fig. 1, the method includes:

step S110, acquiring a basic operator in an artificial intelligence operator library, wherein the basic operator is used for executing corresponding operation on input data;

and step S120, forming a splicing operator by using the basic operator, wherein the splicing operator is used for executing splicing operation on input data in the artificial intelligence processor, so that normalization processing is performed on the input data.

The splicing operator formed by the method can be used as a part of artificial intelligence operation, when the splicing operator is applied to an artificial intelligence processor for artificial intelligence operation, the applications including but not limited to voice recognition, image recognition and the like can be realized, and the artificial intelligence processor can better realize the artificial intelligence operation by combining the deformation operator and the basic operator to form the splicing operator.

In one possible embodiment, the operators (operators) may be algorithms commonly used in artificial intelligence, also referred to as layers, operations, and nodes. An artificial intelligence operator library may be preset, and a plurality of basic operators (such as convolution operators, full-join operators, pooling operators, activation operators, etc.) may be included in the artificial intelligence operator library, and each basic operator may be called by a processor including, but not limited to, a central processing unit CPU, an image processor GPU, etc. to implement a corresponding basic function.

In a possible embodiment, the dimension of the first input data may be 4, and when the first input data is picture data, each dimension of the first input data may represent a number of pictures, a number of picture channels (channels), a picture height, and a picture width. In other embodiments, when the first input data is picture data, but the dimension of the first input data is less than 4 (e.g., 3), each dimension of the first input data may represent any 3 of the number of pictures, the number of picture channels, the height of the picture, and the width of the picture.

In a possible implementation, the basic operators include a first deformation operator (Reshape), a second deformation operator (Reshape), and a normalization index operator (softmax), where the first deformation operator and the second deformation operator are used for performing type conversion processing on the input data, and the normalization index operator is used for performing normalization operation, for example, when the input data is multi-dimensional data (e.g., four-dimensional), the normalization index operator may map all data of a specified dimension to 0-1, and the sum of the mapped data of the specified dimension is 1.

In one example, when the input data is a one-dimensional vector [ -3,2, -1,0], the normalization exponent operator may perform a normalization operation on the input data, thereby normalizing the input data to [0.0057,0.8390,0.0418,0.1135]), it can be seen that the sum of the normalized input data is 1.

In another example, when the input data is 2 x 3 two-dimensional data [ [1,1,1], [1,1,1] ] the second dimension of the input data may be subjected to a normalization operation, thereby obtaining normalized input data of [ [0.333,0.333,0.333], [0.333,0.333,0.333] ]; a normalization operation may be performed on a first dimension of the input data, which may be a starting dimension of the input data, to obtain normalized input data as [ [0.5,0.5,0.5], [0.5,0.5 ] ].

In one possible implementation, the step S120 of stitching the deformation operator and the base operator to form a stitching operator may include:

taking the second deformation operator as a post-stage operator of the normalization index operator;

wherein the first deformation operator is used for converting first input data of a first type into second input data of a second type, and the second deformation operator is used for converting output data of the second type output by the normalization exponential operator into output data of the first type.

The splicing operator formed by the mode can convert the input data by using the first deformation operator when the input data meets a certain condition, then perform normalization operation on the converted input data on a specified dimension by using the normalization index operator, and finally perform shape recovery conversion operation on the result of the normalization operation by using the second deformation operator, so that the result of the normalization operation is converted into the data with the same shape as the input data.

In one possible embodiment, when the splicing operator consists of the first deformation operator, the normalized index operator, and the second deformation operator, the splicing operation includes:

When the artificial intelligence processor performs an operation on the neural network, first input data, a first parameter, a second parameter and the like are input when a normalized exponential operator in the artificial intelligence operator library is called.

In one possible implementation, the first argument (previous _ shape) may indicate whether the shape of the input data is preserved, and the second argument (multi _ output) may indicate whether the shape of the input data and the tag data is consistent.

In this embodiment, the first parameter may be set to true (true) when the input data shape is to be preserved, and set to false (false) when the input data shape is not to be preserved.

In the present embodiment, the second parameter may be set to false (false) when the shape of the input data is the same as the shape of the label data in the training of the neural network model, and the second parameter may be set to true (true) in other cases.

Of course, the above description is exemplary, and one of ordinary skill in the art may set the values of the first parameter and the second parameter as desired.

In a possible implementation manner, when the dimensionality of the first input data of the first type is greater than 2 and a first parameter and a second parameter carried in the first input data satisfy a preset condition, the first input data is converted into a second type by using the first transformation operator, where the dimensionality of the first input data of the second type is 2, where:

the preset condition may be that the first parameter and the second parameter are both false.

After normalization operation is performed on the first input data of the second type converted by the first deformation operator by using the normalization index operator, a normalized result (output data of the second type) can be obtained, and the data type of the normalized result is the same as that of the first input data of the second type, and the normalized result and the first input data of the second type are two-dimensional data.

In order to make the output data and the input data have the same shape, the normalized result may be converted by using a second deformation operator to obtain the first type of output data, and after the conversion processing by the second deformation operator, the first type of output data has the same shape as the first type of input data, for example, the first type of output data is four-dimensional data, and the arrangement order of each dimension is the same as that of the first type of input data.

In an application example, when speech recognition and image processing are required to be performed by using artificial intelligence operation, normalization index operation may be performed by using a normalization index concatenation operator (a first transformation operator + a full join operator + a second transformation operator) in an embodiment of the present disclosure, so that when input data satisfies a certain condition, the input data is converted by using the normalization index concatenation operator to convert input data larger than 2 dimensions into 2-dimensional data, the converted 2-dimensional input data is normalized in a second dimension to obtain output data, and then a data shape of the output data is converted to be the same as that of the input data. By adopting the normalization index splicing operator disclosed by the disclosure, the artificial intelligence operation can be more conveniently executed to realize applications including but not limited to image processing, voice recognition and the like, so that the efficiency of the artificial intelligence operation is improved.

In a possible implementation, the basic operator includes a scaling operator and a normalization operator, the scaling operator is used for scaling the input data, and the normalization operator is used for normalizing the input data.

In one possible embodiment, the scaling operator may be implemented by the following formula:

y ═ nX + m, where X denotes input data, n denotes a scaling multiple, m denotes a displacement number, and Y denotes a result of a scaling operation, where n and m may denote scalars or tensors, and a scaling operator obtained according to this formula operates similarly to a vector multiplication operation.

In one possible implementation, the operation of stitching comprises:

and scaling the normalization result by using the scaling operator so as to obtain the scaled normalization result.

In one possible embodiment, the normalization operator may be implemented by the following formula:

wherein data [: i,: … ] is input data, data _ mean [ i ] is an expected value, and data _ var [ i ] is a standard deviation and is a constant.

In one example, if the normalized result out [: i,: … ] of the input data [: i,: … ] is to be obtained:

wherein gamma [ i ] is a scaling factor, beta [ i ] represents a displacement number,

the normalization operator may be used to normalize the input data [: i,: … ] to obtain the normalization result shown in formula 1, and then the scaling operator may be used to scale the normalization result of the normalization operator to obtain the scaled normalization result shown in formula 2.

In an application example, when it is necessary to perform speech recognition and image processing by using artificial intelligence operation, normalization operation may be performed by using a normalization concatenation operator (normalization operator + scaling operator) in an embodiment of the present disclosure, so that when normalization processing is performed on input data to obtain a normalization result as shown in formula 2, normalization processing may be performed on the input data by using the normalization concatenation operator to obtain a normalization result as shown in formula 1, and scaling operation may be performed on the normalized input data by using the normalization concatenation operator to obtain a final operation result. By adopting the normalization splicing operator disclosed by the disclosure, the artificial intelligence operation can be more conveniently executed to realize applications including but not limited to image processing, voice recognition and the like, so that the efficiency of the artificial intelligence operation is improved.

By the mode, the splicing operator can be formed by utilizing the scaling operator and the normalization operator, and the splicing operator can perform normalization processing and scaling processing on the input data, so that the normalization result of the input data is obtained.

step S120 forms a concatenation operator (L2_ normalization) using the base operator, including:

In one possible implementation, the operation of stitching comprises:

for example, taking an RGB picture as an input data, the channel of which is 3, the convolution operator sums the results of multiple square operations, that is, calculates the sum of squares of corresponding pixels in three channels, and then puts the results in R, G, B channels respectively.

In one possible embodiment, the number of convolution kernels is the same as the number of channels, the size of the convolution kernels is 1 × 1, and the weights are all 1. This is done by summing the data of all incoming channels and putting the result of the summation into each channel.

In one possible embodiment, the splicing operation of the above splicing operator can be represented by the following formula:

out [: i, # thomson ]: data [: i, # thomson ]/sqrt (data [: i, # 2), where i is an integer from 1 to N, out [: i,: is the normalized result of the concatenation operator output, data [: i,: is the input data, data [: i,: is the input data,: 2 is the squaring operation performed on the input data, sum [: i,: is the summing operation performed on the squaring result, sqrt (data [: i,: is,: 2) is the square root operation performed on the summing result and taking the reciprocal.

In one application example, the splice operator may be applied to a neural network for target detection in a Single Shot MultiBox Detector (SSD).

In an application example, when it is necessary to perform speech recognition and image processing by using artificial intelligence operation, normalization operation may be performed by using a normalization concatenation operator (a square operator, a convolution operator, a reciprocal square root operator, and a multiplication operator) in an embodiment of the present disclosure, so that when it is necessary to perform normalization processing on input data to obtain a normalization result as shown in formula 2, normalization processing may be performed on the input data by using the normalization concatenation operator to obtain the normalization result. By adopting the normalization splicing operator disclosed by the disclosure, the artificial intelligence operation can be more conveniently executed to realize applications including but not limited to image processing, voice recognition and the like, so that the efficiency of the artificial intelligence operation is improved.

Referring to fig. 2, fig. 2 is a schematic diagram illustrating a software call hierarchy according to an embodiment of the present disclosure.

As shown in fig. 2, the software calling hierarchy sequentially includes, from top to bottom, an application layer, a framework layer, an operator library layer, a driver layer, and a chip layer, where the splicing operator obtained by the foregoing operation method may be applied to the application layer, the artificial intelligence operator library may be in the operator library layer, the artificial intelligence processor may be located in the chip layer, and the driver layer may include a driver for driving the chip layer to operate.

According to the introduction, after the splicing operator is formed by utilizing the deformation operator and the basic operator in the operator library layer, the splicing operator can be directly called by the application program layer to be applied to the application program layer, so that the corresponding function is realized in the artificial intelligence operation, the condition that the deformation operator and the basic operator need to be called from the operator library layer every time when the application program layer needs to carry out the artificial intelligence operation is avoided, and the execution process of the artificial intelligence operation is improved.

Referring to fig. 3a-3c, fig. 3a-3c are schematic diagrams illustrating a stitching operator according to an embodiment of the present disclosure.

As shown in fig. 3a, the splicing operator includes:

a first deformation operator 11, configured to convert the first input data into a second type, where a dimension of the first input data of the second type is 2;

the normalization index operator 12 is configured to perform normalization operation on the first input data of the second type in a second dimension to output data of the second type;

a second deformation operator 13 for converting the output data of the second type into output data of the first type.

Wherein the splicing the deformation operator with the base operator to form a splicing operator comprises:

taking the first deformation operator 11 as a preceding stage operator of the normalization index operator 12;

and taking the second deformation operator 13 as a subsequent operator of the normalized index operator 12.

In one possible implementation, the arithmetic operation includes:

when the first type represents that the dimensionality of the first input data is greater than 2 and a first parameter and a second parameter carried in the first input data meet a preset condition, converting the first input data into a second type by using the first deformation operator 11, wherein the second type represents that the dimensionality of the converted first input data is 2;

normalizing the first input data of the second type in a second dimension by using the normalization index operator 12 to output data of the second type;

the second type of output data is converted into the first type of output data by means of the second deformation operator 13.

Through the splicing operator, when a first parameter and a second parameter carried in input data meet a preset condition, the first deformation operator is utilized to convert the first input data into a second type, the normalization index operator is utilized to perform normalization operation on the first input data of the second type in a second dimension so as to output data of the second type, and the second deformation operator is utilized to convert the output data of the second type into the output data of the first type.

As shown in fig. 3b, in a possible implementation, the basic operator includes a normalization operator 21 and a scaling operator 22, where the normalization operator 21 is configured to perform normalization processing on input data, and the splicing operation includes:

As shown in fig. 3c, in one possible implementation, the basic operators include a square operator 31, a convolution operator 32, a square root reciprocal operator 33 and a multiplication operator 34, the square operator 31 is used for performing a square operation on the input data, the convolution operator 32 is used for performing a summation operation on the input data, the square root reciprocal operator 33 is used for performing a square root and reciprocal operation on the input data, the multiplication operator 34 is used for performing a multiplication operation on the input data, wherein,

the forming a splicing operator by using the basic operator comprises:

and sequentially splicing the square operator 31, the convolution operator 32, the square root reciprocal operator 33 and the multiplication operator 34 to form the splicing operator.

In one possible implementation, the operation of stitching comprises:

Referring to fig. 4, fig. 4 is a block diagram of an arithmetic device according to an embodiment of the disclosure.

As shown in fig. 4, the apparatus includes:

an obtaining module 80, configured to obtain a basic operator in an artificial intelligence operator library, where the basic operator is used to perform a corresponding operation on input data;

and the operation module 90 is connected to the acquisition module 80 and configured to form a splicing operator by using the basic operator, where the splicing operator is configured to perform a splicing operation on the input data in the artificial intelligence processor, so as to perform normalization processing on the input data.

Through the device, this openly can acquire the basic operator in the artificial intelligence operator storehouse, utilizes basic operator forms the concatenation operator, and the concatenation operator that forms can be used for carrying out normalization to the input data, supports new artificial intelligence treater to improve the operational efficiency of new artificial intelligence treater when carrying out the operation of neural network model.

Referring to fig. 5, fig. 5 is a block diagram of an arithmetic device according to an embodiment of the disclosure.

In one possible implementation, as shown in fig. 5, the basic operator includes a first deformation operator, a second deformation operator and a normalization index operator, the first deformation operator and the second deformation operator are used for performing type conversion processing on the input data, and the normalization index operator is used for performing normalization operation, wherein,

the arithmetic module 90 comprises a first arithmetic sub-module 910, the first arithmetic sub-module 910 being configured to:

In one possible implementation, the operation of stitching comprises:

the operation module 90 includes a second operation submodule 920, and the second operation submodule 920 is configured to:

In one possible implementation, the operation of stitching comprises:

the arithmetic module 90 comprises a third arithmetic sub-module 930, the third arithmetic sub-module 930 being configured to:

In one possible implementation, the operation of stitching comprises:

Referring to fig. 6, fig. 6 shows a block diagram of an artificial intelligence processing device according to an embodiment of the present disclosure.

In one possible embodiment, as shown in figure 6,

a main processor 50, configured to execute the method to obtain a splicing operator, where the splicing operator is configured to perform a corresponding operation on the input data;

an artificial intelligence processor 60 electrically connected to the main processor 50;

the main processor 50 is further configured to send the input data and the splice operator to an artificial intelligence processor 60, the artificial intelligence processor 60 being configured to:

receiving the input data and the splicing operator sent by the main processor 50;

the operation result is sent to the main processor 50.

In a possible embodiment, the main processor 50 may include a main processor storage space for storing a splicing operator obtained by the main processor 50 executing the operation method, wherein,

the main processor 50 is also arranged to provide input data and a splicing operator stored in the main processor memory space.

It should be understood that the main processor 50 may execute the operation method after obtaining the data to obtain the splicing operator, and simultaneously transmit the obtained splicing operator to the artificial intelligence processor 60 for processing. The main processor 50 may also send the stored splicing operator to the artificial intelligence processor 60, so as to send the pre-stored splicing operator to the artificial intelligence processor 60, and the artificial intelligence processor 60 performs artificial intelligence operation according to the received splicing operator and the input data. The former method can be regarded as an on-line real-time processing method, and the latter method can be regarded as an off-line processing method.

In one possible embodiment, the apparatus shown in fig. 4 and 5 may be implemented in the main processor 50.

In one possible embodiment, the main processor 50 may be a central processing unit CPU, or may be another type of processor, such as an image processor GPU. It should be understood that the splicing operator is obtained by the foregoing operation method, and for a specific introduction of the splicing operator, reference is made to the foregoing description, and details are not repeated here.

In one possible embodiment, the artificial intelligence processing means may be formed by a plurality of identical processors, for example a plurality of processors (XPU) forming an architecture similar to the main processor 50+ artificial intelligence processor 60. Or may be formed by a processor, in which case the processor may perform the above-mentioned operation method to obtain the splicing operator, or may perform an artificial intelligence operation on the input data through the splicing operator to obtain the output result. In this embodiment, the type of the processor may be an existing type, or may be a new type of processor, which is not limited in this disclosure.

In one possible embodiment, the main processor 50 may be used as an interface for interfacing the artificial intelligence processing apparatus with external data and control, including data transportation, and completing basic control of starting, stopping, etc. of the artificial intelligence processing apparatus; other processing devices can cooperate with the artificial intelligence processing device to complete the operation task.

In one possible embodiment, the artificial intelligence processing apparatus may include more than one artificial intelligence processor, and the artificial intelligence processors may be linked and transmit data through a specific structure, for example, through a PCIE bus to interconnect and transmit data, so as to support operations of larger-scale machine learning. At this time, the same control system may be shared, or there may be separate control systems; the memory may be shared or there may be separate memories for each accelerator. In addition, the interconnection mode can be any interconnection topology.

The artificial intelligence processing device has high compatibility and can be connected with various types of servers through PCIE interfaces.

Referring to fig. 7, fig. 7 shows a block diagram of an artificial intelligence processing device according to an embodiment of the present disclosure.

In one possible embodiment, as shown in FIG. 7, the main processor 50 and the artificial intelligence processor 60 may be connected via a common interconnect interface (e.g., an I/O interface) for transferring data and control instructions between the main processor 50 and the artificial intelligence processor 60. The artificial intelligence processor 60 obtains the required input data (including the splice operator) from the main processor 50, writes it into a storage device on the artificial intelligence processing device chip; control instructions can be obtained from the main processor 50 and written into a control cache on the artificial intelligence processing device chip; the data in the memory module of the artificial intelligence processor 60 may also be read and transmitted to other processing devices.

In a possible embodiment, the artificial intelligence processing means may further comprise storage means, the storage means being connected to the artificial intelligence processing means and the other processing means, respectively. The storage device is used for storing data stored in the artificial intelligence processing device and the other processing devices, and is particularly suitable for data which is required to be calculated and cannot be stored in the artificial intelligence processing device or other processing devices.

The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface. Through above artificial intelligence processing apparatus, this disclosure can transmit input data and concatenation operator to artificial intelligence treater through the host processor, and artificial intelligence handles the operation that utilizes the concatenation operator to carry out artificial intelligence operation to input data to obtain the operation result, and send the operation result to the host processor.

It should be appreciated that the artificial intelligence processor 60 can be a single processor that can be used for artificial intelligence operations, or a combination of a plurality of different processors. The artificial intelligence processor is applied to artificial intelligence operation, and the artificial intelligence operation comprises machine learning operation, brain-like operation and the like. The machine learning operation comprises neural network operation, k-means operation, support vector machine operation and the like. The artificial intelligence processor 60 may specifically include one or a combination of a GPU (Graphics Processing Unit), an NPU (Neural-Network Processing Unit), a DSP (Digital Signal Processing), and a Field Programmable Gate Array (FPGA) chip.

In one possible implementation, the artificial intelligence processor 60 is shown in FIG. 8. Turning to FIG. 8, FIG. 8 illustrates a block diagram of an artificial intelligence processor according to an embodiment of the disclosure.

As shown in fig. 8, the artificial intelligence processor 30 includes a control module 32, an operation module 33 and a storage module 31, wherein the operation module 33 includes a master processing circuit 331 and a plurality of slave processing circuits 332 (the number of slave processing circuits is exemplary in the figure).

The control module 32 is used for acquiring input data and calculating instructions;

the control module 32 is further configured to analyze the calculation instruction to obtain a plurality of operation instructions, and send the plurality of operation instructions and the input data to the main processing circuit 331;

the master processing circuit 331 is configured to perform preamble processing on the input data and transmit data and operation instructions with the plurality of slave processing circuits;

the plurality of slave processing circuits 332 are configured to perform intermediate operations in parallel according to the data and the operation instructions transmitted from the master processing circuit 331 to obtain a plurality of intermediate results, and transmit the plurality of intermediate results to the master processing circuit 331;

the main processing circuit 331 is configured to perform subsequent processing on the plurality of intermediate results to obtain a calculation result of the calculation instruction.

After receiving the input data and the calculation instruction, the artificial intelligence processor 30 according to the present disclosure performs a corresponding operation on the input data, thereby obtaining the calculation result.

The artificial intelligence processor described in this disclosure can support machine learning as well as some non-machine learning artificial intelligence algorithms.

The above calculation instructions include, but are not limited to: the present application is not limited to the specific representation of the above-mentioned calculation instruction.

In one possible embodiment, after the artificial intelligence process 30 obtains the calculation result, the calculation result may be sent to other processors such as a central processor CPU or an image processor GPU.

The operation instruction is executable code obtained by the artificial intelligence processor 30 according to the splicing operator, and the executable code includes but is not limited to: the present disclosure is not limited to the specific expression of the above-mentioned computation instruction, such as a forward operation instruction, a backward training instruction, or other neural network operation instructions.

In one possible implementation, the artificial intelligence processor 30 may be obtained through a data transmission module 360, and the data transmission module 360 may specifically be one or more data I/O interfaces or I/O pins.

The master processing circuit 331 is configured to perform a preamble processing on the data to be operated to obtain processed operation data, and transmit at least one of the operation data, the intermediate result, and the operation instruction with the plurality of slave processing circuits.

Referring also to fig. 9, fig. 9 is a block diagram of the main processing circuit 331 according to an embodiment of the disclosure.

As shown in fig. 9, the main processing circuit 331 may include: one or any combination of the conversion processing circuit 113, the activation processing circuit 111, and the addition processing circuit 112.

The conversion processing circuit 113 is configured to perform the preamble processing on the data, where the preamble processing may be: performing an interchange between the first data structure and the second data structure (e.g., conversion of continuous data to discrete data) with the data or intermediate results received by the main processing circuit 331; or perform an interchange between the first data type and the second data type (e.g., conversion of fixed point type to floating point type) with data or intermediate results received by the main processing circuit 331.

The activation processing circuit 111 is configured to perform the subsequent processing, specifically, perform an activation operation on data in the main processing circuit 331;

the addition processing circuit 112 is configured to perform the subsequent processing, specifically, perform an addition operation or an accumulation operation.

Each slave processing circuit 332 is configured to perform an intermediate operation according to the operation data and the operation instruction transmitted by the master processing circuit 331 to obtain an intermediate result, and transmit the intermediate result to the master processing circuit 331;

the main processing circuit 331 is configured to perform subsequent processing on the plurality of intermediate results to obtain a final calculation result of the operation instruction.

The control module 32 is further configured to generate a debugging result according to the status information, and output the debugging result to the status information obtaining apparatus 40.

The storage module 31 is configured to store state information in the operation process according to an operation instruction, where the state information includes at least one of state information in a preamble processing process of the main processing circuit 331, state information in an intermediate operation process of the plurality of slave processing circuits 332, and state information in a subsequent processing process of the main processing circuit 331. The memory module may include an on-chip storage submodule 310, and the on-chip storage submodule 310 may include a scratch pad memory.

The storage module 31 may further include one or any combination of a register and a cache, specifically, the cache is used for storing the calculation instruction; the register is used for storing the neural network model, the data and a scalar; the cache is a scratch pad cache.

In one possible implementation, the control module 32 may include: an instruction cache submodule 320, an instruction processing submodule 321 and a storage queue submodule 323;

the instruction cache submodule 320 is used for storing the calculation instructions related to the neural network model;

the instruction processing submodule 321 is configured to analyze the calculation instruction to obtain a plurality of operation instructions;

a store queue submodule 323 for storing an instruction queue, the instruction queue comprising: and a plurality of operation instructions or calculation instructions to be executed according to the front and back sequence of the queue.

For example, in one possible implementation, the main processing circuit 331 may also include a control module 32, and the control module 32 may include a main instruction processing sub-module, specifically for decoding instructions into microinstructions. Of course, in one possible embodiment, the slave processing circuit 332 may also include another control module 32, and the other control module 32 includes a slave instruction processing submodule, specifically for receiving and processing microinstructions. The micro instruction may be a next-stage instruction of the instruction, and the micro instruction may be obtained by splitting or decoding the instruction, and may be further decoded into control signals of each component, each module, or each processing circuit.

In one alternative, the structure of the calculation instruction may be as shown in the following table one.

Watch 1

Operation code

Registers or immediate data

Register/immediate

...

The ellipses in the above table indicate that multiple registers or immediate numbers may be included.

In another alternative, the computing instructions may include: one or more operation domains and an opcode. The computation instructions may include neural network operation instructions. Taking the neural network operation instruction as an example, as shown in table 1, register number 0, register number 1, register number 2, register number 3, and register number 4 may be operation domains. Each of register number 0, register number 1, register number 2, register number 3, and register number 4 may be a number of one or more registers. For example, as shown in table two below.

Watch two

The register may be an off-chip memory, and in practical applications, may also be an on-chip memory for storing data, where the data may specifically be t-dimensional data, where t is an integer greater than or equal to 1, and for example, when t is equal to 1, the data is 1-dimensional data, that is, a vector, and when t is equal to 2, the data is 2-dimensional data, that is, a matrix, and when t is equal to 3 or more, the data is a multidimensional tensor.

Optionally, the control module 32 may further include:

the dependency relationship processing sub-module 322 is configured to, when there are multiple operation instructions, determine whether an association relationship exists between a first operation instruction and a zeroth operation instruction before the first operation instruction, if the association relationship exists between the first operation instruction and the zeroth operation instruction, cache the first operation instruction in the instruction cache sub-module, and after the zeroth operation instruction is executed, extract the first operation instruction from the instruction cache sub-module and transmit the first operation instruction to the operation module;

the determining whether the first operation instruction has an association relationship with a zeroth operation instruction before the first operation instruction comprises:

extracting a first storage address interval of required data (such as a matrix) in the first operation instruction according to the first operation instruction, extracting a zeroth storage address interval of the required matrix in the zeroth operation instruction according to the zeroth operation instruction, if the first storage address interval and the zeroth storage address interval have an overlapped area, determining that the first operation instruction and the zeroth operation instruction have an association relation, and if the first storage address interval and the zeroth storage address interval do not have an overlapped area, determining that the first operation instruction and the zeroth operation instruction do not have an association relation.

Referring to FIG. 10, FIG. 10 shows a schematic diagram of an artificial intelligence processor according to an embodiment of the disclosure.

In one possible implementation, the operation module 33, as shown in fig. 10, may include a branch processing circuit 333; the specific connection structure is shown in fig. 10, wherein,

the main processing circuit 331 is connected to the branch processing circuit 333, and the branch processing circuit 333 is connected to the plurality of slave processing circuits 332;

a branch processing circuit 333 for executing data or instructions between the forwarding main processing circuit 331 and the slave processing circuit 332.

In one possible implementation, taking a fully-connected operation in the neural network operation as an example, the process may be: f (wx + b), where x is an input neuron matrix, w is a weight matrix, b is a bias scalar, and f is an activation function, and may specifically be: sigmoid function, tanh, relu, softmax function. Here, a binary tree structure is assumed, and there are 8 slave processing circuits, and the implementation method may be:

the control module acquires an input neuron matrix x, a weight matrix w and a full-connection operation instruction from the storage module 31, and transmits the input neuron matrix x, the weight matrix w and the full-connection operation instruction to the main processing circuit;

the main processing circuit splits the input neuron matrix x into 8 sub-matrices, then distributes the 8 sub-matrices to 8 slave processing circuits through a tree module, broadcasts a weight matrix w to the 8 slave processing circuits,

the slave processing circuit executes multiplication and accumulation operation of the 8 sub-matrixes and the weight matrix w in parallel to obtain 8 intermediate results, and the 8 intermediate results are sent to the master processing circuit;

and the main processing circuit is used for sequencing the 8 intermediate results to obtain a wx operation result, executing the offset b operation on the operation result, executing the activation operation to obtain a final result y, sending the final result y to the control module, and outputting or storing the final result y into the storage module 31 by the control module.

The method for executing the neural network forward operation instruction by the neural network operation device shown in fig. 10 may specifically be:

the control module 32 extracts an operation domain corresponding to the operation data (e.g., a neural network forward operation instruction, a neural network operation instruction) and at least one operation code from the storage module 31, and the control module 32 transmits the operation domain to the data access module and sends the at least one operation code to the operation module.

The control module 32 extracts the weight w and the offset b corresponding to the operation domain from the storage module 31 (when b is 0, the offset b does not need to be extracted), transmits the weight w and the offset b to the main processing circuit of the operation module, extracts the input data Xi from the storage module 31, and transmits the input data Xi to the main processing circuit.

The main processing circuit splits input data Xi into n data blocks;

the instruction processing submodule 321 of the control module 32 determines a multiplication instruction, an offset instruction and an accumulation instruction according to the at least one opcode, and sends the multiplication instruction, the offset instruction and the accumulation instruction to the master processing circuit, the master processing circuit sends the multiplication instruction and the weight w to a plurality of slave processing circuits in a broadcast manner, and distributes the n data blocks to the plurality of slave processing circuits (for example, if there are n slave processing circuits, each slave processing circuit sends one data block); the plurality of slave processing circuits are used for executing multiplication operation on the weight w and the received data block according to the multiplication instruction to obtain an intermediate result, sending the intermediate result to the main processing circuit, executing accumulation operation on the intermediate result sent by the plurality of slave processing circuits according to the accumulation instruction by the main processing circuit to obtain an accumulation result, executing deflection operation b on the accumulation result according to the deflection instruction to obtain a final result, and sending the final result to the control module.

In addition, the order of addition and multiplication may be reversed.

According to the technical scheme, multiplication and offset operation of the neural network are achieved through one instruction, namely the neural network operation instruction, storage or extraction is not needed in the intermediate result of the neural network calculation, and storage and extraction operations of intermediate data are reduced, so that the method has the advantages of reducing corresponding operation steps and improving the calculation effect of the neural network.

Referring to FIG. 11, FIG. 11 shows a schematic diagram of an artificial intelligence processor according to an embodiment of the disclosure.

In one possible implementation, the operation module 33, as shown in fig. 11, may include a master processing circuit 331 and a plurality of slave processing circuits 332.

In one possible embodiment, as shown in fig. 11, a plurality of slave processing circuits are distributed in an array; each slave processing circuit is connected with other adjacent slave processing circuits, the master processing circuit is connected with k slave processing circuits in the plurality of slave processing circuits, and the k slave processing circuits are as follows: it should be noted that, as shown in fig. 11, the K slave processing circuits include only the n slave processing circuits in the 1 st row, the n slave processing circuits in the m th row, and the m slave processing circuits in the 1 st column, that is, the K slave processing circuits are slave processing circuits directly connected to the master processing circuit among the plurality of slave processing circuits.

And the K slave processing circuits are used for forwarding data and instructions between the main processing circuit and the plurality of slave processing circuits.

In some embodiments, a chip is also claimed, which includes the artificial intelligence processing apparatus.

In some embodiments, a chip package structure is provided, which includes the above chip.

In some embodiments, a board card is provided, which includes the above chip package structure.

Referring to fig. 12, fig. 12 shows a board card according to an embodiment of the present disclosure, where the board card may include other kit components besides the chip 389, where the kit components include, but are not limited to: memory device 390, interface device 391 and control device 392;

the memory device 390 is connected to the chip in the chip package structure through a bus for storing data. The memory device may include a plurality of groups of memory cells 393. Each group of the storage units is connected with the chip through a bus. It is understood that each group of the memory cells may be a DDR SDRAM (Double Data Rate SDRAM).

DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the chip may internally include 4 72-bit DDR4 controllers, and 64 bits of the 72-bit DDR4 controller are used for data transmission, and 8 bits are used for ECC check. It can be understood that when DDR4-3200 particles are adopted in each group of memory cells, the theoretical bandwidth of data transmission can reach 25600 MB/s.

In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each memory unit.

The interface device is electrically connected with a chip in the chip packaging structure. The interface device is used for realizing data transmission between the chip and an external device (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the interface device may also be another interface, and the present application does not limit the concrete expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation result of the chip is still transmitted back to an external device (e.g., a server) by the interface device.

The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). The chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may carry a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing andor a plurality of processing circuits in the chip.

In some embodiments, an electronic device is provided that includes the above board card.

The electronic device comprises a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.

The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some interfaces, and may be in an electrical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a form of hardware or a form of a software program module.

The integrated modules, if implemented in the form of software program modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of operation, the method being applied to a main processor of an artificial intelligence processing apparatus, the artificial intelligence processing apparatus further comprising an artificial intelligence processor connected to the main processor, the method comprising:

acquiring a basic operator in an artificial intelligence operator library of an operator library layer in a software calling level, wherein the basic operator is used for executing corresponding artificial intelligence operation on input data;

forming a splice operator using the base operator, the splice operator comprising executable code,

and directly calling the splicing operator through an application program layer in the software calling hierarchy, so that an artificial intelligence processor of a chip layer in the software calling hierarchy executes corresponding splicing operation on input data to execute artificial intelligence operation.

2. The method of claim 1, wherein the base operator comprises a first deformation operator, a second deformation operator, and a normalization index operator, the first deformation operator and the second deformation operator are used for performing type conversion processing on input data, and the normalization index operator is used for performing normalization operation, wherein,

the forming a splicing operator by using the basic operator comprises:

3. The method of claim 2, wherein the stitching operation comprises:

when the dimensionality of first input data of a first type is larger than 2 and a first parameter and a second parameter carried in the first input data meet a preset condition, converting the first input data into a second type by using a first deformation operator, wherein the dimensionality of the first input data of the second type is 2;

4. The method of claim 1, wherein the base operator comprises a scaling operator for scaling the input data and a normalization operator for normalizing the input data, wherein,

the forming a splicing operator by using the basic operator comprises:

5. The method of claim 4, wherein the stitching operation comprises:

6. The method of claim 1, wherein the base operators comprise a squaring operator for squaring the input data, a convolution operator for summing the input data, a square root reciprocal operator for squaring the input data, and a multiplication operator for multiplying the input data, wherein,

the forming a splicing operator by using the basic operator comprises:

7. The method of claim 6, wherein the stitching operation comprises:

8. A computing device for use in a main processor of an artificial intelligence processing device, the artificial intelligence processing device further comprising an artificial intelligence processor coupled to the main processor, the device comprising:

the system comprises an acquisition module, a data processing module and a data processing module, wherein the acquisition module is used for acquiring a basic operator in an artificial intelligence operator library of an operator library layer in a software calling level, and the basic operator is used for executing corresponding artificial intelligence operation on input data;

and the operation module is connected with the acquisition module and used for forming a splicing operator by using the basic operator, the splicing operator comprises an executable code, and the splicing operator is directly called through an application program layer in the software calling layer, so that the artificial intelligence processor of the chip layer in the software calling layer executes corresponding splicing operation on input data to execute artificial intelligence operation.

9. The apparatus of claim 8, wherein the base operator comprises a first deformation operator, a second deformation operator, and a normalization index operator, the first deformation operator and the second deformation operator are used for performing type conversion processing on input data, and the normalization index operator is used for performing normalization operation, wherein,

the operation module comprises a first operation submodule configured to:

10. The apparatus of claim 9, wherein the stitching operation comprises:

11. The apparatus of claim 8, wherein the base operator comprises a scaling operator for scaling the input data and a normalization operator for normalizing the input data, wherein,

the operation module comprises a second operation submodule configured to:

12. The apparatus of claim 11, wherein the stitching operation comprises:

13. The apparatus of claim 8, wherein the base operators comprise a squaring operator for squaring the input data, a convolution operator for summing the input data, a square root reciprocal operator for squaring the input data, and a multiplication operator for multiplying the input data, wherein,

the operation module comprises a third operation sub-module configured to:

14. The apparatus of claim 13, wherein the stitching operation comprises:

15. An artificial intelligence processing apparatus, the apparatus comprising:

a host processor for executing the method according to any one of claims 1 to 7 to obtain a splicing operator for performing a corresponding arithmetic operation on the input data;

receiving the input data and the splicing operator sent by the main processor;

and sending the operation result to the main processor.

16. The apparatus of claim 15, wherein the host processor further comprises a host processor memory space to store the stitching operator, wherein,

17. The apparatus of claim 15, wherein the artificial intelligence processor passes results of the operations to the host processor via an I/O interface;

18. The apparatus of claim 15, further comprising: and the storage device is respectively connected with the artificial intelligence processor and the main processor and is used for storing the data of the artificial intelligence processor and the main processor.

19. An artificial intelligence chip, wherein the artificial intelligence chip comprises an artificial intelligence processing apparatus according to any one of claims 15-18.

20. An electronic device, characterized in that it comprises a chip according to claim 19.

21. The utility model provides a board card, its characterized in that, the board card includes: a memory device, an interface device and a control device and an artificial intelligence chip according to claim 19;

the storage device is used for storing data;

and the control device is used for monitoring the state of the chip.

22. The card of claim 21,

the memory device includes: a plurality of groups of memory cells, each group of memory cells is connected with the chip through a bus, and the memory cells are: DDR SDRAM;

the interface device is as follows: a standard PCIE interface.