CN110851787B

CN110851787B - Merging instruction processing method and device, electronic equipment and storage medium

Info

Publication number: CN110851787B
Application number: CN202010034696.5A
Authority: CN
Inventors: 不公告发明人
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2020-01-14
Filing date: 2020-01-14
Publication date: 2020-05-08
Anticipated expiration: 2040-01-14
Also published as: CN110851787A

Abstract

The disclosure relates to a merging instruction processing method, a merging instruction processing device, an electronic device and a storage medium. The device includes: the instruction analysis circuit analyzes the obtained merging instruction to obtain an operation code and an operation domain of the merging instruction; the data acquisition circuit acquires first data and second data based on the operation code and the operation domain and caches the first data and/or the second data to the storage circuit; the parameter determining circuit determines a merging parameter required for merging processing; the processing circuit reads the cached first data and/or second data from the storage circuit, and merges the first data and the second data by using the arithmetic unit according to the merging parameter to obtain merged data and stores the merged data, wherein the merging instruction comprises at least one of a different-dimension merging instruction and a same-dimension merging instruction.

Description

Merging instruction processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a merge instruction processing method and apparatus, an electronic device, and a storage medium.

Background

Merging data generally refers to merging two data together to obtain new data. For example, two vectors are merged together, or a vector and a matrix are merged together. In the related art, the data merging process can be realized by a processor such as a CPU, but the data merging process is complex, the type of data that can be merged is single, and the cache is large, and off-chip data transportation is required to be performed continuously when large-scale merging operation is performed, and off-chip bandwidth becomes a main performance bottleneck.

Disclosure of Invention

In view of the above, it is necessary to provide a merge instruction processing method, an apparatus, an electronic device, and a storage medium that can solve the above technical problems.

According to an aspect of the present disclosure, there is provided a merge instruction processing apparatus including:

the instruction analysis circuit is used for analyzing the obtained merging instruction to obtain an operation code and an operation domain of the merging instruction;

the data acquisition circuit is used for acquiring first data and second data required by merging processing based on the operation code and the operation domain and sending the first data and/or the second data to the storage circuit;

a storage circuit for caching the first data and/or the second data;

a parameter determination circuit for determining a merging parameter required for merging processing;

a processing circuit, configured to read the cached first data and/or the cached second data from the storage circuit, merge the first data and the second data by using an arithmetic unit according to the merge parameter to obtain merged data, and store the merged data,

wherein the merge instruction comprises at least one of a different-dimensional merge instruction and a same-dimensional merge instruction,

when the merging instruction is a different-dimension merging instruction, the dimension number of the first data is smaller than that of the second data,

when the merging instruction is a same-dimension merging instruction, the dimension number of the first data is equal to the dimension number of the second data,

the operation code is used for indicating that the processing of the merging instruction on the data is merging processing, and the operation domain comprises the first data address and the second data address.

According to another aspect of the present disclosure, there is provided a merge instruction processing method, the method including:

analyzing the obtained merging instruction to obtain an operation code and an operation domain of the merging instruction;

acquiring first data and second data required for merging processing based on the operation code and the operation domain;

caching the first data and/or the second data;

determining a merging parameter required for merging processing;

reading the cached first data and/or the cached second data, merging the first data and the second data by using an arithmetic unit according to the merging parameter to obtain merged data, and storing the merged data,

the operation code is used for indicating that the merging instruction processes data into merging processing, and the operation domain comprises a first data address and a second data address.

According to another aspect of the present disclosure, an artificial intelligence chip is provided, which includes the merge instruction processing apparatus.

According to another aspect of the present disclosure, an electronic device is provided, which includes the artificial intelligence chip.

According to another aspect of the present disclosure, a board card is provided, which includes: memory device, interface device and control device and above-mentioned artificial intelligence chip;

wherein, the artificial intelligence chip is respectively connected with the storage device, the control device and the interface device;

the storage device is used for storing data;

the interface device is used for realizing data transmission between the artificial intelligence chip and external equipment;

and the control device is used for monitoring the state of the artificial intelligence chip.

According to another aspect of the present disclosure, there is provided an electronic device including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to call the memory-stored instructions to perform the merged instruction processing method described above.

According to another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described merged instruction processing method.

The merging instruction processing method, the merging instruction processing device, the electronic device and the storage medium provided by the embodiment of the disclosure can realize merging of data with different or same dimensionality quantities through one merging instruction, are high in merging speed and small in occupied cache, can more flexibly and effectively support merging of data with different sizes when large-scale merging operation is carried out, and are simple in format and convenient to use.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a block diagram of a merge instruction processing apparatus according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram illustrating a merge instruction processing apparatus performing data merge based on a different-dimension merge instruction according to an embodiment of the present disclosure.

3 a-3 c illustrate a merging instruction processing apparatus according to an embodiment of the present disclosure performing vector and matrix merging.

Fig. 4 and 5 are schematic diagrams illustrating a merge instruction processing apparatus according to an embodiment of the present disclosure performing data merging based on a co-dimensional merge instruction.

FIG. 6 shows a flow diagram of a merge instruction processing method according to an embodiment of the disclosure.

Fig. 7 shows a block diagram of a board card according to an embodiment of the present disclosure.

Fig. 8 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure.

Fig. 9 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be understood that the terms "zero," "first," "second," "third," and "fourth," etc. in the claims, the description, and the drawings of the present disclosure are used for distinguishing between different objects and not for describing a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

The merge instruction Processing method and apparatus according to the embodiments of the present disclosure may be applied to a processor, which may be a general-purpose processor, such as a Central Processing Unit (CPU), or an artificial Intelligence Processor (IPU) for performing artificial intelligence operations. The artificial intelligence operations may include machine learning operations, brain-like operations, and the like. The machine learning operation comprises neural network operation, k-means operation, support vector machine operation and the like. The artificial intelligence processor may include, for example, one or a combination of a GPU (Graphics Processing Unit), a NPU (Neural-network Processing Unit), a DSP (Digital Signal Processing Unit), and a Field Programmable Gate Array (FPGA) chip. The present disclosure is not limited to a particular type of processor.

In the related art, when data are merged by using processors such as a CPU, the processors need to complete the data merging process serially and sequentially when the data are more, and the efficiency is low. And in the merging process, the data to be merged needs to be continuously acquired from the outside of the chip, so that the occupied bandwidth is large, and the speed and the efficiency of data merging are further limited. In addition, the instruction format for merging data in the related art is complex, the use is inconvenient, the size (length, width, etc.) of the data is limited, and merging processing between data of different sizes is difficult to realize. In order to solve the above problems, the present disclosure provides a merge instruction processing method, an apparatus, an electronic device, and a storage medium, which can implement merging of data with different or the same number of dimensions through one merge instruction, and have the advantages of high merging speed, less cache occupation, capability of more flexibly and effectively supporting merging of data with different sizes during large-scale merge operation, simplified format of the merge instruction, and convenience in use.

In many application scenarios, the different-dimension merging instruction provided by the present disclosure may be used for data merging. For example, for the processing of the data itself, in the process of storing the data or after storing the data, if part or all of the data needs to be adjusted and modified, the first data may be determined according to the adjustment that needs to be performed, and then the first data and the second data that needs to be adjusted are combined to obtain the adjusted data (i.e., the combined data). For example, assuming that data in a second row of a certain matrix needs to be modified into designated data, the designated data may be determined as first data, and then the first data is merged into the second row of the value matrix to obtain a modified matrix (i.e., merged data), where each data in the second row of the modified matrix is consistent with the corresponding designated data. Or, in the process of storing the data or after the data is stored, if the size of the data needs to be increased, the first data may be determined according to the specified data and the second data which need to be increased, and then the first data and the second data are combined to obtain combined data, where the combined data is the data with the increased size and used by the new added portion is the specified data. In operations such as neural networks, data with different dimensionality quantities are merged due to actual requirements such as task allocation, processing requirements and algorithm operation requirements. For example, because of the arithmetic operation requirement, two data with different dimensionalities need to be merged to obtain merged data, and subsequent operation processing is performed on the merged data.

In many application scenarios, the same-dimension merging instruction provided by the present disclosure can be used for data merging. For example, in the process of image processing, data in a partial region in an image is merged or blurred according to processing requirements, at this time, data in the region in the image that needs to be merged or blurred may be used as second data, data that can achieve the purpose of merging or blurring the region may be used as first data, and then the first data and the second data (the first data and the second data have the same number of dimensions) are merged to obtain merged data, so as to obtain an image that has been merged or blurred, thereby achieving blurring the image. In operations such as a neural network, intermediate data with the same dimensionality quantity is merged to obtain required data due to actual requirements such as task allocation, processing requirements, algorithm operation requirements and the like. For example, the initial data is divided into two sub-data according to the task allocation requirement, the sub-data is respectively subjected to corresponding operations to obtain sub-operation results, the two sub-operation results are respectively used as the first data and the second data to be combined, and the obtained combined data is the final operation result of the initial data. In the residual structure operation in the neural network, because of the operation requirement of the algorithm, the initial matrix (taking the matrix as an example) is firstly subjected to corresponding operation to obtain an operated matrix, and then the initial matrix and the operated matrix are merged (for example, the alignment addition is carried out) to obtain a final operation result of the residual operation based on the initial matrix. For the processing of the data itself, when part or all of the data needs to be adjusted and modified, the first data may be determined according to the adjustment that needs to be performed, and then the first data and the second data that needs to be adjusted are combined to obtain the adjusted data.

In some parallel processing scenarios, the initial data may be divided into a plurality of sub-data, each sub-data is merged with a different number of data corresponding to another dimension by using a different-dimension merging instruction,

it can be understood that the application scenarios of the merging instruction processing method, apparatus, electronic device and storage medium provided in the present disclosure are quite wide, and the merging instruction may be executed as a single instruction, or may be used as a component of implementing an operation method including data merging processing together with various other instructions.

Fig. 1 shows a block diagram of a merged instruction processing apparatus according to an embodiment of the present disclosure, which may be applied to a processor, as shown in fig. 1, and includes an instruction parsing circuit 11, a data acquisition circuit 12, a storage circuit 13, a parameter determination circuit 14, and a processing circuit 15.

The instruction analysis circuit 11 is configured to analyze the obtained merge instruction to obtain an operation code and an operation domain of the merge instruction;

a data obtaining circuit 12, configured to obtain, based on the operation code and the operation domain, first data and second data required for performing merging processing, and send the first data and/or the second data to a storage circuit;

a storage circuit 13, configured to buffer the received first data and/or the second data;

a parameter determination circuit 14 for determining a merging parameter required for the merging process;

a processing circuit 15, configured to read the cached first data and/or the cached second data from the storage circuit, merge the first data and the second data by using an arithmetic unit according to the merge parameter to obtain merged data, and store the merged data,

In this embodiment, the number of dimensions of the merged data may be the same as or different from the number of dimensions of the second data. When the number of dimensions of the merged data is different from the number of dimensions of the second data, the number of dimensions of the merged data may be equal to the sum of the number of dimensions of the first data and the number of dimensions of the second data, for example, if the first data is a vector and the second data is a matrix, the merged data may be a three-dimensional tensor; the number of dimensions of the merged data may be greater than the sum of the number of dimensions of the first data and the number of dimensions of the second data, for example, if the first data is a vector and the second data is a matrix, the merged data may be a four-dimensional tensor. When the dimensionality of the merged data is the same as that of the second data, and the first data is merged into the second data, the numerical values of part or all of the second data are changed to obtain the final merged data. The dimensionality number of the merged data may be preset or defined in the merge instruction, and the disclosure does not limit this.

In this embodiment, the merge instruction obtained by the instruction parsing circuit may be a compiled instruction that can be directly executed by hardware, or may be an uncompiled software instruction that cannot be directly executed by hardware. If the instruction analysis circuit acquires the uncompiled merging instruction, the uncompiled merging instruction needs to be compiled, and then the compiled merging instruction is analyzed to acquire the operation code and the operation domain of the merging instruction. The instruction parsing circuit may obtain the instructions and data through a data input output unit, which may be one or more data I/O interfaces or I/O pins.

In this embodiment, the operation code may be a part of an instruction or a field (usually indicated by a code) specified in the computer program to perform an operation, and is an instruction sequence number used to inform a device executing the instruction which instruction needs to be executed specifically. The operation domain may indicate the source of data, parameters, required to execute the corresponding instruction. It should be understood that the instruction format of the merge instruction and the included opcode and operation field may be set as desired by those skilled in the art, and the disclosure is not limited thereto.

In this embodiment, the parameter determining circuit may buffer the determined merging parameter to the storage circuit, or may directly send the merging parameter to the processing circuit. The instruction parsing circuit may also cache the operation instruction (described below), the operation code, and the operation field determined and parsed to the storage circuit, or directly send the operation instruction to the processing circuit, and send the operation code and the operation field to the data obtaining circuit and the parameter determining circuit.

In this embodiment, the merge parameter may be determined according to the operation code and/or the operation domain of the merge instruction, or may be determined according to a default setting, which is not limited in this disclosure. The first data address, the second data address may indicate an address of the stored data, e.g., the first data address, the second data address may be a physical address, a first address, a pointer, etc. to store the data. The size of the merged data may be the same as or different from the second data. The size of the merged data may be different from the size of the second data, or the two data may have at least one dimension with different data lengths, for example, if the second data is a matrix, the merged data is also a matrix, and if the length of the row and the column of the merged data is different from the length of the row and the column corresponding to the second data, that is, the size of the merged data is different from the size of the second data.

In this embodiment, the first data and/or the second data are cached in advance by using the storage circuit, so that the processing circuit can directly read the data from the cache without acquiring the data from the off-chip memory, thereby saving the time for acquiring the data and improving the efficiency for acquiring the data.

According to the merging instruction processing device provided by the embodiment of the disclosure, merging between data can be realized through one merging instruction, the merging speed is high, the occupied cache is less, merging of data with different sizes can be supported more flexibly and effectively during large-scale merging operation, the format of the merging instruction is simplified, and the merging instruction is convenient to use.

In this embodiment, the apparatus may implement the data merging process by using various sub-circuits in the processing circuit, and several ways of implementing data merging by using the main processing sub-circuit and the multiple slave processing sub-circuits by using the processing circuit of the apparatus including the main processing sub-circuit and the multiple slave processing sub-circuits are described below, such as the following way one, way two, and way three. The implementation manner of the merging process can be set by those skilled in the art according to actual needs, and the present disclosure does not limit this.

In one possible implementation, the processing circuit may include a master processing sub-circuit and a plurality of slave processing sub-circuits, in a first manner:

the instruction analysis circuit is further configured to analyze the merge instruction to obtain a plurality of operation instructions and send the operation instructions to the main processing sub-circuit, the data acquisition circuit is further configured to send the first data and the second data to the main processing sub-circuit, and the parameter determination circuit is further configured to send the merge parameter to the main processing sub-circuit;

the main processing sub-circuit is used for carrying out preceding processing on the first data and the second data and carrying out transmission of data and/or operation instructions with the plurality of slave processing sub-circuits;

the slave processing sub-circuit is used for carrying out merging processing according to the operation instruction, the corresponding data and the merging parameter to obtain an intermediate merging result and sending the intermediate merging result to the main processing sub-circuit;

the main processing sub-circuit is further configured to perform subsequent processing on the received multiple intermediate merged results to obtain merged data, and store the merged data.

In this implementation, the slave processing sub-circuit actively acquires or passively receives, from the master processing sub-circuit, all of the operation instructions, the corresponding data (the first data and the second data), and the merge parameters required for the merge processing. If each slave processing sub-circuit performs a merging process of a first data and a second data, the slave processing sub-circuit may directly store the resulting merged data. If each slave processing sub-circuit performs a partial operation of "merging processing of one first data and one second data", the slave processing sub-circuit needs to use data obtained by merging processing only as an intermediate merging result, and the master processing sub-circuit obtains merged data of "merging processing of one first data and one second data" according to a plurality of intermediate merging results and then stores the merged data.

In one possible implementation, the processing circuit may include a master processing sub-circuit and a plurality of slave processing sub-circuits, in a second manner:

and the main processing sub-circuit is used for merging the first data and the second data according to the merging parameters to obtain merged data and storing the merged data.

In one possible implementation, the processing circuit may include a master processing sub-circuit and a plurality of slave processing sub-circuits, in a third manner:

the instruction analysis circuit is further configured to analyze the merge instruction to obtain a plurality of operation instructions and send the plurality of operation instructions to the main processing sub-circuit, and the parameter determination circuit is further configured to send the merge parameter to the main processing sub-circuit;

the main processing sub-circuit is used for distributing corresponding operation instructions for the plurality of slave processing sub-circuits and sending at most one of the distributed operation instructions, the corresponding merging parameters, the first data and the second data to the slave processing sub-circuits;

the slave processing sub-circuit is configured to receive at most one of an operation instruction, a corresponding merge parameter, and the first data and the second data allocated by the master processing sub-circuit, perform merge processing on the first data and the second data according to the allocated operation instruction and the corresponding merge parameter to obtain an intermediate merge result, and send the intermediate merge result to the master processing sub-circuit;

the main processing sub-circuit is further configured to perform subsequent processing on the received plurality of intermediate results to obtain merged data, and store the merged data.

In this implementation, when the slave processing sub-circuit receives only any one of the allocated operation instruction, the corresponding merge parameter, the first data, and the second data, which are actively sent by the master processing sub-circuit, the slave processing sub-circuit may directly obtain the remaining data required for the merge processing from the circuit in which the corresponding data is located in the device. For example, in example 1, the slave processing sub-circuit only receives the distributed operation instruction sent by the master processing sub-circuit, and then the slave processing sub-circuit may obtain the corresponding merge parameter from the parameter determining circuit, obtain the first data and the second data from the data obtaining circuit, or obtain the first data and the second data from the storage circuit; or, since the main processing sub-circuit already stores the corresponding merging parameter, the first data and the second data, the slave processing sub-circuit can directly obtain the required data from the main processing sub-circuit. In example 2, if the slave processing sub-circuit does not receive any data actively sent by the master processing sub-circuit, the slave processing sub-circuit may obtain the allocated operation instruction from the master processing sub-circuit, obtain the corresponding merge parameter from the parameter determining circuit, obtain the first data and the second data from the data obtaining circuit, or obtain the first data and the second data from the storage circuit. In fact, before merging, each slave processing sub-circuit needs to determine whether the allocated operation instruction, the corresponding merging parameter, the first data and the second data are currently obtained, and if the data are not complete, the required data are obtained from the circuit of the device until the data are completely prepared, and merging can be performed. As to how the missing data is obtained from the processing sub-circuit, the skilled person can set the way in which the data is obtained as desired, and the present disclosure is not limited thereto.

In this implementation, if each slave processing sub-circuit performs a merging process of one first data and one second data, the slave processing sub-circuit may directly store the resulting merged data. If each slave processing sub-circuit performs a partial operation of "merging processing of one first data and one second data", the slave processing sub-circuit needs to use data obtained by merging processing only as an intermediate merging result, and the master processing sub-circuit obtains merged data of "merging processing of one first data and one second data" according to a plurality of intermediate merging results and then stores the merged data.

In one possible implementation, the storage circuit may include a plurality of storage sub-circuits, each for storing processing data of a corresponding slave processing sub-circuit, the processing data of the slave processing sub-circuit including one or more of an operation instruction, data to be operated on, a merge parameter, and an intermediate merge result.

In this implementation, each slave processing sub-circuit may be configured with its required storage sub-circuit for only the slave processing sub-circuit related data storage. A shared memory sub-circuit may also be provided for the plurality of slave processing sub-circuits, in which shared memory sub-circuit data common to the plurality of slave processing sub-circuits is stored.

In this implementation, a corresponding storage sub-circuit may be further provided for the main processing sub-circuit to store data required by the main processing sub-circuit. Meanwhile, a common shared storage sub-circuit can be arranged for the main processing sub-circuit and the plurality of slave processing sub-circuits in the processing circuit, so that unified storage of common used data in the processing circuit can be realized.

In one possible implementation, the apparatus further includes:

instruction storage circuitry to store the merge instruction;

the queue storage circuit is used for storing an instruction queue, the instruction queue comprises a plurality of instructions to be executed which are sequentially arranged according to an execution sequence, and the plurality of instructions to be executed comprise the merging instruction.

In this implementation, the instructions to be executed may also include computing instructions related to or unrelated to the merge instruction, which is not limited by this disclosure. The execution sequence of the multiple instructions to be executed can be arranged according to the receiving time, the priority level and the like of the instructions to be executed to obtain an instruction queue, so that the multiple instructions to be executed can be sequentially executed according to the instruction queue.

In one possible implementation, the apparatus further includes:

the dependency relationship processing circuit is used for caching a first instruction to be executed in the instruction storage circuit when the fact that the first instruction to be executed in the plurality of instructions to be executed is associated with a zeroth instruction to be executed before the first instruction to be executed is determined, extracting the first instruction to be executed from the instruction storage circuit after the zeroth instruction to be executed is executed, and sending the first instruction to be executed to the processing circuit,

wherein the association relationship between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction comprises: and a first storage address interval for storing the data required by the first instruction to be executed and a zeroth storage address interval for storing the data required by the zeroth instruction to be executed have an overlapped area. On the contrary, there is no association relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction, which may be that there is no overlapping area between the first storage address interval and the zeroth storage address interval.

By the method, according to the dependency relationship between the first to-be-executed instruction and the zeroth to-be-executed instruction before the first to-be-executed instruction, the subsequent first to-be-executed instruction is executed after the execution of the previous zeroth to-be-executed instruction is finished, and the accuracy of the operation result is ensured.

In one possible implementation, the operation code and/or the operation field is used to indicate the merging parameter,

the parameter determining circuit is further configured to determine, when the operation code and/or the operation domain is used to indicate the merge parameter, a part or all of the parameters of the merge parameter according to the operation code and/or the operation domain.

In this implementation manner, merging parameters corresponding to different instructions may be preset, and when all merging parameters cannot be determined according to the operation code and the operation domain, all merging parameters may be determined by combining the preset merging parameters. For example, a part of parameters in the required merging parameters may be determined according to the operation code and the operation domain of the merging instruction, and then another part of parameters in the required merging parameters may be determined according to the determined part of parameters and the preset merging parameters, so as to ensure normal execution of the merging process. In this way, when the same or similar merging instructions are executed in batch, the same part in the merging parameters is preset, and at least different parts in the merging parameters in the merging instructions can reduce the data volume of the merging instructions and provide the processing speed of the same or similar merging instructions. And even if the content of the merging instruction is missing, the normal execution of the merging instruction can be ensured according to the preset merging parameters.

In a possible implementation, the operation code and/or the operation field are further used to indicate a merged data address where merged data is stored.

In one possible implementation, the arithmetic unit may include at least one of a selector, a random number generator, a pseudo-random number generator, an adder, a subtractor, a multiplier, and a comparator. The rule processing circuit may invoke different operators for different merge values to achieve data merging.

For clarity of description, the following describes a process of performing data merging processing by using a merge instruction processing apparatus provided by the present disclosure, by taking an equal-dimension merge instruction and a different-dimension merge instruction as examples.

For a different dimension merge instruction:

in a possible implementation manner, when the merge instruction is a multidimensional merge instruction, the first data and the second data may be two types of data with different data dimensions, such as scalars, vectors, matrices, tensors, and the like. For example, the first data is a vector and the second data is a matrix. Alternatively, the first data is a vector and the second data is a tensor having an order of at least 3. Alternatively, the first data is a matrix and the second data is a tensor having an order of at least 3.

In a possible implementation manner, when the merge instruction is a multidimensional merge instruction, the merge parameter may include at least one of a merge processing type, a type parameter corresponding to the merge processing type, a merge value determination rule, and a data parameter. The merging processing type may indicate a specific merging manner of the performed merging processing, and the type parameter of the merging processing type may be a parameter required to ensure that the device performs the corresponding merging processing. The merge value determination rule may indicate a manner of determining a value of a location where the merge process is performed in the merged data during the merge process.

In one possible implementation, when the merge instruction is a different-dimensional merge instruction, the merge processing type may include at least one of: and (5) specifying dimension direction combination and circularly combining according to the dimension direction. The dimension direction is specified, and the combination processing can be carried out on all or part of data in a dimension. The cyclic merging according to the dimension may be to use the first data as the cyclic data according to the specified dimension, and perform merging processing on the parts of the first data corresponding to the second data multiple times to obtain merged data.

In one possible implementation, when the merge instruction is a different-dimensional merge instruction, the type parameter may include at least one of: the method comprises the steps of starting to merge positions, specifying dimensions, specifying areas in the specifying dimensions and circulating times. When the merging processing type is the designated dimension direction merging, the initial merging position, the designated dimension and the designated area in the designated dimension are type parameters necessary for merging corresponding to the designated dimension direction. When the merged column type is circular merging according to the dimension direction, the initial merging position, the specified dimension and the circulation times are type parameters necessary for circular merging corresponding to the dimension direction, and the specified region in the specified dimension is an optional type parameter.

In this implementation, the starting merging position may be a position in the second data where the merging process is started. The specified dimension may be some or all of the dimensions in the second data. The designated area in the designated dimension may be a row, a column, or a designated area of some or all of the designated dimension. For example, the first data is a vector and the second data is a matrix, and the specified dimension may be one, more or all rows in the matrix, one, more or all columns in the matrix. The cycle number may be the number of times that the first data is cyclically used and merged into the second data, and the cycle number may be a designated value, or may not be limited to a value to be cyclically merged, and the merging process is not stopped until all the data to be merged in the second data is merged with the first data.

For example, when the first data is a vector and the second data is a matrix, the starting merging position may be any position in a specified second row (a row is a specified dimension, and a second row is a specified region of the specified dimension), such as a first position of the second row is the starting merging position, a third position of the second row is the starting merging position, and the like.

In one possible implementation, the merge value determination rule may include any one of: and determining the value of the first data corresponding to the current position as a combined value, determining the value of the second data corresponding to the current position as a combined value, and determining the value obtained by performing arithmetic operation and/or logic operation on the value of the first data corresponding to the designated position and the value of the second data as a combined value. The designated location may include any one or more locations of all locations of the second data that need to be merged.

In this implementation, the merge value determination rule may indicate a specific determination manner for determining the value of the corresponding position of the merged data according to the first data and the second data during the merging process. The arithmetic operation may be an operation based on multiplication, subtraction, addition, or the like, and the logical operation may be an operation for making logical judgment such as or, and, not, or the like. In order to implement the logic determination process, the logic determination may be assisted by combining the generated random number and an operation result obtained by an operation performed based on the first data and/or the second data as input data of a function, an algorithm, or the like. For example, assume that the value of the first data is a and the value of the second data is b. The value of the position of the merged data can be a; may be b; or, the value determined by performing an arithmetic operation and/or a logical operation on a and b may be, for example, a combined value a + b, max (a, b) (i.e., the larger one of a and b is selected as the combined value), min (a, b) (i.e., the smaller one of a and b is selected as the combined value), and the like. Or a value determined by an arithmetic operation and/or a logical operation of the value of the first data and the value of the second data at the previous position. It should be noted that, the above manner for determining the merged value is only an example provided by the present disclosure, and a person skilled in the art may set the merged value determination rule according to actual needs, which is not limited by the present disclosure.

In one possible implementation, when the merge instruction is a different-dimensional merge instruction, the data parameter may include at least one of: the number of dimensions of the first data and the length of the corresponding dimension, the number of dimensions of the second data and the initial length of the corresponding dimension, and the number of dimensions of the merged data and the merged length of the corresponding dimension.

For example, if the first data is a vector with length m, the second data is a matrix with length x y, and the merged data is still a matrix with length x 'y', the number of dimensions of the first data is 1 and the length is m. The number of dimensions of the second data is 2, the two dimensions are rows and columns respectively, the initial length of the row is x, and the initial length of the column is y. The number of dimensions of the merged data is also 2, the two dimensions are respectively a row and a column, the merged length of the row is x ', and the merged length of the column is y'.

In a possible implementation manner, the processing circuit further includes a first complementary bit processing sub-circuit, and the first complementary bit processing sub-circuit is configured to perform complementary bit processing by using the determined complementary bit value when it is determined that the complementary bit condition is satisfied. When the merge instruction is a multidimensional merge instruction, the bit-complementing condition may include: the merged length of at least one of the merged dimensions of the merged data is greater than the corresponding initial length, the data length determined according to the initial merging position and the first data is less than the merged length of the corresponding dimension of the merged data, and the complementary value is a preset value. The preset value may be a preset value of 0,1, etc.

In this implementation manner, when a vacancy or a position where no merging processing is performed exists in the merged data obtained after merging processing is performed on the first data and the second data, the vacancy and/or the position where no merging processing is performed may be complemented to ensure that the merged data is a complete data. The preset values used for bit complementing at different positions can be the same or different, and the disclosure does not limit this.

In one possible implementation, the instruction format of the different-dimension merge instruction may be as shown in table 1 below:

in one possible implementation, par of the data.

D1i a1 a2…ai D2j b1 b2…bj D3q c1 c2…cq

The number of dimensions D1i for representing the first data is i, and a1 a2 … ai represents that the lengths of the dimensions corresponding to the first data are a1 and a2 … ai, respectively. D2j indicates the number of dimensions of the second data is j, and b1 b2 … bj indicates that the lengths of the dimensions of the corresponding second data are b1 and b2 … bj respectively. D3q indicates that the number of dimensions of the merged data is q, and c1 c2 … cq indicates that the lengths of the dimensions of the merged data are c1 and c2 … cq, respectively.

In one possible implementation, par of x.par may include:

Start_addr dim size iter

here, Start _ addr represents a Start merge position, dim represents a designated dimension, size represents a designated area in the designated dimension, and iter represents the number of cycles.

The different merging processing types, the type parameters corresponding to the merging processing types, the merging value determination rule, and the codes, numbers, and other identifiers corresponding to the data parameters may be preset, so that the merging processing indicated by the different-dimensional merging instruction is represented by the identifier in the different-dimensional merging instruction. The above examples 1 and 2 are only partial examples of the different-dimension merge instruction, and a person skilled in the art may set the format of the different-dimension merge instruction according to actual needs, which is not limited by the present disclosure.

In one possible implementation, the merging process performed by the master processing sub-circuit or the slave processing sub-circuit may include the steps of:

when the merging processing type is determined to be merging in the designated dimension direction, determining the designated dimension and the initial merging position which are required to be merged in the second data;

determining first data to be combined according to the first data;

and determining a rule according to the determined combination value, the obtained value of the first data to be combined and the value of the position needing combination processing in the designated dimension, and determining the value of each position between the starting combination position and the combination end position of the designated dimension to obtain the combined data.

In this implementation, determining the first data to be merged according to the first data may include: and determining the first data to be merged according to the specified dimension direction, the relation between the dimension quantity of the first data and the dimension quantity of the second data and other related parameters. Fig. 2 is a schematic diagram illustrating a merge instruction processing apparatus performing data merge based on a different-dimension merge instruction according to an embodiment of the present disclosure. For example, assume that the first data is a matrix and the second data is a three-dimensional tensor. When it is determined that the merging processing type is merging in the designated dimension direction, for example, as shown in fig. 2, when it is determined that merging is performed in the designated z-dimension direction, the first data to be merged may be obtained by determining the rows h and the columns l of the matrix as values in the x and y directions (or the y and x directions), respectively, and then merging is performed in the z-dimension direction. After determining to merge in the designated x-dimension direction, the first data to be merged may be obtained by determining the row h and the column l of the matrix as values in the y and z directions (or in the z and y directions), respectively, and merging is performed according to the x-dimension direction. After determining to merge in the designated y-dimension direction, the first data to be merged can be obtained by determining the rows h and the columns l of the matrix as the values in the x-and z-directions (or the z-and x-directions), respectively, and merging is performed according to the y-dimension direction. It should be noted that, the manner of determining the first data to be combined according to the first data is various, the present disclosure only provides a few simple examples, and a person skilled in the art may set the manner of determining the first data to be combined according to actual needs, and the present disclosure does not limit this.

when the merging processing type is determined to be circular merging according to the dimension direction, determining an initial merging position and the number of times of circulation;

determining a dimension to be circulated according to the dimension of the first data and the dimension of the second data;

and determining a value of each position between the initial merging position and the merging ending position of the second data according to the determined merging value determination rule, the value of the first data acquired according to the circulation times, and the value of the position needing merging processing in the second data acquired according to the dimension to be circulated, so as to obtain merged data.

In a possible implementation manner, determining a dimension to be circulated according to the dimension of the first data and the dimension of the second data may include: according to the dimension number of the first data and the dimension number of the second data, a dimension number difference value is determined, a dimension which is different from the dimension number is selected from a plurality of dimensions of the second data to serve as the target number of the dimension to be circulated, and then a target number of dimensions are selected from the plurality of dimensions of the second data to serve as the dimension to be circulated. Alternatively, the dimension to be looped may be specified in the operand or opcode of the different-dimensional merge instruction. Or, the dimension to be circulated specified when different classes of data are merged can be preset. The method for determining the dimension to be circulated can be set by a person skilled in the art according to actual needs, and the disclosure does not limit this.

For example, as shown in fig. 2, assuming that the merging processing type is cyclic merging according to the dimension direction, and the dimension difference between the matrix and the three-dimensional tensor is 1, any one of x, y, and z in the tensor can be selected as the dimension to be recycled, and then merging processing is performed, and the following processing procedure of cyclic merging according to the specified dimension of the vector and the matrix can be referred to in the specific merging processing procedure, which is not described herein again.

In a possible implementation manner, in the process of performing the merging processing, the main processing sub-circuit or the slave processing sub-circuit may obtain the value of the first data and/or the value of the second data corresponding to the current position according to a preset data value obtaining rule or a data value obtaining rule indicated in the different-dimensional merging instruction. The data value acquisition rule for the value of the first data and the value of the second data may be the same or different. Some or all of the first data may be acquired to perform the merging process between the first data and the second data. The preset data value acquisition rule may include any one of: the method comprises the steps of sequentially acquiring, reversely acquiring, acquiring according to interval sequence, acquiring according to interval reverse sequence, acquiring the value of a preset position in the first data and the like. The interval may be one position, two positions, and the like, and the preset position may be a start position, an end position, and the like of the first data, or may be the start position, the end position, and the like of the first data. The data value obtaining rule can be set by those skilled in the art according to actual needs, and the present disclosure does not limit this.

For example, assume that in some different dimension merge instruction the first data is a vector, which is denoted "abc", the second data is a matrix, which is denoted as

. It should be noted that the following examples are only a few possible examples of vector and matrix combination, and are intended to illustrate the data value obtaining rule, and not to limit the present invention.

If the different-dimensional merging instruction indicates to merge abc with the second row of the matrix and sequentially acquire the first data and the second data in order, and the merge value determination rule is "determine the value of the first data corresponding to the current position as a merge value", the processing circuit may invoke the selector to implement a data merging process, including: for the first position of the second line, the value "a" of the first position of the first data and the value "g" of the first position of the second line of the second data are obtained first, and then the numerical value of the first position of the second line of the merged data is changed from the original "g" to "a", namely, a, g → a. For the second position of the second row, the value "b" of the second position of the first data and the value "j" of the second position of the second row of the second data are obtained first, and then the value of the second position of the second row of the merged data is changed from the original "j" to "b", namely b, j → b. For the third position of the second row, the value "c" of the third position of the first data and the value "d" of the third position of the second row of the second data are obtained first, and then the numerical value of the third position of the second row of the merged data is changed from the original "d" to "c", namely c, d → c. The merged data obtained after the merging process is

。

For convenience of description, the merging processing procedure thereof will be described later in a manner of "data 1, data2 → data 3", where data1 denotes the value of the acquired first data, data2 denotes the value of the acquired second data, and data3 denotes the value of the merged data.

If the different-dimension merging instruction indicates that the ' abc ' is merged with the second row of the matrix, the first data are sequentially acquired in a reverse order, the second data are sequentially acquired in an order, and the merge value determination rule is that the value of the first data corresponding to the current position is determined as a merge value ', the processing circuit can call the selector to realize dataThe merging process comprises the following steps: for the first position of the second row, the process is c, g → c. For the second position of the second row, the process is b, j → b. For the third position of the second row, the process is a, d → a. The merged data obtained after the merging process is

。

If the different-dimensional combining instruction indicates to combine "abc" with the second row of the matrix and sequentially acquire the first data and the second data in order, and the combined value determination rule is "determine a value obtained by adding a value of the first data and a value of the second data corresponding to the current position as a combined value", the processing circuit may call an adder to implement a data combining process, including: for the first position of the second row, the process is a, g → a + g. For the second position of the second row, the process is b, j → b + j. For the third position of the second row, the process is c, d → c + d. The merged data obtained after the merging process is

。

If the same-dimension merging instruction indicates to merge abc with the second row of the matrix and sequentially acquire the first data and the second data in sequence, and the merge value determination rule is that "the greater of the value of the first data and the value of the second data corresponding to the current position is determined to be a merge value", the processing circuit may call a comparator to implement a data merging process, including: for the first position of the second row, the process is a, g → a since a > g. For the second position of the second row, the process is b, j → j since j > b. For the third position in the second row, the process is c, d → d since d > c. The merged data obtained after the merging process is

。

If the different dimension merging instruction indicates to merge "abc" with the second row of the matrix and to sequentially obtain the first data and the second number in orderAccordingly, the merge value determination rule is "determine the value of the first data or the second data selected at the current position as the merge value (i.e. determine the value obtained by performing an arithmetic operation and/or a logical operation on the value of the first data corresponding to the specified position and the value of the second data as the merge value)", for example, the merge value is determined according to an input mask, assuming that the mask is the merge value "

Where 1 indicates that a value of the first data is selected and 0 indicates that a value of the second data is selected, then the processing circuitry may invoke the selector to implement a data merging procedure according to the mask, including: for the first position of the second row, the process is a, g → a. For the second position of the second row, the process is b, j → j. For the third position of the second row, the process is c, d → c. That is, the combined data obtained finally is

。

In addition, other manners may be selected to express or implement a merged value determination rule that a value obtained by performing an arithmetic operation and/or a logical operation on a value of the first data corresponding to the designated position and a value of the second data is determined as a merged value. For example, a threshold value for selecting the first data, such as 0.5, may also be given by a random number generator or a pseudo-random number generator in the operator, and a random number of [0, 1) may be generated by the random number generator, and when the random number is less than 0.5, the value of the first data is selected as the combined value of the current location, otherwise, the value of the second data is selected as the combined value of the current location. The pseudo-random number generator can generate random numbers by utilizing algorithms such as a median method, a congruence method, a shift method, a Meisen rotation algorithm and the like.

In order to facilitate understanding of the process of performing data merging processing according to the different-dimensional merging instruction described in the present disclosure, an example of the process of performing merging processing according to the different-dimensional merging instruction is described below, taking merging processing between vectors and matrices as an example. It is understood that when merging data of other different types is performed, the merging process is similar to vector matrix merging, and details of the present disclosure are omitted.

When the merge instruction is a different-dimensional merge instruction, the first data is a vector, the second data is a matrix, and the merged data is also a matrix, the merge processing type may include at least one of the following: designated row merging, designated column merging, merging by rows, merging by columns, circularly merging by rows and circularly merging by columns, wherein the type parameters comprise at least one of the following parameters: initial merging position, designated row, designated column, and cycle number. The consolidated value determined by the consolidated value determination rule may include any of: the value of the vector corresponding to the current position, the value of the matrix corresponding to the current position, and the value determined by performing arithmetic operation and/or logical operation on the value of the vector corresponding to the designated position and the value of the matrix, wherein the designated position comprises any one or more positions in all positions needing to be subjected to merging processing. The data parameters may include at least one of: the length of the vector, the initial row length and the initial column length of the matrix, and the merged row length and the merged column length of the merged matrix.

For example, assuming that the first data is determined to be a vector with a length of m and the second data is determined to be a matrix with x y according to the different-dimensional merge instruction, the merge value determination rule determines the value of the corresponding position in the first data to be the merge value.

As shown in fig. 3a, when it is determined that the merging processing type is "designated row merging" according to the multidimensional merging instruction, an nth row and a starting merging position (a starting position of the nth row) of a designated row in the matrix that need to be merged are determined, the vector is used as a row, and a value of each position between the starting merging position of the designated row and a merging ending position of the designated row is determined according to a determined merging value determination rule, the obtained value of the vector, and a value of a position that needs to be merged in the designated row, so as to obtain a merged matrix (i.e., merged data). The matrix after merging is different from the matrix (before merging) in that the data of the nth row is changed (the specific change is described with reference to the relevant text below).

Wherein the merging end position of the designated line is determined according to the starting merging position and the length of the vector or according to the length of the merged line. The merging end position may specify the end position of a row, the middle position of a specified row for the merged matrix. For convenience of description, the size of the merged matrix is set to x '× y', and when the first data is a vector with a length of m and the second data is a matrix with a length of x × y, the length from the start merging position to the start position of the designated row is w, and the merging end position is the end position of the designated row of the merged matrix, the case includes: w + m is more than or equal to x'. The merging end position is the middle position of the appointed row of the merged matrix and comprises the following steps: w + m < x'.

As shown in fig. 3a, when it is determined that the merging processing type is "designated column merging" according to the different-dimensional merging instruction, determining a d-th column and a starting merging position (a starting position of the d-th column) of a designated column that needs to be merged in the matrix, taking the vector as one column, determining a rule according to the determined merging value, the obtained value of the vector, and the value of a position that needs to be merged in the designated column, and determining a value of each position of the designated column from the starting merging position to a merging ending position of the designated column, thereby obtaining a merged matrix (i.e., merged data). The matrix after merging differs from the matrix (before merging) in that the data in the d-th column is changed.

Wherein the merging end position of the designated column is determined according to the starting merging position and the length of the vector or according to the length of the merged column. The merging end position may specify the end position of a column, the middle position of a column for the merged matrix. For convenience of description, the size of the merged matrix is set to x '× y', and when the first data is a vector with a length of m and the second data is a matrix with a length of x × y, the length from the starting merging position to the starting position of the designated column is w, and the merging end position is the end position of the designated column of the merged matrix, the case includes: w + m is more than or equal to y'. The merging end position is the middle position of the appointed column of the merged matrix and comprises the following steps: w + m < y'.

As shown in fig. 3a, when it is determined that the merging processing type is "merging by row" according to the multidimensional merging instruction, a starting merging position of each row is determined (the starting merging positions of each row may be the same or different), the vector is used as one row, each row of the matrix is used as a designated row, a value of each position between the starting merging position of the designated row and the merging end position of the designated row is determined according to a determined merging value determination rule (the merging value determination rule of each row may be the same or different), the obtained value of the vector, and the value of the position where merging processing needs to be performed on each designated row, so as to obtain a merged matrix (i.e., merged data). The matrix after merging differs from the matrix (before merging) in that the data of each row is changed.

As shown in fig. 3a, when it is determined that the merging processing type is "merging by column" according to the multidimensional merging instruction, a starting merging position of each column is determined (the starting merging positions of each column may be the same or different), the vector is used as one column, each row of the matrix is used as a designated column, a value of each position between the starting merging position of the designated column and the merging end position of the designated column is determined according to a determined merging value determination rule (the merging value determination rule of each column may be the same or different), the obtained value of the vector, and the value of the position where merging processing is required for each designated column, so as to obtain a merged matrix (i.e., merged data). The matrix after merging differs from the matrix (before merging) in that the data of each column has changed.

As shown in fig. 3a, in the "designated column combination" and the "column combination", since the length of the vector is greater than the column length of the matrix after combination, only the data of the "column length of the matrix after combination" in the vector may be used during combination, and the remaining data of the vector may not be used. Or, when the column size of the matrix is larger than the size of the merged matrix, after merging, intercepting data of the size of the "size of the merged matrix" in the original matrix as the merged matrix. That is, when the size of the vector and/or matrix is larger than the size of the merged matrix, part of the data of the vector and/or matrix may be deleted to obtain the merged matrix of the desired size.

As shown in fig. 3b, according to the determination of the different-dimensional merge instruction, taking as an example that a certain row is specified and the merge value determination rule is "determine the value of the first data corresponding to the current position as the merge value" to perform the merge processing, a possible manner of processing that needs to be performed by the processing circuit when the merged length of the merged matrix row, the initial length of the matrix row, and the length of the vector are changed differently is described as follows, but the present disclosure is not limited to the following six possibilities. Wherein, the initial merging position is the initial position of the nth row of the designated row. When merging processing is performed on a plurality of or all rows and one or more columns, processing performed by the processing circuit is similar to "processing performed on a designated row and processing performed by the processing circuit", and reference may be made to fig. 3b and the following contents, which are not described again.

The following steps are possible: the size of the merged matrix is consistent with the size of the initial matrix, and the length of the vector is smaller than the initial length of the matrix, at this time, the value of the merged matrix, which is only the value of the "data change area before and after merging" shown in fig. 3b, changes correspondingly according to the value of the vector, that is, the position 1 to the position m of the nth row are changed from "j 1, j2, j3 … jm" to "i 1, i2, i3 … im". The position of the nth row in the merged matrix except for the 'data change area before and after merging' can still use the numerical value of the initial matrix, that is, the data from the position m +1 to the position x are still 'jm, jm +1 … jx'; it is also possible to perform a bit-padding process, where preset values (not shown) are padded at positions m +1 to x.

And the possibility of two: the size of the merged matrix is the same as the size of the initial matrix, and the length of the vector is greater than or equal to the initial length of the matrix, at this time, as shown in fig. 3b, the data of each row of the merged matrix changes correspondingly according to the value of the vector, that is, the position 1 to the position m of the nth row are changed from "j 1, j2, j3 … jm" to "i 1, i2, i3 … im". And bit complementing processing is not required.

The possibility of three: the size of the merged matrix is smaller than the size of the original matrix (the merged length of at least one dimension is smaller than the original length of the corresponding dimension), and the length of the vector is smaller than or equal to the merged length. At this time, as shown in fig. 3b, the data in each row of the merged matrix will change correspondingly due to the value of the vector, that is, the positions 1 to m in the nth row are changed from "j 1, j2, j3 … jm" to "i 1, i2, i3 … im"; the numerical value of the initial matrix can still be used at the position of the nth row in the merged matrix except for the 'data change area before and after merging', namely the data from the position m +1 to the position x-6 of the nth row still are 'jm, jm +1 … jx-6'; and bit complementing processing is not required. And the data in the other rows except the nth row in the merged matrix is consistent with the data in the row corresponding to the initial matrix.

And possibly four: the size of the merged matrix is greater than or equal to the size of the original matrix (the merged length of at least one dimension is greater than the original length of the corresponding dimension), and the length of the vector is less than or equal to the original length. At this time, the numerical value of the matrix after merging, which is only the "data change area before and after merging" shown in fig. 3b, changes correspondingly due to the value of the vector, that is, the position 1 to the position m of the nth row are changed from "j 1, j2, j3 … jm" to "i 1, i2, i3 … im"; the position of the nth row in the merged matrix, which corresponds to the initial matrix except for the 'data change area before and after merging', can still use the numerical value of the initial matrix, that is, the data from the position m +1 to the position x of the nth row still is 'jm, jm +1 … jx'; if the nth row in the merged matrix has a position which is beyond the length of the initial matrix except for the 'data change area before and after merging', bit supplementing processing can be carried out to supplement a preset value, namely the data from the position x +1 to the position x +6 of the nth row are 'jx +1, jx +2 … jx + 6', and the numerical values are preset values. If the combined length of the row and the column of the combined matrix is greater than the initial length of the row and the column of the initial matrix, for the other rows of the combined matrix except the nth row, bit supplementing processing can be carried out on the positions exceeding the initial length to supplement preset values, namely the data from the positions x +1 to the positions x +6 of the other rows are 'jx +1, jx +2 … jx + 6', and the numerical values are preset values; and, for the newly added row, the bit complement processing can be performed on the whole row. If the merged length of only the rows of the merged matrix is greater than the initial length of the row of the initial matrix, for the other rows of the merged matrix except the nth row, bit complementing processing can be performed at positions beyond the initial length to complement preset values, that is, the data from the positions x +1 to x +6 of the other rows are 'jx +1, jx +2 … jx + 6', and the values are preset values. If the combined length of only columns of the combined matrix is larger than the initial length of the columns of the initial matrix, bit complementing treatment can be carried out on the whole row of the newly added row.

And possibly five: the size of the merged matrix is larger than or equal to the size of the initial matrix (the merged length of at least one dimension is larger than the initial length of the corresponding dimension), and the length of the vector is larger than the initial length and smaller than the merged length. At this time, the value of the "data change area before and after merging" of the merged matrix shown in fig. 3b is changed correspondingly according to the value of the vector, that is, the n-th row position 1 to the position x are changed from "j 1, j2, j3 … jm … vacancy" to "i 1, i2, i3 … im … ix"; if the nth row in the merged matrix has a position beyond the length of the vector except for the data change area before and after merging, bit supplementing processing can be performed to supplement the preset value, that is, the data from the position x +1 to the position x +6 of the nth row are 'jx +1, jx +2 … jx + 6', and the numerical values are preset values. If the combined length of the row and the column of the combined matrix is greater than the initial length of the row and the column of the initial matrix, for the other rows of the combined matrix except the nth row, bit supplementing processing can be carried out on the positions exceeding the initial length to supplement preset values, namely the data from the positions m +1 to the positions x +6 of the other rows are 'jm +1 … jx, jx +1, jx +2 … jx + 6', and the numerical values are preset values; and, for the newly added row, the bit complement processing can be performed on the whole row. If the merged length of only rows of the merged matrix is greater than the initial length of the row of the initial matrix, for the other rows of the merged matrix except the nth row, bit complementing processing can be performed at positions exceeding the initial length to complement preset values, that is, the data from the positions m +1 to x +6 of the other rows are 'jm +1 … jx, jx +1, jx +2 … jx + 6', and the numerical values are preset values. If the combined length of only columns of the combined matrix is larger than the initial length of the columns of the initial matrix, bit complementing treatment can be carried out on the whole row of the newly added row.

And possibly six: the size of the merged matrix is greater than or equal to the size of the original matrix (the merged length of at least one dimension is greater than the original length of the corresponding dimension), and the length of the vector is greater than the original length and greater than the merged length. At this time, the value of the matrix after merging, which is only the "data change area before and after merging" shown in fig. 3b, changes correspondingly according to the value of the vector, i.e. the n-th row position 1 to the position x-1 changes from "j 1, j2, j3 … jm … empty position" to "i 1, i2, i3 … im … ix-1", and no padding processing is needed. If the combined length of the row and the column of the combined matrix is greater than the initial length of the row and the column of the initial matrix, for the other rows of the combined matrix except the nth row, bit supplementing processing can be carried out on the positions exceeding the initial length to supplement a preset value, namely the data from the position m +1 to the position x-1 of the other rows are 'jm +1 … jx-1', and the numerical values are preset values; and, for the newly added row, the bit complement processing can be performed on the whole row. If the merged length of only the rows of the merged matrix is greater than the initial length of the row of the initial matrix, for the other rows of the merged matrix except the nth row, bit complementing processing can be performed at positions beyond the initial length to complement a preset value, that is, data from positions m +1 to x-1 of the other rows are 'jm +1 … jx-1', and values of the data are preset values. If the combined length of only columns of the combined matrix is larger than the initial length of the columns of the initial matrix, bit complementing treatment can be carried out on the whole row of the newly added row.

As shown in fig. 3c, when it is determined that the merging processing type is merging cyclically by rows according to the multidimensional merging instruction, determining a starting merging position (which may be any position in any row) and a cycle number (which may be a designated numerical value not shown in the figure, or may be a value obtained by cycling through the cycle number as shown in fig. 3c until all data are merged and then ended), determining a value of each position between the starting merging position and a merging ending position of the matrix according to a determined merging value determination rule (each row merging value determination rule may be the same or different, or a merging value determination rule corresponding to each vector merging into a matrix may be the same or different), a value of the vector obtained by cycling through the cycle number, and a value to be merged in the matrix obtained from the starting merging position by rows, resulting in a merged matrix (i.e., merged data). If the merging is performed in a manner without limitation to the number of cycles, the matrix after merging is different from the matrix (before merging) in that the data at each position is changed. If the combination is carried out in a mode specified by the cycle number, the difference between the matrix after combination and the matrix (before combination) is that the data from the initial combination position to the end position of the matrix after combination are all changed; the rest position data can use the data of the initial matrix, and can also carry out bit complementing processing, or partly use the data of the initial matrix and partly carry out bit complementing processing. The merge value determination rule may be consistent all the time in the merge process, or the merge value determination rule may be set according to the merged row or the number of times of using the first data, for example, the merge value determination rule of each row may be inconsistent, or the merge value determination rule corresponding to each vector merged into the matrix may be inconsistent.

As shown in fig. 3c, when it is determined that the merging processing type is circular merging by column according to the multidimensional merging instruction, a starting merging position (which may be any position in any column) and a cycle number (which may be a designated numerical value not shown in the figure, or a cycle number shown in fig. 3 c) are determined, and the cycle number is not limited until all data are merged and then ended, and a value of each position between the starting merging position and a merging end position of the matrix is determined according to a determined merging value determination rule, a value of the vector obtained according to the cycle number, and a value that needs to be merged in the matrix obtained from the starting merging position by column, so as to obtain a merged matrix (i.e., merged data). If the merging is performed in a manner without limitation to the number of cycles, the matrix after merging is different from the matrix (before merging) in that the data at each position is changed. If the combination is carried out in a mode specified by the cycle number, the difference between the matrix after combination and the matrix (before combination) is that the data from the initial combination position to the end position of the matrix after combination are all changed; the rest position data can use the data of the initial matrix, and can also carry out bit complementing processing, or partly use the data of the initial matrix and partly carry out bit complementing processing. The merge value determination rule may be consistent all the time in the merge process, or the merge value determination rule may be set according to the merged column or the number of times of using the first data, for example, each column of the merge value determination rule may be inconsistent, or the merge value determination rule corresponding to each vector merged into the matrix may be inconsistent.

And determining the merging end position of the matrix according to the length of the merged column and the length of the merged row, or according to the cycle number, the starting merging position and the length of the vector. If the number of cycles is not limited, the merging end position of the matrix may be an end position determined according to the length of the merged column and the length of the merged row. If the number of cycles is specified, the merging end position of the matrix may be: w + number of cycles × m and (w + number of cycles × m) < x '. times.y'. If the number of cycles is specified, but (w + number of cycles x m) ≧ x '. y', then the merge end position of the matrix can be the end position determined from the length of the merged column and the length of the merged row. Wherein m is the length of the vector, the length from the initial merging position to the initial position of the matrix is w, and the size of the merged matrix is x '. times.y'.

The device receives a different dimension merging instruction 1 as follows:

merge_diff.1 no1 x. h2 data. 1 12 2 20 30 2 20 30 add0 add1 add2

wherein, merge _ diff in merge _ diff.1 indicates that it is an different-dimension merge instruction, and 1 indicates that the merge processing type specifies the dimension direction merge. no1 indicates that the merge value determination rule is "determine the value of the first data corresponding to the current position as the merge value". x, h2 denotes the type parameter as "specify dimension as row, and specify region as second row". data, 1122203022030 indicates data parameters "the dimension number of the first data is 1, the length of the dimension is 12, that is, the first data is a vector with the length of 12, the dimension number of the second data is 2, the lengths corresponding to two different dimensions are 20 and 30 respectively, that is, the second data is a matrix of 20 × 30, the dimension number of the merged data is 2, the lengths corresponding to two different dimensions are 20 and 30 respectively, that is, the merged data is a matrix of 20 × 30". The address of the first data is addr0, the address of the second data is addr1, and the address of the merged data is addr 2.

The device processes the different-dimension merging instruction 1 by the following steps: the instruction analysis circuit analyzes the different-dimensional merging instruction 1 to obtain an operation code (merge _ diff.1) and an operation domain (no 1 x. h2 data, 1122203022030 add 0add 1 add 2) corresponding to the different-dimensional merging instruction, and then the data acquisition circuit acquires first data from add0 and second data from add1 according to the operation domain and caches the first data and the second data to the storage circuit. The parameter determination circuit determines its corresponding merge parameter based on the operation field and the operation code (as described above). And after acquiring the first data and the second data from the storage circuit, the processing circuit performs merging processing on the first data and the second data according to the merging parameters to obtain merged data and stores the merged data to add 2.

For the same-dimension merge instruction:

in one possible implementation, when the merge instruction is a one-dimensional merge instruction, the first data and the second data are vectors; or, the first data and the second data are matrixes; alternatively, the first data and the second data are tensors. Alternatively, the first data and the second data may be other data with the same number of dimensions, which is not limited by the present disclosure.

In a possible implementation manner, when the merge instruction is a one-dimensional merge instruction, the merge parameter may include at least one of a merge processing type, a type parameter corresponding to the merge processing type, a merge value determination rule, and a data parameter. The merging processing type may indicate a specific merging manner of the performed merging processing, and the type parameter of the merging processing type may be a parameter required to ensure that the device performs the corresponding merging processing. The merge value determination rule may indicate a manner of determining a value of a location where the merge process is performed in the merged data during the merge process.

In one possible implementation, when the merge instruction is a same-dimension merge instruction, the merge processing type may include at least one of: and (5) merging the designated areas and circularly merging according to the direction. The designated area merging may designate a merging process of one or more designated areas to be merged of the first data and the second data. The designated area to be merged may be an area that coincides with the size of the first data. The designated area to be merged may also be an area of a size different from that of the first data, for example, the size of the area to be merged is smaller than that of the first data. The direction-based cyclic merging may be that the merged data is obtained by performing merging processing on the first data and the region corresponding to the second data multiple times according to the specified direction and using the first data as cyclic data.

In one possible implementation, when the merge instruction is a same-dimension merge instruction, the type parameter may include at least one of: the method comprises the steps of starting merging positions, merging dimension sequences, areas to be merged, circulation directions, circulation times and circulation intervals. When the merging processing type is designated region merging, the type parameters corresponding to the designated region merging may include a region to be merged and a merging dimension order; or the type parameter corresponding to the specified region merging may include a starting merging position and a merging dimension order. When the merging-out column type is the direction-based circular merging, the type parameters corresponding to the direction-based circular merging may include a region to be merged and a merging dimension order; or the type parameters corresponding to the direction loop merging may include a start merging position, a loop direction, a loop number, a loop interval, and a merging dimension order.

In this implementation, in the case that the region to be merged is directly indicated in the same-dimension merging instruction, the first data and the data in the region to be merged may be directly merged according to the merging dimension data. In the case that the region to be merged is not indicated in the same-dimension merging instruction, the region to be merged needs to be determined according to the type parameter corresponding to the merging processing type. For example, when the merging processing type is designated region merging, the regions to be merged may be determined according to the first data, the starting merging position, and the merging dimension order. When the merging-out column type is the circular merging according to the direction, the area to be merged can be determined according to the first data, the initial merging position, the circular direction, the circular times, the circular interval and the merging dimension sequence.

In this implementation, the starting merging position may be a position in the second data where the merging process is started. When the number of times of merging the first data and the second data is greater than 1, the starting merging position includes a position where merging of the data is started, and may also include a starting merging position of each region of the second data merged with the first data, which is not limited by the present disclosure. The merged dimension order may be a dimension order in which the first data is merged into the second data, and the larger the number of dimensions of the first data is, the more choices of the merged dimension order are. For example, fig. 4 and 5 are schematic diagrams illustrating a merge instruction processing apparatus according to an embodiment of the disclosure performing data merging based on a co-dimensional merge instruction. Assuming that the first data and the second data are both matrices, the merging dimension order may include merging in the order of rows and columns (example 2 shown in fig. 4), and merging in the transposed order (example 1 shown in fig. 4). Merging in the order of rows and columns refers to merging the data of the rows and columns of the first data with the data of the rows and columns of the second data, respectively. The sequential merging after the transposition refers to merging the row of the first data as a column and the column as a row with the data of the row and the column in the second data, that is, transposing the first data, and merging the transposed first data according to the mode corresponding to the row and the column. The merging dimension order may also be that, after rearranging the data in the first data according to a specified rule, merging the data into the second data according to preset row and column lengths.

In this implementation, the circulation direction may be a certain dimension direction of the second data (as in example 4 shown in fig. 4), or may be a direction determined by combining a plurality of dimension directions (as in example 5 shown in fig. 4). When the circulation direction is composed of a plurality of dimension directions, merging priority levels of different dimensions may be set, and merging may be performed preferentially according to a dimension with a high priority level, or a priority level may not be set, which is not limited by the present disclosure. For example, when the second data and the first data are matrices, the circulation direction may be a row or column direction, and the first data may be merged with at least one region to be merged in a plurality of rows or columns corresponding to the starting merging position. Or, when the second data and the first data are matrices, the cyclic direction may be rows and columns, and the rows have priority, if the determined multiple regions to be merged include multiple "rows", the first data may be merged with at least one region to be merged in the first "row", and then the merging process of the next "row" is performed until the merging is completed.

In this implementation manner, the cycle number may be the number of times that the first data is cyclically used and merged into the second data, and the cycle number may be a specified value, or may not limit the value to be cyclically merged, and the merging process is not stopped until all the data that needs to be merged in the second data is merged with the first data.

In this implementation, the cycle interval may be a distance between two adjacent merged regions in merged data obtained by merging the first data cycle and the second data, and the cycle interval includes an interval corresponding to each dimension of the merged data for the two adjacent merged regions. For example, assuming that the merged data is a matrix, the cyclic interval thereof includes a cyclic interval in the row direction and a cyclic interval in the column direction (example 5 shown in fig. 4). The shorter the cycle interval, the more the positions where the numerical value changes in the merged data, the more the skilled person can set the cycle interval according to the actual needs, which is not limited by the present disclosure.

In one possible implementation, when the merge instruction is a co-dimensional merge instruction, the merge value determination rule may include any one of: and determining the value of the first data corresponding to the current position as a combined value, determining the value of the second data corresponding to the current position as a combined value, and determining the value obtained by performing arithmetic operation and/or logic operation on the value of the first data corresponding to the designated position and the value of the second data as a combined value. The designated location may include any one or more locations of all locations of the second data that need to be merged.

In one possible implementation, when the merge instruction is a co-dimensional merge instruction, the data parameter may include at least one of: the dimension number of the first data and the second data, the length of the dimension corresponding to the first data, the initial length of the dimension corresponding to the second data, the dimension number of the merged data and the merged length of the corresponding dimension.

In this implementation manner, the lengths of the first data, the second data and the merged data in each dimension are not limited in this disclosure, and may be set by those skilled in the art as needed.

In one possible implementation, the processing circuit may further include a second complementary bit processing sub-circuit,

and the second bit complementing processing sub-circuit is used for carrying out bit complementing processing by using the determined bit complementing value when the condition of bit complementing is determined to be met. When the merge instruction is a unidimensional merge instruction, the bit-padding condition may include: the merged length of at least one dimension in the merged data is greater than the corresponding initial length, the data length determined according to the initial merging position, the cycle interval and the region to be merged is less than the merged length of the corresponding dimension in the merged data, and the complement value is a preset value. The preset value may be a preset value of 0,1, etc.

In this implementation, when a vacancy or a position where no merging process is performed exists in the merged data obtained after the merging process is performed on the first data and the second data, the vacancy and/or the position where no merging process is performed may be complemented to ensure that the merged data is a complete data (as in example 9 shown in fig. 5). The preset values used for bit complementing at different positions can be the same or different, and the disclosure does not limit this.

In one possible implementation, the processing circuit may include a deletion processing sub-circuit. The deletion processing sub-circuit is configured to execute deletion processing when it is determined that a deletion condition is satisfied. Wherein the deletion condition may include any one of: and the data length determined according to the initial merging position, the circulation interval and the area to be merged is greater than the merged length of the corresponding dimension in the merged data.

In this implementation, the deletion process performed may include deleting a portion of the first data and/or the second data. For example, the deletion process performed may include deleting part of the first data so that the first data of the deleted part of the data can be merged with the area to be merged of the second data (examples 6 and 7 shown in fig. 5). The deletion process may include deleting a part of the second data, and when the size of the second data is larger than the size of the merged data, the part of the second data may be deleted (example 8 shown in fig. 5). It should be noted that, the deleting process is performed to ensure that the merging process is performed normally and that the size of the merged data meets the requirement, and a person skilled in the art may set the deleting process according to actual requirements, which is not limited in the present disclosure.

In one possible implementation, the instruction format of the co-dimensional merge instruction may be as shown in table 2 below:

in one possible implementation, par of the data.

D1i a1 a2…ai b1 b2…bi D3j c1 c2…cj

The number of dimensions of the D1i for representing the first data and the second data is i, and the lengths of the dimensions of the corresponding first data are a1 and a2 … ai respectively, which are represented by a1 a2 … ai. b1 b2 … bi indicates that the lengths of the respective dimensions corresponding to the second data are b1 and b2 … bi, respectively. D3j indicates that the number of dimensions of the merged data is j, and c1 c2 … cj indicates that the lengths of the dimensions of the merged data are c1 and c2 … cj, respectively.

In one possible implementation, par of x.par may include:

loca order area dire fre dist

wherein loca represents an initial merging position, order represents a merging dimension sequence, area represents a region to be merged, dire represents a circulation direction, fre represents circulation times, and dist represents circulation intervals.

The merging instruction may be a merging instruction, and the merging instruction may include a merging value determination rule, a data parameter, and a flag indicating whether the merging instruction is a merged instruction. The above examples 3 and 4 are only partial examples of the one-dimensional merge instruction, and a person skilled in the art may set the format of the one-dimensional merge instruction according to actual needs, which is not limited by the present disclosure.

In one possible implementation, the merging process performed by the master processing sub-circuit or the slave processing sub-circuit may include the steps of: and when the merging processing type is determined to be the designated area merging, determining the area to be merged in the second data, which needs to be merged, according to the first data, the initial merging position and the merging dimension sequence.

In this implementation, when the initial merging position is one, an area to be merged may be determined according to the initial merging position, the merging dimension order, and the size of the first data. When the number of the initial merging positions is multiple, a plurality of areas to be merged can be determined according to the multiple initial merging positions, the merging dimension sequence and the size of the first data.

In one possible implementation, the merging process performed by the master processing sub-circuit or the slave processing sub-circuit may include the steps of: and when the merging processing type is determined to be cyclic merging according to the dimension direction, determining a plurality of areas to be merged of the second data, which need to be merged, according to the first data, the initial merging position, the cyclic direction, the cyclic times, the cyclic interval and the merging dimension sequence.

In this implementation, when the initial merging position is one, the cyclic number of regions to be merged may be determined according to the initial merging position, the cyclic direction, the cyclic interval, and the size of the first data. When the number of the initial merging positions is multiple, the number of the regions to be merged in the cycle can be determined according to the multiple initial merging positions, the cycle interval and the size of the first data.

In one possible implementation, the merging process performed by the master processing sub-circuit or the slave processing sub-circuit may further include the steps of: determining first data to be merged according to the first data and the merging dimension sequence; and determining values of all positions in the region to be merged according to the determined merging value determination rule, the obtained value of the first data and the value in the region to be merged to obtain merged data.

In this implementation, when the merging dimension order is to be merged with the region to be merged of the second data according to the original dimension order of the first data, the first data may be directly determined as the first data to be merged (as example 2 shown in fig. 4). When the merging dimension order is that the merging is performed with the region to be merged of the second data according to the original dimension order different from the first data, the first data may be directly processed to obtain the first data to be merged (as example 1 shown in fig. 4), or each value in the first data may be obtained according to a new order.

In a possible implementation manner, during the merging process, the master processing sub-circuit or the slave processing sub-circuit may obtain the value of the first data and/or the value of the second data corresponding to the current position according to a preset data value obtaining rule or a data value obtaining rule indicated in the same-dimension merging instruction. The data value acquisition rule for the value of the first data and the value of the second data may be the same or different. Some or all of the first data may be acquired to perform the merging process between the first data and the second data. The preset data value acquisition rule may include any one of: the method comprises the steps of sequentially acquiring, reversely acquiring, acquiring according to interval sequence, acquiring according to interval reverse sequence, acquiring the value of a preset position in the first data and the like. The interval may be one position, two positions, and the like, and the preset position may be a start position, an end position, and the like of the first data, or may be the same. The data value obtaining rule can be set by those skilled in the art according to actual needs, and the present disclosure does not limit this.

For example, assume that the first data in a co-dimensional merge instruction is a matrix, which is represented as

The second data is a matrix represented by

The one-dimensional merge instruction indicates that the region to be merged in the second data (i.e., the region indicated by the gray location in the matrix) includes a first row first location, a second location, and a second row first location and second location of the second data, and for convenience of description, the four locations are referred to as location 1 (first row first location), location 2 (first row second location), location 3 (second row first location), and location 4 (second row second location), respectivelyLocation). It should be noted that the following examples are only a few possible examples of combining between the matrices, and are intended to illustrate the data value obtaining rule and the process of combining the data, and the disclosure is not limited thereto.

If it is determined to merge the two into the order according to the one-dimensional merge instruction, and the first data and the second data are sequentially obtained, the merge value determination rule is "determine the value of the first data or the second data selected at the current position as the merge value (that is, determine the value obtained by performing an arithmetic operation and/or a logical operation on the value of the first data corresponding to the specified position and the value of the second data as the merge value)", for example, determine the merge value according to an input mask, assuming that the mask is the merge value "

Where 1 indicates a value of selecting the first data and 0 indicates a value of selecting the second data, the processing circuit may invoke the selector to determine the value of the region to be merged as determined by the mask

That is, the finally obtained merged data is

。

If two are determined to be combined according to the same-dimension combination instructionThe merging step sequentially obtains the first data and the second data, the merge value determination rule is "determine the value of the first data corresponding to the current position as the merge value", the processing circuit may call the selector to implement the data merging process, and the merging process sequence is: for the position 1, the value "a" at the first position in the first row of the first data and the value "q" at the position 1 in the second data are obtained first, and then the numerical value at the position 1 of the merged data is changed from the original "q" to "a", i.e., a, q → a. For the position 2, the value "b" at the second position of the first row of the first data and the value "w" at the position 2 in the second data are obtained first, and then the numerical value at the position 2 of the merged data is changed from the original "w" to "b", that is, b, w → b. For the position 3, the value "c" at the first position in the second row of the first data and the value "g" at the position 3 in the second data are obtained first, and then the numerical value at the position 3 of the merged data is changed from the original "g" to "c", that is, c, g → c. For the position 4, the value "d" at the second position in the second row of the first data and the value "j" at the position 4 in the second data are obtained first, and then the numerical value at the position 4 of the merged data is changed from the original "j" to "d", i.e., d, j → d. The merged data obtained after the merging process is

。

If the two are determined to be merged into a reverse order according to the one-dimensional merging instruction to sequentially acquire the first data and sequentially acquire the second data, and the merge value determination rule is "determine the value of the first data corresponding to the current position as the merge value", the processing circuit may call the selector to implement the data merging processing, and the merging process sequence is: for the position 1, the value "d" at the second position in the second row of the first data and the value "q" at the position 1 in the second data are obtained first, and then the numerical value at the position 1 of the merged data is changed from the original "q" to "d", i.e., d, q → d. For theAt position 2, the value "c" at the first position in the second row of the first data and the value "w" at position 2 in the second data are obtained first, and then the numerical value at position 2 of the merged data is changed from the original "w" to "c", i.e., c, w → c. For the position 3, the value "b" at the second position in the first row of the first data and the value "g" at the position 3 in the second data are obtained first, and then the numerical value at the position 3 of the merged data is changed from the original "g" to "b", that is, b, g → b. For the position 4, the value "a" of the first position in the first row of the first data and the value "j" of the position 4 in the second data are obtained first, and then the numerical value of the position 4 of the merged data is changed from the original "j" to "a", i.e., d, j → a. The merged data obtained after the merging process is

。

If the two data are determined to be merged into a sequence according to the same-dimension merging instruction, and the first data and the second data are sequentially acquired, the merged value determination rule is that a value obtained by multiplying a value of the first data corresponding to the current position and a value of the second data is determined to be a merged value, the processing circuit can call a multiplier to realize data merging processing, and the merging process sequence is as follows: for the position 1, the value "a" at the first position in the first row of the first data and the value "q" at the position 1 in the second data are obtained first, and then the numerical value at the position 1 of the merged data is changed from the original "q" to "a", that is, a, q → a × q. For the position 2, the value "b" at the second position in the first row of the first data and the value "w" at the position 2 in the second data are obtained first, and then the numerical value at the position 2 of the merged data is changed from the original "w" to "b", that is, b, w → b x w. For the position 3, the value "c" at the first position in the second row of the first data and the value "g" at the position 3 in the second data are obtained first, and then the numerical value at the position 3 of the merged data is changed from the original "g" to "c", that is, c, g → c x g. For the position 4, the value "d" at the second position in the second row of the first data and the value "j" at the position 4 in the second data are obtained first, and then the numerical value of the position 4 of the merged data is changed from the original "j" to "d", that is, d, j → d x j. The merged data obtained after the merging process is

。

If the two data are determined to be merged into a sequence according to the same-dimension merging instruction, and the first data and the second data are sequentially acquired, the merge value determination rule is that "the greater determination of the value of the first data and the value of the second data corresponding to the current position is the merge value", the processing circuit may call the comparator to implement data merging processing, and the merging process sequence is as follows: for position 1, since a is greater than q, the value of the merged data position 1 changes from "q" to "a", i.e., a, q → a. For position 2, since w is greater than b, the value of data position 2 after merging is unchanged, i.e., b, w → w. For position 3, since c is greater than g, the value of the merged data position 3 changes from "g" to "c", i.e., c, g → c. For location 4, since j is greater than d, the value of merged data location 4 is unchanged, i.e., d, j → j. The merged data obtained after the merging process is

。

Assume that the device receives a one-dimensional merge command 2:

merge.1 no1 x. h2 data. 2 2 3 20 30 2 20 30 add0 add1 add2

in merge.1, merge indicates that it is a same-dimension merge instruction for merging two matrices, and 1 indicates merge processing type designation region merge. no1 indicates that the merge value determination rule is "determine the value of the first data corresponding to the current position as the merge value". x and 121 represent type parameters of 'the initial merging position is the second position on the first line, and the merging dimension order is the corresponding merging of the original dimension order of the first data'. data, 223203022030 denotes that the data parameter is "the number of dimensions of the first data and the second data is 2; the lengths of the first data corresponding to two different dimensions are respectively 2 and 3, namely the first data is a matrix of 2 x 3; the second data corresponds to two different dimensions with lengths of 20 and 30 respectively, that is, the second data is a matrix with 20 × 30. The dimension number of the merged data is 2, the lengths corresponding to two different dimensions are 20 and 30 respectively, that is, the merged data is a 20 × 30 matrix ". The address of the first data is addr0, the address of the second data is addr1, and the address of the merged data is addr 2.

The device processes the same-dimension merging instruction 2 by the following steps: the instruction analysis circuit analyzes the same-dimensional merging instruction 2 to obtain an operation code (merge.1) and an operation domain (no 1 x. h2 data, 223203022030 add 0add 1 add 2) corresponding to the same-dimensional merging instruction, and then the data acquisition circuit acquires first data from add0 and second data from add1 according to the operation domain and caches the first data and the second data to the storage circuit. The parameter determination circuit determines its corresponding merge parameter based on the operation field and the operation code (as described above). And after acquiring the first data and the second data from the storage circuit, the processing circuit performs merging processing on the first data and the second data according to the merging parameters to obtain merged data and stores the merged data to add 2.

It should be understood that the above-described apparatus embodiments are merely illustrative and that the apparatus of the present disclosure may be implemented in other ways. For example, the division of the units/circuits in the above embodiments is only one logic function division, and there may be another division manner in actual implementation. For example, various units, circuits, or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented.

In addition, unless otherwise specified, each functional unit/circuit in each embodiment of the present disclosure may be integrated into one unit/circuit, each unit/circuit may exist alone physically, or two or more units/circuits may be integrated together. The integrated unit/circuit may be implemented in the form of hardware, or may be implemented in the form of a software program circuit.

If the integrated unit/circuit is implemented in hardware, the hardware may be a digital circuit, an analog circuit, or the like. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. The artificial intelligence processor may be any suitable hardware processor, such as a CPU, GPU, FPGA, DSP, ASIC, etc., unless otherwise specified. Unless otherwise specified, the Memory unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as resistive Random Access Memory rram (resistive Random Access Memory), Dynamic Random Access Memory dram (Dynamic Random Access Memory), Static Random Access Memory SRAM (Static Random-Access Memory), enhanced Dynamic Random Access Memory edram (enhanced Dynamic Random Access Memory), High-Bandwidth Memory HBM (High-Bandwidth Memory), hybrid Memory cubic hmc (hybrid Memory cube), and so on.

The integrated unit/circuit, if implemented in the form of a software program circuit and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

FIG. 6 shows a flow diagram of a merge instruction processing method according to an embodiment of the disclosure. As shown in fig. 6, the method is applied to the merge instruction processing apparatus described above, and includes steps S41 to S45.

In step S41, the obtained merge instruction is analyzed to obtain the operation code and the operation domain of the merge instruction.

In step S42, first data and second data necessary for the merge process are acquired based on the operation code and the operation field.

In step S43, the first data and/or the second data are cached.

In step S44, a merge parameter necessary for performing the merge process is determined.

In step S45, the cached first data and/or the cached second data are/is read, the first data and the second data are merged by using the arithmetic unit according to the merge parameter to obtain merged data, and the merged data are stored.

Wherein the merge instruction comprises at least one of a different-dimension merge instruction and a same-dimension merge instruction. And when the merging instruction is a different-dimension merging instruction, the dimension number of the first data is smaller than that of the second data. And when the merging instruction is an identical-dimension merging instruction, the dimension number of the first data is equal to the dimension number of the second data. The operation code is used for indicating that the processing of the merging instruction on the data is merging processing, and the operation domain comprises the first data address and the second data address.

According to the merging instruction processing method provided by the embodiment of the disclosure, merging of data can be realized through one merging instruction, the merging speed is high, the occupied cache is less, merging of data with different sizes can be supported more flexibly and effectively during large-scale merging operation, the format of the merging instruction is simplified, and the merging instruction is convenient to use.

In one possible implementation, the processing circuit of the apparatus includes a master processing sub-circuit and a plurality of slave processing sub-circuits, wherein step S41 may include: and analyzing the merging instruction to obtain a plurality of operation instructions, and sending the operation instructions to the main processing sub-circuit. And transmitting the first data and the second data to the main processing sub-circuit in step S42. The merge parameter is sent to the main processing sub-circuit in step S44. Wherein, the step S45 may include:

the slave processing sub-circuit is used for carrying out merging processing according to the operation instruction, the corresponding data and the merging parameter to obtain an intermediate merging result, and the intermediate merging result is sent to the main processing sub-circuit;

and utilizing the main processing sub-circuit to execute subsequent processing on the received intermediate results to obtain merged data, and storing the merged data.

In one possible implementation, the processing circuit includes a master processing sub-circuit and a plurality of slave processing sub-circuits, wherein step S42 may include: and merging the first data and the second data by using the main processing sub-circuit according to the merging parameters to obtain merged data, and storing the merged data.

In one possible implementation, the processing circuit includes a master processing sub-circuit and a plurality of slave processing sub-circuits, wherein step S41 may include: and analyzing the merging instruction to obtain a plurality of operation instructions, and sending the operation instructions to the main processing sub-circuit. And transmitting the merged parameters to the main processing sub-circuit in step S44. Wherein, the step S45 may include:

distributing corresponding operation instructions for the plurality of slave processing sub-circuits by using the master processing sub-circuit, and sending at most one of the distributed operation instructions, the corresponding merging parameters, the first data and the second data to the slave processing sub-circuit;

receiving at most one of the distributed operation instruction, the corresponding merging parameter, the first data and the second data from the main processing sub-circuit by using the auxiliary processing sub-circuit, merging the first data and the second data according to the distributed operation instruction and the corresponding merging parameter to obtain an intermediate merging result, and sending the intermediate merging result to the main processing sub-circuit;

and utilizing the main processing sub-circuit to execute subsequent processing on the received intermediate merging results to obtain merged data, and storing the merged data.

In a possible implementation manner, when the merge instruction is a different-dimensional merge instruction, the first data is a vector and the second data is a matrix; or the first data is a vector and the second data is a tensor with the order of at least 3; or, the first data is a matrix and the second data is a tensor with an order of at least 3.

In a possible implementation manner, when the merge instruction is a different-dimensional merge instruction, the merge parameter may include at least one of a merge processing type, a type parameter corresponding to the merge processing type, a merge value determination rule, and a data parameter,

when the merge instruction is a different-dimensional merge instruction, the merge processing type may include at least one of: specifying dimension direction combination and circularly combining according to the dimension direction, wherein the type parameter can comprise at least one of the following items: initial merging position, designated dimension, designated area in the designated dimension, cycle number,

when the merge instruction is a different-dimensional merge instruction, the merge value determination rule may include any one of: determining a value of first data corresponding to a current position as a combined value, determining a value of second data corresponding to the current position as a combined value, and determining a value obtained by performing arithmetic operation and/or logic operation on the value of the first data corresponding to a designated position and the value of the second data as a combined value, wherein the designated position may include any one or more positions of all positions of the second data which need to be combined,

when the merge instruction is a multidimensional merge instruction, the data parameters may include at least one of: the number of dimensions of the first data and the length of the corresponding dimension, the number of dimensions of the second data and the initial length of the corresponding dimension, and the number of dimensions of the merged data and the merged length of the corresponding dimension.

determining first data to be combined according to the first data;

In one possible implementation, when the merge instruction is a one-dimensional merge instruction, the first data and the second data are vectors; or, the first data and the second data are matrixes; alternatively, the first data and the second data are tensors,

wherein the merge parameter includes at least one of a merge processing type, a type parameter corresponding to the merge processing type, a merge value determination rule, and a data parameter,

the merge processing type includes at least one of: and appointing region merging and circularly merging according to the direction, wherein the type parameters comprise at least one of the following parameters: initial merging position, merging dimension sequence, region to be merged, circulation direction, circulation times and circulation interval,

the merged value determination rule includes any one of: determining a value of first data corresponding to a current position as a combined value, determining a value of second data corresponding to the current position as a combined value, and determining a value obtained by performing arithmetic operation and/or logic operation on the value of the first data corresponding to a designated position and the value of the second data as a combined value, wherein the designated position comprises any one or more positions in all positions needing to be combined in the second data,

the data parameters include at least one of: the dimension number of the first data and the second data, the length of the dimension corresponding to the first data, the initial length of the dimension corresponding to the second data, the dimension number of the merged data and the merged length of the corresponding dimension.

In one possible implementation, the merging process performed by the master processing sub-circuit or the slave processing sub-circuit may include the steps of: when the merging processing type is determined to be designated area merging, determining areas to be merged in the second data, which need to be merged, according to the first data, the initial merging position and the merging dimension sequence;

In one possible implementation, step S45 may further include: when it is determined that a bit-padding condition is met, performing bit-padding processing by using the determined bit-padding value, where the bit-padding condition may include any one of: the merged length of at least one dimension in the dimensions of the merged data is larger than the corresponding initial length, and the data length determined according to the initial merging position and the first data is smaller than the merged length of the corresponding dimension in the merged data; the merged length of at least one of the merged dimensions of the merged data is greater than the corresponding initial length, and the data length determined according to the initial merging position and the first data is less than the merged length of the corresponding dimension of the merged data. And the complement value is a preset value.

In one possible implementation, step S45 may further include: upon determining that the deletion condition is satisfied, a deletion process is performed,

wherein the deletion condition includes any one of: and the data length determined according to the initial merging position, the circulation interval and the area to be merged is greater than the merged length of the corresponding dimension in the merged data.

In one possible implementation, the storage circuit of the apparatus may include a plurality of storage sub-circuits, each for storing processing data of a corresponding slave processing sub-circuit, the processing data of the slave processing sub-circuit including one or more of an operation instruction, data to be operated on, a merge parameter, and an intermediate merge result.

In a possible implementation manner, the operation code and/or the operation field is used to indicate the merge parameter, where step S44 may include: and when the operation code and/or the operation domain are used for indicating the merging parameters, determining part or all of the parameters of the merging parameters according to the operation code and/or the operation domain.

In one possible implementation, the method may further include:

storing the merge instruction;

and storing an instruction queue, wherein the instruction queue comprises a plurality of instructions to be executed which are sequentially arranged according to an execution sequence, and the plurality of instructions to be executed comprise the merging instruction.

In one possible implementation, the method may further include: when determining that a first to-be-executed instruction in the plurality of to-be-executed instructions is associated with a zeroth to-be-executed instruction before the first to-be-executed instruction, caching the first to-be-executed instruction, and after the zeroth to-be-executed instruction is executed, controlling to execute the first to-be-executed instruction,

wherein the association relationship between the first to-be-executed instruction and a zeroth to-be-executed instruction before the first to-be-executed instruction comprises:

and a first storage address interval for storing the data required by the first instruction to be executed and a zeroth storage address interval for storing the data required by the zeroth instruction to be executed have an overlapped area.

It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and circuits described are not necessarily required for the disclosure.

It should be further noted that, although the steps in the flowchart of fig. 6 are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 6 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

In a possible implementation manner, an artificial intelligence chip is also disclosed, which comprises the merge instruction processing device.

In a possible implementation manner, a board card is further disclosed, which comprises a storage device, an interface device, a control device and the artificial intelligence chip; wherein, the artificial intelligence chip is respectively connected with the storage device, the control device and the interface device; the storage device is used for storing data; the interface device is used for realizing data transmission between the artificial intelligence chip and external equipment; and the control device is used for monitoring the state of the artificial intelligence chip.

Fig. 7 shows a block diagram of a board according to an embodiment of the present disclosure, and referring to fig. 7, the board may include other kit components besides the chip 389, where the kit components include, but are not limited to: memory device 390, interface device 391 and control device 392;

the storage device 390 is connected to the artificial intelligence chip through a bus for storing data. The memory device may include a plurality of groups of memory cells 393. Each group of the storage units is connected with the artificial intelligence chip through a bus. It is understood that each group of the memory cells may be a DDR SDRAM (Double Data Rate SDRAM).

In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And arranging a controller for controlling DDR (double data rate) in the chip, wherein the controller is used for controlling data transmission and data storage of each group of storage units.

The interface device is electrically connected with the artificial intelligence chip. The interface device is used for realizing data transmission between the artificial intelligence chip and external equipment (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface, so as to implement data transfer. In another embodiment, the interface device may also be another interface, and the disclosure does not limit the specific expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation result of the artificial intelligence chip is still transmitted back to the external device (e.g. server) by the interface device.

The control device is electrically connected with the artificial intelligence chip. The control device is used for monitoring the state of the artificial intelligence chip. Specifically, the artificial intelligence chip and the control device can be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). As the artificial intelligence chip can comprise a plurality of processing chips, a plurality of processing cores or a plurality of processing circuits, a plurality of loads can be driven. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the artificial intelligence chip.

In one possible implementation, an electronic device is disclosed that includes the artificial intelligence chip described above. The electronic device comprises a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

Fig. 8 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 8, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more circuits that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 may include multimedia circuitry to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface circuitry, which may be a keyboard, click wheel, buttons, and the like. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes Near Field Communication (NFC) circuitry to facilitate short-range communications. For example, the NFC circuit may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 9 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 9, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more circuits each corresponding to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The embodiments of the present disclosure have been described in detail, and the principles and embodiments of the present disclosure are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present disclosure. Meanwhile, a person skilled in the art should, based on the idea of the present disclosure, change or modify the specific embodiments and application scope of the present disclosure. In view of the above, the description is not intended to limit the present disclosure.

Claims

1. A merge instruction processing apparatus, the apparatus comprising:

the storage circuit is used for caching the received first data and/or the second data;

when the merging instruction is a different-dimension merging instruction, the dimension number of the first data is smaller than that of the second data, the first data is a vector, and the second data is a matrix; or the first data is a vector and the second data is a tensor with the order of at least 3; or the first data is a matrix and the second data is a tensor with an order of at least 3,

when the merging instruction is a same-dimension merging instruction, the dimension number of the first data is equal to the dimension number of the second data, and the first data and the second data are vectors; or, the first data and the second data are matrixes; alternatively, the first data and the second data are tensors,

the operation code is used for indicating that the merging instruction processes data into merging processing, the operation domain comprises a first data address and a second data address,

the processing circuit comprises a main processing sub-circuit and a plurality of slave processing sub-circuits, and the mode of the device for realizing data merging by using the main processing sub-circuit and the plurality of slave processing sub-circuits comprises any one of the following modes:

the instruction analysis circuit is further configured to analyze the merge instruction to obtain a plurality of operation instructions and send the operation instructions to the main processing sub-circuit, the data acquisition circuit is further configured to send the first data and the second data to the main processing sub-circuit, and the parameter determination circuit is further configured to send the merge parameter to the main processing sub-circuit; the main processing sub-circuit is used for carrying out preceding processing on the first data and the second data and carrying out transmission of data and/or operation instructions with the plurality of slave processing sub-circuits; the slave processing sub-circuit is used for carrying out merging processing according to the operation instruction, the corresponding data and the merging parameter to obtain an intermediate merging result and sending the intermediate merging result to the main processing sub-circuit; the main processing sub-circuit is further configured to perform subsequent processing on the received multiple intermediate results to obtain merged data, and store the merged data;

or

The main processing sub-circuit is configured to perform merging processing on the first data and the second data according to the merging parameter to obtain merged data, and store the merged data;

or

The instruction analysis circuit is further configured to analyze the merge instruction to obtain a plurality of operation instructions and send the plurality of operation instructions to the main processing sub-circuit, and the parameter determination circuit is further configured to send the merge parameter to the main processing sub-circuit; the main processing sub-circuit is used for distributing corresponding operation instructions for the plurality of slave processing sub-circuits and sending at most one of the distributed operation instructions, the corresponding merging parameters, the first data and the second data to the slave processing sub-circuits; the slave processing sub-circuit is configured to receive at most one of the distributed operation instruction, the corresponding merging parameter, the first data, and the second data from the master processing sub-circuit, merge the first data and the second data according to the distributed operation instruction and the corresponding merging parameter to obtain an intermediate merging result, and send the intermediate merging result to the master processing sub-circuit; the main processing sub-circuit is further configured to perform subsequent processing on the received multiple intermediate merged results to obtain merged data, and store the merged data.

2. The apparatus of claim 1, wherein when the merge instruction is a different-dimensional merge instruction,

the merge parameter includes at least one of a merge process type, a type parameter corresponding to the merge process type, a merge value determination rule, and a data parameter,

the merge processing type includes at least one of: appointing dimension direction combination and circularly combining according to the dimension direction, wherein the type parameters comprise at least one of the following items: initial merging position, designated dimension, designated area in the designated dimension, cycle number,

the data parameters include at least one of: the number of dimensions of the first data and the length of the corresponding dimension, the number of dimensions of the second data and the initial length of the corresponding dimension, and the number of dimensions of the merged data and the merged length of the corresponding dimension.

3. The apparatus of claim 2, wherein the combining process performed by the master processing sub-circuit or the slave processing sub-circuit comprises:

determining first data to be combined according to the first data;

4. The apparatus of claim 2, wherein the combining process performed by the master processing sub-circuit or the slave processing sub-circuit comprises:

5. The apparatus of claim 1, wherein when the merge instruction is a one-dimensional merge instruction,

6. The apparatus of claim 5, wherein the combining process performed by the master processing sub-circuit or the slave processing sub-circuit comprises any one of:

when the merging processing type is determined to be designated area merging, determining areas to be merged in the second data, which need to be merged, according to the first data, the initial merging position and the merging dimension sequence; or

And when the merging processing type is determined to be the direction circular merging, determining a plurality of areas to be merged in the second data, which need to be merged, according to the first data, the initial merging position, the circular direction, the circular times, the circular interval and the merging dimension sequence.

7. The apparatus of claim 6, wherein the combining process performed by the master processing sub-circuit or the slave processing sub-circuit further comprises:

determining first data to be merged according to the first data and the merging dimension sequence;

and determining values of all positions in the region to be merged according to the determined merging value determination rule, the obtained value of the first data and the value in the region to be merged to obtain merged data.

8. The apparatus of claim 2, wherein the processing circuit further comprises:

a first bit complement processing sub-circuit for performing bit complement processing using the determined bit complement value when it is determined that the bit complement condition is satisfied,

wherein the bit-filling condition comprises: the merged length of at least one of the merged dimensions of the merged data is greater than the corresponding initial length, and the data length determined according to the initial merging position and the first data is less than the merged length of the corresponding dimension of the merged data.

9. The apparatus of claim 5, wherein the processing circuit further comprises:

a second bit complement processing sub-circuit for performing bit complement processing using the determined bit complement value when it is determined that the bit complement condition is satisfied,

wherein the bit-filling condition comprises: the merged length of at least one dimension in the merged data is larger than the corresponding initial length, and the data length determined according to the initial merging position, the cycle interval and the region to be merged is smaller than the merged length of the corresponding dimension in the merged data.

10. The apparatus of claim 7, wherein the processing circuit further comprises:

a deletion processing sub-circuit for executing deletion processing when it is determined that the deletion condition is satisfied,

11. The apparatus of claim 1,

the storage circuit comprises a plurality of storage sub-circuits, each storage sub-circuit is used for storing processing data of a corresponding slave processing sub-circuit, and the processing data of the slave processing sub-circuit comprises one or more of an operation instruction, data to be operated, a merging parameter and an intermediate merging result.

12. The apparatus of claim 1, wherein the operation code and/or the operation field is used to indicate the merge parameter,

13. The apparatus of claim 1, wherein the operator comprises at least one of a selector, a random number generator, a pseudo-random number generator, an adder, a subtractor, a multiplier, and a comparator.

14. The apparatus of claim 1, further comprising:

instruction storage circuitry to store the merge instruction;

a queue storage circuit for storing an instruction queue, the instruction queue comprising a plurality of instructions to be executed arranged in sequence according to an execution order, the plurality of instructions to be executed comprising the merge instruction,

wherein the apparatus further comprises:

15. A merge instruction processing method applied to a merge instruction processing apparatus, the method comprising:

analyzing the obtained merging instruction by using an instruction analyzing circuit in the device to obtain an operation code and an operation domain of the merging instruction;

acquiring first data and second data required for merging processing based on the operation code and the operation domain by using a data acquisition circuit in the device;

caching the first data and/or the second data with storage circuitry in the apparatus;

determining a merging parameter required for merging processing by using a parameter determining circuit in the device;

reading the cached first data and/or the cached second data by using a processing circuit in the device, merging the first data and the second data by using an arithmetic unit according to the merging parameter to obtain merged data, and storing the merged data,

the processing circuit of the device comprises a main processing sub-circuit and a plurality of auxiliary processing sub-circuits, the processing circuit in the device is used for reading the cached first data and/or the cached second data, an arithmetic unit is used for carrying out merging processing on the first data and the second data according to the merging parameters to obtain merged data, and the merged data are stored, and the method comprises any one of the following operations:

analyzing the merging instruction by using the instruction analyzing circuit to obtain a plurality of operation instructions, sending the operation instructions to the main processing sub-circuit, sending the first data and the second data to the main processing sub-circuit by using the data acquisition circuit, and sending the merging parameter to the main processing sub-circuit by using the parameter determining circuit; the main processing sub-circuit is used for carrying out preceding processing on the first data and the second data and carrying out transmission of data and/or operation instructions with the plurality of slave processing sub-circuits; the slave processing sub-circuit is used for carrying out merging processing according to the operation instruction, the corresponding data and the merging parameter to obtain an intermediate merging result, and the intermediate merging result is sent to the main processing sub-circuit; utilizing the main processing sub-circuit to execute subsequent processing on the received intermediate results to obtain merged data, and storing the merged data;

or

Merging the first data and the second data by using the main processing sub-circuit according to the merging parameters to obtain merged data, and storing the merged data;

or

Analyzing the merging instruction by using the instruction analyzing circuit to obtain a plurality of operation instructions, sending the operation instructions to the main processing sub-circuit, and sending the merging parameter to the main processing sub-circuit by using the parameter determining circuit; distributing corresponding operation instructions for the plurality of slave processing sub-circuits by using the master processing sub-circuit, and sending at most one of the distributed operation instructions, the corresponding merging parameters, the first data and the second data to the slave processing sub-circuit; receiving at most one of the distributed operation instruction, the corresponding merging parameter, the first data and the second data from the main processing sub-circuit by using the auxiliary processing sub-circuit, merging the first data and the second data according to the distributed operation instruction and the corresponding merging parameter to obtain an intermediate merging result, and sending the intermediate merging result to the main processing sub-circuit; and utilizing the main processing sub-circuit to execute subsequent processing on the received intermediate merging results to obtain merged data, and storing the merged data.

16. An artificial intelligence chip, characterized in that the chip comprises a merging instruction processing device according to any one of claims 1 to 14.

17. An electronic device, characterized in that the electronic device comprises an artificial intelligence chip according to claim 16.

18. The utility model provides a board card, its characterized in that, the board card includes: a memory device, an interface device and a control device and an artificial intelligence chip according to claim 16;

the storage device is used for storing data;

the control device is used for monitoring the state of the artificial intelligence chip,

wherein the memory device comprises: the artificial intelligence chip comprises a plurality of groups of storage units, wherein each group of storage unit is connected with the artificial intelligence chip through a bus, and the storage units are as follows: DDR SDRAM;

the chip includes: the DDR controller is used for controlling data transmission and data storage of each group of the storage units;

the interface device is as follows: a standard PCIE interface.

19. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of claim 15.

20. A computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the method of claim 15.