CN114677549A - Method for reducing multidimensional vector, electronic equipment and storage medium - Google Patents

Method for reducing multidimensional vector, electronic equipment and storage medium Download PDF

Info

Publication number
CN114677549A
CN114677549A CN202011551576.9A CN202011551576A CN114677549A CN 114677549 A CN114677549 A CN 114677549A CN 202011551576 A CN202011551576 A CN 202011551576A CN 114677549 A CN114677549 A CN 114677549A
Authority
CN
China
Prior art keywords
axis
basic block
protocol
reduction
subjected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011551576.9A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Cambricon Information Technology Co Ltd
Original Assignee
Anhui Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Cambricon Information Technology Co Ltd filed Critical Anhui Cambricon Information Technology Co Ltd
Priority to CN202011551576.9A priority Critical patent/CN114677549A/en
Priority to PCT/CN2021/133658 priority patent/WO2022135049A1/en
Publication of CN114677549A publication Critical patent/CN114677549A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention relates to an electronic device, a method and a readable storage medium for stipulating a multi-dimensional image vector, wherein the processing means of the invention are comprised in an integrated circuit device comprising a common interconnect interface and computing means. The computing device interacts with the processing device to jointly complete computing operations specified by the user. The integrated circuit device may further comprise a storage device, which is connected to the computing device and the processing device, respectively, for data storage of the computing device and the processing device.

Description

Method for reducing multidimensional vector, electronic equipment and storage medium
Technical Field
The present invention relates generally to the field of neural networks. More particularly, the present invention relates to a method of reducing a multi-dimensional image vector, an electronic device and a readable storage medium.
Background
In processing image vectors, a convention is often used to compress the dimensions of the image vectors. In the scene of the multi-axis protocol, the universal multi-axis protocol can be converted into a single-axis protocol, and the calculation of the multi-axis protocol is completed through the circulating single-axis protocol. The following problems exist with the general multi-axis protocol scheme: a temporary space (works space) needs to be created for storing the temporary result of each single-axis protocol, and the memory space is wasted; and reading the temporary result from the works space to perform the next specification, which is a repeated IO and influences the performance of the operator.
For the non-continuous multi-axis protocol, the scheme of the Tensorflow framework is as follows: firstly, transpose (transpose) operation is carried out on an input vector (tensor), all dimensions to be subjected to reduction are transferred to the low dimensions of the tensor, and then the low dimensions of the tensor are uniformly subjected to reduction. This solution also has two drawbacks: a transposition operator needs to be configured, and the transposition operator can generate calculation time consumption and IO time consumption; the transposed calculation result also needs to create a workspace as a temporary space to store the intermediate result, and additionally occupies the memory space.
It follows that neither of the current solutions is ideal. In order to solve the above problem, the present invention proposes a scheme for multi-axis reduction of multi-dimensional vectors.
Disclosure of Invention
To at least partially solve the technical problems mentioned in the background, an aspect of the present invention provides a method of reducing a multi-dimensional image vector, a readable storage medium, and an electronic device.
In one aspect, the present invention discloses a method for reducing a multi-dimensional image vector, the method comprising: setting the dimensionality of the image vector as a protocol group; determining a first axis to be subjected to protocol conversion in the protocol group according to a specific sequence; dividing the reduction group into a first basic block and a second basic block based on the axis to be reduced, wherein the first basic block comprises all dimensions before the axis to be reduced in the dimension of the image vector, and the second basic block comprises all dimensions after the axis to be reduced in the dimension of the image vector; judging whether an axis to be subjected to protocol conversion exists in the second basic block; if present, performing the following steps: updating the reduction group with all dimensions within the second base block; executing the determining, dividing and judging steps until no axis to be subjected to protocol specification exists in the second basic block; and performing accumulation operation on the at least one first basic block and the at least one second basic block to obtain a reduction result of the image vector.
In another aspect, the present invention discloses an electronic device, comprising: a processor; a memory for storing executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.
In another aspect, the present invention discloses a computer readable storage medium having stored thereon computer program instructions for a reduced multidimensional image vector, wherein the computer program instructions, when executed by a server, implement the method described above.
The method determines the axes to be subjected to protocol conversion according to the dimensionality of the image vector, and calculates the size of a basic block corresponding to each axis to be subjected to protocol conversion. On the basis of the basic block, a plurality of axes to be subjected to protocol are accumulated, and the protocol operation of the plurality of axes to be subjected to protocol is completed at one time. And no intermediate variable is generated, so that the temporary space is saved, and multiple IO operations in the middle are avoided. The invention converts the multiple specification operations on the multidimensional vector into the one-time accumulation operation on the multidimensional vector data, thereby improving the operation efficiency.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. In the accompanying drawings, several embodiments of the present invention are illustrated by way of example and not by way of limitation, and like reference numerals designate like or corresponding parts throughout the several views, in which:
fig. 1 is a schematic structural diagram showing a board card according to an embodiment of the present invention;
FIG. 2 is a block diagram illustrating an integrated circuit device of an embodiment of the invention;
fig. 3 is a schematic diagram showing a multi-axis protocol of an embodiment of the present invention;
FIG. 4 is a method flow diagram illustrating an embodiment of the invention;
FIG. 5 is a schematic diagram illustrating an embodiment of the present invention;
FIG. 6 is a method flow diagram illustrating an embodiment of the present invention;
FIG. 7 is a method flow diagram illustrating an embodiment of the present invention;
FIG. 8 is a schematic diagram illustrating an embodiment of the present invention; and
fig. 9 is a device diagram showing an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be understood that the terms "first", "second", "third" and "fourth", etc. in the claims, the description and the drawings of the present invention are used for distinguishing different objects and are not used for describing a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification and claims of this application, the singular form of "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this specification refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection".
The following detailed description of embodiments of the invention refers to the accompanying drawings.
Fig. 1 shows a schematic structural diagram of a board card 10 according to an embodiment of the disclosure. As shown in fig. 1, the board card 10 includes a Chip 101, which is a System-on-Chip (SoC) or System-on-Chip, and is integrated with one or more combined processing devices, which are artificial intelligence arithmetic units, for supporting various deep learning and machine learning algorithms, and meeting the intelligent processing requirements in the fields of computer vision, speech, natural language processing, data mining, and the like under complex scenes. Especially, the deep learning technology is widely applied to the field of cloud intelligence, and one remarkable characteristic of the cloud intelligence application is that the input data size is large, and the requirements on the storage capacity and the computing capacity of the platform are high.
The chip 101 is connected to an external device 103 through an external interface device 102. The external device 103 is, for example, a server, a computer, a camera, a display, a mouse, a keyboard, a network card, a wifi interface, or the like. The data to be processed may be transferred by the external device 103 to the chip 101 through the external interface device 102. The calculation result of the chip 101 may be transmitted back to the external device 103 via the external interface device 102. The external interface device 102 may have different interface forms, such as a PCIe interface, according to different application scenarios.
The card 10 also includes a memory device 104 for storing data, which includes one or more memory cells 105. The memory device 104 is connected and data-transferred with the control device 106 and the chip 101 through a bus. The control device 106 in the board 10 is configured to regulate the state of the chip 101. For this purpose, in an application scenario, the control device 106 may include a single chip Microcomputer (MCU).
Fig. 2 is a structural diagram showing a combined processing device in the chip 101 of this embodiment. As shown in fig. 2, the combined processing device 20 includes a computing device 201, an interface device 202, a processing device 203, and a DRAM 204.
The computing device 201 is configured to perform user-specified operations, mainly implemented as a single-core smart processor or a multi-core smart processor, to perform deep learning or machine learning computations, which may interact with the processing device 203 through the interface device 202 to collectively perform the user-specified operations.
The interface device 202 is used for transmitting data and control instructions between the computing device 201 and the processing device 203. For example, the computing device 201 may obtain input data from the processing device 203 via the interface device 202, and write to a storage device on the computing device 201. Further, the computing device 201 may obtain the control instruction from the processing device 203 via the interface device 202, and write the control instruction into a control cache on the computing device 201. Alternatively or optionally, the interface device 202 may also read data from a storage device of the computing device 201 and transmit the data to the processing device 203.
The processing device 203, as a general purpose processing device, performs basic control including, but not limited to, data transfer, starting and/or stopping of the computing device 201, and the like. Depending on the implementation, the processing device 203 may be one or more types of Central Processing Unit (CPU), Graphics Processing Unit (GPU) or other general purpose and/or special purpose processor, including but not limited to a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc., and the number thereof may be determined according to actual needs. As previously mentioned, the computing device 201 of the present disclosure may be viewed as having a single core structure or an isomorphic multi-core structure only. However, when considered collectively, the computing device 201 and the processing device 203 are considered to form a heterogeneous multi-core structure.
The DRAM 204 is used for storing data to be processed, and is a DDR memory, which is typically 16G or larger in size and is used for storing data of the computing device 201 and/or the processing device 203.
In the neural network operation, many operators have a specification function, such as layerorm, batchnorm, groupnorm, weightnorm, normalize and other operators in normalization categories, and the specification calculation is required in the internal implementation of the operators. The image video processing belongs to the field of computer vision, and the tasks are main application scenes of neural network operation and use a large number of operators of normalized categories. Most of the tasks of speech recognition and natural language processing are based on transform bert network, and a great amount of reduction operators are directly used. A transform bert network is a method for solving natural language translation problems by completely utilizing an attention mechanism, wherein the attention mechanism is a problem solving method which is proposed by simulating human attention, and is simply to quickly screen out high-value information from a large amount of information and to solve the problem that a final reasonable vector representation is difficult to obtain when an input sequence of a model with a time sequence (such as an LSTM/RNN model) is long. The neural network performs processing on pictures, voice, video and the like by executing the operators.
Several basic concepts of the vector (tensor) reduction operation are explained below.
Vector reduction: and (4) calculating the dimensionality corresponding to the axis to be reduced in the vector to be reduced into one. The nature of the specification is dimension compression, but there are many ways to compress: summing, averaging, integrating, maximizing, etc. The most common is summation.
Vector shape (shape): the dimensions of the vector are described. For example, a two-dimensional vector has a shape of (2,3), and the first and second dimensions representing the two-dimensional vector are 2 and 3, respectively. Colloquially, the two-dimensional vector is a two-row three-column vector.
Protocol axis (axis): and identifying the dimension needing to be reduced in the vector. The dimensions and axes are different representations of the vector dimensions. For example, axis ═ 0 represents the reduction of the first dimension of the vector.
The present embodiment explains the above concepts in the most common four-dimensional tensor. The shape of the four-dimensional tenasor can be described as nchw, and the data format representing the feature map includes n, c, h, w dimensions, where n represents batch (batch), h represents height (height), w represents width (width), and c represents channel (channel). Taking the image data as an example, n represents how many images are in the batch, h represents how many pixels are in the vertical direction of the image, w represents the number of pixels in the horizontal direction, and c represents the number of channels (for example, the number of channels c of a black-and-white image is 1, and the number of channels c of an RGB color image is 3). The shape corresponds to dimensions, and if the shape is arranged in the order of nchw, the dimension n can be called dimension 0 (axis 0), and if the dimension n is reduced, the dimension 0 can be said to be reduced, which means that a plurality of batches are compressed into one batch. Similarly, the specification is made in the dimension c, so to speak, the specification is made in the axis 1, which represents that the multiple channels are compressed into one channel. Scaling in the h dimension, which can be said to be scaling in the 2-axis, represents compressing the height to one. Which dimension is specified is mainly determined according to a specific application scenario.
The specification is performed by summing up a two-dimensional tensor picture, and the specification is specifically described, and the process is applied to the board 10 or the combination device 20 and the processor thereof.
Assuming a two-dimensional picture vector with a shape of (2,3), the first and second dimensions representing the two-dimensional vector are 2 and 3, respectively. The two-dimensional vector is a two-row three-column vector, and specific data of the two-dimensional vector is assumed to be
Figure BDA0002858084150000061
and axis is 0, which means that the processor performs a reduction operation on the first dimension of the vector, and compresses the first dimension of the picture vector into one, that is, reduces shape (2,3) into (1, 3). Specifically, the specification is performed in the 0 dimension, and the tenor is added in the direction of the 0 dimension (column direction), and two lines are changed into one line. Specifically, 1+ 4-5, 2+ 5-7, 3+ 6-9, resulting in [5,7,9 ═ 9]The shape is changed from (2,3) to (1, 3). The specification is followed by vectorization of two rows and three columns into a vector of one row and three columns. Similarly, let axis be 1, which represents the processor to thisThe second dimension of the two-dimensional tensor is reduced to a column. Shape (2,3) is reduced to (2,1), accumulation is carried out in the row direction, and rows are compressed into a column. I.e. 1+2+ 3-6, 4+5+ 6-15, the result of the specification is therefore
Figure BDA0002858084150000071
For another example, for a 5-dimensional tensor, the shape size is (3,6,2,3,4), and the 3 rd axis, i.e., axis ═ 2, is reduced, the reduced dimension is (3,6,1,3, 4).
As can be seen from the above example, the specification is a compression of dimensions, which can reduce the storage space.
Natural semantic processing is a task of analyzing words, sentences, semantics or information, and in a natural language processing task (NLP task), a specification is often used, and a specification scenario is often a multi-axis specification. For example, in NLP task, the weightnorm operator is a more common operator, and the function of the operator is to normalize the weight w of the convolutional layer and extract the direction vector v and magnitude vector g of w. If the dimension of the input w is nchw, the dimension of the output v is nchw, the weights are normalized, the weights of all the dimensions are uniformly compressed into one dimension, the dimension c represents the number of channels and does not contain the weights, so that the dimension c is not compressed, the dimension of the obtained g is 1c11, the process of calculating the g comprises a multi-axis process, and meanwhile, the specification is carried out on the n axis, the h axis and the w axis.
Multi-axis reduction refers to the reduction of multiple dimensions in a multi-dimensional vector. In the single-axis protocol, axi is a number, while in the multi-axis protocol axi is an array, which may be two, three or even more. Further, when multiple axes are specified, the multiple axes may be continuous or discontinuous. For example, the above example of the weightnorm operator is a discontinuous specification. For another example, for a 5-dimensional tentsor, the shape size is (3,6,2,3, 4). Performing a multi-axis protocol on the 5-dimensional tensor, wherein when axis is [1,2], the dimension after the protocol is (3,1,1,3,4), which is a continuous multi-axis protocol; when axis is ═ 1,3], its reduced dimension is (3,1,2,1,4), which is a discontinuous multi-axis protocol.
Fig. 3 shows a schematic diagram of a multi-axis protocol. The input data is a five-dimensional vector with dimensions (3,6,2,3,4) and axis ═ 1,3] for specification. As shown in fig. 3, the processor loads the five-dimensional vector into the first storage space, and then performs a reduction operation on the second dimension of the five-dimensional vector stored in the first storage space, i.e., the corresponding dimension whose axis is 1, to obtain a temporary result (3,1,2,3, 4). At the same time the processor opens up a second storage space for storing the temporary result (3,1,2,3,4) obtained as a temporary space. Next, the processor reads the intermediate result from the second storage space, performs specification on the corresponding dimension whose axis is 3 to obtain a final specification result (3,1,2,1,4), and stores the final specification result in the first storage space to cover the original five-dimensional vector. The first storage space and the second storage space may be the memory devices 104 in the board 10 or the DRAMs 204 in the combination device 20. The processor is a processing device 204 in a combined device.
In summary, the multi-axis reduction method is accomplished by a round-robin single-axis reduction. The temporary result obtained after the protocol is completed each time needs to be saved by opening up a temporary space, which wastes memory space. And the temporary result is stored in the temporary space, and the data is read from the temporary space to carry out the next specification, so that the repeated IO is realized, and the performance of the operator is influenced.
An embodiment of the present invention provides an efficient multi-axis stipulation method for a multi-axis stipulation scenario. The method is applied to the processing device 204 in the board 10 or the combination device 20. Fig. 4 shows a flow chart of this method.
Step 401, setting the dimension of the image vector as a reduction group. Describing the dimension of the image vector is the shape of the image, and the dimension of the vector is set to a reduction group according to the shape of the image vector. As shown in fig. 5, which illustrates an embodiment of the present invention, taking 5-dimensional tensor (2,3,2,3,6) as an example, stage 501 shows that the vector is stored in the first storage space, the shape size is (2,3,2,3,6), and the dimension of the vector is set to be a reduction group, that is, the reduction group is (2,3,2,3, 6).
Step 402, determining a first axis to be subjected to protocol in a protocol group according to a specific sequence. The vector to be subjected to protocol conversion comprises one or more axes to be subjected to protocol conversion, the protocol group comprises all dimensions of the vector, and the first axis to be subjected to protocol conversion in the protocol group is determined according to a specific sequence. The specific order is a forward or reverse order. The forward direction refers to the order of the reduction groups from left to right, and the reverse direction is the order of the reduction axes from right to left. In addition, the specific order may also be an order from any dimension along a certain direction, and the present invention is not limited in any way.
Let 5-dimensional tensor's axis to be reduced in stage 501 be ═ 1,3, and the axis to be reduced representing the vector is the second and fourth dimensions. If the first axis to be reduced is determined in the reduction group in the forward order, the first axis to be reduced is in the order from left to right, that is, the first axis to be reduced is the dimension 51. If the first axis to be reduced is determined in the reduction group in reverse order, it is from right to left, and the first axis to be reduced is dimension 52. The embodiment is described by taking the reverse example, and the first axis to be specified is the dimension 52.
Step 403, dividing the reduction group into a first basic block and a second basic block based on the axis to be reduced, wherein the first basic block includes all dimensions before the axis to be reduced in the dimension of the image vector, and the second basic block includes all dimensions after the axis to be reduced in the dimension of the image vector.
And dividing the protocol group into two parts by taking the first axis to be protocol as a center. The divided first basic block or second basic block comprises 0,1 or a plurality of dimensions. When the first axis in the reduction group is the axis to be reduced, the axis to be reduced is represented without dimension, and then when the reduction group is divided by taking the axis to be reduced as the center, the first basic block is an empty set. Similarly, when the first axis to be reduced in the reduction group is the last axis in the reduction group, there is no dimension behind the axis to be reduced, so when dividing the reduction group by using the axis to be reduced as the center, the second basic block is an empty set.
In this embodiment, the front and the back of the axis to be subjected to protocol conversion are opposite, and for the forward sequence, the dimension on the left of the axis to be subjected to protocol conversion is before the axis to be subjected to protocol conversion, and the dimension on the right of the axis to be subjected to protocol conversion is after the axis to be subjected to protocol conversion; for the reverse order, the dimension on the right side of the axis to be subjected to protocol is before the axis to be subjected to protocol, and the dimension on the left side of the axis to be subjected to protocol is after the axis to be subjected to protocol.
Based on the reverse order, after determining that the first axis to be reduced is dimension 52, stage 502 divides the reduction group into basic blocks according to the first axis to be reduced. As shown in stage 502, the reduction group is divided into two parts, a first basic block 513 and a second basic block 523, wherein the first basic block 513 includes a corresponding dimension of (6) and the second basic block 513 includes a corresponding dimension of (2,3, 2).
And step 404, judging whether an axis to be subjected to protocol exists in the second basic block.
If the axis to be subjected to protocol conversion exists in the second basic block, step 405 is executed to update the protocol groups by all dimensions in the second basic block, step 402 is executed again based on the updated protocol groups, step 402 is executed again, step 404 is executed again, and then a first basic block and a second basic block are correspondingly divided until the axis to be subjected to protocol conversion does not exist in the second basic block. And correspondingly updating the number of times of the reduction groups according to the number of dimensions to be reduced in the image vector, and generating the number of first basic blocks and the number of second basic blocks.
In fig. 5, it is known from stage 502 that the second basic block 523 is (2,3,2), and since axis is [1,3], if there is an axis to be subjected to protocol conversion in the second basic block 523(2,3,2), the axis is updated to a protocol group in all dimensions in the second basic block 523, and the updated protocol group is (2,3, 2). And repeating the determining step, the dividing step and the interpreting step. The first axis to be reduced of the reduction group is dimension 51, and in stage 503, the vector is divided into a first basic block 533, i.e. (2,3,6), and a second basic block 543, i.e. (2), according to the first axis to be reduced. It is determined that no axis to be specified exists in the second basic block 543, and step 406 is performed.
And 406, performing accumulation operation on the at least one first basic block and the at least one second basic block to obtain a reduction result of the image vector.
As shown in step 405, each axis to be reduced in the vector corresponds to a set of the first basic block and the second basic block. When only one axis to be reduced exists in the vector, a first basic block and a second basic block are generated. And when a plurality of axes to be reduced exist in the vector, generating a plurality of first basic blocks and second basic blocks.
The example of fig. 5 generates two first and two second basic blocks, first and second basic blocks 513 and 523, respectively, and first and second basic blocks 533 and 543. And performing accumulation operation according to the generated first basic blocks 513 and 533 and second basic blocks 523 and 543 to obtain a reduction result of the image vector. The mode of the accumulative calculation and the mode of the specification are in one-to-one correspondence. If the image vector is reduced in a summation mode, the accumulative calculation is accumulation operation, and if the image vector is reduced in a multiplication mode, the accumulative calculation is accumulation multiplication operation. The reduction mode comprises one of summation, averaging, multiplication, maximum and minimum, and the corresponding accumulative operation comprises one of accumulative summation, averaging, multiplication, maximum and minimum.
In the embodiment, the basic block values and the offset values corresponding to all axes to be specified are found first, all values are taken out from the first storage space for calculation at one time according to the offset values, the final result is stored back into the first storage space, the (2,3,2,3,6) is directly specified as (2,1,2,1,6) through one-step calculation according to the basic block and the offset values, and the result is stored in the first storage space without additionally opening an intermediate storage space.
As shown in stage 504 in fig. 5, the specific process is:
obtaining a basic block value of each first basic block, wherein the basic block value is the product of all the non-reduction dimensions in the first basic block; and performing accumulation operation based on the basic block value. Returning to the example corresponding to fig. 5, two first basic blocks 513 and 533 are obtained according to the axis to be specified of the image vector, where the basic block value corresponding to the first basic block 513 is 6, and the basic block value corresponding to the first basic block 533 is 2 × 6 — 12.
Further, this embodiment determines whether the first base block is empty. If the first basic block is empty, the basic block value is set to 1. The dimensionality that first axle corresponds in the reduction group treats the reduction dimensionality, so when treating reduction dimensionality division first basis piece and second basis piece according to this, first basis piece is empty, also does not have the dimensionality of unconventional, when calculating basic block value, can't calculate according to the product of unconventional dimensionality, sets for this basic block value this moment and is 1. There is no empty first basic block in the example of fig. 5.
In the process of fetching data from an image vector, it is necessary to know not only the size of the fetched data, i.e., the basic block value, but also from which location the data was fetched. Further, the step of performing the accumulation operation further includes: calculating an offset, wherein the offset is the product of all dimensions in the first basic block; determining a data address of the accumulate operation based on the offset; and taking out the data with the size of the basic block value from the data address, and performing accumulation operation.
The offset is the distance between the actual address of the data storage in the memory cell and the first address of the memory cell in which the data storage is located. The address of the data can be obtained according to the offset, i.e. the first address plus the offset is the actual address of the data storage. The size of the offset is determined according to the first basic block, and the product of all dimensions in the first basic block is the size of the offset. And the offset corresponds to the axis to be reduced and the basic block value one by one. In the accumulation operation process, when a plurality of axes to be subjected to protocol reduction are performed, data corresponding to the size of the basic block value is taken from the position corresponding to the offset for operation. In the example of fig. 5, the offset amount corresponding to the first basic block 513 is 6, and the offset amount corresponding to the first basic block 533 is 2 × 3 × 6 — 36.
In an optional embodiment, the method further comprises: multiplying the unconventional dimensions in the first basic block in each second basic block to obtain first data; and dividing the accumulation operation into a plurality of portions of a certain size based on a value of the first data, wherein the plurality of portions perform the accumulation operation in parallel.
In the embodiment of the invention, the protocol group is divided into three parts according to the axis to be protocol: the device comprises a first foundation block, a shaft to be subjected to protocol conversion and a second foundation block. The three parts can be abstracted into a three-dimensional vector which is respectively a first dimension, a second dimension and a third dimension. Wherein, treat that the stipulation axle is the second dimension, represent to compress this dimension into one-dimensional, treat that the value of stipulation axle represents and need to get data several times and calculate and just can compress this dimension into a dimension, for example treat that the stipulation axle is 3, represent to have 3 data in this dimension, then need to get cubic data and calculate and just can be 1 number with 3 numbers. The first basic block is a first dimension, and a basic block value is obtained according to the first basic block. The value corresponding to the first dimension represents how many data corresponding to the same position in the second dimension participate in the calculation. Because a plurality of data operations (accumulation, access, etc.) corresponding to the same position in the second dimension are the same, the data operations can be taken as a whole, which is the meaning represented by the basic block value. The second basic block corresponds to the third dimension, and can be understood as how many groups of data need to be reduced in the third dimension. Wherein, each group of data operation is completely the same and can be operated in parallel.
According to the method for dividing the first basic block and the second basic block, each second basic block is taken as a starting point, the first basic block corresponding to the second basic block is found, and the product of all the non-stipulated dimensions in the first basic block is equivalent to the third dimension of the three-dimensional vector.
Therefore, in fig. 5, the basic block value, the offset, and the parallel times corresponding to the first axis dimension 52 to be reduced are: 6. 6, 2; the basic block value, the offset and the parallel times corresponding to the second axis dimension 51 to be reduced are respectively as follows: 12. 36 and 2.
The specific accumulation process is shown in stage 504 in FIG. 5: as can be seen from the above steps, the two parts corresponding to reference numerals (1) and (2) are the specification processes corresponding to 52 and 51 in fig. 5. The value of each small square is a basic block value, i.e. 6. Two adjacent small blocks differ by the storage distance of one small block, i.e. 6, so the offset is 6 for one small block. When only 52 is to be accumulated (in this embodiment, accumulation), the addresses of the fetched data are: in the process a, the first data is taken at the self position, and the address is 0, and the offset value is 6-0; in the process of b, the second data and the first data are taken for calculation, and the address of the second data is 1 x offset value 6-6; and c, taking the third data, wherein the address of the third data is 2 × offset 6-12. And respectively taking out data of 6 pieces of basic block values from the three addresses and carrying out accumulation operation. The label (2) is the other part parallel to the label (1), and the accumulation process is completely consistent.
Similarly, if only 51 in fig. 5 is reduced, that is, the first three dimensions (2,3,6) of the vector (2,3,2,3,6) are regarded as a whole D, and the second dimension of the vector (2,3, D) to be reduced is reduced. The analysis of the above steps results in a basic block value of 12 and an offset of 36. In addition to the analysis of the above calculations, the calculation of the base block values and offset values may be understood as follows: in the diagram stage 504, reference numbers (1) and (2) are the reduction procedures corresponding to 52 in fig. 5, reference numbers (1) and (2) are run in parallel, and the number of basic blocks of reference number (1) is 6, so that the number of basic blocks of reference number (1) and (2) as a whole is two groups of 6, and thus the number of basic blocks corresponding to 51 in fig. 5 is 12. For the offset, the first fetch is the first address, that is, the first number in the label (1), and the second fetch is the first number in the label (3), the numbers included in the label (1) and the label (2) are separated, the numbers of the label (1) and the label (2) have been fetched in the process of the first specification, and therefore, the offset at this time is 6 × 3 × 2 ═ 36, and therefore, the addresses of the fetched data are respectively: taking the number at the first address for the first time, wherein the number taking address is 0 × 36 ═ 0; in the second d process, the fetch address is 1 × 36 ═ 36; the third time is the e process, and the number taking address is 2 × 36 ═ 72. The data of the number of basic block values (i.e., 12) are extracted from the three addresses, respectively, and accumulated.
According to the analysis, the non-reduction dimensionality in the first basic block in each second basic block is multiplied to obtain first data; the accumulate operation is divided into a plurality of portions of a particular size based on the value of the first data, wherein the plurality of portions perform accumulate operations in parallel. Therefore, the parallel part corresponding to the second axis to be reduced (dimension 51) is 2 parts, and the two parts perform the same operation. So that another part can perform exactly the same operations as this part in parallel (another part is not shown in the figure).
In the calculation process, after the processor determines the basic block value, the offset and the parallel times corresponding to each value to be reduced, the processor simultaneously takes out the corresponding basic block value from different corresponding positions to perform the accumulation operation once, and finally, the accumulation result is stored back to the first storage space. An intermediate storage space does not need to be additionally opened up, so that the space waste is avoided, the I/O operation in the calculation process is also avoided, and the operation efficiency is improved.
Fig. 6 shows a flow chart of a method of reducing multi-dimensional image vectors according to another embodiment of the invention.
Step 601, setting the dimension of the image vector as a reduction group, like step 401 of fig. 4. Describing the dimension of the image vector is the shape of the image, and the dimension of the vector is set to a reduction group. For example, for a 5-dimensional tentor, the shape size is (3,6,2,3, 4). The vector corresponds to a reduced group of (3,6,2,3, 4).
Step 602, determining the first axis to be reduced in the reduction group according to a specific sequence, as in step 402 of fig. 4. The vector to be reduced comprises one or more axes to be reduced, the reduction group comprises all dimensions of the vector, and the first axis to be reduced in the reduction group is determined according to a specific sequence. Wherein the specific order is a forward or reverse order. The forward direction represents the order from left to right according to the reduction group, and the reverse direction is the order from right to left according to the reduction axis. In addition, the specific order may also be an order from any dimension along a certain direction, and the present invention is not limited in any way.
Step 603, dividing the reduction group into a first basic block and a second basic block based on the axis to be reduced, which is the same as step 403 in fig. 4. Wherein the first basic block comprises all dimensions before an axis to be reduced among the dimensions of the image vector, and the second basic block comprises all dimensions after the axis to be reduced among the dimensions of the image vector.
And dividing the protocol group into two parts by taking the first axis to be protocol as a center. The first basic block or the second basic block of the division includes 0,1 or more dimensions. When the first axis in the reduction group is the axis to be reduced, there is no dimension before the reduction is to be reduced, and then when the reduction group is divided by using the axis to be reduced as the center, the first basic block is an empty set. Similarly, when the first axis to be reduced in the reduction group is the last axis in the reduction group, there is no dimension after the reduction, so when dividing the reduction group by using the axis to be reduced as the center, the second basic block is an empty set.
And step 604, judging whether the first axis of the second basic block is an axis to be subjected to protocol reduction. As seen in step 602, the partitioned second basic block may include 0,1 or more dimensions. When the second basic block is not empty, the dimension of the second basic block may contain the dimension to be reduced or the dimension not to be reduced, and the order of the axes to be reduced and the axes not to be reduced is uncertain. If the first axis of the second basic block is the axis to be specified, go to step 605.
And step 605, fusing the first axis to be reduced in the reduction group with the first axis. The fusion refers to multiplying the values of the two axes, and the value of the obtained product is the value corresponding to the axis after the fusion.
Step 606, updating the fusion axis to a first axis to be subjected to protocol in the protocol group; and executing a dividing step according to the updated first axis to be subjected to protocol conversion. Wherein, fusion refers to the product obtained by multiplying two dimensions.
For example, the dimension size of the 5-dimensional tensor in step 601 is (3,6,2,3,4), where the axis to be reduced is [1,2, 3 ]. First, the first axis to be reduced in the dimension is determined according to the forward sequence, and the axis is equal to 1 and corresponds to the second dimension 6 in the vector. The dimensions of the vector are divided into a first basic block (3) and a second basic block (2,3, 4). Next, whether the first axis 2 in the second basic block is an axis to be reduced is determined, since axis ═ 1,2, 3], the dimension to be reduced corresponding to the axis to be reduced is the second dimension, the third dimension and the fourth dimension, and the first axis in the second basic block corresponds to the third dimension in the five-dimensional tensor, which is exactly the axis to be reduced. Therefore, the first axis to be reduced in the reduction group is fused with the first axis, that is, the dimension 6 corresponding to the first axis to be reduced in the reduction group is fused with the dimension 2 corresponding to the first axis to form one dimension 6 × 2 ═ 12. The first axis to be reduced in the reduction group is then updated with the fused axis, which is now no longer 6, but the fused result 12. The dimension of the vector is updated to (3,12,3,4) at this time. And re-executing the dividing step according to the dimension of the updated vector. The updated specification group is (3,12,3,4), and the axis to be specified becomes axis ═ 1, 2. And determining the first axis to be reduced as axis as 1 in the reduction group according to the forward sequence, and corresponding to the second dimension of the vector. The vector is divided into a first basic block (3) and a second basic block (3,4), and the first axis of the second basic block is judged to be an unconventional axis.
Returning to step 604, if the first axis of the second basic block is an un-reduced axis, step 607 is executed, that is, step 404 and the following steps in fig. 4 are executed, and finally, a reduction result of the image vector is obtained.
Fig. 7 shows a flow diagram of a method of reducing multi-dimensional image vectors according to another embodiment of the invention.
Step 700, image vector normalization. The acquired image vector may be an original picture vector or an optimized image vector. The dimensions of the image vector include the dimensions to be reduced or the dimensions not to be reduced. The image vectors are unified into a standard form before the reduction processing is performed on the image vectors, so that the reduction calculation is more convenient later. The standard form refers to that the dimension form of the picture vector is that the dimension to be reduced and the dimension not to be reduced are arranged alternately. The specific image vector standard comprises the following steps:
and step 710, judging whether continuous axes to be subjected to protocol conversion or axes not subjected to protocol conversion exist in the dimensionality of the image vector. The standard form of an image vector is to align the axes to be fixed with the axes not to be fixed. When a continuous axis to be subjected to protocol conversion or a continuous axis to be subjected to non-protocol conversion is encountered, the continuous axis to be subjected to protocol conversion or the continuous axis to be subjected to non-protocol conversion needs to be processed into a standard form. In the judgment process, an axis to be subjected to protocol conversion and an axis not subjected to protocol conversion of the picture vector are found out, and different marks are marked. The mark may be a number, a letter or the like, as long as the axis to be stipulated can be distinguished from the axis not to be stipulated, and the invention is not limited in any way.
Further, whether the marked picture vectors have continuous same marks or not is judged, and if yes, the fact that the picture vectors have continuous axes to be subjected to protocol conversion or axes not to be subjected to protocol conversion is indicated. For example, the dimension of a set of image vectors is (1,2,3,4,5,6,7,8,9,10,11,12), where axis is [2,4,5,6,8,10 ]. The dimensions of the image vector are marked with different marks according to axis, for example, an axis to be specified is represented by A, an axis not to be specified is represented by D, and data represented by each axis is distinguished by a numerical corner mark. After labeling, fig. 8 shows a schematic diagram of multi-axis reduction, and the original dimension 801 is the dimension of the image vector (D0, D1, a0, D2, a1, a2, A3, D3, a4, D4, a5, and D5), where a represents the axis to be reduced and D represents the axis not to be reduced. Thus, it is easy to judge that there are consecutive a or consecutive D, i.e., there are consecutive axes to be stipulated or consecutive axes that are not stipulated.
Optionally, when judging whether there are consecutive axes to be reduced or axes not to be reduced, it may also be judged one by one whether the latter dimension and the former dimension are of the same type. The same type refers to dimensions to be reduced or dimensions not to be reduced.
And step 720, if the dimension of the image vector has a continuous axis to be subjected to protocol conversion or an axis not to be subjected to protocol conversion, fusing the continuous axis to be subjected to protocol conversion or the axis not to be subjected to protocol conversion.
The fusion refers to converting a plurality of axes to be reduced or axes not to be reduced into one axis to be reduced or axis not to be reduced in a product form. In fig. 8, according to the judgment result (D0, D1, a0, D2, a1, a2, A3, D3, a4, D4, a5, and D5), the dimensions corresponding to the successive axes to be reduced or axes not to be reduced are multiplied to obtain a new axis to be reduced or a new axis not to be reduced, so as to obtain the updated dimension 802.
And step 730, updating the dimensionality according to the fusion result. The dimensionality after fusion is reduced relative to the dimensionality of the original image vector, and the dimensionality of the image vector before fusion is updated by using the new dimensionality after fusion, so that the dimensionality of the image vector after fusion is arranged between the axis to be regulated and the axis not to be regulated. In fig. 8, the updated dimension 803 is the updated dimension (D0, a0, D1, a1, D2, a2, D3, A3, and D4), and the updated dimension is set as a protocol group.
After the processing of step 700, the image vector to be processed is converted into an image vector in a standard form.
Step 701, setting the dimension of the image vector as a reduction group, which is the same as step 401 in fig. 4. The update dimension 803 of the image vector in fig. 8 is (D0, a0, D1, a1, D2, a2, D3, A3, and D4), and this dimension is set as a reduction group.
Step 702, determining a first axis to be subjected to protocol reduction in a protocol group according to a specific sequence. This step is the same as step 402 of FIG. 4, where the particular order refers to a forward order or a reverse order. The update dimension 804 in fig. 8 is in forward order, and the first axis to be specified is a0, i.e. the shaded area in the figure.
And 703, dividing the protocol group into a first basic block and a second basic block based on the axis to be subjected to protocol. This step is the same as step 403 of fig. 4, where the first basic block includes all dimensions of the image vector that precede the axis to be specified, and the second basic block includes all dimensions of the image vector that follow the axis to be specified. The protocol group is divided into a first basic block and a second basic block in fig. 8. As shown in fig. 8, the first basic block is D0, and the second basic block is (D1, a1, D2, a2, D3, A3, D4).
And step 704, judging whether an axis to be subjected to protocol conversion exists in the second basic block. This step is the same as step 404 in fig. 4, and if there is an axis to be specified in the second basic block, step 705 is performed.
Step 705, which is the same as step 405 of fig. 4, updates the reduction group with all dimensions in the second basic block, and returns to step 702 to execute step 702 and step 704 until there is no axis to be reduced in the second basic block. Otherwise, step 706 is performed.
Step 706, which is the same as step 406 of fig. 4, performs an accumulation operation on the at least one first basic block and the at least one second basic block to obtain a reduction result of the image vector.
Returning to the example of fig. 8, the second basic block (D1, a1, D2, a2, D3, A3, D4) also has an axis to be fixed a 1. Therefore, step 705 is executed to update the reduction groups with all dimensions in the second basic block, and the updated reduction groups are (D1, a1, D2, a2, D3, A3, D4). The step 702 and 704 are repeatedly executed, a first axis to be subjected to protocol conversion in the protocol group is determined to be a1, and the protocol group is divided into a first basic block (D0, a0, D1) and a second basic block (D2, a2, D3, A3, D4) based on the axis to be subjected to protocol conversion. The second basic block (D2, a2, D3, A3, D4) still has an axis to be reduced, and the reduction group is updated with all dimensions in the second basic block, so that the updated reduction group is (D2, a2, D3, A3, D4). The step 702 and 704 are repeatedly executed, the first axis to be reduced in the reduction group is determined to be a2, and the reduction group is divided into a first basic block (D0, a0, D1, a1, D2) and a second basic block (D3, A3, D4) based on the axis to be reduced. The axes to be reduced still exist in the second basic block (D3, A3, D4), and the reduction group is updated by all dimensions in the second basic block, so that the updated reduction group is (D3, A3, D4). And repeatedly executing the step 702 and 704, determining that the first axis to be reduced in the reduction group is A3, dividing the reduction group into a first basic block (D0, a0, D1, a1, D2, a2, D3) and a second basic block (D4) based on the axis to be reduced, and executing the step 706 after the second basic block does not have the axis to be reduced. As shown in the basic block information 805 in fig. 8, the basic block value, the offset, and the number of parallel operations corresponding to each axis to be specified are shown.
Step 706, performing accumulation operation on the at least one first basic block and the at least one second basic block to obtain a reduction result of the image vector. And respectively finding out the basic block value and the offset corresponding to each axis to be subjected to protocol conversion. And finding the storage address of the data according to the offset, and performing cumulative operation on the data corresponding to the size of the basic block value from the storage place of the data to obtain a final reduction result 806. The specific accumulation operation step is the same as step 406, and is not described herein.
Fig. 9 shows a multidimensional vector reduction apparatus 900, which apparatus 900 is adapted to perform the above method. The apparatus 900 includes a setting unit 901, a determination unit 902, a division unit 903, a judgment unit 904, an update unit 905, and a calculation unit 906.
The setting unit 901 is configured to set a reduction dimension of the image vector as a reduction group. Describing the dimension of the image vector is the shape of the image, and the dimension of the vector is set to a reduction group according to the shape of the image vector.
The determining unit 902 is configured to determine a first axis to be reduced in a reduction group in a specific order. The vector to be reduced includes one or more axes to be reduced, the reduction group includes all dimensions of the vector, and the determining unit 902 first determines a first axis to be reduced in the reduction group according to a specific order. The specific order is a forward or reverse order. The forward direction refers to the order of the reduction groups from left to right, and the reverse direction is the order of the reduction axes from right to left. In addition, the specific order may also be an order from any dimension along a certain direction, and the present invention is not limited in any way.
Dividing unit 903 is configured to divide the reduction group into a first basic block and a second basic block based on an axis to be reduced, where the first basic block includes all dimensions before the axis to be reduced in the dimension of the image vector, and the second basic block includes all dimensions after the axis to be reduced in the dimension of the image vector. The front and the back of the axis to be subjected to protocol conversion are opposite, for the forward sequence, the dimension on the left of the axis to be subjected to protocol conversion is before the axis to be subjected to protocol conversion, and the dimension on the right of the axis to be subjected to protocol conversion is after the axis to be subjected to protocol conversion; for the reverse order, the dimension on the right side of the axis to be subjected to protocol is before the axis to be subjected to protocol, and the dimension on the left side of the axis to be subjected to protocol is after the axis to be subjected to protocol.
The judging unit 904 is configured to judge whether an axis to be reduced exists in the second basic block. The updating unit 905 updates the reduction group with all dimensions within the second basic block if an axis to be reduced exists in the second basic block. According to the updated protocol group, the setting unit 901, the determining unit 902, the dividing unit 903, the judging unit 904 and the updating unit 905 re-execute the above operations until the second basic block does not have an axis to be protocol-reduced.
The calculating unit 906 is configured to perform an accumulation operation on the at least one first basic block and the at least one second basic block to obtain a reduction result of the image vector. Each axis to be reduced in the vector corresponds to a set of first and second basic blocks. When only one axis to be subjected to protocol is in the vector, a first basic block and a second basic block are generated. And when a plurality of axes to be reduced exist in the vector, generating a plurality of first basic blocks and second basic blocks.
The calculating unit 906 is further configured to obtain a base block value of each first base block, and perform an accumulation operation based on the base block value. Wherein the basic block value is the product of all the non-reduction dimensions in the first basic block.
The determining unit 904 is further configured to determine whether the first basic block is empty, and if the first basic block is empty, the calculating unit 906 sets the basic block value corresponding to the first basic block to 1.
The judging unit 904 is further configured to judge whether the first axis of the second basic block is an axis to be subjected to protocol conversion; the apparatus 900 further comprises a fusion unit 907, such as the fusion unit 907 is configured to fuse the first axis to be reduced in the reduction group with the first axis, and the update unit 905 is configured to update the fusion axis to the first axis to be reduced in the reduction group. The dividing unit 903 executes a dividing step according to the updated first axis to be reduced.
The calculating unit 906 is further configured to calculate an offset, where the offset is a product of all dimensions in the first base block; the calculation unit determines a data address of an accumulative calculation based on the offset; and taking out the data with the size of the basic block value from the data address, and performing accumulation operation.
The calculating unit 906 is further configured to multiply the unconventional dimensions in the first basic block in each second basic block to obtain first data; and dividing the accumulation operation into a plurality of portions of a certain size based on the value of the first data, wherein the plurality of portions perform accumulation operations in parallel.
Another embodiment of the invention is a computer readable storage medium having stored thereon computer program code for a reduced multidimensional vector, the computer program code comprising, when executed by a server, the server including a processor and a memory, the memory having the computer program code stored therein, the processor executing the computer program code in the memory. In some implementation scenarios, the integrated units may be implemented in the form of software program modules. If implemented in the form of software program modules and sold or used as a stand-alone product, the integrated units may be stored in a computer readable memory. In this regard, when the aspects of the present invention are embodied in a software product (e.g., a computer-readable storage medium), the software product may be stored in a memory, which may include instructions for causing a computer device (e.g., a personal computer, a server, or a network device, etc.) to perform some or all of the steps of the methods described in the embodiments of the present invention. The Memory may include, but is not limited to, a usb disk, a flash disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
According to different application scenarios, the electronic device or apparatus of the present invention may include a server, a cloud server, a server cluster, a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a PC device, an internet of things terminal, a mobile phone, a car recorder, a navigator, a sensor, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a visual terminal, an autopilot terminal, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph. The electronic device or apparatus of the present invention can also be applied to the fields of the internet, the internet of things, data centers, energy, transportation, public management, manufacturing, education, power grid, telecommunications, finance, retail, construction sites, medical care, and the like. Furthermore, the electronic equipment or the device can be used in application scenes such as a cloud end, an edge end and a terminal which are related to artificial intelligence, big data and/or cloud computing. In one or more embodiments, the electronic device or apparatus with high computational power according to the present disclosure may be applied to a cloud device (e.g., a cloud server), and the electronic device or apparatus with low power consumption may be applied to a terminal device and/or an edge device (e.g., a smartphone or a camera). In one or more embodiments, the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that appropriate hardware resources can be matched from the hardware resources of the cloud device according to the hardware information of the terminal device and/or the edge device to simulate the hardware resources of the terminal device and/or the edge device, so as to complete unified management, scheduling and cooperative work of end-cloud integration or cloud-edge-end integration.
It is noted that for the sake of simplicity, the present invention sets forth some methods and embodiments thereof as a series of acts or combinations thereof, but those skilled in the art will appreciate that the inventive arrangements are not limited by the order of acts described. Accordingly, persons skilled in the art may appreciate that certain steps may be performed in other sequences or simultaneously, in accordance with the disclosure or teachings of the invention. Further, those skilled in the art will appreciate that the described embodiments of the invention are capable of being practiced in other alternative embodiments that may involve fewer acts or modules than are necessary to practice one or more aspects of the invention. In addition, the description of some embodiments of the present invention is also focused on different schemes. In view of this, those skilled in the art will understand that portions of the present invention that are not described in detail in one embodiment may also refer to related descriptions of other embodiments.
In particular implementations, based on the disclosure and teachings of the present invention, one of ordinary skill in the art will appreciate that the several embodiments disclosed herein can be practiced in other ways not disclosed herein. For example, as for the units in the foregoing embodiments of the electronic device or apparatus, the units are split based on the logic function, and there may be another splitting manner in the actual implementation. Also for example, multiple units or components may be combined or integrated with another system or some features or functions in a unit or component may be selectively disabled. The connections discussed above in connection with the figures may be direct or indirect couplings between the units or components in terms of connectivity between the different units or components. In some scenarios, the aforementioned direct or indirect coupling involves a communication connection utilizing an interface, where the communication interface may support electrical, optical, acoustic, magnetic, or other forms of signal transmission.
In the present invention, units described as separate parts may or may not be physically separate, and parts shown as units may or may not be physical units. The aforementioned components or units may be co-located or distributed across multiple network elements. In addition, according to actual needs, part or all of the units can be selected to achieve the purpose of the scheme of the embodiment of the invention. In addition, in some scenarios, multiple units in an embodiment of the present invention may be integrated into one unit or each unit may exist physically separately.
In other implementation scenarios, the integrated unit may also be implemented in hardware, that is, a specific hardware circuit, which may include a digital circuit and/or an analog circuit, etc. The physical implementation of the hardware structure of the circuit may include, but is not limited to, physical devices, which may include, but are not limited to, transistors or memristors, among other devices. In this regard, the various devices described herein (e.g., computing devices or other processing devices) may be implemented by suitable hardware processors, such as central processing units, GPUs, FPGAs, DSPs, ASICs, and the like. Further, the aforementioned storage unit or storage device may be any suitable storage medium (including magnetic storage medium or magneto-optical storage medium, etc.), and may be, for example, a variable Resistive Memory (RRAM), a Dynamic Random Access Memory (DRAM), a Static Random Access Memory (SRAM), an Enhanced Dynamic Random Access Memory (EDRAM), a High Bandwidth Memory (HBM), a Hybrid Memory Cube (HMC), a ROM, a RAM, or the like.
The foregoing may be better understood in light of the following clauses:
clause a1, a method of stipulating a multi-dimensional image vector, wherein the method comprises: setting the reduction dimension of the image vector as a reduction group; determining a first axis to be subjected to protocol conversion in the protocol group according to a specific sequence; dividing the specification group into a first basic block and a second basic block based on the axis to be specified, wherein the first basic block comprises all dimensions before the axis to be specified in the specification dimension of the image vector, and the second basic block comprises all dimensions after the axis to be specified in the specification dimension of the image vector; judging whether an axis to be subjected to protocol conversion exists in the second basic block; if present, performing the following steps: updating the reduction group with all dimensions within the second base block; executing the determining, dividing and judging steps until no axis to be subjected to protocol specification exists in the second basic block; and performing accumulation operation on the at least one first basic block and the at least one second basic block to obtain a reduction result of the image vector.
Clause a2, the method of clause a1, the step of performing a cumulative calculation comprising: obtaining a basic block value of each first basic block, wherein the basic block value is a product of all the non-reduction dimensions in the first basic block; and performing accumulation operation based on the basic block value.
Clause A3, the method of clause a2, the step of cumulating further comprising: judging whether the first basic block is empty or not; and if empty, setting the base block value to 1.
Clause a4, the method of clause a1, the dividing step comprising: judging whether a first axis of the second basic block is an axis to be subjected to protocol conversion or not; if yes, fusing a first axis to be subjected to protocol reduction in the protocol reduction group with the first axis; updating the fusion axis to a first axis to be subjected to protocol in the protocol group; and executing the division step according to the updated first axis to be subjected to protocol conversion.
Clause a5, the method of clause a1, the method further comprising: judging whether continuous axes to be subjected to protocol conversion or axes not subjected to protocol conversion exist in the protocol dimensionality of the image vector; if yes, fusing the continuous axes to be subjected to protocol conversion or axes not to be subjected to protocol conversion; and updating the protocol dimensionality according to the fusion result.
Clause a6, the method of clause a2, the step of running a running total further comprising: calculating an offset, wherein the offset is a product of all dimensions in the first basic block; determining a data address of a cumulative operation based on the offset; and taking out the data with the size of the basic block value from the data address, and performing accumulation operation.
Clause a7, the method of clause a4 or a5, the step of running a running total further comprising: multiplying the non-reduction dimensions in the first basic block in each second basic block to obtain first data; and dividing the accumulation operation into a plurality of portions of a certain size based on the value of the first data, wherein the plurality of portions perform accumulation operations in parallel.
Clause A8, the method of clause a1, the particular order being a forward or reverse order.
Clause a9, the method of any of clauses a1-A8, wherein the cumulative operation comprises one of cumulative summing, averaging, multiplying, maximizing, and minimizing.
Clause a10, an electronic device, comprising: a processor; a memory for storing executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the method of any of clauses A1-A10.
Clause a11, a computer readable storage medium having stored thereon computer program code for specifying a multidimensional image vector, the computer program code, when executed by a processing device, performing the method of any one of clauses a1 to a 10.
Clause a12, a stipulation apparatus of multi-dimensional vector reduction, the apparatus comprising a setting unit, a determining unit, a dividing unit, a judging unit, an updating unit, and a calculating unit; the setting unit is used for setting the reduction dimension of the image vector as a reduction group; the determining unit is used for determining a first axis to be subjected to protocol in a protocol group according to a specific sequence; the dividing unit is used for dividing the reduction group into a first basic block and a second basic block based on the axis to be reduced, wherein the first basic block comprises all dimensions before the axis to be reduced in the dimensions of the image vector, and the second basic block comprises all dimensions after the axis to be reduced in the dimensions of the image vector; the judging unit is used for judging whether an axis to be subjected to protocol conversion exists in the second basic block, and if the axis to be subjected to protocol conversion exists in the second basic block, the updating unit updates the protocol group by all dimensions in the second basic block; according to the updated protocol group, the setting unit, the determining unit, the dividing unit, the judging unit and the updating unit execute the operations again until the second basic block does not have a to-be-protocol axis; the calculation unit is used for performing accumulation operation on the at least one first basic block and the at least one second basic block to obtain a reduction result of the image vector.
Clause a13, the apparatus of clause a12, the computing unit further configured to obtain a base block value for each first base block, perform a cumulative operation based on the base block values; wherein the basic block value is a product of all the non-reduction dimensions in the first basic block.
Clause a14, the apparatus according to clause a13, the determining unit further configured to determine whether a first basic block is empty, and if the first basic block is empty, the calculating unit further configured to set a basic block value corresponding to the first basic block to 1.
Clause a15, the apparatus according to clause a12, the determining unit is further configured to determine whether the first axis of the second basic block is an axis to be reduced, the apparatus further includes a fusing unit, if so, the fusing unit is configured to fuse the first axis to be reduced in the reduction group with the first axis, the updating unit is configured to update the fused axis to the first axis to be reduced in the reduction group, and the dividing unit performs the dividing step according to the updated first axis to be reduced. .
Clause a16, the apparatus of clause a12, the computing unit further to calculate an offset, wherein offset is a product of all dimensions in the first base block; the calculation unit determines a data address of an accumulative calculation based on the offset; and taking out the data with the size of the basic block value from the data address, and performing accumulation operation.
Clause a17, the apparatus of clause a12, the computing unit further to multiply the unconventional dimensions in the first base block in each second base block to obtain first data; and dividing the accumulation operation into a plurality of portions of a certain size based on the value of the first data, wherein the plurality of portions perform accumulation operations in parallel.
Clause a18, the apparatus of clause a12, the particular order being a forward or reverse order.
Clause a19, the apparatus of any of clauses a12-a18, wherein the cumulative operation comprises one of cumulative summing, averaging, multiplying, maximizing, and minimizing.
The above embodiments of the present invention are described in detail, and the principle and the implementation of the present invention are explained by applying specific embodiments, and the above description of the embodiments is only used to help understanding the method of the present invention and the core idea thereof; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (11)

1. A method of reducing a multi-dimensional image vector, the method comprising:
setting the reduction dimension of the image vector as a reduction group;
determining a first axis to be subjected to protocol conversion in the protocol group according to a specific sequence;
dividing the reduction group into a first basic block and a second basic block based on the axis to be reduced, wherein the first basic block comprises all dimensions before the axis to be reduced in the reduction dimension of the image vector, and the second basic block comprises all dimensions after the axis to be reduced in the reduction dimension of the image vector;
judging whether an axis to be subjected to protocol conversion exists in the second basic block;
if present, performing the following steps:
updating the reduction group with all dimensions within the second base block; and
executing the determining, dividing and judging steps until the second basic block does not have an axis to be subjected to protocol conversion; and
and performing accumulation operation on the at least one first basic block and the at least one second basic block to obtain a reduction result of the image vector.
2. The method of claim 1, wherein the step of accumulating comprises:
obtaining a basic block value of each first basic block, wherein the basic block value is a product of all the non-reduction dimensions in the first basic block;
and performing accumulation operation based on the basic block value.
3. The method of claim 2, wherein the step of accumulating further comprises:
judging whether the first basic block is empty or not; and
if empty, the base block value is set to 1.
4. The method of claim 1, wherein the dividing step comprises:
judging whether a first axis of the second basic block is an axis to be subjected to protocol conversion or not;
if yes, fusing a first axis to be subjected to protocol reduction in the protocol reduction group with the first axis;
updating the fusion axis to a first axis to be subjected to protocol in the protocol group;
and executing the division step according to the updated first axis to be subjected to protocol conversion.
5. The method of claim 1, further comprising:
judging whether continuous axes to be subjected to protocol conversion or axes not subjected to protocol conversion exist in the protocol dimensionality of the image vector;
if yes, fusing the continuous axes to be subjected to protocol conversion or axes not to be subjected to protocol conversion; and
and updating the protocol dimensionality according to the fusion result.
6. The method of claim 2, wherein the step of accumulating further comprises:
calculating an offset, wherein the offset is a product of all dimensions in the first basic block;
determining a data address of a cumulative operation based on the offset;
and taking out the data with the size of the basic block value from the data address, and performing accumulation operation.
7. The method of claim 4 or 5, wherein the step of accumulating further comprises:
multiplying the non-reduction dimensions in the first basic block in each second basic block to obtain first data; and
the method further includes dividing the accumulate operation into a plurality of portions of a particular size based on the value of the first data, wherein the plurality of portions perform accumulate operations in parallel.
8. The method of claim 1, wherein the specific order is a forward or reverse order.
9. The method of any one of claims 1-8, wherein the accumulation operation comprises one of accumulating sums, averaging, multiplying, maximizing, and minimizing.
10. An electronic device, comprising:
a processor;
a memory for storing executable instructions;
wherein the processor is configured to invoke the memory-stored instructions to perform the method of any one of claims 1 to 10.
11. A computer readable storage medium having stored thereon computer program instructions for a reduced multidimensional image vector, the computer program instructions when executed by a server implementing the method of any of claims 1 to 10.
CN202011551576.9A 2020-12-24 2020-12-24 Method for reducing multidimensional vector, electronic equipment and storage medium Pending CN114677549A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011551576.9A CN114677549A (en) 2020-12-24 2020-12-24 Method for reducing multidimensional vector, electronic equipment and storage medium
PCT/CN2021/133658 WO2022135049A1 (en) 2020-12-24 2021-11-26 Method, electronic device, and storage medium for reducing multi-dimensional vector

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011551576.9A CN114677549A (en) 2020-12-24 2020-12-24 Method for reducing multidimensional vector, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114677549A true CN114677549A (en) 2022-06-28

Family

ID=82071185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011551576.9A Pending CN114677549A (en) 2020-12-24 2020-12-24 Method for reducing multidimensional vector, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN114677549A (en)
WO (1) WO2022135049A1 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6017335B2 (en) * 2013-02-06 2016-10-26 株式会社東芝 Pattern recognition apparatus, method thereof, and program thereof
US11216281B2 (en) * 2019-05-14 2022-01-04 International Business Machines Corporation Facilitating data processing using SIMD reduction operations across SIMD lanes
CN110209503B (en) * 2019-08-01 2019-10-25 上海燧原智能科技有限公司 Specification calculation method, device, equipment and the medium of multidimensional tensor

Also Published As

Publication number Publication date
WO2022135049A1 (en) 2022-06-30

Similar Documents

Publication Publication Date Title
CN109062611B (en) Neural network processing device and method for executing vector scaling instruction
CN111242844B (en) Image processing method, device, server and storage medium
CN112633490A (en) Data processing device and method for executing neural network model and related products
CN111161705B (en) Voice conversion method and device
CN111199276B (en) Data processing method and related product
CN112084023A (en) Data parallel processing method, electronic equipment and computer readable storage medium
CN109740729B (en) Operation method, device and related product
CN109711538B (en) Operation method, device and related product
CN114677549A (en) Method for reducing multidimensional vector, electronic equipment and storage medium
EP4170547A1 (en) Method for extracting data features, and related apparatus
CN115373646A (en) Information expansion method, device and related product
CN112667227A (en) Method for visually designing pipeline and readable storage medium
CN115454923A (en) Data calculation device, board card, method and storage medium
CN115455798A (en) Device, board card and method for correcting dead pixel and readable storage medium
CN112540848A (en) Image decompression apparatus, method and readable storage medium
CN111784557A (en) Method and device for processing image data, board card and readable storage medium
CN115222027A (en) Neural network computing method, electronic equipment and storage medium
CN113469333B (en) Artificial intelligence processor, method and related products for executing neural network model
CN114692864A (en) Quantization method, quantization device, storage medium, and electronic apparatus
CN114691083A (en) Matrix multiplication circuit, method and related product
CN113791754A (en) Arithmetic circuit, chip and board card
CN114648438A (en) Apparatus, method, and readable storage medium for processing image data
CN115221105A (en) Data processing device, data processing method and related product
CN114647442A (en) Apparatus operating according to instruction set
CN114444677A (en) Device, board card and method for sparse training and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination