CN111984189A

CN111984189A - Neural network computing device, data reading method, data storage method and related equipment

Info

Publication number: CN111984189A
Application number: CN202010713923.7A
Authority: CN
Inventors: 蒋文
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2020-11-24
Anticipated expiration: 2040-07-22
Also published as: CN111984189B; WO2022016926A1

Abstract

The embodiment of the application provides a neural network computing device, a data reading and data storing method and related equipment, wherein the neural network computing device comprises a system control module, a computing module, a bus control interface module and an internal storage module, the system control module is in communication connection with the computing module, the bus control interface module is in communication connection with the computing module and the internal storage module respectively, the neural network computing device is in communication connection with an external storage module through the bus control interface module, first source image pixel point data is stored in the internal storage module, second source image pixel point data is stored in the external storage module, the first source image pixel point data and the second source image pixel point data form pixel point data of a source image, and the pixel point data of the source image are stored in the internal storage module and the external storage module in a blocking mode, the pixel point data can be efficiently read, and the calculation efficiency is improved.

Description

Neural network computing device, data reading method, data storage method and related equipment

Technical Field

The present application relates to the field of neural network technologies, and in particular, to a neural network computing device, a data reading method, a data storing method, and a related apparatus.

Background

The common Convolutional Neural Network (CNN) can explicitly learn translational invariance and implicitly learn rotational invariance, but the attention model (attention model) shows that, rather than letting the Network implicitly learn a certain capability, an explicit processing module is designed for the Network to specially process the above various transformations. Therefore, deep thinking (deep mind) has designed a Spatial Transform Network (STN) to implement various transforms, and the STN implements various transforms including parameter prediction, coordinate mapping, and pixel sampling. The coordinates of the pixel points of the target image obtained through STN transformation are regular, for example, the pixel points of the target image with the width of w and the height of h can be from (0, 0) to (w-1, h-1), and the pixel point coordinates of the source image calculated through coordinate mapping according to the pixel point coordinates of the target image are random. In the prior art, pixel point data of a source image is stored outside, and when the pixel is sampled, a space transformation network computing device needs to frequently go to an external double-rate synchronous dynamic random access memory (DDR) or a Random Access Memory (RAM) to read the pixel point data of the source image, so that the computing efficiency is low.

Disclosure of Invention

The embodiment of the application discloses a neural network computing device, a data reading and data storing method and related equipment.

The first aspect of the embodiment of the application discloses a neural network computing device, which comprises a system control module, a computing module, a bus control interface module and an internal storage module, wherein the system control module is in communication connection with the computing module;

the internal storage module stores first source image pixel point data, the external storage module stores second source image pixel point data, and the first source image pixel point data and the second source image pixel point data form pixel point data of a source image;

the system control module is used for sending a calculation starting signal to the calculation module, and the calculation starting signal comprises target image pixel point coordinates and transformation parameters;

the computing module is used for computing storage address information of third source image pixel point data according to the target image pixel point coordinates and the transformation parameters and sending the storage address information to the bus control interface module;

the bus control interface module is used for reading the third source image pixel point data from the internal storage module and/or the external storage module according to the storage address information and sending the third source image pixel point data to the computing module;

and the calculating module is also used for calculating to obtain target image pixel point data according to the third source image pixel point data.

The second aspect of the embodiment of the present application discloses a data reading method, which is applied to a neural network computing device, wherein the neural network computing device comprises an internal storage module, the neural network computing device is in communication connection with an external storage module, the internal storage module stores first source image pixel point data, the external storage module stores second source image pixel point data, and the first source image pixel point data and the second source image pixel point data form pixel point data of a source image, and the method comprises the following steps:

determining pixel point coordinates of a source image;

if the vertical coordinate of the pixel point coordinate of the source image is not larger than the preset threshold value, reading third source image pixel point data from the internal storage module according to the pixel point coordinate of the source image;

and if the vertical coordinate of the pixel point coordinate of the source image is larger than the preset threshold value, reading the pixel point data of the third source image from the external storage module according to the pixel point coordinate of the source image.

A third aspect of the embodiments of the present application discloses a data storage method, which is applied to a neural network computing device, where the neural network computing device includes an internal storage module, and the neural network computing device is communicatively connected to an external storage module, and the method includes:

acquiring pixel point data of a source image;

storing pixel point data of pixel points of a first source image of which the vertical coordinate is not more than a preset threshold value in the source image into the internal storage module, and storing pixel point data of pixel points of a second source image of which the vertical coordinate is more than the preset threshold value in the source image into the external storage module.

The fourth aspect of the embodiment of the present application discloses a data reading device, which is applied to a neural network computing device, wherein the neural network computing device includes an internal storage module, the neural network computing device is in communication connection with an external storage module, the internal storage module stores first source image pixel point data, the external storage module stores second source image pixel point data, the first source image pixel point data and the second source image pixel point data form pixel point data of a source image, and the data reading device includes:

the determining unit is used for determining the coordinates of pixel points of the source image;

the reading unit is used for reading third source image pixel point data from the internal storage module according to the source image pixel point coordinate if the vertical coordinate of the source image pixel point coordinate is not larger than a preset threshold value;

the reading unit is further configured to read third source image pixel point data from the external storage module according to the source image pixel point coordinate if the vertical coordinate of the source image pixel point coordinate is greater than the preset threshold.

In a fifth aspect of the embodiments of the present application, a data storage device is disclosed, which is applied to a neural network computing device, where the neural network computing device includes an internal storage module, the neural network computing device is communicatively connected to an external storage module, and the data storage device includes:

the acquisition unit is used for acquiring pixel point data of a source image;

the storage unit is used for storing pixel point data of pixel points of a first source image of which the vertical coordinate is not more than a preset threshold value in the source image into the internal storage module, and storing pixel point data of pixel points of a second source image of which the vertical coordinate is more than the preset threshold value in the source image into the external storage module.

A sixth aspect of embodiments of the present application discloses a neural network computing device, comprising a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps of the method of any of the second aspects of embodiments of the present application.

A seventh aspect of embodiments of the present application discloses a neural network computing device, comprising a processor, a memory, a communication interface, and one or more programs, stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps of the method according to any one of the third aspects of embodiments of the present application.

An eighth aspect of the embodiments of the present application discloses a chip, which includes: a processor, configured to call and run a computer program from a memory, so that a device on which the chip is installed performs the method according to any one of the second aspect or the third aspect of the embodiments of the present application.

A ninth aspect of the present application embodiment discloses a computer-readable storage medium, which is characterized by storing a computer program for electronic data exchange, wherein the computer program causes a computer to execute the method according to any one of the second aspect or the third aspect of the present application embodiment.

A tenth aspect of embodiments of the present application discloses a computer program that causes a computer to perform the method according to any one of the second or third aspects of embodiments of the present application.

By implementing the embodiment of the application, the pixel point data of a source image is stored in the internal storage module and the external storage module in blocks, the internal storage module stores the pixel point data of a first source image, the external storage module stores the pixel point data of a second source image, and the pixel point data of the first source image and the pixel point data of the second source image form all the pixel point data of the source image; in the neural network computing device, a system control module sends a computing starting signal comprising the coordinates of pixel points of a target image and transformation parameters to a computing module; the calculation module calculates to obtain the storage address information of the pixel point data of the third source image according to the pixel point coordinates of the target image and sends the storage address information to the bus control interface module; the bus control interface module reads third source image pixel point data from the internal storage module and/or the external storage module according to the storage address information and sends the third source image pixel point data to the calculation module; the calculation module calculates to obtain target image pixel point data according to the third source image pixel point data and the transformation parameters; the neural network computing device can efficiently read pixel point data from the memory in the computing process because the pixel point data of the source image is stored in the internal storage module and the external storage module in a blocking mode, and therefore computing efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic architecture diagram of a spatial transform network according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a process of spatial transformation provided in an embodiment of the present application;

FIG. 3 is a schematic process diagram of another spatial transformation provided by an embodiment of the present application;

FIG. 4 is a diagram illustrating bilinear interpolation provided in an embodiment of the present application;

FIG. 5 is a schematic flow chart illustrating a data storage method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an image block storage according to an embodiment of the present disclosure;

fig. 7 is a schematic diagram of a pixel data storage according to an embodiment of the present application;

FIG. 8 is a schematic diagram of another pixel point data storage provided in an embodiment of the present application;

fig. 9 is a schematic diagram of another pixel data storage according to an embodiment of the present application;

fig. 10 is a schematic flowchart of a data reading method according to an embodiment of the present application;

fig. 11 is a schematic diagram illustrating a process of calculating an address offset according to an embodiment of the present application;

FIG. 12 is a diagram illustrating another address offset calculation process provided in an embodiment of the present application;

FIG. 13 is a schematic diagram illustrating a process for calculating an address offset according to an embodiment of the present application;

fig. 14 is a schematic diagram illustrating a process of calculating a further address offset according to an embodiment of the present application;

FIG. 15 is a schematic structural diagram of a data storage address according to an embodiment of the present application;

fig. 16 is a schematic structural diagram of a neural network computing device according to an embodiment of the present application;

fig. 17 is a schematic structural diagram of an address calculation unit according to an embodiment of the present application;

fig. 18 is a schematic diagram illustrating a pixel coordinate determination according to an embodiment of the present application;

FIG. 19 is a schematic structural diagram of a data storage device according to an embodiment of the present application;

fig. 20 is a schematic structural diagram of a data reading apparatus according to an embodiment of the present application;

FIG. 21 is a schematic structural diagram of a neural network computing device provided in an embodiment of the present application;

FIG. 22 is a schematic structural diagram of another neural network computing device provided by an embodiment of the present application;

fig. 23 is a schematic diagram of a hardware structure of a chip according to an embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

To facilitate an understanding of the present application, relevant technical knowledge related to embodiments of the present application will be first introduced herein.

Referring to fig. 1, fig. 1 is a schematic diagram of an architecture of a spatial transform network according to an embodiment of the present application, where the architecture of the spatial transform network is divided into three parts, namely parameter prediction, coordinate mapping, and pixel sampling. As illustrated in fig. 1, source data (U) is input to a parameter prediction part of a spatial transformation network, thereby predicting spatial transformation parameters; then the space transformation parameters are transmitted to a coordinate mapping part, and the coordinate mapping part carries out coordinate mapping according to the given target data coordinates and the space transformation parameters to obtain source data coordinates; the source data coordinates are transmitted to a pixel sampling part, and the pixel sampling part acquires data from the source data (U) according to the source data coordinates, so that target data (V) is obtained. The parts of the space transformer network are described separately below.

(1) Parameter prediction

Suppose that after the source image l-1 is processed by the neural network, a target image l is obtained.

Referring to fig. 2, fig. 2 is a schematic diagram of a spatial transformation process according to an embodiment of the present application, which is derived from a in fig. 1₁₃ ^l-1Obtaining a of the target image l after spatial transformation of pixel points at the coordinates₂₃ ^lPixel points at coordinates, and a in the source map l-1₂₃ ^l-1Obtaining a of the target image l after spatial transformation of pixel points at the coordinates₃₃ ^lPixel points at coordinates.

Referring to FIG. 3, FIG. 3 is a schematic diagram of another spatial transformation process provided in the present application, which is derived from a in FIG. 1₁₁ ^l-1Obtaining a of the target image l after spatial transformation of pixel points at the coordinates₁₃ ^lPixel point at coordinate, a in source image l-1₁₂ ^l-1Obtaining a of the target image l after spatial transformation of pixel points at the coordinates₂₃ ^lPixel points at coordinates, and a in the source map l-1₁₃ ^l-1Obtaining a of the target image l after spatial transformation of pixel points at the coordinates₃₃ ^lPixel points at coordinates.

A series of spatial transformation parameters w are generated in the spatial transformation process, and can be predicted through parameter prediction, and (x ', y') is assumed to be the coordinates of a source image l-1, and (x, y) is assumed to be the coordinates of a target image l, and the calculation formula of the transformation process and the corresponding transformation parameters are shown as follows.

a. Image amplification:

wherein, the parameter 2 represents that the source image l-1 is magnified by 2 times to obtain the target image l.

b. Image reduction:

wherein, the parameter 0.5 represents that the source graph l-1 is reduced to the original 0.5 to obtain the target graph l.

c. Image rotation:

d. image cutting:

it can be found that all the transformations only need 6 parameter controls, so that a source image U can be used as input, and spatial transformation parameters w are regressed through calculation of a plurality of continuous layers (convolution layers, full-connection layers FC Layer and the like) and used for the next coordinate mapping calculation.

(2) Coordinate mapping

For given target image pixel point coordinates (x, y), pixel point coordinates (x ', y') can be obtained through calculation of formula (5) through 6 parameters a, b, c, d, e and f predicted by a space transformation network parameter prediction part.

(3) Pixel sampling

In fact, the pixel point coordinates (x ', y') obtained by calculation after coordinate mapping are decimal, 4 source image pixel point coordinates (Sx, Sy) can be found in the source images by taking integers, and then the final result is obtained by bilinear interpolation.

Referring to fig. 4, fig. 4 is a schematic diagram of bilinear interpolation provided in the embodiment of the present application, assuming that pixel coordinates (x ', y') obtained after calculation are (1.6, 2.4), pixel coordinates (Sx, Sy) of a source image after rounding are (1, 2), (2, 2), (1, 3), and (2, 3), respectively, and then a result is obtained through bilinear interpolation by combining a fractional part (0.6, 0.4), where the bilinear interpolation is shown in formula (6).

Generally, for hardware design, the parameter prediction of the first step can be realized by a neural network processing core or a Digital Signal Processing (DSP), etc., and the subsequent coordinate mapping and pixel sampling can be realized by proprietary hardware.

The application designs hardware for realizing functions of coordinate mapping, pixel sampling and the like in a space transformation network, a source image is stored in blocks, storage address calculation logic is designed, large-size image dynamic access is realized, the problem of reading data from a memory can be efficiently solved, and the technical scheme provided by the application is introduced in detail by combining a specific implementation mode.

Referring to fig. 5, fig. 5 is a schematic flowchart of a data storage method according to an embodiment of the present disclosure, where the data storage method is applied to a neural network computing device, the neural network computing device includes an internal storage module, and the neural network computing device is communicatively connected to an external storage module, and the method includes:

step 501: and acquiring pixel point data of the source image.

Step 502: storing pixel point data of pixel points of a first source image of which the vertical coordinate is not more than a preset threshold value in the source image into the internal storage module, and storing pixel point data of pixel points of a second source image of which the vertical coordinate is more than the preset threshold value in the source image into the external storage module.

The data storage method can be used for storing image data in STN algorithm implementation and other algorithm implementations needing to store large-size image data or other data.

For example, referring to fig. 6 together, fig. 6 is a schematic diagram of image block storage provided in the embodiment of the present application, each time a neural network computing task is started, an input source image needs to be stored, a parameter read _ height may be set in the storage process, pixel point data from a first line to a read _ height line of the image (that is, an ordinate of a pixel coordinate is from 0 to read _ height-1) is stored in an internal storage module, and pixel point data from a read _ height +1 line to a last line of the image (that is, an ordinate of a pixel coordinate is from read _ height to height-1) is stored in an external storage module (for example, DDR or other storage spaces).

As can be seen, in this example, the image is stored in blocks, and a part of the image blocks are stored in the internal storage module and another part of the image blocks are stored in the external storage module in the neural network computing device, so that the computing process is not limited by the size of the image, and even if the image is large in size, efficient storage can be achieved, and the computing efficiency is improved.

In some possible examples, the internal storage module includes a first storage unit, a second storage unit, a third storage unit and a fourth storage unit, and the storing, in the internal storage module, pixel point data of a first source image pixel point whose ordinate in the source image is not greater than a preset threshold includes: storing pixel point data of pixel points of a first source image with even ordinate and even abscissa in a first storage unit; storing pixel point data of pixel points of a first source image with an even ordinate and an odd abscissa in a second storage unit; storing pixel point data of pixel points of a first source image with odd ordinate and even abscissa in a third storage unit; and storing the pixel point data of the pixel points of the first source image with odd ordinate and odd abscissa in a fourth storage unit.

For example, referring to fig. 7 together, fig. 7 is a schematic diagram of a pixel data storage according to an embodiment of the present application, in which a first source image pixel a having an even ordinate and an even abscissa in a source image is provided₂₂ ^l-1Storing the pixel data in a first storage unit; a first source image pixel point a with even ordinate and odd abscissa₂₁ ^l-1And a₂₃ ^l-1The pixel data of (2) are stored in a second storage unit; a first source image pixel point a with the ordinate of odd number and the abscissa of even number₁₂ ^l-1And a₃₂ ^l-1The pixel data of (2) is stored in a third storage unit; a first source image pixel point a with odd ordinate and odd abscissa₁₁ ^l-1、a₁₃ ^l-1、a₃₁ ^l-1And a₃₃ ^l-1The pixel point data of (2) is stored in the fourth storage unit.

Therefore, in this example, the pixel data of the image is respectively stored in 4 different storage units according to the parity properties of the horizontal and vertical coordinates of the pixel, and the parity properties of the horizontal and vertical coordinates of the pixel in each storage unit are the same, so that when the pixel data is read, the pixel data of 4 pixels can be synchronously read, and the reading efficiency of the pixel data is improved.

In some possible examples, the storing, in the first storage unit, pixel point data of a pixel point of the first source image whose ordinate and abscissa are both even numbers includes: arranging the first source image pixel points with even ordinate and abscissa from small to large according to the ordinate to obtain a first arrangement sequence, wherein the first source image pixel points with the same ordinate in the first arrangement sequence are arranged from small to large according to the abscissa; and storing the pixel point data of the pixel points of the first source image with even ordinate and even abscissa in the first storage unit according to the first arrangement sequence.

For example, referring to fig. 8 together, fig. 8 is a schematic diagram of another pixel point data storage provided in the present embodiment, as shown in fig. 8, according to a₂₂ ^l-1、a₂₄ ^l-1、a₂₆ ^l-1、a₄₂ ^l-1、a₄₄ ^l-1、a₄₆ ^l-1Wherein in the first order, the pixels of the same ordinate (i.e. the same row) are ordered from small to large according to the abscissa, e.g. a₂₂ ^l-1、a₂₄ ^l-1、a₂₆ ^l-1Or a₄₂ ^l-1、a₄₄ ^l-1、a₄₆ ^l-1。

The pixel point data is stored in the second storage unit, the third storage unit and the fourth storage unit in the same manner, which is not specifically illustrated here.

As can be seen, in this example, the data in the same row are stored in order from small to large according to the abscissa, and it can be ensured that the data in the same row are adjacent to each other and have the same position as the pixel point in the image.

In some possible examples, the storing, in an external storage module, pixel point data of a second source image pixel point whose ordinate in the source image is greater than the preset threshold includes: arranging second source image pixels with vertical coordinates larger than the preset threshold value in the source images from small to large according to the vertical coordinates to obtain a second arrangement sequence, wherein the second source image pixels with the same vertical coordinates in the second arrangement sequence are arranged from small to large according to the horizontal coordinates; and storing the pixel point data of the pixel points of the second source image in the external storage module according to the second arrangement sequence.

For example, referring to fig. 9 together, fig. 9 is a schematic diagram of another pixel data storage provided in the embodiment of the present application, and as shown in fig. 9, it is assumed that the pixel data of the first and second rows in the image are stored in the internal storage module, and the pixel data of the third and fourth rows are stored in the internal storage moduleData is stored in the external storage module, and when the data is stored in the external storage module, the data is stored according to the a₃₁ ^l-1、a₃₂ ^l-1、a₃₃ ^l-1、a₄₁ ^l-1、a₄₂ ^l-1、a₄₃ ^l-1Wherein the pixels of the same ordinate (i.e. the same row) are sorted from small to large according to the abscissa, e.g. a₃₁ ^l-1、a₃₂ ^l-1、a₃₃ ^l-1Or a₄₁ ^l-1、a₄₂ ^l-1、a₄₃ ^l-1。

Therefore, in this example, the data in the same row are stored in an ordered manner from small to large according to the abscissa, so that the data in the same row can be adjacent to each other and have the same position as the pixel point in the image, and reading of the pixel point data of the adjacent pixel point is facilitated.

Referring to fig. 10, fig. 10 is a schematic flow chart of a data reading method provided in an embodiment of the present application, where the data reading method is applied to a neural network computing device, the neural network computing device includes an internal storage module, the neural network computing device is in communication connection with an external storage module, the internal storage module stores first source image pixel point data, the external storage module stores second source image pixel point data, and the first source image pixel point data and the second source image pixel point data form pixel point data of a source image, and the method includes:

step 1001: and determining the coordinates of pixel points of the source image.

Step 1002: and if the vertical coordinate of the pixel point coordinate of the source image is not larger than the preset threshold value, reading third source image pixel point data from the internal storage module according to the pixel point coordinate of the source image.

Step 1003: and if the vertical coordinate of the pixel point coordinate of the source image is larger than the preset threshold value, reading the pixel point data of the third source image from the external storage module according to the pixel point coordinate of the source image.

Specifically, pixel point data corresponding to pixel point coordinates in the source image, of which the ordinate is not greater than the preset threshold, may be stored in the internal storage module as first source image pixel point data; taking pixel point data corresponding to pixel point coordinates in the source image, which are larger than the preset threshold value in the vertical coordinate, as second source image pixel point data, and storing the second source image pixel point data in an external storage module; therefore, when the neural network computing device is used for computing, whether the pixel point data is read by the internal storage module or the external storage module can be determined according to the pixel point coordinates of the source image.

The data reading method can be used for reading image data in STN algorithm implementation and other algorithm implementations needing to store large-size image data or other data.

In this example, the image is stored in blocks, a part of the image blocks are stored in an internal storage module in the neural network computing device, another part of the image blocks are stored in an external storage module, the neural network computing device determines, after determining pixel point coordinates of the source image, whether vertical coordinates of the pixel point coordinates of the source image are greater than a preset threshold, when the vertical coordinates of the pixel point coordinates of the source image are not greater than the preset threshold, the internal storage module reads pixel point data, and when the vertical coordinates of the pixel point coordinates of the source image are greater than the preset threshold, the external storage module reads the pixel point data, so that the computing process is not limited by the size of the image, even if the image is large-sized, the pixel.

In some possible examples, the determining source image pixel point coordinates includes: acquiring coordinates of pixel points of a target image; and calculating to obtain the pixel point coordinates of the source image according to the pixel point coordinates of the target image.

For example, when the neural network computing device is used for implementing the STN algorithm, coordinate mapping can be performed through given pixel point coordinates of a target image, and pixel point coordinates of a source image are obtained through calculation. For example, given target image pixel point coordinates (x, y), pixel point coordinates (x ', y') obtained through coordinate mapping are decimal, and 4 source image pixel point coordinates (Sx, Sy) can be found in a source image by taking integers.

Therefore, in this example, given the pixel point coordinates of the target image, the pixel point coordinates of the source image can be obtained through calculation, and then the pixel point data of the source image can be read according to the pixel point coordinates of the source image.

In some possible examples, the reading third source image pixel point data from the internal storage module according to the source image pixel point coordinates includes: calculating to obtain first storage address information according to the pixel point coordinates of the source image; and reading third source image pixel point data from the internal storage module according to the first storage address information.

Specifically, the storage address logic is designed, and the pixel point data is stored in the internal storage module, so that each pixel point data generates first storage address information, the first storage address information is related to pixel point coordinates, the first storage address information can be calculated through the pixel point coordinates of a source image, and the pixel point data of the source image is read according to the first storage address information.

Therefore, in the present example, by designing the storage address logic, the first storage address information stored in the pixel point data is related to the pixel point coordinates, and the first storage address information can be calculated through the pixel point coordinates of the source image, so that the pixel point data is read, and the efficiency of data reading is improved.

In some possible examples, the calculating the first storage address information according to the pixel point coordinates of the source image includes: if the ordinate and the abscissa of the pixel point coordinate of the source image are both even numbers, determining a first address offset according to the ordinate and the abscissa of the pixel point coordinate of the source image and the width of the source image; if the ordinate of the pixel coordinates of the source image is even and the abscissa is odd, determining a second address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image; if the ordinate of the pixel coordinates of the source image is odd and the abscissa is even, determining a third address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image; and if the ordinate and the abscissa of the pixel coordinates of the source image are both odd numbers, determining a fourth address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image.

The address offset is an offset from the start address, that is, an offset from the 0 address.

Specifically, the pixel data of the source image can be stored in blocks according to 4 conditions according to the odd-even property of the horizontal and vertical coordinates of the pixel, that is, the pixel data of which the vertical and horizontal coordinates are even numbers are stored in one block, the pixel data of which the vertical and horizontal coordinates are odd numbers are stored in one block, and the pixel data of which the vertical and horizontal coordinates are odd numbers are stored in one block, so that the storage addresses respectively correspond to the first address offset, the second address offset, the third address offset and the fourth address offset.

Therefore, in this example, the pixel data of the image is stored in 4 different storage areas according to the properties of the horizontal and vertical coordinates of the pixel, and the properties of the horizontal and vertical coordinates of the pixel in each storage area are the same, so that the pixel data of 4 pixels can be synchronously read when the pixel data is read, and the reading efficiency of the pixel data is improved.

In some possible examples, referring to fig. 11 together, the determining a first address offset according to the ordinate and the abscissa of the pixel point coordinate of the source image and the width of the source image includes: dividing the width of the source image by 2 and then rounding to obtain a first width value; if the first width value is an odd number, adding 1 to the first width value to obtain a second width value; if the first width value is an even number, taking the first width value as the second width value; dividing the ordinate of the pixel point coordinate of the source image by 2 and then rounding to obtain a first ordinate value; dividing the abscissa of the pixel point coordinate of the source image by 2 and then rounding to obtain a first abscissa value; and multiplying the second width value by the first ordinate value, and adding the first abscissa value to obtain the first address offset.

In fig. 11, the symbol "> > 1" indicates that one bit is shifted to the right in binary, and converted into decimal, that is, the meaning of dividing by 2 is indicated.

For example, assuming that the source image has a width of 16 pixels, a pixel point a on the source image₂₂ ^l-1And the coordinate (Sx, Sy) is (2, 2), and the process of calculating the first address offset is as follows: the width of a source image is 16, and the source image is divided by 2 and then rounded to obtain a first width value of 8; if the first width value is 8 and is an even number, the first width value is taken as a second width value, namely the second width value is 8; the vertical coordinate of the pixel point coordinate of the source image is 2, and the first vertical coordinate value is 1 after dividing by 2 and then rounding; the abscissa of the pixel point coordinate of the source image is 2, and the integer is obtained after the 2 is divided, so that a first abscissa is 1; multiplying the second width value by the first ordinate value, and adding the first abscissa value to obtain a first address offset, that is, 8 × 1+1 is 9; therefore, in a storage area with even ordinate and abscissa, the offset of the coordinates (2, 2) of the pixel points of the source image relative to the initial address 0 of the storage area is determined to be 9.

Therefore, in the example, the storage address information of the pixel point data of the source image is determined according to the ordinate and the abscissa of the pixel point coordinate of the source image and the width of the source image, which is beneficial to efficiently reading the pixel point data.

In some possible examples, referring collectively to fig. 12, the determining a second address offset based on the ordinate and the abscissa of the source image pixel point coordinate and the width of the source image comprises: dividing the width of the source image by 2 to obtain a third width value; dividing the ordinate of the pixel point coordinate of the source image by 2 and then rounding to obtain a second ordinate value; dividing the abscissa of the pixel point coordinate of the source image by 2 and then rounding to obtain a second abscissa value; and multiplying the third width value by the second ordinate value, and adding the second abscissa value to obtain the second address offset.

For example, assuming that the source image has a width of 16 pixels, a pixel point a on the source image₂₃ ^l-1The coordinates (Sx, Sy) ═ 3, 2, the meterThe process of calculating the second address offset is: dividing the width of the source image by 2 to obtain a third width value of 8; the ordinate of the pixel point coordinate of the source image is 2, and the integer is obtained after dividing by 2 to obtain a second ordinate value which is 1; the abscissa of the pixel point coordinate of the source image is 3, and the integer is obtained after the abscissa is divided by 2, so that a second abscissa value is 1; multiplying the third width value by the second ordinate value, and adding the second abscissa value to obtain a second address offset, that is, 8 × 1+1 is 9; thus determining the offset of the coordinates (2, 3) of the pixel points of the source image relative to the initial address 0 to be 9; therefore, in a storage area with even ordinate and odd abscissa, the offset of the coordinates (3, 2) of the pixel points of the source image relative to the initial address 0 of the storage area is determined to be 9.

In some possible examples, referring to fig. 13 together, the determining a third address offset according to the ordinate and the abscissa of the pixel point coordinate of the source image and the width of the source image includes: dividing the width of the source image by 2 and then rounding to obtain a fourth width value; if the fourth width value is an odd number, adding 1 to the fourth width value to obtain a fifth width value; if the fourth width value is an even number, taking the fourth width value as the fifth width value; if the vertical coordinate of the pixel point coordinate of the source image is an odd number, adding 1 to the vertical coordinate of the pixel point coordinate of the source image to obtain a third vertical coordinate value; if the vertical coordinate of the pixel point coordinate of the source image is an even number, taking the vertical coordinate of the pixel point coordinate of the source image as the third vertical coordinate value; dividing the third ordinate value by 2 and rounding to obtain a fourth ordinate value; dividing the abscissa of the pixel point coordinate of the source image by 2 and then rounding to obtain a third abscissa value; multiplying the fifth width value by the fourth ordinate value, and adding the third abscissa value to obtain a first numerical value; if the abscissa of the pixel point coordinate of the source image is an odd number, adding 1 to the first numerical value to obtain the third address offset; and if the abscissa of the pixel point coordinate of the source image is an even number, taking the first numerical value as the third address offset.

For example, assuming that the source image has a width of 16 pixels, a pixel point a on the source image₁₂ ^l-1And the coordinate (Sx, Sy) is (2, 1), and the process of calculating the third address offset is as follows: dividing the width of the source image by 2 to obtain a fourth width value of 8; if the fourth width value is an even number, taking the fourth width value as a fifth width value, wherein the fifth width value is 8; if the ordinate of the pixel point coordinate of the source image is 1 and is an odd number, adding 1 to the ordinate of the pixel point coordinate of the source image to obtain a third ordinate value of 2; dividing the third ordinate value by 2 and rounding to obtain a fourth ordinate value of 1; the abscissa of the pixel point coordinate of the source image is 2, and the integer is obtained after the abscissa is divided by 2, so that a third abscissa value is 1; multiplying the fifth width value by the fourth ordinate value, and adding the third abscissa value to obtain a first value, that is, 8 × 1+1 is 9, where the first value is 9; the abscissa of the pixel point coordinate of the source image is an even number, the first numerical value is used as a third address offset, and the third address offset is 9; therefore, in the storage area with the odd ordinate and the even abscissa, the offset of the coordinates (2, 1) of the pixel points of the source image relative to the initial address 0 of the storage area is determined to be 9.

In some possible examples, referring also to fig. 14, the determining a fourth address offset according to the ordinate and the abscissa of the pixel point coordinate of the source image and the width of the source image includes: adding 1 to the width of the source image to obtain a sixth width value; if the vertical coordinate of the pixel point coordinate of the source image is an odd number, adding 1 to the vertical coordinate of the pixel point coordinate of the source image to obtain a fifth vertical coordinate value; if the vertical coordinate of the pixel point coordinate of the source image is an even number, taking the vertical coordinate of the pixel point coordinate of the source image as the fifth vertical coordinate value; dividing the fifth ordinate value by 2 and rounding to obtain a sixth ordinate value; dividing the abscissa of the pixel point coordinate of the source image by 2 and then rounding to obtain a fourth abscissa value; multiplying the sixth width value by the sixth ordinate value, and adding the fourth abscissa value to obtain a second numerical value; if the abscissa of the pixel point coordinate of the source image is an odd number, adding 1 to the second numerical value to obtain the fourth address offset; and if the abscissa of the pixel point coordinate of the source image is an even number, taking the second numerical value as the fourth address offset.

For example, assuming that the source image has a width of 16 pixels, a pixel point a on the source image₁₃ ^l-1And the coordinate (Sx, Sy) is (3, 1), and the process of calculating the third address offset is as follows: the width of the source image is 16, and 1 is added to obtain a sixth width value of 17; if the vertical coordinate of the pixel point coordinate of the source image is 1 and is an odd number, adding 1 to obtain a fifth vertical coordinate value of 2; dividing the fifth ordinate value by 2 and rounding to obtain a sixth ordinate value of 1; the abscissa of the pixel point coordinate of the source image is 3, and the integer is obtained after dividing by 2 to obtain a fourth abscissa value of 1; multiplying the sixth width value by the sixth ordinate value, and adding the fourth abscissa value to obtain a second value, that is, 17 × 1+1 is 18, and the second value is 18; if the abscissa of the pixel point coordinate of the source image is 3 and the pixel point coordinate is an odd number, adding 1 to the second numerical value to obtain a fourth address offset of 19; therefore, in the storage area with odd ordinate and abscissa, the offset of the coordinates (3, 1) of the pixel point of the source image relative to the initial address 0 of the storage area is determined to be 19.

In some possible examples, the internal storage module includes a first storage unit, a second storage unit, a third storage unit, and a fourth storage unit, and the reading of the third source image pixel point data from the internal storage module according to the first storage address information includes: reading the third source image pixel point data from the first storage unit according to the first address offset; reading the third source image pixel point data from the second storage unit according to the second address offset; reading the third source image pixel point data from the third storage unit according to the third address offset; and reading the third source image pixel point data from the fourth storage unit according to the fourth address offset.

Specifically, source image pixel point data with even ordinate and even abscissa are stored in the first storage unit, source image pixel point data with even ordinate and odd abscissa are stored in the second storage unit, source image pixel point data with odd ordinate and even abscissa are stored in the third storage unit, and source image pixel point data with odd ordinate and odd abscissa are stored in the fourth storage unit.

Therefore, in this example, the pixel data of the image is stored in 4 different storage units respectively according to the horizontal and vertical coordinate odd-even properties of the pixel, and the horizontal and vertical coordinate odd-even properties of the pixel in each storage unit are the same, so that when the pixel data is read, the pixel data of 4 pixels can be synchronously read, and the reading efficiency of the pixel data is improved.

In some possible examples, the reading the third source image pixel point data from the external storage module according to the source image pixel point coordinates includes: calculating according to the pixel point coordinates of the source image to obtain second storage address information; and reading the third source image pixel point data from the external storage module according to the second storage address information.

Therefore, in this example, by designing the storage address logic, the second storage address information of the pixel point data stored in the external storage module is related to the pixel point coordinates, and the second storage address information can be calculated through the pixel point coordinates of the source image, so that the pixel point data is read, and the efficiency of reading the data is improved.

In some possible examples, the source image pixel point coordinates include coordinates of a first source image pixel point, coordinates of a second source image pixel point, coordinates of a third source image pixel point, and coordinates of a fourth source image pixel point, the vertical coordinates of the first source image pixel point and the second source image pixel point are the same, the vertical coordinates of the third source image pixel point and the fourth source image pixel point are the same, the horizontal coordinates of the first source image pixel point and the third source image pixel point are the same, the horizontal coordinates of the second source image pixel point and the fourth source image pixel point are the same, and the second storage address information is obtained by calculation according to the source image pixel point coordinates, including: determining a first storage address of a pixel point of the first source image according to an image storage starting address, the width of the source image, the vertical coordinate of the pixel point of the first source image and the horizontal coordinate of the pixel point of the first source image, and taking one storage address behind the first storage address as a second storage address of the pixel point of the second source image; determining a third storage address of the third source image pixel point according to the image storage starting address, the width of the source image, the vertical coordinate of the third source image pixel point and the horizontal coordinate of the third source image pixel point, and taking a storage address behind the third storage address as a fourth storage address of the fourth source image pixel point.

For example, assuming that the neural network computing device is used for implementing an STN algorithm, the coordinate mapping is performed to obtain a pixel point coordinate (x ', y') (1.6, 2.4), the rounded source image pixel point coordinate (Sx, Sy) is that a first source image pixel point is (1, 2), a second source image pixel point is (1, 3), a third source image pixel point is (2, 2) and a fourth source image pixel point is (2, 3), since pixel data are continuously stored in an external storage module, (1, 2) and 2(1, 3) are adjacently stored, and (2, 2) and (2, 3) are adjacently stored, a first storage address of (1, 2) is obtained by calculation, and a next storage address of the first storage address is a second storage address of (1, 3); and calculating to obtain a third storage address of (2, 2), wherein the next storage address of the third storage address is a fourth storage address of (2, 3).

In this example, the pixel point data of the source image is continuously stored in the external storage module, so that when the storage address of the adjacent pixel point data in the same row is calculated, only the storage address of the first pixel point is calculated, and the storage address behind the storage address of the first pixel point is used as the storage address of the adjacent next pixel point data, so that the address calculation can be reduced when the data is read, and the data reading efficiency is improved.

In some possible examples, the first memory address is determined according to the following formula: and the first storage address is equal to the image storage starting address, the ordinate of the pixel point of the first source image, the width of the source image and the abscissa of the pixel point of the first source image.

In some possible examples, the third memory address is determined according to the following formula: and the third storage address is equal to the image storage starting address + (the vertical coordinate of the third source image pixel point +1) × the width of the source image + the horizontal coordinate of the third source image pixel point.

For the external memory address, 4 pixel point data need to be read, and the memory addresses of two pixel point data of the same row are continuous, so that the neural network computing device can send two read requests, each time two bursts are read, wherein the burst refers to one read of the bus.

For example, referring to fig. 15 together, fig. 15 is a schematic structural diagram of a data storage address provided in the embodiment of the present application, assuming that a bus width is 128 bits, one burst is 128-bit data, two bursts are read for each data read, and assuming that each pixel has 1 byte, the storage address can be calculated by the following formula:

(1) for pixel point data of a first line of pixel points (a first source image pixel point and a second source image pixel point):

the first storage address is the image storage starting address + Sy × the width of the source image + Sx;

(2) for pixel point data of a second row of pixel points (a third source image pixel point and a fourth source image pixel point):

the third storage address is image storage start address + (Sy +1) × width + Sx of the source image;

as shown in fig. 15, if the first pixel (1, 2) of the same row stores the 0 th byte of the first external storage address field, then the second pixel (1, 3) stores the 1 st byte of the first external storage address field; similarly, if the first pixel (1, 2) in the same row stores the 15 th byte of the first address field externally, then the second pixel (1, 3) stores the 0 th byte of the second address field externally.

In some possible examples, the reading the third source image pixel point data from the external storage module according to the second storage address information includes: reading primary pixel point data from the external storage module according to the first storage address and the second storage address to obtain pixel point data of the first source image pixel point and pixel point data of the second source image pixel point; reading primary pixel point data from the external storage module according to the third storage address and the fourth storage address to obtain pixel point data of the third source image pixel point and pixel point data of the fourth source image pixel point.

The data width in the bus protocol determines the width of data read back at each address, for example, the data width is 128 bits (16 bytes), then 16 bytes are read back at each address, and the data read back at each address is byte-aligned according to the data width, that is, the data is read back at any one of the addresses 0/1/2/3/.

For example, the first storage address obtained by calculation is the 0 th byte of the external storage address field one in fig. 15, that is, the storage location of the first pixel point (the first source image pixel point) in the first row is the 0 th byte of the external storage address field one, and then the storage location of the second pixel point (the second source image pixel point) in the first row is the 1 st byte of the external storage address field one; since the storage location of the first pixel in the first row may be the 15 th byte of the first external storage address field, the second pixel in the first row may be the 0 th byte of the second external storage address field.

(1) For the first row of pixels (first source image pixel and second source image pixel)

Assuming that the first storage address is the 0 th byte of the first external storage address field, the data read back by the first of the two bursts are data0, data1, and data15, the data read back by the second burst are data16, data17, and data31, the first source image pixel point is data0, and the second source image pixel point is data 1.

Assuming that the first memory address is the 7 th byte of the first external memory address segment, the data read back by the first burst is also data0, data1, and the data read back by the second burst is also data16, data17, and the data read back by the second burst is data31, then the first source image pixel point is data7 and the second source image pixel point is data 8.

Assuming that the first storage address is the 15 th byte of the first external storage address segment, the data read back by the first burst is also data0, data1, and data15, and the data read back by the second burst is also data16, data17, and data31, the first source image pixel point is data15, and the second source image pixel point corresponds to data16 in the next address segment, that is, the 0 th byte of the second external storage address segment.

(2) For the pixel point data of the second line of pixel points (the third source image pixel points and the fourth source image pixel points), the read data and the first line of pixel points are the same.

It can be seen that, in this example, since the pixel data of the source image is stored continuously in the external storage module, when reading the pixel data adjacent to the same line, the data is read only according to the storage address of the first pixel in the line, and a plurality of data are returned by reading once, and among the plurality of data, the data stored in the next address connected to the first pixel is the data of the next pixel adjacent to the first pixel, and the data of other pixels in the same line can be read only according to the storage address of the first pixel, so that the data reading efficiency is improved.

Referring to fig. 16, fig. 16 is a schematic structural diagram of a neural network computing device according to an embodiment of the present disclosure, where the neural network computing device includes a system control module, a computing module, a bus control interface module and an internal storage module, the system control module is communicatively connected to the computing module, the bus control interface module is communicatively connected to the computing module and the internal storage module respectively, and the neural network computing device is communicatively connected to an external storage module through the bus control interface module;

Specifically, the calculation starting signal includes a target image pixel coordinate (x, y) and a spatial transformation parameter w, where the spatial transformation parameter w includes 6 parameters, and the system control module sends the target image pixel coordinate (x, y) and the spatial transformation parameter w to the coordinate calculation unit through the calculation starting signal; the coordinate calculation unit calculates pixel point coordinates (x ', y') according to the target image pixel point coordinates (x, y) and the space transformation parameter w, rounds the (x ', y') to obtain 4 source image pixel point coordinates (Sx, Sy), and sends the coordinates to the address calculation unit; the address calculation unit calculates address information stored by the pixel points of the 4 source images according to the pixel point coordinates (Sx, Sy) of the 4 source images, and sends the address information to the bus control interface module; the bus control interface module reads the pixel data of the 4 source image pixel points from the internal storage module or the external storage module according to the address information, and then sends the pixel data of the 4 source image pixel points to the pixel calculation unit; the pixel calculation unit calculates the pixel data of the target image according to the pixel data of the 4 source image pixel points, and then sends the pixel data of the target image to the bus control interface module.

It should be noted that, because the internal storage of the neural network computing device is limited, the large-size source image is stored in the internal storage module and the external storage module in blocks in the embodiment of the present application, so that the neural network computing device provided in the embodiment of the present application can implement spatial transformation network computing of the large-size image.

In some possible examples, the first source image pixel point data is pixel point data of a first source image pixel point of the source image whose vertical coordinate is not greater than a preset threshold, and the second source image pixel point data is pixel point data of a second source image pixel point of the source image whose vertical coordinate is greater than the preset threshold.

Because the internal storage space is limited, the pixel point data of a large-size source image cannot be completely stored in the internal storage module, the large-size source image needs to be stored in blocks, a preset threshold value can be set for the vertical coordinate of the pixel point coordinate, the pixel point data of which the vertical coordinate is not more than the preset threshold value is stored in the internal storage module, and the pixel point data of which the vertical coordinate is more than the preset threshold value is stored in the external storage module.

In some possible examples, the computing module includes: the coordinate calculation unit is used for calculating pixel point coordinates of the source image according to the pixel point coordinates of the target image and the transformation parameters to obtain pixel point coordinates of the source image; the address calculation unit is used for calculating to obtain the storage address information according to the pixel point coordinates of the source image and sending the storage address information to the bus control interface module; and the pixel calculation unit is used for calculating and obtaining the target image pixel point data according to the third source image pixel point data.

Specifically, the coordinate calculation unit may be configured to perform coordinate mapping through the target image pixel point coordinates (x, y) and the spatial transformation parameter w to obtain pixel point coordinates (x ', y'), and obtain 4 source image pixel point coordinates (Sx, Sy) by rounding (x ', y'); the address calculation unit calculates the storage address information of the pixel points of the 4 source images according to the coordinates (Sx, Sy) of the pixel points of the 4 source images; and the pixel calculation unit is used for calculating according to the 4 source image pixel point data obtained by pixel sampling to obtain target image pixel point data corresponding to the target image pixel point coordinates (x, y).

In some possible examples, the address calculation unit includes: the coordinate judgment subunit is used for judging whether the ordinate of the pixel point coordinate of the source image is greater than the preset threshold value or not, and sending an external read data waiting signal to the coordinate calculation unit under the condition that the ordinate of the pixel point coordinate of the source image is greater than the preset threshold value, wherein the external read data waiting signal is used for indicating the coordinate calculation unit to suspend calculation; the internal storage address calculation subunit is configured to, when the ordinate of the source image pixel point coordinate is not greater than the preset threshold, calculate, according to the source image pixel point coordinate, to obtain first storage address information, where the first storage address information is storage address information of the third source image pixel point data in the internal storage module; and the external storage address calculation subunit is used for calculating second storage address information according to the pixel point coordinates of the source image under the condition that the vertical coordinates of the pixel point coordinates of the source image are greater than the preset threshold value, wherein the second storage address information is storage address information of the pixel point data of the third source image in the external storage module.

In order to simplify calculation, it is necessary to ensure that pixel data of the same line of a source image are stored in one place every time, but a part of pixel data of the same line cannot be stored in an internal cache, and another part of pixel data is stored in an external storage space.

Referring to fig. 17, fig. 17 is a schematic structural diagram of an address calculating unit according to an embodiment of the present disclosure, in which a coordinate calculating unit calculates (Sx, Sy) and sends the (Sx, Sy) to a coordinate determining subunit, and the coordinate determining subunit determines whether Sy in the (Sx, Sy) is greater than a preset threshold. The judgment process is as follows: assuming that the ordinate of a source image is in the interval [0, height-1], setting a preset threshold value as read _ height-1, and comparing Sy with read _ height-1; if Sy is less than or equal to read _ height-1, indicating that the coordinate point pixel is in an internal cache and directly reading the coordinate point pixel in the internal cache; if Sy > read _ height-1, then it is stated that the coordinate point pixels are stored externally, data needs to be read externally through the bus, and an external read data wait signal is sent to the coordinate calculation unit. If Sy is less than or equal to read _ height-1, the coordinate judging subunit sends (Sx, Sy) to the internal storage address calculating subunit, the internal storage address calculating subunit calculates first storage address information of the internal storage module according to (Sx, Sy), and judges that (Sx, Sy) is internal storage data; if Sy > read _ height-1, the coordinate judging subunit sends (Sx, Sy) to the external storage address calculating subunit, and the external storage address calculating subunit calculates second storage address information of the external storage module according to (Sx, Sy) and judges that (Sx, Sy) is external storage data. And after the internal storage module or the external storage module acquires the pixel point data of the source image, the data selection logic module is used for selecting whether the pixel point data of the current pixel calculation unit comes from the internal storage module or the external storage module.

It should be noted that the computational process performed by the neural network device is a pipelined design, since the time to read data from the internal cache is predictable and can be controlled by the pipeline. However, if the data is read from the external memory, since the bus is shared by a plurality of modules, there is a possibility of competition and congestion, the time for returning the data is definitely longer than the time for internally caching the returned data, and the time for returning the data is unpredictable. In order to ensure the continuity of data, namely to ensure the normal of a production line, an external read data waiting signal needs to be designed at this time, a coordinate calculation unit in front is suspended for waiting, after the data read by a current external storage module is returned, new pixel point coordinates of a target image are sent to the coordinate calculation unit, and then the calculation of the coordinate calculation unit is restarted.

Referring to fig. 18, fig. 18 is a schematic diagram of determining a pixel coordinate according to an embodiment of the present application, where when the coordinate determination subunit determines Sy > read _ height-1, it indicates that a data return time is unpredictable when data needs to be read from an external storage module, so that to ensure a pipeline is normal, a previous coordinate calculation unit needs to be suspended for waiting, and an external read data waiting signal is sent to the coordinate calculation unit.

In some possible examples, in terms of obtaining the first storage address information by calculating the pixel point coordinates of the source image, the internal storage address calculating subunit is specifically configured to: if the ordinate and the abscissa of the pixel point coordinate of the source image are both even numbers, determining a first address offset according to the ordinate and the abscissa of the pixel point coordinate of the source image and the width of the source image; if the ordinate of the pixel coordinates of the source image is even and the abscissa is odd, determining a second address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image; if the ordinate of the pixel coordinates of the source image is odd and the abscissa is even, determining a third address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image; and if the ordinate and the abscissa of the pixel coordinates of the source image are both odd numbers, determining a fourth address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image.

In some possible examples, in determining the first address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image, the internal storage address calculation subunit is specifically configured to: dividing the width of the source image by 2 and then rounding to obtain a first width value; if the first width value is an odd number, adding 1 to the first width value to obtain a second width value; if the first width value is an even number, taking the first width value as the second width value; dividing the ordinate of the pixel point coordinate of the source image by 2 and then rounding to obtain a first ordinate value; dividing the abscissa of the pixel point coordinate of the source image by 2 and then rounding to obtain a first abscissa value; and multiplying the second width value by the first ordinate value, and adding the first abscissa value to obtain the first address offset.

In some possible examples, in determining the second address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image, the internal storage address calculation subunit is specifically configured to: dividing the width of the source image by 2 to obtain a third width value; dividing the ordinate of the pixel point coordinate of the source image by 2 and then rounding to obtain a second ordinate value; dividing the abscissa of the pixel point coordinate of the source image by 2 and then rounding to obtain a second abscissa value; and multiplying the third width value by the second ordinate value, and adding the second abscissa value to obtain the second address offset.

In some possible examples, in determining the third address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image, the internal storage address calculation subunit is specifically configured to: dividing the width of the source image by 2 and then rounding to obtain a fourth width value; if the fourth width value is an odd number, adding 1 to the fourth width value to obtain a fifth width value; if the fourth width value is an even number, taking the fourth width value as the fifth width value; if the vertical coordinate of the pixel point coordinate of the source image is an odd number, adding 1 to the vertical coordinate of the pixel point coordinate of the source image to obtain a third vertical coordinate value; if the vertical coordinate of the pixel point coordinate of the source image is an even number, taking the vertical coordinate of the pixel point coordinate of the source image as the third vertical coordinate value; dividing the third ordinate value by 2 and rounding to obtain a fourth ordinate value; dividing the abscissa of the pixel point coordinate of the source image by 2 and then rounding to obtain a third abscissa value; multiplying the fifth width value by the fourth ordinate value, and adding the third abscissa value to obtain a first numerical value; if the abscissa of the pixel point coordinate of the source image is an odd number, adding 1 to the first numerical value to obtain the third address offset; and if the abscissa of the pixel point coordinate of the source image is an even number, taking the first numerical value as the third address offset.

In some possible examples, in determining the fourth address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image, the internal storage address calculation subunit is specifically configured to: adding 1 to the width of the source image to obtain a sixth width value; if the vertical coordinate of the pixel point coordinate of the source image is an odd number, adding 1 to the vertical coordinate of the pixel point coordinate of the source image to obtain a fifth vertical coordinate value; if the vertical coordinate of the pixel point coordinate of the source image is an even number, taking the vertical coordinate of the pixel point coordinate of the source image as the fifth vertical coordinate value; dividing the fifth ordinate value by 2 and rounding to obtain a sixth ordinate value; dividing the abscissa of the pixel point coordinate of the source image by 2 and then rounding to obtain a fourth abscissa value; multiplying the sixth width value by the sixth ordinate value, and adding the fourth abscissa value to obtain a second numerical value; if the abscissa of the pixel point coordinate of the source image is an odd number, adding 1 to the second numerical value to obtain the fourth address offset; and if the abscissa of the pixel point coordinate of the source image is an even number, taking the second numerical value as the fourth address offset.

In some possible examples, the internal storage module includes: the first storage unit is used for storing source image pixel point data of even-numbered row and even-numbered column pixel points in the first source image pixel point data; the second storage unit is used for storing source image pixel point data of even-numbered row and odd-numbered column pixel points in the first source image pixel point data; the third storage unit is used for storing source image pixel point data of odd-numbered row and even-numbered column pixel points in the first source image pixel point data; and the fourth storage unit is used for storing source image pixel point data of odd-numbered row pixel points in the first source image pixel point data.

From the above description of the algorithm, after the STN is mapped to the coordinates each time, 4 pixel point data need to be taken out from the source image to perform bilinear interpolation, and the four points are adjacent to each other, so that the four points can be stored in different storage spaces respectively according to the characteristics, and all the 4 pixel point data can be read out in one clock cycle.

In some possible examples, the bus control interface module is specifically configured to: reading the third source image pixel point data from the first storage unit according to the first address offset; and/or reading the third source image pixel point data from the second storage unit according to the second address offset; and/or reading the third source image pixel point data from the third storage unit according to the third address offset; and/or reading the third source image pixel point data from the fourth storage unit according to the fourth address offset.

In some possible examples, the source image pixel point coordinates include coordinates of a first source image pixel point, coordinates of a second source image pixel point, coordinates of a third source image pixel point, and coordinates of a fourth source image pixel point, the vertical coordinates of the first source image pixel point and the second source image pixel point are the same, the vertical coordinates of the third source image pixel point and the fourth source image pixel point are the same, the horizontal coordinates of the first source image pixel point and the third source image pixel point are the same, the horizontal coordinates of the second source image pixel point and the fourth source image pixel point are the same, and in terms of obtaining second storage address information by calculation according to the source image pixel point coordinates, the external storage address calculation subunit is configured to: determining a first storage address of a pixel point of the first source image according to an image storage starting address, the width of the source image, the vertical coordinate of the pixel point of the first source image and the horizontal coordinate of the pixel point of the first source image, and taking one storage address behind the first storage address as a second storage address of the pixel point of the second source image; determining a third storage address of the third source image pixel point according to the image storage starting address, the width of the source image, the vertical coordinate of the third source image pixel point and the horizontal coordinate of the third source image pixel point, and taking a storage address behind the third storage address as a fourth storage address of the fourth source image pixel point.

Because the internal storage space is limited, the pixel point coordinates of the source image obtained according to the target image pixel point coordinate mapping can not be found in the internal storage space, and at this time, the pixel point data needs to be searched in the external storage.

In some possible examples, the bus control interface module is specifically configured to: reading primary pixel point data from the external storage module according to the first storage address and the second storage address to obtain pixel point data of the first source image pixel point and pixel point data of the second source image pixel point; reading primary pixel point data from the external storage module according to the third storage address and the fourth storage address to obtain pixel point data of the third source image pixel point and pixel point data of the fourth source image pixel point.

In some possible examples, the apparatus further comprises: and the output cache module is used for caching the pixel point data of the target image.

It can be understood that, because the calculation process is a pipeline design and the bus is shared, the bus control interface module may not transmit the target image pixel point data calculated by the pixel calculation unit to the outside in time, and therefore an output buffer module is required to be arranged for buffering the target image pixel point data, thereby ensuring that the calculation process is performed in order.

In some possible examples, the system control module is further communicatively connected to the bus control interface module, and the system control module is further configured to send a read data start signal to the bus control interface module.

It can be understood that after the system control module sends the calculation starting signal to the calculation module, the calculation module can calculate the storage address information of the pixel point data and send the storage address information to the bus control interface module; the system control module needs to send a data reading start signal to the bus control interface module, and then the bus control interface module reads the pixel point data of the source image from the internal storage module or the external storage module according to the storage address information.

It should be noted that the implementation of each module or unit shown in fig. 16 may also correspond to the corresponding description in the method embodiments shown in fig. 5 and fig. 10.

In the neural network computing device shown in fig. 16, a source image pixel point data is stored in an internal storage module and an external storage module in blocks, the internal storage module stores a first source image pixel point data, the external storage module stores a second source image pixel point data, and the first source image pixel point data and the second source image pixel point data form all pixel point data of the source image; in the neural network computing device, a system control module sends a computing starting signal comprising the coordinates of pixel points of a target image and transformation parameters to a computing module; the calculation module calculates to obtain the storage address information of the pixel point data of the third source image according to the pixel point coordinates of the target image and sends the storage address information to the bus control interface module; the bus control interface module reads third source image pixel point data from the internal storage module and/or the external storage module according to the storage address information and sends the third source image pixel point data to the calculation module; the calculation module calculates to obtain target image pixel point data according to the third source image pixel point data and the transformation parameters; the neural network computing device can efficiently read pixel point data from the memory in the computing process because the pixel point data of the source image is stored in the internal storage module and the external storage module in a blocking mode, and therefore computing efficiency is improved.

Referring to fig. 19, fig. 19 is a schematic structural diagram of a data storage device according to an embodiment of the present application, where the data storage device 1900 is applied to a neural network computing device, the neural network computing device includes an internal storage module, the neural network computing device is communicatively connected to an external storage module, and the data storage device includes:

an obtaining unit 1901, configured to obtain pixel point data of a source image;

the storage unit 1902 is configured to store, in the internal storage module, pixel point data of a first source image pixel point whose vertical coordinate is not greater than a preset threshold in the source image, and store, in the external storage module, pixel point data of a second source image pixel point whose vertical coordinate is greater than the preset threshold in the source image.

In some possible examples, the internal storage module includes a first storage unit, a second storage unit, a third storage unit, and a fourth storage unit, and in terms of storing, in the internal storage module, pixel point data of a first source image pixel point whose ordinate in the source image is not greater than a preset threshold, the storage unit 1902 is specifically configured to: storing pixel point data of pixel points of a first source image with even ordinate and even abscissa in a first storage unit; storing pixel point data of pixel points of a first source image with an even ordinate and an odd abscissa in a second storage unit; storing pixel point data of pixel points of a first source image with odd ordinate and even abscissa in a third storage unit; and storing the pixel point data of the pixel points of the first source image with odd ordinate and odd abscissa in a fourth storage unit.

In some possible examples, in terms of storing, in a first storage unit, pixel point data of a first source image pixel point whose ordinate and abscissa are even numbers, the storage unit 1902 is specifically configured to: arranging the first source image pixel points with even ordinate and abscissa from small to large according to the ordinate to obtain a first arrangement sequence, wherein the first source image pixel points with the same ordinate in the first arrangement sequence are arranged from small to large according to the abscissa; and storing the pixel point data of the pixel points of the first source image with even ordinate and even abscissa in the first storage unit according to the first arrangement sequence.

In some possible examples, in terms of storing, in an external storage module, pixel point data of a pixel point of a second source image of which a vertical coordinate is greater than the preset threshold in the source image, the storage unit 1902 is specifically configured to: arranging second source image pixels with vertical coordinates larger than the preset threshold value in the source images from small to large according to the vertical coordinates to obtain a second arrangement sequence, wherein the second source image pixels with the same vertical coordinates in the second arrangement sequence are arranged from small to large according to the horizontal coordinates; and storing the pixel point data of the pixel points of the second source image in the external storage module according to the second arrangement sequence.

It should be noted that the implementation of each unit shown in fig. 19 may also correspond to the corresponding description in the method embodiment shown in fig. 5.

In the data storage device shown in fig. 19, the image is stored in blocks, one part of the image blocks are stored in the internal storage module of the neural network computing device, and the other part of the image blocks are stored in the external storage module, so that the computing process is not limited by the size of the image, the image can be efficiently stored even if the image is large in size, and the computing efficiency is improved.

Referring to fig. 20, fig. 20 is a schematic structural diagram of a data reading device according to an embodiment of the present application, where the data reading device 2000 is applied to a neural network computing device, the neural network computing device includes an internal storage module, the neural network computing device is in communication connection with an external storage module, the internal storage module stores first source image pixel point data, the external storage module stores second source image pixel point data, and the first source image pixel point data and the second source image pixel point data form pixel point data of a source image, and the data reading device 2000 includes:

a determining unit 2001, configured to determine coordinates of pixel points of the source image;

a reading unit 2002, configured to, if the ordinate of the source image pixel point coordinate is not greater than a preset threshold, read a third source image pixel point data from the internal storage module according to the source image pixel point coordinate;

the reading unit 2002 is further configured to, if the ordinate of the source image pixel point coordinate is greater than the preset threshold, read a third source image pixel point data from the external storage module according to the source image pixel point coordinate.

In some possible examples, the determining unit 2001 is specifically configured to: acquiring coordinates of pixel points of a target image; and calculating to obtain the pixel point coordinates of the source image according to the pixel point coordinates of the target image.

In some possible examples, in terms of reading third source image pixel point data from the internal storage module according to the source image pixel point coordinates, the reading unit 2002 is specifically configured to: calculating to obtain first storage address information according to the pixel point coordinates of the source image; and reading third source image pixel point data from the internal storage module according to the first storage address information.

In some possible examples, in terms of obtaining the first storage address information by calculating the pixel point coordinates of the source image, the reading unit 2002 is specifically configured to: if the ordinate and the abscissa of the pixel point coordinate of the source image are both even numbers, determining a first address offset according to the ordinate and the abscissa of the pixel point coordinate of the source image and the width of the source image; if the ordinate of the pixel coordinates of the source image is even and the abscissa is odd, determining a second address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image; if the ordinate of the pixel coordinates of the source image is odd and the abscissa is even, determining a third address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image; and if the ordinate and the abscissa of the pixel coordinates of the source image are both odd numbers, determining a fourth address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image.

In some possible examples, in determining the first address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image, the reading unit 2002 is specifically configured to: dividing the width of the source image by 2 and then rounding to obtain a first width value; if the first width value is an odd number, adding 1 to the first width value to obtain a second width value; if the first width value is an even number, taking the first width value as the second width value; dividing the ordinate of the pixel point coordinate of the source image by 2 and then rounding to obtain a first ordinate value; dividing the abscissa of the pixel point coordinate of the source image by 2 and then rounding to obtain a first abscissa value; and multiplying the second width value by the first ordinate value, and adding the first abscissa value to obtain the first address offset.

In some possible examples, in determining the second address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image, the reading unit 2002 is specifically configured to: dividing the width of the source image by 2 to obtain a third width value; dividing the ordinate of the pixel point coordinate of the source image by 2 and then rounding to obtain a second ordinate value; dividing the abscissa of the pixel point coordinate of the source image by 2 and then rounding to obtain a second abscissa value; and multiplying the third width value by the second ordinate value, and adding the second abscissa value to obtain the second address offset.

In some possible examples, in determining the third address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image, the reading unit 2002 is specifically configured to: dividing the width of the source image by 2 and then rounding to obtain a fourth width value; if the fourth width value is an odd number, adding 1 to the fourth width value to obtain a fifth width value; if the fourth width value is an even number, taking the fourth width value as the fifth width value; if the vertical coordinate of the pixel point coordinate of the source image is an odd number, adding 1 to the vertical coordinate of the pixel point coordinate of the source image to obtain a third vertical coordinate value; if the vertical coordinate of the pixel point coordinate of the source image is an even number, taking the vertical coordinate of the pixel point coordinate of the source image as the third vertical coordinate value; dividing the third ordinate value by 2 and rounding to obtain a fourth ordinate value; dividing the abscissa of the pixel point coordinate of the source image by 2 and then rounding to obtain a third abscissa value; multiplying the fifth width value by the fourth ordinate value, and adding the third abscissa value to obtain a first numerical value; if the abscissa of the pixel point coordinate of the source image is an odd number, adding 1 to the first numerical value to obtain the third address offset; and if the abscissa of the pixel point coordinate of the source image is an even number, taking the first numerical value as the third address offset.

In some possible examples, in determining the fourth address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image, the reading unit 2002 is specifically configured to: adding 1 to the width of the source image to obtain a sixth width value; if the vertical coordinate of the pixel point coordinate of the source image is an odd number, adding 1 to the vertical coordinate of the pixel point coordinate of the source image to obtain a fifth vertical coordinate value; if the vertical coordinate of the pixel point coordinate of the source image is an even number, taking the vertical coordinate of the pixel point coordinate of the source image as the fifth vertical coordinate value; dividing the fifth ordinate value by 2 and rounding to obtain a sixth ordinate value; dividing the abscissa of the pixel point coordinate of the source image by 2 and then rounding to obtain a fourth abscissa value; multiplying the sixth width value by the sixth ordinate value, and adding the fourth abscissa value to obtain a second numerical value; if the abscissa of the pixel point coordinate of the source image is an odd number, adding 1 to the second numerical value to obtain the fourth address offset; and if the abscissa of the pixel point coordinate of the source image is an even number, taking the second numerical value as the fourth address offset.

In some possible examples, the internal storage module includes a first storage unit, a second storage unit, a third storage unit, and a fourth storage unit, and in terms of reading third source image pixel point data from the internal storage module according to the first storage address information, the reading unit 2002 is specifically configured to: reading the third source image pixel point data from the first storage unit according to the first address offset; reading the third source image pixel point data from the second storage unit according to the second address offset; reading the third source image pixel point data from the third storage unit according to the third address offset; and reading the third source image pixel point data from the fourth storage unit according to the fourth address offset.

In some possible examples, in terms of reading the third source image pixel point data from the external storage module according to the source image pixel point coordinates, the reading unit 2002 is specifically configured to: calculating according to the pixel point coordinates of the source image to obtain second storage address information; and reading the third source image pixel point data from the external storage module according to the second storage address information.

In some possible examples, the source image pixel point coordinates include coordinates of a first source image pixel point, coordinates of a second source image pixel point, coordinates of a third source image pixel point, and coordinates of a fourth source image pixel point, the vertical coordinates of the first source image pixel point and the second source image pixel point are the same, the vertical coordinates of the third source image pixel point and the fourth source image pixel point are the same, the horizontal coordinates of the first source image pixel point and the third source image pixel point are the same, the horizontal coordinates of the second source image pixel point and the fourth source image pixel point are the same, and in terms of obtaining second storage address information through calculation according to the source image pixel point coordinates, the reading unit 2002 is specifically configured to: determining a first storage address of a pixel point of the first source image according to an image storage starting address, the width of the source image, the vertical coordinate of the pixel point of the first source image and the horizontal coordinate of the pixel point of the first source image, and taking one storage address behind the first storage address as a second storage address of the pixel point of the second source image; determining a third storage address of the third source image pixel point according to the image storage starting address, the width of the source image, the vertical coordinate of the third source image pixel point and the horizontal coordinate of the third source image pixel point, and taking a storage address behind the third storage address as a fourth storage address of the fourth source image pixel point.

In some possible examples, in terms of reading the third source image pixel point data from the external storage module according to the second storage address information, the reading unit 2002 is specifically configured to: reading primary pixel point data from the external storage module according to the first storage address and the second storage address to obtain pixel point data of the first source image pixel point and pixel point data of the second source image pixel point; reading primary pixel point data from the external storage module according to the third storage address and the fourth storage address to obtain pixel point data of the third source image pixel point and pixel point data of the fourth source image pixel point.

It should be noted that the implementation of each unit shown in fig. 20 may also correspond to the corresponding description in the method embodiment shown in fig. 10.

In the data reading apparatus shown in fig. 20, the image is stored in blocks, a part of the image blocks are stored in an internal storage module in the neural network computing apparatus, and another part of the image blocks are stored in an external storage module, in the computing process of the neural network computing apparatus, after determining the pixel point coordinates of the source image, the neural network computing apparatus determines whether the ordinate of the pixel point coordinates of the source image is greater than a preset threshold, the internal storage module reads the pixel point data when the ordinate is not greater than the preset threshold, and the external storage module reads the pixel point data when the ordinate is greater than the preset threshold, so that the computing process is not limited by the image size, even if the image is large-sized, the pixel point data can be efficiently read, and.

Referring to fig. 21, fig. 21 is a schematic structural diagram of a neural network computing device 2110 according to an embodiment of the present disclosure, and as shown in fig. 21, the neural network computing device 2110 includes a communication interface 2111, a processor 2112, a memory 2113, and at least one communication bus 2114 for connecting the communication interface 2111, the processor 2112, and the memory 2113.

The memory 2113 includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a portable read-only memory (CD-ROM), and the memory 2113 is used for related instructions and data.

Communication interface 2111 is used for receiving and transmitting data.

The processor 2112 may be one or more Central Processing Units (CPUs), and in the case where the processor 2112 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.

The processor 2112 in the neural network computing device 2110 is configured to read one or more program codes stored in the memory 2113, and perform the following operations: acquiring pixel point data of a source image; storing pixel point data of pixel points of a first source image of which the vertical coordinate is not more than a preset threshold value in the source image into the internal storage module, and storing pixel point data of pixel points of a second source image of which the vertical coordinate is more than the preset threshold value in the source image into the external storage module.

It should be noted that, the implementation of each operation may also correspond to the corresponding description in the method embodiment shown in fig. 5.

In the neural network computing device 2110 illustrated in fig. 21, an image is stored in blocks, a part of the image blocks are stored in an internal storage module in the neural network computing apparatus, and another part of the image blocks are stored in an external storage module, so that the computing process is not limited by the size of the image, and even if the image is large-sized, efficient storage can be realized, and the computing efficiency is improved.

Referring to fig. 22, fig. 22 is a schematic structural diagram of a neural network computing device 2210 according to an embodiment of the present application, and as shown in fig. 22, the neural network computing device 2210 includes a communication interface 2211, a processor 2212, a memory 2213, and at least one communication bus 2214 for connecting the communication interface 2211, the processor 2212, and the memory 2213.

Memory 2213 includes, but is not limited to, Random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or compact disk read-only memory (CD-ROM), and memory 2213 is used for associated instructions and data.

Communication interface 2211 is used for receiving and transmitting data.

The processor 2212 may be one or more Central Processing Units (CPUs), and in the case that the processor 2212 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.

The processor 2212 in the neural network computing device 2210 is configured to read one or more program codes stored in the memory 2213, and perform the following operations: determining pixel point coordinates of a source image; if the vertical coordinate of the pixel point coordinate of the source image is not larger than the preset threshold value, reading third source image pixel point data from the internal storage module according to the pixel point coordinate of the source image; and if the vertical coordinate of the pixel point coordinate of the source image is larger than the preset threshold value, reading the pixel point data of the third source image from the external storage module according to the pixel point coordinate of the source image.

It should be noted that, the implementation of each operation may also correspond to the corresponding description in the method embodiment shown in fig. 10.

In the neural network computing device 2210 described in fig. 22, the image is stored in blocks, a part of the image blocks are stored in an internal storage module of the neural network computing device, another part of the image blocks are stored in an external storage module, and in the computing process of the neural network computing device, after determining the pixel point coordinates of the source image, the neural network computing device determines whether the ordinate of the pixel point coordinates of the source image is greater than a preset threshold, if the ordinate is not greater than the preset threshold, the internal storage module reads the pixel point data, and if the ordinate is greater than the preset threshold, the external storage module reads the pixel point data, so that the computing process is not limited by the image size, even if the image is a large-size image, the pixel point data can be efficiently.

Referring to fig. 23, fig. 23 is a schematic structural diagram of a chip hardware according to an embodiment of the present disclosure, where the chip includes a neural network processor 2300. The chip may be disposed in the execution apparatus 1810 shown in fig. 18 or the execution apparatus 1910 shown in fig. 19, and both the data reading method and the data storing method in the above method embodiments may be implemented in the chip shown in fig. 23.

The neural network processor 2300 is mounted as a coprocessor on a main CPU (host CPU), and tasks are allocated by the main CPU. A core portion of the neural network processor 2300 is an arithmetic circuit 2303, and the controller 2304 controls the arithmetic circuit 2303 to extract data in a memory (weight memory or input memory) and perform arithmetic.

In some implementations, the arithmetic circuit 2303 includes a plurality of processing units or computing units (PEs) therein.

In some implementations, the operational circuit 2303 is a two-dimensional systolic array. The arithmetic circuit 2303 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition.

In some implementations, the arithmetic circuit 2303 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit 2303 fetches the data corresponding to the matrix B from the weight memory 2302, and buffers the data in each PE in the arithmetic circuit 2303. The arithmetic circuit 2303 takes the matrix a data from the input memory 2301 and performs matrix operation with the matrix B, and a partial result or a final result of the obtained matrix is stored in an accumulator (MAC) 2308.

The vector calculation unit 2307 may further process the output of the operation circuit 2303, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 2307 may be used for network calculation of non-convolution/non-FC layers in a neural network, such as pooling (posing), batch normalization (batch normalization), local response normalization (local response normalization), and the like.

In some implementations, the vector calculation unit 2307 can store the processed output vector to the unified buffer 2306. For example, the vector calculation unit 2307 may apply a non-linear function to the output of the arithmetic circuit 2303, such as a vector of accumulated values, to generate the activation value.

In some implementations, vector calculation unit 2307 generates normalized values, merged values, or both.

In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 2303, for example, for use in subsequent layers in a neural network.

The unified memory 2306 is used for storing input data and output data.

The weight data directly passes through a memory cell access controller (DMAC) 2305 to transfer the input data in the external memory to the input memory 2301 and/or the unified memory 2306, store the weight data in the external memory into the weight memory 2302, and store the data in the unified memory 2306 into the external memory.

A Bus Interface Unit (BIU) 2310, configured to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 2309 through a bus.

An instruction fetch buffer (2309) connected to the controller 2304, for storing instructions used by the controller 2304;

the controller 2304 is configured to call an instruction cached in the finger memory 2309, so as to control the operation process of the operation accelerator.

Generally, the unified memory 2306, the input memory 2301, the weight memory 2302, and the instruction fetch memory 2309 are On-Chip (On-Chip) memories, the external memory is a memory outside the NPU, and the external memory may be a double data rate synchronous dynamic random access memory (DDR SDRAM), a High Bandwidth Memory (HBM), or other readable and writable memories.

Embodiments of the present application further provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the method flows shown in the above method embodiments are implemented.

The embodiments of the present application further provide a computer program product, where when the computer program product runs on a computer, the method flows shown in the above method embodiments are implemented.

It should be understood that the Processor mentioned in the embodiments of the present Application may be a Central Processing Unit (CPU), and may also be other general purpose processors, Digital Signal Processors (DSP), Application Specific Integrated Circuits (ASIC), Field Programmable Gate Arrays (FPGA) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will also be appreciated that the memory referred to in the embodiments of the application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous link SDRAM (SLDRAM), and direct memory bus RAM (DR RAM).

It should be noted that when the processor is a general-purpose processor, a DSP, an ASIC, an FPGA or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, the memory (memory module) is integrated in the processor.

It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The modules in the device can be merged, divided and deleted according to actual needs.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A neural network computing device is characterized by comprising a system control module, a computing module, a bus control interface module and an internal storage module, wherein the system control module is in communication connection with the computing module;

2. The apparatus of claim 1, wherein the first source image pixel point data is pixel point data of a first source image pixel point in the source image whose ordinate is not greater than a preset threshold, and the second source image pixel point data is pixel point data of a second source image pixel point in the source image whose ordinate is greater than the preset threshold.

3. The apparatus of claim 1 or 2, wherein the computing module comprises:

the coordinate calculation unit is used for calculating pixel point coordinates of the source image according to the pixel point coordinates of the target image and the transformation parameters to obtain pixel point coordinates of the source image;

the address calculation unit is used for calculating to obtain the storage address information according to the pixel point coordinates of the source image and sending the storage address information to the bus control interface module;

and the pixel calculation unit is used for calculating and obtaining the target image pixel point data according to the third source image pixel point data.

4. The apparatus of claim 3, wherein the address calculation unit comprises:

the coordinate judgment subunit is used for judging whether the ordinate of the pixel point coordinate of the source image is greater than the preset threshold value or not, and sending an external read data waiting signal to the coordinate calculation unit under the condition that the ordinate of the pixel point coordinate of the source image is greater than the preset threshold value, wherein the external read data waiting signal is used for indicating the coordinate calculation unit to suspend calculation;

the internal storage address calculation subunit is configured to, when the ordinate of the source image pixel point coordinate is not greater than the preset threshold, calculate, according to the source image pixel point coordinate, to obtain first storage address information, where the first storage address information is storage address information of the third source image pixel point data in the internal storage module;

and the external storage address calculation subunit is used for calculating second storage address information according to the pixel point coordinates of the source image under the condition that the vertical coordinates of the pixel point coordinates of the source image are greater than the preset threshold value, wherein the second storage address information is storage address information of the pixel point data of the third source image in the external storage module.

5. The apparatus according to claim 4, wherein, in obtaining the first storage address information by calculating the pixel coordinates of the source image, the internal storage address calculating subunit is specifically configured to:

if the ordinate and the abscissa of the pixel point coordinate of the source image are both even numbers, determining a first address offset according to the ordinate and the abscissa of the pixel point coordinate of the source image and the width of the source image;

if the ordinate of the pixel coordinates of the source image is even and the abscissa is odd, determining a second address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image;

if the ordinate of the pixel coordinates of the source image is odd and the abscissa is even, determining a third address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image;

and if the ordinate and the abscissa of the pixel coordinates of the source image are both odd numbers, determining a fourth address offset according to the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image.

6. The apparatus according to claim 5, wherein, in determining the first address offset according to the ordinate and abscissa of the pixel coordinates of the source image and the width of the source image, the internal storage address calculation subunit is specifically configured to:

dividing the width of the source image by 2 and then rounding to obtain a first width value;

if the first width value is an odd number, adding 1 to the first width value to obtain a second width value;

if the first width value is an even number, taking the first width value as the second width value;

dividing the ordinate of the pixel point coordinate of the source image by 2 and then rounding to obtain a first ordinate value;

dividing the abscissa of the pixel point coordinate of the source image by 2 and then rounding to obtain a first abscissa value;

and multiplying the second width value by the first ordinate value, and adding the first abscissa value to obtain the first address offset.

7. The apparatus according to claim 5, wherein, in determining the second address offset according to the ordinate and abscissa of the pixel coordinates of the source image and the width of the source image, the internal storage address calculation subunit is specifically configured to:

dividing the width of the source image by 2 to obtain a third width value;

dividing the ordinate of the pixel point coordinate of the source image by 2 and then rounding to obtain a second ordinate value;

dividing the abscissa of the pixel point coordinate of the source image by 2 and then rounding to obtain a second abscissa value;

and multiplying the third width value by the second ordinate value, and adding the second abscissa value to obtain the second address offset.

8. The apparatus according to claim 5, wherein, in determining a third address offset according to the ordinate and abscissa of the pixel coordinates of the source image and the width of the source image, the internal storage address calculation subunit is specifically configured to:

dividing the width of the source image by 2 and then rounding to obtain a fourth width value;

if the fourth width value is an odd number, adding 1 to the fourth width value to obtain a fifth width value;

if the fourth width value is an even number, taking the fourth width value as the fifth width value;

if the vertical coordinate of the pixel point coordinate of the source image is an odd number, adding 1 to the vertical coordinate of the pixel point coordinate of the source image to obtain a third vertical coordinate value;

if the vertical coordinate of the pixel point coordinate of the source image is an even number, taking the vertical coordinate of the pixel point coordinate of the source image as the third vertical coordinate value;

dividing the third ordinate value by 2 and rounding to obtain a fourth ordinate value;

dividing the abscissa of the pixel point coordinate of the source image by 2 and then rounding to obtain a third abscissa value;

multiplying the fifth width value by the fourth ordinate value, and adding the third abscissa value to obtain a first numerical value;

if the abscissa of the pixel point coordinate of the source image is an odd number, adding 1 to the first numerical value to obtain the third address offset;

and if the abscissa of the pixel point coordinate of the source image is an even number, taking the first numerical value as the third address offset.

9. The apparatus according to claim 5, wherein in determining a fourth address offset according to the ordinate and abscissa of the pixel coordinates of the source image and the width of the source image, the internal storage address calculation subunit is specifically configured to:

adding 1 to the width of the source image to obtain a sixth width value;

if the vertical coordinate of the pixel point coordinate of the source image is an odd number, adding 1 to the vertical coordinate of the pixel point coordinate of the source image to obtain a fifth vertical coordinate value;

if the vertical coordinate of the pixel point coordinate of the source image is an even number, taking the vertical coordinate of the pixel point coordinate of the source image as the fifth vertical coordinate value;

dividing the fifth ordinate value by 2 and rounding to obtain a sixth ordinate value;

dividing the abscissa of the pixel point coordinate of the source image by 2 and then rounding to obtain a fourth abscissa value;

multiplying the sixth width value by the sixth ordinate value, and adding the fourth abscissa value to obtain a second numerical value;

if the abscissa of the pixel point coordinate of the source image is an odd number, adding 1 to the second numerical value to obtain the fourth address offset;

and if the abscissa of the pixel point coordinate of the source image is an even number, taking the second numerical value as the fourth address offset.

10. The apparatus of claim 5, wherein the internal storage module comprises:

the first storage unit is used for storing source image pixel point data of even-numbered row and even-numbered column pixel points in the first source image pixel point data;

the second storage unit is used for storing source image pixel point data of even-numbered row and odd-numbered column pixel points in the first source image pixel point data;

the third storage unit is used for storing source image pixel point data of odd-numbered row and even-numbered column pixel points in the first source image pixel point data;

and the fourth storage unit is used for storing source image pixel point data of odd-numbered row pixel points in the first source image pixel point data.

11. The apparatus of claim 10, wherein the bus control interface module is specifically configured to:

reading the third source image pixel point data from the first storage unit according to the first address offset;

and/or reading the third source image pixel point data from the second storage unit according to the second address offset;

and/or reading the third source image pixel point data from the third storage unit according to the third address offset;

and/or reading the third source image pixel point data from the fourth storage unit according to the fourth address offset.

12. The apparatus according to claim 4, wherein the source image pixel coordinates include coordinates of a first source image pixel point, coordinates of a second source image pixel point, coordinates of a third source image pixel point, and coordinates of a fourth source image pixel point, the vertical coordinates of the first source image pixel point and the second source image pixel point are the same, the vertical coordinates of the third source image pixel point and the fourth source image pixel point are the same, the horizontal coordinates of the first source image pixel point and the third source image pixel point are the same, the horizontal coordinates of the second source image pixel point and the fourth source image pixel point are the same, and in terms of obtaining second storage address information by calculation according to the source image pixel point coordinates, the external storage address calculation subunit is configured to:

determining a first storage address of a pixel point of the first source image according to an image storage starting address, the width of the source image, the vertical coordinate of the pixel point of the first source image and the horizontal coordinate of the pixel point of the first source image, and taking one storage address behind the first storage address as a second storage address of the pixel point of the second source image;

determining a third storage address of the third source image pixel point according to the image storage starting address, the width of the source image, the vertical coordinate of the third source image pixel point and the horizontal coordinate of the third source image pixel point, and taking a storage address behind the third storage address as a fourth storage address of the fourth source image pixel point.

13. The apparatus of claim 12, wherein the first memory address is determined according to the following equation:

and the first storage address is equal to the image storage starting address, the ordinate of the pixel point of the first source image, the width of the source image and the abscissa of the pixel point of the first source image.

14. The apparatus of claim 12, wherein the third memory address is determined according to the following equation:

and the third storage address is equal to the image storage starting address + (the vertical coordinate of the third source image pixel point +1) × the width of the source image + the horizontal coordinate of the third source image pixel point.

15. The apparatus of claim 12, wherein the bus control interface module is specifically configured to:

reading primary pixel point data from the external storage module according to the first storage address and the second storage address to obtain pixel point data of the first source image pixel point and pixel point data of the second source image pixel point;

reading primary pixel point data from the external storage module according to the third storage address and the fourth storage address to obtain pixel point data of the third source image pixel point and pixel point data of the fourth source image pixel point.

16. The apparatus of any one of claims 1-15, further comprising:

and the output cache module is used for caching the pixel point data of the target image.

17. The apparatus of any one of claims 1-15, wherein the system control module is further communicatively coupled to the bus control interface module, and the system control module is further configured to send a read data start signal to the bus control interface module.

18. A data reading method is characterized in that the method is applied to a neural network computing device, the neural network computing device comprises an internal storage module, the neural network computing device is in communication connection with an external storage module, first source image pixel point data are stored in the internal storage module, second source image pixel point data are stored in the external storage module, and the first source image pixel point data and the second source image pixel point data form pixel point data of a source image, and the method comprises the following steps:

determining pixel point coordinates of a source image;

19. The method of claim 18, wherein said determining source image pixel point coordinates comprises:

acquiring coordinates of pixel points of a target image;

and calculating to obtain the pixel point coordinates of the source image according to the pixel point coordinates of the target image.

20. The method of claim 18 or 19, wherein said reading third source image pixel point data from said internal storage module based on said source image pixel point coordinates comprises:

calculating to obtain first storage address information according to the pixel point coordinates of the source image;

and reading third source image pixel point data from the internal storage module according to the first storage address information.

21. The method of claim 20, wherein said calculating first storage address information from said source image pixel point coordinates comprises:

22. The method of claim 21, wherein determining a first address offset based on the ordinate and abscissa of the pixel coordinates of the source image and the width of the source image comprises:

23. The method of claim 21, wherein determining a second address offset based on the ordinate and abscissa of the source image pixel point coordinates and the width of the source image comprises:

dividing the width of the source image by 2 to obtain a third width value;

24. The method of claim 21, wherein determining a third address offset based on the ordinate and the abscissa of the pixel coordinates of the source image and the width of the source image comprises:

25. The method of claim 21, wherein determining a fourth address offset based on the ordinate and abscissa of the pixel coordinates of the source image and the width of the source image comprises:

adding 1 to the width of the source image to obtain a sixth width value;

26. The method of claim 21, wherein the internal storage module comprises a first storage unit, a second storage unit, a third storage unit, and a fourth storage unit, and wherein reading third source image pixel point data from the internal storage module according to the first storage address information comprises:

reading the third source image pixel point data from the second storage unit according to the second address offset;

reading the third source image pixel point data from the third storage unit according to the third address offset;

and reading the third source image pixel point data from the fourth storage unit according to the fourth address offset.

27. The method of claim 18 or 19, wherein said reading said third source image pixel point data from said external storage module based on said source image pixel point coordinates comprises:

calculating according to the pixel point coordinates of the source image to obtain second storage address information;

and reading the third source image pixel point data from the external storage module according to the second storage address information.

28. The method according to claim 27, wherein the source image pixel coordinates include coordinates of a first source image pixel, coordinates of a second source image pixel, coordinates of a third source image pixel and coordinates of a fourth source image pixel, the vertical coordinates of the first source image pixel and the second source image pixel are the same, the vertical coordinates of the third source image pixel and the fourth source image pixel are the same, the horizontal coordinates of the first source image pixel and the third source image pixel are the same, the horizontal coordinates of the second source image pixel and the fourth source image pixel are the same, and the calculating according to the source image pixel coordinates obtains second storage address information, including:

29. The method of claim 28, wherein the first memory address is determined according to the following equation:

30. The method of claim 28, wherein the third memory address is determined according to the following equation:

31. The method of claim 28, wherein said reading said third source image pixel point data from said external storage module based on said second storage address information comprises:

32. A data storage method is applied to a neural network computing device, the neural network computing device comprises an internal storage module, the neural network computing device is in communication connection with an external storage module, and the method comprises the following steps:

acquiring pixel point data of a source image;

33. The method as claimed in claim 32, wherein the internal storage module comprises a first storage unit, a second storage unit, a third storage unit and a fourth storage unit, and the storing the pixel point data of the first source image pixel point with the ordinate not greater than the preset threshold in the source image in the internal storage module comprises:

storing pixel point data of pixel points of a first source image with even ordinate and even abscissa in a first storage unit;

storing pixel point data of pixel points of a first source image with an even ordinate and an odd abscissa in a second storage unit;

storing pixel point data of pixel points of a first source image with odd ordinate and even abscissa in a third storage unit;

and storing the pixel point data of the pixel points of the first source image with odd ordinate and odd abscissa in a fourth storage unit.

34. The method of claim 33, wherein storing pixel data of pixel points of the first source image with even ordinate and even abscissa in the first storage unit comprises:

arranging the first source image pixel points with even ordinate and abscissa from small to large according to the ordinate to obtain a first arrangement sequence, wherein the first source image pixel points with the same ordinate in the first arrangement sequence are arranged from small to large according to the abscissa;

and storing the pixel point data of the pixel points of the first source image with even ordinate and even abscissa in the first storage unit according to the first arrangement sequence.

35. The method of claim 32, wherein the storing pixel data of a pixel point of a second source image having a vertical coordinate greater than the preset threshold in the source image in an external storage module comprises:

arranging second source image pixels with vertical coordinates larger than the preset threshold value in the source images from small to large according to the vertical coordinates to obtain a second arrangement sequence, wherein the second source image pixels with the same vertical coordinates in the second arrangement sequence are arranged from small to large according to the horizontal coordinates;

and storing the pixel point data of the pixel points of the second source image in the external storage module according to the second arrangement sequence.

36. The data reading device is characterized by being applied to a neural network computing device, wherein the neural network computing device comprises an internal storage module, the neural network computing device is in communication connection with an external storage module, first source image pixel point data are stored in the internal storage module, second source image pixel point data are stored in the external storage module, the first source image pixel point data and the second source image pixel point data form pixel point data of a source image, and the data reading device comprises:

37. A data storage device for use in a neural network computing device, the neural network computing device including an internal storage module, the neural network computing device communicatively coupled to an external storage module, the data storage device comprising:

the acquisition unit is used for acquiring pixel point data of a source image;

38. A neural network computing device, comprising a processor, memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method of any of claims 18-31.

39. A neural network computing device, comprising a processor, memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method of any of claims 32-35.

40. A chip, comprising: a processor for calling and running a computer program from a memory so that a device on which the chip is installed performs the method of any of claims 18-31 or 32-35.

41. A computer-readable storage medium, characterized in that it stores a computer program for electronic data exchange, wherein the computer program causes a computer to perform the method according to any one of claims 18-31 or 32-35.

42. A computer program for causing a computer to perform the method of any one of claims 18-31 or 32-35.