US7978905B2 - Calculation processing apparatus and control method thereof - Google Patents

Calculation processing apparatus and control method thereof Download PDF

Info

Publication number
US7978905B2
US7978905B2 US12/602,628 US60262808A US7978905B2 US 7978905 B2 US7978905 B2 US 7978905B2 US 60262808 A US60262808 A US 60262808A US 7978905 B2 US7978905 B2 US 7978905B2
Authority
US
United States
Prior art keywords
calculation
processing
processing node
unit
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US12/602,628
Other languages
English (en)
Other versions
US20100215253A1 (en
Inventor
Takahisa Yamamoto
Masami Kato
Yoshinori Ito
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Assigned to CANON KABUSHIKI KAISHA reassignment CANON KABUSHIKI KAISHA ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ITO, YOSHINORI, KATO, MASAMI, YAMAMOTO, TAKAHISA
Publication of US20100215253A1 publication Critical patent/US20100215253A1/en
Application granted granted Critical
Publication of US7978905B2 publication Critical patent/US7978905B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]

Definitions

  • the present invention relates to a hierarchical calculation processing method and apparatus, which are applied to a pattern identification apparatus, pattern identification system, hierarchical filter calculation processing apparatus, and the like.
  • the neural network is often implemented as software which runs on a microprocessor, and is provided as application software for a personal computer, workstation, and the like.
  • FIG. 14 is a schematic diagram showing an example of the arrangement of an image processing apparatus using a general layer-interconnected neural network.
  • reference numeral 21 denotes detection target data, for example, raster-scanned image data.
  • Reference numeral 22 denotes a calculation unit which detects a predetermined object from an image, and comprises a neural network of three layers in the example of FIG. 14 .
  • Reference numeral 23 denotes an output data plane corresponding to the calculation result.
  • the calculation unit 22 executes processing while scanning and referring to a predetermined image area 24 , thereby detecting a detection target which exists in the image.
  • the output data plane 23 is an image plane having the same size as the image data 21 as the detection target, and stores detection outputs obtained when the calculation unit 22 processes all the areas of the image data 21 while scanning them. Since the calculation unit 22 outputs a large value at a position where a target is detected, it can recognize the position of the target in the image plane by scanning the output data plane 23 .
  • reference numerals 25 , 26 , and 27 denote layers of the neural network, and a predetermined number of neurons 28 exist in each layer.
  • the first layer 25 has the same number of nodes, that is, neurons 28 as the number of pixels of a reference image. Respective neurons are feedforward-interconnected via predetermined weighting coefficients.
  • FIG. 15 shows the arrangement of one neuron 28 .
  • Reference numerals in_ 1 to in_n denote input values to this processing node, which are detection target image data in the first layer, and neuron output values of the previous layer in the second and subsequent layers.
  • Multipliers 31 a , 31 b , . . . , 31 n output products obtained by multiplying the output values of the respective previous layer neurons by coefficients w 1 to w_n obtained by learning.
  • An accumulation adder 32 accumulates the products from the multipliers 31 a , 31 b , . . . , 31 n .
  • a nonlinear transformation processing unit 33 nonlinearly transforms the accumulated sum of the accumulation adder 32 using a logistic function, hyperbolic tangent function (tan h function), or the like, and outputs that result as a detection result “out”.
  • the weighting coefficients w_ 1 to w_n required for respective neurons are determined in advance in accordance with a detection target using a learning algorithm such as back propagation, or the like, which is generally known.
  • Japanese Patent No. 2679730 discloses an architecture of a hierarchical structure neural network which implements a multilayered structure using single-layer analog neural network hardware as time division multiplexing.
  • Japanese Patent Laid-Open No. 03-055658 discloses an implementation method using digital hardware.
  • CNN Convolutional Neural Networks
  • FIG. 16 shows the logical network composition as an example of simple CNN.
  • FIG. 16 shows an example of three-layer CNN in which the number of features of a first layer 406 is 3, that of a second layer 410 is 2, and that of a third layer 411 is 1.
  • Reference numeral 401 denotes image data, which corresponds to raster-scanned image data.
  • Reference numerals 403 a to 403 c denote feature planes of the first layer 406 .
  • the feature plane is an image data plane indicating the calculation result while scanning data of the previous layer using a predetermined feature extraction filter (the accumulated sum of convolution calculations and nonlinear processing). Since the feature plane is the detection result for the raster-scanned image data, the detection result is also expressed by a plane.
  • the feature planes 403 a to 403 c are generated from the image data 401 by corresponding feature extraction filters.
  • the feature planes 403 a to 403 c are generated by two-dimensional convolution filter calculations corresponding to convolution filter kernels 404 a to 404 c , and the nonlinear transformation of the calculation results.
  • reference numeral 402 denotes a reference image area required for the convolution calculations.
  • a convolution filter calculation having a kernel size (the length in the horizontal direction and the height in the vertical direction) of 11 ⁇ 11 processes data by a product-sum calculation given by:
  • Reference numerals 404 a to 404 c denote convolution filter kernels having different coefficients. Also, the convolution filter kernels have different sizes depending on the feature planes. The convolution filter kernels will be referred to as convolution kernels hereinafter.
  • the CNN calculations generate the feature plane by repeating the product-sum calculation while scanning a plurality of filter kernels for respective pixels, and by nonlinearly transforming the final product-sum result.
  • the number of filter kernels is 1 ( 404 a ).
  • the calculation results of three convolution filters corresponding to convolution kernels 409 a to 409 c or 409 d to 409 f are accumulated.
  • the convolution kernels 409 a to 409 f have different filter coefficients.
  • the convolution kernels 409 a to 409 c and the convolution kernels 409 d to 409 f have different kernel, sizes, as shown in FIG. 16 .
  • the feature plane 407 a can be generated by accumulating the outputs from the convolution kernels 409 a to 409 c , and finally executing the nonlinear transformation processing of the result.
  • the basic arrangement of the accumulation of convolution kernels (convolution filters) and the nonlinear transformation processing is the same as that of the neuron shown in FIG. 15 . That is, the coefficients of the convolution kernel corresponds to the weighting coefficients w_ 1 to w_n.
  • the accumulation adder 32 Upon interconnecting to the feature planes of a plurality of previous layers like the feature planes 407 a , 407 b , and 408 , the accumulation adder 32 accumulates a plurality of convolution kernel calculation results. That is, the total number of interconnections corresponds to the convolution kernel size ⁇ the number of features of the previous layer.
  • FIG. 17 is a view for explaining graphic detection processing in the CNN calculations.
  • Reference numerals 51 a to 51 c denote convolution kernels which illustrate feature extraction targets of the first layer, and are learned to respectively extract a horizontal edge and oblique edges.
  • Reference numerals 52 a and 52 b denote graphics determined based on the extraction results of a plurality of first layer features (primary features) and their spatial allocation relationships.
  • Reference numeral 53 denotes a graphic to be finally extracted (ternary feature in this example). The graphic 53 is determined based on the extraction results of a plurality of second layer features (secondary features) and their spatial allocation relationship.
  • the respective filter coefficients of the convolution kernels are determined for respective features by learning using a prevalent method such as perceptron learning, back propagation learning, or the like.
  • a filter kernel having a size as large as 10 ⁇ 10 or more is normally used.
  • convolution kernel sizes are different for respective features.
  • a buffer memory used to hold neuron outputs suffices except for input and output image buffers. That is, if a memory having the predetermined number of bits as many as the number of neurons is provided, desired calculation processing can be executed.
  • the present invention has been made to solve such problems, and one typical embodiment provides a method and circuit, which implement, using a small memory size, hierarchical calculation processing based on the spatial allocation relationship such as the CNN calculations and the like.
  • a calculation processing apparatus which executes calculation processing based on a network composed by hierarchically connecting a plurality of processing nodes, the apparatus comprising:
  • memory control means for assigning a partial area of a memory to each of the plurality of processing nodes, storing a calculation result of each processing node in a storable area of the partial area assigned to that processing node, and setting, as storable areas, areas that store the calculation results whose reference by all processing nodes connected to a subsequent stage of that processing node is complete;
  • designation means for designating a processing node, which is to execute calculation processing, of the plurality of processing nodes
  • determination means for determining, based on storage states of calculation results in partial areas of the memory assigned to the processing node designated by the designation means and to processing nodes connected to a previous stage of the designated processing node, whether or not to execute a calculation of the designated processing node;
  • execution means for, when the determination means determines that the calculation is executed, controlling to execute calculation processing corresponding to the designated processing node.
  • a method of controlling a calculation processing apparatus which executes calculation processing based on a network composed by hierarchically connecting a plurality of processing nodes, the method comprising:
  • FIG. 1 is a block diagram for explaining an example of the arrangement of an image processing apparatus which uses a hierarchical calculation processing apparatus according to an embodiment
  • FIG. 2 is a block diagram showing an example of the arrangement of the hierarchical calculation processing apparatus according to the first embodiment
  • FIG. 3 is a view for explaining the logical connection configuration of processing nodes
  • FIGS. 4A and 4B are views for explaining a unit calculation of each processing node according to the embodiment.
  • FIG. 5 is a view showing an example associated with memory assignments according to the embodiment.
  • FIG. 6 is a view showing a partial network extracted from a hierarchical network shown in FIG. 3 to have the fourth processing node as the center;
  • FIG. 7 is a view showing partial memory assignments extracted from those shown in FIG. 5 ;
  • FIGS. 8A , 8 B and 8 C show an example of the data configuration of a network composition information table
  • FIG. 9 is a view for explaining read processes of calculation target pixel data
  • FIG. 10 is a block diagram for explaining an example of the arrangement of a calculation unit
  • FIGS. 11A and 11B is a flowchart for explaining the operation of a unit calculation execution determination unit
  • FIGS. 12A and 12B is a flowchart for explaining the operation of the unit calculation execution determination unit
  • FIG. 13 is a block diagram for explaining an example of the arrangement of a hierarchical calculation processing apparatus according to the second embodiment
  • FIG. 14 is a view for explaining an example of the composition of a layer-interconnected neural network
  • FIG. 15 is a view showing an example of the arrangement of a neuron
  • FIG. 16 is a view for explaining an example of the network composition of Convolutional Neural Networks (CNN).
  • FIG. 17 is a view for explaining an example of feature extraction of the CNN.
  • FIG. 1 is a block diagram showing an example of the arrangement of a pattern detection apparatus, which comprises a hierarchical calculation processing circuit according to the first embodiment.
  • the pattern detection apparatus has a function of detecting a specific object (image pattern) in image data.
  • reference numeral 61 denotes an image input unit, which comprises an optical system and a photoelectric conversion device such as a CCD (Charge-Coupled Device) or CMOS (Complementary Metal Oxide Semiconductor) sensor, or the like.
  • the image input unit 61 includes a driver circuit for controlling the CCD or CMOS sensor, an AD converter, a signal processing circuit for controlling various kinds of image correction, a frame buffer, and the like.
  • Reference numeral 62 denotes a pre-processing unit, which executes various kinds of pre-processing required to efficiently execute detection processing of graphics and the like from an image. More specifically, the pre-processing unit 62 processes image data conversion such as color conversion processing, contrast correction processing, and the like by hardware.
  • a CNN processing unit 63 is a feature detection processing unit including a hierarchical calculation processing apparatus. Details of the CNN processing unit 63 will be described later with reference to FIG. 2 .
  • Reference numeral 66 denotes a DMAC (Direct Memory Access Controller), which controls data transfer between the respective processing units on an image bus 64 and that between devices on the image bus 64 and a RAM 70 on a CPU bus 67 .
  • Reference numeral 65 denotes a bridge, which provides a bridge function between the image bus 64 and CPU bus 67 .
  • Reference numeral 68 denotes a CPU, which controls the operation of this apparatus as a whole.
  • Reference numeral 69 denotes a ROM (Read Only Memory), which stores instructions that specify the operations of the CPU 68 and parameter data required for various calculations. For example, the ROM 69 stores weighting coefficients, network interconnection information, sequence information, and the like required for the operation of the CNN processing unit 63 .
  • Reference numeral 70 denotes a RAM (Random Access Memory) which functions as a main memory necessary for the operation of the CPU 68 and comprises a memory having a relatively large capacity such as a DRAM (Dynamic RAM) or the like.
  • the CPU 68 can access various processing units on the image bus 64 via the bridge 65 . By isolating the image bus 64 and CPU bus 67 , the operations of the hardware components 61 to 63 and that of the CPU 68 can be executed simultaneously, that is, parallelly.
  • FIG. 2 is a block diagram showing an example of the arrangement of the hierarchical calculation processing apparatus in the CNN processing unit 63 of the first embodiment.
  • the hierarchical calculation processing apparatus shown in FIG. 2 is used to execute hierarchical calculations shown in, for example, FIG. 3 .
  • a processing node indicates a block which executes processing for obtaining a convolution calculation result from a convolution calculation target image and convolution kernels.
  • the zeroth processing node is provided in FIG. 3 for the sake of convenience. However, the zeroth processing node does not particularly execute any processing, and an input image is input to the first to third processing nodes.
  • the fourth processing node in FIG. 3 executes convolution calculations by applying convolution kernels having different coefficients to the outputs from the first to third processing nodes. Then, the fourth processing node adds the respective convolution calculation results, and executes nonlinear transformation to obtain a calculation result. Furthermore, the calculation result of the fourth processing node is input to the sixth and seventh processing nodes.
  • the hierarchical calculation processing apparatus shown in FIG. 2 is time-sharing system used between the processing nodes, thus executing calculations specified in the respective processing nodes.
  • the CNN calculations are executed like that the calculation specified in the first node is made using the hierarchical calculation processing apparatus, and that specified in the second processing node is then made. That is, a plurality of processing nodes which compose the CNN exist to form a logical network, but only one hierarchical calculation processing apparatus which executes calculations specified in the processing nodes exists physically.
  • a plurality of hierarchical calculation processing apparatuses may be configured to be used.
  • reference numeral 114 denotes a CPU bus access control unit, which is a bus interface required for the CPU 68 to access various registers and a memory 104 in the CNN processing unit 63 .
  • various setting data such as an address calculation parameter storage table 107 in a network composition management unit 102 , weighting coefficient set 1205 (to be described later with reference to FIG. 10 ) in a calculation unit 106 , and the like are written via that interface.
  • a sequence control unit 100 outputs sequence instruction information to a unit calculation execution unit 101 in accordance with calculation order information set in advance.
  • the hierarchical calculation processing apparatus executes calculations specified in the respective processing nodes in a time-sharing fashion. Therefore, the sequence control unit 100 controls the order of calculations specified in the respective processing nodes by the unit calculation execution unit 101 .
  • the sequence control unit 100 instructs to cyclically execute all the processing nodes which compose the hierarchical calculation network. For example, upon execution of the CNN shown in FIG. 3 by the hierarchical calculation processing apparatus of this embodiment, the sequence control unit 100 instructs the unit calculation execution unit 101 to cyclically execute the respective processing nodes like:
  • the unit calculation execution unit 101 executes the calculation specified in the instructed processing node in accordance with the instruction from the sequence control unit 100 .
  • a unit upon execution of calculations (to be referred to as a unit calculation hereinafter) is set in advance.
  • the calculation specified in each processing node includes convolution calculations, their addition, and nonlinear transformation of the result, and a series of calculations are executed for the entire input image (entire input data). Note that the addition of the convolution calculation results is executed when convolution calculations are executed for outputs from a plurality of processing nodes like the fourth to eighth processing nodes. Therefore, after the calculations specified in the respective processing nodes are made, the calculation results define a two-dimensional image.
  • the unit calculation here means a calculation for outputting calculation results for one row in the horizontal direction (or for one column in the vertical direction) in the series of calculations, and by repeating this unit calculation, the calculations specified in the respective processing nodes are carried out.
  • FIGS. 4A and 4B are views for explaining unit calculations executed by the processing nodes.
  • FIGS. 4A and 4B show a case in which a convolution calculation is made for a calculation output image (or an input image to the network) as a calculation target image (in case of the first to third processing nodes shown in FIG. 3 ), and nonlinear conversion is omitted.
  • reference numeral 601 denotes a calculation target image, in which one minimum box indicates a pixel of an input image indicated in a raster scan order or a calculation result pixel in the processing node of the previous layer (input (x, y), x: a horizontal position, y: a vertical position).
  • Reference numeral 602 denotes a calculation result image, in which one minimum box indicates a calculation result pixel in the raster-scan order (output (x, y), x: a horizontal position, y: a vertical position).
  • a reference image area 603 (an area in the bold frame) is an area of a reference image upon processing the convolution calculations at a position output ( 6 , 7 ). Note that the reference image area 603 in FIG. 4A indicates a case in which the convolution kernel size is defined by “11” in the horizontal direction and “13” in the vertical direction.
  • An area 604 in the bold frame in the calculation result image 602 indicates a result area obtained when the unit calculation (calculations for one row in the horizontal direction) is made for the calculation target image 601 .
  • Pixels in cross-hatched areas in the result area 604 are those in surrounding areas (areas that do not undergo any calculation) which are generated depending on the convolution kernel size. How to handle these surrounding areas (to delete, to embed a default value, or the like) in the hierarchical processing is not an essential matter in the present invention. In this case, for example, assume that a default value is embedded.
  • an area 605 having a horizontal size which is at least equal to the calculation target image and a vertical size which is equal to that of the convolution kernel is required as a required area of the calculation target image. That is, data of this area 605 serve as processing target data of the unit calculation by the processing node. For the sake of simplicity, this area 605 will be referred to as a unit calculation target image area hereinafter.
  • the convolution calculations can be made for the entire area of the calculation target image 601 by executing the unit calculation indicated by the result area 604 while shifting the unit calculation target area 605 . Note that FIG.
  • FIG. 4B shows a case in which the unit calculation is made for an image area 610 as a unit calculation target when the unit calculation target image area is shifted for one pixel (for one horizontal line) from the state in FIG. 4A .
  • a result area 611 is also shifted for one pixel down from the result area 604 .
  • whether or not to execute a certain unit calculation depends on whether or not pixel data of an image area as a unit calculation target of that unit calculation have been calculated by the processing node of the previous layer, and that result is output.
  • the unit calculation execution unit 101 Upon completion of the unit calculations designated by the sequence control unit 100 , the unit calculation execution unit 101 notifies the sequence control unit 100 of completion of the unit calculations (unit calculation completion notification).
  • the sequence control unit 100 instructs the unit calculation execution unit 101 to execute calculations specified in the first processing node as sequence instruction information. After that, the sequence control unit 100 cyclically updates the instruction like “calculation specified in the second processing node ⁇ . . . ⁇ that specified in the eighth processing node ⁇ that specified in the first processing node” every time it receives a unit calculation completion notification from the unit calculation execution unit 101 .
  • a unit calculation execution determination unit 105 determines whether or not the instruction unit calculation can be executed. Note that the operation and determination of this unit calculation execution determination unit 105 will be described later, and the unit 105 uses information indicating whether or not pixel data of an image area as a target of that unit calculation are available as one criterion.
  • the unit calculation execution unit 101 executes the calculation specified in the processing node instructed by the instruction information for the unit calculation (for example, for one row in the horizontal direction).
  • the unit 101 Upon completion of the unit calculation, the unit 101 notifies the sequence control unit 100 of completion of the unit calculation.
  • the unit calculation execution unit 101 skips the corresponding unit calculation, and notifies the sequence control unit 100 of completion of the unit calculation.
  • FIG. 5 illustrates a state in which the memory 104 is divided into the partial areas upon execution of the hierarchical calculations shown in FIG. 3 .
  • the unit calculation execution unit 101 reads out calculation target data from a first processing node assigned ring buffer, second processing node assigned ring buffer, and third processing node assigned ring buffer of the memory 104 .
  • the unit calculation execution unit 101 makes calculations using the readout data, and stores the calculation result in a fourth processing node assigned ring buffer.
  • the partial areas assigned to respective processing nodes are used as ring buffers.
  • the (logical) width of each ring buffer at that time is the same as that of the input image.
  • the ring buffer is cyclically overwritten and used for respective lines each having a height “1”. Therefore, one line of the ring buffer is updated every time the unit calculation is made.
  • the network composition management unit 102 manages information that specifies the network composition of the hierarchical calculations to be calculated by the hierarchical calculation processing apparatus of this embodiment.
  • the network composition means the connection relationship among processing nodes, the convolution kernel size used in the calculation processing used in each processing node, and the like.
  • the address calculation parameter storage table 107 records the network composition information managed by the network composition management unit 102 , and address management information required for read and write accesses to the memory 104 that occur upon execution of calculations.
  • the address calculation parameter storage table 107 stores various kinds of information for respective processing nodes.
  • FIG. 6 shows a partial network extracted from the hierarchical network shown in FIG. 3 to have the fourth processing node as the center for the sake of simplicity.
  • FIG. 7 shows the relationship between the address calculation parameter storage table 107 and line-storing areas of the ring buffer, for the fourth processing node.
  • the fourth processing node assigned ring buffer (in the bold frame) can store image data for nine lines. That is, the fourth processing node assigned ring buffer can store the calculation results of nine unit calculations.
  • FIG. 7 shows a “read counter value required upon sixth processing node calculation in fourth processing node assigned ring buffer” (to be referred to as “sixth processing node calculation read counter value” hereinafter).
  • the sixth processing node calculation read counter value specifies the data read-out positions when image data stored in the ring buffer assigned to the fourth processing node are used as calculation target pixel data upon making the calculations corresponding to the sixth processing node. For example, when the calculations corresponding to the sixth processing node require calculation target pixel data for five lines, since the current counter value is “3”, data for five lines of line-storing areas 3 , 4 , 5 , 6 , and 7 are read out from the ring buffer. Note that the sequence for counting up the counter value will be described later in a description of a ring buffer management unit 103 .
  • the fourth processing node assigned ring buffer includes a “read counter value required upon seventh processing node calculation in fourth processing node assigned ring buffer” (to be referred to as “seventh processing node calculation read counter value” hereinafter).
  • This seventh processing node calculation read counter value specifies the data read-out positions when image data stored in the ring buffer assigned to the fourth processing node are used as calculation target pixel data upon execution of the calculations corresponding to the seventh processing node. For example, when the calculations corresponding to the seventh processing node require calculation target pixel data for nine lines, since the current counter value is “8”, data for nine lines of line-storing areas 8 , 9 , 1 , 2 , 3 , 4 , 5 , 6 , and 7 are read out from the ring buffer.
  • a “write counter value in fourth processing node assigned processing buffer” specifies the data write positions upon storing calculation result pixel data of the calculations corresponding to the fourth processing node. For example, upon execution of the unit calculation when the current counter value is “7”, the unit calculation result is stored in line-storing area 7 . Note that the sequence for counting up the counter value will be described later in a description of the ring buffer management unit 103 .
  • the address calculation parameter storage table 107 held by the network composition management unit 102 holds the following pieces of information for each processing node, as shown in FIGS. 8A , 8 B and 8 C.
  • the number of storable lines is that of an image area required upon execution of the unit calculation in a processing node connected to the output side (subsequent stage) of that processing node (to be referred to as an adjacent upper layer processing node hereinafter). Therefore, the number of storable lines can be equal to or larger than the vertical size of the convolution kernel used upon calculation of the adjacent upper layer processing node, and is determined in advance based on the network composition of the hierarchical calculations. However, if there are a plurality of adjacent upper layer processing nodes, and convolution kernels required upon calculation of these nodes have different sizes, the number of storable lines is equal to or larger than the vertical size of the convolution kernel having the largest vertical size. In case of the fourth processing node shown in FIG.
  • the number of storable lines can be the number of lines of one having a larger vertical size of convolution kernels W_ 4 _ 6 and W_ 4 _ 7 .
  • FIG. 7 shows a case in which the number of storable lines is “9”. Therefore, one having a larger size of W_ 4 _ 6 and W_ 4 _ 7 in FIG. 6 has a size equal to or smaller than “9”.
  • the number of storable lines is equal to the vertical size of the convolution kernel (maximum size of the plurality of convolution kernels) used upon calculating the adjacent upper layer processing node for the sake of simplicity.
  • the network composition management unit 102 upon reception of the sequence instruction information from the sequence control unit 100 , the network composition management unit 102 checks the address calculation parameter storage table 107 , so as to examine the following two items:
  • unit calculation target image area examination whether or not data of a unit calculation target image area required for the unit calculation corresponding to the designated processing node specified by the sequence information are available;
  • unit calculation result write area examination whether or not the ring buffer assigned to the designated processing node in the memory 104 includes an area in which the unit calculation result is written
  • the network composition management unit 102 makes the following operations with respect to the address calculation parameter storage table 107 shown in FIGS. 8A , 8 B and 8 C.
  • the network composition management unit 102 specifies the adjacent lower layer processing node of the designated processing node (a processing node which is designated by the sequence control unit 100 to execute calculations) (there may be a plurality of adjacent lower layer processing nodes).
  • the unit 102 selects read counter values from the designated processing node and adjacent lower layer processing node (if there are a plurality of adjacent lower layer processing nodes, a plurality of read counter values are also available).
  • the unit 102 selects a write counter value when the adjacent lower layer processing node is selected as a target processing node.
  • the unit 102 selects the number of storable lines when the adjacent lower layer processing node is selected as a target processing node.
  • the unit 102 selects the number of calculation execution threshold lines from the designated processing node.
  • the unit 102 outputs the values selected in items 2, 3, 4, and 5 to the unit calculation execution determination unit 105 .
  • the network composition management unit 102 executes the following operations (see FIGS. 8A , 8 B and 8 C). Note that a case in which the fourth processing node is designated as the designated processing node will be referred to as “practical example 1” hereinafter. Therefore, the following operations of the network composition management unit 102 are made when an image area as a unit calculation target is examined in practical example 1.
  • the network composition management unit 102 selects the first, second, and third processing nodes as adjacent lower layer processing nodes of the fourth processing node.
  • the unit 102 selects:
  • the unit 102 selects a write counter value when each adjacent lower layer processing node is a target processing node, that is:
  • the unit 102 selects the number of storable lines when the each adjacent lower layer processing node is a target processing node, that is:
  • the unit 102 selects “WH 4 ” as the number of calculation execution threshold lines of the fourth node.
  • the unit 102 outputs the values selected in items 2, 3, 4, and 5 to the unit calculation execution determination unit 105 .
  • the unit calculation execution determination unit 105 executes unit calculation execution determination processing to be described later with reference to FIGS. 11A and 11B using these values, and determines whether or not data required for execution of the calculations in the designated processing node are available.
  • the network composition management unit 102 makes the following operations with respect to the address calculation parameter storage table 107 .
  • the network composition management unit 102 specifies an adjacent upper layer processing node of the designated processing node (there may be a plurality of adjacent upper layer processing nodes). Specifying the adjacent upper layer processing node of the designated processing node is equivalent to specifying a processing node which has the designated processing node as an adjacent lower layer processing node.
  • the unit 102 selects a read counter value when the adjacent upper layer processing node specified in item 1 is a target processing node and the designated processing node is an adjacent lower layer processing node (if there are a plurality of adjacent upper layer processing nodes, a plurality of read counter values are also available).
  • the unit 102 selects a write counter value of the designated processing node.
  • the unit 102 selects the number of storable lines from the designated processing node.
  • the unit 102 outputs the values selected in items 2, 3, and 4 to the unit calculation execution determination unit 105 .
  • the network composition management unit 102 makes the following operations (see FIGS. 8A , 8 B and 8 C).
  • the unit 102 specifies the sixth and seventh processing nodes as the adjacent upper layer processing nodes of the fourth processing node (specifying adjacent upper layer processing nodes of the fourth processing node is equivalent to finding processing nodes which have the fourth processing node as adjacent lower layer processing nodes).
  • the unit 102 selects a read counter value when each adjacent upper layer processing node (sixth and seventh processing nodes) is a target processing node and the designated processing node (fourth processing node) is an adjacent lower layer processing node. That is, the unit 102 selects:
  • the unit 102 selects a “write counter value (MWA 4 ) in fourth processing node assigned ring buffer” as a write counter value of the fourth processing node.
  • the unit 102 selects “BH 4 ” as the number of storable lines of the fourth processing node.
  • the unit 102 outputs the values selected in items 2, 3, and 4 to the unit calculation execution determination unit 105 .
  • the unit calculation execution determination unit 105 executes unit calculation result write area examination processing to be described later with reference to FIGS. 12A and 12B using these values, and determines whether or not the ring buffer assigned to the designated processing node includes an area required to hold the calculation execution result in the designated processing node.
  • the network composition management unit 102 outputs address calculation parameters to the ring buffer management unit 103 to give the instruction to calculate addresses.
  • the address calculation parameters to be output to the ring buffer management unit 103 include those to be used when calculation target pixel data are read out from the memory 104 and are supplied to a calculation unit 106 , and those to be used when calculation result pixel data are written out from the calculation unit 106 to the memory 104 .
  • the network composition management unit 102 Upon reading out calculation target pixel data from the memory 104 , the network composition management unit 102 outputs the read counter values, the number of storable lines, and the number of calculation execution threshold lines, which were selected to make the unit calculation target image area examination, to the ring buffer management unit 103 . Furthermore, the network composition management unit 102 outputs an offset address selected when the adjacent lower layer processing node is defined as a target processing node to the ring buffer management unit 103 .
  • the network composition management unit 102 Upon writing calculation result pixel data in the memory 104 , the network composition management unit 102 outputs the write counter value and the number of storable lines, which were selected to make the unit calculation result write area examination, to the ring buffer management unit 103 . Also, the network composition management unit 102 outputs an offset address selected from the designated processing node to the ring buffer management unit 103 .
  • the ring buffer management unit 103 calculates an address for each line based on the address calculation parameters (address calculation instruction) sent from the network composition management unit 102 .
  • the ring buffer management unit 103 outputs the calculated address for each line (ring counter value) and the offset address value to a memory access control unit 110 .
  • An offset address setting unit 111 temporarily stores the offset address sent from the network composition management unit 102 , and outputs the stored value to the memory access control unit 110 .
  • a ring size setting unit 112 temporarily stores the number of storable lines sent from the network composition management unit 102 , and outputs the stored value to a ring counter 113 .
  • the ring counter 113 loads the read counter value or write counter value sent from the network composition management unit 102 , and executes a count-up operation using that value as an initial value.
  • the number of times that the read counter value is counted up by the ring counter 113 is “vertical size of kernel ⁇ 1” times of the convolution calculation to be currently calculated.
  • the number of times that the write counter value is counted up by the ring counter 113 is once.
  • the counted up counter value is sent to the network composition management unit 102 .
  • the counter value When the counter value has reached a value set in the ring size setting unit 112 , it is reset to zero. That is, the maximum value of the counter value is the “value set in the ring size setting unit 112 ⁇ 1”. As described above, the value set in the ring size setting unit 112 upon counting up the read counter value is different from that upon counting up the write counter value. Note that the read counter value is counted up when calculation target pixel data is read out from the memory 104 . The write counter value is counted up when calculation result pixel data is written in the memory 104 .
  • the ring counter 113 Upon reading out calculation target pixel data from the memory 104 , the ring counter 113 outputs, as the ring counter value, the initial value and counted up value (to be referred to as values “as many as the vertical size of the kernel” together hereinafter) to the memory access control unit 110 . Upon writing calculation result pixel data in the memory 104 , the ring counter 113 outputs the initial value to the memory access control unit 110 as the ring counter value.
  • the ring counter 113 sends a value obtained by counting up the read counter value or write counter value sent from the network composition management unit 102 by one to the network composition management unit 102 as an updated value of the address calculation parameter.
  • the read counter value or write counter value sent from the network composition management unit 102 is an initial value of the ring counter 113 .
  • the ring counter 113 updates the count value to zero when the value counted up by one reaches the number of storable lines.
  • the network composition management unit 102 when calculation target pixel data are read out from the memory 104 , the network composition management unit 102 sends as address calculation parameters, to the ring buffer management unit 103 :
  • the ring buffer management unit 103 sets MRA 1 _ 4 in the ring counter 113 , BH 1 in the ring size setting unit 112 , and OA 1 in the offset address setting unit 111 .
  • the ring counter 113 outputs the ring counter value to the memory access control unit 110 while counting it up (WH 4 ⁇ 1) times. As a result, the ring counter 113 outputs values in a number as many as WH 4 including the initial value to the memory access control unit 110 .
  • the ring buffer management unit 103 sets MRA 2 _ 4 in the ring counter 113 , BH 2 in the ring size setting unit 112 , and OA 2 in the offset address setting unit 111 , and repeats the same processing as above. Furthermore, the ring buffer management unit 103 sets MRA 3 _ 4 in the ring counter 113 , BH 3 in the ring size setting unit 112 , and OA 3 in the offset address setting unit 111 , and repeats the same processing as above.
  • the network composition management unit 102 when calculation result pixel data are written in the memory 104 , the network composition management unit 102 sends, to the ring buffer management unit 103 :
  • the memory access control unit 110 generates physical addresses based on the ring counter values and offset address value sent from the ring buffer management unit 103 . Furthermore, the memory access control unit 110 calculates addresses required to read out calculation target pixel data required for the convolution calculations in the calculation unit 106 , and addresses required to store calculation result pixel data.
  • FIG. 9 is a view for explaining the operation when the memory access control unit 110 reads out calculation target pixel data.
  • an area 701 that in the bold frame indicates a ring buffer
  • an area 702 (hatched area) indicates the size of a convolution kernel (5 pixels ⁇ 5 pixels in FIG. 9 ).
  • the memory access control unit 110 calculates the start addresses of respective line-storing areas of the ring buffer based on the ring counter value and offset address value. Note that the horizontal width of a calculation target image is set in advance. Furthermore, the memory access control unit 110 calculates addresses required to read out pixels required for the convolution calculations from each line-storing area using the start address of that line-storing area. The unit 110 calculates addresses of hatched pixels in FIG. 9 , that is, those in the area 702 .
  • the memory access control unit 110 calculates the start address of line-storing area 3 for the ring counter value “2”. Furthermore, the unit 110 calculates addresses required to read out pixels of the horizontal size (5) of the convolution kernel from line-storing area 3 . After that, the unit 110 repeats the same processing for the ring counter value “3” and subsequent values.
  • the memory access control unit 110 generates a read/write control signal and the like, and outputs the calculated addresses and generated control signal to the memory 104 . Furthermore, the unit 110 transfers data output from the memory 104 to the calculation unit 106 upon reading, and transfers the calculation result output from the calculation unit 106 to the memory 104 upon writing.
  • the calculation unit 106 executes the convolution calculations and nonlinear processing for a predetermined data group.
  • FIG. 10 is a block diagram showing an example of the calculation unit 106 .
  • a multiplier 1201 multiplies a coefficient output from a weighting coefficient set 1205 selected by a coefficient selector 1204 in accordance with the network composition information, and calculation target pixel data input in synchronism with that coefficient, and outputs the product.
  • An accumulation adder 1202 accumulates the output from the multiplier 1201 for a predetermined period of time.
  • a nonlinear transformation processor 1203 nonlinearly transforms the accumulated sum result using a logistic function or tan h function. Note that the nonlinear transformation is implemented by a function table which enumerates predetermined function values.
  • the unit calculation execution determination unit 105 determines, based on information sent from the network composition management unit 102 , whether or not the unit calculation corresponding to the processing node instructed by the sequence control unit 100 can be made.
  • a threshold storage unit 108 stores a threshold used when the unit calculation execution determination unit 105 determines the advisability of the unit calculation.
  • the threshold storage unit 108 Upon making the unit calculation target image area examination, the threshold storage unit 108 stores the number of calculation execution threshold lines sent from the network composition management unit 102 .
  • the threshold storage unit 108 Upon making the unit calculation result write area examination, stores the number of storable lines sent from the network composition management unit 102 .
  • a storage amount calculation unit 109 calculates the storage amount of pixel data stored in a predetermined area of the memory 104 .
  • the storage amount indicates an amount of pixel data that can be used as a calculation target of those stored in the predetermined area.
  • a unit of the storage amount is the number of lines for the sake of simplicity.
  • the unit calculation execution determination processing by the unit calculation execution determination unit 105 will be described below with reference to FIGS. 11A , 11 B, 12 A and 12 B.
  • the storage amount becomes zero.
  • the storage amount becomes one line.
  • the storage amount becomes five lines.
  • the adjacent upper layer processing node executes the unit calculation using data in line-storing areas 1 to 5 , since data in line-storing area 1 are never used, the storage amount becomes four lines. The storage amount is calculated for each of adjacent upper layer processing nodes if such nodes exist.
  • the storage amounts of a certain processing node exist as many as the number of adjacent upper layer processing nodes of that processing node, and increase or decrease as follows.
  • the storage amount calculation unit 109 calculates storage amounts upon making the unit calculation target image area examination (steps S 101 to S 111 ) and upon making the unit calculation result write area examination (steps S 201 to S 211 ). In either case, the storage amount is calculated based on the read counter value, write counter value, and the number of storable lines sent from the network composition management unit 102 . However, as described above, the read counter value used in the unit calculation target image area examination is that associated with the designated processing node for the adjacent lower layer processing node. Also, the write counter value used in the unit calculation target image area examination is that when the designated processing node is defined as a target processing node.
  • the read counter value used in the unit calculation result write area examination is that when the adjacent upper layer processing node is defined as a target processing node, and the designated processing node is defined as the adjacent lower layer processing node. Also, the write counter value used in the unit calculation result write area examination is that of the designated processing node.
  • the storage amount calculation processing by the storage amount calculation unit 109 (steps S 102 to S 109 , steps S 202 to 5209 ) will be described in detail below.
  • the storage amount calculation unit 109 compares the read counter value and write counter value (step S 103 , step S 203 ). If the write counter value is larger, a value obtained by subtracting the read counter value from the write counter value is defined as a storage amount (steps S 104 and S 105 , steps S 204 and S 205 ).
  • step S 104 and S 106 a value obtained by adding the number of storable lines to the write counter value, and then subtracting the read counter value from that sum is defined as a storage amount (steps S 104 and S 106 , steps S 204 and S 206 ).
  • the write counter value is equal to the read counter value, either the storage amount is zero or the ring buffer is full of data, but these cases are indistinguishable from the write counter value and read counter value. Hence, which of a corresponding write counter and read counter counts last is managed. With this information, when the write counter value is equal to the read counter value, and the write counter counts last, it is determined that the write counter value reaches the read counter value. On the other hand, when the read counter counts last, it is determined that the read counter value reaches the write counter value. Then, the storage amount is calculated by distinguishing whether
  • a predetermined amount is added to the storage amount when the calculation result of the calculation processing of the corresponding processing node is written in a partial area of the memory.
  • a predetermined amount is subtracted from the storage amount when the calculation processing of a processing node connected to the subsequent stage of the corresponding processing node is completed.
  • step S 111 Upon making the unit calculation target image area examination, if there is a plurality of adjacent lower layer processing nodes, storage amounts are calculated in association with ring buffers assigned to these nodes (step S 111 ).
  • step S 211 Upon making the unit calculation result write area examination, if there is a plurality of adjacent upper layer processing nodes, storage amounts are calculated for these nodes (step S 211 ).
  • the unit calculation execution determination unit 105 compares all the storage amounts calculated in the unit calculation target image area examination with the number of calculation execution threshold lines stored in the threshold storage unit 108 (step S 110 ). Furthermore, the unit calculation execution determination unit 105 compares all the storage amounts calculated in the unit calculation result write area examination with the number of storable lines stored in the threshold storage unit 108 (step S 210 ).
  • step S 110 If all the storage amounts calculated in the unit calculation target image area examination are larger than or equal to the number of calculation execution threshold lines (step S 110 ), the process advances to step S 111 . If it is determined in step S 111 that storage amounts corresponding to all adjacent lower layer processing nodes are calculated, the process advances to step S 201 . If all the storage amounts calculated in the unit calculation result write area examination are smaller than the number of storable lines, the process advances to step S 213 . In this case, since an area that can store the calculation result (storable area) exists in the partial area of the memory, the unit calculation execution determination unit 105 instructs the network composition management unit 102 to start the unit calculation in step S 213 .
  • step S 110 the process advances to step S 112 or step S 212 , and the unit calculation execution determination unit 105 gives the instruction to skip the unit calculation.
  • an area that stores the calculation result that has been referred to by all processing nodes connected to the subsequent stage of a given processing node is sequentially determined as a storable area, and can store a new calculation result.
  • FIGS. 11A , 11 B, 12 A and 12 B whether or not the calculation result is stored in the partial area is determined in accordance with the storage state of the calculation result, that is, the storage amount in the partial area of the assigned memory.
  • the unit calculation execution determination unit 105 when the unit calculation execution determination unit 105 makes the unit calculation target image area examination, it receives, from the network composition management unit 102 :
  • a storage amount is calculated from a set [MRA 1 _ 4 , MWA 1 , BH 1 ], and storage amounts are respectively calculated from sets [MRA 2 _ 4 , MWA 2 , BH 2 ] and [MRA 3 _ 4 , MWA 3 , BH 3 ]. Furthermore, all the calculated storage amounts are compared with WH 4 , thus examining the presence/absence of required calculation target data of the unit calculation target image area.
  • the unit calculation execution determination unit 105 makes the unit calculation result write area examination, it receives, from the network composition management unit 102 :
  • a storage amount is calculated from a set [MRA 4 _ 6 , MWA 4 , BH 4 ], and a storage amount is similarly calculated from a set [MRA 4 _ 7 , MWA 4 , BH 4 ]. Furthermore, all the calculated storage amounts are compared with BH 4 , thus examining the presence/absence of an area that can store the calculation result of the unit calculation.
  • each processing node of a middle layer need only assure a memory that stores calculation results required to make the unit calculations by its upper layer processing nodes. Therefore, according to the first embodiment, when the results of calculations made by processing nodes of a certain layer are used as inputs of calculations of upper layer processing nodes, the required memory size can be reduced. That is, a memory size required to hold temporal calculation results (intermediate results) of input layer processing nodes or middle layer processing nodes can be reduced.
  • the unit calculation execution unit 101 cyclically designates processing nodes that make unit calculations, as soon as the calculation results of lower layer processing nodes required for the unit calculation to be made by a certain processing node are available, the unit calculation is executed. Furthermore, an upper layer processing node immediately executes the unit calculation, and the calculation result which was used in that unit calculation and is no longer required is discarded (an area which stores that calculation result is defined as an overwritable area, that is, an area which can store a new calculation result).
  • the first embodiment realizes the effective use of the memory by such memory control.
  • an intermediate calculation buffer of a hierarchical calculation apparatus of the convolutional neural network and the like can be configured by minimum ring buffers for respective logical processing nodes in accordance with the network composition.
  • the calculation results of middle layer processing nodes are stored in predetermined assigned ring buffers of the memory 104 , and are always cyclically overwritten during hierarchical calculations. Therefore, after completion of the hierarchical calculations, the calculation results of the middle layer processing nodes cannot be used for other processing.
  • FIG. 13 is a block diagram showing an example of the arrangement of a hierarchical calculation processing apparatus according to the second embodiment.
  • components denoted by the same reference numerals in FIG. 2 make the same operations as those in the first embodiment, and a repetitive description thereof will be avoided.
  • components that make operations different from the first embodiment will be mainly described.
  • a CPU bus access control unit 1714 has, in addition to the operation of the CPU bus access control unit 114 of the first embodiment:
  • the sequence control unit 1700 has, in addition to the functions of the sequence control unit 100 of the first embodiment, a function of suspending the output of the next sequence instruction information upon reception of a circulation suspend instruction from a unit calculation execution unit 1701 . Furthermore, upon reception of a circulation restart instruction from the CPU bus access control unit 1714 in the suspended state, the sequence control unit 1700 restarts the output of the sequence instruction information.
  • the unit calculation execution unit 1701 has the following function in addition to those of the unit calculation execution unit 101 of the first embodiment. That is, the unit calculation execution unit 1701 has a function of issuing a circulation suspend instruction to the sequence control unit 1700 upon reception of a suspend request of the output of sequence instruction information of the sequence control unit 1700 from a network composition management unit 1702 .
  • the network composition management unit 1702 has the following function in addition to those of the network composition management unit 102 of the first embodiment. That is, the network composition management unit 1702 has a function of issuing a suspend request of the output of sequence instruction information from the sequence control unit 1700 in response to an instruction from an overwrite inhibited processing node determination unit 1715 . Upon notification of address calculation parameter update information from the ring buffer management unit 103 , the network composition management unit 1702 notifies the overwrite inhibited processing node determination unit 1715 of that information.
  • the overwrite inhibited processing node determination unit 1715 processing nodes, the calculation results of which are inhibited from being overwritten, are registered in advance. Furthermore, upon receiving, from the ring buffer management unit 103 , notification of an updated value of a write counter value corresponding to the registered processing node (to be referred to as an overwrite inhibited processing node hereinafter), the overwrite inhibited processing node determination unit 1715 checks if the updated value is zero. When the updated value of the write counter value is zero, this means that a result is stored in the start line-storing area of a ring buffer in the next unit calculation of this processing node. Therefore, the calculation result stored so far (stored in the first line-storing area of the ring buffer) is likely to be overwritten. Hence, upon notification of 0 as the updated value of the write counter value, the overwrite inhibited processing node determination unit 1715 issues a request for suspending the output of sequence instruction information to the sequence control unit 1700 (circulation suspend instruction).
  • the calculations are suspended. During the suspended state, the calculation result stored so far can be read out from the ring buffer assigned to the overwrite inhibited processing node of the memory 104 , and can be transferred to another location (another memory area).
  • the CPU 68 can read out the calculation result stored so far from the ring buffer assigned to the overwrite inhibited processing node of the memory 104 , and transfer it to the RAM 70 . After completion of the required processing, the CPU 68 issues a circulation restart instruction to the sequence control unit 1700 to restart the hierarchical calculations.
  • the calculation results of middle layer processing nodes can be used for another processing.
  • the method of cyclically using predetermined continuous areas of the memory 104 for respective lines using the ring counter has been described.
  • the present invention is not limited to such specific memory use method.
  • a method of executing processing while assigning discontinuous areas for predetermined processing units with reference to a memory address table corresponding to the ring counter or the like may be used.
  • the ring buffer specified in the present invention is not limited to a ring buffer of the narrow sense or a cyclic buffer.
  • the present invention can be similarly applied to a configuration in which feature planes (calculation results) are sub-sampled with respect to an input plane.
  • sequence control for respective lines as the most efficient processing unit has been explained.
  • the present invention is not limited to such specific control.
  • the present invention can be applied to sequence control for respective units not more than one line or for respective blocks, and the arrangement in such case is known to those who are skilled in the art.
  • calculations for one row in the horizontal direction are defined as a unit calculation.
  • the present invention is not limited to this.
  • calculations for one column in the vertical direction may be defined as a unit calculation.
  • the unit calculation is not limited to calculations for one row (or one column).
  • calculations for two rows in the horizontal directions may be defined as a unit calculation.
  • the present invention is not limited to this.
  • the present invention can be applied to various kinds of hierarchical calculation processing that require a predetermined reference area in calculation results of the previous stage.
  • the present invention is not limited to this.
  • the present invention can also be applied to hierarchical processing of various other two-dimensional calculations other than the convolution calculations.
  • the embodiments have been explained in detail.
  • the present invention can adopt embodiments in the forms of, for example, a system, apparatus, method, program, storage medium, and the like. More specifically, the present invention may be applied to either a system configured by a plurality of device, or an apparatus consisting of a single device.
  • the present invention includes a case wherein the functions of the aforementioned embodiments are achieved when a software program is directly or remotely supplied to a system or apparatus, and a computer of that system or apparatus reads out and executes the supplied program code.
  • the program to be supplied in this case is a computer program corresponding to each illustrated flowchart in the embodiments.
  • the program code itself installed in a computer to implement the functional processing of the present invention using the computer implements the present invention.
  • the present invention includes the computer program itself for implementing the functional processing of the present invention.
  • the form of program is not particularly limited, and an object code, a program to be executed by an interpreter, script data to be supplied to an OS, and the like may be used as long as they have the functions of the program.
  • a computer-readable storage medium for supplying the program the following media can be used.
  • a Floppy® disk, hard disk, optical disk, magneto-optical disk, MO, CD-ROM, CD-R, CD-RW, magnetic tape, nonvolatile memory card, ROM, DVD (DVD-ROM, DVD-R), and the like can be used.
  • the user establishes a connection to a homepage on the Internet using a browser on a client computer, and downloads the computer program of the present invention from the homepage onto a recording medium such as a hard disk or the like.
  • the program to be downloaded may be a compressed file including an automatic installation function.
  • the program code that forms the program of the present invention may be segmented into a plurality of files, which may be downloaded from different homepages. That is, the present invention includes a WWW server which makes a plurality of users download a program file required to implement the functional processing of the present invention by the computer.
  • a storage medium such as a CD-ROM or the like, which stores the encrypted program of the present invention, may be delivered to the user.
  • the user who has cleared a predetermined condition may be allowed to download key information used to decrypt the encrypted program from a homepage via the Internet.
  • the user executes the encrypted program using the downloaded key information to install the program on a computer.
  • the functions of the aforementioned embodiments can be implemented when the computer executes the readout program. Furthermore, the functions of the aforementioned embodiments can be implemented in cooperation with an OS or the like running on the computer based on an instruction of that program. In this case, the OS or the like executes some or all of actual processes, which implement the functions of the aforementioned embodiments.
  • some or all of the functions of the aforementioned embodiments may be implemented when the program read out from the recording medium is written in a memory equipped on a function expansion board or a function expansion unit, which is inserted into or connected to the computer.
  • a CPU equipped on the function expansion board or function expansion unit executes some or all of actual processes based on an instruction of that program.
  • hierarchical calculation processing such as CNN calculations and the like based on a spatial allocation relationship can be implemented by a small memory size.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
US12/602,628 2007-06-13 2008-06-11 Calculation processing apparatus and control method thereof Expired - Fee Related US7978905B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2007-156734 2007-06-13
JP2007156734A JP5171118B2 (ja) 2007-06-13 2007-06-13 演算処理装置及びその制御方法
PCT/JP2008/061083 WO2008153196A1 (en) 2007-06-13 2008-06-11 Calculation processing apparatus and control method thereof

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2008/061083 A-371-Of-International WO2008153196A1 (en) 2007-06-13 2008-06-11 Calculation processing apparatus and control method thereof

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/155,640 Continuation US8385631B2 (en) 2007-06-13 2011-06-08 Calculation processing apparatus and control method thereof

Publications (2)

Publication Number Publication Date
US20100215253A1 US20100215253A1 (en) 2010-08-26
US7978905B2 true US7978905B2 (en) 2011-07-12

Family

ID=40129793

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/602,628 Expired - Fee Related US7978905B2 (en) 2007-06-13 2008-06-11 Calculation processing apparatus and control method thereof
US13/155,640 Active US8385631B2 (en) 2007-06-13 2011-06-08 Calculation processing apparatus and control method thereof

Family Applications After (1)

Application Number Title Priority Date Filing Date
US13/155,640 Active US8385631B2 (en) 2007-06-13 2011-06-08 Calculation processing apparatus and control method thereof

Country Status (4)

Country Link
US (2) US7978905B2 (enExample)
JP (1) JP5171118B2 (enExample)
CN (1) CN101681450B (enExample)
WO (1) WO2008153196A1 (enExample)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9053431B1 (en) 2010-10-26 2015-06-09 Michael Lamport Commons Intelligent control with hierarchical stacked neural networks
US20160224266A1 (en) * 2015-01-29 2016-08-04 Canon Kabushiki Kaisha Information processing apparatus
US9626285B2 (en) 2012-08-22 2017-04-18 Canon Kabushiki Kaisha Storage resource allocation to dataflows based on data requirements and attributes
US9875440B1 (en) 2010-10-26 2018-01-23 Michael Lamport Commons Intelligent control with hierarchical stacked neural networks
US10013628B2 (en) 2014-03-31 2018-07-03 Canon Kabushiki Ksiaha Information processing apparatus and information processing method

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5368687B2 (ja) 2007-09-26 2013-12-18 キヤノン株式会社 演算処理装置および方法
WO2011146147A1 (en) * 2010-05-19 2011-11-24 The Regents Of The University Of California Neural processing unit
US10387773B2 (en) * 2014-10-27 2019-08-20 Ebay Inc. Hierarchical deep convolutional neural network for image classification
US10255547B2 (en) * 2014-12-04 2019-04-09 Nvidia Corporation Indirectly accessing sample data to perform multi-convolution operations in a parallel processing system
JP6706788B2 (ja) * 2015-03-06 2020-06-10 パナソニックIpマネジメント株式会社 画像認識方法、画像認識装置およびプログラム
US9747546B2 (en) * 2015-05-21 2017-08-29 Google Inc. Neural network processor
EP3323075B1 (en) * 2015-07-15 2023-01-18 Cylance Inc. Malware detection
WO2017171852A1 (en) * 2016-04-01 2017-10-05 Intel Corporation Four stable state neuron
CN107315571B (zh) * 2016-04-27 2020-07-31 中科寒武纪科技股份有限公司 一种用于执行全连接层神经网络正向运算的装置和方法
CN111860813B (zh) 2016-04-29 2024-01-16 中科寒武纪科技股份有限公司 一种用于执行卷积神经网络正向运算的装置和方法
GB201607713D0 (en) 2016-05-03 2016-06-15 Imagination Tech Ltd Convolutional neural network
JP6708044B2 (ja) 2016-07-28 2020-06-10 富士通株式会社 画像認識装置、画像認識プログラム、画像認識方法および認識装置
JP6786948B2 (ja) * 2016-08-12 2020-11-18 富士通株式会社 演算処理装置及び演算処理装置の制御方法
CN106529679B (zh) * 2016-10-14 2020-01-14 腾讯科技(上海)有限公司 一种机器学习方法及系统
JP6852365B2 (ja) * 2016-11-25 2021-03-31 富士通株式会社 情報処理装置、情報処理システム、情報処理プログラムおよび情報処理方法
JP6936592B2 (ja) 2017-03-03 2021-09-15 キヤノン株式会社 演算処理装置およびその制御方法
CN107066334A (zh) * 2017-03-17 2017-08-18 联想(北京)有限公司 信息处理方法及处理系统
EP3388981B1 (en) 2017-04-13 2021-09-01 Nxp B.V. Convolutional processing system
JP6929734B2 (ja) * 2017-08-08 2021-09-01 キヤノン株式会社 判別演算装置、判別演算方法及びプログラム
CN108475347A (zh) * 2017-11-30 2018-08-31 深圳市大疆创新科技有限公司 神经网络处理的方法、装置、加速器、系统和可移动设备
US11443185B2 (en) 2018-10-11 2022-09-13 Powerchip Semiconductor Manufacturing Corporation Memory chip capable of performing artificial intelligence operation and method thereof
CN109615065A (zh) * 2018-12-17 2019-04-12 郑州云海信息技术有限公司 一种基于fpga的数据处理方法、设备以及存储介质
CN109739703B (zh) * 2018-12-28 2020-01-17 中科寒武纪科技股份有限公司 调错方法及相关产品
JP7297468B2 (ja) 2019-02-28 2023-06-26 キヤノン株式会社 データ処理装置及びその方法
US10891537B2 (en) * 2019-03-20 2021-01-12 Huawei Technologies Co., Ltd. Convolutional neural network-based image processing method and image processing apparatus
JP7278150B2 (ja) * 2019-05-23 2023-05-19 キヤノン株式会社 画像処理装置、撮像装置、画像処理方法
US10976965B1 (en) * 2020-10-14 2021-04-13 First Capitol Consulting, Inc. Optimization of in-memory processing of data represented by an acyclic graph so that the removal and re-materialization of data in selected nodes is minimized
CN112270412B (zh) * 2020-10-15 2023-10-27 北京百度网讯科技有限公司 网络算子处理方法、装置、电子设备及存储介质
JP2024151449A (ja) * 2023-04-12 2024-10-25 キヤノン株式会社 演算処理装置及びその方法

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0264787A (ja) 1988-08-31 1990-03-05 Fujitsu Ltd 階層構造ニューラルネット
JPH0355658A (ja) 1989-07-25 1991-03-11 Fujitsu Ltd 半導体情報処理装置
US5220559A (en) 1988-08-31 1993-06-15 Fujitsu Limited Neuron architecture
JPH1021406A (ja) 1996-03-29 1998-01-23 Nec Corp 物体認識方法及び装置
JPH10162120A (ja) 1996-12-02 1998-06-19 Mitsubishi Electric Corp 動画像処理方法ならびに動画像処理装置
JPH11184841A (ja) 1997-12-22 1999-07-09 Canon Inc 画像処理方法及び装置
JP3055658B2 (ja) 1996-03-04 2000-06-26 富士電子工業株式会社 内面焼入方法及び装置
JP2002358500A (ja) 2001-05-31 2002-12-13 Canon Inc パターン認識装置
US6546471B1 (en) * 1997-02-27 2003-04-08 Hitachi, Ltd. Shared memory multiprocessor performing cache coherency
JP2003281518A (ja) 2002-03-20 2003-10-03 Fuji Xerox Co Ltd 画像処理装置および画像処理方法
JP2004101910A (ja) 2002-09-10 2004-04-02 Canon Inc 解像度変換装置及び方法及び情報処理装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1310186C (zh) * 2002-09-24 2007-04-11 中兴通讯股份有限公司 一种神经网络均衡器的优化训练方法
CN1331092C (zh) * 2004-05-17 2007-08-08 中国科学院半导体研究所 模式识别专用神经网络计算机系统

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0264787A (ja) 1988-08-31 1990-03-05 Fujitsu Ltd 階層構造ニューラルネット
US5220559A (en) 1988-08-31 1993-06-15 Fujitsu Limited Neuron architecture
JPH0355658A (ja) 1989-07-25 1991-03-11 Fujitsu Ltd 半導体情報処理装置
JP3055658B2 (ja) 1996-03-04 2000-06-26 富士電子工業株式会社 内面焼入方法及び装置
US6038337A (en) 1996-03-29 2000-03-14 Nec Research Institute, Inc. Method and apparatus for object recognition
JPH1021406A (ja) 1996-03-29 1998-01-23 Nec Corp 物体認識方法及び装置
JPH10162120A (ja) 1996-12-02 1998-06-19 Mitsubishi Electric Corp 動画像処理方法ならびに動画像処理装置
US6546471B1 (en) * 1997-02-27 2003-04-08 Hitachi, Ltd. Shared memory multiprocessor performing cache coherency
JPH11184841A (ja) 1997-12-22 1999-07-09 Canon Inc 画像処理方法及び装置
JP2002358500A (ja) 2001-05-31 2002-12-13 Canon Inc パターン認識装置
US7039233B2 (en) 2001-05-31 2006-05-02 Canon Kabushiki Kaisha Pattern recognition apparatus for detecting predetermined pattern contained in input signal
JP2003281518A (ja) 2002-03-20 2003-10-03 Fuji Xerox Co Ltd 画像処理装置および画像処理方法
JP2004101910A (ja) 2002-09-10 2004-04-02 Canon Inc 解像度変換装置及び方法及び情報処理装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
K Korekado et al., "An Image Filtering Processor for Face/Object Recognition Using Merged/Mixed Analog-Digital Architecture", 2005 Symposium on VLSI Circuits Digest of Technical Papers, pp. 220-223 (2005).

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9053431B1 (en) 2010-10-26 2015-06-09 Michael Lamport Commons Intelligent control with hierarchical stacked neural networks
US9875440B1 (en) 2010-10-26 2018-01-23 Michael Lamport Commons Intelligent control with hierarchical stacked neural networks
US10510000B1 (en) 2010-10-26 2019-12-17 Michael Lamport Commons Intelligent control with hierarchical stacked neural networks
US11514305B1 (en) 2010-10-26 2022-11-29 Michael Lamport Commons Intelligent control with hierarchical stacked neural networks
US11868883B1 (en) 2010-10-26 2024-01-09 Michael Lamport Commons Intelligent control with hierarchical stacked neural networks
US12124954B1 (en) 2010-10-26 2024-10-22 Michael Lamport Commons Intelligent control with hierarchical stacked neural networks
US9626285B2 (en) 2012-08-22 2017-04-18 Canon Kabushiki Kaisha Storage resource allocation to dataflows based on data requirements and attributes
US10013628B2 (en) 2014-03-31 2018-07-03 Canon Kabushiki Ksiaha Information processing apparatus and information processing method
US20160224266A1 (en) * 2015-01-29 2016-08-04 Canon Kabushiki Kaisha Information processing apparatus
US9798484B2 (en) * 2015-01-29 2017-10-24 Canon Kabushiki Kaisha Information processing apparatus

Also Published As

Publication number Publication date
US20100215253A1 (en) 2010-08-26
CN101681450A (zh) 2010-03-24
WO2008153196A1 (en) 2008-12-18
US8385631B2 (en) 2013-02-26
JP2008310524A (ja) 2008-12-25
CN101681450B (zh) 2013-08-14
JP5171118B2 (ja) 2013-03-27
US20110239224A1 (en) 2011-09-29

Similar Documents

Publication Publication Date Title
US7978905B2 (en) Calculation processing apparatus and control method thereof
US8391306B2 (en) Calculation processing apparatus and method
US7937346B2 (en) Calculation processing apparatus and method
US20180253641A1 (en) Arithmetic processing apparatus and control method therefor
JP4700892B2 (ja) 画像のマッチング
JP7492555B2 (ja) 複数の入力データセットのための処理
CN112799599A (zh) 一种数据存储方法、计算核、芯片和电子设备
CN110795226B (zh) 利用计算机系统处理任务的方法、电子设备和存储介质
JP7070157B2 (ja) 画像処理プログラム、画像処理装置及び画像処理方法
CN101562691B (zh) 图像处理装置及方法
CN119396413B (zh) 模型部署方案生成、模型处理方法、装置及电子设备
WO2025066551A1 (zh) 图像搜索方法、装置、设备及存储介质
US20220392207A1 (en) Information processing apparatus, information processing method, and non-transitory computer-readable storage medium
KR20050006489A (ko) 화상형성시스템에 있어서 영상처리방법 및 장치
JP7437135B2 (ja) プロセッシングシステム
JP2018055570A (ja) 演算処理装置、演算処理方法及びプログラム
JP7631289B2 (ja) データ処理装置及びその方法
EP4488820B1 (en) Tensors processing methods, devices and systems
CN119863493A (zh) 角点检测方法、角点检测模型训练方法及装置
CN118608746A (zh) 检测框筛选方法、装置、计算机设备及存储介质
JPH0454267B2 (enExample)

Legal Events

Date Code Title Description
AS Assignment

Owner name: CANON KABUSHIKI KAISHA, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YAMAMOTO, TAKAHISA;KATO, MASAMI;ITO, YOSHINORI;REEL/FRAME:023799/0775

Effective date: 20091124

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20230712