WO2018023847A1 - 用于缩放图像的处理器和方法 - Google Patents

用于缩放图像的处理器和方法 Download PDF

Info

Publication number
WO2018023847A1
WO2018023847A1 PCT/CN2016/097293 CN2016097293W WO2018023847A1 WO 2018023847 A1 WO2018023847 A1 WO 2018023847A1 CN 2016097293 W CN2016097293 W CN 2016097293W WO 2018023847 A1 WO2018023847 A1 WO 2018023847A1
Authority
WO
WIPO (PCT)
Prior art keywords
pixel
image
processor
pixel data
target
Prior art date
Application number
PCT/CN2016/097293
Other languages
English (en)
French (fr)
Inventor
涂依晨
欧阳剑
漆维
王勇
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2018023847A1 publication Critical patent/WO2018023847A1/zh
Priority to US16/265,566 priority Critical patent/US10922785B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management

Definitions

  • the present application relates to the field of computer technologies, and in particular, to the field of image processing technologies, and in particular, to a processor and method for scaling an image.
  • the purpose of the present application is to propose an improved processor and method for scaling an image to solve the technical problems mentioned in the background section above.
  • the present application provides a processor for scaling an image, the processor comprising an off-chip memory, a communication device, a control device, and an array processor, wherein: the off-chip memory is configured to store a to-be-scaled Original image, the original image is an N-channel image, N is an integer greater than 1; the communication device is configured to receive an image scaling instruction, The image scaling instruction includes a width scaling factor and a height scaling factor; the control device is configured to execute the image scaling instruction, and send a pixel to the array processor for calculating each target pixel in the scaled target image a calculation control signal of the data; the array processor, configured to extract pixel data of a pixel corresponding to the target pixel in the original image under control of the calculation control signal, and use N in the array processor
  • the processing unit calculates the channel values of the N channels in the target pixel in parallel according to the width scaling factor, the height scaling factor, and the channel values of the N channels in the extracted pixel data.
  • the processor further includes an on-chip buffer; and the control device includes: a read control unit configured to sequentially read pixel data of the original image in the off-chip memory to the on-chip buffer And a calculation control unit, configured to send, to the array processor, a calculation control signal for extracting pixel data of the pixel corresponding to the target pixel from the on-chip buffer.
  • the on-chip cache includes a first on-chip cache and a second on-chip cache, and a read/write speed of the second on-chip cache is greater than a read/write speed of the first on-chip cache; and the reading The control unit is further configured to: sequentially read the pixel data in the original image into the first on-chip buffer in a row; and sequentially read the pixel data of each row in the first on-chip buffer in columns And the calculation control unit is further configured to: send, to the array processor, a calculation control signal for extracting pixel data of the pixel corresponding to the target image from the second on-chip buffer .
  • the calculation control unit is further configured to: when the read control unit finishes reading the pixel data of the pixel corresponding to the target pixel in the original image to the second on-chip buffer, processing the array The machine issues the calculation control signal.
  • the processor further includes: a parameter transfer device configured to acquire, by using a target pixel, coordinates (x, y) of the target image, the width scaling multiple scale_w, and the height scaling multiple scale_h
  • a parameter transfer device configured to acquire, by using a target pixel, coordinates (x, y) of the target image, the width scaling multiple scale_w, and the height scaling multiple scale_h
  • the parameter transfer device is further configured to: determine, according to the width scaling factor scale_w and the height scaling factor scale_h, a corresponding data table in a data table set pre-stored by the processor; use x The current value of y is queried in the determined data table to obtain corresponding values of the parameters x 0 , w 0 , w 1 , y 0 , h 0 , h 1 ; wherein the data table set includes a width scaling factor a data table corresponding to various values of the height scaling factor, and x, y and corresponding x 0 , w 0 , w 1 , y 0 of the various values in the data table for the current width scaling factor and the height scaling factor, The values of h 0 and h 1 are stored in association.
  • the N processing units of the array processor share a multiplier group
  • the array processor is further configured to: calculate w 0 ⁇ h by using a multiplier group shared by the plurality of processing units 0 , w 1 ⁇ h 0 , w 0 ⁇ h 1 , w 1 ⁇ h 1 .
  • the processor further includes a cache management device for at least one of: releasing pixel data of each row of the first on-chip buffer having an abscissa smaller than x 0 ; releasing the second slice Pixel data for each column whose horizontal coordinate is equal to x 0 and whose ordinate is smaller than y 0 in the cache.
  • the read operation control device is further configured to: determine, according to an abscissa of a corresponding pixel of the target image in the target image, the pixel data of the original image to be read to Decoding the row on the first slice; reading the determined row to be read row by row onto the first on-chip cache; and/or according to each target pixel in the target image corresponding to the original image Determining, in the ordinate of the pixel, a column to be read to the second on-chip buffer in each row of pixel data in the first on-chip buffer; reading the determined column to be read column by column to the Two on the cache.
  • the present application provides a method for scaling an image, the method comprising: receiving an image scaling instruction, the image scaling instruction including a width scaling factor and a height scaling factor, the original image to be scaled being an N channel image, N Is an integer greater than 1; performing the image scaling instruction to issue to the array processor for calculating the scaled target image Calculating a control signal of pixel data of each target pixel; using the calculation control signal, controlling the array processor to extract pixel data of a pixel corresponding to the target pixel in the original image, and controlling the array processor
  • the N processing units calculate the channel values of the N channels in the target pixel in parallel according to the width scaling factor, the height scaling factor, and the channel values of the N channels in the extracted pixel data.
  • the method further comprises: sequentially reading pixel data of the original image into an on-chip buffer; and the issuing to the array processor to calculate each target pixel in the scaled target image
  • the calculation control signal of the pixel data includes: issuing, to the array processor, a calculation control signal for extracting pixel data of the pixel corresponding to the target pixel from the on-chip buffer.
  • the on-chip cache includes a first on-chip cache and a second on-chip cache, and a read/write speed of the second on-chip cache is greater than a read/write speed of the first on-chip cache;
  • the pixel data of the original image is sequentially read into the on-chip buffer, including: sequentially reading pixel data in the original image into the first on-chip cache in a row; and buffering the first on-chip Pixel data of each of the rows is sequentially read into the second on-chip buffer in columns; and the pixel processor is configured to extract pixel data of the pixel corresponding to the target pixel from the on-chip buffer
  • Calculating the control signal includes: issuing, to the array processor, a calculation control signal for extracting pixel data of the original image from a pixel corresponding to the target pixel from the second on-chip buffer.
  • the method further includes: acquiring, by using a target pixel, coordinates (x, y), the width scaling factor scale_w, and the height scaling factor scale_h of the target image to obtain parameters x 0 , w 0 ,
  • the processor extracts pixel data of pixels corresponding to the target pixel in the original image, and controls N processing units in the array processor to perform N scaling, the height scaling multiple, and N in the extracted pixel data.
  • the channel values of the channels, and the channel values of the N channels in the target pixel are calculated in parallel, including: controlling the array processor to coordinate the original image (x 0 , y 0 ), (x 0 +1, y 0 )
  • the utilization target pixel acquires parameters x 0 , w 0 , w 1 , y at coordinates (x, y) of the target image, the width scaling multiple scale_w, and the height scaling multiple scale_h.
  • the value of 0 , h 0 , h 1 includes: determining, according to the width scaling factor scale_w and the height scaling factor scale_h, a corresponding data table in a data table set pre-stored by the processor; using x, y The current value is queried in the determined data table to obtain corresponding values of the parameters x 0 , w 0 , w 1 , y 0 , h 0 , h 1 ; wherein the data table set includes a width scaling factor and a height scaling a data table corresponding to various values of the multiple, and x, y and corresponding x 0 , w 0 , w 1 , y 0 , h 0 of the various values in the data table
  • Parallel calculation of channel values Y(x, y) of N channels in the target pixel including: calculating w 0 ⁇ h by using a multiplier group shared by the plurality of processing units 0 , w 1 ⁇ h 0 , w 0 ⁇ h 1 , w 1 ⁇ h 1 .
  • the method further includes at least one of: releasing pixel data of each row in the first on-chip cache with an abscissa smaller than x 0 ; releasing the horizontal coordinate in the second on-chip cache equal to x 0 And the pixel data of each column whose ordinate is smaller than y 0 .
  • the pixel data of each row in the first on-chip cache is sequentially read into the second on-chip buffer in a column, including: according to each of the target images Determining, in the abscissa of the corresponding pixel of the original image, the row of the pixel data of the original image to be read to the cache on the first slice; and determining the determined row to be read line by line to The first on-chip cache; and/or the pixel data of each row in the first on-chip cache is sequentially read into the second on-chip cache in a column, including: according to the target image Determining, in the ordinate of the corresponding pixel in the original image, the column to be read to the cache on the second slice in each row of pixel data in the first on-chip cache; determining the determined to be read The column is read column by column onto the second on-chip cache.
  • the processor and method for scaling an image utilizes each processing unit in the array processor to perform parallel calculation on channel values of respective channels of pixels in the target image, thereby improving parallelism of image scaling and greatly reducing image scaling.
  • the execution speed of the operation utilizes each processing unit in the array processor to perform parallel calculation on channel values of respective channels of pixels in the target image, thereby improving parallelism of image scaling and greatly reducing image scaling.
  • FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
  • FIG. 2 is a block diagram showing an embodiment of a processor for scaling an image according to the present application
  • FIG. 3 is a schematic structural diagram of still another embodiment of a processor for scaling an image according to the present application.
  • FIG. 4 is a flow diagram of one embodiment of a method for scaling an image in accordance with the present application.
  • FIG. 5 is a flow chart of still another embodiment of a method for scaling an image in accordance with the present application.
  • FIG. 1 illustrates an exemplary system architecture 100 of an embodiment of a processor and method for scaling an image to which the present application may be applied.
  • system architecture 100 can include a general purpose processor 101 and a dedicated server 102.
  • the general purpose processor 101 is configured to send the image scaling instruction and the original image to be scaled to the dedicated processor 102, and the dedicated processor 102 may perform a scaling operation on the original image according to the image scaling instruction, and the target image after the scaling operation may be sent back.
  • General purpose processor 101 The general-purpose processor 102 may also be referred to as a host, and the dedicated processor 102 may be designed by an FPGA (Field-Programmable Gate Array).
  • the processor provided by the embodiment of the present application generally refers to the dedicated processor 102 in FIG. 1. Accordingly, the method for executing instructions on the processor is also generally performed by the dedicated processor 102.
  • general purpose processor 101 and dedicated processor 102 in FIG. 1 is merely illustrative. There may be any number of general purpose processors and special purpose processors depending on the needs of the implementation.
  • FIG. 2 a block diagram 200 of a processor in accordance with the present application is shown.
  • processor 200 includes off-chip memory 201, off-chip communication device 202, control device 203, and array processor 204.
  • the off-chip memory 201 can be used to store an original image to be subjected to a scaling operation.
  • the original image may be an N-channel image (N is an integer greater than one).
  • N is an integer greater than one.
  • the zoom operation since the zoom operation only changes the image size (ie height and width), it does not affect the number of channels of the image, and the scaled target image is still an N-channel image.
  • N is 32
  • the original image before scaling is a 32-channel image
  • the scaled target image is still a 32-channel image.
  • Communication device 202 can be coupled to an external host (e.g., a general purpose processor in FIG. 1) to receive image scaling instructions from the host.
  • the image scaling instruction may include a width scaling factor and a height scaling factor.
  • the host may first calculate the width scaling factor and the height scaling factor according to the size of the original image and the size of the target image, and then generate an image scaling instruction, and then send the processor to the processor in the present application through the communication device. 200.
  • the communication device 202 can be connected to the host via PCIE (Peripheral Component Interface Express) or other bus.
  • Communication device 202 can be electrically coupled to control device 203 such that scaling instructions can be transmitted to control device 203 for execution by the latter.
  • the original image stored on the off-chip memory 201 may also be externally input from the host through the communication device 202 in advance, or may be temporarily stored in the off-chip memory 201 for the next image after the image scaling is performed last time. Scaled image.
  • the control device 203 is configured to execute an image scaling instruction received from the outside to generate a series of control signals and send control signals to the devices of the processor in a certain order to cause the devices to be executed in a desired manner.
  • These control signals include computational control signals for controlling array processor 204 to calculate pixel data for each target pixel in the scaled target image.
  • Control device 203 can transmit a computational control signal to array processor 204 at a particular time to control array processor 204 to calculate pixel data for the target pixel at a desired time. It should be noted that the control device 203 can also generate and issue a read/write signal for controlling the memory read/write operation.
  • the array processor 203 can perform the calculation of the pixel data of the target pixel under the control of the above-described calculation control signal. First, the array processor 203 can extract pixel data of pixels corresponding to the target pixel in the original image. The pixel corresponding to the target pixel in the original image is related to the specific algorithm used in the scaling operation. That is, one or more pixels corresponding to the current target pixel may be located in the original image according to a specific algorithm employed by the scaling operation. Thereafter, the array processor 203 can use the N processing units included to calculate the channel values of the N channels in the target pixel in parallel according to the width scaling factor, the height scaling factor, and the channel values of the N channels in the extracted pixel data.
  • the channel value of each channel in the target pixel is calculated by performing the scaling operation, the channel value of the corresponding channel in the extracted pixel can be processed by using the same formula, that is, the calculation of each channel is a process without mutual dependence, so The same set of operations is performed by N processing units PE (Processing Element) repeatedly set in the array processor.
  • PE Processing Element
  • processor 200 also includes an on-chip buffer (not shown), while control device 203 may include a read control unit (not shown) and a computation control unit (not shown).
  • the calculation control unit is configured to send, to the array processor 204, a calculation control signal for extracting pixel data of the original image and the pixel corresponding to the target pixel from the on-chip buffer.
  • a calculation control unit configured to send, to the array processor, a calculation control signal for extracting pixel data of the original image from the pixel corresponding to the target pixel from the on-chip buffer.
  • FIG. 3 a block diagram 300 of a processor in accordance with the present application is shown.
  • processor 300 includes off-chip memory 301, communication device 302, control device 303, on-chip first on-chip cache 304, second on-chip cache 305, and array processor 306.
  • the processor 300 in this embodiment includes a first on-chip buffer 304 and a second on-chip buffer 305, and the read/write speed of the second on-chip cache 305 is greater than the first on-chip buffer 304. Read and write speed.
  • the storage space of the second on-chip cache 305 is larger than the storage space of the first on-chip cache 304.
  • the read control unit 3031 in the control device 303 is configured to sequentially read the pixel data in the original image into the first on-chip buffer 304 in a row; and to store the pixel data of each row in the first on-chip buffer 304 by column.
  • the second on-chip cache 305 is sequentially read; and the calculation control unit 3032 is further configured to issue to the array processor 306 a calculation control signal for extracting pixel data of the original image from the pixel corresponding to the target pixel from the second on-chip buffer 305.
  • the on-chip cache is designed as a two-level cache, and the first on-chip cache is designed to read pixel data of the original image from the off-chip memory in rows, and the second on-chip cache is designed to be on the first slice. Each row of pixel data in the cache is read in columns, and the pixel data buffered on the second slice is extracted by the computational array for calculation.
  • the secondary cache is designed and read in rows and columns, the granularity of the read data is large, and the repeated reading of the data is not caused, which is advantageous for reducing the number of readings, thereby improving the execution speed of the image scaling operation. Reduce operating time.
  • the calculation control unit 3032 further uses When the read control unit 3031 finishes reading the pixel data of the pixel corresponding to the target pixel in the original image to the second on-chip buffer 305, the above-mentioned calculation control signal is sent to the array processor 306. In this implementation manner, since the pixel data required by the current pixel is ready in the L2 cache, a calculation command is initiated, so that the data fetch operation and the array processor's calculation operation on the data can be executed in parallel in a pipeline form. To further improve the overall execution speed.
  • the processor 300 further includes: a parameter transfer device (not shown) for utilizing the target pixel's coordinates (x, y), the width scaling factor scale_w, and the height scaling of the target image.
  • a parameter transfer device (not shown) for utilizing the target pixel's coordinates (x, y), the width scaling factor scale_w, and the height scaling of the target image.
  • the adjacent four pixels are determined as pixels
  • the read control unit 3031 may control the first on-chip buffer 304 to sequentially read the off-chip memory 301. Pixel data for every two rows of pixels.
  • the second on-chip buffer 305 can sequentially read the pixel data of each two rows of pixels in the buffer 304 on the first slice, and sequentially read the pixel data of each two columns, so that two adjacent 2 ⁇ 2 can be acquired each time.
  • the pixel data of the pixel which is used for subsequent calculations. among them, Is a floor function that represents the largest integer less than the number being manipulated.
  • the parameter passing device is further configured to: determine, according to the scale_w and the scale_h, a corresponding data table in the data table set pre-stored by the processor; use the current value of x, y at for the determined data table query to obtain the parameters x 0, w 0, w 1 , y 0, h 0, h 1 corresponding value; wherein the set of the data table includes various width and height of the zoom magnification zoom factor of Corresponding data table, and the data table is associated with the values of x, y and corresponding values of x 0 , w 0 , w 1 , y 0 , h 0 , h 1 of the current width scaling factor and the height scaling factor storage.
  • the parameters can be pre-computed and configured in the data table, and the parameter transfer device in the processor only needs to query the data table when needed. , reducing a large number of repeated parameter calculation operations, further improving the overall execution speed of image scaling.
  • the N processing units of the array processor 306 share a multiplier group, and the array processor 306 is further configured to: calculate, by using a multiplier group shared by the N processing units, w 0 ⁇ h 0 , w 1 ⁇ h 0 , w 0 ⁇ h 1 , w 1 ⁇ h 1 .
  • FIG. 4 shows the structure of processing unit 400 in a computing array in this implementation.
  • Processing unit 400 includes a multiplier group 401 that is shared with other various processing units (not shown).
  • the channel values X(x 0 , y 0 ), X of the channels in the adjacent four pixels may be used for the channels to be processed by the processing unit 400.
  • (x 0 +1, y 0 ), X (x 0 , y 0 +1), X (x 0 +1, y 0 +1) are input to the processing unit 400 through input ports such as input0, input1, input2, and input3.
  • parameters such as h 0 , h 1 , w 0 , w 1 , etc. are input to the multiplier group 401 shared by the processing unit 400 and other processing units in a manner as shown in FIG. 4 to perform w 0 ⁇ by the multiplier group 401.
  • the results calculated by the respective multipliers in the multiplier group 401 are input in parallel to the above-mentioned four multipliers in the processing unit 400 and the four multipliers used in the other respective processing units.
  • the four multipliers used in the processing unit 400 and the subsequent adders continue to calculate the uncalculated portions of the above formula to obtain the channel values of the corresponding channels in the target pixel.
  • the other processing units respectively input the calculated results to the respective processing units into subsequent portions of the respective processing units, and perform subsequent calculations in the above formulas with the channel values input to the corresponding channels in the respective processing units.
  • the other processing units and the processing unit 400 calculate the corresponding channel values of the four pixels in the same manner, and obtain corresponding channel values in the target pixels, which are not described herein again.
  • the calculation of w 0 ⁇ h 0 , w 1 ⁇ h 0 , w 0 ⁇ h 1 , w 1 ⁇ h 1 is the exact same operation, so the common multiplier group can be passed.
  • the above operations are performed to prevent the respective processing units from repeatedly performing the above operations, thereby reducing the components arranged in the array processor.
  • the processor 300 further includes a cache management device (not shown) for at least one of: releasing the first on-chip buffer 304 with an abscissa less than x 0 Pixel data for each row of the row; the pixel data of each column of the second on-chip buffer 305 whose abscissa is equal to x 0 and whose ordinate is less than y 0 is released.
  • a cache management device for at least one of: releasing the first on-chip buffer 304 with an abscissa less than x 0 Pixel data for each row of the row; the pixel data of each column of the second on-chip buffer 305 whose abscissa is equal to x 0 and whose ordinate is less than y 0 is released.
  • the pixel data of the rows in the buffer 304 on the first slice can be released, so that the space generated after the release can be used for reading the subsequent pixel data.
  • the pixel data corresponding to the column in the first buffer 305 whose abscissa is smaller than x 0 and whose ordinate is smaller than y 0 can be released, and the data is not used in the subsequent process, and the space can be released in time. Read subsequent data.
  • the read operation control device 3031 is further configured to: determine, according to the abscissa of the corresponding pixel in the original image, the pixel data of the original image to be read to The row of buffers 304 on the first slice; the determined rows to be read are read row by row onto the first on-chip cache 304.
  • the read operation control device 3031 may be further configured to: determine, according to the ordinate of the corresponding pixel of the target pixel in the original image, the pixel data of each row in the first on-chip buffer 304 to be read to the second slice.
  • the columns of the cache 305 are; the determined columns to be read are read column by column onto the second on-chip cache 305.
  • the read operation control device may perform row-by-row and column-by-column reading only on the data of the pixel corresponding to the target pixel in the original image.
  • the image zoom factor is large, some pixels in the original image are not used.
  • the time for reading the pixels that do not participate in the calculation can be reduced, thereby facilitating further improvement of the overall execution speed.
  • the processor provided by the above embodiment of the present application utilizes an array when processing image scaling
  • Each processing unit in the processor calculates the channel values of the respective channels of the pixels in the target image in parallel, which improves the parallelism of the image scaling and greatly reduces the execution speed of the image scaling operation.
  • the flow 400 of the method for scaling an image includes the following steps:
  • Step 401 Receive an image scaling instruction.
  • a processor eg, the dedicated processor shown in FIG. 1 on which the method for scaling an image runs may receive an image scaling instruction from the outside (eg, the general purpose processor of FIG. 1) via a bus.
  • the image scaling instruction can include a width scaling factor and a height scaling factor.
  • the original image to be scaled is an N-channel image, and N is an integer greater than one.
  • the zoomed image can be stored internally or externally.
  • Step 402 executing an image scaling instruction, and issuing a calculation control signal for calculating pixel data of each target pixel in the scaled target image to the array processor.
  • a processor may execute an image scaling instruction to issue a computational control signal to the array processor for computing pixel data for each of the target pixels in the scaled target image.
  • the size of the target image can be determined according to the size of the original image and the width scaling factor and the height scaling factor in the image scaling instruction, so that the target pixel in the target image that needs to calculate the pixel data can be determined.
  • the processor may issue a computational control signal to the array processor to calculate pixel data for each of the target pixels in the scaled target image to cause the array processor to perform subsequent computational operations.
  • the processor may use the array processor to extract pixel data of the pixel corresponding to the target pixel of the original image.
  • the pixel corresponding to the target pixel can be determined according to a scaling algorithm used by the image scaling operation.
  • the pixel corresponding to the target pixel is the coordinate in the original image (x 0 , y 0 ), (x 0 +1, y 0 ), (x 0 , y 0 +1), (x 0 +1, y 0 +1) corresponding to the adjacent four pixels.
  • Step 403 using the calculation control signal, controlling the array processor to extract pixel data of pixels corresponding to the target pixel in the original image, and controlling N processes in the array processor
  • the unit calculates the channel values of the N channels in the target pixel in parallel according to the width scaling factor, the height scaling factor, and the channel values of the N channels in the extracted pixel data.
  • the processor can control the array processor to calculate pixel data of each target pixel in the scaled target image by using the control signal.
  • the array processor can perform the following operations: First, the pixel data of the pixel corresponding to the target pixel in the original image is extracted to the array processor. Since the original image is an N-channel image, the pixel data of the pixel includes the channel values of the respective channels in the N channel. Afterwards, the array processor can use the N processing units and calculate the channel values in the N channels in the target pixel in parallel according to the channel values of the corresponding channels in the extracted pixel data, thereby implementing multi-channel parallel processing.
  • the method further includes: sequentially reading the pixel data of the original image into the on-chip buffer; and sending the array processor to calculate each target in the scaled target image.
  • the calculation control signal of the pixel data of the pixel includes: issuing, to the array processor, a calculation control signal for extracting pixel data of the original image from the pixel corresponding to the target pixel from the on-chip buffer.
  • the on-chip cache includes a first on-chip cache and a second on-chip cache, and the read/write speed of the cache on the second slice is greater than the read/write speed of the cache on the first slice; and the original image is The pixel data is sequentially read into the on-chip buffer, including: the pixel data in the original image is sequentially read into the first slice cache by the row; the pixel data of each row in the cache on the first slice is sequentially arranged in columns.
  • Reading the second on-chip cache and outputting, to the array processor, a calculation control signal for extracting pixel data of the original image from the pixel corresponding to the target pixel from the on-chip cache, comprising: sending the array processor to extract the original image from the second on-chip cache A calculation control signal of pixel data of a pixel corresponding to the target pixel.
  • the calculating, by the array processor, the calculation control signal for extracting the pixel data of the pixel corresponding to the target image from the second on-chip cache including: completing the target pixel in the original image
  • a calculation control signal is sent to the array processor.
  • the method further includes: acquiring, by using the target pixel, the coordinates (x, y), the width scaling factor scale_w, and the height scaling factor scale_h of the target image to obtain the parameters x 0 , w 0 , w
  • the control array processor extracts pixel data of pixels corresponding to the target pixel in the original image, and controls N processing units in the array processor according to the width scaling factor, the height scaling factor, and N of the extracted pixel data.
  • the values of h 0 and h 1 further include the steps of: determining a corresponding data table in a data table set pre-stored in the processor according to the width scaling factor scale_w and the height scaling factor scale_h; using the current value of x, y at Performing a query in the determined data table to obtain corresponding values of the parameters x 0 , w 0 , w 1 , y 0 , h 0 , h 1 ; wherein the data table set includes various values of the width scaling factor and the height scaling factor Corresponding data table, and the data table is associated with the values of x, y and corresponding values of x 0 , w 0 , w 1 , y
  • the plurality of processing units of the array processor length share a multiplier group.
  • the channel values of the current channel in the adjacent four pixels respectively include: calculating, by using a multiplier group
  • the method further includes at least one of: releasing pixel data of each row whose horizontal coordinate is less than x 0 in the first on-chip cache; and releasing the horizontal coordinate of the cache on the second on-chip is equal to x 0 and pixel data of each column whose ordinate is smaller than y 0 .
  • the pixel data of each row in the first on-chip buffer is sequentially read into the second on-chip buffer according to the column, including: according to each target pixel in the target image in the original image.
  • the abscissa of the corresponding pixel determines a row of the pixel data of the original image to be read to the cache on the first slice; and the determined row to be read is read row by row onto the first on-chip cache.
  • the pixel data of each row in the cache on the first slice is sequentially read into the second on-chip buffer according to the column, including: determining, according to the ordinate of the corresponding pixel in the original image in the target image.
  • the present application further provides a non-volatile computer storage medium, which may be included in the apparatus described in the foregoing embodiments.
  • the non-volatile computer storage medium stores one or more programs, when the one or more programs are executed by a device, causing the device to: receive an image scaling instruction, the image scaling instruction including a width scaling factor and a height scaling factor, the original image to be scaled is an N-channel image, and N is an integer greater than 1; the image scaling instruction is executed to issue pixel data for calculating each target pixel in the scaled target image to the array processor Calculating a control signal; using the calculation control signal, controlling the array processor to extract pixel data of a pixel corresponding to the target pixel in the original image, and controlling N processing units in the array processor according to the width
  • the zoom factor, the height zoom factor, and the channel values of the N channels in the extracted pixel data, and the channel values of the N channels in the target pixel are calculated in parallel.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

一种用于缩放图像的处理器和方法。所述处理器(200)包括:片外存储器(201)、通信器件(202)、控制器件(203)以及阵列处理机(204),其中:所述片外存储器(201),用于存储待缩放的原始图像;所述通信器件(202),用于接收图像缩放指令;所述控制器件(203),用于执行所述图像缩放指令,向所述阵列处理机(204)发出计算控制信号;所述阵列处理机(204),用于在所述计算控制信号的控制下,使用所述阵列处理机(204)中的N个处理单元根据所述宽度缩放倍数、所述高度缩放倍数以及所提取像素数据中N个通道的通道值,并行计算目标像素中N个通道的通道值。该处理器(200)提高了图像缩放操作的处理速度。

Description

用于缩放图像的处理器和方法
相关申请的交叉引用
本申请要求于2016年8月1日提交的中国专利申请号为“201610621655.X”的优先权,其全部内容作为整体并入本申请中。
技术领域
本申请涉及计算机技术领域,具体涉及图像处理技术领域,尤其涉及用于缩放图像的处理器和方法。
背景技术
现有的技术方案采用通用处理器对图像执行缩放操作时,首先需要获得原始图像、目标图像尺寸、宽度上放大缩小倍数、高度上放大缩小倍数等信息,然后根据以上的信息,即可进行缩放操作。
然而,现有技术中在执行图像缩放过程依然存在一些技术缺陷。首先,图像缩放过程没有能够充分挖掘并行度,造成效率低下。虽然一些通用处理器提供的单指令多数据流操作,使同样的操作可以并行执行,但一般并行度较低。此外,缩放所图像过程中参数需要重复计算,也消耗较多的时间。因此,需要对现有技术进行改进,以克服以上缺陷。
发明内容
本申请的目的在于提出一种改进的用于缩放图像的处理器和方法,来解决以上背景技术部分提到的技术问题。
第一方面,本申请提供了一种用于缩放图像的处理器,所述处理器包括片外存储器、通信器件、控制器件以及阵列处理机,其中:所述片外存储器,用于存储待缩放的原始图像,所述原始图像为N通道图像,N是大于1的整数;所述通信器件,用于接收图像缩放指令, 所述图像缩放指令包括宽度缩放倍数以及高度缩放倍数;所述控制器件,用于执行所述图像缩放指令,向所述阵列处理机发出用于计算缩放后的目标图像中的各个目标像素的像素数据的计算控制信号;所述阵列处理机,用于在所述计算控制信号的控制下,提取所述原始图像中与目标像素对应的像素的像素数据,以及使用所述阵列处理机中的N个处理单元根据所述宽度缩放倍数、所述高度缩放倍数以及所提取像素数据中N个通道的通道值,并行计算目标像素中N个通道的通道值。
在一些实施例中,所述处理器还包括片上缓存;以及所述控制器件包括:读控制单元,用于将所述片外存储器中的原始图像的像素数据按序读取到所述片上缓存中;以及计算控制单元,用于向所述阵列处理机发送从所述片上缓存提取所述原始图像与目标像素对应的像素的像素数据的计算控制信号。
在一些实施例中,所述片上缓存包括第一片上缓存和第二片上缓存,且所述第二片上缓存的读写速度大于所述第一片上缓存的读写速度;以及所述读控制单元进一步用于:将所述原始图像中的像素数据,按行依次读取到所述第一片上缓存;将所述第一片上缓存中每一行的像素数据,按列依次读取到所述第二片上缓存;以及所述计算控制单元进一步用于:向所述阵列处理机发出从所述第二片上缓存提取所述原始图像与目标像素对应的像素的像素数据的计算控制信号。
在一些实施例中,所述计算控制单元进一步用于:在读控制单元完成将目标像素在所述原始图像中对应的像素的像素数据读取到所述第二片上缓存时,向所述阵列处理机发出所述计算控制信号。
在一些实施例中,所述处理器还包括:参数传递器件,用于利用目标像素在所述目标图像的坐标(x,y)、所述宽度缩放倍数scale_w以及所述高度缩放倍数scale_h,获取参数x0、w0、w1、y0、h0、h1的值,并传递给所述阵列处理机,其中
Figure PCTCN2016097293-appb-000001
h0=y/scale_h-y0,h1=y0-y/scale_h+1,w0=x/scale_w-x0,w1=x0-x/scale_w+1;以及所述阵列处理机进一步用于:将所述原始图像中坐标(x0,y0)、(x0+1,y0)、(x0,y0+1)、(x0+1,y0+1)所对应的相邻四个像 素确定为与目标像素对应的像素并提取像素数据;使用阵列处理机中的N个处理单元,通过公式Y(x,y)=X(x0,y0)×w0×h0+X(x0+1,y0)×w1×h0+X(x0,y0+1)×w0×h1+X(x0+1,y0+1)×w1×h1并行计算目标像素中N个通道的通道值Y(x,y),其中X(x0,y0)、X(x0+1,y0)、X(x0,y0+1)、X(x0+1,y0+1)分别为当前通道在所述相邻四个像素中的通道值。
在一些实施例中,所述参数传递器件进一步用于:根据所述宽度缩放倍数scale_w以及所述高度缩放倍数scale_h,在所述处理器预先存储的数据表集合中确定对应的数据表;使用x,y的当前值在所确定的数据表中进行查询,以获得参数x0、w0、w1、y0、h0、h1的对应值;其中所述数据表集合中包括宽度缩放倍数和高度缩放倍数的各种值对应的数据表,且数据表中对当前宽度缩放倍数和高度缩放倍数下各种取值的x、y以及对应的x0、w0、w1、y0、h0、h1的值关联存储。
在一些实施例中,所述所述阵列处理机的N个处理单元共用乘法器组,以及所述阵列处理机进一步用于:利用所述多个处理单元共用的乘法器组计算w0×h0、w1×h0、w0×h1、w1×h1
在一些实施例中,所述处理器还包括缓存管理器件,用于以下至少一项:释放所述第一片上缓存中横坐标小于x0的每一行的像素数据;释放所述第二片上缓存中横坐标等于x0且纵坐标小于y0的每一列的像素数据。
在一些实施例中,所述读操作控制器件进一步用于:根据所述目标图像中各个目标像素在原始图像中对应的像素的横坐标,确定所述原始图像的像素数据中待读取至所述第一片上缓存的行;将所确定的待读取的行逐行读取到所述第一片上缓存上;和/或根据所述目标图像中各个目标像素在原始图像中对应的像素的纵坐标,确定所述第一片上缓存中的每一行像素数据中待读取至所述第二片上缓存的列;将所确定的待读取的列逐列读取到所述第二片上缓存上。
第二方面,本申请提供了用于缩放图像的方法,所述方法包括:接收图像缩放指令,所述图像缩放指令包括宽度缩放倍数以及高度缩放倍数,待缩放的原始图像为N通道图像,N是大于1的整数;执行所述图像缩放指令,向阵列处理机发出用于计算缩放后的目标图像中 的各个目标像素的像素数据的计算控制信号;使用所述计算控制信号,控制所述阵列处理机提取所述原始图像中与目标像素对应的像素的像素数据,并控制所述阵列处理机中的N个处理单元根据所述宽度缩放倍数、所述高度缩放倍数以及所提取像素数据中N个通道的通道值,并行计算目标像素中N个通道的通道值。
在一些实施例中,所述方法还包括:将所述原始图像的像素数据按序读取到片上缓存中;以及所述向阵列处理机发出用于计算缩放后的目标图像中的各个目标像素的像素数据的计算控制信号,包括:向所述阵列处理机发出从所述片上缓存提取所述原始图像与目标像素对应的像素的像素数据的计算控制信号。
在一些实施例中,所述片上缓存包括第一片上缓存和第二片上缓存,且所述第二片上缓存的读写速度大于所述第一片上缓存的读写速度;以及所述将所述原始图像的像素数据按序读取到片上缓存中,包括:将所述原始图像中的像素数据,按行依次读取到所述第一片上缓存;将所述第一片上缓存中每一行的像素数据,按列依次读取到所述第二片上缓存;以及所述向所述阵列处理机发出从所述片上缓存提取所述原始图像与目标像素对应的像素的像素数据的计算控制信号,包括:向所述阵列处理机发出从所述第二片上缓存提取所述原始图像与目标像素对应的像素的像素数据的计算控制信号。
在一些实施例中,所述向所述阵列处理机发出从所述第二片上缓存提取所述原始图像与目标像素对应的像素的像素数据的计算控制信号,包括:在完成将目标像素在所述原始图像中对应的像素的像素数据读取到所述第二片上缓存时,向所述阵列处理机发出所述计算控制信号。
在一些实施例中,所述方法还包括:利用目标像素在所述目标图像的坐标(x,y)、所述宽度缩放倍数scale_w以及所述高度缩放倍数scale_h,获取参数x0、w0、w1、y0、h0、h1的值,并传递给所述阵列处理机,其中
Figure PCTCN2016097293-appb-000002
h0=y/scale_h-y0,h1=y0-y/scale_h+1,w0=x/scale_w-x0w1=x0-x/scale_w+1;以及所述控制所述阵列处理机提取所述原始图像中与目标像素对应的像素的 像素数据,并控制所述阵列处理机中的N个处理单元根据所述宽度缩放倍数、所述高度缩放倍数以及所提取像素数据中N个通道的通道值,并行计算目标像素中N个通道的通道值,包括:控制所述阵列处理机将所述原始图像中坐标(x0,y0)、(x0+1,y0)、(x0,y0+1)、(x0+1,y0+1)所对应的相邻四个像素确定为与目标像素对应的像素并提取像素数据;控制所述阵列处理机中的N个处理单元,通过公式Y(x,y)=X(x0,y0)×w0×h0+X(x0+1,y0)×w1×h0+X(x0,y0+1)×w0×h1+X(x0+1,y0+1)×w1×h1并行计算目标像素中N个通道的通道值Y(x,y),其中X(x0,y0)、X(x0+1,y0)、X(x0,y0+1)、X(x0+1,y0+1)分别为当前通道在所述相邻四个像素中的通道值。
在一些实施例中,所述利用目标像素在所述目标图像的坐标(x,y)、所述宽度缩放倍数scale_w以及所述高度缩放倍数scale_h,获取参数x0、w0、w1、y0、h0、h1的值,包括:根据所述宽度缩放倍数scale_w以及所述高度缩放倍数scale_h,在所述处理器预先存储的数据表集合中确定对应的数据表;使用x,y的当前值在所确定的数据表中进行查询,以获得参数x0、w0、w1、y0、h0、h1的对应值;其中所述数据表集合中包括宽度缩放倍数和高度缩放倍数的各种值对应的数据表,且数据表中对当前宽度缩放倍数和高度缩放倍数下各种取值的x、y以及对应的x0、w0、w1、y0、h0、h1的值关联存储。
在一些实施例中,所述阵列处理机的N个处理单元共用乘法器组;以及所述控制所述阵列处理机中的N个处理单元,通过公式Y(x,y)=X(x0,y0)×w0×h0+X(x0+1,y0)×w1×h0+X(x0,y0+1)×w0×h1+X(x0+1,y0+1)×w1×h1并行计算目标像素中N个通道的通道值Y(x,y),包括:利用所述多个处理单元共用的乘法器组计算w0×h0、w1×h0、w0×h1、w1×h1
在一些实施例中,所述方法还包括以下至少一项:释放所述第一片上缓存中横坐标小于x0的每一行的像素数据;释放所述第二片上缓存中横坐标等于x0且纵坐标小于y0的每一列的像素数据。
在一些实施例中,所述将所述第一片上缓存中每一行的像素数据,按列依次读取到所述第二片上缓存,包括:根据所述目标图像中各个 目标像素在原始图像中对应的像素的横坐标,确定所述原始图像的像素数据中待读取至所述第一片上缓存的行;将所确定的待读取的行逐行读取到所述第一片上缓存上;和/或所述将所述第一片上缓存中每一行的像素数据,按列依次读取到所述第二片上缓存,包括:根据所述目标图像中各个目标像素在原始图像中对应的像素的纵坐标,确定所述第一片上缓存中的每一行像素数据中待读取至所述第二片上缓存的列;将所确定的待读取的列逐列读取到所述第二片上缓存上。
本申请提供的用于缩放图像的处理器和方法,利用阵列处理机中的各个处理单元对目标图像中像素的各个通道的通道值并行计算,提高了图像缩放的并行度,大大减少了图像缩放操作的执行速度。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本申请可以应用于其中的示例性系统架构图;
图2是根据本申请的用于缩放图像的处理器的一个实施例的结构示意图;
图3是根据本申请的用于缩放图像的处理器的又一个实施例的结构示意图;
图4是根据本申请的用于缩放图像的方法的一个实施例的流程图;
图5是根据本申请的用于缩放图像的方法的又一个实施例的流程图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例 中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请的用于缩放图像的处理器和方法的实施例的示例性系统架构100。如图1所示,系统架构100可以包括通用处理器101和专用服务器102。
通用处理器101用于将图像缩放指令以及待缩放的原始图像发送给专用处理器102,专用处理器102可以将按照图像缩放指令对原始图像执行缩放操作,缩放操作后的目标图像可以再发送回通用处理器101。其中,通用处理器102也可以称为主机,而专用处理器102可以是由FPGA(Field-Programmable Gate Array,即现场可编程门阵列服务器)设计而成。
需要说明的是,本申请实施例所提供的处理器一般是指图1中的专用处理器102,相应地,用于在处理器上执行指令的方法一般也由专用处理器102执行。
应该理解,图1中的通用处理器101和专用处理器102的数目仅仅是示意性的。根据实现需要,可以具有任意数目的通用处理器和专用处理器。
继续参考图2,示出了根据本申请的处理器的一种结构示意图200。
如图2所示,处理器200包括片外存储器201、片外通信器件202、控制器件203和阵列处理机204。
片外存储器201,可以用于存储待执行缩放操作的原始图像。该原始图像可以是N通道图像(N是大于1的整数)。通常,由于缩放操作只改变图像尺寸(即高度和宽度),不会对图像的通道数产生影响,缩放后的目标图像仍旧是N通道图像。例如,当N为32时,则缩放前的原始图像是32通道图像,缩放后的目标图像仍然是32通道图像。
通信器件202可以与外部的主机(例如图1中的通用处理器)进行连接,以从主机接收到图像缩放指令。通常,该图像缩放指令中可以包括宽度缩放倍数以及高度缩放倍数。在一些情况下,用户可能提 供缩放后目标图像的尺寸,主机也可以先根据原始图像的尺寸和目标图像的尺寸计算出宽度缩放倍数以及高度缩放倍数后生成图像缩放指令后,再通过通信器件发送给本申请中的处理器200。实践中,该通信器件202可以通过PCIE(Peripheral Component Interface Express,总线和接口标准)或其他总线实现与主机的连接。通信器件202可以与控制器件203电连接,从而可以将缩放指令传输给控制器件203,以供后者执行。
需要说明的是,片外存储器201上存储的原始图像也可以预先通过通信器件202从主机获取的外部输入图像,也可以是上一次执行图像缩放后临时存放在片外存储器201中等待下一次图像缩放的图像。
控制器件203,用于执行从外部接收的图像缩放指令,从而生成一系列的控制信号,并按照一定的顺序向处理器的器件发送控制信号,以使这些器件按照所需要的方式执行。这些控制信号包括用于控制阵列处理机204对缩放后的目标图像中的各个目标像素的像素数据进行计算的计算控制信号。控制器件203可以在特定的时刻将计算控制信号发送给阵列处理机204,以控制阵列处理机204在需要的时刻计算目标像素的像素数据。需要说明的是,控制器件203还可以生成并发出用于对存储器读/写操作进行控制的读/写信号。
阵列处理机203,可以在上述计算控制信号的控制下执行目标像素的像素数据的计算工作。首先,阵列处理机203可以提取出原始图像中与目标像素对应的像素的像素数据。原始图像中与目标像素对应的像素,与缩放操作所采用的具体算法有关。即,可以根据缩放操作所采用的具体算法在原始图像中定位一个或多个与当前的目标像素对应的像素。之后,阵列处理机203可以使用所包括的N个处理单元,根据宽度缩放倍数、高度缩放倍数以及所提取像素数据中N个通道的通道值,并行计算目标像素中N个通道的通道值。由于执行缩放操作时,目标像素中各个通道的通道值的计算可以使用相同的公式可以对所提取像素中相应通道的通道值进行处理,即各个通道的计算是没有互相依赖关系的过程,因此可以通过阵列处理机中重复设置的N个处理单元PE(Processing Element)进行同一组操作。
在本实施例的一些可选实现方式中,处理器200还包括片上缓存(未示出),同时,控制器件203可以包括读控制单元(未示出)和计算控制单元(未示出)。其中,计算控制单元,用于向阵列处理机204发送从片上缓存提取原始图像与目标像素对应的像素的像素数据的计算控制信号。而计算控制单元,用于向阵列处理机发送从片上缓存提取原始图像与目标像素对应的像素的像素数据的计算控制信号。通过片上缓存,可以提高计算阵列读取像素数据的速度,从而有利于提高整体的处理效率。
继续参考图3,示出了根据本申请的处理器的一种结构示意图300。
如图3所示,处理器300包括片外存储器301、通信器件302、控制器件303、片上第一片上缓存304、第二片上缓存305以及阵列处理机306。
与图2中的实施例不同的是,本实施例中的处理器300包括第一片上缓存304和第二片上缓存305,且第二片上缓存305的读写速度大于第一片上缓存304的读写速度。通常,第二片上缓存305的存储空间大于第一片上缓存304的存储空间。同时,控制器件303中的读控制单元3031用于将原始图像中的像素数据,按行依次读取到第一片上缓存304;将第一片上缓存304中每一行的像素数据,按列依次读取到第二片上缓存305;以及计算控制单元3032进一步用于:向阵列处理机306发出从第二片上缓存305提取原始图像与目标像素对应的像素的像素数据的计算控制信号。在该实现方式中,片上缓存设计为两级缓存,且第一片上缓存设计为按行从片外存储器中读取原始图像的像素数据,而第二片上缓存则设计为对第一片上缓存中的各行像素数据按列进行读取,第二片上缓存的像素数据供计算阵列进行提取以用于计算。由于设计了二级缓存且按行和列进行读取,读取数据的粒度较大,且不会造成数据的重复读取,有利于减少读取次数,从而提高对图像缩放操作的执行速度,减少操作时间。
在本实施例的一些可选实现方式中,计算控制单元3032进一步用 于:在读控制单元3031完成将目标像素在原始图像中对应的像素的像素数据读取到第二片上缓存305时,向阵列处理机306发出上述计算控制信号。这种实现方式中,由于在当前像素所需要的像素数据在二级缓存中准备好时,则发起计算命令,使得数据的访存操作与阵列处理机对数据的计算操作可以以流水形式并行执行,进一步提高整体的执行速度。
在本实施例的一些可选实现方式中,处理器300还包括:参数传递器件(未示出),用于利用目标像素在目标图像的坐标(x,y)、宽度缩放倍数scale_w以及高度缩放倍数scale_h,获取参数x0、w0、w1、y0、h0、h1的值,并传递给阵列处理机306,其中
Figure PCTCN2016097293-appb-000003
Figure PCTCN2016097293-appb-000004
h0=y/scale_h-y0,h1=y0-y/scale_h+1,w0=x/scale_w-x0,w1=x0-x/scale_w+1;以及阵列处理机进一步用于:将原始图像中坐标(x0,y0)、(x0+1,y0)、(x0,y0+1)、(x0+1,y0+1)所对应的相邻四个像素确定为与目标像素对应的像素并提取像素数据;使用阵列处理机中的N个处理单元,通过公式Y(x,y)=X(x0,y0)×w0×h0+X(x0+1,y0)×w1×h0+X(x0,y0+1)×w0×h1+X(x0+1,y0+1)×w1×h1并行计算目标像素中N个通道的通道值Y(x,y),其中X(x0,y0)、X(x0+1,y0)、X(x0,y0+1)、X(x0+1,y0+1)分别为当前通道在相邻四个像素中的通道值。在该实现方式中,由于缩放操作中目标像素在原始图像中对应的像素是2×2的相邻四个像素,上述读控制单元3031可以控制第一片上缓存304依次读取片外存储器301的每两行像素的像素数据。对应的,第二片上缓存305可以依次对第一片上缓存304中每两行像素的像素数据,依次读取每两列的像素数据,即可每次获取到2×2的相邻四个像素的像素数据,从而用于进行后续计算。其中,
Figure PCTCN2016097293-appb-000005
是地板函数,表示小于所运算数字的最大整数。在该实现方式中,处理器200执行图像缩放所使用的公式为Y(x,y)=X(x0,y0)×w0×h0+X(x0+1,y0)×w1×h0+X(x0,y0+1)×w0×h1+X(x0+1,y0+1)×w1×h1,该方式中可以先获取到所需要的各个参数,然后阵列处理机根据参数在原始图像中确定对应的四个相邻像素,最后再利用阵列处理机的各个处理单元分别使用该公式计算各通道的通道值。
在本实施例的一些可选实现方式中,参数传递器件进一步用于:根据scale_w以及scale_h,在所述处理器预先存储的数据表集合中确定对应的数据表;使用x,y的当前值在所确定的数据表中进行查询,以获得参数x0、w0、w1、y0、h0、h1的对应值;其中数据表集合中包括宽度缩放倍数和高度缩放倍数的各种值对应的数据表,且数据表中对当前宽度缩放倍数和高度缩放倍数下各种取值的x、y以及对应的x0、w0、w1、y0、h0、h1的值关联存储。由于缩放操作所需要的参数的计算是重复工作,在该实现方式中可以将这些参数预先计算出来后配置在数据表中,处理器中的参数传递器件只需要在需要时查询数据表即可得到,减少了大量重复的参数计算操作,进一步提高了图像缩放整体的执行速度。
在本实施例的一些可选实现方式中,阵列处理机306的N个处理单元共用乘法器组,阵列处理机306进一步用于:利用N个处理单元共用的乘法器组计算w0×h0、w1×h0、w0×h1、w1×h1
图4示出了该实现方式中计算阵列中处理单元400的结构。处理单元400包括与其他各个处理单元(未示出)共用的乘法器组401。在计算阵列通过上述公式并行计算N个通道的通道值时,可以针对处理单元400所需要处理的通道,将上述相邻四个像素中该通道的通道值X(x0,y0)、X(x0+1,y0)、X(x0,y0+1)、X(x0+1,y0+1)通过input0、input1、input2和input3等输入口输入至处理单元400所独用的四个乘法器。并且,h0、h1、w0、w1等参数如图4中所示的方式输入至处理单元400与其他处理单元所共用的乘法器组401,以通过乘法器组401执行w0×h0、w1×h0、w0×h1、w1×h1等操作。之后,乘法器组401中各个乘法器所计算的结果并行输入至处理单元400中上述独用的四个乘法器以及其他各个处理单元中独用的四个乘法器。处理单元400中所独用的四个乘法器以及后续的各个加法器则对上述公式中未计算完的部分继续进行计算,得到目标像素中对应通道的通道值。其它处理单元分别将至各个处理单元中计算出来的结果再输入各个处理单元后续的部分,并与输入至各个处理单元中对应通道的通道值,执行上述公式中的后续计算。其它处理单元与处理单元400以相同方式对四个像素 中各自对应的通道值进行计算,并得到目标像素中对应的通道值,这里不再赘述。在该实现方式中,对于各个处理单元,w0×h0、w1×h0、w0×h1、w1×h1的计算是完全相同的操作,因此可以通过共有的乘法器组执行上述操作,以避免各个处理单元重复执行上述操作,从而减少阵列处理机中所布置的元件。
在本实施例的一些可选实现方式中,处理器300还包括缓存管理器件(未示出),该缓存管理器件用于以下至少一项:释放第一片上缓存304中横坐标小于x0的每一行的像素数据;释放第二片上缓存305中横坐标等于x0且纵坐标小于y0的每一列的像素数据。在该方式中,由于缩放操作中像素的操作按序执行,当前所计算的目标像素在原始图像中对应的像素是(x0,y0)、(x0+1,y0)、(x0,y0+1)、(x0+1,y0+1),这意味着第一片上缓存304中的各行数据中,横坐标小于x0的行在后续过程中不会使用到,因此可以将第一片上缓存304中这些行的像素数据释放掉,使得释放后产生的空间可以用于后续像素数据的读取。同理,也可以将第一片上缓存305中横坐标小于x0且纵坐标小于y0的列对应的像素数据释放掉,这些数据不会在后续过程中使用到,也可以及时释放空间用于读取后续数据。
在本实施例的一些可选实现方式中,读操作控制器件3031进一步用于:根据目标图像中各个目标像素在原始图像中对应的像素的横坐标,确定原始图像的像素数据中待读取至第一片上缓存304的行;将所确定的待读取的行逐行读取到第一片上缓存304上。读操作控制器件3031也可以进一步用于:根据目标图像中各个目标像素在原始图像中对应的像素的纵坐标,确定第一片上缓存304中的每一行像素数据中待读取至第二片上缓存305的列;将所确定的待读取的列逐列读取到第二片上缓存305上。在该实现方式中,读操作控制器件可以仅对目标像素在原始图像中所对应的像素的数据进行逐行与逐列读取。在图像缩放倍数较大时原始图像中的一些像素不会被使用到,该实现方式中可以减少读取这些不参与计算的像素的时间,从而有利于进一步提高整体的执行速度。
本申请的上述实施例提供的处理器在处理图像缩放时,利用阵列 处理机中的各个处理单元对目标图像中像素的各个通道的通道值并行计算,提高了图像缩放的并行度,大大减少了图像缩放操作的执行速度。
进一步参考图4,其示出了用于缩放图像的方法的一个实施例的流程400。该用于缩放图像的方法的流程400,包括以下步骤:
步骤401,接收图像缩放指令。
在本实施例中,用于缩放图像的方法运行于其上的处理器(例如图1所示的专用处理器)可以通过总线从外部(例如图1中的通用处理器)接收图像缩放指令,该图像缩放指令可以包括宽度缩放倍数以及高度缩放倍数。待缩放的原始图像为N通道图像,N是大于1的整数。该缩放图像可以是存储在处理器内部,也可以从外部获取。
步骤402,执行图像缩放指令,向阵列处理机发出用于计算缩放后的目标图像中的各个目标像素的像素数据的计算控制信号。
在本实施例中,处理器(例如图1中的专用处理器)可以执行图像缩放指令,向阵列处理机发出用于计算缩放后的目标图像中的各个目标像素的像素数据的计算控制信号。通常,可以根据原始图像的尺寸以及图像缩放指令中的宽度缩放倍数以及高度缩放倍数确定出目标图像的尺寸,即可以确定目标图像中需要计算像素数据的目标像素。针对目标像素,处理器可以向阵列处理机发出计算缩放后的目标图像中的各个目标像素的像素数据的计算控制信号,以使阵列处理机执行后续计算操作。
针对目标图像中各个目标像素,处理器可以使用阵列处理机提取原始图像与目标像素对应的像素的像素数据。与目标像素对应的像素,可以根据图像缩放操作所使用的缩放算法确定。例如,当缩放算法采用公式Y(x,y)=X(x0,y0)×w0×h0+X(x0+1,y0)×w1×h0+X(x0,y0+1)×w0×h1+X(x0+1,y0+1)×w1×h1时,则目标像素对应的像素是原始图像中坐标(x0,y0)、(x0+1,y0)、(x0,y0+1)、(x0+1,y0+1)所对应的相邻四个像素。
步骤403,使用计算控制信号,控制阵列处理机提取原始图像中与目标像素对应的像素的像素数据,并控制阵列处理机中的N个处理 单元根据宽度缩放倍数、高度缩放倍数以及所提取像素数据中N个通道的通道值,并行计算目标像素中N个通道的通道值。
在本实施例中,基于步骤402中向计算阵列发出的计算控制信号,处理器可以通过该控制信号,控制阵列处理机计算缩放后的目标图像中的各个目标像素的像素数据。在该控制信号的控制下,阵列处理机可以执行以下操作:首先,提取原始图像中与目标像素对应的像素的像素数据到阵列处理机。由于原始图像是N通道图像,像素的像素数据包括N通道中各个通道的通道值。之后,阵列处理机可以使用N个处理单元并根据所提取像素数据中对应通道的通道值,并行计算目标像素中N通道中的通道值,从而实现多通道的并行处理。
在本实施例的一些可选实现方式中,上述方法还包括:将原始图像的像素数据按序读取到片上缓存中;以及向阵列处理机发出用于计算缩放后的目标图像中的各个目标像素的像素数据的计算控制信号,包括:向阵列处理机发出从片上缓存提取原始图像与目标像素对应的像素的像素数据的计算控制信号。该实现方式的具体处理可以参考图2对应实施例中相应的实现方式,这里不再赘述。
在本实施例的一些可选实现方式中,片上缓存包括第一片上缓存和第二片上缓存,且第二片上缓存的读写速度大于第一片上缓存的读写速度;以及将原始图像的像素数据按序读取到片上缓存中,包括:将原始图像中的像素数据,按行依次读取到第一片上缓存;将第一片上缓存中每一行的像素数据,按列依次读取到第二片上缓存;以及向阵列处理机发出从片上缓存提取原始图像与目标像素对应的像素的像素数据的计算控制信号,包括:向阵列处理机发出从第二片上缓存提取原始图像与目标像素对应的像素的像素数据的计算控制信号。该实现方式的具体处理可以参考图3对应实施例,这里不再赘述。
在本实施例的一些可选实现方式中,上述向阵列处理机发出从第二片上缓存提取原始图像与目标像素对应的像素的像素数据的计算控制信号,包括:在完成将目标像素在原始图像中对应的像素的像素数据读取到第二片上缓存时,向阵列处理机发出计算控制信号。该实现方式的具体处理可以参考图3对应实施例中相应的实现方式,这里不 再赘述。
在本实施例的一些可选实现方式中,上述方法还包括:利用目标像素在目标图像的坐标(x,y)、宽度缩放倍数scale_w以及高度缩放倍数scale_h,获取参数x0、w0、w1、y0、h0、h1的值,并传递给阵列处理机,其中
Figure PCTCN2016097293-appb-000006
h0=y/scale_h-y0,h1=y0-y/scale_h+1,w0=x/scale_w-x0,w1=x0-x/scale_w+1。同时,步骤403中控制阵列处理机提取原始图像中与目标像素对应的像素的像素数据,并控制阵列处理机中的N个处理单元根据宽度缩放倍数、高度缩放倍数以及所提取像素数据中N个通道的通道值,并行计算目标像素中N个通道的通道值,具体包括以下步骤:控制阵列处理机将原始图像中坐标(x0,y0)、(x0+1,y0)、(x0,y0+1)、(x0+1,y0+1)所对应的相邻四个像素确定为与目标像素对应的像素并提取像素数据;控制阵列处理机中的N个处理单元,通过公式Y(x,y)=X(x0,y0)×w0×h0+X(x0+1,y0)×w1×h0+X(x0,y0+1)×w0×h1+X(x0+1,y0+1)×w1×h1并行计算目标像素中N个通道的通道值Y(x,y),其中X(x0,y0)、X(x0+1,y0)、X(x0,y0+1)、X(x0+1,y0+1)分别为当前通道在相邻四个像素中的通道值。该实现方式的具体处理可以参考图3对应实施例中相应的实现方式,这里不再赘述。
在本实施例的一些可选实现方式中,上述利用目标像素在目标图像的坐标(x,y)、宽度缩放倍数scale_w以及高度缩放倍数scale_h,获取参数x0、w0、w1、y0、h0、h1的值,进一步包括以下步骤:根据宽度缩放倍数scale_w以及高度缩放倍数scale_h,在处理器中预先存储的数据表集合中确定对应的数据表;使用x,y的当前值在所确定的数据表中进行查询,以获得参数x0、w0、w1、y0、h0、h1的对应值;其中数据表集合中包括宽度缩放倍数和高度缩放倍数的各种值对应的数据表,且数据表中对当前宽度缩放倍数和高度缩放倍数下各种取值的x、y以及对应的x0、w0、w1、y0、h0、h1的值关联存储。该实现方式的具体处理可以参考图3对应实施例中相应的实现方式,这里不再赘述。
在本实施例的一些可选实现方式中,上述阵列处理机长的多个处 理单元共用乘法器组。上述控制阵列处理机中的N个处理单元,通过公式Y(x,y)=X(x0,y0)×w0×h0+X(x0+1,y0)×w1×h0+X(x0,y0+1)×w0×h1+X(x0+1,y0+1)×w1×h1并行计算目标像素中N个通道的通道值Y(x,y),其中X(x0,y0)、X(x0+1,y0)、X(x0,y0+1)、X(x0+1,y0+1)分别为当前通道在相邻四个像素中的通道值,包括:利用多个处理单元共用的乘法器组计算w0×h0、w1×h0、w0×h1、w1×h1。该实现方式的具体处理可以参考图3对应实施例中相应的实现方式,这里不再赘述。
在本实施例的一些可选实现方式中,上述方法还包括以下至少一项:释放第一片上缓存中横坐标小于x0的每一行的像素数据;释放第二片上缓存中横坐标等于x0且纵坐标小于y0的每一列的像素数据。该实现方式的具体处理可以参考图3对应实施例中相应的实现方式,这里不再赘述。
在本实施例的一些可选实现方式中,上述将第一片上缓存中每一行的像素数据,按列依次读取到第二片上缓存,包括:根据目标图像中各个目标像素在原始图像中对应的像素的横坐标,确定原始图像的像素数据中待读取至第一片上缓存的行;将所确定的待读取的行逐行读取到第一片上缓存上。和/或,将第一片上缓存中每一行的像素数据,按列依次读取到第二片上缓存,包括:根据目标图像中各个目标像素在原始图像中对应的像素的纵坐标,确定第一片上缓存中的每一行像素数据中待读取至第二片上缓存的列;将所确定的待读取的列逐列读取到第二片上缓存上。该实现方式的具体处理可以参考图3对应实施例中相应的实现方式,这里不再赘述。
需要说明的是,附图中的框图和流程图,图示了按照本申请各种实施例的系统、处理器和方法的可能实现的体系架构、功能和操作。应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。
作为另一方面,本申请还提供了一种非易失性计算机存储介质,该非易失性计算机存储介质可以是上述实施例中所述装置中所包含的 非易失性计算机存储介质;也可以是单独存在,未装配入终端中的非易失性计算机存储介质。上述非易失性计算机存储介质存储有一个或者多个程序,当所述一个或者多个程序被一个设备执行时,使得所述设备:接收图像缩放指令,所述图像缩放指令包括宽度缩放倍数以及高度缩放倍数,待缩放的原始图像为N通道图像,N是大于1的整数;执行所述图像缩放指令,向阵列处理机发出用于计算缩放后的目标图像中的各个目标像素的像素数据的计算控制信号;使用所述计算控制信号,控制所述阵列处理机提取所述原始图像中与目标像素对应的像素的像素数据,并控制所述阵列处理机中的N个处理单元根据所述宽度缩放倍数、所述高度缩放倍数以及所提取像素数据中N个通道的通道值,并行计算目标像素中N个通道的通道值。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离所述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (20)

  1. 一种用于缩放图像的处理器,其特征在于,所述处理器包括片外存储器、通信器件、控制器件以及阵列处理机,其中:
    所述片外存储器,用于存储待缩放的原始图像,所述原始图像为N通道图像,N是大于1的整数;
    所述通信器件,用于接收图像缩放指令,所述图像缩放指令包括宽度缩放倍数以及高度缩放倍数;
    所述控制器件,用于执行所述图像缩放指令,向所述阵列处理机发出用于计算缩放后的目标图像中的各个目标像素的像素数据的计算控制信号;
    所述阵列处理机,用于在所述计算控制信号的控制下,提取所述原始图像中与目标像素对应的像素的像素数据,以及使用所述阵列处理机中的N个处理单元根据所述宽度缩放倍数、所述高度缩放倍数以及所提取像素数据中N个通道的通道值,并行计算目标像素中N个通道的通道值。
  2. 根据权利要求1所述的处理器,其特征在于,所述处理器还包括片上缓存;以及
    所述控制器件包括:
    读控制单元,用于将所述片外存储器中的原始图像的像素数据按序读取到所述片上缓存中;以及
    计算控制单元,用于向所述阵列处理机发送从所述片上缓存提取所述原始图像与目标像素对应的像素的像素数据的计算控制信号。
  3. 根据权利要求2所述的处理器,其特征在于,所述片上缓存包括第一片上缓存和第二片上缓存,且所述第二片上缓存的读写速度大于所述第一片上缓存的读写速度;以及
    所述读控制单元进一步用于:
    将所述原始图像中的像素数据,按行依次读取到所述第一片上缓 存;
    将所述第一片上缓存中每一行的像素数据,按列依次读取到所述第二片上缓存;以及
    所述计算控制单元进一步用于:
    向所述阵列处理机发出从所述第二片上缓存提取所述原始图像与目标像素对应的像素的像素数据的计算控制信号。
  4. 根据权利要求3所述的处理器,其特征在于,所述计算控制单元进一步用于:
    在读控制单元完成将目标像素在所述原始图像中对应的像素的像素数据读取到所述第二片上缓存时,向所述阵列处理机发出所述计算控制信号。
  5. 根据权利要求4所述的处理器,其特征在于,所述处理器还包括:
    参数传递器件,用于利用目标像素在所述目标图像的坐标(x,y)、所述宽度缩放倍数scale_w以及所述高度缩放倍数scale_h,获取参数x0、w0、w1、y0、h0、h1的值,并传递给所述阵列处理机,其中
    Figure PCTCN2016097293-appb-100001
    Figure PCTCN2016097293-appb-100002
    h0=y/scale_h-y0,h1=y0-y/scale_h+1,w0=x/scale_w-x0,w1=x0-x/scale_w+1;以及
    所述阵列处理机进一步用于:
    将所述原始图像中坐标(x0,y0)、(x0+1,y0)、(x0,y0+1)、(x0+1,y0+1)所对应的相邻四个像素确定为与目标像素对应的像素并提取像素数据;
    使用阵列处理机中的N个处理单元,通过公式Y(x,y)=X(x0,y0)×w0×h0+X(x0+1,y0)×w1×h0+X(x0,y0+1)×w0×h1+X(x0+1,y0+1)×w1×h1并行计算目标像素中N个通道的通道值Y(x,y),其中X(x0,y0)、X(x0+1,y0)、X(x0,y0+1)、X(x0+1,y0+1)分别为当前通道在所述相邻四个像素中的通道值。
  6. 根据权利要求5所述的处理器,其特征在于,所述参数传递器件进一步用于:
    根据所述宽度缩放倍数scale_w以及所述高度缩放倍数scale_h,在所述处理器预先存储的数据表集合中确定对应的数据表;
    使用x,y的当前值在所确定的数据表中进行查询,以获得参数x0、w0、w1、y0、h0、h1的对应值;
    其中所述数据表集合中包括宽度缩放倍数和高度缩放倍数的各种值对应的数据表,且数据表中对当前宽度缩放倍数和高度缩放倍数下各种取值的x、y以及对应的x0、w0、w1、y0、h0、h1的值关联存储。
  7. 根据权利要求5所述的处理器,其特征在于,所述所述阵列处理机的N个处理单元共用乘法器组,以及
    所述阵列处理机进一步用于:
    利用所述多个处理单元共用的乘法器组计算w0×h0、w1×h0、w0×h1、w1×h1
  8. 根据权利要求5所述的处理器,其特征在于,所述处理器还包括缓存管理器件,用于以下至少一项:
    释放所述第一片上缓存中横坐标小于x0的每一行的像素数据;
    释放所述第二片上缓存中横坐标等于x0且纵坐标小于y0的每一列的像素数据。
  9. 根据权利要求5-8之一所述的处理器,其特征在于,所述读操作控制器件进一步用于:
    根据所述目标图像中各个目标像素在原始图像中对应的像素的横坐标,确定所述原始图像的像素数据中待读取至所述第一片上缓存的行;
    将所确定的待读取的行逐行读取到所述第一片上缓存上;
    和/或
    根据所述目标图像中各个目标像素在原始图像中对应的像素的纵 坐标,确定所述第一片上缓存中的每一行像素数据中待读取至所述第二片上缓存的列;
    将所确定的待读取的列逐列读取到所述第二片上缓存上。
  10. 一种用于缩放图像的方法,其特征在于,所述方法包括:
    接收图像缩放指令,所述图像缩放指令包括宽度缩放倍数以及高度缩放倍数,待缩放的原始图像为N通道图像,N是大于1的整数;
    执行所述图像缩放指令,向阵列处理机发出用于计算缩放后的目标图像中的各个目标像素的像素数据的计算控制信号;
    使用所述计算控制信号,控制所述阵列处理机提取所述原始图像中与目标像素对应的像素的像素数据,并控制所述阵列处理机中的N个处理单元根据所述宽度缩放倍数、所述高度缩放倍数以及所提取像素数据中N个通道的通道值,并行计算目标像素中N个通道的通道值。
  11. 根据权利要求10所述的方法,其特征在于,所述方法还包括:
    将所述原始图像的像素数据按序读取到片上缓存中;以及
    所述向阵列处理机发出用于计算缩放后的目标图像中的各个目标像素的像素数据的计算控制信号,包括:
    向所述阵列处理机发出从所述片上缓存提取所述原始图像与目标像素对应的像素的像素数据的计算控制信号。
  12. 根据权利要求11所述的方法,其特征在于,所述片上缓存包括第一片上缓存和第二片上缓存,且所述第二片上缓存的读写速度大于所述第一片上缓存的读写速度;以及
    所述将所述原始图像的像素数据按序读取到片上缓存中,包括:
    将所述原始图像中的像素数据,按行依次读取到所述第一片上缓存;
    将所述第一片上缓存中每一行的像素数据,按列依次读取到所述第二片上缓存;以及
    所述向所述阵列处理机发出从所述片上缓存提取所述原始图像与 目标像素对应的像素的像素数据的计算控制信号,包括:
    向所述阵列处理机发出从所述第二片上缓存提取所述原始图像与目标像素对应的像素的像素数据的计算控制信号。
  13. 根据权利要求12所述的方法,其特征在于,所述向所述阵列处理机发出从所述第二片上缓存提取所述原始图像与目标像素对应的像素的像素数据的计算控制信号,包括:
    在完成将目标像素在所述原始图像中对应的像素的像素数据读取到所述第二片上缓存时,向所述阵列处理机发出所述计算控制信号。
  14. 根据权利要求13所述的方法,其特征在于,所述方法还包括:
    利用目标像素在所述目标图像的坐标(x,y)、所述宽度缩放倍数scale_w以及所述高度缩放倍数scale_h,获取参数x0、w0、w1、y0、h0、h1的值,并传递给所述阵列处理机,其中
    Figure PCTCN2016097293-appb-100003
    Figure PCTCN2016097293-appb-100004
    h0=y/scale_h-y0,h1=y0-y/scale_h+1,w0=x/scale_w-x0,w1=x0-x/scale_w+1;以及
    所述控制所述阵列处理机提取所述原始图像中与目标像素对应的像素的像素数据,并控制所述阵列处理机中的N个处理单元根据所述宽度缩放倍数、所述高度缩放倍数以及所提取像素数据中N个通道的通道值,并行计算目标像素中N个通道的通道值,包括:
    控制所述阵列处理机将所述原始图像中坐标(x0,y0)、(x0+1,y0)、(x0,y0+1)、(x0+1,y0+1)所对应的相邻四个像素确定为与目标像素对应的像素并提取像素数据;
    控制所述阵列处理机中的N个处理单元,通过公式Y(x,y)=X(x0,y0)×w0×h0+X(x0+1,y0)×w1×h0+X(x0,y0+1)×w0×h1+X(x0+1,y0+1)×w1×h1并行计算目标像素中N个通道的通道值Y(x,y),其中X(x0,y0)、X(x0+1,y0)、X(x0,y0+1)、X(x0+1,y0+1)分别为当前通道在所述相邻四个像素中的通道值。
  15. 根据权利要求14所述的方法,其特征在于,所述利用目标像 素在所述目标图像的坐标(x,y)、所述宽度缩放倍数scale_w以及所述高度缩放倍数scale_h,获取参数x0、w0、w1、y0、h0、h1的值,包括:
    根据所述宽度缩放倍数scale_w以及所述高度缩放倍数scale_h,在所述处理器预先存储的数据表集合中确定对应的数据表;
    使用x,y的当前值在所确定的数据表中进行查询,以获得参数x0、w0、w1、y0、h0、h1的对应值;
    其中所述数据表集合中包括宽度缩放倍数和高度缩放倍数的各种值对应的数据表,且数据表中对当前宽度缩放倍数和高度缩放倍数下各种取值的x、y以及对应的x0、w0、w1、y0、h0、h1的值关联存储。
  16. 根据权利要求14所述的方法,其特征在于,所述阵列处理机的N个处理单元共用乘法器组;以及
    所述控制所述阵列处理机中的N个处理单元,通过公式Y(x,y)=X(x0,y0)×w0×h0+X(x0+1,y0)×w1×h0+X(x0,y0+1)×w0×h1+X(x0+1,y0+1)×w1×h1并行计算目标像素中N个通道的通道值Y(x,y),包括:
    利用所述多个处理单元共用的乘法器组计算w0×h0、w1×h0、w0×h1、w1×h1
  17. 根据权利要求14所述的方法,其特征在于,所述方法还包括以下至少一项:
    释放所述第一片上缓存中横坐标小于x0的每一行的像素数据;
    释放所述第二片上缓存中横坐标等于x0且纵坐标小于y0的每一列的像素数据。
  18. 根据权利要求14-17之一所述的方法,其特征在于,所述将所述第一片上缓存中每一行的像素数据,按列依次读取到所述第二片上缓存,包括:
    根据所述目标图像中各个目标像素在原始图像中对应的像素的横坐标,确定所述原始图像的像素数据中待读取至所述第一片上缓存的 行;
    将所确定的待读取的行逐行读取到所述第一片上缓存上;
    和/或
    所述将所述第一片上缓存中每一行的像素数据,按列依次读取到所述第二片上缓存,包括:
    根据所述目标图像中各个目标像素在原始图像中对应的像素的纵坐标,确定所述第一片上缓存中的每一行像素数据中待读取至所述第二片上缓存的列;
    将所确定的待读取的列逐列读取到所述第二片上缓存上。
  19. 一种设备,包括:
    处理器;和
    存储器,
    所述存储器中存储有能够被所述处理器执行的计算机可读指令,在所述计算机可读指令被执行时,所述处理器执行用于缩放图像的方法,所述方法包括:
    接收图像缩放指令,所述图像缩放指令包括宽度缩放倍数以及高度缩放倍数,待缩放的原始图像为N通道图像,N是大于1的整数;
    执行所述图像缩放指令,向阵列处理机发出用于计算缩放后的目标图像中的各个目标像素的像素数据的计算控制信号;
    使用所述计算控制信号,控制所述阵列处理机提取所述原始图像中与目标像素对应的像素的像素数据,并控制所述阵列处理机中的N个处理单元根据所述宽度缩放倍数、所述高度缩放倍数以及所提取像素数据中N个通道的通道值,并行计算目标像素中N个通道的通道值。
  20. 一种非易失性计算机存储介质,所述计算机存储介质存储有能够被处理器执行的计算机可读指令,当所述计算机可读指令被处理器执行时,所述处理器执行用于缩放图像的方法,所述方法包括:
    接收图像缩放指令,所述图像缩放指令包括宽度缩放倍数以及高度缩放倍数,待缩放的原始图像为N通道图像,N是大于1的整数;
    执行所述图像缩放指令,向阵列处理机发出用于计算缩放后的目标图像中的各个目标像素的像素数据的计算控制信号;
    使用所述计算控制信号,控制所述阵列处理机提取所述原始图像中与目标像素对应的像素的像素数据,并控制所述阵列处理机中的N个处理单元根据所述宽度缩放倍数、所述高度缩放倍数以及所提取像素数据中N个通道的通道值,并行计算目标像素中N个通道的通道值。
PCT/CN2016/097293 2016-08-01 2016-08-30 用于缩放图像的处理器和方法 WO2018023847A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/265,566 US10922785B2 (en) 2016-08-01 2019-02-01 Processor and method for scaling image

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610621655.XA CN107680028B (zh) 2016-08-01 2016-08-01 用于缩放图像的处理器和方法
CN201610621655.X 2016-08-01

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/265,566 Continuation US10922785B2 (en) 2016-08-01 2019-02-01 Processor and method for scaling image

Publications (1)

Publication Number Publication Date
WO2018023847A1 true WO2018023847A1 (zh) 2018-02-08

Family

ID=61072437

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/097293 WO2018023847A1 (zh) 2016-08-01 2016-08-30 用于缩放图像的处理器和方法

Country Status (3)

Country Link
US (1) US10922785B2 (zh)
CN (1) CN107680028B (zh)
WO (1) WO2018023847A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276836A (zh) * 2018-03-13 2019-09-24 幻视互动(北京)科技有限公司 一种可加速特征点检测的方法及mr混合现实智能眼镜
CN110555802B (zh) * 2019-08-02 2021-04-20 华中科技大学 为图像并行运算电路提供数据的多像素拼接方法和系统
CN111369444B (zh) * 2020-03-31 2024-02-27 浙江大华技术股份有限公司 一种图像缩放处理方法及装置
WO2023154109A1 (en) * 2022-02-10 2023-08-17 Innopeak Technology, Inc. Methods and systems for upscaling video graphics
CN115471404B (zh) * 2022-10-28 2023-03-24 武汉中观自动化科技有限公司 图像缩放方法、处理设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1670766A (zh) * 2004-03-17 2005-09-21 德鑫科技股份有限公司 图像的缩放方法
US7151861B2 (en) * 2001-09-18 2006-12-19 Vixs Systems Inc. Raster image transformation circuit using micro-code and method
CN101183521A (zh) * 2007-11-16 2008-05-21 炬力集成电路设计有限公司 一种图像缩放装置、方法及图像显示设备
CN101950523A (zh) * 2010-09-21 2011-01-19 上海大学 可调矩形窗图像缩放方法及装置
CN102890816A (zh) * 2011-07-20 2013-01-23 深圳市快播科技有限公司 视频图像缩放处理方法以及视频图像缩放处理装置
CN104361555A (zh) * 2014-11-24 2015-02-18 中国航空工业集团公司洛阳电光设备研究所 一种基于fpga的红外图像缩放方法

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5113494A (en) * 1987-02-27 1992-05-12 Eastman Kodak Company High speed raster image processor particularly suited for use in an image management system
EP0449478A3 (en) * 1990-03-29 1992-11-25 Microtime Inc. 3d video special effects system
US5592574A (en) * 1992-04-06 1997-01-07 Ricoh Company Ltd. Method and apparatus for expansion of white space in document images on a digital scanning device
CN1377495A (zh) * 1999-10-04 2002-10-30 松下电器产业株式会社 显示面板的驱动方法、显示面板的亮度校正装置及其驱动装置
US20040130546A1 (en) * 2003-01-06 2004-07-08 Porikli Fatih M. Region growing with adaptive thresholds and distance function parameters
US7788579B2 (en) * 2006-03-06 2010-08-31 Ricoh Co., Ltd. Automated document layout design
US8089555B2 (en) * 2007-05-25 2012-01-03 Zoran Corporation Optical chromatic aberration correction and calibration in digital cameras
US8103129B2 (en) * 2008-01-21 2012-01-24 Broadcom Corporation System(s), and method(s) for non-linear scaling of source pictures to a destination screen
US8422783B2 (en) * 2008-06-25 2013-04-16 Sharp Laboratories Of America, Inc. Methods and systems for region-based up-scaling
US8520979B2 (en) * 2008-08-19 2013-08-27 Digimarc Corporation Methods and systems for content processing
US8581916B2 (en) * 2009-06-26 2013-11-12 Intel Corporation Graphics analysis techniques
CN101894362B (zh) * 2010-07-05 2012-02-01 昆山龙腾光电有限公司 一种图像放大装置及方法
US8625902B2 (en) * 2010-07-30 2014-01-07 Qualcomm Incorporated Object recognition using incremental feature extraction
US8639053B2 (en) * 2011-01-18 2014-01-28 Dimension, Inc. Methods and systems for up-scaling a standard definition (SD) video to high definition (HD) quality
US8711167B2 (en) * 2011-05-10 2014-04-29 Nvidia Corporation Method and apparatus for generating images using a color field sequential display
US20130007602A1 (en) * 2011-06-29 2013-01-03 Apple Inc. Fixed layout electronic publications
AU2011213795A1 (en) * 2011-08-19 2013-03-07 Canon Kabushiki Kaisha Efficient cache reuse through application determined scheduling
US10289924B2 (en) * 2011-10-17 2019-05-14 Sharp Laboratories Of America, Inc. System and method for scanned document correction
JP5547226B2 (ja) * 2012-03-16 2014-07-09 株式会社東芝 画像処理装置、及び画像処理方法
US9105078B2 (en) * 2012-05-31 2015-08-11 Apple Inc. Systems and methods for local tone mapping
US9332239B2 (en) * 2012-05-31 2016-05-03 Apple Inc. Systems and methods for RGB image processing
CN103369338B (zh) * 2013-06-25 2016-01-20 四川虹视显示技术有限公司 基于fpga的近眼双目成像系统的图像处理系统及方法
US9449239B2 (en) * 2014-05-30 2016-09-20 Apple Inc. Credit card auto-fill
US9342894B1 (en) * 2015-05-01 2016-05-17 Amazon Technologies, Inc. Converting real-type numbers to integer-type numbers for scaling images
US10489703B2 (en) * 2015-05-20 2019-11-26 Nec Corporation Memory efficiency for convolutional neural networks operating on graphics processing units

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7151861B2 (en) * 2001-09-18 2006-12-19 Vixs Systems Inc. Raster image transformation circuit using micro-code and method
CN1670766A (zh) * 2004-03-17 2005-09-21 德鑫科技股份有限公司 图像的缩放方法
CN101183521A (zh) * 2007-11-16 2008-05-21 炬力集成电路设计有限公司 一种图像缩放装置、方法及图像显示设备
CN101950523A (zh) * 2010-09-21 2011-01-19 上海大学 可调矩形窗图像缩放方法及装置
CN102890816A (zh) * 2011-07-20 2013-01-23 深圳市快播科技有限公司 视频图像缩放处理方法以及视频图像缩放处理装置
CN104361555A (zh) * 2014-11-24 2015-02-18 中国航空工业集团公司洛阳电光设备研究所 一种基于fpga的红外图像缩放方法

Also Published As

Publication number Publication date
CN107680028B (zh) 2020-04-21
CN107680028A (zh) 2018-02-09
US10922785B2 (en) 2021-02-16
US20190164254A1 (en) 2019-05-30

Similar Documents

Publication Publication Date Title
WO2018023847A1 (zh) 用于缩放图像的处理器和方法
JP6977239B2 (ja) 行列乗算器
JP7256914B2 (ja) ベクトル縮小プロセッサ
US20200234124A1 (en) Winograd transform convolution operations for neural networks
WO2019184657A1 (zh) 图像识别方法、装置、电子设备及存储介质
CN110188869B (zh) 一种基于卷积神经网络算法的集成电路加速计算的方法及系统
CN107909537B (zh) 一种基于卷积神经网络的图像处理方法及移动终端
JP2013511106A (ja) 画素速度での画像処理のための方法および装置
WO2018176882A1 (zh) 一种矩阵与矢量的乘法运算方法及装置
US10310998B2 (en) Direct memory access with filtering
CN112784973A (zh) 卷积运算电路、装置以及方法
WO2021072732A1 (zh) 矩阵运算电路、装置以及方法
CN111984189A (zh) 神经网络计算装置和数据读取、数据存储方法及相关设备
WO2019041264A1 (zh) 图像处理装置、方法及相关电路
US11995569B2 (en) Architecture to support tanh and sigmoid operations for inference acceleration in machine learning
Lysakov et al. FPGA-based hardware accelerator for high-performance data-stream processing
US10127040B2 (en) Processor and method for executing memory access and computing instructions for host matrix operations
KR20070082835A (ko) 직접 메모리 액세스 제어장치 및 방법
US20200159495A1 (en) Processing apparatus and method of processing add operation therein
WO2016197393A1 (zh) 并行多相位图像插值装置和方法
CN114022366B (zh) 基于数据流架构的图像尺寸调整装置、调整方法及设备
JP6414388B2 (ja) アクセラレータ回路及び画像処理装置
TW201322774A (zh) 用於視頻分析與編碼之多重串流處理技術
CN107871162B (zh) 一种基于卷积神经网络的图像处理方法及移动终端
KR20200056898A (ko) 프로세싱 장치 및 프로세싱 장치에서 덧셈 연산을 처리하는 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16911442

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16911442

Country of ref document: EP

Kind code of ref document: A1