CN108416730A - A kind of image processing method and device - Google Patents

A kind of image processing method and device Download PDF

Info

Publication number
CN108416730A
CN108416730A CN201710071029.2A CN201710071029A CN108416730A CN 108416730 A CN108416730 A CN 108416730A CN 201710071029 A CN201710071029 A CN 201710071029A CN 108416730 A CN108416730 A CN 108416730A
Authority
CN
China
Prior art keywords
pixel point
point data
vector register
group
storing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710071029.2A
Other languages
Chinese (zh)
Other versions
CN108416730B (en
Inventor
安爱女
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen ZTE Microelectronics Technology Co Ltd
Original Assignee
Shenzhen ZTE Microelectronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen ZTE Microelectronics Technology Co Ltd filed Critical Shenzhen ZTE Microelectronics Technology Co Ltd
Priority to CN201710071029.2A priority Critical patent/CN108416730B/en
Priority to PCT/CN2017/095172 priority patent/WO2018145424A1/en
Publication of CN108416730A publication Critical patent/CN108416730A/en
Application granted granted Critical
Publication of CN108416730B publication Critical patent/CN108416730B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a kind of image processing methods, including:Obtain the pixel number evidence of image;To the pixel number according to concurrent operation is carried out, operation result is obtained;The invention also discloses a kind of image processing apparatus.Through the embodiment of the present invention, it is effectively improved the treatment effeciency of image.

Description

Image processing method and device
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to an image processing method and apparatus.
Background
With the development of science and technology, image data shows a rapid massive growth trend; although the computational speed of processors has been increasing, processing data-intensive images still takes a significant amount of time. Meanwhile, due to the fact that the performance of various compilers is different, the same C language algorithm is applied to different compilers, and the obtained instruction sequences are different; if the instruction after being compiled by the compiler has a large correlation, the image processing efficiency will be reduced, resulting in low image processing efficiency.
Disclosure of Invention
In view of this, embodiments of the present invention are directed to providing an image processing method and an image processing apparatus, which improve image processing efficiency by performing parallel processing on pixel data of multiple images.
The technical scheme of the embodiment of the invention is realized as follows:
the embodiment of the invention provides an image processing method, which comprises the following steps:
acquiring pixel point data of an image;
and carrying out parallel operation on the pixel point data to obtain an operation result.
In the foregoing solution, after the obtaining of the pixel point data of the image, the method further includes:
sequentially storing the pixel data to a memory, and allocating a storage address to each pixel data;
storing the storage address of the pixel point data to a scalar register;
and reading pixel point data from the memory, and transferring the pixel point data to a vector register.
In the foregoing solution, the storing the pixel point data to a vector register includes:
dividing the pixel point data into a plurality of groups, and storing the pixel point data of each group into a vector register; and the number of each group of pixel point data is the ratio of the bit width of the vector register to the bit number of the pixel point.
In the above solution, storing each group of pixel point data in a vector register includes:
storing a first set of pixel point data to a first vector register and a second set of data to a second vector register;
after the first group of pixel point data operation is finished, storing a third group of pixel point data to the first vector register;
after the second group of pixel point data operation is finished, storing a fourth group of pixel point data into the second vector register;
and so on until the last group of pixel point data is stored to the first vector register or the second vector register.
In the foregoing solution, the performing parallel operation on the pixel point data includes:
respectively carrying out first operation on each group of pixel point data according to an image processing algorithm to obtain a processing result of each group of pixel point data;
and then, performing second operation on the processing results of all pixel point data to obtain an operation result.
An embodiment of the present invention further provides an image processing apparatus, where the apparatus includes: the device comprises an acquisition module and an operation module; wherein,
the acquisition module is used for acquiring pixel point data of the image;
and the operation module is used for performing parallel operation on the pixel point data to obtain an operation result.
In the above scheme, the apparatus further comprises: the scalar register comprises a storage module, a plurality of scalar register modules and a plurality of vector register modules; wherein,
the storage module is used for storing the pixel data and distributing a storage address for each pixel data;
the scalar register module is used for storing the storage address of the pixel point data;
and the vector register module is used for reading pixel point data from the storage module and storing the pixel point data.
In the above scheme, the apparatus further comprises: the dividing module is used for dividing the pixel point data into a plurality of groups;
the vector registering module is specifically used for storing each group of pixel point data; wherein,
the number of each group of pixel point data is the ratio of the bit width of the vector register module to the bit number of the pixel point.
In the above scheme, the first vector register module is configured to store a first group of pixel point data;
the second vector register module is used for storing a second group of pixel point data;
the first vector register module is further used for storing a third group of pixel point data after the first group of pixel point data operation is finished;
the second vector register module is further configured to store a fourth set of pixel point data after the second pixel point data operation is finished;
and the like until the last group of pixel point data is stored in the first vector register module or the second vector register module.
In the above scheme, the operation module is specifically configured to perform a first operation on each group of pixel point data according to an image processing algorithm, so as to obtain a processing result of each group of pixel point data;
and then, performing second operation on the processing results of all pixel point data to obtain an operation result.
An embodiment of the present invention further provides an image processing apparatus, where the apparatus includes: the processor is used for acquiring pixel point data of the image;
and carrying out parallel operation on the pixel point data to obtain an operation result.
According to the image processing method and device provided by the embodiment of the invention, the image processing device acquires pixel data of an image; and carrying out parallel operation on the pixel point data to obtain an operation result. Therefore, the pixel data of the image is divided into groups, and each group of pixel data is subjected to parallel operation, so that the image processing efficiency is effectively improved.
Drawings
FIG. 1 is a schematic diagram of a basic processing flow of an image processing method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a detailed processing flow of a second image processing method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of storing pixel data of a Gaussian Filter 5 × 5 algorithm according to an embodiment of the present invention;
FIG. 4 is a coefficient diagram of a Gaussian Filter algorithm according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an operation structure of a Gaussian Filter 5 × 5 algorithm according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of pixel point data storage of Max 3 × 3 algorithm according to an embodiment of the present invention;
FIG. 7 is a detailed processing flow diagram of a third image processing method according to the embodiment of the present invention;
FIG. 8 is a schematic diagram of the operation structure of the Max 3 × 3 algorithm according to the embodiment of the present invention;
FIG. 9 is a schematic diagram of an exemplary embodiment of an image processing apparatus;
fig. 10 is a schematic structural diagram of an image processing apparatus according to a fifth embodiment of the present invention.
Detailed Description
Method embodiment one
As shown in fig. 1, a basic processing flow of an image processing method according to an embodiment of the present invention includes the following steps:
step 101, acquiring pixel data of an image;
specifically, the resolution of the image is determined, and pixel point data of the image is determined based on the resolution of the image;
taking the resolution of the image as 1080P for example, the image then includes 1920 x 1080 pixel point data.
102, performing parallel operation on the pixel point data to obtain an operation result;
specifically, the acquired pixel data are sequentially stored in a memory, and a storage address is allocated to each pixel data; dividing the pixel point data into a plurality of groups, and storing the pixel point data of each group into a vector register; respectively carrying out first operation on each group of pixel point data according to an image processing algorithm to obtain a processing result of each group of pixel point data; then carrying out second operation on the processing results of all pixel point data to obtain an operation result;
here, the image processing algorithm includes: 3 × 3, 4 × 4 or 5 × 5 algorithms such as Gaussian Filter, Median Filter, Sobel operator, Max, Min and the like; thus, the first operation and the second operation are related to an algorithm that processes the image;
the number of each group of pixel point data is the ratio of the bit width of the vector register to the bit number of the pixel point; taking the bit width of the vector register as 128 bits as an example, the bit width of each pixel point data is 8 bits; then, the number of each group of pixel point data is equal to the bit width 128 of the vector register divided by the bit width 8 of each pixel point data, and the obtained value is 16; namely, the number of each group of pixel point data is 16;
taking two vector registers as an example, a specific implementation manner when each group of pixel point data is stored in one vector register is as follows:
storing a first set of pixel point data to a first vector register and a second set of data to a second vector register; after the first group of pixel point data operation is finished, storing a third group of pixel point data to the first vector register; after the second group of pixel point data operation is finished, storing a fourth group of pixel point data into the second vector register; and so on until the last group of pixel point data is stored to the first vector register or the second vector register;
taking three vector registers as an example, a specific implementation manner when each group of pixel point data is stored in one vector register is as follows:
storing a first group of pixel point data into a first vector register, a second group of pixel point data into a second vector register, and a third group of pixel point data into a third vector register; after the first group of pixel point data operation is finished, storing a fourth group of pixel point data into the first vector register; after the second group of pixel point data operation is finished, storing a fifth group of pixel point data into the second vector register, and after the third group of pixel point data operation is finished, storing a sixth group of pixel point data into the third vector register; and so on until the last set of pixel point data is stored to the first vector register, or the second vector register, or a third vector register.
Method embodiment two
Taking the example of image processing by using Gaussian Filter 5 × 5 algorithm based on assembly language, a detailed processing flow of an image processing method according to a second embodiment of the present invention, as shown in fig. 2, includes the following steps:
step 201, acquiring pixel point data of an image;
specifically, a resolution of the image is determined, and pixel point data of the image is determined based on the resolution of the image.
Step 202, sequentially storing the pixel data to a memory, and allocating a storage address to each pixel data;
specifically, a storage diagram of pixel point data is stored, as shown in fig. 3, where p000, p001, p002.. p421.. denote pixel point data.
Step 203, storing the pixel point data to a vector register;
specifically, the pixel point data is divided into a plurality of groups, and each group of pixel point data is stored in a vector register; the number of each group of pixel point data is the ratio of the bit width of the vector register to the bit number of the pixel point;
in the embodiment of the invention, the bit width of the vector register is 128 bits as an example, and the bit width of each pixel point data is 8 bits; then, the number of each group of pixel point data is equal to the bit width 128 of the vector register divided by the bit width 8 of each pixel point data, and the obtained value is 16; therefore, the pixel point data p000, p 001.. p015 is divided into a first group of pixel point data, the pixel point data p001, p002.. p016 is divided into a second group of pixel point data, and the pixel point data p002, p003.. p017 is divided into a third group of pixel point data, and so on until the pixel point data are all divided into pixel point data groups;
here, the first set of pixel dot data is stored to vector register v0, the second set of pixel dot data is stored to vector register v1, the third set of pixel dot data is stored to vector register v2, the fourth set of pixel dot data is stored to vector register v3, the fourth set of pixel dot data is stored to vector register v 4; the scalar register r2 is used for storing the initial address of the pixel point data p 000; namely: ld v0, lm (r2+ +// p 000-p 015; ld v1, lm (r2+ +// p 001. about. p 016; ld v2, lm (r2+ +// p 002-p 017; ld v3, lm (r2+ +// p 003-p 018; ld v4, lm (r2+ +// p 004-p 019).
Step 204, storing the coefficient data into a scalar register;
specifically, according to the Gaussian Filter algorithm, a corresponding coefficient diagram is shown in fig. 4;
here, each coefficient data may be stored to a separate one of the scalar registers; when the number of scalar registers is limited, a plurality of scalar registers can be reserved for storing coefficient data;
in the embodiment of the present invention, for example, 2 scalar registers are reserved for storing coefficient data, the coefficient data c000 is stored in a scalar register r1, the coefficient data c001 is stored in a scalar register r7, and a vector register is used for storing a start address of the coefficient data; namely: ld r1, lmb (r0+ +// c000, ld r7, lmb (r0+ +// c 001).
Step 205, multiplying the coefficient data c000 by the elements in the vector register v0, storing the obtained result into vector registers v14 and v15, and storing the coefficient data c002 into r 1;
namely: vmul v14, v15, v0, r 1; ld r1, lmb (r0+ +// c 002;
specifically, the number of vector registers may be allocated in accordance with the number of data obtained by multiplying the coefficient data c000 by the elements in the vector register v 0.
Step 206, multiplying the coefficient data c001 by the elements in the vector register v1, accumulating and storing the obtained result and the results in the vector registers v14 and v15, and storing the coefficient data c003 in the scalar register r 7;
namely: vmac v14, v15, v14, v15, v1, r 7; ld r7, lmb (r0+ +// c 003.
Step 207, multiplying the coefficient data c002 by the elements in the vector register v2, accumulating and storing the obtained result and the results in the vector registers v14 and v15, and storing the coefficient data c004 into the scalar register r 1;
namely: vmac v14, v15, v14, v15, v2, r 1; ld r1, lmb (r0+ +// c 004).
Step 208, multiplying the coefficient data c003 by the elements in the vector register v3, accumulating and storing the obtained result and the results in the vector registers v14 and v15, and storing the coefficient data c100 into a scalar register r 7;
namely: vmac v14, v15, v14, v15, v3, r 7; ld r7, lmb (r0+ +// c 100.
Step 209, multiplying the coefficient data c004 with the elements in the vector register v4, accumulating and storing the obtained result and the results in the vector registers v14 and v15, and storing the coefficient data c101 into a scalar register r 1;
namely: vmac v14, v15, v14, v15, v4, r 1; ld r1, lmb (r0+ +// c 101.
Step 210, repeating steps 203 to 209, multiplying each group of pixel point data by coefficient data c100 to c104 in sequence, and accumulating and storing the obtained result and the result in the vector registers v14 and v 15;
namely: ld v0, lm (r3+ +// p 100-p 115; ld v1, lm (r3+ +// p 101-p 116;
ld v2,lm(r3++)//p102~p117;
vmac v14,v15,v14,v15,v0,r7;
ld r7,lmb(r0++)//c102;
vmac v14,v15,v14,v15,v1,r1;
ld r1,lmb(r0++)//c103;
ld v3,lm(r3++)//p103~p118;
ld v4,lm(r3++)//p104~p119;
vmac v14,v15,v14,v15,v2,r7;
ld r7,lmb(r0++)//c104;
vmac v14,v15,v14,v15,v3,r1;
ld r1,lmb(r0++)//c200;
ld v0,lm(r4++)//p200~p215;
ld v1,lm(r4++)//p201~p216;
ld v2,lm(r4++)//p202~p217;
vmac v14,v15,v14,v15,v4,r7;
ld r7,lmb(r0++)//c201;
vmac v14,v15,v14,v15,v0,r1;
ld r1,lmb(r0++)//c202;
vmac v14,v15,v14,v15,v1,r7;
ld r7,lmb(r0++)//c203;
ld v3,lm(r4++)//p203~p218;
ld v4,lm(r4++)//p204~p219;
vmac v14,v15,v14,v15,v2,r1;
ld r1,lmb(r0++)//c204;
vmac v14,v15,v14,v15,v3,r7;
ld r7,lmb(r0++)//c300;
ld v0,lm(r5++)//p300~p315;
ld v1,lm(r5++)//p301~p316;
ld v2,lm(r5++)//p302~p317;
vmac v14,v15,v14,v15,v4,r1;
ld r1,lmb(r0++)//c301;
vmac v14,v15,v14,v15,v0,r7;
ld r7,lmb(r0++)//c302;
vmac v14,v15,v14,v15,v1,r1;
ld r1,lmb(r0++)//c303;
ld v3,lm(r5++)//p303~p318;
ld v4,lm(r5++)//p304~p319;
vmac v14,v15,v14,v15,v2,r7;
ld r7,lmb(r0++)//c104;
vmac v14,v15,v14,v15,v3,r1;
ld r1,lmb(r0++)//c400;
ld v0,lm(r6++)//p400~p415;
ld v1,lm(r6++)//p401~p416;
ld v2,lm(r6++)//p402~p417;
vmac v14,v15,v14,v15,v4,r7;
ld r7,lmb(r0++)//c401;
vmac v14,v15,v14,v15,v0,r1;
ld r1,lmb(r0++)//c402;
vmac v14,v15,v14,v15,v1,r7;
ld r7,lmb(r0++)//c403;
ld v3,lm(r6++)//p403~p418;
ld v4,lm(r6++)//p404~p419;
vmac v14,v15,v14,v15,v2,r1;
ld r1,lmb(r0++)//c404;
vmac v14,v15,v14,v15,v3,r7;
vmac v14,v15,v14,v15,v4,r1。
the 16 data stored in the vector registers v14 and v15 are the result of the Gaussian Filter 5 × 5 algorithm.
In the prior art, when image processing is performed based on a Gaussian Filter 5 × 5 algorithm, 5 × 5 pixel point data in a solid frame shown in fig. 3 and 5 × 5 coefficient data shown in fig. 4 are multiplied respectively and then accumulated to obtain a first group of results Pr 000; specifically, the following formula is used to implement:
Pr000=c000*p000+c001*p001+c002*p002+c003*p003+c004*p004+c100*p100+c101*p101+c102*p102+c103*p103+c104*p104+c200*p200+c201*p201+c202*p202+c203*p203+c204*p204+…+c404*p404;
respectively multiplying 5 × 5 pixel point data obtained by translating the solid line frame shown in fig. 3 to the right by a row with 5 × 5 coefficient data shown in fig. 4, and accumulating to obtain a first group of results Pr001, and so on to obtain Pr001 and Pr002.. Pr 015;
specifically, the following formula is used to implement:
Pr001=c000*p001+c001*p002+c002*p003+c003*p004+c004*p005+c100*p10
1+c101*p102+c102*p103+c103*p104+c104*p105+c200*p201+…+c404*p405
Pr002=c000*p002+c001*p003+c002*p004+c003*p005+c004*p006+c100*p102+c101*p103+c102*p104+c103*p105+c104*p106+c200*p202+…+c404*p406
Pr015=c000*p015+c001*p016+c002*p017+c003*p018+c004*p019+c100*p115+c101*p116+c102*p117+c103*p118+c104*p119+c200*p215+…+c404*p419。
therefore, an operation structure diagram of the Gaussian Filter 5 × 5 algorithm in the embodiment of the present invention is shown in fig. 5.
In summary, it can be seen that, when a pixel data point is calculated in the prior art, 1 multiplication operation and 24 scalar operations of multiply-accumulate are required; therefore, to obtain the above operation result, 16 times of multiplication and 384 times of scalar operation are needed; the required operation times are 25 times of the operation times required by the embodiment of the invention, and the efficiency of processing the image by adopting the method of the embodiment of the invention is improved by 6.3 times.
Therefore, the image processing efficiency can be greatly improved by applying the embodiment of the invention. In addition, the embodiment of the invention is realized by adopting the assembly language, and the influence of a compiler on the image processing efficiency is avoided for different processing systems.
Method embodiment three
Taking Max 3 × 3 algorithm based on assembly language for image processing as an example, a schematic diagram of pixel data storage is shown in fig. 6; as shown in fig. 7, a detailed processing flow of a third image method in the embodiment of the present invention includes the following steps:
step 401, acquiring pixel point data of an image;
specifically, a resolution of the image is determined, and pixel point data of the image is determined based on the resolution of the image.
Step 402, storing pixel point data p000 to p015, p001 to p016 and p002 to p017 into vector registers v0, v1 and v2 respectively;
namely: ld v0, lm (r2+ +// p 000-p 015; ld v1, lm (r2+ +// p 001. about. p 016; ld v2, lm (r2+ +// p 002-p 017;
here, scalar register r2 is used to store the start address of pixel point data p 000.
Step 403, storing pixel point data p100 to p115, p101 to p116, and p102 to p117 into vector registers v3, v4, and v5, respectively;
namely: ld v3, lm (r3+ +// p 100-p 115; ld v4, lm (r3+ +// p 101-p 116;
ld v5,lm(r3++)//p102~p117;
the scalar register r3 is used to store the start address of the pixel point data p 100.
Step 404, performing longitudinal comparison operation on elements in the vector registers v0, v1 and v2, and storing a vector result formed by the maximum values into the vector register v 7;
specifically, p000, p001 and p002 are compared to obtain the maximum value of the three; then comparing p001, p002 and p003 to obtain the maximum value of the three, and so on;
namely: vmax v0, v0v1# vc 0; vmax v7, v0v2# vc 0.
Step 405, storing pixel point data p200 to p215, p201 to p216, p202 to p217 into vector registers v0, v1, v2 respectively;
namely: ld v0, lm (r4+ +// p 200. about.p 215;
ld v1,lm(r4++)//p201~p216
ld v2,lm(r4++)//p202~p217;
the scalar register r4 is used to store the start address of the pixel point data p 200.
Step 406, performing longitudinal comparison operation on elements in the vector registers v3, v4 and v5, and storing a vector result consisting of the obtained maximum values into the vector register v 6;
specifically, p100, p101 and p102 are compared to obtain the maximum value of the three; and comparing p101, p102 and p103 to obtain the maximum value of the three, and so on.
Step 407, performing longitudinal comparison operation on elements in the vector registers v7 and v6, and storing a vector result formed by the obtained maximum values into the vector register v 4;
namely: vmax v4, v7v6# vc 0.
And step 408, performing longitudinal comparison operation on the elements in the vector registers v0, v1 and v2, and storing the vector result formed by the obtained maximum values into the vector register v 5.
Step 409, carrying out longitudinal comparison operation on elements in the vector registers v4 and v5, and storing a vector result formed by the obtained maximum values into the vector register v 7;
namely: vmax v7, v4v5# vc 0.
Through the operation of the embodiment of the invention, 16 results stored in the vector register v7 are the results of the Max 3 × 3 algorithm; that is, the calculation result of the data of 16 pixel points can be obtained by 8 times of vector operation of taking the maximum value.
Therefore, the operation structure of the Max 3 × 3 algorithm is shown in fig. 8.
In the prior art, when image processing is performed based on Max 3 × 3 algorithm, to obtain the 1 st group of calculation results, it is necessary to take the maximum value of 3 × 3 pixel points in the solid frame shown in fig. 6 by the following formula:
Pr000=max(max(max(p000,p001),max(p002,p100)),max(max(max(p101,p102),max(p200,p201)),p202))
wherein max is the operation for realizing the maximum value;
in calculating the results of the groups 2 to 15, the maximum value of the data of 3 × 3 pixels obtained by shifting the solid line frame shown in fig. 6 to the right by one line is obtained by the following formula:
Pr001=max(max(max(p001,p002),max(p003,p101)),max(max(max(p102,p103),max(p201,p202)),p203));
Pr002=max(max(max(p002,p003),max(p004,p102)),max(max(max(p103,p104),max(p202,p203)),p204));
Pr015=max(max(max(p015,p016),max(p017,p115),max(max(max(p116,p117),max(p215,p216)),p217))。
in summary, in the prior art, the pixel data p000 and p001 … p015 and the pixel data p001 and p002 … p016 are calculated to obtain the maximum value, and the pixel data p002 and p003 … p017 and the pixel data p100 and p101 … p115 are calculated to obtain the maximum value; and by analogy, the maximum value of the image processing is obtained. Therefore, in the prior art, 8 times of scalar maximum value calculation is needed when calculating the result of the first pixel point data, and 128 times of scalar maximum value calculation is needed when obtaining the result of 16 pixel point data; based on the Max 3 multiplied by 3 algorithm, the operation times required by the prior art are 16 times of the operation times required by the embodiment of the invention, and the efficiency of processing the image by adopting the method of the embodiment of the invention is improved by 4.8 times.
Therefore, the image processing efficiency can be greatly improved by applying the embodiment of the invention. In addition, the embodiment of the invention is realized by adopting the assembly language, and the influence of a compiler on the image processing efficiency is avoided for different processing systems.
It should be noted that, in the embodiment of the present invention, a maximum value operation may be performed by using three sets of pixel point data, or a maximum value operation may be performed by using two sets of pixel point data.
Example four
In order to implement the image processing method, an eighth embodiment of the present invention further provides an image processing apparatus, where a composition structure of the apparatus is as shown in fig. 9, and the apparatus includes: an acquisition module 11 and an operation module 12; wherein,
the obtaining module 11 is configured to obtain pixel data of an image;
the operation module 12 is configured to perform parallel operation on the pixel point data to obtain an operation result.
In a specific embodiment, the apparatus further comprises: a storage module 13, a plurality of vector register modules 14, and a plurality of scalar register modules 15; wherein,
the storage module 13 is configured to store the pixel data, and allocate a storage address to each pixel data;
the vector register module 14 is configured to read pixel point data from the storage module, and store the pixel point data.
In a specific embodiment, the apparatus further comprises: a dividing module 16, configured to divide the pixel point data into a plurality of groups;
the vector registering module 14 is specifically configured to store each group of pixel point data; wherein,
the number of each group of pixel point data is the ratio of the bit width of the vector register module to the bit number of the pixel points; taking the bit width of the vector register as 128 bits as an example, the bit width of each pixel point data is 8 bits; then, the number of each group of pixel point data is equal to the bit width 128 of the vector register divided by the bit width 8 of each pixel point data, and the obtained value is 16; namely, the number of each group of pixel point data is 16;
in one embodiment, taking two vector register modules as an example, the first vector register module is used for storing a first group of pixel point data; the second vector register module is used for storing a second group of pixel point data; the first vector register module is further used for storing a third group of pixel point data after the first group of pixel point data operation is finished; the second vector register module is further configured to store a fourth set of pixel point data after the second pixel point data operation is finished; and the like until the last group of pixel point data is stored in the first vector register module or the second vector register module.
In a specific embodiment, taking three vector register modules as an example, a specific implementation manner when each group of pixel data is stored in one vector register is as follows: storing a first group of pixel point data to a first vector register module, storing a second group of data to a second vector register module, and storing a third group of data to a third vector register module; after the first group of pixel point data operation is finished, storing a fourth group of pixel point data to the first vector register module; after the second group of pixel point data operation is finished, storing a fifth group of pixel point data to the second vector register module, and after the third group of pixel point data operation is finished, storing a sixth group of pixel point data to the third vector register module; and so on until the last set of pixel point data is stored to the first vector register, or the second vector register, or a third vector register.
In a specific embodiment, the operation module 12 is specifically configured to perform a first operation on each group of pixel point data according to an image processing algorithm, so as to obtain a processing result of each group of pixel point data;
and then, performing second operation on the processing results of all pixel point data to obtain an operation result.
In an embodiment of the present invention, the image processing algorithm includes: 3 × 3, 4 × 4 or 5 × 5 algorithms such as Gaussian Filter, Median Filter, Sobel operator, Max, Min and the like; thus, the first operation and the second operation are related to an algorithm that processes the image.
EXAMPLE five
In order to implement the image processing method, a fifth embodiment of the present invention further provides an image processing apparatus, where a composition structure of the apparatus is as shown in fig. 10, and the apparatus includes: a processor 21, configured to obtain pixel data of an image; performing parallel operation on the pixel point data to obtain an operation result
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that, in practical applications, the functions executed by the obtaining module 11, the operation module 12, and the dividing module 16 may be implemented by a Central Processing Unit (CPU), a microprocessor unit (MPU), a Digital Signal Processor (DSP), or a programmable gate array (FPGA) located on a terminal or a server; the functions performed by the vector register module 14 can be implemented by a vector register located on a terminal or a server; the function performed by the scalar register module 15 may be implemented by a scalar register located on a terminal or a server.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims (11)

1. An image processing method, characterized in that the method comprises:
acquiring pixel point data of an image;
and carrying out parallel operation on the pixel point data to obtain an operation result.
2. The method of claim 1, wherein after obtaining pixel point data for an image, the method further comprises:
sequentially storing the pixel data to a memory, and allocating a storage address to each pixel data;
storing the storage address of the pixel point data to a scalar register;
and reading pixel point data from the memory, and transferring the pixel point data to a vector register.
3. The method of claim 2, wherein storing the pixel point data to a vector register comprises:
dividing the pixel point data into a plurality of groups, and storing the pixel point data of each group into a vector register; wherein,
the number of each group of pixel point data is the ratio of the bit width of the vector register to the bit number of the pixel point.
4. The method of claim 3, wherein each set of pixel point data is stored in a vector register, comprising:
storing a first set of pixel point data to a first vector register and a second set of data to a second vector register;
after the first group of pixel point data operation is finished, storing a third group of pixel point data to the first vector register;
after the second group of pixel point data operation is finished, storing a fourth group of pixel point data into the second vector register;
and so on until the last group of pixel point data is stored to the first vector register or the second vector register.
5. The method of claim 3 or 4, wherein said performing a parallel operation on said pixel point data comprises:
respectively carrying out first operation on each group of pixel point data according to an image processing algorithm to obtain a processing result of each group of pixel point data;
and then, performing second operation on the processing results of all pixel point data to obtain an operation result.
6. An image processing apparatus, characterized in that the apparatus comprises: the device comprises an acquisition module and an operation module; wherein,
the acquisition module is used for acquiring pixel point data of the image;
and the operation module is used for performing parallel operation on the pixel point data to obtain an operation result.
7. The apparatus of claim 6, further comprising: the scalar register comprises a storage module, a plurality of scalar register modules and a plurality of vector register modules; wherein,
the storage module is used for storing the pixel data and distributing a storage address for each pixel data;
the scalar register module is used for storing the storage address of the pixel point data;
and the vector register module is used for reading pixel point data from the storage module and storing the pixel point data.
8. The apparatus of claim 7, further comprising: the dividing module is used for dividing the pixel point data into a plurality of groups;
the vector registering module is specifically used for storing each group of pixel point data; wherein,
the number of each group of pixel point data is the ratio of the bit width of the vector register module to the bit number of the pixel point.
9. The apparatus of claim 8, wherein the first vector register module is configured to store a first set of pixel point data;
the second vector register module is used for storing a second group of pixel point data;
the first vector register module is further used for storing a third group of pixel point data after the first group of pixel point data operation is finished;
the second vector register module is further configured to store a fourth set of pixel point data after the second pixel point data operation is finished;
and the like until the last group of pixel point data is stored in the first vector register module or the second vector register module.
10. The apparatus according to claim 8 or 9, wherein the operation module is specifically configured to perform a first operation on each set of pixel point data according to an image processing algorithm, to obtain a processing result of each set of pixel point data;
and then, performing second operation on the processing results of all pixel point data to obtain an operation result.
11. An image processing apparatus, characterized in that the apparatus comprises: the processor is used for acquiring pixel point data of the image;
and carrying out parallel operation on the pixel point data to obtain an operation result.
CN201710071029.2A 2017-02-09 2017-02-09 Image processing method and device Active CN108416730B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710071029.2A CN108416730B (en) 2017-02-09 2017-02-09 Image processing method and device
PCT/CN2017/095172 WO2018145424A1 (en) 2017-02-09 2017-07-31 Image processing method, apparatus and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710071029.2A CN108416730B (en) 2017-02-09 2017-02-09 Image processing method and device

Publications (2)

Publication Number Publication Date
CN108416730A true CN108416730A (en) 2018-08-17
CN108416730B CN108416730B (en) 2020-11-10

Family

ID=63107129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710071029.2A Active CN108416730B (en) 2017-02-09 2017-02-09 Image processing method and device

Country Status (2)

Country Link
CN (1) CN108416730B (en)
WO (1) WO2018145424A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102171512B1 (en) 2020-03-16 2020-10-29 (주)대성기연 exhaust filter system for radioactive material in the nuclear power plant
WO2021056143A1 (en) * 2019-09-23 2021-04-01 深圳市大疆创新科技有限公司 Image processing method and apparatus, and mobile device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040015682A1 (en) * 2002-04-15 2004-01-22 Alphamosaic Limited Application registers
CN1937701A (en) * 2006-08-02 2007-03-28 北京北大方正电子有限公司 Image processing device and its processing method
CN103780914A (en) * 2012-02-27 2014-05-07 开曼群岛威睿电通股份有限公司 Loop filter accelerating circuit and loop filter method
CN103970506A (en) * 2008-03-28 2014-08-06 英特尔公司 Vector instruction to enable efficient synchronization and parallel reduction operations

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101163240A (en) * 2006-10-13 2008-04-16 国际商业机器公司 Filter arrangement and method thereof
EP3031033B1 (en) * 2013-08-06 2018-09-05 Flir Systems, Inc. Vector processing architectures for infrared camera electronics
CN105654491A (en) * 2015-12-31 2016-06-08 南京华捷艾米软件科技有限公司 Method for extracting deep continuous object images in parallel from background image

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040015682A1 (en) * 2002-04-15 2004-01-22 Alphamosaic Limited Application registers
CN1937701A (en) * 2006-08-02 2007-03-28 北京北大方正电子有限公司 Image processing device and its processing method
CN103970506A (en) * 2008-03-28 2014-08-06 英特尔公司 Vector instruction to enable efficient synchronization and parallel reduction operations
CN103780914A (en) * 2012-02-27 2014-05-07 开曼群岛威睿电通股份有限公司 Loop filter accelerating circuit and loop filter method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021056143A1 (en) * 2019-09-23 2021-04-01 深圳市大疆创新科技有限公司 Image processing method and apparatus, and mobile device
KR102171512B1 (en) 2020-03-16 2020-10-29 (주)대성기연 exhaust filter system for radioactive material in the nuclear power plant

Also Published As

Publication number Publication date
WO2018145424A1 (en) 2018-08-16
CN108416730B (en) 2020-11-10

Similar Documents

Publication Publication Date Title
CN108416730B (en) Image processing method and device
US11461684B2 (en) Operation processing circuit and recognition system
US10509998B2 (en) Multiplication and addition device for matrices, neural network computing device, and method
EP3407203A2 (en) Statically schedulable feed and drain structure for systolic array architecture
KR100602532B1 (en) Method and apparatus for parallel shift right merge of data
US20180232629A1 (en) Pooling operation device and method for convolutional neural network
US10169295B2 (en) Convolution operation device and method
JP5706754B2 (en) Data processing apparatus and data processing method
Gultekin et al. Pure cycles in two-machine dual-gripper robotic cells
US8706795B2 (en) SIMD integer addition including mathematical operation on masks
CN118199848A (en) Method and device for executing hierarchical homomorphic encryption operation
US20010033617A1 (en) Image processing device
CN116796812A (en) Programmable parallel processing device, neural network chip and electronic equipment
TWI415474B (en) Viedo codec and method thereof
CN106934757B (en) Monitoring video foreground extraction acceleration method based on CUDA
US20160162262A1 (en) Parallelization of random number generators
GB2346283A (en) Motion estimation with fast block comparison using SIMD units.
EP2000973A2 (en) Parallel image processing system control method and apparatus
CN110322388B (en) Pooling method and apparatus, pooling system, and computer-readable storage medium
EP2092482A1 (en) Address calculation unit
CN108737833A (en) Data buffering method, system, computer equipment and storage medium
CN112825151B (en) Data processing method, device and equipment
US11640302B2 (en) SMID processing unit performing concurrent load/store and ALU operations
CN111355989B (en) Frame rate control method and related equipment
Zhao Fast filter bank convolution for three-dimensional wavelet transform by shared memory on mobile GPU computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant