CN108416730A

CN108416730A - A kind of image processing method and device

Info

Publication number: CN108416730A
Application number: CN201710071029.2A
Authority: CN
Inventors: 安爱女
Original assignee: Shenzhen ZTE Microelectronics Technology Co Ltd
Current assignee: Shenzhen ZTE Microelectronics Technology Co Ltd
Priority date: 2017-02-09
Filing date: 2017-02-09
Publication date: 2018-08-17
Anticipated expiration: 2037-02-09
Also published as: WO2018145424A1; CN108416730B

Abstract

The invention discloses a kind of image processing methods, including：Obtain the pixel number evidence of image；To the pixel number according to concurrent operation is carried out, operation result is obtained；The invention also discloses a kind of image processing apparatus.Through the embodiment of the present invention, it is effectively improved the treatment effeciency of image.

Description

Image processing method and device

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image processing method and apparatus.

Background

With the development of science and technology, image data shows a rapid massive growth trend; although the computational speed of processors has been increasing, processing data-intensive images still takes a significant amount of time. Meanwhile, due to the fact that the performance of various compilers is different, the same C language algorithm is applied to different compilers, and the obtained instruction sequences are different; if the instruction after being compiled by the compiler has a large correlation, the image processing efficiency will be reduced, resulting in low image processing efficiency.

Disclosure of Invention

In view of this, embodiments of the present invention are directed to providing an image processing method and an image processing apparatus, which improve image processing efficiency by performing parallel processing on pixel data of multiple images.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides an image processing method, which comprises the following steps:

acquiring pixel point data of an image;

and carrying out parallel operation on the pixel point data to obtain an operation result.

In the foregoing solution, after the obtaining of the pixel point data of the image, the method further includes:

sequentially storing the pixel data to a memory, and allocating a storage address to each pixel data;

storing the storage address of the pixel point data to a scalar register;

and reading pixel point data from the memory, and transferring the pixel point data to a vector register.

In the foregoing solution, the storing the pixel point data to a vector register includes:

dividing the pixel point data into a plurality of groups, and storing the pixel point data of each group into a vector register; and the number of each group of pixel point data is the ratio of the bit width of the vector register to the bit number of the pixel point.

In the above solution, storing each group of pixel point data in a vector register includes:

storing a first set of pixel point data to a first vector register and a second set of data to a second vector register;

after the first group of pixel point data operation is finished, storing a third group of pixel point data to the first vector register;

after the second group of pixel point data operation is finished, storing a fourth group of pixel point data into the second vector register;

and so on until the last group of pixel point data is stored to the first vector register or the second vector register.

In the foregoing solution, the performing parallel operation on the pixel point data includes:

respectively carrying out first operation on each group of pixel point data according to an image processing algorithm to obtain a processing result of each group of pixel point data;

and then, performing second operation on the processing results of all pixel point data to obtain an operation result.

An embodiment of the present invention further provides an image processing apparatus, where the apparatus includes: the device comprises an acquisition module and an operation module; wherein,

the acquisition module is used for acquiring pixel point data of the image;

and the operation module is used for performing parallel operation on the pixel point data to obtain an operation result.

In the above scheme, the apparatus further comprises: the scalar register comprises a storage module, a plurality of scalar register modules and a plurality of vector register modules; wherein,

the storage module is used for storing the pixel data and distributing a storage address for each pixel data;

the scalar register module is used for storing the storage address of the pixel point data;

and the vector register module is used for reading pixel point data from the storage module and storing the pixel point data.

In the above scheme, the apparatus further comprises: the dividing module is used for dividing the pixel point data into a plurality of groups;

the vector registering module is specifically used for storing each group of pixel point data; wherein,

the number of each group of pixel point data is the ratio of the bit width of the vector register module to the bit number of the pixel point.

In the above scheme, the first vector register module is configured to store a first group of pixel point data;

the second vector register module is used for storing a second group of pixel point data;

the first vector register module is further used for storing a third group of pixel point data after the first group of pixel point data operation is finished;

the second vector register module is further configured to store a fourth set of pixel point data after the second pixel point data operation is finished;

and the like until the last group of pixel point data is stored in the first vector register module or the second vector register module.

In the above scheme, the operation module is specifically configured to perform a first operation on each group of pixel point data according to an image processing algorithm, so as to obtain a processing result of each group of pixel point data;

An embodiment of the present invention further provides an image processing apparatus, where the apparatus includes: the processor is used for acquiring pixel point data of the image;

According to the image processing method and device provided by the embodiment of the invention, the image processing device acquires pixel data of an image; and carrying out parallel operation on the pixel point data to obtain an operation result. Therefore, the pixel data of the image is divided into groups, and each group of pixel data is subjected to parallel operation, so that the image processing efficiency is effectively improved.

Drawings

FIG. 1 is a schematic diagram of a basic processing flow of an image processing method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating a detailed processing flow of a second image processing method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of storing pixel data of a Gaussian Filter 5 × 5 algorithm according to an embodiment of the present invention;

FIG. 4 is a coefficient diagram of a Gaussian Filter algorithm according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an operation structure of a Gaussian Filter 5 × 5 algorithm according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of pixel point data storage of Max 3 × 3 algorithm according to an embodiment of the present invention;

FIG. 7 is a detailed processing flow diagram of a third image processing method according to the embodiment of the present invention;

FIG. 8 is a schematic diagram of the operation structure of the Max 3 × 3 algorithm according to the embodiment of the present invention;

FIG. 9 is a schematic diagram of an exemplary embodiment of an image processing apparatus;

fig. 10 is a schematic structural diagram of an image processing apparatus according to a fifth embodiment of the present invention.

Detailed Description

Method embodiment one

As shown in fig. 1, a basic processing flow of an image processing method according to an embodiment of the present invention includes the following steps:

step 101, acquiring pixel data of an image;

specifically, the resolution of the image is determined, and pixel point data of the image is determined based on the resolution of the image;

taking the resolution of the image as 1080P for example, the image then includes 1920 x 1080 pixel point data.

102, performing parallel operation on the pixel point data to obtain an operation result;

specifically, the acquired pixel data are sequentially stored in a memory, and a storage address is allocated to each pixel data; dividing the pixel point data into a plurality of groups, and storing the pixel point data of each group into a vector register; respectively carrying out first operation on each group of pixel point data according to an image processing algorithm to obtain a processing result of each group of pixel point data; then carrying out second operation on the processing results of all pixel point data to obtain an operation result;

here, the image processing algorithm includes: 3 × 3, 4 × 4 or 5 × 5 algorithms such as Gaussian Filter, Median Filter, Sobel operator, Max, Min and the like; thus, the first operation and the second operation are related to an algorithm that processes the image;

the number of each group of pixel point data is the ratio of the bit width of the vector register to the bit number of the pixel point; taking the bit width of the vector register as 128 bits as an example, the bit width of each pixel point data is 8 bits; then, the number of each group of pixel point data is equal to the bit width 128 of the vector register divided by the bit width 8 of each pixel point data, and the obtained value is 16; namely, the number of each group of pixel point data is 16;

taking two vector registers as an example, a specific implementation manner when each group of pixel point data is stored in one vector register is as follows:

storing a first set of pixel point data to a first vector register and a second set of data to a second vector register; after the first group of pixel point data operation is finished, storing a third group of pixel point data to the first vector register; after the second group of pixel point data operation is finished, storing a fourth group of pixel point data into the second vector register; and so on until the last group of pixel point data is stored to the first vector register or the second vector register;

taking three vector registers as an example, a specific implementation manner when each group of pixel point data is stored in one vector register is as follows:

storing a first group of pixel point data into a first vector register, a second group of pixel point data into a second vector register, and a third group of pixel point data into a third vector register; after the first group of pixel point data operation is finished, storing a fourth group of pixel point data into the first vector register; after the second group of pixel point data operation is finished, storing a fifth group of pixel point data into the second vector register, and after the third group of pixel point data operation is finished, storing a sixth group of pixel point data into the third vector register; and so on until the last set of pixel point data is stored to the first vector register, or the second vector register, or a third vector register.

Method embodiment two

Taking the example of image processing by using Gaussian Filter 5 × 5 algorithm based on assembly language, a detailed processing flow of an image processing method according to a second embodiment of the present invention, as shown in fig. 2, includes the following steps:

step 201, acquiring pixel point data of an image;

specifically, a resolution of the image is determined, and pixel point data of the image is determined based on the resolution of the image.

Step 202, sequentially storing the pixel data to a memory, and allocating a storage address to each pixel data;

specifically, a storage diagram of pixel point data is stored, as shown in fig. 3, where p000, p001, p002.. p421.. denote pixel point data.

Step 203, storing the pixel point data to a vector register;

specifically, the pixel point data is divided into a plurality of groups, and each group of pixel point data is stored in a vector register; the number of each group of pixel point data is the ratio of the bit width of the vector register to the bit number of the pixel point;

in the embodiment of the invention, the bit width of the vector register is 128 bits as an example, and the bit width of each pixel point data is 8 bits; then, the number of each group of pixel point data is equal to the bit width 128 of the vector register divided by the bit width 8 of each pixel point data, and the obtained value is 16; therefore, the pixel point data p000, p 001.. p015 is divided into a first group of pixel point data, the pixel point data p001, p002.. p016 is divided into a second group of pixel point data, and the pixel point data p002, p003.. p017 is divided into a third group of pixel point data, and so on until the pixel point data are all divided into pixel point data groups;

here, the first set of pixel dot data is stored to vector register v0, the second set of pixel dot data is stored to vector register v1, the third set of pixel dot data is stored to vector register v2, the fourth set of pixel dot data is stored to vector register v3, the fourth set of pixel dot data is stored to vector register v 4; the scalar register r2 is used for storing the initial address of the pixel point data p 000; namely: ld v0, lm (r2+ +// p 000-p 015; ld v1, lm (r2+ +// p 001. about. p 016; ld v2, lm (r2+ +// p 002-p 017; ld v3, lm (r2+ +// p 003-p 018; ld v4, lm (r2+ +// p 004-p 019).

Step 204, storing the coefficient data into a scalar register;

specifically, according to the Gaussian Filter algorithm, a corresponding coefficient diagram is shown in fig. 4;

here, each coefficient data may be stored to a separate one of the scalar registers; when the number of scalar registers is limited, a plurality of scalar registers can be reserved for storing coefficient data;

in the embodiment of the present invention, for example, 2 scalar registers are reserved for storing coefficient data, the coefficient data c000 is stored in a scalar register r1, the coefficient data c001 is stored in a scalar register r7, and a vector register is used for storing a start address of the coefficient data; namely: ld r1, lmb (r0+ +// c000, ld r7, lmb (r0+ +// c 001).

Step 205, multiplying the coefficient data c000 by the elements in the vector register v0, storing the obtained result into vector registers v14 and v15, and storing the coefficient data c002 into r 1;

namely: vmul v14, v15, v0, r 1; ld r1, lmb (r0+ +// c 002;

specifically, the number of vector registers may be allocated in accordance with the number of data obtained by multiplying the coefficient data c000 by the elements in the vector register v 0.

Step 206, multiplying the coefficient data c001 by the elements in the vector register v1, accumulating and storing the obtained result and the results in the vector registers v14 and v15, and storing the coefficient data c003 in the scalar register r 7;

namely: vmac v14, v15, v14, v15, v1, r 7; ld r7, lmb (r0+ +// c 003.

Step 207, multiplying the coefficient data c002 by the elements in the vector register v2, accumulating and storing the obtained result and the results in the vector registers v14 and v15, and storing the coefficient data c004 into the scalar register r 1;

namely: vmac v14, v15, v14, v15, v2, r 1; ld r1, lmb (r0+ +// c 004).

Step 208, multiplying the coefficient data c003 by the elements in the vector register v3, accumulating and storing the obtained result and the results in the vector registers v14 and v15, and storing the coefficient data c100 into a scalar register r 7;

namely: vmac v14, v15, v14, v15, v3, r 7; ld r7, lmb (r0+ +// c 100.

Step 209, multiplying the coefficient data c004 with the elements in the vector register v4, accumulating and storing the obtained result and the results in the vector registers v14 and v15, and storing the coefficient data c101 into a scalar register r 1;

namely: vmac v14, v15, v14, v15, v4, r 1; ld r1, lmb (r0+ +// c 101.

Step 210, repeating steps 203 to 209, multiplying each group of pixel point data by coefficient data c100 to c104 in sequence, and accumulating and storing the obtained result and the result in the vector registers v14 and v 15;

namely: ld v0, lm (r3+ +// p 100-p 115; ld v1, lm (r3+ +// p 101-p 116;

ld v2,lm(r3++)//p102～p117；

vmac v14,v15,v14,v15,v0,r7；

ld r7,lmb(r0++)//c102；

vmac v14,v15,v14,v15,v1,r1；

ld r1,lmb(r0++)//c103；

ld v3,lm(r3++)//p103～p118；

ld v4,lm(r3++)//p104～p119；

vmac v14,v15,v14,v15,v2,r7；

ld r7,lmb(r0++)//c104；

vmac v14,v15,v14,v15,v3,r1；

ld r1,lmb(r0++)//c200；

ld v0,lm(r4++)//p200～p215；

ld v1,lm(r4++)//p201～p216；

ld v2,lm(r4++)//p202～p217；

vmac v14,v15,v14,v15,v4,r7；

ld r7,lmb(r0++)//c201；

vmac v14,v15,v14,v15,v0,r1；

ld r1,lmb(r0++)//c202；

vmac v14,v15,v14,v15,v1,r7；

ld r7,lmb(r0++)//c203；

ld v3,lm(r4++)//p203～p218；

ld v4,lm(r4++)//p204～p219；

vmac v14,v15,v14,v15,v2,r1；

ld r1,lmb(r0++)//c204；

vmac v14,v15,v14,v15,v3,r7；

ld r7,lmb(r0++)//c300；

ld v0,lm(r5++)//p300～p315；

ld v1,lm(r5++)//p301～p316；

ld v2,lm(r5++)//p302～p317；

vmac v14,v15,v14,v15,v4,r1；

ld r1,lmb(r0++)//c301；

vmac v14,v15,v14,v15,v0,r7；

ld r7,lmb(r0++)//c302；

vmac v14,v15,v14,v15,v1,r1；

ld r1,lmb(r0++)//c303；

ld v3,lm(r5++)//p303～p318；

ld v4,lm(r5++)//p304～p319；

vmac v14,v15,v14,v15,v2,r7；

ld r7,lmb(r0++)//c104；

vmac v14,v15,v14,v15,v3,r1；

ld r1,lmb(r0++)//c400；

ld v0,lm(r6++)//p400～p415；

ld v1,lm(r6++)//p401～p416；

ld v2,lm(r6++)//p402～p417；

vmac v14,v15,v14,v15,v4,r7；

ld r7,lmb(r0++)//c401；

vmac v14,v15,v14,v15,v0,r1；

ld r1,lmb(r0++)//c402；

vmac v14,v15,v14,v15,v1,r7；

ld r7,lmb(r0++)//c403；

ld v3,lm(r6++)//p403～p418；

ld v4,lm(r6++)//p404～p419；

vmac v14,v15,v14,v15,v2,r1；

ld r1,lmb(r0++)//c404；

vmac v14,v15,v14,v15,v3,r7；

vmac v14,v15,v14,v15,v4,r1。

the 16 data stored in the vector registers v14 and v15 are the result of the Gaussian Filter 5 × 5 algorithm.

In the prior art, when image processing is performed based on a Gaussian Filter 5 × 5 algorithm, 5 × 5 pixel point data in a solid frame shown in fig. 3 and 5 × 5 coefficient data shown in fig. 4 are multiplied respectively and then accumulated to obtain a first group of results Pr 000; specifically, the following formula is used to implement:

Pr000＝c000*p000+c001*p001+c002*p002+c003*p003+c004*p004+c100*p100+c101*p101+c102*p102+c103*p103+c104*p104+c200*p200+c201*p201+c202*p202+c203*p203+c204*p204+…+c404*p404；

respectively multiplying 5 × 5 pixel point data obtained by translating the solid line frame shown in fig. 3 to the right by a row with 5 × 5 coefficient data shown in fig. 4, and accumulating to obtain a first group of results Pr001, and so on to obtain Pr001 and Pr002.. Pr 015;

specifically, the following formula is used to implement:

Pr001＝c000*p001+c001*p002+c002*p003+c003*p004+c004*p005+c100*p10

1+c101*p102+c102*p103+c103*p104+c104*p105+c200*p201+…+c404*p405

Pr002＝c000*p002+c001*p003+c002*p004+c003*p005+c004*p006+c100*p102+c101*p103+c102*p104+c103*p105+c104*p106+c200*p202+…+c404*p406

…

Pr015＝c000*p015+c001*p016+c002*p017+c003*p018+c004*p019+c100*p115+c101*p116+c102*p117+c103*p118+c104*p119+c200*p215+…+c404*p419。

therefore, an operation structure diagram of the Gaussian Filter 5 × 5 algorithm in the embodiment of the present invention is shown in fig. 5.

In summary, it can be seen that, when a pixel data point is calculated in the prior art, 1 multiplication operation and 24 scalar operations of multiply-accumulate are required; therefore, to obtain the above operation result, 16 times of multiplication and 384 times of scalar operation are needed; the required operation times are 25 times of the operation times required by the embodiment of the invention, and the efficiency of processing the image by adopting the method of the embodiment of the invention is improved by 6.3 times.

Therefore, the image processing efficiency can be greatly improved by applying the embodiment of the invention. In addition, the embodiment of the invention is realized by adopting the assembly language, and the influence of a compiler on the image processing efficiency is avoided for different processing systems.

Method embodiment three

Taking Max 3 × 3 algorithm based on assembly language for image processing as an example, a schematic diagram of pixel data storage is shown in fig. 6; as shown in fig. 7, a detailed processing flow of a third image method in the embodiment of the present invention includes the following steps:

step 401, acquiring pixel point data of an image;

Step 402, storing pixel point data p000 to p015, p001 to p016 and p002 to p017 into vector registers v0, v1 and v2 respectively;

namely: ld v0, lm (r2+ +// p 000-p 015; ld v1, lm (r2+ +// p 001. about. p 016; ld v2, lm (r2+ +// p 002-p 017;

here, scalar register r2 is used to store the start address of pixel point data p 000.

Step 403, storing pixel point data p100 to p115, p101 to p116, and p102 to p117 into vector registers v3, v4, and v5, respectively;

namely: ld v3, lm (r3+ +// p 100-p 115; ld v4, lm (r3+ +// p 101-p 116;

ld v5,lm(r3++)//p102～p117；

the scalar register r3 is used to store the start address of the pixel point data p 100.

Step 404, performing longitudinal comparison operation on elements in the vector registers v0, v1 and v2, and storing a vector result formed by the maximum values into the vector register v 7;

specifically, p000, p001 and p002 are compared to obtain the maximum value of the three; then comparing p001, p002 and p003 to obtain the maximum value of the three, and so on;

namely: vmax v0, v0v1# vc 0; vmax v7, v0v2# vc 0.

Step 405, storing pixel point data p200 to p215, p201 to p216, p202 to p217 into vector registers v0, v1, v2 respectively;

namely: ld v0, lm (r4+ +// p 200. about.p 215;

ld v1,lm(r4++)//p201～p216

ld v2,lm(r4++)//p202～p217；

the scalar register r4 is used to store the start address of the pixel point data p 200.

Step 406, performing longitudinal comparison operation on elements in the vector registers v3, v4 and v5, and storing a vector result consisting of the obtained maximum values into the vector register v 6;

specifically, p100, p101 and p102 are compared to obtain the maximum value of the three; and comparing p101, p102 and p103 to obtain the maximum value of the three, and so on.

Step 407, performing longitudinal comparison operation on elements in the vector registers v7 and v6, and storing a vector result formed by the obtained maximum values into the vector register v 4;

namely: vmax v4, v7v6# vc 0.

And step 408, performing longitudinal comparison operation on the elements in the vector registers v0, v1 and v2, and storing the vector result formed by the obtained maximum values into the vector register v 5.

Step 409, carrying out longitudinal comparison operation on elements in the vector registers v4 and v5, and storing a vector result formed by the obtained maximum values into the vector register v 7;

namely: vmax v7, v4v5# vc 0.

Through the operation of the embodiment of the invention, 16 results stored in the vector register v7 are the results of the Max 3 × 3 algorithm; that is, the calculation result of the data of 16 pixel points can be obtained by 8 times of vector operation of taking the maximum value.

Therefore, the operation structure of the Max 3 × 3 algorithm is shown in fig. 8.

In the prior art, when image processing is performed based on Max 3 × 3 algorithm, to obtain the 1 st group of calculation results, it is necessary to take the maximum value of 3 × 3 pixel points in the solid frame shown in fig. 6 by the following formula:

Pr000＝max(max(max(p000,p001),max(p002,p100)),max(max(max(p101,p102),max(p200,p201)),p202))

wherein max is the operation for realizing the maximum value;

in calculating the results of the groups 2 to 15, the maximum value of the data of 3 × 3 pixels obtained by shifting the solid line frame shown in fig. 6 to the right by one line is obtained by the following formula:

Pr001＝max(max(max(p001,p002),max(p003,p101)),max(max(max(p102,p103),max(p201,p202)),p203))；

Pr002＝max(max(max(p002,p003),max(p004,p102)),max(max(max(p103,p104),max(p202,p203)),p204))；

…

Pr015＝max(max(max(p015,p016),max(p017,p115),max(max(max(p116,p117),max(p215,p216)),p217))。

in summary, in the prior art, the pixel data p000 and p001 … p015 and the pixel data p001 and p002 … p016 are calculated to obtain the maximum value, and the pixel data p002 and p003 … p017 and the pixel data p100 and p101 … p115 are calculated to obtain the maximum value; and by analogy, the maximum value of the image processing is obtained. Therefore, in the prior art, 8 times of scalar maximum value calculation is needed when calculating the result of the first pixel point data, and 128 times of scalar maximum value calculation is needed when obtaining the result of 16 pixel point data; based on the Max 3 multiplied by 3 algorithm, the operation times required by the prior art are 16 times of the operation times required by the embodiment of the invention, and the efficiency of processing the image by adopting the method of the embodiment of the invention is improved by 4.8 times.

It should be noted that, in the embodiment of the present invention, a maximum value operation may be performed by using three sets of pixel point data, or a maximum value operation may be performed by using two sets of pixel point data.

Example four

In order to implement the image processing method, an eighth embodiment of the present invention further provides an image processing apparatus, where a composition structure of the apparatus is as shown in fig. 9, and the apparatus includes: an acquisition module 11 and an operation module 12; wherein,

the obtaining module 11 is configured to obtain pixel data of an image;

the operation module 12 is configured to perform parallel operation on the pixel point data to obtain an operation result.

In a specific embodiment, the apparatus further comprises: a storage module 13, a plurality of vector register modules 14, and a plurality of scalar register modules 15; wherein,

the storage module 13 is configured to store the pixel data, and allocate a storage address to each pixel data;

the vector register module 14 is configured to read pixel point data from the storage module, and store the pixel point data.

In a specific embodiment, the apparatus further comprises: a dividing module 16, configured to divide the pixel point data into a plurality of groups;

the vector registering module 14 is specifically configured to store each group of pixel point data; wherein,

the number of each group of pixel point data is the ratio of the bit width of the vector register module to the bit number of the pixel points; taking the bit width of the vector register as 128 bits as an example, the bit width of each pixel point data is 8 bits; then, the number of each group of pixel point data is equal to the bit width 128 of the vector register divided by the bit width 8 of each pixel point data, and the obtained value is 16; namely, the number of each group of pixel point data is 16;

in one embodiment, taking two vector register modules as an example, the first vector register module is used for storing a first group of pixel point data; the second vector register module is used for storing a second group of pixel point data; the first vector register module is further used for storing a third group of pixel point data after the first group of pixel point data operation is finished; the second vector register module is further configured to store a fourth set of pixel point data after the second pixel point data operation is finished; and the like until the last group of pixel point data is stored in the first vector register module or the second vector register module.

In a specific embodiment, taking three vector register modules as an example, a specific implementation manner when each group of pixel data is stored in one vector register is as follows: storing a first group of pixel point data to a first vector register module, storing a second group of data to a second vector register module, and storing a third group of data to a third vector register module; after the first group of pixel point data operation is finished, storing a fourth group of pixel point data to the first vector register module; after the second group of pixel point data operation is finished, storing a fifth group of pixel point data to the second vector register module, and after the third group of pixel point data operation is finished, storing a sixth group of pixel point data to the third vector register module; and so on until the last set of pixel point data is stored to the first vector register, or the second vector register, or a third vector register.

In a specific embodiment, the operation module 12 is specifically configured to perform a first operation on each group of pixel point data according to an image processing algorithm, so as to obtain a processing result of each group of pixel point data;

In an embodiment of the present invention, the image processing algorithm includes: 3 × 3, 4 × 4 or 5 × 5 algorithms such as Gaussian Filter, Median Filter, Sobel operator, Max, Min and the like; thus, the first operation and the second operation are related to an algorithm that processes the image.

EXAMPLE five

In order to implement the image processing method, a fifth embodiment of the present invention further provides an image processing apparatus, where a composition structure of the apparatus is as shown in fig. 10, and the apparatus includes: a processor 21, configured to obtain pixel data of an image; performing parallel operation on the pixel point data to obtain an operation result

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be noted that, in practical applications, the functions executed by the obtaining module 11, the operation module 12, and the dividing module 16 may be implemented by a Central Processing Unit (CPU), a microprocessor unit (MPU), a Digital Signal Processor (DSP), or a programmable gate array (FPGA) located on a terminal or a server; the functions performed by the vector register module 14 can be implemented by a vector register located on a terminal or a server; the function performed by the scalar register module 15 may be implemented by a scalar register located on a terminal or a server.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring pixel point data of an image;

2. The method of claim 1, wherein after obtaining pixel point data for an image, the method further comprises:

storing the storage address of the pixel point data to a scalar register;

3. The method of claim 2, wherein storing the pixel point data to a vector register comprises:

dividing the pixel point data into a plurality of groups, and storing the pixel point data of each group into a vector register; wherein,

the number of each group of pixel point data is the ratio of the bit width of the vector register to the bit number of the pixel point.

4. The method of claim 3, wherein each set of pixel point data is stored in a vector register, comprising:

5. The method of claim 3 or 4, wherein said performing a parallel operation on said pixel point data comprises:

6. An image processing apparatus, characterized in that the apparatus comprises: the device comprises an acquisition module and an operation module; wherein,

the acquisition module is used for acquiring pixel point data of the image;

7. The apparatus of claim 6, further comprising: the scalar register comprises a storage module, a plurality of scalar register modules and a plurality of vector register modules; wherein,

8. The apparatus of claim 7, further comprising: the dividing module is used for dividing the pixel point data into a plurality of groups;

9. The apparatus of claim 8, wherein the first vector register module is configured to store a first set of pixel point data;

10. The apparatus according to claim 8 or 9, wherein the operation module is specifically configured to perform a first operation on each set of pixel point data according to an image processing algorithm, to obtain a processing result of each set of pixel point data;

11. An image processing apparatus, characterized in that the apparatus comprises: the processor is used for acquiring pixel point data of the image;