CN106878586B

CN106878586B - reconfigurable parallel image detail enhancement method and device

Info

Publication number: CN106878586B
Application number: CN201710013621.7A
Authority: CN
Inventors: 刘壮; 郭若杉; 谭吉来; 李瑞玲; 韩睿; 李晨
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2017-01-09
Filing date: 2017-01-09
Publication date: 2019-12-06
Anticipated expiration: 2037-01-09
Also published as: CN106878586A

Abstract

the invention relates to a reconfigurable parallel image detail enhancement method, which comprises the following steps: parameter preloading, data buffering, filtering in the horizontal and vertical directions, coring filtering, overshoot suppression, amplitude suppression and cache data updating; the invention also relates to a reconfigurable parallel image detail enhancement device, which comprises a local memory, an access control unit, a general buffer, a parallel Arithmetic Logic Unit (ALU), a state machine and a parallel multiply accumulator MAC. The invention enhances the image detail signal, makes the texture area clearer, simultaneously improves the use efficiency of data, reduces the data interaction between the operation component and the peripheral memory, reduces the access bandwidth pressure, and can realize the reuse of hardware resources.

Description

reconfigurable parallel image detail enhancement method and device

Technical Field

The invention relates to the field of video image processing, in particular to a method and a device for enhancing details of a reconfigurable parallel image.

Background

at present, one of the mainstream development directions of video technology is ultra high definition (4K resolution) display technology. Compared with high-definition (1920 × 1080) video, the number of pixels of 4K video is increased from 2M to 8M, and therefore higher requirements are placed on the image quality and the performance of the image enhancement algorithm.

the traditional video image detail enhancement solution is mainly designed aiming at the requirements of high definition and the following standards, and when the 4K image processing requirement is met, the problem of insufficient processing capability is likely to occur; meanwhile, the 4K ultra-high definition image can bring a finer picture effect, so that when the existing detail enhancement algorithm is applied to the 4K resolution image, the negative effects such as overshoot and the like can be more easily perceived by a viewer.

In addition, since the conventional solution usually adopts an application specific integrated circuit chip of a cured algorithm as a specific implementation, the cost pressure is huge when the algorithm upgrading requirement is met.

Therefore, a new video image detail enhancement solution needs to be provided, and the requirements for the solution are that 1 the image sharpness can be improved, 2 the negative effects of overshoot, noise amplification and the like caused by detail enhancement are well inhibited, 3 the processing requirement of real-time ultra-high-definition video stream is met, and 4 the solution has algorithm upgrading potential on the premise of controllable cost.

disclosure of Invention

In order to solve the above problems in the prior art, the present invention provides a reconfigurable parallel image detail enhancement method, including the following steps:

Step 1, loading image data to be processed to a buffer; the image data to be processed is a pixel lattice of R x Q, wherein the value of R or Q is equal to the parallelism N; the pixel dot matrix can be split into a plurality of one-dimensional dot matrixes containing N pixel points;

step 2, filtering in the horizontal direction and the vertical direction is respectively carried out on each pixel point to be enhanced in the one-dimensional lattice in parallel, and detail signals in two directions are obtained;

step 3, carrying out coring filtering on detail signals in two directions, and filtering out tiny detail signals introduced by image noise;

step 4, controlling the intensity of the enhanced detail signal through the gray scale symmetry of two sides of the neighborhood of the pixel point to be enhanced and the intensity of the detail signal of the pixel point to be enhanced, carrying out overshoot suppression, and adding the two detail signals which are subjected to overshoot suppression to obtain the detail signals of the N pixel points;

step 5, further carrying out amplitude suppression on the detail signals obtained in the step 4;

And 6, sequentially executing the steps 2 to 5 to each one-dimensional lattice in the image data to be processed, and finishing the detail enhancement of the image data to be processed.

Preferably, the buffer comprises NM buffer units of size N pixels; the buffer is equipped with 4 read ports and 4 write ports.

Preferably, the filters used in the horizontal and vertical filtering are respectively corresponding to one-dimensional filters of a horizontal NH order and a vertical NV order, and the grayscales of (NH-1)/2 pixels at the left and right sides and (NV-1)/2 pixels at the top and bottom sides of the pixel are respectively calculated, and the detail signals of the pixel in two directions are obtained by combining the grayscale values of the pixel.

Preferably, the buffer is a multi-granularity discrete memory structure.

Preferably, the filtering in the horizontal and vertical directions is specifically to perform spatial convolution on the filtering template and the image data, and the filtering result is represented as:

Wherein, (i, j) represents a pixel point at the ith row and jth column position in the image data, DEH (i, j) represents a horizontal filtering result at (i, j), DEV (i, j) represents a vertical filtering result at (i, j), P (i, j) represents a pixel gray level at the ith row and jth column position in the image, fh (k) represents a kth element of the horizontal template, and fv (t) represents a tth element of the vertical template.

Preferably, the overshoot suppression in step 4 is to process the horizontal detail signal and the vertical detail signal respectively, and then add the two detail signals subjected to the overshoot suppression to obtain a final detail signal, and the specific method is as follows:

Step 41, performing absolute difference operation by using the gray value of the pixel point to be processed and the gray values of the left (NH-1)/2 pixels, the right (NH-1)/2 pixels, the upper (NV-1)/2 pixels, so as to obtain four groups of gray absolute differences of the left (NH-1)/2 pixels, the right (NH-1)/2 pixels, the upper (NV-1)/2 pixels;

step 42, calculating the mean of four sets of absolute differences: mean _ L, Mean _ R, Mean _ T, Mean _ B, i.e. the Mean of the four gray differences around the point;

step 43, calculating a first overshoot suppression factor alpha and a second overshoot suppression factor beta, wherein the formula is

alpha＝ka*Y_abs_mean

Where ka is a set coefficient, Y _ abs _ Mean is an absolute difference of a Mean absolute difference of gray scale, i.e., | Mean _ L-Mean _ R | or | Mean _ T-Mean _ B |, de is a detail signal intensity, and kb is a set positive coefficient.

And step 44, calculating the overshoot suppression factor s to be 1-alpha × beta, performing overcharge suppression, and acquiring the overshoot suppressed detail signal de _ ss to be de × s.

preferably, the detail signal strength de ═ de _ h + de _ v, where de _ h is the detail signal strength in the horizontal direction and de _ v is the detail signal strength in the vertical direction.

Preferably, the amplitude suppression in step 5 is performed by:

Step 51, multiplying de _ ss by a detail enhancement coefficient gain to obtain an enhanced detail signal de _ gain;

step 52, carrying out amplitude suppression according to the following formula, and obtaining a final detail signal de _ final;

where Th is a set threshold and Max _ de is a set maximum amplitude.

preferably, the output value after the amplitude suppression in step 5 is Yout + de _ final, where Yout and Yin are the output pixel gray scale and the input pixel gray scale, respectively.

preferably, before the step 1, a parameter preloading step is further included, and the parameter preloading step includes: and loading preset solidification parameters in the filtering, coring filtering, overshoot suppression and amplitude suppression in the horizontal and vertical directions to the universal buffer.

Preferably, the image data to be processed in step 1 is obtained by sequentially splitting the image data according to a pixel lattice of R × Q; in step 1, the loading to the buffer is performed by:

And sequentially selecting the image data to be processed according to the splitting sequence of the image data and processing the image data through the steps 2 to 6 until all the image data to be processed are processed.

on the other hand, the invention also provides a reconfigurable parallel image detail enhancement device which is characterized by comprising a local memory, an access control unit, a universal buffer, a parallel Arithmetic Logic Unit (ALU), a state machine and a parallel Multiply Accumulator (MAC);

the local memory is used for storing input and output image data and parameters required by a parallel video image contrast enhancement algorithm, and the memory supports parallel access;

The memory access control unit is used for data exchange between the local memory and the universal buffer;

the general buffer is used for buffering all data and intermediate results required by a complete processing flow at a time, and the buffer can be directly indexed through an address;

The parallel arithmetic logic unit is used for executing non-multiplication arithmetic and logic operation related to a parallel video image contrast enhancement algorithm; the parallelism is N;

The state machine is used for generating control signals of all functional components;

The parallel multiply accumulator is used for executing multiplication correlation operation, and the parallelism degree of the parallel multiply accumulator is N;

The state machine is respectively connected with the parallel arithmetic logic unit, the access control unit, the universal buffer and the parallel multiply accumulator through communication lines; the local memory is connected with the memory access control unit through a communication line; the general buffer is respectively connected with the access control unit, the parallel arithmetic logic unit and the parallel multiply accumulator through communication lines; the parallel arithmetic logic unit is connected with the parallel multiply accumulator through a communication line.

The invention has the following beneficial effects:

1. The image detail signal is enhanced, so that the texture area is clearer;

2. The details are enhanced, and simultaneously, the noise and the overshoot are effectively reduced;

3. The image processing algorithm is easy to carry out later-stage optimization and upgrade;

4. The data use efficiency is improved, the data interaction between the operation component and the peripheral memory is reduced, and the memory access bandwidth pressure is reduced;

5. by using the general buffer and the state machine to control the functional components, the reuse of hardware resources is realized.

Drawings

FIG. 1 is a schematic structural diagram of a reconfigurable parallel image detail enhancement device according to the present invention;

FIG. 2 is a flow chart of a parallel image detail enhancement method provided by the present invention;

FIG. 3 is a buffer diagram of a general buffer according to an embodiment of the invention;

FIG. 4 is an exemplary diagram of horizontal 7-order filtering and vertical 5-order filtering;

FIG. 5 is a diagram illustrating exemplary coring filtering noise reduction according to an embodiment of the present invention;

fig. 6(a) to (d) are exemplary diagrams of scenes in which an overshoot phenomenon is likely to occur;

FIG. 7 is an exemplary graph of an overshoot suppression factor alpha calculation curve according to an embodiment of the present invention;

FIG. 8 is an exemplary diagram of an overshoot suppression factor beta calculation curve in accordance with an embodiment of the present invention;

FIG. 9 is an exemplary diagram of an overshoot suppression procedure in accordance with an embodiment of the present invention;

FIG. 10 is an exemplary graph of interpolation along an edge in accordance with embodiments of the invention.

Detailed Description

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.

the invention discloses a reconfigurable parallel image detail enhancement device, which comprises a local memory, an access control unit, a general buffer, a parallel Arithmetic Logic Unit (ALU), a state machine and a parallel Multiply Accumulator (MAC), wherein the local memory is connected with the access control unit;

The memory access control unit is used for data exchange between the local memory and the universal buffer; in the embodiment, three access control units with completely consistent functions are adopted, so that the bottleneck of access resources is broken through;

When the enhancement algorithm needs to be changed, the device only needs to reprogram the state machine to generate a new control signal and update the algorithm parameters in the local memory, so that algorithm iteration can be quickly realized without redesigning and manufacturing a hardware circuit.

the invention also provides a reconfigurable parallel image detail enhancement method, as shown in fig. 2, comprising the following steps:

Step 1, data buffering: loading image data to be processed to a buffer; the image data to be processed is a pixel lattice of R x Q, wherein the value of R or Q is equal to the parallelism N; the pixel dot matrix can be split into a plurality of one-dimensional dot matrixes containing N pixel points;

Step 2, filtering: filtering the pixels to be enhanced in the one-dimensional lattice in parallel in the horizontal direction and the vertical direction respectively to obtain detail signals in two directions;

step 3, noise reduction: carrying out coring filtering on detail signals in two directions to filter out tiny detail signals introduced by image noise;

Step 4, overshoot suppression: controlling the intensity of the enhanced detail signal through the gray scale symmetry of two sides of the neighborhood of the pixel point to be enhanced and the intensity of the detail signal of the pixel point to be enhanced, carrying out overshoot suppression, and adding the two detail signals which are subjected to overshoot suppression to obtain the detail signals of the N pixel points;

step 5, amplitude suppression: further performing amplitude suppression on the detail signal acquired in the step 4;

Step 6, updating cache data: and (5) sequentially executing the steps 2 to 5 on each one-dimensional lattice in the image data to be processed by updating the data in the buffer, and finishing the detail enhancement of the image data to be processed.

this embodiment further includes a parameter preloading step before step 1, where the parameter preloading step includes: and loading preset solidification parameters in the filtering, coring filtering, overshoot suppression and amplitude suppression in the horizontal and vertical directions to the universal buffer.

1. parameter preloading

This step belongs to the initialization phase of the apparatus of the present invention, and the filter coefficients in the horizontal and vertical directions, the solidification parameters such as the threshold values used in the coring filtering, the overshoot suppression, and the amplitude suppression, etc. are pre-loaded into the general buffer.

FIG. 3 is a general buffer according to an embodiment of the present invention. As shown in fig. 3, the general buffer (represented by capital letter M) coexists in NM buffer units of size N pixels, is equipped with 4 read ports (r0, r1, r2, r3) and 4 write ports (w0, w1, w2, w3), and can carry high-speed read and write operations. The universal buffer M supports direct use of serial numbers to read and write NM buffer units of the universal buffer M, and is convenient for repeated use of data. The general buffer and the arithmetic element are operated synchronously, which avoids the problem that the high-speed arithmetic element waits for the low-speed storage element.

2. Data buffering

Sequentially splitting image data according to the pixel lattice of R-Q to obtain a plurality of image data to be processed, sequentially selecting the image data to be processed according to the splitting sequence of the image data, loading the image data to be processed into a buffer, and processing the image data through the steps 2 to 6 until all the image data to be processed are processed; the image data to be processed is a pixel lattice of R x Q, wherein the value of R or Q is equal to the parallelism N; the pixel lattice can be split into a plurality of one-dimensional lattices containing N pixels.

the invention provides a processing method with the parallelism of N, namely equivalent to N filters working simultaneously, so that N pixels in NH columns or N pixels in NV rows are buffered in a general buffer before filtering. Meanwhile, the parallel processing apparatus and method according to the present invention can be regarded as an apparatus and method for processing N-dimensional vector data, and therefore the present invention will be described in detail in the following part of this document from the viewpoint of the operation on N-dimensional vectors.

The algorithm of the invention relates to parallel processing of a column of pixels, namely, the column-by-column access is required to be carried out on a memory, and the traditional memory does not support an efficient column-by-column access mode, so that the device of the invention adopts a multi-granularity discrete memory structure, and can be specifically designed by referring to 'patent number 201110460585.1 entitled multi-granularity parallel memory system and memory'.

3. Filtering

The image detail enhancement method firstly needs to obtain detail signals in the horizontal and vertical directions through filtering, and specifically adopts a one-dimensional filter of a horizontal NH order and a vertical NV order to extract the detail signals. Generally, the higher the order of the filter is, the stronger the extraction capability of the detail signal is, and accordingly, the more obvious the negative effects such as the overshoot effect are, the above two points and the signal symmetry are taken into consideration comprehensively, and usually, a plurality of filters of 5 or 7 orders horizontally and 3 or 5 orders vertically are adopted to implement combined filtering to obtain the best effect, and fig. 4 is a schematic diagram of filtering of 7 orders vertically and 5 orders horizontally for a single pixel. When a detail signal of a pixel point is obtained, the gray value of the pixel point to be processed and the gray values of (NH-1)/2 pixels at the left and right sides and (NV-1)/2 pixels at the upper and lower sides of the pixel point are needed.

the invention realizes the extraction of detail signals through two groups of horizontal and vertical one-dimensional filters, and the specific operation is to carry out spatial convolution on a filtering template and an image, and the specific description is as follows:

if the horizontal filtering template is FH, the vertical filtering template is FV, FH (k) is used to represent the kth element of the horizontal template, FV (t) is used to represent the tth element of the vertical template, and P (i, j) represents the gray level of the pixel at the ith row and jth column position of the image, then the horizontal filtering result DEH (i, j) and the vertical filtering result DEV (i, j) at the (i, j) pixel point can be expressed as formula (1) or formula (2):

Wherein, FH (0) and FV (0) correspond to the filter template middle position element.

the present invention employs parallel processing so that each element of the filter template can be viewed as an N-dimensional vector and P can be viewed as the gray scale of consecutive N pixels in the ith row or jth column. In addition, the vector multiplication involved in the method of the invention is different from the mathematical vector outer product or inner product, in the method, the vector multiplication is to multiply corresponding position elements of two vectors with the same dimension, and the result is still an N-dimensional vector. For simplicity, a two-dimensional vector is taken as an example, and if the vector a is (a1, a2) and the vector b is (b1, b2), the vector multiplication a × b is (a1b1, a2b 2). Wherein a1, a2, b1 and b2 are all real numbers, and a1b1 and a2b2 represent real number products.

when the filtering work is carried out, the device of the invention firstly sends the image data to be processed and the filter coefficient in the buffer area in the general buffer to the register of the MAC in sequence, the MAC has four registers with equivalent width of N and is used for completing N-dimensional vector multiplication and accumulation operation, and the result of the multiplication and accumulation operation can be returned to the general buffer so as to be conveniently called again or directly transmitted to other operation parts to participate in the subsequent processing.

4. noise reduction

the invention adopts the coring filtering to restrain the noise contained in the extracted detail signal, and the coring filtering principle is as follows: the default detail signal is superimposed with a relatively small noise signal, so that the detail signal is subtracted by a small value called the coring filter threshold, i.e. the detail signal is considered to be noise-free. The specific operation is that firstly, the positive and negative of the detail signal are judged, and the sign bit is obtained, if the signal is a positive value, the sign bit is 1, otherwise, the sign bit is-1; then, taking an absolute value of the detail signal, subtracting a coring filtering threshold value from the absolute value, and regarding all non-positive results as 0; and finally, multiplying the subtraction result by the sign bit to obtain a noise reduction result. Fig. 5 is a schematic diagram of the input-output relationship of the coring filtering.

this step involves (and zero) comparison, absolute value, subtraction, maximum value and multiplication, the remaining operations, except multiplication, being performed by the parallel arithmetic logic unit ALU. Similar to the MAC, the ALU also has 4 fully equivalent N-dimensional vector registers, which can perform arithmetic and logical operations on N data simultaneously.

5. Overshoot suppression

According to the method, the amplitude of detail enhancement is controlled according to the size of the detail signal and the gray scale symmetry of the neighborhood of the corresponding pixel point, so that overshoot suppression is realized. Generally, as shown in fig. 6(a), 6(b), 6(c), and 6(d), the overshoot phenomenon generally occurs in a region where the gray scale change is large (i.e., the detail is rich) and the gray scale is not symmetrical. Fig. 6(a) to (d) show four cases in which the luminance in the horizontal direction is asymmetric and the overshoot phenomenon is liable to occur, and the vertical direction is similar to this.

the strategy of the invention for overshoot suppression is to process a horizontal detail signal and a vertical detail signal respectively, and then add the two detail signals after overshoot suppression to obtain a final detail signal. The specific method comprises the following steps:

step 43, obtaining a first overshoot suppression factor alpha by using the curve shown in FIG. 7, as shown in formula (3)

alpha＝ka*Y_abs_mean (3)

Where ka is a set coefficient and Y _ abs _ Mean is the absolute difference of the Mean of the absolute differences of the gray levels, i.e. | Mean _ L-Mean _ R | or | Mean _ T-Mean _ B |.

meanwhile, a second overshoot suppression factor beta related to the strength of the detail signal is calculated by using the following formula, as shown in formula (4);

Where de is the detail signal strength and kb is the set positive coefficient, a graphical representation of the beta calculation formula is given in fig. 8.

In step 44, when the apparatus according to the present invention performs overshoot suppression, the gray scale values of the loaded NH column or NV row pixels may be directly obtained from the general buffer, two suppression factors may be calculated by using the gray scale data, and an overshoot control factor s is calculated as 1-alpha × beta according to alpha and beta, so as to perform overshoot suppression, and obtain a detail signal de _ ss as de × s after the overshoot suppression.

Since the present invention performs overshoot suppression in two directions, the final detail signal de _ ss is de _ ss _ h + de _ ss _ v, and de _ ss _ h and de _ ss _ v represent the strength of the detail signal subjected to overshoot suppression in the horizontal and vertical directions, respectively. X of de _ ss _ X in FIG. 9 represents h or v, i.e., a detail signal in the horizontal or vertical direction.

The specific execution flow of the reconfigurable parallel image detail enhancement device combined with the present invention is shown in fig. 9, and includes: image data is loaded to the ALU; calculating a gray absolute value difference, and outputting a result to the MAC; accumulating the absolute difference, and outputting the result to ALU; calculating an absolute difference mean value, and further calculating an absolute difference of the absolute difference mean value; ka is loaded to MAC, alpha is calculated, and the result is kept in a MAC register; loading the de-noised detail signals de and kb to the MAC, and calculating beta; calculating the product of alpha and beta, and outputting the result to ALU; calculating an overshoot control factor s as 1-alpha x beta, and outputting a result to the MAC; the overshoot-suppressed detail signal de _ ss _ X is calculated as de _ X × s, and the result is output to the MAC.

6. Amplitude suppression

The overshoot-suppressed detail signal may still have too high intensity, which may result in over-enhancement of the image and thus affect the viewing quality, and therefore, the amplitude of the enhanced detail signal needs to be controlled. The step is divided into two processes:

Step 51, amplifying the overshoot suppressed detail signal de _ ss by multiplying de _ ss by a detail enhancement coefficient gain in the MAC to obtain an enhanced detail signal de _ gain;

Step 52, the result is output to the ALU, where it is amplitude suppressed according to the curve shown in fig. 10, and the final detail signal de _ final is obtained, as shown in equation (5),

where Th is a set threshold and Max _ de is a set maximum amplitude.

the final output result is Yout + de _ final. Where Yout and Yin are the output pixel gray scale and the input pixel gray scale, respectively. Yout is firstly output to a general buffer and then stored in a local memory by an access control unit.

7. Cache data update

after the detail enhancement of N pixels is completed, the data in the buffer area in the general buffer needs to be updated, the next N data are read, and the NH or the first of the NV N-dimensional vectors in the buffer area is replaced, which can be regarded as the sliding of the filter window in a physical sense.

and (5) sequentially executing the steps 2 to 5 on each one-dimensional lattice in the image data to be processed by updating the data in the buffer, and finishing the detail enhancement of the image data to be processed.

The above process explains the complete processing flow of the present invention, and the present invention realizes the reuse of hardware resources by programming the state machine and using the design of the universal buffer, and avoids the defects of long chip period and high version iteration cost of the traditional special circuit scheme design when running a complex algorithm.

it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related descriptions of the above-described apparatus may refer to the corresponding process in the foregoing method embodiments, and are not described herein again.

those of skill in the art will appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of electronic hardware and software. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. a reconfigurable parallel image detail enhancement method is characterized by comprising the following steps:

2. The method of claim 1, wherein the buffer comprises NM buffer cells of size N pixels; the buffer is equipped with 4 read ports and 4 write ports.

3. the method of claim 2, wherein the filtering in the horizontal and vertical directions is performed by using filters corresponding to horizontal NH-level and vertical NV-level one-dimensional filters, respectively calculating the gray levels of (NH-1)/2 pixels at left and right sides and (NV-1)/2 pixels at top and bottom sides of a pixel point, and combining the gray level of the pixel point to obtain the detail signal of the pixel point in both directions.

4. the method of claim 3, wherein the buffer is a multi-granular discrete memory structure.

5. the method according to claim 4, wherein the filtering in the horizontal and vertical directions is performed by performing spatial convolution on the filtering template and the image data, and the filtering result is represented as:

6. the method according to claim 5, wherein the overshoot suppression in step 4 is performed by processing the horizontal detail signal and the vertical detail signal separately, and then adding the two detail signals after the overshoot suppression to obtain the final detail signal, and the method comprises:

alpha＝ka*Y_abs_mean

7. The method according to claim 6, wherein the detail signal strength de ═ de _ h + de _ v, where de _ h is the detail signal strength in the horizontal direction and de _ v is the detail signal strength in the vertical direction.

8. The method of claim 7, wherein the step 5 of amplitude suppression comprises:

where Th is a set threshold and Max _ de is a set maximum amplitude.

9. The method according to claim 8, wherein the output value after amplitude suppression in step 5 is Yout + de _ final, where Yout and Yin are the output pixel gray scale and the input pixel gray scale respectively.

10. The method according to any one of claims 1 to 9, characterized by further comprising a parameter preloading step before the step 1, wherein the parameter preloading step comprises: and loading preset solidification parameters in the filtering, coring filtering, overshoot suppression and amplitude suppression in the horizontal and vertical directions to the buffer.

11. the method according to any one of claims 1 to 9, wherein the image data to be processed in step 1 is obtained by sequentially splitting the image data according to a pixel lattice of R x Q; in step 1, the loading to the buffer is performed by:

12. A reconfigurable parallel image detail enhancement apparatus, wherein the apparatus is used for loading and executing the reconfigurable parallel image detail enhancement method of any one of claims 1 to 11, and the apparatus comprises a local memory, an access control unit, the buffer, a parallel arithmetic logic unit ALU, a state machine, and a parallel multiply accumulator MAC;

The memory access control unit is used for data exchange between the local memory and the buffer;

The buffer is used for buffering all data and intermediate results required by a complete processing flow at a time, and the buffer can be directly indexed through addresses;

The parallel arithmetic logic unit is used for executing non-multiplication arithmetic and logic operation involved in the parallel video image contrast enhancement method; the parallelism is N;

the state machine is respectively connected with the parallel arithmetic logic unit, the access control unit, the buffer and the parallel multiply accumulator through communication lines; the local memory is connected with the memory access control unit through a communication line; the buffer is respectively connected with the access control unit, the parallel arithmetic logic unit and the parallel multiply accumulator through communication lines; the parallel arithmetic logic unit is connected with the parallel multiply accumulator through a communication line.