CN115049529A

CN115049529A - Image gradient determination method, device, equipment and storage medium

Info

Publication number: CN115049529A
Application number: CN202110251014.0A
Authority: CN
Inventors: 高强
Original assignee: Shanghai United Imaging Healthcare Co Ltd
Current assignee: Shanghai United Imaging Healthcare Co Ltd
Priority date: 2021-03-08
Filing date: 2021-03-08
Publication date: 2022-09-13

Abstract

The embodiment of the invention discloses a method, a device and equipment for determining image gradient and a storage medium. The method comprises the following steps: acquiring a pixel value of each pixel point in a digital image loaded into a global memory of a graphic processor; determining a target pixel value from each pixel value according to the storage capacity of a shared memory in a graphic processor, and loading each target pixel value to the shared memory; and reading each target pixel value from the shared memory, and respectively determining the image gradient of the digital image at the pixel position where each target pixel value is located based on the reading result. According to the technical scheme of the embodiment of the invention, the locality of the target pixel value on the two-dimensional space is ensured by loading the target pixel value in the digital image into the shared memory with the two-dimensional physical structure, so that the condition of cache miss can be avoided in the subsequent data reading process, and the effect of improving the determination speed of the image gradient is achieved in the computer implementation process.

Description

Image gradient determination method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computer application, in particular to a method, a device, equipment and a storage medium for determining image gradient.

Background

Image gradients are gradient values of digital images, which are commonly used in the field of digital image processing for edge detection, image segmentation, optimal solution solving, and so forth. For a digital image f (x, y), the two-dimensional column vector defined at the coordinates (x, y) is expressed as:

from a computer implementation perspective, the image gradient determination process is usually implemented in a differential form, which can be obtained from the difference of pixel values of adjacent pixels in the x direction or the y direction, as shown in fig. 1. However, if the mechanism of the computer is not considered, it is difficult to obtain better performance in the determination process, because the CPU often cannot obtain the required data in the Cache (Cache), and has to read the data from the memory (memory), and the bandwidth of the Cache is often several orders of magnitude faster than the memory, and the process of Cache Miss (Cache Miss) is one of the main reasons for the poor performance of the computer.

Specifically, as shown in fig. 2, based on the principle of locality of a computer program, when a computer reads data from a memory, the computer does not read only the data at a certain position, but reads multiple adjacent data at the same time to fill a cache line (cache line). On the basis, because the image matrix of the C language is usually a row main sequence, the data required by the image gradient operation in the x direction can be directly mapped into the cache row, namely, the cache miss condition occurs with a small probability when the image gradient operation is performed in the x direction; however, the image gradient operation in the y direction needs to be performed across lines, and due to the limited storage capacity of the cache line, data required for performing the image gradient operation in the y direction inevitably exceeds the range in which the cache line can be stored, that is, there is a high probability that a cache miss occurs in the y direction, so that the determination speed of the image gradient in the computer implementation process is slow.

Disclosure of Invention

The embodiment of the invention provides an image gradient determining method, device and equipment and a storage medium, which are used for realizing the effect of rapidly determining an image gradient in the computer implementation process.

In a first aspect, an embodiment of the present invention provides an image gradient determining method, which may include:

acquiring a pixel value of each pixel point in a digital image loaded to a global memory of a graphic processor;

determining a target pixel value from each pixel value according to the storage capacity of a shared memory in the graphic processor, and loading each target pixel value to the shared memory;

and reading each target pixel value from the shared memory, and respectively determining the image gradient of the digital image at the pixel position where each target pixel value is located based on the reading result.

Optionally, loading each target pixel value into the shared memory may include:

loading, for each thread running in each thread in the graphics processor, each first pixel value associated with the thread in each target pixel value into a shared memory based on the thread;

correspondingly, reading each target pixel value from the shared memory, and determining an image gradient of the digital image at the pixel position where each target pixel value is located based on the reading result, respectively, may include:

and for each thread, after the thread is monitored to receive a preset synchronous instruction, reading second pixel values associated with the thread from the shared memory based on the thread, and respectively determining the image gradient of the digital image at the pixel position where the second pixel values are located from the reading result.

On this basis, optionally, the shared memory stores each first pixel value in a data structure of a bucket, each bucket includes a plurality of storage units, and the first pixel values are stored in the corresponding storage units; after reading each second pixel value associated with the thread from the shared memory based on the thread, the image gradient determining method may further include:

and obtaining a second pixel value stored in the first unit in each storage unit based on the reading result, and broadcasting the obtained result based on the graphic processor.

On this basis, optionally, the shared memory stores each first pixel value in a data structure of a bucket, each bucket includes a plurality of storage units, and the first pixel values are stored in the corresponding storage units; determining the image gradient of the digital image at the pixel position where the second pixel value is located from the reading result may include:

determining a third pixel value in a second unit in each storage unit and a fourth pixel value in a third unit adjacent to the second unit and belonging to the same bucket in each storage unit based on each second pixel value obtained from the reading result; and determining the image gradient of the digital image at the pixel position where the third pixel value is located according to the third pixel value and the fourth pixel value.

On this basis, optionally, determining the target pixel value from the pixel values may include:

determining a target pixel value from pixel values not loaded to a shared memory;

correspondingly, after the image gradients of the digital image at the pixel positions where the second pixel values are located are respectively determined from the reading results, the image gradient determining method may further include:

and repeating the step of executing the steps according to the storage capacity of the shared memory in the graphics processor until no pixel values which are not loaded to the shared memory exist any more.

Optionally, loading each target pixel value into the shared memory may include:

loading each target pixel value to a shared memory, and loading a neighborhood pixel value adjacent to the edge pixel value in each pixel value to a register of a graphic processor aiming at the edge pixel value loaded to the edge of the shared memory in each target pixel value;

correspondingly, determining the image gradient of the digital image at the pixel position where each target pixel value is located based on the reading result may include:

for each read target pixel value, if the target pixel value is an edge pixel value, reading a neighborhood pixel value adjacent to the edge pixel value from a register; and determining the image gradient of the digital image at the pixel position where the edge pixel value is located according to the neighborhood pixel value and the edge pixel value.

Optionally, the target pixel value includes a target real part value or a target imaginary part value of the magnetic resonance signal, and the target real part value and the target imaginary part value belonging to the same magnetic resonance signal are respectively loaded in different buckets of the shared memory;

determining image gradients of the digital image at pixel positions where the target pixel values are located respectively based on the reading results may include:

acquiring a target real part value and a target imaginary part value which belong to the same magnetic resonance signal from each read target pixel value; and weighting the target real part value and the target imaginary part value, and determining the image gradient of the digital image at the pixel position corresponding to the weighting result based on the reading result.

In a second aspect, an embodiment of the present invention further provides an image gradient determining apparatus, which may include:

the image processing device comprises a pixel value acquisition module, a pixel value acquisition module and a pixel value acquisition module, wherein the pixel value acquisition module is used for acquiring the pixel value of each pixel point in a digital image loaded into a global memory of a graphic processor;

the shared memory loading module is used for determining a target pixel value from each pixel value according to the storage capacity of a shared memory in the graphics processor and loading each target pixel value to the shared memory;

and the image gradient determining module is used for reading each target pixel value from the shared memory and respectively determining the image gradient of the digital image at the pixel position where each target pixel value is located based on the reading result.

In a third aspect, an embodiment of the present invention further provides an image gradient determining apparatus, which may include:

one or more processors;

a memory for storing one or more programs;

when executed by one or more processors, cause the one or more processors to implement the image gradient determination methods provided by any of the embodiments of the present invention.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the image gradient determining method provided in any embodiment of the present invention.

According to the technical scheme, the target pixel value is determined from the pixel value of each pixel point in the digital image loaded into the global memory of the graphic processor through the storage capacity of the shared memory in the graphic processor, and the shared memory has a two-dimensional physical structure, so that when each target pixel value is loaded into the shared memory, the locality of the target pixel value on a two-dimensional space is better ensured; and then, reading each target pixel value from the shared memory, wherein each target pixel value has locality in a two-dimensional space, so that when the image gradient of the digital image at the pixel position where each target pixel value is located is determined respectively according to the reading result, the condition of cache miss is avoided, and the determination speed of the image gradient is improved in the computer implementation process. According to the technical scheme, the locality of the target pixel value in the digital image on a two-dimensional space is guaranteed in a mode of loading the target pixel value in the digital image to the shared memory with a two-dimensional physical structure, and therefore when the image gradient is determined based on the target pixel value read from the shared memory, the condition of cache miss is avoided, and the effect of improving the determination speed of the image gradient in the computer implementation process is achieved.

Drawings

FIG. 1 is a first schematic diagram of a prior art image gradient determination process;

FIG. 2 is a schematic illustration of data reading during image gradient determination in the prior art;

FIG. 3 is a flow chart of a method for determining image gradients in accordance with a first embodiment of the present invention;

FIG. 4 is a flowchart of an image gradient determination method according to a second embodiment of the present invention;

FIG. 5 is a diagram illustrating data reading in an image gradient determination method according to a second embodiment of the present invention;

FIG. 6 is a flowchart of an image gradient determination method according to a third embodiment of the present invention;

FIG. 7 is a flow chart of a method for determining image gradients in a fourth embodiment of the present invention;

FIG. 8 is a schematic diagram of an image gradient determination method applied to MR in a fourth embodiment of the present invention;

FIG. 9 is a block diagram of an apparatus for determining an image gradient according to a fifth embodiment of the present invention;

fig. 10 is a schematic structural diagram of an image gradient determination apparatus in a sixth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 3 is a flowchart of an image gradient determining method according to an embodiment of the present invention. The embodiment is applicable to the case of quickly determining the image gradient in the computer implementation process, and is particularly applicable to the case of increasing the determination speed of the image gradient by loading the target pixel value into the shared memory with the two-dimensional physical structure. The method may be performed by an image gradient determining apparatus provided by the embodiment of the present invention, which may be implemented by software and/or hardware, and the apparatus may be integrated on an image gradient determining device, which may be various user terminals or servers.

Referring to fig. 3, the method of the embodiment of the present invention specifically includes the following steps:

s110, obtaining the pixel value of each pixel point in the digital image loaded to the global memory of the graphic processor.

The digital image may also be referred to as a digital image or a digital image, which is an image obtained by digitizing an analog image, which uses pixels as basic elements and can be stored and processed based on a digital computer or a digital circuit, and the digital image may be, for example, a medical image, a natural image, and the like, wherein the medical image is a digital image acquired based on a medical Imaging device, such as a Magnetic Resonance Imaging (MR) image, a Computed Tomography (CT) image, a Positron Emission Computed Tomography (PET) image, and the like; the natural image is based on a digital image acquired by a general imaging device, which may be a mobile phone, a digital camera, a video camera, or the like.

A Graphics Processing Unit (GPU) is a microprocessor that runs Graphics operations on an electronic device, and a global memory is a memory in the GPU that can be accessed by different threads, which can be understood as a memory bank in the GPU.

In practical application, optionally, the pixel value of each pixel point in the digital image may be loaded into the memory first, and then loaded into the global memory from the memory, so that each pixel value may be obtained from the global memory after each pixel value is loaded into the global memory.

And S120, determining a target pixel value from each pixel value according to the storage capacity of the shared memory in the graphics processor, and loading each target pixel value to the shared memory.

The shared memory (shared memory) is a first-level cache that can be manually controlled (i.e., programmed by a programmer) in the GPU, is an abstract concept, and does not refer to any storage structure. It should be noted that, as long as the requirement of the storage structure of the shared memory (i.e., the two-dimensional physical structure, where the shared memory has the two-dimensional physical structure) is met and the data that does not exceed the storage capacity of the shared memory can be loaded into the shared memory, and the shared memory has the advantage of faster data reading speed, loading the data into the shared memory is a preferred scheme for increasing the data operation speed, which can effectively solve the problem of access delay between the memory and the GPU.

On the basis, the shared memory with the two-dimensional physical structure is very suitable for storing two-dimensional images, so that the digital images are stored in the shared memory in a two-dimensional form, in other words, the shared memory stores the pixel values of all pixel points in the digital images in a two-dimensional form, or the relative positions of all the pixel values in the shared memory are consistent with the relative positions of the pixel values in the digital images. Then, since the storage capacity of the shared memory is limited, it cannot accommodate all pixel values in the digital image at one time, so that a target pixel value can be determined from each pixel value according to the storage capacity, and each target pixel value is loaded to the shared memory, where the target pixel value is a pixel value that can be loaded in the shared memory in the current loading process. For example, assuming that the storage capacity is N × N, the target pixel values that have not been loaded into the shared memory may be loaded into the shared memory.

S130, reading each target pixel value from the shared memory, and respectively determining the image gradient of the digital image at the pixel position where each target pixel value is located based on the reading result.

The pixel position may be a position where a target pixel value is located, that is, a position where a pixel point corresponding to the target pixel value is located in the digital image; the image gradient may be a gradient of the digital image in any direction in which the pixel location is located, such as a gradient of the digital image in an x-direction and/or a gradient in a y-direction in which the pixel location is located. On the basis, after each target pixel value is read from the shared memory, for each pixel position, the image gradient of the digital image at the pixel position can be determined according to the target pixel value at the pixel position and the target pixel value at the pixel position adjacent to the pixel position. Illustratively, as shown in fig. 1, if the pixel position is (0,0), the target pixel values at (0,0), (0,1) and (1,0) are now read respectively, then the image gradient of the digital image in the x direction of (0,0) can be determined according to the target pixel values at (0,0) and (0,1), and the image gradient of the digital image in the y direction of (0,0) can be determined according to the target pixel values at (0,0) and (1, 0).

It should be noted that, since the shared memory stores each target pixel value in a data structure of a bucket, where the bucket is a two-dimensional data structure in the GPU, each row has 32 memory spaces, generally one memory space is 4 bytes, and generally, the GPU can have 32 threads running simultaneously, and ideally, the 32 threads can access 1 column of data corresponding to each bucket in parallel. If two or more threads access the same column of data, a bucket access conflict may occur, causing a data read latency. On this basis, for the target pixel value in the x direction, the target pixel values read by the adjacent threads may be pixel values at the same address in the same bank (which is a bucket in the embodiment of the present invention, and the shared memory is implemented by the bucket); and for the target pixel value in the y direction, as each thread reads the pixel value in different addresses in the same bank, the reading process of the target pixel value in the x direction and the y direction can not generate bank conflict, so that the locality of the target pixel value in a two-dimensional space is ensured, namely, different threads can generate inter-thread broadcasting when accessing the first target pixel value of the same bank, which is equivalent to that a certain thread accesses the target pixel value of the bucket, and then the target pixel value is broadcasted to other threads needing the target pixel value, but not all true threads have the action of accessing the target pixel value. Therefore, the target pixel values involved in the image gradient determining process are stored in the shared memory, and data do not need to be loaded from the memory at the moment, so that the problem of buffer miss in the x direction or the y direction is solved well, and the image gradient determining speed is improved in the computer implementation process.

According to the technical scheme, the target pixel value is determined from the pixel value of each pixel point in the digital image loaded into the global memory of the graphic processor through the storage capacity of the shared memory in the graphic processor, and the shared memory has a two-dimensional physical structure, so that when each target pixel value is loaded into the shared memory, the locality of the target pixel value on a two-dimensional space is better ensured; furthermore, each target pixel value is read from the shared memory, and because each target pixel value has locality in a two-dimensional space, when the image gradient of the digital image at the pixel position where each target pixel value is located is determined respectively through the reading result, the condition of cache miss is avoided, and therefore the determination speed of the image gradient is improved in the computer implementation process.

Example two

Fig. 4 is a flowchart of an image gradient determining method provided in the second embodiment of the present invention. The present embodiment is optimized based on the above technical solutions. In this embodiment, optionally, loading each target pixel value to the shared memory may specifically include: loading, for each thread running in each thread in the graphics processor, each first pixel value associated with the thread in each target pixel value into a shared memory based on the thread; correspondingly, reading each target pixel value from the shared memory, and determining an image gradient of the digital image at the pixel position where each target pixel value is located based on the reading result, respectively, may include: and for each thread, after the thread receives a preset synchronous instruction, reading each second pixel value associated with the thread from the shared memory based on the thread, and respectively determining the image gradient of the digital image at the pixel position where each second pixel value is located from the reading result. Explanations of the same or corresponding terms as those of the above embodiments are omitted.

Referring to fig. 4, the method of this embodiment may specifically include the following steps:

s210, obtaining the pixel value of each pixel point in the digital image loaded into the global memory of the graphic processor, and determining a target pixel value from each pixel value according to the storage capacity of the shared memory in the graphic processor.

S220, for each thread running in the threads in the graphics processor, loading, based on the thread, each first pixel value associated with the thread in each target pixel value into the shared memory.

The GPU runs with a plurality of threads, and each target pixel value can be loaded into the shared memory based on the threads. In particular, for each of the plurality of threads, it may be responsible for loading into the shared memory some of the target pixel values, a quantity of which is the storage capacity, which may be the target pixel value of interest to the thread, i.e., the target pixel value associated with the thread during data loading, which is referred to herein as the first pixel value for convenience of description. For example, for N × N target pixel values, a thread needs to load M × M first pixel values associated with the thread from the N × N target pixel values into the shared memory. For another example, since the shared memory stores data in a data structure of a bucket, the shared memory may include a plurality of buckets, and then the thread 1 may load the first pixel value associated with itself into the bucket 1, that is, the first pixel value may be a target pixel value to be loaded into the bucket 1; thread 2 may load a first pixel value to be loaded into bucket 2; and so on, and so on.

It should be noted that, after each thread loads the corresponding first pixel value into the shared memory, it may start to perform a wait operation, because each thread may execute in parallel when performing the data loading operation, and there may be a difference in the execution speed of each thread, so that after a thread loads the first pixel value associated with the thread into the shared memory, it does not know whether the other threads have loaded the corresponding first pixel value into the shared memory, and therefore the wait operation needs to be performed until all threads have loaded the corresponding first pixel value into the shared memory, and then perform the subsequent data reading operation.

And S230, for each thread, after the thread receives a preset synchronous instruction, reading each second pixel value associated with the thread from the shared memory based on the thread, and respectively determining the image gradient of the digital image at the pixel position where each second pixel value is located from the reading result.

The image gradient determining device may be provided with a preset synchronization instruction, which may be an instruction sent to each thread after monitoring that each thread completes data loading operation. Therefore, for any one of the threads, when it is detected that the thread receives the preset synchronization instruction, which indicates that each thread has completed the data loading operation, each second pixel value associated with the thread may be read from the shared memory based on the thread. Specifically, the target pixel value stored in the shared memory can be accessed by all threads, that is, assuming that a certain target pixel value is loaded into the shared memory based on thread 1, the target pixel value can be accessed by thread 1, or can be accessed by other threads except thread 1, which is the meaning of "shared" in "shared memory. In the data reading process, a thread only reads the data of interest, which may be different from the data of interest during the data loading process, for example, continuing with the above example, the data of interest during the data loading process of the thread 1 is the target pixel value to be loaded into the bucket 1, and the data of interest during the data reading process is the target pixel value stored in the bucket 1 and the target pixel value stored in the bucket 2. Accordingly, for ease of description, the target pixel value that a thread is interested in during data reading may be referred to as the second pixel value associated with the thread.

Further, the image gradient of the digital image at the pixel position where each second pixel value is located may be determined based on the reading result of the thread, and the specific implementation process is described in the first embodiment of the present invention and is not described herein again. For example, as shown in fig. 5, taking thread 1 as an example, the second pixel value associated with thread 1 may be a target pixel value located in the second unit in bucket 1, a target pixel value located in the third unit, and a target pixel value located in the first unit in bucket 2, that is, thread 1 may access a target pixel value loaded into the shared memory by thread 2, and then the image gradient determined according to these several target pixel values may be the gradient of the digital image at the image position where the second unit is located.

According to the technical scheme, for each thread running in each thread in the graphics processor, each first pixel value related to the thread in each target pixel value is loaded into the shared memory based on the thread, and the data loading speed is increased in the process of parallel loading of data by each thread; after all the threads finish the data loading operation, for any one of the threads, reading each second pixel value associated with the thread from the shared memory based on the thread, and respectively determining the image gradient of the digital image at the pixel position where each second pixel value is located from the reading result.

In an optional technical solution, the shared memory stores each first pixel value in a data structure of a bucket, each bucket includes a plurality of storage units, and the first pixel values are stored in the corresponding storage units, that is, the shared memory may store data through the storage units in the bucket; after reading each second pixel value associated with the thread from the shared memory based on the thread, the image gradient determining method may further include: and obtaining a second pixel value stored in the first unit in each storage unit based on the reading result, and broadcasting the obtained result based on the graphics processor.

The shared memory has O buckets, the GPU can run O threads at one time in the running process, and an ideal running mode is that different threads access different buckets respectively, so that the fastest data reading speed can be ensured. Visually, it is more likely that there are O lanes on the highway and the vehicle speed is fastest when there is only one vehicle on each lane. When different threads access the same bucket, a bank conflict problem occurs. However, in practical applications, there is a high possibility that at least two threads need to access a target pixel value in the same memory location in the same bucket, where the target pixel value in the same memory location in the same bucket is data with the same address in the same bucket. Illustratively, as shown in FIG. 5, both

threads

1, 2 need to access the target pixel value in the first location in bucket 2, which would result in a bank conflict if accessed directly.

In order to solve this problem, the embodiment of the present invention proposes a way of broadcasting the target pixel value through a broadcasting mechanism to achieve the effect of accessing the same data by at least two threads equally. Specifically, after a certain thread reads a second pixel value which is associated with the certain thread and stored in a first unit, the first unit is a certain storage unit in each storage unit, and the read second pixel value can be broadcasted based on the GPU, so that other threads which have reading requirements for the first unit can obtain the second pixel value through the broadcast result. The at least two threads with the same reading requirement are the threads which need to read data at the same address in the same bucket. According to the technical scheme, the problem of bank conflict existing in the x direction is solved through a broadcasting mechanism, and therefore the effect of effectively determining the image gradient in the x direction is achieved.

In an optional technical solution, the shared memory stores each first pixel value in a data structure of a bucket, each bucket includes a plurality of storage units, and the first pixel values are stored in the corresponding storage units; determining from the read results the image gradient of the digital image at the pixel position at which the second pixel value is located, respectively, may include: determining a third pixel value in a second unit in each storage unit and a fourth pixel value in a third unit adjacent to the second unit and belonging to the same bucket in each storage unit based on each second pixel value obtained from the reading result; and determining the image gradient of the digital image at the pixel position where the third pixel value is located according to the third pixel value and the fourth pixel value.

The second unit and the third unit are adjacent storage units in the same barrel, the third pixel value is a target pixel value stored in the second unit, the fourth pixel value is a target pixel value stored in the third unit, the third pixel value and the fourth pixel value are determined from the read second pixel values, and the image gradient of the digital image at the pixel position where the third pixel value is located is determined according to the third pixel value and the fourth pixel value. Illustratively, as shown in fig. 5, a third pixel value in the second cell and a fourth pixel value in the third cell in the bucket 1 are obtained, and an image gradient of the digital image at the pixel position of the second cell is determined according to the third pixel value and the fourth pixel value. According to the technical scheme, the buckets can store two-dimensional data, namely the buckets can store the target pixel values in the y direction in different storage units in the same bucket, so that the corresponding target pixel values can be directly read from the shared memory in the image gradient determining process, data do not need to be read again from the global memory, and the effect of quickly determining the image gradient in the y direction is achieved.

An alternative solution, determining a target pixel value from pixel values, may include: determining a target pixel value from pixel values not loaded to a shared memory; correspondingly, after the image gradients of the digital image at the pixel positions where the second pixel values are located are respectively determined from the reading results, the image gradient determining method may further include: and repeating the step of executing the steps according to the storage capacity of the shared memory in the graphics processor until no pixel values which are not loaded to the shared memory exist any more. The shared memory has a limited storage capacity, so that all pixel values in the digital image cannot be contained at one time, and each pixel value can be loaded into the shared memory in batches, so that a target pixel value can be determined from each pixel value which is not loaded into the shared memory according to the storage capacity. Correspondingly, after each thread in the GPU completes the corresponding image gradient determination operation, it may load the target pixel values, the quantity of which is the storage capacity, that are not loaded into the shared memory, and read the data in the shared memory to obtain the corresponding image gradient, and cycle through until all the pixel values are loaded into the shared memory, thereby achieving the effect of effectively determining the image gradient of the digital image at each pixel position. For example, assuming that each thread in the GPU has processed N × N pixel values located at the top left corner of the digital image in one run, it may then process N × N pixel values located at N +1-2 × N in the x direction and N +1-2 × N in the y direction, and loop back and forth until all pixel values have been processed.

EXAMPLE III

Fig. 6 is a flowchart of an image gradient determining method provided in the third embodiment of the present invention. The present embodiment is optimized based on the above technical solutions. In this embodiment, optionally, the loading each target pixel value to the shared memory may include: loading each target pixel value to a shared memory, and loading a neighborhood pixel value adjacent to the edge pixel value in each pixel value to a register of a graphic processor aiming at the edge pixel value loaded to the edge of the shared memory in each target pixel value; correspondingly, determining the image gradient of the digital image at the pixel position where each target pixel value is located based on the reading result may include: for each read target pixel value, if the target pixel value is an edge pixel value, reading a neighborhood pixel value adjacent to the edge pixel value from a register; and determining the image gradient of the digital image at the pixel position where the edge pixel value is located according to the neighborhood pixel value and the edge pixel value. The same or corresponding terms as those in the above embodiments are not explained in detail herein.

Referring to fig. 6, the method of this embodiment may specifically include the following steps:

s310, obtaining the pixel value of each pixel point in the digital image loaded into the global memory of the graphic processor, and determining a target pixel value from each pixel value according to the storage capacity of the shared memory in the graphic processor.

S320, loading each target pixel value into the shared memory, and for an edge pixel value at an edge of each target pixel value loaded into the shared memory, loading a neighborhood pixel value adjacent to the edge pixel value in each pixel value into a register of the graphics processor.

Since the shared memory has a limited storage capacity and cannot accommodate all pixel values in the digital image at one time, the target pixel value determined from the pixel values according to the storage capacity can be loaded into the shared memory. It should be noted that, for the target pixel values loaded into the shared memory, the image gradients at the pixel positions where the target pixel values at the edges are located cannot be directly calculated based on the target pixel values stored in the shared memory. For example, as shown in fig. 5, for an image gradient of a digital image at a pixel location where the digital image is located, the determination process needs to involve a target pixel value stored in a corresponding location and a neighborhood pixel value below the target pixel value, but the neighborhood pixel value is not yet loaded into the shared memory, which makes the image gradient at the pixel location unavailable based on data stored in the shared memory, that is, there is a problem of cache miss when determining the image gradient at the pixel location.

To solve the above problem, the embodiments of the present invention effectively apply the global memory and the registers in the GPU together, wherein the registers are high-speed memory units with limited memory capacity inside the GPU, and can be used for temporarily storing instructions, data, and addresses. Specifically, for an edge pixel value loaded to an edge of the shared memory in each target pixel value, that is, the edge pixel value is a target pixel value located at the edge of the shared memory, a neighborhood pixel value adjacent to the edge pixel value in each pixel value is loaded to a register of the graphics processor, so that when the neighborhood pixel values are required to be involved later, the neighborhood pixel values can be directly loaded into the shared memory from the register.

S330, reading each target pixel value from the shared memory, and for each read target pixel value, if the target pixel value is an edge pixel value, reading a neighborhood pixel value adjacent to the edge pixel value from the register.

When reading each target pixel value from the shared memory, for the target pixel values (namely edge pixel values) at the edge of the shared memory, it is also necessary to read neighborhood pixel values adjacent to the edge pixel values from the register, so that the setting is good that the data reading speeds of the register and the shared memory in the data reading process are similar, and even if the corresponding neighborhood pixel values need to be read again from the register in the image gradient determination process, the determination speed of the image gradient is not influenced too much, thereby further solving the problem of slow determination speed of the image gradient caused by cache miss.

On this basis, the neighborhood pixel values set forth in the embodiments of the present invention may be combined with the repeated execution process in the foregoing technical solution, and for example, continuing with the above example, when each thread is in the process of processing N × N pixel values located at the upper left corner of the digital image, it may read the neighborhood pixel values adjacent to the current sub-edge pixel value from the register into the shared memory to implement the determination process of the corresponding image gradient. Further, after the N × N pixel values are processed, each thread may process the N +1-2 × N pixel values in the x direction and the N +1-2 × N pixel values in the y direction, and the process may be repeated until all the pixel values are processed.

S340, determining the image gradient of the digital image at the pixel position where the edge pixel value is located according to the neighborhood pixel value and the edge pixel value.

According to the technical scheme, when the target pixel value is loaded, the problem of cache miss when the image gradient on the pixel position where the edge pixel value is located is determined is solved by loading the neighborhood pixel value adjacent to the edge pixel value located at the edge of the shared memory into the register, and at the moment, the neighborhood pixel value only needs to be read into the shared memory from the register, so that the performance loss caused by cache miss is further reduced by utilizing a merging mechanism of global memory data access.

Example four

Fig. 7 is a flowchart of an image gradient determining method provided in the fourth embodiment of the present invention. The present embodiment is optimized based on the above technical solutions. In this embodiment, optionally, the target pixel value may include a target real part value or a target imaginary part value of the magnetic resonance signal, and the target real part value and the target imaginary part value belonging to the same magnetic resonance signal are respectively loaded in different buckets of the shared memory; determining the image gradient of the digital image at the pixel position where each target pixel value is located based on the reading result, respectively, may include: acquiring a target real part value and a target imaginary part value which belong to the same magnetic resonance signal from each read target pixel value; and weighting the target real part value and the target imaginary part value, and determining the image gradient of the digital image at the pixel position corresponding to the weighting result based on the reading result. The same or corresponding terms as those in the above embodiments are not explained in detail herein.

Referring to fig. 7, the method of this embodiment may specifically include the following steps:

s410, obtaining the pixel value of each pixel point in the digital image loaded into the global memory of the graphic processor.

And S420, determining a target pixel value from the pixel values according to the storage capacity of a shared memory in the graphic processor, and loading the target pixel values into the shared memory, wherein the target pixel value comprises a target real part value or a target imaginary part value of the magnetic resonance signal, and the target real part value and the target imaginary part value belonging to the same magnetic resonance signal are respectively loaded in different buckets of the shared memory.

The MR signals scanned and acquired in the MR are complex, that is, one MR signal is composed of two floating point data of a real part and an imaginary part, if a complex type is defined in a shared memory directly, bank conflict will inevitably occur in more buckets, because each floating point data needs 2 adjacent banks for data storage, many threads cannot be executed in parallel, and the purpose of improving the operation performance cannot be achieved. Therefore, the real part and the imaginary part of the MR signal can be respectively stored by 2 buckets, that is, the target pixel value can be a target real part value or a target imaginary part value in the magnetic resonance signal, and when each target pixel value is loaded into the shared memory, the target real part value and the target imaginary part value belonging to the same magnetic resonance signal can be respectively loaded into different buckets of the shared memory, thereby avoiding the problem of bank conflict possibly existing in the reading process of the target pixel value.

And S430, reading each target pixel value from the shared memory, and acquiring a target real part value and a target imaginary part value which belong to the same magnetic resonance signal from each read target pixel value.

S440, weighting the real part value and the imaginary part value of the target, and determining the image gradient of the digital image at the pixel position corresponding to the weighting result based on the reading result, that is, determining the image gradient of the digital image at the pixel position where the MR signal corresponding to the weighting result is located.

According to the technical scheme of the embodiment of the invention, the target real part value and the target imaginary part value belonging to the same magnetic resonance signal are respectively loaded in different buckets of the shared memory, so that the problem of bank conflict possibly existing in the reading process of the target real part value and the target imaginary part value is solved, and the effect of quickly determining the image gradient is achieved.

In order to better understand the specific implementation process of the above steps, the following describes an exemplary image gradient determining method according to embodiments of the present invention with reference to a specific example. For example, as shown in fig. 8, when the image gradient determination method is applied to a conjugate gradient descent method in a compressed sensing algorithm of an MR, MR signals stored in a global memory are loaded into a shared memory based on threads, wherein target real part values and target imaginary part values belonging to the same MR signal need to be loaded into different buckets of the shared memory, and a synchronous waiting operation of the threads is performed until the threads are monitored to receive a preset synchronous instruction; further, for each thread in the threads, determining whether the target pixel value loaded into the shared memory is located at the edge of the shared memory according to the relationship between the index of the thread and the shared memory; if so, loading the MR signals corresponding to the edge pixel values on the edge in the global memory into a register, and then loading the MR signals into a shared memory from the register, so that the thread can directly read the data concerned by the thread from the shared memory, and performing weighted re-differential operation on the read target real part value and target imaginary part value which belong to the same magnetic resonance signal to obtain the corresponding image gradient; otherwise, the data concerned by the user can be directly read from the shared memory, and the read target real part value and the read target imaginary part value which belong to the same magnetic resonance signal are subjected to weighted re-differential operation to obtain the corresponding image gradient; and finally, storing the operation result in a register. Experiments prove that analysis results obtained by an Nsight computer performance analysis tool provided by great have the advantages that under the condition of the same thread configuration, compared with the prior art, the scheme has obviously higher GPU (SM Frequency) utilization rate and memory bandwidth utilization rate, the single calling time is also increased by 2 microseconds, the scheme is executed in a compressed sensing algorithm, and the total calling time is reduced by about 25% compared with the prior art through the Nsight System performance analysis tool.

In addition, in addition to the above experiment, the applicant also performed experiments on digital images with a dimension of 288 × 240, and if a shared memory with a dimension of 8 × 8 is used, the cache hit rate can be increased by more than 86%, and the algorithm execution speed can be increased by 25% on average.

EXAMPLE five

Fig. 9 is a block diagram of a configuration of an image gradient determining apparatus according to a fifth embodiment of the present invention, where the apparatus is configured to execute the image gradient determining method according to any of the embodiments. The image gradient determining apparatus and the image gradient determining method of the above embodiments belong to the same inventive concept, and details that are not described in detail in the embodiments of the image gradient determining apparatus may refer to the embodiments of the image gradient determining method described above. As shown in fig. 9, the apparatus may specifically include: a pixel value obtaining module 510, a shared memory loading module 520, and an image gradient determining module 530.

The pixel value obtaining module 510 is configured to obtain a pixel value of each pixel point in a digital image loaded into a global memory of a graphics processor;

a shared memory loading module 520, configured to determine a target pixel value from each pixel value according to a storage capacity of a shared memory in the graphics processor, and load each target pixel value to the shared memory;

an image gradient determining module 530, configured to read each target pixel value from the shared memory, and determine an image gradient of the digital image at a pixel position where each target pixel value is located based on the read result.

Optionally, the shared memory loading module 520 may specifically include:

the shared memory loading unit is used for loading each first pixel value related to the thread in each target pixel value into the shared memory based on the thread aiming at each thread running in each thread in the graphics processor;

accordingly, the image gradient determination module 530 may be specifically configured to:

On this basis, optionally, the shared memory stores each first pixel value in a data structure of a bucket, each bucket includes a plurality of storage units, and the first pixel values are stored in the corresponding storage units; the image gradient determination apparatus may further include:

and the data broadcasting module is used for obtaining the second pixel values stored in the first units in the storage units based on the reading result after reading the second pixel values associated with the threads from the shared memory based on the threads, and broadcasting the obtained result based on the graphics processor.

On this basis, optionally, the shared memory stores each first pixel value in a data structure of a bucket, each bucket includes a plurality of storage units, and the first pixel values are stored in the corresponding storage units; the image gradient determining module 530 may specifically include:

a fourth pixel value determining unit, configured to determine, based on each second pixel value obtained from the reading result, a third pixel value located in the second unit in each storage unit and a fourth pixel value located in a third unit adjacent to the second unit and belonging to the same bucket in each storage unit;

and the first image gradient determining unit is used for determining the image gradient of the digital image at the pixel position where the third pixel value is located according to the third pixel value and the fourth pixel value.

On this basis, optionally, the shared memory loading module 520 may specifically include: a target pixel value determining unit, configured to determine a target pixel value from pixel values that are not loaded into the shared memory;

accordingly, the image gradient determination apparatus may further include:

and the repeated execution module is used for repeatedly executing the step according to the storage capacity of the shared memory in the graphic processor after respectively determining the image gradient of the digital image at the pixel position where each second pixel value is located from the reading result until no pixel value which is not loaded to the shared memory exists any more.

Optionally, the shared memory loading module 520 may specifically include:

the register loading unit is used for loading each target pixel value to the shared memory and loading a neighborhood pixel value adjacent to the edge pixel value in each pixel value to a register of the graphic processor aiming at the edge pixel value loaded to the edge of the shared memory in each target pixel value;

accordingly, the image gradient determining module 530 may specifically include:

a neighborhood pixel value reading unit, configured to, for each read target pixel value, read a neighborhood pixel value adjacent to an edge pixel value from the register if the target pixel value is the edge pixel value;

and the second image gradient determining unit is used for determining the image gradient of the digital image at the pixel position where the edge pixel value is located according to the neighborhood pixel value and the edge pixel value.

Optionally, the target pixel value includes a target real part value or a target imaginary part value of the magnetic resonance signal, and the target real part value and the target imaginary part value belonging to the same magnetic resonance signal are respectively loaded in different buckets of the shared memory, and the image gradient determining module 530 may specifically include:

a real part and imaginary part reading unit, configured to obtain a target real part value and a target imaginary part value belonging to the same magnetic resonance signal from each read target pixel value;

and the third image gradient determining unit is used for weighting the target real part value and the target imaginary part value and determining the image gradient of the digital image at the pixel position corresponding to the weighting result based on the reading result.

The image gradient determining apparatus provided in the fifth embodiment of the present invention determines, through mutual cooperation of the pixel value obtaining module and the shared memory loading module, a target pixel value from pixel values of each pixel point in a digital image loaded into a global memory of the graphics processor based on a storage capacity of the shared memory in the graphics processor, and because the shared memory has a two-dimensional physical structure, when each target pixel value is loaded into the shared memory, locality of the target pixel value on a two-dimensional space is better ensured; and then the image gradient determining module reads each target pixel value from the shared memory, and because each target pixel value has locality in a two-dimensional space, the condition of cache miss is avoided when the image gradient of the digital image at the pixel position where each target pixel value is located is determined respectively through the reading result, so that the determining speed of the image gradient is improved in the implementation process of a computer. According to the device, the locality of the target pixel value in the digital image on a two-dimensional space is ensured by loading the target pixel value in the digital image into the shared memory with a two-dimensional physical structure, and further, when the image gradient is determined based on the target pixel value read from the shared memory, the condition of cache miss is avoided, so that the effect of improving the determination speed of the image gradient in the computer implementation process is achieved.

The image gradient determining device provided by the embodiment of the invention can execute the image gradient determining method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the executing method.

It should be noted that, in the embodiment of the image gradient determining apparatus, the included units and modules are only divided according to the functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

Example six

Fig. 10 is a schematic structural diagram of an image gradient determining apparatus according to a sixth embodiment of the present invention, as shown in fig. 10, the apparatus includes a memory 610, a processor 620, an input device 630, and an output device 640. The number of processors 620 in the device may be one or more, and one processor 620 is taken as an example in fig. 10; the memory 610, processor 620, input device 630, and output device 640 in the apparatus may be connected by a bus or other means, such as by bus 650 in fig. 10.

The memory 610 may be used as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the image gradient determination method in the embodiment of the present invention (e.g., the pixel value obtaining module 510, the shared memory loading module 520, and the image gradient determination module 530 in the image gradient determination device). The processor 620 executes software programs, instructions and modules stored in the memory 610 to perform various functional applications of the device and data processing, i.e., to implement the image gradient determination method described above.

The memory 610 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 610 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 610 may further include memory located remotely from processor 620, which may be connected to devices through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 630 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function controls of the device. The output device 640 may include a display device such as a display screen.

EXAMPLE seven

An embodiment of the present invention provides a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method for image gradient determination, the method comprising:

acquiring a pixel value of each pixel point in a digital image loaded into a global memory of a graphic processor; determining a target pixel value from each pixel value according to the storage capacity of a shared memory in the graphic processor, and loading each target pixel value to the shared memory;

Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in the image gradient determination method provided by any embodiment of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. With this understanding in mind, the technical solutions of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. An image gradient determination method, comprising:

acquiring a pixel value of each pixel point in a digital image loaded into a global memory of a graphic processor;

determining a target pixel value from each pixel value according to the storage capacity of a shared memory in the graphics processor, and loading each target pixel value to the shared memory;

2. The method of claim 1, wherein said loading each of said target pixel values into said shared memory comprises:

for each thread running in threads in the graphics processor, loading first pixel values associated with the thread in the target pixel values into the shared memory based on the thread;

correspondingly, the reading each target pixel value from the shared memory, and respectively determining the image gradient of the digital image at the pixel position where each target pixel value is located based on the reading result includes:

3. The method of claim 2, wherein the shared memory stores the first pixel values in a data structure of buckets, each bucket including a plurality of storage locations, and the first pixel values are stored in the respective storage locations; after the reading, based on the thread, each second pixel value associated with the thread from the shared memory, the method further includes:

and obtaining the second pixel value stored in the first unit in each storage unit based on the reading result, and broadcasting the obtained result based on the graphics processor.

4. The method of claim 2, wherein the shared memory stores the first pixel values in a data structure of buckets, each bucket including a plurality of storage locations, and the first pixel values are stored in the corresponding storage locations; the determining, from the reading result, an image gradient of the digital image at a pixel position where each of the second pixel values is located, respectively, includes:

determining a third pixel value located in a second unit in each storage unit and a fourth pixel value located in a third unit adjacent to the second unit and belonging to the same bucket in each storage unit based on each second pixel value obtained from the reading result;

and determining the image gradient of the digital image at the pixel position where the third pixel value is located according to the third pixel value and the fourth pixel value.

5. The method of claim 2, wherein determining a target pixel value from each of the pixel values comprises:

determining a target pixel value from each of the pixel values not loaded into the shared memory;

correspondingly, after the determining the image gradient of the digital image at the pixel position where each second pixel value is located from the reading result, the method further includes:

and repeatedly executing the step according to the storage capacity of the shared memory in the graphics processor until the pixel values which are not loaded to the shared memory do not exist any more.

6. The method of claim 1, wherein said loading each of said target pixel values into said shared memory comprises:

loading each target pixel value into the shared memory, and for an edge pixel value loaded to an edge of the shared memory in each target pixel value, loading a neighborhood pixel value adjacent to the edge pixel value in each pixel value into a register of the graphics processor;

correspondingly, the determining the image gradient of the digital image at the pixel position where each target pixel value is located based on the reading result respectively includes:

for each read target pixel value, if the target pixel value is the edge pixel value, reading the neighborhood pixel value adjacent to the edge pixel value from the register;

and determining the image gradient of the digital image at the pixel position where the edge pixel value is located according to the neighborhood pixel value and the edge pixel value.

7. The method according to claim 1, wherein the target pixel value comprises a target real part value or a target imaginary part value of a magnetic resonance signal, the target real part value and the target imaginary part value belonging to the same magnetic resonance signal being loaded in different buckets of the shared memory, respectively;

the determining, based on the reading result, an image gradient of the digital image at a pixel position where each of the target pixel values is located, respectively includes:

acquiring the target real part value and the target imaginary part value which belong to the same magnetic resonance signal from each read target pixel value;

and weighting the target real part value and the target imaginary part value, and determining the image gradient of the digital image at the pixel position corresponding to the weighting result based on the reading result.

8. An image gradient determination apparatus, characterized by comprising:

the image processing device comprises a pixel value acquisition module, a pixel value acquisition module and a pixel value acquisition module, wherein the pixel value acquisition module is used for acquiring the pixel value of each pixel point in a digital image loaded into a global memory of the graphics processor;

a shared memory loading module, configured to determine a target pixel value from each pixel value according to a storage capacity of a shared memory in the graphics processor, and load each target pixel value to the shared memory;

9. An image gradient determination device, characterized by comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the image gradient determination method of any one of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the image gradient determination method according to any one of claims 1 to 7.