CN113095286A - Big data image processing algorithm and system - Google Patents

Big data image processing algorithm and system Download PDF

Info

Publication number
CN113095286A
CN113095286A CN202110480856.3A CN202110480856A CN113095286A CN 113095286 A CN113095286 A CN 113095286A CN 202110480856 A CN202110480856 A CN 202110480856A CN 113095286 A CN113095286 A CN 113095286A
Authority
CN
China
Prior art keywords
image
big data
data
algorithm
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110480856.3A
Other languages
Chinese (zh)
Inventor
汪知礼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202110480856.3A priority Critical patent/CN113095286A/en
Publication of CN113095286A publication Critical patent/CN113095286A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/446Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering using Haar-like filters, e.g. using integral image techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image processing, and discloses a big data image processing algorithm, which comprises the following steps: acquiring mass image data, and carrying out image graying and grayscale stretching pretreatment on the image data; performing feature extraction on the preprocessed image by using an image description feature extraction algorithm; optimizing a data partitioning strategy of the big data platform by using a partitioning optimization algorithm, storing mass image data into the optimized big data platform, and taking image description characteristics as key values of image storage; carrying out self-adaptive segmentation processing on the stored image by utilizing a self-adaptive image segmentation algorithm; and extracting the semantic information of the segmented image blocks by using an image semantic feature extraction model, and taking the semantic information of all the segmented image blocks as the semantic information of the original image. The invention also provides a big data image processing system. The invention realizes image processing based on big data.

Description

Big data image processing algorithm and system
Technical Field
The invention relates to the technical field of image processing, in particular to a big data image processing algorithm and a big data image processing system.
Background
With the rapid development of image big data technology, multi-dimensional, multi-scale and high-resolution image data shows explosive growth. The traditional image processing software has the problem of long time consumption when processing the massive image data.
Meanwhile, the traditional image processing software has strong dependence on hardware resources of a server due to the problems of high data throughput, information redundancy and the like, cannot be managed in a centralized manner, is difficult to maintain, and has poor expansibility and low storage utilization rate.
In view of this, how to implement more efficient mass data processing by means of big data technology becomes a problem to be solved urgently by those skilled in the art.
Disclosure of Invention
The invention provides a big data image processing algorithm, which is characterized in that the description characteristics of an image are extracted to be used as a storage key value of the image, a data partitioning strategy of a big data platform is optimized by using a partitioning optimization algorithm, and mass data are stored in the optimized big data platform; the self-adaptive image segmentation algorithm is used for carrying out self-adaptive segmentation processing on the stored image, the image semantic feature extraction model is used for extracting the semantic information of the segmented image blocks, and the semantic information of all the segmented image blocks is used as the information of the original image, so that the segmentation and semantic information extraction processing of the image are realized.
In order to achieve the above object, the present invention provides a big data image processing algorithm, including:
acquiring mass image data, and performing image graying and grayscale stretching preprocessing on the image data to obtain preprocessed mass image data;
performing feature extraction on the preprocessed image by using an image description feature extraction algorithm to obtain description features of massive images;
optimizing a data partitioning strategy of the big data platform by using a partitioning optimization algorithm, storing mass image data into the optimized big data platform, and taking image description characteristics as key values of image storage;
carrying out self-adaptive segmentation processing on the stored image by using a self-adaptive image segmentation algorithm to obtain a plurality of image blocks;
and extracting the semantic information of the segmented image blocks by using an image semantic feature extraction model, and taking the semantic information of all the segmented image blocks as the semantic information of the original image.
Optionally, the preprocessing of performing image graying and grayscale stretching on the image data includes:
1) solving the maximum value of three components of each pixel in the acquired massive images, and setting the maximum value as the gray value of the pixel point to obtain a gray map of the image, wherein the formula of the gray processing is as follows:
G(i,j)=max{R(i,j),G(i,j),B(i,j)}
wherein:
(i, j) is a pixel point in the image;
r (i, j), G (i, j) and B (i, j) are respectively the values of the pixel point (i, j) in R, G, B three color channels;
g (i, j) is the gray value of the pixel point (i, j);
2) for the gray-scale image, stretching the gray-scale of the image by using a piecewise linear transformation, wherein the formula of the gray-scale stretching is as follows:
Figure BDA0003049199050000011
wherein:
f (x, y) is a gray scale map;
MAXf(x,y),MINf(x,y)respectively the maximum and minimum grey values of the grey map.
Optionally, the performing, by using an image description feature extraction algorithm, feature extraction on the preprocessed image includes:
1) constructing a Hessian matrix of the image:
Figure BDA0003049199050000021
wherein:
hxx(x, sigma) is a second derivative of the image at the position x of the pixel point, and sigma is a neighborhood standard deviation of the image pixel;
2) calculating each imageHessian matrix extremum of element: d (h (x) ═ hxy*hyy-(0.9*hxy)2(ii) a Selecting K pixel points with the maximum D (H (x)) in the image as local feature points of the image;
3) constructing a Gaussian scale domain space:
Figure BDA0003049199050000022
wherein:
i (x, y) is an original image;
sigma is the standard deviation of pixels of the original image;
4) comparing the local characteristic points processed by the Hessian matrix with all adjacent points of an image domain and a scale domain of the local characteristic points, and positioning stable characteristic points by adopting a non-maximum value inhibition method;
5) counting Harr wavelet characteristics in the circular neighborhood of the characteristic points and determining the main direction of the characteristic points; and 4-4 rectangular area blocks around the feature points are extracted, and the sum of the horizontal direction values, the sum of the vertical direction values, the sum of the horizontal direction absolute values and the sum of the vertical direction absolute values of the Harr wavelet features are counted to obtain 64-dimensional feature vectors which serve as image description features.
Optionally, the optimizing the data partitioning policy of the big data platform by using the partition optimization algorithm includes:
in a specific embodiment of the present invention, the big data platform is a Hadoop platform;
the partition optimization algorithm flow comprises the following steps:
1) the method comprises the following steps that a big data platform receives massive image data, image description characteristics are used as key values key for image storage, and the stored images are sampled based on a sampling rate s, in a specific embodiment of the invention, a sampling rate evaluation model is constructed, the image sampling rate is determined according to the constructed sampling rate evaluation model, and the sampling rate evaluation model is as follows:
s=argmin(αDs+βTs)
Figure BDA0003049199050000023
Figure BDA0003049199050000024
wherein:
covs,ithe value of the error rate of the i-th sampling at the sampling rate s, which is the difference between the data distribution after sampling and the expected distribution, covm,iThe error rate value of the ith sampling when the sampling rate is 100 percent;
ts,ithe sampling time of j sampling when the sampling rate is s is represented;
α, β represent sampling rate evaluation model parameters, which are both set to 1;
2) obtaining the total load and the total Key value Key kind number according to the sampling result, and calculating the average load of the big data platform according to the Reduce number started by the client;
traversing the Key value queue, if the value in the Key value is larger than the average value load, then the image corresponding to the Key value Key is called as a large load, and then the large load is split, wherein the splitting of the large load is divided into two cases, 1. the large load is equal to the load average value, then the Key is distributed to the node with the Reduce load of 0, and the corresponding relation between the partition number and the Reduce node is recorded; 2. and when the large load is several times of the average load, splitting the large load by using the average load, simultaneously recording the corresponding relation between the partition number and the Key, and after the large load is processed, directly distributing the small load to the Reduce node and recording the corresponding relation between the Key and the partition number.
Optionally, the performing, by using an adaptive image segmentation algorithm, an adaptive segmentation process on the stored image includes:
1) determining the image segmentation quantity K, and performing random and uniform centroid distribution operation on all pixel points in the whole image, namely distributing corresponding K clustering centers, wherein the area of each image block is S multiplied by S, S is sqrt (N/K), and N is the total number of image pixels;
2) computing clustered centers and surrogates in search regionsSpatial distance function between prime points
Figure BDA0003049199050000031
Figure BDA0003049199050000032
Wherein (x)i,yi) Is the cluster center of the ith image block, (x)j,yj) The pixel points in the ith image block are selected;
3) converting the image into an LAB image, calculating a scaling function m (i, j) of the image:
Figure BDA0003049199050000033
wherein:
(xij,yij) Is the difference between the distances between pixel i and pixel j;
Lij,Aij,Bijthe correlation between the pixel point i and the pixel point j about the brightness, the red and green color and the yellow and green color is shown;
4) respectively calculating a proportional function of each image block, and dividing the divided image into two parts by using the proportional function, namely an area larger than the proportional function and an area smaller than or equal to the proportional function; for the segmented small-size image, calculating the pixel difference between the small-size image and the adjacent large-size image:
tq=|L-Lq|+|A-Aq|+|B-Bq|
wherein:
tqis the pixel difference between the small-size image and the adjacent large-size image q;
l, A and B are average values of the brightness, red and green color values and yellow and green color values of the small-size image respectively;
Lq,Aq,Bqrespectively averaging the luminance, red-green color value and yellow-green color value of the adjacent large-size image q;
if tqIf T is less than T, the small-size image and the large-size image are merged, wherein T isA preset image threshold;
5) and performing iterative computation to optimize the algorithm until the area of the clustering center and the area of the clustering image blocks are not changed, and segmenting the original image to obtain a plurality of image blocks.
Optionally, the extracting semantic information of the segmented image block by using the image semantic feature extraction model includes:
in a specific embodiment of the present invention, the network structure of the image semantic feature extraction model is a VGG16 model;
the image semantic feature extraction process comprises the following steps:
performing feature training by adopting the initial weight of ImageNet migration, and respectively extracting low-level features and high-level features of the image; basic characteristic information of the image, such as characteristics of lines, edges, shapes and the like, is extracted by the lower layer network, and the information has universality, so that the network parameters of the lower layer can directly adopt pre-trained weight information; the high-level network forms feature expression information aiming at a specific problem by combining and mapping the low-level basic features;
inputting the extracted features into the full-connection layer, and training and adjusting the weight information of the full-connection layer by using a fine-tuning strategy so as to maximize the accuracy of semantic classification of the image block; and selecting the output information of the first full-connection layer as the deep learning semantic information of the image block.
Further, to achieve the above object, the present invention also provides a big data image processing system, comprising:
the image acquisition device is used for acquiring mass image data;
the data processor is used for preprocessing image graying and gray stretching to the image data to obtain preprocessed massive image data, and extracting the characteristics of the preprocessed image by using an image description characteristic extraction algorithm to obtain the description characteristics of the massive image;
the big data image processing device is used for optimizing a data partitioning strategy of the big data platform by utilizing a partitioning optimization algorithm, storing mass image data into the optimized big data platform, taking image description characteristics as key values of image storage, and performing self-adaptive segmentation processing on the stored image by utilizing a self-adaptive image segmentation algorithm to obtain a plurality of image blocks; and extracting the semantic information of the segmented image blocks by using an image semantic feature extraction model, and taking the semantic information of all the segmented image blocks as the semantic information of the original image.
Further, to achieve the above object, the present invention also provides a computer readable storage medium having stored thereon big data image processing program instructions executable by one or more processors to implement the steps of the implementation method of big data image processing as described above.
Compared with the prior art, the invention provides a big data image processing algorithm, which has the following advantages:
firstly, the invention provides a partition optimization algorithm of a Hadoop platform, wherein a big data platform receives mass image data, takes image description characteristics as key values key for image storage, constructs a sampling rate evaluation model, and comprehensively considers the influence of different sampling rates on data distribution and sampling cost, so as to determine the optimal image sampling rate according to the constructed sampling rate evaluation model, and sample the stored image based on the sampling rate s, wherein the sampling rate evaluation model is as follows:
s=argmin(αDs+βTs)
Figure BDA0003049199050000041
Figure BDA0003049199050000042
wherein: covs,iThe value of the error rate of the i-th sampling at the sampling rate s, which is the difference between the data distribution after sampling and the expected distribution, covm,iThe error rate value of the ith sampling when the sampling rate is 100 percent; t is ts,iShow the drawerSampling time of j sampling when the sampling rate is s; alpha and beta represent sampling rate evaluation model parameters; obtaining the total load and the total Key value Key kind number according to the sampling result, and calculating the average load of the big data platform according to the Reduce number started by the client; traversing the Key value queue, if the value in the Key value is larger than the average load, taking the image corresponding to the Key value Key as a large load, splitting the large load, wherein the splitting of the large load is divided into two conditions, 1. the large load is equal to the load average value, distributing the Key to a node with a Reduce load of 0, and recording the corresponding relation between the partition number and the Reduce node; 2. the large load is several times of the average load, the large load is split by the average load, the corresponding relation between the partition number and the Key is recorded, and after the large load is processed, the small load is directly distributed to Reduce nodes and the corresponding relation between the Key and the partition number is recorded, so that a large-load image is placed on a small number of Reduce as much as possible, the problem of overlarge load caused by inclined data distribution in a large data storage platform is effectively solved, and the storage pressure of image data is relieved.
Meanwhile, compared with the traditional image segmentation algorithm which needs to manually determine image segmentation parameters, the invention provides a self-adaptive image segmentation algorithm for carrying out self-adaptive segmentation processing on a stored image, firstly, the image segmentation quantity K is determined, and random and uniform centroid distribution operation is carried out on all pixel points in the whole image, namely, corresponding K clustering centers are distributed, the area of each image block is S multiplied by S, S is sqrt (N/K), wherein N is the total number of image pixels; calculating a spatial distance function between clustered centers and all pixel points in a search area
Figure BDA0003049199050000043
Wherein (x)i,yi) Is the cluster center of the ith image block, (x)j,yj) The pixel points in the ith image block are selected; converting the image into an LAB image, calculating a scaling function m (i, j) of the image:
Figure BDA0003049199050000044
wherein: (x)ij,yij) Is the difference between the distances between pixel i and pixel j; l isij,Aij,BijThe correlation between the pixel point i and the pixel point j about the brightness, the red and green color and the yellow and green color is shown; respectively calculating a proportional function of each image block, and dividing the divided image into two parts by using the proportional function, namely an area larger than the proportional function and an area smaller than or equal to the proportional function; for the segmented small-size image, calculating the pixel difference between the small-size image and the adjacent large-size image:
tq=|L-Lq|+|A-Aq|+|B-Bq|
wherein: t is tqIs the pixel difference between the small-size image and the adjacent large-size image q; l, A and B are average values of the brightness, red and green color values and yellow and green color values of the small-size image respectively; l isq,Aq,BqRespectively averaging the luminance, red-green color value and yellow-green color value of the adjacent large-size image q; if tqIf the image size is less than T, combining the small-size image with the large-size image, wherein T is a preset image threshold value; and performing iterative computation to optimize the algorithm until the area of the clustering center and the area of the clustering image blocks are not changed, and segmenting the original image to obtain a plurality of image blocks. Compared with the traditional algorithm, the method obtains the proportion function of each pixel point in the image area and the clustering center pixel point of the area through a plurality of times of iterative computation, the proportional function contains information about luminance information, red-green color information, yellow-blue color information, and coordinate space distance in the color space, the proportion function of each image block can be obtained through iterative calculation, the divided image is divided into two parts by utilizing the proportion function, i.e., regions greater than the proportional function and regions less than or equal to the proportional function, to achieve finer image segmentation, meanwhile, aiming at the small-size image which possibly appears, the invention sets the threshold value of the pixel difference between the small-size image and the adjacent large-size image, if the threshold value is met, the small-sized image is merged with the large-sized image to thereby implement an adaptive segmentation process of the image based on the color space and the coordinate space of the image.
Drawings
FIG. 1 is a schematic flow chart of a big data image processing algorithm according to an embodiment of the present invention;
FIG. 2 is a block diagram of a big data image processing system according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Extracting description features of the image to serve as storage key values of the image, optimizing a data partitioning strategy of the big data platform by using a partitioning optimization algorithm, and storing mass data into the optimized big data platform; the self-adaptive image segmentation algorithm is used for carrying out self-adaptive segmentation processing on the stored image, the image semantic feature extraction model is used for extracting the semantic information of the segmented image blocks, and the semantic information of all the segmented image blocks is used as the information of the original image, so that the segmentation and semantic information extraction processing of the image are realized. Referring to fig. 1, a schematic diagram of a big data image processing algorithm according to an embodiment of the present invention is provided.
In this embodiment, the big data image processing algorithm includes:
and S1, acquiring mass image data, and performing preprocessing of image graying and gray level stretching on the image data to obtain the mass image data after preprocessing.
Firstly, the invention acquires mass image data and carries out preprocessing of image graying and gray stretching on the image data, wherein the preprocessing flow of the image graying and gray stretching is as follows:
1) solving the maximum value of three components of each pixel in the acquired massive images, and setting the maximum value as the gray value of the pixel point to obtain a gray map of the image, wherein the formula of the gray processing is as follows:
G(i,j)=max{R(i,j),G(i,j),B(i,j)}
wherein:
(i, j) is a pixel point in the image;
r (i, j), G (i, j) and B (i, j) are respectively the values of the pixel point (i, j) in R, G, B three color channels;
g (i, j) is the gray value of the pixel point (i, j);
2) for the gray-scale image, stretching the gray-scale of the image by using a piecewise linear transformation, wherein the formula of the gray-scale stretching is as follows:
Figure BDA0003049199050000051
wherein:
f (x, y) is a gray scale map;
MAXf(x,y),MINf(x,y)respectively the maximum and minimum grey values of the grey map.
And S2, performing feature extraction on the preprocessed image by using an image description feature extraction algorithm to obtain description features of the massive images.
Further, the method utilizes an image description feature extraction algorithm to extract features of the preprocessed image to obtain description features of massive images; the image description feature extraction algorithm flow is as follows:
1) constructing a Hessian matrix of the image:
Figure BDA0003049199050000061
wherein:
hxx(x, sigma) is a second derivative of the image at the position x of the pixel point, and sigma is a neighborhood standard deviation of the image pixel;
2) calculating the Hessian matrix extreme value of each pixel: d (h (x) ═ hxy*hyy-(0.9*hxy)2(ii) a Selecting K pixel points with the maximum D (H (x)) in the image as local feature points of the image;
3) constructing a Gaussian scale domain space:
Figure BDA0003049199050000062
wherein:
i (x, y) is an original image;
sigma is the standard deviation of pixels of the original image;
4) comparing the local characteristic points processed by the Hessian matrix with all adjacent points of an image domain and a scale domain of the local characteristic points, and positioning stable characteristic points by adopting a non-maximum value inhibition method;
5) counting Harr wavelet characteristics in the circular neighborhood of the characteristic points and determining the main direction of the characteristic points; and 4-4 rectangular area blocks around the feature points are extracted, and the sum of the horizontal direction values, the sum of the vertical direction values, the sum of the horizontal direction absolute values and the sum of the vertical direction absolute values of the Harr wavelet features are counted to obtain 64-dimensional feature vectors which serve as image description features.
And S3, optimizing the data partitioning strategy of the big data platform by using a partitioning optimization algorithm, storing mass image data into the optimized big data platform, and taking the image description characteristics as key values of image storage.
Further, the data partitioning strategy of the big data platform is optimized by using a partitioning optimization algorithm, and in a specific embodiment of the invention, the big data platform is a Hadoop platform;
the partition optimization algorithm flow comprises the following steps:
1) the method comprises the following steps that a big data platform receives massive image data, image description characteristics are used as key values key for image storage, and the stored images are sampled based on a sampling rate s, in a specific embodiment of the invention, a sampling rate evaluation model is constructed, the image sampling rate is determined according to the constructed sampling rate evaluation model, and the sampling rate evaluation model is as follows:
s=argmin(αDs+βTs)
Figure BDA0003049199050000063
Figure BDA0003049199050000064
wherein:
covs,tthe value of the error rate of the i-th sampling at the sampling rate s, which is the difference between the data distribution after sampling and the expected distribution, covm,iThe error rate value of the ith sampling when the sampling rate is 100 percent;
ts,ithe sampling time of j sampling when the sampling rate is s is represented;
α, β represent sampling rate evaluation model parameters, which are both set to 1;
2) obtaining the total load and the total Key value Key kind number according to the sampling result, and calculating the average load of the big data platform according to the Reduce number started by the client;
traversing the Key value queue, if the value in the Key value is larger than the average value load, then the image corresponding to the Key value Key is called as a large load, and then the large load is split, wherein the splitting of the large load is divided into two cases, 1. the large load is equal to the load average value, then the Key is distributed to the node with the Reduce load of 0, and the corresponding relation between the partition number and the Reduce node is recorded; 2. and when the large load is several times of the average load, splitting the large load by using the average load, simultaneously recording the corresponding relation between the partition number and the Key, and after the large load is processed, directly distributing the small load to the Reduce node and recording the corresponding relation between the Key and the partition number.
And S4, carrying out self-adaptive segmentation processing on the stored image by utilizing a self-adaptive image segmentation algorithm to obtain a plurality of image blocks.
Further, the invention uses self-adaptive image segmentation algorithm to carry out self-adaptive segmentation processing on the stored image, and the flow of the self-adaptive image segmentation algorithm is as follows:
1) determining the image segmentation quantity K, and performing random and uniform centroid distribution operation on all pixel points in the whole image, namely distributing corresponding K clustering centers, wherein the area of each image block is S multiplied by S, S is sqrt (N/K), and N is the total number of image pixels;
2) calculating a spatial distance function between clustered centers and all pixel points in a search area
Figure BDA0003049199050000071
Figure BDA0003049199050000072
Wherein (x)i,yi) Is the cluster center of the ith image block, (x)j,yj) The pixel points in the ith image block are selected;
3) converting the image into an LAB image, calculating a scaling function m (i, j) of the image:
Figure BDA0003049199050000073
wherein:
(xij,yij) Is the difference between the distances between pixel i and pixel j;
Lij,Aij,Bijthe correlation between the pixel point i and the pixel point j about the brightness, the red and green color and the yellow and green color is shown;
4) respectively calculating a proportional function of each image block, and dividing the divided image into two parts by using the proportional function, namely an area larger than the proportional function and an area smaller than or equal to the proportional function; for the segmented small-size image, calculating the pixel difference between the small-size image and the adjacent large-size image:
tq=|L-Lq|+|A-Aq|+|B-Bq|
wherein:
tqis the pixel difference between the small-size image and the adjacent large-size image q;
l, A and B are average values of the brightness, red and green color values and yellow and green color values of the small-size image respectively;
Lq,Aq,Bqrespectively averaging the luminance, red-green color value and yellow-green color value of the adjacent large-size image q;
if tqIf the image size is less than T, combining the small-size image with the large-size image, wherein T is a preset image threshold value;
5) and performing iterative computation to optimize the algorithm until the area of the clustering center and the area of the clustering image blocks are not changed, and segmenting the original image to obtain a plurality of image blocks.
And S5, extracting the semantic information of the divided image blocks by using the image semantic feature extraction model, and taking the semantic information of all the divided image blocks as the semantic information of the original image.
Furthermore, the invention utilizes an image semantic feature extraction model to extract semantic information of the segmented image blocks, and in a specific embodiment of the invention, the network structure of the image semantic feature extraction model is a VGG16 model;
the image semantic feature extraction process comprises the following steps:
performing feature training by adopting the initial weight of ImageNet migration, and respectively extracting low-level features and high-level features of the image; basic characteristic information of the image, such as characteristics of lines, edges, shapes and the like, is extracted by the lower layer network, and the information has universality, so that the network parameters of the lower layer can directly adopt pre-trained weight information; the high-level network forms feature expression information aiming at a specific problem by combining and mapping the low-level basic features;
inputting the extracted features into the full-connection layer, and training and adjusting the weight information of the full-connection layer by using a fine-tuning strategy so as to maximize the accuracy of semantic classification of the image block; and selecting the output information of the first full-connection layer as the deep learning semantic information of the image block.
The following describes embodiments of the present invention through an algorithmic experiment and tests of the inventive treatment method. The hardware test environment of the algorithm of the invention is as follows: inter (R) core (TM) i7-6700KCPU with software Matlab2018 a; the comparison method comprises a big data image processing algorithm based on random forests and a big data image processing algorithm based on Bayes.
In the algorithm experiment of the invention, the data set is 10G of image data. In the experiment, the image data is input into the algorithm model, and the accuracy of image processing is used as an evaluation index of algorithm feasibility, wherein the higher the accuracy of image processing is, the higher the effectiveness and the feasibility of the algorithm are.
According to the experimental result, the image processing accuracy of the random forest-based big data image processing algorithm is 81.31%, the image processing accuracy of the Bayesian-based big data image processing algorithm is 86.38%, the image processing accuracy of the method is 87.92%, and compared with a comparison algorithm, the big data image processing algorithm provided by the invention can realize higher image processing accuracy.
The invention also provides a big data image processing system. Fig. 2 is a schematic diagram illustrating an internal structure of a big data image processing system according to an embodiment of the present invention.
In the present embodiment, the big data image processing system 1 includes at least an image acquisition device 11, a data processor 12, a big data image processing device 13, a communication bus 14, and a network interface 15.
The image capturing device 11 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, or a mobile Computer, or may be a server.
The data processor 12 includes at least one type of readable storage medium including flash memory, hard disks, multi-media cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, and the like. The data processor 12 may in some embodiments be an internal storage unit of the big data image processing system 1, such as a hard disk of the big data image processing system 1. The data processor 12 may also be an external storage device of the big data image processing system 1 in other embodiments, such as a plug-in hard disk provided on the big data image processing system 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and so on. Further, the data processor 12 may also include both an internal storage unit and an external storage device of the large data image processing system 1. The data processor 12 can be used not only to store application software installed in the large data image processing system 1 and various types of data, but also to temporarily store data that has been output or is to be output.
The big data image Processing apparatus 13 may be, in some embodiments, a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip for running program codes stored in the data processor 12 or Processing data, such as big data image Processing program instructions 16.
The communication bus 14 is used to enable connection communication between these components.
The network interface 15 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and is typically used to establish a communication link between the system 1 and other electronic devices.
Optionally, the big data image processing system 1 may further include a user interface, the user interface may include a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface may further include a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the big data image processing system 1 and for displaying a visualized user interface.
While FIG. 2 only shows the components 11-15 and the big data image processing system 1, those skilled in the art will appreciate that the configuration shown in FIG. 1 does not constitute a limitation of the big data image processing system 1, and may include fewer or more components than shown, or some components in combination, or a different arrangement of components.
In the embodiment of the big-data image processing system 1 shown in fig. 2, big-data image processing program instructions 16 are stored in the data processor 12; the steps of the big-data image processing apparatus 13 executing the big-data image processing program instructions 16 stored in the data processor 12 are the same as the implementation method of the big-data image processing algorithm, and are not described here.
Furthermore, an embodiment of the present invention also provides a computer-readable storage medium having stored thereon big-data image processing program instructions executable by one or more processors to implement the following operations:
acquiring mass image data, and performing image graying and grayscale stretching preprocessing on the image data to obtain preprocessed mass image data;
performing feature extraction on the preprocessed image by using an image description feature extraction algorithm to obtain description features of massive images;
optimizing a data partitioning strategy of the big data platform by using a partitioning optimization algorithm, storing mass image data into the optimized big data platform, and taking image description characteristics as key values of image storage;
carrying out self-adaptive segmentation processing on the stored image by using a self-adaptive image segmentation algorithm to obtain a plurality of image blocks;
and extracting the semantic information of the segmented image blocks by using an image semantic feature extraction model, and taking the semantic information of all the segmented image blocks as the semantic information of the original image.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (8)

1. A big data image processing algorithm, the method comprising:
acquiring mass image data, and performing image graying and grayscale stretching preprocessing on the image data to obtain preprocessed mass image data;
performing feature extraction on the preprocessed image by using an image description feature extraction algorithm to obtain description features of massive images;
optimizing a data partitioning strategy of the big data platform by using a partitioning optimization algorithm, storing mass image data into the optimized big data platform, and taking image description characteristics as key values of image storage;
carrying out self-adaptive segmentation processing on the stored image by using a self-adaptive image segmentation algorithm to obtain a plurality of image blocks;
and extracting the semantic information of the segmented image blocks by using an image semantic feature extraction model, and taking the semantic information of all the segmented image blocks as the semantic information of the original image.
2. The big data image processing algorithm according to claim 1, wherein the pre-processing of image graying and gray stretching for the image data comprises:
1) solving the maximum value of three components of each pixel in the acquired massive images, and setting the maximum value as the gray value of the pixel point to obtain a gray map of the image, wherein the formula of the gray processing is as follows:
G(i,j)=max{R(i,j),G(i,j),B(i,j)}
wherein:
(i, j) is a pixel point in the image;
r (i, j), G (i, j) and B (i, j) are respectively the values of the pixel point (i, j) in R, G, B three color channels;
g (i, j) is the gray value of the pixel point (i, j);
2) for the gray-scale image, stretching the gray-scale of the image by using a piecewise linear transformation, wherein the formula of the gray-scale stretching is as follows:
Figure FDA0003049199040000011
wherein:
f (x, y) is a gray scale map;
MAXf(x,y),MINf(x,y)respectively the maximum and minimum grey values of the grey map.
3. The big data image processing algorithm according to claim 2, wherein the performing the feature extraction on the preprocessed image by using the image description feature extraction algorithm comprises:
1) constructing a Hessian matrix of the image:
Figure FDA0003049199040000012
wherein:
hxx(x, sigma) is the position of image at pixel point xA second derivative, wherein sigma is a neighborhood standard deviation of an image pixel;
2) calculating the Hessian matrix extreme value of each pixel: d (h (x) ═ hxy*hyy-(0.9*hxy)2(ii) a Selecting K pixel points with the maximum D (H (x)) in the image as local feature points of the image;
3) constructing a Gaussian scale domain space:
Figure FDA0003049199040000013
wherein:
i (x, y) is an original image;
sigma is the standard deviation of pixels of the original image;
4) comparing the local characteristic points processed by the Hessian matrix with all adjacent points of an image domain and a scale domain of the local characteristic points, and positioning stable characteristic points by adopting a non-maximum value inhibition method;
5) counting Harr wavelet characteristics in the circular neighborhood of the characteristic points and determining the main direction of the characteristic points; and 4-4 rectangular area blocks around the feature points are extracted, and the sum of the horizontal direction values, the sum of the vertical direction values, the sum of the horizontal direction absolute values and the sum of the vertical direction absolute values of the Harr wavelet features are counted to obtain 64-dimensional feature vectors which serve as image description features.
4. The big data image processing algorithm according to claim 3, wherein the optimizing the data partitioning policy of the big data platform by using the partitioning optimization algorithm comprises:
1) the method comprises the following steps that a big data platform receives massive image data, image description characteristics are used as key values key for image storage, a sampling rate evaluation model is built, the image sampling rate is determined according to the built sampling rate evaluation model, the stored image is sampled based on the sampling rate s, and the sampling rate evaluation model is as follows:
s=argmin(αDs+βTs)
Figure FDA0003049199040000021
Figure FDA0003049199040000022
wherein:
covs,ithe value of the error rate of the i-th sampling at the sampling rate s, which is the difference between the data distribution after sampling and the expected distribution, covm,iThe error rate value of the ith sampling when the sampling rate is 100 percent;
ts,ithe sampling time of j sampling when the sampling rate is s is represented;
α, β represent sampling rate evaluation model parameters, which are both set to 1;
2) obtaining the total load and the total Key value Key kind number according to the sampling result, and calculating the average load of the big data platform according to the Reduce number started by the client;
traversing the Key value queue, if the value in the Key value is larger than the average value load, then the image corresponding to the Key value Key is called as a large load, and then the large load is split, wherein the splitting of the large load is divided into two cases, 1. the large load is equal to the load average value, then the Key is distributed to the node with the Reduce load of 0, and the corresponding relation between the partition number and the Reduce node is recorded; 2. and when the large load is several times of the average load, splitting the large load by using the average load, simultaneously recording the corresponding relation between the partition number and the Key, and after the large load is processed, directly distributing the small load to the Reduce node and recording the corresponding relation between the Key and the partition number.
5. The big data image processing algorithm of claim 4, wherein the adaptively segmenting the stored image using an adaptive image segmentation algorithm comprises:
1) determining the image segmentation quantity K, and performing random and uniform centroid distribution operation on all pixel points in the whole image, namely distributing corresponding K clustering centers, wherein the area of each image block is S multiplied by S, S is sqrt (N/K), and N is the total number of image pixels;
2) calculating a spatial distance function between clustered centers and all pixel points in a search area
Figure FDA0003049199040000023
Figure FDA0003049199040000024
Wherein (x)i,yi) Is the cluster center of the ith image block, (x)j,yj) The pixel points in the ith image block are selected;
3) converting the image into an LAB image, calculating a scaling function m (i, j) of the image:
Figure FDA0003049199040000025
wherein:
(xij,yij) Is the difference between the distances between pixel i and pixel j;
Lij,Aij,Bijthe correlation between the pixel point i and the pixel point j about the brightness, the red and green color and the yellow and green color is shown;
4) respectively calculating a proportional function of each image block, and dividing the divided image into two parts by using the proportional function, namely an area larger than the proportional function and an area smaller than or equal to the proportional function; for the segmented small-size image, calculating the pixel difference between the small-size image and the adjacent large-size image:
tq=|L-Lq|+|A-Aq|+|B-Bq|
wherein:
tqis the pixel difference between the small-size image and the adjacent large-size image q;
l, A and B are average values of the brightness, red and green color values and yellow and green color values of the small-size image respectively;
Lq,Aq,Bqrespectively adjacent to the large-size image q brightness, red and green color valuesAnd an average of yellow-green color values;
if tqIf the image size is less than T, combining the small-size image with the large-size image, wherein T is a preset image threshold value;
5) and performing iterative computation to optimize the algorithm until the area of the clustering center and the area of the clustering image blocks are not changed, and segmenting the original image to obtain a plurality of image blocks.
6. The big data image processing algorithm according to claim 5, wherein the extracting semantic information of the segmented image block by using the image semantic feature extraction model comprises:
the image semantic feature extraction process comprises the following steps:
performing feature training by adopting the initial weight of ImageNet migration, and respectively extracting low-level features and high-level features of the image;
inputting the extracted features into the full-connection layer, and training and adjusting the weight information of the full-connection layer by using a fine-tuning strategy so as to maximize the accuracy of semantic classification of the image block; and selecting the output information of the first full-connection layer as the deep learning semantic information of the image block.
7. A big data image processing system, the system comprising:
the image acquisition device is used for acquiring mass image data;
the data processor is used for preprocessing image graying and gray stretching to the image data to obtain preprocessed massive image data, and extracting the characteristics of the preprocessed image by using an image description characteristic extraction algorithm to obtain the description characteristics of the massive image;
the big data image processing device is used for optimizing a data partitioning strategy of the big data platform by utilizing a partitioning optimization algorithm, storing mass image data into the optimized big data platform, taking image description characteristics as key values of image storage, and performing self-adaptive segmentation processing on the stored image by utilizing a self-adaptive image segmentation algorithm to obtain a plurality of image blocks; and extracting the semantic information of the segmented image blocks by using an image semantic feature extraction model, and taking the semantic information of all the segmented image blocks as the semantic information of the original image.
8. A computer readable storage medium having stored thereon big data image processing program instructions executable by one or more processors to implement the steps of an implementation method of big data image processing as described above.
CN202110480856.3A 2021-04-30 2021-04-30 Big data image processing algorithm and system Pending CN113095286A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110480856.3A CN113095286A (en) 2021-04-30 2021-04-30 Big data image processing algorithm and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110480856.3A CN113095286A (en) 2021-04-30 2021-04-30 Big data image processing algorithm and system

Publications (1)

Publication Number Publication Date
CN113095286A true CN113095286A (en) 2021-07-09

Family

ID=76680904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110480856.3A Pending CN113095286A (en) 2021-04-30 2021-04-30 Big data image processing algorithm and system

Country Status (1)

Country Link
CN (1) CN113095286A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858832A (en) * 2023-03-01 2023-03-28 天津市邱姆预应力钢绞线有限公司 Method and system for storing production data of steel strand

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108230340A (en) * 2018-02-05 2018-06-29 南京邮电大学 A kind of SLIC super-pixel extraction Weighting and super-pixel extracting method based on MMTD
CN110135438A (en) * 2019-05-09 2019-08-16 哈尔滨工程大学 A kind of improvement SURF algorithm based on gradient magnitude pre-computation
US20190347767A1 (en) * 2018-05-11 2019-11-14 Boe Technology Group Co., Ltd. Image processing method and device
CN112287140A (en) * 2020-10-28 2021-01-29 汪礼君 Image retrieval method and system based on big data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108230340A (en) * 2018-02-05 2018-06-29 南京邮电大学 A kind of SLIC super-pixel extraction Weighting and super-pixel extracting method based on MMTD
US20190347767A1 (en) * 2018-05-11 2019-11-14 Boe Technology Group Co., Ltd. Image processing method and device
CN110135438A (en) * 2019-05-09 2019-08-16 哈尔滨工程大学 A kind of improvement SURF algorithm based on gradient magnitude pre-computation
CN112287140A (en) * 2020-10-28 2021-01-29 汪礼君 Image retrieval method and system based on big data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄伟健 等: "并行随机抽样贪心算法分区的MapReduce负载均衡研究", 《现代电子技术》, vol. 43, no. 16, pages 1 - 5 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858832A (en) * 2023-03-01 2023-03-28 天津市邱姆预应力钢绞线有限公司 Method and system for storing production data of steel strand

Similar Documents

Publication Publication Date Title
US10740647B2 (en) Detecting objects using a weakly supervised model
US11657602B2 (en) Font identification from imagery
US10706322B1 (en) Semantic ordering of image text
CN112016546A (en) Text region positioning method and device
US20220108478A1 (en) Processing images using self-attention based neural networks
CN113222921A (en) Image processing method and system
CN109934239B (en) Image feature extraction method
CN111935487A (en) Image compression method and system based on video stream detection
CN111915542A (en) Image content description method and system based on deep learning
CN115272691A (en) Training method, recognition method and equipment for steel bar binding state detection model
CN113095286A (en) Big data image processing algorithm and system
CN114529750A (en) Image classification method, device, equipment and storage medium
CN113887447A (en) Training method of object classification model, object classification prediction method and device
US20210216874A1 (en) Radioactive data generation
CN110705547A (en) Method and device for recognizing characters in image and computer readable storage medium
CN115713769A (en) Training method and device of text detection model, computer equipment and storage medium
CN114463764A (en) Table line detection method and device, computer equipment and storage medium
CN114913339A (en) Training method and device of feature map extraction model
CN112257677A (en) Method and device for processing deep learning task in big data cluster
CN115063473A (en) Object height detection method and device, computer equipment and storage medium
Phadikar et al. A comprehensive assessment of content-based image retrieval using selected full reference image quality assessment algorithms
CN114511862A (en) Form identification method and device and electronic equipment
CN113298702A (en) Reordering and dividing method based on large-size image pixel points
CN112381458A (en) Project evaluation method, project evaluation device, equipment and storage medium
CN112784189A (en) Method and device for identifying page image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination