CN116958717B - Intelligent geological big data cleaning method based on machine learning - Google Patents

Intelligent geological big data cleaning method based on machine learning Download PDF

Info

Publication number
CN116958717B
CN116958717B CN202311210864.1A CN202311210864A CN116958717B CN 116958717 B CN116958717 B CN 116958717B CN 202311210864 A CN202311210864 A CN 202311210864A CN 116958717 B CN116958717 B CN 116958717B
Authority
CN
China
Prior art keywords
value
image
gradient
downsampling
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311210864.1A
Other languages
Chinese (zh)
Other versions
CN116958717A (en
Inventor
薛立明
何祖臣
赵相博
徐进达
于广婷
刘同文
张志进
宋冠涵
段越洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Institute of Geological Surveying and Mapping
Original Assignee
Shandong Institute of Geological Surveying and Mapping
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Institute of Geological Surveying and Mapping filed Critical Shandong Institute of Geological Surveying and Mapping
Priority to CN202311210864.1A priority Critical patent/CN116958717B/en
Publication of CN116958717A publication Critical patent/CN116958717A/en
Application granted granted Critical
Publication of CN116958717B publication Critical patent/CN116958717B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image processing, in particular to a machine learning-based intelligent geological big data cleaning method, which comprises the following steps: obtaining gradient images of the landform images under different scales; constructing a two-dimensional histogram; obtaining a loss grading value of the gradient image according to each gradient image and the information entropy of each gradient image under each downsampling; obtaining an optimal variable value; acquiring a loss grading value of a gradient image under each downsampling of any gradient image according to the optimal variable value, acquiring the monotonicity of an optimal loss grading value sequence, and acquiring an optimal scale value; acquiring a retention probability of each image in each group; obtaining a loss scoring sequence of each group according to the optimal scale; and acquiring a retention grading value, acquiring an image reserved by the frame extraction according to the retention grading value, and completing intelligent cleaning of the geological big data. According to the method, the geological big data are cleaned by acquiring the reservation scores of the images, and the validity of the data after frame extraction is ensured.

Description

Intelligent geological big data cleaning method based on machine learning
Technical Field
The invention relates to the technical field of image processing, in particular to an intelligent geological big data cleaning method based on machine learning.
Background
The extraction of frames from the geomorphic time-series image data refers to the extraction of key frames from a series of successive images of the topography to more intuitively observe and analyze the process of changing the topography. The method has high value and significance, and the geomorphic change process after frame extraction can be used for making expression forms such as animation, visualization and the like, and the geomorphic evolution process can be intuitively displayed. However, since the collected data of the relief time-series image data are often large, the calculation cost is often large when the frame is taken as the data cleaning process of the visualization operation.
The existing time sequence geomorphic data cleaning data calculate the matching degree between continuous images, but because the single geomorphic image data is larger and the change of the geomorphic is larger, when the geomorphic time sequence data frame extraction is carried out, the effect is poor, the construction of a geomorphic model is not utilized, and the scheme provides an intelligent cleaning method for the geomorphic big data based on machine learning.
Disclosure of Invention
The invention provides a machine learning-based intelligent geological big data cleaning method, which aims to solve the existing problems.
The intelligent geological big data cleaning method based on machine learning adopts the following technical scheme:
the embodiment of the invention provides a machine learning-based intelligent geological big data cleaning method, which comprises the following steps of:
acquiring a plurality of initial landform images, and performing downsampling on each initial landform image for a plurality of times to obtain a landform image of the initial landform image after downsampling each time and a gradient image of the initial landform image after downsampling each time;
acquiring an eight-neighborhood gray level average value of any pixel point on the landform image after each downsampling, and constructing a gradient-gray level average value two-dimensional histogram of the random initial landform image after each downsampling according to the eight-neighborhood gray level average value of any pixel point and the gradient value;
obtaining the information entropy of the gradient image after each downsampling according to the gradient-gray average value two-dimensional histogram after each downsampling;
obtaining a gradient image loss grading value after each downsampling according to the information entropy of the gradient image after each downsampling and the information entropy of the gradient image of each initial landform image;
acquiring a gradient image sequence of each initial landform image after multiple downsampling, acquiring an information entropy ratio sequence of each initial landform image after multiple downsampling, and acquiring an initial loss grading value of the gradient image after multiple downsampling of each initial landform image;
obtaining an optimal variable value according to the gradient image sequence and the information entropy ratio sequence of each initial landform image after repeated downsampling and the initial loss grading value;
updating the gradient image loss score value after each downsampling by utilizing the optimal variable value to obtain a gradient image loss score updating value after each downsampling; obtaining an optimal scale value of each initial landform image according to the gradient image loss score updating values after all times of downsampling;
and performing frame extraction on all the initial landform images according to the optimal scale value to obtain a cleaning result.
Preferably, the gradient image of the initial relief image after each downsampling comprises the following specific steps:
optionally selecting a pixel point from the landform image after each downsampling to establishA window with a size, acquiring a maximum gray value and a minimum gray value of pixel points in the window, and combining the maximum gray value and the maximum gray value in the windowAnd (3) making a difference between the small gray values and giving the difference to a central pixel point of the window to obtain gradient values of the central pixel point of the window, wherein the gradient values of all the pixel points on the landform image after each downsampling form a gradient image after each downsampling.
Preferably, the constructing the gradient-gray average two-dimensional histogram of the arbitrary initial landform image after each downsampling according to the eight-neighborhood gray average of the arbitrary pixel point and the gradient value includes the following specific steps:
combining the gray average value and the gradient value corresponding to each pixel point position in the landform image after each downsampling into a group of binary groups, counting the occurrence frequency of each group of binary groups in the landform image after each downsampling, and constructing a gradient-gray average value two-dimensional histogram through the occurrence frequency of each group of binary groups in the landform image after each downsampling, wherein the gradient-gray average value two-dimensional histogramAxis represents gradient values>Axis represents gray mean value>The axis represents the frequency of occurrence of each set of tuples.
Preferably, the obtaining the information entropy of the gradient image after each downsampling according to the gradient-gray average two-dimensional histogram after each downsampling comprises the following specific steps:
the gradient-gray level average two-dimensional histogram represents the occurrence probability of each binary group, and the binary group comprises an eight-neighborhood gray level average and a gradient value of any pixel point; the information entropy acquisition method of the gradient image comprises the following steps:
in the middle ofRepresenting gradient image +.>Information entropy of->Indicate->The initial relief image is at +.>Gradient image after subsampling, +.>Representing gradient image +.>In a gradient-gray mean two-dimensional histogram +.>Probability of the individual doublet Q>Representing gradient image +.>The number of bins in the gradient-gray mean two-dimensional histogram,representing a logarithmic function.
Preferably, the gradient image loss scoring value after each downsampling is obtained according to the information entropy of the gradient image after each downsampling and the information entropy of the gradient image of each initial landform image, and the specific steps include:
the method for obtaining the loss scoring value comprises the following steps:
in the middle ofRepresenting gradient image +.>Loss score value,/>Indicate->The initial relief image is at the firstGradient image after subsampling, +.>Expressed as natural constant->An exponential function of the base +_>Representation pair->Maximum value and minimum value normalization is carried out on the values, and the values are +.>Indicate->Subsampling, & lt->Representing gradient image +.>Information entropy value of>First->Information entropy value of the individual initial relief image, < ->Representing a preset initial variable value, +.>Representing gradient image +.>Is a ratio of the information entropy of (a) to (b).
Preferably, the obtaining the optimal variable value according to the correlation between the gradient image sequence and the information entropy ratio sequence of each initial landform image after multiple downsampling and the initial loss scoring value comprises the following specific steps:
the downsampling times of any initial landform image are all times to form a downsampling time sequence of any initial landform image, the information entropy ratio of all gradient images in the downsampling time sequence is an information entropy ratio sequence, the loss grading values of all gradient images in the downsampling time sequence are a loss grading value sequence, the pearson coefficient value of the downsampling time sequence of any loss grading value sequence and any initial landform image is calculated and recorded asCalculating the pearson coefficient value of any loss score value sequence and any information entropy ratio value sequence, and marking the pearson coefficient value as +.>Obtain->,/>Is +.>Before taking->An initial relief image,/->Representing a first preset number, before acquisition +.>Accumulated value corresponding to the initial relief image +.>Will be->Accumulated value corresponding to the initial relief image +.>Accumulating, and taking the accumulated result as a variable value +.>The objective function value of (2) is to be the variable value at which the objective function value is to be maximized +.>The optimal variable value is recorded.
Preferably, the obtaining the optimal scale value of each initial landform image according to the gradient image loss score updated values after all downsampling comprises the following specific steps:
obtaining an optimal loss score updating value of a gradient image under each downsampling of any initial landform image through an optimal variable value to form an optimal loss score updating value sequence, wherein the obtaining method of the optimal loss score updating value is the same as that of the loss score value; subtracting the previous term from the latter term in the optimal loss score updating value sequence to obtain a difference value sequence, wherein if all the difference values in the difference value sequence are positive or negative signs, the optimal loss score updating value sequence is a monotone sequence, otherwise, the optimal loss score updating value sequence is a fluctuation sequence; if the optimal loss score updating value sequence is a fluctuation sequence, selecting the downsampling times corresponding to the maximum loss score updating value in the optimal loss score updating value sequence as an optimal scale value; and if the optimal loss score updating value sequence is a monotonic sequence, selecting the downsampling times corresponding to the maximum derivative value in the optimal loss score updating value sequence as the optimal scale value of each initial landform image.
Preferably, the step of performing frame extraction on all the initial landform images according to the optimal scale value to obtain a cleaning result comprises the following specific steps:
every time according to the time sequence of camera acquisitionThe stretch relief images are a group of +.>For the second preset number, marking any one initial landform image in each group as a target image, taking a time value corresponding to the acquisition time of the target image as a mean value, and establishing a variance as +.>Is a gaussian function model of->Substituting the time values corresponding to all initial landform image acquisition moments in each group into corresponding Gaussian function models to obtain a plurality of function values of the Gaussian function models for preset variance values, marking the function values as Gaussian values of the target image, and taking the maximum Gaussian value as retention probability of the target image;
and screening out the optimal scale from all groups of optimal scales, marking as y, obtaining a retention grading value of each initial landform image in each group according to the optimal scale, retaining the initial landform image with the largest retention grading value in each group, deleting other initial landform images in each group, and finally performing y downsampling on the initial landform images retained in all groups to obtain a cleaning result.
Preferably, the obtaining the retention score value of each initial landform image in each group according to the optimal scale includes the following specific steps:
the best scale is marked asWill->The initial relief image is at +.>The gradient image after subsampling is recorded asGradient image +.>The loss score value is marked as +.>The product of the retention probability of each initial relief image in each group and the loss score update value of each initial relief image is recorded as the retention score value of each initial relief image in each group.
Preferably, the method for obtaining the optimal scale from all groups of optimal scales comprises the following steps:
the mode of all best scale values in all groups is taken as the preferred scale for each group.
The technical scheme of the invention has the beneficial effects that: the method comprises the steps of carrying out downsampling on images to different degrees, obtaining loss scores of the images under different scales, obtaining an optimal scale value according to the loss scores, guaranteeing validity of data in frame extraction, enabling the loss scores to be more accurate through obtaining the optimal variable value, enabling details of the images to be lost through downsampling of the images, enabling differences of the loss scores after downsampling to be as large as possible, facilitating frame extraction judgment, enabling frame extraction judgment to be more accurate, grouping the images, obtaining retention probability of each image in each group of images, carrying out frame extraction screening on each image through the loss scores and the retention probability, obtaining the optimal retention frame, and achieving the purpose of intelligent cleaning of geological big data while retaining image features as much as possible.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of steps of a machine learning-based intelligent cleaning method for geological big data.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following detailed description is given below of the specific implementation, structure, characteristics and effects of the intelligent cleaning method for geological big data based on machine learning according to the invention by combining the attached drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the intelligent geological big data cleaning method based on machine learning.
Referring to fig. 1, a flowchart of steps of a machine learning-based intelligent geological big data cleaning method according to an embodiment of the present invention is shown, and the method includes the following steps:
s001, acquiring gradient images of the landform images, and acquiring gradient images of the landform images under different scales through downsampling.
It should be noted that: because the corresponding single Zhang Demao image in the landform time sequence image data is larger at a single moment and the landform range is wider, the calculation amount is larger when the landform time sequence image data is acquired to carry out the visualized splicing of the landform data, and the frequency of the acquired landform time sequence image data is higher; the geomorphic image data with smaller time interval values have higher repeatability and need to be removed, wherein matching degree measurement is needed in the removing process, in order to reduce the matching degree calculation amount of the corresponding image data at different moments, different scale image data corresponding to a single Zhang Demao image corresponding to a single moment are obtained by using a pyramid algorithm, the smaller scale image data are used for matching degree calculation of the corresponding image data at different moments, and the calculation amount in the data cleaning process is reduced.
The method comprises the steps that image acquisition is carried out on geological landforms through satellites or aerial cameras, a plurality of landform time sequence image data are acquired, and a first landform image in the acquired landform time sequence image data is recorded as a first initial landform image; downsampling any initial landform image by using an image pyramid algorithm, scaling any initial landform image by two times, and scaling the image by 2 times of the previous scale by using the image pyramid algorithm any time, wherein the length value and the width value of the scaled image are 0.5 times of the size before the previous scale scaling; performing arbitrary initial relief imageSubsampling, the present embodiment usesDescribing, obtaining arbitrary initial relief image +.>A relief image at a seed scale; optionally selecting a pixel point from any one of the landform images to establish +.>A window of size, in this embodiment +.>To describe, other values may be set during implementation, the embodiment is not limited specifically, the maximum gray value and the minimum gray value of the pixel points in the window are obtained, the difference between the maximum gray value and the minimum gray value in the window is made, the difference is given to the central pixel point of the window, the gradient value of the central pixel point of the window is obtained, and the gradient value of any pixel point is obtainedAll pixel points in the geomorphic images are subjected to the processing to obtain gradient values of all pixel points in any one geomorphic image, and a gradient value matrix constructed by the gradient values of all pixel points in any one geomorphic image is recorded as any gradient map; similarly, any gradient map of any relief image in the initial relief time sequence image data is obtained, and the +.>First ∈9 of the initial relief image>The gradient image under subsampling is denoted +.>
S002, acquiring a two-dimensional histogram according to the gradient image.
It should be noted that: after the gradient image is downsampled, compared with the gradient image which is not downsampled, the information quantity of the downsampled gradient image has different degrees of loss, and the larger the loss is, the worse the matching effect is when the images under the same scale at different moments are matched; therefore, the acquisition of the optimal matching scale value at different moments is completed by acquiring the loss scale of each image at different scales according to the matchable coefficient between the information loss degrees of the corresponding landform images at different moments.
Acquiring an eight-neighborhood gray level average value of any pixel point on a landform image after each downsampling, acquiring a gradient value of each pixel point in the gradient image according to the gradient image after each downsampling, wherein each pixel point in the landform image after each downsampling corresponds to one gray level average value and one gradient value, forming a group of binary groups by the gray level average value and the gradient value corresponding to each pixel point in the landform image after each downsampling, wherein the binary groups comprise the eight-neighborhood gray level average value and the gradient value of any pixel point, counting the occurrence frequency of each binary group in the landform image after each downsampling, and constructing a gradient-gray level average value two-dimensional by the occurrence frequency of each binary group in the landform image after each downsamplingHistogram, wherein gradient-gray mean two-dimensional histogramAxis represents gradient values>Axis represents gray mean value>The axis represents the frequency of occurrence of each set of tuples; thus, a gradient-gray average two-dimensional histogram of the relief image after each downsampling of each initial relief image is obtained.
S003, obtaining a loss grading value of each gradient image under each scale, and obtaining an optimal scale value of each gradient image according to the loss grading value.
Acquiring information entropy of each initial landform image at each scale according to the gradient-gray level mean two-dimensional histogram of each initial landform image at each scale to obtain the firstFirst ∈9 of the initial relief image>Gradient image under subsampling +.>For example, gradient image->The information entropy of (a) is:
the gradient-gray level average two-dimensional histogram represents the occurrence probability of each binary group, and the binary group comprises an eight-neighborhood gray level average and a gradient value of any pixel point;
in the middle ofRepresenting gradient image +.>Information entropy of->Indicate->The initial relief image is at +.>Gradient image after subsampling, +.>Representing gradient image +.>In a gradient-gray mean two-dimensional histogram +.>Probability of the individual doublet Q>Representing gradient image +.>The number of bins in the gradient-gray mean two-dimensional histogram.
Obtaining a gradient image loss grading value after each downsampling according to the information entropy of the gradient image after each downsampling and the information entropy of the gradient image of each initial landform image, wherein the gradient image loss grading value is obtained after each downsamplingThe method for obtaining the loss scoring value comprises the following steps:
in the middle ofRepresenting gradientsImage->Loss score value,/>Expressed as natural constant->An exponential function of the base +_>Representation pair->Maximum value and minimum value normalization is carried out on the values, and the values are +.>Indicate->The number of downsampling times is such that,indicate->First ∈9 of the initial relief image>Subsampled information entropy value, +.>First->Information entropy value of initial relief image, +.>Represents an initial variable value, an initial traversal value +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein->The larger the value of (2) is, the smaller the detail loss is, representing an approximation of the entropy of the information between the two images, with a certain size scaling>The larger the value, the better the effect is when used for image matching after the current size is scaled; />The larger indicates the more severe the scaling, +.>The larger the value, the better the effect is when used for image matching after the current size scaling, and further +.>The larger the value of (c) indicates that when a size is selected for image matching, the size should be selected as much as possible for image matching.
It should be noted that: wherein the variable valueThe difference of the values of (2) will lead to different changes of the loss scoring values under different scales, and the value of the obtained variable +.>Value of (2) let->When the loss scoring values under different downsampling of the initial relief image show fluctuation, the variable value is taken>The greater the degree of preference for the values of (c).
Optimum variable valueThe acquisition method of (1) comprises the following steps: get->Downsampling times of initial relief image to form +.>The downsampling order sequence of the initial relief image is marked as +.>,/>Indicate->First ∈9 of the initial relief image>Subsampling according to->Acquiring an information entropy ratio sequence by a downsampling frequency sequence of an initial landform image, and recording the information entropy ratio sequence asWherein->Indicate->First ∈9 of the initial relief image>Subsampled information entropy value, +.>First->Information entropy of initial relief image by initial +.>Value, in this embodimentTo describe an example, calculate +.>The loss score values of the gradient image at each downsampling of the initial relief image constitute a sequence of loss score values, denoted +.>Calculate the loss score sequence +.>And->Downsampling sequence of initial relief image +.>Is recorded as +.about.the value of the pearson coefficient>Wherein the calculation method of the pearson coefficient is the prior art, the present embodiment is not summarized in detail, and the loss score value sequence +.>And->Information entropy ratio sequence of initial relief image>Is recorded as +.about.the value of the pearson coefficient>Obtain->,/>Is recorded as +.>,/>,/>Is->The pearson coefficient accumulated value of the initial relief image.
Before acquisitionAn initial relief image, in this embodiment +.>To describe, before the above method is used to obtain->The pearson coefficient accumulated value corresponding to the initial topography image is added before +.>Accumulating the pearson coefficient accumulated values corresponding to the initial landform images, and taking the accumulated result as a variable value +.>Solving the objective function by adopting a simulated annealing algorithm to obtain a variable value when the objective function is maximum +.>Recorded as the optimal variable value +.>Wherein->Representing the optimal variable value.
Updating the gradient image loss scoring value after each downsampling by using the optimal variable value to obtainThe gradient image loss score updated value after each downsampling, wherein the loss score updated value is obtained by the same method, the difference is that the selected variable values are different, an optimal loss score updated value sequence is formed, and the optimal loss score updated value sequence is recorded asObtaining monotonicity of the best loss score update value sequence according to each loss score update value in the best loss score update value sequence, namely: updating value sequences with optimal loss scoresSubtracting the previous term from the latter term to obtain a difference sequence, wherein if all the difference signs in the difference sequence are positive signs or negative signs, the optimal loss score updating value sequence is a monotone sequence, otherwise, the optimal loss score updating value sequence is a fluctuation sequence.
Acquiring an optimal scale value according to monotonicity of an optimal loss score updating value sequence, and selecting a scale value corresponding to a maximum loss score updating value in the optimal loss score updating value sequence if the optimal loss score updating value sequence is a fluctuation sequenceAs the optimal scale value; if the best loss score update value sequence is a monotonic sequence, nonlinear fitting is performed on the loss score update values in the best loss score update value sequence by using a least square method to obtain a corresponding fitting function, the fitting function used in the embodiment is a polynomial of 5 times, each loss score update value is substituted into a derivative function of the fitting function to obtain a derivative corresponding to each loss score update value, the larger the derivative is, the largest change of the loss score values before and after the corresponding loss score update value is indicated, namely the highest cost performance between the corresponding scale and the loss is indicated, and the value corresponding to the maximum value of the derivative is selected>The value is used as the optimal scale value, wherein the derivation process of the fitting function is known, and the embodiment is not repeated, and any gradient map of any initial landform image is processedAnd (3) carrying out the processing to obtain the optimal scale value of any gradient map of any initial landform image.
The optimal scale obtained by maximizing the objective function ensures that the downsampled geomorphic image does not lose too much information details along with the increase of downsampling times.
S004, acquiring image retention probability, and acquiring a retained target image frame according to a retention score value.
Every time according to the time sequence of camera acquisitionThe group of the relief images is +.>Describing an example, the variance is established as +.about.about.time value corresponding to each initial landform image acquisition time as the center point>In the present embodiment with +.>Describing an example, substituting all time values corresponding to the initial landform image acquisition time into the corresponding Gaussian function model, taking the maximum value as the retention probability of the corresponding image, and adding +.>First->Zhang Chushi the retention probability of the relief image is marked +.>Then->The retention probability sequence for the group is:wherein->Indicate->First->Zhang Chushi retention probability of the relief image; since the two images are identical in size when the images are matched, the optimal scale value of each initial landform image obtained by calculation is different, and therefore the +.>The number of occurrences of each best scale in the group, the mode is chosen as +.>The preferred dimensions of the group, the best dimensions are noted +.>Obtain->Corresponding loss score update value at preferred scale in group, wherein +.>First->The corresponding loss score update value at the preferred scale of Zhang Chushi relief image is recorded as +.>Then->The loss score update sequence for the group is: />Wherein->Indicate->Loss score update value of 1 st image in group,/->Indicate->The loss score update value for the 2 nd image in the group,indicate->Loss score update value of 3 rd image in group,/->Indicate->First->Loss score update value for sheet image, +.>Indicate->First->Loss score update value of sheet image according to +.>Group retention probability sequence and loss score update value sequence to obtain +.>A retention score value for each of the relief images in the group, i.e., the firstRetention probability of each initial relief image in group +.>And normalized loss score update value +.>Multiplying to obtain retention score->For the ∈th after multiplication>Retention score value of group relief image +.>Detecting peak point, selecting the landform image corresponding to the peak point as the landform image remained in the frame extraction process, thereby obtaining the +.>And extracting the geomorphic images reserved in the frames, performing the frame extraction operation on all the initial geomorphic images of the groups acquired by the cameras, and completing intelligent cleaning of geological big data after the frame extraction of all the initial geomorphic images is completed.
Through the steps, the intelligent geological big data cleaning method based on machine learning is completed.
According to the embodiment of the invention, the images are downsampled to different degrees, the loss scores of the images under different scales are obtained, the optimal scale value is obtained according to the loss scores, the validity of data in frame extraction is ensured, the loss scores are more accurate by obtaining the optimal variable value, the retention probability of each image in each group of images is obtained by grouping the images, and the frame extraction screening is carried out on each image through the loss scores and the retention probability to obtain the optimal retention frame, so that the aim of intelligently cleaning geological big data is fulfilled while the image characteristics are retained as much as possible, the quantity and the size of the cleaned geomorphic images are small, the rapid construction of the subsequent geomorphic model is facilitated, and meanwhile, most of information details in the initial geomorphic image are still kept in the cleaned geomorphic images.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the invention, but any modifications, equivalent substitutions, improvements, etc. within the principles of the present invention should be included in the scope of the present invention.

Claims (8)

1. The intelligent geological big data cleaning method based on machine learning is characterized by comprising the following steps of:
acquiring a plurality of initial landform images, and performing downsampling on each initial landform image for a plurality of times to obtain a landform image of the initial landform image after downsampling each time and a gradient image of the initial landform image after downsampling each time;
acquiring an eight-neighborhood gray level average value of any pixel point on the landform image after each downsampling, and constructing a gradient-gray level average value two-dimensional histogram of the random initial landform image after each downsampling according to the eight-neighborhood gray level average value of any pixel point and the gradient value;
obtaining the information entropy of the gradient image after each downsampling according to the gradient-gray average value two-dimensional histogram after each downsampling;
obtaining a gradient image loss grading value after each downsampling according to the information entropy of the gradient image after each downsampling and the information entropy of the gradient image of each initial landform image;
acquiring a gradient image sequence of each initial landform image after multiple downsampling, acquiring an information entropy ratio sequence of each initial landform image after multiple downsampling, and acquiring an initial loss grading value of the gradient image after multiple downsampling of each initial landform image;
obtaining an optimal variable value according to the gradient image sequence and the information entropy ratio sequence of each initial landform image after repeated downsampling and the initial loss grading value;
updating the gradient image loss score value after each downsampling by utilizing the optimal variable value to obtain a gradient image loss score updating value after each downsampling; obtaining an optimal scale value of each initial landform image according to the gradient image loss score updating values after all times of downsampling;
performing frame extraction on all initial landform images according to the optimal scale value to obtain a cleaning result;
the gradient image loss grading value after each downsampling is obtained according to the information entropy of the gradient image after each downsampling and the information entropy of the gradient image of each initial landform image, and the method comprises the following specific steps:
the method for obtaining the loss scoring value comprises the following steps:
in the middle ofRepresenting gradient image +.>Loss score value,/>Indicate->Gradient image of the initial relief image after the b-th downsampling,/th>Expressed as natural constant->An exponential function of the base +_>Representation pair->Maximum value and minimum value normalization is carried out on the values, and the values are +.>Indicate->Subsampling, & lt->Representing gradient image +.>Information entropy value of>First->Information entropy value of the individual initial relief image, < ->Representing a preset initial variable value, +.>Representing gradient image +.>Is a ratio of information entropy of (2);
the method for obtaining the optimal variable value according to the correlation between the gradient image sequence and the information entropy ratio sequence of each initial landform image after multiple downsampling and the initial loss grading value comprises the following specific steps:
the downsampling times of any initial landform image are all times to form a downsampling time sequence of any initial landform image, the information entropy ratio of all gradient images in the downsampling time sequence is an information entropy ratio sequence, the loss grading values of all gradient images in the downsampling time sequence are a loss grading value sequence, the pearson coefficient value of the downsampling time sequence of any loss grading value sequence and any initial landform image is calculated and recorded asCalculating the pearson coefficient value of any loss score value sequence and any information entropy ratio value sequence, and marking the pearson coefficient value as +.>Obtain->,/>Is +.>Before taking->An initial relief image,/->Representing a first preset number, before acquisition +.>Accumulated value corresponding to the initial relief image +.>Will be->Accumulated value corresponding to the initial relief image +.>Accumulating, and taking the accumulated result as a variable value +.>Is to maximize the objective function value of the variable valueThe optimal variable value is recorded.
2. The intelligent cleaning method for geological big data based on machine learning according to claim 1, wherein the gradient image of the initial relief image after each downsampling comprises the following specific steps:
optionally selecting a pixel point from the landform image after each downsampling to establishAnd obtaining a maximum gray value and a minimum gray value of pixel points in the window, making a difference between the maximum gray value and the minimum gray value in the window, and giving the difference to the central pixel point of the window to obtain a gradient value of the central pixel point of the window, wherein the gradient values of all the pixel points on the landform image after each downsampling form a gradient image after each downsampling.
3. The intelligent cleaning method for geological big data based on machine learning according to claim 1, wherein the construction of the gradient-gray average two-dimensional histogram after each downsampling of any initial landform image according to the eight neighborhood gray average and the gradient value of any pixel point comprises the following specific steps:
combining the gray average value and the gradient value corresponding to each pixel point position in the landform image after each downsampling into a group of binary groups, counting the occurrence frequency of each group of binary groups in the landform image after each downsampling, and constructing a gradient-gray average value two-dimensional histogram through the occurrence frequency of each group of binary groups in the landform image after each downsampling, wherein the gradient-gray average value two-dimensional histogramAxis represents gradient values>Axis represents gray mean value>The axis represents the frequency of occurrence of each set of tuples.
4. The intelligent cleaning method for geological big data based on machine learning according to claim 1, wherein the obtaining the information entropy of the gradient image after each downsampling according to the gradient-gray average two-dimensional histogram after each downsampling comprises the following specific steps:
the gradient-gray level average two-dimensional histogram represents the occurrence probability of each binary group, and the binary group comprises an eight-neighborhood gray level average and a gradient value of any pixel point; the information entropy acquisition method of the gradient image comprises the following steps:
in the middle ofRepresenting gradient image +.>Information entropy of->Indicate->Gradient image of the initial relief image after the b-th downsampling,/th>Representing gradient image +.>Probability of the z-th bin Q in the gradient-gray mean two-dimensional histogram, +.>Representing gradient image +.>The number of tuples in the gradient-gray mean two-dimensional histogram,/->Representing a logarithmic function.
5. The intelligent cleaning method for geological big data based on machine learning according to claim 1, wherein the obtaining the optimal scale value of each initial geomorphic image according to the gradient image loss score updating values after all downsampling comprises the following specific steps:
obtaining an optimal loss score updating value of a gradient image under each downsampling of any initial landform image through an optimal variable value to form an optimal loss score updating value sequence, wherein the obtaining method of the optimal loss score updating value is the same as that of the loss score value; subtracting the previous term from the latter term in the optimal loss score updating value sequence to obtain a difference value sequence, wherein if all the difference values in the difference value sequence are positive or negative signs, the optimal loss score updating value sequence is a monotone sequence, otherwise, the optimal loss score updating value sequence is a fluctuation sequence; if the optimal loss score updating value sequence is a fluctuation sequence, selecting the downsampling times corresponding to the maximum loss score updating value in the optimal loss score updating value sequence as an optimal scale value; and if the optimal loss score updating value sequence is a monotonic sequence, selecting the downsampling times corresponding to the maximum derivative value in the optimal loss score updating value sequence as the optimal scale value of each initial landform image.
6. The intelligent cleaning method for geological big data based on machine learning according to claim 1, wherein the step of performing frame extraction on all initial landform images according to the optimal scale value to obtain cleaning results comprises the following specific steps:
every time according to the time sequence of camera acquisitionThe stretch relief images are a group of +.>For the second preset number, marking any one initial landform image in each group as a target image, taking a time value corresponding to the acquisition time of the target image as a mean value, and establishing a variance as +.>Is a gaussian function model of->Substituting the time values corresponding to all initial landform image acquisition moments in each group into corresponding Gaussian function models to obtain a plurality of function values of the Gaussian function models for preset variance values, marking the function values as Gaussian values of the target image, and taking the maximum Gaussian value as retention probability of the target image;
and screening out the optimal scale from all groups of optimal scales, marking as y, obtaining a retention grading value of each initial landform image in each group according to the optimal scale, retaining the initial landform image with the largest retention grading value in each group, deleting other initial landform images in each group, and finally performing y downsampling on the initial landform images retained in all groups to obtain a cleaning result.
7. The intelligent cleaning method for geological big data based on machine learning according to claim 6, wherein the obtaining the retention score value of each initial geomorphic image in each group according to the optimal scale comprises the following specific steps:
the best scale is marked as x, the firstThe gradient image of the original relief image after the xth downsampling is recorded as +.>Gradient image +.>The loss score value is marked as +.>The product of the retention probability of each initial relief image in each group and the loss score update value of each initial relief image is recorded as the retention score value of each initial relief image in each group.
8. The intelligent cleaning method for geological big data based on machine learning according to claim 6, wherein the method for screening out the optimal scale from all groups is as follows:
the mode of all best scale values in all groups is taken as the preferred scale for each group.
CN202311210864.1A 2023-09-20 2023-09-20 Intelligent geological big data cleaning method based on machine learning Active CN116958717B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311210864.1A CN116958717B (en) 2023-09-20 2023-09-20 Intelligent geological big data cleaning method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311210864.1A CN116958717B (en) 2023-09-20 2023-09-20 Intelligent geological big data cleaning method based on machine learning

Publications (2)

Publication Number Publication Date
CN116958717A CN116958717A (en) 2023-10-27
CN116958717B true CN116958717B (en) 2023-12-12

Family

ID=88451457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311210864.1A Active CN116958717B (en) 2023-09-20 2023-09-20 Intelligent geological big data cleaning method based on machine learning

Country Status (1)

Country Link
CN (1) CN116958717B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015072736A1 (en) * 2013-11-14 2015-05-21 대한민국(기상청장) High resolution topographical data generating method and system
CN107818555A (en) * 2017-10-27 2018-03-20 武汉大学 A kind of more dictionary remote sensing images space-time fusion methods based on maximum a posteriori
WO2020156028A1 (en) * 2019-01-28 2020-08-06 南京航空航天大学 Outdoor non-fixed scene weather identification method based on deep learning
WO2020248471A1 (en) * 2019-06-14 2020-12-17 华南理工大学 Aggregation cross-entropy loss function-based sequence recognition method
CN112561307A (en) * 2020-12-11 2021-03-26 重庆市生态环境大数据应用中心 Watershed water environment big data image system and method
WO2021218336A1 (en) * 2020-04-30 2021-11-04 深圳壹账通智能科技有限公司 User information discrimination method and apparatus, and device and computer readable storage medium
WO2021259393A2 (en) * 2021-01-08 2021-12-30 北京安德医智科技有限公司 Image processing method and apparatus, and electronic device
CN113989454A (en) * 2021-12-30 2022-01-28 山东省地质测绘院 Fusion method, device and system suitable for geological data and geographic information data
CN114359722A (en) * 2021-12-24 2022-04-15 北京卫星信息工程研究所 Method, device and equipment for identifying distribution range of special landform
CN114880314A (en) * 2022-05-23 2022-08-09 烟台聚禄信息科技有限公司 Big data cleaning decision-making method applying artificial intelligence strategy and AI processing system
CN115311301A (en) * 2022-10-12 2022-11-08 江苏银生新能源科技有限公司 PCB welding spot defect detection method
CN115620163A (en) * 2022-10-28 2023-01-17 西南交通大学 Semi-supervised learning deep cut valley intelligent identification method based on remote sensing image
CN115880455A (en) * 2021-09-26 2023-03-31 中国石油化工股份有限公司 Three-dimensional intelligent interpolation method based on deep learning
CN116203646A (en) * 2023-05-04 2023-06-02 山东省地质测绘院 Exploration data processing system for determining geological resource quantity
CN116385902A (en) * 2023-04-18 2023-07-04 赵永兰 Remote sensing big data processing method, system and cloud platform
CN116597389A (en) * 2023-07-18 2023-08-15 山东省地质测绘院 Geological disaster monitoring and early warning method based on image processing

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015072736A1 (en) * 2013-11-14 2015-05-21 대한민국(기상청장) High resolution topographical data generating method and system
CN107818555A (en) * 2017-10-27 2018-03-20 武汉大学 A kind of more dictionary remote sensing images space-time fusion methods based on maximum a posteriori
WO2020156028A1 (en) * 2019-01-28 2020-08-06 南京航空航天大学 Outdoor non-fixed scene weather identification method based on deep learning
WO2020248471A1 (en) * 2019-06-14 2020-12-17 华南理工大学 Aggregation cross-entropy loss function-based sequence recognition method
WO2021218336A1 (en) * 2020-04-30 2021-11-04 深圳壹账通智能科技有限公司 User information discrimination method and apparatus, and device and computer readable storage medium
CN112561307A (en) * 2020-12-11 2021-03-26 重庆市生态环境大数据应用中心 Watershed water environment big data image system and method
WO2021259393A2 (en) * 2021-01-08 2021-12-30 北京安德医智科技有限公司 Image processing method and apparatus, and electronic device
CN115880455A (en) * 2021-09-26 2023-03-31 中国石油化工股份有限公司 Three-dimensional intelligent interpolation method based on deep learning
CN114359722A (en) * 2021-12-24 2022-04-15 北京卫星信息工程研究所 Method, device and equipment for identifying distribution range of special landform
CN113989454A (en) * 2021-12-30 2022-01-28 山东省地质测绘院 Fusion method, device and system suitable for geological data and geographic information data
CN114880314A (en) * 2022-05-23 2022-08-09 烟台聚禄信息科技有限公司 Big data cleaning decision-making method applying artificial intelligence strategy and AI processing system
CN115311301A (en) * 2022-10-12 2022-11-08 江苏银生新能源科技有限公司 PCB welding spot defect detection method
CN115620163A (en) * 2022-10-28 2023-01-17 西南交通大学 Semi-supervised learning deep cut valley intelligent identification method based on remote sensing image
CN116385902A (en) * 2023-04-18 2023-07-04 赵永兰 Remote sensing big data processing method, system and cloud platform
CN116203646A (en) * 2023-05-04 2023-06-02 山东省地质测绘院 Exploration data processing system for determining geological resource quantity
CN116597389A (en) * 2023-07-18 2023-08-15 山东省地质测绘院 Geological disaster monitoring and early warning method based on image processing

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
任晓霞 ; 喻孟良 ; 张鸣之 ; 陈一超 ; 韩明伟 ; 曾青石 ; .基于Hadoop分布式系统的地质环境大数据框架探讨.中国地质灾害与防治学报.2018,(01),136-140+148. *
刘同文.基于地质大数据的地质综合成果信息平台研究与实现.北京测绘.2020,814-818. *
杨少敏 ; 张戬 ; 高雅 ; .基于深度学习的高分辨率遥感影像解译技术研究.江苏科技信息.2020,(04),42-46. *
杨洲 ; .基于聚类的SAR图像灰度特征提取算法.科技创新与应用.2015,(19),78. *

Also Published As

Publication number Publication date
CN116958717A (en) 2023-10-27

Similar Documents

Publication Publication Date Title
CN108109162B (en) Multi-scale target tracking method using self-adaptive feature fusion
CN110070531B (en) Model training method for detecting fundus picture, and fundus picture detection method and device
CN112233129B (en) Deep learning-based parallel multi-scale attention mechanism semantic segmentation method and device
CN109903282B (en) Cell counting method, system, device and storage medium
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN109523013A (en) A kind of air particle pollution level estimation method based on shallow-layer convolutional neural networks
CN109522899B (en) Detection method and device for ripe coffee fruits and electronic equipment
CN111652812A (en) Image defogging and rain removing algorithm based on selective attention mechanism
CN111143615B (en) Short video emotion classification recognition device
CN106530329A (en) Fractional differential-based multi-feature combined sparse representation tracking method
CN104036493A (en) No-reference image quality evaluation method based on multifractal spectrum
CN115063786A (en) High-order distant view fuzzy license plate detection method
CN115049945B (en) Unmanned aerial vehicle image-based wheat lodging area extraction method and device
CN114663769B (en) Fruit identification method based on YOLO v5
CN116958717B (en) Intelligent geological big data cleaning method based on machine learning
Peng et al. Litchi detection in the field using an improved YOLOv3 model
CN113034398A (en) Method and system for eliminating jelly effect in urban surveying and mapping based on artificial intelligence
CN117392539A (en) River water body identification method based on deep learning, electronic equipment and storage medium
CN108960285B (en) Classification model generation method, tongue image classification method and tongue image classification device
CN116343111A (en) Abandoned land monitoring method and system
CN116580279A (en) Tomato disease detection method and system based on mixed self-attention mechanism
CN111046861B (en) Method for identifying infrared image, method for constructing identification model and application
CN108182406A (en) The article display recognition methods of retail terminal and system
CN111753835B (en) Cell tracking method based on local graph matching and convolutional neural network
He et al. Feature aggregation convolution network for haze removal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant