CN111126434B - Random forest-based automatic pickup method and system for arrival time of microseismic first arrival - Google Patents

Random forest-based automatic pickup method and system for arrival time of microseismic first arrival Download PDF

Info

Publication number
CN111126434B
CN111126434B CN201911135141.3A CN201911135141A CN111126434B CN 111126434 B CN111126434 B CN 111126434B CN 201911135141 A CN201911135141 A CN 201911135141A CN 111126434 B CN111126434 B CN 111126434B
Authority
CN
China
Prior art keywords
data
microseismic
arrival
random forest
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911135141.3A
Other languages
Chinese (zh)
Other versions
CN111126434A (en
Inventor
胡宾鑫
高煜
朱峰
张华�
宋广东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Laser Institute of Shandong Academy of Science
Original Assignee
Laser Institute of Shandong Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Laser Institute of Shandong Academy of Science filed Critical Laser Institute of Shandong Academy of Science
Priority to CN201911135141.3A priority Critical patent/CN111126434B/en
Publication of CN111126434A publication Critical patent/CN111126434A/en
Application granted granted Critical
Publication of CN111126434B publication Critical patent/CN111126434B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01VGEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
    • G01V1/00Seismology; Seismic or acoustic prospecting or detecting
    • G01V1/28Processing seismic data, e.g. for interpretation or for event detection
    • G01V1/288Event detection in seismic signals, e.g. microseismics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/30Assessment of water resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Environmental & Geological Engineering (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Remote Sensing (AREA)
  • Evolutionary Biology (AREA)
  • Emergency Management (AREA)
  • Business, Economics & Management (AREA)
  • Acoustics & Sound (AREA)
  • Geology (AREA)
  • General Life Sciences & Earth Sciences (AREA)
  • Geophysics (AREA)
  • Geophysics And Detection Of Objects (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

In the random forest-based automatic pickup method and system for arrival time of microseism first arrival, historical waveform data containing microseism events are obtained; extracting characteristic attribute data and characteristic category data from the historical waveform data to form a sample data set T; randomly dividing a sample data set T into a training set and a verification set, randomly extracting samples from the training set to establish N decision trees, and collecting the prediction results of the N decision trees to generate a random forest model; based on the random forest model, the arrival time of the microseismic first arrival is further judged. According to the method, the threshold value is not required to be set, and only the characteristic category marking is required to be carried out on each micro-seismic data characteristic sample, so that errors caused by experience judgment of the threshold value are reduced; the method and the device do not need to set the length of the characteristic function and the time window, so that the influence of different time window lengths on automatic pickup of the arrival time of the microseismic first arrival wave is eliminated; finally, the automation and the accuracy of picking up the arrival time of the microseismic first arrival can be improved.

Description

Random forest-based automatic pickup method and system for arrival time of microseismic first arrival
Technical Field
The application relates to the technical field of microseism monitoring, in particular to a random forest-based automatic pickup method and system for arrival time of microseism first arrival.
Background
The microseismic monitoring technology is widely applied to the fields of safety monitoring of dam mines, hydraulic fracturing monitoring in shale gas exploitation and the like, and a plurality of remarkable research results are obtained; the microseism monitoring technology utilizes a microseism network to carry out on-site real-time detection, and combines a seismic source positioning technology to determine the time-space information and energy of a microseism event, so that scientific qualitative and quantitative evaluation on the deformation damage movable range, stability and development trend of the rock mass can be realized.
When a microseismic event occurs, a detector arranged underground starts to receive signals, and the effective microseismic wave received by the detector firstly is a first arrival wave, and the accurate pickup of the first arrival wave is a key link of seismic source positioning. The energy ratio method is an automatic identification method of primary arrival waves which is more used at present. The principle of the method is that on microseismic data, the time of a first arrival of a microseismic wave is a demarcation point, the noise is before the first arrival time, then the noise and microseismic signal mixed signal are added, the energy characteristics in a time window before and after the first arrival time have great difference, and when the energy ratio of the time window before and after exceeds a set threshold value at a certain moment, the moment can be judged to be the first arrival time. The specific implementation manner is to select a time window, divide the time window into a long time window and a short time window, and according to different characteristic functions, the LTA (signal long time window average) and STA (signal short time window average) can be expressed as:
Figure BDA0002279381270000011
however, it can be seen that in the above scheme, a time window or a threshold needs to be set artificially, but the threshold is set empirically and cannot be adapted to different occasions, especially when weak seismic wave signals of the deep rock mass reach the sensor, the energy is equivalent to noise, and the problem of missed detection is more serious; when the length of the time window is selected, if the length is selected to be too small, the attribute characteristics of the time window are affected by the values of the locally sampled data points, the stability of the pick-up result is poor, and if the length is selected to be too large, the real microseismic waves are ignored. Therefore, a time window or a threshold value is manually set during calculation, and the automaticity and the accuracy of microseismic data processing are reduced.
Disclosure of Invention
The application provides a random forest-based automatic picking method and system for arrival time of a microseismic first arrival time, which are used for solving the technical problems of insufficient automation and accuracy of the picking of the arrival time of the microseismic first arrival time.
In order to solve the technical problems, the embodiment of the application discloses the following technical scheme:
the application provides a random forest-based automatic pickup method for arrival time of microseism first arrival, which comprises the following steps:
acquiring historical waveform data containing microseismic events;
extracting characteristic attribute data and characteristic category data from the historical waveform data, wherein the characteristic attribute data and the characteristic category data form a sample data set T, and the sample data set T comprises a training set and a verification set;
randomly extracting data from the training set with a place to be replaced to establish N decision trees, randomly extracting M ((M is less than or equal to M)) characteristic attributes from the M characteristic attributes for each decision tree, and selecting an optimal attribute from the M attributes according to a minimum coefficient principle to perform internal node splitting until the coefficient is 0;
collecting the prediction results of the N decision trees to generate a random forest model;
performing parameter adjustment on the random forest model according to the verification set to obtain an optimized random forest model;
inputting a test data set containing a microseismic event into the optimized random forest model, and outputting the probability of being divided into each type of label corresponding to each test sample sampling point;
when the probability is larger than 0.5, judging that the micro shock wave exists; when the probability is smaller than 0.5, judging that the vibration wave is not micro-vibration wave;
and extracting a first data point with the probability larger than or equal to 0.5, wherein the first data point is the arrival time of the first arrival of the microseismic.
Optionally, the proportion of the training set and the validation set in the sample data set T is 70% and 30%, respectively.
Optionally, the characteristic properties include amplitude, energy and adjacent moment amplitude ratio.
Optionally, the extracting feature class data from the historical waveform data includes:
marking the data from the arrival time of the first arrival of the microseism to the end of the microseism as 1;
the remaining data is marked 0.
Optionally, the method comprises the following steps:
the random forest is composed of N decision trees, each formed by randomly retrieving sample data from a training set.
Optionally, the randomly extracting data from the training set with place of return establishes N decision trees, randomly extracts M ((M is less than or equal to M)) characteristic attributes from the M characteristic attributes for each decision tree, selects an optimal attribute from the M attributes according to a minimum principle of a coefficient of a base, and performs internal node splitting until the coefficient of the base is 0, including:
randomly extracting data from the training set with a place to establish N decision trees, and randomly selecting 2 characteristic attributes from 3 characteristic attributes for each decision tree;
respectively calculating the coefficient of the foundation of the 2 characteristic attributes;
and selecting the characteristic attribute with the minimum coefficient of the base, and performing internal node splitting until the coefficient of the base is 0.
Optionally, the calculating the kunit coefficients of the 2 feature attributes includes:
according to
Figure BDA0002279381270000021
Calculating a coefficient of a feature attribute, wherein:
t represents a given node, i represents any classification of labels, and p (i|t) represents the proportion of label classification i on node t.
Optionally, the performing parameter adjustment on the random forest model according to the verification set to obtain an optimized random forest model includes:
the parameters include the number of decision trees contained in the random forest and the maximum depth of the decision trees.
Optionally, when the probability is greater than 0.5, determining that the micro shock wave exists; when the probability is less than 0.5, judging that the vibration wave is not micro vibration wave, wherein the method comprises the following steps of:
when the probability value of the output 1 in the results generated by the N decision trees is greater than 0.5, judging that the waves are micro-shock waves;
and when the probability value of the output 1 in the results generated by the N decision trees is smaller than 0.5, judging that the waves are not microseismic waves.
In a second aspect, based on the foregoing random forest-based automatic picking method for arrival time of a microseismic first arrival, the present application further provides an automatic picking system for arrival time of a microseismic first arrival based on a random forest, including:
the historical data acquisition module is used for acquiring historical waveform data containing microseismic events;
the training set and verification set acquisition module is used for extracting characteristic attribute data and characteristic category data from the historical waveform data, wherein the characteristic attribute data and the characteristic category data form a sample data set T, and the sample data set T comprises a training set and a verification set;
the decision tree generation module is used for randomly extracting data from the training set with a place of return to establish N decision trees, randomly extracting M ((M is less than or equal to M)) characteristic attributes from the M characteristic attributes for each decision tree, and selecting an optimal attribute from the M attributes according to a minimum principle of the coefficient of the key to perform internal node splitting until the coefficient of the key is 0;
the random forest generation module is used for gathering the prediction results of the N decision trees to generate a random forest model;
the random forest optimization module is used for carrying out parameter adjustment on the random forest model according to the verification set to obtain an optimized random forest model;
the probability output module is used for inputting a test data set containing a microseismic event into the optimized random forest model and outputting the probability of being divided into each type of label corresponding to each test sample sampling point;
the micro-seismic wave judging module is used for judging that the micro-seismic waves are judged when the probability is larger than 0.5; when the probability is smaller than 0.5, judging that the vibration wave is not micro-vibration wave;
and the microseismic first arrival wave extraction module is used for extracting a first data point with the probability larger than or equal to 0.5, and the first data point is the arrival time of the microseismic first arrival wave.
Compared with the prior art, the beneficial effects of this application are:
according to the technical scheme, in the random forest-based automatic pickup method and system for arrival time of the microseism first arrival, historical waveform data containing microseism events are obtained; extracting characteristic attribute data and characteristic category data from the historical waveform data, wherein the characteristic attribute data and the characteristic category data form a sample data set T; the sample data set T includes a training set and a validation set; randomly extracting data from the place where the training set is placed back to establish N decision trees, and collecting the prediction results of the N decision trees to generate a random forest model; and then determining the category to which each sample belongs by correspondingly returning the probability value of each test sample in the microseism test set based on the random forest model, and further judging the arrival time of the microseism first arrival.
Compared with the STA/LTA method, the method has the advantages that firstly, the threshold value is not required to be set, and only the characteristic category marking is required to be carried out on each microseismic data characteristic sample, so that errors caused by experience judgment of the threshold value are reduced; secondly, the implementation is simpler, and the length of a characteristic function and a time window is not required to be set, so that the influence of different time window lengths on automatic pickup of arrival time of the microseismic first arrival wave is eliminated; finally, the automation and the accuracy of picking up the arrival time of the microseismic first arrival can be improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a schematic flow chart of a random forest-based automatic picking method for arrival time of a microseismic first arrival;
FIG. 2 is a schematic diagram of the result of processing historical waveform data including microseismic events based on a manual pick-up method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a flow chart for executing splitting of each decision tree in a random forest according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a time point result of arrival of a microseismic first arrival wave based on random forest pickup in an embodiment of the present invention;
FIG. 5 is a schematic diagram of a time point result of arrival time of a microseismic first arrival picked up based on a manual method in the embodiment of the invention;
fig. 6 is a schematic diagram of a time point result of arrival of a microseismic first arrival based on STA/LTA pickup according to an embodiment of the present invention.
Detailed Description
In order to better understand the technical solutions in the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
Referring to fig. 1, fig. 1 shows a schematic flow chart of a random forest-based automatic picking method for arrival time of a microseismic first arrival according to an embodiment of the present invention. The following describes a random forest-based automatic picking method for arrival time of a microseism first arrival according to an embodiment of the present application with reference to fig. 1.
As shown in fig. 1, the present application provides a random forest-based automatic picking method for arrival time of a microseismic first arrival, which includes:
s110: historical waveform data including microseismic events is obtained.
Acquiring microseismic sample data, recording arrival time and end time of a microseismic first-time wave, extracting relevant microseismic characteristic attributes, marking each sample by characteristic category, finally forming a microseismic characteristic sample data set, and randomly dividing the obtained sample data set into a training set and a verification set; the specific operation is as follows:
waveform data of a microseismic event are acquired, the data are actually acquired by microseismic sensors installed in deep holes of underground roadways of coal mines, the sampling frequency is 2000Hz, the sampling interval is 0.0005s, the duration of each channel is 4 seconds, and the length is 8000; waveform data of a channel containing the microseismic event is selected, and the arrival time of the microseismic first arrival wave of the channel and the ending time of the microseismic wave are recorded. FIG. 2 is a schematic diagram of the result of processing historical waveform data including microseismic events based on a manual pick-up method according to an embodiment of the present invention; fig. 2 is a selected 1 st channel data waveform, and when a microseismic first arrival wave is obtained through manual pickup, the left broken line in fig. 2 is 1.918s, and the sampling interval is 0.0005s, so that the left broken line is located at the 3836 th sampling point, and the right broken line in fig. 2 is 2.251s and is located at the 4502 nd sampling point at the end of the microseismic wave.
S120: feature attribute data and feature class data are extracted from the historical waveform data, the feature attribute data and the feature class data form a sample data set T, and the sample data set T comprises a training set and a verification set.
When a microseismic event occurs, the main characteristics of microseismic records change, particularly the amplitude of a microseismic waveform is obviously increased at the moment when a first arrival wave arrives, and the energy at the moment is also obviously increased, so that the amplitude, the energy and the amplitude ratio of sampling points at adjacent moments are extracted as the characteristics; and then, carrying out characteristic category marking on each microseismic data sample, if the data sampling point of the microseismic first arrival wave is directly marked as a 1 label, and the rest data sampling points are marked as 0 labels, then sample distribution imbalance occurs to cause non-ideal training effect, and the number of positive samples is required to be enlarged, so that the 1 label is marked on the microseismic first arrival wave to the end time of the microseismic first arrival wave, namely the 3836 th sampling point to the 4502 th sampling point, and the rest data sampling points are marked as 0 labels, thereby forming a microseismic characteristic sample data set T and storing the microseismic characteristic sample data set T in a CSV format.
The sample data set T is randomly divided into a training set and a verification set, wherein the proportion of the training set and the verification set to the sample data set T is 70% and 30%, respectively.
The sample set T contains 8000 microseismic data characteristic samples, each sample has the attribute of 3 characteristics, namely amplitude, energy and amplitude ratio at adjacent moments, the samples are randomly extracted 5600 times in a put-back way from the divided training set, namely 5600 samples, one sample is randomly selected each time, then the selection is continued, and the selected 5600 samples are used for training a decision tree.
The purpose of the put-back random decimation in this application is: if the sample is not the replaced sample, the training samples of each tree are different and have no intersection, so that each tree is biased, that is, each tree is trained with great difference; while the last classification of random forests depends on voting of multiple trees, such voting should be "homomorphic", so training each tree using a completely different training set is not helpful to the final classification result, and thus is not unlike "blind-man-like".
S130: randomly extracting data from the training set with a place of return to establish N decision trees, randomly extracting M ((M is less than or equal to M)) characteristic attributes from the M characteristic attributes for each decision tree, and selecting the optimal attribute from the M attributes according to the minimum coefficient principle to perform internal node splitting until the coefficient is 0.
S140: and collecting the prediction results of the N decision trees to generate a random forest model.
In this embodiment of the present application, two random extraction are adopted, where the first random extraction is that the above-mentioned 5600 samples are put back for 5600 times, and the second random extraction is that M ((M is less than or equal to M)) feature attributes are randomly extracted from the M feature attributes, and two random extraction can achieve that when extracting more microseismic data features, feature selection is not required, and dimension reduction processing is not required.
When each node of the decision tree node splits, 2 nodes are randomly selected from 3 characteristic attributes, then the coefficient of the characteristic is calculated one by one, the minimum criterion of the coefficient of the characteristic is used as the splitting criterion of the node, and the current decision tree is split into a left subtree and a right subtree at the node according to the splitting function. Questions are asked of data based on a value of a certain feature, each question has a true or false answer to be node split, and the data moves downwards accordingly according to the answers. FIG. 3 is a schematic diagram of a flow chart for executing splitting of each decision tree in a random forest according to an embodiment of the present invention; as can be seen from fig. 3, the splitting is performed in the root node according to the characteristic attribute of "energy", and the coefficient of the base of the characteristic attribute of energy in the root node is 0.141, so that the coefficient of base of the characteristic attribute of energy is the smallest compared with the random selection of other characteristic attributes, wherein the coefficient of base is an index of the uncertainty, the larger the coefficient of base is, the smaller the uncertainty is, the selection criterion is that each child node reaches the highest purity, namely, all the characteristic attributes falling in the child nodes belong to the same class, and the coefficient of base is the smallest, the purity is the highest, and the uncertainty is the smallest.
Wherein according to
Figure BDA0002279381270000051
Calculating a coefficient of a feature attribute, wherein:
t represents a given node, i represents any classification of labels, and p (i|t) represents the proportion of label classification i on node t.
Repeating the steps until each tree classifies the microseism data samples extracted by the tree; as shown in the third left and right boxes of fig. 3, where the coefficient of the co-ordinates is 0, leaf nodes have been reached, resulting in a final classification of category 1.
And establishing a plurality of decision trees to form a random forest.
The random forest is composed of N decision trees, each decision tree is generated by randomly extracting from the belonging training set, and microseismic sample data of each decision tree are independently trained through a parallelization method, so that the training speed is higher.
S150: and carrying out parameter adjustment on the random forest model according to the verification set to obtain an optimized random forest model.
And finally, verifying the trained random forest model by using a verification data set, and finishing training when the verification result meets the preset precision requirement. In this embodiment, two parameters of the model are mainly involved in adjustment, the first parameter is the number of decision trees contained in the random forest, the second parameter is the maximum depth of the trees, the model accuracy is continuously improved and gradually stabilized by repeatedly adjusting the two parameters, the number of the decision trees is finally selected to be 137, the maximum depth of the trees is 6, and the accuracy rate is 98.5% after verification by a verification set.
S160: and inputting the test data set containing the microseismic event into the optimized random forest model, and outputting the probability of being divided into each type of label corresponding to each test sample sampling point.
S170: when the probability is larger than 0.5, judging that the micro shock wave exists; and when the probability is smaller than 0.5, judging that the vibration wave is not micro vibration wave.
S180: the first data point with the concept larger than or equal to 0.5 is extracted, and the first data point is the arrival time of the first arrival of the microseismic.
The random forest increases differentiation among classification models by constructing training sets of different microseismic data, and two random processes exist during training of the decision tree, so that the depth of the decision tree can be maximized, and the fitting problem is not easy to occur in the training process. When a random forest classifies a new microseismic test sample, each decision tree carries out a class judgment to judge whether the microseismic test sample is a microseismic wave or not, wherein the classification of each decision tree can be regarded as 'voting', and finally, the random forest synthesizes all decision tree votes, obtains a final classification result according to a maximum voting method, and the class with the largest number of votes is the classification result of the test sample.
The data of the channel containing the microseismic event is selected as test data, the amplitude, the energy and the amplitude ratio of adjacent moments of waveform data are extracted as characteristics to form a test data set, the test data set is stored as a CSV format file and is input into a trained random forest model, the result returns the probability of being divided into each type of label corresponding to each test sample sampling point, and the label returns a plurality of probabilities after a plurality of classifications. Because here the two classes are the micro-seismic wave and not the micro-seismic wave, the result value returned is greater than 0.5 and is classified as 1, i.e. considered as micro-seismic wave, less than 0.5 and is classified as 0, i.e. not micro-seismic wave, wherein the data point with the first return probability value greater than 0.5 is considered as the arrival time of the micro-seismic first arrival. According to the method and the device, the category to which each sample belongs is determined through the probability value returned by the corresponding return of each test sample in the microseismic test set, and then the arrival time of the microseismic first arrival is judged.
Specifically, as in fig. 3, samples: the number of observed data in the node;
value: number of samples per class. If 5173 samples in the root node belong to the classification of 0 label, 427 samples belong to the classification of 1 label;
class: the classification category of most points in the node, such as most points in the root node, is classified as a 0-label. Among the leaf nodes, the prediction of all samples in the node is shown as the leftmost leaf node in the third row of the figure, and the prediction category is 1 label.
Assuming that 10 trees are built for one sample in the embodiment of the present application, 8 tree results are classified to be output as 1,2 tree results are output as 0, and the comprehensive voting result should be 1, then for this sample, a probability is output finally, and the probability is classified as 0.8, and the probability is classified as 0.2. Because 0.8 is greater than 0.5, this sample is classified as 1, at which point it is determined that the sample contains a microseismic event.
In the embodiment of the application, fig. 4 is a schematic diagram of a time point result of arrival time of a microseismic first arrival based on random forest pickup in the embodiment of the invention; FIG. 5 is a schematic diagram of a time point result of arrival time of a microseismic first arrival picked up based on a manual method in the embodiment of the invention; fig. 6 is a schematic diagram of a time point result of arrival of a microseismic first arrival based on STA/LTA pickup according to an embodiment of the present invention.
As shown in fig. 4, which is a selected 3 rd channel test data waveform, it can be seen that the arrival time of the microseismic first arrival at 2582 sampling points, i.e. 1.291s (marked by dotted lines) can be obtained by a random forest method. Fig. 5 shows the arrival time of the first arrival of the microseismic, which is manually marked, and it can be seen that the arrival time of the first arrival of the microseismic, which is manually marked, is 1.2895s (marked by a dotted line) which is the 2579 th sampling point. Fig. 6 shows the arrival time of the first arrival of the microseismic obtained by the STA/LTA method, and it can be seen from the figure that the arrival time of the first arrival of the microseismic obtained by the STA/LTA method is 1.2941s (marked by dotted line) which is 2588 th sampling point. It can be seen that the method for automatically picking up the arrival time of the microseism first arrival by using the random forest is higher in precision than the method for automatically picking up the arrival time of the microseism first arrival by taking the sampling point for manually picking up the arrival time of the microseism first arrival as a reference, wherein the arrival time result of the microseism first arrival obtained by the random forest method is 1.5ms different from that of the manual picking up method, and the difference between the STA/LTA method and the manual picking up method is 4.6 ms.
According to the technical scheme, in the random forest-based automatic pickup method and system for arrival time of the microseism first arrival, historical waveform data containing microseism events are obtained; extracting characteristic attribute data and characteristic category data from the historical waveform data, wherein the characteristic attribute data and the characteristic category data form a sample data set T; the sample data set T includes a training set and a validation set; randomly extracting data from the place where the training set is placed back to establish N decision trees, and collecting the prediction results of the N decision trees to generate a random forest model; and then determining the category to which each sample belongs by correspondingly returning the probability value of each test sample in the microseism test set based on the random forest model, and further judging the arrival time of the microseism first arrival.
Compared with the STA/LTA method, the method has the advantages that firstly, the threshold value is not required to be set, and only the characteristic category marking is required to be carried out on each microseismic data characteristic sample, so that errors caused by experience judgment of the threshold value are reduced; secondly, the implementation is simpler, and the length of a characteristic function and a time window is not required to be set, so that the influence of different time window lengths on automatic pickup of arrival time of the microseismic first arrival wave is eliminated; finally, the automation and the accuracy of picking up the arrival time of the microseismic first arrival can be improved.
Based on the automatic picking up method of microseismic first arrival time based on random forest that this application provided, this application still provides an automatic picking up system of microseismic first arrival time based on random forest, includes:
the historical data acquisition module is used for acquiring historical waveform data containing microseismic events;
the training set and verification set acquisition module is used for extracting characteristic attribute data and characteristic category data from the historical waveform data, wherein the characteristic attribute data and the characteristic category data form a sample data set T, and the sample data set T comprises a training set and a verification set;
the decision tree generation module is used for randomly extracting data from the training set with a place of return to establish N decision trees, randomly extracting M ((M is less than or equal to M)) characteristic attributes from the M characteristic attributes for each decision tree, and selecting an optimal attribute from the M attributes according to a minimum principle of the coefficient of the key to perform internal node splitting until the coefficient of the key is 0;
the random forest generation module is used for gathering the prediction results of the N decision trees to generate a random forest model;
the random forest optimization module is used for carrying out parameter adjustment on the random forest model according to the verification set to obtain an optimized random forest model;
the probability output module is used for inputting a test data set containing a microseismic event into the optimized random forest model and outputting the probability of being divided into each type of label corresponding to each test sample sampling point;
the micro-seismic wave judging module is used for judging that the micro-seismic waves are judged when the probability is larger than 0.5; when the probability is smaller than 0.5, judging that the vibration wave is not micro-vibration wave;
and the microseismic first arrival wave extraction module is used for extracting a first data point with the probability larger than or equal to 0.5, and the first data point is the arrival time of the microseismic first arrival wave.
Since the foregoing embodiments are all described in other modes by reference to the above, the same parts are provided between different embodiments, and the same and similar parts are provided between the embodiments in the present specification. And will not be described in detail herein.
It should be noted that in this specification, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a circuit structure, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such circuit structure, article, or apparatus. Without further limitation, the statement "comprises" or "comprising" a … … "does not exclude that an additional identical element is present in a circuit structure, article or apparatus that comprises the element.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure of the invention herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
The above-described embodiments of the present application are not intended to limit the scope of the present application.

Claims (8)

1. The automatic picking method for the arrival time of the microseismic first arrival based on the random forest is characterized by comprising the following steps:
acquiring historical waveform data containing microseismic events;
extracting feature attribute data and feature class data from the historical waveform data, wherein the extracting feature class data from the historical waveform data comprises: marking the data from the arrival time of the first arrival of the microseism to the end of the microseism as 1; marking the rest data as 0, wherein the characteristic attribute data and the characteristic category data form a sample data set T, and the sample data set T comprises a training set and a verification set;
randomly extracting data from the training set with a place of return to establish N decision trees, randomly extracting M characteristic attributes from M characteristic attributes for each decision tree, selecting an optimal attribute from M attributes according to a minimum coefficient principle to perform internal node splitting until the coefficient is 0, wherein M is less than or equal to M;
collecting the prediction results of N decision trees to generate a random forest model;
performing parameter adjustment on the random forest model according to the verification set to obtain an optimized random forest model;
inputting a test data set containing a microseismic event into the optimized random forest model, and outputting the probability of being divided into each type of label corresponding to each test sample sampling point;
when the probability value of the output 1 in the results generated by the N decision trees is greater than 0.5, judging that the waves are micro-shock waves; when the probability value of the output 1 in the results generated by the N decision trees is smaller than 0.5, judging that the waves are not microseismic waves;
and extracting a first data point with the probability larger than or equal to 0.5, wherein the first data point is the arrival time of the first arrival of the microseismic.
2. The random forest based automatic picking up method for arrival time of microseismic first arrival as set forth in claim 1, wherein the proportion of the training set and the validation set to the sample data set T is 70% and 30%, respectively.
3. The random forest based microseismic first arrival time automatic picking method according to claim 1, wherein the characteristic properties comprise amplitude, energy and adjacent moment amplitude ratio.
4. The random forest based automatic picking up method for arrival time of microseismic first arrival as claimed in claim 1, comprising:
the random forest is composed of N decision trees, each formed by randomly retrieving sample data from a training set.
5. The automatic picking up method for arrival time of microseismic first arrival based on random forest according to claim 1, wherein said randomly extracting data from said training set with place of return establishes N decision trees, randomly extracts M feature attributes from M feature attributes for each decision tree, selects optimal attributes from M attributes according to a minimum coefficient principle of a base node to split internal nodes until said base node coefficient is 0, wherein M is less than or equal to M, and includes:
randomly extracting data from the training set with a place to establish N decision trees, and randomly selecting 2 characteristic attributes from 3 characteristic attributes for each decision tree;
respectively calculating the coefficient of the foundation of the 2 characteristic attributes;
and selecting the characteristic attribute with the minimum coefficient of the base, and performing internal node splitting until the coefficient of the base is 0.
6. The random forest based automatic picking up method for arrival time of microseismic first arrival as set forth in claim 5, wherein said calculating the coefficient of parities of said 2 feature attributes respectively comprises:
according to
Figure FDA0004277578350000021
Calculating a coefficient of a feature attribute, wherein:
t represents a given node, i represents any classification of labels, and p (i|t) represents the proportion of label classification i on node t.
7. The automatic picking up method for arrival time of microseismic first arrival based on random forest according to claim 1, wherein said performing parameter adjustment on the random forest model according to the verification set to obtain an optimized random forest model comprises:
the parameters include the number of decision trees contained in the random forest and the maximum depth of the decision trees.
8. A random forest-based microseismic first arrival time automatic pickup system, comprising:
the historical data acquisition module is used for acquiring historical waveform data containing microseismic events;
the training set and verification set acquisition module is used for extracting feature attribute data and feature category data from the historical waveform data, wherein the feature category data extraction from the historical waveform data comprises the following steps: marking the data from the arrival time of the first arrival of the microseism to the end of the microseism as 1; marking the rest data as 0, wherein the characteristic attribute data and the characteristic category data form a sample data set T, and the sample data set T comprises a training set and a verification set;
a decision tree generating module for randomly extracting data from the training set with a place of return to establish N decision trees, randomly extracting M characteristic attributes from M characteristic attributes for each decision tree, selecting an optimal attribute from M attributes according to a minimum coefficient principle to perform internal node splitting until the coefficient is 0, wherein M is less than or equal to M;
the random forest generation module is used for gathering the prediction results of the N decision trees to generate a random forest model;
the random forest optimization module is used for carrying out parameter adjustment on the random forest model according to the verification set to obtain an optimized random forest model;
the probability output module is used for inputting a test data set containing a microseismic event into the optimized random forest model and outputting the probability of being divided into each type of label corresponding to each test sample sampling point;
the microseismic wave judging module is used for judging the microseismic wave when the probability value of the output 1 in the results generated by the N decision trees is greater than 0.5; when the probability value of the output 1 in the results generated by the N decision trees is smaller than 0.5, judging that the waves are not microseismic waves;
and the microseismic first arrival wave extraction module is used for extracting a first data point with the probability larger than or equal to 0.5, and the first data point is the arrival time of the microseismic first arrival wave.
CN201911135141.3A 2019-11-19 2019-11-19 Random forest-based automatic pickup method and system for arrival time of microseismic first arrival Active CN111126434B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911135141.3A CN111126434B (en) 2019-11-19 2019-11-19 Random forest-based automatic pickup method and system for arrival time of microseismic first arrival

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911135141.3A CN111126434B (en) 2019-11-19 2019-11-19 Random forest-based automatic pickup method and system for arrival time of microseismic first arrival

Publications (2)

Publication Number Publication Date
CN111126434A CN111126434A (en) 2020-05-08
CN111126434B true CN111126434B (en) 2023-07-11

Family

ID=70495809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911135141.3A Active CN111126434B (en) 2019-11-19 2019-11-19 Random forest-based automatic pickup method and system for arrival time of microseismic first arrival

Country Status (1)

Country Link
CN (1) CN111126434B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112505781A (en) * 2020-10-28 2021-03-16 中国石油天然气集团有限公司 Image processing method and device for seismic acquisition first-arrival picking
CN112348831B (en) * 2020-11-05 2022-11-11 中国石油大学(华东) Shale SEM image segmentation method based on machine learning
CN112966434B (en) * 2021-02-26 2023-06-23 四化信息科技(深圳)有限公司 Random forest sudden fault early warning method based on sliding window
CN113744869B (en) * 2021-09-07 2024-03-26 中国医科大学附属盛京医院 Method for establishing early screening light chain type amyloidosis based on machine learning and application thereof

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110261900A (en) * 2019-06-10 2019-09-20 中北大学 A kind of underground shallow layer microseism positioning system based on velocity information

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292186B (en) * 2016-03-31 2021-01-12 阿里巴巴集团控股有限公司 Model training method and device based on random forest
CN106255116A (en) * 2016-08-24 2016-12-21 王瀚辰 A kind of recognition methods harassing number
CN106405640B (en) * 2016-08-26 2018-07-10 中国矿业大学(北京) Microseismic signals based on depth conviction neural network then automatic pick method
CN108388860B (en) * 2018-02-12 2020-04-28 大连理工大学 Aero-engine rolling bearing fault diagnosis method based on power entropy spectrum-random forest
CN108549954B (en) * 2018-03-26 2022-08-02 平安科技(深圳)有限公司 Risk model training method, risk identification device, risk identification equipment and risk identification medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110261900A (en) * 2019-06-10 2019-09-20 中北大学 A kind of underground shallow layer microseism positioning system based on velocity information

Also Published As

Publication number Publication date
CN111126434A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
CN111126434B (en) Random forest-based automatic pickup method and system for arrival time of microseismic first arrival
CN110032975B (en) Seismic facies picking method
CN105527650B (en) Microseismic signals and p ripple first arrival automatic identification algorithms under a kind of engineering yardstick
CN110058294A (en) A kind of tunnel micro seismic monitoring rock rupture event automatic identifying method
CN111008337B (en) Deep attention rumor identification method and device based on ternary characteristics
Saragiotis et al. Automatic P phase picking using maximum kurtosis and/spl kappa/-statistics criteria
CN110133714A (en) A kind of microseismic signals classification discrimination method based on deep learning
US20240078413A1 (en) Massive data-driven method for automatically locating mine microseismic source
CN112528774B (en) Intelligent unknown radar signal sorting system and method in complex electromagnetic environment
CN106382981A (en) Single station infrasonic wave signal recognition and extraction method
CN103994817A (en) Vibration source identification method based on long-distance optical fiber frequent occurring events
CN114152980B (en) Method and device for rapidly and automatically producing seismic source mechanism solution
CN106935038B (en) Parking detection system and detection method
Zaccarelli et al. Anomaly detection in seismic data–metadata using simple machine‐learning models
CN114330120B (en) 24-Hour PM prediction based on deep neural network2.5Concentration method
CN109409216B (en) Speed self-adaptive indoor human body detection method based on subcarrier dynamic selection
Cofré et al. End-to-End LSTM-based earthquake magnitude estimation with a single station
CN115952410B (en) Landslide hazard detection system based on deep learning
CN112215307B (en) Method for automatically detecting signal abnormality of earthquake instrument by machine learning
CN115963548B (en) Mine microseismic P wave arrival time pickup model construction method based on deduction learning
CN103994816A (en) Identification method based on optical fiber multiple events
CN115586570A (en) Microseismic positioning optimization method based on clustering model
CN109886420B (en) Self-adaptive intelligent prediction system for cutting height of coal mining machine
CN114121038A (en) Sound voice testing method, device, equipment and storage medium
CN112412390A (en) Method and device for evaluating second interface of well cementation based on deep learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant