CN114896467A - Neural network-based field matching method and intelligent data entry method - Google Patents

Neural network-based field matching method and intelligent data entry method Download PDF

Info

Publication number
CN114896467A
CN114896467A CN202210436149.9A CN202210436149A CN114896467A CN 114896467 A CN114896467 A CN 114896467A CN 202210436149 A CN202210436149 A CN 202210436149A CN 114896467 A CN114896467 A CN 114896467A
Authority
CN
China
Prior art keywords
data
file
field
neural network
column
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210436149.9A
Other languages
Chinese (zh)
Other versions
CN114896467B (en
Inventor
任钰
申瑞彩
王博涵
武鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yuexin Times Technology Co ltd
Original Assignee
Beijing Yuexin Times Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yuexin Times Technology Co ltd filed Critical Beijing Yuexin Times Technology Co ltd
Priority to CN202210436149.9A priority Critical patent/CN114896467B/en
Publication of CN114896467A publication Critical patent/CN114896467A/en
Application granted granted Critical
Publication of CN114896467B publication Critical patent/CN114896467B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to the field of artificial intelligence, in particular to a field matching method and a data intelligent entry method based on a neural network, and aims to solve the problem of low efficiency in manual identification and data uploading. The field matching method based on the neural network comprises the following steps: reading a column of data from the pre-constructed second structured data to serve as a data column to be matched; and inputting the data column to be matched into the field matching model to obtain the field name corresponding to the data column to be matched. The invention provides a data intelligent entry method based on a neural network, which comprises the following steps: analyzing a file to be input, and constructing first structured data; judging the type of the file to be input according to the first structured data; selecting a field matching model according to the file type, and judging the field name corresponding to each line of data in the file; and uploading the data in the file to be recorded to a data resource pool according to the type of the file to be recorded and the field names of all columns of data. The invention greatly improves the data uploading efficiency.

Description

Neural network-based field matching method and intelligent data entry method
Technical Field
The invention relates to the field of artificial intelligence, in particular to a field matching method and a data intelligent entry method based on a neural network.
Background
Seismic exploration is a geophysical exploration method that uses the differences in elasticity and density of the subsurface medium to infer the nature and morphology of the subsurface rock formations by observing and analyzing the response of the earth to artificially excited seismic waves. Seismic exploration is the most important method in geophysical exploration and is the most effective method for solving the problem of oil and gas exploration. It is an important means for surveying petroleum and natural gas resources before drilling, and is widely applied to the aspects of coal field and engineering geological exploration, regional geological research, crust research and the like. The construction of the exploration seismic geological data resource pool has very important significance for analyzing and researching geological structures.
In the process of constructing the exploration seismic geological data resource pool, the collected data are original well file data, and the data show the phenomena of large data volume, multiple file types and non-uniform field naming modes in files.
The traditional method is to manually discriminate whether the file name is matched with the data content and check whether the field name is matched with the field content one by one, if so, the file name is directly uploaded, and if not, the file name is uploaded after manual correction. Although the method can realize the uploading of data, the method causes longer time consumption and lower efficiency.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a field matching method and a data intelligent entry method based on a neural network, and the data uploading efficiency is greatly improved.
In a first aspect of the present invention, a field matching method based on a neural network is provided, where the method includes:
reading a column of data from the pre-constructed second structured data to serve as a data column to be matched;
inputting a data column to be matched into a field matching model to obtain a field name corresponding to the data column to be matched;
wherein the content of the first and second substances,
the second structured data is constructed after data is extracted from the original file of the seismic geological exploration;
the field matching model is a neural network model.
Preferably, the method of constructing the second structured data comprises:
deleting redundant empty rows in the original file of seismic geological exploration;
extracting data in the seismic geological exploration original file;
in the extracted data, one-hot coding is carried out on the non-numeric fields;
and adopting a tail zero padding mode to enable the number of data contained in each data column to reach a preset second data dimension so as to obtain the second structured data.
Preferably, the training method of the field matching model includes:
reading a column of data from the second training set in sequence each time and inputting the data into the field matching model to obtain an output result;
calculating a second loss function according to the output result and the field name label corresponding to the column of data;
and iterating by using a gradient descent method, gradually adjusting the model parameters and reducing the second loss function until a preset iteration number is reached.
Preferably, before the step of sequentially reading a column of data from the second training set each time and inputting the data into the field matching model to obtain an output result, the training method for the field matching model further includes:
acquiring all field names corresponding to specific file types from a data resource pool, and acquiring a preset number of data columns corresponding to the field names to obtain a structured second original data set;
taking the field name corresponding to each data column as the field name label of the data column;
counting all non-digital fields in the structured second original data set, and carrying out one-hot coding on the non-digital fields;
enabling the number of data contained in each data column in the structured second original data set to reach a preset second data dimension by adopting a tail zero padding mode, so as to obtain a structured second data set;
dividing the structured second data set according to a preset proportion to obtain a second training set and a second testing set;
wherein the content of the first and second substances,
the second test set is used for carrying out effect verification on the trained field matching model;
the specific file types include: a well head file, a well trajectory file, a layered file, or a lithology file.
In a second aspect of the present invention, a data intelligent entry method based on a neural network is provided, where the method includes:
analyzing a file to be input, and constructing first structured data;
judging the type of the file to be input according to the first structured data;
selecting a corresponding field matching model according to the type of the file to be input;
extracting data in the file to be recorded;
in the extracted data, one-hot coding is carried out on the non-numeric fields;
the number of data contained in each data column reaches a preset second data dimension by adopting a tail zero padding mode, so that second structured data are obtained;
sequentially reading one data column to be matched from the second structured data each time, and judging the field name corresponding to the data column according to the field matching method based on the neural network based on the selected field matching model, so as to obtain the field name corresponding to each data column;
uploading the data in the file to be input to a data resource pool according to the type of the file to be input and the field names of all columns of data;
wherein the content of the first and second substances,
the file to be recorded is a data file obtained in seismic geological exploration;
the types of the files to be input comprise: a well head file, a well trajectory file, a layered file, or a lithology file.
Preferably, the type of the file to be entered further includes: logging curve files;
before the step of analyzing the file to be entered and constructing the first structured data, the intelligent data entry method based on the neural network further comprises the following steps of:
judging whether the suffix of the file to be recorded is.1 as, if so, determining the file to be recorded as a logging curve file; if not, then,
judging whether all data in the file to be recorded are floating point data or not, wherein one row of data is an arithmetic progression; if so, determining the file to be input as a logging curve file;
and under the condition that the file to be recorded is a logging curve file, acquiring a well name, a curve name and data from the file to be recorded and uploading the well name, the curve name and the data to the data resource pool.
Preferably, the step of "parsing the file to be entered and constructing the first structured data" includes:
deleting redundant empty rows in the file to be recorded;
extracting data in the file to be recorded;
in the extracted data, one-hot coding is carried out on each non-numeric field;
and adopting a tail zero padding mode to enable the number of data contained in each data line to reach a preset first data dimension, thereby obtaining the first structured data.
Preferably, the step of "judging the type of the file to be entered according to the first structured data" includes:
reading a line of data from the first structured data in sequence each time and inputting the line of data to a file type matching model to respectively obtain a file type corresponding to each line of data;
and determining the type of the file to be recorded by a voting method according to the file type corresponding to each line of data.
Preferably, the file type matching model is a neural network model;
the training method of the file type matching model comprises the following steps:
reading a row of data from the first training set in sequence each time and inputting the data into the file type matching model to obtain an output result;
calculating a first loss function according to the output result and the label corresponding to the row of data;
and iterating by using a gradient descent method, gradually adjusting model parameters and reducing the first loss function until a preset iteration number is reached.
In a third aspect of the present invention, a computer-readable storage medium is further provided, which is capable of being loaded by a processor and executing the above-mentioned neural network-based field matching method, or the above-mentioned neural network-based intelligent data entry method.
Compared with the closest prior art, the invention has the following beneficial effects:
the field matching method based on the neural network reads a column of data from second structured data which is constructed in advance and takes the column of data as a column of data to be matched; and inputting the data column to be matched into the field matching model to obtain the field name corresponding to the data column to be matched. Wherein the second structured data is constructed after extracting data from the seismic geological exploration original file. When the method is used for training the model, the training samples input into the neural network are all in a digital form and have the same dimensionality by performing early-stage processing on data (deleting redundant empty rows, performing one-hot coding on non-digital data, filling zero at the tail of a sample with insufficient length and the like). Except for the logging curve file, the corresponding field matching module is trained aiming at different file types. By using the field matching method, the field name of each line of data can be automatically identified through the neural network, and the identification efficiency is improved.
The invention provides a data intelligent input method based on a neural network. Otherwise, judging the file type by using the file type matching model, selecting a corresponding field matching model according to the file type, and judging the field name of each line of data in the file. And finally, uploading the data which is not subjected to the one-hot coding and zero padding in the file to a data resource pool according to the file type and the field name. Because each file type may contain tens or even hundreds of field information, and the data characteristics corresponding to different fields may be similar, if the data is directly field-matched, matching errors are easy to occur. The invention trains a corresponding field matching model for each file type, and adopts the two-stage matching method, namely, the file type matching is firstly carried out, and then the field matching is carried out based on the file type, so that the matching range is reduced, and the matching precision is improved. The invention can efficiently integrate and manage data, avoid the complex work of manually identifying field names, reduce the workload of data uploading personnel, greatly improve the uploading efficiency and increase the usability of a data resource platform.
Drawings
FIG. 1 is a schematic diagram of an algorithm for a BP neural network employed in an embodiment of the present invention;
FIG. 2 is a diagram illustrating the main steps of an embodiment of the neural network-based field matching method of the present invention;
FIG. 3 is a schematic diagram of a method for constructing second structured data according to an embodiment of the present invention;
FIG. 4 is a screenshot of a portion of data extracted from an original file in an embodiment of the invention;
FIG. 5 is a diagram illustrating the main steps of an embodiment of the field matching model training method according to the present invention;
FIG. 6 is a schematic diagram of the main steps of a first embodiment of the intelligent data entry method based on a neural network according to the present invention;
FIG. 7 is an example of a header included in a file to be entered in the embodiment of the present invention;
FIG. 8 is an example of a file to be recorded that does not include a header but includes a unit in the embodiment of the present invention;
FIG. 9 is a schematic diagram of the main steps of a second embodiment of the intelligent data entry method based on a neural network according to the present invention;
FIG. 10 is a screenshot of a well log file in an embodiment of the present invention;
FIG. 11 is a schematic diagram of the main steps of the training method of the document type matching model in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described in the present application are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first" and "second" in the description of the present invention are used for convenience of description only and do not indicate or imply relative importance of the devices, elements or parameters, and therefore should not be construed as limiting the present invention. In addition, the term "and/or" in the present invention is only one kind of association relationship describing the associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship, unless otherwise specified.
The neural network is a network model provided by the inspiration of the biological neural network, and consists of an input layer, a hidden layer and an output layer, wherein the characteristics are extracted layer by layer from the input layer, the characteristics extracted from the upper layer are used as the input of the next layer, and a plurality of weights and thresholds are continuously adjusted by a gradient descent method, so that the loss function is continuously close to zero, the weight of the network is finally determined, and a final model is established, so that the output result is infinitely close to an artificial result.
FIG. 1 shows an embodiment of the present inventionSchematic algorithm of the BP neural network used in the examples. As shown in fig. 1, the number of input features of the model is d, and the number of output features is l, i.e., l classification models. Inputting data x into an input layer, wherein x is [ x ] 1 ,x 2 ,...,x i ,...,x d ]Weighting each neuron of the input layer according to the calculation method of formula (1) to obtain the input α of the hidden layer, α ═ α 1 ,α 2 ,...,α h ,...,α q ]As shown in equation (1):
Figure BDA0003612498320000071
wherein v is ih Are weights.
The input alpha of the hidden layer is subjected to nonlinear transformation by an activation function to obtain the output b ═ f (alpha) of the hidden layer, wherein f is the activation function, and b ═ b is 1 ,b 2 ,...,b h ,...,b q ]. Then, the hidden layer node is weighted according to the calculation method of formula (2) to obtain β, as shown in formula (2):
Figure BDA0003612498320000072
wherein β ═ β 1 ,β 2 ,...,β j ,...,β l ],w hj Are weights. And inputting the beta into an output layer, and carrying out nonlinear transformation on the output layer to obtain a final output result y ═ f (beta). Wherein y ═ y 1 ,y 2 ,...,y j ,...,y l ],y j Representing the probability that data x belongs to the jth class.
The loss function adopted in the embodiment of the invention is a cross entropy function. In the case of binary classification, we assume that there are two classes in common: class a and class B, the expression of the penalty function is shown in equation (3):
Figure BDA0003612498320000073
wherein U is a loss value, M is the number of training samples, U m For the loss result corresponding to the mth training sample, t m Class label for the mth training sample, q m Probability of predicting class A for the mth training sample, 1-q m The probability of being class B is predicted for the mth training sample.
In the invention, a neural network model is used for classifying file types and field names, and in the case of multi-classification, the expression of the loss function is shown as formula (4):
Figure BDA0003612498320000074
wherein L is a loss value, N is the number of training samples, L n For the loss result corresponding to the nth training sample, G is the total classification number of the classification, y nc For the probability that the nth training sample actually belongs to class c, p nc The probability of being predicted as class c for the nth training sample.
In the embodiment of the invention, the activation function with the leakage correction linear unit (Leaky ReLU) is used for replacing the original activation function of the correction linear unit (ReLU), and the Leaky ReLU activation function outputs the neuron with a negative input value as a negative number close to 0, so that the condition of large-area neuron necrosis is effectively avoided, and the model can be well trained. The Leaky ReLU activation function is shown in equation (5):
Figure BDA0003612498320000081
wherein, gamma is a parameter, the range is between 0 and 1, and the value is generally 0.01.
FIG. 2 is a diagram illustrating the main steps of an embodiment of the field matching method based on neural network according to the present invention. As shown in fig. 2, the field matching method of the present embodiment includes steps a10-a 20:
step A10, reading a column of data from the pre-constructed second structured data as a column of data to be matched.
Step A20, inputting the data column to be matched into the field matching model, and obtaining the field name corresponding to the data column to be matched.
Wherein the second structured data is constructed after extracting data from the seismic geological exploration original file.
The field matching model in this embodiment is a neural network model, preferably a BP neural network model as shown in fig. 1, and includes an input layer, a hidden layer, and an output layer, which are three layers.
FIG. 3 is a schematic diagram of a method for constructing second structured data according to an embodiment of the present invention. As shown in FIG. 3, the step of constructing the second structured data in this embodiment includes B10-B40:
and step B10, deleting redundant empty rows in the original file of the seismic geological exploration.
And step B20, extracting data in the original file of the seismic geological exploration.
Fig. 4 is a screenshot of a part of data extracted from an original file according to an embodiment of the present invention, and the following one-hot encoding and zero padding operations are described with this part of data as an example.
In step B30, one-hot encoding is performed on the non-numeric fields in the extracted data.
As shown in fig. 4, columns 1, 4, and 7 are non-numeric data (combination of english alphabet, number, middle-drawn line, and underline). Therefore, the three non-numeric columns need to be one-hot encoded so that the data later input into the model is in numeric form. Fig. 4 contains 9 kinds of non-numeric data altogether, and assuming that there are only these 9 kinds of non-numeric data related to the field matching model (non-numeric data related to all training samples of the model), the correspondence relationship between the non-numeric data and the one-hot code in fig. 4 can be as shown in table 1:
TABLE 1 correspondence of non-numeric data to one-hot codes
Data of one-hot coding
TZ1-1 (1,0,0,0,0,0,0,0,0)
TZ1-2 (0,1,0,0,0,0,0,0,0)
TZ2-1H (0,0,1,0,0,0,0,0,0)
TZ2-2C (0,0,0,1,0,0,0,0,0)
F-OFFSHORE_BAR (0,0,0,0,1,0,0,0,0)
F-FLOODPLAIN_FINES (0,0,0,0,0,1,0,0,0)
F-BAY_MDST (0,0,0,0,0,0,1,0,0)
F-BAY_SANDSTONE (0,0,0,0,0,0,0,1,0)
F-BAY_MDST (0,0,0,0,0,0,0,0,1)
When the field type determination is performed by using the field matching model, after the non-numeric data in fig. 4 is one-hot encoded each time 1 data column in the structured data is read, the data format of the 1 st to 7 th columns is as shown in the following table 2:
table 2 format of data in each column after one-hot encoding
1 (1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0)
2 (16345727.3,16345727.3,16344998.9,16344998.9)
3 (5126798,5126798,5125757.5,5125757.5)
4 (1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0)
5 (2757.3298,2765.5186,2766.2007,2773.7983,2993.911)
6 (2765.5186,2766.2007,2773.7983,2773.9114,2781.246)
7 (0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1)
The data in the groups "1" to "7" in table 2 correspond to the data in the columns 1 to 7 in fig. 4, respectively, and it can be seen from table 2 that the number of data included in the 7 groups is not uniform, and in order to adapt to the second data dimension preset in the field matching model, the following step B40 needs to be adopted to perform zero padding operation.
And step B40, adopting a tail zero padding mode to enable the number of data contained in each data column to reach a preset second data dimension, thereby obtaining second structured data.
Assuming that the preset second data dimension is 45, a tail zero padding method needs to be adopted, so that the number of data included in each group of data in table 2 reaches 45, taking the 5 th group of data (2757.3298, 2765.5186, 2766.2007, 2773.7983, 2993.911) as an example, there are 5 original numbers, and 40 original numbers need to be padded after 2993.911, and the values of the 40 original numbers are all 0. Since the data is too long after zero padding, it is not shown here.
FIG. 5 is a diagram illustrating the main steps of an embodiment of the field matching model training method according to the present invention. As shown in fig. 5, the field matching model training method of the present embodiment includes steps C10-C30:
and step C10, reading a column of data from the second training set in sequence each time and inputting the data into the field matching model to obtain an output result.
And step C20, calculating a second loss function according to the output result and the field name tag corresponding to the column data. The second loss function may be calculated using the method shown in equation (4) above.
And step C30, performing iteration by using a gradient descent method, gradually adjusting the model parameters and reducing the second loss function until a preset iteration number is reached.
The format of the data file acquired in the seismic geological exploration can be various formats such as dat,. las,. txt,. xls,. xlsx,. prn and the like, and the file types can be divided into types such as well head, well track, layering, lithology, well logging curve and the like. Because the format of the logging curve file is special, the logging curve file can be judged without using a model, and how to judge the logging curve file will be described in the following embodiment of the intelligent data entry method based on the neural network. In this embodiment, respective field matching models are respectively constructed for four file types (a wellhead file, a well trajectory file, a hierarchical file, and lithology), so four sets of data sets are respectively constructed for the four field matching models for model training.
In an alternative embodiment, step C10 is preceded by steps C1-C5 of constructing a training set:
and step C1, acquiring all field names corresponding to the specific file types from the data resource pool, and acquiring a preset number of data columns corresponding to the field names to obtain a structured second original data set.
Wherein the specific file types include: a well head file, a well trajectory file, a layered file, or a lithology file.
In step C2, the field name corresponding to each data column is used as the field name tag of the data column.
And step C3, counting all non-numeric fields in the structured second original data set, and carrying out one-hot coding on the non-numeric fields.
It should be noted that all non-numeric data included in the data set need to be counted and encoded by using the one-hot encoding format. In the above field matching method embodiment, the one-hot encoding of the data extracted from the original file of seismic prospecting in step B30 should correspond to the one-hot encoding in the data set, and the same non-digital data, for example, "TZ 1-1", should be consistent with the encoding in step B30.
And step C4, adopting a tail zero padding mode to enable the number of data contained in each data column in the data set to reach a preset second data dimension, thereby obtaining a structured second data set.
Here, the "preset second data dimension" in the dataset is equal to the "preset second data dimension" in the second structured data extracted and constructed from the seismic geological survey original file in the above field matching method embodiment.
And step C5, dividing the structured second data set according to a preset proportion to obtain a second training set and a second testing set.
And the second test set is used for carrying out effect verification on the trained field matching model.
Further, based on the field matching method, the invention also provides a first embodiment and a second embodiment of the data intelligent entry method based on the neural network, and the details are explained below.
Fig. 6 is a schematic main step diagram of a first embodiment of a neural network-based data intelligent entry method in the present invention. As shown in fig. 6, the entry method of the present embodiment includes steps D10-D80:
and D10, analyzing the file to be recorded, and constructing first structured data.
The file to be recorded is a data file obtained in seismic geological exploration; the types of the files to be recorded comprise: a well head file, a well trajectory file, a layered file, or a lithology file, etc.
And D20, judging the type of the file to be recorded according to the first structured data.
And D30, selecting a corresponding field matching model according to the type of the file to be recorded.
And D40, extracting the data to be recorded in the file.
In step D50, one-hot encoding is performed on the non-numeric fields in the extracted data.
And D60, adopting a tail zero padding mode to enable the number of data contained in each data column to reach a preset second data dimension, thereby obtaining second structured data.
And D70, sequentially reading a data column to be matched from the second structured data each time, and judging the field name corresponding to the data column according to the field matching method based on the neural network based on the selected field matching model, so as to obtain the field name corresponding to each column of data.
And D80, uploading the data in the file to be recorded to a data resource pool according to the type of the file to be recorded and the field names of the data in each column.
In an alternative embodiment, step D10 may specifically include steps D11-D14:
and D11, deleting redundant empty rows in the file to be recorded.
And D12, extracting the data to be recorded in the file.
In step D13, one-hot encoding is performed for each non-numeric field in the extracted data.
And D14, adopting a tail zero padding mode to enable the number of data contained in each data line to reach a preset first data dimension, thereby obtaining first structured data.
In another alternative embodiment, step D20 may specifically include steps D21-D22:
and D21, reading a line of data from the first structured data in sequence each time and inputting the line of data into the file type matching model to respectively obtain the file type corresponding to each line of data.
And D22, determining the type of the file to be recorded by a voting method according to the file type corresponding to each line of data.
In yet another alternative embodiment, step D50 may be preceded by steps (1) - (4) of parsing the header and the unit:
(1) and calculating the percentage of non-numeric characters aiming at the first row of characters in the extracted data.
Because the file to be recorded may or may not include a header, it is first determined whether the first line of data is header data, and then it is determined whether the first line is a header according to the ratio of letters or Chinese characters in the first line. Fig. 7 is an example of a table header included in a file to be recorded in the embodiment of the present invention. As shown in fig. 7, the header of the first line in the file contains 4 field names: WELL _ NAME, ELEV _ TYPE, ground _ elevet (ground elevation), and TOTAL _ ZEPTH (WELL depth). Below the first line is the data, where KB represents the heart-filling altitude (kelly washout).
(2) And judging whether the extracted data contains a header or not according to the calculation result and a preset percentage threshold (80% in the embodiment).
(3) If the extracted data contains a header, the header is deleted.
(4) And determining the unit of each column of data in the file to be recorded.
According to the analysis of the existing file, the units mainly include several of ['m', 'us/m', 'g/cm 3', 'mm', 'API', 'ohmm', 'g/cc', '%', 'mv', 'omm', 'fraction','d','m/s','m/s g/cc' ], corresponding data in the file is compared with the unit list, and if unit information corresponding to the field is included, unit data thereof is extracted. Fig. 8 is an example of a file to be recorded that does not include a header but includes a unit in the embodiment of the present invention. As shown in fig. 8, the first row in the file has two units which are m (meters) and correspond to the units of the data in the third and fourth columns, respectively, and the well name in the first column and the elevation type in the second column have no units.
Since each file type may contain tens or even hundreds of field information, and the data characteristics corresponding to different fields may be similar, if the data is directly field-matched, matching errors are easy to occur. Therefore, the invention adopts a two-stage matching method, namely, the file type matching is firstly carried out, and then the corresponding field matching model is selected to carry out the field matching based on the file type, thereby reducing the matching range and improving the matching precision. Because the logging curve file is different from data of other file types, the logging curve file completely contains the name and the data of each curve, the data of the logging curve file does not need to be matched with fields, and the logging curve file can be uploaded to a data resource pool only by analyzing the well name, the curve name and the data.
Fig. 9 is a schematic diagram of the main steps of a second embodiment of the intelligent data entry method based on the neural network in the present invention. Compared with the first embodiment, the type of the file to be entered in the present embodiment includes: well head files, well trajectory files, layering files, or lithology files may also include well log files. As shown in fig. 9, the entry method of the present embodiment includes steps E10 to E110:
and E10, judging whether the suffix of the file to be recorded is 1as, if so, determining that the file to be recorded is a logging curve file, and turning to the step E30.
E20, judging whether all data in the file to be recorded are floating point data or not, wherein one row of data is an arithmetic progression; if yes, determining the file to be recorded as a logging curve file; otherwise, go to step E40.
And E30, under the condition that the file to be recorded is a logging curve file, acquiring the well name, the curve name and the data from the file to be recorded, uploading the well name, the curve name and the data to a data resource pool, and ending the program.
FIG. 10 is a screenshot of a well log file in an embodiment of the invention. As shown in FIG. 10, in the file, data measured below the "-Ascii" character string and header file information above the "-Ascii" character string are shown. Well names and curve names may be obtained from the header file information. For the sake of clarity, the well name, curve name and the location of the data are marked with three boxes from top to bottom in fig. 10.
Regarding well name acquisition: the character string "UWI" or "WELL" can be searched in the header file information of the file header, and the WELL name "TZ 4" following the character string is obtained; under the condition that the character string 'UWI' or 'WELL' is searched for unsuccessfully, the WELL name can be obtained by analyzing the file name of the file to be recorded.
Regarding the acquisition of the name of the curve: if the Curve name is corresponding to the column number of the data, one Curve name corresponds to one column of the data, so that the data can be read first, the number of the columns of the data is judged, and then the data are read upwards from the Curve name for several rows for searching.
Regarding the acquisition of data: from "Ascii" down, the data corresponding to the file may be read.
And E40, analyzing the file to be recorded, and constructing first structured data.
And E50, judging the type of the file to be recorded according to the first structured data.
And E60, selecting a corresponding field matching model according to the type of the file to be recorded.
And E70, extracting the data to be recorded in the file.
In step E80, one-hot encoding is performed on the non-numeric field in the extracted data.
And step E90, adopting a tail zero padding mode to enable the number of the data contained in each data column to reach a preset second data dimension, thereby obtaining second structured data.
And E100, sequentially reading one data column to be matched from the second structured data each time, and judging the field name corresponding to the data column according to the field matching method based on the neural network based on the selected field matching model, so as to obtain the field name corresponding to each data column.
And E110, uploading the data in the file to be recorded to a data resource pool according to the type of the file to be recorded and the field names of all columns of data.
It should be noted that the data finally uploaded to the data resource pool is also data that has not undergone one-hot encoding or tail zero padding in the file to be recorded.
FIG. 11 is a schematic diagram of the main steps of the training method of the document type matching model in the embodiment of the present invention. The file type matching model in this embodiment is a neural network model, preferably a BP neural network model as shown in fig. 1, and includes an input layer, a hidden layer, and an output layer, which are three-layer structures. As shown in FIG. 11, the training method of the document type matching model of the present embodiment includes steps F10-F30:
and step F10, reading a line of data from the first training set in sequence each time and inputting the line of data into the file type matching model to obtain an output result.
Step F20, a first loss function is calculated according to the output result and the label corresponding to the line of data. The first loss function may be calculated using the method shown in equation (4) above.
And step F30, performing iteration by using a gradient descent method, gradually adjusting the model parameters and reducing the first loss function until a preset iteration number is reached.
In this embodiment, the method for constructing the first training set includes the following steps (1) to (7):
(1) a plurality of original files of known type are obtained.
Wherein the original file is a data file obtained in seismic prospecting.
(2) And deleting redundant empty lines in the original file.
The dummy row is deleted to prevent the redundant part from interfering with the parsed data.
(3) Data in the original file is extracted to form a structured first original data set.
(4) One-hot encoding is performed on each non-numeric data in the data set.
(5) And adopting a tail zero padding mode to enable the number of data contained in each data line to reach a preset first data dimension.
(6) And taking the type of the original file as a label corresponding to each line of data to obtain a structured first data set.
In this embodiment, the category label corresponding to each file type is { well hierarchy: 0, wellhead: 1, well trajectory: 2, lithology: 3}. The number of each type in the constructed first data set is { 'well hierarchy': 83 'wellhead': 65, 'well trajectory': 191, 'lithology': 2492}.
(7) And dividing the structured first data set according to a preset proportion to obtain a first training set and a first testing set. And the first test set is used for performing effect verification on the trained file type matching model.
In this embodiment, the preset ratio is 8: 2, and the structured first data set is divided into a first training set and a first test set according to the ratio of 8: 2, where the size of the first training set is (2263, 304), and the first training set is represented as 2263 (the number of rows of data) data with 304 dimensions (the dimension of each row of data, which is obtained by concatenating one-hot codes of digital data and non-digital data in each row); the first test set is of size (568, 304) and represents 568 pieces of 304-dimensional data.
Still further, based on the field matching method and the intelligent data entry method, the present invention further provides an embodiment of a computer-readable storage medium, in which a computer-readable storage medium is stored, which can be loaded by a processor and executes the neural network-based field matching method as described above, or the intelligent data entry method as described above.
Although the foregoing embodiments describe the steps in the above sequential order, those skilled in the art will understand that, in order to achieve the effect of the present embodiments, the steps may not be executed in such an order, and may be executed simultaneously (in parallel) or in an inverse order, and these simple variations are within the scope of the present invention.
Those of skill in the art will appreciate that the method steps of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described generally in terms of their functionality in the foregoing description for the purpose of clearly illustrating the interchangeability of electronic hardware and software. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
So far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the accompanying drawings. However, it will be readily understood by those skilled in the art that the scope of the present invention is not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A neural network-based field matching method, the method comprising:
reading a column of data from the pre-constructed second structured data to serve as a data column to be matched;
inputting a data column to be matched into a field matching model to obtain a field name corresponding to the data column to be matched;
wherein the content of the first and second substances,
the second structured data is constructed after data is extracted from the original file of the seismic geological exploration;
the field matching model is a neural network model.
2. The neural network-based field matching method of claim 1, wherein the method of constructing the second structured data comprises:
deleting redundant empty rows in the original file of seismic geological exploration;
extracting data in the seismic geological exploration original file;
in the extracted data, one-hot coding is carried out on the non-numeric fields;
and adopting a tail zero padding mode to enable the number of data contained in each data column to reach a preset second data dimension so as to obtain the second structured data.
3. The neural network-based field matching method of claim 1,
the training method of the field matching model comprises the following steps:
sequentially reading a column of data from the second training set each time and inputting the column of data into the field matching model to obtain an output result;
calculating a second loss function according to the output result and the field name label corresponding to the column of data;
and iterating by using a gradient descent method, gradually adjusting the model parameters and reducing the second loss function until a preset iteration number is reached.
4. The field matching method based on the neural network as claimed in claim 3, wherein before the step of reading a column of data from the second training set in sequence each time and inputting the data into the field matching model to obtain the output result, the training method of the field matching model further comprises:
acquiring all field names corresponding to specific file types from a data resource pool, and acquiring a preset number of data columns corresponding to the field names to obtain a structured second original data set;
taking the field name corresponding to each data column as the field name label of the data column;
counting all non-digital fields in the structured second original data set, and carrying out one-hot coding on the non-digital fields;
enabling the number of data contained in each data column in the structured second original data set to reach a preset second data dimension by adopting a tail zero padding mode, so as to obtain a structured second data set;
dividing the structured second data set according to a preset proportion to obtain a second training set and a second testing set;
wherein the content of the first and second substances,
the second test set is used for carrying out effect verification on the trained field matching model;
the specific file types include: a well head file, a well trajectory file, a layered file, or a lithology file.
5. A data intelligent entry method based on a neural network is characterized by comprising the following steps:
analyzing a file to be input, and constructing first structured data;
judging the type of the file to be input according to the first structured data;
selecting a corresponding field matching model according to the type of the file to be input;
extracting data in the file to be recorded;
in the extracted data, one-hot coding is carried out on the non-numeric fields;
the number of data contained in each data column reaches a preset second data dimension by adopting a tail zero padding mode, so that second structured data are obtained;
sequentially reading one data column to be matched from the second structured data each time, and judging the field name corresponding to the data column according to the field matching method based on the neural network as claimed in any one of claims 1 to 4 based on the selected field matching model, so as to obtain the field name corresponding to each data column;
uploading the data in the file to be input to a data resource pool according to the type of the file to be input and the field names of all columns of data;
wherein the content of the first and second substances,
the file to be recorded is a data file obtained in seismic geological exploration;
the types of the files to be recorded comprise: a well head file, a well trajectory file, a layered file, or a lithology file.
6. The intelligent data entry method based on the neural network as claimed in claim 5,
the type of the file to be entered further comprises: logging curve files;
before the step of analyzing the file to be entered and constructing the first structured data, the intelligent data entry method based on the neural network further comprises the following steps of:
judging whether the suffix of the file to be input is 1as, if so, determining the file to be input as a logging curve file; if not, then,
judging whether all data in the file to be recorded are floating point data or not, wherein one row of data is an arithmetic progression; if so, determining the file to be input as a logging curve file;
and under the condition that the file to be recorded is a logging curve file, acquiring a well name, a curve name and data from the file to be recorded and uploading the well name, the curve name and the data to the data resource pool.
7. The intelligent data entry method based on the neural network as claimed in claim 5, wherein the step of analyzing the file to be entered and constructing the first structured data comprises the following steps:
deleting redundant empty rows in the file to be recorded;
extracting data in the file to be recorded;
in the extracted data, one-hot coding is carried out on each non-numeric field;
and adopting a tail zero padding mode to enable the number of data contained in each data line to reach a preset first data dimension, thereby obtaining the first structured data.
8. The intelligent data entry method based on the neural network as claimed in claim 5, wherein the step of determining the type of the file to be entered according to the first structured data comprises:
reading a line of data from the first structured data in sequence each time and inputting the line of data to a file type matching model to respectively obtain a file type corresponding to each line of data;
and determining the type of the file to be recorded by a voting method according to the file type corresponding to each line of data.
9. The intelligent data entry method based on the neural network as claimed in claim 8,
the file type matching model is a neural network model;
the training method of the file type matching model comprises the following steps:
reading a row of data from the first training set in sequence each time and inputting the data into the file type matching model to obtain an output result;
calculating a first loss function according to the output result and the label corresponding to the row of data;
and iterating by using a gradient descent method, gradually adjusting model parameters and reducing the first loss function until a preset iteration number is reached.
10. A computer-readable storage medium, characterized in that it is stored that can be loaded by a processor and executes a neural network-based field matching method as claimed in any one of claims 1 to 4, or a neural network-based intelligent data entry method as claimed in any one of claims 5 to 9.
CN202210436149.9A 2022-04-24 2022-04-24 Neural network-based field matching method and data intelligent input method Active CN114896467B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210436149.9A CN114896467B (en) 2022-04-24 2022-04-24 Neural network-based field matching method and data intelligent input method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210436149.9A CN114896467B (en) 2022-04-24 2022-04-24 Neural network-based field matching method and data intelligent input method

Publications (2)

Publication Number Publication Date
CN114896467A true CN114896467A (en) 2022-08-12
CN114896467B CN114896467B (en) 2024-02-09

Family

ID=82718533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210436149.9A Active CN114896467B (en) 2022-04-24 2022-04-24 Neural network-based field matching method and data intelligent input method

Country Status (1)

Country Link
CN (1) CN114896467B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117292A (en) * 2009-12-30 2011-07-06 中国银联股份有限公司 File secondary generation and query method
US20110270858A1 (en) * 2008-12-31 2011-11-03 Xiao Zhuang File type recognition analysis method and system
CN106250777A (en) * 2016-07-26 2016-12-21 合肥赛猊腾龙信息技术有限公司 In the leakage-preventing system of data, a kind of document fingerprint extracts and matching process
CN110162300A (en) * 2019-04-16 2019-08-23 中国平安财产保险股份有限公司 A kind of insurance business development approach and device
CN110795397A (en) * 2019-10-30 2020-02-14 河南省有色金属地质矿产局第七地质大队 Automatic identification method for catalogue and file type of geological data packet
CN112199415A (en) * 2019-10-29 2021-01-08 山东大学 Data feature preprocessing method and implementation system and application thereof
CN112286934A (en) * 2020-10-29 2021-01-29 平安信托有限责任公司 Database table importing method, device, equipment and medium
CN112347362A (en) * 2020-11-16 2021-02-09 安徽农业大学 Personalized recommendation method based on graph self-encoder
CN113657217A (en) * 2021-08-02 2021-11-16 金陵科技学院 Concrete state recognition model based on improved BP neural network
CN113673252A (en) * 2021-08-12 2021-11-19 之江实验室 Automatic join recommendation method for data table based on field semantics
CN114896468A (en) * 2022-04-24 2022-08-12 北京月新时代科技股份有限公司 File type matching method and intelligent data entry method based on neural network

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110270858A1 (en) * 2008-12-31 2011-11-03 Xiao Zhuang File type recognition analysis method and system
CN102117292A (en) * 2009-12-30 2011-07-06 中国银联股份有限公司 File secondary generation and query method
CN106250777A (en) * 2016-07-26 2016-12-21 合肥赛猊腾龙信息技术有限公司 In the leakage-preventing system of data, a kind of document fingerprint extracts and matching process
CN110162300A (en) * 2019-04-16 2019-08-23 中国平安财产保险股份有限公司 A kind of insurance business development approach and device
CN112199415A (en) * 2019-10-29 2021-01-08 山东大学 Data feature preprocessing method and implementation system and application thereof
CN110795397A (en) * 2019-10-30 2020-02-14 河南省有色金属地质矿产局第七地质大队 Automatic identification method for catalogue and file type of geological data packet
CN112286934A (en) * 2020-10-29 2021-01-29 平安信托有限责任公司 Database table importing method, device, equipment and medium
CN112347362A (en) * 2020-11-16 2021-02-09 安徽农业大学 Personalized recommendation method based on graph self-encoder
CN113657217A (en) * 2021-08-02 2021-11-16 金陵科技学院 Concrete state recognition model based on improved BP neural network
CN113673252A (en) * 2021-08-12 2021-11-19 之江实验室 Automatic join recommendation method for data table based on field semantics
CN114896468A (en) * 2022-04-24 2022-08-12 北京月新时代科技股份有限公司 File type matching method and intelligent data entry method based on neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JUNYONG WEI等: "Design and Implementation of survey and design enterprise file collection system", 《EBEE \'21: PROCEEDINGS OF THE 2021 3RD INTERNATIONAL CONFERENCE ON E-BUSINESS AND E-COMMERCE ENGINEERING》, pages 40 *
郝利军: "农用地质量动态监测数据申报与预测", 《中国优秀硕士学位论文全文数据库经济与管理科学辑》, no. 08, pages 149 - 52 *

Also Published As

Publication number Publication date
CN114896467B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
Wang et al. Marcellus shale lithofacies prediction by multiclass neural network classification in the Appalachian Basin
CN104533400B (en) Method for reconstructing logging curve
CN114154427B (en) Volume fracturing fracture expansion prediction method and system based on deep learning
CN106372402A (en) Parallelization method of convolutional neural networks in fuzzy region under big-data environment
CN111665560B (en) Hydrocarbon reservoir identification method, apparatus, computer device and readable storage medium
CN104727813B (en) The porosity measurement method on one introduces a collection storage symbiotic type stratum
CN112712025A (en) Complex lithology identification method based on long-term and short-term memory neural network
US20210381362A1 (en) Method and apparatus for estimating lithofacies by learning well logs
CN107436452A (en) Hydrocarbon source rock Forecasting Methodology and device based on probabilistic neural network algorithm
CN113344050A (en) Lithology intelligent identification method and system based on deep learning
AU2011382648A1 (en) Permeability prediction systems and methods using quadratic discriminant analysis
CN112784980A (en) Intelligent logging horizon division method
CN113534261A (en) Reservoir gas content detection method and device based on intelligent optimization integrated network
CN114896468B (en) File type matching method and data intelligent input method based on neural network
CN114638300A (en) Method, device and storage medium for identifying desserts of shale oil and gas reservoir
Wang Black Shale lithofacies prediction and distribution pattern analysis of Middle Devonian Marcellus Shale in the Appalachian basin, Northeastern USA
Hou et al. Data-driven optimization of brittleness index for hydraulic fracturing
CN111462037B (en) Geological exploration drilling quality detection method
CN114896467B (en) Neural network-based field matching method and data intelligent input method
US20230212937A1 (en) Automated electric submersible pump (esp) failure analysis
US20220074291A1 (en) System and method for reservoired oil production based on calculated composition of natural tracers
Kakouei et al. Lithological facies identification in Iranian largest gas field: A comparative study of neural network methods
CN112990320A (en) Lithology classification method and device, electronic equipment and storage medium
CN113608258A (en) Self-consistent deep learning method for constructing high-resolution wave impedance inversion label
US20230141334A1 (en) Systems and methods of modeling geological facies for well development

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant