CN116701931A - Water quality parameter inversion method and device, storage medium and electronic equipment - Google Patents

Water quality parameter inversion method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN116701931A
CN116701931A CN202310615470.8A CN202310615470A CN116701931A CN 116701931 A CN116701931 A CN 116701931A CN 202310615470 A CN202310615470 A CN 202310615470A CN 116701931 A CN116701931 A CN 116701931A
Authority
CN
China
Prior art keywords
data set
sample data
training
preset
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310615470.8A
Other languages
Chinese (zh)
Inventor
杨磊
金和平
周华杰
黄忠初
罗惠恒
胡金艳
许艳丽
周超辉
李德龙
张晓萌
姜鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Three Gorges Corp
Original Assignee
China Three Gorges Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Three Gorges Corp filed Critical China Three Gorges Corp
Priority to CN202310615470.8A priority Critical patent/CN116701931A/en
Publication of CN116701931A publication Critical patent/CN116701931A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A20/00Water conservation; Efficient water supply; Efficient water use
    • Y02A20/152Water filtration

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a water quality parameter inversion method, a device, a storage medium and electronic equipment, which are integrated with water quality parameter data and remote sensing data, can perform accurate inversion of water quality parameters based on the water quality parameter data and the remote sensing data, and solve the technical problem that the prior art does not consider the role of point source monitoring data in water quality inversion; meanwhile, a training sample data set which can easily enable the neural network model to obtain a good training result can be provided, and the technical problems that training sample data generated in the prior art is not suitable for training the neural network model and the training sample data lacks diversity are solved.

Description

Water quality parameter inversion method and device, storage medium and electronic equipment
Technical Field
The invention relates to the technical field of data processing, in particular to a water quality parameter inversion method, a device, a storage medium and electronic equipment.
Background
At present, the monitoring of the water environment of a reservoir area mainly comprises traditional point source monitoring and manual monitoring, the water quality of the point source is sampled regularly and then detected by laboratory staff, intelligent monitoring means are few, the point source monitoring data are difficult to reflect the spatial distribution condition of the water quality in a large range, remote sensing monitoring data can compensate the problem of the monitoring spatial discontinuity of the point source, and auxiliary verification of the point source monitoring data is still needed, so that the accurate inversion of the water quality parameters of the reservoir area can be realized by fusing the remote sensing monitoring data and the point source monitoring data.
However, in the prior art, when water quality is monitored based on remote sensing data, the effect of point source monitoring data in water quality inversion is not considered, so that the water quality parameter inversion accuracy is low; meanwhile, the influence of the diversity of the training samples on the water quality inversion is not considered, so that the generated training sample data is not suitable for training a neural network model.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide a water quality parameter inversion method, a device, a storage medium, and an electronic apparatus, so as to solve the technical problem in the prior art that the water quality parameter inversion accuracy is low and the training sample data lacks diversity, so that the generated training sample data is not suitable for training a neural network model.
The technical scheme provided by the invention is as follows:
in a first aspect, an embodiment of the present invention provides a water quality parameter inversion method, including: acquiring a first unlabeled sample data set corresponding to a water surface monitoring point, and selecting a second unlabeled sample data set in the first unlabeled sample data set based on a preset selection condition, wherein the first unlabeled sample data set comprises a water quality parameter data set and a remote sensing data set; adding a first characteristic data set into the second unlabeled sample data set based on a preset adding requirement to obtain a first labeled sample data set, and generating a training sample data set based on the first labeled sample data set; training a preset neural network model based on the training sample data set to obtain a target neural network model; acquiring a third unlabeled sample data set, and inputting the third unlabeled sample data set into the target neural network model to obtain a second characteristic data set corresponding to the third unlabeled sample data set; and processing by a preset regression analysis method based on the second characteristic data set to obtain a water quality parameter inversion result.
With reference to the first aspect, in a possible implementation manner of the first aspect, the first unlabeled sample data set further includes a fourth unlabeled sample data set that does not meet the preset selection condition; generating a training sample data set based on the first labeled sample data set, comprising: dividing the first labeling sample data set based on the first characteristic data set to obtain a first class labeling sample data set and a second class labeling sample data set; training a preset machine learning model based on the first class labeling sample data set and the second class labeling sample data set to obtain a target machine learning model; inputting the fourth unlabeled sample data set into the target machine learning model to obtain a first similarity score set, wherein the first similarity score set characterizes the similarity between the fourth unlabeled sample data set and the first class labeled sample data set; comparing each first similarity score in the first similarity score sets with a preset value, and dividing the fourth unlabeled sample data set according to the comparison result to obtain a first class unlabeled sample data set; processing the first class unlabeled sample data set by using the target machine learning model and a preset dividing method until a second labeled sample data set meeting a preset ending condition is obtained; the training sample data set is determined based on the second labeled sample data set.
With reference to the first aspect, in another possible implementation manner of the first aspect, using the target machine learning model and a preset partitioning method, processing the first class unlabeled sample data set until a second labeled sample data set meeting a preset end condition is obtained includes: dividing the first class unlabeled sample data set to obtain a preset first number of first class unlabeled sample sub-data sets; based on the unlabeled sample sub-data sets of each first category, obtaining a first similarity average score data set through the target machine learning model processing; determining a second-class unlabeled sample sub-data set with a preset second number in each first-class unlabeled sample data set through a preset sorting selection method based on the first-class average score data set; determining a preset third number of third class unlabeled sample sub-data sets in each of the second class unlabeled sample sub-data sets; labeling each of the non-labeled sample sub-data sets of the third category to obtain a third labeled sample data set; judging whether the preset ending condition is met or not; and when the preset end condition is met, repeating the steps of dividing the first class unlabeled sample data set to obtain a preset first number of first class unlabeled sample sub-data sets, labeling each third class unlabeled sample sub-data set to obtain third labeled sample data sets corresponding to each third class unlabeled sample sub-data set, stopping until the preset end condition is met, and obtaining a second labeled sample data set, wherein the second labeled sample data set comprises the third labeled sample data set corresponding to each third class unlabeled sample sub-data set.
With reference to the first aspect, in a further possible implementation manner of the first aspect, the method further includes: sorting the unlabeled sample sub-data sets of each second category based on the first similarity average score data set to obtain a sorting result; and selecting the preset third number of unlabeled sample sub-data sets of the third category in each unlabeled sample sub-data set of the second category based on the sorting result.
With reference to the first aspect, in a further possible implementation manner of the first aspect, after inputting the fourth unlabeled sample data set into the target machine learning model to obtain the first similarity score set, the method further includes: comparing each first similarity score in the first similarity score sets with a preset value, and dividing the fourth unlabeled sample data set according to the comparison result to obtain a second class unlabeled sample data set; processing the second class unlabeled sample data set by using the target machine learning model and a preset dividing method until the second labeled sample data set meeting the preset ending condition is obtained; the training sample data set is determined based on the second labeled sample data set.
With reference to the first aspect, in a further possible implementation manner of the first aspect, before training the preset neural network model based on the training sample data set to obtain the target neural network model, the method further includes: acquiring a first neuron and a second neuron corresponding to the preset neural network model; and dividing the preset neural network model by utilizing preset dividing conditions based on the first neuron and the second neuron to obtain a first neural network model and a second neural network model.
With reference to the first aspect, in a further possible implementation manner of the first aspect, training a preset neural network model based on the training sample data set to obtain a target neural network model includes: training the first neural network model by using the training sample data set to obtain a first neural network training model; inputting the training sample data set into the first neural network training model, and calculating a first error value of each training sample data in the first neural network training model and the training sample data set; determining a first training sample sub-data set in the training sample data set based on each of the first error values and a preset error threshold; training the second neural network model by using the first training sample sub-data set to obtain a second neural network training model; acquiring first model parameters of the first neural network training model and second model parameters of the second neural network training model; determining a third model parameter of the preset neural network model based on the first model parameter and the second model parameter; training the preset neural network model based on the training sample data set and the third model parameter until the second error value of each training sample data in the training sample data set and the output value of the trained preset neural network model are smaller than or equal to the preset error threshold value, and obtaining the target neural network model.
In a second aspect, an embodiment of the present invention provides a water quality parameter inversion apparatus, including: the data acquisition module is used for acquiring a first unlabeled sample data set corresponding to the water surface monitoring point, and selecting a second unlabeled sample data set in the first unlabeled sample data set based on a preset selection condition, wherein the first unlabeled sample data set comprises a water quality parameter data set and a remote sensing data set; the adding and generating module is used for adding a first characteristic data set into the second unlabeled sample data set based on a preset adding requirement to obtain a first labeled sample data set, and generating a training sample data set based on the first labeled sample data set; the training module is used for training a preset neural network model based on the training sample data set to obtain a target neural network model; the acquisition and input module is used for acquiring a third unlabeled sample data set, inputting the third unlabeled sample data set into the target neural network model, and obtaining a second characteristic data set corresponding to the third unlabeled sample data set; and the processing module is used for processing the second characteristic data set by a preset regression analysis method to obtain a water quality parameter inversion result.
In a third aspect, an embodiment of the present invention provides a computer readable storage medium, where a computer program is stored, where the computer program is configured to cause the computer to perform the water quality parameter inversion method according to any one of the first aspect and the first aspect of the embodiment of the present invention.
In a fourth aspect, an embodiment of the present invention provides an electronic device, including: the water quality parameter inversion system comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores a computer program, and the processor executes the computer program to execute the water quality parameter inversion method according to any one of the first aspect and the first aspect of the embodiment of the invention.
The technical scheme provided by the invention has the following effects:
according to the water quality parameter inversion method provided by the embodiment of the invention, the water quality parameter data and the remote sensing data are fused, the accurate inversion of the water quality parameters can be performed based on the water quality parameter data and the remote sensing data, and the technical problem that the point source monitoring data are not considered to play a role in the water quality inversion in the prior art is solved; meanwhile, a training sample data set which can easily enable the neural network model to obtain a good training result can be provided, and the technical problems that training sample data generated in the prior art is not suitable for training the neural network model and the training sample data lacks diversity are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a water quality parameter inversion method provided according to an embodiment of the present invention;
FIG. 2 is a flow chart of a multi-remote sensing data inversion method based on statistical regression and deep learning provided according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a neural network model provided in accordance with an embodiment of the present invention;
FIG. 4 is a block diagram of a water quality parameter inversion apparatus according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a computer-readable storage medium provided according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the terms "first," "second," and the like in the description and the claims and drawings of the present invention are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the invention provides a water quality parameter inversion method, as shown in fig. 1, which comprises the following steps:
step S1: and acquiring a first unlabeled sample data set corresponding to the water surface monitoring point, and selecting a second unlabeled sample data set in the first unlabeled sample data set based on a preset selection condition.
The first unlabeled sample data set may include a water quality parameter data set and a remote sensing data set corresponding to the plurality of water surface monitoring points.
In the embodiment of the invention, the water surface monitoring points are actually small areas on the water surface, and the water surface area occupied by the water surface monitoring points is small relative to the water surface of the whole reservoir area.
Further, the water quality parameter data set can be obtained by sampling the water quality of the water surface monitoring points; the remote sensing data represents the related data of the remote sensing image of the water surface monitoring point.
Specifically, the water quality parameter data set and the remote sensing data set are stored together, and unlabeled sample data, namely a first unlabeled sample data set, are formed.
Further, a small amount of unlabeled sample data, i.e., a second unlabeled sample data set, is selected from the plurality of unlabeled sample data.
Step S2: and adding a first characteristic data set into the second unlabeled sample data set based on a preset adding requirement to obtain a first labeled sample data set, and generating a training sample data set based on the first labeled sample data set.
Specifically, the preset addition requirements are: and adding corresponding characteristic data to each unlabeled sample data in the second unlabeled sample data respectively.
By adding the feature data, labeling processing of each unlabeled sample data in the second unlabeled sample data can be realized, a small amount of labeled sample data, namely a first labeled sample data set, is obtained, and a plurality of training sample data are generated based on the first labeled sample data set.
The labeling process may be manual labeling, or may be other automatic labeling methods, which is not limited in the embodiment of the present invention, so long as labeling requirements are satisfied.
Step S3: and training a preset neural network model based on the training sample data set to obtain a target neural network model.
Specifically, training a preset neural network model by using the obtained training sample data set, and obtaining a trained target neural network model.
Step S4: and acquiring a third unlabeled sample data set, and inputting the third unlabeled sample data set into the target neural network model to obtain a second characteristic data set corresponding to the third unlabeled sample data set.
Specifically, after the trained target neural network model is obtained, a large amount of unlabeled sample data in the reservoir area, namely a third unlabeled sample data set, is obtained again, and the third unlabeled sample data set is input into the target neural network model for processing, so that characteristic data corresponding to each unlabeled sample data in the third unlabeled sample data set can be output.
The third unlabeled sample data is the same as the first labeled sample data set, and also comprises a water quality parameter data set and a remote sensing data set corresponding to the water surface monitoring points.
Step S5: and processing by a preset regression analysis method based on the second characteristic data set to obtain a water quality parameter inversion result.
Specifically, based on each characteristic data in the second characteristic data set, inversion calculation of the water quality parameters of the reservoir area is achieved by adopting a preset regression analysis method, and a corresponding water quality parameter inversion result is obtained. In the embodiment of the present invention, the preset regression analysis method is not particularly limited, as long as the regression analysis requirement is satisfied.
According to the water quality parameter inversion method provided by the embodiment of the invention, the water quality parameter data and the remote sensing data are fused, the accurate inversion of the water quality parameters can be performed based on the water quality parameter data and the remote sensing data, and the technical problem that the point source monitoring data are not considered to play a role in the water quality inversion in the prior art is solved; meanwhile, a training sample data set which can easily enable the neural network model to obtain a good training result can be provided, and the technical problems that training sample data generated in the prior art is not suitable for training the neural network model and the training sample data lacks diversity are solved.
As an optional implementation manner of the embodiment of the present invention, generating a training sample data set based on the first labeling sample data set in the step S2 includes: dividing the first labeling sample data set based on the first characteristic data set to obtain a first class labeling sample data set and a second class labeling sample data set; training a preset machine learning model based on the first class labeling sample data set and the second class labeling sample data set to obtain a target machine learning model; inputting the fourth unlabeled sample data set into the target machine learning model to obtain a first similarity score set; comparing each first similarity score in the first similarity score sets with a preset value, and dividing the fourth unlabeled sample data set according to the comparison result to obtain a first class unlabeled sample data set; processing the first class unlabeled sample data set by using the target machine learning model and a preset dividing method until a second labeled sample data set meeting a preset ending condition is obtained; the training sample data set is determined based on the second labeled sample data set.
The first unlabeled sample data set further comprises a large number of unlabeled sample data which are not selected, namely a fourth unlabeled sample data set which does not meet preset selection conditions; the first set of similarity scores characterizes the similarity of the fourth unlabeled exemplar data set to the first class labeled exemplar data set.
Firstly, a small amount of marked first marked sample data sets are divided into a first category and a second category according to the first characteristic data sets, and meanwhile, the marked sample data of the first category and the second category are used for training a machine learning model to obtain a trained target machine learning model.
And secondly, inputting a fourth unlabeled sample data set which is not selected in the first unlabeled sample data set into a trained target machine learning model, and outputting a similarity score of each unlabeled sample data in the fourth unlabeled sample data set and each first-class labeled sample data in the first-class labeled sample data set, namely outputting a first similarity score set.
And then, dividing unlabeled sample data in the fourth unlabeled sample data set corresponding to the first similarity score set being greater than zero into unlabeled sample data sets of the first category.
Further, processing the first class unlabeled sample data set by using the target machine learning model and the preset dividing method, and obtaining a second labeled sample data set meeting the preset ending condition, including: dividing the first class unlabeled sample data set to obtain a preset first number of first class unlabeled sample sub-data sets; based on the unlabeled sample sub-data sets of each first category, obtaining a first similarity average score data set through the target machine learning model processing; determining a second-class unlabeled sample sub-data set with a preset second number in each first-class unlabeled sample data set through a preset sorting selection method based on the first-class average score data set; determining a preset third number of third class unlabeled sample sub-data sets in each of the second class unlabeled sample sub-data sets; labeling each of the non-labeled sample sub-data sets of the third category to obtain a third labeled sample data set; judging whether the preset ending condition is met or not; and when the preset end condition is met, repeating the steps of dividing the first class unlabeled sample data set to obtain a preset first number of first class unlabeled sample sub-data sets, labeling each third class unlabeled sample sub-data set to obtain third labeled sample data sets corresponding to each third class unlabeled sample sub-data set, stopping until the preset end condition is met, and obtaining a second labeled sample data set, wherein the second labeled sample data set comprises the third labeled sample data set corresponding to each third class unlabeled sample sub-data set.
The first-class unlabeled sample data set is continuously divided into a preset first number of unlabeled sample data sets of different classes, and a preset first number of first-class unlabeled sample sub-data sets are obtained.
Further, inputting each first class unlabeled sample sub-data set into the target machine learning model, calculating a similarity score of each unlabeled sample sub-data in each first class unlabeled sample sub-data set, calculating an average value of the similarity scores, and further forming a corresponding first similarity average score data set according to the average value of the similarity scores corresponding to each first class unlabeled sample sub-data set.
Further, sorting each similarity average score in the first similarity average score data according to the order from small to large, and selecting the unlabeled sample data of the corresponding class with the forefront sorting preset second number from the unlabeled sample data set of the first class, namely, the unlabeled sample sub-data set of the second class.
Further, randomly selecting a preset third number of third-class unlabeled sample sub-data sets from the second-class unlabeled sample sub-data sets, and labeling each unlabeled sample data in the third-class unlabeled sample sub-data sets to obtain a labeled third labeled sample data set.
Further, whether a preset end condition is met or not is judged, if the end condition is met, a third marked sample data set corresponding to all marked non-marked sample sub-data sets of all the third categories at present, namely a second marked sample data set is used, and training sample data are formed. And if the end condition is not met, dividing the first class unlabeled sample data set again to obtain a preset first number of first class unlabeled sample sub-data sets until the preset end condition is met, and obtaining a second labeled sample data set.
Wherein the preset second number is smaller than the preset first number; the preset end condition may include that the training number of times for the machine learning model reaches a preset training number of times threshold, and that an error between an output value and an actual value of the machine learning model is found to be smaller than a preset error threshold through a test for the machine learning model.
As an optional implementation manner of the embodiment of the present invention, the method further includes: sorting the unlabeled sample sub-data sets of each second category based on the first similarity average score data set to obtain a sorting result; and selecting the preset third number of unlabeled sample sub-data sets of the third category in each unlabeled sample sub-data set of the second category based on the sorting result.
Specifically, when determining the unlabeled sample sub-data sets of the third category with the preset third number, sorting all unlabeled sample data in each unlabeled sample sub-data set of the second category according to the sequence from the low average similarity score to the high average similarity score in the first average similarity score data set, and further selecting the unlabeled sample sub-data set of the third category with the preset third number with the forefront sorting in the unlabeled sample sub-data sets of the second category.
As an optional implementation manner of the embodiment of the present invention, after inputting the fourth unlabeled sample data set into the target machine learning model to obtain the first similarity score set, the method further includes: comparing each first similarity score in the first similarity score sets with a preset value, and dividing the fourth unlabeled sample data set according to the comparison result to obtain a second class unlabeled sample data set; processing the second class unlabeled sample data set by using the target machine learning model and a preset dividing method until the second labeled sample data set meeting the preset ending condition is obtained; the training sample data set is determined based on the second labeled sample data set.
Specifically, in the embodiment of the present invention, the fourth unlabeled sample data set is divided to obtain a second class unlabeled sample data set, and the second class unlabeled sample data set is processed by using the target machine learning model and the preset dividing method, and a training sample data set is generated according to the processing result.
The specific implementation process refers to a process of dividing the fourth unlabeled sample data set to obtain a first class unlabeled sample data set and generating a training sample data set based on the first class unlabeled sample data set, which is not described herein.
As an optional implementation manner of the embodiment of the present invention, before the step S3, the method further includes: acquiring a first neuron and a second neuron corresponding to the preset neural network model; and dividing the preset neural network model by utilizing preset dividing conditions based on the first neuron and the second neuron to obtain a first neural network model and a second neural network model.
Wherein the first neuron represents neurons of different layers in the up-down direction except for neurons of an input layer and an output layer in a preset neural network model; the second neuron represents neurons of different layers in the front-rear direction in the preset neural network model.
Specifically, the preset neural network model is divided by taking a combination of a first neuron and a second neuron which can independently extract different layers of data characteristics as a unit, and a corresponding first neural network model and second neural network model are obtained.
As an optional implementation manner of the embodiment of the present invention, the step S3 includes: training the first neural network model by using the training sample data set to obtain a first neural network training model; inputting the training sample data set into the first neural network training model, and calculating a first error value of each training sample data in the first neural network training model and the training sample data set; determining a first training sample sub-data set in the training sample data set based on each of the first error values and a preset error threshold; training the second neural network model by using the first training sample sub-data set to obtain a second neural network training model; acquiring first model parameters of the first neural network training model and second model parameters of the second neural network training model; determining a third model parameter of the preset neural network model based on the first model parameter and the second model parameter; training the preset neural network model based on the training sample data set and the third model parameter until the second error value of each training sample data in the training sample data set and the output value of the trained preset neural network model are smaller than or equal to the preset error threshold value, and obtaining the target neural network model.
Firstly, training a first neural network model by using a training sample data set to obtain a trained first neural network training model, inputting the training sample data set into the trained first neural network training model, obtaining a first error value of each training sample data in the training sample data set of the trained first neural network training model, and further determining a plurality of training sample data with the first error value larger than a preset error threshold, namely a first training sample sub-data set.
And secondly, training the second neural network training model by using the first training sample sub-data set to obtain a trained second neural network training model. Further, the parameters at the beginning of training of the preset neural network model, namely the third model parameters, can be determined according to the first model parameters of the trained first neural network training model and the second model parameters of the trained second neural network training model.
Then training the preset neural network model by using the training sample data set and the third model parameter, judging whether the output value of the trained preset neural network model and the second error value of each training sample data in the training sample data set are smaller than or equal to a preset error threshold value, and if so, ending the training of the preset neural network model and obtaining a trained target neural network model; if the error threshold value is greater than the preset error threshold value, the step S1 is skipped, and the execution is continued.
In an example, as shown in fig. 2, a multi-remote sensing data inversion method based on statistical regression and deep learning is provided, which is mainly implemented by performing the following steps:
firstly, acquiring water quality parameter data of a plurality of water surface monitoring points in a reservoir area and remote sensing data corresponding to the water surface monitoring points respectively, and storing the water quality parameter data and the remote sensing data corresponding to the same water surface monitoring point together to form unlabeled sample data respectively;
selecting a small amount of unlabeled sample data from a plurality of unlabeled sample data, adding corresponding characteristic data for the unlabeled sample data to obtain a small amount of labeled sample data, and generating a plurality of training sample data based on the small amount of labeled sample data;
training the neural network model by using a plurality of training sample data, simultaneously obtaining a large amount of unlabeled sample data of the reservoir area again, respectively inputting the unlabeled sample data into the trained neural network model, outputting characteristic data corresponding to the unlabeled sample data by the trained neural network model, and obtaining inversion results of water quality parameters of the reservoir area by using a regression analysis method for a large amount of characteristic data.
Specifically, when inversion calculation is performed on water quality parameters of a reservoir area, the point source monitoring data are difficult to reflect a large-range water quality space distribution condition, and although the remote sensing monitoring data can compensate for the problem of discontinuity of a monitoring space of the point source, auxiliary verification of the point source monitoring data is still required, so that the inversion calculation of the water quality parameters of the reservoir area is proposed by fusing the remote sensing monitoring data and the point source monitoring data. Firstly, water quality parameter data of water surface monitoring points and remote sensing data are obtained, unlabeled sample data are formed by the water surface monitoring points, wherein the water surface monitoring points are actually small areas on the water surface, compared with the water surface of the whole reservoir area, the water surface area occupied by the water surface monitoring points is small, the remote sensing data are relevant data of remote sensing images of the water surface monitoring points, the water quality parameter data are obtained by sampling the water quality of the water surface monitoring points, secondly, a small amount of unlabeled sample data are selected from the unlabeled sample data, labeling processing is carried out on the unlabeled sample data, namely characteristic data are added to the unlabeled sample data, so that a small amount of labeled sample data can be obtained, the labeling processing can be carried out manually, training sample data can be generated on the basis of the labeled sample data after the small amount of labeled sample data are obtained, the training sample data are used for training a neural network model, the neural network model can realize fusion processing of the water quality parameter data and the remote sensing data, finally, the collected unlabeled sample data are input into the neural network model, the neural network is returned to the corresponding neural network, and the characteristic data can be calculated based on the characteristic data inversion method of the water quality parameter analysis region can be realized.
Further, generating a plurality of training sample data based on a small amount of labeled sample data, comprising the steps of:
the method comprises the steps of firstly, dividing a small amount of marked sample data into a first category and a second category according to characteristic data of the small amount of marked sample data, training a machine learning model by using the marked sample data of the first category and the second category, and using a trained machine learning model for a large amount of unmarked sample data which are not selected in a plurality of unmarked sample data, wherein the trained machine learning model outputs similarity scores of the unmarked sample data and the marked sample data of the first category respectively;
the second step, according to the similarity scores of unlabeled sample data and first-class labeled sample data respectively output by the machine learning model, the unlabeled sample data with the similarity score larger than zero in a large number of unlabeled sample data which are not selected in the plurality of unlabeled sample data are divided into the first class, and meanwhile, the unlabeled sample data with the similarity score smaller than zero in the large number of unlabeled sample data which are not selected in the plurality of unlabeled sample data are divided into the second class;
Dividing unlabeled sample data of the first category and the second category into different categories with preset first quantity respectively, calculating the average value of similarity scores of the unlabeled sample data of the first category respectively for the unlabeled sample data of the first category, and calculating the absolute value of the average value of the unlabeled sample data of the second category respectively for the unlabeled sample data of the second category, sorting the average value of the similarity scores and the absolute value of the average value of the similarity scores in the preset second quantity according to the sequence from small to large, and selecting the corresponding category in the preset second quantity with the highest sorting or the corresponding category in the second category;
a fourth step of randomly selecting a preset third number of unlabeled sample data from unlabeled sample data of a corresponding category in the first category, or randomly selecting a preset third number of unlabeled sample data from unlabeled sample data of a corresponding category in the second category, and labeling the selected unlabeled sample data to obtain labeled sample data;
And fifthly, judging whether the ending condition is met, if so, forming training sample data by using all the current marked sample data, and ending all the steps, and if not, jumping to the starting step.
Further, the preset second number is smaller than the preset first number. In this embodiment, the first number is set to be twice the second number.
Further, the ending condition includes that the training times of the machine learning model reach a preset training time threshold value, and the error between the output value and the actual value of the machine learning model is found to be smaller than the preset error threshold value through the test of the machine learning model.
In particular, the inventor considers that the prior art generally needs to acquire training sample data in advance before training a neural network model, however, the training sample data may not be uniformly distributed in a corresponding data space, so that the training sample data lacks diversity and is not suitable for training the neural network model, and in order to further solve the technical problem, a first step to a fifth step are proposed to generate training sample data based on the labeled sample data.
In a first step, the marked sample data is divided into a first category and a second category according to different characteristic data of the marked sample data, a machine learning model is trained through the marked sample data of the first category and the marked sample data of the second category, so that other unmarked sample data are input into the trained machine learning model, the machine learning model can respectively output similarity scores of the other unmarked sample data and the marked sample data of the first category, the higher the similarity score is, the greater the probability that the corresponding other unmarked sample data belong to the first category is, the other unmarked sample data can be divided into the first category and the second category through the machine learning model, wherein the machine learning model can be a support vector machine, dividing other unlabeled sample data with similarity score greater than zero into a first category based on similarity scores respectively output by the machine learning model, dividing other unlabeled sample data with similarity score smaller than zero into a second category, dividing unlabeled sample data with similarity score closer to zero into a first category and a second category, wherein the machine learning model is difficult to estimate the categories because the unlabeled sample data in the first category and the unlabeled sample data in the second category are closer to classification boundaries, and in a third step, respectively carrying out classification processing on the unlabeled sample data in the first category and the unlabeled sample data in the second category, respectively comprising a first number of different categories, respectively calculating average values of similarity scores of the unlabeled sample data in the first category after the classification processing, and calculating the absolute value of the mean value of the similarity scores of the unlabeled sample data of the respective categories in the second category respectively, and sorting them from small to large, selecting the corresponding category in the first category of the second number of the first categories with the highest sorting or the corresponding category in the second category, selecting the unlabeled sample data of the third number from the unlabeled sample data of the corresponding category in the first category or the corresponding category in the second category for labeling processing in the fourth step, and deleting the unlabeled sample data, wherein the labeling processing can be performed manually or by other automatic labeling methods, and in the fifth step, the current all labeled sample data are used for forming training sample data, and if the ending condition is not met, the first step is skipped, the machine learning model is retrained by using the current all labeled sample data, and the subsequent steps are executed. By the method, training sample data suitable for training the neural network model can be generated, and better training results can be easily obtained by training the neural network model by using the training sample data.
Further, selecting randomly a preset third number of unlabeled sample data from unlabeled sample data of a corresponding category in the first category or a corresponding category in the second category, and sorting the unlabeled sample data according to the order of the absolute value of the similarity score from small to large, and then selecting the unlabeled sample data with the highest sorting in the unlabeled sample data of the corresponding category in the first category or the corresponding category in the second category.
Specifically, the method for randomly selecting unlabeled sample data of a corresponding category in the first category or unlabeled sample data of a corresponding category in the second category can enable training sample data to contain not only labeled sample data generated by unlabeled sample data close to a classification boundary of a machine learning model but also labeled sample data generated by unlabeled sample data far away from the classification boundary of the machine learning model as much as possible, so that diversity of training sample data is guaranteed, in addition, the unlabeled sample data of the corresponding category in the first category or the unlabeled sample data of the corresponding category in the second category can be ordered according to the order of the absolute value of similarity score from small to large, unlabeled sample data of the corresponding category in the first category or the unlabeled sample data of the corresponding category in the second category with the highest ranking is selected, so that training sample data contains more labeled sample data generated by unlabeled sample data close to the classification boundary of the machine learning model, the learning effect of the neural network model is improved, the diversity of training sample data is further reduced, the problem of the training sample data can be further solved, the similarity can be guaranteed from the first category to small to the largest in the order of the similarity score, and the similarity can be further guaranteed from the largest in the similarity score to the largest in the similarity score from the first category to the largest in the similarity score, and the similarity can be ranked from the largest in the similarity score to the largest in the similarity score is guaranteed.
Further, before training the neural network model using the plurality of training sample data, it further includes dividing neurons of different layers of the neural network model in the up-down direction, except for neurons of the input layer and the output layer, while neurons of different layers in the front-back direction are divided into a first neural network model in units of combinations of neurons of different layers capable of independently extracting data features, and a second neural network model.
Further, the process of obtaining the first neural network model and the second neural network model will be described with reference to fig. 3, wherein fig. 3 shows a schematic diagram of a neural network model, which is composed of a module 1 to a module 6 and connections between different modules, fig. 3 omits neurons in each module, wherein a module 1 represents an input layer of the neural network model, a module 2 represents an output layer of the neural network model, and a module 3, a module 4, a module 5, a module 6 represents a combination of neurons in different layers capable of independently extracting data features in the neural network model, for example, when the neural network model is a convolutional neural network, a module 3, a module 4, a module 5, a module 6 respectively contains a convolutional layer and a pooling layer, and thus, the first neural network model includes a connection between a module 1, a module 3, a module 5, a module 2, a connection between a module 1 and a module 3, a connection between a module 5, and a module 2, and a connection between a module 4, a module 4 and a module 6, a connection between a module 4 and a module 6 is deleted, and a module 6 between a module 4 and a module 6.
Further, training the neural network model using the plurality of training sample data, comprising the steps of:
the method comprises the steps of firstly, training a first neural network model through all training sample data, inputting all training sample data into the trained first neural network model, respectively calculating the error of the trained first neural network model on each training sample data, and further determining a plurality of training sample data with the error larger than a preset error threshold;
training the second neural network model by using a plurality of training sample data with errors larger than a preset error threshold, setting parameters of the neural network model at the beginning of training according to the trained parameters of the first neural network model and the trained parameters of the second neural network model, and training the neural network model through all training sample data;
and thirdly, judging whether the error of the trained neural network model is smaller than or equal to a preset error threshold value, if so, ending the training of the neural network model, and if so, continuing the step of jumping to start.
Specifically, after the first neural network model and the second neural network model are obtained, respectively training the first neural network model and the second neural network model through different training sample data, and retraining the neural network model on the basis of the training results of the first neural network model and the second neural network model, so that the training time of the neural network model can be shortened, in the first step, firstly, the first neural network model is trained through all training sample data, then all training sample data are input into the trained first neural network model, the training sample data with errors larger than an error threshold value are searched, in the second step, the second neural network model is trained through the training sample data with errors larger than the error threshold value, and after the first neural network model and the second neural network model are trained, the first neural network model and the second neural network model have already determined their trained parameters, such as weights on connections between different neurons, and parameters at the beginning of training via the network model can be set according to the trained parameters of the first neural network model and the second neural network model, and it should be noted that, for example, the connection between the module 4 and the module 5, and the parameters corresponding to the connection between the module 3 and the module 6 cannot be set according to the trained parameters of the first neural network model and the second neural network model, and then the neural network model is further trained by all training sample data, in the third step, if the error of the trained neural network model is less than or equal to the error threshold, the trained neural network model is ended, otherwise, the first step is skipped to continue execution, until the error of the trained neural network model is less than or equal to the error threshold.
The embodiment of the invention also provides a water quality parameter inversion device, as shown in fig. 4, which comprises:
the data acquisition module 401 is configured to acquire a first unlabeled sample data set corresponding to a water surface monitoring point, and select a second unlabeled sample data set in the first unlabeled sample data set based on a preset selection condition, where the first unlabeled sample data set includes a water quality parameter data set and a remote sensing data set; see for details the relevant description of step S1 in the method embodiment.
An adding and generating module 402, configured to add a first feature data set to the second unlabeled sample data set based on a preset adding requirement, obtain a first labeled sample data set, and generate a training sample data set based on the first labeled sample data set; see for details the relevant description of step S2 in the method embodiment.
The training module 403 is configured to train a preset neural network model based on the training sample data set, so as to obtain a target neural network model; see for details the relevant description of step S3 in the method embodiment.
An acquiring and inputting module 404, configured to acquire a third unlabeled sample data set, and input the third unlabeled sample data set into the target neural network model to obtain a second feature data set corresponding to the third unlabeled sample data set; see for details the relevant description of step S4 in the method embodiment.
The processing module 405 is configured to obtain a water quality parameter inversion result based on the second feature data set through processing by a preset regression analysis method; see for details the relevant description of step S5 in the method embodiment.
According to the water quality parameter inversion device provided by the embodiment of the invention, the water quality parameter data and the remote sensing data are fused, the accurate inversion of the water quality parameters can be performed based on the water quality parameter data and the remote sensing data, and the technical problem that the point source monitoring data are not considered to play a role in the water quality inversion in the prior art is solved; meanwhile, a training sample data set which can easily enable the neural network model to obtain a good training result can be provided, and the technical problems that training sample data generated in the prior art is not suitable for training the neural network model and the training sample data lacks diversity are solved.
As an optional implementation manner of the embodiment of the present invention, the first unlabeled sample data set further includes a fourth unlabeled sample data set that does not meet the preset selection condition; the adding and generating module comprises: the dividing sub-module is used for dividing the first labeling sample data set based on the first characteristic data set to obtain a first class labeling sample data set and a second class labeling sample data set; the first training sub-module is used for training a preset machine learning model based on the first class labeling sample data set and the second class labeling sample data set to obtain a target machine learning model; the input sub-module is used for inputting the fourth unlabeled sample data set into the target machine learning model to obtain a first similarity score set, and the first similarity score set characterizes the similarity between the fourth unlabeled sample data set and the first class labeled sample data set; the first comparison sub-module is used for comparing each first similarity score in the first similarity score set with a preset value, and dividing the fourth unlabeled sample data set according to the comparison result to obtain a first class unlabeled sample data set; the first processing sub-module is used for processing the first class unlabeled sample data set by utilizing the target machine learning model and a preset dividing method until a second labeled sample data set meeting a preset ending condition is obtained; a first determination sub-module for determining the training sample data set based on the second labeled sample data set.
As an optional implementation manner of the embodiment of the present invention, the first processing sub-module includes: the dividing unit is used for dividing the first class unlabeled sample data set to obtain a preset first number of first class unlabeled sample sub-data sets; the processing unit is used for obtaining a first similarity average score data set through the target machine learning model processing based on each first class unlabeled sample sub-data set; the first determining unit is used for determining a second class unlabeled sample sub-data set with a preset second number in each first class unlabeled sample data set through a preset sorting selection method based on the first similarity average score data set; a second determining unit, configured to determine a preset third number of third class unlabeled sample sub-data sets in each of the second class unlabeled sample sub-data sets; the labeling unit is used for labeling each of the third-class unlabeled sample sub-data sets to obtain a third labeled sample data set corresponding to each of the third-class unlabeled sample sub-data sets; the judging unit is used for judging whether the preset ending condition is met or not; and the repeating unit is used for repeating the steps from the step of dividing the first class unlabeled sample data set to the step of labeling each third class unlabeled sample sub-data set to the step of obtaining a third labeled sample data set corresponding to each third class unlabeled sample sub-data set when the preset end condition is met, stopping until the preset end condition is met, and obtaining the second labeled sample data set, wherein the second labeled sample data set comprises the third labeled sample data set corresponding to each third class unlabeled sample sub-data set.
As an alternative implementation manner of the embodiment of the present invention, the apparatus further includes: the sorting module is used for sorting the unlabeled sample sub-data sets of each second category based on the first similarity average score data set to obtain a sorting result; and the selection module is used for selecting the third class unlabeled sample sub-data set with the preset third number in each second class unlabeled sample sub-data set based on the sorting result.
As an optional implementation manner of the embodiment of the present invention, the adding and generating module further includes: the second comparison sub-module is used for comparing each first similarity score in the first similarity score set with a preset value, and dividing the fourth unlabeled sample data set according to the comparison result to obtain a second class unlabeled sample data set; the second processing sub-module is used for processing the second class unlabeled sample data set by utilizing the target machine learning model and a preset dividing method until the second labeled sample data set meeting the preset ending condition is obtained; a second determination sub-module for determining the training sample data set based on the second labeled sample data set.
As an alternative implementation manner of the embodiment of the present invention, the apparatus further includes: the neuron acquisition module is used for acquiring a first neuron and a second neuron corresponding to the preset neural network model; the division module is used for dividing the preset neural network model by utilizing preset division conditions based on the first neuron and the second neuron to obtain a first neural network model and a second neural network model.
As an optional implementation manner of the embodiment of the present invention, the training module includes: the second training sub-module is used for training the first neural network model by utilizing the training sample data set to obtain a first neural network training model; the input and calculation sub-module is used for inputting the training sample data set into the first neural network training model and calculating a first error value of each training sample data in the first neural network training model and the training sample data set; a second determining sub-module, configured to determine a first training sample sub-data set in the training sample data set based on each of the first error values and a preset error threshold; the third training sub-module is used for training the second neural network model by utilizing the first training sample sub-data set to obtain a second neural network training model; an acquisition sub-module for acquiring first model parameters of the first neural network training model and second model parameters of the second neural network training model; a third determining sub-module configured to determine a third model parameter of the preset neural network model based on the first model parameter and the second model parameter; and the fourth training sub-module is used for training the preset neural network model based on the training sample data set and the third model parameter, stopping training until the second error value of each training sample data in the training sample data set and the output value of the trained preset neural network model are smaller than or equal to the preset error threshold value, and obtaining the target neural network model.
The functional description of the water quality parameter inversion device provided by the embodiment of the invention is detailed in the description of the water quality parameter inversion method in the embodiment.
The embodiment of the present invention further provides a storage medium, as shown in fig. 5, on which a computer program 501 is stored, which when executed by a processor, implements the steps of the water quality parameter inversion method of the embodiment. The storage medium may be a magnetic Disk, an optical disc, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), a Solid State Drive (SSD), or the like; the storage medium may also include a combination of types of memory.
Those skilled in the art will appreciate that implementing all or part of the processes in the embodiment methods may be accomplished by computer programs, which may be stored on a computer readable storage medium, and which when executed may include processes such as those of the embodiments of the respective methods. The storage medium may be a magnetic Disk, an optical disc, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), a Solid State Drive (SSD), or the like; the storage medium may also include a combination of types of memory.
The embodiment of the present invention further provides an electronic device, as shown in fig. 6, which may include a processor 61 and a memory 62, where the processor 61 and the memory 62 may be connected by a bus or other means, and in fig. 6, the connection is exemplified by a bus.
The processor 61 may be a central processing unit (Central Processing Unit, CPU). Processor 61 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or a combination of various types of chips.
The memory 62 serves as a non-transitory computer readable storage medium that may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as corresponding program instructions/modules in embodiments of the present invention. The processor 61 executes various functional applications of the processor and data processing, i.e., implements the water quality parameter inversion method in the method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 62.
The memory 62 may include a memory program area that may store an operating device, an application program required for at least one function, and a memory data area; the storage data area may store data created by the processor 61, etc. In addition, the memory 62 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 62 may optionally include memory located remotely from processor 61, which may be connected to processor 61 via a network. Examples of networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory 62, which when executed by the processor 61, perform the water quality parameter inversion method of the embodiment shown in fig. 1-3.
Specific details of the electronic device may be understood in view of the corresponding relevant descriptions and effects in the embodiments shown in fig. 1 to 3, which are not repeated here.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims (10)

1. A water quality parameter inversion method, the method comprising:
acquiring a first unlabeled sample data set corresponding to a water surface monitoring point, and selecting a second unlabeled sample data set in the first unlabeled sample data set based on a preset selection condition, wherein the first unlabeled sample data set comprises a water quality parameter data set and a remote sensing data set;
adding a first characteristic data set into the second unlabeled sample data set based on a preset adding requirement to obtain a first labeled sample data set, and generating a training sample data set based on the first labeled sample data set;
training a preset neural network model based on the training sample data set to obtain a target neural network model;
acquiring a third unlabeled sample data set, and inputting the third unlabeled sample data set into the target neural network model to obtain a second characteristic data set corresponding to the third unlabeled sample data set;
and processing by a preset regression analysis method based on the second characteristic data set to obtain a water quality parameter inversion result.
2. The method of claim 1, wherein the first unlabeled sample data set further includes a fourth unlabeled sample data set that does not meet the preset selection condition; generating a training sample data set based on the first labeled sample data set, comprising:
Dividing the first labeling sample data set based on the first characteristic data set to obtain a first class labeling sample data set and a second class labeling sample data set;
training a preset machine learning model based on the first class labeling sample data set and the second class labeling sample data set to obtain a target machine learning model;
inputting the fourth unlabeled sample data set into the target machine learning model to obtain a first similarity score set, wherein the first similarity score set characterizes the similarity between the fourth unlabeled sample data set and the first class labeled sample data set;
comparing each first similarity score in the first similarity score sets with a preset value, and dividing the fourth unlabeled sample data set according to the comparison result to obtain a first class unlabeled sample data set;
processing the first class unlabeled sample data set by using the target machine learning model and a preset dividing method until a second labeled sample data set meeting a preset ending condition is obtained;
the training sample data set is determined based on the second labeled sample data set.
3. The method of claim 2, wherein processing the first class unlabeled sample dataset using the target machine learning model and a preset partitioning method until a second labeled sample dataset is obtained that meets a preset end condition comprises:
dividing the first class unlabeled sample data set to obtain a preset first number of first class unlabeled sample sub-data sets;
based on the unlabeled sample sub-data sets of each first category, obtaining a first similarity average score data set through the target machine learning model processing;
determining a second-class unlabeled sample sub-data set with a preset second number in each first-class unlabeled sample data set through a preset sorting selection method based on the first-class average score data set;
determining a preset third number of third class unlabeled sample sub-data sets in each of the second class unlabeled sample sub-data sets;
labeling each of the third-class unlabeled sample sub-data sets to obtain a third labeled sample data set corresponding to each of the third-class unlabeled sample sub-data sets;
Judging whether the preset ending condition is met or not;
and when the preset end condition is met, repeating the steps of dividing the first class unlabeled sample data set to obtain a preset first number of first class unlabeled sample sub-data sets, labeling each third class unlabeled sample sub-data set to obtain third labeled sample data sets corresponding to each third class unlabeled sample sub-data set, stopping until the preset end condition is met, and obtaining a second labeled sample data set, wherein the second labeled sample data set comprises the third labeled sample data set corresponding to each third class unlabeled sample sub-data set.
4. A method according to claim 3, characterized in that the method further comprises:
sorting the unlabeled sample sub-data sets of each second category based on the first similarity average score data set to obtain a sorting result;
and selecting the preset third number of unlabeled sample sub-data sets of the third category in each unlabeled sample sub-data set of the second category based on the sorting result.
5. The method of claim 2, wherein after inputting the fourth unlabeled exemplar data set into the target machine learning model to obtain a first set of similarity scores, the method further comprises:
Comparing each first similarity score in the first similarity score sets with a preset value, and dividing the fourth unlabeled sample data set according to the comparison result to obtain a second class unlabeled sample data set;
processing the second class unlabeled sample data set by using the target machine learning model and a preset dividing method until the second labeled sample data set meeting the preset ending condition is obtained;
the training sample data set is determined based on the second labeled sample data set.
6. The method of claim 1, wherein prior to training a pre-set neural network model based on the training sample dataset to obtain a target neural network model, the method further comprises:
acquiring a first neuron and a second neuron corresponding to the preset neural network model;
and dividing the preset neural network model by utilizing preset dividing conditions based on the first neuron and the second neuron to obtain a first neural network model and a second neural network model.
7. The method of claim 6, wherein training a pre-set neural network model based on the training sample dataset to obtain a target neural network model comprises:
Training the first neural network model by using the training sample data set to obtain a first neural network training model;
inputting the training sample data set into the first neural network training model, and calculating a first error value of each training sample data in the first neural network training model and the training sample data set;
determining a first training sample sub-data set in the training sample data set based on each of the first error values and a preset error threshold;
training the second neural network model by using the first training sample sub-data set to obtain a second neural network training model;
acquiring first model parameters of the first neural network training model and second model parameters of the second neural network training model;
determining a third model parameter of the preset neural network model based on the first model parameter and the second model parameter;
training the preset neural network model based on the training sample data set and the third model parameter until the second error value of each training sample data in the training sample data set and the output value of the trained preset neural network model are smaller than or equal to the preset error threshold value, and obtaining the target neural network model.
8. A water quality parameter inversion apparatus, the apparatus comprising:
the data acquisition module is used for acquiring a first unlabeled sample data set corresponding to the water surface monitoring point, and selecting a second unlabeled sample data set in the first unlabeled sample data set based on a preset selection condition, wherein the first unlabeled sample data set comprises a water quality parameter data set and a remote sensing data set;
the adding and generating module is used for adding a first characteristic data set into the second unlabeled sample data set based on a preset adding requirement to obtain a first labeled sample data set, and generating a training sample data set based on the first labeled sample data set;
the training module is used for training a preset neural network model based on the training sample data set to obtain a target neural network model;
the acquisition and input module is used for acquiring a third unlabeled sample data set, inputting the third unlabeled sample data set into the target neural network model, and obtaining a second characteristic data set corresponding to the third unlabeled sample data set;
and the processing module is used for processing the second characteristic data set by a preset regression analysis method to obtain a water quality parameter inversion result.
9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program for causing the computer to execute the water quality parameter inversion method according to any one of claims 1 to 7.
10. An electronic device, comprising: a memory and a processor, said memory and said processor being communicatively connected to each other, said memory storing a computer program, said processor executing said computer program to perform the water quality parameter inversion method according to any one of claims 1 to 7.
CN202310615470.8A 2023-05-24 2023-05-24 Water quality parameter inversion method and device, storage medium and electronic equipment Pending CN116701931A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310615470.8A CN116701931A (en) 2023-05-24 2023-05-24 Water quality parameter inversion method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310615470.8A CN116701931A (en) 2023-05-24 2023-05-24 Water quality parameter inversion method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN116701931A true CN116701931A (en) 2023-09-05

Family

ID=87828526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310615470.8A Pending CN116701931A (en) 2023-05-24 2023-05-24 Water quality parameter inversion method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN116701931A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114154570A (en) * 2021-11-30 2022-03-08 深圳壹账通智能科技有限公司 Sample screening method and system and neural network model training method
WO2023019908A1 (en) * 2021-08-19 2023-02-23 上海商汤智能科技有限公司 Method and apparatus for generating training sample set, and electronic device, storage medium and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023019908A1 (en) * 2021-08-19 2023-02-23 上海商汤智能科技有限公司 Method and apparatus for generating training sample set, and electronic device, storage medium and program
CN114154570A (en) * 2021-11-30 2022-03-08 深圳壹账通智能科技有限公司 Sample screening method and system and neural network model training method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
石爱业;徐立中;杨先一;黄凤辰;: "基于知识和遥感图像的神经网络水质反演模型", 中国图象图形学报, no. 04, pages 521 - 528 *

Similar Documents

Publication Publication Date Title
KR102329152B1 (en) Automatic defect classification without sampling and feature selection
CN110046706B (en) Model generation method and device and server
CN111160469B (en) Active learning method of target detection system
CN111476302A (en) fast-RCNN target object detection method based on deep reinforcement learning
CN109117863B (en) Insulator sample expansion method and device based on deep convolution generation countermeasure network
CN109754089B (en) Model training system and method
CN111144561A (en) Neural network model determining method and device
CN112149721B (en) Target detection method for reducing labeling requirements based on active learning
EP3596655B1 (en) Method and apparatus for analysing an image
CN110705573A (en) Automatic modeling method and device of target detection model
CN113239975B (en) Target detection method and device based on neural network
CN111242176B (en) Method and device for processing computer vision task and electronic system
CN115456107A (en) Time series abnormity detection system and method
CN109978058B (en) Method, device, terminal and storage medium for determining image classification
JP7150918B2 (en) Automatic selection of algorithm modules for specimen inspection
CN117290719A (en) Inspection management method and device based on data analysis and storage medium
CN116701931A (en) Water quality parameter inversion method and device, storage medium and electronic equipment
CN114580517A (en) Method and device for determining image recognition model
CN110705695B (en) Method, device, equipment and storage medium for searching model structure
CN114722942A (en) Equipment fault diagnosis method and device, electronic equipment and storage medium
CN113139332A (en) Automatic model construction method, device and equipment
Wibowo Performance Analysis of Deep Learning Models for Sweet Potato Image Recognition
CN113793604B (en) Speech recognition system optimization method and device
CN116188998B (en) Method, device, equipment and storage medium for identifying defects of overhead transmission line
EP4300372A1 (en) Method, device and computer program for training an artificial neural network for object recognition in images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination