CN114943290B

CN114943290B - Biological intrusion recognition method based on multi-source data fusion analysis

Info

Publication number: CN114943290B
Application number: CN202210575412.2A
Authority: CN
Inventors: 陈碧云
Original assignee: Yancheng Teachers University
Current assignee: Yancheng Teachers University
Priority date: 2022-05-25
Filing date: 2022-05-25
Publication date: 2023-08-08
Anticipated expiration: 2042-05-25
Also published as: CN114943290A; NL2034409A; NL2034214A

Abstract

The invention discloses a biological intrusion recognition method based on multi-source data fusion analysis, which comprises the following steps: acquiring a multi-source data set containing the invasive biological data and marking the invasive biological data; the data set includes: text data, picture data, time data, geographic location data; classifying the text data and outputting a text probability matrix with marks; identifying the position of an invading organism in the picture according to the picture data, determining the boundary and the size, and training a picture probability matrix with marks; performing single-heat coding on the time data, and constructing a time-space feature matrix through the coded data and the geographic position data; constructing a multi-feature vector according to the text probability matrix, the picture probability matrix and the time-space feature matrix; weight distribution is carried out on the multi-feature vectors, and a binary classifier is trained by using a machine learning algorithm; inputting the data to be predicted into a binary classifier to obtain the invasive biological data.

Description

Biological intrusion recognition method based on multi-source data fusion analysis

Technical Field

The invention relates to the technical field of big data artificial intelligence, in particular to a biological intrusion recognition method based on multi-source data fusion analysis.

Background

With the acceleration of the progress of globalization and the change of land use patterns, biological invasion has become a worldwide ecological safety problem. Studies have shown that global intrusion prevention total costs from 1970 to 2017 reach at least 1.288 trillion dollars, with an average annual cost of 268 billions dollars, and no trace of slowing down the rate of growth. The research on the field of biological invasion is still in the primary stage, and in recent years, the research has been developed into a new field combining global change and ecological sustainable management under the large background of global ecological change. The means for preventing and controlling biological invasion at the present stage mainly comprises: establishing a corresponding monitoring system to find out the types, the numbers, the distribution and the roles of the foreign species; reinforcing propaganda and education on the harm of biological invasion and improving the precaution awareness of society; technology for identifying and preventing foreign invasive species is actively sought to effectively suppress the spreading trend of the current biological invasion. In summary, accurate identification of foreign species is critical.

At present, the artificial intelligence technology is becoming a new engine in the field of ecological resources, research on species identification by utilizing the artificial intelligence technology is started earlier, effects exceeding the traditional classifier are achieved in plant, animal and specimen identification, and deep learning in the artificial intelligence technology is widely applied to species image identification. In plant classification (Lee et al, 2015, 2016), mohanty et al (2016) have achieved classification of 38 plant diseases based on images using a deep learning approach. Carranza-Rojas et al (2017) have implemented classification of thousands of species based on specimen pictures with convolutional and migrating neural networks. In addition to classifying individual images with CNN, taghavi et al (2018) uses LSTM to classify the phenotype and genotype of features of CNN extracted time series images. Norouzzadeh et al automatically identify the category of animals based on image data acquired by a camera trap and count the number of the animals, so that the monitoring of animal population is realized, but the identification accuracy is lower in a complex environment background. In order to solve the problem of low accuracy of monitoring image recognition caused by a complex field environment background, the sound emitted by animals is also used as an important data source.

With the progress of observation technology, the species monitoring system is continuously perfected, and the acquisition capability of long-time, cross-scale and massive heterogeneous multi-source data is remarkably improved. Research published in Science by the American academy of sciences of the United states of America, P.Asner et al shows that plant functional division is performed on the whole Peruvian forest by integrating massive and high-precision hyperspectral and laser radar data, so that corresponding forest management and protection countermeasures are provided for each region. Breaks through the limitation that the plant group with complex structure and high biological diversity can not be accurately monitored. It is noted that there is a problem of whether the data structure, the precision, etc. match in the fusion process of the multi-source data. The monitoring information obtained by the method comprises various different types of data, and how to use multi-feature data to rapidly identify and intelligently diagnose the foreign species and to perform risk analysis and prognosis based on the multi-feature data is a very valuable research problem.

Disclosure of Invention

The research on the aspect is recently reported, and based on the background, a biological intrusion recognition method based on multi-source data fusion analysis is provided. Firstly, probability pre-judging is carried out on data by using a deep learning method, then data weight is distributed based on an entropy weight method, and finally, multi-feature data is comprehensively judged by adopting an SVM method. The invention takes the washington event invaded by the hornet as an example, and analyzes and verifies the practicability of the algorithm. The result shows that the algorithm can be applied to rapid identification and monitoring of species, and meanwhile, the change development trend of the species along with time can be predicted. Provides basis for formulating corresponding reasonable and efficient protection and management measures.

The invention provides a biological intrusion recognition method based on multi-source data fusion analysis, which comprises the following steps:

acquiring a multi-source data set containing the invasive biological data and marking the invasive biological data; the dataset comprises: text data, picture data, time data, geographic location data.

And classifying the text data and outputting a text probability matrix with marks.

And identifying the position of the invading organism in the picture according to the picture data, determining the boundary and the size, and training a picture probability matrix with marks.

And performing single-heat coding on the time data, and constructing a time-space feature matrix through the coded data and the geographic position data.

Constructing a multi-feature vector according to the text probability matrix, the picture probability matrix and the time-space feature matrix; and carrying out weight distribution on the multi-feature vectors, and training a binary classifier by using a machine learning algorithm.

Inputting the data to be predicted into a binary classifier to obtain the invasive biological data.

Further, classifying the text data specifically includes: removing stop words from the Text data, constructing N-gram features by using Fast-Text, performing sliding window operation with the size of N on the Text content according to byte sequence, finally forming a byte fragment sequence with the length of N, taking the generated sequence as a Text feature candidate set, screening out important features, and outputting a Text probability matrix with marks by using Soft-Max.

Further, the training of the tagged picture probability matrix specifically includes: and determining the position of the invasive organism to be identified according to the picture data through a picture identification algorithm such as CNN, amplifying the position, determining the boundary and the picture size, and training a picture probability matrix with marks by using CNN.

Further, the weight distribution is carried out on the multi-feature vectors, and a binary classifier is trained by utilizing a machine learning algorithm, which specifically comprises the following steps: and normalizing the multi-feature vectors, performing weight distribution by using an entropy weight method, and training into a binary classifier by using a machine learning algorithm SVM.

Further, inputting the data to be predicted into a binary classifier to obtain the invasive biological data, which specifically comprises: the data to be predicted is input, the SVM is used for making a final mark, when the output mark is 1, the data uploaded by the user at the place is true, the data representing that invasive species appear at the place, and the data should be processed in time.

Further, the biological intrusion recognition method based on multi-source data fusion analysis further comprises the following steps: and predicting migration or reproduction rules of future invading organisms by using a GM model for the time-space characteristic matrix.

Further, the classifier in the binary classifier includes: random forest, logistic regression, neural network.

Compared with the prior art, the biological intrusion recognition method based on multi-source data fusion analysis has the following beneficial effects:

the biological intrusion recognition method based on multi-source data fusion analysis fuses text data, picture data, time data and geographic position data, is applied to rapid recognition and monitoring of species, can also pre-judge the change development trend of the species along with time, and provides a basis for formulating corresponding reasonable and efficient protection and management measures.

Drawings

FIG. 1 is a Fast-Text flow diagram;

FIG. 2 is a detailed flow chart from the full connection layer to the output layer;

FIG. 3 is a graph of probability distribution of 11-100 training sets;

FIG. 4 is a statistical plot of test set recall;

FIG. 5 is a chart of test set accuracy statistics;

FIG. 6 is a random predictive graph of a training model;

FIG. 7 is a plot of variability index for a training model;

FIG. 8 is a schematic representation of the actual geographic location of a training model;

FIG. 9 is a thermal prediction map;

fig. 10 is a flowchart of a biological intrusion recognition method based on multi-source data fusion analysis.

Detailed Description

The following describes the embodiments of the present invention further with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

Example 1

The invention provides a biological intrusion recognition method based on multi-source data fusion analysis, which is shown in an overall flow chart in fig. 10, and comprises the following steps:

acquiring a multi-source data set containing the invasive biological data and marking the invasive biological data; the data set includes: text data, picture data, time data, geographic location data.

The data set consists of texts, pictures, time and geographic positions, the data range of the data set is enlarged, and a comparison experiment can be carried out by dividing the data set into a training set and a testing set, so that the technical effect of the algorithm is conveniently verified.

Classifying the text data and outputting a text probability matrix with marks; classifying the text data specifically comprises: removing stop words from the Text data, constructing N-gram features by using Fast-Text, performing sliding window operation with the size of N on the Text content according to byte sequence, finally forming a byte fragment sequence with the length of N, taking the generated sequence as a candidate set, screening out important features, and outputting a Text probability matrix with marks by using Soft-Max.

The comment data of the invading organisms provided by the people can be helpful for the laboratory to judge, so that the comment data of the invading organisms has great influence on judging whether the invading organisms are the text data. To ensure morphological features within each word, feature extraction is performed on each recorded word vector. The text content is subjected to sliding window operation with the size of N according to the byte sequence, and finally a byte fragment sequence with the length of N is formed. Wherein'<'represents a prefix'>' represents a suffix. From'<>The 'composed trigram' can be used to represent a word, and further 5 vector stacks can be used to better representA word vector. Converting discrete variable into continuous vector by Embedding to form word vector of the record, W _j ＝[w _1j ,w _2j, …w _ij, …,w _nj ]Wherein W is _j ＝w _1j ,w _2j ,...w _ij ,...,w _nj ]Representing the word vector in the j-th record. As shown in FIG. 1, which is a flow chart of Fast-Text, the hidden layer performs superposition averaging on a plurality of word vectors by using the word vectors processed by the Embedding as input features. And uses the negative log likelihood as a loss function:where N is text, x _n Is a word feature in text. y is _n Is a label, a and B are weight matrices, a is used to transform to text representation, B is used to linearly transform calculate class, and f is a Soft-Max function used to calculate the probability of the final class. And classifying the text data by adopting a hierarchical structure based on Huffman tree, and outputting the probability of each category.

The expression of hierarchical Soft-Max is as follows,wherein p (omega) _c |ω _t ) Is the final probability of the text. W represents a word vector.

Finally, a probability matrix of n records is obtained. As shown in Table 1, wherein T _n Represents n records, L _k Is represented by k categories, p _ij The probability of category j in the ith record is shown.

Table 1: text data probability matrix

Identifying the position of an invading organism in the picture according to the picture data, determining the boundary and the size, and training a picture probability matrix with marks; training a labeled picture probability matrix, comprising: and determining the position of the invasive organism to be identified according to the picture data through a picture identification algorithm CNN, amplifying the position, determining the boundary and the picture size, and training a picture probability matrix with marks by utilizing the CNN.

The image data information uploaded by people has great influence on laboratory judgment, so that the image data feature extraction is very important. Firstly, preprocessing picture data: deleting data that is not a picture, and modifying the suffix name. CNN convolutional neural networks are largely divided into convolutional layers (CONV), pooling layers (POOL), and fully-connected layers (FC). CNNs perform functions such as image recognition by continuously extracting features from local features to global features through individual filters. The picture processing by using the CNN neural network is divided into four steps: an input layer, a convolution layer, a downsampling layer, a full connection layer, and an output layer. The input layer uses RGB color image, convolves the output of RGB component with convolution layer weight W to get each C layer, then downsamples to get each S layer. Using the activation function, the output of these layers is called Feature-Map. The full connection layer expands each element of all Feature-maps in sequence and arranges the elements in a row. The classification is performed at the output layer using Soft-Max.

Fig. 2 is a specific flow diagram from the full connection layer to the output layer, wherein asian invading organisms are used as input layer data, and three separate 2D kernels are used to scale and ash the pictures, and the 3-channel RGB color images are quickly converted to 1-channel gray scale, since the data set is an RGB color image. After multiple convolution, pooling and activation, features are extracted, and whether the probability of Asian invasion organism is outputted by using a Soft-Max function through a full connection layer.

Wherein X (height×width×channel) is an input pixel matrix, Y is an output matrix, convolution pooling is performed to flatten the multi-dimension data, the multi-dimension data is connected with a full connection layer, and the output is a class probability classified by using traditional Soft-Max. A T x L vector is obtained in which each value represents the probability value of the input for all samples. This results in a classification probability for the image file, denoted by C.

Finally, a probability matrix for the picture data is obtained, as shown in Table 2, q _ij The probability of category j in the ith record is represented.

Table 2: picture data probability matrix

Performing single-heat coding on the time data, and constructing a time-space feature matrix through the coded data and the geographic position data; constructing a multi-feature vector according to the text probability matrix, the picture probability matrix and the time-space feature matrix; weight distribution is carried out on the multi-feature vectors, and a binary classifier is trained by using a machine learning algorithm; the method comprises the steps of carrying out weight distribution on the multi-feature vectors, and training a binary classifier by utilizing a machine learning algorithm, and specifically comprises the following steps: the multi-feature vector is standardized, the weight coefficient is distributed by adopting an entropy weight method, and a machine learning algorithm SVM is utilized to train into a binary classifier; the classifier in the binary classifier comprises: random forest, logistic regression, neural network.

Inputting data to be predicted into a binary classifier to obtain invasive biological data; inputting data to be predicted into a binary classifier to obtain invasive biological data, wherein the method specifically comprises the following steps of: the data to be predicted is input, the SVM is used for making a final mark, when the output mark is 1, the data uploaded by the user at the place is true, the data representing that invasive species appear at the place, and the data should be processed in time.

The biological intrusion recognition method based on the multi-source data fusion analysis further comprises the following steps: and (3) predicting migration or multiplication rules of future invading organisms by using a GM model for the time-space characteristic matrix.

The data set includes: witness report data aggregated by the department of agriculture of washington, year 2020, month 12, the dataset containing a spreadsheet of 4440 witness reports and 3305 image data uploaded by the user; the witness report data which has been subjected to laboratory discrimination is marked, and the discrimination is that the invasive organism is marked as 1, otherwise, as 0. 70% of all data in the data set is randomly divided into training sets and the remaining data in the data set is divided into test sets.

For witness reports provided by the people, each report is independent of the other, and the information and the characteristic values are not continuous, but discrete and unordered. The features can be digitized using one-hot encoding, also known as one-bit efficient encoding. The method is to encode N states using N-bit state registers, each with its own register bit, and at any time only one of the bits is valid.

Since the invading organism belongs to the foreign species, there may be fewer occurrence events, the GM model has good applicability to a small amount of incomplete information, and the variation range is calculated for the time and geographic location of the invading organism. GM (1, 1) model can be expressed as y=bu, and the prediction sample range represents the region division. In order to guarantee the feasibility of the GM (1, 1) modeling method, the necessary verification process is required for the known data. Let the original data column be x ⁽⁰⁾ ＝(x ⁰ (1),x ⁰ (2),…x ⁰ (n)), where x ⁽⁰⁾ For the raw time data columns, the rank ratio of the columns is calculated. If all the level ratios fall between the acceptable coverage areasIn, the array can build the GM (1, 1) model and can make grey predictions. Otherwise, the data is subjected to proper transformation processing, such as translation transformation: y is ⁽⁰⁾ (k)＝x(0)(k)+c,k＝1,2,…,n。

And obtaining the region range of the event occurrence in different records by using the model, and planning, predicting and sorting out the trend range. For the range predicted by GM, it is first judged whether it is within the range, and if it is 1, it is not 0. Using L (Location) = (0, 1) to represent the recorded geographical feature value, there is uncertainty about the time characteristic T due to the occurrence of an event, so using one-hot encoding can effectively distinguish between different time periods for discrete values.

Firstly, data feature extraction is carried out, and standardization is carried out according to four different features (text, picture, time and position). Providing text data and picture data for the masses has high realism, but is relatively wide in terms of time and geographical range and cannot represent a specific meaning, so that the overall numerical value needs to be weighted. The sum of the weights for the assignment completion is equal to 1.

For the extracted text data probability feature F and picture data probability feature C, 1/k is used for default values to represent. Since there is mutual independence between events, the probability of occurrence of k events at the same time is only 1/k, where the sum of k time probabilities should be 1. The absence of time and geographic location can be complemented by using the average of the upper and lower records.

For multi-feature events, the event randomness and disorder degree are judged by calculating entropy values, the degree of scattering of an index is judged, and the larger the degree of scattering of the index is, the larger the influence of the index on comprehensive evaluation is.

First, the feature vector x= { X is required ₁₁ ,...,X _Nj Z-Score normalization of the values between the matrices is required,where μ and σ represent all the mean and standard deviation in feature vector X.

The entropy weight of each index is calculated by utilizing the information entropy,wherein H is _j Information entropy indicating the j-th index.

To ensure 0.ltoreq.H _j Less than or equal to 1, and usually taking k=1/(lnm), calculating the deviation degree d of each index _j 。

The normalized eigenvector matrix is combined with each index w _j And multiplying to obtain a weighted multi-feature evaluation matrix V.

Since more feature vectors are provided, in the final multi-feature fusion classification, no overly complex algorithm is required because the probability statistics of the first round of predictive classification have been performed on the data. It is sufficient to use a conventional SVM classification model.

The input data X and the learning objective Y are given in the SVM classification problem.

X＝{X ₁ ，...，X _N }，

Y＝{Y ₁ ，...，Y _N }，

The input data here are x=f, C, L, T.

A simple explanation is given for FCLT, since each sample of input data contains multiple features, thus constituting feature space X (feature space): x= [ X ] _i ,…,X _N ]∈x，

While learning targets are binary variables, representing negative classes (negative classes) and positive classes (positive classes). If the feature space X in which the input data resides exists, it will be a hyperplane of decision boundaries (Decision Boundary): decision boundary w ^T X+b=0, then separating learning targets by positive and negative classes, and making the point-to-plane distance of any sample equal to or greater than 1: point to plane distance y _i *(w ^T X _i +b) is not less than 1, wherein the parameters w, b are normal vector and intercept of the hyperplane respectively.

Decision boundaries meeting this condition actually construct 2 parallel hyperplanes as interval boundaries to discriminate the classification of the sample:

positive sample

Negative sample

All samples above the upper interval boundary belong to the positive class and samples below the lower interval boundary belong to the negative class. The distance d between two spaced boundaries is defined as the margin (margin)

The positive and negative class samples located on interval boundaries are Support vectors (Support vectors).

The data is Text-classified using Fast-Text tools. The training set is subjected to data equalization operation to obtain 100 training set probability distributions, and the peak value is finally reached in the continuous training process.

As shown in FIG. 3, the probability tends to be 0.9 or 0.1 from about 10 th, and in the case of uneven samples, fast-Text helps to perform sample equalization processing and evaluate probability events well.

Training is carried out on sample data by out-of-order processing, and the recall rate and the accuracy rate of the training result of the Asian invasive biological problem are 94.6% as shown in fig. 4 and 5.

In practice, CNN models are built using the pyrerch framework. A 70% dataset was used for training and 30% data was used for testing. Due to the non-uniformity of the data samples, a simple oversampling is performed. The final training model performed well on the test set, several pictures were randomly selected, and then the trained model was used to predict as shown in FIG. 6 (true value is the actual class, prediction is the prediction result of the model, negative value indicates that the picture is Asian invading organism, positive indicates that the picture is not Asian invading organism

Finally, the metrics of the training model were evaluated as shown in table 3 and fig. 7.

Table 3: training metrics for models

For fewer invasion biological occurrence events, the GM model has good applicability to a small amount of incomplete information, and the change range of the invasion biological occurrence time and geographic position is calculated.

Taking c to make the level ratio of the data column fall in the acceptable coverage, and finding that the level ratio test value of the two data is in the standard range interval [0.857,1.166] after calculating the level ratio means that the data is suitable for GM (1, 1) model construction. After the data are checked, the development coefficient a, the gray dose b, and the posterior ratio C are calculated for the GM model as shown in table 4:

table 4: results of model construction

The posterior to C value of both models is less than 0.65, with a model of longitude of only 0.0468 of less than 0.35 indicating that the longitude model is particularly good. Predicting longitude and latitude, and checking residual errors after the longitude and latitude are predicted, wherein the residual errors comprise relative errors and level ratio deviations; for the longitude and latitude, the maximum value of the relative error values of the two groups of data is smaller than 0.1, and for the level ratio deviation value, if the maximum value of the relative error values is smaller than 0.1, the model fitting effect reaches the higher requirement. And draws the relevant scope from its geographic location as shown in fig. 8.

Calculating the distance between two points by using the difference value between the longitude and the latitude of the two points after predicting the longitude and the latitude, and calculating the goodness of fit of the two models: latitude R71.45% longitude R95.31%, as can be readily seen from figure 8, these are verified as true asian invading organisms for the sample Latitude range: [48.7775,49.1494], longitude range: [ -123.9431, -122.4186].

For the extracted text data probability feature F and picture data probability feature C, 1/k is used for default values to represent. Since mutual independence exists among events, the probability of occurrence of k events at the same moment is only 1/k. For the range predicted by GM, it is first judged whether it is within the range, and if it is 1, it is not 0. Using L (Location) = (0, 1) to represent the recorded geographical feature value, there is uncertainty about the time characteristic T due to the occurrence of an event, so using one-hot encoding can effectively distinguish between different time periods for discrete values.

The related weights are needed to be determined among the features, and the weight distribution of each feature is calculated by using an entropy weight method, as shown in table 5:

table 5: different characteristic weight distribution table

Longitude	0.048440
		Latitude	0.026739
Text	0.290734
		Image	0.258996
LabText	0.157505
		Year	0.115133
Month	0.102453

For several eigenvalues, the linear inseparable problem is converted by mapping the nonlinear inseparable problem from the original eigenvalue space to the higher-dimensional hilbert space by using a nonlinear function due to the fact that the hyperspectral exists in the eigenvalue space, and the linear regression calculation is used to obtain a table 6:

table 6: linear regression calculation table

MAE	0.08905792097395834
		MSE	0.4136931156194732
R ²	-0.3932832269742397

Discovery of R ² <0, the data may not have any linear relationship. For such hyperplane multi-feature problems, classification of kernel functions using radial basis function kernels has good convergence. And SVM multi-feature fusion analysis is carried out, and compared with a single Fast-Text and CNN neural network model, the analysis result has high accuracy. Has good universality compared with the common method of only predicting single data.

Training was performed on 300 sets of data, and classification evaluation was performed on 521 sets of data, resulting in table 7:

table 7: classification evaluation table

The diagonal line is the correct number of predictions, and the number of faults of the invasive organism is only once determined in 521 records, so that the records of the invasive organism and the occurrence range of the records can be accurately determined. The audience data which is not yet subjected to experimental judgment is predicted, and a thermodynamic diagram is drawn according to the predicted data and time, as shown in fig. 9: it was found that the washington portion of the year in the next half may also have traces of asian invading organisms, and such hidden danger may not be eliminated in a short time.

The multi-feature data fusion analysis algorithm is comprehensively evaluated, and as can be seen from the step 8, the multi-feature data source can be well combined by using the algorithm, and different events can be reasonably judged.

Table 8: comprehensive evaluation of multi-feature data fusion analysis algorithm

MSE	0.0007262164124909223
		MAE	0.0007262164124909223
R ²	0.9970754396397927
		ACC	0.9992737835875091
Recall	0.9992088607594937
		F2	0.9992737396242461
ROC	0.9992088607594937

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

Claims

1. The biological intrusion recognition method based on multi-source data fusion analysis is characterized by comprising the following steps of:

acquiring a multi-source data set containing the invasive biological data and marking the invasive biological data; the dataset comprises: text data, picture data, time data, geographic location data;

removing stop words from the Text data, constructing N-gram features by using Fast-Text, performing sliding window operation with the size of N on the Text content according to byte sequence, finally forming a byte fragment sequence with the length of N, taking the generated sequence as a Text feature candidate set, screening out important features, and outputting a Text probability matrix with marks by using Soft-Max;

identifying the position of an invading organism in the picture according to the picture data, determining the boundary and the size, and training a picture probability matrix with marks;

performing single-heat coding on the time data, and constructing a time-space feature matrix through the coded data and the geographic position data;

constructing a multi-feature vector according to the text probability matrix, the picture probability matrix and the time-space feature matrix; performing weight distribution on the multi-feature vectors, and training a binary classifier by using a machine learning algorithm;

2. The method for identifying biological intrusion based on multi-source data fusion analysis according to claim 1, wherein the method comprises the steps of:

the training of the marked picture probability matrix specifically comprises the following steps:

and determining the position of the invasive organism to be identified according to the picture data through a picture identification algorithm CNN, amplifying the position, determining the boundary and the picture size, and training a picture probability matrix with marks by utilizing the CNN.

3. The method for identifying biological intrusion based on multi-source data fusion analysis according to claim 1, wherein the method comprises the steps of:

the multi-feature vector is subjected to weight distribution, and a binary classifier is trained by using a machine learning algorithm, and the method specifically comprises the following steps:

and normalizing the multi-feature vectors, performing weight distribution by using an entropy weight method, and training into a binary classifier by using a machine learning algorithm SVM.

4. The method for identifying biological intrusion based on multi-source data fusion analysis according to claim 1, wherein the method comprises the steps of:

inputting data to be predicted into a binary classifier to obtain invasive biological data, wherein the method specifically comprises the following steps of:

and inputting data to be predicted, using the SVM as a final mark, and when the output mark is 1, representing that the data uploaded by a user at the time and place represented by the output mark is true, representing that invasive species appear at the time, and timely processing the data.

5. The method for biological intrusion identification based on multi-source data fusion analysis of claim 1, further comprising:

and predicting migration or reproduction rules of future invading organisms by using a GM model for the time-space characteristic matrix.

6. The method for identifying biological intrusion based on multi-source data fusion analysis according to claim 1, wherein the method comprises the steps of:

the classifier in the binary classifier comprises: random forest, logistic regression, neural network.