Method for identifying cephalopod population and species
Technical Field
The invention relates to the technical field of fishery management, in particular to a method for identifying cephalopod populations and types.
Background
Cephalopods belong to mollusks, all marine animals, and are widely distributed in various marine areas of the world, including three major groups, namely squid, cuttlefish and octopus. At present, about 700 kinds of cephalopods are available and the variety is various. Cephalopods are typically short-life cycle species, usually annual. Over the 70 s of the 20 th century, the world's marine fishing production has shifted in momentum due to over-fishing and declining of many underlying fish resources, but cephalopods and other shorter-life fish have changed the composition of the harvest yield, thereby maintaining the overall increase in harvest at a certain level. The annual average growth rate exceeds the annual growth rate of the total output of ocean fishing. With the drastic change of marine environment in recent years, the resource amount of cephalopods fluctuates remarkably year by year, and the resource amount tends to decrease year by year. Therefore, reasonable development and utilization of the cephalopod resources are also receiving attention of relevant scholars. Meanwhile, the accurate distinguishing of economic cephalopod species is also one of the key points for accurate yield statistics.
The species identification is an important basis of fishery research, and a proper identification method can not only improve the identification accuracy, but also better describe the species form. The existing cephalopods are similar in environment and ingestion mode, so that most types have highly similar appearances, and the type composition cannot be correctly judged in actual production. The traditional species identification method needs to rely on a laboratory to observe the form, needs abundant technical experience and related equipment to carry out complete species identification, is time-consuming and cannot be operated in batch.
The population is a basic unit for fishery resource assessment, and the accurate distinguishing of the population can lay a foundation for accurate resource assessment and management. Similar to species identification, population division is more dependent on morphological parameters, and corresponding results can be obtained through a series of data analysis methods. However, the traditional measurement method is often unsatisfactory in division result due to large artificial error, and cannot bring a better result to resource evaluation.
Geometric measurements that appeared later in the 80's of the last century no longer focused on changes in object size alone, but rather on analysis and reconstruction of shapes. The method abandons a large amount of redundant data in the traditional measurement method, analyzes the internal reasons of the change of the shape structure of the object by a statistical method after finding out a plurality of corresponding landmark points, and can redraw the shape of the object, so that the result is more visual and accurate. Meanwhile, with the rapid development of related technologies, the machine learning method is widely applied to classification analysis by integrating the characteristics of various models, and a good effect is achieved. Therefore, the accuracy of species identification and population division can be effectively improved by combining geometric form measurement with a machine learning method.
Disclosure of Invention
The invention provides a method for identifying the species and the species of cephalopods, which aims to extract important landmark points based on morphological characteristics of hard tissues of the cephalopods, perform dimension reduction processing according to landmark point information, and select the most appropriate parameters and methods to complete species identification and species identification by combining a machine learning method.
A head and foot group and category identification method comprises the following steps:
(1) first, extracting hard tissues of cephalopods (such as horny jaw and the like);
(2) extracting main appearance features of the hard tissue by using the landmark points;
(3) software calculates the average morphology and centroid size;
(4) performing dimensionality reduction on the acquired data, performing principal component analysis, and acquiring the first N groups of principal components with the proportion exceeding 80%;
(5) and taking the first N groups of main components as explanatory variables to perform differential analysis.
Preferably, the differential Analysis in the step (5) is to analyze the data by using Linear Discriminant Analysis (LDA), Classification Tree (CT), Naive Bayes (NB), Random Forest (RF) and Support Vector Machine (SVM), respectively.
Preferably, 70% of the data is used for modeling and 30% is used for verification.
Preferably, the differential analysis in the step (5) finally selects an optimal method according to the relevant parameters (sensitivity, specificity, Kappa number) and the judgment accuracy.
Has the advantages that: compared with the traditional method, the method can simulate the form of hard tissues, more intuitively see the difference, effectively improve the species or population discrimination accuracy, and expand the application to individual difference analysis, thereby providing a reference basis for fishery resource evaluation and habitat evaluation at different periods.
Drawings
FIG. 1 is a schematic diagram of hard tissue topographical features extracted according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of the main structural feature of hard tissue extracted by landmark points according to an embodiment of the present invention.
FIG. 3 is a structural diagram of the mean morphology and centroid size calculated using software in an embodiment of the present invention.
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the accompanying drawings: the present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following embodiments.
A method for identifying the population and species of cephalopods comprises the following steps:
(1) first, extracting hard tissues of cephalopods (such as horny jaw and the like);
(2) extracting main appearance features of the hard tissue by using the landmark points;
(3) software calculates the average morphology and centroid size;
(4) performing dimensionality reduction on the acquired data, performing principal component analysis, and acquiring the first N groups of principal components with the proportion exceeding 80%;
(5) and taking the first N groups of main components as explanatory variables to perform differential analysis.
The differential Analysis in the step (5) is to analyze the data respectively by utilizing Linear Discriminant Analysis (LDA), Classification Tree (CT), Naive Bayes Naive Nanive Bayes (RF) and Support Vector Machine (SVM).
70% of the data were used for modeling and 30% for verification.
And (5) performing differential analysis, and finally selecting an optimal method according to related parameters (sensitivity, specificity and Kappa number Kappa) and the judgment accuracy.
The following description uses the analysis of the horny jaw of octopus ocellatus, ootheca ovata, octopus variabilis, octopus vulgaris as an example. The samples are collected in the sea area of the north of the south China sea, and are immediately frozen and brought back to a laboratory after being collected by a trawl, and 100 samples including the upper jaw and the lower jaw are selected from each variety. The basic information is shown in table 1.
TABLE 1 basic information on four Octopus species
Utilize digital camera to shoot cutin jaw, guarantee that the upper and lower jaw of the photograph of cutin jaw is separated, make the edge profile of the side of cutin jaw show in the field of vision completely simultaneously, guarantee that the edge is clear visible.
The photographed cutin jaw is numbered and arranged, and named as 'variety-number-sex-trunk length'. Landmark point calibration was then performed on all cuticular jaw samples using tpsdig2 software. The specific calibration position is shown in fig. 2. Wherein the solid black dots represent calibrated fixed landmark points and the hollow dots represent calibrated sliding landmark points. All calibrated sample data is saved in tps format.
Using an R language program to perform rotation, reconstruction and other processing on the landmark point data stored in tps format to obtain the centroid size and average morphology of the cuticle jaw (fig. 3, where the black point on each connecting line is the centroid size of each landmark point, and the morphology obtained by the connecting line is the average morphology). And converting the two-dimensional data format into a one-dimensional data format which can be subjected to conventional data processing. Morphological differences and effects of the upper and lower jaws between the different species were analyzed using multi-factor analysis of variance (MANOVA). The results suggest that there were significant differences between the species and that there was no interactive effect of the keratinous jaw size on the morphological differences between the species (table 2).
TABLE 2 analysis of differences in morphology between the upper and lower jaws of four Octopus species
Marked differences are represented by ns, not marked differences are represented by alpha-0.05
Subsequently, the first 46 principal components were considered to account for the variation in 100% interspecies differences by principal component analysis. Of which the first 22 principal components may account for 80% of the above variations. Finally, discriminant analysis was performed using the first 22 principal components, with 70% of the data used for modeling and 30% for validation. The judgment result shows that the total judgment accuracy of the five methods is between 60% and 73%, wherein the judgment effect of the Support Vector Machine (SVM) is the best, the judgment effect exceeds 73%, and the kappa value is the highest, so the support vector machine method is selected for judgment in the case.
TABLE 2 four Octopus based discriminant analysis of the morphology of the keratinous jaw
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.