CN112614547A - Microbial serum type classification model training and microbial serum type classification method - Google Patents

Microbial serum type classification model training and microbial serum type classification method Download PDF

Info

Publication number
CN112614547A
CN112614547A CN202011580705.7A CN202011580705A CN112614547A CN 112614547 A CN112614547 A CN 112614547A CN 202011580705 A CN202011580705 A CN 202011580705A CN 112614547 A CN112614547 A CN 112614547A
Authority
CN
China
Prior art keywords
microbial
serum type
hyperplane
classification model
edge distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011580705.7A
Other languages
Chinese (zh)
Inventor
黄福桂
彭真
杨俊林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Hexin Instrument Co Ltd
Original Assignee
Guangzhou Hexin Instrument Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Hexin Instrument Co Ltd filed Critical Guangzhou Hexin Instrument Co Ltd
Priority to CN202011580705.7A priority Critical patent/CN112614547A/en
Priority to PCT/CN2020/142354 priority patent/WO2022141490A1/en
Publication of CN112614547A publication Critical patent/CN112614547A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • General Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The application relates to a microbial serum type classification model training and microbial serum type classification method. The method comprises the following steps: acquiring training data of the microorganism serum type classification model; determining a maximum margin hyperplane of the training data separation solution; adjusting the maximum edge distance hyperplane of the separation solution according to a preset charge-to-mass ratio interval, an allowable mass deviation and a minimum peak frequency to obtain a maximum edge distance hyperplane of the target separation solution; and determining the microorganism serum type classification model according to the maximum margin hyperplane obtained by the target separation solution, so as to classify the microorganism serum type according to the microorganism serum type classification model. By adopting the method, the microbial serum type can be rapidly classified through the microbial serum type classification model, and the accuracy of the classification result is higher.

Description

Microbial serum type classification model training and microbial serum type classification method
Technical Field
The application relates to the technical field of serotype classification, in particular to a method, a device, computer equipment and a storage medium for training a microorganism serum classification model and classifying microorganism serum.
Background
Serotypes refer to a specific and distinct subspecies of viruses and bacteria, and these microorganisms are generally classified and named by cell surface antigens. Serotype differences are caused by a plurality of factors, including viral and gram-negative bacteria surface lipopolysaccharide, exotoxin, bacterial plasmid and bacteriophage, and the same microorganism causes different disease symptoms, disease processes and infection degrees due to different antigenicity. Therefore, the accurate identification of the serotype is of great significance in clinical detection.
At present, serotype identification mainly depends on a conventional biochemical detection or serological method, is long in time consumption and high in cost, and needs a very professional foundation for detection personnel. Taking a common food-borne pathogenic bacterium salmonella as an example, according to GB4789.4-2016 food safety national standard food microbiology test for salmonella, different serotypes can be accurately distinguished, but the time is required to be more than 48 hours, and the antigen is very easily influenced by source environmental factors.
Therefore, the current microbial serum typing technology has the problems of slow speed and inaccurate typing.
Disclosure of Invention
In view of the above, there is a need to provide a method, an apparatus, a computer device and a storage medium for training a classification model of microbial serum capable of quickly and accurately performing serotype classification.
A method of training a classification model for microbial serological types, the method comprising:
acquiring training data of the microorganism serum type classification model;
determining a maximum margin hyperplane of the training data separation solution;
adjusting the maximum edge distance hyperplane of the separation solution according to a preset charge-to-mass ratio interval, an allowable mass deviation and a minimum peak frequency to obtain a maximum edge distance hyperplane of the target separation solution;
and determining the microorganism serum type classification model according to the maximum margin hyperplane obtained by the target separation solution, so as to classify the microorganism serum type according to the microorganism serum type classification model.
In one embodiment, the obtaining the training data set of the microorganism serum type classification model includes:
acquiring microbial serum type data;
performing peak intensity normalization processing on the microbial serum type data to obtain normalized serum type data;
performing internal standard mass axis calibration on the normalized serotype data to obtain calibrated serum type data;
and denoising the calibrated serum type data to obtain a training data set of the microorganism serum type classification model.
In one embodiment, the acquiring microbial serological data includes:
obtaining a microbial serum type spectrogram;
determining the credibility of the microbial serum type spectrogram;
and if the reliability is lower than a preset threshold value, deleting the microbial serum type spectrogram.
In one embodiment, the determining the maximum edge distance hyperplane for the training data separation solution includes:
and obtaining the maximum margin hyperplane separated and solved by the training data through a support vector machine algorithm.
In one embodiment, the adjusting the maximum edge distance hyperplane of the separation solution according to a preset charge-to-mass ratio interval, an allowable mass deviation, and a minimum peak-to-peak frequency to obtain the maximum edge distance hyperplane of the target separation solution includes:
according to the maximum edge distance hyperplane obtained by separation and solution, the charge-to-mass ratio, the mass deviation and the peak frequency of the microbial serum type are obtained;
determining whether the maximum margin hyperplane of the separation solution needs to be adjusted by comparing the charge-to-mass ratio with the charge-to-mass ratio interval, comparing the mass deviation with the allowable mass deviation, and comparing the peak frequency with the lowest peak frequency;
if the adjustment is needed, returning to the step of obtaining the maximum margin hyperplane obtained by the separated solution of the training data through the support vector machine algorithm;
and if the adjustment is not needed, obtaining the maximum edge distance hyperplane of the target separation solution according to the maximum edge distance hyperplane of the separation solution.
A method of typing a microbial serum, the method comprising:
obtaining a spectrogram of a microorganism to be typed;
identifying the microbial spectrogram to be typed through a microbial serum type classification model to obtain a microbial spectrogram identification result;
and determining the microbial serum type according to the microbial spectrogram identification result.
A microbial serum type classification model training device, the device comprising:
the training data acquisition module is used for acquiring training data of the microbial serum type classification model;
the maximum edge distance hyperplane determining module is used for determining the maximum edge distance hyperplane of the training data separation solution;
the maximum edge distance hyperplane adjusting module is used for adjusting the maximum edge distance hyperplane of the separation solution according to a preset charge-to-mass ratio interval, an allowable mass deviation and a minimum peak frequency to obtain the maximum edge distance hyperplane of the target separation solution;
and the model generation module is used for determining the microbial serum type classification model according to the maximum margin hyperplane obtained by the target separation solution so as to classify the microbial serum type according to the microbial serum type classification model.
A microbial serum typing device, the device comprising:
the spectrogram acquiring module is used for acquiring a spectrogram of a microorganism to be typed;
the spectrogram identification module is used for identifying the microbial spectrogram to be typed through a microbial serum type classification model to obtain a microbial spectrogram identification result;
and the serotype determination module is used for determining the microbial serum type according to the microbial spectrogram recognition result.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring training data of the microorganism serum type classification model;
determining a maximum margin hyperplane of the training data separation solution;
adjusting the maximum edge distance hyperplane of the separation solution according to a preset charge-to-mass ratio interval, an allowable mass deviation and a minimum peak frequency to obtain a maximum edge distance hyperplane of the target separation solution;
and determining the microorganism serum type classification model according to the maximum margin hyperplane obtained by the target separation solution, so as to classify the microorganism serum type according to the microorganism serum type classification model.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring training data of the microorganism serum type classification model;
determining a maximum margin hyperplane of the training data separation solution;
adjusting the maximum edge distance hyperplane of the separation solution according to a preset charge-to-mass ratio interval, an allowable mass deviation and a minimum peak frequency to obtain a maximum edge distance hyperplane of the target separation solution;
and determining the microorganism serum type classification model according to the maximum margin hyperplane obtained by the target separation solution, so as to classify the microorganism serum type according to the microorganism serum type classification model.
The microbial serum type classification model training and microbial serum type classification method, the device, the computer equipment and the storage medium, by obtaining the training data of the microorganism serum type classification model, the maximum margin hyperplane for the separation and solution of the training data is determined, the microbial serum type can be preliminarily classified based on the maximum margin hyperplane of the separation solution, adjusting the maximum edge distance hyperplane of the separation solution according to the preset charge-to-mass ratio interval, the allowable mass deviation and the lowest peak frequency to obtain the maximum edge distance hyperplane of the target separation solution, the microorganism serum type can be accurately typed through the maximum margin hyperplane solved by target separation, and determining a microorganism serum type classification model according to the maximum margin hyperplane obtained by target separation and solution, quickly classifying the microorganism serum type through the microorganism serum type classification model, and enabling the accuracy of a classification result to be higher.
Drawings
FIG. 1 is a diagram illustrating an exemplary environment in which a method for training a classification model of microbial blood serum type is applied;
FIG. 2 is a schematic flow chart illustrating a method for training a classification model of microbial blood serum type according to an embodiment;
FIG. 3 is a schematic diagram showing an interface of a training condition setting of the classification model of microbial blood serum type according to an embodiment;
FIG. 4 is a diagram of a support vector machine algorithm in one embodiment;
FIG. 5 is a schematic flow chart of a method for typing a microbial serum according to one embodiment;
FIG. 6 is a schematic flow chart of a method for typing a microbial serum according to another embodiment;
FIG. 7 is a block diagram showing the structure of a training apparatus for a classification model of a microbial blood serum type according to an embodiment;
FIG. 8 is a block diagram showing the structure of a microorganism serum typing apparatus according to an embodiment;
FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The microorganism identification method provided by the application can be applied to the application environment shown in figure 1. The mass spectrometer 102 may transmit the collected mass spectrum data to the terminal 104 by wire or wirelessly. The mass spectrometer 102 may be, but is not limited to, a matrix-assisted laser desorption ionization time-of-flight mass spectrometer (MALDI-TOF MS), and the terminal 104 may be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices.
In one embodiment, as shown in fig. 2, a method for training a classification model of microbial blood serum type is provided, which is illustrated by applying the method to the terminal in fig. 1, and includes the following steps:
step S210, obtaining training data of the microorganism serum type classification model.
In the concrete implementation, MALDI-TOF MS is a mass spectrum analysis technology which is developed in recent years and widely applied to microorganism identification, is mainly used for detecting bacteria and fungi after separation, and can generally realize the rapid identification of genera, species and even partial subspecies. However, because the classification of antigens determining the microbial serotype is complex and variable, the stability of different antigens is obviously different, the standardized treatment specification of a strain is not established, a database is not fine and complete enough, and MALDI-TOF MS cannot directly distinguish the serotypes through map identification.
Spectrogram data of the microorganism serum type classification model can be collected through MALDI-TOF MS and sent to the terminal. Specifically, a target serotype may be selected first, a microorganism having an application value may be selected, representative serotypes (for example, 20 serotypes of a certain bacterium are screened, but the detection rate of the first 5 serotypes is more than 90%, that is, the first 5 serotypes may be selected as representative serotypes), and a certain number of different strains may be selected for each serotype, including as obvious differences as possible, such as separation source, separation time, storage time, and the like. And then strain culture and pretreatment can be carried out, all strains adopt uniform culture conditions, the most common conditions can be selected in the application scene, the conventional protein extraction method is used for processing 12-24 hours later, the spotting scheme of each serotype and each strain sample is designed, and uniform matrix covering spotting is used. And finally, acquiring and screening spectrograms through MALDI-TOF MS, setting unified method parameters, acquiring the spectrograms by using an automatic mode, performing basic quality calibration by using Escherichia coli before testing, manually removing data with wrong strain identification and credibility score lower than a preset threshold value, respectively storing residual data in independent folders according to different serotype data, storing different serotype subfolders of the same microorganism in the same mother catalogue, and acquiring and supplementing again when the quantity of the spectrograms is insufficient.
After acquiring the serotype data, the terminal can perform fine processing on the data. Specifically, after different serotype data classified manually are imported into the terminal, peak intensity normalization and internal standard mass axis precise calibration (i.e., mass spectrum peaks generated at the same positions of different spectrograms are superposed, and after an average value of abscissa charge-to-mass ratios is calculated, calibration is performed according to a set certain tolerance) can be performed on the same serotype data, and a spectrogram with an average noise level higher than a set value is removed, so that training data of the microbial serum type classification model is obtained.
Step S220, determining the maximum margin hyperplane of the training data separation solution.
In a specific implementation, training conditions may be set first, including setting a charge-to-mass ratio range to be intercepted (for example, a minimum value and a maximum value of the charge-to-mass ratio), allowable mass deviations of different spectrograms of the same serotype, allowable mass differences between different serotypes, and a minimum peak-out frequency of a target peak required in feature extraction (for example, if 50% is set, the occurrence frequency of the feature peak at least exceeds 50%, and the feature peak is included in the feature information extraction range).
Fig. 3 provides an interface schematic diagram of a training condition setting of a microorganism serum type classification model, which can be displayed on a terminal, wherein min _ mz can be the minimum value of a charge-to-mass ratio range, max _ mz can be the maximum value of the charge-to-mass ratio range, in _ ppm can be the allowable mass deviation of different spectrograms of the same serotype, out _ ppm can be the allowable mass deviation of different serotypes, and live _ percent can be the minimum peak-out frequency of a target peak required in feature extraction.
After the setting is completed, a user can click a Train button, the terminal can use a support vector machine algorithm after deep optimization according to classified and processed data and set preconditions, a linear classifier with the maximum interval on a feature space is defined by solving a maximum margin hyperplane which can correctly divide a training data set and is separated and solved with the maximum geometric interval, feature information extraction and training are carried out, and finally a prediction model is generated and stored.
Fig. 4 provides a schematic diagram of an algorithm of a Support Vector Machine (SVM), which is a generalized linear classifier for binary classification of data in a supervised learning manner, and a decision boundary of the SVM is a maximum edge distance hyperplane for solving learning samples. For example, there are two types of linearly separable points on the paper, and the support vector machine algorithm looks for a straight line to distinguish the two types of points and to distance them as far as possible. The SVM has the advantages of low generalization error rate and easy interpretation of results, has the defects of difficult implementation on large-scale training samples, difficult multi-classification problem solution, sensitivity to parameter adjustment and kernel function selection, and can be widely applied to the fields of text classification, portrait recognition, medical diagnosis and the like.
And step S230, adjusting the maximum edge distance hyperplane of the separation solution according to a preset charge-to-mass ratio interval, an allowable mass deviation and a minimum peak frequency to obtain the maximum edge distance hyperplane of the target separation solution.
In a specific implementation, after the model is obtained in step S220, the model can be verified and optimized, and the optimization conditions improve the prediction accuracy through the known target serotype spectrogram data test. Specifically, the model generated in step S220 may not meet the requirements in terms of the charge-to-mass ratio interval, the allowable mass deviation, the minimum peak-out frequency, and the like set in advance, some spectrograms may be selected as test data, the charge-to-mass ratio, the mass deviation, and the peak-out frequency of the microbial serum type are calculated according to the maximum edge distance hyperplane determined in step S220, whether the maximum edge distance hyperplane to be resolved needs to be adjusted may be determined by comparing the obtained charge-to-mass ratio with the charge-to-mass ratio range to be intercepted, comparing the obtained mass deviation with the allowable mass deviation of different spectrograms of different serotypes or the allowable mass difference between different serotypes, comparing the obtained peak-out frequency with the minimum peak-out frequency of the required target peak at the time of feature extraction, if adjustment is needed, the SVM may be reapplied to determine the maximum edge distance hyperplane to be resolved, otherwise, if no adjustment is needed, the maximum edge distance hyperplane of the separation solution obtained in step S220 may be used as the maximum edge distance hyperplane of the target separation solution.
And S240, determining a microorganism serum type classification model according to the maximum margin hyperplane obtained by target separation and solving so as to classify the microorganism serum type according to the microorganism serum type classification model.
In the concrete implementation, the obtained maximum edge distance hyperplane for target separation and solution can enable the geometric interval of a training data set to be maximum, meet the requirements of a charge-to-mass ratio interval, allowable mass deviation and minimum peak frequency, extract and train characteristic information according to the maximum edge distance hyperplane for target separation and solution to obtain a microorganism serum type classification model, and classify the microorganism serum type through the microorganism serum type classification model.
According to the method for training the microbial serum type classification model, the maximum edge distance hyperplane of the separation solution of the training data is determined by obtaining the training data of the microbial serum type classification model, the microbial serum type can be preliminarily classified based on the maximum edge distance hyperplane of the separation solution, the maximum edge distance hyperplane of the separation solution is adjusted according to a preset charge-mass ratio interval, allowable mass deviation and minimum peak frequency, the maximum edge distance hyperplane of the separation solution of the target is obtained, the microbial serum type can be accurately classified through the maximum edge distance hyperplane of the separation solution of the target, the microbial serum type classification model is determined according to the maximum edge distance hyperplane of the separation solution of the target, the microbial serum type can be rapidly classified through the microbial serum type classification model, and the accuracy of the classification result is high.
In an embodiment, the step S210 may specifically include: acquiring microbial serum type data; carrying out peak intensity normalization processing on the microbial serum type data to obtain normalized serum type data; performing internal standard mass axis calibration on the normalized serotype data to obtain calibrated serum type data; and denoising the calibrated serum type data to obtain a training data set of the microorganism serum type classification model.
In the specific implementation, after different serotype data classified manually are imported into a terminal, peak intensity normalization processing can be performed on the same serotype data, then internal standard mass axis accurate calibration is performed (i.e. mass spectrum peaks generated at the same positions of different spectrograms are superposed, the average value of the charge-to-mass ratio of the abscissa is calculated and then calibrated according to a set certain tolerance), finally, a spectrogram with the average noise level higher than a set value is removed, the training data of the microbial serum type classification model is obtained, and a plurality of training data form a training data set.
In this embodiment, the training data can be normalized by obtaining the microbial serum type data, performing peak intensity normalization processing on the microbial serum type data to obtain normalized serum type data, performing internal standard mass axis calibration on the normalized serum type data to obtain calibrated serum type data, and performing denoising processing on the calibrated serum type data to obtain a training data set of the microbial serum type classification model, so that the training data can be normalized, and the classification accuracy of the obtained microbial serum type classification model can be improved.
In an embodiment, the step S210 may further include: obtaining a microbial serum type spectrogram; determining the credibility of the microbial serum type spectrogram; and if the reliability is lower than a preset threshold value, deleting the microbial serum type spectrogram.
In a specific implementation, after acquiring a microbial serum type spectrogram acquired by MALDI-TOF MS, the terminal may determine the reliability of the spectrogram, for example, may perform basic mass calibration using escherichia coli to obtain the reliability, then may compare the reliability with a preset reliability threshold, and if the reliability is higher than or equal to the threshold, the spectrogram may be retained, otherwise, if the reliability is lower than the threshold, the spectrogram may be deleted.
In this embodiment, the reliability of the microbial serum type spectrogram is determined by obtaining the microbial serum type spectrogram, and if the reliability is lower than a preset threshold, the microbial serum type spectrogram is deleted, so that higher reliability of training data of the microbial serum type classification model can be ensured, and the classification accuracy of the microbial serum type classification model is improved.
In an embodiment, the step S220 may specifically include: and obtaining the maximum margin hyperplane for the training data separation solution through a support vector machine algorithm.
In specific implementation, the training data set can be classified through a support vector machine algorithm, and a maximum edge distance hyperplane which can correctly divide the training data set and is separated and solved at the maximum geometric interval is obtained.
In this embodiment, the maximum edge distance hyperplane of the training data separation solution is obtained through the support vector machine algorithm, and the acquisition speed of the maximum edge distance hyperplane of the separation solution can be increased.
In an embodiment, the step S230 may specifically include: according to the maximum edge distance hyperplane obtained by separation solution, the charge-to-mass ratio, the mass deviation and the peak frequency of the microbial serum type are obtained; determining whether the maximum edge distance hyperplane needing to be adjusted and separated for solving is determined by comparing the charge-to-mass ratio with the charge-to-mass ratio interval, comparing the mass deviation with the allowable mass deviation and comparing the peak frequency with the lowest peak frequency; if the adjustment is needed, returning to the step of obtaining the maximum margin hyperplane obtained by the separation and solution of the training data through a support vector machine algorithm; and if the adjustment is not needed, obtaining the maximum edge distance hyperplane of the target separation solution according to the maximum edge distance hyperplane of the separation solution.
In a specific implementation, some spectrograms can be selected as test data, the charge-to-mass ratio, the mass deviation and the peak-appearing frequency of the microbial serum type are calculated according to the maximum edge distance hyperplane determined by the separation and solution in the step S220, comparing the obtained charge-to-mass ratio with the charge-to-mass ratio range to be intercepted, comparing the obtained mass deviation with different spectrogram allowable mass deviations of different serotypes or allowable mass differences among different serotypes, comparing the obtained peak frequency with the lowest peak frequency of a target peak required during characteristic extraction, it may be determined whether the maximum margin hyperplane of the separation solution needs to be adjusted, which, if necessary, the SVM may be reapplied to determine the maximum edge distance hyperplane for the separation solution, otherwise, if no adjustment is required, the maximum edge distance hyperplane of the separation solution obtained in step S220 may be used as the maximum edge distance hyperplane of the target separation solution.
In this embodiment, the charge-to-mass ratio, the mass deviation and the peak-out frequency of the microbial serum type are obtained according to the maximum edge distance hyperplane obtained by separation, whether the maximum edge distance hyperplane obtained by separation and solution needs to be adjusted is determined by comparing the charge-to-mass ratio with a charge-to-mass ratio interval, comparing the mass deviation with the allowable mass deviation and comparing the peak-out frequency with the minimum peak-out frequency, if so, the step of obtaining the maximum edge distance hyperplane obtained by separation and solution of the training data through a support vector machine algorithm is returned, and if not, the maximum edge distance hyperplane obtained by separation and solution is obtained according to the maximum edge distance hyperplane obtained by separation and solution, so that the obtained maximum edge distance hyperplane obtained by separation and solution of the target can be ensured to meet the requirements of the charge-to-mass ratio, the mass deviation and the peak-out frequency, and the classification accuracy of the microbial serum type classification model is improved.
In one embodiment, as shown in fig. 5, a method for typing a microorganism serum is provided, which is illustrated by way of example in the terminal of fig. 1, and comprises the following steps:
step S510, acquiring a microbial spectrogram to be typed;
step S520, identifying a microbial spectrogram to be typed through a microbial serum type classification model to obtain a microbial spectrogram identification result;
and step S530, determining the microbial serum type according to the microbial spectrogram identification result.
In a specific implementation, the terminal may obtain a microbial spectrogram to be typed, identify the microbial spectrogram to be typed through the microbial serum type classification model trained in the above steps S210 to S240 to obtain an identification result, where the identification result may be a microbial serum type corresponding to the microbial spectrogram, and determine the microbial serum type of the microbial spectrogram to be typed according to the identification result.
In practical application, a trained microorganism serum type classification model is loaded at a terminal, acquired spectrogram data which is identified as the microorganism with high reliability is imported, the terminal can extract information and compare the information with the trained microorganism serum type classification model, and finally a serotype prediction result is given. Compared with the conventional biochemical serum typing method, the technology can achieve high-degree accurate prediction by carrying out standardized processing on microorganisms and acquired map data, establishing and continuously optimizing and improving a classification model by using a machine learning algorithm, introducing data to be typed into the model for comparison, and realizing rapid prediction of the serotype to a certain degree, thereby greatly identifying the serotype and effectively reducing the cost, and even if the accurate prediction cannot reach 100 percent, reducing the test times by effectively reducing the range of possible serotypes and using a serological method.
In the embodiment, the microbial spectrogram to be typed is obtained by obtaining the microbial spectrogram, the microbial spectrogram to be typed is identified by the microbial serum type classification model to obtain the identification result of the microbial spectrogram, and the microbial serum type is determined according to the identification result of the microbial spectrogram, so that serotype typing prediction can be quickly realized, typing time can be shortened, and the accuracy of the serum type prediction can be improved.
Fig. 6 provides a schematic flow diagram of a microbial serum typing method, which may include steps of data acquisition, model training, and typing prediction. The data acquisition comprises acquisition of serotype data based on MALDI-TOF MS, model training comprises processing of the acquired serotype data and training of a microorganism serum type classification model, and typing prediction comprises data comparison and serotype prediction.
Specifically, the microorganism serum typing method is based on MALDI-TOF MS, firstly, a certain amount of representative mass spectrograms of different serotype strains of the same microorganism are systematically collected, common features of the same serotype are extracted by combining a machine learning classification algorithm to train and generate a model, then, the spectrogram identified as the microorganism is subjected to model matching analysis and prediction, and the method has the technical characteristics of simplicity in operation, high detection speed, good repeatability and high flux. The training model established in a targeted manner already contains the common characteristics of the spectrogram of a target microorganism and even all serotypes, so that after the MALDI-TOF MS identifies the microorganism, the prediction result of blood type classification can be quickly realized by directly carrying out classification comparison with the model. Compared with the traditional biochemical method, the method can greatly shorten the serotyping time and improve the accuracy of serotyping prediction.
It should be understood that although the various steps in the flowcharts of fig. 2, 5 and 6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 5 and 6 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps or stages.
In one embodiment, as shown in fig. 7, there is provided a training apparatus 700 for a classification model of microbial serum type, comprising: a training data acquisition module 702, a maximum margin hyperplane determining module 704 for separate solution, a maximum margin hyperplane adjusting module 706 for separate solution, and a model generation module 708, wherein:
a training data obtaining module 702, configured to obtain training data of the microbial serum type classification model;
a maximum edge distance hyperplane determining module 704 for determining a maximum edge distance hyperplane for the training data separation solution;
a maximum edge distance hyperplane adjusting module 706 for adjusting the maximum edge distance hyperplane of the separation solution according to a preset charge-to-mass ratio interval, an allowable mass deviation and a minimum peak-out frequency to obtain a maximum edge distance hyperplane of the target separation solution;
the model generating module 708 is configured to determine the microorganism serum type classification model according to the maximum margin hyperplane obtained by the target separation solution, so as to classify the microorganism serum type according to the microorganism serum type classification model.
In one embodiment, the training data obtaining module 702 is further configured to obtain microbial serum type data; performing peak intensity normalization processing on the microbial serum type data to obtain normalized serum type data; performing internal standard mass axis calibration on the normalized serotype data to obtain calibrated serum type data; and denoising the calibrated serum type data to obtain a training data set of the microorganism serum type classification model.
In one embodiment, the training data obtaining module 702 is further configured to obtain a microbial serogram; determining the credibility of the microbial serum type spectrogram; and if the reliability is lower than a preset threshold value, deleting the microbial serum type spectrogram.
In an embodiment, the above separately solved maximum edge distance hyperplane determining module 704 is further configured to obtain the maximum edge distance hyperplane separately solved by the training data through a support vector machine algorithm.
In an embodiment, the maximum edge distance hyperplane adjusting module 706 is further configured to obtain a charge-to-mass ratio, a mass deviation, and a peak frequency of the microbial serum type according to the maximum edge distance hyperplane obtained by the separation solution; determining whether the maximum margin hyperplane of the separation solution needs to be adjusted by comparing the charge-to-mass ratio with the charge-to-mass ratio interval, comparing the mass deviation with the allowable mass deviation, and comparing the peak frequency with the lowest peak frequency; if the adjustment is needed, returning to the step of obtaining the maximum margin hyperplane obtained by the separated solution of the training data through the support vector machine algorithm; and if the adjustment is not needed, obtaining the maximum edge distance hyperplane of the target separation solution according to the maximum edge distance hyperplane of the separation solution.
In one embodiment, as shown in fig. 8, there is provided a microbial serum typing device 800 comprising: a spectrogram acquisition module 802, a spectrogram identification module 804, and a serotype determination module 806, wherein:
a spectrogram acquiring module 802 for acquiring a spectrogram of a microorganism to be typed;
the spectrogram identification module 804 is used for identifying the microbial spectrogram to be typed through a microbial serum type classification model to obtain a microbial spectrogram identification result;
and a serotype determination module 806 for determining a microbial serotype according to the microbial spectrogram identification result.
For the specific definition of the training of the microbial blood serum type classification model and the microbial blood serum type classification device, reference may be made to the above definition of the training of the microbial blood serum type classification model and the microbial blood serum type classification method, which is not described herein again. All or part of each module in the microbial blood serum type classification model training and microbial blood serum type classification device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of training a microbial serum type classification model. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of a method of training a classification model of microbial serum type as described above. Here, the steps of a method for training a microorganism serum type classification model may be steps of a method for training a microorganism serum type classification model according to the above embodiments.
In one embodiment, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, causes the processor to carry out the steps of a method of training a classification model of microbial serum type as described above. Here, the steps of a method for training a microorganism serum type classification model may be steps of a method for training a microorganism serum type classification model according to the above embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for training a classification model of a microbial serum type, which is characterized by comprising the following steps:
acquiring training data of the microorganism serum type classification model;
determining a maximum margin hyperplane of the training data separation solution;
adjusting the maximum edge distance hyperplane of the separation solution according to a preset charge-to-mass ratio interval, an allowable mass deviation and a minimum peak frequency to obtain a maximum edge distance hyperplane of the target separation solution;
and determining the microorganism serum type classification model according to the maximum margin hyperplane obtained by the target separation solution, so as to classify the microorganism serum type according to the microorganism serum type classification model.
2. The method of claim 1, wherein said obtaining a training data set of said classification model of microbial serological type comprises:
acquiring microbial serum type data;
performing peak intensity normalization processing on the microbial serum type data to obtain normalized serum type data;
performing internal standard mass axis calibration on the normalized serotype data to obtain calibrated serum type data;
and denoising the calibrated serum type data to obtain a training data set of the microorganism serum type classification model.
3. The method of claim 2, wherein said obtaining microbial serological data comprises:
obtaining a microbial serum type spectrogram;
determining the credibility of the microbial serum type spectrogram;
and if the reliability is lower than a preset threshold value, deleting the microbial serum type spectrogram.
4. The method of claim 1, wherein determining the maximum edge distance hyperplane for the training data separation solution comprises:
and obtaining the maximum margin hyperplane separated and solved by the training data through a support vector machine algorithm.
5. The method according to claim 4, wherein the adjusting the maximum edge distance hyperplane of the separation solution according to the preset charge-to-mass ratio interval, the allowable mass deviation and the lowest peak-out frequency to obtain the maximum edge distance hyperplane of the target separation solution comprises:
according to the maximum edge distance hyperplane obtained by separation and solution, the charge-to-mass ratio, the mass deviation and the peak frequency of the microbial serum type are obtained;
determining whether the maximum margin hyperplane of the separation solution needs to be adjusted by comparing the charge-to-mass ratio with the charge-to-mass ratio interval, comparing the mass deviation with the allowable mass deviation, and comparing the peak frequency with the lowest peak frequency;
if the adjustment is needed, returning to the step of obtaining the maximum margin hyperplane obtained by the separated solution of the training data through the support vector machine algorithm;
and if the adjustment is not needed, obtaining the maximum edge distance hyperplane of the target separation solution according to the maximum edge distance hyperplane of the separation solution.
6. A method of typing a microbial serum, the method comprising:
obtaining a spectrogram of a microorganism to be typed;
identifying the microbial spectrogram to be typed through a microbial serum type classification model to obtain a microbial spectrogram identification result;
and determining the microbial serum type according to the microbial spectrogram identification result.
7. A microbial serum type classification model training device, characterized in that the device comprises:
the training data acquisition module is used for acquiring training data of the microbial serum type classification model;
the maximum edge distance hyperplane determining module is used for determining the maximum edge distance hyperplane of the training data separation solution;
the maximum edge distance hyperplane adjusting module is used for adjusting the maximum edge distance hyperplane of the separation solution according to a preset charge-to-mass ratio interval, an allowable mass deviation and a minimum peak frequency to obtain the maximum edge distance hyperplane of the target separation solution;
and the model generation module is used for determining the microbial serum type classification model according to the maximum margin hyperplane obtained by the target separation solution so as to classify the microbial serum type according to the microbial serum type classification model.
8. A microbial serum typing device, said device comprising:
the spectrogram acquiring module is used for acquiring a spectrogram of a microorganism to be typed;
the spectrogram identification module is used for identifying the microbial spectrogram to be typed through a microbial serum type classification model to obtain a microbial spectrogram identification result;
and the serotype determination module is used for determining the microbial serum type according to the microbial spectrogram recognition result.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
CN202011580705.7A 2020-12-28 2020-12-28 Microbial serum type classification model training and microbial serum type classification method Pending CN112614547A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011580705.7A CN112614547A (en) 2020-12-28 2020-12-28 Microbial serum type classification model training and microbial serum type classification method
PCT/CN2020/142354 WO2022141490A1 (en) 2020-12-28 2020-12-31 Microbial serotyping model training method and microbial serotyping method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011580705.7A CN112614547A (en) 2020-12-28 2020-12-28 Microbial serum type classification model training and microbial serum type classification method

Publications (1)

Publication Number Publication Date
CN112614547A true CN112614547A (en) 2021-04-06

Family

ID=75248285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011580705.7A Pending CN112614547A (en) 2020-12-28 2020-12-28 Microbial serum type classification model training and microbial serum type classification method

Country Status (2)

Country Link
CN (1) CN112614547A (en)
WO (1) WO2022141490A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103308696A (en) * 2013-05-30 2013-09-18 中国疾病预防控制中心传染病预防控制所 Brucella rapid detection kit based on mass-spectrometric technique
CN103344695A (en) * 2013-06-18 2013-10-09 中国疾病预防控制中心传染病预防控制所 Kit for rapid mass spectrometric detection of leptospira
CN111007139A (en) * 2020-03-09 2020-04-14 中国疾病预防控制中心传染病预防控制所 Rapid brucella infection detection method based on serum
US20200118650A1 (en) * 2018-10-10 2020-04-16 Shimadzu Corporation Mass spectrometer, mass spectrometry method, and non-transitory computer readable medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103308696A (en) * 2013-05-30 2013-09-18 中国疾病预防控制中心传染病预防控制所 Brucella rapid detection kit based on mass-spectrometric technique
CN103344695A (en) * 2013-06-18 2013-10-09 中国疾病预防控制中心传染病预防控制所 Kit for rapid mass spectrometric detection of leptospira
US20200118650A1 (en) * 2018-10-10 2020-04-16 Shimadzu Corporation Mass spectrometer, mass spectrometry method, and non-transitory computer readable medium
CN111007139A (en) * 2020-03-09 2020-04-14 中国疾病预防控制中心传染病预防控制所 Rapid brucella infection detection method based on serum

Also Published As

Publication number Publication date
WO2022141490A1 (en) 2022-07-07

Similar Documents

Publication Publication Date Title
Xiong et al. Automatic detection of mycobacterium tuberculosis using artificial intelligence
Wang et al. A new scheme for strain typing of methicillin-resistant Staphylococcus aureus on the basis of matrix-assisted laser desorption ionization time-of-flight mass spectrometry by using machine learning approach
WO2017124942A1 (en) Method and apparatus for abnormal access detection
US20190385086A1 (en) Method of knowledge transferring, information processing apparatus and storage medium
WO2016205286A1 (en) Automatic entity resolution with rules detection and generation system
TW201407154A (en) Integration of automatic and manual defect classification
CN108280348B (en) Android malicious software identification method based on RGB image mapping
WO2015117560A1 (en) Web page recognizing method and apparatus
CN112201300B (en) Protein subcellular localization method based on depth image features and threshold learning strategy
CN110348471B (en) Abnormal object identification method, device, medium and electronic equipment
CN112926045B (en) Group control equipment identification method based on logistic regression model
Weis et al. Topological and kernel-based microbial phenotype prediction from MALDI-TOF mass spectra
US20220359039A1 (en) Electronic Methods And Systems For Microorganism Characterization
CN111814893A (en) Lung full-scan image EGFR mutation prediction method and system based on deep learning
Hediyeh-zadeh et al. MSImpute: Imputation of label-free mass spectrometry peptides by low-rank approximation
US10713342B2 (en) Techniques to determine distinctiveness of a biometric input in a biometric system
US11397868B2 (en) Fungal identification by pattern recognition
US20100239168A1 (en) Semi-tied covariance modelling for handwriting recognition
CN112614547A (en) Microbial serum type classification model training and microbial serum type classification method
CN116704580A (en) Face counterfeiting detection method based on depth information decoupling
CN111832426A (en) Cross-library micro-expression recognition method and device based on double-sparse transfer learning
CN113988226B (en) Data desensitization validity verification method and device, computer equipment and storage medium
WO2023130386A1 (en) Procedural video assessment
Iravani et al. An Interpretable Deep Learning Approach for Biomarker Detection in LC-MS Proteomics Data
Christner et al. Identification of Shiga-Toxigenic Escherichia coli outbreak isolates by a novel data analysis tool after matrix-assisted laser desorption/ionization time-of-flight mass spectrometry

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 510535 No. 16, Xinrui Road, Huangpu District, Guangzhou, Guangdong

Applicant after: GUANGZHOU HEXIN INSTRUMENT Co.,Ltd.

Address before: 510530 Room 102, building A3, No. 11, Kaiyuan Avenue, Huangpu District, Guangzhou City, Guangdong Province

Applicant before: GUANGZHOU HEXIN INSTRUMENT Co.,Ltd.