WO2018121082A1 - Self-learning-type qualitative analysis method based on raman spectrum - Google Patents

Self-learning-type qualitative analysis method based on raman spectrum Download PDF

Info

Publication number
WO2018121082A1
WO2018121082A1 PCT/CN2017/109712 CN2017109712W WO2018121082A1 WO 2018121082 A1 WO2018121082 A1 WO 2018121082A1 CN 2017109712 W CN2017109712 W CN 2017109712W WO 2018121082 A1 WO2018121082 A1 WO 2018121082A1
Authority
WO
WIPO (PCT)
Prior art keywords
substance
list
self
similarity
learning
Prior art date
Application number
PCT/CN2017/109712
Other languages
French (fr)
Chinese (zh)
Inventor
赵自然
王红球
杨内
苟巍
Original Assignee
同方威视技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 同方威视技术股份有限公司 filed Critical 同方威视技术股份有限公司
Publication of WO2018121082A1 publication Critical patent/WO2018121082A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/65Raman scattering
    • G01N21/658Raman scattering enhancement Raman, e.g. surface plasmons
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/62Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light
    • G01N21/63Systems in which the material investigated is excited whereby it emits light or causes a change in wavelength of the incident light optically excited
    • G01N21/65Raman scattering

Definitions

  • the present disclosure relates to the field of Raman spectroscopy, and in particular to a self-learning qualitative analysis method based on Raman spectroscopy.
  • Raman spectroscopy is a non-contact spectroscopy technique based on the Raman scattering effect of excitation light. It can qualitatively and quantitatively analyze the composition of a substance.
  • Raman spectroscopy is a molecular vibrational spectroscopy that reflects the fingerprint characteristics of a molecule, and the Raman spectrum of each substance is unique. The Raman spectrum obtained by comparison with the known Raman spectrum database of various substances is used to identify the composition of the substance to be tested, and thus can be used for detecting substances, and has been widely used for liquid security. , jewelry testing, explosives testing, drug testing, drug testing and other fields.
  • a conventional Raman spectroscopy detecting device generally performs a qualitative analysis based on a spectral database to perform a qualitative measurement, and finally displays a measurement result
  • the approximate workflow can be summarized as: collecting spectral data; preprocessing the acquired spectral image The pre-processed acquired spectra are compared with the spectral library; the qualitative analysis results are obtained; and the qualitative analysis results are displayed.
  • the Raman spectral similarity of the two species can be quantitatively represented, for example, by a "similarity” parameter, such as the similarity function commonly used to calculate similarity.
  • this conventional Raman spectroscopy method for qualitative analysis generally has a high false alarm rate and a false negative rate for substances of low purity, and is merely exhaustive and mechanically performed with the exhaustivity of the spectral database. Contrast until a consistent alignment result is obtained to complete the qualitative analysis, so that the analysis process takes a long time; and the sample with a small difference between the two components adopts a global simple repeated Raman spectral similarity alignment analysis, which is difficult to be similar Degree calculation result Therefore, the current conventional similarity calculation method and the similarity discrimination threshold also encounter certain difficulties.
  • the purpose of the present disclosure is to address at least one aspect of the above problems and deficiencies existing in the prior art.
  • the embodiments of the present disclosure provide a self-learning qualitative analysis method based on Raman spectroscopy, which combines self-learning and manual comparison to complete Raman spectroscopy, which can reduce false positives due to insufficient material purity in qualitative analysis. And the incidence of underreporting, improve the accuracy of qualitative analysis; shorten the analysis processing time; and shorten the system startup time.
  • an embodiment of the present disclosure provides a method for self-learning qualitative analysis based on Raman spectroscopy, comprising: a Raman spectroscopy acquisition step: collecting Raman of an item to be measured Spectral; feature extraction and comparison steps: extracting the Raman spectral data and comparing the spectral signature database in the spectral library to obtain a list of original identification substance IDs; similarity comparison step: obtaining each substance in the list of original identification substance IDs for Raman spectroscopy calculation The similarity of the IDs to generate a similarity list, and compared with the similarity threshold library in the spectral library; and the substance ID selection step: the similarity super similarity threshold obtained after comparing with the similarity threshold based on the self-learning library
  • the similarity identification substance ID list is subjected to verification detection, including false alarm detection and false negative detection.
  • the false alarm detection is performed. ; when there is no substance ID in the similarity list that exceeds the similarity threshold of the substance ID stored in the similarity threshold library Executive false negative test.
  • the false negative detection is additionally performed after the false alarm detection is performed.
  • any one of the false positive detection and the false negative detection is set to selectively perform three parallel material ID selection methods, including: statistical selection Method: statistically select all false positives or missing material IDs in the self-learning library; feature recognition method: for the false alarms or missing material IDs of the “self-learning type” in the self-learning library The selection of the feature recognition method; and the secondary recognition method: the secondary recognition mode is selected for the false alarm or the missing material ID of the "self-learning type” in the self-learning library.
  • any one of the false positive detection and the false negative detection is set to include a pre-processing step and a post-processing step, the pre-processing step comprising: by listing the identified substance IDs ID and self-learning library for all false positives or missing material IDs, false positives or missing material IDs for "self-learning types" in the self-learning library, and "for self-learning libraries”
  • the self-learning type "false identification or missing material ID” whose value is "secondary recognition” is respectively compared to generate the highest correct substance ID of the statistical selection method, the feature recognition method, and the secondary recognition method, respectively.
  • the post-processing step selectively performing the three substance IDs based on a comparison of the highest correct substance ID number of the statistical selection method, the feature identification method, and the secondary identification method with respective number thresholds Method of choosing.
  • the list of identified substance IDs in the pre-processing step of the false positive detection is selected as the similarity identifying substance ID list.
  • the list of identified substance IDs in the pre-processing step of the missing report detection is selected as the original identification substance ID list.
  • a threshold number of times of the highest correct substance ID number obtained for all false positive or missing material IDs in the self-learning library is set to be larger than "self-learning" for the self-learning library
  • the type "value” is a threshold value of the number of times the highest correct substance ID number obtained by the false positive or missing material ID of one of "feature recognition” and "secondary recognition”.
  • the generated at least two identification substance ID lists are equal, it is confirmed as the list of the identified substance IDs after the verification detection.
  • the intersection is confirmed as the list of the identified substance IDs after the verification detection.
  • the substance ID selection step is performed again for a portion other than the intersection in the generated at least two identification substance ID lists.
  • the substance ID selection step performed again includes enhanced detection by using an additive-to-measurement article and an enhancer to obtain an enhanced Raman spectrum.
  • the post-processing step of the false positive detection is performed only when the number of statistical false positives is greater than the false positive number threshold.
  • the method further includes adding the obtained false positive substance ID list and the missing material ID list to the self-learning library according to the “self-learning type” field after performing the qualitative analysis on the item to be measured.
  • the method prior to performing qualitative analysis on the item to be measured, the method further includes creating a self-learning library using one of an initial self-learning library that performs initial learning and input presets on the self-learning library.
  • the method further comprises selectively identifying the substance using a manual comparison method.
  • an embodiment of the present disclosure further provides an electronic device, including: a memory for storing executable instructions; and a processor for executing executable instructions stored in the memory to perform as before Said method.
  • FIG. 1 shows a schematic diagram of a basic process according to an embodiment of the present disclosure, the illustrated components being two phases of a learning phase and an actual detection phase;
  • FIG. 2 is a schematic diagram showing the overall flow of an actual detection stage according to an embodiment of the present disclosure as shown in FIG. 1;
  • 3(a) and 3(b) respectively show schematic diagrams of Raman spectra before and after the pretreatment step in the overall flow of the actual detection phase shown in FIG. 2;
  • FIG. 4(a) shows an exemplary similarity list obtained in step S31 in the overall flow diagram shown in FIG. 2;
  • FIG. 4(b) shows step S32 in the overall flow diagram shown in FIG.
  • FIG. 4(c) shows an exemplary excess generated after threshold comparison in step S32 in the overall flow diagram shown in FIG. 2.
  • FIG. 4(d) shows schematic content of an exemplary self-learning library generated by step S10 in the overall flow diagram shown in FIG. 2;
  • FIG. 5 shows a basic schematic flow chart of false alarm detection in the actual detection phase as shown in FIG. 2;
  • Figure 6 is a schematic flow chart showing an extension of the "three method election" implementation of false alarm detection in the actual detection phase as shown in Figure 2;
  • Figure 7 is a schematic flow diagram of an extended exemplary embodiment of false alarm detection as shown in Figure 6;
  • Figure 8 is a schematic flow chart of another extended exemplary embodiment of false alarm detection as shown in Figure 6;
  • Figure 9 is a sub-flow diagram of re-false alarm detection performed using enhanced Raman spectroscopy in another extended exemplary embodiment of false alarm detection as shown in Figure 8, showing a re-error as shown in Figure 8.
  • Figure 10 shows a basic schematic flow chart of the false negative detection in the actual detection phase as shown in Figure 2;
  • Figure 11 is a schematic flow chart showing an extension of the "three method election" implementation of the false negative detection in the actual detection phase as shown in Figure 2;
  • Figure 12 is a schematic flow chart of an extended exemplary embodiment of the false negative detection shown in Figure 11;
  • Figure 13 is a schematic flow chart of another extended exemplary embodiment of the false negative detection shown in Figure 11;
  • Figure 14 is a sub-flow diagram of re-missing detection performed using enhanced Raman spectroscopy in another extended exemplary embodiment of the false negative detection shown in Figure 13, showing the re-leakage as shown in Figure 13 An exemplary decomposition step of reporting detection;
  • Figure 15 shows an operational schematic of the method of the embodiment of Figure 1 in accordance with the present disclosure
  • FIG. 16 shows still another flow diagram according to an embodiment of the present disclosure, which is illustrated as being divided into Two phases, the learning phase and the actual testing phase, which show the possible detection methods for simultaneous false positives and false negatives;
  • FIG. 17 is a block diagram showing an example hardware arrangement of an electronic device in accordance with still another embodiment of the present invention.
  • a self-learning qualitative analysis method based on Raman spectroscopy comprising: a Raman spectroscopy acquisition step: acquiring a Raman spectrum of an item to be measured; a feature extraction and comparison step: extracting Raman spectroscopy data Comparing with the spectral feature library in the spectral library, obtaining a list of original identification substance IDs; similarity comparison step: obtaining the similarity degree of each substance ID in the original identification substance ID list for the Raman spectrum calculation to generate a similarity list, and the spectrum Comparing the similarity threshold library in the gallery; and the substance ID selection step: verifying, based on the self-learning library, the similarity identification substance ID list obtained by comparing the similarity super-similarity threshold with the similarity threshold, including the error Report detection and false negative detection.
  • FIG. 1 shows a schematic diagram of a basic process according to an embodiment of the present disclosure. Two stages of the learning phase and the actual testing phase.
  • the main purpose is to establish a Raman spectral self-learning library for the samples used for actual testing.
  • the self-learning library is used, and the actual sample to be tested is detected by combining artificial contrast Raman spectroscopy to obtain the result of the qualitative analysis.
  • the above learning phase can also be equivalently considered as a pre-set or calibration phase of the self-learning library, for example typically comprising the steps of measuring the Raman spectrum of the learning sample, such as by extracting its spectral features and comparing it to the spectral feature library; For example, by comparing the spectral features, the similarity list is obtained and compared with the similarity threshold library; whether there is a substance exceeding the threshold exists, and based on the judgment result, (1) if there is more than the listed in the comparison with the threshold library
  • the substance ID of the similarity threshold performs false positive detection (ie, whether there is a substance that is detected by the current similarity threshold exceeding the similarity threshold and is not substantially included in the current learning sample due to a false alarm), the error
  • the report detection selects the false positive substance ID, for example, by comparing with the false positive substance ID or name in the existing self-learning library, and further selectively adopting different self-learning type methods, and (2) if there is no more than The material ID of the similarity threshold is listed
  • the missing detection is compared, for example, by comparison with a missing substance ID or name in an existing self-learning library, and further selectively using different self-learning types
  • the method selects the missing material ID; then optionally determines whether to perform manual comparison and selectively performs manual comparison based on the judgment result; finally, the substance ID such as the correct identification and the correction identification type (ie, false alarm, false negative)
  • the information is entered into the self-learning library as part of the initial preset value of the self-learning library.
  • the above process can be performed separately for one or more learning samples until the Raman spectra of the new learning samples that are no longer needed require acquisition and qualitative detection.
  • the actual detection phase described above can also be equivalently considered as a stage for qualitative analysis of a test sample based on a generated self-learning library, for example typically comprising the steps of measuring the Raman spectrum of the sample to be measured, such as by extracting its spectral characteristics. And comparing with the spectral feature library; and obtaining the similarity list based on the comparison of the spectral features and comparing with the similarity threshold library; determining whether there is a substance exceeding the threshold, and based on the judgment result, (1) if If there is a substance ID that exceeds the similarity threshold listed in the threshold database, then a false positive detection is performed (ie, it is determined whether there is a false positive in the currently detected substance exceeding the similarity threshold.
  • the false positive detection is compared, for example, by comparison with a false positive substance ID or name in an existing self-learning library, and further selectively adopting different self-learning types.
  • the method selects the missing material ID; then optionally determines whether to perform manual comparison and selectively performs manual comparison based on the determination result; finally displays the identification result of the qualitative analysis; and then the substance ID such as the correct identification and the correction identification type thereof
  • the information ie, false positives, false negatives
  • the conventional Raman spectroscopy method For conventional Raman spectroscopy methods, if only the measured samples are directly tested and judged based on the original Raman spectroscopy data, in some cases the accuracy of the detection is difficult to guarantee for certain samples such as samples of insufficient purity; And if only the artificial comparison method is used, usually based on the experience of the tester, objective and accurate test results cannot be obtained; and the conventional Raman spectroscopy detection method generates at most the initial calibration sample database for direct comparison, and there is no self. Learning ability to adapt to lack of flexibility when performing qualitative analysis, for example, on substances in mixtures of different components. Moreover, the conventional Raman spectroscopy method generally has a problem that the analysis processing time is long.
  • the self-learning qualitative analysis method based on Raman spectroscopy utilizes a combination of self-learning and manual comparison.
  • the measured sample is tested.
  • the self-learning library is continuously supplemented and perfected, such as learning by using learning samples in a pre-staged learning phase, and the results of qualitative analysis of different samples of the substance to be tested in actual use. Learning to improve the accuracy and efficiency of the recognition results, so that the detection efficiency and detection accuracy of qualitative analysis based on Raman spectroscopy can be optimally optimized, especially in the case of insufficient material purity, which cannot be used by conventional Raman detection methods. In case of direct identification.
  • the learning sample may, for example, be selected such that the characteristic peaks in the generated spectrum are clear, the peak position is uniform, A sample of a substance that interferes with small substances. Moreover, it is desirable that the learning samples are selected to have a more uniform peak interval and a certain interval to facilitate more accurate pre-learning.
  • the learning sample is, for example, a liquid or solid sample. And, for example, considering that the sample to be actually tested is a mixture of a plurality of substances, the learning sample is selected, for example, as a mixture of a plurality of components whose single component purity is not absolutely superior to be adapted for comparison in a later measurement. Claim.
  • the Raman spectrum of the learning sample has, for example, at least four characteristic peaks.
  • the greater number of characteristic peaks is beneficial to the accuracy of the initial learning to improve the accuracy of subsequent qualitative detection operations based on the self-learning library.
  • this is not essential, and the learning sample can also have, for example, two or three characteristic peaks.
  • an initial self-learning library may be established using representative learning sample items; on the other hand, the above learning phase is not necessary.
  • the operator can perform a qualitative analysis of the sample material to be measured using a self-learning library that is input in advance rather than a newly generated self-learning library.
  • the above-mentioned pre-self-learning phase does not have to be performed for a long time before the actual detection, for example, instead of self-learning while detecting the measured sample substance at the inspection site, the newly added test sample is accumulated during use. Add substances to the self-learning library.
  • the overall spectral library in the usual Raman detection is subdivided into a plurality of sub-libraries: a spectral feature library, such as by Some basic features such as peak number, peak position, and peak intensity of the graph are extracted to generate the spectral feature library for use in algorithm comparison and identification, and are loaded at software startup; (similarity) threshold library, including recognition spectrum The similarity threshold, material ID, library number and other information are used for display processing and loaded at software startup; the substance name library includes material ID, name, alias and other information for use in software display processing.
  • a spectral feature library such as by Some basic features such as peak number, peak position, and peak intensity of the graph are extracted to generate the spectral feature library for use in algorithm comparison and identification, and are loaded at software startup
  • (similarity) threshold library including recognition spectrum The similarity threshold, material ID, library number and other information are used for display processing and loaded at software startup
  • the substance name library includes material ID, name, alias and other information for use in software display processing.
  • the sub-libraries of the respective subdivisions are respectively loaded for comparison at the respective detection steps, and it is not necessary to always load the complete spectral library as a whole or multiple times, thereby shortening the response time of each step and improving the response time. Detection speed.
  • the actual detection phase includes, for example:
  • Step S0 start;
  • Step S1 generating a Raman spectrum to be detected and extracting Raman spectrum data
  • Step S2 comparing the extracted Raman spectral data with a spectral feature library
  • Step S3 using a similarity calculation and a similarity threshold comparison to generate a preliminary determined substance list
  • Step S4 Determine whether there is a substance exceeding the threshold?
  • Step S5 further performing false alarm detection for the case where it is determined that there is a substance exceeding the threshold;
  • Step S6 further performing false negative detection for the case where it is determined that there is no substance exceeding the threshold
  • Step S7 generating a list of substances confirmed by a false positive (or missing report) test
  • Step S8 Manually comparing the detection of Raman spectroscopy
  • Step S9 generating a list of substances for final detection confirmation, and finding a substance name from the substance library;
  • Step S10 all the test results of the current time are written into the self-learning library
  • step S11 displaying the detection result of the qualitative analysis, and the current detection process is terminated.
  • step S1 specifically includes:
  • Step S11 collecting a Raman spectrum, which can be obtained, for example, by a known process such as beam emission, collection, and splitting;
  • Step S12 pre-processing the collected Raman spectrum to obtain a raw Raman spectrum to be tested
  • Step S13 extracting spectral data from the original Raman spectrum to be tested.
  • the measured raw spectral data needs to be preprocessed as shown in step S12 above to facilitate the extraction of subsequent valid information.
  • the pre-processed spectral pre-processing of the above step S12 generally includes interpolation, de-noising, baseline correction, normalization processing, etc., in particular, the main purpose is to perform smooth denoising processing on the input spectrogram signal. Spectral signals before and after pre-processing are shown in Figures 3(a) and 3(b), respectively.
  • the collected original spectrum generally needs to be pre-processed, and for brevity, it will not be described below.
  • step S3 includes, for example, specifically:
  • Step S31 calculating a list of acquired similarities
  • Step S32 The similarity list is compared with the similarity threshold library, and a substance list exceeding the threshold is acquired.
  • FIG. 4(a) shows an exemplary similarity list obtained in step S31 in the overall flow diagram shown in FIG. 2;
  • FIG. 4(b) shows FIG. An exemplary threshold library for threshold comparison included in the Raman spectral spectrum library in step S32 is shown in the overall flow diagram;
  • FIG. 4(c) shows step S32 in the overall flow diagram shown in FIG. An exemplary over-threshold substance list generated after threshold comparison;
  • FIG. 4(d) shows schematic content of an exemplary self-learning library generated in step S10 in the overall flow diagram shown in FIG. 2.
  • the qualitative analysis of the Raman spectrum of the sample to be measured is still based on the typical idea of Raman spectroscopy, that is, the comparison with the reference Raman spectrum, that is, the measured Raman spectrum and the reference pull of the sample to be measured Whether the error of the spectroscopy is within a predetermined range, for example, by calculating the similarity between the two.
  • the calculation of the similarity in the above step S31 is, for example, a plurality of methods, for example, calculating the similarity based on the Euclidean distance algorithm as an industry standard algorithm for spectral search; more specifically, as an example, assuming The reference Raman spectrum curve of the sample that has been studied is A(x), and the measured Raman spectrum curve of the sample to be measured is B(x).
  • the maximum likelihood algorithm is used, based on the Euclidean distance algorithm.
  • the similarity between the two can be calculated by equation (1):
  • Corr represents the similarity between the reference Raman spectrum of the sample that has been studied and the measured Raman spectrum of the sample to be measured, and " ⁇ " indicates the dot product operation.
  • the similarity is calculated in an algorithm similar to that described above, but the average of the spectra is subtracted prior to execution of the algorithm.
  • A(x) and B(x) may be sampled separately to obtain n sampling points, respectively denoted as A 1 , A 2 , . . . , A n and B 1 , B 2 , . . . , B n .
  • the similarity of the learned reference Raman spectrum and the measured Raman spectrum of the sample to be measured Corr can be calculated according to formula (2):
  • also represents a dot product operation.
  • A(x) and B(x) may also be sampled separately to obtain n sample points, denoted as A 1 , A 2 , . . . , A n and B 1 , B 2 , respectively. ..., B n , the similarity of the learned reference Raman spectrum and the measured Raman spectrum of the sample to be measured Corr can be calculated according to formula (3):
  • the above similarity calculation may be performed for the entire Raman spectrum, or may be performed only for the portion having the characteristic portion in the Raman spectrum. The closer the similarity value is to 1, the higher the degree of similarity.
  • the threshold of the similarity may be set to 0.9, 0.8, and the like.
  • the similarity threshold is given, for example, by more actual detection sensitivity, accuracy of the detection instrument, and the like.
  • the term "characteristic portion" refers to a key portion of a Raman spectrum curve of a sample to be tested that differs from other samples in a Raman spectrum curve.
  • the feature portion may be one or more feature peaks, feature valleys, phase inflection points, and the like.
  • the above similarity may be weighted based on the peak position, the peak width, and/or the peak height of the characteristic peak.
  • the feature peaks may also be searched and sorted prior to calculating the similarity.
  • the Raman spectrum of each substance is a reflection of the molecular structure of the substance, it has unique structural and mode characteristics.
  • a Raman spectral spectrum can be expressed as a pattern vector in the pattern space, and the analysis of the similarity between the N maps is transformed into the computational pattern space. The similarity of N pattern vectors.
  • the similarity calculation such as the angle cosine method or the Jakedian similarity coefficient method based on the Jachard distance is used, so that the method for calculating the HQI value is simple and fast, and the calculated value is also based on the above-mentioned Euclidean
  • the similarity calculation of the distance algorithm similarly has a fixed interval range between 0 and 1, which is easy to measure. Further, an adjusted cosine similarity algorithm can also be selectively employed.
  • determining whether the error of the Raman spectrum and the reference Raman spectrum of the sample to be measured is within a predetermined range or directly passing peak intensity detection (amplitude detection) and peak position detection (Phase detection or inflection detection) to extract the information of the characteristic peaks, thereby directly comparing the measured Raman spectrum with the information of the characteristic peaks in the reference Raman spectrum.
  • peak intensity detection amplitude detection
  • peak position detection Phase detection or inflection detection
  • the Raman spectrum is biased due to the difference in sample uniformity, instrument noise, fluorescence background, etc., and in the spectral processing process, denoising, baseline correction, etc. will also produce errors.
  • the accuracy of the substance recognition using only the similarity in the recognition process is not high. Therefore, in the embodiment of the present disclosure, the object to be inspected is further qualitatively analyzed, for example, by introducing a combination of the self-learning recognition method and the manual contrast recognition method.
  • Fig. 5 shows a basic schematic flow chart of the false alarm detecting step S5 in the actual detecting phase as shown in Fig. 2.
  • the false alarm detection step S5 is further performed, which is performed.
  • the report detecting step S5 includes two stages: a false positive check pre-processing step S50, S50' and S50"; and a false positive detection post-processing step S51.
  • the false positive detection pre-processing steps S50, S50', and S50" are three logically parallel sub-flows, respectively corresponding to subsequent post-processing steps.
  • S50 corresponds to the first substance ID selection method, that is, the statistical method is used to verify one by one, which is also called “statistical selection” method
  • S50′ corresponds to the second substance ID selection method, that is, the preset “features are called” Corresponding algorithm for identifying the interface to select the verified substance ID, also referred to as the "feature recognition” method
  • S50" corresponds to the third substance ID selection method, that is, calling the corresponding algorithm of the preset "secondary identification interface”
  • the verified substance ID is also referred to as a "secondary recognition” method.
  • S50 is also referred to as a pre-processing step of "statistical selection”
  • S50' is also called For the pre-processing step of "feature recognition”
  • S50 is also referred to as the pre-processing step of "secondary recognition”.
  • the above-described three pre-processing steps S50, S50' and S50" are logically parallel to mean performing independently of each other, for example, substantially simultaneously, or sequentially, or temporally independent of each other.
  • the pre-reporting pre-processing step that is, the pre-processing step S50 of "statistical selection", the pre-processing step S50' of "feature recognition”, and the pre-processing step S50 of "secondary recognition" "For example:
  • Step S500, S500', S500" The false positive check subroutine starts.
  • Step S501, S501', S501" the substance IDs in the list of identification substance IDs (hereinafter referred to as "threshold identification list") of the similarity super-threshold acquired after the threshold comparison are sequentially and (in the corresponding/or corresponding) in the self-learning library A single) "false positive substance ID” field is compared.
  • step S501 is to sequentially compare the IDs in the threshold-valued threshold identification list with the "false positive substance ID” field in the entire self-learning library; step S501 'Comparing the IDs in the threshold-identified threshold identification list with the "false positive substance ID” field in the case where the "self-learning type” field in the self-learning library takes the value of "feature identification”; and the steps S501" is to compare the IDs in the threshold-identified threshold identification list with the "false positive substance ID” field in the case where the "self-learning type” field in the self-learning library takes the value of "secondary recognition";
  • Steps S502, S502', S502" determine whether the same ID is matched (ie, is it recognized that the false positive substance ID exists?).
  • Steps S503, S503', S503" If the same substance ID is matched, it is equivalent to finding a false positive substance ID, and the false alarm count counter is incremented by one.
  • Steps S504, S504', S504" If the same substance ID is not matched, the current substance ID is not a false alarm but is actually considered to exist, and the correct substance ID number is counted. Add 1 to the device.
  • Steps S505, S505', S505" determining whether the comparison of the identification substance ID list is completed. If the comparison is not completed, the process proceeds to step S501, S501', S501" is executed cyclically; if the comparison is completed, the process proceeds to the next step S506, S506', S506. ".
  • Step S506, S506', S506" determining whether the number of false positives is greater than 10. If the number of false positives is less than or equal to 10, it is considered that the number of false positives is insufficient to ensure the smooth progress of the self-learning detection, thereby jumping to manual contrast recognition; If the number is greater than 10, the assignment step of the "Maximum correct substance ID times" field is entered.
  • the number of false positives is set to 10 is an empirical value.
  • the number of false positives exceeds the value, it is determined that the number of false positives generated is sufficient to generate a sufficiently large set of substance IDs to be verified.
  • Subsequent post-processing step S51 performs material ID selection.
  • Steps S507, S507', S507” assigning respective current "correct substance ID times counters" to the corresponding "highest correct substance ID times” field MaxRightIDNum(n), respectively, as a post-processing step S51 to determine whether or not to perform subsequent correspondence.
  • the criterion for the nth substance ID selection method is the criterion for the nth substance ID selection method.
  • the post-false alarm detection post-processing step S51 includes, for example:
  • S511 It is judged that for the above three component flows S50, S50' and S50", the comparison formula "field MaxRightIDNum(n)> corresponding threshold THR(n)? Whether it is established for at least two groups. This judgment is a criterion for dividing whether the highest correct substance ID number is sufficient to ensure the execution of the corresponding substance ID selection method, and if satisfied, at least two substance ID selection methods are available for acquiring at least two The group material ID list is used to jointly verify the existence of the substance ID that can be identified in a program-controlled manner.
  • S514 The same at least two substance lists are used as a list of identification substances that are respectively recognized and jointly confirmed by the corresponding at least two substance ID selection methods.
  • the respective thresholds THR(n) of the field MaxRightIDNum(n) are respectively set, for example, as thresholds for the "statistical selection” method, the "feature recognition” method, and the “secondary recognition” method, respectively.
  • the “feature recognition” method is a dimensionality reduction method used in pattern recognition to remove uncorrelated or redundant features from the original feature set
  • the “secondary recognition” method is used to estimate the mean and covariance matrix, for example, after feature extraction.
  • the classifier is trained to be classified and identified, so that the two can achieve the purpose of reducing the number of features, improving the detection accuracy, and reducing the running time; and the "statistical selection” method is inconsistently comparing and confirming one by one, thereby “statistics”
  • the reliability of the selection method is smaller than the "feature recognition” method or the "secondary recognition” method using pattern recognition, and accordingly, the first threshold THR(1) is set to be compared to the second threshold THR (2).
  • the third threshold THR(3) is larger.
  • the "feature recognition" method is a dimensionality reduction method in pattern recognition for rejecting irrelevant or redundant features from the original feature set, for example in
  • the embodiment of the present disclosure is implemented by calling a plurality of feature recognition interfaces preset in the “feature identification interface” field of the self-learning library, and may be selected as at least one of the following:
  • Filter/Filter which characterizes the importance of each feature by selecting an indicator, and then sorts the features based on the index values of the features, such as by setting thresholds and removing them Feature selection is not performed by the characteristics of the threshold, or by setting the number of features to be selected and selecting the top N or sorting to a certain percentage of the top.
  • the weights represent the importance of the dimension features and are then sorted by weight.
  • the usual filtering method uses the characteristics of the training set to screen out the feature subsets. Generally, the independence of the features or the relationship with the dependent variables, such as chi-square test, information gain, correlation coefficient, etc., are considered.
  • the parcel/encapsulation method essentially considers the selection of feature subsets as a search optimization problem, and generates different combinations (feature subsets) by packaging, and evaluates the combinations and compares them with other combinations, for example.
  • the accuracy of the classification is used as a measure of how good or bad the feature subset is. Therefore, the selection of subsets is regarded as an optimization problem, for example, it can be solved by many optimization algorithms, especially heuristic optimization algorithms, such as genetic algorithm, particle swarm optimization algorithm, differential evolution algorithm, artificial bee colony algorithm and so on. Parcel/encapsulation methods such as recursive feature elimination algorithms.
  • Embedded It uses some machine learning algorithms and models to train, obtains the weight coefficients of each feature, and then selects features according to the weight coefficients from large to small. Similar to the Filter method, but through training to determine the pros and cons of the feature, that is, to learn the best attributes to improve the accuracy of the model in the case of the model. Specifically, in the process of establishing the model, it is important to select the characteristics that are important for the training of the model (for example, the greatest contribution to improving the accuracy). The most common Embedded methods are the regularization methods.
  • the "secondary recognition" method is implemented, for example, by calling a plurality of secondary recognition interfaces preset in a "feature recognition interface” field of the self-learning library, and For example, it is constructed in such a manner as to use a quadratic discriminant equation QDF classifier commonly used in pattern recognition, an MQDF improved quadratic discriminant equation classifier, etc., and the classifier is trained by estimating the mean and covariance matrix, and the covariance matrix reflects the feature. The spread between the two, the greater the covariance, the more information is included, the more accurate the final classification.
  • the obtained statistical selection substance list ID1 is selected statistically from the "false positive substance ID" field of the entire self-learning library; if the field MaxRightIDNum(2)>5
  • the feature recognition interface is called to obtain the feature recognition substance.
  • List ID2 if the field MaxRightIDNum(3)>6 is established, the secondary recognition interface is called to obtain the secondary identification substance list ID3.
  • the substance identification verification is performed independently by using at least two sets of substance ID selection methods, and then the confirmed substance ID list is compared. Once the same, it means that based on the similarity judgment, further A list of identified substance IDs is co-confirmed using at least two independent methods, thereby obtaining a more accurate self-learning substance identification ID than conventional Raman spectroscopy based only on similarity judgment and manually performed Raman spectroscopy. List.
  • the jump to S7 generates a substance list confirmed by the false positive check.
  • Fig. 10 shows a basic schematic flow chart of the false negative detection in the actual detection phase as shown in Fig. 2.
  • the missing report detection step 71 is further performed.
  • the missing report detecting step S6 includes two stages: a missing pre-test pre-processing step S60, S60' and S60"; and a missing-post detection post-processing step S61.
  • the pre-missing detection pre-processing steps S60, S60', and S60" are three logically parallel sub-flows, respectively corresponding to subsequent post-processing steps.
  • the nth (n 1, 2, 3) substance ID selection method to be used in S61: S60 corresponds to the first type, that is, the aforementioned "statistical selection” method; and S60' corresponds to the second type, that is, the aforementioned "feature recognition" The method; and S60” corresponds to the third, ie, the aforementioned "secondary recognition" method.
  • S60 is also referred to as a pre-processing step of "statistical selection”
  • S60' is also referred to as a pre-processing step of "feature recognition”
  • S60" is also referred to as " Pre-processing steps of secondary recognition.
  • the above three pre-processing steps S60, S60' and S60" are logically parallel to mean that they are executed independently of each other, for example, substantially simultaneously, or sequentially, or temporally independent of each other in time. Execution.
  • the pre-reporting pre-processing step that is, the pre-processing step S60 of "statistical selection", "feature recognition”
  • the pre-processing step S60' and the "secondary recognition” pre-processing step S60" include, for example:
  • Steps S600, S600', S600" The underreporting test subroutine begins.
  • Steps S601, S601', S601" The substance IDs in the original identification substance ID list are sequentially compared with the (whole/or corresponding single) "missing substance ID" field in the self-learning library.
  • step S601 is to sequentially compare the IDs in the original identification substance ID list with the "false negative substance ID” field in the entire self-learning library;
  • step S601' is The IDs in the original identification substance ID list are sequentially compared with the "false negative substance ID” field in the case where the "self-learning type” field in the self-learning library takes the value of "feature recognition”;
  • step S601" is the original identification substance The IDs in the ID list are sequentially compared with the "Reporting Substance ID” field in the case where the "Self-learning Type” field in the self-learning library takes the value of "Secondary Recognition";
  • Step S602, S602', S602" It is judged whether or not the same ID is matched (ie, is it recognized that the missing substance ID exists?).
  • Steps S603, S603', S603" If the same substance ID is matched, it is equivalent to finding the missing substance ID once, and the counter of the correct substance ID number (here, equivalent to the number of missing substance IDs) is incremented by one.
  • Steps S604, S604', S604" determining whether the comparison of the identification substance ID list is completed. If the comparison is not completed, the process proceeds to step S601, S601', S601" is cyclically executed; if the comparison is completed, the process proceeds to the next step S605, S605', S605. ".
  • Steps S605, S605', S605" assigning respective current "correct substance ID times counters" to the corresponding "highest correct substance ID times” field MaxRightIDNum(n), respectively, as a post-processing step S61 to determine whether or not to perform subsequent correspondence.
  • the criterion for the nth substance ID selection method is the criterion for the nth substance ID selection method.
  • the missing report detection post-processing step S61 includes, for example:
  • S611 It is judged that for the above three component flows S60, S60' and S60", the comparison formula "field MaxRightIDNum(n)> corresponding threshold value THR(n)'? Whether it is established for at least two groups. This judgment is a criterion for dividing whether the highest correct substance ID number is sufficient to ensure the execution of the corresponding substance ID selection method, and if satisfied, at least two substance ID selection methods are available for acquiring at least two Group material ID list to jointly verify substances that can be identified by program control The existence of the ID.
  • S614 The same at least two substance lists are used as a list of identification substances that are respectively recognized and jointly confirmed by the corresponding at least two substance ID selection methods.
  • the selection and setting of the corresponding threshold THR(n)' of the field MaxRightIDNum(n) is the same as or similar to the false positive detection.
  • the first threshold THR(1)' is set to be larger than the second threshold THR(2) and the third threshold THR(3).
  • the "feature recognition" method and the "secondary recognition” method are also the same or similar, and are respectively executed by calling a plurality of different "feature recognition interfaces" and a plurality of "secondary recognition interfaces".
  • the obtained statistical selection substance list ID1' is selected statistically from the "missing substance ID" field of the entire self-learning library; if the field MaxRightIDNum (2)
  • the feature recognition interface is called to obtain the feature recognition substance list ID2'; if the field MaxRightIDNum(3)'>6 is established, the secondary recognition interface is called to obtain the secondary identification substance list ID3'.
  • step S614 at least two sets of substance ID selection methods are used to independently perform substance identification verification, and then the confirmed substance ID list is compared. Once the same, it means that based on the similarity judgment, further Use at least two separate methods The list of identification substance IDs is collectively confirmed, thereby obtaining a more accurate list of self-learning substance identification IDs than conventional Raman spectroscopy tests based on similarity judgment and manually performed Raman spectroscopy.
  • the jump to S7 generates a substance list confirmed by the false negative check.
  • FIG. 15 shows a schematic diagram of an operation for detecting a Raman spectrum of a test sample using a method according to an embodiment of the present disclosure.
  • the main processes in this example include:
  • Figure 6 shows a schematic flow chart of an extension of the "three method elections" implementation of false positive detection in the actual detection phase as shown in Figure 2.
  • the difference between the false positive detection flow S5 in the example of FIG. 6 and the false positive detection flow S5 in the example of FIG. 5 mainly lies in, as shown in FIG. 6, for example, based on "at least two (by various substance IDs)
  • the false positive detection post-processing step S51 additionally includes an optional step S515, that is, a further "three-method election" based on "intersection". For the sake of brevity, the remaining sub-steps will not be described again.
  • FIG. 7 is a schematic flow chart of a substantially extended exemplary embodiment of false alarm detection as shown in FIG. 6.
  • the difference between the false positive detection flow S5 in the example of FIG. 7 and the false positive detection flow S5 in the example of FIG. 15 is mainly that, as shown in FIG. 7, for example, the optional step of the post-false positive detection processing step S51 S515 specifically includes:
  • Step S5150 It is judged that there is an intersection of at least two of the generated substance lists ID1, ID2, and ID3. If yes, proceed to step S5150, that is, there is an overlap portion of the list of substance IDs respectively selected by using at least two independent methods, and the overlapping portion can be used to generate a list of commonly recognized identification substance IDs; otherwise, jump Go to manual contrast recognition.
  • Step S5151 In the case where step S5150 is established, the intersection is assigned to the first identification list.
  • the first identification list is directly used as a list of substances confirmed after the false positive check in the subsequent step S7.
  • the extended flowchart of the false positive detection S5 shown in FIG. 7 further utilizes at least two independent methods after jointly identifying the identification substance ID list based on the similarity recognition and the determination of the same result using at least two independent methods.
  • Figure 8 is a schematic flow diagram of another further expanded exemplary embodiment of false alarm detection as shown in Figure 6.
  • the difference between the false positive detection flow S5 in the example of FIG. 8 and the false positive detection flow S5 in the example of FIG. 7 is mainly that, as shown in FIG. 8, for example, the optional step of the false positive detection post-processing step S51 S515 additionally includes a list of substance IDs selected for each of at least two independent methods, in addition to confirming the intersection portion, further verifying the non-intersection portion.
  • the optional step S515 of the post-false positive detection processing step S51 additionally includes:
  • S5152 Subtract the intersection of the at least two substance lists ID1, ID2, and ID3 to obtain a list of substances to be rechecked.
  • step S5154 It is judged whether there is a newly confirmed substance list generation after re-incrementing the false alarm detection. If yes, proceed to step S5155, otherwise, go to step S5156.
  • FIG. 9 is a sub-flow diagram of re-false alarm detection S5153 performed using enhanced Raman spectroscopy in another extended exemplary embodiment of false alarm detection as shown in FIG.
  • the re-false alarm detects an exemplary decomposition step of S5153.
  • the re-false alarm detection S5153 includes, for example:
  • S51531 Acquire enhanced Raman spectroscopy by mixing the sample to be tested and the enhancer.
  • S51532 Perform false alarm detection. Specifically, for example, based on enhanced Raman spectroscopy, embedded The sleeve utilizes the aforementioned step S5.
  • S51533 (for example, human confirmation) to determine whether to jump to manual comparison.
  • S51535 Generate a list of substances that are confirmed to be present by re-execution of false alarm detection using enhanced Raman spectroscopy.
  • the mixture of the sample to be tested and the enhancer may be directly mixed by the sample to be tested and the enhancer or by the sample to be tested.
  • the aqueous solution or organic solution is mixed with the reinforcing agent.
  • the mixture of the measured substance sample and the enhancer is formed by directly mixing the sample of the measured substance with the enhancer or by mixing an aqueous solution of the sample of the test substance or an organic solution with the enhancer.
  • the enhancer may comprise any one of metal nanoparticle materials, metal nanowires, metal nanoclusters, carbon nanotubes, and carbon nanoparticles, or a combination thereof.
  • the enhancer may comprise a metal nanomaterial, or may also contain a chloride nanoparticle, a bromide ion, a sodium ion, a potassium ion, or a sulfate ion.
  • the metal may include, for example, any one of gold, silver, copper, magnesium, aluminum, iron, cobalt, nickel, palladium, or platinum, or a combination thereof.
  • Figure 11 shows a schematic flow chart of an extension of the "three method elections" implementation of false negative detection in the actual detection phase as shown in Figure 2.
  • the difference between the missing report detection flow S6 in the example of FIG. 11 and the missing report detection flow S6 in the preferred embodiment of FIG. 10 is mainly as shown in FIG. 11, for example, based on "at least two (by various After the substance ID selection method separately identifies the identified substance ID list, the missing report detection processing step S61 additionally includes an optional step S615, that is, a further "three method election" based on "intersection". For the sake of brevity, the remaining sub-steps will not be described again.
  • FIG. 12 is a schematic flow chart of a substantially expanded exemplary embodiment of the false negative detection shown in FIG.
  • the optional step S615 of the post-report detection post-processing step S61 specifically includes:
  • Step S6150 It is judged that there is an intersection of at least two of the generated substance lists ID1', ID2', ID3'? If yes, proceed to step S6150, that is, there is an overlapped portion of the list of substance IDs respectively selected by using at least two independent methods, and the overlapping portion can be used to generate a list of commonly recognized identification substance IDs; otherwise, jump Go to manual contrast recognition.
  • Step S6151 In the case where step S6150 is established, the intersection is assigned to the first identification list.
  • the first identification list is directly used as a list of substances confirmed by the missing report test in the subsequent step S7.
  • the extended flowchart of the false negative detection S6 shown in FIG. 12 further utilizes at least two independent methods after jointly identifying the identification substance ID list based on the similarity recognition and the determination of the same result using at least two independent methods.
  • FIG. 13 is a schematic flow chart of another further extended exemplary embodiment of the false negative detection shown in FIG.
  • the difference between the missing report detection flow S6 in the example of FIG. 13 and the missing report detection flow S6 in the example of FIG. 12 is mainly that, as shown in FIG. 13, for example, the optional step of the missing report detection post-processing step S61 S615 additionally includes a list of substance IDs selected for each of the at least two independent methods, and further verifying the non-intersection portion in addition to confirming the intersection portion.
  • the optional step S615 of the post-report detection post-processing step S61 additionally includes:
  • S6152 Subtract the intersection of the at least two substance lists ID1, ID2, and ID3 to obtain a list of substances to be re-examined.
  • step S6154 It is judged whether or not a newly confirmed substance list is generated after the enhanced false negative detection is performed again. If yes, proceed to step S6155, otherwise, go to step S6156.
  • the missing report detection S6 of FIG. 13 is substantially on the basis of the example shown in FIG. 12, and is substantially a portion of the "additions other than the intersection" that cannot be confirmed after the "intersection judgment". Perform further analysis and verification. The specific steps are explained in detail below.
  • FIG. 14 is a sub-flowchart of re-false negative detection S6153 performed using enhanced Raman spectroscopy in another extended exemplary embodiment of the false negative detection shown in FIG. 13, which is shown in FIG.
  • the re-missing detection detects an exemplary decomposition step of S6153.
  • the re-missing detection S6153 includes, for example:
  • step S61532 Perform a false negative detection. Specifically, for example, based on the enhanced Raman spectrum, the above-described step S6 is nested.
  • S61533 (for example, human confirmation) to determine whether to jump to manual comparison.
  • S61535 Generate a list of substances that are confirmed to be present by re-executing the false negative detection using the enhanced Raman spectrum.
  • the above specific operation flow has strict logic and can avoid the abnormal operation of the user.
  • the self-learning described above is also replaced, for example, by using a self-learning mixture analysis method.
  • FIG. 16 shows a further flow diagram in accordance with an embodiment of the present disclosure, illustrated as being divided into two phases, a learning phase and an actual detection phase, in which a detection manner regarding the simultaneous presence of false positives and false negatives is shown.
  • FIG. 17 is a block diagram showing an example hardware arrangement 100 of the electronic device.
  • the hardware arrangement 100 includes a processor 106 (eg, a microprocessor ( ⁇ P), a digital signal processor (DSP), etc.).
  • processor 106 may be a single processing unit or a plurality of processing units for performing different acts of the method steps described herein.
  • the arrangement 100 may also include an input unit 102 for receiving signals from other entities, and an output unit 104 for providing signals to other entities.
  • Input unit 102 and output unit 104 may be arranged as a single entity or as separate entities.
  • arrangement 100 can include at least one readable storage medium 108 in the form of a non-volatile or volatile memory, such as an electrically erasable programmable read only memory (EEPROM), flash memory, and/or a hard drive.
  • the readable storage medium 108 includes a computer program 110 that includes code/computer readable instructions that, when executed by the processor 106 in the arrangement 100, cause the hardware arrangement 100 and/or the device including the hardware arrangement 100 to The flow described above in connection with the above embodiments and any variations thereof are performed.
  • Computer program 110 can be configured as computer program code having a computer program module 110A-110C architecture, for example.
  • the code in the computer program of arrangement 100 includes a plurality of modules, including but not limited to, for example, illustrated modules 110A, 110B, and 110C, the plurality of modules Respectively configured to perform different determinations or operational steps, such as any of the processes, sub-processes, sub-processes, and/or steps performed in the previous Figures 1-2, and 5-16 .
  • the computer program module can substantially perform the various actions in the flow described in the above embodiments to simulate the device.
  • different computer program modules when executed in processor 106, they may correspond to the different units described above in the device.
  • code means in the embodiment disclosed above in connection with FIG. 17 is implemented as a computer program module that, when executed in processor 106, causes hardware arrangement 100 to perform the actions described above in connection with the above-described embodiments, in alternative embodiments At least one of the code means can be implemented at least partially as a hardware circuit.
  • the processor may be a single CPU (Central Processing Unit), but may also include two or more processing units.
  • a processor can include a general purpose microprocessor, an instruction set processor, and/or a related chipset and/or a special purpose microprocessor (eg, an application specific integrated circuit (ASIC)).
  • ASIC application specific integrated circuit
  • the processor can also include an onboard memory for caching purposes.
  • the computer program can be carried by a computer program product connected to the processor.
  • the computer program product can comprise a computer readable medium having stored thereon a computer program.
  • the computer program product can be flash memory, random access memory (RAM), read only memory (ROM), EEPROM, and the computer program modules described above can be distributed to different computers in the form of memory within the UE in alternative embodiments. In the program product.
  • the present disclosure has at least the following advantages: it can make full use of the similarity method, the self-learning method, and the combination with the optional manual recognition method to achieve efficient and rapid spectral processing of substance recognition.

Landscapes

  • Health & Medical Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)

Abstract

Disclosed is a method for carrying out self-learning-type qualitative analysis based on a Raman spectrum. The method comprises: a Raman spectrum acquisition step for acquiring a Raman spectrum; a feature extraction and comparison step for extracting spectrum data and comparing same with a spectrum feature library of a spectrogram library so as to obtain an original recognition substance ID list; a similarity comparison step for obtaining, by means of calculation, the similarity between substance IDs in the original recognition substance ID list for the Raman spectrum so as to generate a similarity list, and comparing same with a similarity threshold value library in the spectrogram library; and a substance ID selection step for verifying, based on a self-learning library, a list, obtained after comparison with a similarity threshold value, of similarity recognition substance IDs, the similarity thereof exceeding a threshold value, comprising when there is, in the similarity list, a substance ID exceeding a threshold value corresponding to a substance ID in the similarity threshold value library, executing false positive detection; and when there is not, in the similarity list, a substance ID exceeding a threshold value corresponding to a substance ID in the similarity threshold value library, executing false negative detection.

Description

基于拉曼光谱的自学习式定性分析方法Self-learning qualitative analysis method based on Raman spectroscopy
相关申请的交叉引用Cross-reference to related applications
本申请要求于2016年12月26日递交中国专利局的、申请号为201611220308.2的中国专利申请的权益,该申请的全部内容以引用方式并入本文。The present application claims the benefit of the Chinese Patent Application No. 201611220308.2, filed on Dec. 26, 2016, the entire content of which is hereby incorporated by reference.
技术领域Technical field
本公开涉及拉曼光谱检测领域,尤其涉及一种基于拉曼光谱的自学习式定性分析方法。The present disclosure relates to the field of Raman spectroscopy, and in particular to a self-learning qualitative analysis method based on Raman spectroscopy.
背景技术Background technique
拉曼光谱分析技术是一种以激发光的拉曼散射效应为基础的非接触式光谱分析技术,它能对物质的成分进行定性、定量分析。拉曼光谱是一种分子振动光谱,它可以反映分子的指纹特征,每种物质的拉曼光谱具有独特性。通过将测量得到的拉曼光谱与已知的各种物质拉曼光谱数据库的谱图进行比对来识别出被测物质的成份,从而可用于对物质的检测,具体地已经广泛应用于液体安检、珠宝检测、爆炸物检测、毒品检测、药品检测等领域。Raman spectroscopy is a non-contact spectroscopy technique based on the Raman scattering effect of excitation light. It can qualitatively and quantitatively analyze the composition of a substance. Raman spectroscopy is a molecular vibrational spectroscopy that reflects the fingerprint characteristics of a molecule, and the Raman spectrum of each substance is unique. The Raman spectrum obtained by comparison with the known Raman spectrum database of various substances is used to identify the composition of the substance to be tested, and thus can be used for detecting substances, and has been widely used for liquid security. , jewelry testing, explosives testing, drug testing, drug testing and other fields.
在现有技术中,常规拉曼光谱检测装置通常基于光谱数据库进行搜索测量来实现定性分析,最后显示测量结果,其大致工作流程可概括为:采集谱图数据;对所采集谱图进行预处理;将经预处理后的所采集谱图与谱图库进行比对;获取定性分析结果;显示定性分析结果。In the prior art, a conventional Raman spectroscopy detecting device generally performs a qualitative analysis based on a spectral database to perform a qualitative measurement, and finally displays a measurement result, and the approximate workflow can be summarized as: collecting spectral data; preprocessing the acquired spectral image The pre-processed acquired spectra are compared with the spectral library; the qualitative analysis results are obtained; and the qualitative analysis results are displayed.
两种物质的拉曼光谱相似性例如可以用“相似度”参数来定量地表示,诸如通常较为普遍地采用相似性函数来计算相似度。The Raman spectral similarity of the two species can be quantitatively represented, for example, by a "similarity" parameter, such as the similarity function commonly used to calculate similarity.
然而,这种常规的用于定性分析的拉曼光谱检测方法通常对于纯度不高的物质而言误报率和漏报率均较高,且仅仅是重复且机械地执行与光谱数据库的穷尽性对比直至获得一致的比对结果以完成定性分析,从而分析处理的时间较长;且对两种成份相差很小的样品采用全局的简单重复的拉曼光谱相似度比对分析,很难从相似度计算结果将样品进行区 分,从而当前常规的相似度计算方法和相似度判别阈值也遇到了一定困难。However, this conventional Raman spectroscopy method for qualitative analysis generally has a high false alarm rate and a false negative rate for substances of low purity, and is merely exhaustive and mechanically performed with the exhaustivity of the spectral database. Contrast until a consistent alignment result is obtained to complete the qualitative analysis, so that the analysis process takes a long time; and the sample with a small difference between the two components adopts a global simple repeated Raman spectral similarity alignment analysis, which is difficult to be similar Degree calculation result Therefore, the current conventional similarity calculation method and the similarity discrimination threshold also encounter certain difficulties.
因此,亟需一种改进的对拉曼光谱进行定性分析的方法,其具备自学习能力,且能够充分利用相似度方法、自学习方法以及与可选人工识别方法的组合与来实现高效快速筛分的光谱处理从而达到快速收敛且准确的物质检测。Therefore, there is an urgent need for an improved method for qualitative analysis of Raman spectroscopy, which has self-learning ability and can make full use of similarity method, self-learning method and combination with optional artificial identification method to achieve efficient and rapid screening. The spectral processing of the fractions achieves fast convergence and accurate material detection.
发明内容Summary of the invention
本公开的目的旨在解决现有技术中存在的上述问题和缺陷的至少一个方面。本公开实施例提供一种基于拉曼光谱的自学习式定性分析方法,其通过结合自学习与人工对比两种方式来完成拉曼光谱检测,能够降低定性分析中由于物质纯度不足导致的误报和漏报发生率,提高定性分析的准确性;缩短分析处理时间;以及缩短系统启动时间。The purpose of the present disclosure is to address at least one aspect of the above problems and deficiencies existing in the prior art. The embodiments of the present disclosure provide a self-learning qualitative analysis method based on Raman spectroscopy, which combines self-learning and manual comparison to complete Raman spectroscopy, which can reduce false positives due to insufficient material purity in qualitative analysis. And the incidence of underreporting, improve the accuracy of qualitative analysis; shorten the analysis processing time; and shorten the system startup time.
为实现上述目的,根据本公开的第一方面,本公开的实施例提供了一种基于拉曼光谱进行自学习式定性分析的方法,包括:拉曼光谱采集步骤:采集待实测物品的拉曼光谱;特征提取和对比步骤:提取拉曼光谱数据与谱图库中的光谱特征库比较,获取原始识别物质ID列表;相似度比较步骤:针对拉曼光谱计算获取原始识别物质ID列表中每个物质ID的相似度来生成相似度列表,并且与谱图库中的相似度阈值库进行对比;以及物质ID选择步骤:基于自学习库来对经与相似度阈值比较后所获相似度超相似度阈值的相似度识别物质ID列表进行验证检测,包括误报检测和漏报检测,当相似度列表中存在超过相似度阈值库中所储存的物质ID对应相似度阈值的物质ID时,执行误报检测;当相似度列表中不存在超过相似度阈值库中所储存的物质ID对应相似度阈值的物质ID时,执行漏报检测。In order to achieve the above object, according to a first aspect of the present disclosure, an embodiment of the present disclosure provides a method for self-learning qualitative analysis based on Raman spectroscopy, comprising: a Raman spectroscopy acquisition step: collecting Raman of an item to be measured Spectral; feature extraction and comparison steps: extracting the Raman spectral data and comparing the spectral signature database in the spectral library to obtain a list of original identification substance IDs; similarity comparison step: obtaining each substance in the list of original identification substance IDs for Raman spectroscopy calculation The similarity of the IDs to generate a similarity list, and compared with the similarity threshold library in the spectral library; and the substance ID selection step: the similarity super similarity threshold obtained after comparing with the similarity threshold based on the self-learning library The similarity identification substance ID list is subjected to verification detection, including false alarm detection and false negative detection. When there is a substance ID in the similarity list exceeding the similarity threshold of the substance ID stored in the similarity threshold library, the false alarm detection is performed. ; when there is no substance ID in the similarity list that exceeds the similarity threshold of the substance ID stored in the similarity threshold library Executive false negative test.
根据本公开的一个实施例,当相似度列表中存在超过相似度阈值库中所储存的物质ID对应相似度阈值的物质ID时,先执行误报检测之后再额外地执行漏报检测。According to an embodiment of the present disclosure, when there is a substance ID exceeding the similarity threshold of the substance ID stored in the similarity threshold library in the similarity list, the false negative detection is additionally performed after the false alarm detection is performed.
根据本公开的一个实施例,所述误报检测和所述漏报检测中任一种均设置成选择性地执行三种并行的物质ID选择方法,包括:统计选择 方法:对自学习库中的所有误报或漏报物质ID进行统计选择;特征识别方法:对于自学习库中“自学习类型”取值为“特征识别”的误报或漏报物质ID进行特征识别方式的选择;和二次识别方法:对于自学习库中“自学习类型”取值为“二次识别”的误报或漏报物质ID进行二次识别方式的选择。According to an embodiment of the present disclosure, any one of the false positive detection and the false negative detection is set to selectively perform three parallel material ID selection methods, including: statistical selection Method: statistically select all false positives or missing material IDs in the self-learning library; feature recognition method: for the false alarms or missing material IDs of the “self-learning type” in the self-learning library The selection of the feature recognition method; and the secondary recognition method: the secondary recognition mode is selected for the false alarm or the missing material ID of the "self-learning type" in the self-learning library.
根据本公开的一个实施例,所述误报检测和所述漏报检测中任一种均设置成包括前处理步骤和后处理步骤,所述前处理步骤包括:通过将已识别物质ID列表中的ID与自学习库中对于所有误报或漏报物质ID、对于自学习库中“自学习类型”取值为“特征识别”的误报或漏报物质ID、以及对于自学习库中“自学习类型”取值为“二次识别”的误报或漏报物质ID分别比较,来针对分别生成所述统计选择方法、所述特征识别方法和所述二次识别方法的最高正确物质ID次数;以及所述后处理步骤基于所述统计选择方法、所述特征识别方法和所述二次识别方法的最高正确物质ID次数与各自次数阈值的对比来选择性地执行所述三种物质ID选择方法。According to an embodiment of the present disclosure, any one of the false positive detection and the false negative detection is set to include a pre-processing step and a post-processing step, the pre-processing step comprising: by listing the identified substance IDs ID and self-learning library for all false positives or missing material IDs, false positives or missing material IDs for "self-learning types" in the self-learning library, and "for self-learning libraries" The self-learning type "false identification or missing material ID" whose value is "secondary recognition" is respectively compared to generate the highest correct substance ID of the statistical selection method, the feature recognition method, and the secondary recognition method, respectively. And the post-processing step selectively performing the three substance IDs based on a comparison of the highest correct substance ID number of the statistical selection method, the feature identification method, and the secondary identification method with respective number thresholds Method of choosing.
根据本公开的一个实施例,所述误报检测的前处理步骤中的已识别物质ID列表选择为所述相似度识别物质ID列表。According to an embodiment of the present disclosure, the list of identified substance IDs in the pre-processing step of the false positive detection is selected as the similarity identifying substance ID list.
根据本公开的一个实施例,所述漏报检测的前处理步骤中的已识别物质ID列表选择为所述原始识别物质ID列表。According to an embodiment of the present disclosure, the list of identified substance IDs in the pre-processing step of the missing report detection is selected as the original identification substance ID list.
根据本公开的一个实施例,对于所述自学习库中的所有误报或漏报物质ID获得的所述最高正确物质ID次数的次数阈值被设置为大于对于所述自学习库中“自学习类型”取值为“特征识别”和“二次识别”之一的误报或漏报物质ID获得的所述最高正确物质ID次数的次数阈值。According to an embodiment of the present disclosure, a threshold number of times of the highest correct substance ID number obtained for all false positive or missing material IDs in the self-learning library is set to be larger than "self-learning" for the self-learning library The type "value" is a threshold value of the number of times the highest correct substance ID number obtained by the false positive or missing material ID of one of "feature recognition" and "secondary recognition".
根据本公开的一个实施例,当所述统计选择方法、所述特征识别方法和所述二次识别方法的最高正确物质ID次数与各自相应次数阈值比较时,在条件“最高正确物质ID次数大于次数阈值”成立至少两次的情况下,继续选择性地执行三种并行的物质ID选择方法中满足该条件的方法来生成相应的至少两种识别物质ID列表。According to an embodiment of the present disclosure, when the number of the highest correct substance IDs of the statistical selection method, the feature recognition method, and the secondary recognition method is compared with the respective corresponding number of times thresholds, the condition "the highest correct substance ID number is greater than In the case where the number threshold is established at least twice, the method of satisfying the condition in the three parallel substance ID selection methods is continuously performed selectively to generate the corresponding at least two identification substance ID lists.
根据本公开的一个实施例,所生成的至少两种识别物质ID列表若相等,则确认为经验证检测后的识别物质ID列表。 According to an embodiment of the present disclosure, if the generated at least two identification substance ID lists are equal, it is confirmed as the list of the identified substance IDs after the verification detection.
根据本公开的一个实施例,所生成的至少两种识别物质ID列表若存在交集,则确认交集为经验证检测后的识别物质ID列表。According to an embodiment of the present disclosure, if there is an intersection of the generated at least two identification substance ID lists, the intersection is confirmed as the list of the identified substance IDs after the verification detection.
根据本公开的一个实施例,针对所生成的至少两种识别物质ID列表中的交集以外的部分再次执行所述物质ID选择步骤。According to an embodiment of the present disclosure, the substance ID selection step is performed again for a portion other than the intersection in the generated at least two identification substance ID lists.
根据本公开的一个实施例,再次执行的所述物质ID选择步骤包括利用待实测物品与增强剂混合获取增强拉曼光谱来进行的增强检测。According to an embodiment of the present disclosure, the substance ID selection step performed again includes enhanced detection by using an additive-to-measurement article and an enhancer to obtain an enhanced Raman spectrum.
根据本公开的一个实施例,所述误报检测的前处理步骤中,仅当统计的误报次数大于误报次数阈值时,执行所述误报检测的后处理步骤。According to an embodiment of the present disclosure, in the pre-processing step of the false positive detection, the post-processing step of the false positive detection is performed only when the number of statistical false positives is greater than the false positive number threshold.
根据本公开的一个实施例,所述的方法还包括在对待实测物品执行定性分析完成之后,将获得的误报物质ID列表和漏报物质ID列表按照“自学习类型”字段加入自学习库。According to an embodiment of the present disclosure, the method further includes adding the obtained false positive substance ID list and the missing material ID list to the self-learning library according to the “self-learning type” field after performing the qualitative analysis on the item to be measured.
根据本公开的一个实施例,在对待实测物品执行定性分析之前,所述方法还包括利用学习样本物质对自学习库进行初始学习和输入预置的初始自学习库之一来创建自学习库。According to one embodiment of the present disclosure, prior to performing qualitative analysis on the item to be measured, the method further includes creating a self-learning library using one of an initial self-learning library that performs initial learning and input presets on the self-learning library.
根据本公开的一个实施例,所述方法还包括选择性地利用人工对比方法识别物质。According to an embodiment of the present disclosure, the method further comprises selectively identifying the substance using a manual comparison method.
根据本公开的另一方面,本公开实施例还提供了一种电子设备,包括:存储器,用于存储可执行指令;以及处理器,用于执行存储器中存储的可执行指令,以执行如前所述的方法。According to another aspect of the present disclosure, an embodiment of the present disclosure further provides an electronic device, including: a memory for storing executable instructions; and a processor for executing executable instructions stored in the memory to perform as before Said method.
附图说明DRAWINGS
现在参照随附的示意性附图,仅以举例的方式,描述本公开的实施例,其中,在附图中相应的附图标记表示相应的部件。附图的简要描述如下:Embodiments of the present disclosure will now be described by way of example only, with reference to the accompanying drawings, in which FIG. A brief description of the drawings is as follows:
图1示出根据本公开实施例的一种基础流程示意图,图示成分为学习阶段和实际检测阶段两个阶段;1 shows a schematic diagram of a basic process according to an embodiment of the present disclosure, the illustrated components being two phases of a learning phase and an actual detection phase;
图2示出如图1所示的根据本公开实施例的实际检测阶段的总体流程示意图;2 is a schematic diagram showing the overall flow of an actual detection stage according to an embodiment of the present disclosure as shown in FIG. 1;
图3(a)和3(b)分别示出如图2所示的实际检测阶段的总体流程中的预处理步骤前后的拉曼光谱谱图的示意图; 3(a) and 3(b) respectively show schematic diagrams of Raman spectra before and after the pretreatment step in the overall flow of the actual detection phase shown in FIG. 2;
图4(a)示出如图2所示的总体流程示意图中步骤S31中所获取的示例性的相似度列表;图4(b)示出如图2所示的总体流程示意图中步骤S32中拉曼光谱谱图库中所包括的用于阈值对比的示例性的阈值库;图4(c)示出如图2所示的总体流程示意图中步骤S32中经阈值对比后生成的示例性的超过阈值物质列表;图4(d)示出如图2所示的总体流程示意图中步骤S10所生成的示例性自学习库的示意性内容;4(a) shows an exemplary similarity list obtained in step S31 in the overall flow diagram shown in FIG. 2; FIG. 4(b) shows step S32 in the overall flow diagram shown in FIG. An exemplary threshold library for threshold comparison included in the Raman spectral spectrum library; FIG. 4(c) shows an exemplary excess generated after threshold comparison in step S32 in the overall flow diagram shown in FIG. 2. a list of threshold substances; FIG. 4(d) shows schematic content of an exemplary self-learning library generated by step S10 in the overall flow diagram shown in FIG. 2;
图5示出如图2所示的实际检测阶段中误报检测的基本示意性流程图;FIG. 5 shows a basic schematic flow chart of false alarm detection in the actual detection phase as shown in FIG. 2;
图6示出如图2所示的实际检测阶段中误报检测的关于“三种方法选举”实现方式的扩展的示意性流程图;Figure 6 is a schematic flow chart showing an extension of the "three method election" implementation of false alarm detection in the actual detection phase as shown in Figure 2;
图7是如图6所示的误报检测的一种扩展的示例性实施例的示意性流程图;Figure 7 is a schematic flow diagram of an extended exemplary embodiment of false alarm detection as shown in Figure 6;
图8是如图6所示的误报检测的另一种扩展的示例性实施例的示意性流程图;Figure 8 is a schematic flow chart of another extended exemplary embodiment of false alarm detection as shown in Figure 6;
图9是如图8所示的误报检测的另一种扩展的示例性实施例中的利用增强拉曼光谱执行的重新误报检测的子流程图,示出如图8所示的重新误报检测的示例性分解步骤;Figure 9 is a sub-flow diagram of re-false alarm detection performed using enhanced Raman spectroscopy in another extended exemplary embodiment of false alarm detection as shown in Figure 8, showing a re-error as shown in Figure 8. An exemplary decomposition step of reporting detection;
图10示出如图2所示的实际检测阶段中漏报检测的基本示意性流程图;Figure 10 shows a basic schematic flow chart of the false negative detection in the actual detection phase as shown in Figure 2;
图11示出如图2所示的实际检测阶段中漏报检测的关于“三种方法选举”实现方式的扩展的示意性流程图;Figure 11 is a schematic flow chart showing an extension of the "three method election" implementation of the false negative detection in the actual detection phase as shown in Figure 2;
图12是如图11所示的漏报检测的一种扩展的示例性实施例的示意性流程图;Figure 12 is a schematic flow chart of an extended exemplary embodiment of the false negative detection shown in Figure 11;
图13是如图11所示的漏报检测的另一种扩展的示例性实施例的示意性流程图;Figure 13 is a schematic flow chart of another extended exemplary embodiment of the false negative detection shown in Figure 11;
图14是如图13所示的漏报检测的另一种扩展的示例性实施例中的利用增强拉曼光谱执行的重新漏报检测的子流程图,示出如图13所示的重新漏报检测的示例性分解步骤;Figure 14 is a sub-flow diagram of re-missing detection performed using enhanced Raman spectroscopy in another extended exemplary embodiment of the false negative detection shown in Figure 13, showing the re-leakage as shown in Figure 13 An exemplary decomposition step of reporting detection;
图15示出根据本公开的如图1所示实施例的方法的操作示意图;Figure 15 shows an operational schematic of the method of the embodiment of Figure 1 in accordance with the present disclosure;
图16示出根据本公开实施例的又一流程示意图,图示成也分为学 习阶段和实际检测阶段两个阶段,其中示出关于同时存在误报和漏报可能的检测方式;16 shows still another flow diagram according to an embodiment of the present disclosure, which is illustrated as being divided into Two phases, the learning phase and the actual testing phase, which show the possible detection methods for simultaneous false positives and false negatives;
图17是示出了根据本发明的又一实施例的一种电子设备的示例硬件布置的框图。FIG. 17 is a block diagram showing an example hardware arrangement of an electronic device in accordance with still another embodiment of the present invention.
具体实施方式detailed description
为使本公开的上述目的、特征和优点能够更加显而易见,下面通过实施例,并结合附图,对本公开的技术方案作进一步具体的说明。在说明书中,相同或相似的附图标号表示相同或相似的部件。下述参照附图对本公开实施方式的说明旨在对本公开的总体发明构思进行解释,而不应当理解为对本公开的一种限制。The above described objects, features and advantages of the present disclosure will be more apparent from the following description of the embodiments of the invention. In the specification, the same or similar reference numerals indicate the same or similar parts. The description of the embodiments of the present disclosure is intended to be illustrative of the present invention, and is not to be construed as limiting
另外,在下面的详细描述中,为便于解释,阐述了许多具体的细节以提供对本披露实施例的全面理解。然而明显地,一个或多个实施例在没有这些具体细节的情况下也可以被实施。在其他情况下,公知的结构和装置以图示的方式体现以简化附图。In the following detailed description, numerous specific details are set forth Obviously, however, one or more embodiments may be practiced without these specific details. In other instances, well-known structures and devices are shown in the drawings in the drawings.
根据本公开的总体构思,提供一种基于拉曼光谱的自学习式定性分析方法,包括:拉曼光谱采集步骤:采集待实测物品的拉曼光谱;特征提取和对比步骤:提取拉曼光谱数据与谱图库中的光谱特征库比较,获取原始识别物质ID列表;相似度比较步骤:针对拉曼光谱计算获取原始识别物质ID列表中每个物质ID的相似度来生成相似度列表,并且与谱图库中的相似度阈值库进行对比;以及物质ID选择步骤:基于自学习库来对经与相似度阈值比较后所获相似度超相似度阈值的相似度识别物质ID列表进行验证检测,包括误报检测和漏报检测,当相似度列表中存在超过相似度阈值库中所储存的物质ID对应相似度阈值的物质ID时,执行误报检测;当相似度列表中不存在超过相似度阈值库中所储存的物质ID对应相似度阈值的物质ID时,执行漏报检测。According to the general idea of the present disclosure, a self-learning qualitative analysis method based on Raman spectroscopy is provided, comprising: a Raman spectroscopy acquisition step: acquiring a Raman spectrum of an item to be measured; a feature extraction and comparison step: extracting Raman spectroscopy data Comparing with the spectral feature library in the spectral library, obtaining a list of original identification substance IDs; similarity comparison step: obtaining the similarity degree of each substance ID in the original identification substance ID list for the Raman spectrum calculation to generate a similarity list, and the spectrum Comparing the similarity threshold library in the gallery; and the substance ID selection step: verifying, based on the self-learning library, the similarity identification substance ID list obtained by comparing the similarity super-similarity threshold with the similarity threshold, including the error Report detection and false negative detection. When there is a substance ID in the similarity list that exceeds the similarity threshold of the substance ID stored in the similarity threshold database, false alarm detection is performed; when there is no similarity threshold library in the similarity list When the substance ID stored in the substance corresponds to the substance ID of the similarity threshold, the false negative detection is performed.
另外,在下面的详细描述中,为便于解释,阐述了许多具体的细节以提供对本披露实施例的全面理解。然而明显地,一个或更多个实施例在没有这些具体细节的情况下也可以被实施。In the following detailed description, numerous specific details are set forth Obviously, however, one or more embodiments may be practiced without these specific details.
图1示出根据本公开实施例的一种基础流程示意图,图示成分为学 习阶段和实际检测阶段两个阶段。FIG. 1 shows a schematic diagram of a basic process according to an embodiment of the present disclosure. Two stages of the learning phase and the actual testing phase.
在学习阶段,主要目的是建立供实际检测使用的样品的拉曼光谱自学习库。在实际检测阶段,则是利用自学习库,并且诸如结合人工对比拉曼光谱,来对实际的待测样品进行检测,以得出定性分析的结果。In the learning phase, the main purpose is to establish a Raman spectral self-learning library for the samples used for actual testing. In the actual detection stage, the self-learning library is used, and the actual sample to be tested is detected by combining artificial contrast Raman spectroscopy to obtain the result of the qualitative analysis.
上述学习阶段也可等效地视为自学习库的预设置阶段或标定阶段,例如典型地包括以下步骤:测量学习样本的拉曼光谱,诸如通过提取其光谱特征并且与光谱特征库比较;并且例如通过基于光谱特征的比较,来获取相似度列表并且与相似度阈值库进行比较;判断是否有超过阈值的物质存在,并且基于判断结果,(1)如果经与阈值库比较存在超过其中所列相似度阈值的物质ID,则执行误报检测(即判断当前检出的超过相似度阈值的物质之中是否有因误报而实质上并未包括于当前学习样本内的物质),所述误报检测例如通过与已有自学习库中的误报物质ID或名称进行比较、并且进一步选择性地采用不同自学习类型的方法来选择出误报物质ID,以及(2)如果不存在超过其中所列相似度阈值的物质ID,则执行漏报检测(即判断当前是否有因漏报而实质上包括于当前学习样本内但被测量为不超过相似度阈值的物质),所述漏报检测例如通过与已有自学习库中的漏报物质ID或名称进行比较、并且进一步选择性地采用不同自学习类型的方法来选择出漏报物质ID;之后可选地判断是否进行人工对比并且基于判断结果选择性地执行人工对比;最后将诸如正确识别的物质ID及其校正识别类型(即误报、漏报)的信息录入到自学习库汇中作为自学习库的初始预置值的一部分。上述过程可以对于一种或多种学习样本分别实施,直至不再有所需的新的学习样本的拉曼光谱需要采集和定性检测为止。The above learning phase can also be equivalently considered as a pre-set or calibration phase of the self-learning library, for example typically comprising the steps of measuring the Raman spectrum of the learning sample, such as by extracting its spectral features and comparing it to the spectral feature library; For example, by comparing the spectral features, the similarity list is obtained and compared with the similarity threshold library; whether there is a substance exceeding the threshold exists, and based on the judgment result, (1) if there is more than the listed in the comparison with the threshold library The substance ID of the similarity threshold performs false positive detection (ie, whether there is a substance that is detected by the current similarity threshold exceeding the similarity threshold and is not substantially included in the current learning sample due to a false alarm), the error The report detection selects the false positive substance ID, for example, by comparing with the false positive substance ID or name in the existing self-learning library, and further selectively adopting different self-learning type methods, and (2) if there is no more than The material ID of the similarity threshold is listed, and the false negative detection is performed (ie, it is judged whether there is currently a false report and is substantially included in the current learning). Within the sample but measured as a substance that does not exceed the similarity threshold, the missing detection is compared, for example, by comparison with a missing substance ID or name in an existing self-learning library, and further selectively using different self-learning types The method selects the missing material ID; then optionally determines whether to perform manual comparison and selectively performs manual comparison based on the judgment result; finally, the substance ID such as the correct identification and the correction identification type (ie, false alarm, false negative) The information is entered into the self-learning library as part of the initial preset value of the self-learning library. The above process can be performed separately for one or more learning samples until the Raman spectra of the new learning samples that are no longer needed require acquisition and qualitative detection.
上述实际检测阶段也可等效地视为基于已生成的自学习库来对待实测样品进行定性分析的阶段,例如典型地包括以下步骤:测量待实测样品的拉曼光谱,诸如通过提取其光谱特征并且与光谱特征库比较;并且例如通过基于光谱特征的比较,来获取相似度列表并且与相似度阈值库进行比较;判断是否有超过阈值的物质存在,并且基于判断结果,(1)如果经与阈值库比较存在超过其中所列相似度阈值的物质ID,则执行误报检测(即判断当前检出的超过相似度阈值的物质之中是否有因误报 而实质上并未包括于当前待实测样品内的物质),所述误报检测例如通过与已有自学习库中的误报物质ID或名称进行比较、并且进一步选择性地采用不同自学习类型的方法来选择出误报物质ID,以及(2)如果不存在超过其中所列相似度阈值的物质ID,则执行漏报检测(即判断当前是否有因漏报而实质上包括于当前待实测样品内但被测量为不超过相似度阈值的物质),所述漏报检测例如通过与已有自学习库中的漏报物质ID或名称进行比较、并且进一步选择性地采用不同自学习类型的方法来选择出漏报物质ID;之后可选地判断是否进行人工对比并且基于判断结果选择性地执行人工对比;最后显示定性分析的识别结果;继而将诸如正确识别的物质ID及其校正识别类型(即误报、漏报)的信息录入到自学习库汇中作为自学习库的初始预置值的一部分。The actual detection phase described above can also be equivalently considered as a stage for qualitative analysis of a test sample based on a generated self-learning library, for example typically comprising the steps of measuring the Raman spectrum of the sample to be measured, such as by extracting its spectral characteristics. And comparing with the spectral feature library; and obtaining the similarity list based on the comparison of the spectral features and comparing with the similarity threshold library; determining whether there is a substance exceeding the threshold, and based on the judgment result, (1) if If there is a substance ID that exceeds the similarity threshold listed in the threshold database, then a false positive detection is performed (ie, it is determined whether there is a false positive in the currently detected substance exceeding the similarity threshold. While not substantially included in the current sample to be measured, the false positive detection is compared, for example, by comparison with a false positive substance ID or name in an existing self-learning library, and further selectively adopting different self-learning types. The method of selecting the false positive substance ID, and (2) if there is no substance ID exceeding the similarity threshold listed therein, performing the false negative detection (ie, determining whether there is currently a false negative report and substantially including the current to be measured a sample within the sample but measured as not exceeding a similarity threshold, the missing detection being compared, for example, by comparison with a missing substance ID or name in an existing self-learning library, and further selectively employing different self-learning types The method selects the missing material ID; then optionally determines whether to perform manual comparison and selectively performs manual comparison based on the determination result; finally displays the identification result of the qualitative analysis; and then the substance ID such as the correct identification and the correction identification type thereof The information (ie, false positives, false negatives) is entered into the self-learning library as part of the initial preset value of the self-learning library.
对于常规的拉曼光谱测量方法而言,如果仅采用直接对待实测样品进行检测并根据原始拉曼光谱数据进行判定,在一些情况下检测的准确性对于某些样品例如纯度不足的样品难以保证;且如果仅采用人工比对的方法,通常基于检测者的经验进行,也无法得到客观准确的测试结果;并且,常规的拉曼光谱检测方法至多生成初始的标定样品数据库用于直接对比,没有自学习能力,在例如对不同组分的混合物中的物质执行定性分析时适应灵活性不足。而且,常规的拉曼光谱测量方法普遍存在分析处理时间较长的问题。For conventional Raman spectroscopy methods, if only the measured samples are directly tested and judged based on the original Raman spectroscopy data, in some cases the accuracy of the detection is difficult to guarantee for certain samples such as samples of insufficient purity; And if only the artificial comparison method is used, usually based on the experience of the tester, objective and accurate test results cannot be obtained; and the conventional Raman spectroscopy detection method generates at most the initial calibration sample database for direct comparison, and there is no self. Learning ability to adapt to lack of flexibility when performing qualitative analysis, for example, on substances in mixtures of different components. Moreover, the conventional Raman spectroscopy method generally has a problem that the analysis processing time is long.
从上述如图1所示的本公开实施例的示意性基础流程图可知,根据本公开的实施例的基于拉曼光谱的自学习式定性分析方法利用了自学习与人工对比相结合的方式对待实测样品进行检测。这种方式诸如通过在预先进行的学习阶段中利用学习样本进行的学习、以及在实际使用中对于不同的待测物质样品的定性分析的结果,来不断对自学习库进行增补完善,从而通过自学习而提升识别结果准确度和效率,使得能够将对基于拉曼光谱的物质定性分析的检测效率和检测准确性进行最佳的优化,特别是在物质纯度不足,无法被常规的拉曼检测方法所直接识别的情况下。It can be seen from the above-described schematic basic flowchart of the embodiment of the present disclosure as shown in FIG. 1 that the self-learning qualitative analysis method based on Raman spectroscopy according to an embodiment of the present disclosure utilizes a combination of self-learning and manual comparison. The measured sample is tested. In this way, the self-learning library is continuously supplemented and perfected, such as learning by using learning samples in a pre-staged learning phase, and the results of qualitative analysis of different samples of the substance to be tested in actual use. Learning to improve the accuracy and efficiency of the recognition results, so that the detection efficiency and detection accuracy of qualitative analysis based on Raman spectroscopy can be optimally optimized, especially in the case of insufficient material purity, which cannot be used by conventional Raman detection methods. In case of direct identification.
作为示例,为了防止由于所述学习样本本身给校准工作带来误差,所述所述学习样本例如可选择为生成的谱图中特征峰清晰、峰位均匀、 干扰小的物质样本。并且,期望所述学习样本被选择为峰位间隔更加均匀且有一定间隔,以利于更准确的进行预学习。在本公开的实施例中,所述学习样本例如为液体或固体样品。并且,例如考虑到通常待实测样品为多种物质的混合物,所述学习样本例如选择为单一组分纯度并不占绝对优势的多种组分的混合物以适应于后期实测中进行比对匹配的要求。As an example, in order to prevent an error in the calibration work due to the learning sample itself, the learning sample may, for example, be selected such that the characteristic peaks in the generated spectrum are clear, the peak position is uniform, A sample of a substance that interferes with small substances. Moreover, it is desirable that the learning samples are selected to have a more uniform peak interval and a certain interval to facilitate more accurate pre-learning. In an embodiment of the present disclosure, the learning sample is, for example, a liquid or solid sample. And, for example, considering that the sample to be actually tested is a mixture of a plurality of substances, the learning sample is selected, for example, as a mixture of a plurality of components whose single component purity is not absolutely superior to be adapted for comparison in a later measurement. Claim.
作为示例,所述学习样本的拉曼光谱图例如具有至少四个特征峰。较多的特征峰数量对于初始学习的准确度以提高后续基于自学习库进行的定性检测操作的准确性是有益的。但这不是必须的,所述学习样本也例如可以具有两个或三个特征峰。As an example, the Raman spectrum of the learning sample has, for example, at least four characteristic peaks. The greater number of characteristic peaks is beneficial to the accuracy of the initial learning to improve the accuracy of subsequent qualitative detection operations based on the self-learning library. However, this is not essential, and the learning sample can also have, for example, two or three characteristic peaks.
在根据本公开的基于拉曼光谱的自学习式定性分析方法中,一方面,可以先利用代表性的学习样本物品建立起来初始的自学习库;另一方面,上述学习阶段并不是必需的。例如,操作者能够利用预先输入的自学习库而非经训练而新生成的自学习库,来进行待实测样品物质的定性分析。再一方面,上述预先的自学习阶段也不必在实际检测之前进行很久,例如,替代地在检测现场对实测样品物质进行检测的同时进行自学习,在使用过程中通过累积对新增的检测样品物质而向自学习库中添加。In the Raman spectroscopy-based self-learning qualitative analysis method according to the present disclosure, on the one hand, an initial self-learning library may be established using representative learning sample items; on the other hand, the above learning phase is not necessary. For example, the operator can perform a qualitative analysis of the sample material to be measured using a self-learning library that is input in advance rather than a newly generated self-learning library. On the other hand, the above-mentioned pre-self-learning phase does not have to be performed for a long time before the actual detection, for example, instead of self-learning while detecting the measured sample substance at the inspection site, the newly added test sample is accumulated during use. Add substances to the self-learning library.
图2示出如图1所示的根据本公开实施例的实际检测阶段的总体流程示意图。在本公开的实施例的定性分析中,为了缩短分析处理时间和系统启动时间,将通常的拉曼检测中的整体的谱图库细分为包括多个子库:光谱特征库,诸如是通过将谱图的峰个数、峰位、峰强等一些基本物征抽取出来而生成所述光谱特征库以供算法对比识别使用,在软件启动时载入;(相似度)阈值库,包括识别谱图的相似度最低阈值、物质ID、库号等信息,供显示处理使用,在软件启动时载入;物质名称库,包括物质ID、名称、别名等信息,供软件显示处理时使用。由此,各个细分的子库分别在相应的检测步骤处载入用于比较,而不必总是将完整的谱图库整体地或多次载入,从而缩短了各个步骤的响应时间,提高了检测速度。2 shows a general flow diagram of an actual detection phase in accordance with an embodiment of the present disclosure as shown in FIG. In the qualitative analysis of the embodiments of the present disclosure, in order to shorten the analysis processing time and the system startup time, the overall spectral library in the usual Raman detection is subdivided into a plurality of sub-libraries: a spectral feature library, such as by Some basic features such as peak number, peak position, and peak intensity of the graph are extracted to generate the spectral feature library for use in algorithm comparison and identification, and are loaded at software startup; (similarity) threshold library, including recognition spectrum The similarity threshold, material ID, library number and other information are used for display processing and loaded at software startup; the substance name library includes material ID, name, alias and other information for use in software display processing. Thus, the sub-libraries of the respective subdivisions are respectively loaded for comparison at the respective detection steps, and it is not necessary to always load the complete spectral library as a whole or multiple times, thereby shortening the response time of each step and improving the response time. Detection speed.
则如图2所示可知,实际检测阶段例如包括:As shown in FIG. 2, the actual detection phase includes, for example:
步骤S0:开始; Step S0: start;
步骤S1:生成待检测拉曼光谱并且提取拉曼光谱数据;Step S1: generating a Raman spectrum to be detected and extracting Raman spectrum data;
步骤S2:利用所提取拉曼光谱数据与光谱特征库进行比较;Step S2: comparing the extracted Raman spectral data with a spectral feature library;
步骤S3:运用相似度计算和相似度阈值对比来生成初步确定物质列表;Step S3: using a similarity calculation and a similarity threshold comparison to generate a preliminary determined substance list;
步骤S4:判断是否存在超过阈值的物质?Step S4: Determine whether there is a substance exceeding the threshold?
步骤S5:针对判断为存在超过阈值的物质的情况,进一步执行误报检测;Step S5: further performing false alarm detection for the case where it is determined that there is a substance exceeding the threshold;
步骤S6:针对判断为不存在超过阈值的物质的情况,进一步执行漏报检测;Step S6: further performing false negative detection for the case where it is determined that there is no substance exceeding the threshold;
步骤S7:生成经误报(或漏报)检验后确认的物质列表;Step S7: generating a list of substances confirmed by a false positive (or missing report) test;
(可选的)步骤S8:人工比对拉曼光谱的检测;(Optional) Step S8: Manually comparing the detection of Raman spectroscopy;
步骤S9:生成最终检测确认的物质列表,并且从物质库中找出物质名称;Step S9: generating a list of substances for final detection confirmation, and finding a substance name from the substance library;
步骤S10:本次的全部检验结果写入自学习库;Step S10: all the test results of the current time are written into the self-learning library;
以及步骤S11:显示定性分析的检测结果,本次检测过程终止。And step S11: displaying the detection result of the qualitative analysis, and the current detection process is terminated.
作为示例性实施例,具体而言,上述步骤S1例如具体还包括:As an exemplary embodiment, specifically, the foregoing step S1 specifically includes:
步骤S11:采集拉曼光谱,例如可以通过光束发射、收集、分光等已知过程获得;Step S11: collecting a Raman spectrum, which can be obtained, for example, by a known process such as beam emission, collection, and splitting;
步骤S12:对所采集到的拉曼光谱进行预处理,得到待测的原始拉曼光谱;Step S12: pre-processing the collected Raman spectrum to obtain a raw Raman spectrum to be tested;
步骤S13:从待测的原始拉曼光谱中提取光谱数据。Step S13: extracting spectral data from the original Raman spectrum to be tested.
由于拉曼光谱仪采集到的样品原始光谱包含荧光背景、检测器(CCD)噪声、发射器功率波动等干扰信息,会影响后续的比对和信号处理。因此,需要对测量得到的原始光谱数据进行如上述步骤S12所示的预处理,以利于后续有效信息的提取。上述步骤S12的预处理光谱预处理一般包括插值、去噪、基线校正、归一化处理等,特别是主要目的旨在对输入的光谱图信号进行平滑去噪处理。预处理前后的光谱图信号如图3(a)和3(b)分别示出。在本公开实施例中,采集的原始光谱一般需要经过预处理,为了简洁,下文不再赘述。Since the original spectrum of the sample collected by the Raman spectrometer contains interference information such as fluorescent background, detector (CCD) noise, and transmitter power fluctuations, subsequent alignment and signal processing are affected. Therefore, the measured raw spectral data needs to be preprocessed as shown in step S12 above to facilitate the extraction of subsequent valid information. The pre-processed spectral pre-processing of the above step S12 generally includes interpolation, de-noising, baseline correction, normalization processing, etc., in particular, the main purpose is to perform smooth denoising processing on the input spectrogram signal. Spectral signals before and after pre-processing are shown in Figures 3(a) and 3(b), respectively. In the embodiment of the present disclosure, the collected original spectrum generally needs to be pre-processed, and for brevity, it will not be described below.
并且,上述步骤S3例如具体地包括: Moreover, the above step S3 includes, for example, specifically:
步骤S31:计算获取相似度列表;Step S31: calculating a list of acquired similarities;
步骤S32:相似度列表与相似度阈值库进行对比,并且获取超过阈值的物质列表。Step S32: The similarity list is compared with the similarity threshold library, and a substance list exceeding the threshold is acquired.
在上述实施例中,为清楚起见,图4(a)示出如图2所示的总体流程示意图中步骤S31中所获取的示例性的相似度列表;图4(b)示出如图2所示的总体流程示意图中步骤S32中拉曼光谱谱图库中所包括的用于阈值对比的示例性的阈值库;图4(c)示出如图2所示的总体流程示意图中步骤S32中经阈值对比后生成的示例性的超过阈值物质列表;图4(d)示出如图2所示的总体流程示意图中步骤S10所生成的示例性自学习库的示意性内容。In the above embodiment, for the sake of clarity, FIG. 4(a) shows an exemplary similarity list obtained in step S31 in the overall flow diagram shown in FIG. 2; FIG. 4(b) shows FIG. An exemplary threshold library for threshold comparison included in the Raman spectral spectrum library in step S32 is shown in the overall flow diagram; FIG. 4(c) shows step S32 in the overall flow diagram shown in FIG. An exemplary over-threshold substance list generated after threshold comparison; FIG. 4(d) shows schematic content of an exemplary self-learning library generated in step S10 in the overall flow diagram shown in FIG. 2.
对于待实测物质样品的拉曼光谱图的定性分析,仍基于拉曼光谱识别的典型思路,即与参考拉曼光谱图的比对,即判定待实测物质样品的测量拉曼光谱图与参考拉曼光谱图的误差是否在预定范围内,例如通过计算两者的相似度来进行。作为示例,如上述步骤S31中的相似度的计算例如有多种方法,例如,基于作为用于光谱搜索的工业标准算法的欧氏距离算法来计算相似度;更具体而言,作为示例,假定已学习过的样品的参考拉曼光谱图曲线为A(x),待实测样品的测量拉曼光谱图曲线为B(x),在一示例中,采用最大似然算法,基于欧氏距离算法,可以通过式(1)对两者的相似度进行计算:The qualitative analysis of the Raman spectrum of the sample to be measured is still based on the typical idea of Raman spectroscopy, that is, the comparison with the reference Raman spectrum, that is, the measured Raman spectrum and the reference pull of the sample to be measured Whether the error of the spectroscopy is within a predetermined range, for example, by calculating the similarity between the two. As an example, the calculation of the similarity in the above step S31 is, for example, a plurality of methods, for example, calculating the similarity based on the Euclidean distance algorithm as an industry standard algorithm for spectral search; more specifically, as an example, assuming The reference Raman spectrum curve of the sample that has been studied is A(x), and the measured Raman spectrum curve of the sample to be measured is B(x). In an example, the maximum likelihood algorithm is used, based on the Euclidean distance algorithm. The similarity between the two can be calculated by equation (1):
Figure PCTCN2017109712-appb-000001
Figure PCTCN2017109712-appb-000001
其中Corr表示已学习过的样品的参考拉曼光谱图和待实测样品的测量拉曼光谱图的相似度,“·”表示点积运算。Corr represents the similarity between the reference Raman spectrum of the sample that has been studied and the measured Raman spectrum of the sample to be measured, and "·" indicates the dot product operation.
在另一替代示例中,以与上述类似的算法计算相似度,但在执行算法之前先减去光谱的平均值。具体而言,可以对A(x)和B(x)分别进行采样以各获得n个采样点,分别表示为A1,A2,…,An以及B1,B2,…,Bn,则已学习的参考拉曼光谱图和待实测样品的测量拉曼光谱图的相似度Corr可以根据式(2)进行计算: In another alternative example, the similarity is calculated in an algorithm similar to that described above, but the average of the spectra is subtracted prior to execution of the algorithm. Specifically, A(x) and B(x) may be sampled separately to obtain n sampling points, respectively denoted as A 1 , A 2 , . . . , A n and B 1 , B 2 , . . . , B n . , the similarity of the learned reference Raman spectrum and the measured Raman spectrum of the sample to be measured Corr can be calculated according to formula (2):
Figure PCTCN2017109712-appb-000002
Figure PCTCN2017109712-appb-000002
其中,“·”也表示点积运算。Among them, "·" also represents a dot product operation.
在又一替代示例中,亦可以对A(x)和B(x)分别进行采样以各获得n个采样点,分别表示为A1,A2,…,An以及B1,B2,…,Bn,则已学习的参考拉曼光谱图和待实测样品的测量拉曼光谱图的相似度Corr可以根据式(3)进行计算:In still another alternative example, A(x) and B(x) may also be sampled separately to obtain n sample points, denoted as A 1 , A 2 , . . . , A n and B 1 , B 2 , respectively. ..., B n , the similarity of the learned reference Raman spectrum and the measured Raman spectrum of the sample to be measured Corr can be calculated according to formula (3):
Figure PCTCN2017109712-appb-000003
Figure PCTCN2017109712-appb-000003
上述相似度计算可以针对整个拉曼光谱图进行,也可以仅针对于拉曼光谱图中具有特征部分的局部进行。相似度值越接近于1,表明相似程度越高。以上仅是给出了一些相似度计算的示例,本领域技术人员所知的一些其他的相似度计算方法也是可行的。判定待实测样品的测量拉曼光谱图与已学习的参考拉曼光谱图的误差是否在预定范围内,可以通过上述相似度大于一定的阈值来确定。作为示例,该相似度的阈值可以设定为0.9,0.8等等。所述相似度阈值例如更加实际的检测灵敏度需要,检测仪器的精度等因素来给出。The above similarity calculation may be performed for the entire Raman spectrum, or may be performed only for the portion having the characteristic portion in the Raman spectrum. The closer the similarity value is to 1, the higher the degree of similarity. The above is merely an example of some similarity calculations, and some other similarity calculation methods known to those skilled in the art are also feasible. It is determined whether the error of the measured Raman spectrum of the sample to be measured and the learned reference Raman spectrum is within a predetermined range, and may be determined by the above similarity being greater than a certain threshold. As an example, the threshold of the similarity may be set to 0.9, 0.8, and the like. The similarity threshold is given, for example, by more actual detection sensitivity, accuracy of the detection instrument, and the like.
在本公开中,术语“特征部分”是指某种待测样品的拉曼光谱曲线中有别于其它样品的拉曼光谱曲线的关键部分。例如,所述特征部分可以是一个或更多个特征峰、特征谷、相位拐点等等。并且,例如,在待实测样品的拉曼光谱曲线包括特征峰的情况下,上述相似度可以基于所述特征峰的峰位、峰宽和/或峰高来进行加权计算。在一示例中,在计算所述相似度之前,还可以对所述特征峰进行搜索和排序。In the present disclosure, the term "characteristic portion" refers to a key portion of a Raman spectrum curve of a sample to be tested that differs from other samples in a Raman spectrum curve. For example, the feature portion may be one or more feature peaks, feature valleys, phase inflection points, and the like. And, for example, in the case where the Raman spectrum curve of the sample to be measured includes a characteristic peak, the above similarity may be weighted based on the peak position, the peak width, and/or the peak height of the characteristic peak. In an example, the feature peaks may also be searched and sorted prior to calculating the similarity.
以上仅是给出了一些相似度计算的示例,本领域技术人员所知的一些其他的相似度计算方法也是可行的。例如,与以上基于作为光谱搜索的工业标准算法的欧氏距离的相似度算法相区别,还替代地例如采用基 于闵氏距离公式中不同于p=2的欧氏距离以外的其它p取值的距离来计算相似度。闵氏距离公式如下式(4)所示,当p=2时即为欧氏距离。The above is merely an example of some similarity calculations, and some other similarity calculation methods known to those skilled in the art are also feasible. For example, in contrast to the above similarity algorithm based on the Euclidean distance of an industry standard algorithm for spectral search, for example, a base is also used instead. The similarity is calculated from the distance of other p values other than the Euclidean distance of p=2 in the formula of the distance. The Euclidean distance formula is as shown in the following formula (4), and when p=2, it is the Euclidean distance.
Figure PCTCN2017109712-appb-000004
Figure PCTCN2017109712-appb-000004
作为另外的实施例,由于每种物质的拉曼光谱是组成该物质分子结构的反映,具有独特的结构和模式特性。通过将光谱数据点数视为模式空间的维数,由此,一张拉曼光谱谱图在模式空间中即可表达为一个模式向量,N张图谱间相似性的分析就转化为计算模式空间中N个模式向量的相似度。相应地,还诸如替代地采用夹角余弦方法、或基于杰卡德距离的杰卡德相似系数方法的相似度计算,使得计算HQI值的方法简便快速,且其计算值也与基于上述欧氏距离算法的相似度计算类似地具备介于0至1之间的固定区间范围,易于衡量。进一步地,还如可选择性地采用调整余弦相似度算法。As a further embodiment, since the Raman spectrum of each substance is a reflection of the molecular structure of the substance, it has unique structural and mode characteristics. By considering the number of spectral data points as the dimension of the model space, a Raman spectral spectrum can be expressed as a pattern vector in the pattern space, and the analysis of the similarity between the N maps is transformed into the computational pattern space. The similarity of N pattern vectors. Correspondingly, the similarity calculation such as the angle cosine method or the Jakedian similarity coefficient method based on the Jachard distance is used, so that the method for calculating the HQI value is simple and fast, and the calculated value is also based on the above-mentioned Euclidean The similarity calculation of the distance algorithm similarly has a fixed interval range between 0 and 1, which is easy to measure. Further, an adjusted cosine similarity algorithm can also be selectively employed.
作为示例,补充地或替代地,例如,判定待实测样品的拉曼光谱图与参考拉曼光谱图的误差是否在预定范围内,也可以直接通过峰强检测(幅值检测)和峰位检测(相位检测或拐点检测)来提取特征峰的信息,从而直接将测量拉曼光谱图与参考拉曼光谱图中的这些特征峰的信息进行比较来实现。As an example, supplementally or alternatively, for example, determining whether the error of the Raman spectrum and the reference Raman spectrum of the sample to be measured is within a predetermined range, or directly passing peak intensity detection (amplitude detection) and peak position detection (Phase detection or inflection detection) to extract the information of the characteristic peaks, thereby directly comparing the measured Raman spectrum with the information of the characteristic peaks in the reference Raman spectrum.
在拉曼光谱测量中,由于存在样品均匀性差异、仪器噪声、荧光背景等,使得拉曼光谱产生偏差;且在光谱处理过程中,去噪、基线校正等也会产生误差。在识别过程中仅采用相似度进行物质识别的准确率不高,因此,在本公开实施例中,例如通过引入自学习识别方法和人工对比识别方法的组合来进一步对待检物品进行物品定性分析。In Raman spectroscopy, the Raman spectrum is biased due to the difference in sample uniformity, instrument noise, fluorescence background, etc., and in the spectral processing process, denoising, baseline correction, etc. will also produce errors. The accuracy of the substance recognition using only the similarity in the recognition process is not high. Therefore, in the embodiment of the present disclosure, the object to be inspected is further qualitatively analyzed, for example, by introducing a combination of the self-learning recognition method and the manual contrast recognition method.
图5示出如图2所示的实际检测阶段中误报检测步骤S5的基本示意性流程图。如图所示,在本公开的示例性示例中,针对与阈值库进行的相似度对比之后判断为存在超过阈值的物质的情况,进一步执行误报检测所示误报检测步骤S5,所述误报检测步骤S5包括两个阶段即:误报检验前处理步骤S50、S50'和S50″;以及误报检测后处理步骤S51。Fig. 5 shows a basic schematic flow chart of the false alarm detecting step S5 in the actual detecting phase as shown in Fig. 2. As shown in the figure, in the exemplary example of the present disclosure, in the case where it is determined that there is a substance exceeding the threshold after the comparison with the threshold library, the false alarm detection step S5 is further performed, which is performed. The report detecting step S5 includes two stages: a false positive check pre-processing step S50, S50' and S50"; and a false positive detection post-processing step S51.
一方面,作为本公开的示例性示例,例如如图5所示,误报检测前处理步骤S50、S50'和S50″是三个在逻辑上并行的分流程,分别对应于后续的后处理步骤S51中的待采用的第n种(n=1,2,3)物质ID选择方 法:S50对应于第一种物质ID选择方法,即利用统计方式逐一核实来选择,也称为“统计选择”方法;S50'对应于第二种物质ID选择方法,即调用预设的“特征识别接口”的相应算法来选择验证的物质ID,也称为“特征识别”方法;以及S50″对应于第三种物质ID选择方法,即调用预设的“二次识别接口”的相应算法来选择验证的物质ID,也称为“二次识别”方法。相应地,基于下文中各自待用的物质ID选择方法的特性,S50也称为“统计选择”的前处理步骤,S50'也称为“特征识别”的前处理步骤,S50″也称为“二次识别”的前处理步骤。上述三个前处理步骤S50、S50'和S50″在逻辑上并行是指彼此独立地执行,在时间上例如基本同时执行、或顺序地执行、或时间上彼此无关地执行。In one aspect, as an illustrative example of the present disclosure, for example, as shown in FIG. 5, the false positive detection pre-processing steps S50, S50', and S50" are three logically parallel sub-flows, respectively corresponding to subsequent post-processing steps. The nth (n=1, 2, 3) substance ID selector to be used in S51 Method: S50 corresponds to the first substance ID selection method, that is, the statistical method is used to verify one by one, which is also called “statistical selection” method; S50′ corresponds to the second substance ID selection method, that is, the preset “features are called” Corresponding algorithm for identifying the interface to select the verified substance ID, also referred to as the "feature recognition" method; and S50" corresponds to the third substance ID selection method, that is, calling the corresponding algorithm of the preset "secondary identification interface" The verified substance ID is also referred to as a "secondary recognition" method. Accordingly, based on the characteristics of the respective material ID selection methods to be used hereinafter, S50 is also referred to as a pre-processing step of "statistical selection", and S50' is also called For the pre-processing step of "feature recognition", S50" is also referred to as the pre-processing step of "secondary recognition". The above-described three pre-processing steps S50, S50' and S50" are logically parallel to mean performing independently of each other, for example, substantially simultaneously, or sequentially, or temporally independent of each other.
具体地,如图5所示,所述误报检验前处理步骤,即“统计选择”的前处理步骤S50、“特征识别”的前处理步骤S50'和“二次识别”的前处理步骤S50″例如包括:Specifically, as shown in FIG. 5, the pre-reporting pre-processing step, that is, the pre-processing step S50 of "statistical selection", the pre-processing step S50' of "feature recognition", and the pre-processing step S50 of "secondary recognition" "For example:
步骤S500,S500',S500":误报检验子流程开始。Step S500, S500', S500": The false positive check subroutine starts.
步骤S501,S501',S501":将阈值比较后获取的相似度超阈值的识别物质ID列表(下文称为“阈值识别列表”)中的物质ID依次与自学习库中的(整个/或对应单个)“误报物质ID”字段进行比较。Step S501, S501', S501": the substance IDs in the list of identification substance IDs (hereinafter referred to as "threshold identification list") of the similarity super-threshold acquired after the threshold comparison are sequentially and (in the corresponding/or corresponding) in the self-learning library A single) "false positive substance ID" field is compared.
此处,具体而言,如图5所示,例如,步骤S501是将经阈值比较后的阈值识别列表中的ID依次与整个自学习库中的“误报物质ID”字段进行比较;步骤S501'是将经阈值比较后的阈值识别列表中的ID依次与在自学习库中的“自学习类型”字段取值为“特征识别”情况下的“误报物质ID”字段进行比较;且步骤S501"是将经阈值比较后的阈值识别列表中的ID依次与在自学习库中的“自学习类型”字段取值为“二次识别”情况下的“误报物质ID”字段进行比较;Here, specifically, as shown in FIG. 5, for example, step S501 is to sequentially compare the IDs in the threshold-valued threshold identification list with the "false positive substance ID" field in the entire self-learning library; step S501 'Comparing the IDs in the threshold-identified threshold identification list with the "false positive substance ID" field in the case where the "self-learning type" field in the self-learning library takes the value of "feature identification"; and the steps S501" is to compare the IDs in the threshold-identified threshold identification list with the "false positive substance ID" field in the case where the "self-learning type" field in the self-learning library takes the value of "secondary recognition";
步骤S502,S502',S502":判断是否匹配到相同的ID(即:是否识别到误报物质ID存在?)。Steps S502, S502', S502": determine whether the same ID is matched (ie, is it recognized that the false positive substance ID exists?).
步骤S503,S503',S503":若匹配到相同的物质ID,则相当于发现一次误报物质ID,从而误报次数计数器加1。Steps S503, S503', S503": If the same substance ID is matched, it is equivalent to finding a false positive substance ID, and the false alarm count counter is incremented by one.
步骤S504,S504',S504":若未能匹配到相同的物质ID,则相当于当前物质ID并非是误报的而是认为实际存在的,则正确物质ID次数计数 器加1。Steps S504, S504', S504": If the same substance ID is not matched, the current substance ID is not a false alarm but is actually considered to exist, and the correct substance ID number is counted. Add 1 to the device.
步骤S505,S505',S505":判断识别物质ID列表对比是否完成,若对比未完成则转向至步骤S501,S501',S501"循环执行;若对比完成则进入下一步骤S506,S506',S506"。Steps S505, S505', S505": determining whether the comparison of the identification substance ID list is completed. If the comparison is not completed, the process proceeds to step S501, S501', S501" is executed cyclically; if the comparison is completed, the process proceeds to the next step S506, S506', S506. ".
步骤S506,S506',S506":判断误报次数是否大于10,若误报次数小于等于10,则认为误报次数不足以确保自学习检测的顺利进行,从而跳转至人工对比识别;若识别次数大于10,则进入“最高正确物质ID次数”字段的赋值步骤。Step S506, S506', S506": determining whether the number of false positives is greater than 10. If the number of false positives is less than or equal to 10, it is considered that the number of false positives is insufficient to ensure the smooth progress of the self-learning detection, thereby jumping to manual contrast recognition; If the number is greater than 10, the assignment step of the "Maximum correct substance ID times" field is entered.
此处,误报次数设置为10是一种经验值,当确认发生的误报次数超过该值的情况下,则认定所发生的误报数量足以产生一个充分大的待核实物质ID集合,以供后续的后处理步骤S51进行物质ID选择。具体地,以上三种误报检验前处理步骤,即“统计选择”步骤S50、“特征识别”步骤S50'和“二次识别”步骤S50″分别对应于后处理步骤中采用的第n种(n=1,2,3)物质ID选择方法:第一种物质ID选择方法为利用统计方式逐一核实来选择;第二种物质ID选择方法为调用预设的“特征识别接口”的相应算法来选择验证的物质ID;以及第三种物质ID选择方法为调用预设的“二次识别接口”的相应算法来选择验证的物质ID。Here, the number of false positives is set to 10 is an empirical value. When it is confirmed that the number of false positives exceeds the value, it is determined that the number of false positives generated is sufficient to generate a sufficiently large set of substance IDs to be verified, Subsequent post-processing step S51 performs material ID selection. Specifically, the above three types of false positive check processing steps, that is, the "statistical selection" step S50, the "feature recognition" step S50', and the "secondary recognition" step S50" respectively correspond to the nth type used in the post-processing step ( n=1, 2, 3) material ID selection method: the first substance ID selection method is selected by statistical verification one by one; the second substance ID selection method is to call a corresponding algorithm of the preset “feature recognition interface” The verified substance ID is selected; and the third substance ID selection method is to select the verified substance ID by calling a corresponding algorithm of the preset "secondary identification interface".
步骤S507,S507',S507":将各自当前“正确物质ID次数计数器”分别赋值给相应“最高正确物质ID次数”字段MaxRightIDNum(n),以作为后处理步骤S51中判断是否需执行后续对应的第n种物质ID选择方法的判据。Steps S507, S507', S507": assigning respective current "correct substance ID times counters" to the corresponding "highest correct substance ID times" field MaxRightIDNum(n), respectively, as a post-processing step S51 to determine whether or not to perform subsequent correspondence. The criterion for the nth substance ID selection method.
另一方面,作为本公开的示例性示例,如图5所示,误报检测后处理步骤S51例如包括:On the other hand, as an illustrative example of the present disclosure, as shown in FIG. 5, the post-false alarm detection post-processing step S51 includes, for example:
S511:判断对于以上三组分流程S50、S50'和S50″而言,比较公式“字段MaxRightIDNum(n)>相应阈值THR(n)?”是否对于至少有两组成立。此判断是作为划分是否最高正确物质ID次数足以确保执行相应物质ID选择方法的判据,若满足则表示可利用至少两种物质ID选择方法用于获取至少两组物质ID列表来共同验证能以程控方式识别的物质ID的存在性。反之,若该对于以上三组分流程S50、S50'和S50″而言,所述比较公式均不成立或仅对一组成立,则表示无法通过将以上至少两种 物质ID选择方法各自识别的物质ID列表进行选举来定性分析,从而实质上此自学习过程继续进行无意义,则终止操作并跳转至人工对比识别。S511: It is judged that for the above three component flows S50, S50' and S50", the comparison formula "field MaxRightIDNum(n)> corresponding threshold THR(n)? Whether it is established for at least two groups. This judgment is a criterion for dividing whether the highest correct substance ID number is sufficient to ensure the execution of the corresponding substance ID selection method, and if satisfied, at least two substance ID selection methods are available for acquiring at least two The group material ID list is used to jointly verify the existence of the substance ID that can be identified in a program-controlled manner. Conversely, if for the above three-component processes S50, S50' and S50", the comparison formula is not true or only one group If it is established, it means that it is impossible to pass at least two of the above. The list of substance IDs identified by the substance ID selection method is subjected to election for qualitative analysis, so that substantially the self-learning process continues to be meaningless, the operation is terminated and the manual comparison is recognized.
S512:在公式“字段MaxRightIDNum(n)>相应阈值THR(n)?”成立的情况下,分别以第n种方法获取各自相应物质列表IDn(例如,ID1或ID2或ID3)。S512: In the case where the formula "field MaxRightIDNum(n)>the corresponding threshold value THR(n)?" is established, the respective corresponding substance list IDn (for example, ID1 or ID2 or ID3) is acquired by the nth method, respectively.
S513:判断所生成的(即经误报检验后确认的)各物质列表IDn中是否至少两个相同。若“是”则继续执行后续步骤S514,若“非”则实质上此自学习过程继续进行无意义,则终止操作并跳转至人工对比识别。S513: It is judged whether at least two of the generated substance lists IDn (that is, confirmed by the false positive test) are identical. If YES, the subsequent step S514 is continued. If "None", then the self-learning process continues to be meaningless, the operation is terminated and the manual comparison is recognized.
S514:相同的至少两个物质列表作为对应的至少两个物质ID选择方法各自辨识并共同确认的识别物质列表。S514: The same at least two substance lists are used as a list of identification substances that are respectively recognized and jointly confirmed by the corresponding at least two substance ID selection methods.
其中,对于上述步骤S511而言,字段MaxRightIDNum(n)的相应阈值THR(n)例如分别设置为:对于“统计选择”方法、“特征识别”方法、“二次识别”方法而言,其阈值分别为第一阈值THR(1)、第二阈值THR(2)、和第三阈值THR(3)。由于“特征识别”方法是模式识别中用于从原始特征集中剔除不相关或冗余特征的降维方法,“二次识别”方法则诸如在特征提取后通过估计均值与协方差矩阵,建立和训练分类器,进行分类来识别,从而二者能够达到减少特征个数,提高检测精确度,减少运行时间的目的;而“统计选择”方法则是不加选择地逐一比较确认,由此“统计选择”方法的可信度相比于采用模式识别的“特征识别”方法或“二次识别”方法较小,相应地,第一阈值THR(1)设置为相比于第二阈值THR(2)和第三阈值THR(3)更大。例如,在本公开实施例中,THR(n)分别设置为THR(1)=10、THR(2)=5、THR(3)=6。Wherein, for the above step S511, the respective thresholds THR(n) of the field MaxRightIDNum(n) are respectively set, for example, as thresholds for the "statistical selection" method, the "feature recognition" method, and the "secondary recognition" method, respectively. The first threshold value THR(1), the second threshold value THR(2), and the third threshold value THR(3), respectively. Since the “feature recognition” method is a dimensionality reduction method used in pattern recognition to remove uncorrelated or redundant features from the original feature set, the “secondary recognition” method is used to estimate the mean and covariance matrix, for example, after feature extraction. The classifier is trained to be classified and identified, so that the two can achieve the purpose of reducing the number of features, improving the detection accuracy, and reducing the running time; and the "statistical selection" method is inconsistently comparing and confirming one by one, thereby "statistics" The reliability of the selection method is smaller than the "feature recognition" method or the "secondary recognition" method using pattern recognition, and accordingly, the first threshold THR(1) is set to be compared to the second threshold THR (2). And the third threshold THR(3) is larger. For example, in the embodiment of the present disclosure, THR(n) is set to THR(1)=10, THR(2)=5, and THR(3)=6, respectively.
对于上述步骤S512而言,一方面,在本公开的示例性示例中,例如,“特征识别”方法是模式识别中用于从原始特征集中剔除不相关或冗余特征的降维方法,例如在本公开实施例中通过调用预置于自学习库的“特征识别接口”字段中的多个特征识别接口来实现,并且可以选择为如下至少之一:For the above step S512, on the one hand, in an exemplary example of the present disclosure, for example, the "feature recognition" method is a dimensionality reduction method in pattern recognition for rejecting irrelevant or redundant features from the original feature set, for example in The embodiment of the present disclosure is implemented by calling a plurality of feature recognition interfaces preset in the “feature identification interface” field of the self-learning library, and may be selected as at least one of the following:
过滤/筛选法(Filter),其通过选定一个指标来表征每个特征的重要性,然后根据特征的指标值来对特征排序,诸如通过设定阈值并去掉达 不到阈值的特征、或通过设定待选择特征的个数并且选择前N个或者排序为最前一定百分比的特征,来进行特征筛选。换言之,通过给每一维的特征赋予权重,权重代表该维特征的重要性,然后依据权重排序。通常的过滤法利用训练集自身的特点筛选出特征子集,一般考虑的是特征的独立性或者与因变量的关系,例如卡方检验、信息增益、相关系数等。Filter/Filter, which characterizes the importance of each feature by selecting an indicator, and then sorts the features based on the index values of the features, such as by setting thresholds and removing them Feature selection is not performed by the characteristics of the threshold, or by setting the number of features to be selected and selecting the top N or sorting to a certain percentage of the top. In other words, by weighting the features of each dimension, the weights represent the importance of the dimension features and are then sorted by weight. The usual filtering method uses the characteristics of the training set to screen out the feature subsets. Generally, the independence of the features or the relationship with the dependent variables, such as chi-square test, information gain, correlation coefficient, etc., are considered.
包裹/封装法(Wrapper),其根据目标函数(通常是预测效果的评估),每次对训练集选择若干分组特征,或者排除若干分组特征。换言之,包裹/封装法实质上是将特征子集的选择看作是一个搜索寻优问题,通过打包生成不同的组合(特征子集),对组合进行评价、再与其他的组合进行比较,例如将分类的精度作为衡量特征子集好坏的标准。由此,子集的选择被视为一个优化问题,例如可通过很多的优化算法解决,尤其是启发式优化算法,诸如遗传算法,粒子群算法,差分进化算法,人工蜂群算法等。包裹/封装法例如递归特征消除算法。Wrapper, which selects several grouping features for a training set each time, or excludes several grouping features, based on an objective function (usually an evaluation of the predicted effect). In other words, the parcel/encapsulation method essentially considers the selection of feature subsets as a search optimization problem, and generates different combinations (feature subsets) by packaging, and evaluates the combinations and compares them with other combinations, for example. The accuracy of the classification is used as a measure of how good or bad the feature subset is. Therefore, the selection of subsets is regarded as an optimization problem, for example, it can be solved by many optimization algorithms, especially heuristic optimization algorithms, such as genetic algorithm, particle swarm optimization algorithm, differential evolution algorithm, artificial bee colony algorithm and so on. Parcel/encapsulation methods such as recursive feature elimination algorithms.
嵌入/集成法(Embedded):其先使用某些机器学习的算法和模型进行训练,得到各个特征的权重系数,再根据权重系数从大到小选择特征。类似于Filter方法,但是通过训练来确定特征的优劣,即在模型既定的情况下学习出对提高模型准确性最好的属性。具体而言,是在确立模型的过程中,挑选出对模型的训练有重要意义(例如对于提升准确率贡献最大)的特征。最常见的Embedded方法例如正则化方法。Embedded: It uses some machine learning algorithms and models to train, obtains the weight coefficients of each feature, and then selects features according to the weight coefficients from large to small. Similar to the Filter method, but through training to determine the pros and cons of the feature, that is, to learn the best attributes to improve the accuracy of the model in the case of the model. Specifically, in the process of establishing the model, it is important to select the characteristics that are important for the training of the model (for example, the greatest contribution to improving the accuracy). The most common Embedded methods are the regularization methods.
另一方面,在本公开的示例性示例中,例如,“二次识别”方法例如是通过调用预置于自学习库的“特征识别接口”字段中的多个二次识别接口来实现,并且例如以如下方式构建:诸如采用模式识别中常用的二次判别方程QDF分类器、MQDF改进二次判别方程分类器等来执行,通过估计均值与协方差矩阵训练分类器,协方差矩阵反映出特征之间的散布情况,协方差越大,包含的信息量就越多,则最终分类就越准确。On the other hand, in an exemplary example of the present disclosure, for example, the "secondary recognition" method is implemented, for example, by calling a plurality of secondary recognition interfaces preset in a "feature recognition interface" field of the self-learning library, and For example, it is constructed in such a manner as to use a quadratic discriminant equation QDF classifier commonly used in pattern recognition, an MQDF improved quadratic discriminant equation classifier, etc., and the classifier is trained by estimating the mean and covariance matrix, and the covariance matrix reflects the feature. The spread between the two, the greater the covariance, the more information is included, the more accurate the final classification.
由此,在本公开的示例性实施例中,当如前所述,例如,在各个阈值的取值分别设置为THR(1)=10、THR(2)=5、THR(3)=6的前提下,若字段MaxRightIDNum(1)>10成立,则以统计方式从整个自学习库的“误报物质ID”字段中选择出所获得的统计选择物质列表ID1;若字段MaxRightIDNum(2)>5成立,则调用特征识别接口来获得特征识别物质 列表ID2;若字段MaxRightIDNum(3)>6成立,则调用二次识别接口来获得二次识别物质列表ID3。Thus, in an exemplary embodiment of the present disclosure, as described above, for example, the values of the respective thresholds are respectively set to THR(1)=10, THR(2)=5, and THR(3)=6. On the premise, if the field MaxRightIDNum(1)>10 is established, the obtained statistical selection substance list ID1 is selected statistically from the "false positive substance ID" field of the entire self-learning library; if the field MaxRightIDNum(2)>5 Once established, the feature recognition interface is called to obtain the feature recognition substance. List ID2; if the field MaxRightIDNum(3)>6 is established, the secondary recognition interface is called to obtain the secondary identification substance list ID3.
对于上述步骤S514而言,利用至少两组物质ID选择方法来各自独立地进行物质识别验证,然后将所确认的物质ID列表对比,一旦相同,则意味着在基于相似度判断的基础上,进一步利用至少两种独立的方法来共同确认了识别物质ID列表,从而获得相比常规的仅基于相似度判断的拉曼光谱检验和人工执行的拉曼光谱检验而言更准确的自学习物质识别ID列表。For the above step S514, the substance identification verification is performed independently by using at least two sets of substance ID selection methods, and then the confirmed substance ID list is compared. Once the same, it means that based on the similarity judgment, further A list of identified substance IDs is co-confirmed using at least two independent methods, thereby obtaining a more accurate self-learning substance identification ID than conventional Raman spectroscopy based only on similarity judgment and manually performed Raman spectroscopy. List.
在本公开的示例性实施例中,在以上误报检测后处理步骤S51之后,跳转至S7即生成经误报检验后确认的物质列表。In the exemplary embodiment of the present disclosure, after the above-described false alarm detection post-processing step S51, the jump to S7 generates a substance list confirmed by the false positive check.
类似地,讨论漏报检测S6。图10示出如图2所示的实际检测阶段中漏报检测的基本示意性流程图。如图所示,在本公开的示例性示例中,针对与阈值库进行的相似度对比之后判断为不存在超过阈值的物质的情况,进一步执行漏报检测所示漏报检测步骤S6,所述漏报检测步骤S6包括两个阶段即:漏报检验前处理步骤S60、S60'和S60″;以及漏报检测后处理步骤S61。Similarly, the missing report detection S6 is discussed. Fig. 10 shows a basic schematic flow chart of the false negative detection in the actual detection phase as shown in Fig. 2. As shown in the figure, in the exemplary example of the present disclosure, for the case where it is determined that there is no substance exceeding the threshold after the comparison with the threshold library, the missing report detection step 71 is further performed. The missing report detecting step S6 includes two stages: a missing pre-test pre-processing step S60, S60' and S60"; and a missing-post detection post-processing step S61.
一方面,作为本公开的示例性示例,例如如图10所示,漏报检测前处理步骤S60、S60'和S60″是三个在逻辑上并行的分流程,分别对应于后续的后处理步骤S61中的待采用的第n种(n=1,2,3)物质ID选择方法:S60对应于第一种即前述“统计选择”方法;S60'对应于第二种即前述“特征识别”方法;以及S60″对应于第三种即前述“二次识别”方法。相应地,基于下文中各自待用的物质ID选择方法的特性,S60也称为“统计选择”的前处理步骤,S60'也称为“特征识别”的前处理步骤,S60″也称为“二次识别”的前处理步骤。上述三个前处理步骤S60、S60'和S60″在逻辑上并行是指彼此独立执行,在时间上例如基本同时执行、或顺序地执行、或时间上彼此无关地执行。In one aspect, as an illustrative example of the present disclosure, for example, as shown in FIG. 10, the pre-missing detection pre-processing steps S60, S60', and S60" are three logically parallel sub-flows, respectively corresponding to subsequent post-processing steps. The nth (n=1, 2, 3) substance ID selection method to be used in S61: S60 corresponds to the first type, that is, the aforementioned "statistical selection" method; and S60' corresponds to the second type, that is, the aforementioned "feature recognition" The method; and S60" corresponds to the third, ie, the aforementioned "secondary recognition" method. Accordingly, based on the characteristics of the material ID selection method to be used in each of the following, S60 is also referred to as a pre-processing step of "statistical selection", and S60' is also referred to as a pre-processing step of "feature recognition", and S60" is also referred to as " Pre-processing steps of secondary recognition. The above three pre-processing steps S60, S60' and S60" are logically parallel to mean that they are executed independently of each other, for example, substantially simultaneously, or sequentially, or temporally independent of each other in time. Execution.
具体地,如图10所示,大致与前述误报检验前处理步骤S50、S50'和S50″相似,所述漏报检验前处理步骤,即“统计选择”的前处理步骤S60、“特征识别”的前处理步骤S60'和“二次识别”的前处理步骤S60″例如包括: Specifically, as shown in FIG. 10, it is substantially similar to the foregoing pre-false positive check processing steps S50, S50' and S50", the pre-reporting pre-processing step, that is, the pre-processing step S60 of "statistical selection", "feature recognition" The pre-processing step S60' and the "secondary recognition" pre-processing step S60" include, for example:
步骤S600,S600',S600":漏报检验子流程开始。Steps S600, S600', S600": The underreporting test subroutine begins.
步骤S601,S601',S601":将原始识别物质ID列表中的物质ID依次与自学习库中的(整个/或对应单个)“漏报物质ID”字段进行比较。Steps S601, S601', S601": The substance IDs in the original identification substance ID list are sequentially compared with the (whole/or corresponding single) "missing substance ID" field in the self-learning library.
此处,具体而言,如图10所示,例如,步骤S601是将原始识别物质ID列表中的ID依次与整个自学习库中的“漏报物质ID”字段进行比较;步骤S601'是将原始识别物质ID列表中的ID依次与在自学习库中的“自学习类型”字段取值为“特征识别”情况下的“漏报物质ID”字段进行比较;且步骤S601"是原始识别物质ID列表中的ID依次与在自学习库中的“自学习类型”字段取值为“二次识别”情况下的“漏报物质ID”字段进行比较;Here, specifically, as shown in FIG. 10, for example, step S601 is to sequentially compare the IDs in the original identification substance ID list with the "false negative substance ID" field in the entire self-learning library; step S601' is The IDs in the original identification substance ID list are sequentially compared with the "false negative substance ID" field in the case where the "self-learning type" field in the self-learning library takes the value of "feature recognition"; and step S601" is the original identification substance The IDs in the ID list are sequentially compared with the "Reporting Substance ID" field in the case where the "Self-learning Type" field in the self-learning library takes the value of "Secondary Recognition";
步骤S602,S602',S602":判断是否匹配到相同的ID(即:是否识别到漏报物质ID存在?)。Step S602, S602', S602": It is judged whether or not the same ID is matched (ie, is it recognized that the missing substance ID exists?).
步骤S603,S603',S603":若匹配到相同的物质ID,则相当于发现一次漏报物质ID,从而正确物质ID次数(此处即等效于漏报物质ID次数)计数器加1。Steps S603, S603', S603": If the same substance ID is matched, it is equivalent to finding the missing substance ID once, and the counter of the correct substance ID number (here, equivalent to the number of missing substance IDs) is incremented by one.
步骤S604,S604',S604":判断识别物质ID列表对比是否完成,若对比未完成则转向至步骤S601,S601',S601"循环执行;若对比完成则进入下一步骤S605,S605',S605"。Steps S604, S604', S604": determining whether the comparison of the identification substance ID list is completed. If the comparison is not completed, the process proceeds to step S601, S601', S601" is cyclically executed; if the comparison is completed, the process proceeds to the next step S605, S605', S605. ".
步骤S605,S605',S605":将各自当前“正确物质ID次数计数器”分别赋值给相应“最高正确物质ID次数”字段MaxRightIDNum(n),以作为后处理步骤S61中判断是否需执行后续对应的第n种物质ID选择方法的判据。Steps S605, S605', S605": assigning respective current "correct substance ID times counters" to the corresponding "highest correct substance ID times" field MaxRightIDNum(n), respectively, as a post-processing step S61 to determine whether or not to perform subsequent correspondence. The criterion for the nth substance ID selection method.
另一方面,作为本公开的示例性示例,如图10所示,大致与图5所示的前述误报检测后处理步骤S51类似,漏报检测后处理步骤S61例如包括:On the other hand, as an exemplary example of the present disclosure, as shown in FIG. 10, substantially similar to the aforementioned false positive detection post-processing step S51 shown in FIG. 5, the missing report detection post-processing step S61 includes, for example:
S611:判断对于以上三组分流程S60、S60'和S60″而言,比较公式“字段MaxRightIDNum(n)>相应阈值THR(n)'?”是否对于至少有两组成立。此判断是作为划分是否最高正确物质ID次数足以确保执行相应物质ID选择方法的判据,若满足则表示可利用至少两种物质ID选择方法用于获取至少两组物质ID列表来共同验证能以程控方式识别的物质 ID的存在性。反之,若该对于以上三组分流程S60、S60'和S60″而言,所述比较公式均不成立或仅对一组成立,则表示无法通过将以上至少两种物质ID选择方法各自识别的物质ID列表进行选举来定性分析,从而实质上此自学习过程继续进行无意义,则终止操作并跳转至人工对比识别。S611: It is judged that for the above three component flows S60, S60' and S60", the comparison formula "field MaxRightIDNum(n)> corresponding threshold value THR(n)'? Whether it is established for at least two groups. This judgment is a criterion for dividing whether the highest correct substance ID number is sufficient to ensure the execution of the corresponding substance ID selection method, and if satisfied, at least two substance ID selection methods are available for acquiring at least two Group material ID list to jointly verify substances that can be identified by program control The existence of the ID. On the other hand, if the comparison formula is not true for the above three-component processes S60, S60' and S60", or only one group is established, it means that the substance cannot be identified by the above at least two substance ID selection methods. The ID list is elected for qualitative analysis, so that essentially the self-learning process continues to be meaningless, then the operation is terminated and jumped to manual comparison recognition.
S612:在公式“字段MaxRightIDNum(n)>相应阈值THR(n)'?”成立的情况下,分别以第n种方法获取各自相应物质列表IDn'(例如,ID1'或ID2'或ID3')。S612: In the case where the formula "field MaxRightIDNum(n)> corresponding threshold THR(n)'?" is established, respectively obtain the respective substance list IDn' (for example, ID1' or ID2' or ID3') by the nth method. .
S613:判断所生成的(即经漏报检验后确认的)各物质列表IDn'中是否至少两个相同。若“是”则继续执行后续步骤S614,若“非”则实质上此自学习过程继续进行无意义,则终止操作并跳转至人工对比识别。S613: It is judged whether at least two of the generated substance lists IDn' (that is, confirmed by the false negative test) are identical. If "Yes", the subsequent step S614 is continued. If "None", then the self-learning process continues to be meaningless, the operation is terminated and the manual comparison is recognized.
S614:相同的至少两个物质列表作为对应的至少两个物质ID选择方法各自辨识并共同确认的识别物质列表。S614: The same at least two substance lists are used as a list of identification substances that are respectively recognized and jointly confirmed by the corresponding at least two substance ID selection methods.
其中,对于上述步骤S611而言,字段MaxRightIDNum(n)的相应阈值THR(n)'的选择和设置与误报检测相同或类似。例如,第一阈值THR(1)'设置为相比于第二阈值THR(2)和第三阈值THR(3)更大。例如,在本公开实施例中,THR(n)'分别设置为THR(1)'=10、THR(2)'=5、THR(3)'=6。并且采用的“特征识别”方法和“二次识别”方法也相同或类似,分别通过调用多个不同“特征识别接口”和多个“二次识别接口”而执行。Wherein, for the above step S611, the selection and setting of the corresponding threshold THR(n)' of the field MaxRightIDNum(n) is the same as or similar to the false positive detection. For example, the first threshold THR(1)' is set to be larger than the second threshold THR(2) and the third threshold THR(3). For example, in the embodiment of the present disclosure, THR(n)' is set to THR(1)'=10, THR(2)'=5, and THR(3)'=6, respectively. And the "feature recognition" method and the "secondary recognition" method are also the same or similar, and are respectively executed by calling a plurality of different "feature recognition interfaces" and a plurality of "secondary recognition interfaces".
由此,在本公开的示例性实施例中,当如前所述,例如,在各个阈值的取值分别设置为THR(1)'=10、THR(2)'=5、THR(3)'=6的前提下,若字段MaxRightIDNum(1)'>10成立,则以统计方式从整个自学习库的“漏报物质ID”字段中选择出所获得的统计选择物质列表ID1';若字段MaxRightIDNum(2)'>5成立,则调用特征识别接口来获得特征识别物质列表ID2';若字段MaxRightIDNum(3)'>6成立,则调用二次识别接口来获得二次识别物质列表ID3'。Thus, in an exemplary embodiment of the present disclosure, as described above, for example, the values of the respective thresholds are respectively set to THR(1)'=10, THR(2)'=5, THR(3), respectively. Under the premise of '=6, if the field MaxRightIDNum(1)'>10 is established, the obtained statistical selection substance list ID1' is selected statistically from the "missing substance ID" field of the entire self-learning library; if the field MaxRightIDNum (2) When '5' is established, the feature recognition interface is called to obtain the feature recognition substance list ID2'; if the field MaxRightIDNum(3)'>6 is established, the secondary recognition interface is called to obtain the secondary identification substance list ID3'.
对于上述步骤S614而言,利用至少两组物质ID选择方法来各自独立地进行物质识别验证,然后将所确认的物质ID列表对比,一旦相同,则意味着在基于相似度判断的基础上,进一步利用至少两种独立的方法 来共同确认了识别物质ID列表,从而获得相比常规的仅基于相似度判断的拉曼光谱检验和人工执行的拉曼光谱检验而言更准确的自学习物质识别ID列表。For the above step S614, at least two sets of substance ID selection methods are used to independently perform substance identification verification, and then the confirmed substance ID list is compared. Once the same, it means that based on the similarity judgment, further Use at least two separate methods The list of identification substance IDs is collectively confirmed, thereby obtaining a more accurate list of self-learning substance identification IDs than conventional Raman spectroscopy tests based on similarity judgment and manually performed Raman spectroscopy.
在本公开的示例性实施例中,在以上漏报检测后处理步骤S61之后,跳转至S7即生成经漏报检验后确认的物质列表。In the exemplary embodiment of the present disclosure, after the above-described missing report detection processing step S61, the jump to S7 generates a substance list confirmed by the false negative check.
为示意起见,图15给出了利用根据本公开的一实施例的方法对待实测样品的拉曼光谱进行检测的操作的示意图。在该示例中的主要流程包括:For the sake of illustration, FIG. 15 shows a schematic diagram of an operation for detecting a Raman spectrum of a test sample using a method according to an embodiment of the present disclosure. The main processes in this example include:
1)准备好样品后,采集数据;1) After preparing the sample, collect the data;
2)调用算法接口进行光谱预处理,提取光谱特征数据;2) calling the algorithm interface to perform spectral preprocessing and extracting spectral feature data;
3)与光谱特征库比较;3) compared with the spectral feature library;
4)获取相似度列表,如图2所示;4) Obtain a similarity list, as shown in Figure 2;
5)与阈值库比较,如图3所示;5) compared with the threshold library, as shown in Figure 3;
6)获取超过阈值物质列表,如图4所示;6) Obtain a list of substances exceeding the threshold, as shown in Figure 4;
7)有超过阈值物质吗?如果是“No”,跳转到14);7) Is there a substance exceeding the threshold? If it is "No", jump to 14);
8)如果是“Yes”,则在自学习库查找是否有误报物质ID,如果是“No”,跳转到21);8) If it is "Yes", find in the self-learning library whether there is a false positive substance ID, if it is "No", jump to 21);
9)如果是“Yes“,调用“统计选择”算法选择误报物质ID;9) If it is "Yes", call the "statistical selection" algorithm to select the false positive substance ID;
10)调用“特征选择”算法选择误报物质ID;10) Calling the "feature selection" algorithm to select the false positive substance ID;
11)调用“二次识别”算法选择误报物质ID;11) Calling the "secondary recognition" algorithm to select the false positive substance ID;
12)调用“三种方案选举确定结果”算法选择最终的可能正确的物质ID;12) Calling the "three scheme election determination results" algorithm to select the final possible correct material ID;
13)根据物质ID从谱图库中找出物质名称,跳转到21);13) Find the substance name from the spectrum library according to the substance ID and jump to 21);
14)自学习库中有漏报物质ID吗?如果是“No”,跳转到21);14) Is there a missing material ID in the self-learning library? If it is "No", jump to 21);
15)如果是“Yes”,调用“统计选择”算法选择漏报物质ID;15) If it is "Yes", call the "statistical selection" algorithm to select the missing material ID;
16)调用“特征选择”算法选择漏报物质ID;16) Calling the "feature selection" algorithm to select the missing material ID;
17)调用“二次识别”算法选择漏报物质ID;17) Calling the "secondary recognition" algorithm to select the missing material ID;
18)调用“三种方案选举确定结果”算法选择最终的可能正确的物质ID;18) Calling the "three scheme election determination results" algorithm to select the final possible correct material ID;
19)“有漏报物质ID吗”,如果是“No”,跳转到21);19) "Is there a missing material ID?" If it is "No", jump to 21);
20)如果是“Yes”,从谱图库中找出物质名称;20) If it is "Yes", find the name of the substance from the spectrum library;
21)显示测量结果; 21) Display the measurement results;
22)进行“人工对比”吗?如果选择“No”,跳转到26);22) Is there a “manual comparison”? If you select "No", jump to 26);
23)如果是“Yes”,列出样品谱图与谱图库所有谱图对比结果,包括相似度、峰个数、峰位、峰强等,进行人工分析与筛查、判断;23) If it is “Yes”, list the results of all spectra of the sample spectrum and the spectrum library, including similarity, number of peaks, peak position, peak intensity, etc., for manual analysis, screening and judgment;
24)“有误报、漏报吗?”,如果是“No”,跳转到26);24) "Is there a false positive or missing report?", if it is "No", jump to 26);
25)如果是“Yes”,选择正确的物质、选择类型等信息写入自学习库,供分析处理及自学习;25) If it is "Yes", select the correct substance, selection type and other information to be written into the self-learning library for analysis and self-learning;
26)结束。26) End.
类似地,在其它实施例中,基于上述优选实施例实现多个修改和变型。Similarly, in other embodiments, numerous modifications and variations are possible based on the preferred embodiments described above.
图6示出如图2所示的实际检测阶段中误报检测的关于“三种方法选举”实现方式的扩展的示意性流程图。在图6的示例中的误报检测流程S5与图5的示例中的误报检测流程S5的区别主要在于,如图6所示,例如,在完成基于“至少两个(由各种物质ID选择方法分别地)所识别的物质ID列表”之后,所述误报检测后处理步骤S51还额外地包括可选的步骤S515即基于“交集”的进一步“三方法选举”。为简洁起见,其余相同的子步骤不再赘述。Figure 6 shows a schematic flow chart of an extension of the "three method elections" implementation of false positive detection in the actual detection phase as shown in Figure 2. The difference between the false positive detection flow S5 in the example of FIG. 6 and the false positive detection flow S5 in the example of FIG. 5 mainly lies in, as shown in FIG. 6, for example, based on "at least two (by various substance IDs) After the selection method separately identifies the identified substance ID list, the false positive detection post-processing step S51 additionally includes an optional step S515, that is, a further "three-method election" based on "intersection". For the sake of brevity, the remaining sub-steps will not be described again.
进一步地,图7是如图6所示的误报检测的一种基本扩展的示例性实施例的示意性流程图。在图7的示例中的误报检测流程S5与图15的示例中的误报检测流程S5的区别主要在于,如图7所示,例如,所述误报检测后处理步骤S51的可选步骤S515具体地包括:Further, FIG. 7 is a schematic flow chart of a substantially extended exemplary embodiment of false alarm detection as shown in FIG. 6. The difference between the false positive detection flow S5 in the example of FIG. 7 and the false positive detection flow S5 in the example of FIG. 15 is mainly that, as shown in FIG. 7, for example, the optional step of the post-false positive detection processing step S51 S515 specifically includes:
步骤S5150:判断所生成的各物质列表ID1、ID2、ID3中至少两个存在交集?若成立则继续执行步骤S5150,即认为利用至少两种独立的方法所分别选择生成的物质ID列表存在重叠部分,该重叠部分可用来生成一种经共同确认的识别物质ID列表;否则,则跳转至人工对比识别。Step S5150: It is judged that there is an intersection of at least two of the generated substance lists ID1, ID2, and ID3. If yes, proceed to step S5150, that is, there is an overlap portion of the list of substance IDs respectively selected by using at least two independent methods, and the overlapping portion can be used to generate a list of commonly recognized identification substance IDs; otherwise, jump Go to manual contrast recognition.
步骤S5151:在步骤S5150成立的情况下,将所述交集赋值给第一识别列表。Step S5151: In the case where step S5150 is established, the intersection is assigned to the first identification list.
此后,第一识别列表在后续步骤S7直接作为经误报检验后确认的物质列表。 Thereafter, the first identification list is directly used as a list of substances confirmed after the false positive check in the subsequent step S7.
图7所示的误报检测S5的扩展流程图在基于相似度识别和利用至少两种独立的方法的相同结果的判断来共同确认了识别物质ID列表之后,进一步利用至少两种独立的方法的结果的重叠部分即交集的判断来共同确认了识别物质ID列表,确保了识别准确度又被进一步提高。The extended flowchart of the false positive detection S5 shown in FIG. 7 further utilizes at least two independent methods after jointly identifying the identification substance ID list based on the similarity recognition and the determination of the same result using at least two independent methods. The overlapping portion of the results, that is, the judgment of the intersection, together confirms the list of identification substance IDs, ensuring that the recognition accuracy is further improved.
图8是如图6所示的误报检测的另一种进一步扩展的示例性实施例的示意性流程图。在图8的示例中的误报检测流程S5与图7的示例中的误报检测流程S5的区别主要在于,如图8所示,例如,所述误报检测后处理步骤S51的可选步骤S515还额外地包括,针对至少两种独立的方法各自选择出的物质ID列表,除了对交集部分进行确认之外,还进一步对非交集部分进行进一步验证。例如,在步骤S5150和S5151之后,所述误报检测后处理步骤S51的可选步骤S515还额外地包括:Figure 8 is a schematic flow diagram of another further expanded exemplary embodiment of false alarm detection as shown in Figure 6. The difference between the false positive detection flow S5 in the example of FIG. 8 and the false positive detection flow S5 in the example of FIG. 7 is mainly that, as shown in FIG. 8, for example, the optional step of the false positive detection post-processing step S51 S515 additionally includes a list of substance IDs selected for each of at least two independent methods, in addition to confirming the intersection portion, further verifying the non-intersection portion. For example, after steps S5150 and S5151, the optional step S515 of the post-false positive detection processing step S51 additionally includes:
S5152:以所述至少两个物质列表ID1、ID2、ID3的并集减去交集得到待重检物质列表。S5152: Subtract the intersection of the at least two substance lists ID1, ID2, and ID3 to obtain a list of substances to be rechecked.
S5153:待重检物质列表再次进行增强误报检测。S5153: The list of substances to be rechecked is again subjected to enhanced false positive detection.
S5154:判断重新进行增强误报检测后是否有新确认的物质列表生成。若有则继续执行步骤S5155,否则跳转至步骤S5156。S5154: It is judged whether there is a newly confirmed substance list generation after re-incrementing the false alarm detection. If yes, proceed to step S5155, otherwise, go to step S5156.
S5155:生成重新识别列表。S5155: Generate a re-recognition list.
S5156:重新识别列表被赋值为空(NONE)。S5156: The re-recognition list is assigned the value NONE.
S5157:将重新识别列表赋值给第二识别列表。S5157: Assign the re-identification list to the second recognition list.
S5158:第一与第二识别列表合并生成识别物质列表。S5158: The first and second identification lists are combined to generate a list of identification substances.
其中,如上子步骤所述,图8的误报检测S5实质上在图7所示示例的基础上,实质上是对于经“交集判断”之后仍无法确认的“交集之外的补集”部分进行进一步分析验证。其具体步骤在下文中详细阐述。例如,图9是如图8所示的误报检测的另一种扩展的示例性实施例中的利用增强拉曼光谱执行的重新误报检测S5153的子流程图,示出如图8所示的重新误报检测S5153的示例性分解步骤。Wherein, as described in the above sub-steps, the false positive detection S5 of FIG. 8 is substantially based on the example shown in FIG. 7, and is substantially a portion of the "additions other than the intersection" that cannot be confirmed after the "intersection judgment". Perform further analysis and verification. The specific steps are explained in detail below. For example, FIG. 9 is a sub-flow diagram of re-false alarm detection S5153 performed using enhanced Raman spectroscopy in another extended exemplary embodiment of false alarm detection as shown in FIG. The re-false alarm detects an exemplary decomposition step of S5153.
在本公开的示例性实施例中,如图9所示,针对交集以外的补集部分,所述重新误报检测S5153例如包括:In an exemplary embodiment of the present disclosure, as shown in FIG. 9, for the complement portion other than the intersection, the re-false alarm detection S5153 includes, for example:
S51531:利用待测样品与增强剂混合获取增强拉曼光谱。S51531: Acquire enhanced Raman spectroscopy by mixing the sample to be tested and the enhancer.
S51532:执行误报检测。具体而言,例如:基于增强拉曼光谱,嵌 套利用前述步骤S5。S51532: Perform false alarm detection. Specifically, for example, based on enhanced Raman spectroscopy, embedded The sleeve utilizes the aforementioned step S5.
S51533:(例如人为确认)判断是否跳转至人工对比。S51533: (for example, human confirmation) to determine whether to jump to manual comparison.
S51534:跳转至人工对比的误报检测。S51534: Jump to manual comparison for false positive detection.
S51535:生成利用增强拉曼光谱重新执行误报检测而确认存在的物质列表。S51535: Generate a list of substances that are confirmed to be present by re-execution of false alarm detection using enhanced Raman spectroscopy.
作为示例,上述步骤S51531中,在利用实测物质样品的增强拉曼光谱数据进行检测时,所述待测样品与增强剂的混合物可以由待测样品与增强剂直接混合而成或由待测样品的水溶液或有机溶液与增强剂混合而成。同样,所述实测物质样品与增强剂的混合物由实测物质样品与增强剂直接混合而成或由实测物质样品的水溶液或有机溶液与增强剂混合而成。作为示例,增强剂可以包含金属纳米颗粒材料、金属纳米线、金属纳米团簇、碳纳米管和碳纳米颗粒中任一种或它们的组合。在另一示例中,增强剂可以包含金属纳米材料,也可在包含金属纳米材料的同时还包含氯离子、溴离子、钠离子、钾离子或硫酸根离子。所述金属例如可以包括金、银、铜、镁、铝、铁、钴、镍、钯或铂中的任一种或它们的组合。在实测物质样品与增强剂的混合物中,实测物质样品分子会附着于增强剂材料的表面,而增强剂材料表面的电磁场会使得实测物质样品的拉曼光谱信号得到增强。As an example, in the above step S51531, when detecting by using the enhanced Raman spectroscopy data of the sample of the measured substance, the mixture of the sample to be tested and the enhancer may be directly mixed by the sample to be tested and the enhancer or by the sample to be tested. The aqueous solution or organic solution is mixed with the reinforcing agent. Similarly, the mixture of the measured substance sample and the enhancer is formed by directly mixing the sample of the measured substance with the enhancer or by mixing an aqueous solution of the sample of the test substance or an organic solution with the enhancer. As an example, the enhancer may comprise any one of metal nanoparticle materials, metal nanowires, metal nanoclusters, carbon nanotubes, and carbon nanoparticles, or a combination thereof. In another example, the enhancer may comprise a metal nanomaterial, or may also contain a chloride nanoparticle, a bromide ion, a sodium ion, a potassium ion, or a sulfate ion. The metal may include, for example, any one of gold, silver, copper, magnesium, aluminum, iron, cobalt, nickel, palladium, or platinum, or a combination thereof. In the mixture of the measured substance sample and the enhancer, the molecules of the measured substance sample adhere to the surface of the enhancer material, and the electromagnetic field on the surface of the enhancer material enhances the Raman spectrum signal of the sample of the measured substance.
类似地,在本公开的其它示例性实施例中,也得到关于漏报检测S6的变型。Similarly, in other exemplary embodiments of the present disclosure, a variation regarding the false negative detection S6 is also obtained.
图11示出如图2所示的实际检测阶段中漏报检测的关于“三种方法选举”实现方式的扩展的示意性流程图。在图11的示例中的漏报检测流程S6与图10的优选实施例中的漏报检测流程S6的区别主要在于,如图11所示,例如,在完成基于“至少两个(由各种物质ID选择方法分别地)所识别的物质ID列表”之后,所述漏报检测后处理步骤S61还额外地包括可选的步骤S615即基于“交集”的进一步“三方法选举”。为简洁起见,其余相同的子步骤不再赘述。Figure 11 shows a schematic flow chart of an extension of the "three method elections" implementation of false negative detection in the actual detection phase as shown in Figure 2. The difference between the missing report detection flow S6 in the example of FIG. 11 and the missing report detection flow S6 in the preferred embodiment of FIG. 10 is mainly as shown in FIG. 11, for example, based on "at least two (by various After the substance ID selection method separately identifies the identified substance ID list, the missing report detection processing step S61 additionally includes an optional step S615, that is, a further "three method election" based on "intersection". For the sake of brevity, the remaining sub-steps will not be described again.
进一步地,图12是如图11所示的漏报检测的一种基本扩展的示例性实施例的示意性流程图。在图12的示例中的漏报检测流程S6与图15的示例中的漏报检测流程S6的区别主要在于,如图12所示,例如, 所述漏报检测后处理步骤S61的可选步骤S615具体地包括:Further, FIG. 12 is a schematic flow chart of a substantially expanded exemplary embodiment of the false negative detection shown in FIG. The difference between the missing report detection flow S6 in the example of FIG. 12 and the missing report detection flow S6 in the example of FIG. 15 mainly lies in, as shown in FIG. 12, for example, The optional step S615 of the post-report detection post-processing step S61 specifically includes:
步骤S6150:判断所生成的各物质列表ID1'、ID2'、ID3'中至少两个存在交集?若成立则继续执行步骤S6150,即认为利用至少两种独立的方法所分别选择生成的物质ID列表存在重叠部分,该重叠部分可用来生成一种经共同确认的识别物质ID列表;否则,则跳转至人工对比识别。Step S6150: It is judged that there is an intersection of at least two of the generated substance lists ID1', ID2', ID3'? If yes, proceed to step S6150, that is, there is an overlapped portion of the list of substance IDs respectively selected by using at least two independent methods, and the overlapping portion can be used to generate a list of commonly recognized identification substance IDs; otherwise, jump Go to manual contrast recognition.
步骤S6151:在步骤S6150成立的情况下,将所述交集赋值给第一识别列表。Step S6151: In the case where step S6150 is established, the intersection is assigned to the first identification list.
此后,第一识别列表在后续步骤S7直接作为经漏报检验后确认的物质列表。Thereafter, the first identification list is directly used as a list of substances confirmed by the missing report test in the subsequent step S7.
图12所示的漏报检测S6的扩展流程图在基于相似度识别和利用至少两种独立的方法的相同结果的判断来共同确认了识别物质ID列表之后,进一步利用至少两种独立的方法的结果的重叠部分即交集的判断来共同确认了识别物质ID列表,确保了识别准确度又被进一步提高。The extended flowchart of the false negative detection S6 shown in FIG. 12 further utilizes at least two independent methods after jointly identifying the identification substance ID list based on the similarity recognition and the determination of the same result using at least two independent methods. The overlapping portion of the results, that is, the judgment of the intersection, together confirms the list of identification substance IDs, ensuring that the recognition accuracy is further improved.
图13是如图11所示的漏报检测的另一种进一步扩展的示例性实施例的示意性流程图。在图13的示例中的漏报检测流程S6与图12的示例中的漏报检测流程S6的区别主要在于,如图13所示,例如,所述漏报检测后处理步骤S61的可选步骤S615还额外地包括,针对至少两种独立的方法各自选择出的物质ID列表,除了对交集部分进行确认之外,还进一步对非交集部分进行进一步验证。例如,在步骤S6150和S6151之后,所述漏报检测后处理步骤S61的可选步骤S615还额外地包括:FIG. 13 is a schematic flow chart of another further extended exemplary embodiment of the false negative detection shown in FIG. The difference between the missing report detection flow S6 in the example of FIG. 13 and the missing report detection flow S6 in the example of FIG. 12 is mainly that, as shown in FIG. 13, for example, the optional step of the missing report detection post-processing step S61 S615 additionally includes a list of substance IDs selected for each of the at least two independent methods, and further verifying the non-intersection portion in addition to confirming the intersection portion. For example, after steps S6150 and S6151, the optional step S615 of the post-report detection post-processing step S61 additionally includes:
S6152:以所述至少两个物质列表ID1、ID2、ID3的并集减去交集得到待重检物质列表。S6152: Subtract the intersection of the at least two substance lists ID1, ID2, and ID3 to obtain a list of substances to be re-examined.
S6153:待重检物质列表再次进行增强漏报检测。S6153: The list of substances to be rechecked is again subjected to enhanced false negative detection.
S6154:判断重新进行增强漏报检测后是否有新确认的物质列表生成。若有则继续执行步骤S6155,否则跳转至步骤S6156。S6154: It is judged whether or not a newly confirmed substance list is generated after the enhanced false negative detection is performed again. If yes, proceed to step S6155, otherwise, go to step S6156.
S6155:生成重新识别列表。S6155: Generate a re-recognition list.
S6156:重新识别列表被赋值为空(NONE)。S6156: The re-recognition list is assigned the value NONE.
S6157:将重新识别列表赋值给第二识别列表。S6157: Assign the re-identification list to the second recognition list.
S6158:第一与第二识别列表合并生成识别物质列表。 S6158: The first and second identification lists are combined to generate a list of identification substances.
其中,如上子步骤所述,图13的漏报检测S6实质上在图12所示示例的基础上,实质上是对于经“交集判断”之后仍无法确认的“交集之外的补集”部分进行进一步分析验证。其具体步骤在下文中详细阐述。例如,图14是如图13所示的漏报检测的另一种扩展的示例性实施例中的利用增强拉曼光谱执行的重新漏报检测S6153的子流程图,示出如图13所示的重新漏报检测S6153的示例性分解步骤。Wherein, as described in the above sub-steps, the missing report detection S6 of FIG. 13 is substantially on the basis of the example shown in FIG. 12, and is substantially a portion of the "additions other than the intersection" that cannot be confirmed after the "intersection judgment". Perform further analysis and verification. The specific steps are explained in detail below. For example, FIG. 14 is a sub-flowchart of re-false negative detection S6153 performed using enhanced Raman spectroscopy in another extended exemplary embodiment of the false negative detection shown in FIG. 13, which is shown in FIG. The re-missing detection detects an exemplary decomposition step of S6153.
在本公开的示例性实施例中,如图14所示,针对交集以外的补集部分,所述重新漏报检测S6153例如包括:In an exemplary embodiment of the present disclosure, as shown in FIG. 14, for the complement portion other than the intersection, the re-missing detection S6153 includes, for example:
S61531:利用待测样品与增强剂混合获取增强拉曼光谱。S61531: Using the sample to be tested and the enhancer to obtain an enhanced Raman spectrum.
S61532:执行漏报检测。具体而言,例如:基于增强拉曼光谱,嵌套利用前述步骤S6。S61532: Perform a false negative detection. Specifically, for example, based on the enhanced Raman spectrum, the above-described step S6 is nested.
S61533:(例如人为确认)判断是否跳转至人工对比。S61533: (for example, human confirmation) to determine whether to jump to manual comparison.
S61534:跳转至人工对比的漏报检测。S61534: Jump to manual comparison for false negative detection.
S61535:生成利用增强拉曼光谱重新执行漏报检测而确认存在的物质列表。S61535: Generate a list of substances that are confirmed to be present by re-executing the false negative detection using the enhanced Raman spectrum.
上述具体的操作流程具有严密的逻辑性,能够规避用户的非正常操作。为了实现本公开的目的,还例如替代地使用自学习混合物分析方法替换上述的自学习。The above specific operation flow has strict logic and can avoid the abnormal operation of the user. For the purposes of the present disclosure, the self-learning described above is also replaced, for example, by using a self-learning mixture analysis method.
在本公开的另一变型的实施例中,例如,现实中,在步骤S4即判断是否存在超过阈值的物质中,存在着这样一种可能性,即,虽然确认判断有超过阈值的物质,然而不排除仍然有漏报实际存在物质的可能性。具体地,图16示出根据本公开实施例的又一流程示意图,图示成也分为学习阶段和实际检测阶段两个阶段,其中示出关于同时存在误报和漏报可能的检测方式。In another embodiment of the present disclosure, for example, in reality, in step S4, that is, whether or not there is a substance exceeding a threshold value, there is a possibility that although it is confirmed that there is a substance exceeding a threshold value, It is not excluded that there is still a possibility of underreporting the actual substance. In particular, FIG. 16 shows a further flow diagram in accordance with an embodiment of the present disclosure, illustrated as being divided into two phases, a learning phase and an actual detection phase, in which a detection manner regarding the simultaneous presence of false positives and false negatives is shown.
如图16示意性地示出,若判断不存在超过阈值的物质,仍然如前述实施例所述,仅执行漏报检测。然而,若判断存在超过阈值的物质,则例如顺序地执行误报检测和漏报检测。由此可以实现更全面的物质ID的定性识别。 As schematically shown in Fig. 16, if it is judged that there is no substance exceeding the threshold, as described in the foregoing embodiment, only the false negative detection is performed. However, if it is judged that there is a substance exceeding the threshold, for example, false positive detection and false negative detection are sequentially performed. This allows a more comprehensive qualitative identification of the substance ID.
根据本公开的又一实施例,还提供一种电子设备,图17是示出了该电子设备的示例硬件布置100的框图。硬件布置100包括处理器106(例如,微处理器(μP)、数字信号处理器(DSP)等)。处理器106可以是用于执行本文描述的方法步骤的不同动作的单一处理单元或者是多个处理单元。布置100还可以包括用于从其他实体接收信号的输入单元102、以及用于向其他实体提供信号的输出单元104。输入单元102和输出单元104可以被布置为单一实体或者是分离的实体。In accordance with yet another embodiment of the present disclosure, an electronic device is also provided, and FIG. 17 is a block diagram showing an example hardware arrangement 100 of the electronic device. The hardware arrangement 100 includes a processor 106 (eg, a microprocessor (μP), a digital signal processor (DSP), etc.). Processor 106 may be a single processing unit or a plurality of processing units for performing different acts of the method steps described herein. The arrangement 100 may also include an input unit 102 for receiving signals from other entities, and an output unit 104 for providing signals to other entities. Input unit 102 and output unit 104 may be arranged as a single entity or as separate entities.
此外,布置100可以包括具有非易失性或易失性存储器形式的至少一个可读存储介质108,例如是电可擦除可编程只读存储器(EEPROM)、闪存、和/或硬盘驱动器。可读存储介质108包括计算机程序110,该计算机程序110包括代码/计算机可读指令,其在由布置100中的处理器106执行时使得硬件布置100和/或包括硬件布置100在内的设备可以执行例如上面结合上述实施例所描述的流程及其任何变形。Moreover, arrangement 100 can include at least one readable storage medium 108 in the form of a non-volatile or volatile memory, such as an electrically erasable programmable read only memory (EEPROM), flash memory, and/or a hard drive. The readable storage medium 108 includes a computer program 110 that includes code/computer readable instructions that, when executed by the processor 106 in the arrangement 100, cause the hardware arrangement 100 and/or the device including the hardware arrangement 100 to The flow described above in connection with the above embodiments and any variations thereof are performed.
计算机程序110可被配置为具有例如计算机程序模块110A~110C架构的计算机程序代码。因此,在例如设备中使用硬件布置100时的示例实施例中,布置100的计算机程序中的代码包括多个模块,包括但不限于例如图示的模块110A、110B和110C,所述多个模块分别被配置成用以执行不同的判断或运行步骤,如之前的图1-2、和图5-16中所示的任意流程、分流程、子流程中的一个或多个判断和或执行步骤。 Computer program 110 can be configured as computer program code having a computer program module 110A-110C architecture, for example. Thus, in an example embodiment when a hardware arrangement 100 is used, for example, in a device, the code in the computer program of arrangement 100 includes a plurality of modules, including but not limited to, for example, illustrated modules 110A, 110B, and 110C, the plurality of modules Respectively configured to perform different determinations or operational steps, such as any of the processes, sub-processes, sub-processes, and/or steps performed in the previous Figures 1-2, and 5-16 .
计算机程序模块实质上可以执行上述实施例中所描述的流程中的各个动作,以模拟设备。换言之,当在处理器106中执行不同计算机程序模块时,它们可以对应于设备中的上述不同单元。The computer program module can substantially perform the various actions in the flow described in the above embodiments to simulate the device. In other words, when different computer program modules are executed in processor 106, they may correspond to the different units described above in the device.
尽管上面结合图17所公开的实施例中的代码手段被实现为计算机程序模块,其在处理器106中执行时使得硬件布置100执行上面结合上述实施例所描述的动作,然而在备选实施例中,该代码手段中的至少一项可以至少被部分地实现为硬件电路。Although the code means in the embodiment disclosed above in connection with FIG. 17 is implemented as a computer program module that, when executed in processor 106, causes hardware arrangement 100 to perform the actions described above in connection with the above-described embodiments, in alternative embodiments At least one of the code means can be implemented at least partially as a hardware circuit.
处理器可以是单个CPU(中央处理单元),但也可以包括两个或更多个处理单元。例如,处理器可以包括通用微处理器、指令集处理器和/或相关芯片组和/或专用微处理器(例如,专用集成电路(ASIC))。处 理器还可以包括用于缓存用途的板载存储器。计算机程序可以由连接到处理器的计算机程序产品来承载。计算机程序产品可以包括其上存储有计算机程序的计算机可读介质。例如,计算机程序产品可以是闪存、随机存取存储器(RAM)、只读存储器(ROM)、EEPROM,且上述计算机程序模块在备选实施例中可以用UE内的存储器的形式被分布到不同计算机程序产品中。The processor may be a single CPU (Central Processing Unit), but may also include two or more processing units. For example, a processor can include a general purpose microprocessor, an instruction set processor, and/or a related chipset and/or a special purpose microprocessor (eg, an application specific integrated circuit (ASIC)). At The processor can also include an onboard memory for caching purposes. The computer program can be carried by a computer program product connected to the processor. The computer program product can comprise a computer readable medium having stored thereon a computer program. For example, the computer program product can be flash memory, random access memory (RAM), read only memory (ROM), EEPROM, and the computer program modules described above can be distributed to different computers in the form of memory within the UE in alternative embodiments. In the program product.
本公开至少具备以下有益效果:其能够充分利用相似度方法、自学习方法以及与可选人工识别方法的组合来实现高效快速的物质识别的光谱处理。The present disclosure has at least the following advantages: it can make full use of the similarity method, the self-learning method, and the combination with the optional manual recognition method to achieve efficient and rapid spectral processing of substance recognition.
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各实施例之间相同相似的部分互相参见即可。在此不再赘述。The various embodiments in the present specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the embodiments can be referred to each other. I will not repeat them here.
本领域的技术人员可以理解,上面所描述的实施例都是示例性的,并且本领域的技术人员可以对其进行改进,各种实施例中所描述的结构在不发生结构或者原理方面的冲突的情况下可以进行自由组合。It will be understood by those skilled in the art that the embodiments described above are exemplary and can be modified by those skilled in the art, and the structures described in the various embodiments do not conflict in structure or principle. In the case of free combination.
虽然结合附图对本公开进行了说明,但是附图中公开的实施例旨在对本公开优选实施方式进行示例性说明,而不能理解为对本公开的一种限制。附图中的尺寸比例仅仅是示意性的,也并不能理解为对本公开的限制。The present disclosure has been described with reference to the accompanying drawings, which are intended to be illustrative of the preferred embodiments of the present invention and are not to be construed as limiting. The size ratios in the drawings are merely illustrative and are not to be construed as limiting the disclosure.
虽然本公开总体构思的一些实施例已被显示和说明,本领域普通技术人员将理解,在不背离本总体发明构思的原则和精神的情况下,可对这些实施例做出改变,本公开的范围以权利要求和它们的等同物限定。 While some embodiments of the present general inventive concept have been shown and described, it will be understood by those of ordinary skill in the art The scope is defined by the claims and their equivalents.

Claims (17)

  1. 一种基于拉曼光谱进行自学习式定性分析的方法,包括:A method for self-learning qualitative analysis based on Raman spectroscopy, comprising:
    拉曼光谱采集步骤:采集待实测物品的拉曼光谱;Raman spectroscopy acquisition step: collecting Raman spectra of the items to be tested;
    特征提取和对比步骤:提取拉曼光谱数据与谱图库中的光谱特征库比较,获取原始识别物质ID列表;Feature extraction and comparison steps: extracting Raman spectral data and comparing the spectral feature database in the spectral library to obtain a list of original identification substance IDs;
    相似度比较步骤:针对拉曼光谱计算获取原始识别物质ID列表中每个物质ID的相似度来生成相似度列表,并且与谱图库中的相似度阈值库进行对比;以及Similarity comparison step: obtaining a similarity list for each substance ID in the original identification substance ID list for Raman spectroscopy calculation, and generating a similarity list, and comparing with the similarity threshold library in the spectral library;
    物质ID选择步骤:基于自学习库来对经与相似度阈值比较后所获相似度超相似度阈值的相似度识别物质ID列表进行验证检测,包括误报检测和漏报检测,当相似度列表中存在超过相似度阈值库中所储存的物质ID对应相似度阈值的物质ID时,执行误报检测;当相似度列表中不存在超过相似度阈值库中所储存的物质ID对应相似度阈值的物质ID时,执行漏报检测。Substance ID selection step: performing verification detection on the similarity identification substance ID list obtained by comparing the similarity super-similarity threshold with the similarity threshold based on the self-learning library, including false positive detection and false negative detection, when the similarity list When there is a substance ID exceeding the similarity threshold of the substance ID stored in the similarity threshold library, the false positive detection is performed; when there is no similarity threshold in the similarity list that exceeds the substance ID stored in the similarity threshold library When the substance ID is used, the false negative detection is performed.
  2. 根据权利要求1所述的方法,其中,当相似度列表中存在超过相似度阈值库中所储存的物质ID对应相似度阈值的物质ID时,先执行误报检测之后再额外地执行漏报检测。The method according to claim 1, wherein when there is a substance ID in the similarity list that exceeds the similarity threshold of the substance ID stored in the similarity threshold library, the false negative detection is performed after the false alarm detection is performed first. .
  3. 根据权利要求1所述的方法,其中,所述误报检测和所述漏报检测中任一种均设置成选择性地执行三种并行的物质ID选择方法,包括:The method of claim 1 wherein any one of said false positive detection and said false negative detection is arranged to selectively perform three parallel material ID selection methods, comprising:
    统计选择方法:对自学习库中的所有误报或漏报物质ID进行统计选择;Statistical selection method: statistical selection of all false positives or missing material IDs in the self-learning library;
    特征识别方法:对于自学习库中“自学习类型”取值为“特征识别”的误报或漏报物质ID进行特征识别方式的选择;和Feature recognition method: selecting a feature recognition method for a false positive or missing material ID of a "self-learning type" in the self-learning library; and
    二次识别方法:对于自学习库中“自学习类型”取值为“二次识别”的误报或漏报物质ID进行二次识别方式的选择。Secondary recognition method: For the self-learning library, the "self-learning type" takes the value of "secondary recognition" and the false positive or missing material ID is selected twice.
  4. 根据权利要求3所述的方法,其中,所述误报检测和所述漏报 检测中任一种均设置成包括前处理步骤和后处理步骤,The method of claim 3 wherein said false positive detection and said false negative Any one of the tests is set to include a pre-processing step and a post-processing step,
    所述前处理步骤包括:通过将已识别物质ID列表中的ID与自学习库中对于所有误报或漏报物质ID、对于自学习库中“自学习类型”取值为“特征识别”的误报或漏报物质ID、以及对于自学习库中“自学习类型”取值为“二次识别”的误报或漏报物质ID分别比较,来针对分别生成所述统计选择方法、所述特征识别方法和所述二次识别方法的最高正确物质ID次数;以及The pre-processing step includes: by using the ID in the list of identified substance IDs and the self-learning library for all false positives or missing material IDs, and for the "self-learning type" in the self-learning library, the value is "feature identification" The false positive or false negative substance ID, and the false positive or false negative substance IDs whose "self-learning type" values in the self-learning library are "secondary identification" are respectively compared to generate the statistical selection method, the The feature identification method and the highest correct substance ID number of the secondary recognition method;
    所述后处理步骤基于所述统计选择方法、所述特征识别方法和所述二次识别方法的最高正确物质ID次数与各自次数阈值的对比来选择性地执行所述三种物质ID选择方法。The post-processing step selectively performs the three substance ID selection methods based on a comparison of the highest correct substance ID number of the statistical selection method, the feature identification method, and the secondary recognition method with respective number thresholds.
  5. 根据权利要求4所述的方法,其中,所述误报检测的前处理步骤中的已识别物质ID列表选择为所述相似度识别物质ID列表。The method according to claim 4, wherein the list of identified substance IDs in the pre-processing step of the false positive detection is selected as the similarity identifying substance ID list.
  6. 根据权利要求4所述的方法,其中,所述漏报检测的前处理步骤中的已识别物质ID列表选择为所述原始识别物质ID列表。The method according to claim 4, wherein the list of identified substance IDs in the pre-processing step of the missing report detection is selected as the original identification substance ID list.
  7. 根据权利要求4所述的方法,其中,对于所述自学习库中的所有误报或漏报物质ID获得的所述最高正确物质ID次数的次数阈值被设置为大于对于所述自学习库中“自学习类型”取值为“特征识别”和“二次识别”之一的误报或漏报物质ID获得的所述最高正确物质ID次数的次数阈值。The method of claim 4, wherein a threshold number of times of the highest correct substance ID number obtained for all false positive or missing material IDs in the self-learning library is set to be larger than for the self-learning library The "self-learning type" takes the value of the number of times of the highest correct substance ID number obtained by the false positive or missing material ID of one of "feature recognition" and "secondary recognition".
  8. 根据权利要求4或7所述的方法,其中,当所述统计选择方法、所述特征识别方法和所述二次识别方法的最高正确物质ID次数与各自相应次数阈值比较时,在条件“最高正确物质ID次数大于次数阈值”成立至少两次的情况下,继续选择性地执行三种并行的物质ID选择方法中满足该条件的方法来生成相应的至少两种识别物质ID列表。The method according to claim 4 or 7, wherein when the number of highest correct substance IDs of said statistical selection method, said feature recognition method, and said secondary recognition method is compared with respective respective number of times thresholds, the condition is "highest" In the case where the correct substance ID number is greater than the number of times threshold "established at least twice, the method of satisfying the condition in the three parallel substance ID selection methods is continuously performed selectively to generate the corresponding at least two identification substance ID lists.
  9. 根据权利要求8所述的方法,其中,所生成的至少两种识别物 质ID列表若相等,则确认为经验证检测后的识别物质ID列表。The method of claim 8 wherein the generated at least two identifiers If the quality ID list is equal, it is confirmed as the list of the identification substance ID after the verification detection.
  10. 根据权利要求8所述的方法,其中,所生成的至少两种识别物质ID列表若存在交集,则确认交集为经验证检测后的识别物质ID列表。The method according to claim 8, wherein if there is an intersection of the generated at least two identification substance ID lists, the intersection is confirmed as the list of the identified substance IDs after the verification detection.
  11. 根据权利要求10所述的方法,其中,针对所生成的至少两种识别物质ID列表中的交集以外的部分再次执行所述物质ID选择步骤。The method according to claim 10, wherein said substance ID selecting step is performed again for a portion other than the intersection in the generated at least two identification substance ID lists.
  12. 根据权利要求11所述的方法,其中,再次执行的所述物质ID选择步骤包括利用待实测物品与增强剂混合获取增强拉曼光谱来进行的增强检测。The method according to claim 11, wherein said substance ID selecting step performed again comprises enhanced detection by using an article to be measured and an enhancer to obtain an enhanced Raman spectrum.
  13. 根据权利要求4所述的方法,其中,所述误报检测的前处理步骤中,仅当统计的误报次数大于误报次数阈值时,执行所述误报检测的后处理步骤。The method according to claim 4, wherein in the pre-processing step of the false positive detection, the post-processing step of the false positive detection is performed only when the number of statistical false positives is greater than the false alarm count threshold.
  14. 根据权利要求3至13中任一项所述的方法,还包括:A method according to any one of claims 3 to 13, further comprising:
    在对待实测物品执行定性分析完成之后,将获得的误报物质ID列表和漏报物质ID列表按照“自学习类型”字段加入自学习库。After the qualitative analysis is performed on the measured object, the obtained false positive substance ID list and the missing material ID list are added to the self-learning library according to the "self-learning type" field.
  15. 根据权利要求1所述的方法,在对待实测物品执行定性分析之前,还包括:The method according to claim 1, before performing qualitative analysis on the item to be tested, further comprising:
    利用学习样本物质对自学习库进行初始学习和输入预置的初始自学习库之一来创建自学习库。The self-learning library is created by using the learning sample material to perform initial learning on the self-learning library and inputting a preset initial self-learning library.
  16. 根据权利要求1所述的方法,还包括:The method of claim 1 further comprising:
    选择性地利用人工对比方法识别物质。Manually comparing methods to identify substances.
  17. 一种电子设备,包括:An electronic device comprising:
    存储器,用于存储可执行指令;以及 a memory for storing executable instructions;
    处理器,用于执行存储器中存储的可执行指令,以执行如权利要求1-16中任一项所述的方法。 A processor for executing executable instructions stored in a memory to perform the method of any of claims 1-16.
PCT/CN2017/109712 2016-12-26 2017-11-07 Self-learning-type qualitative analysis method based on raman spectrum WO2018121082A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611220308.2 2016-12-26
CN201611220308.2A CN108240978B (en) 2016-12-26 2016-12-26 Self-learning qualitative analysis method based on Raman spectrum

Publications (1)

Publication Number Publication Date
WO2018121082A1 true WO2018121082A1 (en) 2018-07-05

Family

ID=62702114

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/109712 WO2018121082A1 (en) 2016-12-26 2017-11-07 Self-learning-type qualitative analysis method based on raman spectrum

Country Status (2)

Country Link
CN (1) CN108240978B (en)
WO (1) WO2018121082A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709637A (en) * 2020-06-11 2020-09-25 中国科学院西安光学精密机械研究所 Qualitative analysis method for interference degree of spectral curve
CN112395803A (en) * 2020-09-11 2021-02-23 北京工商大学 ICP-AES multimodal spectral line separation method based on particle swarm optimization
CN113466206A (en) * 2021-06-23 2021-10-01 上海仪电(集团)有限公司中央研究院 Raman spectrum analysis system based on big data
CN114814593A (en) * 2022-04-29 2022-07-29 哈尔滨工业大学(威海) Min's distance and two-step detection strategy-based battery pack multi-fault diagnosis method

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210559B (en) * 2019-05-31 2021-10-08 北京小米移动软件有限公司 Object screening method and device and storage medium
CN112763477B (en) * 2020-12-30 2022-11-08 山东省食品药品检验研究院 Rapid evaluation system for pharmaceutical imitation quality based on Raman spectrum

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050225758A1 (en) * 2004-03-23 2005-10-13 Knopp Kevin J Raman optical identification tag
CN101324544A (en) * 2007-06-15 2008-12-17 徐向阳 Method for recognizing sample using characteristic fingerprint pattern
CN104215623A (en) * 2013-05-31 2014-12-17 欧普图斯(苏州)光学纳米科技有限公司 Multi-industry detection-oriented laser Raman spectrum intelligent identification method and system
CN104458693A (en) * 2013-09-25 2015-03-25 同方威视技术股份有限公司 Raman spectrum measuring method for drug detection
CN104749158A (en) * 2013-12-27 2015-07-01 同方威视技术股份有限公司 Jade jewelry appraisal method and device thereof
CN106198482A (en) * 2015-05-04 2016-12-07 清华大学 The method whether being added with Western medicine in detection health product based on Raman spectrum

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE4406768A1 (en) * 1994-03-02 1995-09-07 Mnogotrasslevoe N Proizv Ob Ed Optical spectroscopic method for identification of esp. ruby
CN101458214A (en) * 2008-12-15 2009-06-17 浙江大学 Organic polymer solution concentration detecting method
CN101995395B (en) * 2009-08-14 2013-07-31 上海镭立激光科技有限公司 Method for online detecting material by laser induction multiple spectrum united fingerprint network
WO2013001549A1 (en) * 2011-06-29 2013-01-03 Shetty Ravindra K Devices connect and operate universally by learning
CN102507532A (en) * 2011-11-11 2012-06-20 上海化工研究院 Chemical composition instant recognition system based on Raman spectrum
CN106020508A (en) * 2016-07-18 2016-10-12 南京医健通信息科技有限公司 Self-learning method for rapid and intelligent input of data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050225758A1 (en) * 2004-03-23 2005-10-13 Knopp Kevin J Raman optical identification tag
CN101324544A (en) * 2007-06-15 2008-12-17 徐向阳 Method for recognizing sample using characteristic fingerprint pattern
CN104215623A (en) * 2013-05-31 2014-12-17 欧普图斯(苏州)光学纳米科技有限公司 Multi-industry detection-oriented laser Raman spectrum intelligent identification method and system
CN104458693A (en) * 2013-09-25 2015-03-25 同方威视技术股份有限公司 Raman spectrum measuring method for drug detection
CN104749158A (en) * 2013-12-27 2015-07-01 同方威视技术股份有限公司 Jade jewelry appraisal method and device thereof
CN106198482A (en) * 2015-05-04 2016-12-07 清华大学 The method whether being added with Western medicine in detection health product based on Raman spectrum

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709637A (en) * 2020-06-11 2020-09-25 中国科学院西安光学精密机械研究所 Qualitative analysis method for interference degree of spectral curve
CN111709637B (en) * 2020-06-11 2023-08-22 中国科学院西安光学精密机械研究所 Qualitative analysis method for interference degree of spectrum curve
CN112395803A (en) * 2020-09-11 2021-02-23 北京工商大学 ICP-AES multimodal spectral line separation method based on particle swarm optimization
CN112395803B (en) * 2020-09-11 2023-10-13 北京工商大学 ICP-AES multimodal spectral line separation method based on particle swarm optimization
CN113466206A (en) * 2021-06-23 2021-10-01 上海仪电(集团)有限公司中央研究院 Raman spectrum analysis system based on big data
CN114814593A (en) * 2022-04-29 2022-07-29 哈尔滨工业大学(威海) Min's distance and two-step detection strategy-based battery pack multi-fault diagnosis method

Also Published As

Publication number Publication date
CN108240978A (en) 2018-07-03
CN108240978B (en) 2020-01-21

Similar Documents

Publication Publication Date Title
WO2018121082A1 (en) Self-learning-type qualitative analysis method based on raman spectrum
CN108254351B (en) Raman spectrum detection method for checking articles
WO2018121121A1 (en) Method for use in subtracting spectrogram background, method for identifying substance via raman spectrum, and electronic device
EP2853885B1 (en) Raman spectrum measuring method for drug inspection
WO2016177002A1 (en) Raman spectroscopy-based method for detecting addition of western medicines into healthcare product
US20190034518A1 (en) Target class feature model
JP6743892B2 (en) Mass spectrometry data analyzer and analysis method
US7277807B2 (en) Method for processing a set of spectra, particularly NMR spectra
TW201350836A (en) Optimization of unknown defect rejection for automatic defect classification
JP6715451B2 (en) Mass spectrum analysis system, method and program
CN108844941B (en) Method for identifying and classifying different-grade phosphate ores based on Raman spectrum and PCA-HCA
US20220107346A1 (en) Method and apparatus for non-intrusive program tracing with bandwith reduction for embedded computing systems
CN114611582A (en) Method and system for analyzing substance concentration based on near infrared spectrum technology
TWI493168B (en) A method computer program and system to analyze mass spectra
US8000940B2 (en) Shape parameter for hematology instruments
CN110310897A (en) The guidance inspection of semiconductor wafer based on space density analysis
JP2015512522A (en) Method for measuring the performance of a spectroscopic system
WO2018121151A1 (en) Method for identifying raman spectrogram, and electronic device
CN117373565A (en) Library construction method, identification method and device for ion mobility spectrometry-mass spectrometry
CN108169204A (en) A kind of Raman spectra pretreatment method based on database
CN113791062A (en) Method for judging fixed substance type based on Raman spectrum
CN114694771A (en) Sample classification method, training method of classifier, device and medium
US20180137270A1 (en) Method and apparatus for non-intrusive program tracing for embedded computing systems
CN110069039A (en) Measurement result analytical equipment, measurement result analysis method and computer-readable medium
US11187664B2 (en) Devices and methods for detecting elements in a sample

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17888677

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17888677

Country of ref document: EP

Kind code of ref document: A1