CN116755724B - PCA software installation method - Google Patents

PCA software installation method Download PDF

Info

Publication number
CN116755724B
CN116755724B CN202310763998.XA CN202310763998A CN116755724B CN 116755724 B CN116755724 B CN 116755724B CN 202310763998 A CN202310763998 A CN 202310763998A CN 116755724 B CN116755724 B CN 116755724B
Authority
CN
China
Prior art keywords
file
pca
software
mass spectrum
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310763998.XA
Other languages
Chinese (zh)
Other versions
CN116755724A (en
Inventor
刘敏
李晔
杨静
何天豪
黄晔
何尔凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Publication of CN116755724A publication Critical patent/CN116755724A/en
Application granted granted Critical
Publication of CN116755724B publication Critical patent/CN116755724B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/61Installation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses PCA software and an installation method, comprising the following steps: software and installation method. In particular to the technical field of PCA software application. The name of the software is advanced SpectraPCAToolbox, the download address and the use method are published in the application. The application provides a download address and a detailed download position of PCA software, and carries out complete and clear explanation on the installation, parameter modification and use description of the downloaded PCA software, thereby solving the problems that the conventional PCA software is inconvenient and fast to download and use, and errors are easily generated due to the fact that the PCA software is automatically installed in the process of self-installation after being downloaded and used, and the normal use of the PCA software is affected.

Description

PCA software installation method
Technical Field
The invention relates to the technical field of PCA software use, in particular to a PCA software installation method.
Background
PCA is an acronym for English Principal Component Analysis and is a commonly used data analysis method. PCA transforms raw data into a set of linearly independent representations of each dimension through linear transformation, and can be used for extracting main characteristic components of data and is commonly used for dimension reduction of high-dimension data. Data dimension reduction is another common problem of unsupervised learning.
The existing PCA software is inconvenient and fast to download and use, so that errors are easily generated due to installation errors in the process of self-installation after the software is downloaded and used, and normal use of the PCA software is affected.
Disclosure of Invention
In order to achieve the above purpose, the present invention provides the following technical solutions:
a PCA software comprising a Python script and an editable software file, the Python script using a software package comprising: PANDAS, NUMPY, SCIKIT-LEARN and MATLOTLIB, wherein the whole workflow comprises file input and output, interaction with a user, data preprocessing, principal component analysis, drawing of a visual image and calculation of measurement, wherein the input is text data of an original spectrogram, the text data comprises all peak positions and peak intensities, the input file is divided and arranged into different groups according to requirements to represent different sample groups, and the output principal component analysis result comprises a histogram of important proportion of each principal component, a fractional scatter diagram of the principal component as extraction characteristics and a factor load diagram of peak of each principal component; automatically reading mass spectrum information from txt data files in a specified format, extracting principal components, combining every two of the first 5 most important principal components extracted with each other and respectively serving as an x axis and a y axis, drawing a fractional scatter diagram, simultaneously calculating the average central value and variance of numerical values for the first 5 most important principal components extracted by a program, and then drawing a 90% confidence interval.
The name of the PCA software is Advanced Spectra PCA Toolbox, and the download address of the Advanced Spectra PCA Toolbox software is (https:// docs. Anaconda. Com/anaconda/install /);
the installation method of the PCA software comprises the following steps:
step S1: opening the installation package and decompressing the installation package after the software is downloaded;
step S2: finding out unit mass spectrum txt data from a decompressed file of software, and deriving the unit mass spectrum txt data;
step S3: finding out the PCA7.Py file and the PCA subfolder attached to the file, and then changing the path of the PCA7.Py file and the PCA subfolder attached to the file;
step S4: grouping the names of the mass spectrum files, and coding and replacing the mass spectrum files after the grouping is finished;
step S5: determining an output file, finding a graph of the output file, and arranging group names of groups in the graph;
step S6: find and open Anaconda Powershell Prompt.exe's run program, then run the program with the identity of the administrator;
step S7: enter the program and view the output results of the PCA.
Compared with the prior art, the invention has the beneficial effects that:
the application provides a download address and a detailed download position of PCA software, and carries out complete and clear explanation on the installation, parameter modification and use description of the downloaded PCA software, thereby solving the problems that the conventional PCA software is inconvenient and fast to download and use, and errors are easily generated due to the fact that the PCA software is automatically installed in the process of self-installation after being downloaded and used, and the normal use of the PCA software is affected.
Detailed Description
In the following, the technical solutions in the embodiments of the present invention will be clearly and completely described in conjunction with the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the embodiment of the invention, the following technical scheme is provided:
SIMS Mass Spectrometry data batch processing workflow and PCA automated analysis by custom-written Python scripts and editable software files (Editable Files of software. Zip), in a free, open source and portable Python-based scientific environment named WINPYTHON (v3.6.7.0, https:// WINPYTHON. Github. Io /). The software package used in the script includes: PANDAS (v0.23.4), NUMPY (v 1.15), SCIKIT-LEARN (v0.20.2) and MATPLOTLIB (v 3.0.2). The whole workflow comprises input and output of files, interaction with a user, data preprocessing, principal component analysis, drawing of visual images and calculation of metrics. Input is text data of the original spectrogram, which contains all peak positions and peak intensities. The input files are required to be divided into different groups according to requirements, and represent different sample groups. The output principal component analysis results include a histogram of the important proportions of the principal components, a fractional scatter diagram of the principal components as extraction features, and a factor load diagram of the peak values of the principal components. Details of the downloading and installation of the Python package are found in our manual (document s1. Manual docx, found in the support information).
The mass spectrum information can be automatically read from the txt data file with a specified format in a programmable manner (the detailed information of the txt data format is seen in an operation manual) and the main components are extracted. The first 5 most significant principal components extracted are combined with each other and taken as x-axis and y-axis, respectively, and a point scatter plot is drawn, while for each individual data set, the program will calculate the mean center value and variance of the values, and then draw a 90% confidence interval. In combination with the score and load map acquired by the new mode, deeper information of compositional differences between different samples can be found.
For verification, one embodiment of the software is set to be Advanced Spectra PCA Toolbox, the software is uploaded on the website on the application date, the download address of the Advanced Spectra PCA Toolbox software is (https:// docs. Anaconda. Com/anaconda/install /), then the program is downloaded according to the installation step in the web page, and after the downloading is completed, the menu is selected to be installed on microsoft.
The method for installing the software comprises the following steps:
step S1: opening the installation package and decompressing the installation package after the software is downloaded;
after the software is installed. The PCA file and PCA folder are placed in one folder of the software (the application typically places these two files in the C-disc in the position of submenu scripts of the sub-Anaconda menu.
Step S2: finding out unit mass spectrum txt data from a decompressed file of software, and deriving the unit mass spectrum txt data;
step S3: finding out the PCA7.Py file and the PCA subfolder attached to the file, and then changing the path of the PCA7.Py file and the PCA subfolder attached to the file;
the default working path is changed by finding the PCA7.Py file and its accompanying PCA folder. For example: the PCA file stores the position of the submenu (script) of the subordinate (boa) menu in the path C disc, and the storage path of the attached PCA folder is in the branching menu of the PCA of the submenu (script) of the subordinate (boa) menu in the C disc, then opens the PCA7.Py file with a notepad, then finds the default working path code, the code pcaDir value is included in the branching menu of the PCA of the position of the submenu (script) of the subordinate (boa) menu in the C disc, and the path in the PCA7.Py file can be adjusted to be the same as the actual storage path, if necessary.
Step S4: grouping the names of the mass spectrum files, and coding and replacing the mass spectrum files after the grouping is finished;
the group column names of the Mass spectrum files are that firstly, the txt data file is opened by using Excel, then the related data column is cut, then the Excel file is pasted to a new Excel file, the Mass spectrum data can be grouped by changing the column names, and the attention should be paid to ensure that the name of the first column is Mass (u), in the process, by adding Arabic number of each group, before the group number based on the Mass spectrum column name is needed,
reference example one:
(1) The original column name is a file written by English letters, arabic numerals are added in front of the file written by English letters after the file is recombined and named, the new file is required to be saved as a test (desktop definition) after the file is recombined and named, and the saved test (desktop definition) is a TXT format file;
(2) The new file is then pasted to the corresponding path: in the conventional data in the branched menu of the PCA of the submenu (script) of the subordinate (boa) menu in the C-disc.
Step S5: determining an output file, finding a graph of the output file, and arranging group names of groups in the graph;
reference example two:
(1) Determining each group name in the output diagram, wherein the specific process is to find the group name file in the branch menu of PCA of the submenu (script) of the subordinate (boa) menu in the path C disk;
(2) The Group names displayed in the last drawing are then opened, followed by renaming or entering the Group names, respectively, after the Arabic numerals, in the format of Arabic numerals preceded by 0.
Step S6: find and open Anaconda Powershell Prompt.exe or Anaconde.exe program, then run the program with the identity of the administrator;
reference example three:
(1) Find the program named Anaconda Powershell Prompt (Anaconda) exe in the start menu of the software and then run the program with administrator identity
(2) The command "cd C \Anacondas\scripts" is input, then the enter key is pressed to enter the next menu, then the command python pca7.Py is input in the menu, and finally the enter key is pressed to run.
Step S7: enter the program and view the output results of the PCA.
Reference example four:
entering a folder of output in a PCA branch menu of a submenu (script) of a subordinate (boa) menu in a C disc, and checking a PCA output result after entering the folder, wherein the PCA output result comprises the following parts:
a.10 Zhang Hanyou score plot of the PC1-PC5 two-dimensional principal components combined with each other for the confidence interval;
b, 10 score graphs of mutual combination of PC1-PC5 two-dimensional principal components without confidence intervals;
5 individual one-dimensional fractional graphs of PC1-PC 5;
a PC1-PC5 score table;
e.5 PC1-PC5 load maps;
f.5 PC1-PC5 front 20 load tables;
PC1-PC10 "percent explained variance" bar graph;
h.PC1-PC10 "percent explained variance" table;
a PC1-PC5 load table;
if the size of the picture coordinate system is to be changed, the picture resolution menu is opened and then the picture resolution menu is entered into the pca7.Py file, then the corresponding parameter positions of the 16 th-32 th rows are found from the pca7.Py file, then the parameter values are deleted, finally the values are saved and run again for use, and the values can be modified according to the following table confidence limits, and the details are shown in the following table.
TABLE 1 confidence limits
Reference example five:
if the number of principal PCs needs to be increased or decreased, the pca7.Py file may be opened, then the following corresponding content is found from the pca7.Py file, then the corresponding parameters are modified, and finally the pca7.Py file is saved and rerun.
Reference example six:
in the two-dimensional score map of 10 PC1-PC5 main components combined with each other and the two-dimensional score map of 10 PC1-PC5 main components combined with no confidence interval and the single one-dimensional score map of 5 PC1-PC5, if the size and the color in the picture are to be changed, a pca7.Py file can be opened, then highlight matters are found from the file and deleted, and the file is saved and operated again after the deletion.
Reference example seven:
in the two-dimensional fractional diagram with 10 PC1-PC5 main components combined with each other and containing a confidence interval, if the picture proportion, the label font size, the font model, the font thickness and the line shape, the line width, the color and the transparency of the confidence interval are to be changed, a pca7.Py file can be opened, and then parameters with high brightness are found to delete, and the file is saved and operated again after deletion.
Reference example eight:
in the two-dimensional fractional graphs of 10 PC1-PC5 main components which do not contain confidence intervals, if the proportion of the pictures, the font size of the labels, the font model and the font thickness are to be changed, a pca7.Py file can be opened, then highlighted parameter deletion is found, and the file is saved and is operated again after deletion.
Reference example nine:
in the single one-dimensional fractional diagram of 5 PCs 1-PC5, if the proportion of the picture, the font size of the label, the font model and the font thickness are to be changed, a pca7.py file can be opened, then the highlighted parameter deletion is found, and the file is saved and operated again after the deletion.
Reference example ten:
in the independent load diagram of 5 PCs 1-PC5, if the proportion of pictures, the number of extracted loads, the font size, the font model, the font thickness of labels, the size of columns in the histogram, the color and the text size are to be changed, a pca7.py file can be opened, then parameters with high brightness are found for deletion, and the file is saved and operated again after deletion.
Reference example eleven:
in the bar graph of PC1 to PC10 "interpret percent of variance", if the picture proportion, font size, font model, font thickness of the label, and size and color of the bar in the bar graph are to be changed, the pca7.Py file may be opened, and then the highlighted parameter deletion is found, and the file is saved and run again after deletion.
Reference twelve:
in the single load diagram of 5 PCs 1-PC5, if the marked load peak value number needs to be changed, a pca7.Py file can be opened, then highlighted parameter deletion is found, and the file is saved and operated again after deletion.
Reference example thirteen:
in the independent load tables of 5 PCs 1-PC5, if the number of positive and negative loads in the tables is to be changed, a pca7.Py file can be opened, then highlighted parameter deletion is found, and the file is saved and is operated again after deletion:
the foregoing description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical solution of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims (1)

1. The method for installing PCA software is characterized by comprising the following steps of:
the PCA software comprises a Python script and an editable software file, wherein a software package used in the Python script comprises: PANDAS, NUMPY, SCIKIT-LEARN and MATLOTLIB, wherein the whole workflow comprises file input and output, interaction with a user, data preprocessing, principal component analysis, drawing of a visual image and calculation of measurement, wherein the input is text data of an original spectrogram, the text data comprises all peak positions and peak intensities, the input file is divided and arranged into different groups according to requirements to represent different sample groups, and the output principal component analysis result comprises a histogram of important proportion of each principal component, a fractional scatter diagram of the principal component as extraction characteristics and a factor load diagram of peak of each principal component; automatically reading mass spectrum information from a txt data file in a specified format, extracting principal components, combining every two of the extracted first 5 most important principal components, respectively serving as an x axis and a y axis, drawing a fractional scatter diagram, simultaneously for the extracted first 5 most important principal components, calculating an average central value and variance of numerical values by a program, and then drawing a 90% confidence interval;
the installation method of the PCA software comprises the following steps:
step S1: opening the installation package and decompressing the installation package after the PCA software is downloaded;
step S2: finding out unit mass spectrum txt data from a decompression file of PCA software, and deriving the unit mass spectrum txt data;
step S3: finding out the PCA7.Py file and the PCA subfolder attached to the file, and then changing the path of the PCA7.Py file and the PCA subfolder attached to the file;
step S4: the names of the unit mass spectrum txt data files are listed, and after the listing is completed, the encoding and the replacement of the unit mass spectrum txt data files are carried out;
step S5: determining an output file, finding a graph of the output file, and arranging group names of groups in the graph;
step S6: find and open a program named Anaconda Powershell Prompt. Exe, then run the program with the identity of the administrator;
step S7: entering a program and checking the output result of the PCA software.
CN202310763998.XA 2022-11-29 2023-06-27 PCA software installation method Active CN116755724B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2022115098632 2022-11-29
CN202211509863 2022-11-29

Publications (2)

Publication Number Publication Date
CN116755724A CN116755724A (en) 2023-09-15
CN116755724B true CN116755724B (en) 2024-02-02

Family

ID=87960609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310763998.XA Active CN116755724B (en) 2022-11-29 2023-06-27 PCA software installation method

Country Status (1)

Country Link
CN (1) CN116755724B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6301708B1 (en) * 1998-11-12 2001-10-09 Hewlett-Packard Company Software installation process using abstract data and program files
CN104820602A (en) * 2015-05-18 2015-08-05 北京瑞星信息技术有限公司 Method, device and system for publishing software package
WO2016000623A1 (en) * 2014-07-01 2016-01-07 北京奇虎科技有限公司 Method, apparatus and system for initializing intelligent terminal device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6301708B1 (en) * 1998-11-12 2001-10-09 Hewlett-Packard Company Software installation process using abstract data and program files
WO2016000623A1 (en) * 2014-07-01 2016-01-07 北京奇虎科技有限公司 Method, apparatus and system for initializing intelligent terminal device
CN104820602A (en) * 2015-05-18 2015-08-05 北京瑞星信息技术有限公司 Method, device and system for publishing software package

Also Published As

Publication number Publication date
CN116755724A (en) 2023-09-15

Similar Documents

Publication Publication Date Title
US9690788B2 (en) File type recognition analysis method and system
Pletschacher et al. The page (page analysis and ground-truth elements) format framework
Montesanto A fast GNU method to draw accurate scientific illustrations for taxonomy
US7313514B2 (en) Validating content of localization data files
Hunt MacProbe: A Macintosh-based experimenter’s workstation for the cognitive sciences
US20150026200A1 (en) Systems and Methods for Extracting Data from a Document in an Electronic Format
Goloboff et al. GB‐to‐TNT: facilitating creation of matrices from GenBank and diagnosis of results in TNT
CN103399848B (en) Engine test data normalization specific format imports processing method
CN113609820A (en) Method, device and equipment for generating word file based on extensible markup language file
US20200311406A1 (en) Method for analysing digital documents
CN111667231B (en) Automatic tax return method, device, system, computer equipment and storage medium
CN116755724B (en) PCA software installation method
WO2022150110A1 (en) Document content extraction and regression testing
JP6812944B2 (en) Information management device for mass spectrometer
CN110647573B (en) Chart visualization editing method and system based on database technology
CN113567605A (en) Method and device for constructing automatic interpretation model of mass chromatogram and electronic equipment
CN111695330A (en) Method and device for generating table, electronic equipment and computer-readable storage medium
US20050187904A1 (en) Data processing unit and data processing program stored in computer readable medium
CN110780970A (en) Data screening method, device, equipment and computer readable storage medium
CN114821618A (en) Analysis method for OFD reading software display effect
CN114169306A (en) Method, device and equipment for generating electronic receipt and readable storage medium
CN109656821B (en) Test method and device
CN111241096A (en) Text extraction method, system, terminal and storage medium for EXCEL document
GB2438769A (en) Vehicle quality analysis system and method for managing a plurality of data
LU502685B1 (en) An Analysis and Calculation Device and Method of Coal Geological Composition Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant