CN112309577B - Multi-mode feature selection method for optimizing parkinsonism voice data - Google Patents

Multi-mode feature selection method for optimizing parkinsonism voice data Download PDF

Info

Publication number
CN112309577B
CN112309577B CN202011078465.0A CN202011078465A CN112309577B CN 112309577 B CN112309577 B CN 112309577B CN 202011078465 A CN202011078465 A CN 202011078465A CN 112309577 B CN112309577 B CN 112309577B
Authority
CN
China
Prior art keywords
individual
optimal
individuals
population
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011078465.0A
Other languages
Chinese (zh)
Other versions
CN112309577A (en
Inventor
胡晓敏
张首荣
李敏
陈伟能
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202011078465.0A priority Critical patent/CN112309577B/en
Publication of CN112309577A publication Critical patent/CN112309577A/en
Application granted granted Critical
Publication of CN112309577B publication Critical patent/CN112309577B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/61Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/65Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Public Health (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The application discloses a multi-mode feature selection method for optimizing parkinsonism voice data, which comprises the following steps: establishing a Parkinson voice data set, initializing a population based on a particle swarm algorithm, and determining characteristic character strings of individuals according to a real number coding scheme; dividing individuals in the population into niches according to the individual adaptation values; updating the historical optimal value and the historical optimal position of each individual, and updating the position and the adaptive value of the optimal individual in each niche; updating the position and the speed of each individual, and evaluating the adaptation value of the individual according to the characteristic character string of each individual and combining the Parkinson voice data set; the updated individuals are used as new populations and are compared with the initialized populations to obtain new generation populations; screening, retaining optimal individuals of the two populations, and eliminating repeated individuals to obtain a new generation population for evolution; all optimal individuals of each generation are output, and the characteristic combination of the optimal individuals is used for assisting in judging whether the Parkinson disease exists or not.

Description

Multi-mode feature selection method for optimizing parkinsonism voice data
Technical Field
The application relates to the technical fields of medical technology and evolutionary computing, in particular to a multi-modal feature method for reducing the dimensionality of parkinsonism voice data.
Background
The etiology of parkinsonism is currently unknown and cannot be completely cured. The disease can be detected in early stage of disease, and has great significance for improving life experience of patients and treating parkinsonism. Various approaches have been proposed by researchers to assist doctors in diagnosing parkinson's disease, including hand-drawn signal diagnosis and voice data prediction. In recent years, more and more researchers analyze voice data through voice signal processing algorithms, machine learning algorithms, support vector machines and the like to determine parkinson's disease. The voice data collected by researchers is also disclosed in UCI databases such as the parkinsonism voice dataset of Max Little at oxford university and the parkinsonism voice dataset of Olcay et al. However, the dimension of the voice data set collected in real life is often larger, and the calculation cost and the time cost are greatly increased when the voice data set is subjected to analysis training of a large number of examples.
The evolutionary algorithm has the characteristics of low calculation cost, high convergence speed and simple and understandable structure, and has been widely applied to solve the problem of feature selection. However, conventional evolutionary algorithms can only provide one solution if applied to feature selection, either multi-objective optimization or unimodal optimization.
Disclosure of Invention
The application aims to provide a multi-mode feature selection method for optimizing parkinsonism voice data, which is used for overcoming the defects that the existing algorithm is high in time cost and can only provide a single prediction scheme.
In order to realize the tasks, the application adopts the following technical scheme:
a multi-modal feature selection method for optimizing parkinsonism voice data comprises the following steps:
extracting original parkinsonism voice data, determining attributes and labels and establishing parkinsonism voice data sets; wherein the attribute represents a collection standard of voice data, and the label represents whether a person corresponding to the voice data is healthy or diseased;
initializing a population based on a particle swarm algorithm, and initializing a position range of an individual according to the dimension of the Parkinson voice data set, thereby determining a search space; determining characteristic character strings of individuals according to a real number coding scheme;
randomly dividing the whole population according to the individual adaptation value, and dividing the individuals in the population into niches;
updating the historical optimal value and the historical optimal position of each individual, and guiding the searching direction of the individual; updating the position and the adaptation value of the optimal individual in each niche, taking the optimal individual of each niche as the global optimal individual of all the individuals of the niche, and further guiding the searching direction of the individuals;
updating the position and the speed of each individual, and evaluating the adaptation value of the individual according to the characteristic character string of each individual and combining the Parkinson voice data set; the updated individuals are used as new populations and compared with the old populations to obtain new generation populations;
screening the new generation population and the old population, retaining the optimal individuals of the two populations, and eliminating repeated individuals to obtain the new generation population for evolution;
outputting all optimal individuals of each generation; the combination of features of all of the optimal individuals will be applied to the prediction of parkinson's disease.
Further, the step of initializing the position range of the individual according to the dimension of the Parkinson voice data set so as to determine a search space; determining the characteristic string of the individual according to the real number coding scheme comprises:
the range of each dimension position of the individual is set in the interval of [0,1], and the individual randomly initializes a real number for each dimension position in the interval; each dimension of an individual corresponds to each attribute of the dataset, each individual being a potential solution; the continuous real value of the individual position is converted into a discrete 01 character string, and the coding scheme is as follows:
wherein the method comprises the steps ofBinary value representing the ith individual d-th dimension,/->Representing the position of the ith individual's d-th dimension; after conversion to a string, each individual represents a potential solution, each bin of an individual represents an attribute of the dataset, 0 represents that the attribute is valid, and 1 represents that the attribute is invalid.
Further, the random division of the whole population according to the individual adaptation value, and the division of the individuals in the population into niches comprises the following steps:
firstly, setting the size N of each sub-population, then sorting the individuals of the whole population according to the adaptive value, selecting the optimal individuals P in the current population, calculating the distance between all the individuals and the optimal individuals P, and finding out N-1 individuals nearest to the optimal individuals P, wherein the N-1 individuals and the optimal individuals P form a niche; and finally, removing N individuals forming the niche from the population, and repeating the steps until all the individuals are separated in the corresponding niche.
Further, the updating the historical optimal value and the historical optimal position of each individual includes:
if the current fitness value of the individual is better than the historical optimal value, the position x of the current individual is used i (t+1) and an adaptation value fit (x i (t+1)) to replace the history optimal adaptation value and the history optimal position;
if the adaptation value of the current individual is equal to the historical optimal adaptation value, selecting a random value to be compared with a threshold value of 0.5, and updating the historical optimal position by using the position of the current individual with half probability; if the current individual fitness value is less than the historical optimal fitness value, the historical optimal position and the optimal fitness value are reserved.
Further, the update of the location and velocity of each individual is formulated as follows:
where k represents the kth niche, i represents the ith individual, d represents the historical optimal position of the d-th dimension,represents the speed of individuals at the t th generation,/>Representing the historical optimal position of the individual,/->Represents the kth niche optimal individual position,/->Representing the position of the t-th generation individual, w is the internal weight, c1 and c2 are the two acceleration coefficients, r1 and r2 are at [0,1]The interval oral administration is from uniformly distributed random values.
Further, the step of comparing the updated individuals as a new population with the initialized population to obtain a new generation population includes:
firstly, preserving the optimal individuals of two populations, and if the number of the optimal individuals is larger than or equal to the size of the population at the moment, preserving the individuals with the number equal to the size of the population; if the optimal individual number is smaller than the population size, all the optimal individuals are saved, the optimal individuals are selected from the two populations to make up for the missing population number, and finally the population evolving for the new generation is obtained.
Further, the outputting all optimal individuals of each generation, the feature combinations of all optimal individuals being used to assist in determining whether the parkinson's disease is present, includes:
selecting optimal individuals from the new generation population, and storing the optimal individuals in an external set, and if the optimal individuals in the current generation are better than the optimal individuals stored in the external set in the previous old population, firstly emptying the external set and then storing the optimal individuals in the current generation; if the current optimal individual is equivalent to the optimal individual stored in the external set in the adaptation value, judging whether the optimal individual is repeated, adding the external set without repeating, and discarding the repetition; if the current optimal individual is worse than the optimal individual previously stored in the external set, directly discarding the current optimal individual; whereby this external set is updated continuously with evolving algebra; when the search is completed, all the optimal individuals of each generation of external set are output, and the characteristic combination of the optimal individuals represents each scheme capable of correctly judging the Parkinson disease.
A multi-modal feature selection apparatus for optimizing parkinsonism data, comprising:
the input module is used for extracting original parkinsonism voice data, determining attributes and labels and establishing parkinsonism voice data sets; wherein the attribute represents a collection standard of voice data, and the label represents whether a person corresponding to the voice data is healthy or diseased;
the initialization module is used for initializing a population based on a particle swarm algorithm and initializing the position range of an individual according to the dimension of the Parkinson voice data set so as to determine a search space; determining characteristic character strings of individuals according to a real number coding scheme;
the division module is used for randomly dividing the whole population according to the individual adaptation value and dividing the individuals in the population into niches;
the individual updating module is used for updating the historical optimal value and the historical optimal position of each individual and guiding the searching direction of the individual; updating the position and the adaptation value of the optimal individual in each niche, taking the optimal individual of each niche as the global optimal individual of all the individuals of the niche, and further guiding the searching direction of the individuals;
the population updating module is used for updating the position and the speed of each individual and evaluating the adaptation value of the individual according to the characteristic character string of each individual and combining the Parkinson voice data set; the updated individuals are used as new populations and compared with the old populations to obtain new generation populations;
the screening module is used for screening the new generation population and the old population, retaining the optimal individuals of the two populations, and eliminating repeated individuals to obtain the new generation population for evolution;
the output module is used for outputting all optimal individuals of each generation; the combination of features of all of the optimal individuals will be applied to the prediction of parkinson's disease.
A computer comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor executing the computer program to perform the steps of the multimodal feature selection method for optimizing parkinsonism data.
A computer readable storage medium storing a computer program which when executed by a processor performs the steps of the multimodal feature selection method for optimizing parkinsonism data.
Compared with the prior art, the application has the following technical characteristics:
1. the feature selection problem of optimizing the parkinsonism voice data is treated as a multi-mode discrete optimization problem, so that the data dimension can be effectively reduced, the time and the cost can be reduced, and a plurality of alternatives can be found out to predict parkinsonism.
2. A niche technique based on species clusters is used to divide the sub-populations. Conventional particle swarm algorithms tend to be unimodal optimized when used to study feature selection problems, i.e., find an optimal solution. According to the application, the particle swarm algorithm is expanded from unimodal optimization to multi-modal optimization by using a niche technology based on species swarm, and individuals evolve in each niche, so that more optimal solutions can be found.
3. A screening rule for generating a next generation population is designed, selection is made among optimal individuals of the new and old populations, and the optimal non-repeated individuals can be ensured to be continued to the next generation population. The screening rule can save different optimal schemes to guide the searching direction of the population, is beneficial to the maintenance of the diversity of the population and avoids sinking into a certain local optimal area.
4. Screening and preservation rules for the most preferred solution are designed. The optimal individuals generated in each generation are stored in an external set, so that the searching capability of an algorithm can be fully embodied, and the loss of the optimal individuals in the exploration process is avoided. And at the end of the search, all the alternatives for predicting parkinson's disease can be obtained by analyzing all the individuals retained by the external set.
Drawings
FIG. 1 is a schematic flow chart of the method of the present application;
FIG. 2 is a pseudo-code schematic of a species-based clustered niche algorithm;
FIG. 3 is a pseudo code schematic diagram of a screening process to generate a next generation population;
FIG. 4 is a schematic diagram of pseudocode for screening and preserving optimal individuals using an external set.
Detailed Description
Because the parkinsonism voice data collected in real life has the characteristic of large data dimension, the direct training can obviously increase time cost and calculation cost, and because the attribute of the voice data has redundancy and uncorrelated characteristics, the classification accuracy can be even reduced even if the voice data is directly used without screening, and the misjudgment of medical staff on parkinsonism diseases is caused.
In order to reduce the dimension of voice data and the detection cost, and rapidly judge the result, the parkinsonism voice data set needs to be subjected to feature selection preprocessing operation. However, the evolutionary algorithm for researching the characteristic selection problem is biased to unimodal optimization, and only one solution is found. The multi-mode particle swarm algorithm designed by the application not only can effectively reduce the dimensionality of parkinsonism voice data, but also provides a plurality of reliable schemes for predicting parkinsonism.
The method comprises the steps of initializing a group by combining a Parkinson voice data set, dividing a plurality of sub-groups according to a niche rule, and evolving each instance in the sub-groups to find an optimal position. Each instance, after finding a new location, is compared to the old population to determine the next generation population, and the optimal instance is saved as a feature set for determining parkinson's disease.
Referring to fig. 1, the multi-modal feature selection method for optimizing parkinsonism voice data according to the present application comprises the following steps:
and 1, taking original parkinsonism data, determining attributes and labels, and establishing a parkinsonism data set.
Extracting the original parkinsonism data, determining the attribute and the label, establishing a data set, and storing the data in an external document in a LIBSVM format. The attributes represent the collection criteria of the voice data, while the tags represent whether the person is healthy or ill; an external document holding parkinsonism data will be used to evaluate the performance of each individual in the population. Wherein the LIBSVM format is (dimension value: data value), each attribute of the dataset corresponds to each dimension value, and the data under that attribute corresponds to the data value under that dimension. Each piece of data stored in the libvm format can clearly show how many dimensions each piece of data has and the specific data value for each dimension.
Since the external document holds the pre-processed parkinsonism data set, the subsequent step obtains parkinsonism data by reading the external document.
Step 2, initializing a population based on a particle swarm algorithm, setting the size of the population according to the user demand, and setting the size of the population to be 30 in consideration of calculation cost. The search space is determined by initializing a range of locations of the individual from the parkinsonism data set dimension.
Wherein, the range of each dimension position of the individual is set in the interval of [0,1], and the individual randomly initializes a real number for each dimension position in the interval; each dimension of an individual corresponds to each attribute of the dataset, and each individual is a potential solution. Because the position of each individual is a continuous real value in the traditional particle swarm algorithm, the method is not suitable for processing the discrete characteristic selection problem, and a real number coding scheme is adopted to convert the continuous real value of the position of each individual into a discrete 01 character string. The coding scheme is as follows:
wherein the method comprises the steps ofBinary value representing the ith individual d-th dimension,/->Representing the position of the ith individual's d-th dimension; after conversion to a string, each individual represents a potential solution, each bin of an individual represents an attribute of the dataset, 0 represents that the attribute is valid, and 1 represents that the attribute is invalid.
And 3, randomly dividing the whole population according to the individual adaptation value, and dividing the individuals in the population into niches.
Dividing the whole population by adopting a small-sized mirror rule based on species clusters, firstly setting the size N of each sub-population, then sequencing the individuals of the whole population according to the adaptive value, selecting the optimal individuals P in the current population, calculating the distance between all the individuals and the optimal individuals P, and finding out N-1 individuals nearest to the optimal individuals P, wherein the N-1 individuals and the optimal individuals P form a niche; and finally, removing N individuals forming the niche from the population, and repeating the steps until all the individuals are separated in the corresponding niche.
It is noted here that the distance criterion requires the use of hamming distances, since the feature selection problem for parkinsonism speech data is performed in discrete space. At this point, additional space is required to hold all the individuals of this population, designated oldsharm, for comparison with the new population, and fig. 2 illustrates the process of dividing the niche.
In the step, a species cluster-based niche technology is adopted to divide sub-populations, each sub-population represents a potential mountain peak, and in the following series of steps, operation conforming to multi-mode optimization is selected, so that individuals evolve and search in respective niches. And thus several alternatives for predicting parkinson's disease may be available at the end of evolution.
Step 4, updating the historical optimal position pbest of each individual i (t) and historical optimum value fit (pbest) i (t)) for guiding the search direction of the individual; if the current fitness value of the individual is better than the historical optimal value, the position x of the current individual is used i (t+1) and an adaptation value fit (x i (t+1)) to replace the history optimal adaptation value and the history optimal position;
if the adaptation value of the current individual is equal to the historical optimal adaptation value, selecting a random value to be compared with a threshold value of 0.5, and updating the historical optimal position by using the position of the current individual with half probability; if the adaptation value of the current individual is smaller than the historical optimal adaptation value, the historical optimal position and the optimal adaptation value are reserved; the specific formula is as follows:
where i represents the ith individual of the population and t represents the current algebra.
Step 5, updating the position lbest of each niche optimal individual k And an adaptation value fit (lbestk); the optimal individual for each niche will serve as the globally optimal individual for all individuals of the niche, further guiding the search direction of the individual.
The traditional global optimal individual is only suitable for solving the problem of single-peak optimization. After being divided into a plurality of sub-populations, the traditional method is not suitable for being applied to multi-mode optimization, and therefore the optimal individual of each small lens is selected as a substitute in the scheme.
And 6, updating the position and the speed of each individual, wherein the speed and the position updating formula is as follows:
where k represents the kth niche, i represents the ith individual, d represents the historical optimal position of the d-th dimension,represents the speed of individuals at the t th generation,/>Representing the historical optimal position of the individual,/->Represents the kth niche optimal individual position,/->Representing the position of the t-th generation individual, w is an internal weight and controls the influence of the previous speed on the current speed; c1 and c2 are two acceleration coefficients controlling the position of the particle's historical optimum and global optimum and the impact on the search process; r1 and r2 are in the range of [0,1]]The interval oral administration is from uniformly distributed random values.
In this step, the new and old populations generated are combined and the next generation population is screened. In order to reduce the collision probability among the optimal individuals and slow down the loss of the optimal individuals, the optimal individuals are selected from the new and old populations, and the optimal number N of non-repeated individuals which does not exceed the population size M is stored at most. If the optimal number of individuals does not reach the population size, then (M-N) individuals are randomly selected from the sub-optimal individuals to compensate for the remaining number. Such rules ensure that the discovered optimal individuals continue as far as possible to the next generation.
And 7, after updating the state of the individual, adopting a 1NN classifier according to the characteristic character string of each individual, and evaluating the adaptation value of the individual by combining the Parkinson voice data, wherein the adaptation value is the classification accuracy. And then, the updated individuals are used as a new population, and compared with the population oldSwart in the step 3 to obtain a new generation population, wherein the specific process is as follows:
firstly, preserving the optimal individuals of two populations, and if the number of the optimal individuals is larger than or equal to the size of the population, preserving the individuals with the number corresponding to the size of the population. If the optimal number of individuals is less than the population size, then all optimal individuals are saved and the preferred individual from the two populations is selected to compensate for the missing population number. Finally, the population evolving for the new generation is obtained. Fig. 3 illustrates the population screening process visually.
And 8, outputting all optimal individuals of each generation, wherein the characteristic combination of all optimal individuals is used for assisting in judging whether the Parkinson disease exists or not.
Selecting optimal individuals from the new generation population, and storing the optimal individuals in an external set, and if the current generation optimal individuals are better than the optimal individuals stored in the external set in the previous old population, firstly emptying the external set and then storing the current generation optimal individuals; if the current optimal individual is equivalent to the optimal individual stored in the external set in the adaptation value, judging whether the optimal individual is repeated, adding the external set without repeating, and discarding the repetition; if the current optimal individual is worse than the optimal individual previously stored in the external set of storage, directly discarding the current optimal individual; whereby this external set is updated continuously with evolving algebra; when the search is completed, all the optimal individuals of each generation of external set are output, and the characteristic combination of the optimal individuals represents each scheme capable of correctly judging the Parkinson disease. Figure 4 illustrates the screening process for the external set of optimal individuals.
This step employs an external set of archive to preserve the optimal non-duplicate individuals for each generation. In view of the fact that the optimal individuals are lost in the whole searching process, the algorithm searching capability can be fully embodied by adopting an external set to store each generation of optimal individuals; after the next generation of new population for evolution is generated, all optimal non-repeated individuals of the population are extracted, and then the comparison screening is carried out with the external set, so that the external set is ensured to store the optimal non-repeated individuals. Finding out the truly effective attribute column capable of accurately predicting the Parkinson disease according to the 01 characteristic character string of each optimal individual, and judging whether one person suffers from the Parkinson disease or not according to the attribute column combination.
Step 9, applying all optimal individuals of each generation in the external set to predict Parkinson's disease. As the population evolves throughout the search space, the parkinsonism data set evaluates the fitness value of each individual by a 1NN classifier. In the application, the adaptive value is the classification accuracy, and the adaptive value can judge the quality of the individuals so as to guide the searching of the population. The external set of individuals stores all the optimal non-repeated individuals found in the population evolution process, namely all the individuals with the largest classification accuracy and non-repeated individuals. These optimal individuals are also plausible by verification of the parkinsonism data set.
The feature combinations of each optimal individual represent selected attribute columns of the dataset, meaning that the healthcare worker or researcher need only select or match these attribute columns and then determine from a phonetic perspective whether a person suffers from parkinson's disease based on this combination of attribute columns. In practical application, medical staff or researchers need to collect voice data of a plurality of individuals first, label the corresponding voice data for health or parkinsonian disease according to the health state of the individuals, then train by adopting the method provided by the application, find all the optimal feature combinations, and finally judge whether an individual has parkinsonian disease or not according to the feature combinations, namely label an unknown labeled individual.
Based on the above process, after the parkinsonism voice data is obtained, the attribute of the parkinsonism voice data can be accurately judged by continuously selecting the multi-mode particle swarm algorithm designed by the application, and the redundant attribute is removed, so that the calculation cost and the time cost are reduced, and the judging accuracy is even further improved. In addition, under the condition of ensuring the accuracy, the application can find a plurality of optimal characteristic solutions, and the 01 characteristic strings represent a plurality of schemes for judging the Parkinson's disease.
Examples:
in order to embody the effect and meaning of the multi-mode feature method provided by the application, the application adopts an oxford university parkinsonism detection data set as an example, wherein the data set has 195 pieces of voice data, and each piece of voice data has 22 dimensions. The experimental results are shown in the table:
in the table, the character string 1 indicates that the attribute of the voice data is selected, and 0 indicates that the attribute is not selected. When feature selection is not performed, that is, the classification accuracy is only 96.41% according to all attribute columns. However, the classification accuracy of the feature selection method designed by the application is improved to 99.49%, so that not only is the required attribute reduced, but also at least three prediction schemes are found. This demonstrates that the feature selection method of the present application not only effectively reduces the data dimension and improves the classification accuracy of predicting parkinson's disease, but also provides a variety of prediction schemes.
According to another aspect of the present application, there is provided a multi-modal feature selection apparatus for optimizing parkinsonism voice data, comprising:
the input module is used for extracting original parkinsonism voice data, determining attributes and labels and establishing parkinsonism voice data sets; wherein the attribute represents a collection standard of voice data, and the label represents whether a person corresponding to the voice data is healthy or diseased;
the initialization module is used for initializing a population based on a particle swarm algorithm and initializing the position range of an individual according to the dimension of the Parkinson voice data set so as to determine a search space; determining characteristic character strings of individuals according to a real number coding scheme;
the division module is used for randomly dividing the whole population according to the individual adaptation value and dividing the individuals in the population into niches;
the individual updating module is used for updating the historical optimal value and the historical optimal position of each individual and guiding the searching direction of the individual; updating the position and the adaptation value of the optimal individual in each niche, taking the optimal individual of each niche as the global optimal individual of all the individuals of the niche, and further guiding the searching direction of the individuals;
the population updating module is used for updating the position and the speed of each individual and evaluating the adaptation value of the individual according to the characteristic character string of each individual and combining the Parkinson voice data set; the updated individuals are used as new populations and compared with the old populations to obtain new generation populations;
the screening module is used for screening the new generation population and the old population, retaining the optimal individuals of the two populations, and eliminating repeated individuals to obtain the new generation population for evolution;
the output module is used for outputting all optimal individuals of each generation; the combination of features of all of the optimal individuals will be applied to the prediction of parkinson's disease.
It should be noted that, the specific functions and the relevant explanations of the above respective modules refer to the corresponding steps 1 to 9 in the foregoing method embodiments, which are not described herein.
The embodiment of the application further provides a computer which can be a terminal device, a controller, a server and the like; the multi-modal feature selection method for optimizing parkinsonism data comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the multi-modal feature selection method for optimizing parkinsonism data, such as the steps 1 to 9.
A computer program may also be split into one or more modules/units that are stored in a memory and executed by a processor to perform the present application. One or more of the modules/units may be a series of computer program instruction segments capable of performing specific functions, where the instruction segments are used to describe the execution of the computer program in the terminal device, for example, the computer program may be divided into an input module, an initialization module, a division module, an individual update module, a population update module, a screening module, and an output module, and the functions of each module are referred to in the foregoing apparatuses and are not described herein.
The implementation of the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described multimodal feature selection method for optimizing parkinsonism voice data, for example, the aforementioned steps 1 to 9.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (8)

1. A multi-modal feature selection method for optimizing parkinsonism voice data, comprising the steps of:
extracting original parkinsonism voice data, determining attributes and labels and establishing parkinsonism voice data sets; wherein the attribute represents a collection standard of voice data, and the label represents whether a person corresponding to the voice data is healthy or diseased;
initializing a population based on a particle swarm algorithm, and initializing a position range of an individual according to the dimension of the Parkinson voice data set, thereby determining a search space; determining characteristic character strings of individuals according to a real number coding scheme;
randomly dividing the whole population according to the individual adaptation value, and dividing the individuals in the population into niches;
updating the historical optimal value and the historical optimal position of each individual, and guiding the searching direction of the individual; updating the position and the adaptation value of the optimal individual in each niche, taking the optimal individual of each niche as the global optimal individual of all the individuals of the niche, and further guiding the searching direction of the individuals;
updating the position and the speed of each individual, and evaluating the adaptation value of the individual according to the characteristic character string of each individual and combining the Parkinson voice data set; the updated individuals are used as new populations and compared with the old populations to obtain new generation populations;
screening the new generation population and the old population, retaining the optimal individuals of the two populations, and eliminating repeated individuals to obtain the new generation population for evolution;
outputting all optimal individuals of each generation; all of the optimal individual feature combinations will be applied to the prediction of parkinson's disease;
the updating of the historical optimal value and the historical optimal position of each individual comprises:
if the current fitness value of the individual is better than the historical optimal value, the position x of the current individual is used i (t+1) and an adaptation value fit (x i (t+1)) to replace the history optimal adaptation value and the history optimal position;
if the adaptation value of the current individual is equal to the historical optimal adaptation value, selecting a random value to be compared with a threshold value of 0.5, and updating the historical optimal position by using the position of the current individual with half probability; if the adaptation value of the current individual is smaller than the historical optimal adaptation value, the historical optimal position and the optimal adaptation value are reserved;
the update of the location and velocity of each individual is formulated as follows:
where k represents the kth niche, i represents the ith individual, d represents the historical optimal position of the d-th dimension,represents the speed of individuals at the t th generation,/>Representing the historical optimal position of the individual,/->Represents the kth niche optimal individual location,representing the position of the t-th generation individual, w is the internal weight, c1 and c2 are the two acceleration coefficients, r1 and r2 are at [0,1]The interval oral administration is from uniformly distributed random values.
2. The method of claim 1, wherein the determining the search space is performed by initializing a range of locations of the individual based on dimensions of the parkinson's voice data set; determining the characteristic string of the individual according to the real number coding scheme comprises:
the range of each dimension position of the individual is set in the interval of [0,1], and the individual randomly initializes a real number for each dimension position in the interval; each dimension of an individual corresponds to each attribute of the dataset, each individual being a potential solution; the continuous real value of the individual position is converted into a discrete 01 character string, and the coding scheme is as follows:
wherein the method comprises the steps ofBinary value representing the ith individual d-th dimension,/->Representing the position of the ith individual's d-th dimension; after conversion to a string, each individual represents a potential solution, each bin of an individual represents an attribute of the dataset, 0 represents that the attribute is valid, and 1 represents that the attribute is invalid.
3. The method for multimodal feature selection for optimizing parkinsonian speech data according to claim 1, wherein said randomly partitioning the entire population according to the individual fitness value, partitioning the individuals in the population into niches, comprises:
firstly, setting the size N of each sub-population, then sorting the individuals of the whole population according to the adaptive value, selecting the optimal individuals P in the current population, calculating the distance between all the individuals and the optimal individuals P, and finding out N-1 individuals nearest to the optimal individuals P, wherein the N-1 individuals and the optimal individuals P form a niche; and finally, removing N individuals forming the niche from the population, and repeating the steps until all the individuals are separated in the corresponding niche.
4. The method of claim 1, wherein comparing the updated individuals as new populations to the initialized populations to obtain new generation populations comprises:
firstly, preserving the optimal individuals of two populations, and if the number of the optimal individuals is larger than or equal to the size of the population at the moment, preserving the individuals with the number equal to the size of the population; if the optimal individual number is smaller than the population size, all the optimal individuals are saved, the optimal individuals are selected from the two populations to make up for the missing population number, and finally the population evolving for the new generation is obtained.
5. The method of claim 1, wherein outputting all optimal individuals for each generation, the combination of features of all optimal individuals being used to assist in determining whether a parkinson's disease is present, comprises:
selecting optimal individuals from the new generation population, and storing the optimal individuals in an external set, and if the optimal individuals in the current generation are better than the optimal individuals stored in the external set in the previous old population, firstly emptying the external set and then storing the optimal individuals in the current generation; if the current optimal individual is equivalent to the optimal individual stored in the external set in the adaptation value, judging whether the optimal individual is repeated, adding the external set without repeating, and discarding the repetition; if the current optimal individual is worse than the optimal individual previously stored in the external set, directly discarding the current optimal individual; whereby this external set is updated continuously with evolving algebra; when the search is completed, all the optimal individuals of each generation of external set are output, and the characteristic combination of the optimal individuals represents each scheme capable of correctly judging the Parkinson disease.
6. A multi-modal feature selection apparatus for optimizing parkinsonism data, comprising:
the input module is used for extracting original parkinsonism voice data, determining attributes and labels and establishing parkinsonism voice data sets; wherein the attribute represents a collection standard of voice data, and the label represents whether a person corresponding to the voice data is healthy or diseased;
the initialization module is used for initializing a population based on a particle swarm algorithm and initializing the position range of an individual according to the dimension of the Parkinson voice data set so as to determine a search space; determining characteristic character strings of individuals according to a real number coding scheme;
the division module is used for randomly dividing the whole population according to the individual adaptation value and dividing the individuals in the population into niches;
the individual updating module is used for updating the historical optimal value and the historical optimal position of each individual and guiding the searching direction of the individual; updating the position and the adaptation value of the optimal individual in each niche, taking the optimal individual of each niche as the global optimal individual of all the individuals of the niche, and further guiding the searching direction of the individuals;
the population updating module is used for updating the position and the speed of each individual and evaluating the adaptation value of the individual according to the characteristic character string of each individual and combining the Parkinson voice data set; the updated individuals are used as new populations and compared with the old populations to obtain new generation populations;
the screening module is used for screening the new generation population and the old population, retaining the optimal individuals of the two populations, and eliminating repeated individuals to obtain the new generation population for evolution;
the output module is used for outputting all optimal individuals of each generation; all of the optimal individual feature combinations will be applied to the prediction of parkinson's disease;
the updating of the historical optimal value and the historical optimal position of each individual comprises:
if the current fitness value of the individual is better than the historical optimal value, the position x of the current individual is used i (t+1) and an adaptation value fit (x i (t+1)) to replace the history optimal adaptation value and the history optimal position;
if the adaptation value of the current individual is equal to the historical optimal adaptation value, selecting a random value to be compared with a threshold value of 0.5, and updating the historical optimal position by using the position of the current individual with half probability; if the adaptation value of the current individual is smaller than the historical optimal adaptation value, the historical optimal position and the optimal adaptation value are reserved;
the update of the location and velocity of each individual is formulated as follows:
where k represents the kth niche, i represents the ith individual, d represents the historical optimal position of the d-th dimension,represents the speed of individuals at the t th generation,/>Representing the historical optimal position of the individual,/->Represents the kth niche optimal individual location,representing the position of the t-th generation individual, w is the internal weight, c1 and c2 are the two acceleration coefficients, r1 and r2 are at [0,1]The interval oral administration is from uniformly distributed random values.
7. A computer comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the steps of the multimodal feature selection method for optimizing parkinsonism data according to any of claims 1 to 5 are carried out by the processor when the computer program is executed.
8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the multimodal feature selection method for optimizing parkinsonism data according to any of claims 1 to 5.
CN202011078465.0A 2020-10-10 2020-10-10 Multi-mode feature selection method for optimizing parkinsonism voice data Active CN112309577B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011078465.0A CN112309577B (en) 2020-10-10 2020-10-10 Multi-mode feature selection method for optimizing parkinsonism voice data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011078465.0A CN112309577B (en) 2020-10-10 2020-10-10 Multi-mode feature selection method for optimizing parkinsonism voice data

Publications (2)

Publication Number Publication Date
CN112309577A CN112309577A (en) 2021-02-02
CN112309577B true CN112309577B (en) 2023-10-13

Family

ID=74489493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011078465.0A Active CN112309577B (en) 2020-10-10 2020-10-10 Multi-mode feature selection method for optimizing parkinsonism voice data

Country Status (1)

Country Link
CN (1) CN112309577B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361563B (en) * 2021-04-22 2022-11-25 重庆大学 Parkinson's disease voice data classification system based on sample and feature double transformation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663100A (en) * 2012-04-13 2012-09-12 西安电子科技大学 Two-stage hybrid particle swarm optimization clustering method
CN108595499A (en) * 2018-03-18 2018-09-28 西安财经学院 A kind of population cluster High dimensional data analysis method of clone's optimization

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663100A (en) * 2012-04-13 2012-09-12 西安电子科技大学 Two-stage hybrid particle swarm optimization clustering method
CN108595499A (en) * 2018-03-18 2018-09-28 西安财经学院 A kind of population cluster High dimensional data analysis method of clone's optimization

Also Published As

Publication number Publication date
CN112309577A (en) 2021-02-02

Similar Documents

Publication Publication Date Title
CN112257449B (en) Named entity recognition method and device, computer equipment and storage medium
CN110633366B (en) Short text classification method, device and storage medium
US11900250B2 (en) Deep learning model for learning program embeddings
CN113393911A (en) Ligand compound rapid pre-screening model based on deep learning
CN114880452A (en) Text retrieval method based on multi-view contrast learning
CN112309577B (en) Multi-mode feature selection method for optimizing parkinsonism voice data
CN111259664B (en) Method, device and equipment for determining medical text information and storage medium
CN113836896A (en) Patent text abstract generation method and device based on deep learning
CN113220865A (en) Text similar vocabulary retrieval method, system, medium and electronic equipment
CN114093445B (en) Patient screening marking method based on partial multi-marking learning
US20230029947A1 (en) Medical disease feature selection method based on improved salp swarm algorithm
CN116663536B (en) Matching method and device for clinical diagnosis standard words
CN117422074A (en) Method, device, equipment and medium for standardizing clinical information text
CN112244863A (en) Signal identification method, signal identification device, electronic device and readable storage medium
US20230214643A1 (en) Computer implemented pre-processing method and system for facilitating machine learning signal classes separability, and, non-transitory computer readable storage medium
CN112465054B (en) FCN-based multivariate time series data classification method
CN113590867B (en) Cross-modal information retrieval method based on hierarchical measurement learning
US20220277194A1 (en) Storage medium and inference method
CN112200224B (en) Medical image feature processing method and device
CN114190889A (en) Electrocardiosignal classification method and system, electronic equipment and readable storage medium
CN113851117A (en) Voice keyword recognition method, system, device and storage medium
Kecman et al. Adaptive local hyperplane for regression tasks
CN112925936B (en) Motion capture data retrieval method and system based on deep hash
CN117251574B (en) Text classification extraction method and system based on multi-feature data fusion
CN115344531A (en) Method and system for compressed fast medical interoperability resource (FHIR) file similarity search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant