WO2021139112A1 - Data dimensionality reduction processing method and apparatus, computer device, and storage medium - Google Patents

Data dimensionality reduction processing method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2021139112A1
WO2021139112A1 PCT/CN2020/099242 CN2020099242W WO2021139112A1 WO 2021139112 A1 WO2021139112 A1 WO 2021139112A1 CN 2020099242 W CN2020099242 W CN 2020099242W WO 2021139112 A1 WO2021139112 A1 WO 2021139112A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
initial
feature
standard
target
Prior art date
Application number
PCT/CN2020/099242
Other languages
French (fr)
Chinese (zh)
Inventor
张旭
刘伟
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021139112A1 publication Critical patent/WO2021139112A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a data dimensionality reduction processing method, device, computer equipment and storage medium.
  • a data dimensionality reduction processing method is provided.
  • a data dimensionality reduction processing method including:
  • Non-linear dimensionality reduction processing is performed on the standard data to obtain target data of preset dimensions.
  • a data dimensionality reduction processing device including:
  • the analysis module is used to extract relevant historical analysis models from the database, and analyze the historical analysis models to obtain initial features with a priority higher than the target priority;
  • the target feature generation module is used to obtain the frequency of the initial feature, and select the initial feature whose frequency meets the requirements as the target feature;
  • a data extraction module for extracting initial data corresponding to the target feature in the multi-dimensional data
  • the same data level data processing module is used to perform the same data level data processing on different types of said initial data to obtain standard data;
  • the dimensionality reduction processing module is used to perform non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension.
  • a computer device includes a memory and one or more processors, and the memory stores computer-readable instructions.
  • the one or more processors execute the following step:
  • Non-linear dimensionality reduction processing is performed on the standard data to obtain target data of preset dimensions.
  • One or more computer-readable storage media storing computer-readable instructions.
  • the one or more processors perform the following steps:
  • Non-linear dimensionality reduction processing is performed on the standard data to obtain target data of preset dimensions.
  • the generated target data is generated based on multi-dimensional data, and there is an association with the multi-dimensional data, so that the characteristics of the multi-dimensional data can be maintained, and subsequent data processing and analysis can be performed through the target data.
  • it can save the resource consumption of the system for data processing and analysis, and can improve the efficiency of data processing.
  • Fig. 1 is an application scenario diagram of the data dimensionality reduction processing method according to one or more embodiments.
  • Fig. 2 is a schematic flowchart of a data dimensionality reduction processing method according to one or more embodiments.
  • Fig. 3 is a schematic flowchart of a target feature determination step according to one or more embodiments.
  • Fig. 4 is a schematic flowchart of a data missing detection step according to one or more embodiments.
  • Fig. 5 is a structural block diagram of a data dimensionality reduction processing device according to one or more embodiments.
  • Figure 6 is a block diagram of a computer device according to one or more embodiments.
  • the data dimensionality reduction processing method provided in this application can be applied to the application environment as shown in FIG. 1.
  • the terminal 102 communicates with the server 104 through the network through the network.
  • the user can determine the data processing by triggering the terminal 102.
  • the server 104 extracts the historical analysis model from the database and analyzes it to obtain the initial characteristics.
  • 104 may select the target feature according to the frequency of the acquired initial feature, so as to extract the initial data from the multi-dimensional data based on the selected target feature to obtain the initial data.
  • the server 104 may also perform data processing and dimensionality reduction processing of the same data level on the initial data after obtaining the initial data to obtain target data of preset dimensions, thereby reducing subsequent data processing. Increase the amount of data and improve the efficiency of data processing.
  • the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
  • a data dimensionality reduction processing method is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps:
  • Step 202 Extract a historical analysis model from the database, and analyze the historical analysis model to obtain an initial feature with a priority higher than the target priority.
  • the database may be the database of the server.
  • the historical analysis model may be a model pre-configured in the server database.
  • the historical analysis model configured in the server database may include models corresponding to different data types and different business types, including but not limited to models related to medical insurance, or Models related to disease types, for example, diabetes surgery analysis model, routine examination analysis model, etc.
  • the server may filter the historical analysis models stored in the database according to the filter conditions, and then extract the historical analysis models obtained by the filter.
  • the server uses the medical insurance field as a filter condition to filter and extract a historical analysis model related to medical insurance from the database.
  • the initial characteristics refer to the characteristics obtained after analyzing the historical analysis model, which can include, but are not limited to, cost characteristics, frequency characteristics, and various indicator characteristics.
  • the initial feature mentioned here refers to the meaning of the feature, and does not involve specific feature data of the feature.
  • the cost characteristics may include, but are not limited to, surgical fees, drug fees, inspection fees, etc.
  • the frequency characteristics include, but are not limited to, the number of medical visits, the number of inspections, the number of operations, the number of purchases of drugs, etc.
  • various indicator characteristics may include but not Limited to height and weight, heart rate, blood pressure, hemoglobin content, platelet count, glucose content, urine protein, etc.
  • the target priority refers to a preset priority level, which can be a priority level preset by the server, such as high priority, low priority, or the like, or it can be level 1, level 2, or level 3, etc.
  • the target priority can be different according to the data type or business type, and this application does not impose any restrictions on this.
  • the priority of the initial feature can be associated with the historical analysis model.
  • the initial features extracted by the server can be different.
  • the initial features obtained by the server after analysis can be features such as operating expenses, number of operations, glucose content, etc.
  • the initial features obtained by the server after analysis can be height, weight, blood pressure, etc. , Heart rate, vision and other characteristics.
  • Step 204 Obtain the frequency of the initial feature, and select the initial feature whose frequency meets the requirements as the target feature.
  • the frequency of the initial feature may be the frequency of the initial feature in different historical analysis models.
  • the frequency of their appearance in multiple historical analysis models can be different. For example, for the operation fee, it can appear in various historical analysis models with a higher frequency, while for the glucose content, it may only appear in the diabetes operation analysis model with a lower frequency.
  • the initial characteristics can be determined according to the frequency of the characteristics appearing in different historical analysis models.
  • Frequency compliance can mean that the frequency of the initial features in the historical analysis model meets a certain threshold condition, or the frequency of the initial features in the historical analysis model is sorted, and the initial features that meet certain requirements are ranked as the initial features whose frequency meets the requirements.
  • the server may select the initial feature according to the frequency of the initial feature in the extracted historical analysis model to determine the target feature.
  • Step 206 Extract the initial data corresponding to the target feature in the multi-dimensional data.
  • multi-dimensional data can refer to all data stored in the database, and can include new data every time the data is changed and historical data before the change.
  • the initial data refers to the user's medical treatment.
  • the resulting medical treatment data stored under the user name can include historical medical treatment data and current medical treatment data, which can specifically include but not limited to the location of the consultation, the time of the consultation, the International Classification of Diseases (ICD), and registration Department, registered doctor information, registration fee, payment method, inspection items, inspection fee, condition description, medical advice, drug list, drug price, drug dosage, payment window, drug collection window, whether to follow-up, follow-up time, number of consultations, etc. data.
  • ICD International Classification of Diseases
  • the server may extract the initial data from the multi-dimensional data based on the selected target feature.
  • the initial data extracted by the server can be divided into multiple categories.
  • medical insurance data it may include, but is not limited to, the current medical treatment expense data, the current medical treatment ICD data, and the historical medical treatment data.
  • the medical expenses data of this medical treatment can include but not limited to surgery fees, drug fees, examination fees, etc.
  • the medical ICD data of this medical visit can include but are not limited to the cost of the confirmed ICD, the average cost of the ICD, etc.
  • the historical medical data can be Including but not limited to data such as the number of local outpatient clinics, the number of local hospitalizations, the number of off-site outpatient clinics, the number of off-site hospitalizations, the proportion of local outpatient clinics, and the proportion of off-site outpatient clinics.
  • Step 208 Perform data processing of the same data level on the different types of the initial data to obtain standard data.
  • the data level of the extracted initial data may be quite different.
  • the drug cost is 500
  • the total cost is 1,000,000
  • the data level of the two is quite different.
  • the server can perform data processing on the initial data of different data levels by using the same data level data processing method to obtain standard data of the same data level. For example, using the previous example, the same data level of data processing is performed on the drug cost and the total cost, and the drug cost and total cost of the data level between 0 and 100 are obtained, that is, the standard drug cost is 0.05, and the standard total cost is obtained. Is 100.
  • data processing methods of the same data magnitude can be selected according to different data types or different data magnitudes. For example, methods such as square root, square, cube, exponent, logarithm, etc. can be selected, which is not limited in this application. .
  • Step 210 Perform non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension.
  • the preset dimension may be a dimension preset by the user on the server through the terminal according to subsequent data processing requirements, and the data volume of the target data of the preset dimension may be less than the data volume of the standard data.
  • Non-linear dimensionality reduction processing methods can include, but are not limited to, Isometric Feature Mapping (Isomap), Locally Linear Embedding (LLE), Modified Locally Linear Embedding (MLLE), Hessian Eigenmapping , Spectral Embedding, Local Tangent Space Alignment (LTSA), Multi-dimensional Scaling (MDS), t-distributed Stochastic Neighbor Embedding, t-SNE) and so on.
  • Isometric Feature Mapping Isometric Feature Mapping
  • LLE Locally Linear Embedding
  • MLLE Modified Locally Linear Embedding
  • Hessian Eigenmapping Hessian Eigenmapping
  • Spectral Embedding Spectral Embedding
  • LTSA Local Tangent Space Alignment
  • MDS Multi-dimensional Scaling
  • t-SNE t-distributed Stochastic Neighbor Embedding
  • linear dimensionality reduction methods can also be used, including but not limited to principal component analysis (PCA), kernel PCA, and incremental principal component analysis (Incremental Principal Component Analysis). PCA) etc.
  • PCA principal component analysis
  • kernel PCA kernel PCA
  • incremental principal component analysis Intelligent Principal Component Analysis
  • the server can map the multi-dimensional standard data to a low dimension, for example, to 2 dimensions, by using the clustering characteristics of the data in the multi-dimensional Riemann space according to the above method, to obtain the target data.
  • the target feature is obtained through the historical analysis model, and then the initial data corresponding to the target feature in the multi-dimensional data is extracted, and the standard data is obtained after the data of the same data level is processed, and the standard number is performed
  • Non-linear dimensionality reduction processing obtains target data of preset dimensions.
  • the generated target data is generated based on multi-dimensional data, and there is an association with the multi-dimensional data, so that the characteristics of the multi-dimensional data can be maintained, and subsequent data processing and analysis can be performed through the target data.
  • it can save the resource consumption of the system for data processing and analysis, and can improve the efficiency of data processing.
  • the historical analysis model may be multiple, and may be models related to different data types or different business types.
  • the analyzing the historical analysis model to obtain the initial feature with a priority higher than the target priority may include: analyzing each of the historical analysis models to obtain the corresponding historical analysis model The initial characteristic with a priority greater than the target priority.
  • the diabetes surgery analysis model and the routine inspection analysis model can analyze the diabetes surgery analysis model and the routine inspection analysis model separately to obtain the initial characteristics of the operation fee, inspection fee, glucose content, etc. whose priority is higher than the target priority of the diabetes surgery analysis model, and obtain the corresponding routine inspection analysis
  • the priority of the model is higher than the height, weight, eyesight, examination fee and other characteristics of the target priority.
  • acquiring the frequency of the initial feature and selecting the initial feature whose frequency meets the requirements as the target feature may include the following process steps:
  • Step S302 Determine the feature priority and appearance frequency of each of the initial features in each of the historical analysis models.
  • the feature priority refers to the metric standard of the feature corresponding to the target priority in the historical analysis model, which can be a high, medium, or low metric.
  • the feature priority can be high, while the features such as height and weight are at low priority in the diabetes surgery analysis model. Therefore, the feature The priority can be low.
  • the feature priority can also be a numerical metric. For example, a high priority corresponds to 10 points, a low priority corresponds to 1 point, etc.
  • the application is not restricted.
  • Step S304 Perform a comprehensive calculation on the feature priority and appearance frequency of each of the initial features in each of the historical analysis models to obtain the relative priority level of each of the initial features in each of the historical analysis models.
  • the server after the server obtains the feature priority and appearance frequency of each initial feature, it can perform comprehensive calculations on each initial feature according to the set feature priority and feature appearance frequency ratio to obtain each initial feature.
  • the relative priority level of the feature For example, after the server obtains the feature priority and appearance frequency of each initial feature, it can perform comprehensive calculations on each initial feature according to the set feature priority and feature appearance frequency ratio to obtain each initial feature.
  • the relative priority level of the feature For example, after the server obtains the feature priority and appearance frequency of each initial feature, it can perform comprehensive calculations on each initial feature according to the set feature priority and feature appearance frequency ratio to obtain each initial feature.
  • the relative priority level of the feature For example, after the server obtains the feature priority and appearance frequency of each initial feature, it can perform comprehensive calculations on each initial feature according to the set feature priority and feature appearance frequency ratio to obtain each initial feature. The relative priority level of the feature.
  • the relative priority level can be a high, medium, or low priority metric, or it can be a priority metric expressed in the form of numerical scores such as 60, 70, 80, 100, or a combination of the two Determined priority level.
  • Step S306 Compare the relative priority level of each of the initial features in each of the historical analysis models, and determine that the initial feature whose relative priority level meets the requirements is the target feature.
  • the server can determine the target feature by comparing the relative priority level of each initial feature. For example, it can be to determine the initial feature with a high relative priority level. Is the target feature, or the initial feature with a high relative priority level is determined as the target feature.
  • the comprehensive calculation of each initial feature is performed to obtain the relative priority level of each initial feature, which is selected according to the relative priority level Target characteristics. Therefore, the determined target feature has a strong correlation with the multi-dimensional data, and the accuracy of the initial data obtained based on the target feature extraction can be improved.
  • the data dimensionality reduction processing method may further include: obtaining the data source of the initial data and the user corresponding to the initial data Address, and divide the historical data in the initial data into local data and remote data based on the data source and the user address.
  • the initial data extracted from the multi-dimensional data may include, but is not limited to, the current medical treatment expense data, the current medical treatment ICD data, and the historical medical treatment data.
  • historical medical treatment data may include, but are not limited to, data such as the number of local outpatient clinics, the number of local hospitalizations, the number of off-site outpatient clinics, the number of off-site hospitalizations, the proportion of local outpatient visits, and the proportion of off-site outpatient visits.
  • the server may calculate the number of local outpatient clinics, the number of local hospitalizations, the number of off-site outpatient clinics, the number of off-site hospitalizations, the proportion of the number of local outpatient clinics in the historical data, and the proportion of the number of local outpatient clinics in the historical data according to the data source of the initial data and the user address corresponding to the initial data
  • Data such as the proportion of outpatient visits is divided into data to obtain local data (the number of local outpatient visits, the number of local hospitalizations, the proportion of local outpatient visits, etc.) and remote data (the number of outpatient visits in different places, the number of hospitalizations in other places, the proportion of outpatient visits in other places, etc.) ).
  • the server after the server divides the initial data into local data and remote data, it performs data processing of the same data level on different types of the initial data to obtain standard data, which may include: The local data is processed with the same data level to obtain local standard data; the different types of remote data are processed with the same data level to obtain the remote standard data.
  • the server performs data processing of the same data level on local data and remote data
  • the obtained standard data can be data of the same data level or different.
  • the obtained local standard data can be between 0 and 10.
  • the data obtained in different places may be data between 0-100, or may all be data between 0-100, which is not limited in this application.
  • the methods for performing data processing of the same order of magnitude on local data and remote data may be the same or different.
  • the performing non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension may include: performing non-linear dimensionality reduction processing on the local standard data and the remote standard data respectively to obtain the local standard data of the preset dimensions.
  • Target data and remote target data may include: performing non-linear dimensionality reduction processing on the local standard data and the remote standard data respectively to obtain the local standard data of the preset dimensions.
  • the server may select different non-linear dimensionality reduction processing methods to perform dimensionality reduction processing on the local standard data and the remote standard data according to the different characteristics of the local standard data and the remote standard data.
  • the server performs non-linear dimensionality reduction data on the local standard data and the remote standard data, and the obtained local target data and the remote target data can have the same or different dimensions, which can be different from the selected data dimensionality reduction method and subsequent data
  • the processed data dimension association this application does not limit this.
  • the server can continue to divide the remote medical treatment data, for example, according to specific provinces or cities, it is divided into Beijing data, Shanghai data, Guangzhou data, etc.
  • the server can also perform data processing of the same data level on the initial data, then divide the standard data geographically based on the data source and user address, and then perform data dimensionality reduction respectively to obtain preset dimensions. Local target data and remote target data.
  • the appeal data processing may further include: dividing the initial data into expense data and times according to data types. Class data.
  • the initial data may include data such as surgical fees, drug fees, inspection fees, the number of local outpatient clinics, the number of local hospitalizations, the number of off-site clinics, and the number of off-site hospitalizations.
  • the server can classify data such as surgical fees, drug fees, and inspection fees into expense data, and classify data such as the number of local outpatient clinics, the number of local hospitalizations, the number of off-site outpatient clinics, and the number of off-site hospitalizations into frequency data.
  • said performing the same data-level data processing on the different types of the initial data to obtain the standard data may include: obtaining preset formulas corresponding to the expense type data and the frequency type data respectively; and according to the expense type The preset formula corresponding to the data and the corresponding standard magnitude, the same data magnitude data processing is performed on the expense data of different data magnitudes, and the standard data corresponding to the expense data is obtained; according to the prediction corresponding to the frequency data Set the formula and the corresponding standard magnitude, perform data processing of the same data magnitude on the order data of different data magnitudes to obtain the standard data corresponding to the order data.
  • the preset formula is a function formula corresponding to the data processing method of the same data magnitude as described above, for example, a square formula, a square formula, a cube formula, an exponential formula, a logarithmic formula, etc.
  • the preset formulas can be the same or different. Or, different types of data can be associated with different preset formulas for data processing.
  • the server after the server extracts the initial data corresponding to the target feature in the multidimensional data, it can also check the integrity and polarity of the extracted initial data. Specifically, refer to the data missing as shown in FIG. 4 Schematic diagram of the flow of the detection steps.
  • the dimensionality reduction processing method of the appeal data may also include the following steps:
  • Step S402 Perform data missing detection on the data of the remaining data types in the initial data according to the data of at least one data type in the initial data.
  • the server After the server extracts the initial data, it determines that the user has 3 times of illness based on the initial data. However, in the initial data obtained, the treatment cost data only includes two sets of data of 2000 and 500. , The data on the number of illnesses can be used to determine the lack of data on treatment costs.
  • Step S404 When it is detected that the initial data has data missing, perform data filling on the data of the data type according to the data of the same type as the missing data.
  • the server After the server detects that the treatment cost data is missing, it can determine and fill in the data based on the 2000 and 500 sets of data included in the treatment cost data. For example, the average value or median of the two sets of data of 2000 and 500 can be taken, and the treatment data can be filled according to the determined average value or median.
  • the server can also perform data evaluation based on the user's historical data in the initial data, and perform data filling based on the evaluation data obtained by the evaluation.
  • the data evaluation may also be based on a combination of the user's historical data and the data of the user with the same medical record to obtain the evaluation data.
  • the integrity of the acquired initial data can be improved, and the accuracy of subsequent data processing can be improved.
  • a data dimensionality reduction processing device which may include: an analysis module 100, a target feature generation module 200, a data extraction module 300, a data processing module 400 of the same data level,
  • the dimensionality reduction processing module 500 in which:
  • the analysis module 100 is configured to extract relevant historical analysis models from the database, and analyze the historical analysis models to obtain initial features with a priority higher than the target priority.
  • the target feature generating module 200 is configured to obtain the frequency of the initial feature, and select the initial feature whose frequency meets the requirement as the target feature.
  • the data extraction module 300 is used to extract the initial data corresponding to the target feature in the multi-dimensional data.
  • the same data level data processing module 400 is used to perform the same data level data processing on different types of the initial data to obtain standard data.
  • the dimensionality reduction processing module 500 is configured to perform non-linear dimensionality reduction processing on the standard data to obtain target data of preset dimensions.
  • the analysis module 100 is configured to analyze each historical analysis model to obtain a priority corresponding to each historical analysis model that is greater than the target priority. Initial characteristics.
  • the target feature generation module 200 may include:
  • the first determining sub-module is used to determine the feature priority and the appearance frequency of the initial feature in the historical analysis model.
  • the calculation sub-module is used to comprehensively calculate the feature priority and appearance frequency of each of the initial features in each of the historical analysis models to obtain the relative priority of each of the initial features in each of the historical analysis models grade.
  • the comparison and determination sub-module is used to compare the relative priority level of each of the initial features in each of the historical analysis models, and determine that the initial feature whose relative priority level meets the requirements is the target feature.
  • the above-mentioned data dimensionality reduction processing device may further include:
  • the data classification module is used to obtain the data source of the initial data and the user address corresponding to the initial data after the data extraction module 300 extracts the initial data corresponding to the target feature in the multidimensional data, and based on the The data source and user address divide the historical data in the initial data into local data and remote data.
  • the data processing module 400 of the same data level may include:
  • the first data processing sub-module of the same data level is used to perform data processing of the same data level on different types of the local data to obtain local standard data.
  • the second data processing sub-module of the same data level is used to perform data processing of the same data level on different types of said remote data to obtain remote standard data.
  • the dimensionality reduction processing module 500 is configured to perform non-linear dimensionality reduction processing on the local standard data and the remote standard data, respectively, to obtain the local target data and the remote target data of a preset dimension.
  • the above-mentioned data dimensionality reduction processing device may further include:
  • the classification module is used for the same data level data processing module 400 to perform data processing of the same data level on different types of the initial data before obtaining standard data, according to the data type to divide the initial data into expense data and Times data.
  • the same data level data processing module 400 may include:
  • the obtaining sub-module is used to obtain preset formulas corresponding to the expense type data and the frequency type data respectively.
  • the third data processing sub-module of the same data level is used to perform data processing of the same data level on the expense data of different data levels according to the preset formula corresponding to the expense data and the corresponding standard magnitude. Standard data of the expense data.
  • the fourth data processing sub-module of the same data magnitude is used to perform data processing of the same data magnitude on the multiple data of different data magnitudes according to the preset formula corresponding to the magnitude data and the corresponding standard magnitude to obtain the corresponding The standard data of the frequency type data.
  • the above-mentioned data dimensionality reduction processing device may further include:
  • the detection module is used for the data extraction module 300 after extracting the initial data corresponding to the target feature in the multi-dimensional data, according to the data of at least one data type in the initial data, check the remaining data types in the initial data Data is tested for missing data.
  • the filling module is used to fill the data of the data type according to the data of the same type as the missing data when it is detected that the initial data has data missing.
  • Each module in the above-mentioned data dimensionality reduction processing device can be implemented in whole or in part by software, hardware, and combinations thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 6.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile or volatile storage medium and internal memory.
  • the non-volatile or volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer equipment is used to store historical analysis model data and various data during data processing.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instruction is executed by the processor to realize a data dimensionality reduction processing method.
  • FIG. 6 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. Specifically, the computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device includes a memory and one or more processors, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the one or more processors perform the following steps: extract from a database Relevant historical analysis model, and analyze the historical analysis model to obtain the initial feature with priority higher than the target priority; obtain the frequency of the initial feature, and select the initial feature with the required frequency as the target feature; extract the multidimensional data Initial data corresponding to the target feature; perform data processing of the same data level on different types of the initial data to obtain standard data; and perform non-linear dimensionality reduction processing on the standard data to obtain a target with a preset dimension data.
  • the analysis of the historical analysis models to obtain initial features with a priority higher than the target priority may include: Analyze each of the historical analysis models to obtain the initial characteristics corresponding to the priority of each historical analysis model that is greater than the target priority.
  • the acquiring the frequency of the initial feature and selecting the initial feature whose frequency meets the requirements as the target feature may include: determining the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models; Perform a comprehensive calculation on the feature priority and the frequency of appearance of the initial features in each of the historical analysis models to obtain the relative priority level of each of the initial features in each of the historical analysis models; and compare each of the histories The relative priority level of each initial feature in the model is analyzed, and the initial feature whose relative priority level meets the requirements is determined as the target feature.
  • the processor may further include: acquiring the data source of the initial data and the initial data.
  • the user address corresponding to the data, and the historical data in the initial data is divided into local data and remote data based on the data source and the user address.
  • the performing the same data-level data processing on the different types of the initial data to obtain standard data includes: performing the same data-level data processing on the different types of the local data to obtain local standard data; and Data processing of the same data level is performed on the type of the remote data to obtain the remote standard data.
  • the performing non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension includes: performing non-linear dimensionality reduction processing on the local standard data and remote standard data respectively to obtain the local target data of the preset dimension, and Remote target data.
  • the processor when the processor executes the computer-readable instructions, before performing the same data-level data processing on the different types of the initial data to obtain the standard data, it may further include: dividing the initial data according to the data type. The data is divided into expense data and frequency data.
  • the data processing of the same data level on the different types of the initial data to obtain the standard data may include: obtaining preset formulas corresponding to the cost-type data and the frequency-type data, respectively; corresponding to the cost-type data Perform data processing of the same data level on expense data of different data levels to obtain the standard data corresponding to the expense data; and according to the preset corresponding to the frequency data
  • the formula and the corresponding standard order of magnitude perform data processing of the same data level on the order data of different data levels to obtain the standard data corresponding to the order data.
  • the processor after extracting the initial data corresponding to the target feature in the multi-dimensional data realized when the processor executes the computer-readable instruction, it may further include: data according to at least one data type in the initial data , Performing data missing detection on the data of the remaining data types in the initial data; when detecting that the initial data has data missing, perform data filling on the data of the data type according to data of the same type as the missing data.
  • One or more computer-readable storage media storing computer-readable instructions.
  • the one or more processors execute the following steps to extract relevant historical analysis models from the database , And analyze the historical analysis model to obtain the initial feature with a priority higher than the target priority; obtain the frequency of the initial feature, and select the initial feature with the required frequency as the target feature; extract the multi-dimensional data and the target feature Corresponding initial data; performing data processing of the same data level on different types of the initial data to obtain standard data; and performing non-linear dimensionality reduction processing on the standard data to obtain target data of preset dimensions.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the analysis of the historical analysis model to obtain an initial feature with a priority higher than the target priority may include: Analyze each of the historical analysis models to obtain the initial characteristics corresponding to the priority of each historical analysis model that is greater than the target priority.
  • the acquiring the frequency of the initial feature and selecting the initial feature whose frequency meets the requirements as the target feature may include: determining the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models; Perform a comprehensive calculation on the feature priority and the frequency of appearance of the initial features in each of the historical analysis models to obtain the relative priority level of each of the initial features in each of the historical analysis models; and compare each of the histories The relative priority level of each initial feature in the model is analyzed, and the initial feature whose relative priority level meets the requirements is determined as the target feature.
  • the processor after the extraction of the initial data corresponding to the target feature in the multi-dimensional data realized when the computer-readable instruction is executed by the processor, it may further include: acquiring the data source of the initial data and the The user address corresponding to the initial data, and the historical data in the initial data is divided into local data and remote data based on the data source and the user address.
  • the performing the same data-level data processing on the different types of the initial data to obtain standard data includes: performing the same data-level data processing on the different types of the local data to obtain local standard data; and Data processing of the same data level is performed on the type of the remote data to obtain the remote standard data.
  • the performing non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension includes: performing non-linear dimensionality reduction processing on the local standard data and the remote standard data, respectively, to obtain local target data of the preset dimension, and Remote target data.
  • the computer-readable instructions when executed by the processor to perform the same data-level data processing on the different types of the initial data to obtain the standard data, it may further include:
  • the initial data is divided into expense data and frequency data.
  • the performing data processing of the same data level on the different types of the initial data to obtain the standard data may include: obtaining preset formulas corresponding to the expense-type data and the frequency-type data, respectively; corresponding to the expense-type data Perform data processing of the same data level on expense data of different data levels to obtain the standard data corresponding to the expense data; and according to the preset corresponding to the frequency data
  • the formula and the corresponding standard order of magnitude perform data processing of the same data level on the order data of different data levels to obtain the standard data corresponding to the order data.
  • the processor after the extraction of the initial data corresponding to the target feature in the multidimensional data realized when the computer-readable instruction is executed by the processor, it may further include: Data, perform data missing detection on data of the remaining data types in the initial data; and when it is detected that the initial data has data missing, perform data on the data of the data type according to the same data as the missing data type filling.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A data dimensionality reduction processing method, relating to the technical field of artificial intelligence, and comprising: extracting a historical analysis model from a database, and analyzing the historical analysis model to obtain initial features the priorities of which are greater than a target priority (S202); obtaining the frequencies of the initial features, and selecting an initial feature having a frequency satisfying the requirements as a target feature (S204); extracting initial data corresponding to the target feature from multi-dimensional data (S206); performing data processing of the same order of magnitude on the different types of initial data to obtain standard data (S208); and performing nonlinear dimensionality reduction processing on the standard data to obtain target data of a preset dimension (S210).

Description

数据降维处理方法、装置、计算机设备和存储介质Data dimensionality reduction processing method, device, computer equipment and storage medium
相关申请的交叉引用Cross-references to related applications
本申请要求于2020年01月07日提交中国专利局,申请号为202010014342.4,申请名称为“数据降维处理方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on January 7, 2020. The application number is 202010014342.4 and the application name is "Data dimensionality reduction processing methods, devices, computer equipment and storage media". The entire content of the application is approved. The reference is incorporated in this application.
技术领域Technical field
本申请涉及人工智能领域,特别是涉及一种数据降维处理方法、装置、计算机设备和存储介质。This application relates to the field of artificial intelligence, and in particular to a data dimensionality reduction processing method, device, computer equipment and storage medium.
背景技术Background technique
随着社会的发展,各种各样的数据越来越多,为了获取更为有价值的数据,越来越多的专业人士也开始对各种数据进行相关的研究分析。With the development of society, there are more and more kinds of data. In order to obtain more valuable data, more and more professionals have begun to conduct related research and analysis on various data.
但对数据量巨大、数据类型众多的数据,发明人意识到,直接进行数据的分析处理会消耗大量的处理时间,这种问题往往会导致数据处理系统的硬件的处理效率效果不高,耗时的同时也耗费资源。But for data with huge amount of data and many types of data, the inventor realized that directly analyzing and processing data will consume a lot of processing time. This problem often leads to low processing efficiency and time-consuming effects of the hardware of the data processing system. It also consumes resources.
发明内容Summary of the invention
根据本申请公开的各种实施例,提供一种数据降维处理方法、装置、计算机设备和存储介质。According to various embodiments disclosed in the present application, a data dimensionality reduction processing method, device, computer equipment, and storage medium are provided.
一种数据降维处理方法,包括:A data dimensionality reduction processing method, including:
从数据库中提取相关的历史分析模型,并对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征;Extracting relevant historical analysis models from the database, and analyzing the historical analysis models to obtain initial features with a priority higher than the target priority;
获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征;Acquiring the frequency of the initial feature, and selecting the initial feature whose frequency meets the requirements as the target feature;
提取多维数据中与所述目标特征对应的初始数据;Extracting the initial data corresponding to the target feature in the multidimensional data;
对不同类型的所述初始数据进行同一数据量级的数据处理,得到标准数据;及Perform data processing of the same data level on different types of said initial data to obtain standard data; and
对所述标准数据进行非线性降维处理,得到预设维度的目标数据。Non-linear dimensionality reduction processing is performed on the standard data to obtain target data of preset dimensions.
一种数据降维处理装置,包括:A data dimensionality reduction processing device, including:
解析模块,用于从数据库中提取相关的历史分析模型,并对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征;The analysis module is used to extract relevant historical analysis models from the database, and analyze the historical analysis models to obtain initial features with a priority higher than the target priority;
目标特征生成模块,用于获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征;The target feature generation module is used to obtain the frequency of the initial feature, and select the initial feature whose frequency meets the requirements as the target feature;
数据提取模块,用于提取多维数据中与所述目标特征对应的初始数据;A data extraction module for extracting initial data corresponding to the target feature in the multi-dimensional data;
同一数据量级数据处理模块,用于对不同类型的所述初始数据进行同一数据量级的数据处理得到标准数据;及The same data level data processing module is used to perform the same data level data processing on different types of said initial data to obtain standard data; and
降维处理模块,用于对所述标准数据进行非线性降维处理得到预设维度的目标数据。The dimensionality reduction processing module is used to perform non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension.
一种计算机设备,包括存储器和一个或多处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:A computer device includes a memory and one or more processors, and the memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the one or more processors execute the following step:
从数据库中提取相关的历史分析模型,并对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征;Extracting relevant historical analysis models from the database, and analyzing the historical analysis models to obtain initial features with a priority higher than the target priority;
获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征;Acquiring the frequency of the initial feature, and selecting the initial feature whose frequency meets the requirements as the target feature;
提取多维数据中与所述目标特征对应的初始数据;Extracting the initial data corresponding to the target feature in the multidimensional data;
对不同类型的所述初始数据进行同一数据量级的数据处理,得到标准数据;及Perform data processing of the same data level on different types of said initial data to obtain standard data; and
对所述标准数据进行非线性降维处理,得到预设维度的目标数据。Non-linear dimensionality reduction processing is performed on the standard data to obtain target data of preset dimensions.
一个或多个存储有计算机可读指令的计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:One or more computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:
从数据库中提取相关的历史分析模型,并对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征;Extracting relevant historical analysis models from the database, and analyzing the historical analysis models to obtain initial features with a priority higher than the target priority;
获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征;Acquiring the frequency of the initial feature, and selecting the initial feature whose frequency meets the requirements as the target feature;
提取多维数据中与所述目标特征对应的初始数据;Extracting the initial data corresponding to the target feature in the multidimensional data;
对不同类型的所述初始数据进行同一数据量级的数据处理,得到标准数据;及Perform data processing of the same data level on different types of said initial data to obtain standard data; and
对所述标准数据进行非线性降维处理,得到预设维度的目标数据。Non-linear dimensionality reduction processing is performed on the standard data to obtain target data of preset dimensions.
上述数据降维处理方法、装置、计算机设备和存储介质,通过历史分析模型得到的目标特征,然后提取多维数据中与目标特征对应的初始数据,并对同一数据量级的数据处理后得到标准数据,对所述标准数进行非线性降维处理,得到预设维度的目标数据。生成的目标数据基于多维数据生成,与多维数据之间存在关联,从而可以保持多维数据的特征,进而可以通过目标数据进行后续的数据处理分析。相比以多维数据进行数据处理分析,可以节约系统进行数据处理分析时的资源耗费,并且可以提升数据处理效率。The above-mentioned data dimensionality reduction processing method, device, computer equipment and storage medium, through the historical analysis model to obtain the target feature, then extract the initial data corresponding to the target feature in the multidimensional data, and process the data of the same data level to obtain the standard data , Performing non-linear dimensionality reduction processing on the standard number to obtain target data of a preset dimension. The generated target data is generated based on multi-dimensional data, and there is an association with the multi-dimensional data, so that the characteristics of the multi-dimensional data can be maintained, and subsequent data processing and analysis can be performed through the target data. Compared with multi-dimensional data for data processing and analysis, it can save the resource consumption of the system for data processing and analysis, and can improve the efficiency of data processing.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. A person of ordinary skill in the art can obtain other drawings based on these drawings without creative work.
图1为根据一个或多个实施例中数据降维处理方法的应用场景图。Fig. 1 is an application scenario diagram of the data dimensionality reduction processing method according to one or more embodiments.
图2为根据一个或多个实施例中数据降维处理方法的流程示意图。Fig. 2 is a schematic flowchart of a data dimensionality reduction processing method according to one or more embodiments.
图3为根据一个或多个实施例中目标特征确定步骤的流程示意图。Fig. 3 is a schematic flowchart of a target feature determination step according to one or more embodiments.
图4为根据一个或多个实施例中数据缺失检测步骤的流程示意图。Fig. 4 is a schematic flowchart of a data missing detection step according to one or more embodiments.
图5为根据一个或多个实施例中数据降维处理装置的结构框图。Fig. 5 is a structural block diagram of a data dimensionality reduction processing device according to one or more embodiments.
图6为根据一个或多个实施例中计算机设备的框图。Figure 6 is a block diagram of a computer device according to one or more embodiments.
具体实施方式Detailed ways
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the technical solutions and advantages of the present application clearer, the following further describes the present application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.
本申请提供的数据降维处理方法,可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104通过网络进行通信。其中,用户可以通过对终端102的触发操作,确定进行数据处理,服务器104在接收到终端102发送的进行数据处理的指示后,从数据库中提取历史分析模型并进行解析,以得到初始特征,服务器104可以根据获取到的初始特征的频率,选择目标特征,从而以基于选择的目标特征对多维数据进行初始数据的提取,得到初始数据。进一步,为了方便后续数据处理的进行,服务器104还可以在得到初始数据后,对初始数据进行同一数据量级的数据处理以及降维处理,得到预设维度的目标数据,进而可以减少后续数据处理的数据量并提升数据处理的处理效率。其中,终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The data dimensionality reduction processing method provided in this application can be applied to the application environment as shown in FIG. 1. Wherein, the terminal 102 communicates with the server 104 through the network through the network. Among them, the user can determine the data processing by triggering the terminal 102. After receiving the data processing instruction sent by the terminal 102, the server 104 extracts the historical analysis model from the database and analyzes it to obtain the initial characteristics. 104 may select the target feature according to the frequency of the acquired initial feature, so as to extract the initial data from the multi-dimensional data based on the selected target feature to obtain the initial data. Further, in order to facilitate subsequent data processing, the server 104 may also perform data processing and dimensionality reduction processing of the same data level on the initial data after obtaining the initial data to obtain target data of preset dimensions, thereby reducing subsequent data processing. Increase the amount of data and improve the efficiency of data processing. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
在其中一个实施例中,如图2所示,提供了一种数据降维处理方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:In one of the embodiments, as shown in FIG. 2, a data dimensionality reduction processing method is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps:
步骤202,从数据库中提取历史分析模型,并对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征。Step 202: Extract a historical analysis model from the database, and analyze the historical analysis model to obtain an initial feature with a priority higher than the target priority.
其中,数据库可以是服务器的数据库。历史分析模型可以是预先配置于服务器数据库中的模型,服务器数据库中配置的历史分析模型可以包括对应不同数据类型以及与不同的业务类型相关的模型,可以包括但不限于与医保相关的模型,或者与疾病种类相关的模型等,例如,糖尿病手术分析模型、常规检查分析模型等。Among them, the database may be the database of the server. The historical analysis model may be a model pre-configured in the server database. The historical analysis model configured in the server database may include models corresponding to different data types and different business types, including but not limited to models related to medical insurance, or Models related to disease types, for example, diabetes surgery analysis model, routine examination analysis model, etc.
具体地,服务器可以按照筛选条件,对数据库中存储的历史分析模型进行筛选,然后提取出筛选得到的历史分析模型。例如,服务器以医保字段为筛选条件,从数据库中筛选并提取出与医保相关的历史分析模型。Specifically, the server may filter the historical analysis models stored in the database according to the filter conditions, and then extract the historical analysis models obtained by the filter. For example, the server uses the medical insurance field as a filter condition to filter and extract a historical analysis model related to medical insurance from the database.
初始特征是指对历史分析模型进行解析后得到的特征,可以包括但不限于费用特征、次数特征以及各种指标特征等。本领域技术人员可以理解的是,此处所述初始特征是指特征含义,并不涉及特征具体的特征数据。具体地,费用特征可以包括但不限于手术费、药品费、检查费等;次数特征包括但不限于就医次数、检查次数、动手术次数、购领药品次数等;各种指标特征可以包括但不限于身高体重、心率、血压、血红蛋白含量、血小板计数、葡萄糖含量、尿蛋白等。The initial characteristics refer to the characteristics obtained after analyzing the historical analysis model, which can include, but are not limited to, cost characteristics, frequency characteristics, and various indicator characteristics. Those skilled in the art can understand that the initial feature mentioned here refers to the meaning of the feature, and does not involve specific feature data of the feature. Specifically, the cost characteristics may include, but are not limited to, surgical fees, drug fees, inspection fees, etc.; the frequency characteristics include, but are not limited to, the number of medical visits, the number of inspections, the number of operations, the number of purchases of drugs, etc.; various indicator characteristics may include but not Limited to height and weight, heart rate, blood pressure, hemoglobin content, platelet count, glucose content, urine protein, etc.
目标优先级是指预先设置的优先度等级,可以是高优先级、低优先级等通过服务器预先设置的优先等级,或者也可以是一级、二级、三级等。目标优先级可以根据数据类型或业务类型的不同而不同,对此本申请不做任何限制。The target priority refers to a preset priority level, which can be a priority level preset by the server, such as high priority, low priority, or the like, or it can be level 1, level 2, or level 3, etc. The target priority can be different according to the data type or business type, and this application does not impose any restrictions on this.
初始特征的优先级可以与历史分析模型相关联,对于不同的历史分析模型,服务器提取得到初始特征可以不同。例如,对于糖尿病手术分析模型,服务器进行解析后得到的初始特征可以是手术费、手术次数、葡萄糖含量等特征,对于常规检查分析模型,服务器进行解析后得到的初始特征可以是身高、体重、血压、心率、视力等特征。The priority of the initial feature can be associated with the historical analysis model. For different historical analysis models, the initial features extracted by the server can be different. For example, for a diabetes surgery analysis model, the initial features obtained by the server after analysis can be features such as operating expenses, number of operations, glucose content, etc. For a routine inspection analysis model, the initial features obtained by the server after analysis can be height, weight, blood pressure, etc. , Heart rate, vision and other characteristics.
步骤204,获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征。Step 204: Obtain the frequency of the initial feature, and select the initial feature whose frequency meets the requirements as the target feature.
其中,初始特征的频率可以是初始特征在不同的历史分析模型中出现的频率。对于不同的初始特征,其在多个历史分析模型中出现的频率可以不同。例如,对于手术费,可以出现在各种不同的历史分析模型中,出现频率较高,而对于葡萄糖含量,可能仅出现在糖尿病手术分析模型中,出现频率较低。Among them, the frequency of the initial feature may be the frequency of the initial feature in different historical analysis models. For different initial features, the frequency of their appearance in multiple historical analysis models can be different. For example, for the operation fee, it can appear in various historical analysis models with a higher frequency, while for the glucose content, it may only appear in the diabetes operation analysis model with a lower frequency.
具体地,初始特征在不同的历史分析模型中出现的频率越高,如初始特征在所有历史分析模型中都有出现,则可以确定该初始特征越重要。从而可以根据特征出现在不同历史分析模型中的频率,进行初始特征的确定。Specifically, the higher the frequency of the initial feature appearing in different historical analysis models, if the initial feature appears in all historical analysis models, it can be determined that the initial feature is more important. Therefore, the initial characteristics can be determined according to the frequency of the characteristics appearing in different historical analysis models.
频率符合要求可以是指初始特征在历史分析模型中出现的频率满足一定的阈值条件,或者对初始特征在历史分析模型中出现的频率进行排序,排序满足一定要求的初始特征为频率符合要求的初始特征,例如,排序前10位等。具体地,服务器可以根据初始特征在提取得到的历史分析模型中出现的频率,对初始特征进行选取,以确定目标特征。Frequency compliance can mean that the frequency of the initial features in the historical analysis model meets a certain threshold condition, or the frequency of the initial features in the historical analysis model is sorted, and the initial features that meet certain requirements are ranked as the initial features whose frequency meets the requirements. Features, for example, the top 10 rankings, etc. Specifically, the server may select the initial feature according to the frequency of the initial feature in the extracted historical analysis model to determine the target feature.
步骤206,提取多维数据中与所述目标特征对应的初始数据。Step 206: Extract the initial data corresponding to the target feature in the multi-dimensional data.
其中,多维数据可以是指存储于数据库中的所有数据,可以包括每一次数据变更时新增的数据以及变更前的历史数据,例如,对应于前文所述的医保数据,初始数据是指用户就医后生成的存储于用户名下的就医数据,可以包括历史就医数据以及本次就医数据,具体可以包括但不限于问诊地点、问诊时间、国际疾病分类(International Classification of Diseases,ICD)、挂号科室、挂号医生信息、挂号费、付费方式、检查项目、检查费、病情描述、就诊建议、药品清单、药品价格、用药剂量、付费窗口、取药窗口、是否复诊、复诊时间、问诊次数等数据。Among them, multi-dimensional data can refer to all data stored in the database, and can include new data every time the data is changed and historical data before the change. For example, corresponding to the medical insurance data mentioned above, the initial data refers to the user's medical treatment. The resulting medical treatment data stored under the user name can include historical medical treatment data and current medical treatment data, which can specifically include but not limited to the location of the consultation, the time of the consultation, the International Classification of Diseases (ICD), and registration Department, registered doctor information, registration fee, payment method, inspection items, inspection fee, condition description, medical advice, drug list, drug price, drug dosage, payment window, drug collection window, whether to follow-up, follow-up time, number of consultations, etc. data.
具体地,服务器可以基于选择的目标特征,从多维数据中提取得到初始数据。在本实施例中,服务器提取得到的初始数据可以分为多类,例如,对于医保数据,可以包括但不限于本次就医费用数据、本次就医ICD数据、历史就医数据。其中,本次就医费用数据可以包括但不限于手术费、药品费、检查费等;本次就医ICD数据可以包括但不限于本次确诊ICD的费用,该ICD的平均费用等;历史就医数据可以包括但不限于本地门诊次数、本地住院次数、异地门诊次数、异地住院次数、本地门诊次数占比、异地门诊次数占比等数据。Specifically, the server may extract the initial data from the multi-dimensional data based on the selected target feature. In this embodiment, the initial data extracted by the server can be divided into multiple categories. For example, for medical insurance data, it may include, but is not limited to, the current medical treatment expense data, the current medical treatment ICD data, and the historical medical treatment data. Among them, the medical expenses data of this medical treatment can include but not limited to surgery fees, drug fees, examination fees, etc.; the medical ICD data of this medical visit can include but are not limited to the cost of the confirmed ICD, the average cost of the ICD, etc.; the historical medical data can be Including but not limited to data such as the number of local outpatient clinics, the number of local hospitalizations, the number of off-site outpatient clinics, the number of off-site hospitalizations, the proportion of local outpatient clinics, and the proportion of off-site outpatient clinics.
步骤208,对不同类型的所述初始数据进行同一数据量级的数据处理,得到标准数据。Step 208: Perform data processing of the same data level on the different types of the initial data to obtain standard data.
具体地,提取得到的初始数据由于数据类型的不同,其数据量级可能存在较大差异,例如,药品费为500,而总费用为1000000,两者的数据量级相差巨大。Specifically, due to different data types, the data level of the extracted initial data may be quite different. For example, the drug cost is 500, and the total cost is 1,000,000, and the data level of the two is quite different.
服务器可以通过同一数据量级的数据处理的方法,对不同数据量级的初始数据进行数据处理,得到数据量级相同的标准数据。例如,延用前例,对药品费以及总费用进行同一数据量级的数据处理,得到数据量级在0至100之间的药品费和总费用,即得到的标准药品费为0.05,标准总费用为100。The server can perform data processing on the initial data of different data levels by using the same data level data processing method to obtain standard data of the same data level. For example, using the previous example, the same data level of data processing is performed on the drug cost and the total cost, and the drug cost and total cost of the data level between 0 and 100 are obtained, that is, the standard drug cost is 0.05, and the standard total cost is obtained. Is 100.
具体地,同一数据量级的数据处理方法可以根据数据类型的不同或者根据数据量级的不同选用,例如,可以选用开方、平方、立方、指数、对数等方法,本申请对此不作限制。Specifically, data processing methods of the same data magnitude can be selected according to different data types or different data magnitudes. For example, methods such as square root, square, cube, exponent, logarithm, etc. can be selected, which is not limited in this application. .
步骤210,对所述标准数据进行非线性降维处理,得到预设维度的目标数据。Step 210: Perform non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension.
其中,预设维度可以是用户根据后续数据处理的需求,通过终端对服务器进行预先设置的维度,预设维度的目标数据的数据量可以小于标准数据的数据量。Among them, the preset dimension may be a dimension preset by the user on the server through the terminal according to subsequent data processing requirements, and the data volume of the target data of the preset dimension may be less than the data volume of the standard data.
非线性降维处理的方法可以包括但不限于等量度映射(Isometric Feature Mapping,Isomap)、局部线性嵌入(Locally Linear Embedding,LLE)、改进的局部线性嵌入(Modified Locally Linear Embedding,MLLE)、Hessian Eigenmapping、谱嵌入(Spectral Embedding)、局部切空间排列算法(Local Tangent Space Alignment,LTSA)、多维标度法(Multi-dimensional Scaling,MDS)、t-分布随机邻域嵌入(t-distributed Stochastic Neighbor Embedding,t-SNE)等。Non-linear dimensionality reduction processing methods can include, but are not limited to, Isometric Feature Mapping (Isomap), Locally Linear Embedding (LLE), Modified Locally Linear Embedding (MLLE), Hessian Eigenmapping , Spectral Embedding, Local Tangent Space Alignment (LTSA), Multi-dimensional Scaling (MDS), t-distributed Stochastic Neighbor Embedding, t-SNE) and so on.
在实际应用中,也可以采用线性降维处理的方法,可以包括但不限于主成分分析法(Principal Component Analysis,PCA)、核主成分分析法(kernel PCA)、增量主成分分析法(Incremental PCA)等。In practical applications, linear dimensionality reduction methods can also be used, including but not limited to principal component analysis (PCA), kernel PCA, and incremental principal component analysis (Incremental Principal Component Analysis). PCA) etc.
具体地,服务器可以根据如上方法,利用数据在多维度上的黎曼空间里的聚类特征,将多维度的标准数据映射到低维度,例如,映射到2维,得到目标数据。Specifically, the server can map the multi-dimensional standard data to a low dimension, for example, to 2 dimensions, by using the clustering characteristics of the data in the multi-dimensional Riemann space according to the above method, to obtain the target data.
上述数据降维处理方法中,通过历史分析模型得到的目标特征,然后提取多维数据中与目标特征对应的初始数据,并对同一数据量级的数据处理后得到标准数据,对所述标准数进行非线性降维处理,得到预设维度的目标数据。生成的目标数据基于多维数据生成,与多维数据之间存在关联,从而可以保持多维数据的特征,进而可以通过目标数据进行后续的数据处理分析。相比以多维数据进行数据处理分析,可以节约系统进行数据处理分析时的资源耗费,并且可以提升数据处理效率。In the above-mentioned data dimensionality reduction processing method, the target feature is obtained through the historical analysis model, and then the initial data corresponding to the target feature in the multi-dimensional data is extracted, and the standard data is obtained after the data of the same data level is processed, and the standard number is performed Non-linear dimensionality reduction processing obtains target data of preset dimensions. The generated target data is generated based on multi-dimensional data, and there is an association with the multi-dimensional data, so that the characteristics of the multi-dimensional data can be maintained, and subsequent data processing and analysis can be performed through the target data. Compared with multi-dimensional data for data processing and analysis, it can save the resource consumption of the system for data processing and analysis, and can improve the efficiency of data processing.
如前所述,所述历史分析模型可以为多个,可以是分别与不同数据类型或不同的业务类型相关的模型。As mentioned above, the historical analysis model may be multiple, and may be models related to different data types or different business types.
在其中一个实施例中,所述对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征可以包括:分别对各所述历史分析模型进行解析,得到对应各所述历史分析模型的优先级大于目标优先级的初始特征。In one of the embodiments, the analyzing the historical analysis model to obtain the initial feature with a priority higher than the target priority may include: analyzing each of the historical analysis models to obtain the corresponding historical analysis model The initial characteristic with a priority greater than the target priority.
例如,可以是分别对糖尿病手术分析模型以及常规检查分析模型进行解析,得到对应糖尿病手术分析模型的优先级大于目标优先级的手术费、检查费、葡萄糖含量等初始特征, 以及得到对应常规检查分析模型的优先级大于目标优先级的身高、体重、视力、检查费等特征。For example, it can analyze the diabetes surgery analysis model and the routine inspection analysis model separately to obtain the initial characteristics of the operation fee, inspection fee, glucose content, etc. whose priority is higher than the target priority of the diabetes surgery analysis model, and obtain the corresponding routine inspection analysis The priority of the model is higher than the height, weight, eyesight, examination fee and other characteristics of the target priority.
参考图3所示的目标特征确定步骤的流程示意图,所述获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征,可以包括如下流程步骤:Referring to the schematic flowchart of the target feature determination step shown in FIG. 3, acquiring the frequency of the initial feature and selecting the initial feature whose frequency meets the requirements as the target feature may include the following process steps:
步骤S302,确定各所述初始特征在各所述历史分析模型中的特征优先度以及出现的频率。Step S302: Determine the feature priority and appearance frequency of each of the initial features in each of the historical analysis models.
其中,特征优先度是指特征在历史分析模型中对应目标优先级的度量标准,可以是高、中、低等度量标准。例如,手术费、葡萄糖含量等特征在糖尿病手术分析模型中位于高优先级,因此,其特征优先度可以是高,而身高、体重等特征在糖尿病手术分析模型中位于低优先级,从而,特征优先度可以是低。Among them, the feature priority refers to the metric standard of the feature corresponding to the target priority in the historical analysis model, which can be a high, medium, or low metric. For example, features such as operation cost and glucose content are at high priority in the diabetes surgery analysis model. Therefore, the feature priority can be high, while the features such as height and weight are at low priority in the diabetes surgery analysis model. Therefore, the feature The priority can be low.
本领域技术人员可以理解的是,此处仅为举例说明,在具体应用中,特征优先度也可以是数值形式的度量标准,例如高优先级对应10分,低优先级对应1分等,本申请对此不作限制。Those skilled in the art can understand that this is only an example. In specific applications, the feature priority can also be a numerical metric. For example, a high priority corresponds to 10 points, a low priority corresponds to 1 point, etc. The application is not restricted.
步骤S304,对各所述初始特征在各所述历史分析模型中的特征优先度以及出现的频率进行综合计算,得到各所述初始特征在各所述历史分析模型中的相对优先度等级。Step S304: Perform a comprehensive calculation on the feature priority and appearance frequency of each of the initial features in each of the historical analysis models to obtain the relative priority level of each of the initial features in each of the historical analysis models.
例如,服务器在获取到各初始特征的特征优先度以及出现的频率后,可以根据设定的特征优先度的比重以及特征出现的频率所占比重,分别对各初始特征进行综合计算,得到各初始特征的相对优先度等级。For example, after the server obtains the feature priority and appearance frequency of each initial feature, it can perform comprehensive calculations on each initial feature according to the set feature priority and feature appearance frequency ratio to obtain each initial feature. The relative priority level of the feature.
本领域技术人员可以理解的是,此处仅为举例说明,在实际应用中,对各初始特征在各历史分析模型中的特征优先度以及出现的频率进行综合计算也可以简单的加减,或者应用平方、开方、指数等计算方式进行计算,本申请对此不作限制。Those skilled in the art can understand that this is only an example. In practical applications, the comprehensive calculation of the feature priority and the frequency of occurrence of each initial feature in each historical analysis model can also be simply added or subtracted, or Calculation methods such as squares, square roots, and exponents are used for calculations, and this application does not limit this.
相对优先度等级可以是高、中、低等优先度量级,或者也可以是以60分、70分、80分、100分等以数值分数形式表现的优先度量级,或者以二者结合的方式确定的优先度量级。The relative priority level can be a high, medium, or low priority metric, or it can be a priority metric expressed in the form of numerical scores such as 60, 70, 80, 100, or a combination of the two Determined priority level.
步骤S306,比较各所述历史分析模型中各所述初始特征的相对优先度等级,确定相对优先度等级符合要求的初始特征为目标特征。Step S306: Compare the relative priority level of each of the initial features in each of the historical analysis models, and determine that the initial feature whose relative priority level meets the requirements is the target feature.
具体地,服务器计算得到的各初始特征的相对优先度等级后,可以通过对各初始特征的相对优先度等级进行比较,确定目标特征,例如,可以是确定相对优先度等级分值高的初始特征为目标特征,或者确定相对优先度等级为高的初始特征为目标特征。Specifically, after the server calculates the relative priority level of each initial feature, it can determine the target feature by comparing the relative priority level of each initial feature. For example, it can be to determine the initial feature with a high relative priority level. Is the target feature, or the initial feature with a high relative priority level is determined as the target feature.
通过获取各初始特征的在各不同历史分析模型中的特征优先度以及各初始特征的出现频率,对各初始特征的进行综合计算,得到各初始特征的相对优先度等级,根据相对优先度等级选取目标特征。从而确定的目标特征与多维数据关联性较强,可以提升基于目标特征提取得到的初始数据的准确性。By obtaining the feature priority of each initial feature in each different historical analysis model and the appearance frequency of each initial feature, the comprehensive calculation of each initial feature is performed to obtain the relative priority level of each initial feature, which is selected according to the relative priority level Target characteristics. Therefore, the determined target feature has a strong correlation with the multi-dimensional data, and the accuracy of the initial data obtained based on the target feature extraction can be improved.
在其中一个实施例中,在提取多维数据中与所述目标特征对应的初始数据之后,上述数据降维处理方法还可以包括:获取所述初始数据的数据来源地以及所述初始数据对应的 用户地址,并基于所述数据来源地以及用户地址将所述初始数据中的历史数据分为本地数据和异地数据。In one of the embodiments, after extracting the initial data corresponding to the target feature in the multidimensional data, the data dimensionality reduction processing method may further include: obtaining the data source of the initial data and the user corresponding to the initial data Address, and divide the historical data in the initial data into local data and remote data based on the data source and the user address.
如前所述,对于医保数据,从多维数据中提取得到初始数据可以包括但不限于本次就医费用数据、本次就医ICD数据、历史就医数据。其中,历史就医数据可以包括但不限于本地门诊次数、本地住院次数、异地门诊次数、异地住院次数、本地门诊次数占比、异地门诊次数占比等数据。As mentioned above, for medical insurance data, the initial data extracted from the multi-dimensional data may include, but is not limited to, the current medical treatment expense data, the current medical treatment ICD data, and the historical medical treatment data. Among them, historical medical treatment data may include, but are not limited to, data such as the number of local outpatient clinics, the number of local hospitalizations, the number of off-site outpatient clinics, the number of off-site hospitalizations, the proportion of local outpatient visits, and the proportion of off-site outpatient visits.
具体地,服务器可以根据初始数据的数据来源地以及所述初始数据对应的用户地址,将历史数据中的本地门诊次数、本地住院次数、异地门诊次数、异地住院次数、本地门诊次数占比、异地门诊次数占比等数据进行数据划分,以得到本地数据(本地门诊次数、本地住院次数、本地门诊次数占比等数据)和异地数据(异地门诊次数、异地住院次数、异地门诊次数占比等数据)。Specifically, the server may calculate the number of local outpatient clinics, the number of local hospitalizations, the number of off-site outpatient clinics, the number of off-site hospitalizations, the proportion of the number of local outpatient clinics in the historical data, and the proportion of the number of local outpatient clinics in the historical data according to the data source of the initial data and the user address corresponding to the initial data Data such as the proportion of outpatient visits is divided into data to obtain local data (the number of local outpatient visits, the number of local hospitalizations, the proportion of local outpatient visits, etc.) and remote data (the number of outpatient visits in different places, the number of hospitalizations in other places, the proportion of outpatient visits in other places, etc.) ).
结合前文所述实施例,服务器将初始数据分为本地数据和异地数据后,对不同类型的所述初始数据进行同一数据量级的数据处理,得到标准数据,可以包括:对不同类型的所述本地数据进行同一数据量级的数据处理,得到本地标准数据;对不同类型的所述异地数据进行同一数据量级的数据处理,得到异地标准数据。In combination with the foregoing embodiment, after the server divides the initial data into local data and remote data, it performs data processing of the same data level on different types of the initial data to obtain standard data, which may include: The local data is processed with the same data level to obtain local standard data; the different types of remote data are processed with the same data level to obtain the remote standard data.
具体地,服务器对本地数据以及异地数据分别进行同一数据量级的数据处理,得到的标准数据可以是同一数据量级的数据,也可以不同,例如,得到的本地标准数据可以是0-10之间的数据,得到的异地标准数据可以是0-100之间的数据,或者可以均是0-100之间的数据,本申请对此不作限制。可选的,对本地数据以及异地数据分别进行同一数量级的数据处理的方法可以相同也可以不同。Specifically, the server performs data processing of the same data level on local data and remote data, and the obtained standard data can be data of the same data level or different. For example, the obtained local standard data can be between 0 and 10. The data obtained in different places may be data between 0-100, or may all be data between 0-100, which is not limited in this application. Optionally, the methods for performing data processing of the same order of magnitude on local data and remote data may be the same or different.
进一步,所述对所述标准数据进行非线性降维处理得到预设维度的目标数据,可以包括:分别对所述本地标准数据和异地标准数据进行非线性降维处理,得到预设维度的本地目标数据以及异地目标数据。Further, the performing non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension may include: performing non-linear dimensionality reduction processing on the local standard data and the remote standard data respectively to obtain the local standard data of the preset dimensions. Target data and remote target data.
具体地,服务器可以根据本地标准数据和异地标准数据的不同的特性,选取不同的非线性降维处理方法对本地标准数据和异地标准数据进行降维处理。Specifically, the server may select different non-linear dimensionality reduction processing methods to perform dimensionality reduction processing on the local standard data and the remote standard data according to the different characteristics of the local standard data and the remote standard data.
可选的,服务器对本地标准数据和异地标准数据进行非线性降维数据,得到的本地目标数据以及异地目标数据的维度可以相同,也可以不同,具体可以与选取的数据降维方法以及后续数据处理的数据维度关联,本申请对此不作限制。Optionally, the server performs non-linear dimensionality reduction data on the local standard data and the remote standard data, and the obtained local target data and the remote target data can have the same or different dimensions, which can be different from the selected data dimensionality reduction method and subsequent data The processed data dimension association, this application does not limit this.
通过划分初始数据得到本地数据和异地数据,并分别进行同一数据量级的数据处理以及非线性降维处理,得到两种不同类别的目标数据,从而有益于后续根据地域不同进行不同类别的数据处理,可以使得后续数据处理更具针对性且提升数据处理的准确性。Obtain local data and remote data by dividing the initial data, and perform data processing of the same data level and non-linear dimensionality reduction processing to obtain two different types of target data, which is beneficial for subsequent data processing of different types according to different regions , Can make subsequent data processing more targeted and improve the accuracy of data processing.
可选地,服务器在得到本地数据和异地数据后,可以继续对异地就医数据进行划分,例如,按照具体的省份或者城市,分为北京数据、上海数据、广州数据等。Optionally, after the server obtains the local data and the remote data, it can continue to divide the remote medical treatment data, for example, according to specific provinces or cities, it is divided into Beijing data, Shanghai data, Guangzhou data, etc.
在实际应用中,服务器也可以先对初始数据进行同一数据量级的数据处理后,在基于数据来源地以及用户地址,对标准数据进行地域划分,然后分别进行数据降维,得到预设 维度的本地目标数据以及异地目标数据。In practical applications, the server can also perform data processing of the same data level on the initial data, then divide the standard data geographically based on the data source and user address, and then perform data dimensionality reduction respectively to obtain preset dimensions. Local target data and remote target data.
在其中一个实施例中,对不同类型的所述初始数据进行同一数据量级的数据处理得到标准数据之前,上诉数据处理还可以包括:按照数据类型将所述初始数据分为费用类数据以及次数类数据。In one of the embodiments, before data processing of the same data level is performed on different types of the initial data to obtain standard data, the appeal data processing may further include: dividing the initial data into expense data and times according to data types. Class data.
如前所述,对于医保数据,初始数据可以包括手术费、药品费、检查费、本地门诊次数、本地住院次数、异地门诊次数、异地住院次数等数据。具体地,服务器可以将手术费、药品费、检查费等数据分为费用类数据,将本地门诊次数、本地住院次数、异地门诊次数、异地住院次数等数据分为次数类数据。As mentioned earlier, for medical insurance data, the initial data may include data such as surgical fees, drug fees, inspection fees, the number of local outpatient clinics, the number of local hospitalizations, the number of off-site clinics, and the number of off-site hospitalizations. Specifically, the server can classify data such as surgical fees, drug fees, and inspection fees into expense data, and classify data such as the number of local outpatient clinics, the number of local hospitalizations, the number of off-site outpatient clinics, and the number of off-site hospitalizations into frequency data.
进一步,所述对不同类型的所述初始数据进行同一数据量级的数据处理得到标准数据,可以包括:分别获取与所述费用类数据和次数类数据对应的预设公式;根据所述费用类数据对应的预设公式以及对应的标准数量级,对不同数据量级的费用类数据进行同一数据量级的数据处理,得到对应所述费用类数据的标准数据;根据所述次数类数据对应的预设公式以及对应的标准数量级,对不同数据量级的次数类数据进行同一数据量级的数据处理,得到对应所述次数类数据的标准数据。Further, said performing the same data-level data processing on the different types of the initial data to obtain the standard data may include: obtaining preset formulas corresponding to the expense type data and the frequency type data respectively; and according to the expense type The preset formula corresponding to the data and the corresponding standard magnitude, the same data magnitude data processing is performed on the expense data of different data magnitudes, and the standard data corresponding to the expense data is obtained; according to the prediction corresponding to the frequency data Set the formula and the corresponding standard magnitude, perform data processing of the same data magnitude on the order data of different data magnitudes to obtain the standard data corresponding to the order data.
其中,预设公式是与前文所述的同一数据量级的数据处理方法对应的函数公式,例如,开方公式、平方公式、立方公式、指数公式、对数公式等。对不同类型的数据,预设公式可以相同,也可以不同。或者,不同类型的数据可以关联不同的预设公式,以进行数据处理。Wherein, the preset formula is a function formula corresponding to the data processing method of the same data magnitude as described above, for example, a square formula, a square formula, a cube formula, an exponential formula, a logarithmic formula, etc. For different types of data, the preset formulas can be the same or different. Or, different types of data can be associated with different preset formulas for data processing.
具体地,对于次数类数据和费用类数据,其数据量级存在较大差异。通过对分别对次数类数据和费用类数据进行同一数据量级的数据处理,可以得到更精确的标准数据,从而可以提升后续数据处理的准确性。Specifically, there is a big difference in the magnitude of data between the frequency data and the expense data. By performing data processing of the same data level on the frequency data and expense data respectively, more accurate standard data can be obtained, which can improve the accuracy of subsequent data processing.
在其中一个实施例中,服务器在提取多维数据中与所述目标特征对应的初始数据之后,还可以对提取的初始数据的完整性极性检测,具体地,图参考图4所示的数据缺失检测步骤的流程示意图,上诉数据降维处理方法还可以包括如下步骤:In one of the embodiments, after the server extracts the initial data corresponding to the target feature in the multidimensional data, it can also check the integrity and polarity of the extracted initial data. Specifically, refer to the data missing as shown in FIG. 4 Schematic diagram of the flow of the detection steps. The dimensionality reduction processing method of the appeal data may also include the following steps:
步骤S402,根据所述初始数据中至少一种数据类型的数据,对所述初始数据中剩余数据类型的数据进行数据缺失检测。Step S402: Perform data missing detection on the data of the remaining data types in the initial data according to the data of at least one data type in the initial data.
例如,继续以医保数据为例,服务器提取得到初始数据后,基于初始数据确定用户患病次数为3次,但是,在获取得到的初始数据中,治疗费用数据中仅包括2000、500两组数据,则可以通过患病次数数据确定治疗费用数据存在缺失。For example, continue to take medical insurance data as an example. After the server extracts the initial data, it determines that the user has 3 times of illness based on the initial data. However, in the initial data obtained, the treatment cost data only includes two sets of data of 2000 and 500. , The data on the number of illnesses can be used to determine the lack of data on treatment costs.
步骤S404,当检测到所述初始数据存在数据缺失时,根据与缺失的数据类型相同的数据,对所述数据类型的数据进行数据填充。Step S404: When it is detected that the initial data has data missing, perform data filling on the data of the data type according to the data of the same type as the missing data.
继续延用前例,服务器检测到治疗费用数据存在缺失后,可以根据治疗费用数据中包括的2000、500两组数据,确定填充数据,并进行数据填充。例如,可以是对2000、500两组数据取平局值或中位数,并根据确定的平局值或中位数对该治疗数据进行填充。Continue to use the previous example. After the server detects that the treatment cost data is missing, it can determine and fill in the data based on the 2000 and 500 sets of data included in the treatment cost data. For example, the average value or median of the two sets of data of 2000 and 500 can be taken, and the treatment data can be filled according to the determined average value or median.
在本实施例中,服务器也可以根据初始数据中用户的历史数据进行数据评估,根据评 估得到的评估数据进行数据填充。优选地,数据评估也可以是根据用户的历史数据与同病历用户的数据相结合参考,得到评估数据。In this embodiment, the server can also perform data evaluation based on the user's historical data in the initial data, and perform data filling based on the evaluation data obtained by the evaluation. Preferably, the data evaluation may also be based on a combination of the user's historical data and the data of the user with the same medical record to obtain the evaluation data.
通过对获取的初始数据进行确实检测并极性数据填充,可以提升获取的初始数据的完整性,从而可以提升后续数据处理的准确性。By accurately detecting the acquired initial data and filling in the polarity data, the integrity of the acquired initial data can be improved, and the accuracy of subsequent data processing can be improved.
应该理解的是,虽然图2-4的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-4中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowcharts of FIGS. 2-4 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figures 2-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
在其中一个实施例中,如图5所示,提供了一种数据降维处理装置,可以包括:解析模块100、目标特征生成模块200、数据提取模块300、同一数据量级数据处理模块400、降维处理模块500,其中:In one of the embodiments, as shown in FIG. 5, a data dimensionality reduction processing device is provided, which may include: an analysis module 100, a target feature generation module 200, a data extraction module 300, a data processing module 400 of the same data level, The dimensionality reduction processing module 500, in which:
解析模块100,用于从数据库中提取相关的历史分析模型,并对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征。The analysis module 100 is configured to extract relevant historical analysis models from the database, and analyze the historical analysis models to obtain initial features with a priority higher than the target priority.
目标特征生成模块200,用于获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征。The target feature generating module 200 is configured to obtain the frequency of the initial feature, and select the initial feature whose frequency meets the requirement as the target feature.
数据提取模块300,用于提取多维数据中与所述目标特征对应的初始数据。The data extraction module 300 is used to extract the initial data corresponding to the target feature in the multi-dimensional data.
同一数据量级数据处理模块400,用于对不同类型的所述初始数据进行同一数据量级的数据处理,得到标准数据。The same data level data processing module 400 is used to perform the same data level data processing on different types of the initial data to obtain standard data.
降维处理模块500,用于对所述标准数据进行非线性降维处理,得到预设维度的目标数据。The dimensionality reduction processing module 500 is configured to perform non-linear dimensionality reduction processing on the standard data to obtain target data of preset dimensions.
在其中一个实施例中,所述历史分析模型为多个,所述解析模块100用于分别对各所述历史分析模型进行解析,得到对应各所述历史分析模型的优先级大于目标优先级的初始特征。In one of the embodiments, there are multiple historical analysis models, and the analysis module 100 is configured to analyze each historical analysis model to obtain a priority corresponding to each historical analysis model that is greater than the target priority. Initial characteristics.
所述目标特征生成模块200可以包括:The target feature generation module 200 may include:
第一确定子模块,用于确定所述初始特征在所述历史分析模型中的特征优先度以及出现的频率。The first determining sub-module is used to determine the feature priority and the appearance frequency of the initial feature in the historical analysis model.
计算子模块,用于对各所述初始特征在各所述历史分析模型中的特征优先度以及出现的频率进行综合计算,得到各所述初始特征在各所述历史分析模型中的相对优先度等级。The calculation sub-module is used to comprehensively calculate the feature priority and appearance frequency of each of the initial features in each of the historical analysis models to obtain the relative priority of each of the initial features in each of the historical analysis models grade.
比较确定子模块,用于比较各所述历史分析模型中各所述初始特征的相对优先度等级,确定相对优先度等级符合要求的初始特征为目标特征。The comparison and determination sub-module is used to compare the relative priority level of each of the initial features in each of the historical analysis models, and determine that the initial feature whose relative priority level meets the requirements is the target feature.
在其中一个实施例中,上述数据降维处理装置还可以包括:In one of the embodiments, the above-mentioned data dimensionality reduction processing device may further include:
数据分类模块,用于在所数据提取模块300提取多维数据中与所述目标特征对应的初始数据之后,获取所述初始数据的数据来源地以及所述初始数据对应的用户地址,并基于所述数据来源地以及用户地址将所述初始数据中的历史数据分为本地数据和异地数据。The data classification module is used to obtain the data source of the initial data and the user address corresponding to the initial data after the data extraction module 300 extracts the initial data corresponding to the target feature in the multidimensional data, and based on the The data source and user address divide the historical data in the initial data into local data and remote data.
所述同一数据量级数据处理模块400,可以包括:The data processing module 400 of the same data level may include:
第一同一数据量级数据处理子模块,用于对不同类型的所述本地数据进行同一数据量级的数据处理,得到本地标准数据。The first data processing sub-module of the same data level is used to perform data processing of the same data level on different types of the local data to obtain local standard data.
第二同一数据量级数据处理子模块,用于对不同类型的所述异地数据进行同一数据量级的数据处理,得到异地标准数据。The second data processing sub-module of the same data level is used to perform data processing of the same data level on different types of said remote data to obtain remote standard data.
所述降维处理模块500用于分别对所述本地标准数据和异地标准数据进行非线性降维处理,得到预设维度的本地目标数据以及异地目标数据。The dimensionality reduction processing module 500 is configured to perform non-linear dimensionality reduction processing on the local standard data and the remote standard data, respectively, to obtain the local target data and the remote target data of a preset dimension.
在其中一个实施例中,上述数据降维处理装置还可以包括:In one of the embodiments, the above-mentioned data dimensionality reduction processing device may further include:
分类模块,用于所述同一数据量级数据处理模块400对不同类型的所述初始数据进行同一数据量级的数据处理得到标准数据之前,按照数据类型将所述初始数据分为费用类数据以及次数类数据。The classification module is used for the same data level data processing module 400 to perform data processing of the same data level on different types of the initial data before obtaining standard data, according to the data type to divide the initial data into expense data and Times data.
所述同一数据量级数据处理模块400可以包括:The same data level data processing module 400 may include:
获取子模块,用于分别获取与所述费用类数据和次数类数据对应的预设公式。The obtaining sub-module is used to obtain preset formulas corresponding to the expense type data and the frequency type data respectively.
第三同一数据量级数据处理子模块,用于根据所述费用类数据对应的预设公式以及对应的标准数量级,对不同数据量级的费用类数据进行同一数据量级的数据处理,得到对应所述费用类数据的标准数据。The third data processing sub-module of the same data level is used to perform data processing of the same data level on the expense data of different data levels according to the preset formula corresponding to the expense data and the corresponding standard magnitude. Standard data of the expense data.
第四同一数据量级数据处理子模块,用于根据所述次数类数据对应的预设公式以及对应的标准数量级,对不同数据量级的次数类数据进行同一数据量级的数据处理,得到对应所述次数类数据的标准数据。The fourth data processing sub-module of the same data magnitude is used to perform data processing of the same data magnitude on the multiple data of different data magnitudes according to the preset formula corresponding to the magnitude data and the corresponding standard magnitude to obtain the corresponding The standard data of the frequency type data.
在其中一个实施例中,上述数据降维处理装置还可以包括:In one of the embodiments, the above-mentioned data dimensionality reduction processing device may further include:
检测模块,用于所述数据提取模块300提取多维数据中与所述目标特征对应的初始数据之后,根据所述初始数据中至少一种数据类型的数据,对所述初始数据中剩余数据类型的数据进行数据缺失检测。The detection module is used for the data extraction module 300 after extracting the initial data corresponding to the target feature in the multi-dimensional data, according to the data of at least one data type in the initial data, check the remaining data types in the initial data Data is tested for missing data.
填充模块,用于当检测到所述初始数据存在数据缺失时,根据与缺失的数据类型相同的数据,对所述数据类型的数据进行数据填充。The filling module is used to fill the data of the data type according to the data of the same type as the missing data when it is detected that the initial data has data missing.
关于数据降维处理装置的具体限定可以参见上文中对于数据降维处理方法的限定,在此不再赘述。上述数据降维处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific definition of the data dimensionality reduction processing device, please refer to the above definition of the data dimensionality reduction processing method, which will not be repeated here. Each module in the above-mentioned data dimensionality reduction processing device can be implemented in whole or in part by software, hardware, and combinations thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
在其中一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图6所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接 口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性或易失性存储介质、内存储器。该非易失性或易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储历史分析模型数据、以及数据处理过程中的各种数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种数据降维处理方法。In one of the embodiments, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 6. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile or volatile storage medium and internal memory. The non-volatile or volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store historical analysis model data and various data during data processing. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instruction is executed by the processor to realize a data dimensionality reduction processing method.
本领域技术人员可以理解,图6中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体地计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. Specifically, the computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
一种计算机设备,包括存储器和一个或者多个处理器,该存储器存储有计算机可读指令,计算机可读指令被该处理器执行时,使得一个或多个处理器执行以下步骤:从数据库中提取相关的历史分析模型,并对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征;获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征;提取多维数据中与所述目标特征对应的初始数据;对不同类型的所述初始数据进行同一数据量级的数据处理,得到标准数据;及对所述标准数据进行非线性降维处理,得到预设维度的目标数据。A computer device includes a memory and one or more processors, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the one or more processors perform the following steps: extract from a database Relevant historical analysis model, and analyze the historical analysis model to obtain the initial feature with priority higher than the target priority; obtain the frequency of the initial feature, and select the initial feature with the required frequency as the target feature; extract the multidimensional data Initial data corresponding to the target feature; perform data processing of the same data level on different types of the initial data to obtain standard data; and perform non-linear dimensionality reduction processing on the standard data to obtain a target with a preset dimension data.
在其中一个实施例中,处理器执行计算机可读指令时所实现的历史分析模型为多个,所述对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征,可以包括:分别对各所述历史分析模型进行解析,得到对应各所述历史分析模型的优先级大于目标优先级的初始特征。所述获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征,可以包括:确定各所述初始特征在各所述历史分析模型中的特征优先度以及出现的频率;对各所述初始特征在各所述历史分析模型中的特征优先度以及出现的频率进行综合计算,得到各所述初始特征在各所述历史分析模型中的相对优先度等级;及比较各所述历史分析模型中各所述初始特征的相对优先度等级,确定相对优先度等级符合要求的初始特征为目标特征。In one of the embodiments, there are multiple historical analysis models that are implemented when the processor executes the computer-readable instructions, and the analysis of the historical analysis models to obtain initial features with a priority higher than the target priority may include: Analyze each of the historical analysis models to obtain the initial characteristics corresponding to the priority of each historical analysis model that is greater than the target priority. The acquiring the frequency of the initial feature and selecting the initial feature whose frequency meets the requirements as the target feature may include: determining the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models; Perform a comprehensive calculation on the feature priority and the frequency of appearance of the initial features in each of the historical analysis models to obtain the relative priority level of each of the initial features in each of the historical analysis models; and compare each of the histories The relative priority level of each initial feature in the model is analyzed, and the initial feature whose relative priority level meets the requirements is determined as the target feature.
在其中一个实施例中,处理器执行计算机可读指令时所实现的提取多维数据中与所述目标特征对应的初始数据之后,还可以包括:获取所述初始数据的数据来源地以及所述初始数据对应的用户地址,并基于所述数据来源地以及用户地址将所述初始数据中的历史数据分为本地数据和异地数据。所述对不同类型的所述初始数据进行同一数据量级的数据处理,得到标准数据,包括:对不同类型的所述本地数据进行同一数据量级的数据处理,得到本地标准数据;及对不同类型的所述异地数据进行同一数据量级的数据处理,得到异地标准数据。所述对所述标准数据进行非线性降维处理得到预设维度的目标数据,包括:分别对所述本地标准数据和异地标准数据进行非线性降维处理,得到预设维度的本地目标数据以及异地目标数据。In one of the embodiments, after the processor executes the computer-readable instructions to extract the initial data corresponding to the target feature in the multi-dimensional data, it may further include: acquiring the data source of the initial data and the initial data. The user address corresponding to the data, and the historical data in the initial data is divided into local data and remote data based on the data source and the user address. The performing the same data-level data processing on the different types of the initial data to obtain standard data includes: performing the same data-level data processing on the different types of the local data to obtain local standard data; and Data processing of the same data level is performed on the type of the remote data to obtain the remote standard data. The performing non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension includes: performing non-linear dimensionality reduction processing on the local standard data and remote standard data respectively to obtain the local target data of the preset dimension, and Remote target data.
在其中一个实施例中,处理器执行计算机可读指令时所实现的对不同类型的所述初始数据进行同一数据量级的数据处理得到标准数据之前,还可以包括:按照数据类型将所述初始数据分为费用类数据以及次数类数据。所述对不同类型的所述初始数据进行同一数据量级的数据处理得到标准数据,可以包括:分别获取与所述费用类数据和次数类数据对应的预设公式;根据所述费用类数据对应的预设公式以及对应的标准数量级,对不同数据量级的费用类数据进行同一数据量级的数据处理,得到对应所述费用类数据的标准数据;及根据所述次数类数据对应的预设公式以及对应的标准数量级,对不同数据量级的次数类数据进行同一数据量级的数据处理,得到对应所述次数类数据的标准数据。In one of the embodiments, when the processor executes the computer-readable instructions, before performing the same data-level data processing on the different types of the initial data to obtain the standard data, it may further include: dividing the initial data according to the data type. The data is divided into expense data and frequency data. The data processing of the same data level on the different types of the initial data to obtain the standard data may include: obtaining preset formulas corresponding to the cost-type data and the frequency-type data, respectively; corresponding to the cost-type data Perform data processing of the same data level on expense data of different data levels to obtain the standard data corresponding to the expense data; and according to the preset corresponding to the frequency data The formula and the corresponding standard order of magnitude, perform data processing of the same data level on the order data of different data levels to obtain the standard data corresponding to the order data.
在其中一个实施例中,处理器执行计算机可读指令时所实现的提取多维数据中与所述目标特征对应的初始数据之后,还可以包括:根据所述初始数据中至少一种数据类型的数据,对所述初始数据中剩余数据类型的数据进行数据缺失检测;当检测到所述初始数据存在数据缺失时,根据与缺失的数据类型相同的数据,对所述数据类型的数据进行数据填充。In one of the embodiments, after extracting the initial data corresponding to the target feature in the multi-dimensional data realized when the processor executes the computer-readable instruction, it may further include: data according to at least one data type in the initial data , Performing data missing detection on the data of the remaining data types in the initial data; when detecting that the initial data has data missing, perform data filling on the data of the data type according to data of the same type as the missing data.
一个或多个存储有计算机可读指令的计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤从数据库中提取相关的历史分析模型,并对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征;获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征;提取多维数据中与所述目标特征对应的初始数据;对不同类型的所述初始数据进行同一数据量级的数据处理,得到标准数据;及对所述标准数据进行非线性降维处理,得到预设维度的目标数据。One or more computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps to extract relevant historical analysis models from the database , And analyze the historical analysis model to obtain the initial feature with a priority higher than the target priority; obtain the frequency of the initial feature, and select the initial feature with the required frequency as the target feature; extract the multi-dimensional data and the target feature Corresponding initial data; performing data processing of the same data level on different types of the initial data to obtain standard data; and performing non-linear dimensionality reduction processing on the standard data to obtain target data of preset dimensions.
该计算机可读存储介质可以是非易失性,也可以是易失性的。The computer-readable storage medium may be non-volatile or volatile.
在其中一个实施例中,计算机可读指令被处理器执行时所实现的历史分析模型为多个,所述对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征,可以包括:分别对各所述历史分析模型进行解析,得到对应各所述历史分析模型的优先级大于目标优先级的初始特征。所述获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征,可以包括:确定各所述初始特征在各所述历史分析模型中的特征优先度以及出现的频率;对各所述初始特征在各所述历史分析模型中的特征优先度以及出现的频率进行综合计算,得到各所述初始特征在各所述历史分析模型中的相对优先度等级;及比较各所述历史分析模型中各所述初始特征的相对优先度等级,确定相对优先度等级符合要求的初始特征为目标特征。In one of the embodiments, there are multiple historical analysis models that are implemented when the computer-readable instruction is executed by the processor, and the analysis of the historical analysis model to obtain an initial feature with a priority higher than the target priority may include: Analyze each of the historical analysis models to obtain the initial characteristics corresponding to the priority of each historical analysis model that is greater than the target priority. The acquiring the frequency of the initial feature and selecting the initial feature whose frequency meets the requirements as the target feature may include: determining the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models; Perform a comprehensive calculation on the feature priority and the frequency of appearance of the initial features in each of the historical analysis models to obtain the relative priority level of each of the initial features in each of the historical analysis models; and compare each of the histories The relative priority level of each initial feature in the model is analyzed, and the initial feature whose relative priority level meets the requirements is determined as the target feature.
在其中一个实施例中,计算机可读指令被处理器执行时所实现的提取多维数据中与所述目标特征对应的初始数据之后,还可以包括:获取所述初始数据的数据来源地以及所述初始数据对应的用户地址,并基于所述数据来源地以及用户地址将所述初始数据中的历史数据分为本地数据和异地数据。所述对不同类型的所述初始数据进行同一数据量级的数据处理,得到标准数据,包括:对不同类型的所述本地数据进行同一数据量级的数据处理,得到本地标准数据;及对不同类型的所述异地数据进行同一数据量级的数据处理,得到异地标准数据。所述对所述标准数据进行非线性降维处理得到预设维度的目标数据,包括: 分别对所述本地标准数据和异地标准数据进行非线性降维处理,得到预设维度的本地目标数据以及异地目标数据。In one of the embodiments, after the extraction of the initial data corresponding to the target feature in the multi-dimensional data realized when the computer-readable instruction is executed by the processor, it may further include: acquiring the data source of the initial data and the The user address corresponding to the initial data, and the historical data in the initial data is divided into local data and remote data based on the data source and the user address. The performing the same data-level data processing on the different types of the initial data to obtain standard data includes: performing the same data-level data processing on the different types of the local data to obtain local standard data; and Data processing of the same data level is performed on the type of the remote data to obtain the remote standard data. The performing non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension includes: performing non-linear dimensionality reduction processing on the local standard data and the remote standard data, respectively, to obtain local target data of the preset dimension, and Remote target data.
在其中一个实施例中,计算机可读指令被处理器执行时所实现的对不同类型的所述初始数据进行同一数据量级的数据处理得到标准数据之前,还可以包括:按照数据类型将所述初始数据分为费用类数据以及次数类数据。所述对不同类型的所述初始数据进行同一数据量级的数据处理得到标准数据,可以包括:分别获取与所述费用类数据和次数类数据对应的预设公式;根据所述费用类数据对应的预设公式以及对应的标准数量级,对不同数据量级的费用类数据进行同一数据量级的数据处理,得到对应所述费用类数据的标准数据;及根据所述次数类数据对应的预设公式以及对应的标准数量级,对不同数据量级的次数类数据进行同一数据量级的数据处理,得到对应所述次数类数据的标准数据。In one of the embodiments, when the computer-readable instructions are executed by the processor to perform the same data-level data processing on the different types of the initial data to obtain the standard data, it may further include: The initial data is divided into expense data and frequency data. The performing data processing of the same data level on the different types of the initial data to obtain the standard data may include: obtaining preset formulas corresponding to the expense-type data and the frequency-type data, respectively; corresponding to the expense-type data Perform data processing of the same data level on expense data of different data levels to obtain the standard data corresponding to the expense data; and according to the preset corresponding to the frequency data The formula and the corresponding standard order of magnitude, perform data processing of the same data level on the order data of different data levels to obtain the standard data corresponding to the order data.
在其中一个实施例中,计算机可读指令被处理器执行时所实现的提取多维数据中与所述目标特征对应的初始数据之后,还可以包括:根据所述初始数据中至少一种数据类型的数据,对所述初始数据中剩余数据类型的数据进行数据缺失检测;及当检测到所述初始数据存在数据缺失时,根据与缺失的数据类型相同的数据,对所述数据类型的数据进行数据填充。In one of the embodiments, after the extraction of the initial data corresponding to the target feature in the multidimensional data realized when the computer-readable instruction is executed by the processor, it may further include: Data, perform data missing detection on data of the remaining data types in the initial data; and when it is detected that the initial data has data missing, perform data on the data of the data type according to the same data as the missing data type filling.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a computer-readable storage. In the medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database, or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (20)

  1. 一种数据降维处理方法,包括:A data dimensionality reduction processing method, including:
    从数据库中提取历史分析模型,并对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征;Extracting a historical analysis model from the database, and analyzing the historical analysis model to obtain an initial feature with a priority higher than the target priority;
    获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征;Acquiring the frequency of the initial feature, and selecting the initial feature whose frequency meets the requirements as the target feature;
    提取多维数据中与所述目标特征对应的初始数据;Extracting the initial data corresponding to the target feature in the multidimensional data;
    对不同类型的所述初始数据进行同一数据量级的数据处理,得到标准数据;及Perform data processing of the same data level on different types of said initial data to obtain standard data; and
    对所述标准数据进行非线性降维处理,得到预设维度的目标数据。Non-linear dimensionality reduction processing is performed on the standard data to obtain target data of preset dimensions.
  2. 根据权利要求1所述的方法,其中,所述历史分析模型为多个,所述对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征,包括:The method according to claim 1, wherein there are multiple historical analysis models, and the parsing of the historical analysis models to obtain an initial feature with a priority greater than a target priority comprises:
    分别对各所述历史分析模型进行解析,得到对应各所述历史分析模型的优先级大于目标优先级的初始特征;Analyze each of the historical analysis models respectively, and obtain the initial characteristics corresponding to the priority of each historical analysis model that is greater than the target priority;
    所述获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征,包括:The acquiring the frequency of the initial feature and selecting the initial feature whose frequency meets the requirements as the target feature includes:
    确定各所述初始特征在各所述历史分析模型中的特征优先度以及出现的频率;Determining the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models;
    对各所述初始特征在各所述历史分析模型中的特征优先度以及出现的频率进行综合计算,得到各所述初始特征在各所述历史分析模型中的相对优先度等级;及Comprehensively calculate the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models to obtain the relative priority level of each of the initial features in each of the historical analysis models; and
    比较各所述历史分析模型中各所述初始特征的相对优先度等级,确定相对优先度等级符合要求的初始特征为所述目标特征。Comparing the relative priority levels of each of the initial features in each of the historical analysis models, and determining that the initial feature whose relative priority level meets the requirements is the target feature.
  3. 根据权利要求1或2所述的方法,其中,所述提取多维数据中与所述目标特征对应的初始数据之后,所述方法还包括:The method according to claim 1 or 2, wherein, after the extraction of the initial data corresponding to the target feature in the multi-dimensional data, the method further comprises:
    获取所述初始数据的数据来源地以及所述初始数据对应的用户地址,并基于所述数据来源地以及用户地址将所述初始数据中的历史数据分为本地数据和异地数据;Acquiring the data source of the initial data and the user address corresponding to the initial data, and dividing the historical data in the initial data into local data and off-site data based on the data source and user address;
    所述对不同类型的所述初始数据进行同一数据量级的数据处理,得到标准数据,包括:The data processing of the same data level on the different types of the initial data to obtain standard data includes:
    对不同类型的所述本地数据进行同一数据量级的数据处理,得到本地标准数据;及Perform data processing of the same data level on different types of said local data to obtain local standard data; and
    对不同类型的所述异地数据进行同一数据量级的数据处理,得到异地标准数据;Perform data processing of the same data level on different types of said remote data to obtain remote standard data;
    所述对所述标准数据进行非线性降维处理得到预设维度的目标数据,包括:The performing non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension includes:
    分别对所述本地标准数据和异地标准数据进行非线性降维处理,得到预设维度的本地目标数据以及异地目标数据。Non-linear dimensionality reduction processing is performed on the local standard data and the remote standard data, respectively, to obtain the local target data and the remote target data of a preset dimension.
  4. 根据权利要求1或2所述的方法,其中,所述对不同类型的所述初始数据进行同一数据量级的数据处理得到标准数据之前,所述方法还包括:The method according to claim 1 or 2, wherein before the data processing of the same data level is performed on the different types of the initial data to obtain the standard data, the method further comprises:
    按照数据类型将所述初始数据分为费用类数据以及次数类数据;Divide the initial data into expense data and frequency data according to data types;
    所述对不同类型的所述初始数据进行同一数据量级的数据处理得到标准数据,包括:The data processing of the same data level on the different types of the initial data to obtain the standard data includes:
    分别获取与所述费用类数据和次数类数据对应的预设公式;Obtaining preset formulas corresponding to the expense data and the frequency data respectively;
    根据所述费用类数据对应的预设公式以及对应的标准数量级,对不同数据量级的费用类数据进行同一数据量级的数据处理,得到对应所述费用类数据的标准数据;及According to the preset formula corresponding to the expense data and the corresponding standard magnitude, perform data processing of the same data magnitude on the expense data of different data magnitudes to obtain the standard data corresponding to the expense data; and
    根据所述次数类数据对应的预设公式以及对应的标准数量级,对不同数据量级的次数类数据进行同一数据量级的数据处理,得到对应所述次数类数据的标准数据。According to the preset formula corresponding to the order data and the corresponding standard order of magnitude, data processing of the same data order is performed on the order data of different data levels to obtain the standard data corresponding to the order data.
  5. 根据权利要求1或2所述的方法,其中,所述提取多维数据中与所述目标特征对应的初始数据之后,所述方法还包括:The method according to claim 1 or 2, wherein, after the extraction of the initial data corresponding to the target feature in the multi-dimensional data, the method further comprises:
    根据所述初始数据中至少一种数据类型的数据,对所述初始数据中剩余数据类型的数据进行数据缺失检测;及Performing data missing detection on the data of the remaining data types in the initial data according to the data of at least one data type in the initial data; and
    当检测到所述初始数据存在数据缺失时,根据与缺失的数据类型相同的数据,对所述数据类型的数据进行数据填充。When it is detected that there is a data missing in the initial data, data filling is performed on the data of the data type according to the data of the same type as the missing data.
  6. 一种数据降维处理装置,包括:A data dimensionality reduction processing device, including:
    解析模块,用于从数据库中提取历史分析模型,并对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征;The analysis module is used to extract a historical analysis model from the database, and analyze the historical analysis model to obtain an initial feature with a priority higher than the target priority;
    目标特征生成模块,用于获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征;The target feature generation module is used to obtain the frequency of the initial feature, and select the initial feature whose frequency meets the requirements as the target feature;
    数据提取模块,用于提取多维数据中与所述目标特征对应的初始数据;A data extraction module for extracting initial data corresponding to the target feature in the multi-dimensional data;
    同一数据量级数据处理模块,用于对不同类型的所述初始数据进行同一数据量级的数据处理得到标准数据;及The same data level data processing module is used to perform the same data level data processing on different types of said initial data to obtain standard data; and
    降维处理模块,用于对所述标准数据进行降维处理得到预设维度的目标数据。The dimensionality reduction processing module is used to perform dimensionality reduction processing on the standard data to obtain target data of preset dimensions.
  7. 根据权利要求6所述的装置,其中,所述历史分析模型为多个,所述解析模块用于分别对各所述历史分析模型进行解析,得到对应各所述历史分析模型的优先级大于目标优先级的初始特征;7. The device according to claim 6, wherein there are multiple historical analysis models, and the analysis module is used to analyze each of the historical analysis models to obtain that the priority of each historical analysis model is greater than the target Initial characteristics of priority;
    所述目标特征生成模块包括:The target feature generation module includes:
    第一确定子模块,用于确定所述初始特征在所述历史分析模型中的特征优先度以及出现的频率;The first determining sub-module is used to determine the feature priority and the frequency of appearance of the initial feature in the historical analysis model;
    计算子模块,用于对各所述初始特征在各所述历史分析模型中的特征优先度以及出现的频率进行综合计算,得到各所述初始特征在各所述历史分析模型中的相对优先度等级;及The calculation sub-module is used to comprehensively calculate the feature priority and appearance frequency of each of the initial features in each of the historical analysis models to obtain the relative priority of each of the initial features in each of the historical analysis models Grade; and
    比较确定子模块,用于比较各所述历史分析模型中各所述初始特征的相对优先度等级,确定相对优先度等级符合要求的初始特征为所述目标特征。The comparison and determination sub-module is used to compare the relative priority level of each of the initial features in each of the historical analysis models, and determine that the initial feature whose relative priority level meets the requirements is the target feature.
  8. 根据权利要求6或7所述的装置,其中,所述装置还包括:The device according to claim 6 or 7, wherein the device further comprises:
    数据分类模块,用于在所数据提取模块提取多维数据中与所述目标特征对应的初始数据之后,获取所述初始数据的数据来源地以及所述初始数据对应的用户地址,并基于所述数据来源地以及用户地址将所述初始数据中的历史就医数据分为本地就医数据和异地就医数据;The data classification module is used to obtain the data source of the initial data and the user address corresponding to the initial data after the data extraction module extracts the initial data corresponding to the target feature in the multidimensional data, and based on the data The source place and the user address divide the historical medical treatment data in the initial data into local medical treatment data and remote medical treatment data;
    所述同一数据量级数据处理模块,包括:The data processing module of the same data level includes:
    第一同一数据量级数据处理子模块,用于对不同类型的所述本地就医数据进行同一数 据量级的数据处理,得到本地标准数据;The first data processing sub-module of the same data level is used to perform data processing of the same data level on different types of said local medical treatment data to obtain local standard data;
    第二同一数据量级数据处理子模块,用于对不同类型的所述异地就医数据进行同一数据量级的数据处理,得到异地标准数据;及The second data processing sub-module of the same data level is used to perform data processing of the same data level on different types of said remote medical treatment data to obtain remote standard data; and
    所述降维处理模块用于分别对所述本地标准数据和异地标准数据进行降维处理,得到预设维度的本地目标数据以及异地目标数据。The dimensionality reduction processing module is used to perform dimensionality reduction processing on the local standard data and the remote standard data respectively to obtain the local target data and the remote target data of a preset dimension.
  9. 根据权利要求6或7所述的装置,其中,上述数据降维处理装置还包括:The device according to claim 6 or 7, wherein the data dimensionality reduction processing device further comprises:
    分类模块,用于所述对不同类型的所述初始数据进行同一数据量级的数据处理得到标准数据之前,按照数据类型将所述初始数据分为费用类数据以及次数类数据;The classification module is used to divide the initial data into expense data and frequency data according to the data type before performing data processing of the same data level on the different types of the initial data to obtain the standard data;
    所述同一数据量级数据处理模块包括:The data processing module of the same data level includes:
    获取子模块,用于分别获取与所述费用类数据和次数类数据对应的预设公式;The obtaining sub-module is used to obtain the preset formula corresponding to the expense data and the frequency data respectively;
    第三同一数据量级数据处理子模块,用于根据所述费用类数据对应的预设公式以及对应的标准数量级,对不同数据量级的费用类数据进行同一数据量级的数据处理,得到对应所述费用类数据的标准数据;The third data processing sub-module of the same data level is used to perform data processing of the same data level on the expense data of different data levels according to the preset formula corresponding to the expense data and the corresponding standard magnitude. Standard data of the expense data;
    第四同一数据量级数据处理子模块,用于根据所述次数类数据对应的预设公式以及对应的标准数量级,对不同数据量级的次数类数据进行同一数据量级的数据处理,得到对应所述次数类数据的标准数据。The fourth data processing sub-module of the same data magnitude is used to perform data processing of the same data magnitude on the multiple data of different data magnitudes according to the preset formula corresponding to the magnitude data and the corresponding standard magnitude to obtain the corresponding The standard data of the frequency type data.
  10. 根据权利要求6或7所述的装置,其中,上述装置还包括:The device according to claim 6 or 7, wherein the device further comprises:
    检测模块,用于所述数据提取模块所述提取多维数据中与所述目标特征对应的初始数据之后,根据所述初始数据中至少一种数据类型的数据,对所述初始数据中剩余数据类型的数据进行数据缺失检测;The detection module is configured to, after the data extraction module extracts the initial data corresponding to the target feature in the multi-dimensional data, determine the remaining data types in the initial data according to the data of at least one data type in the initial data Data missing detection;
    填充模块,用于当检测到所述初始数据存在数据缺失时,根据与缺失的数据类型相同的数据,对所述数据类型的数据进行数据填充。The filling module is used to fill the data of the data type according to the data of the same type as the missing data when it is detected that the initial data has data missing.
  11. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器存储有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more The processor performs the following steps:
    从数据库中提取历史分析模型,并对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征;Extracting a historical analysis model from the database, and analyzing the historical analysis model to obtain an initial feature with a priority higher than the target priority;
    获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征;Acquiring the frequency of the initial feature, and selecting the initial feature whose frequency meets the requirements as the target feature;
    提取多维数据中与所述目标特征对应的初始数据;Extracting the initial data corresponding to the target feature in the multidimensional data;
    对不同类型的所述初始数据进行同一数据量级的数据处理,得到标准数据;及Perform data processing of the same data level on different types of said initial data to obtain standard data; and
    对所述标准数据进行非线性降维处理,得到预设维度的目标数据。Non-linear dimensionality reduction processing is performed on the standard data to obtain target data of preset dimensions.
  12. 根据权利要求11所述的方法,其中,所述历史分析模型为多个,所述处理器执行所述计算机可读指令时所实现的所述对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征,包括:The method according to claim 11, wherein there are multiple historical analysis models, and the parsing of the historical analysis model implemented when the processor executes the computer-readable instructions obtains that the priority is greater than the target The initial characteristics of priority include:
    分别对各所述历史分析模型进行解析,得到对应各所述历史分析模型的优先级大于目 标优先级的初始特征;Analyze each of the historical analysis models respectively, and obtain the initial characteristics corresponding to the priority of each historical analysis model that is greater than the target priority;
    所述处理器执行所述计算机可读指令时所实现的所述获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征,包括:The frequency at which the processor executes the computer-readable instruction to acquire the initial feature and select the initial feature whose frequency meets the requirements as the target feature includes:
    确定各所述初始特征在各所述历史分析模型中的特征优先度以及出现的频率;Determining the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models;
    对各所述初始特征在各所述历史分析模型中的特征优先度以及出现的频率进行综合计算,得到各所述初始特征在各所述历史分析模型中的相对优先度等级;及Comprehensively calculate the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models to obtain the relative priority level of each of the initial features in each of the historical analysis models; and
    比较各所述历史分析模型中各所述初始特征的相对优先度等级,确定相对优先度等级符合要求的初始特征为所述目标特征。Comparing the relative priority levels of each of the initial features in each of the historical analysis models, and determining that the initial feature whose relative priority level meets the requirements is the target feature.
  13. 根据权利要求11或12所述的方法,其中,所述处理器执行所述计算机可读指令时所实现的所述提取多维数据中与所述目标特征对应的初始数据之后,还实现以下步骤:The method according to claim 11 or 12, wherein after said extracting the initial data corresponding to the target feature in the multi-dimensional data which is implemented when the processor executes the computer readable instruction, the following steps are further implemented:
    获取所述初始数据的数据来源地以及所述初始数据对应的用户地址,并基于所述数据来源地以及用户地址将所述初始数据中的历史数据分为本地数据和异地数据;Acquiring the data source of the initial data and the user address corresponding to the initial data, and dividing the historical data in the initial data into local data and off-site data based on the data source and user address;
    所述处理器执行所述计算机可读指令时所实现的所述对不同类型的所述初始数据进行同一数据量级的数据处理,得到标准数据,包括:When the processor executes the computer-readable instructions, the same data-level data processing is performed on the different types of the initial data to obtain standard data, including:
    对不同类型的所述本地数据进行同一数据量级的数据处理,得到本地标准数据;及Perform data processing of the same data level on different types of said local data to obtain local standard data; and
    对不同类型的所述异地数据进行同一数据量级的数据处理,得到异地标准数据;Perform data processing of the same data level on different types of said remote data to obtain remote standard data;
    所述处理器执行所述计算机可读指令时所实现的所述对所述标准数据进行非线性降维处理得到预设维度的目标数据,包括:The non-linear dimensionality reduction processing on the standard data that is implemented when the processor executes the computer-readable instructions to obtain target data of a preset dimension includes:
    分别对所述本地标准数据和异地标准数据进行非线性降维处理,得到预设维度的本地目标数据以及异地目标数据。Non-linear dimensionality reduction processing is performed on the local standard data and the remote standard data, respectively, to obtain the local target data and the remote target data of a preset dimension.
  14. 根据权利要求11或12所述的方法,其中,所述处理器执行所述计算机可读指令时所实现的所述对不同类型的所述初始数据进行同一数据量级的数据处理得到标准数据之前,还实现以下步骤:The method according to claim 11 or 12, wherein the processing of the same data level on the different types of the initial data, which is implemented when the processor executes the computer-readable instructions, obtains standard data , And implement the following steps:
    按照数据类型将所述初始数据分为费用类数据以及次数类数据;Divide the initial data into expense data and frequency data according to data types;
    所述处理器执行所述计算机可读指令时所实现的所述对不同类型的所述初始数据进行同一数据量级的数据处理得到标准数据,包括:When the processor executes the computer-readable instructions, the data processing of the same data level on the different types of the initial data to obtain the standard data includes:
    分别获取与所述费用类数据和次数类数据对应的预设公式;Obtaining preset formulas corresponding to the expense data and the frequency data respectively;
    根据所述费用类数据对应的预设公式以及对应的标准数量级,对不同数据量级的费用类数据进行同一数据量级的数据处理,得到对应所述费用类数据的标准数据;及According to the preset formula corresponding to the expense data and the corresponding standard magnitude, perform data processing of the same data magnitude on the expense data of different data magnitudes to obtain the standard data corresponding to the expense data; and
    根据所述次数类数据对应的预设公式以及对应的标准数量级,对不同数据量级的次数类数据进行同一数据量级的数据处理,得到对应所述次数类数据的标准数据。According to the preset formula corresponding to the order data and the corresponding standard order of magnitude, data processing of the same data order is performed on the order data of different data levels to obtain the standard data corresponding to the order data.
  15. 根据权利要求11或12所述的方法,其中,所述处理器执行所述计算机可读指令时所实现的所述提取多维数据中与所述目标特征对应的初始数据之后,还实现以下步骤:The method according to claim 11 or 12, wherein after said extracting the initial data corresponding to the target feature in the multi-dimensional data which is implemented when the processor executes the computer readable instruction, the following steps are further implemented:
    根据所述初始数据中至少一种数据类型的数据,对所述初始数据中剩余数据类型的数据进行数据缺失检测;及Performing data missing detection on the data of the remaining data types in the initial data according to the data of at least one data type in the initial data; and
    当检测到所述初始数据存在数据缺失时,根据与缺失的数据类型相同的数据,对所述数据类型的数据进行数据填充。When it is detected that there is a data missing in the initial data, data filling is performed on the data of the data type according to the data of the same type as the missing data.
  16. 一个或多个存储有计算机可读指令的计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:One or more computer-readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
    从数据库中提取历史分析模型,并对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征;Extracting a historical analysis model from the database, and analyzing the historical analysis model to obtain an initial feature with a priority higher than the target priority;
    获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征;Acquiring the frequency of the initial feature, and selecting the initial feature whose frequency meets the requirements as the target feature;
    提取多维数据中与所述目标特征对应的初始数据;Extracting the initial data corresponding to the target feature in the multidimensional data;
    对不同类型的所述初始数据进行同一数据量级的数据处理,得到标准数据;及Perform data processing of the same data level on different types of said initial data to obtain standard data; and
    对所述标准数据进行非线性降维处理,得到预设维度的目标数据。Non-linear dimensionality reduction processing is performed on the standard data to obtain target data of preset dimensions.
  17. 根据权利要求16所述的方法,其中,所述历史分析模型为多个,所述计算机可读指令被所述处理器执行时所述对所述历史分析模型进行解析得到优先级大于目标优先级的初始特征,包括:The method according to claim 16, wherein there are multiple historical analysis models, and when the computer-readable instructions are executed by the processor, the historical analysis model is analyzed to obtain a priority greater than a target priority The initial characteristics include:
    分别对各所述历史分析模型进行解析,得到对应各所述历史分析模型的优先级大于目标优先级的初始特征;Analyze each of the historical analysis models respectively, and obtain the initial characteristics corresponding to the priority of each historical analysis model that is greater than the target priority;
    所述计算机可读指令被所述处理器执行时所述获取所述初始特征的频率,并选择频率符合要求的初始特征作为目标特征,包括:The acquiring the frequency of the initial feature when the computer-readable instruction is executed by the processor and selecting the initial feature whose frequency meets the requirements as the target feature includes:
    确定各所述初始特征在各所述历史分析模型中的特征优先度以及出现的频率;Determining the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models;
    对各所述初始特征在各所述历史分析模型中的特征优先度以及出现的频率进行综合计算,得到各所述初始特征在各所述历史分析模型中的相对优先度等级;及Comprehensively calculate the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models to obtain the relative priority level of each of the initial features in each of the historical analysis models; and
    比较各所述历史分析模型中各所述初始特征的相对优先度等级,确定相对优先度等级符合要求的初始特征为所述目标特征。Comparing the relative priority levels of each of the initial features in each of the historical analysis models, and determining that the initial feature whose relative priority level meets the requirements is the target feature.
  18. 根据权利要求16或17所述的方法,其中,所述计算机可读指令被所述处理器执行时所述提取多维数据中与所述目标特征对应的初始数据之后,还执行以下步骤:The method according to claim 16 or 17, wherein, after the initial data corresponding to the target feature in the multi-dimensional data is extracted when the computer-readable instructions are executed by the processor, the following steps are further performed:
    获取所述初始数据的数据来源地以及所述初始数据对应的用户地址,并基于所述数据来源地以及用户地址将所述初始数据中的历史数据分为本地数据和异地数据;Acquiring the data source of the initial data and the user address corresponding to the initial data, and dividing the historical data in the initial data into local data and off-site data based on the data source and user address;
    所述计算机可读指令被所述处理器执行时所述对不同类型的所述初始数据进行同一数据量级的数据处理,得到标准数据,包括:When the computer-readable instructions are executed by the processor, the same data-level data processing is performed on the different types of the initial data to obtain standard data, including:
    对不同类型的所述本地数据进行同一数据量级的数据处理,得到本地标准数据;及Perform data processing of the same data level on different types of said local data to obtain local standard data; and
    对不同类型的所述异地数据进行同一数据量级的数据处理,得到异地标准数据;Perform data processing of the same data level on different types of said remote data to obtain remote standard data;
    所述计算机可读指令被所述处理器执行时所述对所述标准数据进行非线性降维处理得到预设维度的目标数据,包括:When the computer-readable instructions are executed by the processor, the non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension includes:
    分别对所述本地标准数据和异地标准数据进行非线性降维处理,得到预设维度的本地目标数据以及异地目标数据。Non-linear dimensionality reduction processing is performed on the local standard data and the remote standard data, respectively, to obtain the local target data and the remote target data of a preset dimension.
  19. 根据权利要求16或17所述的方法,其中,所述计算机可读指令被所述处理器执 行时所述对不同类型的所述初始数据进行同一数据量级的数据处理得到标准数据之前,还执行以下步骤:The method according to claim 16 or 17, wherein, when the computer-readable instructions are executed by the processor, before the same data level data processing is performed on the different types of the initial data to obtain the standard data, the Perform the following steps:
    按照数据类型将所述初始数据分为费用类数据以及次数类数据;Divide the initial data into expense data and frequency data according to data types;
    所述计算机可读指令被所述处理器执行时所述对不同类型的所述初始数据进行同一数据量级的数据处理得到标准数据,包括:When the computer-readable instructions are executed by the processor, the data processing of the same data level on the different types of the initial data to obtain standard data includes:
    分别获取与所述费用类数据和次数类数据对应的预设公式;Obtaining preset formulas corresponding to the expense data and the frequency data respectively;
    根据所述费用类数据对应的预设公式以及对应的标准数量级,对不同数据量级的费用类数据进行同一数据量级的数据处理,得到对应所述费用类数据的标准数据;及According to the preset formula corresponding to the expense data and the corresponding standard magnitude, perform data processing of the same data magnitude on the expense data of different data magnitudes to obtain the standard data corresponding to the expense data; and
    根据所述次数类数据对应的预设公式以及对应的标准数量级,对不同数据量级的次数类数据进行同一数据量级的数据处理,得到对应所述次数类数据的标准数据。According to the preset formula corresponding to the order data and the corresponding standard order of magnitude, data processing of the same data order is performed on the order data of different data levels to obtain the standard data corresponding to the order data.
  20. 根据权利要求16或17所述的方法,其中,所述计算机可读指令被所述处理器执行时所述提取多维数据中与所述目标特征对应的初始数据之后,还执行以下步骤:The method according to claim 16 or 17, wherein, after the initial data corresponding to the target feature in the multi-dimensional data is extracted when the computer-readable instructions are executed by the processor, the following steps are further performed:
    根据所述初始数据中至少一种数据类型的数据,对所述初始数据中剩余数据类型的数据进行数据缺失检测;及Performing data missing detection on the data of the remaining data types in the initial data according to the data of at least one data type in the initial data; and
    当检测到所述初始数据存在数据缺失时,根据与缺失的数据类型相同的数据,对所述数据类型的数据进行数据填充。When it is detected that there is a data missing in the initial data, data filling is performed on the data of the data type according to the data of the same type as the missing data.
PCT/CN2020/099242 2020-01-07 2020-06-30 Data dimensionality reduction processing method and apparatus, computer device, and storage medium WO2021139112A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010014342.4 2020-01-07
CN202010014342.4A CN111221876A (en) 2020-01-07 2020-01-07 Data dimension reduction processing method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021139112A1 true WO2021139112A1 (en) 2021-07-15

Family

ID=70828129

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/099242 WO2021139112A1 (en) 2020-01-07 2020-06-30 Data dimensionality reduction processing method and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN111221876A (en)
WO (1) WO2021139112A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116990450A (en) * 2023-07-18 2023-11-03 欧几里德(苏州)医疗科技有限公司 Defect detection method and system for cornea shaping mirror

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111221876A (en) * 2020-01-07 2020-06-02 平安科技(深圳)有限公司 Data dimension reduction processing method and device, computer equipment and storage medium
CN113377761A (en) * 2021-07-16 2021-09-10 贵州电网有限责任公司电力科学研究院 Overvoltage data cleaning method and device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354198A (en) * 2014-08-19 2016-02-24 中国移动通信集团湖北有限公司 Data processing method and apparatus
CN107798563A (en) * 2017-11-09 2018-03-13 山东师范大学 Internet advertising effect assessment method and system based on multi-modal feature
CN110047015A (en) * 2019-04-22 2019-07-23 水利部信息中心 A kind of water total amount prediction technique merging KPCA and thinking Optimized BP Neural Network
US10419470B1 (en) * 2015-06-15 2019-09-17 Thetaray Ltd System and method for anomaly detection in dynamically evolving data using hybrid decomposition
CN110263074A (en) * 2019-06-26 2019-09-20 东南大学 A method of illegal accident corresponding relationship is excavated based on LLE and K averaging method
CN111221876A (en) * 2020-01-07 2020-06-02 平安科技(深圳)有限公司 Data dimension reduction processing method and device, computer equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650314A (en) * 2016-11-25 2017-05-10 中南大学 Method and system for predicting amino acid mutation
US20190115028A1 (en) * 2017-08-02 2019-04-18 Veritone, Inc. Methods and systems for optimizing engine selection
CN108985462B (en) * 2018-07-12 2021-03-12 北京航空航天大学 Unsupervised feature selection method based on mutual information and fractal dimension
CN110060166A (en) * 2019-03-13 2019-07-26 平安科技(深圳)有限公司 Intelligence Claims Resolution method, apparatus, computer equipment and storage medium
CN110516818A (en) * 2019-05-13 2019-11-29 南京江行联加智能科技有限公司 A kind of high dimensional data prediction technique based on integrated study technology

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354198A (en) * 2014-08-19 2016-02-24 中国移动通信集团湖北有限公司 Data processing method and apparatus
US10419470B1 (en) * 2015-06-15 2019-09-17 Thetaray Ltd System and method for anomaly detection in dynamically evolving data using hybrid decomposition
CN107798563A (en) * 2017-11-09 2018-03-13 山东师范大学 Internet advertising effect assessment method and system based on multi-modal feature
CN110047015A (en) * 2019-04-22 2019-07-23 水利部信息中心 A kind of water total amount prediction technique merging KPCA and thinking Optimized BP Neural Network
CN110263074A (en) * 2019-06-26 2019-09-20 东南大学 A method of illegal accident corresponding relationship is excavated based on LLE and K averaging method
CN111221876A (en) * 2020-01-07 2020-06-02 平安科技(深圳)有限公司 Data dimension reduction processing method and device, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116990450A (en) * 2023-07-18 2023-11-03 欧几里德(苏州)医疗科技有限公司 Defect detection method and system for cornea shaping mirror
CN116990450B (en) * 2023-07-18 2024-04-26 欧几里德(苏州)医疗科技有限公司 Defect detection method and system for cornea shaping mirror

Also Published As

Publication number Publication date
CN111221876A (en) 2020-06-02

Similar Documents

Publication Publication Date Title
WO2021139112A1 (en) Data dimensionality reduction processing method and apparatus, computer device, and storage medium
WO2021184571A1 (en) Dynamic form generation method, apparatus, computer device, and storage medium
Hegde et al. MICE vs PPCA: Missing data imputation in healthcare
CN108109700B (en) Method and device for evaluating curative effect of chronic disease
CN110504028A (en) A kind of disease way of inquisition, device, system, computer equipment and storage medium
WO2021114624A1 (en) Artificial intelligence-based medication recommendation method, apparatus, device, and storage medium
US9443002B1 (en) Dynamic data analysis and selection for determining outcomes associated with domain specific probabilistic data sets
US10430716B2 (en) Data driven featurization and modeling
CN111180086B (en) Data matching method, device, computer equipment and storage medium
CN111145910A (en) Abnormal case identification method and device based on artificial intelligence and computer equipment
WO2020119098A1 (en) Health evaluation method and apparatus, and computer readable storage medium
CN109597805A (en) A kind of data processing method, electronic equipment and storage medium
WO2020034801A1 (en) Medical feature screening method and apparatus, computer device, and storage medium
KR20200088498A (en) Create reference range
CN110729054B (en) Abnormal diagnosis behavior detection method and device, computer equipment and storage medium
CN109783788A (en) Tables of data complementing method, device, computer equipment and storage medium
WO2019062186A1 (en) Diabetes analysis method, application server and computer readable storage medium
CN110727711B (en) Method and device for detecting abnormal data in fund database and computer equipment
CN111383725A (en) Adverse reaction data identification method and device, electronic equipment and readable medium
CN113821641B (en) Method, device, equipment and storage medium for classifying medicines based on weight distribution
US20210256091A1 (en) Multi-dimensional data analysis and database generation
CN111274231B (en) Abnormal medical insurance data checking method and device, computer equipment and storage medium
CN113299402A (en) Medical data processing method, device, computer equipment and storage medium
CN113643783A (en) Sub-health population drug recommendation method, system, equipment and storage medium
US20130311207A1 (en) Medical Record Processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20911991

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20911991

Country of ref document: EP

Kind code of ref document: A1