WO2021139112A1

WO2021139112A1 - Data dimensionality reduction processing method and apparatus, computer device, and storage medium

Info

Publication number: WO2021139112A1
Application number: PCT/CN2020/099242
Authority: WO
Inventors: 张旭; 刘伟
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-01-07
Filing date: 2020-06-30
Publication date: 2021-07-15
Also published as: CN111221876A

Abstract

A data dimensionality reduction processing method, relating to the technical field of artificial intelligence, and comprising: extracting a historical analysis model from a database, and analyzing the historical analysis model to obtain initial features the priorities of which are greater than a target priority (S202); obtaining the frequencies of the initial features, and selecting an initial feature having a frequency satisfying the requirements as a target feature (S204); extracting initial data corresponding to the target feature from multi-dimensional data (S206); performing data processing of the same order of magnitude on the different types of initial data to obtain standard data (S208); and performing nonlinear dimensionality reduction processing on the standard data to obtain target data of a preset dimension (S210).

Description

Data dimensionality reduction processing method, device, computer equipment and storage medium

Cross-references to related applications

This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on January 7, 2020. The application number is 202010014342.4 and the application name is "Data dimensionality reduction processing methods, devices, computer equipment and storage media". The entire content of the application is approved. The reference is incorporated in this application.

Technical field

This application relates to the field of artificial intelligence, and in particular to a data dimensionality reduction processing method, device, computer equipment and storage medium.

Background technique

With the development of society, there are more and more kinds of data. In order to obtain more valuable data, more and more professionals have begun to conduct related research and analysis on various data.

But for data with huge amount of data and many types of data, the inventor realized that directly analyzing and processing data will consume a lot of processing time. This problem often leads to low processing efficiency and time-consuming effects of the hardware of the data processing system. It also consumes resources.

Summary of the invention

According to various embodiments disclosed in the present application, a data dimensionality reduction processing method, device, computer equipment, and storage medium are provided.

A data dimensionality reduction processing method, including:

Extracting relevant historical analysis models from the database, and analyzing the historical analysis models to obtain initial features with a priority higher than the target priority;

Acquiring the frequency of the initial feature, and selecting the initial feature whose frequency meets the requirements as the target feature;

Extracting the initial data corresponding to the target feature in the multidimensional data;

Perform data processing of the same data level on different types of said initial data to obtain standard data; and

Non-linear dimensionality reduction processing is performed on the standard data to obtain target data of preset dimensions.

A data dimensionality reduction processing device, including:

The analysis module is used to extract relevant historical analysis models from the database, and analyze the historical analysis models to obtain initial features with a priority higher than the target priority;

The target feature generation module is used to obtain the frequency of the initial feature, and select the initial feature whose frequency meets the requirements as the target feature;

A data extraction module for extracting initial data corresponding to the target feature in the multi-dimensional data;

The same data level data processing module is used to perform the same data level data processing on different types of said initial data to obtain standard data; and

The dimensionality reduction processing module is used to perform non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension.

A computer device includes a memory and one or more processors, and the memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the one or more processors execute the following step:

One or more computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:

The above-mentioned data dimensionality reduction processing method, device, computer equipment and storage medium, through the historical analysis model to obtain the target feature, then extract the initial data corresponding to the target feature in the multidimensional data, and process the data of the same data level to obtain the standard data , Performing non-linear dimensionality reduction processing on the standard number to obtain target data of a preset dimension. The generated target data is generated based on multi-dimensional data, and there is an association with the multi-dimensional data, so that the characteristics of the multi-dimensional data can be maintained, and subsequent data processing and analysis can be performed through the target data. Compared with multi-dimensional data for data processing and analysis, it can save the resource consumption of the system for data processing and analysis, and can improve the efficiency of data processing.

The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. A person of ordinary skill in the art can obtain other drawings based on these drawings without creative work.

Fig. 1 is an application scenario diagram of the data dimensionality reduction processing method according to one or more embodiments.

Fig. 2 is a schematic flowchart of a data dimensionality reduction processing method according to one or more embodiments.

Fig. 3 is a schematic flowchart of a target feature determination step according to one or more embodiments.

Fig. 4 is a schematic flowchart of a data missing detection step according to one or more embodiments.

Fig. 5 is a structural block diagram of a data dimensionality reduction processing device according to one or more embodiments.

Figure 6 is a block diagram of a computer device according to one or more embodiments.

Detailed ways

In order to make the technical solutions and advantages of the present application clearer, the following further describes the present application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.

The data dimensionality reduction processing method provided in this application can be applied to the application environment as shown in FIG. 1. Wherein, the terminal 102 communicates with the server 104 through the network through the network. Among them, the user can determine the data processing by triggering the terminal 102. After receiving the data processing instruction sent by the terminal 102, the server 104 extracts the historical analysis model from the database and analyzes it to obtain the initial characteristics. 104 may select the target feature according to the frequency of the acquired initial feature, so as to extract the initial data from the multi-dimensional data based on the selected target feature to obtain the initial data. Further, in order to facilitate subsequent data processing, the server 104 may also perform data processing and dimensionality reduction processing of the same data level on the initial data after obtaining the initial data to obtain target data of preset dimensions, thereby reducing subsequent data processing. Increase the amount of data and improve the efficiency of data processing. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server 104 may be implemented by an independent server or a server cluster composed of multiple servers.

In one of the embodiments, as shown in FIG. 2, a data dimensionality reduction processing method is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps:

Step 202: Extract a historical analysis model from the database, and analyze the historical analysis model to obtain an initial feature with a priority higher than the target priority.

Among them, the database may be the database of the server. The historical analysis model may be a model pre-configured in the server database. The historical analysis model configured in the server database may include models corresponding to different data types and different business types, including but not limited to models related to medical insurance, or Models related to disease types, for example, diabetes surgery analysis model, routine examination analysis model, etc.

Specifically, the server may filter the historical analysis models stored in the database according to the filter conditions, and then extract the historical analysis models obtained by the filter. For example, the server uses the medical insurance field as a filter condition to filter and extract a historical analysis model related to medical insurance from the database.

The initial characteristics refer to the characteristics obtained after analyzing the historical analysis model, which can include, but are not limited to, cost characteristics, frequency characteristics, and various indicator characteristics. Those skilled in the art can understand that the initial feature mentioned here refers to the meaning of the feature, and does not involve specific feature data of the feature. Specifically, the cost characteristics may include, but are not limited to, surgical fees, drug fees, inspection fees, etc.; the frequency characteristics include, but are not limited to, the number of medical visits, the number of inspections, the number of operations, the number of purchases of drugs, etc.; various indicator characteristics may include but not Limited to height and weight, heart rate, blood pressure, hemoglobin content, platelet count, glucose content, urine protein, etc.

The target priority refers to a preset priority level, which can be a priority level preset by the server, such as high priority, low priority, or the like, or it can be level 1, level 2, or level 3, etc. The target priority can be different according to the data type or business type, and this application does not impose any restrictions on this.

The priority of the initial feature can be associated with the historical analysis model. For different historical analysis models, the initial features extracted by the server can be different. For example, for a diabetes surgery analysis model, the initial features obtained by the server after analysis can be features such as operating expenses, number of operations, glucose content, etc. For a routine inspection analysis model, the initial features obtained by the server after analysis can be height, weight, blood pressure, etc. , Heart rate, vision and other characteristics.

Step 204: Obtain the frequency of the initial feature, and select the initial feature whose frequency meets the requirements as the target feature.

Among them, the frequency of the initial feature may be the frequency of the initial feature in different historical analysis models. For different initial features, the frequency of their appearance in multiple historical analysis models can be different. For example, for the operation fee, it can appear in various historical analysis models with a higher frequency, while for the glucose content, it may only appear in the diabetes operation analysis model with a lower frequency.

Specifically, the higher the frequency of the initial feature appearing in different historical analysis models, if the initial feature appears in all historical analysis models, it can be determined that the initial feature is more important. Therefore, the initial characteristics can be determined according to the frequency of the characteristics appearing in different historical analysis models.

Frequency compliance can mean that the frequency of the initial features in the historical analysis model meets a certain threshold condition, or the frequency of the initial features in the historical analysis model is sorted, and the initial features that meet certain requirements are ranked as the initial features whose frequency meets the requirements. Features, for example, the top 10 rankings, etc. Specifically, the server may select the initial feature according to the frequency of the initial feature in the extracted historical analysis model to determine the target feature.

Step 206: Extract the initial data corresponding to the target feature in the multi-dimensional data.

Among them, multi-dimensional data can refer to all data stored in the database, and can include new data every time the data is changed and historical data before the change. For example, corresponding to the medical insurance data mentioned above, the initial data refers to the user's medical treatment. The resulting medical treatment data stored under the user name can include historical medical treatment data and current medical treatment data, which can specifically include but not limited to the location of the consultation, the time of the consultation, the International Classification of Diseases (ICD), and registration Department, registered doctor information, registration fee, payment method, inspection items, inspection fee, condition description, medical advice, drug list, drug price, drug dosage, payment window, drug collection window, whether to follow-up, follow-up time, number of consultations, etc. data.

Specifically, the server may extract the initial data from the multi-dimensional data based on the selected target feature. In this embodiment, the initial data extracted by the server can be divided into multiple categories. For example, for medical insurance data, it may include, but is not limited to, the current medical treatment expense data, the current medical treatment ICD data, and the historical medical treatment data. Among them, the medical expenses data of this medical treatment can include but not limited to surgery fees, drug fees, examination fees, etc.; the medical ICD data of this medical visit can include but are not limited to the cost of the confirmed ICD, the average cost of the ICD, etc.; the historical medical data can be Including but not limited to data such as the number of local outpatient clinics, the number of local hospitalizations, the number of off-site outpatient clinics, the number of off-site hospitalizations, the proportion of local outpatient clinics, and the proportion of off-site outpatient clinics.

Step 208: Perform data processing of the same data level on the different types of the initial data to obtain standard data.

Specifically, due to different data types, the data level of the extracted initial data may be quite different. For example, the drug cost is 500, and the total cost is 1,000,000, and the data level of the two is quite different.

The server can perform data processing on the initial data of different data levels by using the same data level data processing method to obtain standard data of the same data level. For example, using the previous example, the same data level of data processing is performed on the drug cost and the total cost, and the drug cost and total cost of the data level between 0 and 100 are obtained, that is, the standard drug cost is 0.05, and the standard total cost is obtained. Is 100.

Specifically, data processing methods of the same data magnitude can be selected according to different data types or different data magnitudes. For example, methods such as square root, square, cube, exponent, logarithm, etc. can be selected, which is not limited in this application. .

Step 210: Perform non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension.

Among them, the preset dimension may be a dimension preset by the user on the server through the terminal according to subsequent data processing requirements, and the data volume of the target data of the preset dimension may be less than the data volume of the standard data.

Non-linear dimensionality reduction processing methods can include, but are not limited to, Isometric Feature Mapping (Isomap), Locally Linear Embedding (LLE), Modified Locally Linear Embedding (MLLE), Hessian Eigenmapping , Spectral Embedding, Local Tangent Space Alignment (LTSA), Multi-dimensional Scaling (MDS), t-distributed Stochastic Neighbor Embedding, t-SNE) and so on.

In practical applications, linear dimensionality reduction methods can also be used, including but not limited to principal component analysis (PCA), kernel PCA, and incremental principal component analysis (Incremental Principal Component Analysis). PCA) etc.

Specifically, the server can map the multi-dimensional standard data to a low dimension, for example, to 2 dimensions, by using the clustering characteristics of the data in the multi-dimensional Riemann space according to the above method, to obtain the target data.

In the above-mentioned data dimensionality reduction processing method, the target feature is obtained through the historical analysis model, and then the initial data corresponding to the target feature in the multi-dimensional data is extracted, and the standard data is obtained after the data of the same data level is processed, and the standard number is performed Non-linear dimensionality reduction processing obtains target data of preset dimensions. The generated target data is generated based on multi-dimensional data, and there is an association with the multi-dimensional data, so that the characteristics of the multi-dimensional data can be maintained, and subsequent data processing and analysis can be performed through the target data. Compared with multi-dimensional data for data processing and analysis, it can save the resource consumption of the system for data processing and analysis, and can improve the efficiency of data processing.

As mentioned above, the historical analysis model may be multiple, and may be models related to different data types or different business types.

In one of the embodiments, the analyzing the historical analysis model to obtain the initial feature with a priority higher than the target priority may include: analyzing each of the historical analysis models to obtain the corresponding historical analysis model The initial characteristic with a priority greater than the target priority.

For example, it can analyze the diabetes surgery analysis model and the routine inspection analysis model separately to obtain the initial characteristics of the operation fee, inspection fee, glucose content, etc. whose priority is higher than the target priority of the diabetes surgery analysis model, and obtain the corresponding routine inspection analysis The priority of the model is higher than the height, weight, eyesight, examination fee and other characteristics of the target priority.

Referring to the schematic flowchart of the target feature determination step shown in FIG. 3, acquiring the frequency of the initial feature and selecting the initial feature whose frequency meets the requirements as the target feature may include the following process steps:

Step S302: Determine the feature priority and appearance frequency of each of the initial features in each of the historical analysis models.

Among them, the feature priority refers to the metric standard of the feature corresponding to the target priority in the historical analysis model, which can be a high, medium, or low metric. For example, features such as operation cost and glucose content are at high priority in the diabetes surgery analysis model. Therefore, the feature priority can be high, while the features such as height and weight are at low priority in the diabetes surgery analysis model. Therefore, the feature The priority can be low.

Those skilled in the art can understand that this is only an example. In specific applications, the feature priority can also be a numerical metric. For example, a high priority corresponds to 10 points, a low priority corresponds to 1 point, etc. The application is not restricted.

Step S304: Perform a comprehensive calculation on the feature priority and appearance frequency of each of the initial features in each of the historical analysis models to obtain the relative priority level of each of the initial features in each of the historical analysis models.

For example, after the server obtains the feature priority and appearance frequency of each initial feature, it can perform comprehensive calculations on each initial feature according to the set feature priority and feature appearance frequency ratio to obtain each initial feature. The relative priority level of the feature.

Those skilled in the art can understand that this is only an example. In practical applications, the comprehensive calculation of the feature priority and the frequency of occurrence of each initial feature in each historical analysis model can also be simply added or subtracted, or Calculation methods such as squares, square roots, and exponents are used for calculations, and this application does not limit this.

The relative priority level can be a high, medium, or low priority metric, or it can be a priority metric expressed in the form of numerical scores such as 60, 70, 80, 100, or a combination of the two Determined priority level.

Step S306: Compare the relative priority level of each of the initial features in each of the historical analysis models, and determine that the initial feature whose relative priority level meets the requirements is the target feature.

Specifically, after the server calculates the relative priority level of each initial feature, it can determine the target feature by comparing the relative priority level of each initial feature. For example, it can be to determine the initial feature with a high relative priority level. Is the target feature, or the initial feature with a high relative priority level is determined as the target feature.

By obtaining the feature priority of each initial feature in each different historical analysis model and the appearance frequency of each initial feature, the comprehensive calculation of each initial feature is performed to obtain the relative priority level of each initial feature, which is selected according to the relative priority level Target characteristics. Therefore, the determined target feature has a strong correlation with the multi-dimensional data, and the accuracy of the initial data obtained based on the target feature extraction can be improved.

In one of the embodiments, after extracting the initial data corresponding to the target feature in the multidimensional data, the data dimensionality reduction processing method may further include: obtaining the data source of the initial data and the user corresponding to the initial data Address, and divide the historical data in the initial data into local data and remote data based on the data source and the user address.

As mentioned above, for medical insurance data, the initial data extracted from the multi-dimensional data may include, but is not limited to, the current medical treatment expense data, the current medical treatment ICD data, and the historical medical treatment data. Among them, historical medical treatment data may include, but are not limited to, data such as the number of local outpatient clinics, the number of local hospitalizations, the number of off-site outpatient clinics, the number of off-site hospitalizations, the proportion of local outpatient visits, and the proportion of off-site outpatient visits.

Specifically, the server may calculate the number of local outpatient clinics, the number of local hospitalizations, the number of off-site outpatient clinics, the number of off-site hospitalizations, the proportion of the number of local outpatient clinics in the historical data, and the proportion of the number of local outpatient clinics in the historical data according to the data source of the initial data and the user address corresponding to the initial data Data such as the proportion of outpatient visits is divided into data to obtain local data (the number of local outpatient visits, the number of local hospitalizations, the proportion of local outpatient visits, etc.) and remote data (the number of outpatient visits in different places, the number of hospitalizations in other places, the proportion of outpatient visits in other places, etc.) ).

In combination with the foregoing embodiment, after the server divides the initial data into local data and remote data, it performs data processing of the same data level on different types of the initial data to obtain standard data, which may include: The local data is processed with the same data level to obtain local standard data; the different types of remote data are processed with the same data level to obtain the remote standard data.

Specifically, the server performs data processing of the same data level on local data and remote data, and the obtained standard data can be data of the same data level or different. For example, the obtained local standard data can be between 0 and 10. The data obtained in different places may be data between 0-100, or may all be data between 0-100, which is not limited in this application. Optionally, the methods for performing data processing of the same order of magnitude on local data and remote data may be the same or different.

Further, the performing non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension may include: performing non-linear dimensionality reduction processing on the local standard data and the remote standard data respectively to obtain the local standard data of the preset dimensions. Target data and remote target data.

Specifically, the server may select different non-linear dimensionality reduction processing methods to perform dimensionality reduction processing on the local standard data and the remote standard data according to the different characteristics of the local standard data and the remote standard data.

Optionally, the server performs non-linear dimensionality reduction data on the local standard data and the remote standard data, and the obtained local target data and the remote target data can have the same or different dimensions, which can be different from the selected data dimensionality reduction method and subsequent data The processed data dimension association, this application does not limit this.

Obtain local data and remote data by dividing the initial data, and perform data processing of the same data level and non-linear dimensionality reduction processing to obtain two different types of target data, which is beneficial for subsequent data processing of different types according to different regions , Can make subsequent data processing more targeted and improve the accuracy of data processing.

Optionally, after the server obtains the local data and the remote data, it can continue to divide the remote medical treatment data, for example, according to specific provinces or cities, it is divided into Beijing data, Shanghai data, Guangzhou data, etc.

In practical applications, the server can also perform data processing of the same data level on the initial data, then divide the standard data geographically based on the data source and user address, and then perform data dimensionality reduction respectively to obtain preset dimensions. Local target data and remote target data.

In one of the embodiments, before data processing of the same data level is performed on different types of the initial data to obtain standard data, the appeal data processing may further include: dividing the initial data into expense data and times according to data types. Class data.

As mentioned earlier, for medical insurance data, the initial data may include data such as surgical fees, drug fees, inspection fees, the number of local outpatient clinics, the number of local hospitalizations, the number of off-site clinics, and the number of off-site hospitalizations. Specifically, the server can classify data such as surgical fees, drug fees, and inspection fees into expense data, and classify data such as the number of local outpatient clinics, the number of local hospitalizations, the number of off-site outpatient clinics, and the number of off-site hospitalizations into frequency data.

Further, said performing the same data-level data processing on the different types of the initial data to obtain the standard data may include: obtaining preset formulas corresponding to the expense type data and the frequency type data respectively; and according to the expense type The preset formula corresponding to the data and the corresponding standard magnitude, the same data magnitude data processing is performed on the expense data of different data magnitudes, and the standard data corresponding to the expense data is obtained; according to the prediction corresponding to the frequency data Set the formula and the corresponding standard magnitude, perform data processing of the same data magnitude on the order data of different data magnitudes to obtain the standard data corresponding to the order data.

Wherein, the preset formula is a function formula corresponding to the data processing method of the same data magnitude as described above, for example, a square formula, a square formula, a cube formula, an exponential formula, a logarithmic formula, etc. For different types of data, the preset formulas can be the same or different. Or, different types of data can be associated with different preset formulas for data processing.

Specifically, there is a big difference in the magnitude of data between the frequency data and the expense data. By performing data processing of the same data level on the frequency data and expense data respectively, more accurate standard data can be obtained, which can improve the accuracy of subsequent data processing.

In one of the embodiments, after the server extracts the initial data corresponding to the target feature in the multidimensional data, it can also check the integrity and polarity of the extracted initial data. Specifically, refer to the data missing as shown in FIG. 4 Schematic diagram of the flow of the detection steps. The dimensionality reduction processing method of the appeal data may also include the following steps:

Step S402: Perform data missing detection on the data of the remaining data types in the initial data according to the data of at least one data type in the initial data.

For example, continue to take medical insurance data as an example. After the server extracts the initial data, it determines that the user has 3 times of illness based on the initial data. However, in the initial data obtained, the treatment cost data only includes two sets of data of 2000 and 500. , The data on the number of illnesses can be used to determine the lack of data on treatment costs.

Step S404: When it is detected that the initial data has data missing, perform data filling on the data of the data type according to the data of the same type as the missing data.

Continue to use the previous example. After the server detects that the treatment cost data is missing, it can determine and fill in the data based on the 2000 and 500 sets of data included in the treatment cost data. For example, the average value or median of the two sets of data of 2000 and 500 can be taken, and the treatment data can be filled according to the determined average value or median.

In this embodiment, the server can also perform data evaluation based on the user's historical data in the initial data, and perform data filling based on the evaluation data obtained by the evaluation. Preferably, the data evaluation may also be based on a combination of the user's historical data and the data of the user with the same medical record to obtain the evaluation data.

By accurately detecting the acquired initial data and filling in the polarity data, the integrity of the acquired initial data can be improved, and the accuracy of subsequent data processing can be improved.

It should be understood that although the various steps in the flowcharts of FIGS. 2-4 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figures 2-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.

In one of the embodiments, as shown in FIG. 5, a data dimensionality reduction processing device is provided, which may include: an analysis module 100, a target feature generation module 200, a data extraction module 300, a data processing module 400 of the same data level, The dimensionality reduction processing module 500, in which:

The analysis module 100 is configured to extract relevant historical analysis models from the database, and analyze the historical analysis models to obtain initial features with a priority higher than the target priority.

The target feature generating module 200 is configured to obtain the frequency of the initial feature, and select the initial feature whose frequency meets the requirement as the target feature.

The data extraction module 300 is used to extract the initial data corresponding to the target feature in the multi-dimensional data.

The same data level data processing module 400 is used to perform the same data level data processing on different types of the initial data to obtain standard data.

The dimensionality reduction processing module 500 is configured to perform non-linear dimensionality reduction processing on the standard data to obtain target data of preset dimensions.

In one of the embodiments, there are multiple historical analysis models, and the analysis module 100 is configured to analyze each historical analysis model to obtain a priority corresponding to each historical analysis model that is greater than the target priority. Initial characteristics.

The target feature generation module 200 may include:

The first determining sub-module is used to determine the feature priority and the appearance frequency of the initial feature in the historical analysis model.

The calculation sub-module is used to comprehensively calculate the feature priority and appearance frequency of each of the initial features in each of the historical analysis models to obtain the relative priority of each of the initial features in each of the historical analysis models grade.

The comparison and determination sub-module is used to compare the relative priority level of each of the initial features in each of the historical analysis models, and determine that the initial feature whose relative priority level meets the requirements is the target feature.

In one of the embodiments, the above-mentioned data dimensionality reduction processing device may further include:

The data classification module is used to obtain the data source of the initial data and the user address corresponding to the initial data after the data extraction module 300 extracts the initial data corresponding to the target feature in the multidimensional data, and based on the The data source and user address divide the historical data in the initial data into local data and remote data.

The data processing module 400 of the same data level may include:

The first data processing sub-module of the same data level is used to perform data processing of the same data level on different types of the local data to obtain local standard data.

The second data processing sub-module of the same data level is used to perform data processing of the same data level on different types of said remote data to obtain remote standard data.

The dimensionality reduction processing module 500 is configured to perform non-linear dimensionality reduction processing on the local standard data and the remote standard data, respectively, to obtain the local target data and the remote target data of a preset dimension.

The classification module is used for the same data level data processing module 400 to perform data processing of the same data level on different types of the initial data before obtaining standard data, according to the data type to divide the initial data into expense data and Times data.

The same data level data processing module 400 may include:

The obtaining sub-module is used to obtain preset formulas corresponding to the expense type data and the frequency type data respectively.

The third data processing sub-module of the same data level is used to perform data processing of the same data level on the expense data of different data levels according to the preset formula corresponding to the expense data and the corresponding standard magnitude. Standard data of the expense data.

The fourth data processing sub-module of the same data magnitude is used to perform data processing of the same data magnitude on the multiple data of different data magnitudes according to the preset formula corresponding to the magnitude data and the corresponding standard magnitude to obtain the corresponding The standard data of the frequency type data.

The detection module is used for the data extraction module 300 after extracting the initial data corresponding to the target feature in the multi-dimensional data, according to the data of at least one data type in the initial data, check the remaining data types in the initial data Data is tested for missing data.

The filling module is used to fill the data of the data type according to the data of the same type as the missing data when it is detected that the initial data has data missing.

For the specific definition of the data dimensionality reduction processing device, please refer to the above definition of the data dimensionality reduction processing method, which will not be repeated here. Each module in the above-mentioned data dimensionality reduction processing device can be implemented in whole or in part by software, hardware, and combinations thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.

In one of the embodiments, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 6. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile or volatile storage medium and internal memory. The non-volatile or volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store historical analysis model data and various data during data processing. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instruction is executed by the processor to realize a data dimensionality reduction processing method.

Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. Specifically, the computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

A computer device includes a memory and one or more processors, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the one or more processors perform the following steps: extract from a database Relevant historical analysis model, and analyze the historical analysis model to obtain the initial feature with priority higher than the target priority; obtain the frequency of the initial feature, and select the initial feature with the required frequency as the target feature; extract the multidimensional data Initial data corresponding to the target feature; perform data processing of the same data level on different types of the initial data to obtain standard data; and perform non-linear dimensionality reduction processing on the standard data to obtain a target with a preset dimension data.

In one of the embodiments, there are multiple historical analysis models that are implemented when the processor executes the computer-readable instructions, and the analysis of the historical analysis models to obtain initial features with a priority higher than the target priority may include: Analyze each of the historical analysis models to obtain the initial characteristics corresponding to the priority of each historical analysis model that is greater than the target priority. The acquiring the frequency of the initial feature and selecting the initial feature whose frequency meets the requirements as the target feature may include: determining the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models; Perform a comprehensive calculation on the feature priority and the frequency of appearance of the initial features in each of the historical analysis models to obtain the relative priority level of each of the initial features in each of the historical analysis models; and compare each of the histories The relative priority level of each initial feature in the model is analyzed, and the initial feature whose relative priority level meets the requirements is determined as the target feature.

In one of the embodiments, after the processor executes the computer-readable instructions to extract the initial data corresponding to the target feature in the multi-dimensional data, it may further include: acquiring the data source of the initial data and the initial data. The user address corresponding to the data, and the historical data in the initial data is divided into local data and remote data based on the data source and the user address. The performing the same data-level data processing on the different types of the initial data to obtain standard data includes: performing the same data-level data processing on the different types of the local data to obtain local standard data; and Data processing of the same data level is performed on the type of the remote data to obtain the remote standard data. The performing non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension includes: performing non-linear dimensionality reduction processing on the local standard data and remote standard data respectively to obtain the local target data of the preset dimension, and Remote target data.

In one of the embodiments, when the processor executes the computer-readable instructions, before performing the same data-level data processing on the different types of the initial data to obtain the standard data, it may further include: dividing the initial data according to the data type. The data is divided into expense data and frequency data. The data processing of the same data level on the different types of the initial data to obtain the standard data may include: obtaining preset formulas corresponding to the cost-type data and the frequency-type data, respectively; corresponding to the cost-type data Perform data processing of the same data level on expense data of different data levels to obtain the standard data corresponding to the expense data; and according to the preset corresponding to the frequency data The formula and the corresponding standard order of magnitude, perform data processing of the same data level on the order data of different data levels to obtain the standard data corresponding to the order data.

In one of the embodiments, after extracting the initial data corresponding to the target feature in the multi-dimensional data realized when the processor executes the computer-readable instruction, it may further include: data according to at least one data type in the initial data , Performing data missing detection on the data of the remaining data types in the initial data; when detecting that the initial data has data missing, perform data filling on the data of the data type according to data of the same type as the missing data.

One or more computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps to extract relevant historical analysis models from the database , And analyze the historical analysis model to obtain the initial feature with a priority higher than the target priority; obtain the frequency of the initial feature, and select the initial feature with the required frequency as the target feature; extract the multi-dimensional data and the target feature Corresponding initial data; performing data processing of the same data level on different types of the initial data to obtain standard data; and performing non-linear dimensionality reduction processing on the standard data to obtain target data of preset dimensions.

The computer-readable storage medium may be non-volatile or volatile.

In one of the embodiments, there are multiple historical analysis models that are implemented when the computer-readable instruction is executed by the processor, and the analysis of the historical analysis model to obtain an initial feature with a priority higher than the target priority may include: Analyze each of the historical analysis models to obtain the initial characteristics corresponding to the priority of each historical analysis model that is greater than the target priority. The acquiring the frequency of the initial feature and selecting the initial feature whose frequency meets the requirements as the target feature may include: determining the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models; Perform a comprehensive calculation on the feature priority and the frequency of appearance of the initial features in each of the historical analysis models to obtain the relative priority level of each of the initial features in each of the historical analysis models; and compare each of the histories The relative priority level of each initial feature in the model is analyzed, and the initial feature whose relative priority level meets the requirements is determined as the target feature.

In one of the embodiments, after the extraction of the initial data corresponding to the target feature in the multi-dimensional data realized when the computer-readable instruction is executed by the processor, it may further include: acquiring the data source of the initial data and the The user address corresponding to the initial data, and the historical data in the initial data is divided into local data and remote data based on the data source and the user address. The performing the same data-level data processing on the different types of the initial data to obtain standard data includes: performing the same data-level data processing on the different types of the local data to obtain local standard data; and Data processing of the same data level is performed on the type of the remote data to obtain the remote standard data. The performing non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension includes: performing non-linear dimensionality reduction processing on the local standard data and the remote standard data, respectively, to obtain local target data of the preset dimension, and Remote target data.

In one of the embodiments, when the computer-readable instructions are executed by the processor to perform the same data-level data processing on the different types of the initial data to obtain the standard data, it may further include: The initial data is divided into expense data and frequency data. The performing data processing of the same data level on the different types of the initial data to obtain the standard data may include: obtaining preset formulas corresponding to the expense-type data and the frequency-type data, respectively; corresponding to the expense-type data Perform data processing of the same data level on expense data of different data levels to obtain the standard data corresponding to the expense data; and according to the preset corresponding to the frequency data The formula and the corresponding standard order of magnitude, perform data processing of the same data level on the order data of different data levels to obtain the standard data corresponding to the order data.

In one of the embodiments, after the extraction of the initial data corresponding to the target feature in the multidimensional data realized when the computer-readable instruction is executed by the processor, it may further include: Data, perform data missing detection on data of the remaining data types in the initial data; and when it is detected that the initial data has data missing, perform data on the data of the data type according to the same data as the missing data type filling.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a computer-readable storage. In the medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database, or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.

The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims

A data dimensionality reduction processing method, including:

Extracting a historical analysis model from the database, and analyzing the historical analysis model to obtain an initial feature with a priority higher than the target priority;

Acquiring the frequency of the initial feature, and selecting the initial feature whose frequency meets the requirements as the target feature;

Extracting the initial data corresponding to the target feature in the multidimensional data;

Perform data processing of the same data level on different types of said initial data to obtain standard data; and

Non-linear dimensionality reduction processing is performed on the standard data to obtain target data of preset dimensions.
The method according to claim 1, wherein there are multiple historical analysis models, and the parsing of the historical analysis models to obtain an initial feature with a priority greater than a target priority comprises:

Analyze each of the historical analysis models respectively, and obtain the initial characteristics corresponding to the priority of each historical analysis model that is greater than the target priority;

The acquiring the frequency of the initial feature and selecting the initial feature whose frequency meets the requirements as the target feature includes:

Determining the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models;

Comprehensively calculate the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models to obtain the relative priority level of each of the initial features in each of the historical analysis models; and

Comparing the relative priority levels of each of the initial features in each of the historical analysis models, and determining that the initial feature whose relative priority level meets the requirements is the target feature.
The method according to claim 1 or 2, wherein, after the extraction of the initial data corresponding to the target feature in the multi-dimensional data, the method further comprises:

Acquiring the data source of the initial data and the user address corresponding to the initial data, and dividing the historical data in the initial data into local data and off-site data based on the data source and user address;

The data processing of the same data level on the different types of the initial data to obtain standard data includes:

Perform data processing of the same data level on different types of said local data to obtain local standard data; and

Perform data processing of the same data level on different types of said remote data to obtain remote standard data;

The performing non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension includes:

Non-linear dimensionality reduction processing is performed on the local standard data and the remote standard data, respectively, to obtain the local target data and the remote target data of a preset dimension.
The method according to claim 1 or 2, wherein before the data processing of the same data level is performed on the different types of the initial data to obtain the standard data, the method further comprises:

Divide the initial data into expense data and frequency data according to data types;

The data processing of the same data level on the different types of the initial data to obtain the standard data includes:

Obtaining preset formulas corresponding to the expense data and the frequency data respectively;

According to the preset formula corresponding to the expense data and the corresponding standard magnitude, perform data processing of the same data magnitude on the expense data of different data magnitudes to obtain the standard data corresponding to the expense data; and

According to the preset formula corresponding to the order data and the corresponding standard order of magnitude, data processing of the same data order is performed on the order data of different data levels to obtain the standard data corresponding to the order data.
The method according to claim 1 or 2, wherein, after the extraction of the initial data corresponding to the target feature in the multi-dimensional data, the method further comprises:

Performing data missing detection on the data of the remaining data types in the initial data according to the data of at least one data type in the initial data; and

When it is detected that there is a data missing in the initial data, data filling is performed on the data of the data type according to the data of the same type as the missing data.
A data dimensionality reduction processing device, including:

The analysis module is used to extract a historical analysis model from the database, and analyze the historical analysis model to obtain an initial feature with a priority higher than the target priority;

The target feature generation module is used to obtain the frequency of the initial feature, and select the initial feature whose frequency meets the requirements as the target feature;

A data extraction module for extracting initial data corresponding to the target feature in the multi-dimensional data;

The same data level data processing module is used to perform the same data level data processing on different types of said initial data to obtain standard data; and

The dimensionality reduction processing module is used to perform dimensionality reduction processing on the standard data to obtain target data of preset dimensions.
7. The device according to claim 6, wherein there are multiple historical analysis models, and the analysis module is used to analyze each of the historical analysis models to obtain that the priority of each historical analysis model is greater than the target Initial characteristics of priority;

The target feature generation module includes:

The first determining sub-module is used to determine the feature priority and the frequency of appearance of the initial feature in the historical analysis model;

The calculation sub-module is used to comprehensively calculate the feature priority and appearance frequency of each of the initial features in each of the historical analysis models to obtain the relative priority of each of the initial features in each of the historical analysis models Grade; and

The comparison and determination sub-module is used to compare the relative priority level of each of the initial features in each of the historical analysis models, and determine that the initial feature whose relative priority level meets the requirements is the target feature.
The device according to claim 6 or 7, wherein the device further comprises:

The data classification module is used to obtain the data source of the initial data and the user address corresponding to the initial data after the data extraction module extracts the initial data corresponding to the target feature in the multidimensional data, and based on the data The source place and the user address divide the historical medical treatment data in the initial data into local medical treatment data and remote medical treatment data;

The data processing module of the same data level includes:

The first data processing sub-module of the same data level is used to perform data processing of the same data level on different types of said local medical treatment data to obtain local standard data;

The second data processing sub-module of the same data level is used to perform data processing of the same data level on different types of said remote medical treatment data to obtain remote standard data; and

The dimensionality reduction processing module is used to perform dimensionality reduction processing on the local standard data and the remote standard data respectively to obtain the local target data and the remote target data of a preset dimension.
The device according to claim 6 or 7, wherein the data dimensionality reduction processing device further comprises:

The classification module is used to divide the initial data into expense data and frequency data according to the data type before performing data processing of the same data level on the different types of the initial data to obtain the standard data;

The data processing module of the same data level includes:

The obtaining sub-module is used to obtain the preset formula corresponding to the expense data and the frequency data respectively;

The third data processing sub-module of the same data level is used to perform data processing of the same data level on the expense data of different data levels according to the preset formula corresponding to the expense data and the corresponding standard magnitude. Standard data of the expense data;

The fourth data processing sub-module of the same data magnitude is used to perform data processing of the same data magnitude on the multiple data of different data magnitudes according to the preset formula corresponding to the magnitude data and the corresponding standard magnitude to obtain the corresponding The standard data of the frequency type data.
The device according to claim 6 or 7, wherein the device further comprises:

The detection module is configured to, after the data extraction module extracts the initial data corresponding to the target feature in the multi-dimensional data, determine the remaining data types in the initial data according to the data of at least one data type in the initial data Data missing detection;

The filling module is used to fill the data of the data type according to the data of the same type as the missing data when it is detected that the initial data has data missing.
A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more The processor performs the following steps:

Extracting a historical analysis model from the database, and analyzing the historical analysis model to obtain an initial feature with a priority higher than the target priority;

Acquiring the frequency of the initial feature, and selecting the initial feature whose frequency meets the requirements as the target feature;

Extracting the initial data corresponding to the target feature in the multidimensional data;

Perform data processing of the same data level on different types of said initial data to obtain standard data; and

Non-linear dimensionality reduction processing is performed on the standard data to obtain target data of preset dimensions.
The method according to claim 11, wherein there are multiple historical analysis models, and the parsing of the historical analysis model implemented when the processor executes the computer-readable instructions obtains that the priority is greater than the target The initial characteristics of priority include:

Analyze each of the historical analysis models respectively, and obtain the initial characteristics corresponding to the priority of each historical analysis model that is greater than the target priority;

The frequency at which the processor executes the computer-readable instruction to acquire the initial feature and select the initial feature whose frequency meets the requirements as the target feature includes:

Determining the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models;

Comprehensively calculate the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models to obtain the relative priority level of each of the initial features in each of the historical analysis models; and

Comparing the relative priority levels of each of the initial features in each of the historical analysis models, and determining that the initial feature whose relative priority level meets the requirements is the target feature.
The method according to claim 11 or 12, wherein after said extracting the initial data corresponding to the target feature in the multi-dimensional data which is implemented when the processor executes the computer readable instruction, the following steps are further implemented:

Acquiring the data source of the initial data and the user address corresponding to the initial data, and dividing the historical data in the initial data into local data and off-site data based on the data source and user address;

When the processor executes the computer-readable instructions, the same data-level data processing is performed on the different types of the initial data to obtain standard data, including:

Perform data processing of the same data level on different types of said local data to obtain local standard data; and

Perform data processing of the same data level on different types of said remote data to obtain remote standard data;

The non-linear dimensionality reduction processing on the standard data that is implemented when the processor executes the computer-readable instructions to obtain target data of a preset dimension includes:

Non-linear dimensionality reduction processing is performed on the local standard data and the remote standard data, respectively, to obtain the local target data and the remote target data of a preset dimension.
The method according to claim 11 or 12, wherein the processing of the same data level on the different types of the initial data, which is implemented when the processor executes the computer-readable instructions, obtains standard data , And implement the following steps:

Divide the initial data into expense data and frequency data according to data types;

When the processor executes the computer-readable instructions, the data processing of the same data level on the different types of the initial data to obtain the standard data includes:

Obtaining preset formulas corresponding to the expense data and the frequency data respectively;

According to the preset formula corresponding to the expense data and the corresponding standard magnitude, perform data processing of the same data magnitude on the expense data of different data magnitudes to obtain the standard data corresponding to the expense data; and

According to the preset formula corresponding to the order data and the corresponding standard order of magnitude, data processing of the same data order is performed on the order data of different data levels to obtain the standard data corresponding to the order data.
The method according to claim 11 or 12, wherein after said extracting the initial data corresponding to the target feature in the multi-dimensional data which is implemented when the processor executes the computer readable instruction, the following steps are further implemented:

Performing data missing detection on the data of the remaining data types in the initial data according to the data of at least one data type in the initial data; and

When it is detected that there is a data missing in the initial data, data filling is performed on the data of the data type according to the data of the same type as the missing data.
One or more computer-readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:

Extracting a historical analysis model from the database, and analyzing the historical analysis model to obtain an initial feature with a priority higher than the target priority;

Acquiring the frequency of the initial feature, and selecting the initial feature whose frequency meets the requirements as the target feature;

Extracting the initial data corresponding to the target feature in the multidimensional data;

Perform data processing of the same data level on different types of said initial data to obtain standard data; and

Non-linear dimensionality reduction processing is performed on the standard data to obtain target data of preset dimensions.
The method according to claim 16, wherein there are multiple historical analysis models, and when the computer-readable instructions are executed by the processor, the historical analysis model is analyzed to obtain a priority greater than a target priority The initial characteristics include:

Analyze each of the historical analysis models respectively, and obtain the initial characteristics corresponding to the priority of each historical analysis model that is greater than the target priority;

The acquiring the frequency of the initial feature when the computer-readable instruction is executed by the processor and selecting the initial feature whose frequency meets the requirements as the target feature includes:

Determining the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models;

Comprehensively calculate the feature priority and the frequency of appearance of each of the initial features in each of the historical analysis models to obtain the relative priority level of each of the initial features in each of the historical analysis models; and

Comparing the relative priority levels of each of the initial features in each of the historical analysis models, and determining that the initial feature whose relative priority level meets the requirements is the target feature.
The method according to claim 16 or 17, wherein, after the initial data corresponding to the target feature in the multi-dimensional data is extracted when the computer-readable instructions are executed by the processor, the following steps are further performed:

Acquiring the data source of the initial data and the user address corresponding to the initial data, and dividing the historical data in the initial data into local data and off-site data based on the data source and user address;

When the computer-readable instructions are executed by the processor, the same data-level data processing is performed on the different types of the initial data to obtain standard data, including:

Perform data processing of the same data level on different types of said local data to obtain local standard data; and

Perform data processing of the same data level on different types of said remote data to obtain remote standard data;

When the computer-readable instructions are executed by the processor, the non-linear dimensionality reduction processing on the standard data to obtain target data of a preset dimension includes:

Non-linear dimensionality reduction processing is performed on the local standard data and the remote standard data, respectively, to obtain the local target data and the remote target data of a preset dimension.
The method according to claim 16 or 17, wherein, when the computer-readable instructions are executed by the processor, before the same data level data processing is performed on the different types of the initial data to obtain the standard data, the Perform the following steps:

Divide the initial data into expense data and frequency data according to data types;

When the computer-readable instructions are executed by the processor, the data processing of the same data level on the different types of the initial data to obtain standard data includes:

Obtaining preset formulas corresponding to the expense data and the frequency data respectively;

According to the preset formula corresponding to the expense data and the corresponding standard magnitude, perform data processing of the same data magnitude on the expense data of different data magnitudes to obtain the standard data corresponding to the expense data; and

According to the preset formula corresponding to the order data and the corresponding standard order of magnitude, data processing of the same data order is performed on the order data of different data levels to obtain the standard data corresponding to the order data.
The method according to claim 16 or 17, wherein, after the initial data corresponding to the target feature in the multi-dimensional data is extracted when the computer-readable instructions are executed by the processor, the following steps are further performed:

Performing data missing detection on the data of the remaining data types in the initial data according to the data of at least one data type in the initial data; and

When it is detected that there is a data missing in the initial data, data filling is performed on the data of the data type according to the data of the same type as the missing data.