WO2021196457A1 - 数据相关性分析方法、装置、计算机系统及可读存储介质 - Google Patents

数据相关性分析方法、装置、计算机系统及可读存储介质 Download PDF

Info

Publication number
WO2021196457A1
WO2021196457A1 PCT/CN2020/103829 CN2020103829W WO2021196457A1 WO 2021196457 A1 WO2021196457 A1 WO 2021196457A1 CN 2020103829 W CN2020103829 W CN 2020103829W WO 2021196457 A1 WO2021196457 A1 WO 2021196457A1
Authority
WO
WIPO (PCT)
Prior art keywords
qualitative
quantitative
data set
information
dimension
Prior art date
Application number
PCT/CN2020/103829
Other languages
English (en)
French (fr)
Inventor
吴锐
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2021196457A1 publication Critical patent/WO2021196457A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Definitions

  • This application relates to the field of computer technology, which relates to the knowledge representation and reasoning technology of artificial intelligence, and in particular to a data correlation analysis method, device, computer system and readable storage medium.
  • Diversion refers to the process in which the platform party forwards a customer application to a funder, that is, the process of converting a certain transaction product that the customer applies for on the platform side into product information of the funder.
  • the platform will be connected to multiple product information, and each product information has different requirements for customers. Some product information is limited to the business area, so there are requirements for the customer's area; some have restrictions on the customer's loan amount, how to correctly determine a product information based on business data is a problem that the platform must solve.
  • the current platform adopts a tree-like management method, that is, the product information that has requirements for the exhibition area is divided into one category, and the product information that does not require the exhibition area is divided into another category; on this basis
  • the above classifies those with restrictions on loan amounts into one category, and those with unlimited loan amounts into another category, and so on; however, the inventor realizes that this kind of rough division of customer applications based on the requirements of the funder
  • the method can only divide customer applications from a relatively single dimension to meet the rigid requirements of the funder, and cannot identify factors outside the rigid requirements of the funder (for example, the funder’s loan preference factors and risks specified by its historical data analysis). Controlling dimensions, etc.), it is impossible to accurately match customer applications, resulting in a low success rate of product information recommended by the platform.
  • the purpose of this application is to provide a data relevance analysis method, device, computer system and readable storage medium, which are used to solve the existing technology that cannot identify the funder outside the rigid requirements, which makes it impossible to apply for the client. Accurate matching results in a lower success rate of product information recommended by the platform.
  • this application provides a data correlation analysis method based on artificial intelligence, including:
  • the data set is extracted from the comprehensive database, and the information entropy of the data set is calculated to determine the qualitative analysis dimension of the data set, and the qualitative judgment condition of the data set is formulated according to the qualitative information under each qualitative analysis dimension. It is sent to the qualitative knowledge base; wherein, the qualitative judgment condition is the qualitative information with recognition degree in the response data set;
  • Extract a data set from the comprehensive database calculate the maximum density range of the data set to determine the quantitative analysis dimension of the data set, and formulate the quantitative judgment condition of the data set according to each quantitative analysis dimension and its maximum density range And send it to the quantitative knowledge base; wherein the quantitative judgment condition is the quantitative information with recognition degree in the reaction data set;
  • the quantitative judgment condition calculates the correlation between the data to be evaluated and each data set and obtains a related evaluation value, and sends the product information of the data set with the highest related evaluation value to the man-machine interface.
  • this application also provides an artificial intelligence-based data correlation analysis device, including:
  • the data processing module is used to obtain historical business data and extract product information therein, classify the historical business data according to the product information, obtain at least one data set composed of historical business data of the same product information, and send it to a comprehensive database ;
  • the product information is the name information of the product that reflects the user's consumption in the historical business data;
  • the qualitative analysis module is used to extract a data set from the comprehensive database, calculate the information entropy of the data set to determine the qualitative analysis dimension of the data set, and formulate the data set according to the qualitative information in each qualitative analysis dimension
  • the directional analysis module is used to extract a data set from the comprehensive database, and calculate the maximum density range of the data set to determine the quantitative analysis dimension of the data set, and formulate the quantitative analysis dimension according to each quantitative analysis dimension and its maximum density range
  • the quantitative judgment condition of the data set is sent to the quantitative knowledge base; wherein, the quantitative judgment condition is the quantitative information with recognition degree in the reaction data set;
  • the inference engine module is used to receive the user's quantitative information and qualitative information to be evaluated output by the man-machine interface, and extract qualitative judgment conditions and quantitative judgment conditions from the qualitative knowledge base and quantitative knowledge base respectively, according to The qualitative judgment condition and the quantitative judgment condition calculate the correlation between the data to be evaluated and each data set and obtain a related evaluation value, and send the product information of the data set with the highest related evaluation value to the man-machine interface;
  • the man-machine interface is used to output data to be evaluated and receive product information.
  • the present application also provides a computer system, which includes a plurality of computer devices, each computer device includes a memory, a processor, and a computer program stored in the memory and running on the processor, the multiple computers
  • the processor of the device executes the computer program, the steps of the above data correlation analysis method are jointly implemented.
  • the present application also provides a computer-readable storage medium, which includes multiple storage media, each of which stores a computer program, and when the computer program stored in the multiple storage media is executed by a processor Jointly implement the steps of the above-mentioned data correlation analysis method.
  • the data correlation analysis method, device, computer system, and readable storage medium realize the classification of historical business data and obtain data sets through a comprehensive database.
  • Each data set contains the hard requirements of each product information. And all factors other than the rigid requirements; calculate the data set through the qualitative knowledge base to obtain the qualitative dimension with recognition degree and set it as the qualitative analysis dimension, according to the qualitative analysis dimension, obtain the most recognizable judgment value range under the qualitative analysis dimension And judgment methods to realize all the requirements for identifying the qualitative dimensions of product information; through the quantitative knowledge base to calculate the data set to obtain the quantitative dimension with recognition degree and set it as the quantitative analysis dimension, according to the quantitative analysis dimension, the quantitative analysis dimension can be obtained.
  • the data to be evaluated through the inference engine to the human-machine interface output from the two aspects of quantitative judgment conditions and qualitative judgment conditions Perform calculations to obtain the relevant evaluation value between the data to be evaluated and each data set, so as to realize the judgment of the matching degree between the data to be evaluated and the product information from the quantitative and qualitative dimensions, and realize the relationship between the data to be evaluated and the product information.
  • the precise matching of the product information improves the success rate of the product information recommended by the platform, and therefore solves the existing technology that cannot identify the funder’s in addition to the hard requirements.
  • the problem of low success rate of recommended product information is a problem of low success rate of recommended product information.
  • FIG. 1 is a flowchart of Embodiment 1 of the data correlation analysis method of this application;
  • FIG. 3 is a flowchart of the qualitative analysis dimension of the data set determined in the first embodiment S2 of the data correlation analysis method of this application;
  • FIG. 4 is a flowchart of formulating qualitative judgment conditions of the data set in S2 of the first embodiment of the data correlation analysis method of this application;
  • 5 is a flowchart of determining the quantitative analysis dimension of the data set in S3 of the first embodiment of the data correlation analysis method of this application;
  • FIG. 6 is a flowchart of formulating the quantitative judgment conditions of the data set in S3 of the first embodiment of the data correlation analysis method of this application;
  • FIG. 7 is a flow chart of obtaining the relevant evaluation value describing the matching degree between the data to be evaluated and each data set in S4 of the first embodiment of the data correlation analysis method of this application;
  • FIG. 8 is a schematic diagram of program modules of Embodiment 2 of the data correlation analysis device of this application.
  • FIG. 9 is a schematic diagram of the hardware structure of the computer equipment in the third embodiment of the computer system of this application.
  • Data correlation analysis device 2. Computer equipment 11. Data processing module
  • the data correlation analysis method, device, computer system, and readable storage medium provided in this application are applicable to the computer field, and provide a data correlation based on a comprehensive database, a qualitative knowledge base, a quantitative knowledge base, a reasoning machine, and a human-machine interface.
  • sexual analysis methods are applicable to the computer field, and provide a data correlation based on a comprehensive database, a qualitative knowledge base, a quantitative knowledge base, a reasoning machine, and a human-machine interface.
  • This application obtains historical business data and extracts product information therein, classifies the historical business data according to product information, obtains at least one data set composed of historical business data of the same product information, and sends the data set to qualitative knowledge Database and quantitative knowledge base; calculate the information entropy of the data set to determine the qualitative analysis dimension of the data set, formulate the qualitative judgment condition of the data set according to the qualitative information under each qualitative analysis dimension, and send the qualitative judgment condition Inference engine; calculates the maximum density range of the data set to determine the quantitative analysis dimension of the data set, formulates the quantitative judgment condition of the data set according to each quantitative analysis dimension and its maximum density range, and sends the quantitative judgment condition to inference Machine; receiving the data to be evaluated output by the man-machine interface, calculating the data to be evaluated through the qualitative judgment conditions and quantitative judgment conditions of each of the data sets, to obtain a description of the degree of matching between the data to be evaluated and each data set The relevant evaluation value of, the product information of the data set with the highest relevant evaluation value is sent to the man-machine interface.
  • An artificial intelligence-based data correlation analysis method of this embodiment includes:
  • S1 Obtain historical business data and extract product information therein, classify the historical business data according to product information, obtain at least one data set composed of historical business data of the same product information and send it to a comprehensive database;
  • the product information mentioned is the name information of the product that reflects the user's consumption in the historical business data;
  • S3 Extract a data set from the comprehensive database, calculate the maximum density range of the data set to determine the quantitative analysis dimension of the data set, and formulate the quantitative analysis of the data set according to each quantitative analysis dimension and its maximum density range Judging conditions and sending them to the quantitative knowledge base; wherein, the quantitative judging conditions are quantitative information with recognition degree in the reaction data set;
  • S4 Receive the data to be evaluated and record the quantitative information and qualitative information of the user output by the man-machine interface, and extract qualitative judgment conditions and quantitative judgment conditions from the qualitative knowledge base and the quantitative knowledge base, respectively, according to the qualitative judgment
  • the condition and the quantitative judgment condition calculate the correlation between the data to be evaluated and each data set and obtain the related evaluation value, and send the product information of the data set with the highest related evaluation value to the man-machine interface.
  • historical business data is obtained from a database storing historical business data.
  • the dimensional characteristics of the historical business data include qualitative dimensions, quantitative dimensions, and product information.
  • the information under the qualitative dimensions is qualitative information
  • the information in the quantitative dimension is quantitative information; wherein the qualitative dimension refers to the dimensional characteristics that describe the user's characteristics in the form of text, such as last name, gender, occupation, etc.; the quantitative dimension refers to the description of the user in the form of numbers
  • the dimensional characteristics of the feature such as age, working experience, etc.
  • the product information is the dimensional feature reflecting the product information purchased by the user in the history, which at least includes: the product name; the data set refers to the history corresponding to the same product information
  • the information collection constituted by business data. For example, if product information includes product A and product B, then two data sets will be obtained, one of which covers all historical business data of product B purchased in history, and the other covers all historical business data Purchased the historical business data of product B.
  • the qualitative information with the highest probability best reflects the recognition of the data set.
  • the maximum density range of historical quantitative information in each quantitative dimension in the data set is calculated by means of a mean shift model; the quantitative dimension is set as the quantitative analysis dimension of the data set, and the quantitative analysis is obtained according to the quantitative analysis dimension and its maximum density range. Judgment condition; the mean shift model is a non-parametric method based on density gradient rise. It finds the target position through iterative calculations and realizes target tracking algorithm; therefore, in this application, the maximum density range is taken as the target position, and each quantitative value is found through an iterative algorithm The area where the maximum density of values under the dimension is located, and set this area as the maximum density range.
  • the qualitative evaluation value and the quantitative evaluation value are weighted and calculated to obtain the relevant evaluation value of the service data to be evaluated for the data set; compare the relevant evaluation value of the service data to be evaluated for each data set, and compare the relevant evaluation
  • the product information corresponding to the data set with the highest value is set as the recommended product and output to the man-machine interface.
  • the step of obtaining historical business data and extracting product information described in S1 includes:
  • S101 Set the number of training sessions through the configuration module, and obtain historical service data with the number consistent with the number of training sessions from the historical database.
  • the historical database is a database used to store historical business data; setting the number of training helps data managers to ensure the number of training on historical business data, ensuring the accuracy of the trained qualitative and quantitative judgment conditions , Wherein the number of training sessions can be set as required.
  • DMCTextFilter can be used as the configuration module.
  • DMCTextFilter is a general-purpose library for plain text extraction. It can completely remove special control information from various document format data or from inserted OLE objects, and quickly extract Plain text data information. It is convenient for users to realize unified management, editing, retrieval and browsing of multiple document data resource information.
  • S102 Obtain the dimension value type in the historical business data through the dimension module, set the dimension ID and dimension code corresponding to the character as the qualitative dimension as the qualitative dimension, set the information corresponding to the qualitative dimension as the qualitative information, and set the dimension
  • the value type is code value, or date, or dimension ID and dimension code corresponding to the value are set as quantitative dimensions, and the information corresponding to the quantitative dimension is set as quantitative information; wherein, the dimension ID is marked in historical business data The numerical number of the dimension feature.
  • the historical business data is as follows:
  • the re module is used as the dimension module, and the re module is a module that is embedded and integrated in python and is used to directly implement regular matching.
  • the product information of the aforementioned historical business data is extracted as product A, so as to classify the historical business data according to the product information, for example, the historical business data whose product information is product A is classified into a data set.
  • the re module is used as the product module, and the re module is a module that is embedded and integrated in python and is used to directly implement regular matching.
  • the step of calculating the information entropy of the data set in S2 to determine the qualitative analysis dimension of the data set includes:
  • S201 Summarize the qualitative information in each qualitative dimension in the historical business data of the data set through the qualitative summary module to obtain a qualitative set.
  • the historical qualitative information under the qualitative dimension is extracted and summarized to obtain the qualitative set; for example, the qualitative dimension is "gender", and the qualitative set is ⁇ , ⁇ , Male, male, female ⁇ .
  • the re module can be used as a qualitative summary module.
  • the re module is a module integrated in python and used to directly implement regular matching.
  • S202 Use the probability module to calculate the occurrence probability of various types of qualitative information in the qualitative set through a preset information gain model, so as to obtain information entropy of the qualitative dimension corresponding to the qualitative set.
  • the quantity of historical qualitative information in the qualitative set is obtained and set as the qualitative total amount, the qualitative set is deduplicated to obtain a qualitative category set with qualitative categories, and the qualitative categories are sequentially obtained in the qualitative set Calculate the probability of occurrence of the qualitative category according to the qualitative single quantity; Based on the above example, the qualitative total is 5, and the set of qualitative categories is ⁇ Male, Female ⁇ ; where, " The qualitative order quantity of "male” is 4, and the qualitative order quantity of "female” is 1; the appearance probability of the qualitative type being male is 80%, and the appearance probability of the qualitative type being female is 20%.
  • E is the information entropy
  • pi is the appearance probability of the i-th qualitative name.
  • math module of python can be used to construct the information gain formula of the probability module.
  • the math module defines mathematical functions. Since this module comes with the compilation system, it can be called unconditionally to construct The formula of the probability module.
  • S203 Use the qualitative judgment module to set the qualitative dimension whose information entropy is less than the preset information threshold value as the qualitative analysis dimension of the data set.
  • the information entropy is filtered through the preset information threshold to eliminate the qualitative dimension with small information entropy; information entropy is a quantitative index used as the information content of a system. If the information entropy is larger, It means that the greater the degree of confusion in the content of the information, the lower the reliability of identifying the system through the dimensions corresponding to the information entropy. On the contrary, the smaller the information entropy, the less the degree of confusion in the content of the information.
  • the gender distribution is very confusing, so the reliability of identifying this class by gender is relatively low; on the contrary, if there are 19 boys and 1 girl in a class, the information entropy is relatively small, which means this
  • the genders of the classes are very regular, so the reliability of identifying this class by gender is relatively high.
  • a computer module written by computer code with an "IF" function can be used as the qualitative judgment module to set the qualitative dimension with information entropy less than the information threshold as the qualitative analysis dimension of the data set.
  • the step of formulating the qualitative judgment condition of the data set according to the qualitative information in each qualitative analysis dimension in S2 includes:
  • S211 Use the range module to set the qualitative category with the highest occurrence probability in the qualitative analysis dimension in the data set as the judgment range.
  • a computer module written by computer code with a "conditional counting function COUNTIF" function can be used as the range module to calculate the qualitative category with the highest occurrence probability in the qualitative analysis dimension as the judgment range.
  • the qualitative condition module obtains the judgment method corresponding to the qualitative analysis dimension from the qualitative mapping table, and summarizes the judgment value range and judgment method to generate the qualitativeness of the data set Analyzing conditions.
  • the preset mapping table has a mapping relationship between the qualitative analysis dimension and the judgment method; in this embodiment, the mapping relationship reflects the mapping between the dimension value type of the qualitative dimension and the judgment method; For example, the judgment method corresponding to the dimension value type being the code value is "belongs to", and the judgment method corresponding to the dimension value type being the character type being "contains”.
  • the qualitative category with the highest occurrence probability in the qualitative analysis dimension is set as the judgment value range;
  • the qualitative judgment condition also includes a judgment method, and the judgment method is The behavior of judging the relationship between the qualitative information of the data to be evaluated and the judgment value range in the qualitative information of the data to be evaluated;
  • the judgment method of the dimension value type as the code value includes "belongs", and the judgment method of the dimension value type as the character type includes "contains ".
  • map() mapping function can be used as the qualitative condition module to obtain the judgment method corresponding to the qualitative analysis dimension from the qualitative mapping table, and to summarize the judgment value range and judgment method to generate the qualitative data set Analyzing conditions.
  • the step of calculating the maximum density range of the data set in S3 to determine the quantitative analysis dimension of the data set includes:
  • S301 Use the drift module to calculate the maximum density range of quantitative information in each quantitative dimension in the data set through a preset mean drift model.
  • S is the high-dimensional ball area
  • k is the number of points in the high-dimensional ball area
  • X is the center point of the high-dimensional ball area
  • Xi is the quantitative information falling in the high-dimensional ball area
  • M is the center point of the high-dimensional ball area and The average distance of the historical quantitative information falling into the high-dimensional sphere area, and the high-dimensional sphere area is continuously moved until the M is minimum; the center point of the high-dimensional sphere area is extracted, and the center point and its radius are subtracted to obtain the lower limit of quantification, Then add the center point and its radius to obtain the upper limit of quantification; obtain the maximum density range according to the upper limit of quantification and the lower limit of quantification.
  • math module of python can be used to construct a drift module with a mean drift model.
  • S302 Extract the quantity of quantitative information in the maximum density range through the quantitative judgment module, and if the quantity is greater than a preset quantitative threshold, set the quantitative dimension corresponding to the maximum density range as the quantitative analysis dimension of the data set .
  • the quantitative threshold is set according to the needs of the user, the quantity of quantitative information in the high-dimensional sphere area corresponding to the maximum density range is extracted, and the quantity is compared with the quantitative threshold, and the quantity is greater than the quantitative threshold.
  • the quantitative dimension corresponding to the maximum density range is set as the quantitative analysis dimension of the data set.
  • a computer module written by computer code with an "IF" function can be used as a quantitative judgment module, so that if the amount is greater than a preset quantitative threshold, the quantitative dimension corresponding to the maximum density range is set to Describe the quantitative analysis dimension of the data set.
  • the step of formulating the quantitative judgment condition of the data set according to each quantitative analysis dimension and its maximum density range in S3 includes:
  • S311 Use the mode range module to obtain the judgment mode of the quantitative analysis dimension from the preset quantitative mapping table, and use the maximum density range as the judgment range.
  • the preset quantitative mapping table has a mapping relationship between the quantitative analysis dimension and the judgment method; in this embodiment, the mapping relationship reflects the mapping between the dimension value type of the quantitative dimension and the judgment method ; For example, if the dimension value type is numeric and date, the judgment method is "range".
  • map() mapping function can be used as the mode value range module to obtain the quantitative analysis dimension judgment mode from the quantitative mapping table, and the maximum density range is used as the judgment value range.
  • S312 Summarize the judgment value range and judgment method through the quantitative condition module to generate a quantitative judgment condition of the quantitative analysis dimension.
  • the quantitative judgment condition is formed as follows:
  • classification and summary function SUBTOTAL can be used to make a quantitative condition module to summarize the judgment range and judgment method to generate the quantitative judgment condition of the quantitative analysis dimension.
  • the creation success signal is generated and output to the man-machine interface.
  • receiving the data to be evaluated output by the human-machine interface in S4 includes:
  • the step of calculating the correlation between the data to be evaluated and each data set according to the qualitative judgment condition and the quantitative judgment condition in S4 and obtaining the relevant evaluation value includes:
  • S401 Calculate the qualitative information of the data to be evaluated to obtain the qualitative evaluation value according to the qualitative judgment condition of the data set through the qualitative evaluation module;
  • the qualitative evaluation value of the qualitative information is assigned a value of 1; If the qualitative information corresponding to the judgment condition does not meet the judgment mode and judgment value range of the qualitative judgment condition, the qualitative evaluation value of the qualitative information is assigned a value of 0.
  • a computer module written by computer code with a "conditional counting function COUNTIF" function can be used as the qualitative evaluation module to calculate the qualitative information of the data to be evaluated to obtain the qualitative evaluation value according to the qualitative judgment conditions of the data set.
  • the qualitative information corresponding to the qualitative judgment condition in the data to be evaluated is as follows:
  • the qualitative information corresponding to the qualitative judgment condition in the data to be evaluated and its qualitative evaluation value are as follows:
  • S402 Calculate the quantitative information of the data to be evaluated to obtain the quantitative evaluation value according to the quantitative judgment conditions of the data set through the quantitative evaluation module;
  • the quantitative evaluation value of the quantitative information is assigned a value of 1; If the quantitative information corresponding to the judgment condition does not meet the judgment mode and judgment value range of the quantitative judgment condition, then the quantitative evaluation value of the quantitative information is assigned a value of 0.
  • a computer module written by computer code with a "conditional counting function COUNTIF" function can be used as the quantitative evaluation module to calculate the quantitative information of the data to be evaluated to obtain the quantitative evaluation value according to the quantitative judgment conditions of the data set.
  • the quantitative information corresponding to the quantitative judgment conditions in the data to be evaluated is as follows:
  • S403 Perform a weighted calculation on the quantitative evaluation value and the qualitative evaluation value through a calculation module to obtain a relevant evaluation value describing the degree of matching between the data to be evaluated and each data set.
  • math module of python may be used to construct the calculation module to perform weighted calculation on the quantitative evaluation value and the qualitative evaluation value to obtain the relevant evaluation value.
  • the step of sending the product information of the data set with the highest relevant evaluation value to the man-machine interface in S4 includes:
  • An artificial intelligence-based data correlation analysis device 1 of this embodiment includes:
  • the data processing module 11 is used to obtain historical business data and extract product information therein, classify the historical business data according to the product information, obtain at least one data set composed of historical business data of the same product information and send it to the integrated Database; wherein, the product information is the name information of the product that reflects the user's consumption in the historical business data;
  • the qualitative analysis module 12 is configured to extract a data set from the comprehensive database, calculate the information entropy of the data set to determine the qualitative analysis dimension of the data set, and formulate the data according to the qualitative information in each qualitative analysis dimension Collect the qualitative judgment conditions of the collection and send them to the qualitative knowledge base; wherein, the qualitative judgment conditions reflect the qualitative information with recognition degree in the data collection;
  • the directional analysis module 13 is used to extract a data set from the comprehensive database, and calculate the maximum density range of the data set to determine the quantitative analysis dimension of the data set, and formulate the data set according to each quantitative analysis dimension and its maximum density range.
  • the quantitative judgment condition of the data set is sent to the quantitative knowledge base; wherein, the quantitative judgment condition is the quantitative information with recognition degree in the reaction data set;
  • the inference engine module 14 is used to receive the data to be evaluated and record the quantitative information and qualitative information of the user output by the man-machine interface, and extract the qualitative judgment conditions and the quantitative judgment conditions from the qualitative knowledge base and the quantitative knowledge base respectively, Calculate the correlation between the data to be evaluated and each data set according to the qualitative judgment condition and the quantitative judgment condition and obtain the relevant evaluation value, and send the product information of the data set with the highest relevant evaluation value to the man-machine interface;
  • the man-machine interface 15 is used to output data to be evaluated and receive product information.
  • This application is based on the intelligent decision-making technology in the field of artificial intelligence, and adopts an expert system constructed at least by a comprehensive database, a qualitative knowledge base, a quantitative knowledge base, an inference engine and a human-machine interface.
  • an expert system (ExpertSystem) is one or a group of In some specific fields, an artificial intelligence computer program that applies a large amount of expert knowledge and reasoning methods to solve complex problems. Therefore, this application constructs a classification model for similarity matching of the data to be evaluated based on the expert system.
  • the present application also provides a computer system, which includes a plurality of computer devices 2.
  • the components of the data correlation analysis apparatus 1 of the second embodiment can be dispersed in different computer devices, and the computer devices can be Smart phones, tablets, laptops, desktop computers, rack servers, blade servers, tower servers or cabinet servers (including independent servers, or server clusters composed of multiple servers) that execute the program.
  • the computer device in this embodiment at least includes but is not limited to: a memory 21 and a processor 22 that can be communicatively connected to each other through a system bus, as shown in FIG. 9. It should be pointed out that FIG. 9 only shows a computer device with components, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the memory 21 (ie, readable storage medium) includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), Read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 21 may be an internal storage unit of a computer device, such as a hard disk or memory of the computer device.
  • the memory 21 may also be an external storage device of the computer device, for example, a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD).
  • SD Secure Digital
  • the memory 21 may also include both an internal storage unit of the computer device and an external storage device thereof.
  • the memory 21 is generally used to store an operating system and various application software installed in a computer device, such as the program code of the data correlation analysis device in the first embodiment, and so on.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 22 is generally used to control the overall operation of the computer equipment.
  • the processor 22 is used to run program codes or process data stored in the memory 21, for example, to run a data correlation analysis device, so as to implement the data correlation analysis method of the first embodiment.
  • this application also provides a computer-readable storage system, which includes multiple storage media.
  • the storage media may be non-volatile or volatile, such as flash memory, hard disk, multimedia card, and card.
  • Type memory for example, SD or DX memory, etc.
  • RAM random access memory
  • SRAM static random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • PROM programmable only
  • the read memory (PROM), magnetic memory, magnetic disk, optical disk, server, App application store, etc. have computer programs stored thereon, and the programs are executed by the processor 22 to realize corresponding functions.
  • the computer-readable storage medium of this embodiment is used to store a data correlation analysis device, and when executed by the processor 22, the data correlation analysis method of the first embodiment is implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Artificial Intelligence (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

数据相关性分析方法、装置、计算机系统及可读存储介质,基于人工智能,包括:获取历史业务数据并提取其中的产品信息,按照产品信息对历史业务数据分类,获得至少一个由同一产品信息的历史业务数据构成的数据集合;计算数据集合的信息熵以确定数据集合的定性分析维度,根据各定性分析维度下的定性信息制定数据集合的定性判断条件;计算数据集合的最大密度范围以确定数据集合的定量分析维度,根据各定量分析维度及其最大密度范围制定数据集合的定量判断条件;根据定性判断条件及定量判断条件计算待评估数据与各数据集合之间的相关度获得相关评估值。解决了当前无法对客户申请进行精准匹配,导致推荐的产品信息成功率较低的问题。

Description

数据相关性分析方法、装置、计算机系统及可读存储介质
本申请要求于2020年4月2日提交中国专利局、申请号为CN 202010253260.5,发明名称为“数据相关性分析方法、装置、计算机系统及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,其涉及到人工智能的知识表示与推理技术,尤其涉及一种数据相关性分析方法、装置、计算机系统及可读存储介质。
背景技术
导流是指平台方将一笔客户申请转发到一家资金方,即将客户在平台方申请的某一交易产品转化为资金方的产品信息的过程。随着互联网金融的不断发展,平台方会对接多家产品信息,各家产品信息对客户要求不尽相同。有些产品信息受限于展业地区,因此对客户的地区有要求;有些对客户的贷款金额有限制,如何依据业务数据正确确定一家产品信息是平台方必须解决的问题。
为解决上述问题,当前的平台方采用了树状图式的管理方式,也就是将对展业地区有要求的产品信息分为一类,对展业地区无要求的分为另一类;在此基础上再将对贷款金额有限制的分为一类,对贷款金额无限制的分为另一类,以此类推;然而发明人意识到,这种根据资金方的要求对客户申请进行粗略划分的方法,只能从较为单一的维度上划分客户申请以满足资金方的硬性要求,无法识别资金方在硬性要求之外的因素(如,资金方因其历史数据分析所指定的贷款偏好因素、风险控制的维度等),因此无法对客户申请进行精准匹配,导致平台方所推荐的产品信息成功率较低。
发明内容
本申请的目的是提供一种数据相关性分析方法、装置、计算机系统及可读存储介质,用于解决现有技术存在的无法识别资金方在硬性要求之外的因素,导致无法对客户申请进行精准匹配,使平台方所推荐的产品信息成功率较低的问题。
为实现上述目的,本申请提供一种基于人工智能的数据相关性分析方法,包括:
获取历史业务数据并提取其中的产品信息,按照产品信息对所述历史业务数据分类,获得至少一个由同一产品信息的历史业务数据构成的数据集合并将其发送至综合数据库;其中,所述产品信息是历史业务数据中反应用户消费的产品的名称信息;
从所述综合数据库中提取数据集合,并计算所述数据集合的信息熵以确定所述数据集合的定性分析维度,根据各定性分析维度下的定性信息制定所述数据集合的定性判断条件并将其发送至定性知识库;其中,所述定性判断条件是反应数据集合中具有识别度的定性信息;
从所述综合数据库中提取数据集合,并计算所述数据集合的最大密度范围以确定所述数据集合的定量分析维度,根据各定量分析维度及其最大密度范围制定所述数据集合的定量判断条件并将其发送至定量知识库;其中,所述定量判断条件是反应数据集合中具有识别度的定量信息;
接收由人机界面输出的记载有用户的定量信息和定性信息的待评估数据,并分别从所述定性知识库和定量知识库中提取定性判断条件和定量判断条件,根据所述定性判断条件及定量判断条件计算所述待评估数据与各数据集合之间的相关度并获得相关评估值,将相关评估值最高的数据集合的产品信息发送所述人机界面。
为实现上述目的,本申请还一种基于人工智能的数据相关性分析装置,包括:
数据处理模块,用于获取历史业务数据并提取其中的产品信息,按照产品信息对所述历史业务数据分类,获得至少一个由同一产品信息的历史业务数据构成的数据集合并将其发送至综合数据库;其中,所述产品信息是历史业务数据中反应用户消费的产品的名称信 息;
定性分析模块,用于从所述综合数据库中提取数据集合,并计算所述数据集合的信息熵以确定所述数据集合的定性分析维度,根据各定性分析维度下的定性信息制定所述数据集合的定性判断条件并将其发送至定性知识库;其中,所述定性判断条件是反应数据集合中具有识别度的定性信息;
定向分析模块,用于从所述综合数据库中提取数据集合,并计算所述数据集合的最大密度范围以确定所述数据集合的定量分析维度,根据各定量分析维度及其最大密度范围制定所述数据集合的定量判断条件并将其发送至定量知识库;其中,所述定量判断条件是反应数据集合中具有识别度的定量信息;
推理机模块,用于接收由人机界面输出的记载有用户的定量信息和定性信息的待评估数据,并分别从所述定性知识库和定量知识库中提取定性判断条件和定量判断条件,根据所述定性判断条件及定量判断条件计算所述待评估数据与各数据集合之间的相关度并获得相关评估值,将相关评估值最高的数据集合的产品信息发送所述人机界面;
人机界面,用于输出待评估数据及接收产品信息。
为实现上述目的,本申请还提供一种计算机系统,其包括多个计算机设备,各计算机设备包括存储器.处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述多个计算机设备的处理器执行所述计算机程序时共同实现上述数据相关性分析方法的步骤。
为实现上述目的,本申请还提供一种计算机可读存储介质,其包括多个存储介质,各存储介质上存储有计算机程序,所述多个存储介质存储的所述计算机程序被处理器执行时共同实现上述数据相关性分析方法的步骤。
本申请提供的数据相关性分析方法、装置、计算机系统及可读存储介质,通过综合数据库实现对历史业务数据进行分类并获得数据集合,每个数据集合都蕴含了每个产品信息的硬性要求,及硬性要求以外的所有因素;通过定性知识库计算数据集合获得具有识别度的定性维度并将其设为定性分析维度,根据该定性分析维度获得在定性分析维度下最具识别度的判断值域和判断方式,以实现识别产品信息的定性维度上的所有要求;通过定量知识库计算数据集合获得具有识别度的定量维度并将其设为定量分析维度,根据该定量分析维度获得在定量分析维度下最具识别度的判断值域和判断方式,以实现识别产品信息的定量维度上的所有要求;通过推理机对人机界面输出的待评估数据,分别从定量判断条件和定性判断条件两方面进行计算以获得,待评估数据与各数据集合之间的相关评估值,以实现从定量维度和定性维度上判断待评估数据与各产品信息的匹配度,实现了待评估数据与产品信息之间的精准匹配,提高了平台方所推荐的产品信息成功率,因此解决了现有技术中存在的无法识别资金方在硬性要求之外的因素,导致无法对客户申请进行精准匹配,使平台方所推荐的产品信息成功率较低的问题。
附图说明
图1为本申请数据相关性分析方法实施例一的流程图;
图2为本申请数据相关性分析方法实施例一S1中获取历史业务数据并提取其中的产品信息的流程图;
图3为本申请数据相关性分析方法实施例一S2中所确定所述数据集合的定性分析维度的流程图;
图4为本申请数据相关性分析方法实施例一S2中制定所述数据集合的定性判断条件的流程图;
图5为本申请数据相关性分析方法实施例一S3中确定所述数据集合的定量分析维度的流程图;
图6为本申请数据相关性分析方法实施例一S3中制定所述数据集合的定量判断条件的流程图;
图7为本申请数据相关性分析方法实施例一S4中获得描述所述待评估数据与各数据集合之 间匹配度的相关评估值的流程图;
图8为本申请数据相关性分析装置实施例二的程序模块示意图;
图9为本申请计算机系统实施例三中计算机设备的硬件结构示意图。
附图标记:
1、数据相关性分析装置  2、计算机设备       11、数据处理模块
12、定性分析模块       13、定向分析模块    14、推理机模块
15、人机界面           21、存储器          22、处理器
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请提供的数据相关性分析方法、装置、计算机系统及可读存储介质,适用于计算机领域,为提供一种基于综合数据库、定性知识库、定量知识库、推理机、人机界面的数据相关性分析方法。本申请通过获取历史业务数据并提取其中的产品信息,按照产品信息对所述历史业务数据分类,获得至少一个由同一产品信息的历史业务数据构成的数据集合,并将所述数据集合发送定性知识库和定量知识库;计算数据集合的信息熵以确定所述数据集合的定性分析维度,根据各定性分析维度下的定性信息制定所述数据集合的定性判断条件,并将所述定性判断条件发送推理机;计算数据集合的最大密度范围以确定所述数据集合的定量分析维度,根据各定量分析维度及其最大密度范围制定所述数据集合的定量判断条件,并将所述定量判断条件发送推理机;接收由人机界面输出的待评估数据,通过各所述数据集合的定性判断条件及定量判断条件计算所述待评估数据,以获得描述所述待评估数据与各数据集合之间匹配度的相关评估值,将相关评估值最高的数据集合的产品信息发送所述人机界面。
实施例一:
请参阅图1,本实施例的一种基于人工智能的数据相关性分析方法,包括:
S1:获取历史业务数据并提取其中的产品信息,按照产品信息对所述历史业务数据分类,获得至少一个由同一产品信息的历史业务数据构成的数据集合并将其发送至综合数据库;其中,所述产品信息是历史业务数据中反应用户消费的产品的名称信息;
S2:从所述综合数据库中提取数据集合,并计算所述数据集合的信息熵以确定所述数据集合的定性分析维度,根据各定性分析维度下的定性信息制定所述数据集合的定性判断条件并将其发送至定性知识库;其中,所述定性判断条件是反应数据集合中具有识别度的定性信息;
S3:从所述综合数据库中提取数据集合,并计算所述数据集合的最大密度范围以确定所述数据集合的定量分析维度,根据各定量分析维度及其最大密度范围制定所述数据集合的定量判断条件并将其发送至定量知识库;其中,所述定量判断条件是反应数据集合中具有识别度的定量信息;
S4:接收由人机界面输出的记载有用户的定量信息和定性信息的待评估数据,并分别从所述定性知识库和定量知识库中提取定性判断条件和定量判断条件,根据所述定性判断条件及定量判断条件计算所述待评估数据与各数据集合之间的相关度并获得相关评估值,将相关评估值最高的数据集合的产品信息发送所述人机界面。
在示例性的实施例中,从储存有历史业务数据的数据库中获取历史业务数据,所述历史业务数据的维度特征包括定性维度、定量维度和产品信息,在定性维度下的信息为定性信息,在定量维度下的信息为定量信息;其中,所述定性维度是指以文字的形式描述用户特征的维度特征,如,姓氏,性别,职业等;所述定量维度是指以数字的形式描述用户特征的维度特征,如,年龄,工龄等,所述产品信息是反应用户在历史上购买的产品信息的 维度特征,其至少包括:产品名称;所述数据集合是指由对应同一产品信息的历史业务数据所构成的信息集合,例如,产品信息包括产品A和产品B,那么将获得两个数据集合,其中一个涵盖了历史上所有购买了产品B的历史业务数据,另一个涵盖了历史上所有购买了产品B的历史业务数据。
通过信息增益模型计算所述数据集合中历史业务数据的信息熵,根据所述信息熵确定所述数据集合中某一定性维度为定性分析维度,从各所述历史业务数据中获取所述定性分析维度下的定性信息,并根据出现概率最高的定性信息及其判断方式制定定性判断条件;其中,信息熵是一种被用来作为一个系统的信息含量的量化指标,如果信息熵越大,则说明该信息中内容的混乱程度越大,通过该信息熵所对应的维度来识别该系统的可靠性就较低,反之,如果信息熵越小,则说明该信息中内容的混乱程度越小,那么通过该信息熵所对应的维度来识别该系统的可靠性就越高;因此,某一定性维度的信息熵越小的则说明这个定性维度的识别度最高,因此,获取该定性维度下出现概率最高的定性信息最能够体现数据集合的识别度。
通过均值漂移模型计算所述数据集合中各定量维度下历史定量信息的最大密度范围;将所述定量维度设为该数据集合的定量分析维度,根据所述定量分析维度及其最大密度范围获得定量判断条件;均值漂移模型是一种基于密度梯度上升的非参数方法,通过迭代运算找到目标位置,实现目标跟踪的算法;因此,本申请中将最大密度范围作为目标位置,通过迭代算法找到各定量维度下值的最大密度所在区域,并将该区域设为最大密度范围。接收由人机界面输出的待评估业务数据,根据所述定性判断条件判断所述待评估业务数据并获得定性评估值,根据所述定量判断条件判断所述待评估业务数据并获得定量评估值,将所述定性评估值与定量评估值加权计算,获得所述待评估业务数据对于所述数据集合的相关评估值;对比所述待评估业务数据对于每个数据集合的相关评估值,将相关评估值最高的数据集合所对应的产品信息设为推荐产品并将其输出至人机界面。
在一个优选的实施例中,请参阅图2,S1中所述获取历史业务数据并提取其中的产品信息的步骤,包括:
S101:通过配置模块设定训练数量,从历史数据库中获取数量与所述训练数量一致的历史业务数据。
其中,所述历史数据库是用于储存历史业务数据的数据库;通过设置训练数量有利于数据管理人员保证对历史业务数据进行训练的数量,保证了训练出的定性判断条件和定量判断条件的准确度,其中,所述训练数量可根据需要设置。
需要说明的是,可采用DMCTextFilter作为配置模块,DMCTextFilter是纯文本抽出通用程序库,可以从各种各样的文档格式的数据中或从插入的OLE对象中,完全除掉特殊控制信息,快速抽出纯文本数据信息。便于用户实现对多种文档数据资源信息进行统一管理,编辑,检索和浏览。
S102:通过维度模块获取所述历史业务数据中的维度值类型,将维度值类型为字符所对应的维度ID和维度编码设为定性维度,将定性维度所对应的信息设为定性信息,将维度值类型为码值、或日期、或数值所对应的维度ID和维度编码设为定量维度,将所述定量维度所对应的信息设为定量信息;其中,所述维度ID是标注历史业务数据中维度特征的数字编号。
例如:历史业务数据如下所示:
Figure PCTCN2020103829-appb-000001
Figure PCTCN2020103829-appb-000002
需要说明的是,采用re模块作为所述维度模块,所述re模块是一种在python中通过内嵌集成的模块,其用于直接实现正则匹配。
S103:通过产品模块提取所述历史业务数据的产品信息。
基于上述举例,提取上述历史业务数据的产品信息为产品A,以便于按照产品信息对历史业务数据进行分类,如,将产品信息为产品A的历史业务数据划归为一个数据集合。
需要说明的是,采用re模块作为产品模块,所述re模块是一种在python中通过内嵌集成的模块,其用于直接实现正则匹配。
在一个优选的实施例中,请参阅图3,S2中所述计算数据集合的信息熵以确定所述数据集合的定性分析维度的步骤,包括:
S201:通过定性汇总模块汇总数据集合的历史业务数据中,各定性维度下的定性信息以获得定性集合。
示例性地,根据所述数据集合中的定性维度,提取该定性维度下的历史定性信息并汇总,获得定性集合;例如,所述定性维度为“性别”,所述定性集合为{男,男,男,男,女}。
需要说明的是,可采用re模块作为定性汇总模块,所述re模块是一种在python中通过内嵌集成的模块,其用于直接实现正则匹配。
S202:采用所述概率模块通过预设的信息增益模型计算所述定性集合中各种类定性信息出现的概率,以获得与所述定性集合对应的定性维度的信息熵。
示例性地,获得所述定性集合中历史定性信息的数量并将其设为定性总量,将所述定性集合去重获得具有定性种类的定性种类集,在所述定性集合中依次获得定性种类的数量,并将其设为定性单量;根据所述定性单量计算所述定性种类出现的概率;基于上述举例,定性总量为5,定性种类集为{男,女};其中,“男”的定性单量为4,“女”的定性单量为1;定性种类为男的出现概率为80%,定性种类为女的出现概率为20%。
将各定性种类的出现概率录入具有信息增益公式的信息增益模型,以计算所述定性维度在该数据集合中的信息熵;
所述信息增益公式为:E(X)=—Σ i=1pilog2(pi)
其中,E为所述信息熵,pi为第i个定性名称的出现概率。
需要说明的是,可采用python的math模块构建所述概率模块的信息增益公式,其中,所述math模块中定义了数学函数,由于这个模块属于编译系统自带,因此它可以被无条件调用以构建所述概率模块的公式。
S203:采用定性判断模块将信息熵小于预设的信息阈值的定性维度,设为所述数据集合的定性分析维度。
本步骤中,通过预设的信息阈值对信息熵进行过滤,以消除信息熵较小的定性维度;信息熵是一种被用来作为一个系统的信息含量的量化指标,如果信息熵越大,则说明该信息中内容的混乱程度越大,通过该信息熵所对应的维度来识别该系统的可靠性就较低,反之,如果信息熵越小,则说明该信息中内容的混乱程度越小,那么通过该信息熵所对应的维度来识别该系统的可靠性就越高;例如,如果一个班上的学生有10个男生,10个女生,该信息熵就比较大,也就是说这个班的性别分布十分的混乱,因此,通过性别来识别这个班的可靠性就比较低;反之,如果一个班上的学生有19个男生,1个女生,该信息熵就比较小,也就是说这个班的性别分别十分规律,因此通过性别来识别这个班的可靠性就比较高。
因此,通过这种方法,能够在海量的数据中获得具有识别度的定性维度,按照这种定性维度区分历史业务数据,其准确度和可靠性就会非常高。
需要说明的是,可采用具有“IF”函数的计算机代码编写的计算机模块作为定性判断 模块,以将信息熵小于信息阈值的定性维度,设为所述数据集合的定性分析维度。
在一个优选的实施例中,请参阅图4,S2中所述根据各定性分析维度下的定性信息制定所述数据集合的定性判断条件的步骤,包括:
S211:通过值域模块将数据集合中在所述定性分析维度下出现概率最高的定性种类设为判断值域。
基于上述举例,假设定性维度“性别”的信息熵小于信息阈值,并将其设为了定性分析维度,由于定性种类为男的出现概率为80%,定性种类为女的出现概率为20%,那么将“男”设为判断值域。
需要说明的是,可采用具有“条件计数函数COUNTIF”函数的计算机代码编写的计算机模块作为值域模块,以计算获得在所述定性分析维度下出现概率最高的定性种类设为判断值域。
S212:通过定性条件模块具有定性映射表,所述定性条件模块从定性映射表中获取与所述定性分析维度对应的判断方式,及汇总所述判断值域和判断方式生成所述数据集合的定性判断条件。
示例性地,预先设置的映射表中具有定性分析维度与判断方式之间的映射关系;于本实施例中,所述映射关系反应的是定性维度的维度值类型与判断方式之间的映射;例如,维度值类型为码值所对应的判断方式为“属于”,维度值类型为字符类型所对应的判断方式为“包含”。其中,所述判断值域为数据集合的历史业务数据中,在所述定性分析维度下出现概率最高的定性种类设为判断值域;所述定性判断条件还包括判断方式,所述判断方式为判断待评估数据的定性信息中待评估数据的定性信息与判断值域之间关系的行为;维度值类型为码值的判断方式包括“属于”,维度值类型为字符类型的判断方式包括“包含”。
例如:按照上述方法获得定性分析维度和定性判断条件,并如下表所示:
Figure PCTCN2020103829-appb-000003
需要说明的是,可采用map()映射函数作为定性条件模块从定性映射表中获取与所述定性分析维度对应的判断方式,及汇总所述判断值域和判断方式生成所述数据集合的定性判断条件。
在一个优选的实施例中,请参阅图5,S3中计算数据集合的最大密度范围以确定所述数据集合的定量分析维度的步骤,包括:
S301:采用漂移模块通过预设的均值漂移模型计算所述数据集合中各定量维度下定量信息的最大密度范围。
示例性地,获得数据集合中定量维度下的历史定量信息,将所述数据集合的历史业务数据录入所述均值漂移模型中,并使所述历史业务数据的各定量信息在均值漂移模型中以坐标点的方式存在;例如,定量维度的维度ID为125,维度编码为app_amt,该历史业务数据的定量信息为500,则在均值漂移模型中,该定量信息以坐标为X1=500的形式存在。创建半径为h的高维球区域,并利用密度公式计算所述高维球区域中的密度;其中,所述密度公式为:
Figure PCTCN2020103829-appb-000004
所述密度公式中,S为高维球区域,k为落入高维球区域中点的个数,X为高维球区域的中心点,Xi为落入高维球区域中的定量信息,M为高维球区域的中心点与落入高维球区域中的历史定量信息的平均距离,不断移动所述高维球区域直至所述M为最小时停止移动;提取所述高维球区域的中心点,将该中心点与其半径相减获得定量下限,再将该中心点与 其半径相加获得定量上限;根据所述定量上限和定量下限获得最大密度范围。
需要说明的是,可采用python的math模块构建具有均值漂移模型的漂移模块。
S302:通过定量判断模块提取所述最大密度范围中定量信息的数量,若该数量大于预设的定量阈值,则将所述最大密度范围所对应的定量维度设为所述数据集合的定量分析维度。
示例性地,根据使用者需要设置定量阈值,提取所述最大密度范围所对应的高维球区域中定量信息的数量,并将该数量与所述定量阈值比对,及将数量大于所述定量阈值的最大密度范围所对应的定量维度,设为所述数据集合的定量分析维度。
需要说明的是,可采用具有“IF”函数的计算机代码编写的计算机模块作为定量判断模块,以若该数量大于预设的定量阈值,则将所述最大密度范围所对应的定量维度设为所述数据集合的定量分析维度。
在一个优选的实施例中,请参阅图6,S3中根据各定量分析维度及其最大密度范围制定所述数据集合的定量判断条件的步骤,包括:
S311:采用方式值域模块从预设的定量映射表中获得定量分析维度的判断方式,并将所述最大密度范围作为判断值域。
示例性地,预先设置的定量映射表中具有定量分析维度与判断方式之间的映射关系;于本实施例中,所述映射关系反应的是定量维度的维度值类型与判断方式之间的映射;例如,维度值类型为数值和日期的判断方式为“范围”。
需要说明的是,可采用map()映射函数作为方式值域模块,从定量映射表中获得定量分析维度的判断方式,并将所述最大密度范围作为判断值域。
S312:通过定量条件模块汇总所述判断值域和判断方式,生成所述定量分析维度的定量判断条件。
例如:根据所述定量分析维度的维度值类型获取与其对应的判断方式,形成定量判断条件如下所示:
Figure PCTCN2020103829-appb-000005
需要说明的是,可采用分类汇总函数SUBTOTAL制作定量条件模块,以汇总所述判断值域和判断方式生成所述定量分析维度的定量判断条件。
在示例性的实施例中,S3中根据各定量分析维度及其最大密度范围制定所述数据集合的定量判断条件之后还包括:
根据所述定性判断条件和定量分析条件生成创建成功信号,并将其输出至人机界面。
在示例性的实施例中,S4中接收由人机界面输出的待评估数据包括:
接收由人机界面根据所述创建成功信号输出的待评估业务数据;例如:待评估业务数据如下所示:
维度ID 维度名称 信息 维度编码 维度值类型 所属码值组
123 性别 SEX 1-码值 sex_type
124 申请时间 2019-3 app_time 2-日期  
125 申请金额 1000 app_amt 3-数值  
126 姓名 李四 name 4-字符  
127 职业 律师 job 4-字符  
在一个优选的实施例中,请参阅图7,S4中根据所述定性判断条件及定量判断条件计算所述待评估数据与各数据集合之间的相关度并获得相关评估值的步骤,包括:
S401:通过定性评估模块根据数据集合的定性判断条件,计算待评估数据的定性信息以获得定性评估值;
本步骤中,若待评估数据中与定性判断条件对应的定性信息,符合定性判断条件的判断方式和判断值域,则对所述定性信息的定性评估值赋值1;若待评估数据中与定性判断条件对应的定性信息,不符合定性判断条件的判断方式和判断值域,则对所述定性信息的定性评估值赋值0。
需要说明的是,可采用具有“条件计数函数COUNTIF”函数的计算机代码编写的计算机模块作为定性评估模块,以根据数据集合的定性判断条件,计算待评估数据的定性信息以获得定性评估值。
例如,待评估数据中与定性判断条件对应的定性信息如下所示:
维度ID 维度编码 信息
123 SEX
126 name
定性判断条件为
Figure PCTCN2020103829-appb-000006
因此,待评估数据中与定性判断条件对应的定性信息及其定性评估值如下所示:
维度ID 维度编码 信息 定性评估值
123 SEX 0
126 name 0
S402:通过定量评估模块根据数据集合的定量判断条件,计算待评估数据的定量信息以获得定量评估值;
本步骤中,若待评估数据中与定量判断条件对应的定量信息,符合定量判断条件的判断方式和判断值域,则对所述定量信息的定量评估值赋值1;若待评估数据中与定量判断条件对应的定量信息,不符合定量判断条件的判断方式和判断值域,则对所述定量信息的定量评估值赋值0。
需要说明的是,可采用具有“条件计数函数COUNTIF”函数的计算机代码编写的计算机模块作为定量评估模块,以根据数据集合的定量判断条件,计算待评估数据的定量信息以获得定量评估值。
例如,待评估数据中与定量判断条件对应的定量信息如下所示:
维度ID 维度编码 信息
124 app_time 2019-3
125 app_amt 1000
定量判断条件为
Figure PCTCN2020103829-appb-000007
因此,待评估数据中与定量判断条件对应的定量信息及其定量评估值如下所示:
维度ID 维度编码 信息 定量评估值
124 app_time 2019-3 1
125 app_amt 1000 1
S403:通过计算模块对所述定量评估值和定性评估值进行加权计算,获得描述所述待评估数据与各数据集合之间匹配度的相关评估值。
例如:
Figure PCTCN2020103829-appb-000008
由此可知,待评估数据对于产品信息A的相关评估值为2。
需要说明的是,可采用python的math模块构建所述计算模块,以对所述定量评估值和定性评估值进行加权计算获得相关评估值。
在示例性的实施例中,S4中将相关评估值最高的数据集合的产品信息发送所述人机界面的步骤,包括:
对比所述待评估业务数据与每个数据集合之间的相关评估值,将相关评估值最高的数据集合所对应的产品信息发送至人机界面。
基于上述举例:若待评估数据对于产品A的相关评估值为2,对于产品B的相关评估值为0,对于产品C的相关评估值为1,对于产品D的相关评估值为4,则将产品D作为推荐产品并输出至人机界面,以实现对产品信息的精确导流。
实施例二:
请参阅图8,本实施例的一种基于人工智能的数据相关性分析装置1,包括:
数据处理模块11,用于获取历史业务数据并提取其中的产品信息,按照产品信息对所述历史业务数据分类,获得至少一个由同一产品信息的历史业务数据构成的数据集合并将其发送至综合数据库;其中,所述产品信息是历史业务数据中反应用户消费的产品的名称信息;
定性分析模块12,用于从所述综合数据库中提取数据集合,并计算所述数据集合的信息熵以确定所述数据集合的定性分析维度,根据各定性分析维度下的定性信息制定所述数据集合的定性判断条件并将其发送至定性知识库;其中,所述定性判断条件是反应数据集合中具有识别度的定性信息;
定向分析模块13,用于从所述综合数据库中提取数据集合,并计算所述数据集合的最大密度范围以确定所述数据集合的定量分析维度,根据各定量分析维度及其最大密度范围制定所述数据集合的定量判断条件并将其发送至定量知识库;其中,所述定量判断条件是反应数据集合中具有识别度的定量信息;
推理机模块14,用于接收由人机界面输出的记载有用户的定量信息和定性信息的待评估数据,并分别从所述定性知识库和定量知识库中提取定性判断条件和定量判断条件,根据所述定性判断条件及定量判断条件计算所述待评估数据与各数据集合之间的相关度并获得相关评估值,将相关评估值最高的数据集合的产品信息发送所述人机界面;
人机界面15,用于输出待评估数据及接收产品信息。
本申请基于人工智能领域的智能决策技术,采用了至少由综合数据库、定性知识库、定量知识库、推理机和人机界面构建的专家系统,由于专家系统(ExpertSystem)是一个或一组能在某些特定领域内,应用大量的专家知识和推理方法求解复杂问题的一种人工智能计算机程序,因此本申请基于专家系统构建了一种用于对待评估数据进行相似度匹配的分类模型。
实施例三:
为实现上述目的,本申请还提供一种计算机系统,该计算机系统包括多个计算机设备2,实施例二的数据相关性分析装置1的组成部分可分散于不同的计算机设备中,计算机设备可以是执行程序的智能手机、平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。本实施例的计算机设备至少包括但不限于:可通过系统总线相互通信连接的存储器21、处理器22,如图9所示。需要指出的是,图9仅示出了具有组件-的计算机设备,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
本实施例中,存储器21(即可读存储介质)包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器21可以是计算机设备的内部存储单元,例如该计算机设备的硬盘或内存。在另一些实施例中,存储器21也可以是计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,存储器21还可以既包括计算机设备的内部存储单元也包括其外部存储设备。本实施例中,存储器21通常用于存储安装于计算机设备的操作系统和各类应用软件,例如实施例一的数据相关性分析装置的程序代码等。此外,存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。
处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制计算机设备的总体操作。本实施例中,处理器22用于运行存储器21中存储的程序代码或者处理数据,例如运行数据相关性分析装置,以实现实施例一的数据相关性分析方法。
实施例四:
为实现上述目的,本申请还提供一种计算机可读存储系统,其包括多个存储介质,所述存储介质可以是非易失性,也可以是易失性,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机程序,程序被处理器22执行时实现相应功能。本实施例的计算机可读存储介质用于存储数据相关性分析装置,被处理器22执行时实现实施例一的数据相关性分析方法。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种基于人工智能的数据相关性分析方法,其中,包括:
    获取历史业务数据并提取其中的产品信息,按照产品信息对所述历史业务数据分类,获得至少一个由同一产品信息的历史业务数据构成的数据集合并将其发送至综合数据库;其中,所述产品信息是历史业务数据中反应用户消费的产品的名称信息;
    从所述综合数据库中提取数据集合,并计算所述数据集合的信息熵以确定所述数据集合的定性分析维度,根据各定性分析维度下的定性信息制定所述数据集合的定性判断条件并将其发送至定性知识库;其中,所述定性判断条件是反应数据集合中具有识别度的定性信息;
    从所述综合数据库中提取数据集合,并计算所述数据集合的最大密度范围以确定所述数据集合的定量分析维度,根据各定量分析维度及其最大密度范围制定所述数据集合的定量判断条件并将其发送至定量知识库;其中,所述定量判断条件是反应数据集合中具有识别度的定量信息;
    接收由人机界面输出的记载有用户的定量信息和定性信息的待评估数据,并分别从所述定性知识库和定量知识库中提取定性判断条件和定量判断条件,根据所述定性判断条件及定量判断条件计算所述待评估数据与各数据集合之间的相关度并获得相关评估值,将相关评估值最高的数据集合的产品信息发送所述人机界面。
  2. 根据权利要求1所述的数据相关性分析方法,其中,所述获取历史业务数据并提取其中的产品信息的步骤,包括:
    设定训练数量,从历史数据库中获取数量与所述训练数量一致的历史业务数据;
    获取所述历史业务数据中的维度值类型,将维度值类型为字符所对应的维度ID和维度编码设为定性维度,将定性维度所对应的信息设为定性信息,将维度值类型为码值、或日期、或数值所对应的维度ID和维度编码设为定量维度,将所述定量维度所对应的信息设为定量信息;其中,所述维度ID是标注历史业务数据中维度特征的数字编号;
    提取所述历史业务数据的产品信息。
  3. 根据权利要求1所述的数据相关性分析方法,其中,所述计算所述数据集合的信息熵以确定所述数据集合的定性分析维度的步骤,包括:
    汇总数据集合的历史业务数据中各定性维度下的定性信息以获得定性集合;
    通过预设的信息增益模型计算所述定性集合中各种类定性信息出现的概率,以获得与所述定性集合对应的定性维度的信息熵;
    将信息熵小于预设的信息阈值的定性维度,设为所述数据集合的定性分析维度。
  4. 根据权利要求1所述的数据相关性分析方法,其中,所述根据各定性分析维度下的定性信息制定所述数据集合的定性判断条件的步骤,包括:
    将数据集合中在所述定性分析维度下出现概率最高的定性种类设为判断值域;
    从预设的定性映射表中获取与所述定性分析维度对应的判断方式,及汇总所述判断值域和判断方式生成所述数据集合的定性判断条件。
  5. 根据权利要求1所述的数据相关性分析方法,其中,所述计算所述数据集合的最大密度范围以确定所述数据集合的定量分析维度的步骤,包括:
    通过预设的均值漂移模型计算所述数据集合中各定量维度下定量信息的最大密度范围;
    提取所述最大密度范围中定量信息的数量,若该数量大于预设的定量阈值,则将所述最大密度范围所对应的定量维度设为所述数据集合的定量分析维度。
  6. 根据权利要求1所述的数据相关性分析方法,其中,所述根据各定量分析维度及其最大密度范围制定所述数据集合的定量判断条件的步骤,包括:
    从预设的定量映射表中获得定量分析维度的判断方式,并将所述最大密度范围作 为判断值域;
    汇总所述判断值域和判断方式生成所述定量分析维度的定量判断条件。
  7. 根据权利要求1所述的数据相关性分析方法,其中,根据所述定性判断条件及定量判断条件计算所述待评估数据与各数据集合之间的相关度并获得相关评估值的步骤,包括:
    根据各数据集合的定性判断条件,计算待评估数据的定性信息与所述各数据集合之间的相关度,以获得定性评估值;
    根据各数据集合的定量判断条件,计算待评估数据的定量信息与所述各数据集合之间的相关度,以获得定量评估值;
    对所述定量评估值和定性评估值进行加权计算,获得反映所述待评估数据与各数据集合之间匹配度的相关评估值。
  8. 一种基于人工智能的数据相关性分析装置,其中,包括:
    数据处理模块,用于获取历史业务数据并提取其中的产品信息,按照产品信息对所述历史业务数据分类,获得至少一个由同一产品信息的历史业务数据构成的数据集合并将其发送至综合数据库;其中,所述产品信息是历史业务数据中反应用户消费的产品的名称信息;
    定性分析模块,用于从所述综合数据库中提取数据集合,并计算所述数据集合的信息熵以确定所述数据集合的定性分析维度,根据各定性分析维度下的定性信息制定所述数据集合的定性判断条件并将其发送至定性知识库;其中,所述定性判断条件是反应数据集合中具有识别度的定性信息;
    定向分析模块,用于从所述综合数据库中提取数据集合,并计算所述数据集合的最大密度范围以确定所述数据集合的定量分析维度,根据各定量分析维度及其最大密度范围制定所述数据集合的定量判断条件并将其发送至定量知识库;其中,所述定量判断条件是反应数据集合中具有识别度的定量信息;
    推理机模块,用于接收由人机界面输出的记载有用户的定量信息和定性信息的待评估数据,并分别从所述定性知识库和定量知识库中提取定性判断条件和定量判断条件,根据所述定性判断条件及定量判断条件计算所述待评估数据与各数据集合之间的相关度并获得相关评估值,将相关评估值最高的数据集合的产品信息发送所述人机界面;
    人机界面,用于输出待评估数据及接收产品信息。
  9. 一种计算机系统,其包括多个计算机设备,各计算机设备包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其中,所述多个计算机设备的处理器执行所述计算机程序时共同实现以下步骤:
    获取历史业务数据并提取其中的产品信息,按照产品信息对所述历史业务数据分类,获得至少一个由同一产品信息的历史业务数据构成的数据集合并将其发送至综合数据库;其中,所述产品信息是历史业务数据中反应用户消费的产品的名称信息;
    从所述综合数据库中提取数据集合,并计算所述数据集合的信息熵以确定所述数据集合的定性分析维度,根据各定性分析维度下的定性信息制定所述数据集合的定性判断条件并将其发送至定性知识库;其中,所述定性判断条件是反应数据集合中具有识别度的定性信息;
    从所述综合数据库中提取数据集合,并计算所述数据集合的最大密度范围以确定所述数据集合的定量分析维度,根据各定量分析维度及其最大密度范围制定所述数据集合的定量判断条件并将其发送至定量知识库;其中,所述定量判断条件是反应数据集合中具有识别度的定量信息;
    接收由人机界面输出的记载有用户的定量信息和定性信息的待评估数据,并分别从所述定性知识库和定量知识库中提取定性判断条件和定量判断条件,根据所述定性 判断条件及定量判断条件计算所述待评估数据与各数据集合之间的相关度并获得相关评估值,将相关评估值最高的数据集合的产品信息发送所述人机界面。
  10. 根据权利要求9所述的计算机系统,其中,所述获取历史业务数据并提取其中的产品信息的步骤,包括:
    设定训练数量,从历史数据库中获取数量与所述训练数量一致的历史业务数据;
    获取所述历史业务数据中的维度值类型,将维度值类型为字符所对应的维度ID和维度编码设为定性维度,将定性维度所对应的信息设为定性信息,将维度值类型为码值、或日期、或数值所对应的维度ID和维度编码设为定量维度,将所述定量维度所对应的信息设为定量信息;其中,所述维度ID是标注历史业务数据中维度特征的数字编号;
    提取所述历史业务数据的产品信息。
  11. 根据权利要求9所述的计算机系统,其中,所述计算所述数据集合的信息熵以确定所述数据集合的定性分析维度的步骤,包括:
    汇总数据集合的历史业务数据中各定性维度下的定性信息以获得定性集合;
    通过预设的信息增益模型计算所述定性集合中各种类定性信息出现的概率,以获得与所述定性集合对应的定性维度的信息熵;
    将信息熵小于预设的信息阈值的定性维度,设为所述数据集合的定性分析维度。
  12. 根据权利要求9所述的计算机系统,其中,所述根据各定性分析维度下的定性信息制定所述数据集合的定性判断条件的步骤,包括:
    将数据集合中在所述定性分析维度下出现概率最高的定性种类设为判断值域;
    从预设的定性映射表中获取与所述定性分析维度对应的判断方式,及汇总所述判断值域和判断方式生成所述数据集合的定性判断条件;
    所述计算所述数据集合的最大密度范围以确定所述数据集合的定量分析维度的步骤,包括:
    通过预设的均值漂移模型计算所述数据集合中各定量维度下定量信息的最大密度范围;
    提取所述最大密度范围中定量信息的数量,若该数量大于预设的定量阈值,则将所述最大密度范围所对应的定量维度设为所述数据集合的定量分析维度。
  13. 根据权利要求9所述的计算机系统,其中,所述根据各定量分析维度及其最大密度范围制定所述数据集合的定量判断条件的步骤,包括:
    从预设的定量映射表中获得定量分析维度的判断方式,并将所述最大密度范围作为判断值域;
    汇总所述判断值域和判断方式生成所述定量分析维度的定量判断条件。
  14. 根据权利要求9所述的计算机系统,其中,根据所述定性判断条件及定量判断条件计算所述待评估数据与各数据集合之间的相关度并获得相关评估值的步骤,包括:
    根据各数据集合的定性判断条件,计算待评估数据的定性信息与所述各数据集合之间的相关度,以获得定性评估值;
    根据各数据集合的定量判断条件,计算待评估数据的定量信息与所述各数据集合之间的相关度,以获得定量评估值;
    对所述定量评估值和定性评估值进行加权计算,获得反映所述待评估数据与各数据集合之间匹配度的相关评估值。
  15. 一种计算机可读存储介质,其包括多个存储介质,各存储介质上存储有计算机程序,其中,所述多个存储介质存储的所述计算机程序被处理器执行时共同实现以下步骤:
    获取历史业务数据并提取其中的产品信息,按照产品信息对所述历史业务数据分类,获得至少一个由同一产品信息的历史业务数据构成的数据集合并将其发送至综合 数据库;其中,所述产品信息是历史业务数据中反应用户消费的产品的名称信息;
    从所述综合数据库中提取数据集合,并计算所述数据集合的信息熵以确定所述数据集合的定性分析维度,根据各定性分析维度下的定性信息制定所述数据集合的定性判断条件并将其发送至定性知识库;其中,所述定性判断条件是反应数据集合中具有识别度的定性信息;
    从所述综合数据库中提取数据集合,并计算所述数据集合的最大密度范围以确定所述数据集合的定量分析维度,根据各定量分析维度及其最大密度范围制定所述数据集合的定量判断条件并将其发送至定量知识库;其中,所述定量判断条件是反应数据集合中具有识别度的定量信息;
    接收由人机界面输出的记载有用户的定量信息和定性信息的待评估数据,并分别从所述定性知识库和定量知识库中提取定性判断条件和定量判断条件,根据所述定性判断条件及定量判断条件计算所述待评估数据与各数据集合之间的相关度并获得相关评估值,将相关评估值最高的数据集合的产品信息发送所述人机界面。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述获取历史业务数据并提取其中的产品信息的步骤,包括:
    设定训练数量,从历史数据库中获取数量与所述训练数量一致的历史业务数据;
    获取所述历史业务数据中的维度值类型,将维度值类型为字符所对应的维度ID和维度编码设为定性维度,将定性维度所对应的信息设为定性信息,将维度值类型为码值、或日期、或数值所对应的维度ID和维度编码设为定量维度,将所述定量维度所对应的信息设为定量信息;其中,所述维度ID是标注历史业务数据中维度特征的数字编号;
    提取所述历史业务数据的产品信息。
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述计算所述数据集合的信息熵以确定所述数据集合的定性分析维度的步骤,包括:
    汇总数据集合的历史业务数据中各定性维度下的定性信息以获得定性集合;
    通过预设的信息增益模型计算所述定性集合中各种类定性信息出现的概率,以获得与所述定性集合对应的定性维度的信息熵;
    将信息熵小于预设的信息阈值的定性维度,设为所述数据集合的定性分析维度。
  18. 根据权利要求15所述的计算机可读存储介质,其中,所述根据各定性分析维度下的定性信息制定所述数据集合的定性判断条件的步骤,包括:
    将数据集合中在所述定性分析维度下出现概率最高的定性种类设为判断值域;
    从预设的定性映射表中获取与所述定性分析维度对应的判断方式,及汇总所述判断值域和判断方式生成所述数据集合的定性判断条件;
    所述计算所述数据集合的最大密度范围以确定所述数据集合的定量分析维度的步骤,包括:
    通过预设的均值漂移模型计算所述数据集合中各定量维度下定量信息的最大密度范围;
    提取所述最大密度范围中定量信息的数量,若该数量大于预设的定量阈值,则将所述最大密度范围所对应的定量维度设为所述数据集合的定量分析维度。
  19. 根据权利要求15所述的计算机可读存储介质,其中,所述根据各定量分析维度及其最大密度范围制定所述数据集合的定量判断条件的步骤,包括:
    从预设的定量映射表中获得定量分析维度的判断方式,并将所述最大密度范围作为判断值域;
    汇总所述判断值域和判断方式生成所述定量分析维度的定量判断条件。
  20. 根据权利要求15所述的计算机可读存储介质,其中,根据所述定性判断条件及定量判断条件计算所述待评估数据与各数据集合之间的相关度并获得相关评估值的 步骤,包括:
    根据各数据集合的定性判断条件,计算待评估数据的定性信息与所述各数据集合之间的相关度,以获得定性评估值;
    根据各数据集合的定量判断条件,计算待评估数据的定量信息与所述各数据集合之间的相关度,以获得定量评估值;
    对所述定量评估值和定性评估值进行加权计算,获得反映所述待评估数据与各数据集合之间匹配度的相关评估值。
PCT/CN2020/103829 2020-04-02 2020-07-23 数据相关性分析方法、装置、计算机系统及可读存储介质 WO2021196457A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010253260.5 2020-04-02
CN202010253260.5A CN111581296B (zh) 2020-04-02 2020-04-02 数据相关性分析方法、装置、计算机系统及可读存储介质

Publications (1)

Publication Number Publication Date
WO2021196457A1 true WO2021196457A1 (zh) 2021-10-07

Family

ID=72119173

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/103829 WO2021196457A1 (zh) 2020-04-02 2020-07-23 数据相关性分析方法、装置、计算机系统及可读存储介质

Country Status (2)

Country Link
CN (1) CN111581296B (zh)
WO (1) WO2021196457A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573728A (zh) * 2024-01-17 2024-02-20 杭银消费金融股份有限公司 一种数据信息的信息维度升维处理方法及系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269179B (zh) * 2021-06-24 2024-04-05 中国平安人寿保险股份有限公司 数据处理方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104867037A (zh) * 2015-05-29 2015-08-26 北京京东尚科信息技术有限公司 一种画像特征的数据处理方法及装置
CN106548375A (zh) * 2016-11-04 2017-03-29 东软集团股份有限公司 用于构建产品画像的方法和装置
CN110189164A (zh) * 2019-05-09 2019-08-30 杭州览众数据科技有限公司 基于信息熵度量和特征随机采样的商品—门店推荐方案
CN110727857A (zh) * 2019-09-04 2020-01-24 口碑(上海)信息技术有限公司 针对业务对象识别潜在用户的关键特征的方法及装置
CN110751533A (zh) * 2019-09-09 2020-02-04 上海陆家嘴国际金融资产交易市场股份有限公司 产品画像生成方法、装置、计算机设备和存储介质
US20200051098A1 (en) * 2018-08-08 2020-02-13 Adp, Llc Method and System for Predictive Modeling of Consumer Profiles

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180052886A1 (en) * 2015-05-30 2018-02-22 The Power Player Inc. Data aggregation system
EP3159838A1 (en) * 2015-10-23 2017-04-26 Tata Consultancy Services Limited System and method for evaluating reviewer's ability to provide feedback
CN108108488A (zh) * 2018-01-12 2018-06-01 中译语通科技股份有限公司 基于流式计算的数据统计分析方法及系统、计算机程序
CN108734567A (zh) * 2018-04-03 2018-11-02 杭州连银科技有限公司 一种基于大数据人工智能风控的资产管理系统及其评估方法
CN110516164B (zh) * 2019-07-25 2023-06-30 上海喜马拉雅科技有限公司 一种信息推荐方法、装置、设备及存储介质
CN110599040A (zh) * 2019-09-16 2019-12-20 中国人民解放军陆军工程大学 维修训练的评估方法、系统及终端设备
CN110889082B (zh) * 2019-12-03 2021-08-31 中国航空综合技术研究所 基于系统工程理论的人机工程设备综合评价方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104867037A (zh) * 2015-05-29 2015-08-26 北京京东尚科信息技术有限公司 一种画像特征的数据处理方法及装置
CN106548375A (zh) * 2016-11-04 2017-03-29 东软集团股份有限公司 用于构建产品画像的方法和装置
US20200051098A1 (en) * 2018-08-08 2020-02-13 Adp, Llc Method and System for Predictive Modeling of Consumer Profiles
CN110189164A (zh) * 2019-05-09 2019-08-30 杭州览众数据科技有限公司 基于信息熵度量和特征随机采样的商品—门店推荐方案
CN110727857A (zh) * 2019-09-04 2020-01-24 口碑(上海)信息技术有限公司 针对业务对象识别潜在用户的关键特征的方法及装置
CN110751533A (zh) * 2019-09-09 2020-02-04 上海陆家嘴国际金融资产交易市场股份有限公司 产品画像生成方法、装置、计算机设备和存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573728A (zh) * 2024-01-17 2024-02-20 杭银消费金融股份有限公司 一种数据信息的信息维度升维处理方法及系统
CN117573728B (zh) * 2024-01-17 2024-04-23 杭银消费金融股份有限公司 一种数据信息的信息维度升维处理方法及系统

Also Published As

Publication number Publication date
CN111581296B (zh) 2022-08-16
CN111581296A (zh) 2020-08-25

Similar Documents

Publication Publication Date Title
WO2021068610A1 (zh) 资源推荐的方法、装置、电子设备及存储介质
WO2022105115A1 (zh) 问答对匹配方法、装置、电子设备及存储介质
JP7403605B2 (ja) マルチターゲット画像テキストマッチングモデルのトレーニング方法、画像テキスト検索方法と装置
CN104081385A (zh) 从文档表示信息
WO2022048363A1 (zh) 网站分类方法、装置、计算机设备及存储介质
CN111814910B (zh) 异常检测方法、装置、电子设备及存储介质
CN111444956B (zh) 低负载信息预测方法、装置、计算机系统及可读存储介质
WO2021196457A1 (zh) 数据相关性分析方法、装置、计算机系统及可读存储介质
CN110728313B (zh) 一种用于意图分类识别的分类模型训练方法及装置
CN108241867B (zh) 一种分类方法及装置
CN111783039B (zh) 风险确定方法、装置、计算机系统和存储介质
US20230004979A1 (en) Abnormal behavior detection method and apparatus, electronic device, and computer-readable storage medium
CN111966886A (zh) 对象推荐方法、对象推荐装置、电子设备及存储介质
CN112528315A (zh) 识别敏感数据的方法和装置
CN111125658A (zh) 识别欺诈用户的方法、装置、服务器和存储介质
CN112597135A (zh) 用户分类方法、装置、电子设备及可读存储介质
CN113379469A (zh) 一种异常流量检测方法、装置、设备及存储介质
Taghizadeh et al. How meaningful are similarities in deep trajectory representations?
CN113988878B (zh) 一种基于图数据库技术的反欺诈方法及系统
CN113780675B (zh) 一种消耗预测方法、装置、存储介质及电子设备
CN113705201B (zh) 基于文本的事件概率预测评估算法、电子设备及存储介质
CN114549174A (zh) 用户行为预测方法、装置、计算机设备和存储介质
CN111414699A (zh) 信息分析预测方法、装置、计算机系统及可读存储介质
CN110852392A (zh) 一种用户分群方法、装置、设备和介质
CN113127573B (zh) 相关数据的确定方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20928660

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20928660

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.03.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20928660

Country of ref document: EP

Kind code of ref document: A1