CN114266443A - Data evaluation method and device, electronic equipment and storage medium - Google Patents

Data evaluation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114266443A
CN114266443A CN202111436315.7A CN202111436315A CN114266443A CN 114266443 A CN114266443 A CN 114266443A CN 202111436315 A CN202111436315 A CN 202111436315A CN 114266443 A CN114266443 A CN 114266443A
Authority
CN
China
Prior art keywords
data
evaluation
target
dimension
scoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111436315.7A
Other languages
Chinese (zh)
Inventor
洪博然
张宇
杜自然
邵雷
董传晔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shujuwan District Big Data Research Institute
Yu Shiyang
Original Assignee
Shenzhen Shujuwan District Big Data Research Institute
Yu Shiyang
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Shujuwan District Big Data Research Institute, Yu Shiyang filed Critical Shenzhen Shujuwan District Big Data Research Institute
Priority to CN202111436315.7A priority Critical patent/CN114266443A/en
Publication of CN114266443A publication Critical patent/CN114266443A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the application provides a data evaluation method and device, electronic equipment and a storage medium, and relates to the technical field of data circulation, data processing and artificial intelligence. The data evaluation method comprises the following steps: acquiring target data; evaluating the target data according to a preset data evaluation model to obtain grading data of target evaluation dimensions, wherein the number of the target evaluation dimensions is at least two; performing natural language processing on target data to obtain target keywords; performing content recommendation processing on the target data according to the target keyword to obtain recommendation data related to the target data; searching in preset historical evaluation data according to the recommendation data to obtain historical scores of the recommendation data; obtaining a scoring matrix according to historical scoring and scoring data; obtaining weight information according to the scoring matrix; and obtaining an evaluation result according to the weight information and the grading data. By the technical scheme of the embodiment of the application, the efficiency and the accuracy of data evaluation can be improved.

Description

Data evaluation method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of data circulation, data processing and artificial intelligence, in particular to a data evaluation method and device, electronic equipment and a storage medium.
Background
With the maturity and development of big data technology, big data is more and more widely applied to business, the demand of big data transaction is increasing, big data commodities are different from entity commodities, the value of the big data commodities is difficult to evaluate, and the evaluation standard of the industry is not available, so that the problem that how to comprehensively and objectively evaluate the data is ubiquitous in the industry is solved.
The current data evaluation method mainly comprises single-dimensional evaluation and multi-dimensional evaluation, wherein the single-dimensional evaluation tends to evaluate the quality index of data of a single dimension, the evaluation process is mostly manually scoring, the evaluation mode has evaluation errors caused by different experience or interpretation standards of evaluators, the manual scoring consumes time and labor, and the evaluation efficiency is low; the multi-dimensional evaluation tends to evaluate multiple dimensions such as data quality, data asset dimensions and data security dimensions, the evaluation process is mostly manually scored, then weights of different types of indexes are calculated by using a Delphi method, but because different classifications exist in the field of data circulation, the weight of each type of index is easy to change along with the change of the market, and the Delphi method cannot meet the condition of instantaneous change of the market, so that the evaluation accuracy is low.
Disclosure of Invention
The embodiment of the application mainly aims to provide a data evaluation method and device, electronic equipment and a storage medium, and the efficiency and the accuracy of data evaluation can be improved.
To achieve the above object, a first aspect of an embodiment of the present application provides a data evaluation method, including:
acquiring target data;
evaluating the target data according to a preset data evaluation model to obtain scoring data of target evaluation dimensions, wherein the number of the target evaluation dimensions is at least two;
performing natural language processing on the target data to obtain target keywords;
performing content recommendation processing on the target data according to the target keyword to obtain recommendation data related to the target data;
searching in preset historical evaluation data according to the recommendation data to obtain a historical score of the recommendation data;
obtaining a scoring matrix according to the historical scoring and the scoring data;
obtaining weight information according to the grading matrix;
and obtaining an evaluation result according to the weight information and the grading data.
In some embodiments, the target data includes full data, the target evaluation dimension includes a data matching dimension, and the evaluating the target data according to a preset data evaluation model to obtain score data of the target evaluation dimension includes:
extracting the full data according to a preset sample extraction rule to obtain sample data;
and evaluating the data matching degree according to the full data and the sample data to obtain the grading data of the target data in the data matching dimension.
In some embodiments, the evaluating dimensions include data contribution dimensions, and the evaluating the target data according to a preset data evaluating model to obtain score data of the target evaluating dimensions includes:
obtaining a model sub-dimension score and a user use feedback score;
performing variable characteristic analysis on the target data to obtain information quantity corresponding to each variable characteristic;
obtaining a feature sub-dimension score according to the information quantity corresponding to each variable feature;
and obtaining the scoring data of the target data in the data contribution dimension according to the model sub-dimension score, the user use feedback score and the feature sub-dimension score.
In some embodiments, the evaluating dimension includes a data application dimension, and the evaluating the target data according to a preset data evaluation model to obtain score data of the target evaluating dimension includes:
acquiring historical transaction data;
obtaining an application behavior score according to the historical transaction data;
and obtaining the scoring data of the target data in the data application dimension according to the application behavior score and a preset application effect score.
In some embodiments, the obtaining weight information according to the scoring matrix includes:
carrying out data standardization processing on the scoring matrix to obtain a standardized matrix;
calculating the information entropy of each evaluation dimension according to the standardized matrix;
and obtaining the weight information corresponding to each evaluation dimension according to the information entropy.
In some embodiments, the performing content recommendation processing on the target data according to the target keyword to obtain recommendation data related to the target data includes:
acquiring a preset recommended quantity;
calculating similarity information of the target data according to the target keywords;
and obtaining the preset recommendation quantity of the recommendation data related to the target data according to the similarity information.
In some embodiments, the data evaluation method further comprises:
and updating the historical evaluation data according to the evaluation result.
To achieve the above object, a second aspect of the present application proposes a data evaluation apparatus comprising:
the acquisition module is used for acquiring target data;
the evaluation module is used for evaluating the target data according to a preset data evaluation model to obtain scoring data of target evaluation dimensions, and the number of the target evaluation dimensions is at least two;
the natural language processing module is used for carrying out natural language processing on the target data to obtain target keywords;
the recommending module is used for recommending the content of the target data according to the target keyword to obtain recommended data related to the target data;
the searching module is used for searching in preset historical evaluation data according to the recommendation data to obtain the historical score of the recommendation data;
the matrix module is used for obtaining a scoring matrix according to the historical scoring and the scoring data;
the weighting module is used for obtaining weighting information according to the scoring matrix;
and the evaluation result module is used for obtaining an evaluation result according to the weight information and the grading data.
To achieve the above object, a third aspect of the present application provides an electronic apparatus comprising:
at least one memory;
at least one processor;
at least one program;
the programs are stored in a memory and a processor executes the at least one program to implement the method of the present application as described in the above first aspect.
To achieve the above object, a fourth aspect of the present application proposes a storage medium that is a computer-readable storage medium storing computer-executable instructions for causing a computer to execute:
a method as described in the first aspect above.
According to the data evaluation method and device, the electronic device and the storage medium, evaluation processing is carried out on target data through a preset data evaluation model to obtain grading data of target evaluation dimensionality, natural language processing is carried out on the target data to obtain target keywords, content recommendation processing is carried out on the target data according to the target keywords to obtain recommendation data related to the target data, searching is carried out in preset historical evaluation data according to the recommendation data to obtain historical grading of the recommendation data, a grading matrix is obtained by combining the historical grading and the grading data of the target data, weight information is obtained by using the grading matrix, an evaluation result is obtained by calculation according to the weight information and the grading data, and evaluation efficiency and accuracy are improved.
Drawings
Fig. 1 is a flowchart of a data evaluation method according to an embodiment of the present application.
FIG. 2 is a flow diagram of step 120 of FIG. 1 in one embodiment.
FIG. 3 is a flow chart of step 120 of FIG. 1 in another embodiment.
FIG. 4 is a flowchart of step 120 of FIG. 1 in yet another embodiment.
Fig. 5 is a flowchart of step 170 in fig. 1.
Fig. 6 is a flowchart of step 140 in fig. 1.
Fig. 7 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
It should be noted that although functional blocks are partitioned in a schematic diagram of an apparatus and a logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the partitioning of blocks in the apparatus or the order in the flowchart. The terms first, second and the like in the description and in the claims, and the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.
First, several terms referred to in the present application are resolved:
delphi Method (DM), also known as expert survey, was pioneered by the united states lander in 1946, which is essentially a feedback anonymous inquiry Method, whose general procedure is to characterize the opinions of experts on the problem to be predicted, then to sort, summarize, count, feed back to each expert anonymously, ask again for opinions, then to concentrate, and feed back again until consistent opinions are obtained. The method is a method that enterprises form a special forecasting organization, which comprises a plurality of experts and enterprise forecasting organizers, according to a specified program, the experts are inquired back to back about the opinion or judgment of the future market, and then forecasting is carried out. Delphi is the Chinese translation name of Delphi. The delphi method is essentially a feedback anonymity function query method. The general process is as follows: after the opinions of the experts are characterized for the problems to be predicted, the problems are sorted, induced and counted, and then fed back to each expert anonymously, the opinions are solicited again, concentrated and fed back again until the consistent opinions are obtained. The process can be simply expressed as follows: anonymous solicitation expert opinion-induction, statistics-anonymous feedback-induction, statistics … … stop after several rounds, so the delphi method is a collective anonymous thought exchange process using a form of inquiry. It has three characteristics that are obviously different from other expert prediction methods, namely anonymity, multiple feedback and group statistical answers.
Natural Language Processing (NLP), also called Natural Language Understanding (NLU), also called Computational Linguistics (Computational Linguistics), is one of the branches of linguistic information Processing on the one hand and one of the core topics of Artificial Intelligence (AI) on the other hand. According to different technical implementation difficulties, such systems can be divided into three types, namely simple matching type, fuzzy matching type and paragraph understanding type. The simple matching type tutoring and answering system mainly realizes the matching of questions proposed by students and related answering items in an answer library through a simple keyword matching technology, thereby realizing the automatic answering of the questions or the related tutoring. The fuzzy matching type tutoring and answering system increases the matching of synonyms and antonyms on the basis of the fuzzy matching type tutoring and answering system. Thus, even if the student does not find a directly matching answer in the answer library according to the original keyword in the question, if the words synonymous with the keyword or antisense to the keyword can be matched, the relevant answer item can be found in the answer library. Paragraph understanding type tutoring and answering system is the most ideal and really intelligent tutoring and answering system. However, the system relates to paragraph understanding of natural language, and for Chinese, the understanding relates to various complex technologies in the NLP field such as automatic word segmentation, part of speech analysis, syntactic analysis and semantic analysis, so that the realization difficulty is high.
Information Value (IV), in the machine learning binary problem, the IV value is mainly used to encode input variables and estimate prediction capability. The magnitude of the characteristic variable IV value represents the strength of the variable prediction capability. The value of IV is [0, plus infinity) ], if the current packet contains only responding clients or non-responding clients, the IV is plus infinity.
And (3) word segmentation, wherein word segmentation is the basis of natural language processing, and the word segmentation accuracy directly determines the quality of subsequent part-of-speech tagging, syntactic analysis, word vector and text analysis. English sentences use spaces to separate words, and the word segmentation problem is not considered in most cases except for certain specific words such as how, New York and the like. However, Chinese is different, and naturally lacks separators, requiring the reader to self-divide words and break sentences. Therefore, we need to perform word segmentation first when doing Chinese natural language processing. The complete chinese natural language processing process generally includes the following five chinese processing core technologies: word segmentation, part of speech tagging, named entity recognition, dependency syntax analysis and semantic analysis. The word segmentation is the basis of Chinese natural language processing, and Chinese word segmentation is used in technologies such as a search engine, text mining, machine translation, keyword extraction, automatic abstract generation and the like, and comprises recently learned chat robots, text similarity and the like.
Entity Identification (EI) is an information extraction technique for acquiring entity data such as a person name and a place name from text data. In NLP, entities are commonly referred to as names of people, places, and organizations, and in the news domain, people, places, organizations, and so on, want to know the subject of an emergency. If the words are expanded, the words which are concerned by the user, such as the title of the product, are concerned by the brand words, the item words and the item attribute words, and the words are added with the emotional polar words, so that the shopping willingness of the customer can be known in more detail. The recognition step is divided into two steps, wherein the first step is to recognize the entity word boundary, namely the starting position and the ending position of the entity; the second step identifies the entity type, i.e., the specific entity types mentioned above, such as name of person, place name, organization name, etc. From the recognition method, the first type is based on the regular rule, namely the word formation rule of the entity words and the high-frequency context words.
Semantic analysis is the core of natural language processing technology. Semantic analysis is a method for analyzing semantic information based on natural language, which not only performs analysis on grammatical level such as lexical analysis and syntactic analysis, but also relates to meanings contained in words, phrases, sentences and paragraphs, and aims to represent the structure of the language by the semantic structure of the sentences. The semantic analysis technique specifically includes lexical analysis, syntactic analysis, pragmatic analysis, contextual analysis, and the like. The lexical analysis comprises two aspects of morphological analysis and lexical analysis. Generally speaking, the morphological analysis is mainly expressed in analyzing prefixes, suffixes and the like of words, and the lexical analysis is expressed in controlling the whole lexical system, so that the characteristics of information input by a user can be accurately analyzed, and finally, the search process is accurately completed; the syntactic analysis is to analyze the vocabulary phrases of the natural language input by the user, and aims to identify the syntactic structure of a sentence so as to realize the process of automatic syntactic analysis; the pragmatic analysis is added with the analysis of context, language background, context and the like compared with the semantic analysis, namely, the additional information of image, interpersonal relationship and the like is extracted from the structure of an article, and the pragmatic analysis is a higher-level linguistic analysis. It associates the content in the sentence with the real-life details, thus forming a dynamic ideographic structure; contextual analysis refers primarily to techniques for analyzing large numbers of "gaps" outside the context of the original query language in order to more accurately interpret the desired query language. These "gaps" include general knowledge, domain-specific knowledge, and the needs of the querying user, among others.
A Recommendation Algorithm (RA), which is an algorithm in computer specialties, and what a user may like is presumed through some mathematical algorithms, and a place where the recommendation algorithm is well applied is mainly a network. The recommendation algorithm is to use some behaviors of the user to infer what the user may like through some mathematical algorithms. The recommendation algorithm is very old and is needed and applied when machine learning is not yet emerging. The recommendation algorithm comprises content-based recommendation, coordinated filtering recommendation, mixed recommendation, rule-based recommendation, demographic information-based recommendation and the like, wherein the content-based recommendation generally depends on some knowledge of natural language processing NLP, and the preference of a user is obtained by mining TF-IDF characteristic vectors of texts so as to make recommendation. The recommendation algorithm can find unique preferences of the user and has better interpretability; the collaborative filtering recommendation is the most popular type in the recommendation algorithm at present, has a variety of styles, and has been widely applied in the industry. The advantage of the collaborative filtering recommendation is that it does not require much knowledge in a specific domain, and a better recommendation effect can be obtained by a machine learning algorithm based on statistics. The method has the greatest advantage of easy realization in engineering and can be conveniently applied to products. At present, most of recommendation algorithms applied in practice are collaborative filtering recommendation algorithms; the hybrid recommendation is similar to ensemble learning in machine learning, and is popular, a better recommendation algorithm is obtained through combination of a plurality of recommendation algorithms, for example, a model of the plurality of recommendation algorithms is established, and finally a final recommendation result is determined by a voting method. The hybrid recommendation is theoretically no worse than any single recommendation algorithm, but the algorithm complexity is improved by using the hybrid recommendation, and the hybrid recommendation is used in practical application, but a single coordination filtering recommendation algorithm is not provided, such as a binary recommendation algorithm like logistic regression, and the like; common rules-based recommendation methods such as most user clicks and most user browses belong to popular recommendation methods, and are not mainstream in the current big data era; the recommendation based on the demographic information is the simplest recommendation algorithm, the relevance degree of the user is found according to the basic information of the system user, and then the recommendation is carried out, which is less used in a large-scale system at present.
Cosine Similarity (CS), also called Cosine similarity, is an evaluation of similarity between two vectors by calculating the Cosine of the angle between them. Cosine similarity maps vectors into a vector space, such as the most common two-dimensional space, according to coordinate values. Cosine similarity measures the similarity between two vectors by measuring their cosine values of their angle. The cosine value of the 0-degree angle is 1, and the cosine value of any other angle is not more than 1; and its minimum value is-1. The cosine of the angle between the two vectors thus determines whether the two vectors point in approximately the same direction. When the two vectors have the same direction, the cosine similarity value is 1; when the included angle of the two vectors is 90 degrees, the value of the cosine similarity is 0; the cosine similarity has a value of-1 when the two vectors point in completely opposite directions. The result is independent of the length of the vector, only the pointing direction of the vector. Cosine similarity is commonly used in the positive space, and therefore gives values between-1 and 1. In information retrieval, each term is assigned a different dimension, and a dimension is represented by a vector whose values in the respective dimensions correspond to the frequency with which the term appears in the document. Cosine similarity may thus give the similarity of two documents in terms of their subject matter. In addition, cosine similarity is commonly used for file comparison in text mining. In addition, in the field of data mining, cosine similarity is used to measure the cohesion inside a cluster.
Euclidean distance (EM), also known as the euclidean metric, is mathematically the distance between two points in euclidean space. Using this distance, the euclidean space becomes the metric space. The associated norm is called the euclidean norm. Earlier documents are referred to as the Pythagorean metric. The euclidean distance is a commonly used distance definition, and refers to the true distance between two points in m-dimensional space, or the natural length of a vector, and the euclidean distance in two-dimensional and three-dimensional space is the actual distance between two points.
Pearson Correlation Coefficients (PCCs) are statistically used to measure the correlation between two variables X and Y, with values between-1 and 1.
Spearman's correlation coefficient for linked data is mainly used to solve the problem of the correlation between name data and sequence data. It is suitable for two rows of variables and has data with linear relation of the property of grade variable. The spearman rank correlation is deduced by british psychologists, statisticians, based on the concept of product difference correlation, and some consider spearman rank correlation as a special form of product difference correlation.
The entropy weight method is characterized in that according to the explanation of the basic principle of information theory, information is a measure of the degree of system order, and entropy is a measure of the degree of system disorder; according to the definition of the information entropy, the degree of dispersion of a certain index can be judged by using the entropy value, the smaller the information entropy value is, the larger the degree of dispersion of the index is, the larger the influence (namely weight) of the index on the comprehensive evaluation is, and if the values of the certain index are all equal, the index does not play a role in the comprehensive evaluation. Therefore, the weight of each index can be calculated by using the information entropy tool, and a basis is provided for multi-index comprehensive evaluation. In information theory, entropy represents a measure of uncertainty. The entropy weight method is an objective weighting method and comprises the following calculation steps: a. constructing a judgment matrix of each evaluation index in each year: b. carrying out normalization processing on the judgment matrix to obtain a normalized judgment matrix; c. according to the definition of the entropy and the evaluation index of each year, the entropy of the evaluation index can be determined; d. defining an entropy weight; after the entropy of the nth index is defined, the entropy weight of the nth index can be obtained; f. and calculating the weight value of the system.
The Population Stability Index (PSI) is used for measuring the difference of data distribution between scores of the test sample and the modeling sample, and is a common indicator of model stability, generally, PSI is less than 0.1, model stability is high, PSI between 0.1 and 0.2 is general, and PSI greater than 0.2 suggests re-iterating the model.
With the maturity and development of big data technology, big data is more and more widely applied to business, the demand of big data transaction is increasing, big data commodities are different from entity commodities, the value of the big data commodities is difficult to evaluate, and the evaluation standard of the industry is not available, so that how to comprehensively and objectively evaluate the data is a problem generally existing in the industry.
The current data evaluation method mainly comprises single-dimensional evaluation and multi-dimensional evaluation, wherein the single-dimensional evaluation tends to evaluate the quality of data of a single dimension, most evaluation processes are manually scoring, evaluation errors caused by different experience or interpretation standards of evaluation personnel exist in the evaluation mode, the manual scoring consumes time and labor, and the evaluation efficiency is low; the multi-dimensional evaluation tends to evaluate multiple dimensions such as data quality, data asset dimensions and data security dimensions, the evaluation process is mostly manually scored, then weights of different types of indexes are calculated by using a Delphi method, but because different classifications exist in the field of data circulation, the weight of each type of index is easy to change along with the change of the market, and the Delphi method cannot meet the condition of instantaneous change of the market, so that the evaluation accuracy is low.
Based on this, the embodiments of the present application provide a data evaluation method and apparatus, an electronic device, and a storage medium, which can improve the efficiency and accuracy of data evaluation.
The embodiments of the present application provide a data evaluation method and apparatus, an electronic device, and a storage medium, which are specifically described with reference to the following embodiments, and first describe the data evaluation method in the embodiments of the present application.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The embodiment of the application provides a data evaluation method, which relates to the technical field of data circulation, data processing and artificial intelligence, in particular to the technical field of data mining. The data evaluation method provided by the embodiment of the application can be applied to a terminal, a server side and software running in the terminal or the server side. In some embodiments, the terminal may be a smartphone, tablet, laptop, desktop computer, smart watch, or the like; the server can be an independent server, and can also be a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, Content Delivery Network (CDN), big data and artificial intelligence platform and the like; the software may be an application or the like that implements a data evaluation method, but is not limited to the above form.
The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiment of the application provides a data evaluation method, which comprises the following steps: acquiring target data; evaluating the target data according to a preset data evaluation model to obtain grading data of target evaluation dimensions, wherein the number of the target evaluation dimensions is at least two; performing natural language processing on target data to obtain target keywords; performing content recommendation processing on the target data according to the target keyword to obtain recommendation data related to the target data; searching in preset historical evaluation data according to the recommendation data to obtain historical scores of the recommendation data; obtaining a scoring matrix according to historical scoring and scoring data; obtaining weight information according to the scoring matrix; and obtaining an evaluation result according to the weight information and the grading data.
As shown in fig. 1, fig. 1 is a flowchart of a data evaluation method provided in some embodiments, where the data evaluation method includes, but is not limited to, steps S110 to S180, and specifically includes:
s110, acquiring target data;
s120, evaluating the target data according to a preset data evaluation model to obtain grading data of target evaluation dimensions, wherein the number of the target evaluation dimensions is at least two;
s130, natural language processing is carried out on the target data to obtain target keywords;
s140, performing content recommendation processing on the target data according to the target keyword to obtain recommendation data related to the target data;
s150, searching in preset historical evaluation data according to the recommendation data to obtain historical scores of the recommendation data;
s160, obtaining a scoring matrix according to historical scoring and scoring data;
s170, obtaining weight information according to the scoring matrix;
and S180, obtaining an evaluation result according to the weight information and the grading data.
In step S110, the target data is data to be evaluated, in a specific embodiment, the data transaction process involves a buyer, a seller, and an evaluator, the seller uploads the target data to be sold, the evaluator evaluates the target data, and the evaluation result is used for pricing the target data, so that the buyer completes the transaction according to the priced target data as a commodity.
In step S120, the preset data evaluation model is used to perform multidimensional scoring on the target data, where evaluation rules of each target evaluation dimension are different, and at least two target evaluation dimensions are provided.
In a specific embodiment, the number of target evaluation dimensions is eight (S1-S8), which are: s1, data matching dimensionality; s2, data quality dimension; s3, data contribution dimension; s4, data credit degree dimension; s5, evaluating dimensionality of data; s6, data application dimension; s7, data asset dimension; and S8, managing data maturity. In the data evaluation method provided by the application, for the defects of the current data evaluation process, the eight target evaluation dimensions are fully considered, particularly the S1 data matching dimension, the S3 data contribution dimension and the S6 data application dimension, the evaluation process is realized by a brand-new evaluation angle and an evaluation rule matched with the evaluation angle, and the accuracy of data evaluation is remarkably improved.
In step S130, the target data further includes data information, which includes, but is not limited to, the name of the target data as a commodity, the description of the target data, the category of the industry where the target data is located, the application scenario, the data update time, and the like. And when uploading the target data, the seller synchronously uploads the data information of the target data to the evaluation system, so that the evaluation system can perform natural language processing on the target data according to the data information of the target data and extract the target keywords corresponding to the data information.
Specifically, natural language processing includes, but is not limited to, word segmentation processing, entity recognition, part-of-speech recognition, keyword extraction, semantic understanding, and the like, such as algorithms tf-idf, transform, textrank, encoder-decoder, and the like.
In step S140, the recommendation data includes other products related to the characteristics of the product of the seller, and in a specific embodiment, the recommendation is performed by using a content recommendation algorithm according to the target keyword obtained in step S130, so as to obtain associated recommendation data, where the recommendation data includes a product list associated with the product of the seller.
In some embodiments, content recommendation algorithms include, but are not limited to, content-based recommendations, collaborative filtering recommendations, hybrid recommendations, rule-based recommendations, demographic information-based recommendations, and the like; the similarity calculation method employed in the content recommendation algorithm includes, but is not limited to, a cosine similarity method, an euclidean distance method, a pearson correlation coefficient, a spearman rank correlation coefficient method, and the like.
Specifically, if the goods to be evaluated are communicated user tag data; the obtained keywords are: user tag data; the related goods are the label data of users of mobile, telecommunication, bank and E-commerce; if the obtained key words are wind control; the related commodities are the wind control of the bank, the wind control of the e-commerce, the user data of credit investigation and the user data of insurance.
In step S150, preset historical evaluation data is an evaluation record obtained in a past evaluation process, and the historical evaluation data includes a name of target data to be evaluated each time, evaluation scores of each dimension, and the like. According to the recommendation data obtained in step S140, the historical scores of the related data commodities are found for reference in the evaluation process of the target data.
In step S180, if the obtained score data of the target data in the eight dimensions are: X1-X8; the weight information of the target data in eight dimensions is obtained as follows: W1-W8; then the scoring result Y ═ X1 × W1+ X2 × W2+. + X8 × W8.
In steps S160 to S180, the scoring data of the target data and the historical scores of the recommended data are subjected to splicing and tabulation to obtain a scoring matrix, the scoring matrix is used for obtaining weight information corresponding to each target evaluation dimension according to the scoring matrix, and then obtaining an evaluation result of the target data according to the weight information, wherein the evaluation result is a final evaluation score of the target data in eight dimensions.
According to the data evaluation method provided by the embodiment of the application, evaluation processing is carried out on target data through a preset data evaluation model to obtain scoring data of target evaluation dimensionality, natural language processing is carried out on the target data to obtain target keywords, content recommendation processing is carried out on the target data according to the target keywords to obtain recommendation data related to the target data, searching is carried out in preset historical evaluation data according to the recommendation data to obtain historical scoring of the recommendation data, a scoring matrix is obtained by combining the historical scoring and the scoring data of the target data, weighting information is obtained by using the scoring matrix, and an evaluation result is obtained by calculating according to the weighting information and the scoring data, so that the evaluation process of the target data is completed, and the evaluation efficiency and the evaluation accuracy are improved. Meanwhile, the evaluation result of the target data obtained by evaluation can provide a pricing reference for the seller and also provide an auxiliary opinion for the purchasing decision of the buyer.
In some embodiments, the target data includes full data, the target evaluation dimension includes a data matching dimension, and the evaluation processing is performed on the target data according to a preset data evaluation model to obtain score data of the target evaluation dimension, including: extracting the full-scale data according to a preset sample extraction rule to obtain sample data; and evaluating the data matching degree according to the full data and the sample data to obtain the grading data of the target data in the data matching dimension.
Fig. 2 is a flowchart of step 120 in fig. 1, where step S120 illustrated in fig. 2 includes, but is not limited to, steps S210 to S220:
s210, extracting sample data from the full-size data according to a preset sample extraction rule;
and S220, evaluating the data matching degree according to the full data and the sample data to obtain the grading data of the target data in the data matching dimension.
In step S210, the target data includes full data and sample data, where the full data is all data in the data warehouse of the target data, and the sample data is a sample extracted from the full data and used to characterize sample characteristics of the full data.
In some embodiments, the manner of sample data extraction includes, but is not limited to, the following two: (1) the service personnel of the seller manually extract according to the actual service condition; (2) and automatically extracting according to a certain extraction rule, wherein the extraction rule comprises consideration of factors such as the proportion of column labels, the proportion of row numbers, the distribution condition of the whole sample and the like. In order to ensure that the distribution of the sample data and the full-scale data is the same or similar, the data matching degree of the sample data and the full-scale data needs to be detected.
Specifically, the extraction rule instance: 10000 records of the total data a are divided into 10 labels; at this time, the sample extraction is performed on the full-scale data according to the synchronous comparison, generally, the data volume of the sample data accounts for 10% -15% of the full-scale data, and the volume of the sample data can be reduced to the maximum extent under the condition that the data characteristics are not distorted. The sync ratio represents the matching degree of the tags between the sample data and the full-scale data, and in order to make the distribution of the sample data substantially consistent with that of the full-scale data, the sync ratio needs to be close to 100%.
In step S220, a data matching dimension, also referred to as a sample data dimension, examines the degree of similarity of the trial data set and the overall data. The metrics referenced in the scoring of the data matching dimension include, but are not limited to: sampling data set sample size ratio, sampling data set characteristic quantity ratio and data distribution consistency; the ratio of the sample volume of the trial data set to the sample volume of the full data is represented; the ratio of the characteristic quantity of the trial data set to the characteristic quantity of the representative trial data set to the characteristic quantity of the total data; the data distribution consistency is used for inspecting the characteristic distribution of the trial data set and the difference of the overall data characteristic distribution.
In particular embodiments, the scoring process includes answers that are not limited to: (1) when the integral data set is loaded, the boundary threshold value of each characteristic ten-equal part is stored, and the trial data sets are grouped according to the same threshold value; (2) calculating PSI of each feature, and simultaneously inspecting sample amount ratio and feature amount ratio of the trial data set; (3) and obtaining the scoring data of the target data in the data matching dimension according to the number of the target data features and the PSI range.
It should be noted that if each feature PSI is less than 0.1, the distribution difference between the whole data set and the trial data set is slight, and the similarity is high; if the characteristic 0.1< ═ PSI <0.25 exists, the difference between the trial data set and the whole data set is small; if the characteristic PSI > is 0.25, the difference between the trial data set and the whole data set is larger.
Specifically, the scoring rules include, but are not limited to, so that when each feature PSI <0.1, a score of 80-100 is scored; presence feature 0.1< ═ PSI <0.25, score 50-80; less characteristic PSI > -0.25, score 30-50; there are more features PSI > 0.25, scoring under 30 points.
Note that, if the target data does not have corresponding sample data, the score data of the target data in the data matching dimension is 0.
In some embodiments, the evaluating dimension includes a data contribution dimension, and evaluating the target data according to a preset data evaluation model to obtain score data of the target evaluating dimension includes: obtaining a model sub-dimension score and a user use feedback score; performing variable characteristic analysis on the target data to obtain information quantity corresponding to each variable characteristic; obtaining a feature sub-dimension score according to the information quantity corresponding to each variable feature; and obtaining the scoring data of the target data in the data contribution dimension according to the model sub-dimension scoring, the user use feedback scoring and the characteristic sub-dimension scoring.
Fig. 3 is a flowchart of another embodiment of step 120 in fig. 1, and step S120 illustrated in fig. 3 includes, but is not limited to, steps S310 to S340:
s310, obtaining model sub-dimension scores and user use feedback scores;
s320, performing variable characteristic analysis on the target data to obtain information quantity corresponding to each variable characteristic;
s330, obtaining a feature sub-dimension score according to the information quantity corresponding to each variable feature;
and S340, obtaining the scoring data of the target data in the data contribution dimension according to the model sub-dimension scoring, the user use feedback scoring and the feature sub-dimension scoring.
In particular embodiments, the data contribution dimension reviews the effectiveness of vendor data in a practical application.
In step S310, the indexes referred to by the model sub-dimension scores include, but are not limited to: AUC boost maximum ratio, AUC boost mean, KS boost maximum ratio, KS boost mean, precision boost maximum ratio, precision boost mean, recall boost maximum ratio, recall boost mean, F1-score boost maximum ratio, F1-score boost mean; the index represents the maximum value of the effect promotion amplitude and the mean value of the promotion rate after certain seller data are introduced and compared with the data modeling effect before introduction. The data before introduction may be data of a single participant or data of a federation. The principle of scoring is that the larger the promotion rate of the effect is, the higher the score is, specifically: the effect improvement rate of the model is more than 20 percent, and the score is 4-5 points; the effect improvement rate of the model is more than 10 percent, and the score is 3; the effect promotion rate of the model is greater than 0, and the score is 1-2.
In addition, the indexes referred by the model sub-dimension score also comprise the maximum value of the Shapril value and the average value of the Shapril value, and the indexes represent the maximum contribution degree and the average contribution value calculated by the Shapril value method under the condition of alliances of different numbers of participants after the seller collaborates with multiple clients. The principle of scoring is that the higher the seller data contribution value, the higher the score, for the same number of participants in the transaction.
In step S310, the indexes referred to by the user using the feedback score include, but are not limited to: the number of application scenes, the number of the buyers who have collaborated, the maximum collaboration life, the cumulative calling times after online, the purchasing proportion after the trial of the client, the deviation of production data and trial data and the like.
The application scene number is the number of scenes in which the transaction is completed on the platform and has been cooperated, and is suitable for the seller to declare the scene which has been cooperated and the scene of the exchange service, and the scoring principle is as follows: the more scenes of actual service in the exchange, the higher the score; the number of the cooperated buyers is the accumulated number of the buyers cooperated with the products of the sellers, and the scoring principle is as follows: the greater the number of customers the exchange seller has collaborated, the higher the score; the maximum cooperation age is the maximum cooperation age in the cooperative buyer clients, and the scoring principle is as follows: in the cooperative clients of the exchange, the longer the cooperation years, the higher the score; the accumulated calling times after the seller product is online are the accumulated calling times after the seller product is online, and the grading principle is as follows: the higher the cumulative calling times after the online is, the higher the score is; the purchasing proportion of the customer after the trial is the purchasing rate of the customer after the trial of the data product, and the scoring principle is as follows: the higher the purchase rate after trial, the higher the score; the deviation between the production data and the trial data is the transaction achievement, and the scoring principle is as follows: and after the data are on line, deviation of model effect of the production data and the trial data is achieved. The smaller the deviation of the model effect of the production data from the trial data, the higher the score.
In step S320, the variable feature analysis includes: and performing variable characteristic analysis by using a privacy computing platform of a ready-made third party during data mining, wherein the variable characteristic is a variable parameter which is useful for the model when the data is used for modeling, and obtaining information quantity corresponding to each variable characteristic after the variable characteristic analysis for a subsequent grading process.
The information amount is an Information Value (IV) indicating the strength of the variable prediction capability.
In step S330, the scoring rule of the feature sub-dimension score is that the larger the number of features with larger IV values, the higher the score. The method specifically comprises the following steps: classifying the features according to the information value to obtain the number of the features with the IV value larger than 0.5, and if the ratio of the features with the IV value larger than 0.5 is larger than 20%, scoring for 5 points; if the ratio of the IV value is more than 0.5, the score is 2-4; if the percentage of features with IV values greater than 0.5 is less than 5%, the score is less than 2.
In step S340, the model sub-dimension score, the feedback score used by the user, and the feature sub-dimension score are weighted and summed according to the score proportion of each of the model sub-dimension score, the feedback score used by the user, and the feature sub-dimension score, so as to obtain the score data of the target data in the data contribution dimension, thereby realizing the evaluation of the target data in the data contribution dimension.
In some embodiments, the evaluating dimension includes a data application dimension, and the evaluating process is performed on the target data according to a preset data evaluation model to obtain the scoring data of the target evaluating dimension, including: acquiring historical transaction data; obtaining an application behavior score according to historical transaction data; and obtaining the grading data of the target data in the data application dimension according to the application behavior grading and the preset application effect grading.
Fig. 4 is a flowchart of step S120 in fig. 1 according to another embodiment, and step S120 illustrated in fig. 4 includes, but is not limited to, steps S410 to S430:
s410, acquiring historical transaction data;
s420, obtaining application behavior scores according to historical transaction data;
and S430, obtaining the grading data of the target data in the data application dimension according to the application behavior grading and the preset application effect grading.
In step S410, the historical transaction data includes, but is not limited to, the number of data commodity purchasing institutions, the number of data commodity accumulated application scenes, the number of historical transactions (number of purchases), the number of historical uses (volume), the online service call volume (accumulation), the online service call total volume (year) of the last year, the historical transaction price-stability, the evaluation score of the buyer for the data application, and the like.
In step S420, the application behavior score is obtained by manually scoring and summing the items according to the historical transaction data.
In step S430, the indexes referred to by the preset application effect score include, but are not limited to: and data application service promotion both need to be filled in manually. Wherein, the data application service is improved, and the value range is 0-100% for the specific annual improvement efficiency; data application services are promoted to a specific annual reduction in cost, in units of ten thousand yuan. And the grading data of the data application dimension is used for inspecting the evaluation condition of the data product and the merchant.
In some embodiments, deriving the weight information from the scoring matrix comprises: carrying out data standardization processing on the scoring matrix to obtain a standardized matrix; calculating the information entropy of each evaluation dimension according to the standardized matrix; and obtaining the weight information corresponding to each evaluation dimension according to the information entropy.
Fig. 5 is a flowchart of an embodiment of step S170 in fig. 1, and step S170 illustrated in fig. 5 includes, but is not limited to, step S510 to step S530:
s510, carrying out data standardization processing on the scoring matrix to obtain a standardized matrix;
s520, calculating the information entropy of each evaluation dimension according to the standardized matrix;
s530, obtaining weight information corresponding to each evaluation dimension according to the information entropy.
In step S510, the data normalization process includes, but is not limited to, normalization, and since the measurement units of the indexes are not uniform, before the comprehensive index is calculated by using them, the normalization process is performed, that is, the absolute value of the index is converted into a relative value, so as to solve the problem of homogenization of the different index values. In addition, the positive index and the negative index have different meanings, the higher the positive index is, the better the negative index is, and the lower the negative index is, the better the data standardization processing needs to be performed by adopting different algorithms for the positive index and the negative index.
In steps S520 to S530, the information entropy of each index is calculated according to the calculation formula of the information entropy, and the weight of each index is calculated through the information entropy to obtain the weight information corresponding to each evaluation dimension, so that the weight is subsequently brought into the evaluation data of the seller commodity to obtain the final evaluation result.
In some embodiments, performing content recommendation processing on target data according to the target keyword to obtain recommendation data related to the target data includes: acquiring a preset recommended quantity; calculating similarity information of the target data according to the target keywords; and obtaining a preset recommendation quantity of recommendation data related to the target data according to the similarity information.
Fig. 6 is a flowchart of another embodiment of step 140 in fig. 1, and step S140 illustrated in fig. 6 includes, but is not limited to, step S610 to step S630:
s610, acquiring a preset recommended quantity;
s620, calculating similarity information of the target data according to the target keywords;
and S630, obtaining a preset recommendation quantity of recommendation data related to the target data according to the similarity information.
In step S610, the preset recommended number is used to indicate the number of the searched related data products, and in a specific embodiment, the preset recommended number is 10.
In steps S620 to S630, similarity calculation is performed according to the key word and the key word of the target keyword and the data product in the existing transaction platform, and the calculation manner includes, but is not limited to, methods such as cosine similarity, euclidean distance, pearson correlation coefficient, spearman rank correlation coefficient, and the like, so as to sort the related commodities according to the similarity, measure the related commodities with higher similarity according to the preset recommendation number, and obtain the corresponding recommendation data, so as to realize generation of a subsequent scoring matrix and calculation of weight, thereby completing the data evaluation process.
In some embodiments, the data evaluation method further comprises: and updating the historical evaluation data according to the evaluation result.
The historical evaluation data in the step S150 is updated according to the evaluation result in the step S180, so that the evaluation data of each evaluated commodity is collected into the historical evaluation data to be used as a reference for the next data evaluation, thereby realizing a dynamic update process of the weight information and improving the accuracy of the data evaluation.
In a specific embodiment, the scoring data in step S160 is shown in table 1, where table 1 illustrates the scores of the seller products in eight dimensions from dimension 1 to dimension 8, for example, the score of the seller product in dimension 1 is 70 points; the seller commodity scores 50 points in dimension 7.
Dimension 1 Dimension 2 Dimension 3 Dimension 4 Dimension 5 Dimension 6 Dimension 7 Dimension 8
Seller commodity 70 80 30 70 60 10 50 70
TABLE 1
The historical scores of the recommendation data are shown in table 2, where table 2 illustrates 10 dimension-related data commodities from correlation 1 to correlation 10, scores in 8 dimensions from dimension 1 to dimension 8, and correlation 1 is a data commodity associated with a seller commodity, for example, the score of the data commodity "correlation 1" in dimension 1 is 70 points; the data item "related 2" scored 60 points in dimension 6.
Dimension 1 Dimension 2 Dimension 3 Dimension 4 Dimension 5 Dimension 6 Dimension 7 Dimension 8
Correlation 1 70 80 60 50 50 70 50 80
Correlation 2 80 70 90 70 60 60 70 80
Correlation 3 70 70 50 50 70 60 50 80
Correlation 4 50 80 70 70 60 70 50 70
Correlation 5 30 80 80 40 30 60 70 80
Correlation 6 70 80 70 50 60 10 50 100
Correlation 7 80 70 40 80 80 60 50 80
Correlation 8 60 80 30 60 60 70 70 60
Correlation 9 90 70 70 70 70 60 90 70
Correlation 10 40 30 40 50 60 70 80 60
TABLE 2
The scoring matrix is shown in table 3, where table 3 illustrates the scoring matrix obtained by combining table 1 and table 2, for example, the score of a seller commodity in dimension 6 is 10 points; the data item "related 3" scored 60 points in dimension 6.
Dimension 1 Dimension 2 Dimension 3 Dimension 4 Dimension 5 Dimension 6 Dimension 7 Dimension 8
Seller commodity 70 80 30 70 60 10 50 70
Correlation 1 70 80 60 50 50 70 50 80
Correlation 2 80 70 90 70 60 60 70 80
Correlation 3 70 70 50 50 70 60 50 80
Correlation 4 50 80 70 70 60 70 50 70
Correlation 5 30 80 80 40 30 60 70 80
Correlation 6 70 80 70 50 60 10 50 100
Correlation 7 80 70 40 80 80 60 50 80
Correlation 8 60 80 30 60 60 70 70 60
Correlation 9 90 70 70 70 70 60 90 70
Correlation 10 40 30 40 50 60 70 80 60
TABLE 3
In a specific embodiment, in addition to the S1 data matching dimension, the S3 data contribution dimension, and the S6 data application dimension of the eight target evaluation dimensions mentioned above, the target evaluation dimensions further include: the data quality dimension of S2, the credibility dimension of S4, the appraisal dimension of S5, the assets dimension of S7 and the management maturity of S8 are described in a one-to-one manner.
The data quality dimension (S2) considers the quality of the overall data and trial data set, and its sub-dimensions include, but are not limited to: the overall data quality condition and the sampling trial data quality condition; taking the overall data quality situation as an example, the reference indexes include but are not limited to: overall data coverage, data dimension type, timeliness, desensitization, applicability-feature dimension.
Wherein, the whole data coverage rate is surveyed and is sold whole data cover visitor and crowd the number, covers visitor and crowd the number more and score higher more, and specific rule of grading is: covering the customer group with billions of people and scoring 8-10 points; covering more than ten million people in the customer group, and scoring for 5-8 points; the crowd covering the guest group is less than ten million, and the score is 3-5.
Data dimension type measures the degree of scarcity of a data product in the seller market. The more scarce the product dimension, the higher the score. The specific scoring rule is as follows: only 1-2 products of the same dimension type are sold in the seller market, the seller completely controls the price, and the grade is 8-10 points; 3-10 products of the same dimension type are sold in the seller market, and have substitutes with different prices with a score of 5-8; products of the same dimension type are sold in more than 10 in the seller market, and each price is almost the same, and the score is 3-5.
Timeliness includes time sequence, update frequency, timeliness based on time points; time-sequence, the relative time-sequence relationship between data elements of the same entity in a data set; the longer the time span of the stored data set sample is, the higher the score is, and the highest score is 5; the updating frequency comprises the frequency of days/weeks/months/seasons/years and the like, and the higher the updating frequency is, the higher the score is; based on the timeliness of the time points, the shorter the data delay time is, the higher the score is; the scoring principle of updating frequency and timeliness is as follows: updating to 5 points in real time, updating to 4 points for T +1, and updating to 3 points for T + 2-7; t + 7-30 is 2 points, more than T +30 is 1 point, and T is the updating period.
The desensitization rate is whether the returned result of the characteristic data is an interval value or a specific value, and the lower the desensitization rate is, the higher the score is. The scoring principle is as follows: desensitization rate is less than 10%, and score is 4-5; desensitization rate is 10% -50%, and the score is 2-3; desensitization rate was above 50% and scored 1.
Applicability-feature dimension the feature dimension of the data coverage is investigated for evaluating the feature dimension of the data coverage.
The data credit dimension (S4) considers the seller enterprise itself and the data notarization, and its sub-dimensions include but are not limited to: the enterprise credit and data fairness, for example, the data fairness, refer to indexes including but not limited to: whether a legal person is admitted or not, whether data is admitted or not, whether a contract is notarized or not, model auditing, certificate storage conditions and the like, and the data notarization sub-dimension inspects the compliance and qualification conditions of an enterprise legal person, data and the like, and accesses a data registration platform owned by a trading exchange to convert the result filled by a client into a grade.
The data evaluation dimension (S5) considers the evaluation conditions of the data products and the merchants, and the sub-dimensions comprise data product evaluation, merchant evaluation and the like, wherein the data product evaluation comprises but is not limited to data practicability, price rationality, data consistency, data timeliness, data authenticity, commodity history comprehensive evaluation scores and the grading values of the commodity evaluation in related categories; merchant ratings include, but are not limited to, merchant response age, merchant demand match, merchant service capability, merchant credit. The specific scoring mode is manual scoring.
The data asset dimension (S7) considers the assets condition of the data commodity, and the sub-dimensions comprise risk, instruction, cost and application, wherein the risk considered indexes comprise compliance, regionality and safety; indicators of quality considerations include accuracy, authenticity, and completeness; the cost considered indexes comprise storage, processing and operation and maintenance, and the application considered indexes comprise scenicity, timeliness, scarcity and multidimensional property. The specific scoring mode is manual scoring.
Data management maturity (S8) considers data management maturity, the sub-dimensions of which include data strategy, data governance, data architecture, data application, data security, data quality, data standard, data life cycle. The specific scoring mode is manual scoring.
An embodiment of the present application further provides a data evaluation device, which can implement the data evaluation method, and the device includes: the acquisition module is used for acquiring target data; the evaluation module is used for evaluating the target data according to a preset data evaluation model to obtain scoring data of target evaluation dimensions, and the number of the target evaluation dimensions is at least two; the natural language processing module is used for carrying out natural language processing on the target data to obtain target keywords; the recommendation module is used for recommending the content of the target data according to the target keyword to obtain recommendation data related to the target data; the searching module is used for searching in preset historical evaluation data according to the recommendation data to obtain the historical score of the recommendation data; the matrix module is used for obtaining a scoring matrix according to historical scoring and scoring data; the weighting module is used for obtaining weighting information according to the scoring matrix; and the evaluation result module is used for obtaining an evaluation result according to the weight information and the grading data.
The specific implementation of the data evaluation apparatus of this embodiment is substantially the same as the specific implementation of the data evaluation method, and is not described herein again.
An embodiment of the present application further provides an electronic device, including:
at least one memory;
at least one processor;
at least one program;
the programs are stored in a memory and a processor executes the at least one program to implement the present application to implement the data evaluation methods described above. The electronic device can be any intelligent terminal including a mobile phone, a tablet computer, a Personal Digital Assistant (PDA for short), a vehicle-mounted computer and the like.
Referring to fig. 7, fig. 7 illustrates a hardware structure of an electronic device according to another embodiment, where the electronic device includes:
the processor 701 may be implemented by a general-purpose CPU (central processing unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, and is configured to execute a relevant program to implement the technical solution provided in the embodiment of the present application;
the memory 702 may be implemented in a ROM (read only memory), a static memory device, a dynamic memory device, or a RAM (random access memory). The memory 702 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present disclosure is implemented by software or firmware, the relevant program codes are stored in the memory 702 and called by the processor 701 to execute the data evaluation method according to the embodiments of the present disclosure;
an input/output interface 703 for realizing information input and output;
the communication interface 704 is used for realizing communication interaction between the device and other devices, and can realize communication in a wired manner (for example, USB, network cable, etc.) or in a wireless manner (for example, mobile network, WIFI, bluetooth, etc.); and
a bus 705 that transfers information between the various components of the device (e.g., the processor 701, the memory 702, the input/output interface 703, and the communication interface 704);
wherein the processor 701, the memory 702, the input/output interface 703 and the communication interface 704 are communicatively connected to each other within the device via a bus 705.
The embodiment of the application also provides a storage medium which is a computer-readable storage medium, and the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used for causing a computer to execute the data evaluation method.
According to the data evaluation method, the data evaluation device, the electronic equipment and the storage medium, evaluation processing is carried out on target data through a preset data evaluation model to obtain grading data of target evaluation dimensionality, natural language processing is carried out on the target data to obtain target keywords, content recommendation processing is carried out on the target data according to the target keywords to obtain recommendation data related to the target data, searching is carried out in preset historical evaluation data according to the recommendation data to obtain historical grading of the recommendation data, a grading matrix is obtained by combining the historical grading and the grading data of the target data, weight information is obtained by using the grading matrix, an evaluation result is obtained by calculation according to the weight information and the grading data, the evaluation process of the target data is completed, and the evaluation efficiency and accuracy are improved.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The embodiments described in the embodiments of the present application are for more clearly illustrating the technical solutions of the embodiments of the present application, and do not constitute a limitation to the technical solutions provided in the embodiments of the present application, and it is obvious to those skilled in the art that the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems with the evolution of technology and the emergence of new application scenarios.
It will be appreciated by those skilled in the art that the solutions shown in fig. 1-7 are not intended to limit the embodiments of the present application and may include more or fewer steps than those shown, or some of the steps may be combined, or different steps may be included.
The above-described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
One of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.
The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes multiple instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing programs, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and the scope of the claims of the embodiments of the present application is not limited thereto. Any modifications, equivalents and improvements that may occur to those skilled in the art without departing from the scope and spirit of the embodiments of the present application are intended to be within the scope of the claims of the embodiments of the present application.

Claims (10)

1. A data evaluation method, comprising:
acquiring target data;
evaluating the target data according to a preset data evaluation model to obtain scoring data of target evaluation dimensions, wherein the number of the target evaluation dimensions is at least two;
performing natural language processing on the target data to obtain target keywords;
performing content recommendation processing on the target data according to the target keyword to obtain recommendation data related to the target data;
searching in preset historical evaluation data according to the recommendation data to obtain a historical score of the recommendation data;
obtaining a scoring matrix according to the historical scoring and the scoring data;
obtaining weight information according to the grading matrix;
and obtaining an evaluation result according to the weight information and the grading data.
2. The data evaluation method of claim 1, wherein the target data comprises full data, the target evaluation dimension comprises a data matching dimension, and the evaluation of the target data according to a preset data evaluation model to obtain scoring data of the target evaluation dimension comprises:
extracting the full data according to a preset sample extraction rule to obtain sample data;
and evaluating the data matching degree according to the full data and the sample data to obtain the grading data of the target data in the data matching dimension.
3. The data evaluation method of claim 1, wherein the evaluation dimension comprises a data contribution dimension, and the evaluation processing of the target data according to a preset data evaluation model to obtain the scoring data of the target evaluation dimension comprises:
obtaining a model sub-dimension score and a user use feedback score;
performing variable characteristic analysis on the target data to obtain information quantity corresponding to each variable characteristic;
obtaining a feature sub-dimension score according to the information quantity corresponding to each variable feature;
and obtaining the scoring data of the target data in the data contribution dimension according to the model sub-dimension score, the user use feedback score and the feature sub-dimension score.
4. The data evaluation method of claim 1, wherein the evaluation dimension comprises a data application dimension, and the evaluation processing of the target data according to a preset data evaluation model to obtain the scoring data of the target evaluation dimension comprises:
acquiring historical transaction data;
obtaining an application behavior score according to the historical transaction data;
and obtaining the scoring data of the target data in the data application dimension according to the application behavior score and a preset application effect score.
5. The data evaluation method of claim 1, wherein the deriving weight information from the scoring matrix comprises:
carrying out data standardization processing on the scoring matrix to obtain a standardized matrix;
calculating the information entropy of each evaluation dimension according to the standardized matrix;
and obtaining the weight information corresponding to each evaluation dimension according to the information entropy.
6. The data evaluation method of claim 1, wherein the performing content recommendation processing on the target data according to the target keyword to obtain recommendation data related to the target data comprises:
acquiring a preset recommended quantity;
calculating similarity information of the target data according to the target keywords;
and obtaining the preset recommendation quantity of the recommendation data related to the target data according to the similarity information.
7. The data evaluation method according to any one of claims 1 to 6, characterized by further comprising:
and updating the historical evaluation data according to the evaluation result.
8. A data evaluation device, comprising:
the acquisition module is used for acquiring target data;
the evaluation module is used for evaluating the target data according to a preset data evaluation model to obtain scoring data of target evaluation dimensions, and the number of the target evaluation dimensions is at least two;
the natural language processing module is used for carrying out natural language processing on the target data to obtain target keywords;
the recommending module is used for recommending the content of the target data according to the target keyword to obtain recommended data related to the target data;
the searching module is used for searching in preset historical evaluation data according to the recommendation data to obtain the historical score of the recommendation data;
the matrix module is used for obtaining a scoring matrix according to the historical scoring and the scoring data;
the weighting module is used for obtaining weighting information according to the scoring matrix;
and the evaluation result module is used for obtaining an evaluation result according to the weight information and the grading data.
9. An electronic device, comprising:
at least one memory;
at least one processor;
at least one program;
the programs are stored in a memory, and a processor executes the at least one program to implement:
the method of any one of claims 1 to 7.
10. A storage medium that is a computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform:
the method of any one of claims 1 to 7.
CN202111436315.7A 2021-11-29 2021-11-29 Data evaluation method and device, electronic equipment and storage medium Pending CN114266443A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111436315.7A CN114266443A (en) 2021-11-29 2021-11-29 Data evaluation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111436315.7A CN114266443A (en) 2021-11-29 2021-11-29 Data evaluation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114266443A true CN114266443A (en) 2022-04-01

Family

ID=80825857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111436315.7A Pending CN114266443A (en) 2021-11-29 2021-11-29 Data evaluation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114266443A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115297367A (en) * 2022-07-06 2022-11-04 北京快乐茄信息技术有限公司 Recommendation method, recommendation device, electronic equipment and storage medium
CN116955736A (en) * 2023-09-15 2023-10-27 北京南天智联信息科技股份有限公司 Data constraint condition recommendation method and system in data standard
CN117093596A (en) * 2023-10-12 2023-11-21 北京固加数字科技有限公司 Bond transaction data collecting and processing system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115297367A (en) * 2022-07-06 2022-11-04 北京快乐茄信息技术有限公司 Recommendation method, recommendation device, electronic equipment and storage medium
CN115297367B (en) * 2022-07-06 2024-02-09 北京快乐茄信息技术有限公司 Recommendation method, recommendation device, electronic equipment and storage medium
CN116955736A (en) * 2023-09-15 2023-10-27 北京南天智联信息科技股份有限公司 Data constraint condition recommendation method and system in data standard
CN116955736B (en) * 2023-09-15 2023-12-01 北京南天智联信息科技股份有限公司 Data constraint condition recommendation method and system in data standard
CN117093596A (en) * 2023-10-12 2023-11-21 北京固加数字科技有限公司 Bond transaction data collecting and processing system
CN117093596B (en) * 2023-10-12 2024-01-12 北京固加数字科技有限公司 Bond transaction data collecting and processing system

Similar Documents

Publication Publication Date Title
Zhang Incorporating phrase-level sentiment analysis on textual reviews for personalized recommendation
Chen et al. Predicting the influence of users’ posted information for eWOM advertising in social networks
Yang et al. Integrating rich and heterogeneous information to design a ranking system for multiple products
CN111274330B (en) Target object determination method and device, computer equipment and storage medium
CN111626832B (en) Product recommendation method and device and computer equipment
CN114266443A (en) Data evaluation method and device, electronic equipment and storage medium
CN114861050A (en) Feature fusion recommendation method and system based on neural network
KR20220000485A (en) User inference and emotion analysis system and method using the review data of online shopping mall
CN114201516A (en) User portrait construction method, information recommendation method and related device
CN116468460A (en) Consumer finance customer image recognition system and method based on artificial intelligence
CN111429161A (en) Feature extraction method, feature extraction device, storage medium, and electronic apparatus
Zeng et al. User review helpfulness assessment based on sentiment analysis
Jayawardena et al. Artificial intelligence (AI)-based market intelligence and customer insights
CN114996579A (en) Information pushing method and device, electronic equipment and computer readable medium
Zhang et al. Product improvement in a big data environment: A novel method based on text mining and large group decision making
Liu et al. User-generated content analysis for customer needs elicitation
CN113254775A (en) Credit card product recommendation method based on client browsing behavior sequence
CN113052653A (en) Financial product content recommendation method and system and computer readable storage medium
Iswari et al. User-Generated Content Extraction: A Bibliometric Analysis of the Research Literature (2007–2022)
CN112860878A (en) Service data recommendation method, storage medium and equipment
CN111460300A (en) Network content pushing method and device and storage medium
CN111444338A (en) Text processing device, storage medium and equipment
Peng et al. Personalized product recommendation model of automatic question answering robot based on deep learning
Lee Automatically learning user needs from online reviews for new product design
Turdjai et al. Simulation of marketplace customer satisfaction analysis based on machine learning algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination