CN111539562A

CN111539562A - Data evaluation method and system based on model

Info

Publication number: CN111539562A
Application number: CN202010280896.9A
Authority: CN
Inventors: 陈召群; 谢理达
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-04-10
Filing date: 2020-04-10
Publication date: 2020-08-14

Abstract

The disclosure relates to a model-based data evaluation method and system. The method includes determining a predicted reliability of each of all reports published by all data publishers for all objects, summarizing the predicted reliability of each data publisher for all individual objects based on the predicted reliability of each report, and ranking the reliability of all reports for each of all individual objects based on the predicted reliability of the data publishers for all individual objects.

Description

Data evaluation method and system based on model

Technical Field

The present disclosure relates generally to data information processing, and more particularly to a model-based data evaluation method and system.

Background

Data publishers, such as various data research organizations and the like, typically publish or predict a wide variety of data for various businesses that consumers or users typically employ in making purchasing decisions or other decisions.

However, currently, an evaluation method for data publishers or published prediction data is lacking, so that it is difficult for users to screen out the most important and most valuable parts from a large number of data publishing or prediction reports, and the decision efficiency and user experience of users are greatly influenced.

In addition, there are often many data publishers who publish the predicted data for a certain object, often there are many data publishers who publish or predict data values of the object for one object, and each data publisher may publish multiple research reports for different objects. In addition, the data publisher has industries and data which are good at research, and the conventional research report cannot show the research capability of the data publisher and cannot reflect the prediction or research of the data publisher on a specific object.

Accordingly, the present disclosure provides a model-based data evaluation method and system to solve the problems of the prior art.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

In view of the shortcomings of the prior art solutions, by checking the object research reports (including the predicted values of one or more data of an object) issued by data publishers, evaluating the prediction reliability of the research reports, summarizing the prediction reliability of one object by one data publisher, evaluating the research or prediction level of the object by the data publisher, and ranking the research reports (i.e., research reports) based on these evaluation data, the method of ranking the research reports (i.e., research reports) can rank higher research reports with reference value for users ahead, so that users can obtain more valuable data prediction or research reports.

In one embodiment of the present disclosure, a method for model-based data evaluation is provided, the method comprising:

determining the prediction reliability of each newspaper in all the newspapers issued by all the data issuers to all the objects;

summarizing the predicted reliability of each data publisher on all the single objects based on the predicted reliability of each report; and

all the reports for each of all the individual objects are reliability ranked based on their data publisher predicted reliability.

In one embodiment of the disclosure, a data evaluation system is provided and includes a report reliability evaluation module, a data publisher reliability evaluation module, and a report sorting module.

The research and report reliability evaluating module determines the predicted reliability of each research and report in all research and reports issued by all objects by all data issuers, and transmits the determined predicted reliability of each research and report to the data issuer reliability evaluating module.

The data publisher reliability evaluating module summarizes the predicted reliability of each data publisher on all single objects based on the predicted reliability of each report, and transmits the summarized predicted reliability to the report sorting module.

A report ranking module ranks all reports for each of all individual objects for reliability based on their predicted reliability by data publishers.

Other aspects, features and embodiments of the disclosure will become apparent to those ordinarily skilled in the art upon review of the following description of specific exemplary embodiments of the disclosure in conjunction with the accompanying figures. While features of the disclosure may be discussed below with respect to certain embodiments and figures, all embodiments of the disclosure may include one or more of the advantageous features discussed herein. In other words, while one or more embodiments may have been discussed as having certain advantageous features, one or more of such features may also be used in accordance with the various embodiments of the present disclosure discussed herein. In a similar manner, although example embodiments may be discussed below as device, system, or method embodiments, it should be appreciated that such example embodiments may be implemented in a variety of devices, systems, and methods.

Drawings

So that the manner in which the above recited features of the present disclosure can be understood in detail, a more particular description of the disclosure, briefly summarized above, may be had by reference to aspects, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only certain typical aspects of this disclosure and are therefore not to be considered limiting of its scope, for the description may admit to other equally effective aspects.

FIG. 1 illustrates an application environment in which embodiments of the present disclosure may be implemented.

FIG. 2 illustrates a block diagram of a data profiling application, according to one embodiment of the present disclosure.

FIG. 3 illustrates a block diagram of a research reliability evaluation module according to one embodiment of the present disclosure.

FIG. 4 illustrates a block diagram of a data publisher reliability evaluation module, according to one embodiment of the present disclosure.

FIG. 5 shows a block diagram of a newspaper ranking module according to one embodiment of the present disclosure.

FIG. 6 illustrates a flow diagram of a method for model-based data profiling according to one embodiment of the present disclosure.

Detailed Description

Various embodiments will be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show specific exemplary embodiments. Embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of these embodiments to those skilled in the art. Embodiments may be implemented as a method, system or device. Accordingly, embodiments may take the form of a hardware implementation, an entirely software implementation, or an implementation combining software and hardware aspects. The following detailed description is, therefore, not to be taken in a limiting sense.

FIG. 1 illustrates an application environment 100 in which embodiments of the present disclosure may be implemented.

In environment 100, a user 102 uses a computing device to obtain predictions or reports of data published by various data publishers via a server 104. These prediction data are typically stored in data store 106. Additionally, the data store 106 also stores real data (daily updates) of the predicted objects.

The computing device may be a desktop computing device. A desktop computing device includes at least one processing unit and system memory. The system memory may include an operating system and one or more programming modules. The one or more programming modules may include an application module installed on a desktop computing device for obtaining data reviews (such as the data profiling application 200 described in detail below). When executed on a processing unit, the programming modules may perform various processes, including operations relating to methods as described below.

The computing device may also be a mobile computing device, by way of example and not limitation, a smartphone, tablet device, or the like. The mobile computing device is a handheld computer having both input elements and output elements. The input elements may include a touch screen display and input buttons that allow a user to input information into the mobile computing device. The mobile computing device may also incorporate optional side input elements that allow further user input. The optional side input element may be a rotary switch, a button, or any other type of manual input element. The mobile computing device may also include at least one processing unit and a system memory. The system memory may include an operating system and one or more programming modules. The one or more programming modules may include an application module installed on the mobile computing device for obtaining data reviews (such as the data profiling application 200 described in detail below). When executed on a processing unit, the programming modules may perform various processes, including operations relating to methods as described below.

According to one embodiment of the disclosure, the server 104 includes a processor coupled to volatile memory and a large capacity nonvolatile memory (such as a disk drive). The server 104 may also include a floppy disk drive, Compact Disk (CD) or DVD disk drive coupled to the processor. The server 104 may also include a network access port coupled to the processor for establishing a data connection with a network, such as a local area network coupled to other broadcast system computers and servers or to the internet.

As can be appreciated by those skilled in the art, the various steps and functional modules of the model-based data profiling methods and systems provided in the present disclosure may be implemented on the server 104, or may be implemented on both the server 104 and the computing devices of the user 102 (including desktop computing devices and mobile computing devices).

Data publishers, such as various data research organizations and the like, typically publish or predict a wide variety of data for various businesses that consumers or users typically employ in making purchasing decisions or other decisions. The data publisher may predict the data value or trend, such as rise and fall and target value, of a particular piece of data for an object.

In one embodiment of the disclosure, the objects may be, by way of example and not limitation, social network personalities, and the data publisher may predict the amount of daily access, clicks, attention, etc. of a personalities on the social network. In another embodiment of the disclosure, the object may be an advertisement, by way of example and not limitation, and the data publisher may predict the number of daily visits, exposure, of a certain advertisement after being placed on a certain platform. In yet another embodiment of the disclosure, the object may be a stock, and the data publisher may predict the trend of a certain stock, by way of example and not limitation. Thus, users (such as social network users, advertising publishers, data publisher users) can reference items of data published or predicted by the data publisher as the basis for their next action. As can be appreciated by those skilled in the art, an object herein can be any other object or entity.

The data reports published by the data publishers generally consist of title, publication time, publisher name, body, conclusion, etc., and in the conclusion, the data publishers generally give ratings, predicted target values, and achievement times of the target values. Ratings are a general description of whether a data publisher looks good for the predicted object in the future, and can be generally classified into three categories, namely "positive rating", "neutral rating", and "negative rating". The target value is the prediction of the future data of the predicted object by the data publisher, and the target value has reference value only by matching with the target value achievement time. The target value is a quantitative prediction or evaluation of the foreground of the object by the data publisher.

The invention innovatively provides a novel data prediction reliability assessment method and solves the problem that a user cannot distinguish the research and report value when reading data research and report.

The above aspects of the present disclosure will be described in detail below by means of various block diagrams and method flow diagrams.

FIG. 2 illustrates a block diagram of a data profiling application 200 according to one embodiment of the present disclosure.

The functionality of the profiling application 200 may be implemented on the server 104 in FIG. 1, or may be implemented on both the server 104 and the computing devices of the users 102. The data needed by the data profiling application 200 may be stored in the data store 106 in FIG. 1.

As shown in FIG. 2, the data profiling application 200 includes a report reliability profiling module 202, a data publisher reliability profiling module 204, a report ranking module 206, and a data processing model 208. The data processing model 208 includes a data analysis layer 210, a data summarization layer 212, and a data ordering layer 214. The functions of these three layers in the data processing model 208 can be implemented by the report reliability evaluation module 202, the data publisher reliability evaluation module 204, and the report sorting module 206 to implement the functions disclosed in the embodiments of the present disclosure. These functions will be described in detail below.

In particular, the report reliability evaluation module 202 may be configured to calculate a reliability score for all report predictions. The research and report reliability evaluation module receives 2 input data and generates 1 output data, which are respectively introduced as follows:

inputting data one: the object report data is prediction data published by a data publisher. Each study is distinguished by a study id. The report data may include the release time, the investigated subject, the data release person, the target value of the data, the achievement time of the target value, and other critical data included in the subject report described above.

By way of example and not limitation, a report about an object a (e.g., a social network person) published by a data publisher a in 2017, 6/30 includes two important data, namely a rating (increase, hold, decrease) of a degree of interest (i.e., fan count) for the social network person and a goal of 200 million users reached at 3 months. By way of example and not limitation, a report on object B (e.g., an advertisement) published by data publisher B on 6/13/2017 includes two important data, namely a rating (good, general, bad) for the advertisement B and 100 ten thousand clicks (i.e., exposures) to reach a target value in 6 months. By way of example and not limitation, a report about object C (e.g., object) published by data publisher C on 30/5/2017 includes two important data, namely a rating (buy, hold, sell) for object C and a 15.5-dollar target value reached in 3 months, and so on.

Inputting data II: and the object real data is all real data of all objects for which all reports issued by all data issuers are between report issuing time and target value achievement time.

By way of example and not limitation, the number of fans of social network character a for each day in three months, 6-30 of 2017 to 9-30 of 2017; click volume for ad B on each day in six months, 6/13/2017 to 12/13/2017; and closing stock price data for subject C for each day in the three months of 2017, 5-30.2017, 8-30.8, and so on.

The two input data described above may be stored in the data store 106 of FIG. 1, but those skilled in the art will appreciate that these data may be stored in databases other than the data store 106 of FIG. 1, and that the data store 106 in this disclosure is by way of example only and not by way of limitation.

With these two inputs, the report reliability evaluation module 202 may score the reliability of the report predictions. In this step, the following intermediate variables are also introduced and calculated:

and (3) an observation period: in the above example, by way of example and not limitation, the observation period of object a published by data publisher a is from 6/30/2017 to 9/30/2017.

Observation days: the number of days of observation period, in the above example, the number of observation days of the subject a is, by way of example and not limitation, 92 days.

Initial values: the object data value for the day of release is reported, in the above example, by way of example and not limitation, the number of fans for object a in 2017 for 6, 30 months.

Actual number of days to reach target value: the number of days it took to reach the target value after release was reported. If the initial value is less than the target value, the number of days it takes for the true value to be greater than or equal to the target value for the first time; if the initial value is greater than the target value, the number of days it takes for the true value to be less than or equal to the target value for the first time; if the starting value is equal to the target value, this indicates that no informative prediction of the future behavior of the object data has been made by the study, this term being 0.

Days with same trend as the target value or rating: if the initial value is less than the target value or the rating is a positive rating, indicating that the data of the reported prediction object will develop well, the item is the number of days that the real value is higher than the initial value in the observation period; if the initial value is greater than the target value or the rating is a negative rating, indicating that the data of the forecast object will develop to a bad aspect, the item is the number of days that the real value is lower than the initial value in the observation period; if the starting value is equal to the target value, indicating that no informative prediction has been made of the future data trend, the term is 0.

Prediction deviation degree: the change rate of the target value relative to the starting value is calculated according to the following formula:

wherein abs represents absolute values such as abs (1) ═ 1, abs (-1) ═ 1; min represents taking the minimum value, e.g., min (1,2) ═ 1. The value range of the prediction deviation is 0-1, and the smaller the prediction deviation is, the more conservative the prediction is; the larger the prediction deviation degree is, the more aggressive the prediction is.

Then, the reliability score of the study is calculated according to the intermediate variables as follows:

if the actual value of the object in the observation period reaches the target value, then:

wherein the variables a and b can be set according to actual needs. In one embodiment of the present disclosure, by way of example and not limitation, a-60 and b-40, in other embodiments of the present disclosure, other values may also be employed.

If the actual value of the object in the observation period does not reach the target value, then:

wherein the variable a can be set according to actual needs. In one embodiment of the present disclosure, by way of example and not limitation, a-60, in other embodiments of the present disclosure, other values may also be employed.

In the above formulas (2) and (3), the reliability score range of the report is 0 to 100. The base score 60 is obtained if the target value is reached within the observation window, and the additional score 40 is dependent on the degree of predicted deviation, the faster the target value is reached, and the number of days it takes to reach the target value, the higher the degree of predicted deviation, the smaller the degree of predicted deviation, the slower the target value is reached, and the lower the additional score. If the target value is not reached within the observation window, the highest score is 60, the higher the number of days with the same trend as the target or rating, the higher the score, and the lower the number of days with the same trend as the target or rating, the lower the score.

Then, by combining the real data of the objects, that is, all the real data of all the objects for which all the reports issued by all the data issuers are between the report issuing time and the target value reaching time, the report reliability evaluating module 202 can calculate the predicted reliability scores of all the reports issued by all the data issuers for all the objects by using the above equations (1) to (3).

By way of example and not limitation, the credibility score of the newspaper for object a published by data publisher a on 30.6.2017 is 90 points, the credibility score of the newspaper for object a published by data publisher a on 30.11.30.2017 is 80 points, the credibility score of the newspaper for object B published by data publisher B on 13.6.2017 is 70 points, the credibility score of the newspaper for object B published by data publisher B on 13.10.2017 is 80 points, the credibility score of the newspaper for object C published by data publisher C on 30.5.30.2017.5.70 points, the credibility score of the newspaper for object C published by data publisher C on 30.1.30.2018 is 70 points, and so on.

As can be seen from the above examples, the report reliability evaluation module 202 can calculate the prediction reliability scores of all reports issued by all data issuers on all objects by using equations (1) to (3) through the object truth data and the report data.

After the research and report reliability evaluating module 202 calculates the prediction reliability scores of all the research and reports issued by all the data issuers to all the objects by using the formulas (1) to (3) through the object real data and the research and report data, the research and report reliability evaluating module 202 transmits the calculated prediction reliability scores of all the research and reports issued by all the data issuers to all the objects to the data issuer reliability evaluating module 204.

The data publisher reliability evaluation module 204 may be configured to aggregate the predicted reliability of each data publisher for all individual objects based on the predicted reliability of each report. In particular, the data publisher reliability evaluation module 204 determines a predicted reliability score for the data publisher for the object A. This step involves 1 input, the forecast reliability scores for all subjects published by all data publishers from the survey reliability evaluation module 202.

In other words, the data publisher reliability evaluation module 204 screens and aggregates all the reports of each data publisher to a single object from the predicted reliability score data of all the reports published by all the data publishers to all the objects from the report reliability evaluation module 202.

By way of example and not limitation, the data publisher reliability evaluation module 204 screens out three reports with scores of 90, 60, and 75 issued by the data publisher a for the object a, two reports with scores of 80 and 60 issued by the data publisher B for the object a, two reports with scores of 70 and 80 issued by the data publisher C for the object B, and three reports with scores of 50, 70, and 90 issued by the data publisher D for the object C, respectively, and so on.

Subsequently, the data publisher reliability evaluation module 204 scores the overall predicted reliability of each data publisher for the object according to the predicted reliability scoring data of each data publisher for all the reports of all the single objects. This step involves introducing and calculating the following intermediate variables:

and (3) researching and reporting freshness: the closer the report release time is to the current time, the higher the freshness, otherwise, the lower the freshness, and the specific calculation formula is as follows:

wherein the variable c can be set according to actual needs. In one embodiment of the present disclosure, by way of example and not limitation, c-130. In other embodiments of the present disclosure, other values may also be employed.

Where c is the attenuation coefficient, indicating that the freshness has dropped to the first half after a particular number of days of newspaper release. The research and report freshness is introduced to reflect the current research and judgment capability of a data publisher on a certain object, and weaken the past research and judgment capability, because the performance of the object at different periods can be different and the market environment can be changed instantly. In one embodiment of the present disclosure, by way of example and not limitation, the particular number of days may be 90 days, and other numbers of days may be employed in other embodiments.

The data publisher reliability evaluation module 204 then calculates a data publisher predicted reliability score for the single object according to the following equation:

wherein N represents the number of all reports of the object by the data publisher, and as can be seen from the above formula, the predicted reliability score of the object by the data publisher is the average value of the product of the freshness and the reliability score of all reports of the object by the data publisher.

By way of example and not limitation, the reliability score for three reports published by data publisher a for object a is 75 points, the reliability score for two reports published by data publisher B for object a is 70 points, the reliability score for three reports published by data publisher C for object B is 75 points, the reliability score for three reports published by data publisher D for object C is 70 points, and so on.

The data publisher reliability evaluation module 204 then outputs and passes the data publisher prediction reliability scores for all individual objects to the research and report ranking module 206.

The report ordering module 206 may be used to order the reliability of all reports for each of all individual objects based on their predicted reliability by the data publisher. This step involves 2 inputs and 1 output, respectively as follows:

inputting one: all data publishers from the data publisher reliability evaluation module 204 score the predicted reliability of all individual objects.

Inputting a second: the current report data for the subject, this input being the subject report data presented to the user via the user's computing device. For example, if the user is interested in object A and chooses to view the report for object A, then this input is the report data about object A that all data publishers have published.

Subsequently, the report ranking module 206 ranks the report data based on the predicted reliability scores of all individual objects by all data publishers from the data publisher reliability evaluation module 204 to assign a highest priority to the highest ranked report, including presenting the highest predicted reliability data publisher report to the user.

In one embodiment of the present disclosure, the higher the reliability score, the higher the ranking of the written newspaper by the data publisher, and vice versa. If a data publisher writes multiple reports for an object at different times, the closer the publication time, the earlier the ranking of the reports, and vice versa. In another embodiment of the present disclosure, if a data publisher writes multiple reports for a subject at different times, the higher the report reliability score (from the report reliability evaluation module 202) the earlier the ranking of the reports, and vice versa. As will be appreciated by one skilled in the art, the above illustration or ordering rules are merely exemplary and are not limiting on the scope of the disclosure.

The various modules in the reporting evaluation application 200 are described further below.

FIG. 3 illustrates a block diagram of a report reliability evaluation module 302, according to one embodiment of the present disclosure.

In particular, the survey reliability evaluation module 302 may be configured to calculate a reliability score for a single survey prediction. The research and report reliability evaluation module receives 2 input data and generates 1 output data, which are respectively introduced as follows:

inputting data one: the object report data is prediction data published by a data publisher. Each study is distinguished by a study id. The report data may include the key data of the subject reports described above including the time of release, the subject being studied, the publisher of the release report data, and the ratings, target values, time to achieve the target values, etc.

By way of example and not limitation, a report about object a published by data publisher a in 2017, 6/month and 30/month, includes two important data, namely an "increase" rating for object a and 200 general users reaching a target value in 3 months; the report about the object B issued by the data publisher B in 2017, 6 and 13 months comprises two important data, namely a good effect rating for the object B and 14.5 yuan for reaching a target value in 6 months; and a report about object C published by data publisher C in 2017, 5, month, and 30, including two important data, a "buy" rating for object C and a 15.5-element target value reached in 3 months, and so on. As shown in table 1 below (by way of example only):

TABLE 1

By way of example and not limitation, the number of fans of social network character a for each day in three months, 6-30 of 2017 to 9-30 of 2017; click volume for ad B on each day in six months, 6/13/2017 to 12/13/2017; and closing stock price data for subject C for each day in the three months of 2017, 5-30.2017, 8-30.8, and so on. As shown in table 2 below (by way of example only):

object code	Date	True value
			ObjA	2017/07/01	120 ten thousand
ObjA	2017/07/02	122 ten thousand
			ObjA	2017/07/03	125 ten thousand
……	……

TABLE 2

With these two inputs, the report reliability evaluation module 202 may score the reliability of the report predictions. In this step, the following intermediate variables are also introduced and calculated: and (3) an observation period: a time window between the target value achievement time and the release time in the object report, in the above example, the observation period of the object a released by the data publisher a is, by way of example and not limitation, from 6/30/2017 to 9/30/2017; observation days: days of observation period, in the above example, by way of example and not limitation, the number of observation days for subject a is 92 days; initial values: the object report publishes the true values of the objects on the day, in the above example, object a is by way of example and not limitation the true data values of object a on 30 days 6 months 2017; actual number of days to reach target value: days it takes for the true value to reach the target value after the object is published. If the initial value is less than the target value, the number of days it takes for the true value to be greater than or equal to the target value for the first time; if the initial value is greater than the target value, the number of days it takes for the true value to be less than or equal to the target value for the first time; if the starting value is equal to the target value, it means that the report has not made an informative prediction about the future data trend of the subject, the term being 0; days with same trend as the target value or rating: if the initial value is smaller than the target value or the rating is positive rating, which indicates that the research and report prediction object has good prospect, the item is the number of days that the real value is higher than the initial value in the observation period; if the initial value is larger than the target value or the rating is negative rating, which indicates that the forecast object has poor prospect, the item is the number of days that the real value is lower than the initial value in the observation period; if the starting value is equal to the target value, it means that the report has not made an informative prediction about the future data trend of the subject, the term being 0; and a prediction deviation degree: the calculation formula of the rate of change of the target value from the initial value is shown in the above formula (1). The value range of the prediction deviation is 0-1, and the smaller the prediction deviation is, the more conservative the prediction is; the larger the prediction deviation degree is, the more aggressive the prediction is.

if the true value of the object during the observation period reaches the target value, a reliability score is calculated using equation (2) above. And if the true value of the object during the observation period does not reach the target value, the reliability score is calculated using equation (3) above.

Then, by combining the real data of the object, that is, all the actual data of all the objects for which all the reports issued by all the data issuers are between the report issuing time and the target value reaching time, the report reliability evaluating module 302 can calculate the predicted reliability scores of all the reports issued by all the data issuers for all the objects by using the above equations (1) - (3), as shown in the following table 3 (by way of example only):

research and report id	Object code	Data publisher code	Reliability scoring
				1	ObjA	PubA	90
2	ObjB	PubB	60
				3	ObjC	PubC	70
……

TABLE 3

After the research and report reliability evaluating module 302 calculates the prediction reliability scores of all the research and reports issued by all the data issuers to all the objects by using the formulas (1) to (3) through the object real data and the research and report data, the research and report reliability evaluating module 302 transmits the calculated prediction reliability scores of all the research and reports issued by all the data issuers to all the objects to the data issuer reliability evaluating module 402.

FIG. 4 illustrates a block diagram of a data publisher reliability evaluation module 402, according to one embodiment of the present disclosure.

The data publisher reliability evaluation module 402 may be configured to aggregate the predicted reliability of each data publisher for all individual objects based on the predicted reliability of each data publisher. In particular, the data publisher reliability evaluation module 402 computes a predicted reliability score for the object for the data publisher. This step involves 1 input, the predictive reliability score for all the reports on the subject, from the data publisher of the report reliability evaluation module 202.

In other words, the data publisher reliability evaluation module 402 screens and summarizes all the reports for a single object from the predicted reliability score data of all the reports published for all objects by all the data publishers from the report reliability evaluation module 302, as shown in the following table 4 (by way of example only):

TABLE 4

Subsequently, the data publisher reliability evaluating module 402 scores the overall predicted reliability of each data publisher for the object according to the predicted reliability scoring data of each data publisher for all the reports of all the single objects. This step involves introducing and calculating the following intermediate variables: the freshness of the report, that is, the closer the report release time is to the current time, the higher the freshness, and conversely, the lower the freshness, the specific calculation formula is shown in the above formula (4).

The research and report freshness is introduced to reflect the current research and judgment capability of a data publisher on a certain object, and weaken the past research and judgment capability, because the performance of the object at different periods can be different and the market environment can be changed instantly.

Then, the data publisher reliability evaluating module 402 calculates a data publisher target prediction reliability score according to the above formula (5), where N represents the number of all reports researched by the data publisher for the target, and as can be seen from the above formula, the data publisher target prediction reliability score is an average value of the product of freshness of all reports researched by the data publisher for the target and the reliability score.

By way of example and not limitation, the output of the data publisher reliability evaluation module 402 is shown in Table 5 below (by way of example only):

TABLE 5

The data publisher reliability evaluation module 402 then outputs and passes the data publisher prediction reliability score for the object to the research and report ranking module 502.

FIG. 5 shows a block diagram of a newspaper sorting module 502 according to one embodiment of the present disclosure.

The report ordering module 502 may be used to order the reliability of all reports for each of all individual objects based on their predicted reliability by the data publishers. This step involves 2 inputs and 1 output, respectively as follows:

inputting one: all data publishers from the data publisher reliability evaluation module 402 score the predicted reliability of all individual objects.

Inputting a second: the input is the subject's current report data for presentation to the user via the user's computing device. For example, if the user is interested in object a and chooses to view the report for object a, then this input is the report data about object a that all data publishers published, as shown in table 6 below (by way of example only):

examination and report subject	Time of release of newspaper	Object code	Data publisher code
				Title 1	2017/06/30	ObjA	PubA
Title 2	2017/06/13	ObjA	PubB
				Title 3	2017/05/30	ObjA	PubA
……	……

TABLE 6

Subsequently, the report ranking module 502 ranks the report data for the objects based on the predicted reliability scores for all individual objects by all data publishers from the data publisher reliability evaluation module 402 to present the data publisher reports with the highest predicted reliability to the user, as shown in table 7 below (by way of example only):

TABLE 7

In one embodiment of the present disclosure, the higher the reliability score, the higher the ranking of the written newspaper by the data publisher, and vice versa. If a data publisher writes multiple reports for an object at different times, the closer the publication time, the earlier the ranking of the reports, and vice versa. In another embodiment of the present disclosure, if a data publisher writes multiple reports for a subject at different times, the higher the report reliability score (from the report reliability evaluation module 302) the higher the rank of the reports, and vice versa. As will be appreciated by one skilled in the art, the above illustration or ordering rules are merely exemplary and are not limiting on the scope of the disclosure.

Additionally, as will be appreciated by those skilled in the art, the data in the tables above are exemplary only, and not limiting.

FIG. 6 illustrates a flow diagram of a method 600 for model-based data profiling according to one embodiment of the present disclosure.

At 602, the predicted reliability of each of all the reports published by all the data publishers for all the objects is determined. This determination step is based on all the reports issued by all data publishers to all objects, each separated by a report id. The report data may include the time of release, the subject under study, the publisher of the release report data, and key data such as the rating, target value, and time to achieve the target value included in the subject report. The determining step is further based on all real data of all objects for which all reports issued by all data issuers between the report issuance time and the target value achievement time. This step also introduces observation period, days of observation, starting value, actual days to reach target value, days co-trending with target value or rating, and prediction bias. The calculation method of the prediction deviation degree is as shown in the formula (1), and the smaller the prediction deviation degree is, the more conservative the prediction is; the larger the prediction deviation degree is, the more aggressive the prediction is.

This step then calculates the predicted reliability for each study based on equation (2) or (3) above. Equation (2) is used when the actual value of the object during the observation period reaches the target value, and equation (3) is used when the actual value of the object during the observation period does not reach the target value.

At 604, the predicted reliabilities for all individual objects for each data publisher are aggregated based on the predicted reliabilities for each study. This step screens and aggregates all the reports from each data publisher to a single object from the predictive reliability score data for all the reports published by all the data publishers to all the objects from step 602. This step includes introducing and calculating the variable for the freshness of the study (calculated as equation (4)). This step includes calculating a data publisher prediction reliability score for the object using equation (5).

At 606, all the reports for each of all the individual objects are reliability ranked based on their data publisher predicted reliability. This step is based on the predicted reliability scores for all individual objects by all data publishers from step 604 and the object's report data, i.e., the object's report data for presentation to the user through the user's computing device. This step selects and presents to the user, among all the research data, the research published by the data publisher with the highest predicted reliability.

Embodiments of the present invention are described above with reference to block diagrams and/or operational illustrations of methods, systems, and computer program products according to embodiments of the invention. The functions/acts noted in the blocks may occur out of the order noted in any flowchart. By way of example, and not limitation, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved.

The above specification, examples and data provide a complete description of the manufacture and use of the composition of the invention. Since many embodiments of the invention can be made without departing from the spirit and scope of the invention, the invention resides in the claims hereinafter appended.

Claims

1. A method of evaluating research data, comprising the steps of:

receiving object real data of an object and object report data in a report issued by a data issuer;

determining a predicted reliability of the survey based on the object survey data and the object truth data, the predicted reliability relating to a predicted degree of deviation;

aggregating the predicted reliabilities of the data publishers for the individual objects based on the reported predicted reliabilities; and

reliability ranking the reports for each object based on the predicted reliability for the individual object to push the ranked reports to the user.

2. The method of claim 1, wherein the object report data includes a title, a release time, a release data publisher name, a rating, a target value achievement time.

3. The method of claim 1, wherein the object truth data comprises a truth value for the object for each day between a time of publication to a time of attainment of a target value.

4. The method of claim 3, wherein the determining step further combines an observation period, a number of observation days, an initial value, a number of actual days to reach a target value, a number of days co-trending with a target value or rating.

5. The method of claim 4, wherein the observation period refers to a period of time from a time when the newspaper is released to a time when the target value is achieved, the number of observation days refers to a number of days from the time when the newspaper is released to the time when the target value is achieved, the initial value refers to a real value of an object on the day when the newspaper is released, the number of actual target value achieving days refers to a number of days taken for the real value to reach the target value after the newspaper is released, the number of days on the same trend as the target value or the rating refers to a number of days for which the real value is higher than the initial value in the observation period or a number of days for which the real value is lower than the initial value in the observation period, and the prediction deviation refers to a rate.

6. The method of claim 4, wherein the determining step further comprises determining a prediction reliability based on the predicted deviation, the observed infusion, and the number of days to actually reach the target value when the actual value within the observation period reaches the target value.

7. The method of claim 4, wherein the determining step further comprises determining the predicted reliability based on a number of days with the same trend as the target value or rating and a number of observation days when the true value in the observation period does not reach the target value.

8. The method of claim 1, wherein the aggregating step further comprises screening all the reports for a single object for each data publisher from the predicted reliability score data for all the reports published by all the data publishers for all the objects.

9. The method of claim 1, wherein said aggregating step further incorporates a newspaper freshness, said newspaper freshness referring to how close the newspaper is released from the current time and determined based on how many days the newspaper is released from the current time.

10. The method of claim 9, wherein the aggregating step further comprises calculating a predicted reliability of each data publisher for all individual objects based on a predicted reliability of the reports for all objects and a freshness of the reports.

11. The method of claim 1, wherein the ranking step further incorporates report data for objects of interest to the user, the report data being from more than one data publisher.

12. The method of claim 11, wherein the ranking step further comprises presenting the most predictive reliable data publisher study to the user.

13. The method of claim 12, wherein:

the higher the predicted reliability, the higher the ranking of the research and report written by the data publisher; and

if a data publisher writes multiple reports for an object at different times, the reports with closer publication times are ranked more forward.

14. A system for evaluating research and report data comprises a research and report reliability evaluating module, a data publisher reliability evaluating module and a research and report sorting module, and is characterized in that:

the research and report reliability evaluating module determines the prediction reliability of the research and report based on the object real data of the object and the object research data in the research and report issued by the data publisher and transmits the determined prediction reliability to the data publisher reliability evaluating module, wherein the prediction reliability relates to the prediction deviation;

the data publisher reliability evaluating module summarizes the predicted reliability of the data publisher on a single object based on the predicted reliability of the research report and transmits the summarized predicted reliability to the research report sorting module;

the report ranking module ranks the reports for each object for reliability based on the predicted reliability of the data publishers for the individual object to push the ranked reports to the user.

15. The system of claim 14, wherein the review reliability evaluation module determines a predicted reliability for each review based on a number of observation days, a number of actual target value days, a number of days co-trending with the target value or rating, the target value, and an initial value,

the initial value refers to the real value of an object on the current day of newspaper release, the observation days refer to the days from the time of newspaper release to the time of reaching a target value, the actual days of reaching the target value refer to the days spent when the real value reaches the target value after the newspaper release, the initial value refers to the real value of the object on the current day of newspaper release, the days with the same trend as the target value or the rating refer to the days in which the real value is higher than the initial value in an observation period or the days in which the real value is lower than the initial value in the observation period, and the prediction deviation refers to the change rate of the target value relative to the initial value and is determined based on the target value and the.

16. The system of claim 14, wherein the data publisher reliability evaluation module further screens out all reports for a single object from the predicted reliability scores of all reports published by all data publishers for all objects, and determines the predicted reliability of each data publisher for all individual objects by the predicted reliability of each report for all objects and the freshness of the reports, the freshness of the reports determined based on the number of days the report was published from the current day.

17. The system of claim 14, wherein the report ranking module further presents the data publisher report with the highest predicted reliability to the user by: