CN116340752A - Predictive analysis result-oriented data story generation method and system - Google Patents

Predictive analysis result-oriented data story generation method and system Download PDF

Info

Publication number
CN116340752A
CN116340752A CN202310155761.3A CN202310155761A CN116340752A CN 116340752 A CN116340752 A CN 116340752A CN 202310155761 A CN202310155761 A CN 202310155761A CN 116340752 A CN116340752 A CN 116340752A
Authority
CN
China
Prior art keywords
story
generating
character
event
predictive analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310155761.3A
Other languages
Chinese (zh)
Inventor
朝乐门
张晨
靳庆文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Renmin University of China
Original Assignee
Renmin University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Renmin University of China filed Critical Renmin University of China
Priority to CN202310155761.3A priority Critical patent/CN116340752A/en
Publication of CN116340752A publication Critical patent/CN116340752A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a data story generation method and system for predictive analysis results. The method comprises the following steps: treating the predictive analysis system as a black box, and generating story parameters based on input parameters of a user t; generating a story character based on the story parameters; generating a story event for the story character according to the story parameters; generating a storyline in a plurality of story stages according to the storyline; and generating a visual data story curve and a data story report according to the story line. The technical scheme of the invention provides a new solution for testing three key problems of reliability, fairness and resolvable in the current big data application, supports two analysis tasks of the white-if analysis and the Why-not analysis, has stronger practicability, and is beneficial to solving the contradiction between the availability and the interpretability of the big data age model.

Description

Predictive analysis result-oriented data story generation method and system
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a data story generation method and system for predictive analysis results.
Background
Interpretable and trusted of predictive analysis results is a focus of attention for future society. With the widespread implementation of automatic decision-making applications such as personalized recommendations, autopilot, intelligent medicine, machine translation, etc., there is an increasing focus on the interpretability of the ethical, ethical and legal problems behind them. The difficulty in interpreting predictive analysis results is that it is not desirable to overscore the theoretical and technical problems associated with the model to non-professionals, and it is not possible to leak the implementation details of the model itself and the underlying trade secret.
The existing related research is mainly focused on the problem of interpretability of predictive algorithms and models. The problem of interpretive algorithms and models has become a hot topic of research in related fields and has evolved into a new field, interpretive machine learning (Interpretable Machine Learning). At present, the interpretable machine learning has greatly progressed in the aspects of methodology, key technology, application development and the like, and model-independent and local interpretation technology represented by LIME algorithm can provide theoretical basis for story description oriented to predictive analysis results. Meanwhile, "interpretation of algorithms or models" and "interpretation of analysis results" are terms that are both relational and discriminative, and the field of interpretable machine learning is mainly concerned with how to solve the interpretable problem of machine learning algorithms and models trained thereby.
Therefore, how to provide a data story scheme for the predictive analysis result becomes a technical problem to be solved at present.
Disclosure of Invention
Aiming at the problems, the invention aims to provide a data story generation method and system for a predictive analysis result, which are used for giving a data story scheme for the predictive analysis result of a predictive analysis system, and are independent of a model of the predictive analysis system, so that the contradiction between usability and interpretability is solved.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
one aspect of the present invention provides a data story generation method for a predictive analysis result, including the steps of:
treating the predictive analysis system as a black box, and generating story parameters based on input parameters of a user t;
generating a story character based on the story parameters;
generating a story event for the story character according to the story parameters;
generating a storyline in a plurality of story stages according to the storyline;
and generating a visual data story curve and a data story report according to the story line.
Further, the processing the predictive analysis system as a black box to generate story parameters specifically includes:
acquiring input information t [ X ] of user t]Predicted outcome y expected by user t target Actual prediction result t [ y ] returned to user t by predictive analysis system]Wherein X is a feature set, and y is a target vector; acquiring feature subset X in feature set X, for which bias needs to be detected * The corresponding target vector is y * The method comprises the steps of carrying out a first treatment on the surface of the Acquiring feature subset X * As a subset of features without detection bias
Figure BDA0004092280570000021
The corresponding target vector is +.>
Figure BDA0004092280570000022
Obtaining an invariable feature subset X 'in a feature set X, wherein a corresponding target vector is y'; obtaining a complement of feature subset X "as a variable feature subset
Figure BDA0004092280570000023
The corresponding target vector is +.>
Figure BDA0004092280570000024
Will input information t [ X ]]Prediction result t [ y ]]Predicted outcome y expected by user t target Feature subset X * And its target vector y * Feature subset
Figure BDA0004092280570000025
And its target vector->
Figure BDA0004092280570000026
Feature subset X ", and its target vector y", feature subset +.>
Figure BDA0004092280570000027
And its target vector->
Figure BDA0004092280570000028
As story parameters, for generating a story character.
Further, the generating the story character according to the story parameters specifically includes:
taking the user t as the host public t, and determining input information t [ X ] and actual prediction result t [ y ] of the host public t;
determining a character t of the same type as the host male t Different types of characters t The judgment basis is that the feature subset X needing to detect the bias * Whether the same person as the host public t is the same or not, and if so, the person is called as a person t of the same type as the host public t ", otherwise called" character t of a different type from the owner ”;
Determining a frontal character t relative to a host man t + Reverse character t - Judging whether the predicted result is the same as the actual predicted result of the host male t, if so, the method is called as' front character t + ", otherwise called" reverse character t - ”。
Further, the generating a story event for the story person according to the story parameters includes generating a reliability test event, a fairness test event, and a resolvable test event; wherein, generating the reliability test event refers to submitting input information t [ X ] of the host male t to the predictive analysis system multiple times]Judging the actual prediction result t [ y ] output for multiple times]Corresponding predictive label t [ y ]]' whether the same or its float is within less than a reliability threshold; generating fairness test event refers to a character t of the same type as the owner The absolute value of the difference between the probability that the expression of the occurrence bias detection rule is established and the probability that the expression of the occurrence bias detection rule is established in all users is in a range smaller than a bias threshold value;
the Bias rule expression Bias is:
Bias={|P((y * ==y target )|(X * ==t[X=]))-P(y==y target )|<ε 1 }
wherein Bias represents whether the detected character feature may have a Bias, tx ]Representing the same type of character group t as the character of which the owner is willing to detect a prejudice Is characterized by epsilon 1 Represents the threshold of acceptable range for bias, P ((y) * ==y target )|(X * ==t[X ]) Representing the same type of character group t The probability that the Bias expression Bias holds, P (y= y) target ) Representing the probability of occurrence of Bias expression Bias in all characters;
generating a solvable test event refers to by a subset of features that are variable to the host's public t
Figure BDA0004092280570000029
Performing minimum change processing on a plurality of characteristic attribute values of (a) to realize the expected prediction result y of the user target
Further, the generating the storyline according to the storyline in a plurality of story stages includes generating the storyline according to a start stage, an ascending stage, a climax stage, a descending stage and a final stage, and specifically includes:
in the starting stage, story parameters are set, wherein the set story parameters comprise input information t [ X ] of a host male t and an actual prediction result t [ y ];
setting a reliability test event as an 'flaring event' at a starting point in a rising stage, and drawing a fairness test result at a fairness test event bit; finding y for the first time based on the host male t target On the premise of (1) and y by multiple what-if analysis events target After distance sequencing, set median event Q b 2 Last quarter bit event Q b 1 And a lower quartile event Q b 3
In the climax stage, setting climax events, namely, finding expected prediction results y conforming to the public t of the owner in the what-if analysis for the first time target Event or prediction of (a)The sexual analysis system recommends events for the host male t;
during the descent phase, finding y for the first time based on the host male t target On the premise of (1) carrying out Why-not analysis, and carrying out Why-not analysis on the event according to the sum y target Setting median event Q after distance sequencing a 2 Last quarter bit event Q a 1 And a lower quartile event Q a 3
In the ending phase, suggested events are set, including suggestions made by the predictive analysis system for the owner.
Further, the generating a visual data story line according to the story line specifically includes:
taking the development time of the story line as the abscissa to find the expected prediction result y of the user t for the first time target The similarity value of (2) is taken as an ordinate, and different types of points are used for representing whether the host male t is the same type of person and whether the host male t is the front person, so that the expected prediction result y of the user t is found for the first time target Points (1) are the plotted curves of the tower tip according to the pyramid model.
Further, the columns of the data story report include reliability, fairness and resolvable, and the rows of the data story report include input samples, predicted results, analysis methods, analysis results, analysis conclusions, suggestions of the predictive analysis system to the user t.
In another aspect, the present invention further provides a data story generating system facing to a predictive analysis result, including:
the parameter generation module is used for processing the predictive analysis system as a black box and generating story parameters based on input parameters of a user t;
a character generation module for generating a story character based on the story parameters;
the event generation module is used for generating a story event for the story character according to the story parameters;
the plot generation module is used for generating a plot according to the story event according to a plurality of story stages;
the view generation module is used for generating a visual data story line according to the story line;
and the report generation module is used for generating a data story report according to the story line.
In a further aspect the invention provides a processing device comprising at least a processor and a memory, the memory having stored thereon a computer program, the processor executing the steps of the method for generating a data story oriented on a predictive analysis result.
Yet another aspect of the invention provides a computer storage medium having stored thereon computer readable instructions executable by a processor to perform the steps of a data story generation method for predictive analysis results.
Due to the adoption of the technical scheme, the invention has the following advantages:
1. the invention provides a model-independent data story method, which can be applied to any predictive analysis algorithm and business, has stronger universality and flexibility, and can be used as a new functional module in common data analysis software such as SPSS, SAS, excel.
2. The invention provides a new solution for testing the reliability, fairness and resolvability of three key problems in the current big data application, supports two analysis tasks of the What-if analysis and the Why-not analysis, has stronger practicability, and provides a new solution for solving the contradiction between the availability and the interpretability of the big data age model.
3. The invention adopts a data story mode to present, provides a visual story line curve and a readable data story report, has higher readability and understandability, and is not limited to the knowledge level and the professional field of target audience.
4. The invention supports the source selection of different story data, thereby supporting various application purposes.
When the management side (such as a merchant) of the service system provides story data, the interpretation rights are owned by the management side, and the invention better supports business logic and business purposes; when generated by a user, the invention better supports the user experience and the interactive story generation method; the present invention better supports third party testing and evaluation scenarios when standard test datasets are generated or employed by third party institutions.
5. The data story generated in the invention is a visual data story curve and a readable data story report, which not only supports the reading of human users, but also can support the readability of computer users, thereby supporting the research and development of newly added functions and the upgrading of functional modules, and having strong expandability.
6. The last stage of the data story provided by the invention supports commercial advertisement and algorithm recommendation, and has a strong commercial application prospect.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Like parts are designated with like reference numerals throughout the drawings. In the drawings:
FIG. 1 is a flow chart of a data story generation method for predictive analysis results in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of feature matrix and target vector in the predictive analysis result according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a data story generation flow for predictive analysis results in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of a visual data story line of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which are obtained by a person skilled in the art based on the described embodiments of the invention, fall within the scope of protection of the invention.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Data storylinization is an important means of interpreting the results of predictive analysis. From the data reception mode of the subject, perception is a precondition for cognition, which is a continuation of perception. Data visualization and data storylinization solve the perception and cognition problems of data, respectively. The data visualization has the characteristics of easy understanding, easy perception and easy hole finding, and the data story has the characteristics of easy memorization, easy cognition and easy experience. Therefore, the data story is widely applied to the application scene of interpreting the predictive analysis result to the non-professional, so that the trust of the non-professional on the predictive analysis result is obtained.
An aspect of the embodiment of the invention is to provide a data story generation method oriented to predictive analysis results, which is independent of a model and can be applied to any predictive analysis algorithm and business system. In the data story generation method, firstly, a predictive analysis system is treated as a black box, and story parameters are generated based on input parameters of a user t; generating a story character based on the story parameters; generating a story event for the story character according to the story parameters; generating a storyline in a plurality of story stages according to the storyline;
and generating a visual data story curve and a data story report according to the story line. On the other hand, the data story generation system facing to the predictive analysis result is also provided corresponding to the data story generation method.
Example 1
The embodiment provides a data story generation method facing predictive analysis results, as shown in fig. 1, the method includes the following steps:
s1, using a predictive analysis system as a black box to process, and generating story parameters based on input parameters of a user t;
s2, generating a story character based on story parameters;
s3, generating a story event for the story character according to the story parameters;
s4, generating a story line according to the story line and a plurality of story stages;
and S5, generating a visual data story curve and a data story report according to the story line.
In step S1, story parameters are generated.
The predictive analysis system S is treated as a black box and its inputs and outputs are abstracted (or mapped) into a new relational pattern R (X, y), where X and y are the names of feature sets and the names of target vectors, respectively. The data form of X and y is given as shown in fig. 2. The story parameters need to be provided by the story audience (or predictive analysis system user). The user input data and its meaning are as follows:
(1) t [ X ]: input information submitted to the predictive analysis system S by the user t;
(2) t [ y ]: the predictive analysis system S returns a predicted result to the user t;
(3)y target : the user t expects (or wants) a prediction result;
(4)X * : a subset of the feature set X, representing a feature subset for which the user t needs to detect a bias,
Figure BDA0004092280570000061
Figure BDA0004092280570000062
is X * Complement of (i.e.)>
Figure BDA0004092280570000063
X * And->
Figure BDA0004092280570000064
The corresponding target vector is denoted as y * And->
Figure BDA0004092280570000065
(5)t[X ]: the same type of character group t as the character of the owner's public detection prejudice Is a subset of features in (a).
(6) X': a subset of feature set X represents a feature subset that is unchanged for user t, such as gender, skin tone, belief, ethnicity, and the like.
Figure BDA0004092280570000066
Is the complement of X ", i.e. +.>
Figure BDA0004092280570000067
X' and->
Figure BDA0004092280570000068
The corresponding target vectors are denoted y "and +.>
Figure BDA0004092280570000069
Input information t [ X ]]Prediction result t [ y ]]Predicted outcome y expected by user t target Feature subset X * And its target vector y * Feature subset
Figure BDA00040922805700000610
And its target vector->
Figure BDA00040922805700000611
Feature subset X ", and its target vector y", feature subset +.>
Figure BDA00040922805700000612
And its target vector->
Figure BDA00040922805700000613
As story parameters, for generating a story character. In step S2, a story character is generated. Story characters are classified into a host man, a man-like/heterogeneous character, a front/back face with respect to the host manA character, wherein:
(1) Host male t: for a particular user t of the predictive analysis system S, the characteristic information and the prediction result are t [ X ] and t [ y ], respectively. the values of tX and ty may be not only the current input value of the user, but also the historical input value or the default value of the predictive analysis system S.
(2) Character t of the same type as the host male t Different types of characters t : the basis for the judgment is that the feature subset X possibly has prejudice * Whether the figures that are the same as the owners are the same or not. If the characters are the same, the characters are called as 'characters of the same type as the characters of the public of the host', otherwise, the characters of the different type as the characters of the public of the host are called as 'characters of the different type as the characters of the public of the host'.
Wherein the character t of the same type as the host male t The generation method of (1) comprises the following steps: x is X * Take the same value as t and complement the attributes
Figure BDA00040922805700000614
A random value conforming to the corresponding attribute definition field is employed. Character t of the same type
t ={z|z[X * ]=t[X * ]∩z[X * ]Is a random value }
Wherein the character t is of a different type from the host's public t The generation method of (1) comprises the following steps: x is X * Take a different value than t, complement the properties
Figure BDA00040922805700000615
A random value conforming to the corresponding attribute definition field is employed. Different types of characters t
t ={z|z[X * ]≠t[X * ]∩z[X * ]Is a random value }
(3) A frontal character t relative to the host man t + Reverse character t - : the judgment is based on whether the predicted result (or the classified result) is the same as the owner. If the characters are the same, the character is called a front character t + ", otherwise is" reverse character t - ”。
Wherein, relative to the host man t, the front character t + Definition of (2)The generation method specifically comprises the following steps:
if the value of y (or classification result) is equal to t [ y ]]Similarly, randomly select k (k>0) The nearest neighbor sample is taken as the front character t + If the front character t cannot be found in the history + Generating a frontal character t by fine tuning the variable attributes of the host character t +
Figure BDA0004092280570000071
Wherein p is a variable attribute of the host male t
Figure BDA0004092280570000072
Is the number of (3); x is x j Variable property set for user t +.>
Figure BDA0004092280570000073
Is a member attribute of (a); x is x j And x' j The j-th variable attribute value of the host public t and the value after fine tuning are respectively adopted.
Wherein, relative to the host man t, the back character t - The defining and generating method of (1) specifically comprises:
if the value of y (or classification result) is equal to t [ y ]]By contrast, select k (k>0) The nearest neighbor sample is taken as the reverse character t - . If the front character t is not found in the history - A reverse character t is generated by the following method -
Figure BDA0004092280570000074
Wherein p is a variable attribute of the host male t
Figure BDA0004092280570000075
Is the number of (3); x is x j Variable property set for user t +.>
Figure BDA0004092280570000076
Is a member attribute of (a); x is x j And x' j The j-th variable attribute value of the host public t and the value after fine tuning are respectively adopted.
In step S3, a story event is generated, the story event including a reliability test event, a fairness test event, and a resolvable test event.
The reliability refers to the credibility of the prediction result corresponding to the same behavior of the same host male t for a plurality of times; fairness refers to the fact that the person t is of the same kind as the owner t Whether there is a discrimination or Bias, i.e., whether Bias rules (bias_benchmark) are satisfied; resolvable refers to whether a particular user is by modifying its variable feature set
Figure BDA0004092280570000078
Reaching the expected prediction result y target
(1) The method for generating the reliability test event specifically comprises the following steps: submitting the characteristic information t [ X ] of the user t n times (n is more than or equal to 2) to the predictive analysis system S]View the corresponding tag t [ y ]]' whether equal or floating within a negligible reliability threshold range (ε) 2 ) Inside, namely:
Figure BDA0004092280570000077
(2) The method for generating the fairness test event specifically comprises the following steps:
user group t The absolute value of the difference between the probability of occurrence of Bias rule expression Bias in all users and the probability of occurrence of Bias rule expression Bias in all users is within an acceptable range epsilon 1 The inner part is as follows:
Bias={|P((y * ==y target )|(X * ==t[X ]))-P(y==y target )|<ε 1 }
wherein Bias represents whether the detected character feature may have a Bias, tx ]Representing the same type of character group t as the character of which the owner is willing to detect a prejudice Is characterized by epsilon 1 Representing acceptable prejudiceRange threshold, P ((y) * ==y target )|(X * ==t[X ]) Representing the same type of character group t The probability that the Bias expression Bias holds, P (y= y) target ) Representing the probability of occurrence of Bias expression Bias in all characters;
the meaning of the Bias expression Bias is: calculating the difference between the probability that the predicted result expected by the user t can be obtained based on the feature to be detected, which may have a bias, and the probability that the predicted result expected by the user t can be obtained based on all the features, if the difference is smaller than a threshold epsilon 1 These features to be detected are considered to be unbiased.
(3) The method for generating the resolvable test event comprises the following steps:
for user t, by varying its characteristics
Figure BDA0004092280570000081
With minimal variation to help achieve the objective y target The specific method comprises the following steps:
Figure BDA0004092280570000082
wherein p is a variable attribute of the host male t
Figure BDA0004092280570000083
Is the number of (3); x is x j Variable property set for t +.>
Figure BDA0004092280570000084
Is a member attribute of (a); x is x j And x' j The j-th variable attribute value of the host public t and the value after fine tuning are respectively adopted.
In step S4, a storyline is generated. The story line adopts a pyramid mode, and is divided into five stages:
a start phase, an up phase, a climax phase, a down phase, and a final phase, as shown in fig. 3.
(1) Starting: only one event, namely the setting of story parameters, is included, including the characteristic information of the user and the prediction result of the predictive analysis system S.
(2) Rising stage: setting the user to find y target The event occurring on the premise of (2) includes three types of events:
setting the reliability test event to an "incipient event" at the start point;
drawing a fairness test result at a fairness test event bit;
the rest length of the rising stage is equally divided into 4 parts, and the user t is subjected to What-if analysis events for a plurality of times according to the sum y target Median event after distance sequencing (Q b 2 ) Last quarter bit event (Q) b 1 ) And a lower quartile event (Q b 3 ) Indicating that user t is finding y target Various attempts and efforts have been made previously.
(3) And (3) a climax stage: comprising only one event, i.e. the event recommended by the predictive analysis system S or the user finding a predicted result y in the what-if analysis, which corresponds to the user' S own expectations target Is a part of the event.
(4) The descending stage: comprising at least three events, user t finding y for the first time target On the premise of (2) performing Why-not analysis, and analyzing the cause of success for successful users. The length of the descent phase is equally divided into 4 parts, and the user t is subjected to Why-not analysis events for a plurality of times according to the sum y target Median event after distance sequencing (Q a 2 ) Last quarter bit event (Q) a 1 ) And a lower quartile event (Q a 3 ) Indicating that user t is finding y target Various causal analyses were then performed.
(5) Ending stage: the suggestion made by the predictive analysis system S to the user t, including only one event, may be determined by the business logic of the predictive analysis system S, such as corresponding to the corresponding commercial or algorithmic recommendation information.
In step S5, a data story is generated. Firstly, a visual data story curve is generated, and secondly, a readable data story report is generated, and the two have complementary effects.
The generated data story curve is shown in fig. 4, and the specific drawing method is as follows:
abscissa: the development time of the storyline;
ordinate: y first found with user target Similarity of (2);
curve type: pyramid model, the tower tip is y found by the user for the first time target
Shape of the dot: indicating whether it is homogeneous with the user t or is a frontal character.
As can be seen from fig. 4, the owner first performs a reliability test in the data story, this event being the flaring event. The character then acts continuously to find y target The event that occurs at this stage is the what-if analysis event, and the data story is continually advancing. When the story character finds y for the first time target At this time, the development of the data story reaches climax. Then the person gets y target On the basis of (a) and continuing to take action, the event occurring at this stage is why-not analysis event, and y at this stage target The development of the data story reaches the decline period without change. Finally, the story character stops acting, the data story enters a tail sound, and the user finds a suggestion to take a decision or take an action.
The readable data story report consists of 3 columns and 6 rows as shown in table 1. The 3 columns are reliability, fairness and resolvable respectively; the 6 lines are respectively input samples, prediction results, analysis methods, analysis results, analysis conclusions and suggestions of the predictive analysis system S to the user t.
Table 1 Structure of readable data story report
Figure BDA0004092280570000091
The invention provides a model-independent post-interpretation data story method provided by predictive analysis aiming at data story description of predictive analysis results. The technical scheme of the invention can be used as a new functional module in common data analysis software (such as SPSS, SAS, excel). At present, the data analysis software does not have a data story function, and the data model, the training of the agent analysis model, the definition and the verification of the formal script and the Python toolkit in the data story method facing the predictive analysis result can provide theoretical basis and tool support for adding a new data story function module into the software.
Example 2
In contrast to the above embodiment 1, which provides a data story generation method for a predictive analysis result, this embodiment provides a data story generation system for a predictive analysis result. The system provided in this embodiment may implement the data story generating method for predictive analysis results of embodiment 1, where the system may be implemented by software, hardware, or a combination of software and hardware. For example, the system may include integrated or separate functional modules or functional units to perform the corresponding steps in the methods of embodiment 1. Since the system of this embodiment is substantially similar to the method embodiment, the description of this embodiment is relatively simple, and the relevant points may be found in part in the description of embodiment 1, which is provided by way of illustration only.
The data story generation system facing predictive analysis results provided in this embodiment includes:
the parameter generation module is used for processing the predictive analysis system as a black box and generating story parameters based on input parameters of a user t;
a character generation module for generating a story character based on the story parameters;
the event generation module is used for generating a story event for the story character according to the story parameters;
the plot generation module is used for generating a plot according to the story event according to a plurality of story stages;
the view generation module is used for generating a visual data story line according to the story line;
and the report generation module is used for generating a data story report according to the story line.
Example 3
The present embodiment provides a processing device corresponding to the data story generating method for a predictive analysis result provided in the present embodiment 1, where the processing device may be a processing device for a client, for example, a mobile phone, a notebook computer, a tablet computer, a desktop computer, or the like, to perform the method of embodiment 1.
The processing device comprises a processor, a memory, a communication interface and a bus, wherein the processor, the memory and the communication interface are connected through the bus so as to complete communication among each other. A computer program executable on the processor is stored in the memory, and when the processor executes the computer program, the data story generation method for predictive analysis results provided in embodiment 1 is executed.
In some embodiments, the memory may be a high-speed random access memory (RAM: random Access Memory), and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
In other embodiments, the processor may be a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or other general purpose processor, which is not limited herein.
Example 4
The data story generation method for a predictive analysis result of this embodiment 1 may be embodied as a computer program product, which may include a computer-readable storage medium having computer-readable program instructions loaded thereon for performing the data story generation method for a predictive analysis result of this embodiment 1.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any combination of the preceding.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for generating a data story oriented to predictive analysis results, the method comprising the steps of:
treating the predictive analysis system as a black box, and generating story parameters based on input parameters of a user t;
generating a story character based on the story parameters;
generating a story event for the story character according to the story parameters;
generating a storyline in a plurality of story stages according to the storyline;
and generating a visual data story curve and a data story report according to the story line.
2. The method for generating a data story for a predictive analysis result of claim 1,
the predictive analysis system is treated as a black box to generate story parameters, and the method specifically comprises the following steps:
acquiring input information t [ X ] of user t]Predicted outcome y expected by user t target Actual prediction result t [ y ] returned to user t by predictive analysis system]Wherein X is a feature set, and y is a target vector; acquiring feature subset X in feature set X, for which bias needs to be detected * The corresponding target vector is y * The method comprises the steps of carrying out a first treatment on the surface of the Acquiring feature subset X * As a subset of features without detection bias
Figure FDA0004092280560000011
The corresponding target vector is +.>
Figure FDA0004092280560000012
Obtaining an invariable feature subset X 'in a feature set X, wherein a corresponding target vector is y'; obtaining a complement of feature subset X "as a variable feature subset
Figure FDA0004092280560000013
The corresponding target vector is +.>
Figure FDA0004092280560000014
Will input information t [ X ]]Prediction result t [ y ]]Predicted outcome y expected by user t target Feature subset X * And its target vector y * Feature subset
Figure FDA0004092280560000015
And its target vector->
Figure FDA0004092280560000016
Feature subset X ", and its target vector y", feature subset +.>
Figure FDA0004092280560000017
And its target vector
Figure FDA0004092280560000018
As story parameters, for generating a story character.
3. The method for generating a data story for a predictive analysis result of claim 2,
the generating the story character according to the story parameters specifically comprises the following steps:
taking the user t as the host public t, and determining input information t [ X ] and actual prediction result t [ y ] of the host public t;
determining a character t of the same type as the host male t Different types of characters t The judgment basis is that the feature subset X needing to detect the bias * Whether the person who is the same as the host male t is the same,if the characters are the same, the character is called as a character t of the same type as a host ", otherwise called" character t of a different type from the owner ”;
Determining a frontal character t relative to a host man t + Reverse character t - Judging whether the predicted result is the same as the actual predicted result of the host male t, if so, the method is called as' front character t + ", otherwise called" reverse character t - ”。
4. The method for generating a data story for a predictive analysis result of claim 3,
generating a story event for a story character according to story parameters, including generating a reliability test event, a fairness test event and a resolvable test event;
the generation of the reliability test event refers to that whether the predictive labels t [ y ]' corresponding to the actual predictive results t [ y ] output for many times are the same or float within a range smaller than a reliability threshold value is judged by submitting input information t [ X ] of the host male t to the predictive analysis system for many times;
generating fairness test event refers to a character t of the same type as the owner The absolute value of the difference between the probability that the expression of the occurrence bias detection rule is established and the probability that the expression of the occurrence bias detection rule is established in all users is in a range smaller than a bias threshold value;
the Bias rule expression Bias is:
Bias={|P((y * ==y target )|(X * ==t[X=]))-P(y==y target )|<ε 1 }
wherein Bias represents whether the detected character feature may have a Bias, tx =]Representing the same type of character group t as the character of which the owner is willing to detect a prejudice Is characterized by epsilon 1 Represents the threshold of acceptable range for bias, P ((y) * ==y target )|(X * ==t[X=]) Representing the same type of character group t The probability that the Bias expression Bias holds, P (y= y) target ) Representing the probability of occurrence of Bias expression Bias in all characters;
generating a solvable test event refers to by a subset of features that are variable to the host's public t
Figure FDA0004092280560000021
Performing minimum change processing on a plurality of characteristic attribute values of (a) to realize the expected prediction result y of the user target
5. The method for generating a predictive analysis result-oriented data story according to claim 4, wherein,
the generating the storyline according to the story line and the story stages includes generating the storyline according to the start stage, the rise stage, the climax stage, the decline stage and the ending stage:
in the starting stage, story parameters are set, wherein the set story parameters comprise input information t [ X ] of a host male t and an actual prediction result t [ y ];
setting a reliability test event as an 'flaring event' at a starting point in a rising stage, and drawing a fairness test result at a fairness test event bit; finding y for the first time based on the host male t target On the premise of (1) and y by multiple what-if analysis events target After distance sequencing, set median event Q b 2 Last quarter bit event Q b 1 And a lower quartile event Q b 3
In the climax stage, setting climax events, namely, finding expected prediction results y conforming to the public t of the owner in the what-if analysis for the first time target The event or the event recommended by the predictive analysis system for the host male t;
during the descent phase, finding y for the first time based on the host male t target On the premise of (1) carrying out Why-not analysis, and carrying out Why-not analysis on the event according to the sum y target Setting median event Q after distance sequencing a 2 Last quarter bit event Q a 1 And a lower quartile event Q a 3
In the ending phase, suggested events are set, including suggestions made by the predictive analysis system for the owner.
6. The method for generating a data story for a predictive analysis result of claim 5,
the method for generating the visual data story line according to the story line specifically comprises the following steps:
taking the development time of the story line as the abscissa to find the expected prediction result y of the user t for the first time target The similarity value of (2) is taken as an ordinate, and different types of points are used for representing whether the host male t is the same type of person and whether the host male t is the front person, so that the expected prediction result y of the user t is found for the first time target Points (1) are the plotted curves of the tower tip according to the pyramid model.
7. The method for generating a data story for a predictive analysis result of claim 5,
the columns of the data story report include reliability, fairness and resolvable, and the rows of the data story report include input samples, predicted results, analysis methods, analysis results, analysis conclusions, and suggestions of the predictive analysis system to the user t.
8. A predictive analysis result-oriented data story generation system, comprising:
the parameter generation module is used for processing the predictive analysis system as a black box and generating story parameters based on input parameters of a user t;
a character generation module for generating a story character based on the story parameters;
the event generation module is used for generating a story event for the story character according to the story parameters;
the plot generation module is used for generating a plot according to the story event according to a plurality of story stages;
the view generation module is used for generating a visual data story line according to the story line;
and the report generation module is used for generating a data story report according to the story line.
9. A processing device comprising at least a processor and a memory, said memory having stored thereon a computer program, characterized in that the processor executes the steps of the predictive analysis result oriented data story generation method of any of claims 1 to 7 when running said computer program.
10. A computer storage medium having stored thereon computer readable instructions executable by a processor to implement the steps of the predictive analysis result oriented data story generation method of any of claims 1 to 7.
CN202310155761.3A 2023-02-23 2023-02-23 Predictive analysis result-oriented data story generation method and system Pending CN116340752A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310155761.3A CN116340752A (en) 2023-02-23 2023-02-23 Predictive analysis result-oriented data story generation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310155761.3A CN116340752A (en) 2023-02-23 2023-02-23 Predictive analysis result-oriented data story generation method and system

Publications (1)

Publication Number Publication Date
CN116340752A true CN116340752A (en) 2023-06-27

Family

ID=86886765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310155761.3A Pending CN116340752A (en) 2023-02-23 2023-02-23 Predictive analysis result-oriented data story generation method and system

Country Status (1)

Country Link
CN (1) CN116340752A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573859A (en) * 2024-01-15 2024-02-20 杭州数令集科技有限公司 Data processing method, system and equipment for automatically advancing scenario and dialogue

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573859A (en) * 2024-01-15 2024-02-20 杭州数令集科技有限公司 Data processing method, system and equipment for automatically advancing scenario and dialogue

Similar Documents

Publication Publication Date Title
US12067571B2 (en) Systems and methods for generating models for classifying imbalanced data
RU2678716C1 (en) Use of autoencoders for learning text classifiers in natural language
WO2018196760A1 (en) Ensemble transfer learning
CN108846077B (en) Semantic matching method, device, medium and electronic equipment for question and answer text
JP5171962B2 (en) Text classification with knowledge transfer from heterogeneous datasets
US20200110842A1 (en) Techniques to process search queries and perform contextual searches
US20130097103A1 (en) Techniques for Generating Balanced and Class-Independent Training Data From Unlabeled Data Set
US11637858B2 (en) Detecting malware with deep generative models
CN113408558B (en) Method, apparatus, device and medium for model verification
Zhou et al. Bias, fairness and accountability with artificial intelligence and machine learning algorithms
US20230368003A1 (en) Adaptive sparse attention pattern
CN116340752A (en) Predictive analysis result-oriented data story generation method and system
CN117573985B (en) Information pushing method and system applied to intelligent online education system
CN114118526A (en) Enterprise risk prediction method, device, equipment and storage medium
WO2019204008A1 (en) Identification, extraction and transformation of contextually relevant content
WO2021236423A1 (en) Identifying claim complexity by integrating supervised and unsupervised learning
Bevara et al. Scaling Implicit Bias Analysis across Transformer-Based Language Models through Embedding Association Test and Prompt Engineering
CN114912623B (en) Method and device for model interpretation
CN110851600A (en) Text data processing method and device based on deep learning
CN114676237A (en) Sentence similarity determining method and device, computer equipment and storage medium
Zhao et al. Research on data imbalance in intrusion detection using CGAN
Dubey et al. Analysis of supervised and unsupervised technique for authentication dataset
Muralitharan et al. Privacy BERT-LSTM: a novel NLP algorithm for sensitive information detection in textual documents
CN117171653B (en) Method, device, equipment and storage medium for identifying information relationship
Mohammed et al. Brain tumour classification using BoF-SURF with filter-based feature selection methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination