CN116340752A - Predictive analysis result-oriented data story generation method and system - Google Patents

Predictive analysis result-oriented data story generation method and system Download PDF

Info

Publication number
CN116340752A
CN116340752A CN202310155761.3A CN202310155761A CN116340752A CN 116340752 A CN116340752 A CN 116340752A CN 202310155761 A CN202310155761 A CN 202310155761A CN 116340752 A CN116340752 A CN 116340752A
Authority
CN
China
Prior art keywords
story
generating
character
event
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310155761.3A
Other languages
Chinese (zh)
Inventor
朝乐门
张晨
靳庆文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Renmin University of China
Original Assignee
Renmin University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Renmin University of China filed Critical Renmin University of China
Priority to CN202310155761.3A priority Critical patent/CN116340752A/en
Publication of CN116340752A publication Critical patent/CN116340752A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明涉及一种面向预测性分析结果的数据故事生成方法及系统。所述方法包括以下步骤:将预测性分析系统作为黑盒处理,基于用户t的输入参数生成故事参数;基于故事参数生成故事人物;根据故事参数为故事人物生成故事事件;根据故事事件按多个故事阶段生成故事情节;根据故事情节生成可视化数据故事曲线和数据故事报表。本发明的技术方案为当前大数据应用中可靠性、公平性和可解性三个关键问题的测试提供了新的解决方案,支持What‑if分析和Why‑not分析两种分析任务,具有较强的实用性,有助于解决大数据时代模型可用性和可解释性之间的矛盾。

Figure 202310155761

The invention relates to a data story generation method and system for predictive analysis results. The method comprises the following steps: treating the predictive analysis system as a black box, generating story parameters based on user t's input parameters; generating story characters based on the story parameters; generating story events for the story characters according to the story parameters; The story stage generates a storyline; according to the storyline, a visual data story curve and a data story report are generated. The technical scheme of the present invention provides a new solution for the testing of the three key problems of reliability, fairness and solvability in current big data applications, supports two analysis tasks of What-if analysis and Why-not analysis, and has relatively Strong practicability helps to solve the contradiction between model usability and interpretability in the era of big data.

Figure 202310155761

Description

一种面向预测性分析结果的数据故事生成方法及系统A data story generation method and system for predictive analysis results

技术领域technical field

本发明涉及人工智能技术领域,尤其涉及一种面向预测性分析结果的数据故事生成方法及系统。The present invention relates to the technical field of artificial intelligence, in particular to a data story generation method and system for predictive analysis results.

背景技术Background technique

预测性分析结果的可解释与可信任是未来社会的关注焦点。随着个性化推荐、自动驾驶、智能医疗、机器翻译等自动决策类应用的广泛实施,人们越来越关注其背后的道德、伦理和法律问题的可解释性。预测性分析结果的解释难点在于不宜向非专业人士过多讲解模型相关的理论与技术问题,更不能泄漏模型本身的实现细节及背后的商业秘密。The interpretability and trustworthiness of predictive analysis results will be the focus of future society. With the widespread implementation of automatic decision-making applications such as personalized recommendation, automatic driving, intelligent medical care, and machine translation, people are paying more and more attention to the explainability of moral, ethical, and legal issues behind them. The difficulty in interpreting the results of predictive analysis is that it is not appropriate to explain the theoretical and technical issues related to the model to non-professionals, let alone reveal the implementation details of the model itself and the business secrets behind it.

现有相关研究主要集中在预测性算法和模型的可解释性问题。算法和模型的可解释性问题已成为相关领域研究的一个热门话题,并逐渐发展为一个新领域——可解释性机器学习(Interpretable Machine Learning)。目前,可解释性机器学习在方法论、关键技术和应用开发等方面均已取得较大进展,以LIME算法为代表的模型无关、局部解释技术可以为面向预测型分析结果的故事化描述提供理论基础。同时,“算法或模型的解释”与“分析结果的解释”是既有联系又有区别的术语,可解释性机器学习领域主要关注的是如何解决机器学习算法及其所训练模型的可解释问题。Existing related research mainly focuses on the interpretability of predictive algorithms and models. The interpretability of algorithms and models has become a hot topic of research in related fields, and has gradually developed into a new field - interpretable machine learning (Interpretable Machine Learning). At present, interpretable machine learning has made great progress in methodology, key technologies, and application development. The model-independent and local interpretation technology represented by the LIME algorithm can provide a theoretical basis for the story-oriented description of predictive analysis results. . At the same time, "interpretation of algorithms or models" and "interpretation of analysis results" are terms that are both related and different. The field of interpretable machine learning is mainly concerned with how to solve the interpretability of machine learning algorithms and the models they train. .

因此,如何面向预测性分析结果给出数据故事化方案,成为当前需要解决的技术问题。Therefore, how to provide data storytelling solutions for predictive analysis results has become a technical problem that needs to be solved at present.

发明内容Contents of the invention

针对上述问题,本发明的目的是提供一种面向预测性分析结果的数据故事生成方法及系统,用于针对预测性分析系统的预测性分析结果给出数据故事化方案,不依赖于预测性分析系统的模型,解决可用性和可解释性之间的矛盾。In view of the above problems, the object of the present invention is to provide a data story generation method and system for predictive analysis results, which is used to provide data storytelling solutions for the predictive analysis results of the predictive analysis system without relying on predictive analysis A model of a system that resolves the tension between usability and interpretability.

为实现上述目的,本发明采取以下技术方案:To achieve the above object, the present invention takes the following technical solutions:

本发明的一个方面提供一种面向预测性分析结果的数据故事生成方法,包括以下步骤:One aspect of the present invention provides a method for generating data stories for predictive analysis results, comprising the following steps:

将预测性分析系统作为黑盒处理,基于用户t的输入参数生成故事参数;treat the predictive analytics system as a black box, generating story parameters based on the input parameters of user t;

基于故事参数生成故事人物;Generate story characters based on story parameters;

根据故事参数为故事人物生成故事事件;Generate story events for story characters based on story parameters;

根据故事事件按多个故事阶段生成故事情节;Generate storylines in multiple story stages based on story events;

根据故事情节生成可视化数据故事曲线和数据故事报表。Generate visual data story curves and data story reports based on storylines.

进一步地,所述将预测性分析系统作为黑盒处理,生成故事参数,具体包括:Further, the predictive analysis system is treated as a black box to generate story parameters, specifically including:

获取用户t的输入信息t[X]、用户t期待的预测结果ytarget以及预测性分析系统向用户t返回的实际预测结果t[y],其中,X为特征集,y为目标向量;获取特征集X中需要检测偏见的特征子集X*,对应目标向量为y*;获取特征子集X*的补集作为无需检测偏见的特征子集

Figure BDA0004092280570000021
对应目标向量为/>
Figure BDA0004092280570000022
Obtain the input information t[X] of user t, the expected prediction result y target of user t, and the actual prediction result t[y] returned by the predictive analysis system to user t, where X is the feature set and y is the target vector; The feature subset X * that needs to detect bias in the feature set X, the corresponding target vector is y * ; get the complement of the feature subset X * as the feature subset that does not need to detect bias
Figure BDA0004092280570000021
The corresponding target vector is />
Figure BDA0004092280570000022

获取特征集X中不可变的特征子集X",对应目标向量为y";获取特征子集X"的补集作为可变的特征子集

Figure BDA0004092280570000023
对应目标向量为/>
Figure BDA0004092280570000024
Get the immutable feature subset X" in the feature set X, and the corresponding target vector is y"; get the complement of the feature subset X" as a variable feature subset
Figure BDA0004092280570000023
The corresponding target vector is />
Figure BDA0004092280570000024

将输入信息t[X]、预测结果t[y]、用户t期待的预测结果ytarget、特征子集X*及其目标向量y*、特征子集

Figure BDA0004092280570000025
及其目标向量/>
Figure BDA0004092280570000026
特征子集X"及其目标向量y"、特征子集/>
Figure BDA0004092280570000027
及其目标向量/>
Figure BDA0004092280570000028
作为故事参数,用于生成故事人物。Input information t[X], prediction result t[y], prediction result y target expected by user t, feature subset X * and its target vector y * , feature subset
Figure BDA0004092280570000025
and its target vector />
Figure BDA0004092280570000026
Feature subset X" and its target vector y", feature subset />
Figure BDA0004092280570000027
and its target vector />
Figure BDA0004092280570000028
As a story parameter, used to generate story characters.

进一步地,所述根据故事参数生成故事人物,具体包括:Further, said generating story characters according to story parameters specifically includes:

将用户t作为主人公t,并确定主人公t的输入信息t[X]和实际预测结果t[y];Take the user t as the protagonist t, and determine the input information t[X] and the actual prediction result t[y] of the protagonist t;

确定与主人公t相同类型的人物t/不同类型的人物t,判断依据为在需要检测偏见的特征子集X*上与主人公t相同的人物是否为相同,若相同则称为“与主人公相同类型的人物t”,否则称为“与主人公不同类型的人物t”;Determine the character t of the same type as the protagonist t = /character of a different type t , the basis for judging is whether the same character as the protagonist t on the feature subset X * that needs to detect bias is the same, if they are the same, it is called "with the protagonist Characters of the same type t = ", otherwise it is called "characters of different types from the protagonist t ";

确定相对于主人公t而言的正面人物t+/反面人物t-,判断依据为预测结果与主人公t的实际预测结果是否相同,若相同则称为“正面人物t+”,否则称为“反面人物t-”。Determine the positive character t + / negative character t - relative to the protagonist t, the judgment is based on whether the prediction result is the same as the actual prediction result of the protagonist t, if they are the same, it is called "positive character t + ", otherwise it is called "negative character Character t- ".

进一步地,所述根据故事参数为故事人物生成故事事件,包括生成可靠性测试事件、公平性测试事件和可解性测试事件;其中,生成可靠性测试事件是指通过多次向预测性分析系统提交主人公t的输入信息t[X]判断多次输出的实际预测结果t[y]对应的预测标签t[y]’是否相同或其浮动在小于可靠性阈值范围之内;生成公平性测试事件是指与主人公相同类型的人物t中出现偏见检测规则的表达式成立的概率与所有用户中出现偏见检测规则的表达式成立的概率之差的绝对值在小于偏差阈值范围内;Further, said generating story events for story characters according to story parameters includes generating reliability test events, fairness test events and solvability test events; wherein, generating reliability test events refers to the multi-direction predictive analysis system Submit the input information t[X] of the protagonist t to judge whether the prediction label t[y]' corresponding to the actual prediction result t[y] output multiple times is the same or whether it floats within the range of less than the reliability threshold; generate a fairness test event It means that the absolute value of the difference between the probability of the establishment of the expression of the bias detection rule in the character t = of the same type as the protagonist and the probability of the establishment of the expression of the bias detection rule in all users is within the range of less than the deviation threshold;

偏见规则表达式Bias为:The bias rule expression Bias is:

Bias={|P((y*==ytarget)|(X*==t[X=]))-P(y==ytarget)|<ε1}Bias={|P((y * ==y target )|(X * ==t[X=]))-P(y==y target )|<ε 1 }

其中,Bias表示检测人物特征是否可能存在偏见,t[X]表示与主人公想检测偏见的特征相同的同类型人物群t中的特征,ε1表示偏见的可接受范围阈值,P((y*==ytarget)|(X*==t[X]))表示同类型人物群t中出现偏见表达式Bias成立的概率,P(y==ytarget)表示所有人物中出现偏见表达式Bias的概率;Among them, Bias represents whether there may be a bias in the detection of character features, t[X = ] represents the characteristics of the same type of character group t = in the same type of character as the protagonist wants to detect bias, ε1 represents the acceptable range threshold of bias, P(( y * ==y target )|(X * ==t[X ])) indicates the probability that the bias expression Bias appears in the same type of characters t , and P(y==y target ) indicates the probability that Bias appears in all characters The probability of bias expression Bias;

生成可解性测试事件是指通过对主人公t可变的特征子集

Figure BDA0004092280570000029
的多个特征属性值进行最小变化处理实现用户的期待预测结果ytarget。Generating solvability test events refers to a variable subset of features for the protagonist t
Figure BDA0004092280570000029
The minimum change processing is performed on multiple characteristic attribute values of the user to achieve the user's expected prediction result y target .

进一步地,所述根据故事事件按多个故事阶段生成故事情节,是指按照起步阶段、上升阶段、高潮阶段、下降阶段和结局阶段生成故事情节,具体包括:Further, said generating storylines according to multiple story stages according to story events refers to generating storylines according to the initial stage, rising stage, climax stage, descending stage and ending stage, specifically including:

在起步阶段,对故事参数进行设置,设置的故事参数包括主人公t的输入信息t[X]和实际预测结果t[y];In the initial stage, set the story parameters. The set story parameters include the input information t[X] of the protagonist t and the actual prediction result t[y];

在上升阶段,在起点处将可靠性测试事件设置为“煽动性事件”,在公平性测试事件位绘制公平性测试结果;基于主人公t第一次找到ytarget的前提下,将多次what-if分析事件按与ytarget距离排序后,设置中位事件Qb 2、上四分位事件Qb 1和下四分位事件Qb 3In the rising stage, set the reliability test event as an "inciting event" at the starting point, and draw the fairness test result at the fairness test event position; based on the premise that the protagonist t finds y target for the first time, the If the analysis events are sorted according to the distance from the y target , set the median event Q b 2 , the upper quartile event Q b 1 and the lower quartile event Q b 3 ;

在高潮阶段,设置高潮事件,所述高潮事件是在what-if分析中第一次找到符合主人公t的期待预测结果ytarget的事件或者预测性分析系统为主人公t推荐的事件;In the climax stage, a climax event is set, and the climax event is the first event found in the what-if analysis that meets the expected prediction result y target of the protagonist t or an event recommended by the predictive analysis system for the protagonist t;

在下降阶段,基于主人公t第一次找到ytarget的前提下,进行Why-not分析,将多次Why-not分析事件按与ytarget距离排序后设置中位事件Qa 2、上四分位事件Qa 1和下四分位事件Qa 3In the descending stage, based on the premise that the protagonist t finds the y target for the first time, the Why-not analysis is performed, and the multiple Why-not analysis events are sorted according to the distance from the y target , and then the median event Q a 2 and the upper quartile are set event Q a 1 and lower quartile event Q a 3 ;

在结局阶段,设置建议事件,所述建议事件包括由预测性分析系统为主人公给出的建议。In the ending stage, a suggestion event is set, which includes a suggestion given by the predictive analysis system for the protagonist.

进一步地,所述根据故事情节生成可视化数据故事曲线,具体包括:Further, the generating a visual data story curve according to the storyline specifically includes:

以故事情节的发展时间为横坐标,以首次发现用户t的期待预测结果ytarget的相似度值为纵坐标,以不同类型的点表示与主人公t是否为同类人物以及是否为正面人物,以首次发现用户t的期待预测结果ytarget的点为塔尖按照金字塔模型的绘制曲线。Take the development time of the storyline as the abscissa, and take the similarity value of the expected prediction result y target of the first discovery of user t as the ordinate, and use different types of points to indicate whether the protagonist t is a similar character and whether he is a positive person. It is found that the point of user t's expected prediction result y target is the curve drawn according to the pyramid model.

进一步地,所述数据故事报表的列包括可靠性、公平性和可解性,所述数据故事报表的行包括输入样本、预测结果、分析方法、分析结果、分析结论、预测性分析系统对用户t的建议。Further, the columns of the data story report include reliability, fairness, and solvability, and the rows of the data story report include input samples, prediction results, analysis methods, analysis results, analysis conclusions, predictive analysis system for users t's suggestion.

本发明的另一方面还提供一种面向预测性分析结果的数据故事生成系统,包括:Another aspect of the present invention also provides a data story generation system for predictive analysis results, including:

参数生成模块,用于将预测性分析系统作为黑盒处理,基于用户t的输入参数生成故事参数;a parameter generation module for treating the predictive analytics system as a black box and generating story parameters based on the input parameters of the user t;

人物生成模块,用于基于故事参数生成故事人物;A character generation module, configured to generate story characters based on story parameters;

事件生成模块,用于根据故事参数为故事人物生成故事事件;An event generation module, configured to generate story events for story characters according to story parameters;

情节生成模块,用于根据故事事件按多个故事阶段生成故事情节;A plot generation module for generating storylines in multiple story stages based on story events;

视图生成模块,用于根据故事情节生成可视化数据故事曲线;The view generation module is used to generate a visual data story curve according to the storyline;

报表生成模块,用于根据故事情节生成数据故事报表。The report generation module is used to generate a data story report according to the storyline.

本发明的另一方面还提供一种处理设备,所述处理设备至少包括处理器和存储器,所述存储器上存储有计算机程序,所述处理器运行所述计算机程序时执行以实现面向预测性分析结果的数据故事生成方法的步骤。Another aspect of the present invention also provides a processing device, the processing device includes at least a processor and a memory, a computer program is stored on the memory, and the processor executes the computer program to implement predictive analysis The steps of the resulting data story generation method.

本发明的另一方面还提供一种计算机存储介质,其上存储有计算机可读指令,所述计算机可读指令可被处理器执行以实现面向预测性分析结果的数据故事生成方法的步骤。Another aspect of the present invention also provides a computer storage medium on which computer-readable instructions are stored, and the computer-readable instructions can be executed by a processor to implement the steps of the method for generating data stories oriented to predictive analysis results.

本发明由于采取以上技术方案,其具有以下优点:The present invention has the following advantages due to the adoption of the above technical scheme:

1、本发明提出了一种模型无关的数据故事化方法,可以适用于任何预测性分析算法和业务,具有较强的通用性和灵活性,可以作为常用数据分析软件,如SPSS、SAS、Excel中的新功能模块。1. The present invention proposes a model-independent data storytelling method, which can be applied to any predictive analysis algorithm and business, has strong versatility and flexibility, and can be used as common data analysis software, such as SPSS, SAS, Excel New feature modules in .

2、本发明为当前大数据应用中的三个关键问题可靠性、公平性和可解性的测试提供了新的解决方案,支持What-if分析和Why-not分析两种分析任务,具有较强的实用性,为解决大数据时代模型可用性和可解释性之间的矛盾提供了新解决方案。2. The present invention provides a new solution for testing the reliability, fairness and solvability of the three key problems in current big data applications, and supports What-if analysis and Why-not analysis. Strong practicability provides a new solution to solve the contradiction between model usability and interpretability in the era of big data.

3、本发明采用数据故事化方式呈现,提供可视故事情节曲线和可读数据故事报表,具备较高的可读性和可理解性,对目标受众的知识水平和专业领域不限。3. The present invention adopts data story-based presentation, provides visual story plot curves and readable data story reports, has high readability and comprehension, and is not limited to the knowledge level and professional field of the target audience.

4、本发明支持不同的故事数据的来源选择,进而支持多种应用目的。4. The present invention supports selection of sources of different story data, and further supports multiple application purposes.

当由业务系统的管理方(如商家)提供故事数据时,解释权归管理方所有,本发明较好地支持其商业逻辑和业务目的;当由用户自己生成时,本发明较好地支持用户体验和交互式故事生成方法;当由第三方机构生成或采用标准测试数据集,本发明较好地支持第三方测试和评估场景。When story data is provided by the manager of the business system (such as a merchant), the right of interpretation belongs to the manager, and the present invention better supports its business logic and business purpose; when generated by the user himself, the present invention better supports the user Experiential and interactive story generation methods; the present invention better supports third-party testing and evaluation scenarios when generated by third-party institutions or using standard test data sets.

5、本发明中生成的数据故事为可视数据故事曲线和可读数据故事报表,不仅支持人类用户的阅读,而且也可以计算机用户的可读性,进而支持新增功能的研发与功能模块的升级,可扩展性强。5. The data stories generated in the present invention are visual data story curves and readable data story reports, which not only support the reading of human users, but also the readability of computer users, and then support the development of new functions and the development of functional modules Upgrade, strong scalability.

6.本发明给出的数据故事的最后一个阶段支持商业广告和算法推荐,具有较强的商业应用前景。6. The last stage of the data story provided by the present invention supports commercial advertisements and algorithm recommendations, and has strong commercial application prospects.

附图说明Description of drawings

通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。在整个附图中,用相同的附图标记表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiment. The drawings are only for the purpose of illustrating a preferred embodiment and are not to be considered as limiting the invention. Throughout the drawings, the same reference numerals are used to refer to the same parts. In the attached picture:

图1是本发明实施例的面向预测性分析结果的数据故事生成方法流程图;Fig. 1 is a flow chart of a data story generation method for predictive analysis results according to an embodiment of the present invention;

图2是本发明实施例的预测性分析结果中的特征矩阵与目标向量示意图;Fig. 2 is a schematic diagram of a feature matrix and a target vector in the predictive analysis results of an embodiment of the present invention;

图3是本发明实施例的面向预测性分析结果的数据故事生成流程示意图;FIG. 3 is a schematic diagram of a data story generation process for predictive analysis results according to an embodiment of the present invention;

图4是本发明实施例的可视数据故事曲线示意图。Fig. 4 is a schematic diagram of a visual data story curve according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例的附图,对本发明实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例。基于所描述的本发明的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the following will clearly and completely describe the technical solutions of the embodiments of the present invention in conjunction with the drawings of the embodiments of the present invention. Apparently, the described embodiments are some, not all, embodiments of the present invention. All other embodiments obtained by those skilled in the art based on the described embodiments of the present invention belong to the protection scope of the present invention.

需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本申请的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terminology used here is only for describing specific implementations, and is not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and/or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and/or combinations thereof.

数据故事化是解释预测性分析结果的重要手段。从主体的数据接受模式看,感知是认知的前提,认知是感知的延续。数据可视化和数据故事化分别解决的是数据的感知和认知问题。数据可视化具有易于理解、易于感知和易于洞见的特点,而数据故事化具备易于记忆、易于认知和易于体验的特点。因此,数据故事化将会广泛应用于将预测性分析结果解释给非专业人士的应用场景,进而得到非专业人士对预测性分析结果的信任。Data storytelling is an important means of interpreting predictive analytics results. From the subject's data acceptance mode, perception is the premise of cognition, and cognition is the continuation of perception. Data visualization and data storytelling respectively solve the problems of data perception and cognition. Data visualization has the characteristics of easy understanding, perception and insight, while data storytelling has the characteristics of easy memory, cognition and experience. Therefore, data storytelling will be widely used in application scenarios where predictive analysis results are explained to non-professionals, and then non-professionals can gain trust in predictive analysis results.

本发明实施例的一个方面是提供一种面向预测性分析结果的数据故事生成方法,该方法与模型本身无关,可以适用于任何预测性分析算法和业务系统。数据故事生成方法中,首先将预测性分析系统作为黑盒处理,基于用户t的输入参数生成故事参数;基于故事参数生成故事人物;根据故事参数为故事人物生成故事事件;根据故事事件按多个故事阶段生成故事情节;One aspect of the embodiments of the present invention is to provide a predictive analysis result-oriented data story generation method, which has nothing to do with the model itself and can be applied to any predictive analysis algorithm and business system. In the data story generation method, firstly, the predictive analysis system is treated as a black box, and the story parameters are generated based on the input parameters of the user t; the story characters are generated based on the story parameters; the story events are generated for the story characters according to the story parameters; according to the story events, multiple The story stage generates the storyline;

根据故事情节生成可视化数据故事曲线和数据故事报表。另一方面与该数据故事生成方法相对应,还提供一种面向预测性分析结果的数据故事生成系统。Generate visual data story curves and data story reports based on storylines. On the other hand, corresponding to the data story generation method, a data story generation system oriented to predictive analysis results is also provided.

实施例1Example 1

本实施例提供一种面向预测性分析结果的数据故事生成方法,如图1所示,该方法包括以下步骤:This embodiment provides a data story generation method for predictive analysis results, as shown in Figure 1, the method includes the following steps:

S1,将预测性分析系统作为黑盒处理,基于用户t的输入参数生成故事参数;S1, treat the predictive analysis system as a black box, and generate story parameters based on the input parameters of user t;

S2,基于故事参数生成故事人物;S2, generating story characters based on story parameters;

S3,根据故事参数为故事人物生成故事事件;S3, generating story events for story characters according to story parameters;

S4,根据故事事件按多个故事阶段生成故事情节;S4, generating a storyline according to a plurality of story stages according to a story event;

S5,根据故事情节生成可视化数据故事曲线和数据故事报表。S5, generating a visualized data story curve and a data story report according to the storyline.

在步骤S1中,生成故事参数。In step S1, story parameters are generated.

将预测性分析系统S作为黑盒处理,并将其输入和输出抽象(或映射)为一个新关系模式R(X,y),其中X和y分别为特征集的名称和目标向量的名称。如图2所示给出了X和y的数据形式。故事参数需要由故事受众(或预测性分析系统用户)提供。用户输入数据及其含义如下:Treat the predictive analytics system S as a black box, and abstract (or map) its input and output into a new relational schema R(X,y), where X and y are the names of feature sets and target vectors, respectively. As shown in Figure 2, the data form of X and y is given. Story parameters need to be provided by the story audience (or predictive analytics system user). The user input data and their meaning are as follows:

(1)t[X]:用户t提交给预测性分析系统S的输入信息;(1) t[X]: input information submitted by user t to predictive analysis system S;

(2)t[y]:预测性分析系统S向用户t返回的预测结果;(2) t[y]: the prediction result returned by the predictive analysis system S to the user t;

(3)ytarget:用户t期待(或想要)的预测结果;(3) y target : the predicted result that user t expects (or wants);

(4)X*:特征集X的子集,表示用户t需要检测偏见的特征子集,

Figure BDA0004092280570000061
(4) X * : A subset of the feature set X, representing the subset of features that user t needs to detect bias,
Figure BDA0004092280570000061

Figure BDA0004092280570000062
为X*的补集,即/>
Figure BDA0004092280570000063
X*和/>
Figure BDA0004092280570000064
对应的目标向量记为y*和/>
Figure BDA0004092280570000065
Figure BDA0004092280570000062
is the complement of X * , ie />
Figure BDA0004092280570000063
X * and />
Figure BDA0004092280570000064
The corresponding target vectors are denoted as y * and />
Figure BDA0004092280570000065

(5)t[X]:与主人公想检测偏见的特征相同的同类型人物群t中的特征子集。(5) t[X ]: A subset of features in the group t of the same type of characters as the protagonist wants to detect bias.

(6)X":特征集X的子集,表示对用户t而言无法变化的特征子集,如性别、肤色、信仰和民族等。

Figure BDA0004092280570000066
为X"的补集,即/>
Figure BDA0004092280570000067
X"和/>
Figure BDA0004092280570000068
对应的目标向量记为y"和/>
Figure BDA0004092280570000069
(6) X": A subset of the feature set X, which represents a subset of features that cannot be changed for user t, such as gender, skin color, belief, and ethnicity.
Figure BDA0004092280570000066
is the complement of X", ie />
Figure BDA0004092280570000067
X" and />
Figure BDA0004092280570000068
The corresponding target vectors are denoted as y" and />
Figure BDA0004092280570000069

输入信息t[X]、预测结果t[y]、用户t期待的预测结果ytarget、特征子集X*及其目标向量y*、特征子集

Figure BDA00040922805700000610
及其目标向量/>
Figure BDA00040922805700000611
特征子集X"及其目标向量y"、特征子集/>
Figure BDA00040922805700000612
及其目标向量/>
Figure BDA00040922805700000613
作为故事参数,用于生成故事人物。在步骤S2中,生成故事人物。故事人物分为主人公、与主人公同类/异类人物、相对于主人公而言的正面/反面人物,其中:Input information t[X], prediction result t[y], prediction result y target expected by user t, feature subset X * and its target vector y * , feature subset
Figure BDA00040922805700000610
and its target vector />
Figure BDA00040922805700000611
Feature subset X" and its target vector y", feature subset />
Figure BDA00040922805700000612
and its target vector />
Figure BDA00040922805700000613
As a story parameter, used to generate story characters. In step S2, story characters are generated. The characters in the story are divided into the protagonist, similar/different characters to the protagonist, and positive/negative characters relative to the protagonist, among which:

(1)主人公t:为预测性分析系统S的一个特定用户t,其特征信息和预测结果分别为t[X]和t[y]。t[X]和t[y]值不仅可以为用户当前输入值,也可以为历史输入值或预测性分析系统S默认值。(1) Protagonist t: a specific user t of the predictive analysis system S, whose feature information and prediction results are t[X] and t[y] respectively. The values of t[X] and t[y] can be not only the current input values by the user, but also the historical input values or the default values of the predictive analysis system S.

(2)与主人公t相同类型的人物t/不同类型的人物t:判断依据为在可能出现偏见的特征子集X*上与主人公相同的人物是否为相同。若相同则称之为“与主人公相同类型的人物”,否则为“与主人公不同类型的人物”。(2) Character t of the same type as the protagonist t = /character of a different type t : the basis for judging is whether the same character as the protagonist is the same in the characteristic subset X * that may appear biased. If they are the same, it is called "the same type of character as the protagonist", otherwise it is called "a different type of character from the protagonist".

其中,与主人公t相同类型的人物t的生成方法为:X*取与t相同的值,补集属性

Figure BDA00040922805700000614
采用符合相应属性定义域的随机值。相同类型的人物t:Among them, the generation method of the character t = of the same type as the protagonist t is: X * takes the same value as t, and the complement attribute
Figure BDA00040922805700000614
Takes a random value that fits the domain of the corresponding property. People of the same type t = :

t={z|z[X*]=t[X*]∩z[X*]为随机值}t ={z|z[X * ]=t[X * ]∩z[X * ] is a random value}

其中,与主人公t不同类型的人物t的生成方法为:X*取与t不同的值,补集属性

Figure BDA00040922805700000615
采用符合相应属性定义域的随机值。不同类型的人物t:Among them, the generation method of a character t different from the protagonist t is: X * takes a value different from t, and the complement attribute
Figure BDA00040922805700000615
Takes a random value that fits the domain of the corresponding property. Different types of characters t :

t={z|z[X*]≠t[X*]∩z[X*]为随机值}t ={z|z[X * ]≠t[X * ]∩z[X * ] is a random value}

(3)相对于主人公t而言的正面人物t+/反面人物t-:判断依据为预测结果(或分类结果)与主人公是否相同。若相同则称之为“正面人物t+”,否则为“反面人物t-”。(3) Positive character t + /villain t relative to the protagonist t: the basis for judging is whether the prediction result (or classification result) is the same as the protagonist. If they are the same, it is called “positive character t + ”, otherwise it is called “negative character t ”.

其中,相对于主人公t而言,正面人物t+的定义及生成方法具体包括:Among them, relative to the protagonist t, the definition and generation method of the positive character t + specifically include:

如果y的取值(或分类结果)与t[y]相同,从历史记录中随机选择k(k>0)个最近邻样本作为正面人物t+,如果历史记录中找不到正面人物t+,通过对主人公t的可变属性进行微调的方式生成正面人物t+If the value of y (or classification result) is the same as t[y], randomly select k (k>0) nearest neighbor samples from the historical records as the positive person t + , if the positive person t + cannot be found in the historical record , generate a positive character t + by fine-tuning the variable attributes of the protagonist t:

Figure BDA0004092280570000071
Figure BDA0004092280570000071

式中,p为主人公t的可变属性

Figure BDA0004092280570000072
的个数;xj为用户t的可变属性集/>
Figure BDA0004092280570000073
的一个成员属性;xj和x’j分别为主人公t的第j个可变属性值及其微调后的取值。In the formula, p is the variable attribute of the protagonist t
Figure BDA0004092280570000072
The number of; x j is the variable attribute set of user t/>
Figure BDA0004092280570000073
A member attribute of ; x j and x' j are respectively the jth variable attribute value of the protagonist t and its fine-tuned value.

其中,相对于主人公t而言,反面人物t-的定义及生成方法具体包括:Among them, relative to the protagonist t, the definition and generation method of the villain t - specifically include:

如果y的取值(或分类结果)与t[y]不同,从历史记录中选择k(k>0)个最近邻样本作为反面人物t-。如果历史记录中找不到正面人物t-,用以下方法生成一个反面人物t-If the value of y (or classification result) is different from t[y], select k (k>0) nearest neighbor samples from the historical records as villain t - . If the positive character t - cannot be found in the historical records, use the following method to generate a negative character t - :

Figure BDA0004092280570000074
Figure BDA0004092280570000074

式中,p为主人公t的可变属性

Figure BDA0004092280570000075
的个数;xj为用户t的可变属性集/>
Figure BDA0004092280570000076
的一个成员属性;xj和x’j分别为主人公t的第j个可变属性值及其微调后的取值。In the formula, p is the variable attribute of the protagonist t
Figure BDA0004092280570000075
The number of; x j is the variable attribute set of user t/>
Figure BDA0004092280570000076
A member attribute of ; x j and x' j are respectively the jth variable attribute value of the protagonist t and its fine-tuned value.

在步骤S3中,生成故事事件,故事事件包括可靠性测试事件、公平性测试事件和可解性测试事件。In step S3, a story event is generated, and the story event includes a reliability test event, a fairness test event, and a solvability test event.

其中,可靠性是指同一个主人公t的多次相同行为所对应预测结果的信度;公平性是指在与主人公t同类人物t中是否存在歧视或偏见,即是否满足偏见规则(Bias_benchmark);可解性是指对于特定用户而言,是否通过修改其可变特征集

Figure BDA0004092280570000078
达到所期待的预测结果ytarget。Among them, reliability refers to the reliability of the prediction results corresponding to multiple identical behaviors of the same protagonist t; fairness refers to whether there is discrimination or prejudice among characters t = similar to the protagonist t, that is, whether the bias rule (Bias_benchmark) is satisfied ; Solvability refers to whether for a specific user, by modifying its variable feature set
Figure BDA0004092280570000078
Achieve the expected prediction result y target .

(1)可靠性测试事件的生成方法具体为:向预测性分析系统S提交n次(n≥2)用户t的特征信息t[X],查看对应标签t[y]’是否相等或其浮动在可忽略的可靠性阈值范围(ε2)之内,即:(1) The generation method of the reliability test event is as follows: submit the feature information t[X] of user t n times (n≥2) to the predictive analysis system S, and check whether the corresponding label t[y]' is equal or floating Within the negligible reliability threshold range (ε 2 ), namely:

Figure BDA0004092280570000077
Figure BDA0004092280570000077

(2)公平性测试事件的生成方法具体为:(2) The method of generating the fairness test event is as follows:

用户群t中出现偏见规则表达式Bias成立的概率与所有用户中出现偏见规则表达式Bias的概率之差的绝对值在可接受范围ε1之内:The absolute value of the difference between the probability of the bias rule expression Bias occurring in the user group t = and the probability of the bias rule expression Bias appearing in all users is within the acceptable range ε1 :

Bias={|P((y*==ytarget)|(X*==t[X]))-P(y==ytarget)|<ε1}Bias={|P((y * ==y target )|(X * ==t[X ]))-P(y==y target )|<ε 1 }

其中,Bias表示检测人物特征是否可能存在偏见,t[X]表示与主人公想检测偏见的特征相同的同类型人物群t中的特征,ε1表示偏见的可接受范围阈值,P((y*==ytarget)|(X*==t[X]))表示同类型人物群t中出现偏见表达式Bias成立的概率,P(y==ytarget)表示所有人物中出现偏见表达式Bias的概率;Among them, Bias represents whether there may be a bias in the detection of character features, t[X = ] represents the characteristics of the same type of character group t = in the same type of character as the protagonist wants to detect bias, ε1 represents the acceptable range threshold of bias, P(( y * ==y target )|(X * ==t[X ])) indicates the probability that the bias expression Bias appears in the same type of characters t , and P(y==y target ) indicates the probability that Bias appears in all characters The probability of bias expression Bias;

偏见表达式Bias的含义为:计算基于待检测的可能存在偏见的特征能够获得用户t期待的预测结果的概率与基于所有特征能够获得用户t期待的预测结果的概率之差,若这个差值小于阈值ε1,则认为这些待检测的特征不存在偏见。The meaning of the bias expression Bias is: calculate the difference between the probability of obtaining the predicted result expected by user t based on the features to be detected that may be biased and the probability of obtaining the predicted result expected by user t based on all features, if the difference is less than Threshold ε 1 , it is considered that there is no bias in these features to be detected.

(3)可解性测试事件的生成方法具体为:(3) The generation method of the solvability test event is as follows:

对于用户t,通过对其可变特征

Figure BDA0004092280570000081
上进行最小变化来帮助其实现达到目的ytarget,具体方法如下:For user t, through its variable features
Figure BDA0004092280570000081
Make minimal changes to help it achieve the goal y target , the specific method is as follows:

Figure BDA0004092280570000082
Figure BDA0004092280570000082

式中,p为主人公t的可变属性

Figure BDA0004092280570000083
的个数;xj为t的可变属性集/>
Figure BDA0004092280570000084
的一个成员属性;xj和x’j分别为主人公t的第j个可变属性值及其微调后的取值。In the formula, p is the variable attribute of the protagonist t
Figure BDA0004092280570000083
The number of ; x j is the variable attribute set of t />
Figure BDA0004092280570000084
A member attribute of ; x j and x' j are respectively the jth variable attribute value of the protagonist t and its fine-tuned value.

在步骤S4中,生成故事情节。故事情节采用的是金字塔模式,分为五个阶段:In step S4, a storyline is generated. The storyline uses a pyramid model, divided into five stages:

起步阶段、上升阶段、高潮阶段、下降阶段和结局阶段,如图3所示。The initial stage, rising stage, climax stage, descending stage and ending stage are shown in Figure 3.

(1)起步阶段:只包括一个事件,即故事参数的设置,含用户的特征信息和预测性分析系统S的预测结果。(1) Initial stage: only one event is included, that is, the setting of story parameters, including the user's characteristic information and the prediction result of the predictive analysis system S.

(2)上升阶段:设置用户在找到ytarget的前提下发生的事件,包括三类事件:(2) Ascending stage: set the events that occur on the premise that the user finds the y target , including three types of events:

在起点处将可靠性测试事件设置为“煽动性事件”;Set the reliability test event as "inciting event" at the origin;

在公平性测试事件位绘制公平性测试结果;Draw the fairness test result at the fairness test event bit;

上升阶段的其余长度等分为4个部分,将用户t多次what-if分析事件按与ytarget距离排序后的中位事件(Qb 2)、上四分位事件(Qb 1)和下四分位事件(Qb 3),表示用户t在发现ytarget之前做得各种尝试和努力。The remaining length of the ascending phase is divided into 4 parts equally, and the median event (Q b 2 ), upper quartile event (Q b 1 ) and The lower quartile event (Q b 3 ) indicates that user t made various attempts and efforts before finding y target .

(3)高潮阶段:只包括一个事件,即预测性分析系统S推荐的事件或用户在what-if分析中第一次找到一个符合用户t自己所期待的预测结果ytarget的事件。(3) Climax stage: only one event is included, that is, the event recommended by the predictive analysis system S or the event that the user finds for the first time in the what-if analysis that meets the predicted result y target expected by the user t.

(4)下降阶段:至少包括三个事件,用户t第一次找到ytarget的前提下,进行Why-not分析,对已成功用户进行成功原因的分析。下降阶段的长度等分为4个部分,将用户t多次Why-not分析事件按与ytarget距离排序后的中位事件(Qa 2)、上四分位事件(Qa 1)和下四分位事件(Qa 3),表示用户t在发现ytarget之后进行的各种因果分析。(4) Declining stage: At least three events are included. On the premise that user t finds y target for the first time, a Why-not analysis is performed to analyze the success reasons of successful users. The length of the descending phase is equally divided into 4 parts, the median event (Q a 2 ), the upper quartile event (Q a 1 ) and the lower quartile event (Q a 1 ) and the lower quartile event ( The quartile event (Q a 3 ) represents various causal analyzes performed by user t after discovering y target .

(5)结局阶段:只包括一个事件,预测性分析系统S对用户t给出的建议,可以由预测性分析系统S的商业逻辑决定,如对应相应商业广告或算法推荐信息。(5) Outcome stage: only one event is included, and the suggestion given by the predictive analysis system S to the user t can be determined by the business logic of the predictive analysis system S, such as corresponding to the corresponding commercial advertisement or algorithm recommendation information.

在步骤S5中,生成数据故事。一是生成可视数据故事曲线,二是生成可读数据故事报表,二者具有互补作用。In step S5, a data story is generated. One is to generate a visual data story curve, and the other is to generate a readable data story report, both of which are complementary.

生成的数据故事曲线如图4所示,其具体绘制方法如下:The generated data story curve is shown in Figure 4, and its specific drawing method is as follows:

·横坐标:故事情节的发展时间;·Abscissa: the development time of the storyline;

·纵坐标:与用户第一次发现的ytarget的相似度;Vertical coordinate: the similarity with the y target discovered by the user for the first time;

·曲线类型:金字塔模型,塔尖为用户第一次发现的ytargetCurve type: pyramid model, the spire is the y target discovered by the user for the first time;

·点的形状:表示与用户t是否为同类以及是否为正面人物。· The shape of the dot: indicates whether it is the same as the user t and whether it is a positive person.

通过图4可以看出,主人公在数据故事中首先进行可靠性测试,这一事件成为煽动性事件。随后人物不断行动以找到ytarget,在这一阶段发生的事件为what-if分析事件,数据故事不断推进。当故事人物第一次找到ytarget时,数据故事的发展达到高潮。接着人物在得到ytarget的基础上继续采取行动,这一阶段发生的事件为why-not分析事件,并且此阶段的ytarget不会变化,数据故事的发展达到下降期。最后故事人物停止行动,数据故事进入尾声,用户找到了采取决策或采取行动的建议。As can be seen from Figure 4, the protagonist first conducts a reliability test in the data story, and this event becomes an inciting event. Then the characters continue to act to find the y target , and the events at this stage are what-if analysis events, and the data story continues to advance. The development of the data story reaches its climax when the story characters first find the y target . Then the characters continue to take actions on the basis of obtaining the y target . The events that occur at this stage are why-not analysis events, and the y target at this stage will not change, and the development of the data story reaches a decline period. Finally, the story characters stop acting, the data story comes to an end, and the user finds a suggestion to make a decision or take an action.

可读数据故事报表由3个列和6个行组成,如表1所示。3个列分别为可靠性、公平性和可解性;6个行分别为输入样本、预测结果、分析方法、分析结果、分析结论、预测性分析系统S对用户t的建议。The readable data story report consists of 3 columns and 6 rows, as shown in Table 1. The three columns are reliability, fairness, and solvability; the six rows are input samples, prediction results, analysis methods, analysis results, analysis conclusions, and suggestions from predictive analysis system S to user t.

表1可读数据故事报表的结构Table 1 Structure of a readable data story report

Figure BDA0004092280570000091
Figure BDA0004092280570000091

本发明针对预测性分析结果的数据故事化描述,提出了一种预测性分析提供的模型无关的后解释性数据故事化方法。本发明的技术方案可以作为常用数据分析软件(如SPSS、SAS、Excel)中的新功能模块。目前,数据分析类软件并不具备数据故事化功能,本发明提出的面向预测性分析结果的数据故事化方法中的数据模型、代理分析模型的训练、形式化脚本的定义与校验以及Python工具包可以为此类软件中增加数据故事化新功能模块提供理论依据和工具支持。Aiming at the data story description of predictive analysis results, the present invention proposes a model-independent post-explanatory data story description method provided by predictive analysis. The technical scheme of the present invention can be used as a new functional module in common data analysis software (such as SPSS, SAS, Excel). At present, data analysis software does not have the function of data storytelling. The data model in the data storytelling method for predictive analysis results proposed by the present invention, the training of agent analysis models, the definition and verification of formalized scripts, and the Python tool Packages can provide theoretical basis and tool support for adding new functional modules of data storytelling in such software.

实施例2Example 2

上述实施例1提供了面向预测性分析结果的数据故事生成方法,与之相对应地,本实施例提供一种面向预测性分析结果的数据故事生成系统。本实施例提供的系统可以实施实施例1的面向预测性分析结果的数据故事生成方法,该系统可以通过软件、硬件或软硬结合的方式来实现。例如,该系统可以包括集成的或分开的功能模块或功能单元来执行实施例1各方法中的对应步骤。由于本实施例的系统基本相似于方法实施例,所以本实施例描述过程比较简单,相关之处可以参见实施例1的部分说明即可,本实施例提供的系统的实施例仅仅是示意性的。Embodiment 1 above provides a method for generating data stories oriented to predictive analysis results. Correspondingly, this embodiment provides a system for generating data stories oriented to predictive analysis results. The system provided in this embodiment can implement the predictive analysis result-oriented data story generation method in Embodiment 1, and the system can be implemented by software, hardware, or a combination of software and hardware. For example, the system may include integrated or separate functional modules or functional units to execute corresponding steps in the methods of Embodiment 1. Since the system of this embodiment is basically similar to the method embodiment, the description process of this embodiment is relatively simple. For relevant information, please refer to the part of the description of Embodiment 1. The embodiment of the system provided by this embodiment is only illustrative .

本实施例提供的一种面向预测性分析结果的数据故事生成系统,包括:A data story generation system oriented to predictive analysis results provided in this embodiment includes:

参数生成模块,用于将预测性分析系统作为黑盒处理,基于用户t的输入参数生成故事参数;a parameter generation module for treating the predictive analytics system as a black box and generating story parameters based on the input parameters of the user t;

人物生成模块,用于基于故事参数生成故事人物;A character generation module, configured to generate story characters based on story parameters;

事件生成模块,用于根据故事参数为故事人物生成故事事件;An event generation module, configured to generate story events for story characters according to story parameters;

情节生成模块,用于根据故事事件按多个故事阶段生成故事情节;A plot generation module for generating storylines in multiple story stages based on story events;

视图生成模块,用于根据故事情节生成可视化数据故事曲线;The view generation module is used to generate a visual data story curve according to the storyline;

报表生成模块,用于根据故事情节生成数据故事报表。The report generation module is used to generate a data story report according to the storyline.

实施例3Example 3

本实施例提供一种与本实施例1所提供的面向预测性分析结果的数据故事生成方法对应的处理设备,处理设备可以是用于客户端的处理设备,例如手机、笔记本电脑、平板电脑、台式机电脑等,以执行实施例1的方法。This embodiment provides a processing device corresponding to the predictive analysis result-oriented data story generation method provided in Embodiment 1. The processing device may be a processing device for a client, such as a mobile phone, a notebook computer, a tablet computer, a desktop Machine computer etc., to carry out the method of embodiment 1.

所述处理设备包括处理器、存储器、通信接口和总线,处理器、存储器和通信接口通过总线连接,以完成相互间的通信。存储器中存储有可在所述处理器上运行的计算机程序,所述处理器运行所述计算机程序时执行本实施例1所提供的面向预测性分析结果的数据故事生成方法。The processing device includes a processor, a memory, a communication interface and a bus, and the processor, the memory and the communication interface are connected through the bus to complete mutual communication. A computer program that can run on the processor is stored in the memory, and the processor executes the predictive analysis result-oriented data story generation method provided in Embodiment 1 when running the computer program.

在一些实施例中,存储器可以是高速随机存取存储器(RAM:Random AccessMemory),也可能还包括非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。In some embodiments, the memory may be a high-speed random access memory (RAM: Random Access Memory), and may also include a non-volatile memory, such as at least one disk memory.

在另一些实施例中,处理器可以为中央处理器(CPU)、数字信号处理器(DSP)等各种类型通用处理器,在此不做限定。In some other embodiments, the processor may be a central processing unit (CPU), a digital signal processor (DSP) and other types of general-purpose processors, which are not limited herein.

实施例4Example 4

本实施例1的面向预测性分析结果的数据故事生成方法可被具体实现为一种计算机程序产品,计算机程序产品可以包括计算机可读存储介质,其上载有用于执行本实施例1所述的面向预测性分析结果的数据故事生成方法的计算机可读程序指令。The method for generating a data story oriented to predictive analysis results in Embodiment 1 can be embodied as a computer program product, and the computer program product can include a computer-readable storage medium loaded with a method for executing the method described in Embodiment 1. Computer readable program instructions for a data story generation method for predictive analytics results.

计算机可读存储介质可以是保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是但不限于电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意组合。A computer readable storage medium may be a tangible device that holds and stores instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any combination of the above.

最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the present invention.

Claims (10)

1. A method for generating a data story oriented to predictive analysis results, the method comprising the steps of:
treating the predictive analysis system as a black box, and generating story parameters based on input parameters of a user t;
generating a story character based on the story parameters;
generating a story event for the story character according to the story parameters;
generating a storyline in a plurality of story stages according to the storyline;
and generating a visual data story curve and a data story report according to the story line.
2. The method for generating a data story for a predictive analysis result of claim 1,
the predictive analysis system is treated as a black box to generate story parameters, and the method specifically comprises the following steps:
acquiring input information t [ X ] of user t]Predicted outcome y expected by user t target Actual prediction result t [ y ] returned to user t by predictive analysis system]Wherein X is a feature set, and y is a target vector; acquiring feature subset X in feature set X, for which bias needs to be detected * The corresponding target vector is y * The method comprises the steps of carrying out a first treatment on the surface of the Acquiring feature subset X * As a subset of features without detection bias
Figure FDA0004092280560000011
The corresponding target vector is +.>
Figure FDA0004092280560000012
Obtaining an invariable feature subset X 'in a feature set X, wherein a corresponding target vector is y'; obtaining a complement of feature subset X "as a variable feature subset
Figure FDA0004092280560000013
The corresponding target vector is +.>
Figure FDA0004092280560000014
Will input information t [ X ]]Prediction result t [ y ]]Predicted outcome y expected by user t target Feature subset X * And its target vector y * Feature subset
Figure FDA0004092280560000015
And its target vector->
Figure FDA0004092280560000016
Feature subset X ", and its target vector y", feature subset +.>
Figure FDA0004092280560000017
And its target vector
Figure FDA0004092280560000018
As story parameters, for generating a story character.
3. The method for generating a data story for a predictive analysis result of claim 2,
the generating the story character according to the story parameters specifically comprises the following steps:
taking the user t as the host public t, and determining input information t [ X ] and actual prediction result t [ y ] of the host public t;
determining a character t of the same type as the host male t Different types of characters t The judgment basis is that the feature subset X needing to detect the bias * Whether the person who is the same as the host male t is the same,if the characters are the same, the character is called as a character t of the same type as a host ", otherwise called" character t of a different type from the owner ”;
Determining a frontal character t relative to a host man t + Reverse character t - Judging whether the predicted result is the same as the actual predicted result of the host male t, if so, the method is called as' front character t + ", otherwise called" reverse character t - ”。
4. The method for generating a data story for a predictive analysis result of claim 3,
generating a story event for a story character according to story parameters, including generating a reliability test event, a fairness test event and a resolvable test event;
the generation of the reliability test event refers to that whether the predictive labels t [ y ]' corresponding to the actual predictive results t [ y ] output for many times are the same or float within a range smaller than a reliability threshold value is judged by submitting input information t [ X ] of the host male t to the predictive analysis system for many times;
generating fairness test event refers to a character t of the same type as the owner The absolute value of the difference between the probability that the expression of the occurrence bias detection rule is established and the probability that the expression of the occurrence bias detection rule is established in all users is in a range smaller than a bias threshold value;
the Bias rule expression Bias is:
Bias={|P((y * ==y target )|(X * ==t[X=]))-P(y==y target )|<ε 1 }
wherein Bias represents whether the detected character feature may have a Bias, tx =]Representing the same type of character group t as the character of which the owner is willing to detect a prejudice Is characterized by epsilon 1 Represents the threshold of acceptable range for bias, P ((y) * ==y target )|(X * ==t[X=]) Representing the same type of character group t The probability that the Bias expression Bias holds, P (y= y) target ) Representing the probability of occurrence of Bias expression Bias in all characters;
generating a solvable test event refers to by a subset of features that are variable to the host's public t
Figure FDA0004092280560000021
Performing minimum change processing on a plurality of characteristic attribute values of (a) to realize the expected prediction result y of the user target
5. The method for generating a predictive analysis result-oriented data story according to claim 4, wherein,
the generating the storyline according to the story line and the story stages includes generating the storyline according to the start stage, the rise stage, the climax stage, the decline stage and the ending stage:
in the starting stage, story parameters are set, wherein the set story parameters comprise input information t [ X ] of a host male t and an actual prediction result t [ y ];
setting a reliability test event as an 'flaring event' at a starting point in a rising stage, and drawing a fairness test result at a fairness test event bit; finding y for the first time based on the host male t target On the premise of (1) and y by multiple what-if analysis events target After distance sequencing, set median event Q b 2 Last quarter bit event Q b 1 And a lower quartile event Q b 3
In the climax stage, setting climax events, namely, finding expected prediction results y conforming to the public t of the owner in the what-if analysis for the first time target The event or the event recommended by the predictive analysis system for the host male t;
during the descent phase, finding y for the first time based on the host male t target On the premise of (1) carrying out Why-not analysis, and carrying out Why-not analysis on the event according to the sum y target Setting median event Q after distance sequencing a 2 Last quarter bit event Q a 1 And a lower quartile event Q a 3
In the ending phase, suggested events are set, including suggestions made by the predictive analysis system for the owner.
6. The method for generating a data story for a predictive analysis result of claim 5,
the method for generating the visual data story line according to the story line specifically comprises the following steps:
taking the development time of the story line as the abscissa to find the expected prediction result y of the user t for the first time target The similarity value of (2) is taken as an ordinate, and different types of points are used for representing whether the host male t is the same type of person and whether the host male t is the front person, so that the expected prediction result y of the user t is found for the first time target Points (1) are the plotted curves of the tower tip according to the pyramid model.
7. The method for generating a data story for a predictive analysis result of claim 5,
the columns of the data story report include reliability, fairness and resolvable, and the rows of the data story report include input samples, predicted results, analysis methods, analysis results, analysis conclusions, and suggestions of the predictive analysis system to the user t.
8. A predictive analysis result-oriented data story generation system, comprising:
the parameter generation module is used for processing the predictive analysis system as a black box and generating story parameters based on input parameters of a user t;
a character generation module for generating a story character based on the story parameters;
the event generation module is used for generating a story event for the story character according to the story parameters;
the plot generation module is used for generating a plot according to the story event according to a plurality of story stages;
the view generation module is used for generating a visual data story line according to the story line;
and the report generation module is used for generating a data story report according to the story line.
9. A processing device comprising at least a processor and a memory, said memory having stored thereon a computer program, characterized in that the processor executes the steps of the predictive analysis result oriented data story generation method of any of claims 1 to 7 when running said computer program.
10. A computer storage medium having stored thereon computer readable instructions executable by a processor to implement the steps of the predictive analysis result oriented data story generation method of any of claims 1 to 7.
CN202310155761.3A 2023-02-23 2023-02-23 Predictive analysis result-oriented data story generation method and system Pending CN116340752A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310155761.3A CN116340752A (en) 2023-02-23 2023-02-23 Predictive analysis result-oriented data story generation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310155761.3A CN116340752A (en) 2023-02-23 2023-02-23 Predictive analysis result-oriented data story generation method and system

Publications (1)

Publication Number Publication Date
CN116340752A true CN116340752A (en) 2023-06-27

Family

ID=86886765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310155761.3A Pending CN116340752A (en) 2023-02-23 2023-02-23 Predictive analysis result-oriented data story generation method and system

Country Status (1)

Country Link
CN (1) CN116340752A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573859A (en) * 2024-01-15 2024-02-20 杭州数令集科技有限公司 Data processing method, system and equipment for automatically advancing scenario and dialogue

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573859A (en) * 2024-01-15 2024-02-20 杭州数令集科技有限公司 Data processing method, system and equipment for automatically advancing scenario and dialogue

Similar Documents

Publication Publication Date Title
Keyes The misgendering machines: Trans/HCI implications of automatic gender recognition
US20240394714A1 (en) Systems and methods for generating models for classifying imbalanced data
US10354159B2 (en) Methods and software for detecting objects in an image using a contextual multiscale fast region-based convolutional neural network
US20210303864A1 (en) Method and apparatus for processing video, electronic device, medium and product
Guo et al. MDMaaS: Medical-assisted diagnosis model as a service with artificial intelligence and trust
Li et al. Human age estimation based on locality and ordinal information
WO2021139316A1 (en) Method and apparatus for establishing expression recognition model, and computer device and storage medium
CN111566646A (en) Electronic device for obfuscating and decoding data and method for controlling the same
CN114385817B (en) Entity relationship identification method, device and readable storage medium
CN116227542A (en) Disturbance diversity enhancement-based countermeasure training method and system
CN116340752A (en) Predictive analysis result-oriented data story generation method and system
CN111694954B (en) Image classification method and device and electronic equipment
Liu et al. Evaluating the factuality of large language models using large-scale knowledge graphs
CN116957036A (en) Training method, training device and computing equipment for fake multimedia detection model
Chowdhury et al. Handling language prior and compositional reasoning issues in Visual Question Answering system
WO2023000792A1 (en) Methods and apparatuses for constructing living body identification model and for living body identification, device and medium
CN114255257B (en) Target tracking method, device, electronic equipment and storage medium
CN114282606A (en) Object identification method and device, computer readable storage medium and computer equipment
KR20220150060A (en) platform that provides company matching services based on user information and provides security services for them
CN111611409A (en) A case analysis method and related equipment incorporating scene knowledge
CN117669530A (en) False information detection method and system based on prompt learning
Qian et al. Label disambiguation-based feature selection for partial label learning via fuzzy dependency and feature discernibility
Hardy et al. Adaptive adversarial training does not increase recourse costs
CN114639044B (en) Label determination method, device, electronic device and storage medium
CN114359952B (en) Multi-mode score fusion method, device, computer readable storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination