CN116484805B - Intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis - Google Patents

Intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis Download PDF

Info

Publication number
CN116484805B
CN116484805B CN202310502167.7A CN202310502167A CN116484805B CN 116484805 B CN116484805 B CN 116484805B CN 202310502167 A CN202310502167 A CN 202310502167A CN 116484805 B CN116484805 B CN 116484805B
Authority
CN
China
Prior art keywords
word
words
data
determined
verification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310502167.7A
Other languages
Chinese (zh)
Other versions
CN116484805A (en
Inventor
胡若云
姚冰峰
郭兰兰
郭大琦
夏霖
唐健毅
张潇匀
刘铭
楼洁妮
陈洲泓
包挺华
潘鑫
金红霞
张磊
万志锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Zhejiang Electric Power Co Ltd
Marketing Service Center of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
State Grid Zhejiang Electric Power Co Ltd
Marketing Service Center of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Zhejiang Electric Power Co Ltd, Marketing Service Center of State Grid Zhejiang Electric Power Co Ltd filed Critical State Grid Zhejiang Electric Power Co Ltd
Priority to CN202310502167.7A priority Critical patent/CN116484805B/en
Publication of CN116484805A publication Critical patent/CN116484805A/en
Application granted granted Critical
Publication of CN116484805B publication Critical patent/CN116484805B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

本发明提供一种结合知识图谱和语义分析的电力报告智能清洗处理方法,包括:基于相应知识信息所对应的三元组关系构建相对应的知识图谱;确定词语中有数据属性的词语作为第一词语,将其他词语作为第二词语;根据用户对本次电力报告所配置的计算需求生成相对应的数据需求清单,基于数据需求清单选择具有相应分析属性的第一词语作为第三词语;基于清洗验证策略、历史数据对第三词语进行正确性的验证;根据第三词语的验证存疑类型确定与第三词语对应的清洗处理策略,验证存疑类型包括不合理存疑类型或待验证存疑类型;将第三词语输入至预设模型中计算得到数据清洗后的分析数据,基于分析数据和知识图谱得到相对应的处理数据。

The present invention provides an intelligent cleaning processing method for electric power reports that combines knowledge graphs and semantic analysis, including: constructing corresponding knowledge graphs based on triple relationships corresponding to corresponding knowledge information; determining words with data attributes in words as the first words, using other words as the second words; generate a corresponding data demand list based on the user's computing requirements configured for this power report, and select the first word with corresponding analysis attributes as the third word based on the data demand list; based on cleaning The verification strategy and historical data are used to verify the correctness of the third word; the cleaning processing strategy corresponding to the third word is determined according to the verification doubt type of the third word, and the verification doubt type includes the unreasonable doubt type or the unverified doubt type; the third word is The three words are input into the preset model to calculate the analyzed data after data cleaning, and the corresponding processed data is obtained based on the analyzed data and knowledge map.

Description

结合知识图谱和语义分析的电力报告智能清洗处理方法Intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis

技术领域Technical field

本发明涉及数据处理技术,尤其涉及一种结合知识图谱和语义分析的电力报告智能清洗处理方法。The invention relates to data processing technology, and in particular, to an intelligent cleaning and processing method for power reports that combines knowledge graphs and semantic analysis.

背景技术Background technique

随着电网企业数字化的转型,逐渐将以往的纸质电力报告转换为电子电力报告进行数据归纳。电力报告一般会包括多个维度的数据统计,例如居民用电数据、工业用电数据、园区用电数据等。电力报告为电网企业分析数据的重要依据。With the digital transformation of power grid enterprises, the previous paper power reports are gradually converted into electronic power reports for data summary. Power reports generally include data statistics from multiple dimensions, such as residential power consumption data, industrial power consumption data, park power consumption data, etc. Power reports are an important basis for power grid companies to analyze data.

由于电力报告的数量较多、数据较多的原因导致其数据量较大,在工作人员对电力报告进行分析时,往往需要阅读大量的电力报告,无法有效的结合需求快速获取到电力报告中的相关数据,因此,需要对电力报告进行有效数据的整理。现有技术中,往往是通过人为的方式对电力报告的数据进行提取和整理,然而,针对数据量较大的电力报告而言,其整理的工作量巨大,且由于不可避免的人为原因,还会导致一些数据整理错误。Due to the large number of power reports and the large amount of data, the data volume is large. When staff analyze the power reports, they often need to read a large number of power reports, and cannot effectively and quickly obtain the information in the power reports based on their needs. Relevant data, therefore, it is necessary to organize effective data for power reports. In the existing technology, the data of power reports are often extracted and organized manually. However, for power reports with a large amount of data, the workload of sorting them is huge, and due to unavoidable human reasons, This can lead to some data sorting errors.

因此,如何对电力报告进行智能清洗并对清洗的数据进行自动验证成为了急需解决的问题。Therefore, how to intelligently clean power reports and automatically verify the cleaned data has become an urgent problem to be solved.

发明内容Contents of the invention

本发明实施例提供一种结合知识图谱和语义分析的电力报告智能清洗处理方法,可以对电力报告进行智能清洗并对清洗的数据进行自动验证,还可以结合工作人员的需求实现对数据的自动分析,得到相应的分析数据。Embodiments of the present invention provide an intelligent cleaning and processing method for power reports that combines knowledge graphs and semantic analysis. It can intelligently clean power reports and automatically verify the cleaned data. It can also realize automatic analysis of data based on the needs of workers. , and obtain the corresponding analysis data.

本发明实施例的第一方面,提供一种结合知识图谱和语义分析的电力报告智能清洗处理方法,包括:A first aspect of the embodiment of the present invention provides an intelligent cleaning processing method for power reports that combines knowledge graph and semantic analysis, including:

接收用户对电力报告智能清洗及处理所配置的知识信息,基于相应知识信息所对应的三元组关系构建相对应的知识图谱,所述知识图谱中包括多个知识节点;Receive the knowledge information configured by the user for intelligent cleaning and processing of the power report, and construct a corresponding knowledge graph based on the triplet relationship corresponding to the corresponding knowledge information, and the knowledge graph includes multiple knowledge nodes;

对电力报告内的语句进行分词处理得到多个词语,确定词语中有数据属性的词语作为第一词语,将其他词语作为第二词语,结合所述语句中的第二词语对相应的第一词语进行语义分析,得到第一词语的分析属性;Perform word segmentation processing on the sentences in the power report to obtain multiple words, determine the words with data attributes in the words as the first words, use other words as the second words, and combine the second words in the sentences to the corresponding first words Perform semantic analysis to obtain the analytical attributes of the first word;

根据用户对本次电力报告所配置的计算需求生成相对应的数据需求清单,基于所述数据需求清单选择具有相应分析属性的第一词语作为第三词语;Generate a corresponding data requirement list based on the user's configured calculation requirements for this power report, and select the first word with the corresponding analysis attribute as the third word based on the data requirement list;

根据所述第三词语的分析属性确定与所述第三词语所对应的清洗验证策略,基于所述清洗验证策略、历史数据对第三词语进行正确性的验证,每个类型的分析属性具有预设的清洗验证策略;The cleaning verification strategy corresponding to the third word is determined according to the analysis attribute of the third word, and the correctness of the third word is verified based on the cleaning verification strategy and historical data. Each type of analysis attribute has a predetermined Designed cleaning and verification strategies;

若判断验证不通过则确定验证不通过所对应的验证存疑类型,根据所述第三词语的验证存疑类型确定与第三词语对应的清洗处理策略,所述验证存疑类型包括不合理存疑类型或待验证存疑类型;If it is determined that the verification fails, the verification doubt type corresponding to the verification failure is determined, and the cleaning processing strategy corresponding to the third word is determined according to the verification doubt type of the third word, and the verification doubt type includes an unreasonable doubt type or an unreasonable doubt type. Verify doubtful types;

在判断得到数据需求清单所对应的所有第三词语后,将所述第三词语输入至预设模型中计算得到数据清洗后的分析数据,基于所述分析数据和知识图谱得到相对应的处理数据。After determining that all third words corresponding to the data requirement list are obtained, the third words are input into the preset model to calculate the analyzed data after data cleaning, and the corresponding processing data is obtained based on the analyzed data and the knowledge graph. .

可选地,在第一方面的一种可能实现方式中,所述接收用户对电力报告智能清洗及处理所配置的知识信息,基于相应知识信息所对应的三元组关系构建相对应的知识图谱,所述知识图谱中包括多个知识节点,包括:Optionally, in a possible implementation of the first aspect, the receiving user intelligently cleans and processes the configured knowledge information for the power report, and constructs a corresponding knowledge graph based on the triplet relationship corresponding to the corresponding knowledge information. , the knowledge graph includes multiple knowledge nodes, including:

所述知识信息包括与分析数据所对应的第一知识信息,以及与处理数据所对应的第二知识信息,每个第一知识信息或第二知识信息具有相对应的知识节点;The knowledge information includes first knowledge information corresponding to the analysis data, and second knowledge information corresponding to the processing data, and each first knowledge information or second knowledge information has a corresponding knowledge node;

根据用户对第一知识信息、第二知识信息配置的三元组关系对相应的知识节点进行连接,构建生成相对应的知识图谱。The corresponding knowledge nodes are connected according to the triplet relationship configured by the user on the first knowledge information and the second knowledge information, and a corresponding knowledge graph is constructed and generated.

可选地,在第一方面的一种可能实现方式中,所述对电力报告内的语句进行分词处理得到多个词语,确定词语中有数据属性的词语作为第一词语,将其他词语作为第二词语,结合所述语句中的第二词语对相应的第一词语进行语义分析,得到第一词语的分析属性,包括:Optionally, in a possible implementation of the first aspect, word segmentation processing is performed on the statements in the power report to obtain multiple words, words with data attributes among the words are determined as the first words, and other words are used as the third words. The two words are combined with the second word in the sentence to perform semantic analysis on the corresponding first word to obtain the analytical attributes of the first word, including:

对电力报告内的语句进行分词处理得到多个词语,确定词语中有数据属性的词语作为第一词语,所述数据属性的词语至少包括阿拉伯数字、大写数字、繁体数字;Perform word segmentation processing on the statements in the power report to obtain multiple words, and determine the words with data attributes among the words as the first words. The words with data attributes at least include Arabic numerals, uppercase numerals, and traditional Chinese numerals;

将所有分词中第一词语以外的其他词语作为第二词语,遍历所述第二词语与预设词语进行比对,若判断第二词语与预设词语相对应则基于所述预设词语确定其为待分析的第二词语;Use words other than the first word in all participles as second words, traverse the second words and compare them with the preset words. If it is determined that the second words correspond to the preset words, determine the second words based on the preset words. is the second word to be analyzed;

根据每个第一词语与待分析的第二词语之间的位置关系,确定与相应第一词语所关联的待分析的第二词语,基于关联的第二词语对相应的第一词语进行语义分析,得到第一词语的分析属性;According to the positional relationship between each first word and the second word to be analyzed, the second word to be analyzed associated with the corresponding first word is determined, and the corresponding first word is semantically analyzed based on the associated second word. , get the analytical attributes of the first word;

若判断所述第一词语的格式与预设格式不一致,则根据第一词语的格式确定预设的第一转换模板,基于所述第一转换模板将第一词语进行转换,得到满足格式要求的第一词语。If it is determined that the format of the first word is inconsistent with the preset format, a preset first conversion template is determined according to the format of the first word, and the first word is converted based on the first conversion template to obtain a format that meets the format requirements. First word.

可选地,在第一方面的一种可能实现方式中,所述根据每个第一词语与待分析的第二词语之间的位置关系,确定与相应第一词语所关联的第二词语,基于关联的第二词语对相应的第一词语进行语义分析,得到第一词语的分析属性,包括:Optionally, in a possible implementation of the first aspect, determining the second word associated with the corresponding first word based on the positional relationship between each first word and the second word to be analyzed, Perform semantic analysis on the corresponding first word based on the associated second word to obtain the analytical attributes of the first word, including:

若判断一个语句中具有多个第一词语和预设的合并词,则判断相应的第一词语可以合并为一个第一词语,对可以合并的第一词语添加相对应的合并标签,以使后续对第一词语处理时基于所述合并标签对相应的第一词语合并处理;If it is determined that there are multiple first words and preset merged words in a sentence, it is determined that the corresponding first words can be merged into one first word, and corresponding merge tags are added to the first words that can be merged, so that subsequent When processing the first word, merge the corresponding first word based on the merge tag;

若判断一个语句中具有一个第一词语或可以合并为一个第一词语的多个第一词语,则将相应语句中所有待分析的第二词语与相应的第一词语或合并后的第一词语相关联;If it is determined that a sentence contains a first word or multiple first words that can be combined into one first word, then all the second words to be analyzed in the corresponding sentence are combined with the corresponding first words or the merged first words. Associated;

若判断一个语句中具有多个第一词语且多个词语无法合并为一个第一词语,则基于第一词语的位置对语句进行分段得到分段结果,根据所述分段结果确定每个第一词语所关联的第二词语;If it is determined that there are multiple first words in a sentence and the multiple words cannot be combined into one first word, segment the sentence based on the position of the first word to obtain a segmentation result, and determine each first word based on the segmentation result. A second word to which one word is associated;

所述分析属性为第二词语所包括的主语分析属性、趋势变化分析属性、概念分析属性中的任意一种或多种。The analysis attributes are any one or more of subject analysis attributes, trend change analysis attributes, and concept analysis attributes included in the second word.

可选地,在第一方面的一种可能实现方式中,所述若判断一个语句中具有多个第一词语且多个词语无法合并为一个第一词语,则基于第一词语的位置对语句进行分段得到分段结果,根据所述分段结果确定每个第一词语所关联的第二词语,包括:Optionally, in a possible implementation of the first aspect, if it is determined that there are multiple first words in a sentence and the multiple words cannot be combined into one first word, then the sentence is compared based on the position of the first word. Perform segmentation to obtain segmentation results, and determine the second words associated with each first word based on the segmentation results, including:

在语句中确定所有第一词语的位置,基于所述第一词语的位置对语句分段处理得到多个子段,确定每个第一词语前部、相邻的段作为关联段;Determine the positions of all first words in the sentence, segment the sentence based on the positions of the first words to obtain multiple sub-segments, and determine the front and adjacent segments of each first word as associated segments;

将关联段内的第二词语作为与相应第一词语所关联的第二词语。The second word in the associated segment is used as the second word associated with the corresponding first word.

可选地,在第一方面的一种可能实现方式中,所述若判断所述第一词语的格式与预设格式不一致,则根据第一词语的格式确定预设的第一转换模板,基于所述第一转换模板将第一词语进行转换,得到满足格式要求的第一词语,包括:Optionally, in a possible implementation of the first aspect, if it is determined that the format of the first word is inconsistent with the preset format, the preset first conversion template is determined according to the format of the first word, based on The first conversion template converts the first word to obtain the first word that meets the format requirements, including:

将第一词语的格式与预设格式比对,所述第一词语的格式为阿拉伯数字格式、大写数字格式或繁体数字格式,所述预设格式为阿拉伯数字格式;Compare the format of the first word with a preset format, the format of the first word is Arabic numeral format, uppercase numeral format or traditional Chinese numeral format, and the preset format is Arabic numeral format;

若判断所述第一词语的格式与预设格式不一致,则根据第一词语的格式确定预设的第一转换模板,对第一词语进行分解得到相对应的数值字和位置字,基于位置字将数值字填充至第一转换模板内的相对空位处;If it is determined that the format of the first word is inconsistent with the preset format, the preset first conversion template is determined according to the format of the first word, and the first word is decomposed to obtain the corresponding numerical word and position word, based on the position word Fill the numerical words into the relative empty positions in the first conversion template;

若判断第一转换模板存在未填充的空位,则对未填充的空位填充0;If it is determined that there are unfilled vacancies in the first conversion template, the unfilled vacancies are filled with 0;

若判断第一转换模板内所有的空位都被填充后,则将所有空位所组成的数字作为满足格式要求的第一词语。If it is determined that all the gaps in the first conversion template have been filled, then the number composed of all the gaps will be used as the first word that meets the format requirements.

可选地,在第一方面的一种可能实现方式中,所述若判断所述第一词语的格式与预设格式不一致,则根据第一词语的格式确定预设的第一转换模板,对第一词语进行分解得到相对应的数值字和位置字,基于位置字将数值字填充至第一转换模板内的相对空位处,包括:Optionally, in a possible implementation of the first aspect, if it is determined that the format of the first word is inconsistent with the preset format, the preset first conversion template is determined according to the format of the first word, and the The first word is decomposed to obtain the corresponding numerical words and positional words, and the numerical words are filled into relative vacancies in the first conversion template based on the positional words, including:

根据第一词语的格式确定预设的第一转换模板,所述第一转换模板具有多个空位,对第一词语进行分解识别得到相对应的数值字和位置字,将每个数值字和其后部相邻的位置字作为一个数值组;Determine a preset first conversion template according to the format of the first word. The first conversion template has a plurality of gaps. The first word is decomposed and recognized to obtain the corresponding numerical word and position word. Each numerical word and its other The adjacent position words at the rear are treated as a numerical value group;

提取整个第一词语中最前面的位置字得到相对应的预设空位数量,每个位置字具有相对应的预设空位数量;Extract the frontmost position word in the entire first word to obtain the corresponding preset number of gaps, and each position word has a corresponding preset number of gaps;

对第一转换模板中的预设空位数量的空位进行保留,按照所保留的空位由后至前依次对每个空位添加位置标签;Reserve a preset number of vacancies in the first conversion template, and add position labels to each vacancy in sequence from back to front according to the reserved vacancies;

根据每个数值组中的位置字确定相应对应的位置标签和空位,将数值组中的数值字填充至所确定的空位内。The corresponding position labels and gaps are determined according to the position words in each numerical group, and the numerical words in the numerical group are filled into the determined gaps.

可选地,在第一方面的一种可能实现方式中,还包括:Optionally, a possible implementation of the first aspect also includes:

若判断多个第一词语之间具有相对应的合并标签,则根据多个第一词语之间的所对应的合并词对多个第一词语进行计算得到计算后的第一词语,每个合并词具有预设的计算方式。If it is determined that there are corresponding merge tags between multiple first words, the multiple first words are calculated according to the corresponding merge tags between the multiple first words to obtain the calculated first words, each merged Words have a preset calculation method.

可选地,在第一方面的一种可能实现方式中,所述根据用户对本次电力报告所配置的计算需求生成相对应的数据需求清单,基于所述数据需求清单选择具有相应分析属性的第一词语作为第三词语,包括:Optionally, in a possible implementation of the first aspect, the corresponding data demand list is generated according to the computing demand configured by the user for this power report, and the data with corresponding analysis attributes is selected based on the data demand list. The first word as the third word includes:

根据所有预设格式的第一词语、合并后的第一词语以及分别关联的第二词语,生成相对应的数据统计表;Generate corresponding data statistics tables based on all the first words in the preset format, the merged first words, and the respectively associated second words;

根据用户对本次电力报告所配置的计算需求生成相对应的数据需求清单,提取所述数据需求清单内的第四词语;Generate a corresponding data requirement list based on the calculation requirements configured by the user for this power report, and extract the fourth word in the data requirement list;

将所述第四词语与数据统计表的第二词语比对,若第四词语与第二词语相对应,则将数据统计表内相应第二词语对应的第一词语作为第三词语。The fourth word is compared with the second word in the data statistics table. If the fourth word corresponds to the second word, the first word corresponding to the corresponding second word in the data statistics table is used as the third word.

可选地,在第一方面的一种可能实现方式中,所述根据所述第三词语的分析属性确定与所述第三词语所对应的清洗验证策略,基于所述清洗验证策略、历史数据对第三词语进行正确性的验证,每个类型的分析属性具有预设的清洗验证策略,包括:Optionally, in a possible implementation of the first aspect, the cleaning verification strategy corresponding to the third word is determined based on the analysis attribute of the third word, based on the cleaning verification strategy, historical data Verify the correctness of the third word. Each type of analysis attribute has a preset cleaning verification strategy, including:

确定所述第三词语所对应分析属性的第二词语,根据相应的第二词语确定相对应的清洗验证策略,每类主语分析属性或概念分析属性的第二词语具有预设的清洗验证策略;Determine the second word corresponding to the analysis attribute of the third word, and determine the corresponding cleaning verification strategy according to the corresponding second word. The second words of each type of subject analysis attribute or concept analysis attribute have a preset cleaning verification strategy;

将所述第三词语与清洗验证策略所包括的第一阈值区间进行比对,若第三词语的数值位于第一阈值区间内,则判断相应的第三词语满足清洗验证策略的正确性的验证;The third word is compared with the first threshold interval included in the cleaning verification strategy. If the value of the third word is within the first threshold interval, it is determined that the corresponding third word satisfies the verification of the correctness of the cleaning verification strategy. ;

根据所述第二词语的主语分析属性或概念分析属性确定相对应的历史数据,根据所述历史数据进行计算得到平均数值,对所述平均数值按照预设比例值进行区间化处理得到第二阈值区间;Corresponding historical data is determined according to the subject analysis attribute or concept analysis attribute of the second word, the average value is calculated according to the historical data, and the average value is interval processed according to the preset proportion value to obtain the second threshold. interval;

若第三词语的数值位于第二阈值区间内,则判断相应的第三词语满足历史数据的正确性的验证。If the value of the third word is within the second threshold interval, it is determined that the corresponding third word meets the verification of the correctness of the historical data.

可选地,在第一方面的一种可能实现方式中,所述若判断验证不通过则确定验证不通过所对应的验证存疑类型,根据所述第三词语的验证存疑类型确定与第三词语对应的清洗处理策略,所述验证存疑类型包括不合理存疑类型或待验证存疑类型,包括:Optionally, in a possible implementation of the first aspect, if it is determined that the verification fails, the verification doubt type corresponding to the verification failure is determined, and the verification doubt type corresponding to the third word is determined according to the verification doubt type of the third word. Corresponding cleaning processing strategy, the verification doubt types include unreasonable doubt types or unverified doubt types, including:

若判断第三词语的数值不位于第一阈值区间内,则确定第三词语所对应的验证存疑类型为不合理存疑类型;If it is determined that the value of the third word is not within the first threshold interval, it is determined that the verification doubt type corresponding to the third word is an unreasonable doubt type;

若判断第三词语的数值不位于第二阈值区间内,则确定第三词语所对应的验证存疑类型为待验证存疑类型;If it is determined that the value of the third word is not within the second threshold interval, then it is determined that the verification questionable type corresponding to the third word is the questionable type to be verified;

若第三词语为不合理存疑类型,则将相对应的第三词语标记为错误,并对第三词语、相对应的第二词语输出,以使用户对第三词语直接更新;If the third word is of an unreasonable and doubtful type, mark the corresponding third word as an error, and output the third word and the corresponding second word so that the user can directly update the third word;

若第三词语为待验证存疑类型,则将相对应的第三词语标记为待验证,并对第三词语、相对应的第二词语输出,若判断用户输入肯定的验证信息则将相对应的第三词语作为最终的第三词语;If the third word is of a questionable type to be verified, the corresponding third word is marked as to be verified, and the third word and the corresponding second word are output. If it is judged that the user inputs positive verification information, the corresponding The third word serves as the final third word;

若判断用户输入否定的验证信息,则根据用户输入对相应的第三词语更新。If it is determined that the user input is negative verification information, the corresponding third word is updated according to the user input.

可选地,在第一方面的一种可能实现方式中,所述在判断得到数据需求清单所对应的所有第三词语后,将所述第三词语输入至预设模型中计算得到数据清洗后的分析数据,基于所述分析数据和知识图谱得到相对应的处理数据,包括:Optionally, in a possible implementation of the first aspect, after determining that all third words corresponding to the data requirement list are obtained, the third words are input into a preset model to calculate the data after cleaning. The analysis data and the corresponding processing data are obtained based on the analysis data and knowledge graph, including:

在判断得到数据需求清单所对应的所有第三词语后,将所述第三词语输入至预设模型中计算得到数据清洗后的分析数据,所述分析数据中包括相对应的分词词语;After determining that all third words corresponding to the data requirement list are obtained, the third words are input into the preset model to calculate the analyzed data after data cleaning, and the analyzed data includes the corresponding segmented words;

将所述分词词语输入至知识图谱内确定具有相应第一知识信息的知识节点,根据所确定的知识节点与其他具有第二知识信息的知识节点;Input the segmented words into the knowledge graph to determine knowledge nodes with corresponding first knowledge information, and based on the determined knowledge nodes and other knowledge nodes with second knowledge information;

统计其他具有第二知识信息的知识节点的第二知识信息,得到相对应的处理数据并输出。The second knowledge information of other knowledge nodes with second knowledge information is counted, and the corresponding processing data is obtained and output.

本发明实施例的第二方面,提供一种结合知识图谱和语义分析的电力报告智能清洗处理系统,包括:A second aspect of the embodiment of the present invention provides an intelligent cleaning and processing system for power reports that combines knowledge graph and semantic analysis, including:

构建模块,用于接收用户对电力报告智能清洗及处理所配置的知识信息,基于相应知识信息所对应的三元组关系构建相对应的知识图谱,所述知识图谱中包括多个知识节点;The construction module is used to receive the knowledge information configured by the user for intelligent cleaning and processing of power reports, and build a corresponding knowledge graph based on the triple relationship corresponding to the corresponding knowledge information. The knowledge graph includes multiple knowledge nodes;

处理模块,用于对电力报告内的语句进行分词处理得到多个词语,确定词语中有数据属性的词语作为第一词语,将其他词语作为第二词语,结合所述语句中的第二词语对相应的第一词语进行语义分析,得到第一词语的分析属性;The processing module is used to perform word segmentation processing on the statements in the power report to obtain multiple words, determine the words with data attributes in the words as the first words, use other words as the second words, and combine the second word pairs in the sentences. Perform semantic analysis on the corresponding first word to obtain the analytical attributes of the first word;

生成模块,用于根据用户对本次电力报告所配置的计算需求生成相对应的数据需求清单,基于所述数据需求清单选择具有相应分析属性的第一词语作为第三词语;A generation module configured to generate a corresponding data requirement list based on the user's configured computing requirements for this power report, and select the first word with the corresponding analysis attribute as the third word based on the data requirement list;

验证模块,用于根据所述第三词语的分析属性确定与所述第三词语所对应的清洗验证策略,基于所述清洗验证策略、历史数据对第三词语进行正确性的验证,每个类型的分析属性具有预设的清洗验证策略;A verification module, configured to determine the cleaning verification strategy corresponding to the third word based on the analytical attributes of the third word, and verify the correctness of the third word based on the cleaning verification strategy and historical data. Each type The analysis attributes have preset cleaning verification strategies;

判断模块,用于若判断验证不通过则确定验证不通过所对应的验证存疑类型,根据所述第三词语的验证存疑类型确定与第三词语对应的清洗处理策略,所述验证存疑类型包括不合理存疑类型或待验证存疑类型;A judgment module, configured to determine the verification doubt type corresponding to the verification failure if it is judged that the verification fails, and determine the cleaning processing strategy corresponding to the third word according to the verification doubt type of the third word, and the verification doubt type includes not Types of reasonable doubts or types of doubts to be verified;

计算模块,用于在判断得到数据需求清单所对应的所有第三词语后,将所述第三词语输入至预设模型中计算得到数据清洗后的分析数据,基于所述分析数据和知识图谱得到相对应的处理数据。The calculation module is used to, after determining and obtaining all third words corresponding to the data requirement list, input the third words into the preset model to calculate the analyzed data after data cleaning, which is obtained based on the analyzed data and the knowledge map. Process the data accordingly.

有益效果:Beneficial effects:

1、本方案可以对电力报告进行词语维度上的处理和分析,得到具备数据属性的第一词语以及其他类型的第二词语,并结合第一词语和第二词语的分析,得到第一词语的分析数据。在用户由计算需求时,可以生成数据需求清单,并结合清洗验证策略、历史数据对第三词语进行正确性的验证,同时在出现异常情况时,可以用工作人员进行交互,使得分析的数据是无误的。最后,本方案还会结合用户对电力报告智能清洗及处理所配置的知识信息形成知识图谱,利用知识图谱输出分析数据对应的处理结果。综上,本方案可以对电力报告进行智能清洗并对清洗的数据进行自动验证,还可以结合工作人员的需求实现对数据的自动分析,得到相应的分析数据。1. This solution can process and analyze the power report in the word dimension, obtain the first word with data attributes and other types of second words, and combine the analysis of the first word and the second word to obtain the first word analyze data. When the user has calculation requirements, a data requirement list can be generated, and the correctness of the third word can be verified based on the cleaning verification strategy and historical data. At the same time, when an abnormal situation occurs, the staff can be used to interact, so that the analyzed data is Unmistakable. Finally, this solution will also combine the knowledge information configured by users for intelligent cleaning and processing of power reports to form a knowledge graph, and use the knowledge graph to output the processing results corresponding to the analysis data. In summary, this solution can intelligently clean the power report and automatically verify the cleaned data. It can also realize automatic analysis of the data based on the needs of the staff and obtain corresponding analysis data.

2、本方案在对电力报告进行词语维度上的处理和分析时,会结合词语的属性对词语进行分类,得到第一词语和第二词语,并结合第一词语与待分析的第二词语之间的位置关系,确定与相应第一词语所关联的待分析的第二词语,实现对电力报告中语句的分析及关联。同时,还会将第一词语的格式与预设格式比对,如果不对应,会进行格式的转换,在进行格式转换时,会利用第一转换模板进行转换,同时还会结合数值字和位置字实现数字的定位及补充,通过上述方式,可以将数据转换为所需格式,以进行后续的处理和判断。 2. When this program processes and analyzes the power report in the word dimension, it will classify the words based on their attributes, obtain the first word and the second word, and combine the first word with the second word to be analyzed. The positional relationship between the two words determines the second word to be analyzed that is associated with the corresponding first word, and realizes the analysis and correlation of the statements in the power report. At the same time, the format of the first word will be compared with the preset format. If it does not correspond, the format will be converted. When performing format conversion, the first conversion template will be used for conversion, and the numerical word and position will also be combined. Words realize the positioning and supplement of numbers. Through the above method, the data can be converted into the required format for subsequent processing and judgment.

3、本方案在结合用户对本次电力报告所配置的计算需求进行计算时,会先结合清洗验证策略、历史数据对第三词语进行正确性的验证。在验证时,本方案设置了两个验证方式,一种是利用第一阈值区间来确定数据的合理性,另一种是结合历史数据来确定数据的存疑性。另外,还会结合验证结果与工作人员进行交互,对数据进行修正处理,使得数据是准确的。最后,本方案还会结合知识图谱对分析数据进行处理,得到相对应的处理结果,智能化的协助工作人员对电力报告进行清洗、分析和处理。 3. When this solution is calculated based on the calculation requirements configured by the user for this power report, it will first verify the correctness of the third word based on the cleaning verification strategy and historical data. During verification, this plan sets up two verification methods, one is to use the first threshold interval to determine the rationality of the data, and the other is to combine historical data to determine the doubtfulness of the data. In addition, the verification results will be combined with the interaction with the staff to correct the data so that the data is accurate. Finally, this solution will also combine the knowledge graph to process the analysis data to obtain corresponding processing results, and intelligently assist staff in cleaning, analyzing and processing power reports.

附图说明Description of the drawings

图1是本发明实施例提供的一种结合知识图谱和语义分析的电力报告智能清洗处理方法的流程示意图;Figure 1 is a schematic flow chart of an intelligent cleaning processing method for power reports that combines knowledge graph and semantic analysis provided by an embodiment of the present invention;

图2是本发明实施例提供的一种结合知识图谱和语义分析的电力报告智能清洗处理系统的结构示意图。Figure 2 is a schematic structural diagram of an intelligent cleaning and processing system for power reports that combines knowledge graph and semantic analysis provided by an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments These are only some embodiments of the present invention, rather than all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。The terms "first", "second", "third", "fourth", etc. (if present) in the description and claims of the present invention and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used for Describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the invention described herein are capable of being practiced in sequences other than those illustrated or described herein.

应当理解,在本发明的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。It should be understood that in various embodiments of the present invention, the size of the sequence numbers of each process does not mean the order of execution. The execution order of each process should be determined by its functions and internal logic, and should not be determined by the execution order of the embodiments of the present invention. The implementation process constitutes no limitation.

应当理解,在本发明中,“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be understood that in the present invention, "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or equipment that includes a series of steps or units is not necessarily limited to Those steps or elements that are expressly listed may instead include other steps or elements that are not expressly listed or that are inherent to the process, method, product or apparatus.

应当理解,在本发明中,“多个”是指两个或两个以上。“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。“包含A、B和C”、“包含A、B、C”是指A、B、C三者都包含,“包含A、B或C”是指包含A、B、C三者之一,“包含A、B和/或C”是指包含A、B、C三者中任1个或任2个或3个。It should be understood that in the present invention, "plurality" means two or more. "And/or" is just an association relationship that describes related objects. It means that there can be three kinds of relationships. For example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. Condition. The character "/" generally indicates that the related objects are in an "or" relationship. "Includes A, B and C" and "includes A, B, C" means that it includes all three of A, B and C, and "includes A, B or C" means that it includes one of A, B and C. "Including A, B and/or C" means including any one, any two or three of A, B and C.

应当理解,在本发明中,“与A对应的B”、“与A相对应的B”、“A与B相对应”或者“B与A相对应”,表示B与A相关联,根据A可以确定B。根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其他信息确定B。A与B的匹配,是A与B的相似度大于或等于预设的阈值。It should be understood that in the present invention, "B corresponding to A", "B corresponding to A", "A corresponding to B" or "B corresponding to A" means that B is associated with A. According to A B can be determined. Determining B based on A does not mean determining B only based on A, but can also determine B based on A and/or other information. The matching between A and B means that the similarity between A and B is greater than or equal to the preset threshold.

取决于语境,如在此所使用的“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。Depending on the context, "if" as used herein may be interpreted as "when" or "when" or "in response to determination" or "in response to detection."

下面以具体地实施例对本发明的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。The technical solution of the present invention will be described in detail below with specific examples. The following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be described again in some embodiments.

参见图1,是本发明实施例提供的一种结合知识图谱和语义分析的电力报告智能清洗处理方法的流程示意图,该结合知识图谱和语义分析的电力报告智能清洗处理方法包括S1-S6,具体如下:Referring to Figure 1, it is a schematic flow chart of an intelligent cleaning processing method for power reports that combines knowledge graphs and semantic analysis provided by an embodiment of the present invention. The intelligent cleaning processing method for power reports that combines knowledge graphs and semantic analysis includes S1-S6. Specifically, as follows:

S1,接收用户对电力报告智能清洗及处理所配置的知识信息,基于相应知识信息所对应的三元组关系构建相对应的知识图谱,所述知识图谱中包括多个知识节点。S1: Receive the knowledge information configured by the user for intelligent cleaning and processing of power reports, and construct a corresponding knowledge graph based on the triplet relationship corresponding to the corresponding knowledge information. The knowledge graph includes multiple knowledge nodes.

由于本方案是结合知识图谱对电力报告进行处理的,因此,本方案会接收用户对电力报告智能清洗及处理所配置的知识信息,然后结合相应知识信息所对应的三元组关系构建相对应的知识图谱。Since this solution is combined with the knowledge graph to process power reports, this solution will receive the knowledge information configured by users for intelligent cleaning and processing of power reports, and then combine the triple relationships corresponding to the corresponding knowledge information to build the corresponding Knowledge graph.

其中,知识图谱中包括多个知识节点。Among them, the knowledge graph includes multiple knowledge nodes.

在一些实施例中,S1(接收用户对电力报告智能清洗及处理所配置的知识信息,基于相应知识信息所对应的三元组关系构建相对应的知识图谱,所述知识图谱中包括多个知识节点)包括S11- S12: In some embodiments, S1 (receives the knowledge information configured by the user for intelligent cleaning and processing of power reports, and builds a corresponding knowledge graph based on the triplet relationship corresponding to the corresponding knowledge information. The knowledge graph includes multiple knowledge nodes) including S11-S12:

S11,所述知识信息包括与分析数据所对应的第一知识信息,以及与处理数据所对应的第二知识信息,每个第一知识信息或第二知识信息具有相对应的知识节点。S11. The knowledge information includes first knowledge information corresponding to the analysis data and second knowledge information corresponding to the processing data. Each first knowledge information or second knowledge information has a corresponding knowledge node.

其中,知识信息包括与分析数据所对应的第一知识信息,以及与处理数据所对应的第二知识信息。每个第一知识信息或第二知识信息具有相对应的知识节点。The knowledge information includes first knowledge information corresponding to the analysis data, and second knowledge information corresponding to the processing data. Each first knowledge information or second knowledge information has a corresponding knowledge node.

分析数据和处理数据都可以是预设好的。分析数据例如是居民用电过高、工业用电过高等数据;处理数据例如是对应居民用电过高时,进行居民供电增加处理,还可以是对应工业用电过高时,进行工业限电处理。以上仅以示例说明,不限于上述示例。Both analyzing data and processing data can be preset. The analysis data is, for example, data on whether residential electricity consumption is too high or industrial electricity consumption is too high. The processing data is, for example, when the residential electricity consumption is too high, the residential power supply is increased, or when the industrial electricity consumption is too high, the industrial power supply is limited. deal with. The above are only examples and are not limited to the above examples.

S12,根据用户对第一知识信息、第二知识信息配置的三元组关系对相应的知识节点进行连接,构建生成相对应的知识图谱。 S12: Connect the corresponding knowledge nodes according to the triplet relationship configured by the user on the first knowledge information and the second knowledge information, and construct and generate the corresponding knowledge graph.

本方案会结合用户对第一知识信息、第二知识信息配置的三元组关系对相应的知识节点进行连接,构建生成相对应的知识图谱。该过程为现有技术,不再赘述。This solution will combine the triplet relationship of the user's configuration of the first knowledge information and the second knowledge information to connect the corresponding knowledge nodes to construct and generate the corresponding knowledge graph. This process is an existing technology and will not be described again.

S2,对电力报告内的语句进行分词处理得到多个词语,确定词语中有数据属性的词语作为第一词语,将其他词语作为第二词语,结合所述语句中的第二词语对相应的第一词语进行语义分析,得到第一词语的分析属性。 S2, perform word segmentation processing on the sentences in the power report to obtain multiple words, determine the words with data attributes in the words as the first words, use other words as the second words, and combine the second words in the sentences to the corresponding third words. Perform semantic analysis on a word to obtain the analytical attributes of the first word.

本方案会对电力报告内的语句进行分词处理得到多个词语,例如,对“根据本次统计居民用电量为百分之二十”进行分词处理可以得到“根据”、“本次”、“统计”、“居民用电量”、 “百分之二十”等结果。 This solution will segment the statements in the power report to obtain multiple words. For example, segmenting "According to this statistics, residents' electricity consumption is 20%" can obtain "according to", "this time", "Statistics", "Resident electricity consumption", "Twenty percent" and other results.

在得到词语后,会确定词语中有数据属性的词语作为第一词语,将其他词语作为第二词语,结合语句中的第二词语对相应的第一词语进行语义分析,得到第一词语的分析属性。After obtaining the words, the words with data attributes in the words will be determined as the first words, other words will be used as the second words, and the corresponding first words will be semantically analyzed based on the second words in the sentence to obtain the analysis of the first words. Attributes.

在一些实施例中,S2(对电力报告内的语句进行分词处理得到多个词语,确定词语中有数据属性的词语作为第一词语,将其他词语作为第二词语,结合所述语句中的第二词语对相应的第一词语进行语义分析,得到第一词语的分析属性)包括S21-S24:In some embodiments, S2 (perform word segmentation processing on the sentences in the power report to obtain multiple words, determine the words with data attributes in the words as the first words, use other words as the second words, combine the third words in the sentences The second word performs semantic analysis on the corresponding first word to obtain the analysis attributes of the first word) including S21-S24:

S21,对电力报告内的语句进行分词处理得到多个词语,确定词语中有数据属性的词语作为第一词语,所述数据属性的词语至少包括阿拉伯数字、大写数字、繁体数字。S21, perform word segmentation processing on the statements in the power report to obtain multiple words, and determine the words with data attributes among the words as the first words. The words with data attributes at least include Arabic numerals, uppercase numerals, and traditional Chinese numerals.

在得到词语后,本方案会确定词语中有数据属性的词语作为第一词语,其中,数据属性的词语至少包括阿拉伯数字、大写数字、繁体数字。After obtaining the words, this solution will determine the words with data attributes in the words as the first words. Among them, the words with data attributes at least include Arabic numerals, uppercase numerals, and traditional Chinese numerals.

示例性的,词语“20”的数据属性为阿拉伯数字,词语“二十”的数据属性为大写数字,词语“贰拾”的数据属性为繁体数字。For example, the data attribute of the word "20" is Arabic numerals, the data attribute of the word "twenty" is uppercase numerals, and the data attribute of the word "twenty" is traditional Chinese numerals.

S22,将所有分词中第一词语以外的其他词语作为第二词语,遍历所述第二词语与预设词语进行比对,若判断第二词语与预设词语相对应则基于所述预设词语确定其为待分析的第二词语。 S22, use words other than the first words in all word segments as second words, traverse the second words and compare them with the preset words, and if it is determined that the second words correspond to the preset words, then based on the preset words Determine it as the second word to be analyzed.

可以理解的是,第一词语为数据词语,其他的词语会被标记为第二词语。It can be understood that the first words are data words, and other words will be marked as second words.

本方案会遍历第二词语与预设词语进行比对,如果判断第二词语与预设词语相对应,则本方案会结合预设词语确定其为待分析的第二词语。This solution will traverse the second word and compare it with the preset word. If it is determined that the second word corresponds to the preset word, this solution will combine it with the preset word to determine it as the second word to be analyzed.

值得一提的是,由于分词会得到多个第二词语,有些第二词语是有用的,而有些是无用的,例如,“根据”、“本次”、“统计”是无用的,“居民用电量”是有用的,那么其可以是对应的待分析的第二词语。其中在确定待分析的第二词语时,可以结合预设词语进行确定。预设词语例如是“居民用电量”、“工业用电量”等。It is worth mentioning that since word segmentation will result in multiple second words, some second words are useful and some are useless. For example, "according to", "this time" and "statistics" are useless, and "residents" are useless. "Electricity consumption" is useful, then it can be the corresponding second word to be analyzed. When determining the second word to be analyzed, it can be determined in combination with the preset word. The default words are, for example, "residential electricity consumption", "industrial electricity consumption", etc.

S23,根据每个第一词语与待分析的第二词语之间的位置关系,确定与相应第一词语所关联的待分析的第二词语,基于关联的第二词语对相应的第一词语进行语义分析,得到第一词语的分析属性。 S23. According to the positional relationship between each first word and the second word to be analyzed, determine the second word to be analyzed that is associated with the corresponding first word, and perform the analysis on the corresponding first word based on the associated second word. Semantic analysis to obtain the analytical attributes of the first word.

在确定了第一词语和第二词语后,本方案会根据每个第一词语与待分析的第二词语之间的位置关系,确定与相应第一词语所关联的待分析的第二词语。After determining the first word and the second word, this solution determines the second word to be analyzed that is associated with the corresponding first word based on the positional relationship between each first word and the second word to be analyzed.

然后基于关联的第二词语对相应的第一词语进行语义分析,得到第一词语的分析属性。Then, semantic analysis is performed on the corresponding first word based on the associated second word, and the analysis attribute of the first word is obtained.

值得一提的是,本方案需要结合关联的词语来得到第一词语的含义,从而得到第一词语的分析属性。分析属性为第二词语所包括的主语分析属性、趋势变化分析属性、概念分析属性中的任意一种或多种。其中,主语分析属性可以是“居民用电”、“工业用电”等,趋势变化分析属性可以是“过高”、“过低”等,概念分析属性可以是“”例如,经过关联分析,得到的“百分之二十”对应的分析属性为“居民用电”。It is worth mentioning that this solution needs to combine related words to obtain the meaning of the first word, thereby obtaining the analytical attributes of the first word. The analysis attributes are any one or more of subject analysis attributes, trend change analysis attributes, and concept analysis attributes included in the second word. Among them, the subject analysis attributes can be "residential electricity", "industrial electricity", etc., the trend change analysis attributes can be "too high", "too low", etc., and the concept analysis attributes can be "". For example, after correlation analysis, The analyzed attribute corresponding to the obtained "twenty percent" is "resident electricity consumption".

在一些实施例中,S23(根据每个第一词语与待分析的第二词语之间的位置关系,确定与相应第一词语所关联的第二词语,基于关联的第二词语对相应的第一词语进行语义分析,得到第一词语的分析属性)包括S231-S233:In some embodiments, S23 (determine the second word associated with the corresponding first word based on the positional relationship between each first word and the second word to be analyzed, and pair the corresponding third word based on the associated second word). Perform semantic analysis on a word to obtain the analysis attributes of the first word) including S231-S233:

S231,若判断一个语句中具有多个第一词语和预设的合并词,则判断相应的第一词语可以合并为一个第一词语,对可以合并的第一词语添加相对应的合并标签,以使后续对第一词语处理时基于所述合并标签对相应的第一词语合并处理。S231, if it is determined that there are multiple first words and preset merged words in a sentence, it is determined that the corresponding first words can be merged into one first word, and corresponding merge tags are added to the first words that can be merged, so as to When subsequent processing of the first word is performed, the corresponding first word is merged and processed based on the merge tag.

如果判断一个语句中具有多个第一词语和预设的合并词,则判断相应的第一词语可以合并为一个第一词语,对可以合并的第一词语添加相对应的合并标签,以使后续对第一词语处理时基于所述合并标签对相应的第一词语合并处理。If it is determined that there are multiple first words and preset merged words in a sentence, it is determined that the corresponding first words can be merged into one first word, and corresponding merge tags are added to the first words that can be merged, so that subsequent When processing the first word, the corresponding first word is merged and processed based on the merge tag.

示例性的,预设的合并词例如是“百分之二十至百分之三十”中的“至”,此时,本方案会将相应的第一词语合并为一个第一词语。For example, the preset merging word is "to" in "twenty percent to thirty percent". At this time, this solution will merge the corresponding first words into one first word.

在上述实施例的基础上,还包括:On the basis of the above embodiments, it also includes:

若判断多个第一词语之间具有相对应的合并标签,则根据多个第一词语之间的所对应的合并词对多个第一词语进行计算得到计算后的第一词语,每个合并词具有预设的计算方式。If it is determined that there are corresponding merge tags between multiple first words, the multiple first words are calculated according to the corresponding merge tags between the multiple first words to obtain the calculated first words, each merged Words have a preset calculation method.

例如,上述的合并可以是取中间值的合并方式,例如,“百分之二十至百分之三十”合并之后得到一个第一词语为“百分之二十五”。值得一提的是,针对合并数据而言,其为一个粗略的统计,可能准确性稍微低一些。For example, the above-mentioned merging may be a merging method of taking an intermediate value. For example, after merging "twenty percent to thirty percent", the first word obtained is "twenty-five percent". It is worth mentioning that for the combined data, it is a rough statistic and may be slightly less accurate.

S232,若判断一个语句中具有一个第一词语或可以合并为一个第一词语的多个第一词语,则将相应语句中所有待分析的第二词语与相应的第一词语或合并后的第一词语相关联。 S232, if it is determined that a sentence has a first word or multiple first words that can be combined into one first word, then all the second words to be analyzed in the corresponding sentence are combined with the corresponding first words or the merged third words. associated with a word.

如果判断一个语句中具有一个第一词语或可以合并为一个第一词语的多个第一词语。If it is judged that a sentence has a first word or multiple first words that can be combined into one first word.

本方案会将相应语句中所有待分析的第二词语与相应的第一词语或合并后的第一词语相关联。This solution will associate all the second words to be analyzed in the corresponding sentence with the corresponding first words or the merged first words.

S233,若判断一个语句中具有多个第一词语且多个词语无法合并为一个第一词语,则基于第一词语的位置对语句进行分段得到分段结果,根据所述分段结果确定每个第一词语所关联的第二词语。 S233, if it is determined that there are multiple first words in a sentence and the multiple words cannot be combined into one first word, segment the sentence based on the position of the first word to obtain a segmentation result, and determine each segmentation result based on the segmentation result. a second word associated with the first word.

示例性的, 一个语句为“统计结果为居民的用电量为44455和工业用电量为600000”,其中的多个第一词语“44455”和“600000”之间无法合并为一个第一词语,此时,本方案会基于第一词语的位置对语句进行分段得到分段结果,然后根据分段结果确定每个第一词语所关联的第二词语。 For example, a statement is "the statistical result is that the electricity consumption of residents is 44455 and the electricity consumption of industry is 600000", in which multiple first words "44455" and "600000" cannot be combined into one first word , at this time, this solution will segment the sentence based on the position of the first word to obtain the segmentation result, and then determine the second word associated with each first word based on the segmentation result.

其中,S233(若判断一个语句中具有多个第一词语且多个词语无法合并为一个第一词语,则基于第一词语的位置对语句进行分段得到分段结果,根据所述分段结果确定每个第一词语所关联的第二词语)包括S2331-S2332:Among them, S233 (if it is determined that there are multiple first words in a sentence and the multiple words cannot be combined into one first word, segment the sentence based on the position of the first word to obtain a segmented result. According to the segmented result Determining the second word associated with each first word) includes S2331-S2332:

S2331,在语句中确定所有第一词语的位置,基于所述第一词语的位置对语句分段处理得到多个子段,确定每个第一词语前部、相邻的段作为关联段。S2331. Determine the positions of all first words in the sentence, segment the sentence based on the positions of the first words to obtain multiple sub-segments, and determine the front and adjacent segments of each first word as associated segments.

示例性的,针对“统计结果为居民的用电量为44455和工业用电量为600000”而言,第一词语为“44455”和“600000”。For example, for "the statistical result is that residential electricity consumption is 44455 and industrial electricity consumption is 600000", the first words are "44455" and "600000".

基于所述第一词语的位置对语句分段处理得到多个子段,分别为“统计结果为居民的用电量为”、“和工业用电量为”。Based on the position of the first word, the sentence is segmented to obtain multiple sub-segments, namely "the statistical result is that the electricity consumption of residents is", "and the electricity consumption of industry is".

然后,本方案会确定每个第一词语前部、相邻的段作为关联段,即“统计结果为居民的用电量为”为“44455”的关联段,“和工业用电量为”为“600000”的关联段。Then, this program will determine the segment in front of each first word and adjacent to it as the associated segment, that is, the associated segment "the statistical result is that the electricity consumption of residents is" is "44455", and the "and industrial electricity consumption is" is the associated segment of "600000".

S2332,将关联段内的第二词语作为与相应第一词语所关联的第二词语。 S2332: Use the second word in the associated segment as the second word associated with the corresponding first word.

本方案会将关联段内的第二词语作为与相应第一词语所关联的第二词语。This solution will use the second word in the associated segment as the second word associated with the corresponding first word.

通过上述方式,可以针对不同的情况进行不同策略的分析,得到与相应第一词语所关联的第二词语。Through the above method, different strategies can be analyzed for different situations to obtain the second word associated with the corresponding first word.

S24,若判断所述第一词语的格式与预设格式不一致,则根据第一词语的格式确定预设的第一转换模板,基于所述第一转换模板将第一词语进行转换,得到满足格式要求的第一词语。 S24, if it is determined that the format of the first word is inconsistent with the preset format, determine the preset first conversion template according to the format of the first word, and convert the first word based on the first conversion template to obtain a format that satisfies the format. The first word requested.

在一些情况下,第一词语的格式可能与预设格式不一致,预设格式可以是阿拉伯数字格式。In some cases, the format of the first word may be inconsistent with a preset format, which may be an Arabic numeral format.

如果判断第一词语的格式与预设格式不一致,则根据第一词语的格式确定预设的第一转换模板,基于第一转换模板将第一词语进行转换,得到满足格式要求的第一词语。通过格式的转换,以进行后续的相应格式的数据整理。If it is determined that the format of the first word is inconsistent with the preset format, a preset first conversion template is determined according to the format of the first word, and the first word is converted based on the first conversion template to obtain a first word that meets the format requirements. Through format conversion, subsequent data organization in the corresponding format can be carried out.

在一些实施例中,S24(若判断所述第一词语的格式与预设格式不一致,则根据第一词语的格式确定预设的第一转换模板,基于所述第一转换模板将第一词语进行转换,得到满足格式要求的第一词语)包括S241- S244: In some embodiments, S24 (if it is determined that the format of the first word is inconsistent with the preset format, determine a preset first conversion template based on the format of the first word, and convert the first word based on the first conversion template Convert to obtain the first word that meets the format requirements) including S241-S244:

S241,将第一词语的格式与预设格式比对,所述第一词语的格式为阿拉伯数字格式、大写数字格式或繁体数字格式,所述预设格式为阿拉伯数字格式。S241. Compare the format of the first word with a preset format. The format of the first word is Arabic numeral format, uppercase numeral format or traditional Chinese numeral format. The preset format is Arabic numeral format.

本方案会将第一词语的格式与阿拉伯数字格式进行比对, 如果第一词语的格式与阿拉伯数字格式不对应,那么需要对第一词语的格式进行转换。 This solution will compare the format of the first word with the Arabic numeral format. If the format of the first word does not correspond to the Arabic numeral format, then the format of the first word needs to be converted.

S242,若判断所述第一词语的格式与预设格式不一致,则根据第一词语的格式确定预设的第一转换模板,对第一词语进行分解得到相对应的数值字和位置字,基于位置字将数值字填充至第一转换模板内的相对空位处。 S242, if it is determined that the format of the first word is inconsistent with the preset format, determine the preset first conversion template according to the format of the first word, decompose the first word to obtain the corresponding numerical word and position word, based on The position word fills the numerical word into the relative empty position in the first conversion template.

值得一提的是,本方案会设置有预设的第一转换模板,然后利用预设的第一转换模板对第一词语进行分解得到相对应的数值字和位置字,基于位置字将数值字填充至第一转换模板内的相对空位处;It is worth mentioning that this solution will set up a preset first conversion template, and then use the preset first conversion template to decompose the first word to obtain the corresponding numerical word and positional word, and convert the numerical word based on the positional word. Fill to the relative empty space in the first conversion template;

其中,S242(若判断所述第一词语的格式与预设格式不一致,则根据第一词语的格式确定预设的第一转换模板,对第一词语进行分解得到相对应的数值字和位置字,基于位置字将数值字填充至第一转换模板内的相对空位处)包括S2421- S2424:Among them, S242 (if it is determined that the format of the first word is inconsistent with the preset format, determine the preset first conversion template according to the format of the first word, decompose the first word to obtain the corresponding numerical word and position word , filling the numerical words into relative vacancies in the first conversion template based on the position words) including S2421-S2424:

S2421,根据第一词语的格式确定预设的第一转换模板,所述第一转换模板具有多个空位,对第一词语进行分解识别得到相对应的数值字和位置字,将每个数值字和其后部相邻的位置字作为一个数值组。S2421. Determine the preset first conversion template according to the format of the first word. The first conversion template has multiple gaps. Decompose and identify the first word to obtain the corresponding numerical word and position word. Each numerical word is The position words adjacent to its rear are treated as a numerical group.

其中,第一转换模板具有多个空位,空位用于填写相应的数据。The first conversion template has a plurality of empty spaces, and the empty spaces are used to fill in corresponding data.

首先,本方案会对第一词语进行分解识别得到相对应的数值字和位置字,然后将每个数值字和其后部相邻的位置字作为一个数值组。First, this solution will decompose and identify the first word to obtain the corresponding numerical characters and positional characters, and then use each numerical character and its adjacent positional character as a numerical value group.

值得一提的是,数值字为数字位对应的数据,位置字为位置位对应的数据。示例性的,针对“二百三十”而言,其中的“二”和“三”对应的“2”和“3”为数值字,“百”和“十”为位置字。其中,“2”与“百”为一个数值组,“3”与“十”为一个数值组。It is worth mentioning that the numerical word is the data corresponding to the digital bit, and the position word is the data corresponding to the position bit. For example, for "two hundred and thirty", "2" and "3" corresponding to "two" and "three" are numerical characters, and "hundred" and "ten" are positional characters. Among them, "2" and "hundred" are a numerical value group, and "3" and "tens" are a numerical value group.

S2422,提取整个第一词语中最前面的位置字得到相对应的预设空位数量,每个位置字具有相对应的预设空位数量。 S2422: Extract the frontmost position word in the entire first word to obtain the corresponding preset number of gaps. Each position word has a corresponding preset number of gaps.

本方案会提取整个第一词语中最前面的位置字得到相对应的预设空位数量,其中,每个位置字具有相对应的预设空位数量。This solution extracts the first positional word in the entire first word to obtain the corresponding preset number of slots, where each positional word has a corresponding preset number of slots.

示例性的,最前面的位置字如果是“千”,那么其对应的预设空位数量为4个;最前面的位置字如果是“百”,那么其对应的预设空位数量为3个;最前面的位置字如果是“十”,那么其对应的预设空位数量为2个。For example, if the front position word is "thousand", then the corresponding preset number of slots is 4; if the front position word is "hundred", then the corresponding preset number of slots is 3; If the first position word is "十", then its corresponding default number of vacancies is 2.

S2423,对第一转换模板中的预设空位数量的空位进行保留,按照所保留的空位由后至前依次对每个空位添加位置标签。 S2423: Reserve a preset number of slots in the first conversion template, and add position labels to each slot in sequence from back to front according to the reserved slots.

本方案会对第一转换模板中的预设空位数量的空位进行保留,按照所保留的空位由后至前依次对每个空位添加位置标签。This solution will reserve a preset number of vacancies in the first conversion template, and add position labels to each vacancy sequentially from back to front according to the reserved vacancies.

示例性的,最前面的位置字如果是“百”,那么其对应的预设空位数量为3个,本方案会由后至前依次对每个空位添加位置标签。例如,3个空位的位置标签分别对应“个”、“十”、“百”。For example, if the first position word is "hundred", then the corresponding preset number of slots is 3. This solution will add position labels to each slot in sequence from back to front. For example, the three empty position labels correspond to "one", "ten", and "hundred" respectively.

S2424,根据每个数值组中的位置字确定相应对应的位置标签和空位,将数值组中的数值字填充至所确定的空位内。 S2424: Determine the corresponding position label and empty space according to the position word in each numerical value group, and fill the numerical words in the numerical value group into the determined empty position.

最后,本方案会根据每个数值组中的位置字确定相应对应的位置标签和空位,将数值组中的数值字填充至所确定的空位内。Finally, this solution determines the corresponding position labels and gaps based on the position words in each numerical group, and fills the numerical words in the numerical group into the determined gaps.

例如,将“2”填充到“百”对应的空位处,将“3”填充到“十”对应的空位处。For example, fill "2" into the empty space corresponding to "hundred" and fill "3" into the empty space corresponding to "tens".

S243,若判断第一转换模板存在未填充的空位,则对未填充的空位填充0。 S243, if it is determined that there are unfilled vacancies in the first conversion template, fill the unfilled vacancies with 0s.

示例性的,在上述的填充过程中,个位还没填充,此时,本方案会对未填充的空位填充0,最后得到230的数据。For example, during the above filling process, the ones bit has not been filled. At this time, this solution will fill the unfilled empty bits with 0, and finally obtain the data of 230.

S244,若判断第一转换模板内所有的空位都被填充后,则将所有空位所组成的数字作为满足格式要求的第一词语。 S244, if it is determined that all the gaps in the first conversion template have been filled, then use the number composed of all the gaps as the first word that meets the format requirements.

如果判断第一转换模板内所有的空位都被填充后,则将所有空位所组成的数字作为满足格式要求的第一词语。If it is determined that all the gaps in the first conversion template have been filled, then the number composed of all the gaps will be used as the first word that meets the format requirements.

需要说明的是,针对少见的末尾有位置字的数据而言,本方案可以将末尾的位置字进行相应的转换。例如,针对“二百三十万”来说,其中末尾的“万”可以转换成4个0,最后得到2300000的转换数据。It should be noted that for the rare data with a position word at the end, this solution can convert the position word at the end accordingly. For example, for "two million and three hundred thousand", the "ten thousand" at the end can be converted into four zeros, and finally the conversion data of 2,300,000 is obtained.

S3,根据用户对本次电力报告所配置的计算需求生成相对应的数据需求清单,基于所述数据需求清单选择具有相应分析属性的第一词语作为第三词语。 S3: Generate a corresponding data requirement list based on the user's configured computing requirements for this power report, and select the first word with the corresponding analysis attribute as the third word based on the data requirement list.

由于用户的需求不同,其所要分析的数据也是不同的。本方案会根据用户对本次电力报告所配置的计算需求生成相对应的数据需求清单。Because users have different needs, the data they need to analyze is also different. This solution will generate a corresponding data requirement list based on the calculation requirements configured by the user for this power report.

在得到数据需求清单后,可以结合数据需求清单选择具有相应分析属性的第一词语作为第三词语。After obtaining the data requirement list, the first word with the corresponding analysis attribute can be selected as the third word in combination with the data requirement list.

在一些实施例中,S3(根据用户对本次电力报告所配置的计算需求生成相对应的数据需求清单,基于所述数据需求清单选择具有相应分析属性的第一词语作为第三词语)包括S31- S33: In some embodiments, S3 (generating a corresponding data requirement list based on the computing requirements configured by the user for this power report, and selecting the first word with the corresponding analysis attribute as the third word based on the data requirement list) includes S31 -S33:

S31,根据所有预设格式的第一词语、合并后的第一词语以及分别关联的第二词语,生成相对应的数据统计表。S31: Generate a corresponding data statistics table based on all the first words in the preset format, the merged first words, and the respectively associated second words.

首先,本方案会对所有预设格式的第一词语、合并后的第一词语以及分别关联的第二词语进行统计,生成相对应的数据统计表。First, this solution will count all the first words in the preset format, the merged first words, and the respectively associated second words, and generate a corresponding data statistics table.

S32,根据用户对本次电力报告所配置的计算需求生成相对应的数据需求清单,提取所述数据需求清单内的第四词语。 S32: Generate a corresponding data requirement list based on the calculation requirements configured by the user for this power report, and extract the fourth word in the data requirement list.

在一些情况下,用户可能具有多个计算需求,此时可以对所有预设格式的第一词语、合并后的第一词语以及分别关联的第二词语,生成相对应的数据统计表。In some cases, the user may have multiple computing requirements. In this case, corresponding data statistics tables can be generated for all the first words in the preset format, the merged first words, and the respectively associated second words.

其中,计算需求例如是计算居民用电量、工业用电量等需求,对应的第四词语可以是居民用电量、工业用电量等。The calculation requirement is, for example, the calculation of residential electricity consumption, industrial electricity consumption, etc., and the corresponding fourth word may be residential electricity consumption, industrial electricity consumption, etc.

S33,将所述第四词语与数据统计表的第二词语比对,若第四词语与第二词语相对应,则将数据统计表内相应第二词语对应的第一词语作为第三词语。 S33: Compare the fourth word with the second word in the data statistics table. If the fourth word corresponds to the second word, use the first word corresponding to the corresponding second word in the data statistics table as the third word.

本方案会将第四词语与数据统计表的第二词语比对,如果第四词语与第二词语相对应,说明数据统计表中有对应的数据。此时,本方案会将数据统计表内相应第二词语对应的第一词语作为第三词语。This solution will compare the fourth word with the second word in the data statistics table. If the fourth word corresponds to the second word, it means that there is corresponding data in the data statistics table. At this time, this solution will use the first word corresponding to the corresponding second word in the data statistics table as the third word.

S4,根据所述第三词语的分析属性确定与所述第三词语所对应的清洗验证策略,基于所述清洗验证策略、历史数据对第三词语进行正确性的验证,每个类型的分析属性具有预设的清洗验证策略。 S4: Determine the cleaning verification strategy corresponding to the third word according to the analysis attribute of the third word, and verify the correctness of the third word based on the cleaning verification strategy and historical data. The analysis attributes of each type Has preset cleaning verification strategies.

在得到第三词语后,本方案会根据第三词语的分析属性确定与第三词语所对应的清洗验证策略,每个类型的分析属性具有预设的清洗验证策略。After obtaining the third word, this solution will determine the cleaning and verification strategy corresponding to the third word based on the analysis attribute of the third word. Each type of analysis attribute has a preset cleaning and verification strategy.

然后,结合清洗验证策略、历史数据对第三词语进行正确性的验证。可以理解的是,由于电力报告中统计的数据可能是不准确的,例如,得到的数据为居民用电量为1度,这个数据肯定是有误的,因此,在对其进行分析之前,还会对其进行数据正确性的验证。Then, combine the cleaning verification strategy and historical data to verify the correctness of the third word. It is understandable that the statistical data in the power report may be inaccurate. For example, the data obtained is that residents’ electricity consumption is 1 kilowatt hour. This data must be wrong. Therefore, before analyzing it, it is still necessary to The data correctness will be verified.

在一些实施例中,S4(根据所述第三词语的分析属性确定与所述第三词语所对应的清洗验证策略,基于所述清洗验证策略、历史数据对第三词语进行正确性的验证,每个类型的分析属性具有预设的清洗验证策略)包括S41- S44: In some embodiments, S4 (determine the cleaning verification strategy corresponding to the third word based on the analysis attribute of the third word, and verify the correctness of the third word based on the cleaning verification strategy and historical data, Each type of analysis attribute has preset cleaning validation strategies) including S41-S44:

S41,确定所述第三词语所对应分析属性的第二词语,根据相应的第二词语确定相对应的清洗验证策略,每类主语分析属性或概念分析属性的第二词语具有预设的清洗验证策略。S41. Determine the second word with the analysis attribute corresponding to the third word, and determine the corresponding cleaning verification strategy according to the corresponding second word. The second word with each type of subject analysis attribute or concept analysis attribute has a preset cleaning verification Strategy.

例如,针对主语分析属性为居民用电而言,其预设的清洗验证策略可以是设置一个用电区间,即第一阈值区间。后续判断居民用电量是否在第一阈值区间内,对其进行验证。For example, if the subject analysis attribute is residential electricity consumption, the preset cleaning verification strategy may be to set an electricity consumption interval, that is, a first threshold interval. Subsequently, it is determined whether the residents' electricity consumption is within the first threshold interval and verified.

S42,将所述第三词语与清洗验证策略所包括的第一阈值区间进行比对,若第三词语的数值位于第一阈值区间内,则判断相应的第三词语满足清洗验证策略的正确性的验证。 S42: Compare the third word with the first threshold interval included in the cleaning verification strategy. If the value of the third word is within the first threshold interval, determine whether the corresponding third word satisfies the correctness of the cleaning verification strategy. verification.

首先,本方案会将第三词语与清洗验证策略所包括的第一阈值区间进行比对。可以理解的是,如果第三词语的数值位于第一阈值区间内,则判断相应的第三词语满足清洗验证策略的正确性的验证。First, this solution will compare the third word with the first threshold interval included in the cleaning verification strategy. It can be understood that if the value of the third word is within the first threshold interval, it is determined that the corresponding third word meets the verification of the correctness of the cleaning verification strategy.

值得一提的是,上述步骤是验证数据是否合理的,是肯定有误的。It is worth mentioning that the above steps are to verify whether the data is reasonable and are definitely wrong.

S43,根据所述第二词语的主语分析属性或概念分析属性确定相对应的历史数据,根据所述历史数据进行计算得到平均数值,对所述平均数值按照预设比例值进行区间化处理得到第二阈值区间。 S43: Determine corresponding historical data according to the subject analysis attribute or concept analysis attribute of the second word, calculate an average value based on the historical data, and perform interval processing on the average value according to a preset proportion value to obtain the third Two threshold intervals.

可以理解的是,该步骤是得到对应的历史数据,结合历史数据对其进行判断。It can be understood that this step is to obtain the corresponding historical data and judge it based on the historical data.

其中,在结合历史数据得到平均数值后,可以结合预设比例值进行区间化处理得到第二阈值区间。例如,预设比例值为0.5,平均数值为1000,那么第二阈值区间可以是500-1500。After the average value is obtained by combining the historical data, the interval processing can be performed by combining the preset proportion value to obtain the second threshold interval. For example, if the preset proportion value is 0.5 and the average value is 1000, then the second threshold interval may be 500-1500.

值得一提的是,上述步骤是验证数据是否存疑的,如果位于第二阈值区间外,其不一定有误的,是存疑的。It is worth mentioning that the above steps are to verify whether the data is doubtful. If it is outside the second threshold interval, it is not necessarily wrong and is doubtful.

S44,若第三词语的数值位于第二阈值区间内,则判断相应的第三词语满足历史数据的正确性的验证。 S44: If the value of the third word is within the second threshold interval, it is determined that the corresponding third word meets the verification of the correctness of the historical data.

如果第三词语的数值位于第二阈值区间内,则本方案会判断相应的第三词语满足历史数据的正确性的验证。If the value of the third word is within the second threshold interval, this solution will determine that the corresponding third word meets the verification of the correctness of historical data.

S5,若判断验证不通过则确定验证不通过所对应的验证存疑类型,根据所述第三词语的验证存疑类型确定与第三词语对应的清洗处理策略,所述验证存疑类型包括不合理存疑类型或待验证存疑类型。 S5, if it is determined that the verification fails, determine the verification doubt type corresponding to the verification failure, and determine the cleaning processing strategy corresponding to the third word according to the verification doubt type of the third word. The verification doubt type includes an unreasonable doubt type. or questionable types to be verified.

在一些实施例中,S5(若判断验证不通过则确定验证不通过所对应的验证存疑类型,根据所述第三词语的验证存疑类型确定与第三词语对应的清洗处理策略,所述验证存疑类型包括不合理存疑类型或待验证存疑类型)包括S51- S55: In some embodiments, S5 (if it is determined that the verification fails, determine the verification question type corresponding to the verification failure, and determine the cleaning processing strategy corresponding to the third word according to the verification question type of the third word, and the verification question type Types include unreasonable doubt types or doubtful types to be verified) including S51-S55:

S51,若判断第三词语的数值不位于第一阈值区间内,则确定第三词语所对应的验证存疑类型为不合理存疑类型。S51, if it is determined that the value of the third word is not within the first threshold interval, determine that the verification doubt type corresponding to the third word is an unreasonable doubt type.

可以理解的是,如果判断第三词语的数值不位于第一阈值区间内,则说明数据肯定是有误的,本方案会确定第三词语所对应的验证存疑类型为不合理存疑类型。It can be understood that if it is determined that the value of the third word is not within the first threshold interval, it means that the data must be wrong, and this solution will determine that the verification doubt type corresponding to the third word is an unreasonable doubt type.

S52,若判断第三词语的数值不位于第二阈值区间内,则确定第三词语所对应的验证存疑类型为待验证存疑类型。 S52: If it is determined that the value of the third word is not within the second threshold interval, determine that the verification questionable type corresponding to the third word is the questionable type to be verified.

可以理解的是,如果判断第三词语的数值不位于第二阈值区间内,则说明数据是存疑的,本方案会确定第三词语所对应的验证存疑类型为待验证存疑类型。It can be understood that if it is determined that the value of the third word is not within the second threshold interval, it means that the data is doubtful, and this solution will determine that the verification doubtful type corresponding to the third word is the doubtful type to be verified.

S53,若第三词语为不合理存疑类型,则将相对应的第三词语标记为错误,并对第三词语、相对应的第二词语输出,以使用户对第三词语直接更新。 S53, if the third word is of an unreasonable and doubtful type, mark the corresponding third word as an error, and output the third word and the corresponding second word so that the user can directly update the third word.

如果第三词语为不合理存疑类型,则将相对应的第三词语标记为错误,并对第三词语、相对应的第二词语输出,以使用户对第三词语直接更新。If the third word is of an unreasonable and doubtful type, the corresponding third word is marked as an error, and the third word and the corresponding second word are output, so that the user can directly update the third word.

S54,若第三词语为待验证存疑类型,则将相对应的第三词语标记为待验证,并对第三词语、相对应的第二词语输出,若判断用户输入肯定的验证信息则将相对应的第三词语作为最终的第三词语。 S54, if the third word is of a questionable type to be verified, mark the corresponding third word as to be verified, and output the third word and the corresponding second word. If it is judged that the user inputs positive verification information, the corresponding The corresponding third word serves as the final third word.

如果第三词语为待验证存疑类型,则将相对应的第三词语标记为待验证,并对第三词语、相对应的第二词语输出,协助用户进一步的判断,如果判断用户输入肯定的验证信息则将相对应的第三词语作为最终的第三词语。If the third word is a questionable type to be verified, mark the corresponding third word as to be verified, and output the third word and the corresponding second word to assist the user in further judgment. If it is judged that the user inputs a positive verification The information uses the corresponding third word as the final third word.

S55,若判断用户输入否定的验证信息,则根据用户输入对相应的第三词语更新。 S55, if it is determined that the user input is negative verification information, update the corresponding third word according to the user input.

如果判断用户输入否定的验证信息,则根据用户输入对相应的第三词语更新。If it is determined that the user input is negative verification information, the corresponding third word is updated according to the user input.

S6,在判断得到数据需求清单所对应的所有第三词语后,将所述第三词语输入至预设模型中计算得到数据清洗后的分析数据,基于所述分析数据和知识图谱得到相对应的处理数据。 S6: After determining all the third words corresponding to the data requirement list, input the third words into the preset model to calculate the analyzed data after data cleaning, and obtain the corresponding analysis data based on the analyzed data and the knowledge map. Data processing.

本方案设置有预设模型,可以将第三词语输入至预设模型中计算得到数据清洗后的分析数据,然后结合分析数据和知识图谱得到相对应的处理数据。This solution is equipped with a preset model. The third words can be input into the preset model to calculate the analyzed data after data cleaning, and then the corresponding processed data can be obtained by combining the analyzed data and the knowledge graph.

在一些实施例中,S6(在判断得到数据需求清单所对应的所有第三词语后,将所述第三词语输入至预设模型中计算得到数据清洗后的分析数据,基于所述分析数据和知识图谱得到相对应的处理数据)包括S61-S63:In some embodiments, S6 (after determining that all third words corresponding to the data requirement list are obtained, input the third words into the preset model to calculate the analyzed data after data cleaning, based on the analyzed data and The knowledge graph obtains the corresponding processing data) including S61-S63:

S61,在判断得到数据需求清单所对应的所有第三词语后,将所述第三词语输入至预设模型中计算得到数据清洗后的分析数据,所述分析数据中包括相对应的分词词语。S61: After determining that all third words corresponding to the data requirement list are obtained, the third words are input into the preset model to calculate analyzed data after data cleaning, and the analyzed data includes corresponding segmented words.

本方案在判断得到数据需求清单所对应的所有第三词语后,可以将第三词语输入至预设模型中计算得到数据清洗后的分析数据,其中,分析数据中包括相对应的分词词语。After this solution determines and obtains all the third words corresponding to the data requirement list, the third words can be input into the preset model to calculate the analyzed data after data cleaning, where the analyzed data includes the corresponding segmented words.

示例性的,可以采用预设模型将居民用电数据和工业用电数据进行大小比较,得到分析数据,例如,分析数据例如可以是工业用电占比较高。最后得到工业用电占比较高的分词词语。For example, a preset model can be used to compare residential electricity consumption data and industrial electricity consumption data to obtain analytical data. For example, the analytical data can be that industrial electricity consumption accounts for a relatively high proportion. Finally, segmented words with a high proportion of industrial electricity consumption are obtained.

S62,将所述分词词语输入至知识图谱内确定具有相应第一知识信息的知识节点,根据所确定的知识节点与其他具有第二知识信息的知识节点。 S62: Input the segmented words into the knowledge graph to determine knowledge nodes with corresponding first knowledge information, and based on the determined knowledge nodes and other knowledge nodes with second knowledge information.

本方案会将分词词语输入至知识图谱内确定具有相应第一知识信息的知识节点,根据所确定的知识节点与其他具有第二知识信息的知识节点。This solution will input segmented words into the knowledge graph to determine knowledge nodes with corresponding first knowledge information, and based on the determined knowledge nodes and other knowledge nodes with second knowledge information.

其中,第二知识节点可以是对应第一知识节点的处理节点,例如,针对工业用电占比较高的第一知识节点而言,其对应的第二知识节点可以是“限制工业用电”、“建议工厂提高清洁能源供电占比”等节点。The second knowledge node may be a processing node corresponding to the first knowledge node. For example, for the first knowledge node with a high proportion of industrial electricity consumption, the corresponding second knowledge node may be "limit industrial electricity consumption", "It is recommended that factories increase the proportion of clean energy power supply" and other nodes.

S63,统计其他具有第二知识信息的知识节点的第二知识信息,得到相对应的处理数据并输出。 S63: Count the second knowledge information of other knowledge nodes with second knowledge information, obtain corresponding processing data and output it.

最后,统计其他具有第二知识信息的知识节点的第二知识信息,得到相对应的处理数据并输出。可以理解的是,针对一个第一知识节点而言,其可能具有多个第二知识节点,本方案会将对应的多个第二知识信息作为处理数据输出。Finally, the second knowledge information of other knowledge nodes with second knowledge information is counted, and the corresponding processed data is obtained and output. It can be understood that, for a first knowledge node, it may have multiple second knowledge nodes, and this solution will output the corresponding multiple second knowledge information as processing data.

参见图2,是本发明实施例提供的一种结合知识图谱和语义分析的电力报告智能清洗处理系统的结构示意图,该结合知识图谱和语义分析的电力报告智能清洗处理系统包括:Refer to Figure 2, which is a schematic structural diagram of an intelligent cleaning and processing system for power reports that combines knowledge graphs and semantic analysis provided by an embodiment of the present invention. The intelligent cleaning and processing system for power reports that combines knowledge graphs and semantic analysis includes:

构建模块,用于接收用户对电力报告智能清洗及处理所配置的知识信息,基于相应知识信息所对应的三元组关系构建相对应的知识图谱,所述知识图谱中包括多个知识节点;The construction module is used to receive the knowledge information configured by the user for intelligent cleaning and processing of power reports, and build a corresponding knowledge graph based on the triple relationship corresponding to the corresponding knowledge information. The knowledge graph includes multiple knowledge nodes;

处理模块,用于对电力报告内的语句进行分词处理得到多个词语,确定词语中有数据属性的词语作为第一词语,将其他词语作为第二词语,结合所述语句中的第二词语对相应的第一词语进行语义分析,得到第一词语的分析属性;The processing module is used to perform word segmentation processing on the statements in the power report to obtain multiple words, determine the words with data attributes in the words as the first words, use other words as the second words, and combine the second word pairs in the sentences. Perform semantic analysis on the corresponding first word to obtain the analytical attributes of the first word;

生成模块,用于根据用户对本次电力报告所配置的计算需求生成相对应的数据需求清单,基于所述数据需求清单选择具有相应分析属性的第一词语作为第三词语;A generation module configured to generate a corresponding data requirement list based on the user's configured computing requirements for this power report, and select the first word with the corresponding analysis attribute as the third word based on the data requirement list;

验证模块,用于根据所述第三词语的分析属性确定与所述第三词语所对应的清洗验证策略,基于所述清洗验证策略、历史数据对第三词语进行正确性的验证,每个类型的分析属性具有预设的清洗验证策略;A verification module, configured to determine the cleaning verification strategy corresponding to the third word based on the analytical attributes of the third word, and verify the correctness of the third word based on the cleaning verification strategy and historical data. Each type The analysis attributes have preset cleaning verification strategies;

判断模块,用于若判断验证不通过则确定验证不通过所对应的验证存疑类型,根据所述第三词语的验证存疑类型确定与第三词语对应的清洗处理策略,所述验证存疑类型包括不合理存疑类型或待验证存疑类型;A judgment module, configured to determine the verification doubt type corresponding to the verification failure if it is judged that the verification fails, and determine the cleaning processing strategy corresponding to the third word according to the verification doubt type of the third word, and the verification doubt type includes not Types of reasonable doubts or types of doubts to be verified;

计算模块,用于在判断得到数据需求清单所对应的所有第三词语后,将所述第三词语输入至预设模型中计算得到数据清洗后的分析数据,基于所述分析数据和知识图谱得到相对应的处理数据。The calculation module is used to, after determining and obtaining all third words corresponding to the data requirement list, input the third words into the preset model to calculate the analyzed data after data cleaning, which is obtained based on the analyzed data and the knowledge map. Process the data accordingly.

本发明还提供一种存储介质,所述存储介质中存储有计算机程序,所述计算机程序被处理器执行时用于实现上述的各种实施方式提供的方法。The present invention also provides a storage medium. A computer program is stored in the storage medium. When the computer program is executed by a processor, the computer program is used to implement the methods provided by the above-mentioned various embodiments.

其中,存储介质可以是计算机存储介质,也可以是通信介质。通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。计算机存储介质可以是通用或专用计算机能够存取的任何可用介质。例如,存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于专用集成电路(Application Specific IntegratedCircuits,简称:ASIC)中。另外,该ASIC可以位于用户设备中。当然,处理器和存储介质也可以作为分立组件存在于通信设备中。存储介质可以是只读存储器(ROM)、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。 The storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media can be any available media that can be accessed by a general purpose or special purpose computer. For example, a storage medium is coupled to a processor such that the processor can read information from the storage medium and write information to the storage medium. Of course, the storage medium can also be an integral part of the processor. The processor and storage medium may be located in application specific integrated circuits (Application Specific Integrated Circuits, ASIC for short). Additionally, the ASIC can be located in the user equipment. Of course, the processor and the storage medium may also exist as discrete components in the communication device. Storage media can be read-only memory (ROM), random access memory (RAM), CD-ROM, tapes, floppy disks, optical data storage devices, etc.

本发明还提供一种程序产品,该程序产品包括执行指令,该执行指令存储在存储介质中。设备的至少一个处理器可以从存储介质读取该执行指令,至少一个处理器执行该执行指令使得设备实施上述的各种实施方式提供的方法。The present invention also provides a program product, the program product includes execution instructions, and the execution instructions are stored in a storage medium. At least one processor of the device can read the execution instruction from the storage medium, and at least one processor executes the execution instruction so that the device implements the methods provided by the various embodiments mentioned above.

在上述终端或者服务器的实施例中,应理解,处理器可以是中央处理单元(英文:Central Processing Unit,简称:CPU),还可以是其他通用处理器、数字信号处理器(英文:Digital Signal Processor,简称:DSP)、专用集成电路(英文:Application SpecificIntegrated Circuit,简称:ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本发明所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。 In the above embodiments of the terminal or server, it should be understood that the processor may be a central processing unit (English: Central Processing Unit, referred to as: CPU), or other general-purpose processors, digital signal processors (English: Digital Signal Processor (abbreviation: DSP), application specific integrated circuit (English: Application Specific Integrated Circuit, abbreviation: ASIC), etc. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in the present invention can be directly implemented by a hardware processor, or executed by a combination of hardware and software modules in the processor.

最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention, but not to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features can be equivalently replaced; and these modifications or substitutions do not deviate from the essence of the corresponding technical solutions from the technical solutions of the embodiments of the present invention. scope.

Claims (12)

1.结合知识图谱和语义分析的电力报告智能清洗处理方法,其特征在于,包括:1. An intelligent cleaning processing method for power reports that combines knowledge graph and semantic analysis, which is characterized by: 接收用户对电力报告智能清洗及处理所配置的知识信息,基于相应知识信息所对应的三元组关系构建相对应的知识图谱,所述知识图谱中包括多个知识节点;Receive the knowledge information configured by the user for intelligent cleaning and processing of the power report, and construct a corresponding knowledge graph based on the triplet relationship corresponding to the corresponding knowledge information, and the knowledge graph includes multiple knowledge nodes; 对电力报告内的语句进行分词处理得到多个词语,确定词语中有数据属性的词语作为第一词语,将其他词语作为第二词语,结合所述语句中的第二词语对相应的第一词语进行语义分析,得到第一词语的分析属性;Perform word segmentation processing on the sentences in the power report to obtain multiple words, determine the words with data attributes in the words as the first words, use other words as the second words, and combine the second words in the sentences to the corresponding first words Perform semantic analysis to obtain the analytical attributes of the first word; 根据用户对本次电力报告所配置的计算需求生成相对应的数据需求清单,基于所述数据需求清单选择具有相应分析属性的第一词语作为第三词语;Generate a corresponding data requirement list based on the user's configured calculation requirements for this power report, and select the first word with the corresponding analysis attribute as the third word based on the data requirement list; 根据所述第三词语的分析属性确定与所述第三词语所对应的清洗验证策略,基于所述清洗验证策略、历史数据对第三词语进行正确性的验证,每个类型的分析属性具有预设的清洗验证策略;The cleaning verification strategy corresponding to the third word is determined according to the analysis attribute of the third word, and the correctness of the third word is verified based on the cleaning verification strategy and historical data. Each type of analysis attribute has a predetermined Designed cleaning and verification strategies; 若判断验证不通过则确定验证不通过所对应的验证存疑类型,根据所述第三词语的验证存疑类型确定与第三词语对应的清洗处理策略,所述验证存疑类型包括不合理存疑类型或待验证存疑类型;If it is determined that the verification fails, the verification doubt type corresponding to the verification failure is determined, and the cleaning processing strategy corresponding to the third word is determined according to the verification doubt type of the third word, and the verification doubt type includes an unreasonable doubt type or an unreasonable doubt type. Verify doubtful types; 在判断得到数据需求清单所对应的所有第三词语后,将所述第三词语输入至预设模型中计算得到数据清洗后的分析数据,基于所述分析数据和知识图谱得到相对应的处理数据。After determining that all third words corresponding to the data requirement list are obtained, the third words are input into the preset model to calculate the analyzed data after data cleaning, and the corresponding processing data is obtained based on the analyzed data and the knowledge graph. . 2.根据权利要求1所述的结合知识图谱和语义分析的电力报告智能清洗处理方法,其特征在于,2. The intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis according to claim 1, characterized in that, 所述接收用户对电力报告智能清洗及处理所配置的知识信息,基于相应知识信息所对应的三元组关系构建相对应的知识图谱,所述知识图谱中包括多个知识节点,包括:The receiving user intelligently cleans and processes the configured knowledge information for power reports, and constructs a corresponding knowledge graph based on the triplet relationship corresponding to the corresponding knowledge information. The knowledge graph includes multiple knowledge nodes, including: 所述知识信息包括与分析数据所对应的第一知识信息,以及与处理数据所对应的第二知识信息,每个第一知识信息或第二知识信息具有相对应的知识节点;The knowledge information includes first knowledge information corresponding to the analysis data, and second knowledge information corresponding to the processing data, and each first knowledge information or second knowledge information has a corresponding knowledge node; 根据用户对第一知识信息、第二知识信息配置的三元组关系对相应的知识节点进行连接,构建生成相对应的知识图谱。The corresponding knowledge nodes are connected according to the triplet relationship configured by the user on the first knowledge information and the second knowledge information, and a corresponding knowledge graph is constructed and generated. 3.根据权利要求2所述的结合知识图谱和语义分析的电力报告智能清洗处理方法,其特征在于,3. The intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis according to claim 2, characterized in that, 所述对电力报告内的语句进行分词处理得到多个词语,确定词语中有数据属性的词语作为第一词语,将其他词语作为第二词语,结合所述语句中的第二词语对相应的第一词语进行语义分析,得到第一词语的分析属性,包括:Perform word segmentation processing on the sentences in the power report to obtain multiple words, determine the words with data attributes in the words as the first words, use other words as the second words, and combine the second words in the sentences to the corresponding third words. Perform semantic analysis on a word to obtain the analytical attributes of the first word, including: 对电力报告内的语句进行分词处理得到多个词语,确定词语中有数据属性的词语作为第一词语,所述数据属性的词语至少包括阿拉伯数字、大写数字、繁体数字;Perform word segmentation processing on the statements in the power report to obtain multiple words, and determine the words with data attributes among the words as the first words. The words with data attributes at least include Arabic numerals, uppercase numerals, and traditional Chinese numerals; 将所有分词中第一词语以外的其他词语作为第二词语,遍历所述第二词语与预设词语进行比对,若判断第二词语与预设词语相对应则基于所述预设词语确定其为待分析的第二词语;Use words other than the first word in all participles as second words, traverse the second words and compare them with the preset words. If it is determined that the second words correspond to the preset words, determine the second words based on the preset words. is the second word to be analyzed; 根据每个第一词语与待分析的第二词语之间的位置关系,确定与相应第一词语所关联的待分析的第二词语,基于关联的第二词语对相应的第一词语进行语义分析,得到第一词语的分析属性;According to the positional relationship between each first word and the second word to be analyzed, the second word to be analyzed associated with the corresponding first word is determined, and the corresponding first word is semantically analyzed based on the associated second word. , get the analytical attributes of the first word; 若判断所述第一词语的格式与预设格式不一致,则根据第一词语的格式确定预设的第一转换模板,基于所述第一转换模板将第一词语进行转换,得到满足格式要求的第一词语。If it is determined that the format of the first word is inconsistent with the preset format, a preset first conversion template is determined according to the format of the first word, and the first word is converted based on the first conversion template to obtain a format that meets the format requirements. First word. 4.根据权利要求3所述的结合知识图谱和语义分析的电力报告智能清洗处理方法,其特征在于,4. The intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis according to claim 3, characterized in that, 所述根据每个第一词语与待分析的第二词语之间的位置关系,确定与相应第一词语所关联的第二词语,基于关联的第二词语对相应的第一词语进行语义分析,得到第一词语的分析属性,包括:Determining the second word associated with the corresponding first word based on the positional relationship between each first word and the second word to be analyzed, and performing semantic analysis on the corresponding first word based on the associated second word, Get the analytical attributes of the first word, including: 若判断一个语句中具有多个第一词语和预设的合并词,则判断相应的第一词语可以合并为一个第一词语,对可以合并的第一词语添加相对应的合并标签,以使后续对第一词语处理时基于所述合并标签对相应的第一词语合并处理;If it is determined that there are multiple first words and preset merged words in a sentence, it is determined that the corresponding first words can be merged into one first word, and corresponding merge tags are added to the first words that can be merged, so that subsequent When processing the first word, merge the corresponding first word based on the merge tag; 若判断一个语句中具有一个第一词语或可以合并为一个第一词语的多个第一词语,则将相应语句中所有待分析的第二词语与相应的第一词语或合并后的第一词语相关联;If it is determined that a sentence contains a first word or multiple first words that can be combined into one first word, then all the second words to be analyzed in the corresponding sentence are combined with the corresponding first words or the merged first words. Associated; 若判断一个语句中具有多个第一词语且多个词语无法合并为一个第一词语,则基于第一词语的位置对语句进行分段得到分段结果,根据所述分段结果确定每个第一词语所关联的第二词语;If it is determined that there are multiple first words in a sentence and the multiple words cannot be combined into one first word, segment the sentence based on the position of the first word to obtain a segmentation result, and determine each first word based on the segmentation result. A second word to which one word is associated; 所述分析属性为第二词语所包括的主语分析属性、趋势变化分析属性、概念分析属性中的任意一种或多种。The analysis attributes are any one or more of subject analysis attributes, trend change analysis attributes, and concept analysis attributes included in the second word. 5.根据权利要求4所述的结合知识图谱和语义分析的电力报告智能清洗处理方法,其特征在于,5. The intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis according to claim 4, characterized in that, 所述若判断一个语句中具有多个第一词语且多个词语无法合并为一个第一词语,则基于第一词语的位置对语句进行分段得到分段结果,根据所述分段结果确定每个第一词语所关联的第二词语,包括:If it is determined that there are multiple first words in a sentence and the multiple words cannot be combined into one first word, segment the sentence based on the position of the first word to obtain a segmentation result, and determine each segmentation result based on the segmentation result. second words associated with the first word, including: 在语句中确定所有第一词语的位置,基于所述第一词语的位置对语句分段处理得到多个子段,确定每个第一词语前部、相邻的段作为关联段;Determine the positions of all first words in the sentence, segment the sentence based on the positions of the first words to obtain multiple sub-segments, and determine the front and adjacent segments of each first word as associated segments; 将关联段内的第二词语作为与相应第一词语所关联的第二词语。The second word in the associated segment is used as the second word associated with the corresponding first word. 6.根据权利要求4所述的结合知识图谱和语义分析的电力报告智能清洗处理方法,其特征在于,6. The intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis according to claim 4, characterized in that, 所述若判断所述第一词语的格式与预设格式不一致,则根据第一词语的格式确定预设的第一转换模板,基于所述第一转换模板将第一词语进行转换,得到满足格式要求的第一词语,包括:If it is determined that the format of the first word is inconsistent with the preset format, a preset first conversion template is determined according to the format of the first word, and the first word is converted based on the first conversion template to obtain a format that satisfies the format. The first word required includes: 将第一词语的格式与预设格式比对,所述第一词语的格式为阿拉伯数字格式、大写数字格式或繁体数字格式,所述预设格式为阿拉伯数字格式;Compare the format of the first word with a preset format, the format of the first word is Arabic numeral format, uppercase numeral format or traditional Chinese numeral format, and the preset format is Arabic numeral format; 若判断所述第一词语的格式与预设格式不一致,则根据第一词语的格式确定预设的第一转换模板,对第一词语进行分解得到相对应的数值字和位置字,基于位置字将数值字填充至第一转换模板内的相对空位处;If it is determined that the format of the first word is inconsistent with the preset format, the preset first conversion template is determined according to the format of the first word, and the first word is decomposed to obtain the corresponding numerical word and position word, based on the position word Fill the numerical words into the relative empty positions in the first conversion template; 若判断第一转换模板存在未填充的空位,则对未填充的空位填充0;If it is determined that there are unfilled vacancies in the first conversion template, the unfilled vacancies are filled with 0; 若判断第一转换模板内所有的空位都被填充后,则将所有空位所组成的数字作为满足格式要求的第一词语。If it is determined that all the gaps in the first conversion template have been filled, then the number composed of all the gaps will be used as the first word that meets the format requirements. 7.根据权利要求6所述的结合知识图谱和语义分析的电力报告智能清洗处理方法,其特征在于,7. The intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis according to claim 6, characterized in that, 所述若判断所述第一词语的格式与预设格式不一致,则根据第一词语的格式确定预设的第一转换模板,对第一词语进行分解得到相对应的数值字和位置字,基于位置字将数值字填充至第一转换模板内的相对空位处,包括:If it is determined that the format of the first word is inconsistent with the preset format, the preset first conversion template is determined according to the format of the first word, and the first word is decomposed to obtain the corresponding numerical word and position word, based on The position word fills the numerical word into the relative empty position in the first conversion template, including: 根据第一词语的格式确定预设的第一转换模板,所述第一转换模板具有多个空位,对第一词语进行分解识别得到相对应的数值字和位置字,将每个数值字和其后部相邻的位置字作为一个数值组;Determine a preset first conversion template according to the format of the first word. The first conversion template has a plurality of gaps. The first word is decomposed and identified to obtain the corresponding numerical word and position word. Each numerical word and its other The adjacent position words at the rear are treated as a numerical value group; 提取整个第一词语中最前面的位置字得到相对应的预设空位数量,每个位置字具有相对应的预设空位数量;Extract the frontmost position word in the entire first word to obtain the corresponding preset number of gaps, and each position word has a corresponding preset number of gaps; 对第一转换模板中的预设空位数量的空位进行保留,按照所保留的空位由后至前依次对每个空位添加位置标签;Reserve a preset number of vacancies in the first conversion template, and add position labels to each vacancy in sequence from back to front according to the reserved vacancies; 根据每个数值组中的位置字确定相应对应的位置标签和空位,将数值组中的数值字填充至所确定的空位内。The corresponding position labels and gaps are determined according to the position words in each numerical group, and the numerical words in the numerical group are filled into the determined gaps. 8.根据权利要求6所述的结合知识图谱和语义分析的电力报告智能清洗处理方法,其特征在于,还包括:8. The intelligent cleaning processing method for electric power reports combining knowledge graph and semantic analysis according to claim 6, characterized in that it further includes: 若判断多个第一词语之间具有相对应的合并标签,则根据多个第一词语之间的所对应的合并词对多个第一词语进行计算得到计算后的第一词语,每个合并词具有预设的计算方式。If it is determined that there are corresponding merge tags between multiple first words, the multiple first words are calculated according to the corresponding merge tags between the multiple first words to obtain the calculated first words, each merged Words have a preset calculation method. 9.根据权利要求6所述的结合知识图谱和语义分析的电力报告智能清洗处理方法,其特征在于,9. The intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis according to claim 6, characterized in that, 所述根据用户对本次电力报告所配置的计算需求生成相对应的数据需求清单,基于所述数据需求清单选择具有相应分析属性的第一词语作为第三词语,包括:The method generates a corresponding data demand list based on the user's configured computing requirements for this power report, and selects the first word with corresponding analysis attributes as the third word based on the data demand list, including: 根据所有预设格式的第一词语、合并后的第一词语以及分别关联的第二词语,生成相对应的数据统计表;Generate corresponding data statistics tables based on all the first words in the preset format, the merged first words, and the respectively associated second words; 根据用户对本次电力报告所配置的计算需求生成相对应的数据需求清单,提取所述数据需求清单内的第四词语;Generate a corresponding data requirement list based on the calculation requirements configured by the user for this power report, and extract the fourth word in the data requirement list; 将所述第四词语与数据统计表的第二词语比对,若第四词语与第二词语相对应,则将数据统计表内相应第二词语对应的第一词语作为第三词语。The fourth word is compared with the second word in the data statistics table. If the fourth word corresponds to the second word, the first word corresponding to the corresponding second word in the data statistics table is used as the third word. 10.根据权利要求9所述的结合知识图谱和语义分析的电力报告智能清洗处理方法,其特征在于,10. The intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis according to claim 9, characterized in that, 所述根据所述第三词语的分析属性确定与所述第三词语所对应的清洗验证策略,基于所述清洗验证策略、历史数据对第三词语进行正确性的验证,每个类型的分析属性具有预设的清洗验证策略,包括:The cleaning verification strategy corresponding to the third word is determined based on the analysis attribute of the third word, and the correctness of the third word is verified based on the cleaning verification strategy and historical data. The analysis attributes of each type are Has preset cleaning verification strategies, including: 确定所述第三词语所对应分析属性的第二词语,根据相应的第二词语确定相对应的清洗验证策略,每类主语分析属性或概念分析属性的第二词语具有预设的清洗验证策略;Determine the second word corresponding to the analysis attribute of the third word, and determine the corresponding cleaning verification strategy according to the corresponding second word. The second words of each type of subject analysis attribute or concept analysis attribute have a preset cleaning verification strategy; 将所述第三词语与清洗验证策略所包括的第一阈值区间进行比对,若第三词语的数值位于第一阈值区间内,则判断相应的第三词语满足清洗验证策略的正确性的验证;The third word is compared with the first threshold interval included in the cleaning verification strategy. If the value of the third word is within the first threshold interval, it is determined that the corresponding third word satisfies the verification of the correctness of the cleaning verification strategy. ; 根据所述第二词语的主语分析属性或概念分析属性确定相对应的历史数据,根据所述历史数据进行计算得到平均数值,对所述平均数值按照预设比例值进行区间化处理得到第二阈值区间;Corresponding historical data is determined according to the subject analysis attribute or concept analysis attribute of the second word, the average value is calculated according to the historical data, and the average value is interval processed according to the preset proportion value to obtain the second threshold. interval; 若第三词语的数值位于第二阈值区间内,则判断相应的第三词语满足历史数据的正确性的验证。If the value of the third word is within the second threshold interval, it is determined that the corresponding third word meets the verification of the correctness of the historical data. 11.根据权利要求10所述的结合知识图谱和语义分析的电力报告智能清洗处理方法,其特征在于,11. The intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis according to claim 10, characterized in that, 所述若判断验证不通过则确定验证不通过所对应的验证存疑类型,根据所述第三词语的验证存疑类型确定与第三词语对应的清洗处理策略,所述验证存疑类型包括不合理存疑类型或待验证存疑类型,包括:If it is determined that the verification fails, the verification doubt type corresponding to the verification failure is determined, and the cleaning processing strategy corresponding to the third word is determined according to the verification doubt type of the third word, and the verification doubt type includes an unreasonable doubt type. or questionable types to be verified, including: 若判断第三词语的数值不位于第一阈值区间内,则确定第三词语所对应的验证存疑类型为不合理存疑类型;If it is determined that the value of the third word is not within the first threshold interval, it is determined that the verification doubt type corresponding to the third word is an unreasonable doubt type; 若判断第三词语的数值不位于第二阈值区间内,则确定第三词语所对应的验证存疑类型为待验证存疑类型;If it is determined that the value of the third word is not within the second threshold interval, then it is determined that the verification questionable type corresponding to the third word is the questionable type to be verified; 若第三词语为不合理存疑类型,则将相对应的第三词语标记为错误,并对第三词语、相对应的第二词语输出,以使用户对第三词语直接更新;If the third word is of an unreasonable and doubtful type, mark the corresponding third word as an error, and output the third word and the corresponding second word so that the user can directly update the third word; 若第三词语为待验证存疑类型,则将相对应的第三词语标记为待验证,并对第三词语、相对应的第二词语输出,若判断用户输入肯定的验证信息则将相对应的第三词语作为最终的第三词语;If the third word is of a questionable type to be verified, the corresponding third word is marked as to be verified, and the third word and the corresponding second word are output. If it is judged that the user inputs positive verification information, the corresponding The third word serves as the final third word; 若判断用户输入否定的验证信息,则根据用户输入对相应的第三词语更新。If it is determined that the user input is negative verification information, the corresponding third word is updated according to the user input. 12.根据权利要求11所述的结合知识图谱和语义分析的电力报告智能清洗处理方法,其特征在于,12. The intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis according to claim 11, characterized in that, 所述在判断得到数据需求清单所对应的所有第三词语后,将所述第三词语输入至预设模型中计算得到数据清洗后的分析数据,基于所述分析数据和知识图谱得到相对应的处理数据,包括:After it is determined that all third words corresponding to the data requirement list are obtained, the third words are input into the preset model to calculate the analyzed data after data cleaning, and the corresponding analysis data is obtained based on the analyzed data and the knowledge map. Process data including: 在判断得到数据需求清单所对应的所有第三词语后,将所述第三词语输入至预设模型中计算得到数据清洗后的分析数据,所述分析数据中包括相对应的分词词语;After determining that all third words corresponding to the data requirement list are obtained, the third words are input into the preset model to calculate the analyzed data after data cleaning, and the analyzed data includes the corresponding segmented words; 将所述分词词语输入至知识图谱内确定具有相应第一知识信息的知识节点,根据所确定的知识节点与其他具有第二知识信息的知识节点;Input the segmented words into the knowledge graph to determine knowledge nodes with corresponding first knowledge information, and based on the determined knowledge nodes and other knowledge nodes with second knowledge information; 统计其他具有第二知识信息的知识节点的第二知识信息,得到相对应的处理数据并输出。The second knowledge information of other knowledge nodes with second knowledge information is counted, and the corresponding processing data is obtained and output.
CN202310502167.7A 2023-05-06 2023-05-06 Intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis Active CN116484805B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310502167.7A CN116484805B (en) 2023-05-06 2023-05-06 Intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310502167.7A CN116484805B (en) 2023-05-06 2023-05-06 Intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis

Publications (2)

Publication Number Publication Date
CN116484805A CN116484805A (en) 2023-07-25
CN116484805B true CN116484805B (en) 2023-09-15

Family

ID=87219378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310502167.7A Active CN116484805B (en) 2023-05-06 2023-05-06 Intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis

Country Status (1)

Country Link
CN (1) CN116484805B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118332139B (en) * 2024-06-17 2024-09-06 国网山东省电力公司电力科学研究院 A power big data analysis system based on knowledge graph and big model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183949A (en) * 2015-08-13 2015-12-23 中国铁道科学研究院 Railway main data cleaning method and system
CN106227872A (en) * 2016-08-01 2016-12-14 浪潮软件集团有限公司 Data cleaning and verifying method based on e-commerce platform
CN110727668A (en) * 2019-09-30 2020-01-24 北京百度网讯科技有限公司 Data cleaning method and device
CN111914568A (en) * 2020-07-31 2020-11-10 平安科技(深圳)有限公司 Method, device and equipment for generating text modifying sentence and readable storage medium
CN113297744A (en) * 2021-05-28 2021-08-24 国网浙江省电力有限公司营销服务中心 Charging pile data cleaning method suitable for error monitoring calculation and charging station
WO2022110454A1 (en) * 2020-11-25 2022-06-02 中译语通科技股份有限公司 Automatic text generation method and apparatus, and electronic device and storage medium
CN115098488A (en) * 2022-07-19 2022-09-23 苏州数猎科技有限公司 Automatic data cleaning method based on knowledge graph

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183949A (en) * 2015-08-13 2015-12-23 中国铁道科学研究院 Railway main data cleaning method and system
CN106227872A (en) * 2016-08-01 2016-12-14 浪潮软件集团有限公司 Data cleaning and verifying method based on e-commerce platform
CN110727668A (en) * 2019-09-30 2020-01-24 北京百度网讯科技有限公司 Data cleaning method and device
CN111914568A (en) * 2020-07-31 2020-11-10 平安科技(深圳)有限公司 Method, device and equipment for generating text modifying sentence and readable storage medium
WO2022110454A1 (en) * 2020-11-25 2022-06-02 中译语通科技股份有限公司 Automatic text generation method and apparatus, and electronic device and storage medium
CN113297744A (en) * 2021-05-28 2021-08-24 国网浙江省电力有限公司营销服务中心 Charging pile data cleaning method suitable for error monitoring calculation and charging station
CN115098488A (en) * 2022-07-19 2022-09-23 苏州数猎科技有限公司 Automatic data cleaning method based on knowledge graph

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
关于主动学习下的知识图谱补全研究;陈钦况 等;计算机科学与探索;第14卷(第05期);第769-782页 *

Also Published As

Publication number Publication date
CN116484805A (en) 2023-07-25

Similar Documents

Publication Publication Date Title
EP3796176B1 (en) Fault root cause analysis method and apparatus
WO2021051630A1 (en) Knowledge fusion method and apparatus based on data relationship analysis, and computer device and storage medium
CN114430363B (en) Fault cause location method, device, equipment and storage medium
CN111626047A (en) Intelligent text error correction method and device, electronic equipment and readable storage medium
CN112231431B (en) Abnormal address identification method and device and computer readable storage medium
WO2023093116A1 (en) Method and apparatus for determining industrial chain node of enterprise, and terminal and storage medium
WO2022178994A1 (en) Table structure recognition method and apparatus, electronic device, and storage medium
WO2022089227A1 (en) Address parameter processing method, and related device
CN116484805B (en) Intelligent cleaning processing method for power reports combining knowledge graph and semantic analysis
CN112433874A (en) Fault positioning method, system, electronic equipment and storage medium
CN116467468B (en) Abnormal information processing method for power management system based on knowledge graph technology
CN116541698A (en) XGBoost-based network anomaly intrusion detection method and system
CN115859191A (en) Fault diagnosis method, device, computer readable storage medium and computer equipment
WO2024031930A1 (en) Error log detection method and apparatus, and electronic device and storage medium
CN118672990A (en) Log analysis method, device, program product and medium
CN116842949A (en) Event extraction method, device, electronic equipment and storage medium
CN111784246A (en) Estimation method of logistics route
CN109344255B (en) Label filling method and terminal equipment
CN117633518B (en) Industrial chain construction method and system
CN117095230A (en) Air quality low-consumption assessment method and system based on image big data intelligent analysis
CN115587190A (en) Construction method and device of knowledge graph in power field and electronic equipment
CN115330350A (en) Financial data collaborative management method and device for secure persistent memory
CN118410775B (en) Form data processing method and system for big model pre-training in bidding field
CN111784248B (en) Logistics traceability method
CN114548765B (en) Method and device for risk identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant