CN116484805A

CN116484805A - Intelligent cleaning processing method for power report combining knowledge graph and semantic analysis

Info

Publication number: CN116484805A
Application number: CN202310502167.7A
Authority: CN
Inventors: 胡若云; 姚冰峰; 郭兰兰; 郭大琦; 夏霖; 唐健毅; 张潇匀; 刘铭; 楼洁妮; 陈洲泓; 包挺华; 潘鑫; 金红霞; 张磊; 万志锦
Original assignee: State Grid Zhejiang Electric Power Co Ltd; Marketing Service Center of State Grid Zhejiang Electric Power Co Ltd
Current assignee: State Grid Zhejiang Electric Power Co Ltd; Marketing Service Center of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2023-05-06
Filing date: 2023-05-06
Publication date: 2023-07-25
Anticipated expiration: 2043-05-06
Also published as: CN116484805B

Abstract

The invention provides an intelligent cleaning processing method for a power report by combining knowledge graph and semantic analysis, which comprises the following steps: constructing a corresponding knowledge graph based on the triplet relation corresponding to the corresponding knowledge information; determining words with data attributes in the words as first words, and taking other words as second words; generating a corresponding data demand list according to the calculation demand configured by the user for the current power report, and selecting a first word with corresponding analysis attribute as a third word based on the data demand list; verifying the correctness of the third word based on the cleaning verification strategy and the historical data; determining a cleaning treatment strategy corresponding to the third word according to the verification doubt type of the third word, wherein the verification doubt type comprises an unreasonable doubt type or a to-be-verified doubt type; inputting the third word into a preset model to calculate the data-cleaned analysis data, and obtaining corresponding processing data based on the analysis data and the knowledge graph.

Description

Intelligent cleaning processing method for power report combining knowledge graph and semantic analysis

Technical Field

The invention relates to a data processing technology, in particular to an intelligent cleaning processing method for a power report by combining a knowledge graph and semantic analysis.

Background

With the digital transformation of power grid enterprises, the traditional paper power report is gradually converted into an electronic power report for data induction. The power report will typically include data statistics in multiple dimensions, such as residential electricity data, industrial electricity data, campus electricity data, and the like. The power report is an important basis for analysis data of the power grid enterprises.

Because the number of the power reports is large and the data amount is large due to the large number of the power reports, when a worker analyzes the power reports, a large number of the power reports are often required to be read, and related data in the power reports cannot be obtained quickly by effectively combining the requirements, so that the power reports need to be subjected to effective data arrangement. In the prior art, the data of the power report are often extracted and arranged in an artificial way, however, for the power report with larger data quantity, the arranging workload is huge, and due to unavoidable artificial reasons, some data arranging errors are caused.

Therefore, how to intelligently clean the power report and automatically verify the cleaned data becomes an urgent problem to be solved.

Disclosure of Invention

The embodiment of the invention provides an intelligent cleaning processing method for a power report by combining a knowledge graph and semantic analysis, which can be used for intelligently cleaning the power report and automatically verifying cleaned data, and can also be used for automatically analyzing the data by combining the requirements of staff to obtain corresponding analysis data.

In a first aspect of the embodiment of the present invention, there is provided an intelligent cleaning processing method for a power report by combining a knowledge graph and semantic analysis, including:

receiving knowledge information configured by intelligent cleaning and processing of a power report by a user, and constructing a corresponding knowledge graph based on a triplet relation corresponding to the corresponding knowledge information, wherein the knowledge graph comprises a plurality of knowledge nodes;

word segmentation processing is carried out on sentences in the power report to obtain a plurality of words, words with data attributes in the words are determined to be used as first words, other words are used as second words, and semantic analysis is carried out on the corresponding first words by combining the second words in the sentences to obtain analysis attributes of the first words;

generating a corresponding data demand list according to the calculation demand configured by a user for the current power report, and selecting a first word with corresponding analysis attribute as a third word based on the data demand list;

Determining a cleaning verification strategy corresponding to the third word according to the analysis attribute of the third word, verifying the correctness of the third word based on the cleaning verification strategy and historical data, wherein each type of analysis attribute has a preset cleaning verification strategy;

if the verification is judged to be failed, determining a verification doubt type corresponding to the failure of the verification, and determining a cleaning treatment strategy corresponding to the third word according to the verification doubt type of the third word, wherein the verification doubt type comprises an unreasonable doubt type or a to-be-verified doubt type;

after judging and obtaining all third words corresponding to the data demand list, inputting the third words into a preset model to calculate and obtain analysis data after data cleaning, and obtaining corresponding processing data based on the analysis data and a knowledge graph.

Optionally, in one possible implementation manner of the first aspect, the receiving the configured knowledge information of the intelligent cleaning and processing of the power report by the user constructs a corresponding knowledge graph based on the triplet relationship corresponding to the corresponding knowledge information, where the knowledge graph includes a plurality of knowledge nodes, and includes:

the knowledge information comprises first knowledge information corresponding to the analysis data and second knowledge information corresponding to the processing data, and each first knowledge information or second knowledge information has a corresponding knowledge node;

And connecting corresponding knowledge nodes according to the triplet relation configured by the first knowledge information and the second knowledge information by the user, and constructing and generating a corresponding knowledge graph.

Optionally, in one possible implementation manner of the first aspect, the word segmentation processing is performed on a sentence in the power report to obtain a plurality of words, determining a word with a data attribute in the words as a first word, and other words as a second word, and performing semantic analysis on the corresponding first word in combination with the second word in the sentence to obtain an analysis attribute of the first word, where the step includes:

word segmentation processing is carried out on sentences in the power report to obtain a plurality of words, and words with data attributes in the words are determined to be used as first words, wherein the words with the data attributes at least comprise Arabic numerals, capitalization numerals and complex numbers;

taking other words except the first word in all the word segmentation as second words, traversing the second words and comparing the second words with preset words, and determining the second words to be analyzed based on the preset words if the second words are judged to correspond to the preset words;

determining second words to be analyzed associated with the corresponding first words according to the position relation between each first word and the second words to be analyzed, and carrying out semantic analysis on the corresponding first words based on the associated second words to obtain analysis attributes of the first words;

If the format of the first word is inconsistent with the preset format, a preset first conversion template is determined according to the format of the first word, and the first word is converted based on the first conversion template to obtain the first word meeting the format requirement.

Optionally, in one possible implementation manner of the first aspect, the determining, according to a positional relationship between each first word and the second word to be analyzed, the second word associated with the corresponding first word, performing semantic analysis on the corresponding first word based on the associated second word, to obtain an analysis attribute of the first word includes:

if a sentence is judged to have a plurality of first words and preset merging words, judging that the corresponding first words can be merged into one first word, and adding corresponding merging labels to the merged first words so as to enable the corresponding first words to be merged based on the merging labels when the first words are processed later;

if judging that one first word or a plurality of first words which can be combined into one first word exists in one sentence, associating all second words to be analyzed in the corresponding sentence with the corresponding first word or the combined first word;

If a plurality of first words are included in one sentence and cannot be combined into one first word, segmenting the sentence based on the positions of the first words to obtain a segmentation result, and determining a second word associated with each first word according to the segmentation result;

the analysis attribute is any one or more of a subject analysis attribute, a trend change analysis attribute and a concept analysis attribute which are included in the second word.

Optionally, in one possible implementation manner of the first aspect, if it is determined that there are multiple first words in a sentence and the multiple words cannot be combined into one first word, segmenting the sentence based on the positions of the first words to obtain a segmentation result, and determining, according to the segmentation result, a second word associated with each first word includes:

determining the positions of all first words in the sentence, segmenting the sentence based on the positions of the first words to obtain a plurality of subsections, and determining the front part of each first word and adjacent sections as associated sections;

and using the second words in the associated segment as the second words associated with the corresponding first words.

Optionally, in one possible implementation manner of the first aspect, if the determining that the format of the first word is inconsistent with the preset format, determining a preset first conversion template according to the format of the first word, and converting the first word based on the first conversion template to obtain the first word meeting the format requirement includes:

Comparing the format of the first word with a preset format, wherein the format of the first word is an Arabic digital format, a great-writing digital format or a traditional digital format, and the preset format is an Arabic digital format;

if the format of the first word is not consistent with the preset format, determining a preset first conversion template according to the format of the first word, decomposing the first word to obtain a corresponding numerical word and a corresponding position word, and filling the numerical word into a relative vacancy in the first conversion template based on the position word;

if the first conversion template is judged to have unfilled vacancies, filling 0 into the unfilled vacancies;

and if all the gaps in the first conversion template are filled, taking the numbers formed by all the gaps as the first words meeting the format requirement.

Optionally, in one possible implementation manner of the first aspect, if the determining that the format of the first word is inconsistent with the preset format, determining a preset first conversion template according to the format of the first word, decomposing the first word to obtain a corresponding numeric word and a position word, and filling the numeric word into a relative empty position in the first conversion template based on the position word includes:

Determining a preset first conversion template according to the format of a first word, wherein the first conversion template is provided with a plurality of gaps, decomposing and identifying the first word to obtain corresponding numerical words and position words, and taking each numerical word and the position words adjacent to the rear part of each numerical word as a numerical group;

extracting the forefront position word in the whole first word to obtain the corresponding preset vacancy number, wherein each position word has the corresponding preset vacancy number;

reserving the number of gaps of preset gaps in the first conversion template, and adding position labels to each gap in sequence from back to front according to the reserved gaps;

and determining corresponding position labels and gaps according to the position words in each numerical value group, and filling the numerical value words in the numerical value groups into the determined gaps.

Optionally, in one possible implementation manner of the first aspect, the method further includes:

if the first words are judged to have corresponding merging labels, the calculated first words are obtained by calculating the first words according to the corresponding merging words among the first words, and each merging word has a preset calculation mode.

Optionally, in a possible implementation manner of the first aspect, the generating a corresponding data requirement list according to a computing requirement configured by a user for the current power report, selecting, based on the data requirement list, a first word with a corresponding analysis attribute as a third word includes:

Generating a corresponding data statistics table according to all the first words in the preset format, the combined first words and the respectively associated second words;

generating a corresponding data demand list according to the calculation demand configured by the user for the current power report, and extracting a fourth word in the data demand list;

and comparing the fourth word with the second word of the data statistics table, and if the fourth word corresponds to the second word, taking the first word corresponding to the corresponding second word of the data statistics table as a third word.

Optionally, in one possible implementation manner of the first aspect, the determining, according to the analysis attribute of the third word, a cleaning verification policy corresponding to the third word, verifying correctness of the third word based on the cleaning verification policy and historical data, where each type of analysis attribute has a preset cleaning verification policy includes:

determining a second word of the analysis attribute corresponding to the third word, and determining a corresponding cleaning and verification strategy according to the corresponding second word, wherein the second word of each category of subject analysis attribute or concept analysis attribute has a preset cleaning and verification strategy;

Comparing the third word with a first threshold interval included in the cleaning verification strategy, and if the numerical value of the third word is in the first threshold interval, judging that the corresponding third word meets the verification of the correctness of the cleaning verification strategy;

determining corresponding historical data according to subject analysis attributes or concept analysis attributes of the second words, calculating according to the historical data to obtain an average value, and carrying out interval treatment on the average value according to a preset proportion value to obtain a second threshold interval;

if the numerical value of the third word is in the second threshold value interval, judging that the corresponding third word meets the verification of the correctness of the historical data.

Optionally, in one possible implementation manner of the first aspect, if the verification is judged to be failed, it is determined that the verification is failed, and a cleaning processing policy corresponding to the third word is determined according to the verification in-doubt type of the third word, where the verification in-doubt type includes an unreasonable in-doubt type or an in-doubt type to be verified, including:

if the numerical value of the third word is judged not to be in the first threshold value interval, determining that the verification doubt type corresponding to the third word is an unreasonable doubt type;

If the numerical value of the third word is judged not to be in the second threshold value interval, determining that the verification doubtful type corresponding to the third word is the to-be-verified doubtful type;

if the third word is of an unreasonable doubt type, marking the corresponding third word as an error, and outputting the third word and the corresponding second word so as to enable a user to update the third word directly;

if the third word is of the type to be verified and suspicious, marking the corresponding third word as to be verified, outputting the third word and the corresponding second word, and if the user inputs positive verification information, taking the corresponding third word as a final third word;

if the negative verification information is input by the user, updating the corresponding third word according to the user input.

Optionally, in one possible implementation manner of the first aspect, after determining that all third words corresponding to the data requirement list are obtained, inputting the third words into a preset model to calculate analysis data after data cleaning, and obtaining corresponding processing data based on the analysis data and a knowledge graph, where the processing data includes:

after judging all third words corresponding to the data demand list, inputting the third words into a preset model to calculate analysis data after data cleaning, wherein the analysis data comprises corresponding word segmentation words;

Inputting the word segmentation words into a knowledge graph to determine knowledge nodes with corresponding first knowledge information, and according to the determined knowledge nodes and other knowledge nodes with second knowledge information;

and counting the second knowledge information of other knowledge nodes with the second knowledge information, obtaining corresponding processing data and outputting the corresponding processing data.

In a second aspect of the embodiment of the present invention, there is provided an intelligent cleaning processing system for power report combining knowledge graph and semantic analysis, including:

the construction module is used for receiving the knowledge information configured by the intelligent cleaning and processing of the power report by the user, constructing a corresponding knowledge graph based on the triplet relation corresponding to the corresponding knowledge information, wherein the knowledge graph comprises a plurality of knowledge nodes;

the processing module is used for carrying out word segmentation processing on sentences in the power report to obtain a plurality of words, determining words with data attributes in the words as first words, taking other words as second words, and carrying out semantic analysis on the corresponding first words by combining the second words in the sentences to obtain analysis attributes of the first words;

the generation module is used for generating a corresponding data demand list according to the calculation demand configured by the user for the current power report, and selecting a first word with corresponding analysis attribute as a third word based on the data demand list;

The verification module is used for determining a cleaning verification strategy corresponding to the third word according to the analysis attribute of the third word, verifying the correctness of the third word based on the cleaning verification strategy and the historical data, and each type of analysis attribute has a preset cleaning verification strategy;

the judging module is used for determining that the verification fails the corresponding verification doubt type if the verification fails, and determining a cleaning treatment strategy corresponding to the third word according to the verification doubt type of the third word, wherein the verification doubt type comprises an unreasonable doubt type or a doubt type to be verified;

and the calculation module is used for inputting the third words into a preset model to calculate the analysis data after the data are cleaned after judging and obtaining all the third words corresponding to the data demand list, and obtaining corresponding processing data based on the analysis data and the knowledge graph.

The beneficial effects are that:

1. according to the method and the device, the electric power report can be processed and analyzed in terms dimension to obtain the first terms with data attributes and the second terms of other types, and analysis data of the first terms are obtained by combining the analysis of the first terms and the second terms. When a user is required by calculation, a data demand list can be generated, the correctness of the third word is verified by combining a cleaning verification strategy and historical data, and meanwhile, when an abnormal condition occurs, interaction can be performed by workers, so that analyzed data is error-free. Finally, the method also combines knowledge information configured by intelligent cleaning and processing of the power report by a user to form a knowledge graph, and outputs a processing result corresponding to the analysis data by using the knowledge graph. In conclusion, the scheme can intelligently clean the power report and automatically verify the cleaned data, and can automatically analyze the data by combining the requirements of staff to obtain corresponding analysis data.

2. When the method and the system are used for processing and analyzing the electric power report in terms of dimension, terms can be classified according to the attributes of the terms to obtain first terms and second terms, and the second terms to be analyzed, which are associated with the corresponding first terms, are determined according to the position relation between the first terms and the second terms to be analyzed, so that analysis and association of sentences in the electric power report are realized. Meanwhile, the format of the first word is compared with a preset format, if the format is not corresponding, format conversion is performed, when format conversion is performed, conversion is performed by using a first conversion template, and meanwhile, positioning and supplementation of numbers are realized by combining numerical words and position words.

3. When the scheme is combined with a user to calculate the calculation requirement configured by the current power report, the cleaning verification strategy and the historical data are combined to verify the correctness of the third word. During verification, two verification modes are set in the scheme, one is to determine the rationality of the data by utilizing a first threshold interval, and the other is to determine the doubtful of the data by combining historical data. In addition, the data is corrected by interacting with staff in combination with the verification result, so that the data is accurate. Finally, the scheme can also be combined with the knowledge graph to process analysis data, so as to obtain corresponding processing results, and intelligently assist staff in cleaning, analyzing and processing the power report.

Drawings

FIG. 1 is a schematic flow chart of an intelligent cleaning processing method for a power report combining knowledge graph and semantic analysis according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an intelligent cleaning processing system for power report combining knowledge graph and semantic analysis according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein.

It should be understood that, in various embodiments of the present invention, the sequence number of each process does not mean that the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

It should be understood that in the present invention, "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements that are expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present invention, "plurality" means two or more. "and/or" is merely an association relationship describing an association object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. "comprising A, B and C", "comprising A, B, C" means that all three of A, B, C comprise, "comprising A, B or C" means that one of the three comprises A, B, C, and "comprising A, B and/or C" means that any 1 or any 2 or 3 of the three comprises A, B, C.

It should be understood that in the present invention, "B corresponding to a", "a corresponding to B", or "B corresponding to a" means that B is associated with a, from which B can be determined. Determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information. The matching of A and B is that the similarity of A and B is larger than or equal to a preset threshold value.

As used herein, "if" may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection" depending on the context.

The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Referring to fig. 1, a flow chart of an intelligent cleaning processing method for a power report combining a knowledge graph and semantic analysis according to an embodiment of the present invention includes S1-S6, specifically as follows:

s1, knowledge information configured by intelligent cleaning and processing of the power report by a user is received, a corresponding knowledge graph is constructed based on a triplet relation corresponding to the corresponding knowledge information, and the knowledge graph comprises a plurality of knowledge nodes.

Because the power report is processed by combining the knowledge graph, the scheme can receive the knowledge information configured by the intelligent cleaning and processing of the power report by a user, and then the corresponding knowledge graph is constructed by combining the triplet relation corresponding to the corresponding knowledge information.

The knowledge graph comprises a plurality of knowledge nodes.

In some embodiments, S1 (receiving knowledge information configured by a user for intelligent cleaning and processing of a power report, and constructing a corresponding knowledge graph based on a triplet relationship corresponding to the corresponding knowledge information, where the knowledge graph includes a plurality of knowledge nodes) includes S11-S12:

s11, the knowledge information comprises first knowledge information corresponding to the analysis data and second knowledge information corresponding to the processing data, and each first knowledge information or each second knowledge information has a corresponding knowledge node.

The knowledge information comprises first knowledge information corresponding to the analysis data and second knowledge information corresponding to the processing data. Each of the first knowledge information or the second knowledge information has a corresponding knowledge node.

Both the analysis data and the process data may be preset. The analysis data are data such as household electricity too high and industrial electricity too high; the processing data may be, for example, resident power supply increase processing when resident power consumption is too high, or industrial power limit processing when industrial power consumption is too high. The above is merely illustrative and is not limited to the above examples.

And S12, connecting corresponding knowledge nodes according to the triplet relation configured by the first knowledge information and the second knowledge information by the user, and constructing and generating a corresponding knowledge graph.

According to the scheme, corresponding knowledge nodes are connected by combining the triplet relation configured by the first knowledge information and the second knowledge information of the user, and a corresponding knowledge map is constructed and generated. The process is the prior art and will not be described in detail.

S2, word segmentation processing is carried out on sentences in the power report to obtain a plurality of words, words with data attributes in the words are determined to be used as first words, other words are used as second words, and semantic analysis is carried out on the corresponding first words by combining the second words in the sentences to obtain analysis attributes of the first words.

According to the scheme, the sentences in the power report are subjected to word segmentation processing to obtain a plurality of words, for example, the results of 'according to', 'this time', 'statistics', 'resident electricity consumption', 'twenty percent' and the like can be obtained through word segmentation processing of 'twenty percent according to resident electricity consumption of this time'.

After the words are obtained, determining the words with data attributes in the words as first words, using other words as second words, and carrying out semantic analysis on the corresponding first words by combining the second words in the sentences to obtain the analysis attributes of the first words.

In some embodiments, S2 (word segmentation processing is performed on a sentence in the power report to obtain a plurality of words, determining that a word with a data attribute in the words is used as a first word, other words are used as second words, and performing semantic analysis on the corresponding first word by combining the second word in the sentence to obtain an analysis attribute of the first word) includes S21-S24:

s21, word segmentation processing is carried out on sentences in the power report to obtain a plurality of words, the words with data attributes in the words are determined to be used as first words, and the words with the data attributes at least comprise Arabic numerals, capitalization numerals and complex numbers.

After the words are obtained, the scheme can determine the words with data attributes in the words as first words, wherein the words with the data attributes at least comprise Arabic numerals, capitalization numerals and complex numbers.

Illustratively, the data attribute of the word "20" is an Arabic number, the data attribute of the word "twenty" is a great write number, and the data attribute of the word "two" is a complex number.

S22, taking other words except the first word in all the word segmentation as second words, traversing the second words and comparing the second words with preset words, and determining the second words to be analyzed based on the preset words if the second words are judged to correspond to the preset words.

It will be appreciated that the first word is a data word and that other words will be labeled as second words.

The second word is traversed and compared with the preset word, and if the second word is judged to correspond to the preset word, the second word is determined to be the second word to be analyzed by combining the preset word.

It should be noted that, since the word segmentation may obtain a plurality of second words, some of the second words are useful, and some are useless, for example, "according to", "this time", "statistics" are useless, and "resident electricity consumption" is useful, then it may be the corresponding second word to be analyzed. Wherein the determination may be made in combination with a preset word when determining the second word to be analyzed. The preset words are, for example, "resident electricity amount", "industrial electricity amount", and the like.

S23, determining second words to be analyzed associated with the corresponding first words according to the position relation between each first word and the second words to be analyzed, and carrying out semantic analysis on the corresponding first words based on the associated second words to obtain analysis attributes of the first words.

After the first words and the second words are determined, the scheme can determine the second words to be analyzed, which are associated with the corresponding first words, according to the position relation between each first word and the second words to be analyzed.

And then carrying out semantic analysis on the corresponding first words based on the associated second words to obtain analysis attributes of the first words.

It should be noted that, the method needs to combine the related words to obtain the meaning of the first word, so as to obtain the analysis attribute of the first word. The analysis attribute is any one or more of a subject analysis attribute, a trend change analysis attribute and a concept analysis attribute which are included in the second word. The subject analysis attribute may be "resident electricity", "industrial electricity", etc., the trend change analysis attribute may be "too high", "too low", etc., the concept analysis attribute may be "for example," the analysis attribute corresponding to twenty percent "obtained through association analysis is" resident electricity ".

In some embodiments, S23 (determining, according to the positional relationship between each first term and the second term to be analyzed, the second term associated with the corresponding first term, performing semantic analysis on the corresponding first term based on the associated second term, resulting in the analysis attribute of the first term) includes S231-S233:

s231, if a sentence is judged to have a plurality of first words and preset merging words, judging that the corresponding first words can be merged into one first word, and adding corresponding merging labels to the merged first words so as to enable the corresponding first words to be merged based on the merging labels when the first words are processed later.

If a sentence is judged to have a plurality of first words and preset merging words, judging that the corresponding first words can be merged into one first word, and adding corresponding merging labels to the merged first words so as to enable the corresponding first words to be merged based on the merging labels when the first words are processed later.

For example, the preset merging word is "to" in "twenty to thirty percent", and at this time, the scheme merges the corresponding first word into one first word.

On the basis of the above embodiment, the method further comprises:

For example, the above combination may be a combination with an intermediate value, for example, "twenty-five percent" is combined to obtain a first word of "twenty-five percent". It is worth mentioning that for merging data, which is a rough statistic, the accuracy may be slightly lower.

S232, if judging that one first word or a plurality of first words which can be combined into one first word exists in one sentence, associating all second words to be analyzed in the corresponding sentence with the corresponding first word or the combined first word.

If it is determined that there is one first word in a sentence or a plurality of first words that can be combined into one first word.

The scheme can associate all second words to be analyzed in the corresponding sentence with the corresponding first words or the combined first words.

S233, if it is determined that a sentence has a plurality of first words and the plurality of words cannot be combined into one first word, segmenting the sentence based on the positions of the first words to obtain a segmentation result, and determining a second word associated with each first word according to the segmentation result.

For example, one sentence is "the statistical result is that the electricity consumption of residents is 44455 and the industrial electricity consumption is 600000", wherein a plurality of first words "44455" and "600000" cannot be combined into one first word, at this time, the sentence is segmented based on the positions of the first words to obtain a segmentation result, and then the second word associated with each first word is determined according to the segmentation result.

Wherein S233 (if it is determined that there are multiple first words in a sentence and the multiple words cannot be combined into one first word, segmenting the sentence based on the positions of the first words to obtain a segmentation result, and determining the second word associated with each first word according to the segmentation result) includes S2331-S2332:

s2331, determining the positions of all the first words in the sentence, processing the sentence segments based on the positions of the first words to obtain a plurality of subsections, and determining the front adjacent sections of each first word as associated sections.

Illustratively, for "the statistical result is 44455 for the resident's electricity consumption and 600000 for the industrial electricity consumption", the first words are "44455" and "600000".

And processing the sentence segments based on the position of the first word to obtain a plurality of subsections, wherein the subsections are respectively 'the statistical result is that the electricity consumption of residents' is 'and the industrial electricity consumption is'.

Then, the scheme determines the front and adjacent segments of each first word as associated segments, namely an associated segment with the statistics of resident electricity consumption of '44455', and an associated segment with industrial electricity consumption of '600000'.

And S2332, using the second words in the association section as the second words associated with the corresponding first words.

The scheme can take the second words in the associated segment as the second words associated with the corresponding first words.

By the method, different strategies can be analyzed according to different conditions, and second words associated with corresponding first words are obtained.

S24, if the format of the first word is not consistent with the preset format, determining a preset first conversion template according to the format of the first word, and converting the first word based on the first conversion template to obtain the first word meeting the format requirement.

In some cases, the format of the first word may not be consistent with the preset format, which may be an arabic numeral format.

If the format of the first word is not consistent with the preset format, a preset first conversion template is determined according to the format of the first word, and the first word is converted based on the first conversion template to obtain the first word meeting the format requirement. And converting the format to carry out subsequent data arrangement of corresponding format.

In some embodiments, S24 (if it is determined that the format of the first word does not conform to the preset format, determining a preset first conversion template according to the format of the first word, and converting the first word based on the first conversion template to obtain the first word meeting the format requirement) includes S241-S244:

S241, comparing the format of the first word with a preset format, wherein the format of the first word is an Arabic number format, a great-number format or a complex number format, and the preset format is an Arabic number format.

The format of the first word is compared with the Arabic number format, and if the format of the first word does not correspond to the Arabic number format, the format of the first word needs to be converted.

And S242, if the format of the first word is not consistent with the preset format, determining a preset first conversion template according to the format of the first word, decomposing the first word to obtain a corresponding numerical word and a position word, and filling the numerical word into a relative vacancy in the first conversion template based on the position word.

It should be noted that the scheme is provided with a preset first conversion template, then the first word is decomposed by using the preset first conversion template to obtain a corresponding numerical word and a corresponding position word, and the numerical word is filled into a relative vacancy in the first conversion template based on the position word;

wherein S242 (if it is determined that the format of the first word is inconsistent with the preset format, determining a preset first conversion template according to the format of the first word, decomposing the first word to obtain a corresponding numeric word and a position word, and filling the numeric word into a relative space in the first conversion template based on the position word) includes S2421-S2424:

S2421, determining a preset first conversion template according to the format of the first word, wherein the first conversion template is provided with a plurality of gaps, decomposing and identifying the first word to obtain corresponding numerical words and position words, and taking each numerical word and the position words adjacent to the numerical word as a numerical group.

Wherein the first conversion template has a plurality of empty slots for filling in corresponding data.

First, the first word is decomposed and identified to obtain corresponding numerical words and position words, and then each numerical word and the position words adjacent to the numerical word are used as a numerical group.

It should be noted that the numerical value is data corresponding to the digital bit, and the position is data corresponding to the position bit. For example, for "two hundred and thirty", the "2" and "3" corresponding to "two" and "three" are numerical values, and the "hundred" and "ten" are positional words. Wherein "2" and "hundred" are a numerical group, and "3" and "ten" are a numerical group.

S2422, extracting the forefront position word in the whole first word to obtain the corresponding preset vacancy number, wherein each position word has the corresponding preset vacancy number.

The scheme can extract the forefront position word in the whole first word to obtain the corresponding preset vacancy number, wherein each position word has the corresponding preset vacancy number.

Illustratively, if the forefront position word is "thousand", the number of preset slots corresponding to the forefront position word is 4; if the forefront position word is 'hundred', the number of the corresponding preset vacancies is 3; if the forefront position word is ten, the number of preset vacancies corresponding to the forefront position word is 2.

S2423, reserving the number of gaps of preset gaps in the first conversion template, and adding position labels to each gap in sequence from back to front according to the reserved gaps.

The method can reserve the gaps of the preset number of gaps in the first conversion template, and position labels are added to each gap in sequence from back to front according to the reserved gaps.

For example, if the forefront position word is "hundred", the number of preset slots corresponding to the forefront position word is 3, and the position label is added to each slot sequentially from the back to the front. For example, the position tags of 3 slots correspond to "one", "ten", "hundred", respectively.

S2424, determining corresponding position labels and gaps according to the position words in each numerical value group, and filling the numerical value words in the numerical value groups into the determined gaps.

Finally, the method determines corresponding position labels and gaps according to the position words in each numerical group, and fills the numerical words in the numerical groups into the determined gaps.

For example, "2" is filled to the vacancy corresponding to "hundred", and "3" is filled to the vacancy corresponding to "ten".

S243, if it is determined that the first conversion template has unfilled voids, filling 0 into the unfilled voids.

Illustratively, during the above-mentioned filling process, the bits are not yet filled, at this time, the scheme fills the unfilled gaps with 0, and finally obtains 230 data.

S244, if all the gaps in the first conversion template are filled, taking the numbers formed by all the gaps as the first words meeting the format requirement.

It should be noted that, for the rare data with the end position word, the scheme can correspondingly convert the end position word. For example, for "two hundred thousand", the last "ten thousand" may be converted into 4 0 s, resulting in conversion data of 2300000.

And S3, generating a corresponding data demand list according to the calculation demand configured by the user for the current power report, and selecting a first word with corresponding analysis attribute as a third word based on the data demand list.

The data to be analyzed is also different due to the different demands of the users. The scheme can generate a corresponding data demand list according to the calculation demand configured by the user for the current power report.

After the data requirement list is obtained, the first word with the corresponding analysis attribute can be selected as the third word in combination with the data requirement list.

In some embodiments, S3 (generating a corresponding data demand list according to the computing demand configured by the user for the current power report, selecting the first word having the corresponding analysis attribute as the third word based on the data demand list) includes S31-S33:

s31, generating a corresponding data statistics table according to all the first words in the preset format, the combined first words and the respectively associated second words.

Firstly, the scheme can count all the first words in the preset format, the combined first words and the respectively associated second words to generate a corresponding data statistics table.

S32, generating a corresponding data demand list according to the calculation demand configured by the user for the current power report, and extracting a fourth word in the data demand list.

In some cases, the user may have multiple computing requirements, and at this time, a corresponding data statistics table may be generated for all the first words in the preset format, the combined first words, and the second words respectively associated therewith.

The calculating requirement is, for example, a requirement of calculating the electricity consumption of the residents, the industrial electricity consumption and the like, and the corresponding fourth word can be the electricity consumption of the residents, the industrial electricity consumption and the like.

S33, comparing the fourth word with the second word of the data statistics table, and if the fourth word corresponds to the second word, taking the first word corresponding to the corresponding second word of the data statistics table as a third word.

The fourth word is compared with the second word of the data statistics table, and if the fourth word corresponds to the second word, the corresponding data in the data statistics table is indicated. At this time, the scheme takes the first word corresponding to the corresponding second word in the data statistics table as the third word.

S4, determining a cleaning verification strategy corresponding to the third word according to the analysis attribute of the third word, and verifying the correctness of the third word based on the cleaning verification strategy and the historical data, wherein each type of analysis attribute has a preset cleaning verification strategy.

After the third word is obtained, the scheme can determine a cleaning verification strategy corresponding to the third word according to the analysis attribute of the third word, and each type of analysis attribute has a preset cleaning verification strategy.

And then, combining the cleaning verification strategy and the historical data to verify the correctness of the third word. It will be appreciated that since the statistical data in the power report may be inaccurate, for example, the data obtained is 1 degree for the residential power consumption, and this data must be erroneous, the verification of the correctness of the data is also performed before it is analyzed.

In some embodiments, S4 (determining, according to the analytical attribute of the third word, a cleaning verification policy corresponding to the third word, verifying correctness of the third word based on the cleaning verification policy and the historical data, where each type of analytical attribute has a preset cleaning verification policy) includes S41-S44:

s41, determining second words of the analysis attributes corresponding to the third words, and determining corresponding cleaning verification strategies according to the corresponding second words, wherein the second words of each category of subject analysis attributes or concept analysis attributes have preset cleaning verification strategies.

For example, for a subject analysis attribute to be resident electricity consumption, the preset cleaning verification policy may be to set a electricity consumption interval, i.e. a first threshold interval. And then judging whether the electricity consumption of the residents is within a first threshold value interval, and verifying the electricity consumption of the residents.

S42, comparing the third word with a first threshold interval included in the cleaning verification strategy, and if the numerical value of the third word is in the first threshold interval, judging that the corresponding third word meets the verification of the correctness of the cleaning verification strategy.

Firstly, the third word is compared with a first threshold interval included in the cleaning verification policy. It can be understood that if the value of the third word is within the first threshold interval, it is determined that the corresponding third word satisfies the verification of the correctness of the cleaning verification policy.

It is worth mentioning that the above steps are to verify whether the data are reasonable or not, and are affirmative.

And S43, corresponding historical data is determined according to the subject analysis attribute or the concept analysis attribute of the second word, an average value is obtained by calculation according to the historical data, and the average value is subjected to interval processing according to a preset proportion value to obtain a second threshold interval.

It will be appreciated that this step is to obtain corresponding historical data, and determine it in combination with the historical data.

After the average value is obtained by combining the historical data, the second threshold interval can be obtained by combining the preset proportion value to perform interval processing. For example, the preset ratio is 0.5 and the average value is 1000, and then the second threshold interval may be 500-1500.

It should be noted that the above steps are to verify whether the data is in doubt, and if the data is outside the second threshold interval, it is not necessarily in doubt.

And S44, if the numerical value of the third word is in the second threshold value interval, judging that the corresponding third word meets the verification of the correctness of the historical data.

If the numerical value of the third word is within the second threshold interval, the scheme can judge that the corresponding third word meets the verification of the correctness of the historical data.

S5, if the verification is not passed, determining that the verification is not passed, and determining a cleaning treatment strategy corresponding to the third word according to the verification doubt type of the third word, wherein the verification doubt type comprises an unreasonable doubt type or a doubt type to be verified.

In some embodiments, S5 (if it is determined that the verification is not passed, it is determined that the verification is not passed by the corresponding verification in-doubt type, and the cleaning processing policy corresponding to the third word is determined according to the verification in-doubt type of the third word, where the verification in-doubt type includes an unreasonable in-doubt type or an in-doubt type to be verified) includes S51-S55:

and S51, if the numerical value of the third word is judged not to be in the first threshold value interval, determining that the verification doubt type corresponding to the third word is an unreasonable doubt type.

It can be understood that if the numerical value of the third word is not within the first threshold interval, the description data is certainly wrong, and the scheme can determine that the verification doubt type corresponding to the third word is an unreasonable doubt type.

And S52, if the numerical value of the third word is judged not to be in the second threshold value interval, determining that the verification doubtful type corresponding to the third word is the to-be-verified doubtful type.

It can be understood that if the numerical value of the third word is not within the second threshold interval, the description data is doubtful, and the scheme can determine that the verification doubtful type corresponding to the third word is the to-be-verified doubtful type.

And S53, if the third word is of an unreasonable doubt type, marking the corresponding third word as an error, and outputting the third word and the corresponding second word so as to enable the user to update the third word directly.

If the third word is of an unreasonable doubt type, marking the corresponding third word as an error, and outputting the third word and the corresponding second word so as to enable the user to update the third word directly.

S54, if the third word is of the type to be verified and suspicious, marking the corresponding third word as the type to be verified, outputting the third word and the corresponding second word, and if the user inputs positive verification information, taking the corresponding third word as the final third word.

If the third word is of the type to be verified and suspicious, the corresponding third word is marked as the type to be verified, the third word and the corresponding second word are output, the further judgment of the user is assisted, and if the user inputs positive verification information, the corresponding third word is used as the final third word.

And S55, if the negative verification information is input by the user, updating the corresponding third word according to the user input.

If the user input is negative, updating the corresponding third word according to the user input.

And S6, after judging and obtaining all third words corresponding to the data demand list, inputting the third words into a preset model to calculate and obtain analysis data after data cleaning, and obtaining corresponding processing data based on the analysis data and a knowledge graph.

The method is provided with a preset model, the third word can be input into the preset model to be calculated to obtain analysis data after data cleaning, and then the analysis data and the knowledge graph are combined to obtain corresponding processing data.

In some embodiments, S6 (after determining that all third words corresponding to the data requirement list are obtained, inputting the third words into a preset model to calculate analysis data after data cleaning, and obtaining corresponding processing data based on the analysis data and a knowledge graph) includes S61-S63:

S61, after judging that all third words corresponding to the data demand list are obtained, inputting the third words into a preset model to calculate analysis data after data cleaning, wherein the analysis data comprises corresponding word segmentation words.

After all third words corresponding to the data demand list are judged, the third words can be input into a preset model to be calculated to obtain analysis data after data cleaning, wherein the analysis data comprises corresponding word segmentation words.

For example, the residential electricity data and the industrial electricity data may be compared in size by using a preset model, so as to obtain analysis data, for example, the analysis data may be that the industrial electricity occupies a relatively high amount. Finally, word segmentation words with high industrial electricity occupation are obtained.

S62, inputting the word segmentation words into a knowledge graph to determine knowledge nodes with corresponding first knowledge information, and according to the determined knowledge nodes and other knowledge nodes with second knowledge information.

According to the scheme, word segmentation words are input into a knowledge graph to determine knowledge nodes with corresponding first knowledge information, and the knowledge nodes with second knowledge information are determined according to the determined knowledge nodes and other knowledge nodes with second knowledge information.

The second knowledge node may be a processing node corresponding to the first knowledge node, for example, for the first knowledge node with relatively high industrial power consumption, the corresponding second knowledge node may be a node such as "limit industrial power consumption", "suggest factory to improve clean energy power supply ratio", and the like.

S63, counting second knowledge information of other knowledge nodes with the second knowledge information, obtaining corresponding processing data and outputting the corresponding processing data.

And finally, counting the second knowledge information of other knowledge nodes with the second knowledge information, obtaining corresponding processing data and outputting the corresponding processing data. It will be appreciated that for a first knowledge node, which may have a plurality of second knowledge nodes, the present solution outputs a corresponding plurality of second knowledge information as processing data.

Referring to fig. 2, a schematic structural diagram of an intelligent cleaning processing system for power report combining knowledge graph and semantic analysis according to an embodiment of the present invention includes:

The present invention also provides a storage medium having stored therein a computer program for implementing the methods provided by the various embodiments described above when executed by a processor.

The storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media can be any available media that can be accessed by a general purpose or special purpose computer. For example, a storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short). In addition, the ASIC may reside in a user device. The processor and the storage medium may reside as discrete components in a communication device. The storage medium may be read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tape, floppy disk, optical data storage device, etc.

The present invention also provides a program product comprising execution instructions stored in a storage medium. The at least one processor of the device may read the execution instructions from the storage medium, the execution instructions being executed by the at least one processor to cause the device to implement the methods provided by the various embodiments described above.

In the above embodiments of the terminal or the server, it should be understood that the processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: digital Signal Processor, abbreviated as DSP), application specific integrated circuits (english: application Specific Integrated Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. The intelligent cleaning processing method for the power report by combining the knowledge graph and the semantic analysis is characterized by comprising the following steps of:

2. The intelligent cleaning processing method for the power report combining the knowledge graph and the semantic analysis according to claim 1 is characterized in that,

the method comprises the steps of receiving knowledge information configured by intelligent cleaning and processing of a power report by a user, constructing a corresponding knowledge graph based on a triplet relation corresponding to the corresponding knowledge information, wherein the knowledge graph comprises a plurality of knowledge nodes, and the method comprises the following steps:

3. The intelligent cleaning processing method for the power report combining the knowledge graph and the semantic analysis according to claim 2 is characterized in that,

the word segmentation processing is performed on the sentences in the power report to obtain a plurality of words, the words with data attributes in the words are determined to be used as first words, other words are used as second words, the corresponding first words are subjected to semantic analysis by combining the second words in the sentences to obtain analysis attributes of the first words, and the method comprises the following steps:

4. The intelligent cleaning processing method for the power report combining the knowledge graph and the semantic analysis according to claim 3, wherein,

determining a second word associated with the corresponding first word according to the position relation between each first word and the second word to be analyzed, and carrying out semantic analysis on the corresponding first word based on the associated second word to obtain the analysis attribute of the first word, wherein the method comprises the following steps:

5. The intelligent cleaning processing method for the power report combining the knowledge graph and the semantic analysis according to claim 4, wherein,

if it is determined that a sentence has a plurality of first words and the plurality of words cannot be combined into one first word, segmenting the sentence based on the positions of the first words to obtain a segmentation result, and determining a second word associated with each first word according to the segmentation result, including:

6. The intelligent cleaning processing method for the power report combining the knowledge graph and the semantic analysis according to claim 4, wherein,

if the format of the first word is inconsistent with the preset format, determining a preset first conversion template according to the format of the first word, and converting the first word based on the first conversion template to obtain the first word meeting the format requirement, wherein the method comprises the following steps:

7. The intelligent cleaning processing method for the power report combining the knowledge graph and the semantic analysis according to claim 6, wherein,

if the format of the first word is inconsistent with the preset format, determining a preset first conversion template according to the format of the first word, decomposing the first word to obtain a corresponding numerical word and a position word, and filling the numerical word into a relative vacancy in the first conversion template based on the position word, wherein the method comprises the following steps:

8. The intelligent cleaning processing method for the power report combining knowledge graph and semantic analysis according to claim 6, further comprising:

9. The intelligent cleaning processing method for the power report combining the knowledge graph and the semantic analysis according to claim 6, wherein,

the generating a corresponding data demand list according to the calculation demand configured by the user for the current power report, selecting a first word with a corresponding analysis attribute as a third word based on the data demand list, including:

10. The intelligent cleaning processing method for the power report combining the knowledge graph and the semantic analysis according to claim 9 is characterized in that,

determining a cleaning verification policy corresponding to the third word according to the analysis attribute of the third word, verifying the correctness of the third word based on the cleaning verification policy and historical data, wherein each type of analysis attribute has a preset cleaning verification policy, and the method comprises the following steps:

11. The intelligent cleaning processing method for the power report combining the knowledge graph and the semantic analysis according to claim 10, wherein,

if the verification is judged not to be passed, determining a verification in-doubt type corresponding to the verification not to be passed, and determining a cleaning treatment strategy corresponding to the third word according to the verification in-doubt type of the third word, wherein the verification in-doubt type comprises an unreasonable in-doubt type or an in-doubt type to be verified, and comprises the following steps:

12. The intelligent cleaning processing method for the power report combining the knowledge graph and the semantic analysis according to claim 11, wherein,

after judging all third words corresponding to the data demand list, inputting the third words into a preset model to calculate analysis data after data cleaning, and obtaining corresponding processing data based on the analysis data and a knowledge graph, wherein the processing data comprises the following steps: