CN113641877A - Intelligent comparison method for relay protection fixed values - Google Patents

Intelligent comparison method for relay protection fixed values Download PDF

Info

Publication number
CN113641877A
CN113641877A CN202110941813.0A CN202110941813A CN113641877A CN 113641877 A CN113641877 A CN 113641877A CN 202110941813 A CN202110941813 A CN 202110941813A CN 113641877 A CN113641877 A CN 113641877A
Authority
CN
China
Prior art keywords
fixed value
dictionary
name
value
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110941813.0A
Other languages
Chinese (zh)
Other versions
CN113641877B (en
Inventor
戴志辉
方伟
李金铄
耿宏贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Electric Power University
Original Assignee
North China Electric Power University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Electric Power University filed Critical North China Electric Power University
Priority to CN202110941813.0A priority Critical patent/CN113641877B/en
Publication of CN113641877A publication Critical patent/CN113641877A/en
Application granted granted Critical
Publication of CN113641877B publication Critical patent/CN113641877B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Emergency Protection Circuit Devices (AREA)

Abstract

The invention belongs to an intelligent comparison method of relay protection constant values in the technical field of safe and stable operation of a power grid, and the intelligent comparison method of the relay protection constant values is an intelligent comparison method of the relay protection device constant values based on Chinese word segmentation; firstly, selecting a Chinese word segmentation technology to perform text processing on the relay protection device fixed value name, analyzing and carding naming rules of relay protection devices of different manufacturers and different types, and establishing a relay protection fixed value name dictionary; then, on the basis of a dictionary, carrying out Chinese word segmentation on the fixed value name on the fixed value list and the fixed value name on the operating equipment by adopting an improved maximum forward matching algorithm to obtain a word segmentation result and a word segmentation array; comparing the word groups, screening out the fixed value items of the running device matched with the fixed value single fixed value items, and comparing the fixed values; finally, for few special synonym problems, a sequence similarity calculation and error comparison prevention mechanism is introduced, accuracy and comprehensiveness are further improved, and the validity of the method is verified through example analysis.

Description

Intelligent comparison method for relay protection fixed values
Technical Field
The invention belongs to the technical field of safe and stable operation of a power grid, and particularly relates to an intelligent comparison method for relay protection fixed values. In particular to a relay protection device constant value intelligent comparison method based on Chinese word segmentation
Background
The reasonability and the accuracy of the relay protection setting value are crucial to ensuring the safe and stable operation of a power grid and fully exerting the performance of a relay protection device. With the change of the operation mode of the power grid, the protection fixed value of the device is changed after the processes of setting, checking, comparing, downloading and the like. At present, many units at home and abroad carry out targeted research, an online setting system and an online checking system are provided, and the operation efficiency of a relay protection system is greatly improved. Because the correctness of the operation setting value in the relay protection device directly affects the operation mode of a power grid system and has important significance on the safe and stable operation of the power grid, a transformer substation with 220kV or more needs to completely compare the actual operation setting value of the total station equipment with the latest scheduling setting value every half year, and a transformer substation with 110kV or less needs to compare once every year. In addition, the relay protection device needs to be compared again before the equipment is operated and new equipment is put into use. However, at present, research on relay protection constant value comparison is less, and constant value comparison still remains to be manually completed by using paper, telephone, fax, mail and other modes[8]. This approach has the following disadvantages:
(1) operation and maintenance personnel need to print a latest fixed value list of a protection device in the dispatching system, carry the printed fixed value list to a transformer substation site, look up operation fixed values through a human-computer interface of the protection device, and compare the operation fixed values one by one, wherein the fixed value checking workload of the existing in-operation equipment is large, and the bearing capacity of the existing personnel is limited;
(2) different relay protection devices of different manufacturers and different types have different display modes of digital items, control words and soft pressing plate functional modules, so that the problems of item missing, item missing and the like are easily caused;
(3) meanwhile, the source of the fixed value original data is more, the window of a human-computer interface of the equipment is limited, the operation flexibility is limited, the difficulty of field fixed value checking is increased, and the accuracy rate of the fixed value checking operation is low;
in view of the above situation, some solutions have been proposed by relevant units at home and abroad. For example, in the relay protection constant value online intelligent comparison method based on the hybrid professional dictionary, the names of the setting constant value and the operation constant value are matched through the maximum forward matching algorithm and the Jaccard similarity calculation without word sequence, so that intelligent online comparison is realized; but the alignment takes too long and cannot be matched, a synonymous allograph due to the term of art. The method is characterized by comprising a fixed value on-line comparison method based on multi-source data and a fixed value curing system based on an expert system so as to guarantee safe and stable operation of the smart grid. Because the system adopts the fuzzy matching method, higher accuracy cannot be ensured, and the possibility of comparison errors still exists. The fixed value reading and comparing technology based on the AR character recognition technology utilizes the structural characteristics of the table, and does not perform the segmentation of the fixed value name and the data during information acquisition, but the technology needs to utilize the table format fixed value list of the paper edition, and has poor compatibility with the fixed value lists of different formats. And obtaining a constant value parameter through a printing port of the relay protection equipment and comparing the constant value parameter with the constant value. And downloading the fixed value of the operation area, the soft pressing plate and the printing data switched in by the switch from a printing interface of the relay protection device. And (4) carrying out item-by-item searching and fuzzy matching on the fixed value file and the printing data in the system database to complete fixed value comparison. The method avoids interference on the protection equipment, has high safety, but considers that the low-voltage level protection equipment does not have a printing interface, and has poor applicability.
Disclosure of Invention
The invention aims to provide an intelligent comparison method for relay protection setting values, which is characterized in that the intelligent comparison method for the relay protection setting values is an intelligent comparison method for the relay protection device setting values based on Chinese word segmentation; firstly, selecting a Chinese word segmentation technology to perform text processing on the relay protection device fixed value name, analyzing and carding naming rules of relay protection devices of different manufacturers and different types, and establishing a relay protection fixed value name dictionary; then, on the basis of a dictionary, carrying out Chinese word segmentation on the fixed value name on the fixed value list and the fixed value name on the operating equipment by adopting an improved maximum forward matching algorithm to obtain a word segmentation result and a word segmentation array; finally, by comparing the word segmentation arrays, screening out the fixed value items of the operation device matched with the fixed value single fixed value items and comparing the fixed values, if the fixed value items are different, sending out a warning and providing a fixed value downloading application;
the method specifically comprises the following steps:
1. mechanical word segmentation algorithm based on character string matching
In combination with the actual situation of fixed value name matching, because most of the naming rules for protecting fixed value items are different from life terms, two word segmentation methods based on understanding and semantics cannot be applied to segmentation of fixed value name character strings. In addition, the statistical-based word segmentation method has great uncertainty, which affects the accuracy of word segmentation, thereby causing failed name matching or mismatching. The fixed value names are thus standardized, unified, and split to facilitate subsequent name matching.
2. Establishment of relay protection definite value name dictionary
The relay protection definite value name is composed of various relay protection terms, and a perfect definite value name dictionary is established according to the terms before a mechanical algorithm based on character string matching is adopted; therefore, naming rules of relay protection devices of different manufacturers and different types are comprehensively sorted and analyzed, a traditional mechanical word segmentation dictionary mechanism is improved, a relay protection constant value name dictionary suitable for constant value comparison is established, and follow-up constant value name matching efficiency is effectively improved.
3. Improved dictionary lookup mechanism
Aiming at the traditional mechanical word segmentation, segmenting the character string to be segmented by adopting a forward maximum matching algorithm according to an established dictionary mechanism; the requirement of constant value comparison work cannot be met; the dictionary lookup mechanism is improved as follows:
3.1 improved Forward maximum matching principles
Combining the characteristics of the naming rule of the relay protection constant value item, the improved forward maximum matching algorithm firstly takes the first character A of the constant value item name S to be segmented1Calculate the hash value H of the word1. According to the hash value H1Query first character A1The position in the first character Hash index table is read, and the maximum word length B in the first character Hash index table unit B is read1. In S, the length of the positive truncation is B1Substring T of1Then searching words in the constant value item name dictionary(ii) a strip; if entries and substrings T exist in the dictionary1If the two are identical, the matching is successful, and T is1As independent substring, and outputting T1Dictionary position number a1: if there is no synonym mark, then output T1The dictionary position number of the user; if the synonym mark exists, outputting the dictionary position number of the standard synonym; if there are no entries and substrings T1If the matching fails, the character reduction method is adopted to reduce the substring T to be cut1The last character is searched and matched in the dictionary again, and the process is repeated in a circulating way until the length of the substring to be divided is 1, namely A1Cutting out the single characters to finish one round of cutting; then, the next character is segmented and matched according to the process until S is completely segmented; finally, outputting the word sequence with the segmentation symbol and the corresponding numbering sequence which are subjected to standard normalization;
3.2 improved Forward maximum matching principles
Based on an improved fixed value item name dictionary, segmenting the character string according to an improved forward maximum matching algorithm; in order to match with a subsequent fixed value name matching mechanism to enable the fixed value name matching mechanism to have higher efficiency, the output sequence is adjusted according to the lexical property of the entries in the fixed value name dictionary, if the fixed value name contains a serial number word, the word sequence is adjusted, the serial number word is placed at the head of the word sequence after word segmentation, and the numbering sequence is correspondingly adjusted;
4. fixed value name matching based on word segmentation
Because of the improved dictionary mechanism and the improved word segmentation algorithm, most of the problems of synonyms of definite value names are solved, the similarity between the definite value list and the output sequence of the definite value item of the running definite value is 1, and the definite value list and the running definite value item can be directly compared and downloaded after matching; the problem of few synonyms with definite value names cannot be completely matched due to the defect of a dictionary mechanism, and a method combining a sequence similarity calculation method and an anti-error comparison mechanism is provided for solving the problem;
4.1 numbering sequence similarity calculation
The measuring indexes of the similarity of the two digital sequences can be mainly divided into two types of sequence position indexes and sequence numerical indexes, and the similarity of the two digital sequences is measured by comparing the sequence numerical indexes due to the language logic of the relay protection fixed value item name and the processing of the output word sequence in the step 3.2; sequentially calculating the similarity between the character string on the constant value list and the target string on the operating equipment to obtain a constant value item sequence meeting a given similarity threshold, and arranging the sequence according to the principle that the similarity is from large to small; if the similarity of the given sequence meets the threshold value and does not reach 1, the matching of the fixed value items is failed, namely the matching of the names of the fixed value items with similar numbering sequences is interfered by the similarity; in consideration of the fact that the probability that the numerical value of the fixed value item is completely the same as the unit is extremely low, an error comparison preventing mechanism is introduced, and the accuracy of fixed value comparison is improved;
4.2 mechanism for preventing mismatching
Interference of similarity matching is carried out according to the names of the constant value items with similar serial number sequences in the step 4.1, so that matching of the constant value items fails; in consideration of the fact that the probability that the numerical value of the fixed value item is completely the same as the unit is extremely low, an error comparison preventing mechanism is introduced, and the accuracy of fixed value comparison is improved; taking the fixed value item with the maximum similarity to carry out fixed value comparison, thereby judging whether the fixed value name matching of the participles is successful or not;
4.3 Relay protection constant value intelligent comparison process
The relay protection fixed value intelligent comparison process is used for comparing the actual operation fixed value of the total station equipment with the latest scheduling fixed value list; firstly, translating, sequencing and word segmentation are carried out on a fixed value single fixed value item and a fixed value item of an operating device by adopting an improved maximum forward matching algorithm, and a corresponding word sequence group is output; then, traversing the two word sequence groups, and inquiring whether the two word sequence groups are completely the same; if the two are the same, comparing the corresponding fixed values; if not, carrying out similarity calculation on the two word sequences, judging whether the word sequences meet the conditions by combining an anti-error comparison mechanism, and if not, not matching the word sequences; if the two are in accordance, the corresponding fixed values are compared.
The core idea of applying the mechanical word segmentation algorithm based on character string matching to constant value name matching in the step 1 is as follows: comprehensively combing and analyzing naming standards and habits of protecting fixed value item names at a transport equipment manufacturer and a scheduling master station, and establishing a fixed value name dictionary according to the naming standards and habits; under the condition of not using grammar knowledge and statistical information, matching the constant value name character strings to be segmented with the entries stored in the constant value name dictionary one by one according to a certain strategy, if the entries identical to the character strings can be found, indicating that the matching is successful, otherwise, re-intercepting the constant value name character strings according to increasing characters or decreasing characters, and searching in the dictionary again. Therefore, the mechanical word segmentation algorithm based on character string matching can be used as the optimal word segmentation algorithm in constant value item name matching.
The step 2 specifically comprises:
2.1 Relay protection definite value name composition analysis
The relay protection definite value name can be composed of four parts, namely a station name, a primary equipment name, a protection device model and a definite value item name. When the fixed value names are matched, matching is carried out in sequence according to the sequence of the plant station name, the primary equipment name, the protection device model and the fixed value item name;
(1) station name: generally in the form of "voltage class + place name + change/plant/power plant" or "place name + change/plant/power plant". When a dictionary is constructed, three words of 'change', 'factory' and 'power plant' are required to be stored in a fixed value name dictionary as fixed words, and a 'voltage level' and a 'place name' are stored in the dictionary according to actual application places.
(2) Primary device name: typically in the form of "location + line" or "# + digits + main changes/variants" (where the digits are typically no greater than 3, and may be stored in a dictionary as synonyms with the digits of the protective device model below). When the dictionary is constructed, the lines, the changes, the main changes and the # are taken as fixed words to be stored in the dictionary, and the places are stored in the dictionary according to actual application places.
(3) Protection device model: typically in the form of "english letters + numbers". When the dictionary is constructed, 26 English letters are stored in upper and lower cases and numbers 0-10 are stored in the dictionary as fixed words.
(4) Fixed value item name: the dictionary is generally composed of various relay protection terms, and when the dictionary is constructed, the various relay protection terms are required to be stored in the dictionary after being split; when a dictionary is constructed, attention needs to be paid to that the naming specifications and habits of the names of the fixed value items protected by the transport equipment manufacturer and the scheduling master station are different, so that the matching of the three types of fixed value items fails:
the 3 rd problem is matching failure caused by different word orders, and due to the diversity of Chinese expression modes, when a plurality of modifiers exist, the sequence of the modifiers is not fixed, namely, overcurrent protection II-segment current constant value and overcurrent protection II-segment current constant value;
2.2 fixed value name dictionary construction
According to the characteristics of the fixed value name item in the step 2.1, the original mechanical word segmentation dictionary mechanism is improved, and the improved fixed value item dictionary has a three-layer structure (as shown in figure 1.)
(1) First-character Hash index table: each unit in the first character Hash index table mainly comprises the following contents, A: the first character is stored by calculating the hash value of the first character of the fixed value item and taking the value as a serial number; as shown in the formula (1),
offset=(c1-0xB0)*94+(c2-0xA1) (1)
in formula (1), offset is the serial number of the first character in the Hash table, c1 and c2 are the machine code of the first character, and 0xB0 and 0xA1 are the first high byte and low byte of Chinese character coding; b: the maximum word length is the word number of the maximum word when the Chinese character in the dictionary text is the first word; c: the first pointer points to the position of the next-layer word index table;
(2) word index table: each unit in the word index table mainly comprises the following contents, A: all word lengths, all word lengths of which the Chinese character in the dictionary text is the first word; b: and the dictionary text pointer points to the position of the dictionary text which meets the first character of the Chinese character and accords with the word length.
(3) Dictionary text: each unit in the dictionary text mainly comprises the following contents, A: the vocabulary entry is a professional vocabulary related to a fixed value name, and comprises a Chinese vocabulary, an English vocabulary and a serial number vocabulary; b: the position number of the dictionary where the entry is located, namely the sequence number of the dictionary where the entry is located; c: synonym mark, "0" indicates that the entry is a standard expression of all synonyms or no synonym exists, a non-zero integer indicates that the entry is not a standard expression, and the non-zero integer is a position number of the corresponding standard expression in the dictionary; d: the part of speech mark "1" indicates that the entry is a sequence number type word or a compound word, and the fixed value names include "I", "II", "III" and "section"; "2" represents the term "current", "voltage" and "protection" in the term "univocal structural noun", constant value name; "3" indicates that the entry is a special alternative entry, "instantaneous current quick-break protection", "time-limited current quick-break protection" and "time-limited overcurrent protection"; such entry synonym translation processes are subsequently processed in a dictionary lookup process.
In the step 2.1- (4), three types of constant value items are matched:
the problem of category 1 is that the matching fails due to the existence of synonyms, and can be classified as:
1) sequence number type synonyms: the section I of the overcurrent protection and the section I of the overcurrent protection 1 belong to the same synonym and heteromorphism as the section 1 of the sequence number;
2) chinese synonym allographs, "overcurrent" with "overcurrent," "startup," and "startup";
3) english synonyms, "TV" and "PT";
the category 2 problem is matching failure due to terminology: the overcurrent protection I section and the current quick-break protection, the overcurrent protection II section and the time-limit overcurrent protection are adopted.
In the step 3.2, the output sequence is sequenced according to the lexical characters of the terms in the fixed value term dictionary, and the whole output word sequence is firstly traversed to find out whether sequence number words exist or not; if yes, judging whether the part-of-speech sign of the text of the dictionary where the word is located behind the word is '1'; if yes, the two words are combined into a compound word and placed at the head. After the processing, the single type words (namely the entries with the part of speech marked as '2') are sequenced according to the original sequence.
Aiming at the special classified terms, namely terms with the part of speech mark being '3', such as 'instantaneous current quick-break protection', 'time-limited current quick-break protection' and 'time-limited overcurrent protection', the terms are stored in a specific area in a mixed dictionary due to small occupation ratio. If the special alternative name type entries exist in the character string, the standard synonym sequence is correspondingly output according to the position number of the dictionary where the special alternative name type entries exist. For example, "instantaneous current snap protection" will be translated into a word segmentation output sequence "198, 24,47, 10" of "I-segment overcurrent protection" according to the dictionary location number "215"; the 'time-limited current quick-break protection' is translated into a word segmentation output sequence '29, 24,47, 10' of 'II-segment overcurrent protection' according to the position number '216' of the dictionary; the "timing limit overcurrent protection" will be translated into the participle output sequence "90, 24,47, 10" of "III-stage overcurrent protection" according to the dictionary position number "217".
The step 4.1 digital sequence similarity calculation methods are divided into four categories: (1) comparing the sequence containing the ordinal number words with the sequence without the ordinal number words; (2) alignment of two sequences containing different sequence numbers; (3) comparing two sequences which do not contain sequence number words; (4) comparing two sequences which both contain sequence numbers and have the same sequence number; for the first two cases, the two sequences have a similarity of 0. Aiming at the latter two cases, the dot product ratio of the sequence A and the sequence B is used as a sequence numerical index, and the specific formulas are shown as a formula (2) and a formula (3).
Figure BDA0003215176100000091
Figure BDA0003215176100000092
In the formula (2), DPR (A, B) represents the dot product ratio of the sequence A and the sequence B; n represents the length of the sequence;
Figure BDA0003215176100000093
representing the sum of products of corresponding positions of the two sequences;
Figure BDA0003215176100000094
representing the sum of squares of the numbers of the sequences; i represents the serial number of the sequence element, and takes the values of 0,2, … and n-1; the larger the dot product ratio is, the larger the dot product ratio isThe more similar the sequences; in the formula (3), NDPR (A, B) represents a normalized dot-product ratio, and the value range thereof is [0,1]]For example, the sequences "overcurrent I-segment protection" and "current I-segment protection" are output in the sequence "198, 24,47, 10" and "198, 24,27, 10" respectively after word segmentation, and the sequence similarity is calculated as 0.9932;
in the step 4.1, for the constant value sequences which meet the similarity threshold and are arranged from large to small according to the similarity, firstly, the constant value item corresponding to the constant value name with the maximum similarity value is taken to carry out corresponding constant value comparison; if the fixed value numerical value is consistent with the unit, the fixed value name is judged to be successfully matched, the fixed value of the operating equipment is consistent with the set value, the operating equipment does not need to be changed, and otherwise, the value consistency comparison of the next fixed value item is carried out; if the next comparison result is consistent, the fixed value name corresponding to the fixed value item is judged to be the corresponding fixed value name on the fixed value list, the fixed value item is correct and does not need to be changed, and otherwise, the fixed value item value consistency comparison of the next fixed value is carried out; if the comparison of the whole constant value sequence is finished and no constant value is consistent, determining that the constant value item with the maximum similarity and consistent constant value unit is a corresponding item on the constant value unit and determining that the constant value is wrong, sending alarm information, performing constant value downloading after confirming that the constant value is different, and calling the running constant value again for comparison after the constant value is finished;
and 4.2, carrying out fixed value comparison on the fixed value item with the maximum similarity in the step 4.2, thereby judging whether the fixed value name matching of the participle is successful or not: if the numerical value and the unit of the fixed value item are consistent, judging that the name of the fixed value item is successfully matched and is consistent with the set value, and not needing subsequent operation; if the units of the constant value terms are consistent and the values are inconsistent, there are two possibilities: (1) the name matching of the fixed value item is successful, but the running fixed value is inconsistent with the setting value, alarm information needs to be sent out, and the fixed value downloading is carried out; (2) the name matching of the fixed value item fails, and if the units of the fixed value item are inconsistent, the name matching of the fixed value item fails;
the method has the advantages of simplicity, practicability and easiness in implementation. Because the word classes in the protection fixed value name are fewer, the built dictionary has smaller and more perfect capacity, and the maximum defect of a mechanical word segmentation algorithm is effectively overcome; in addition, the longest word length for protecting the fixed value name is shorter, so that the operation speed of the mechanical word segmentation algorithm is increased to a certain extent, and the complexity of matching operation is reduced. The invention has the following advantages:
1) on the basis of a traditional dictionary, the grammar composition of fixed value names and Chinese and English terms are considered, and a more comprehensive fixed value item name dictionary is formed; meanwhile, data items such as synonym marks, serial number compound word marks, special noun marks, dictionary position numbers and the like are added into the dictionary text, so that the synonym expression problem is better processed.
2) On the basis of the traditional matching algorithm, processing methods such as word order adjustment, synonym substitution, special class word translation, word segmentation sequence output and the like are provided. Through an improved dictionary mechanism and a matching algorithm, most of constant value item name matching problems caused by word orders, professional terms, false words and the like are better processed.
3) Aiming at the rare matching which can not be realized due to the dictionary mechanism problem, an intelligent comparison method combining sequence similarity calculation and an anti-error comparison mechanism is provided, so that the constant value comparison is more comprehensive and accurate.
Drawings
FIG. 1 is a fixed value item name dictionary.
FIG. 2 is a flow chart of a forward maximum matching algorithm.
Fig. 3 is a flow chart of a forward maximum matching algorithm.
Fig. 4 is a flow chart of intelligent comparison of relay protection setting values.
Detailed Description
The invention provides an intelligent comparison method for relay protection setting values, which is based on a Chinese word segmentation intelligent comparison method for relay protection device setting values; the method comprises the following steps: 1. a mechanical word segmentation algorithm based on character string matching; 2. establishing a relay protection fixed value name dictionary; 3 improved dictionary lookup mechanism; 4. matching fixed value names based on the participles; firstly, selecting a Chinese word segmentation technology to perform text processing on the relay protection device fixed value name, analyzing and carding naming rules of relay protection devices of different manufacturers and different types, and establishing a relay protection fixed value name dictionary; then, on the basis of a dictionary, carrying out Chinese word segmentation on the fixed value name on the fixed value list and the fixed value name on the operating equipment by adopting an improved maximum forward matching algorithm to obtain a word segmentation result and a word segmentation array; finally, by comparing the word segmentation arrays, screening out the fixed value items of the operation device matched with the fixed value single fixed value items and comparing the fixed values, if the fixed value items are different, sending out a warning and providing a fixed value downloading application; the invention is further described with reference to the following figures and examples. The method specifically comprises the following steps:
1. mechanical word segmentation algorithm based on character string matching
In view of the diversification of the components of the fixed value name of the relay protection device and the fact that the existence of synonyms is the main reason for restricting the online intelligent comparison level of the fixed value, the invention selects the Chinese word segmentation technology to perform text processing on the fixed value name of the relay protection device, standardizes and unifies the fixed value name, and segments the fixed value name to facilitate the subsequent name matching.
In combination with the actual situation of fixed value name matching, because most of the naming rules for protecting fixed value items are different from life terms, two word segmentation methods based on understanding and semantics cannot be applied to segmentation of fixed value name character strings. In addition, the statistical-based word segmentation method has great uncertainty, which affects the accuracy of word segmentation, thereby causing failed name matching or mismatching. Therefore, the latter three methods are not suitable for the segmentation of the constant value name character string; in contrast, the core idea of applying the mechanical word segmentation algorithm based on string matching to constant value name matching is as follows: comprehensively combing and analyzing naming standards and habits of protecting fixed value item names at a transport equipment manufacturer and a scheduling master station, and establishing a fixed value name dictionary according to the naming standards and habits; under the condition of not using grammar knowledge and statistical information, matching the constant value name character strings to be segmented with the entries stored in the constant value name dictionary one by one according to a certain strategy, if the entries identical to the character strings can be found, indicating that the matching is successful, otherwise, re-intercepting the constant value name character strings according to increasing characters or decreasing characters, and searching in the dictionary again. The method has the advantages of simplicity, practicability and easy realization. Because the word classes in the protection fixed value name are fewer, the built dictionary is smaller and more complete in capacity, and the maximum defect of a mechanical word segmentation algorithm is effectively overcome. In addition, the longest word length for protecting the fixed value name is shorter, so that the operation speed of the mechanical word segmentation algorithm is increased to a certain extent, and the complexity of matching operation is reduced. Therefore, the mechanical word segmentation algorithm based on character string matching can be used as the optimal word segmentation algorithm in constant value item name matching.
2. Establishment of relay protection definite value name dictionary
The relay protection fixed value name is composed of various relay protection terms, and a perfect fixed value name dictionary is established according to the terms before a mechanical algorithm based on character string matching is adopted. The section comprehensively sorts and analyzes naming rules of relay protection devices of different manufacturers and different types, improves a traditional mechanical word segmentation dictionary mechanism, establishes a relay protection constant value name dictionary suitable for constant value comparison, and effectively improves subsequent constant value name matching efficiency.
2.1 Relay protection definite value name composition analysis
The relay protection definite value name can be composed of four parts, namely a station name, a primary equipment name, a protection device model and a definite value item name. When the fixed value names are matched, the factory station names, the primary equipment names, the protection device models and the fixed value item names are sequentially matched.
(1) Station name: generally in the form of "voltage class + place name + change/plant/power plant" or "place name + change/plant/power plant". When a dictionary is constructed, three words of 'change', 'factory' and 'power plant' are required to be stored in a fixed value name dictionary as fixed words, and a 'voltage level' and a 'place name' are stored in the dictionary according to actual application places.
(2) Primary device name: typically in the form of "location + line" or "# + digits + main changes/variants" (where the digits are typically no greater than 3, and may be stored in a dictionary as synonyms with the digits of the protective device model below). When the dictionary is constructed, the lines, the changes, the main changes and the # are taken as fixed words to be stored in the dictionary, and the places are stored in the dictionary according to actual application places.
(3) Protection device model: typically in the form of "english letters + numbers". When the dictionary is constructed, 26 English letters are stored in upper and lower cases and numbers 0-10 are stored in the dictionary as fixed words.
(4) Fixed value item name: the dictionary is composed of various relay protection terms generally, and the various relay protection terms are required to be stored in the dictionary after being split when the dictionary is constructed. When a dictionary is constructed, attention needs to be paid to the problem of failure in matching of three types of fixed value items caused by different naming specifications and habits of protecting the fixed value item names of a transport equipment manufacturer and a scheduling master station: the problem of type 1 is that matching fails due to the existence of synonymy heteromorphism, which can be divided into sequence number synonymy heteromorphism (such as overcurrent protection I section and overcurrent protection 1 section), Chinese synonymy heteromorphism (such as overcurrent and overcurrent, starting and starting), English synonymy heteromorphism (such as TV and PT); the problem of category 2 is matching failure caused by professional terms, such as "overcurrent protection I section" and "current quick-break protection", "overcurrent protection II section" and "time-limited overcurrent protection", and the like; the 3 rd kind of problem is matching failure caused by different word orders. Due to the diversity of Chinese expression modes, when a plurality of modifiers exist, the sequence of the modifiers is not always fixed (such as 'overcurrent protection II-segment current constant value' and 'II-segment overcurrent protection current constant value').
2.2 fixed value name dictionary construction
According to the characteristics of the fixed value name items described above, the original mechanical word segmentation dictionary mechanism is improved. The improved constant value item dictionary shown in fig. 1 has a three-layer structure.
(1) First-character Hash index table: each unit in the first character Hash index table mainly comprises the following contents, A: the first word is stored by calculating the value of the first word hash of the constant value item (shown in equation 1) and using this value as the sequence number.
offset=(c1-0xB0)*94+(c2-0xA1) (1)
In formula (1), offset is the serial number of the first character in the Hash table, c1 and c2 are the machine code of the first character, and 0xB0 and 0xA1 are the first high byte and low byte of Chinese character coding; b: the maximum word length is the word number of the maximum word when the Chinese character in the dictionary text is the first word; c: and the first pointer points to the position of the next word index table.
(2) Word index table: each unit in the word index table mainly comprises the following contents, A: all word lengths, all word lengths of which the Chinese character in the dictionary text is the first word; b: and the dictionary text pointer points to the position of the dictionary text which meets the first character of the Chinese character and accords with the word length.
(3) Dictionary text: each unit in the dictionary text mainly comprises the following contents, A: the terms are professional vocabularies related to fixed value names, and comprise Chinese vocabularies (such as current, voltage and the like), English vocabularies (such as PT) and serial number vocabularies (such as I, 1 and the like); b: the position number of the dictionary where the entry is located, namely the sequence number of the dictionary where the entry is located; c: synonym mark, "0" indicates that the entry is a standard expression of all synonyms or no synonym exists, a non-zero integer indicates that the entry is not a standard expression, and the non-zero integer is a position number of the corresponding standard expression in the dictionary; d: the part of speech flag "1" indicates that the entry is a sequence number class word or a compound word, such as "I", "II", "III", and "segment" in a constant value name. "2" represents the term univocal structural noun, such as "current", "voltage", and "protection" in the constant value name. The term "3" indicates that the term is a special alternative term, such as "instantaneous current quick-break protection", "time-limited current quick-break protection", and "time-limited overcurrent protection", and the like, and the subsequent translation processing of synonyms of the term is performed in the process of dictionary query.
3. Improved dictionary lookup mechanism
The traditional mechanical word segmentation method is used for segmenting character strings to be segmented by adopting a forward maximum matching algorithm according to an established dictionary mechanism; the forward maximum matching algorithm matches the characters to be segmented one by adopting forward scanning and a longest word minus character method, as shown in fig. 2. After segmentation of the traditional matching algorithm, the original character string is only changed into a string of word sequences with segmentation symbols, and the requirement of constant value comparison work cannot be met.
3.1 improved Forward maximum matching principles
The invention combines the characteristics of the naming rule of the relay protection fixed value item, and as shown in figure 3, the invention adopts an improved forward maximum matching algorithm, and for the name S of the fixed value item to be participled, firstly, the first character A is taken1Calculate the hash value H of the word1. According to the hash value H1Query first character A1The position in the first character Hash index table is read, and the maximum word length B in the first character Hash index table unit B is read1. In S, the length of the positive truncation is B1Substring T of1Then, the entries are looked up in a constant value entry name dictionary. If entries and substrings T exist in the dictionary1If the two are identical, the matching is successful, and T is1As independent substring, and outputting T1Dictionary position number a1(if there is no synonym flag, output T1The dictionary position number of the user; if the synonym mark exists, outputting the dictionary position number of the standard synonym); if there are no entries and substrings T1If the matching fails, the character reduction method is adopted to reduce the substring T to be cut1The last character is searched and matched in the dictionary again, and the process is repeated in a circulating way until the length of the substring to be divided is 1, namely A1And (5) cutting out as single characters, and finishing one round of cutting. And then, the next character is subjected to segmentation matching according to the process until the S segmentation is complete. And finally, outputting the word sequence with the segmentation symbols and the corresponding numbering sequence which are subjected to standard normalization.
3.1 improved Forward maximum matching principles
And based on an improved definite value item name dictionary, segmenting the character string according to an improved forward maximum matching algorithm. In order to match with a subsequent constant value name matching mechanism, the efficiency is higher. As shown in the flow chart of the forward maximum matching algorithm of figure 2,
the output sequence is adjusted according to the word part of speech in the fixed value item name dictionary, if the fixed value name contains the sequence number words, the word sequence is adjusted, the sequence number words are placed at the head of the word sequence after word segmentation, and the numbering sequence is correspondingly adjusted. Firstly, traversing the whole output word sequence, and searching whether sequence number words exist or not; if yes, judging whether the part-of-speech sign of the text of the dictionary where the word is located behind the word is '1'; if yes, the two words are combined into a compound word and placed at the head. After the processing, the single type words (namely the entries with the part of speech marked as '2') are sequenced according to the original sequence.
For the special category entries (i.e. the entry with the part-of-speech sign of "3"), such as "instantaneous current quick-break protection", "time-limited current quick-break protection", and "time-limited overcurrent protection", the entries are stored in a specific area in the hybrid dictionary due to their small occupation ratio. If the special alternative name type entries exist in the character string, the standard synonym sequence is correspondingly output according to the position number of the dictionary where the special alternative name type entries exist. For example, "instantaneous current snap protection" will be translated into a word segmentation output sequence "198, 24,47, 10" of "I-segment overcurrent protection" according to the dictionary location number "215"; the 'time-limited current quick-break protection' is translated into a word segmentation output sequence '29, 24,47, 10' of 'II-segment overcurrent protection' according to the position number '216' of the dictionary; the "timing limit overcurrent protection" will be translated into the participle output sequence "90, 24,47, 10" of "III-stage overcurrent protection" according to the dictionary position number "217".
4. Fixed value name matching based on word segmentation
Most of the synonym problem of definite value name (section 2.1) is solved due to the improved dictionary mechanism and the improved word segmentation algorithm, the similarity between the definite value list and the output sequence of the definite value item of the running definite value is 1, and the definite value list and the running definite value item can be directly compared and downloaded after matching. Few synonym problems with constant value names cannot be completely matched due to defects of dictionary mechanisms, such as "overcurrent segment I protection" and "current segment I protection", and the output sequences after word segmentation are "198, 24,47, 10" and "198, 24,27, 10", respectively. The invention provides a method for combining a sequence similarity calculation method and an anti-error alignment mechanism to solve the problems.
4.1 numbering sequence similarity calculation
The measurement indexes of the similarity of two digital sequences can be mainly divided into two types of sequence position indexes and sequence numerical value indexes. Due to the language logic of the relay protection fixed value item name and the processing of the output language sequence in the step 3.2, the similarity of the two digital sequences is measured by comparing the numerical indexes of the sequences.
The digital sequence similarity calculation methods can be classified into four categories: (1) comparing the sequence containing the ordinal number words with the sequence without the ordinal number words; (2) comparing two sequences which both contain ordinal numbers but are different from each other; (3) comparing two sequences which do not contain sequence number words; (4) alignment of two sequences, both containing ordinal words and being identical. For the first two cases, the two sequences have a similarity of 0. Aiming at the latter two conditions, the invention takes the dot product ratio of the sequences A and B as the sequence numerical index, and the specific formulas are shown as formula (2) and formula (3).
Figure BDA0003215176100000171
Figure BDA0003215176100000172
In the formulas (2) and (3), DPR (a, B) represents a dot product ratio of the sequence a and the sequence B; n represents the length of the sequence;
Figure BDA0003215176100000173
representing the sum of products of corresponding positions of the two sequences;
Figure BDA0003215176100000174
representing the sum of the squares of the numbers of the sequences. A larger dot product ratio indicates that the two sequences are more similar. NDPR (A, B) represents normalized dot-product ratio with a value range of [0, 1%]And i represents the serial number of the sequence element and takes the values of 0,2, … and n-1. For example, the sequences of "overcurrent I-segment protection" and "current I-segment protection" are output in the sequence of "198, 24,47, 10" and "198, 24,27, 10" respectively after word segmentation, and the sequence similarity is calculated as 0.9932;
for a definite value sequence which meets a similarity threshold and is arranged from large to small according to similarity, firstly, a definite value item corresponding to a definite value name with the maximum similarity value is taken for corresponding definite value comparison; if the fixed value numerical value is consistent with the unit, the fixed value name is judged to be successfully matched, the fixed value of the operating equipment is consistent with the set value, the operating equipment does not need to be changed, and otherwise, the value consistency comparison of the next fixed value item is carried out; if the next comparison result is consistent, the fixed value name corresponding to the fixed value item is judged to be the corresponding fixed value name on the fixed value list, the fixed value item is correct and does not need to be changed, and otherwise, the fixed value item value consistency comparison of the next fixed value is carried out; if the comparison of the whole constant value sequence is finished and no constant value is consistent, determining that the constant value item with the maximum similarity and consistent constant value unit is a corresponding item on the constant value unit and determining that the constant value is wrong, sending alarm information, performing constant value downloading after confirming that the constant value is different, and calling the running constant value again for comparison after the constant value is finished;
4.2 mechanism for preventing mismatching
And sequentially calculating the similarity between the character string on the constant value list and the target string on the operating equipment to obtain a constant value item sequence meeting a given similarity threshold, and arranging the sequence according to the principle that the similarity is from large to small. And (4) carrying out fixed value comparison on the fixed value item with the maximum similarity, and if the numerical value and the unit of the fixed value item are consistent, judging that the name matching of the fixed value item is successful and is consistent with the set value, and not needing subsequent operation. If the units of the constant value terms are consistent and the values are inconsistent, there are two possibilities: (1) the name matching of the fixed value item is successful, but the running fixed value is inconsistent with the setting value, alarm information needs to be sent out, and the fixed value downloading is carried out; (2) the constant value item name matching fails. And if the units of the constant value items are not consistent, the name matching of the constant value items fails. The reasons for the failure of matching of the fixed value items are all caused by the fact that the sequence similarity meets the threshold but does not reach 1, namely the fixed value item names with similar numbering sequences interfere with the similarity matching. In consideration of the fact that the probability that the numerical value of the fixed value item is completely the same as the unit is extremely low, the method introduces an anti-error comparison mechanism, and improves the accuracy of fixed value comparison.
And for the constant value sequences which meet the similarity threshold and are arranged from large to small according to the similarity, firstly, taking the constant value item corresponding to the constant value name with the maximum similarity value to carry out corresponding constant value comparison. If the fixed value numerical value is consistent with the unit, the fixed value name is judged to be successfully matched, the fixed value of the operating equipment is consistent with the set value, the operating equipment does not need to be changed, and otherwise, the value consistency comparison of the next fixed value item is carried out; if the next comparison result is consistent, the fixed value name corresponding to the fixed value item is judged to be the corresponding fixed value name on the fixed value list, the fixed value item is correct and does not need to be changed, and otherwise, the fixed value item value consistency comparison of the next fixed value is carried out; if the comparison of the whole constant value sequence is finished and no constant value is consistent, determining that the constant value item with the maximum similarity and consistent constant value unit is a corresponding item on the constant value unit and determining that the constant value is wrong, sending alarm information, confirming that the constant value is different, downloading the constant value, and calling the running constant value again for comparison after the constant value is finished.
And 4.2, carrying out fixed value comparison on the fixed value item with the maximum similarity in the step 4.2, thereby judging whether the fixed value name matching of the participle is successful or not: if the numerical value and the unit of the fixed value item are consistent, judging that the name of the fixed value item is successfully matched and is consistent with the set value, and not needing subsequent operation; if the units of the constant value terms are consistent and the values are inconsistent, there are two possibilities: (1) the name matching of the fixed value item is successful, but the running fixed value is inconsistent with the setting value, alarm information needs to be sent out, and the fixed value downloading is carried out; (2) the name matching of the fixed value item fails, and if the units of the fixed value item are inconsistent, the name matching of the fixed value item fails;
4.3 Relay protection constant value intelligent comparison process
The relay protection fixed value intelligent comparison process is shown in fig. 4, and the actual operation fixed value of the total station equipment is compared with the latest scheduling fixed value list. Firstly, an improved maximum forward matching algorithm is adopted to translate, sequence and divide words for a fixed value single fixed value item and a fixed value item of an operating device, and a corresponding word sequence group is output. And then traversing the two word sequence groups to inquire whether the two word sequence groups are identical. If the two are the same, comparing the corresponding fixed values; if not, similarity calculation is carried out on the two groups of word sequences, and whether the word sequences meet the conditions is judged by combining an anti-error comparison mechanism. If not, the two are not matched; if the two are in accordance, the corresponding fixed values are compared.
Examples
In order to verify the practicability of the method, naming specifications and habits of protection fixed-value item names of a transport equipment manufacturer and a dispatching master station are comprehensively sorted and analyzed, and differences exist in name description of a plurality of relay protection manufacturer devices and dispatching centers. Several special common differences were chosen as example verifications.
5.1 improved Forward maximum matching Algorithm
The method is characterized in that example analysis is carried out by taking fan starting overcurrent, a re-pressure locking I section and instantaneous current quick-break protection as examples. The conventional word segmentation process is shown in table 1, and the improved word segmentation process is shown in table 2.
TABLE 1 conventional Forward Max matching Algorithm
Figure BDA0003215176100000201
TABLE 2 improved Forward Max-match Algorithm
Figure BDA0003215176100000211
The traditional word segmentation method only outputs a string of word sequences with segmentation symbols, and does not perform other processing on entries and word sequences. Compared with the first group of character string segmentation process, the improved segmentation algorithm is added with the synonym mark, and synonym query and replacement are carried out on the characters matched with the sub strings. For example, in the first group of word segmentation process of 'fan start over current', synonym marks exist in the query character strings of 'start' and 'over current', synonym marks are obtained from synonym terms in the constant value name dictionary text and are replaced, and the word segmentation result is 'fan/start/over current'. The synonym replacement process solves the problem that the fixed value name matching fails due to the synonym and heteromorphism (described in section 2.1), and facilitates subsequent matching work.
Compared with the second group of character string word segmentation process, the improved word segmentation algorithm adjusts the word sequence of the word segmentation result. For example, in the word segmentation process of the second group of the "re-pressing locked I section", after replacing the serial number type entry "I" with the standard word "1", the serial number type entry and the standard word "section" are combined into a compound word which is placed at the head for output. The adjustment of the word segmentation sequence solves the problem of failure in matching the fixed value name due to different semantemes of the word sequence, which is described in section 2.1, and facilitates the similarity calculation of the subsequent data set.
Compared with the third group of character string word segmentation process, the improved word segmentation algorithm adds the query and conversion of the special word mark. For example, the word segmentation process of the instantaneous current snap protection is that the phrase has a special word mark and exists in a special word area of the dictionary. The output word segmentation result is '1 segment/overcurrent/protection' according to the position of the word segmentation result in the dictionary. The introduction of the special word mark solves the problem of synonymy of the special words caused by the professional terms (described in section 2.1), and facilitates subsequent matching work.
5.2 example of Special string alignment
Taking a fixed value single character string s as 'overcurrent I section protection' and a running equipment fixed value item character string space T as { T1,t2,t3,t4}. Wherein, take t1Section II overcurrent protection, t2Current I segment protection, t3Time equal to zero sequence I and t4The current is zero sequence I section current. The word segmentation results are shown in table 3. The similarity between the character string s and the target space T is calculated, and the result is shown in table 4.
The threshold defined by the similarity of the most distant synonym sequences in the dictionary is 0.95. As can be seen from table 4, the strings satisfying the threshold are "current I segment protection" and "zero sequence I segment current", respectively. S and t are obtained from the formulae (2) and (3)2And t4The similarity of the two groups meets the threshold value, and then the value of the fixed value item is compared. As shown in Table 5, the set value of s and t2Is consistent with the setting value of t4Has different setting values of s and t2When the similarity is the highest, the target string corresponding to s is determined to be t2And if the setting value is qualified, the next operation is not needed. Therefore, the accuracy of the matching result can be improved to a certain extent by the reverse correction (namely, the error comparison prevention mechanism) of the value of the fixed value item.
TABLE 3 results of word segmentation
Figure BDA0003215176100000231
TABLE 4 calculation of similarity
Figure BDA0003215176100000232
TABLE 5 Source string and string fixed value name and setting value
Figure BDA0003215176100000233

Claims (7)

1. An intelligent comparison method for relay protection setting values is characterized in that the intelligent comparison method for the relay protection setting values is an intelligent comparison method for the relay protection device setting values based on Chinese word segmentation; the diversification of the fixed value name components of the relay protection device and the existence of synonyms are main reasons for restricting the on-line intelligent comparison level of the fixed value. Firstly, selecting a Chinese word segmentation technology to perform text processing on the relay protection device fixed value name, analyzing and carding naming rules of relay protection devices of different manufacturers and different types, and establishing a relay protection fixed value name dictionary; then, on the basis of a dictionary, carrying out Chinese word segmentation on the fixed value name on the fixed value list and the fixed value name on the operating equipment by adopting an improved maximum forward matching algorithm to obtain a word segmentation result and a word segmentation array; finally, by comparing the word segmentation arrays, screening out the fixed value items of the operation device matched with the fixed value single fixed value items and comparing the fixed values, if the fixed value items are different, sending out a warning and providing a fixed value downloading application; the method specifically comprises the following steps:
1. mechanical word segmentation algorithm based on character string matching
Combining the actual situation of fixed value name matching, comprehensively combing and analyzing naming specifications and habits of protecting fixed value item names by a transport equipment manufacturer and a scheduling master station, and establishing a fixed value name dictionary according to the naming specifications and habits; under the condition of not using grammar knowledge and statistical information, matching constant value name character strings to be segmented with entries stored in a constant value name dictionary one by one according to a certain strategy; therefore, the fixed value name is standardized and unified, and is segmented to facilitate subsequent name matching;
2. establishment of relay protection definite value name dictionary
The relay protection definite value name is composed of various relay protection terms, and a perfect definite value name dictionary is established according to the terms before a mechanical algorithm based on character string matching is adopted; therefore, naming rules of relay protection devices of different manufacturers and different types are comprehensively sorted and analyzed, a traditional mechanical word segmentation dictionary mechanism is improved, a relay protection constant value name dictionary suitable for constant value comparison is established, and the subsequent constant value name matching efficiency is effectively improved;
3 improved dictionary lookup mechanism
Aiming at the traditional mechanical word segmentation, segmenting the character string to be segmented by adopting a forward maximum matching algorithm according to an established dictionary mechanism; the requirement of constant value comparison work cannot be met; the dictionary lookup mechanism is improved as follows:
3.1 improved Forward maximum matching principles
Combining the characteristics of the naming rule of the relay protection constant value item, the improved forward maximum matching algorithm firstly takes the first character A of the constant value item name S to be segmented1Calculate the hash value H of the word1(ii) a According to the hash value H1Query first character A1The position in the first character Hash index table is read, and the maximum word length B in the first character Hash index table unit B is read1In S, the length of the positive truncation is B1Substring T of1Then, searching entries in a fixed value item name dictionary; if entries and substrings T exist in the dictionary1If the two are identical, the matching is successful, and T is1As independent substring, and outputting T1Dictionary position number a1: if there is no synonym mark, then output T1The dictionary position number of the user; if the synonym mark exists, outputting the dictionary position number of the standard synonym; if there are no entries and substrings T1If the matching fails, the character reduction method is adopted to reduce the substring T to be cut1The last character is searched and matched in the dictionary again, and the process is repeated in a circulating way until the length of the substring to be divided is 1, namely A1Cut out as single character, then finishCutting in one round; then, the next character is segmented and matched according to the process until S is completely segmented; finally, outputting the word sequence with the segmentation symbol and the corresponding numbering sequence which are subjected to standard normalization;
3.2 improved Forward maximum matching principles
Based on an improved fixed value item name dictionary, segmenting the character string according to an improved forward maximum matching algorithm; in order to match with a subsequent fixed value name matching mechanism to enable the fixed value name matching mechanism to have higher efficiency, the output sequence is adjusted according to the lexical property of the entries in the fixed value name dictionary, if the fixed value name contains a serial number word, the word sequence is adjusted, the serial number word is placed at the head of the word sequence after word segmentation, and the numbering sequence is correspondingly adjusted;
4. fixed value name matching based on word segmentation
Because of the improved dictionary mechanism and the improved word segmentation algorithm, most of the problems of synonyms of definite value names are solved, the similarity between the definite value list and the output sequence of the definite value item of the running definite value is 1, and the definite value list and the running definite value item can be directly compared and downloaded after matching; the problem of few synonyms with definite value names cannot be completely matched due to the defect of a dictionary mechanism, and a method combining a sequence similarity calculation method and an anti-error comparison mechanism is provided for solving the problem;
4.1 numbering sequence similarity calculation
The measuring indexes of the similarity of the two digital sequences can be mainly divided into two types of sequence position indexes and sequence numerical indexes, and the similarity of the two digital sequences is measured by comparing the sequence numerical indexes due to the language logic of the relay protection fixed value item name and the processing of the output word sequence in the step 3.2; sequentially calculating the similarity between the character string on the constant value list and the target string on the operating equipment to obtain a constant value item sequence meeting a given similarity threshold, and arranging the sequence according to the principle that the similarity is from large to small; if the similarity of the given sequence meets the threshold value and does not reach 1, the matching of the fixed value items is failed, namely the matching of the names of the fixed value items with similar numbering sequences is interfered by the similarity; in consideration of the fact that the probability that the numerical value of the fixed value item is completely the same as the unit is extremely low, an error comparison preventing mechanism is introduced, and the accuracy of fixed value comparison is improved;
4.2 mechanism for preventing mismatching
Interference of similarity matching is carried out according to the names of the constant value items with similar serial number sequences in the step 4.1, so that matching of the constant value items fails; in consideration of the fact that the probability that the numerical value of the fixed value item is completely the same as the unit is extremely low, an error comparison preventing mechanism is introduced, and the accuracy of fixed value comparison is improved; taking the fixed value item with the maximum similarity to carry out fixed value comparison, thereby judging whether the fixed value name matching of the participles is successful or not;
4.3 Relay protection constant value intelligent comparison process
The relay protection fixed value intelligent comparison process is used for comparing the actual operation fixed value of the total station equipment with the latest scheduling fixed value list; firstly, translating, sequencing and word segmentation are carried out on a fixed value single fixed value item and a fixed value item of an operating device by adopting an improved maximum forward matching algorithm, and a corresponding word sequence group is output; then, traversing the two word sequence groups, and inquiring whether the two word sequence groups are completely the same; if the two are the same, comparing the corresponding fixed values; if not, carrying out similarity calculation on the two word sequences, judging whether the word sequences meet the conditions by combining an anti-error comparison mechanism, and if not, not matching the word sequences; if the two are in accordance, the corresponding fixed values are compared.
2. The intelligent relay protection fixed value comparison method according to claim 1, wherein the step 2 specifically comprises:
2.1 Relay protection definite value name composition analysis
The relay protection definite value name can be composed of four parts, namely a station name, a primary equipment name, a protection device model and a definite value item name. When the fixed value names are matched, matching is carried out in sequence according to the sequence of the plant station name, the primary equipment name, the protection device model and the fixed value item name;
(1) station name: generally in the form of "voltage class + place name + change/plant/power plant" or "place name + change/plant/power plant". When a dictionary is constructed, three words of 'change', 'factory' and 'power plant' are required to be stored in a fixed value name dictionary as fixed words, and a 'voltage level' and a 'place name' are stored in the dictionary according to actual application places;
(2) primary device name: typically in the form of "location + line" or "# + digits + main changes/variants" (where the digits are typically no greater than 3, and may be stored in a dictionary as synonyms with the digits of the protective device model below). When a dictionary is constructed, lines, changes, main transformers and # are taken as fixed words and stored in the dictionary, and places are stored in the dictionary according to actual application places;
(3) protection device model: appears in the form of "english letters + numbers"; when a dictionary is constructed, 26 English letters in upper and lower cases and numbers 0-10 are used as fixed vocabularies to be stored in the dictionary;
(4) fixed value item name: the dictionary is generally composed of various relay protection terms, and when the dictionary is constructed, the various relay protection terms are required to be stored in the dictionary after being split; when a dictionary is constructed, attention needs to be paid to that the naming specifications and habits of the names of the fixed value items protected by the transport equipment manufacturer and the scheduling master station are different, so that the matching of the three types of fixed value items fails:
the 3 rd problem is matching failure caused by different word orders, and due to the diversity of Chinese expression modes, when a plurality of modifiers exist, the sequence of the modifiers is not fixed, namely, overcurrent protection II-segment current constant value and overcurrent protection II-segment current constant value;
2.2 fixed value name dictionary construction
According to the characteristics of the fixed value name item in the step 2.1, the original mechanical word segmentation dictionary mechanism is improved, the improved fixed value item dictionary has three layers of structures,
(1) first-character Hash index table: each unit in the first character Hash index table mainly comprises the following contents, A: the first character is stored by calculating the hash value of the first character of the fixed value item and taking the value as a serial number; as shown in the formula (1),
offset=(c1-0xB0)*94+(c2-0xA1) (1)
in formula (1), offset is the serial number of the first character in the Hash table, c1 and c2 are the machine code of the first character, and 0xB0 and 0xA1 are the first high byte and low byte of Chinese character coding; b: the maximum word length is the word number of the maximum word when the Chinese character in the dictionary text is the first word; c: the first pointer points to the position of the next-layer word index table;
(2) word index table: each unit in the word index table mainly comprises the following contents, A: all word lengths, all word lengths of which the Chinese character in the dictionary text is the first word; b: the dictionary text pointer points to the position of the dictionary text which meets the first character of the Chinese character and accords with the word length;
(3) dictionary text: each unit in the dictionary text mainly comprises the following contents, A: the vocabulary entry is a professional vocabulary related to a fixed value name, and comprises a Chinese vocabulary, an English vocabulary and a serial number vocabulary; b: the position number of the dictionary where the entry is located, namely the sequence number of the dictionary where the entry is located; c: synonym mark, "0" indicates that the entry is a standard expression of all synonyms or no synonym exists, a non-zero integer indicates that the entry is not a standard expression, and the non-zero integer is a position number of the corresponding standard expression in the dictionary; d: the part of speech mark "1" indicates that the entry is a sequence number type word or a compound word, and the fixed value names include "I", "II", "III" and "section"; "2" represents the term "current", "voltage" and "protection" in the term "univocal structural noun", constant value name; "3" indicates that the entry is a special alternative entry, "instantaneous current quick-break protection", "time-limited current quick-break protection" and "time-limited overcurrent protection"; such entry synonym translation processes are subsequently processed in a dictionary lookup process.
3. The intelligent relay protection constant value comparison method according to claim 1, wherein three types of constant value items in the step 2.1- (4) are matched:
the problem of category 1 is that the matching fails due to the existence of synonyms, and can be classified as:
1) sequence number type synonyms: the section I of the overcurrent protection and the section I of the overcurrent protection 1 belong to the same synonym and heteromorphism as the section 1 of the sequence number;
2) chinese synonym allographs, "overcurrent" with "overcurrent," "startup," and "startup";
3) english synonyms, "TV" and "PT";
the category 2 problem is matching failure due to terminology: the overcurrent protection I section and the current quick-break protection, the overcurrent protection II section and the time-limit overcurrent protection are adopted.
4. The intelligent relay protection constant value comparison method according to claim 1, wherein in 3.2, the whole output word sequence is firstly traversed by sequence adjustment of the output sequence according to the vocabulary property in the constant value item name dictionary to find out whether a sequence number word exists; if yes, judging whether the part-of-speech sign of the text of the dictionary where the word is located behind the word is '1'; if yes, the two words are combined into a compound word and placed at the head. After the processing is finished, sequencing the single words, namely the entries with the part of speech marks of 2 according to the original sequence;
for the entry of the special category, namely the entry with the part of speech marked as '3', the entry has small occupation ratio and is stored in a specific area in the mixed dictionary. If the special alternative name type entries exist in the character string, the standard synonym sequence is correspondingly output according to the position number of the dictionary where the special alternative name type entries exist.
5. The intelligent relay protection constant value comparison method according to claim 1, wherein the step 4.1 digital sequence similarity calculation method is divided into four categories: (1) comparing the sequence containing the ordinal number words with the sequence without the ordinal number words; (2) alignment of two sequences containing different sequence numbers; (3) comparing two sequences which do not contain sequence number words; (4) comparing two sequences which both contain sequence numbers and have the same sequence number; for the first two cases, the two sequences have a similarity of 0. Aiming at the latter two conditions, the dot product ratio of the sequence A and the sequence B is used as a sequence numerical index, the specific formulas are shown as a formula (2) and a formula (3),
Figure FDA0003215176090000071
Figure FDA0003215176090000072
in the formula (2), DPR (A, B) represents the dot product ratio of the sequence A and the sequence B; n represents the length of the sequence;
Figure FDA0003215176090000073
representing the sum of products of corresponding positions of the two sequences;
Figure FDA0003215176090000074
representing the sum of squares of the numbers of the sequences; i represents the serial number of the sequence element, and takes the values of 0,2, … and n-1; a larger dot product ratio indicates that the two sequences are more similar; in the formula (3), NDPR (A, B) represents a normalized dot-product ratio, and the value range thereof is [0,1]]。
In the formula (3), NDPR (a, B) represents a normalized dot-product ratio, and its value range is [0,1 ].
6. The intelligent relay protection constant value comparison method according to claim 1, wherein in step 4.1, for the constant value sequences which satisfy the similarity threshold and are arranged from large to small according to the similarity, the corresponding constant value comparison is performed by first taking the constant value item corresponding to the constant value name with the largest similarity value; if the fixed value numerical value is consistent with the unit, the fixed value name is judged to be successfully matched, the fixed value of the operating equipment is consistent with the set value, the operating equipment does not need to be changed, and otherwise, the value consistency comparison of the next fixed value item is carried out; if the next comparison result is consistent, the fixed value name corresponding to the fixed value item is judged to be the corresponding fixed value name on the fixed value list, the fixed value item is correct and does not need to be changed, and otherwise, the fixed value item value consistency comparison of the next fixed value is carried out; if the comparison of the whole constant value sequence is finished and no constant value is consistent, determining that the constant value item with the maximum similarity and consistent constant value unit is a corresponding item on the constant value unit and determining that the constant value is wrong, sending alarm information, confirming that the constant value is different, downloading the constant value, and calling the running constant value again for comparison after the constant value is finished.
7. The intelligent relay protection constant value comparison method according to claim 1, wherein the constant value item with the largest similarity in step 4.2 is subjected to constant value comparison, so as to determine whether the matching of the fixed value names of the participles is successful: if the numerical value and the unit of the fixed value item are consistent, judging that the name of the fixed value item is successfully matched and is consistent with the set value, and not needing subsequent operation; if the units of the constant value terms are consistent and the values are inconsistent, there are two possibilities: (1) the name matching of the fixed value item is successful, but the running fixed value is inconsistent with the setting value, alarm information needs to be sent out, and the fixed value downloading is carried out; (2) and if the units of the fixed value items are inconsistent, the fixed value item name matching fails.
CN202110941813.0A 2021-08-17 2021-08-17 Intelligent comparison method for relay protection fixed values Active CN113641877B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110941813.0A CN113641877B (en) 2021-08-17 2021-08-17 Intelligent comparison method for relay protection fixed values

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110941813.0A CN113641877B (en) 2021-08-17 2021-08-17 Intelligent comparison method for relay protection fixed values

Publications (2)

Publication Number Publication Date
CN113641877A true CN113641877A (en) 2021-11-12
CN113641877B CN113641877B (en) 2023-07-14

Family

ID=78422277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110941813.0A Active CN113641877B (en) 2021-08-17 2021-08-17 Intelligent comparison method for relay protection fixed values

Country Status (1)

Country Link
CN (1) CN113641877B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6031703A (en) * 1997-02-10 2000-02-29 Schneider Electric Sa Protection relay and process
CN104539047A (en) * 2014-12-23 2015-04-22 国家电网公司 Intelligent substation fault diagnosing and positioning method based on multi-factor comparison visualization
US20170173262A1 (en) * 2017-03-01 2017-06-22 François Paul VELTZ Medical systems, devices and methods
US20190109451A1 (en) * 2016-04-22 2019-04-11 Mitsubishi Electric Corporation Circuit breaker failure protection relay and protection relay system
CN109753683A (en) * 2018-11-29 2019-05-14 国家电网有限公司 A kind of forming method of relay protection setting software protecting equipment model
CN110991184A (en) * 2019-12-10 2020-04-10 国网青海省电力公司 Relay protection fixed value self-adaptive checking method based on comprehensive dictionary characteristics
CN112003234A (en) * 2020-09-02 2020-11-27 广西电网有限责任公司河池供电局 Intelligent calibration system and method for relay protection equipment fixed value information
CN112180186A (en) * 2019-07-05 2021-01-05 国网新疆电力有限公司 Automatic checking method for fixed value of intelligent substation relay protection device
CN112182313A (en) * 2020-09-30 2021-01-05 国网青海省电力公司 Relay protection setting value name matching method and system
CN112415314A (en) * 2020-11-17 2021-02-26 华北电力大学(保定) Hidden fault identification method for relay protection system
CN112436477A (en) * 2020-11-10 2021-03-02 云南电网有限责任公司昆明供电局 Instrument is checked fast to relay protection device definite value of transformer substation

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6031703A (en) * 1997-02-10 2000-02-29 Schneider Electric Sa Protection relay and process
CN104539047A (en) * 2014-12-23 2015-04-22 国家电网公司 Intelligent substation fault diagnosing and positioning method based on multi-factor comparison visualization
US20190109451A1 (en) * 2016-04-22 2019-04-11 Mitsubishi Electric Corporation Circuit breaker failure protection relay and protection relay system
US20170173262A1 (en) * 2017-03-01 2017-06-22 François Paul VELTZ Medical systems, devices and methods
CN109753683A (en) * 2018-11-29 2019-05-14 国家电网有限公司 A kind of forming method of relay protection setting software protecting equipment model
CN112180186A (en) * 2019-07-05 2021-01-05 国网新疆电力有限公司 Automatic checking method for fixed value of intelligent substation relay protection device
CN110991184A (en) * 2019-12-10 2020-04-10 国网青海省电力公司 Relay protection fixed value self-adaptive checking method based on comprehensive dictionary characteristics
CN112003234A (en) * 2020-09-02 2020-11-27 广西电网有限责任公司河池供电局 Intelligent calibration system and method for relay protection equipment fixed value information
CN112182313A (en) * 2020-09-30 2021-01-05 国网青海省电力公司 Relay protection setting value name matching method and system
CN112436477A (en) * 2020-11-10 2021-03-02 云南电网有限责任公司昆明供电局 Instrument is checked fast to relay protection device definite value of transformer substation
CN112415314A (en) * 2020-11-17 2021-02-26 华北电力大学(保定) Hidden fault identification method for relay protection system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHAO YANG 等: ""Research and Application on the Comparison System for Relay Protection Settings Based on Document Comparison Tool"", 《2020 IEEE IAS INDUSTRIAL AND COMMERCIAL POWER SYSTEM ASIA TECHNICAL CONFERENCE》, pages 954 - 958 *
蒙亮 等: ""广西电网继电保护定值在线比对系统"", 《红水河》, pages 84 - 88 *

Also Published As

Publication number Publication date
CN113641877B (en) 2023-07-14

Similar Documents

Publication Publication Date Title
CN108304372B (en) Entity extraction method and device, computer equipment and storage medium
CN111860882B (en) Method and device for constructing power grid dispatching fault processing knowledge graph
CN112905804B (en) Dynamic updating method and device for power grid dispatching knowledge graph
US7281001B2 (en) Data quality system
US7827025B2 (en) Efficient capitalization through user modeling
CN110377901B (en) Text mining method for distribution line trip filling case
CN110991184B (en) Relay protection fixed value self-adaptive checking method based on comprehensive dictionary characteristics
CN112036179B (en) Electric power plan information extraction method based on text classification and semantic frame
CN112883693A (en) Method and terminal for automatically generating electric power work ticket
GB2375859A (en) Search engine systems
CN114662279A (en) Relay protection information modeling method and system based on secondary equipment big data platform
CN114996470A (en) Intelligent scheduling maintenance identification library construction method
CN114495143A (en) Text object identification method and device, electronic equipment and storage medium
CN114822545A (en) Method for improving speech recognition rate in professional field
CN115618883A (en) Business semantic recognition method and device
CN115563968A (en) Water and electricity transportation and inspection knowledge natural language artificial intelligence system and method
CN113641877B (en) Intelligent comparison method for relay protection fixed values
CN114818663B (en) Hierarchical intelligent pinyin and character matching method
CN111553158A (en) Method and system for identifying named entities in power scheduling field based on BilSTM-CRF model
CN113420564B (en) Hybrid matching-based electric power nameplate semantic structuring method and system
CN114861649A (en) Professional-field-oriented pinyin and character matching method
CN110928990A (en) Method special for recommending standing book data of power equipment based on user portrait
CN113515950B (en) Natural language processing semantic analysis method suitable for intelligent power dispatching
CN113011183B (en) Unstructured text data processing method and system in electric power regulation and control field
CN114491014A (en) Relay protection fixed value intelligent comparison method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant