CN109522559B - Method and system for Chinese word segmentation in power grid operation and distribution system - Google Patents

Method and system for Chinese word segmentation in power grid operation and distribution system Download PDF

Info

Publication number
CN109522559B
CN109522559B CN201811417689.2A CN201811417689A CN109522559B CN 109522559 B CN109522559 B CN 109522559B CN 201811417689 A CN201811417689 A CN 201811417689A CN 109522559 B CN109522559 B CN 109522559B
Authority
CN
China
Prior art keywords
word
word segmentation
distribution
power grid
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811417689.2A
Other languages
Chinese (zh)
Other versions
CN109522559A (en
Inventor
李志�
夏同飞
章玉龙
郭振
王超
张学敏
岳想想
费晓璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Information and Telecommunication Co Ltd
Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd
Anhui Jiyuan Software Co Ltd
Original Assignee
State Grid Information and Telecommunication Co Ltd
Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd
Anhui Jiyuan Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Information and Telecommunication Co Ltd, Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd, Anhui Jiyuan Software Co Ltd filed Critical State Grid Information and Telecommunication Co Ltd
Priority to CN201811417689.2A priority Critical patent/CN109522559B/en
Publication of CN109522559A publication Critical patent/CN109522559A/en
Application granted granted Critical
Publication of CN109522559B publication Critical patent/CN109522559B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention provides a method for Chinese word segmentation in a power grid operation and distribution system, which comprises the following steps: establishing a power grid operation and distribution word segmentation word bank; selecting a word segmentation word bank corresponding to a preset scene; carrying out hash indexing on the first 2 characters of the data to be processed one by one according to the word segmentation word bank in the second step; arranging the residual word strings of the processed data according to a preset sequence, and performing word-by-word matching on the arranged data according to the word segmentation word bank in the second step; extracting sample data to form a big data training set and a verification set; and evaluating the word feature indexes. The invention provides a word segmentation method for improving a TRIE index tree on the basis of a classical dictionary word segmentation method, and further provides a double-array Trie word segmentation method, which is more suitable for a power service environment; a Chinese word segmentation method is provided by combining with the scene requirements of the power business, the feature information of the power business object is efficiently and accurately extracted, and the feature extraction meets certain synonymy recognition rate, ambiguity recognition rate and new word recognition rate indexes.

Description

Method and system for Chinese word segmentation in power grid operation and distribution system
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method and a system for Chinese word segmentation in a power grid operation and distribution system.
Background
Distribution and utilization of electricity are core services of power grid enterprises, and the operation and distribution ledger is an important basis for development of the distribution and utilization services. Because the power grid operation, distribution and dispatching business relevance is strong, operation, distribution and dispatching basic accounts (such as lines, areas, transformers, users and the like) belong to different professional management and have intersection, the through and corresponding problems of the operation, distribution and dispatching basic accounts are one of the difficulties of power business.
At present, a large amount of research work is carried out on Chinese unstructured text matching by domestic scholars, and certain achievements are achieved. The word segmentation and matching process is the focus of research, and the feature extraction and weight calculation process can also be generally included in the matching process. The word segmentation technology belongs to the category of natural language understanding technology, is the first link of semantic understanding, and is a technology for exactly separating words in sentences. Different from the separation of English words by spaces, the absence of fixed separators between Chinese words and the existence of ambiguity problems and new word recognition problems, the word segmentation is relatively difficult.
The existing Chinese word segmentation can be generally divided into 3 types such as a word segmentation method based on a dictionary, a word segmentation method based on statistics, a word segmentation method based on understanding and the like, wherein a mechanical word segmentation method based on the dictionary is the most mature. However, the method is limited by the scale of the dictionary, has certain difficulty in identifying unregistered new words, and is also troubled by ambiguity problems, and the ideal word segmentation method is based on an understood word segmentation method, namely, a computer learns grammar and semantic rules like a human being, and correct word segmentation selection is made according to the rules.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method and a system for Chinese word segmentation in a power grid operation and distribution system, which can efficiently and accurately extract the characteristic information of a power business object, and the characteristic extraction meets certain synonymy recognition rate, ambiguity recognition rate and new word recognition rate indexes.
In order to achieve the purpose, the invention is realized by the following technical scheme:
a method for Chinese word segmentation in a power grid operation and distribution system comprises the following steps:
step one, establishing a power grid marketing and distribution word segmentation word bank;
selecting a word segmentation word bank corresponding to a preset scene;
step three, carrying out hash index on the first 2 characters of the data to be processed one by one according to the word segmentation word bank in the step two;
step four, arranging the residual word strings of the processed data according to a preset sequence, and performing word-by-word matching on the arranged data according to the word segmentation word bank in the step two;
step five, extracting sample data to form a big data training set and a verification set;
and sixthly, evaluating the word feature indexes.
Further, the second step specifically includes: selecting the distribution line name to be matched with the name in the dispatching, operation and inspection and marketing system; selecting naming matching of the transformer substation in a dispatching and marketing system; and selecting the naming matching of the distribution station in the electric power operation inspection and marketing system.
Further, each node in the method uses two arrays of the same index for element expression, including an array for determining state transition and an array for checking the correctness of the transition.
Furthermore, the word segmentation characteristic indexes comprise accuracy and recall rate, and the accuracy is calculated by the method
Figure BDA0001879877020000031
Wherein b represents the number of correctly segmented words, and a represents the total number of segmented words;
the recall rate is calculated by
Figure BDA0001879877020000032
Where b denotes the number of correctly segmented words and n denotes the total number of words that should be segmented.
A system for Chinese word segmentation in a power grid operation and distribution system comprises:
the word bank establishing module is used for establishing a power grid marketing and distribution word division word bank;
the scene selection module is used for selecting a word segmentation word bank corresponding to a preset scene;
the Trie node index module is used for carrying out hash index one by one on the first 2 characters of the data to be processed according to the word segmentation word bank selected by the scene selection module;
the Trie mechanism index module is used for arranging the residual word strings of the processed data according to a preset sequence and performing word-by-word matching on the arranged data according to the word segmentation word bank selected by the scene selection module;
the set generation module is used for extracting sample data to form a big data training set and a verification set;
and the characteristic index evaluation module is used for evaluating the word characteristic indexes.
Further, the scene selection module comprises:
the distribution circuit selection submodule is used for selecting the naming matching of the distribution circuit naming in the dispatching, operation and inspection and marketing systems;
the transformer substation selection submodule is used for selecting naming matching of the transformer substation in the dispatching and marketing system;
and the power distribution area selection submodule selects the naming matching of the power distribution area in the electric power operation inspection and marketing system.
Further, the set generation module includes:
the training set generation submodule is used for extracting sample data to form a big data training set;
and the verification set generation submodule is used for extracting sample data to form a large data verification set.
Further, the characteristic index evaluation module includes:
an accuracy calculation submodule including an accuracy calculation method of
Figure BDA0001879877020000041
Wherein b represents the number of correctly segmented words and a represents the total number of segmented words;
a recall rate calculating submodule including a recall rate calculating method of
Figure BDA0001879877020000042
Where b denotes the number of correctly segmented words and n denotes the total number of words that should be segmented.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a word segmentation method for improving a TRIE index tree on the basis of a classical dictionary word segmentation method, and further provides a double-array Trie word segmentation method which is more suitable for a power service environment; the Chinese word segmentation method is provided by combining with the scene requirements of the power service, the feature information of the power service object is efficiently and accurately extracted, and the feature extraction meets certain synonymy recognition rate, ambiguity recognition rate and new word recognition rate indexes.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a block diagram of the system architecture of the present invention;
FIG. 3 is a block diagram of a scene selection module according to the present invention;
FIG. 4 is a block diagram of the structure of a collection generation module according to the present invention;
fig. 5 is a block diagram of a feature index evaluation module according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a method for Chinese word segmentation in a power grid marketing and distribution system, which comprises the following steps:
s1, establishing a power grid marketing and distribution word segmentation word bank;
s2, selecting a word segmentation word bank corresponding to a preset scene;
s3, performing hash index one by one on the first 2 characters of the data to be processed according to the word segmentation word bank in the S2;
s4, arranging the rest word strings of the processed data according to a preset sequence, and performing word-by-word matching on the arranged data according to the word segmentation word bank in the S2;
s5, extracting sample data to form a big data training set and a verification set;
and S6, evaluating the word feature indexes.
Specifically, S2 specifically includes: selecting the distribution line name to be matched with the name in the dispatching, operation and inspection and marketing system; selecting naming matching of the transformer substation in a dispatching and marketing system; and selecting the naming matching of the distribution station in the electric power operation inspection and marketing system.
Specifically, each node in the method uses two arrays of the same subscript for element expression, including an array for determining state transition and an array for checking the correctness of the transition.
Specifically, the word segmentation characteristic indexes comprise accuracy and recall rate, and the accuracy is calculated by the method
Figure BDA0001879877020000061
Wherein b represents the number of correctly segmented words and a represents the total number of segmented words;
the recall rate is calculated by
Figure BDA0001879877020000062
Where b denotes the number of correctly segmented words and n denotes the total number of words that should be segmented.
The invention also provides a system for Chinese word segmentation in a power grid marketing and distribution system, which comprises the following steps:
the word bank establishing module 201 is used for establishing a power grid marketing and distribution word bank;
a scene selection module 202, configured to select a word segmentation lexicon corresponding to a preset scene;
the Trie node indexing module 203 is configured to perform hash indexing on the first 2 words of the data to be processed one by one according to the participle lexicon selected by the scene selection module 202;
the Trie mechanism index module 203 is configured to arrange the remaining word strings of the processed data according to a preset sequence, and perform word-by-word matching on the arranged data according to the word segmentation lexicon selected by the scene selection module 202;
the set generating module 205 is configured to extract sample data to form a big data training set and a verification set;
and the feature index evaluation module 206 is used for evaluating the word feature indexes.
Specifically, the scene selection module 202 includes:
the distribution circuit selection submodule 301 is used for selecting the naming matching of the distribution circuit naming in the dispatching, operation and inspection and marketing system;
the transformer substation selection submodule 302 is used for selecting naming matching of a transformer substation in a dispatching and marketing system;
and the power distribution area selection submodule 303 selects naming matching of the power distribution area in the electric power operation inspection and marketing system.
Specifically, the set generating module 205 includes:
a training set generation submodule 401, configured to extract sample data to form a big data training set;
the verification set generation submodule 402 is configured to extract sample data to form a large data verification set.
Specifically, the feature index evaluation module 206 includes:
the accuracy calculation submodule 501 includes an accuracy calculation method of
Figure BDA0001879877020000071
Wherein b represents the number of correctly segmented words, and a represents the total number of segmented words;
the recall ratio calculation submodule 502 comprises a recall ratio calculation method
Figure BDA0001879877020000072
Where b denotes the number of correctly segmented words and n denotes the total number of words that should be segmented.
In order to adapt to naming habits of electric power objects in different regions, different systems and different time periods, key features of the electric power objects are extracted according to naming, recognition effects of classical Chinese word segmentation methods based on dictionaries, statistics and the like under an electric power service scene are researched, key indexes such as synonymy recognition rate, ambiguity recognition rate, new word recognition rate and the like are used for evaluation, and a current mainstream research direction is referred to on the basis of the classical Chinese word segmentation, the invention respectively provides an improved Trie index tree facing the electric power service scene and a double-array Trie Chinese word segmentation method facing the electric power service scene:
the classical Chinese word segmentation method is carried out depending on a machine dictionary, all word segmentation processes need to pass through a word list, namely the word segmentation dictionary, and too much information about languages such as lexical, semantic, syntactic knowledge and the like is not involved. The dictionary classification lists various vocabulary entries, and the number of entries in the dictionary, the selection of the entries and the organization structure of the dictionary directly influence the final word segmentation effect.
The basic idea of classical word segmentation is to first build a lexicon, i.e. a word segmentation dictionary, which contains as many as possible all possible words. For a given Chinese character string s to be segmented, a substring of the s is taken according to a certain determined principle (forward or reverse), if the substring is matched with a certain entry in a dictionary, the substring is a word and is segmented, and the rest is continuously segmented until the substring is empty; otherwise, the substring is not a word, and the next substring is continuously taken for matching. The classical word segmentation method can be divided into forward matching and reverse matching according to different scanning directions; according to the condition of preferential matching of different lengths, the method can be divided into maximum (longest) matching and minimum (shortest) matching; whether the method is combined with the part-of-speech tagging process or not can be divided into a simple word segmentation method and an integrated method combining word segmentation and tagging.
The invention respectively selects the naming matching of distribution line naming in a dispatching, operation and inspection and marketing system, the naming matching of a transformer substation in the dispatching and marketing system, the naming matching of a distribution substation in a power operation and inspection and marketing system and other different scenes, applies different classical Chinese word segmentation methods to extract features, and checks the feature expression effect of feature segmentation verification, and the research work comprises the following steps: sample data extraction, training set and verification set setting, implementation of word segmentation algorithm, chinese word segmentation feature extraction, word segmentation feature index evaluation and the like.
The Trie index tree is a key tree expressed in the form of multiple linked lists of the tree, and consists of Trie index tree nodes and a Trie index mechanism 2 part, and the tree structure expresses the covering and preferential matching relation between Chinese dictionaries and each participle in the dictionaries. In the word segmentation application, the segmented sentences only need to be matched word by word along the tree chain without predicting the length of the word to be queried.
According to the characteristic that double-character words are more in Chinese, a Trie index tree dictionary indexing mechanism is improved, a structure that the first 2 words are subjected to hash indexing one by one and the rest word strings are arranged in order is adopted, a word-by-word matching method is adopted in the query process, namely, phrases below 2 words are realized by the Trie index tree mechanism, and the rest parts of long words above 3 words are organized by linear tables, so that deep search is avoided, and the word segmentation speed is improved under the condition that the maintenance complexity of a typical dictionary mechanism is not improved.
On the basis of a classical Chinese word segmentation method, the invention researches an establishment method of an improved Trie index tree, a maintenance method of the improved Trie index tree, an application method of the improved Trie index tree in a typical power service scene and a feature extraction effect.
The double-array Trie tree is a variant of the Trie tree, and is a data structure which is provided on the premise of ensuring the Trie tree retrieval speed and improving the space utilization rate. The essence of the method is to determine the finite state automaton, each node represents one state of the automaton, state transition is carried out according to different variables, and query is completed when the end state is reached or the transition is not possible. The method comprises the following steps of adopting two linear arrays (base and check) to express a Trie tree, wherein each node in the Trie tree is expressed by using two array elements with the same subscript, the base array is used for determining state transfer, and the check array is used for checking the transfer correctness.
On the basis of the improved Trie index tree Chinese word segmentation method, the invention researches an establishing method of an even-number Trie index tree, a maintenance method of the even-number Trie index tree, an application method of the even-number Trie index tree in a typical electric power service scene and a feature extraction effect.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A method for Chinese word segmentation in a power grid operation and distribution system is characterized by comprising the following steps:
step one, establishing a power grid marketing and distribution word segmentation word bank;
selecting a word segmentation word bank corresponding to a preset scene;
step three, carrying out hash index on the first 2 characters of the data to be processed one by one according to the word segmentation word bank in the step two;
step four, arranging the rest word strings of the processed data according to a preset sequence, and performing word-by-word matching on the arranged data according to the word segmentation word bank in the step two;
step five, extracting sample data to form a big data training set and a verification set;
and sixthly, evaluating the word feature indexes.
2. The method for Chinese word segmentation in the power grid operation and distribution system according to claim 1, wherein the second step specifically comprises: selecting the naming matching of the distribution line naming in the dispatching, operation and inspection and marketing system; selecting naming matching of the transformer substation in a dispatching and marketing system; and selecting the naming matching of the distribution station in the electric power operation inspection and marketing system.
3. The method for Chinese word segmentation in the power grid operation and distribution system according to claim 1, wherein the method comprises the following steps: in the method, each node uses two arrays of the same subscript for element expression, including an array for determining state transition and an array for checking the correctness of the transition.
4. The method for Chinese word segmentation in the power grid operation and distribution system according to claim 1, wherein the method comprises the following steps: the word segmentation characteristic indexes comprise accuracy and recall rate, and the accuracy is calculated by
Figure FDA0001879877010000011
Wherein b represents the number of correctly segmented words, and a represents the total number of segmented words;
the recall rate is calculated by
Figure FDA0001879877010000021
Where b denotes the number of correctly segmented words and n denotes the total number of words that should be segmented.
5. A system for chinese word segmentation in a power grid operation and distribution system, the system comprising:
the word bank establishing module is used for establishing a power grid operation and distribution word bank;
the scene selection module is used for selecting a word segmentation word bank corresponding to a preset scene;
the Trie node index module is used for carrying out hash index one by one on the first 2 characters of the data to be processed according to the word segmentation word bank selected by the scene selection module;
the Trie mechanism index module is used for arranging the residual word strings of the processed data according to a preset sequence and performing word-by-word matching on the arranged data according to the word segmentation word bank selected by the scene selection module;
the set generation module is used for extracting sample data to form a big data training set and a verification set;
and the characteristic index evaluation module is used for evaluating the word characteristic indexes.
6. The system for Chinese word segmentation in the power grid operation and distribution system according to claim 5, wherein the scene selection module comprises:
the distribution circuit selecting submodule is used for selecting the naming matching of the distribution circuit naming in the dispatching, operation and inspection and marketing system;
the transformer substation selection submodule is used for selecting naming matching of the transformer substation in the dispatching and marketing system;
and the power distribution area selection submodule selects the naming matching of the power distribution area in the electric power operation inspection and marketing system.
7. The system for Chinese word segmentation in the power grid operation and distribution system according to claim 5, wherein the set generation module comprises:
the training set generation submodule is used for extracting sample data to form a big data training set;
and the verification set generation submodule is used for extracting sample data to form a large data verification set.
8. The system for Chinese word segmentation in the power grid operation and distribution system according to claim 5, wherein the characteristic index evaluation module comprises:
an accuracy calculation submodule including an accuracy calculation method of
Figure FDA0001879877010000031
Wherein b represents the number of correctly segmented words and a represents the total number of segmented words;
a recall rate calculating submodule including a recall rate calculating method of
Figure FDA0001879877010000032
Where b denotes the number of correctly segmented words and n denotes the total number of words that should be segmented.
CN201811417689.2A 2018-11-26 2018-11-26 Method and system for Chinese word segmentation in power grid operation and distribution system Active CN109522559B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811417689.2A CN109522559B (en) 2018-11-26 2018-11-26 Method and system for Chinese word segmentation in power grid operation and distribution system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811417689.2A CN109522559B (en) 2018-11-26 2018-11-26 Method and system for Chinese word segmentation in power grid operation and distribution system

Publications (2)

Publication Number Publication Date
CN109522559A CN109522559A (en) 2019-03-26
CN109522559B true CN109522559B (en) 2023-03-31

Family

ID=65793677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811417689.2A Active CN109522559B (en) 2018-11-26 2018-11-26 Method and system for Chinese word segmentation in power grid operation and distribution system

Country Status (1)

Country Link
CN (1) CN109522559B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1069493A (en) * 1996-08-29 1998-03-10 Matsushita Electric Ind Co Ltd Dictionary preparation device and word segmentation device
CN102411568A (en) * 2010-09-20 2012-04-11 苏州同程旅游网络科技有限公司 Chinese word segmentation method based on travel industry feature word stock
WO2015032120A1 (en) * 2013-09-03 2015-03-12 盈世信息科技(北京)有限公司 Method and device for filtering spam mail based on short text

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1069493A (en) * 1996-08-29 1998-03-10 Matsushita Electric Ind Co Ltd Dictionary preparation device and word segmentation device
CN102411568A (en) * 2010-09-20 2012-04-11 苏州同程旅游网络科技有限公司 Chinese word segmentation method based on travel industry feature word stock
WO2015032120A1 (en) * 2013-09-03 2015-03-12 盈世信息科技(北京)有限公司 Method and device for filtering spam mail based on short text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于互信息改进算法的新词发现对中文分词系统改进;夏同飞等;《电子元器件与信息技术》;20180920(第09期);全文 *

Also Published As

Publication number Publication date
CN109522559A (en) 2019-03-26

Similar Documents

Publication Publication Date Title
CN102799577B (en) A kind of Chinese inter-entity semantic relation extraction method
CN109033307A (en) Word polyarch vector based on CRP cluster indicates and Word sense disambiguation method
CN114065758B (en) Document keyword extraction method based on hypergraph random walk
WO2014209810A2 (en) Methods and apparatuses for mining synonymous phrases, and for searching related content
CN101510221A (en) Enquiry statement analytical method and system for information retrieval
CN104615593A (en) Method and device for automatic detection of microblog hot topics
CN108874896B (en) Humor identification method based on neural network and humor characteristics
CN104199965A (en) Semantic information retrieval method
CN113377897B (en) Multi-language medical term standard standardization system and method based on deep confrontation learning
CN102214166A (en) Machine translation system and machine translation method based on syntactic analysis and hierarchical model
CN109522547A (en) Chinese synonym iteration abstracting method based on pattern learning
CN107526841A (en) A kind of Tibetan language text summarization generation method based on Web
CN111949774A (en) Intelligent question answering method and system
CN107341188A (en) Efficient data screening technique based on semantic analysis
Alshaina et al. Multi-document abstractive summarization based on predicate argument structure
CN108536724A (en) Main body recognition methods in a kind of metro design code based on the double-deck hash index
CN111428031A (en) Graph model filtering method fusing shallow semantic information
CN110929518A (en) Text sequence labeling algorithm using overlapping splitting rule
Kessler et al. Extraction of terminology in the field of construction
Wang et al. Semi-supervised chinese open entity relation extraction
Nguyen et al. An ontology-based approach for key phrase extraction
Chader et al. Sentiment Analysis for Arabizi: Application to Algerian Dialect.
CN107562774A (en) Generation method, system and the answering method and system of rare foreign languages word incorporation model
CN109522559B (en) Method and system for Chinese word segmentation in power grid operation and distribution system
Al Taawab et al. Transliterated bengali comment classification from social media

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant