CN109522559A - A kind of method and system of power grid battalion match system Chinese word segmentation - Google Patents

A kind of method and system of power grid battalion match system Chinese word segmentation Download PDF

Info

Publication number
CN109522559A
CN109522559A CN201811417689.2A CN201811417689A CN109522559A CN 109522559 A CN109522559 A CN 109522559A CN 201811417689 A CN201811417689 A CN 201811417689A CN 109522559 A CN109522559 A CN 109522559A
Authority
CN
China
Prior art keywords
word
battalion
participle
dictionary
power grid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811417689.2A
Other languages
Chinese (zh)
Other versions
CN109522559B (en
Inventor
李志�
夏同飞
章玉龙
郭振
王超
张学敏
岳想想
费晓璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Information and Telecommunication Co Ltd
Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd
Anhui Jiyuan Software Co Ltd
Original Assignee
State Grid Information and Telecommunication Co Ltd
Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd
Anhui Jiyuan Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Information and Telecommunication Co Ltd, Electric Power Research Institute of State Grid Anhui Electric Power Co Ltd, Anhui Jiyuan Software Co Ltd filed Critical State Grid Information and Telecommunication Co Ltd
Priority to CN201811417689.2A priority Critical patent/CN109522559B/en
Publication of CN109522559A publication Critical patent/CN109522559A/en
Application granted granted Critical
Publication of CN109522559B publication Critical patent/CN109522559B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of method of power grid battalion match system Chinese word segmentation, comprising steps of establishing power grid battalion with participle dictionary;Choose the default corresponding participle dictionary of scene;Hash index one by one is carried out by the participle dictionary in the step 2 to preceding 2 words of pending data;The remaining word string of the processing data is arranged by preset order, the data after arrangement are word for word matched according to the participle dictionary in the step 2;It extracts sample data and forms big data training set and verifying collection;Participle characteristic index is evaluated.The present invention proposes the segmenting method for proposing to improve TRIE index tree on the basis of classical Dictionary based segment method, further provides even numbers group Trie segmenting method, is suitable for power business environment;By proposing a kind of Chinese word cutting method in conjunction with power business scene demand, the characteristic information of power business object efficiently, is accurately extracted, feature extraction meets certain synonymous discrimination, ambiguity discrimination and new word identification rate index.

Description

A kind of method and system of power grid battalion match system Chinese word segmentation
Technical field
The present invention relates to natural language processing technique fields, and in particular to a kind of method of power grid battalion match system Chinese word segmentation And system.
Background technique
It is the core business of power grid enterprises with electricity consumption, battalion is with the important foundation that account is that the business of adapted electric industry is carried out.Due to electricity Net battalion auxiliary tone business association is strong, battalion with account (such as route, Tai Qu, transformer, user etc.) adhere to separately different majors management and There are intersections, therefore the perforation of battalion's auxiliary tone basis account, correspondence problem are always one of power business difficult point.
Currently, domestic scholars have conducted extensive research work in terms of Chinese non-structured text matching, and achieve one Determine achievement.Wherein, it segments and matching process is the emphasis of research, can generally also be attributed to feature extraction and weight computations In matching process.Participle technique belongs to the scope of natural language understanding technology, is the primary link of semantic understanding, it is by sentence In a kind of technology for opening of the correct cutting of word.Different from English word with space-separated, do not have between Chinese word fixed Separator, along with the presence of ambiguity problem and new word identification problem, so to be segmented some with regard to relative difficulty.
Current Chinese word segmentation generally can be divided into the segmenting method based on dictionary, the segmenting method based on statistics, based on reason 3 class such as segmenting method of solution, wherein again the most mature with the mechanical Chinese word segmentation method based on dictionary, this method has efficiently and accurately, real Now simple advantage, therefore using also the most extensive.But this method is limited by dictionary scale, is deposited to the identification of not landed neologisms In certain difficulty, furthermore also by the puzzlement of ambiguity problem, ideal segmenting method is the segmenting method based on understanding, that is, is allowed Computer learns grammar as the mankind and semantic rules, makes correctly participle selection according to rule.
Summary of the invention
In view of the deficiencies of the prior art, the present invention provides a kind of method and system of power grid battalion match system Chinese word segmentation, energy Enough characteristic informations for efficiently, accurately extracting power business object, feature extraction meet certain synonymous discrimination, ambiguity discrimination With new word identification rate index.
In order to achieve the above object, the present invention is achieved by the following technical programs:
A kind of method of power grid battalion match system Chinese word segmentation, comprising steps of
Step 1: establishing power grid battalion with participle dictionary;
Step 2: choosing the default corresponding participle dictionary of scene;
Step 3: preceding 2 words to pending data carry out hash index one by one by the participle dictionary in the step 2;
Step 4: being arranged by preset order the remaining word string of the processing data, according in the step 2 Participle dictionary word for word matches the data after arrangement;
Step 5: extracting sample data forms big data training set and verifying collection;
Step 6: evaluating participle characteristic index.
Further, the step 2 specifically includes: choosing distribution line name in scheduling, fortune inspection and marketing system Name matching;Choose name matching of the substation in scheduling and marketing system;Power distribution station is chosen in electric power inspection and marketing Name matching in system.
Further, each node using two arrays of same lower target carries out element expression in the method, including with In the array for determining state transfer and for examining the array for shifting correctness.
Further, the participle characteristic index includes accuracy rate and recall rate, and the calculation method of the accuracy rate is
Wherein, b indicates the number for the word being correctly syncopated as, and a indicates the sum for the word being syncopated as;
The calculation method of the recall rate is
Wherein, b indicates the number for the word being correctly syncopated as, and n indicates the sum for the word that should be syncopated as.
A kind of system of power grid battalion match system Chinese word segmentation, comprising:
Dictionary establishes module, for establishing power grid battalion with participle dictionary;
Scene chooses module, for choosing the corresponding participle dictionary of default scene;
Trie node index module chooses point that module is chosen by the scene for preceding 2 words to pending data Word dictionary carries out hash index one by one;
Trie mechanism index module is arranged for the remaining word string to the processing data by preset order, and root The participle dictionary that module is chosen is chosen according to the scene word for word to match the data after arrangement;
Gather generation module, forms big data training set and verifying collection for extracting sample data;
Characteristic index evaluation module, for evaluating participle characteristic index.
Further, the scene selection module includes:
Wiring circuit chooses submodule, for choosing name of the distribution line name in scheduling, fortune inspection and marketing system Matching;
Substation chooses submodule, for choosing name matching of the substation in scheduling and marketing system;
Submodule is chosen in power distribution station, chooses name matching of the power distribution station in electric power inspection and marketing system.
Further, the set generation module includes:
Training set generates submodule, forms big data training set for extracting sample data;
Verifying collection generates submodule, forms big data verifying collection for extracting sample data.
Further, the characteristic index evaluation module includes:
Accuracy rate computational submodule, the calculation method including accuracy rate are
Wherein, b indicates the number for the word being correctly syncopated as, and a indicates the sum for the word being syncopated as;
Recall rate computational submodule, the calculation method including recall rate are
Wherein, b indicates the number for the word being correctly syncopated as, and n indicates the sum for the word that should be syncopated as.
Compared with prior art, the invention has the following advantages:
The present invention proposes the segmenting method for proposing to improve TRIE index tree on the basis of classical Dictionary based segment method, into one Step proposes even numbers group Trie segmenting method, is suitable for power business environment;By combining power business scene demand to propose A kind of Chinese word cutting method, efficiently, accurately extracts the characteristic information of power business object, and feature extraction meets certain synonymous knowledge Not rate, ambiguity discrimination and new word identification rate index.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is the method for the present invention flow chart;
Fig. 2 is present system structural block diagram;
Fig. 3 is that scene chooses modular structure block diagram in the present invention;
Fig. 4 is to gather generation module structural block diagram in the present invention;
Fig. 5 is characteristic index evaluation module structural block diagram in the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
The present invention provides a kind of methods of power grid battalion match system Chinese word segmentation, comprising steps of
S1, power grid battalion is established with participle dictionary;
S2, the default corresponding participle dictionary of scene is chosen;
S3, hash index one by one is carried out by the participle dictionary in S2 to preceding 2 words of pending data;
S4, to processing data remaining word string arranged by preset order, according to the participle dictionary in S2 to arrangement after Data word for word matched;
S5, sample data formation big data training set and verifying collection are extracted;
S6, participle characteristic index is evaluated.
Specifically, S2 is specifically included: choosing name matching of the distribution line name in scheduling, fortune inspection and marketing system; Choose name matching of the substation in scheduling and marketing system;Choose life of the power distribution station in electric power inspection and marketing system Name matching.
Specifically, each node uses two arrays of same lower target to carry out element expression in method, including for determining The array that state shifts and the array for examining transfer correctness.
Specifically, participle characteristic index includes accuracy rate and recall rate, and the calculation method of accuracy rate is
Wherein, b indicates the number for the word being correctly syncopated as, and a indicates the sum for the word being syncopated as;
The calculation method of recall rate is
Wherein, b indicates the number for the word being correctly syncopated as, and n indicates the sum for the word that should be syncopated as.
The present invention also provides a kind of systems of power grid battalion match system Chinese word segmentation, comprising:
Dictionary establishes module 201, for establishing power grid battalion with participle dictionary;
Scene chooses module 202, for choosing the corresponding participle dictionary of default scene;
Trie node index module 203 chooses what module 202 was chosen by scene for preceding 2 words to pending data It segments dictionary and carries out hash index one by one;
Trie mechanism index module 203 is arranged for the remaining word string to processing data by preset order, and according to Scene is chosen the participle dictionary that module 202 is chosen and is word for word matched to the data after arrangement;
Gather generation module 205, forms big data training set and verifying collection for extracting sample data;
Characteristic index evaluation module 206, for evaluating participle characteristic index.
Specifically, scene selection module 202 includes:
Wiring circuit chooses submodule 301, for choosing life of the distribution line name in scheduling, fortune inspection and marketing system Name matching;
Substation chooses submodule 302, for choosing name matching of the substation in scheduling and marketing system;
Submodule 303 is chosen in power distribution station, chooses name matching of the power distribution station in electric power inspection and marketing system.
Specifically, set generation module 205 includes:
Training set generates submodule 401, forms big data training set for extracting sample data;
Verifying collection generates submodule 402, forms big data verifying collection for extracting sample data.
Specifically, characteristic index evaluation module 206 includes:
Accuracy rate computational submodule 501, the calculation method including accuracy rate are
Wherein, b indicates the number for the word being correctly syncopated as, and a indicates the sum for the word being syncopated as;
Recall rate computational submodule 502, the calculation method including recall rate are
Wherein, b indicates the number for the word being correctly syncopated as, and n indicates the sum for the word that should be syncopated as.
In order to adapt to different regions, not homologous ray, different periods electric power object naming habit, electric power pair is extracted according to name As key feature, the recognition effect based on the classical Chinese word cutting method such as dictionary, statistics under power business scene, application are studied The key indexes such as synonymous discrimination, ambiguity discrimination and new word identification rate are evaluated, and are joined on the basis of classical Chinese word segmentation Examine current main-stream research direction, the present invention proposes a kind of modified Trie index tree towards power business scene, one kind respectively Even numbers group Trie Chinese word cutting method towards power business scene:
Classical Chinese word cutting method is to rely on machine dictionary progress, and all participle processes will pass through a vocabulary That is dictionary for word segmentation, wherein not being related to the information about language itself such as too many morphology, semanteme, syntactic knowledge.Dictionary divides door other Class lists various vocabulary entries, and the institutional framework of the number of entry, the selection of entry and dictionary all directly affects in dictionary Last participle effect.
The basic thought of classical participle method is: dictionary, i.e. dictionary for word segmentation is initially set up, wherein as much as possible comprising all The word being likely to occur.To given Chinese character string s to be slit, the principle determined according to certain (positive or reverse) takes the substring of s, if The substring matches with certain entry in dictionary, then the substring is word, is cut out, and remainder continues cutting, until For sky;Otherwise, which is not word, continues to remove a substring being matched.Classical participle method according to scanning direction difference, again Positive matching and reverse matching can be divided into;The case where according to different length priority match, can be divided into maximum (longest) matching With minimum (most short) matching;Combined according to whether with part-of-speech tagging process, but can be divided into simple segmenting method and participle with Mark the integral method combined.
The present invention chooses name matching of the distribution line name in scheduling, fortune inspection and marketing system respectively, and substation exists Name matching in scheduling and marketing system, the different fields such as the name matching of power distribution station in electric power inspection and marketing system Scape carries out feature extraction using different classical Chinese word cutting methods, and verification characteristics participle verifying character representation effect studies work Work includes: that sample data is extracted, training set and verifying collect setting, segmentation methods realization, Chinese word segmentation feature extraction, participle feature Metrics evaluation etc..
Trie index tree is a kind of key tree indicated in the form of the multilinked list of tree, by Trie index tree node and Trie rope Draw 2 part of mechanism composition, covering and priority match pass between respectively segmenting in Chinese dictionary and dictionary are expressed by tree System.In participle application, it only need to word for word be matched to sentence is split along tree chain, be not required to predict participle length to be checked.
According to the more feature of double word word in Chinese, improve Trie index tree dictionary index mechanism, using preceding 2 words by The structure of a hash index, remaining word string ordered arrangement, query process using word for word matched method, be equivalent to make 2 words with Under phrase realized with Trie index tree mechanism, the linear table organization of the remainder of long words more than 3 words, to avoid depth Degree search improves participle speed in the case where not promoting oneself has typical Dictionary Mechanism maintenance complexity.
The present invention is on the basis of classical Chinese word cutting method, method for building up, the modified of Improvement type Trie index tree Application method of the maintaining method and application enhancements type Trie index tree of Trie index tree under representative power business scenario and Feature extraction effect.
Even numbers group Trie tree is a kind of variant of Trie tree, is improved empty under the premise of guaranteeing Trie tree retrieval rate Between utilization rate and a kind of data structure for proposing.Its essence is a deterministic finite automation, and each node on behalf is automatic One state of machine carries out state transfer according to the difference of variable, completes to look into when reaching end state or can not shift It askes.Trie tree is expressed using two linear arrays (base and check), each node uses same lower target two in Trie tree Array element expression, base array is for determining that state shifts, and check array is for examining transfer correctness.
The present invention studies the foundation of even numbers group Trie index tree on the basis of modified Trie index tree Chinese word cutting method The maintaining method of method, even numbers group Trie index tree, and application even numbers group Trie index tree is under representative power business scenario Application method and feature extraction effect.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to the foregoing embodiments Invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each implementation Technical solution documented by example is modified or equivalent replacement of some of the technical features;And these modification or Replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.

Claims (8)

1. a kind of method of power grid battalion match system Chinese word segmentation, which is characterized in that the method includes the steps:
Step 1: establishing power grid battalion with participle dictionary;
Step 2: choosing the default corresponding participle dictionary of scene;
Step 3: preceding 2 words to pending data carry out hash index one by one by the participle dictionary in the step 2;
Step 4: being arranged by preset order the remaining word string of the processing data, according to the participle in the step 2 Dictionary word for word matches the data after arrangement;
Step 5: extracting sample data forms big data training set and verifying collection;
Step 6: evaluating participle characteristic index.
2. a kind of method of power grid battalion match system Chinese word segmentation according to claim 1, which is characterized in that the step 2 It specifically includes: choosing name matching of the distribution line name in scheduling, fortune inspection and marketing system;Choose substation scheduling and Name matching in marketing system;Choose name matching of the power distribution station in electric power inspection and marketing system.
3. a kind of method of power grid battalion match system Chinese word segmentation according to claim 1, it is characterised in that: in the method Each node carries out element expression using two arrays of same lower target, including for determining the array of state transfer and for examining Test the array of transfer correctness.
4. a kind of method of power grid battalion match system Chinese word segmentation according to claim 1, it is characterised in that: the participle is special Levying index includes accuracy rate and recall rate, and the calculation method of the accuracy rate is
Wherein, b indicates the number for the word being correctly syncopated as, and a indicates the sum for the word being syncopated as;
The calculation method of the recall rate is
Wherein, b indicates the number for the word being correctly syncopated as, and n indicates the sum for the word that should be syncopated as.
5. a kind of system of power grid battalion match system Chinese word segmentation, which is characterized in that the system comprises:
Dictionary establishes module, for establishing power grid battalion with participle dictionary;
Scene chooses module, for choosing the corresponding participle dictionary of default scene;
Trie node index module chooses the participle word that module is chosen by the scene for preceding 2 words to pending data Library carries out hash index one by one;
Trie mechanism index module is arranged for the remaining word string to the processing data by preset order, and according to institute The participle dictionary that scene selection module is chosen is stated word for word to match the data after arrangement;
Gather generation module, forms big data training set and verifying collection for extracting sample data;
Characteristic index evaluation module, for evaluating participle characteristic index.
6. a kind of system of power grid battalion match system Chinese word segmentation according to claim 5, which is characterized in that the scene choosing Modulus block includes:
Wiring circuit chooses submodule, for choosing name matching of the distribution line name in scheduling, fortune inspection and marketing system;
Substation chooses submodule, for choosing name matching of the substation in scheduling and marketing system;
Submodule is chosen in power distribution station, chooses name matching of the power distribution station in electric power inspection and marketing system.
7. a kind of system of power grid battalion match system Chinese word segmentation according to claim 5, which is characterized in that the collection symphysis Include: at module
Training set generates submodule, forms big data training set for extracting sample data;
Verifying collection generates submodule, forms big data verifying collection for extracting sample data.
8. a kind of system of power grid battalion match system Chinese word segmentation according to claim 5, which is characterized in that the feature refers to Marking evaluation module includes:
Accuracy rate computational submodule, the calculation method including accuracy rate are
Wherein, b indicates the number for the word being correctly syncopated as, and a indicates the sum for the word being syncopated as;
Recall rate computational submodule, the calculation method including recall rate are
Wherein, b indicates the number for the word being correctly syncopated as, and n indicates the sum for the word that should be syncopated as.
CN201811417689.2A 2018-11-26 2018-11-26 Method and system for Chinese word segmentation in power grid operation and distribution system Active CN109522559B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811417689.2A CN109522559B (en) 2018-11-26 2018-11-26 Method and system for Chinese word segmentation in power grid operation and distribution system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811417689.2A CN109522559B (en) 2018-11-26 2018-11-26 Method and system for Chinese word segmentation in power grid operation and distribution system

Publications (2)

Publication Number Publication Date
CN109522559A true CN109522559A (en) 2019-03-26
CN109522559B CN109522559B (en) 2023-03-31

Family

ID=65793677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811417689.2A Active CN109522559B (en) 2018-11-26 2018-11-26 Method and system for Chinese word segmentation in power grid operation and distribution system

Country Status (1)

Country Link
CN (1) CN109522559B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1069493A (en) * 1996-08-29 1998-03-10 Matsushita Electric Ind Co Ltd Dictionary preparation device and word segmentation device
CN102411568A (en) * 2010-09-20 2012-04-11 苏州同程旅游网络科技有限公司 Chinese word segmentation method based on travel industry feature word stock
WO2015032120A1 (en) * 2013-09-03 2015-03-12 盈世信息科技(北京)有限公司 Method and device for filtering spam mail based on short text

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1069493A (en) * 1996-08-29 1998-03-10 Matsushita Electric Ind Co Ltd Dictionary preparation device and word segmentation device
CN102411568A (en) * 2010-09-20 2012-04-11 苏州同程旅游网络科技有限公司 Chinese word segmentation method based on travel industry feature word stock
WO2015032120A1 (en) * 2013-09-03 2015-03-12 盈世信息科技(北京)有限公司 Method and device for filtering spam mail based on short text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
夏同飞等: "基于互信息改进算法的新词发现对中文分词系统改进", 《电子元器件与信息技术》 *

Also Published As

Publication number Publication date
CN109522559B (en) 2023-03-31

Similar Documents

Publication Publication Date Title
CN109033307A (en) Word polyarch vector based on CRP cluster indicates and Word sense disambiguation method
CN104573028B (en) Realize the method and system of intelligent answer
CN110134946B (en) Machine reading understanding method for complex data
CN107562918A (en) A kind of mathematical problem knowledge point discovery and batch label acquisition method
CN110516256A (en) A kind of Chinese name entity extraction method and its system
CN108121700A (en) A kind of keyword extracting method, device and electronic equipment
CN104063387A (en) Device and method abstracting keywords in text
CN110377901B (en) Text mining method for distribution line trip filling case
CN108829658A (en) The method and device of new word discovery
CN111274814B (en) Novel semi-supervised text entity information extraction method
CN109857457B (en) Function level embedding representation method in source code learning in hyperbolic space
CN103970730A (en) Method for extracting multiple subject terms from single Chinese text
CN102662923A (en) Entity instance leading method based on machine learning
CN102214166A (en) Machine translation system and machine translation method based on syntactic analysis and hierarchical model
CN103813279A (en) Junk short message detecting method and device
CN103544266A (en) Method and device for generating search suggestion words
CN113592037B (en) Address matching method based on natural language inference
CN103646112A (en) Dependency parsing field self-adaption method based on web search
CN103294820B (en) WEB page classifying method and system based on semantic extension
CN108197175A (en) The treating method and apparatus of technical supervision data, storage medium, processor
CN109614626A (en) Keyword Automatic method based on gravitational model
CN113761890A (en) BERT context sensing-based multi-level semantic information retrieval method
CN112749265A (en) Intelligent question-answering system based on multiple information sources
CN102768679A (en) Searching method and searching system
CN114491081A (en) Electric power data tracing method and system based on data blood relationship graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant