CN110413998A - A kind of adaptive Chinese word cutting method and its system, medium towards power industry - Google Patents

A kind of adaptive Chinese word cutting method and its system, medium towards power industry Download PDF

Info

Publication number
CN110413998A
CN110413998A CN201910638948.2A CN201910638948A CN110413998A CN 110413998 A CN110413998 A CN 110413998A CN 201910638948 A CN201910638948 A CN 201910638948A CN 110413998 A CN110413998 A CN 110413998A
Authority
CN
China
Prior art keywords
participle
candidate
text
word
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910638948.2A
Other languages
Chinese (zh)
Other versions
CN110413998B (en
Inventor
张云翔
饶竹一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Power Supply Bureau Co Ltd
Original Assignee
Shenzhen Power Supply Bureau Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Power Supply Bureau Co Ltd filed Critical Shenzhen Power Supply Bureau Co Ltd
Priority to CN201910638948.2A priority Critical patent/CN110413998B/en
Publication of CN110413998A publication Critical patent/CN110413998A/en
Application granted granted Critical
Publication of CN110413998B publication Critical patent/CN110413998B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of adaptive Chinese word cutting method and its system, medium towards power industry, which comprises S1, obtain candidate text terms, candidate's text terms are short sentence or paragraph to be segmented;S2, it the candidate text terms is split with processing obtains multiple candidate text sentences;S3, one or more participles are obtained to each candidate text sentence progress cutting;S4, the participle in candidate text terms is replaced with into the identical vocabulary with participle word meaning one by one and carries out semantic differentiation, if there is ambiguity, return to S3, if retaining the participle as candidate participle without ambiguity;S5, acquisition and the semantic similar one or more power domain specialized vocabularies of candidate participle, calculate the similarity of candidate participle and one or more power domain specialized vocabularies and determine final participle according to similarity;S6, it is exported after being ranked up final participle by the frequency that participle occurs in the candidate text terms.

Description

A kind of adaptive Chinese word cutting method and its system, medium towards power industry
Technical field
The present invention relates to power equipment technical field of data processing, and in particular in a kind of adaptive towards power industry Literary segmenting method and its system, computer readable storage medium.
Background technique
In recent years, as network becomes increasingly popular, the text scale on internet gradually expands, and information resources are continuously increased, In order to retrieve and excavate valuable information from a large amount of resource, Internet company greatly develops natural language processing field Technology, Chinese word segmentation is basis and the premise of natural language processing technique, and Chinese word segmentation is in information retrieval, machine translation, letter It plays an important role in the information processings such as breath filtering, is the key technology and difficult point of information processing;Up to now, national grid A large amount of data management system has had been established in company, and business datum amount is very huge.
Therefore there are following technical problems: due to each business department and each operation system to data information definition rule not Together, cause same source data in reality to occur the inconsistent situation of such as title in different operation systems, cause a number The problem of multi-source, data uniformity brings certain difficulty between each operation system.
Summary of the invention
It is an object of the invention to propose a kind of adaptive Chinese word cutting method and its system, calculating towards power industry Machine readable storage medium storing program for executing, to solve the above technical problems.
In order to achieve the object of the present invention, according to a first aspect of the present invention, the embodiment of the present invention provides one kind towards electric power row The adaptive Chinese word cutting method of industry, includes the following steps:
Step S1, candidate text terms are obtained, candidate's text terms are short sentence or paragraph to be segmented;
Step S2, processing is split to the candidate text terms and obtains multiple candidate text sentences;
Step S3, cutting is carried out to each candidate text sentence and obtains one or more participles;
Step S4, the participle in candidate text terms is replaced with and segments the identical vocabulary of word meaning and carries out semanteme one by one Differentiate, if ambiguity, return step S3 occur in the text terms of front and back after replacement, if the text terms of front and back do not have discrimination after replacement Justice then retains the participle as candidate participle;
Step S5, acquisition and the semantic similar one or more power domain specialized vocabularies of candidate participle, calculate candidate point The similarity of word and one or more power domain specialized vocabularies simultaneously determines final participle according to similarity;
Step S6, it is exported after being ranked up final participle by the frequency that participle occurs in the candidate text terms.
Preferably, the step S2 includes:
By in the candidate text terms punctuate and space be separated to obtain multiple textual portions, and remove described more Punctuate and space in a textual portions obtain multiple text sentences to be filtered;
Judge whether the character in each text sentence to be filtered is power industry profession participle, if so, extracting text Simultaneously cutting is word to all identical characters in sentence, if it is not, then extracting all identical characters in text sentence and giving up;Wherein, institute It is that the text after character and character together cutting is obtained candidate text sentence that state cutting, which be word,.
Preferably, the step S3 includes:
Vocabulary corresponding with vocabulary in dictionary database in candidate text sentence is extracted and is segmented;Wherein, institute Stating vocabulary in dictionary database is vocabulary in the dedicated dictionary for word segmentation of power domain.
Preferably, the step S4 includes:
When a candidate text sentence is corresponding with multiple candidate participles, each candidate participle in candidate's text sentence is calculated The corresponding similarity value of candidate participle is accumulated by with the similarity value of one or more power domain specialized vocabularies and carrying out;
Choose final participle of the highest candidate participle of similarity value as candidate text sentence.
Preferably, the step S6 includes:
Final participle after sequence is exported by interval of space, and the top ten after selected and sorted carries out emphasis and shows Show, other final word segmentation results are then hidden.
According to a second aspect of the present invention, the embodiment of the present invention provides a kind of adaptive Chinese word segmentation system towards power industry System, comprising:
Text acquiring unit, for obtaining candidate text terms, candidate's text terms are short sentence or section to be segmented It falls;
Text segmentation unit obtains multiple candidate text sentences for being split processing to the candidate text terms;
Participle unit obtains one or more participles for carrying out cutting to each candidate text sentence;
First participle screening unit replaces with the participle in candidate text terms for one by one identical with participle word meaning Vocabulary simultaneously carries out semantic differentiation, if ambiguity, return step S3, if front and back after replacement occur in the text terms of front and back after replacement Text terms do not have ambiguity, then retain the participle as candidate participle;
Second participle screening unit, for obtaining and the semantic similar one or more power domain profession words of candidate participle It converges, calculate the similarity of candidate participle and one or more power domain specialized vocabularies and final participle is determined according to similarity;
Output unit, after being ranked up final participle by the frequency that participle occurs in the candidate text terms Output.
Preferably, the text segmentation unit includes:
First cutting unit, for by the candidate text terms punctuate and space be separated to obtain multiple texts Part, and remove punctuate and space in the multiple textual portions and obtain multiple text sentences to be filtered;
Second cutting unit, for judging whether the character in each text sentence to be filtered is power industry profession point Word, if so, extracting all identical characters in text sentence, simultaneously cutting is word, if it is not, then extracting all identical in text sentence Character is simultaneously given up;Wherein, it is that the text after character and character together cutting is obtained candidate text sentence that the cutting, which is word,.
Preferably, the participle unit is specifically used for word corresponding with vocabulary in dictionary database in candidate text sentence Remittance, which extracts, to be segmented;Wherein, vocabulary is vocabulary in the dedicated dictionary for word segmentation of power domain in the dictionary database;
The output unit includes:
Similarity calculated, for calculating candidate text when a candidate text sentence is corresponding with multiple candidate participles Each candidate segments the similarity value with one or more power domain specialized vocabularies and carries out being accumulated by the time in this sentence Choosing segments corresponding similarity value;
Final participle determination unit, for choosing the highest candidate participle of similarity value as the final of candidate text sentence Participle.
Preferably, the output unit includes:
Display unit is exported for the final participle after sorting by interval of space, and after selected and sorted before Ten progress emphasis show that other final word segmentation results are then hidden.
According to a third aspect of the present invention, the embodiment of the present invention provides a kind of computer readable storage medium, is stored thereon with Computer program realizes the adaptive Chinese word cutting method towards power industry when the program is executed by processor.
In embodiments of the present invention, in conjunction with the characteristics of electric power data, the exclusive dictionary for word segmentation library of power domain is established, according to Vocabulary split to candidate text sentence in the dictionary for word segmentation library and ambiguity differentiates to obtain candidate participle, and further to time Choosing participle determines final participle to the similarity of similar vocabulary in dictionary for word segmentation library, substantially increases the accuracy of participle, root According to by data match analysis between each operation system, the service efficiency of working efficiency and data can be significantly improved.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that being emerged from by implementing the present invention.The objectives and other advantages of the invention can by specification, Specifically noted structure is achieved and obtained in claims and attached drawing.Certainly, implement any of the products of the present invention or Method does not necessarily require achieving all the advantages described above at the same time.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of adaptive Chinese word cutting method flow chart towards power industry in the embodiment of the present invention one.
Fig. 2 is a kind of adaptive Chinese automatic word-cut schematic diagram towards power industry in the embodiment of the present invention two.
Specific embodiment
Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove It non-specifically points out, it is not necessary to attached drawing drawn to scale.
In addition, in order to better illustrate the present invention, numerous details is given in specific embodiment below.This Field is it will be appreciated by the skilled person that without certain details, the present invention equally be can be implemented.In some instances, for this Means known to the technical staff of field are not described in detail, in order to highlight purport of the invention.
As shown in Figure 1, the embodiment of the present invention provides a kind of adaptive Chinese word cutting method towards power industry, including such as Lower step:
Step S1, candidate text terms are obtained, candidate's text terms are short sentence or paragraph to be segmented;
Step S2, processing is split to the candidate text terms and obtains multiple candidate text sentences;
Step S3, cutting is carried out to each candidate text sentence and obtains one or more participles;
Step S4, the participle in candidate text terms is replaced with and segments the identical vocabulary of word meaning and carries out semanteme one by one Differentiate, if ambiguity, return step S3 occur in the text terms of front and back after replacement, if the text terms of front and back do not have discrimination after replacement Justice then retains the participle as candidate participle;
Step S5, acquisition and the semantic similar one or more power domain specialized vocabularies of candidate participle, calculate candidate point The similarity of word and one or more power domain specialized vocabularies simultaneously determines final participle according to similarity;
Step S6, it is exported after being ranked up final participle by the frequency that participle occurs in the candidate text terms.
Wherein, the step S2 is specifically included:
By in the candidate text terms punctuate and space be separated to obtain multiple textual portions, and remove described more Punctuate and space in a textual portions obtain multiple text sentences to be filtered;
Judge whether the character in each text sentence to be filtered is power industry profession participle, if so, extracting text Simultaneously cutting is word to all identical characters in sentence, if it is not, then extracting all identical characters in text sentence and giving up;Wherein, institute It is that the text after character and character together cutting is obtained candidate text sentence that state cutting, which be word,.
Specifically, extracting first character first, and judge this first for text sentence to be filtered for one Whether character is power industry profession participle, if so, extracting all identical characters and cutting in text sentence is word, if it is not, It then extracts all identical characters in text sentence and gives up;The differentiation for then proceeding to successive character, until taking out text to be filtered Last character in sentence, to realize the filtering to candidate text sentence.Wherein, according to the power industry special term of building Table and daily vocabulary dictionary for word segmentation, the character taken out in text sentence and the dedicated vocabulary of power industry are compared, and judgement should Whether character is the dedicated participle of power industry.
Wherein, the step S3 includes:
Vocabulary corresponding with vocabulary in dictionary database in candidate text sentence is extracted and is segmented;Wherein, institute Stating vocabulary in dictionary database is vocabulary in the dedicated dictionary for word segmentation of power domain.
Specifically, vocabulary corresponding with vocabulary in dictionary database and semantic similar vocabulary, a candidate text language There may be zero or more participles for sentence.
Wherein, the step S4 includes:
When a candidate text sentence is corresponding with multiple candidate participles, each candidate participle in candidate's text sentence is calculated The corresponding similarity value of candidate participle is accumulated by with the similarity value of one or more power domain specialized vocabularies and carrying out;
Choose final participle of the highest candidate participle of similarity value as candidate text sentence.
It is segmented specifically, a candidate text sentence may be corresponding with multiple candidates, according to similarity in this step Value screens these candidate's participles, and final one candidate text sentence only exports a participle, reduces participle error rate.
Wherein, the step S6 includes:
Final participle after sequence is exported by interval of space, and the top ten after selected and sorted carries out emphasis and shows Show, other final word segmentation results are then hidden.
Specifically, be ranked up each word segmentation result being calculated according to the frequency of appearance in the present embodiment, and Word segmentation result after sequence is exported by interval of space, the top ten after selected and sorted carries out emphasis and shows, subsequent Word segmentation result is then hidden, and can click respective keys when needing to watch, and shows remaining word segmentation result, and by whole participles As a result it is exported in bar graph form to display device, shows user.
The embodiment of the present invention is by choosing the participle data in the dedicated dictionary for word segmentation of power domain, by the candidate text of extraction Term can be separated with punctuate and space, be split as multiple text sentences, exported, and can be located in advance to text terms Reason reduces the punctuate contained in text terms and the participle interference of space bring, also increases the pretreatment efficiency of text terms, Solves the efficiency of existing text terms processing, by taking out a character of the text sentence split out, by taking-up Character substitutes into comparison, judges whether the character is the dedicated participle of power industry, until taking out the last character in text sentence The text sentence split out can substitute into and judge by word, and take out all identical characters by symbol, be not required to substitute into all characters Judgement is compared, the workload of character comparison judgement is reduced, so that more efficient, the filtered time of character comparison judgement It selects text terms to will do it cutting, ambiguity differentiation is carried out to the participle data obtained after cutting, until participle does not contain ambiguity, is subtracted , still there is ambiguity after avoiding text terms cutting, user is caused to see in less to producing ambiguity after text terms cutting the case where The cognition that mistake is generated when seeing, increases the accuracy segmented to text data, by the power for calculating all word segmentation results Weight score value, and carries out accumulation calculating, filters out the maximum word segmentation result of numerical value, and be ranked up according to the frequency of appearance carry out it is defeated Out, the participle data that can be obtained to cutting in text terms are ranked up output, and participle data viewing is more intuitive, more for item Rationality, so that thinking is more clear when user watches, to significantly improve the service efficiency of working efficiency and data.
As shown in Fig. 2, second embodiment of the present invention provides a kind of adaptive Chinese automatic word-cut towards power industry, packet It includes:
Text acquiring unit 1, for obtaining candidate text terms, candidate's text terms are short sentence or section to be segmented It falls;
Text segmentation unit 2 obtains multiple candidate text sentences for being split processing to the candidate text terms;
Participle unit 3 obtains one or more participles for carrying out cutting to each candidate text sentence;
First participle screening unit 4 replaces with the participle in candidate text terms for one by one identical as participle word meaning Vocabulary and carry out semantic differentiation, if there is ambiguity in the text terms of front and back after replacement, return step S3, if front and back after replacement Text terms there is no ambiguity, then retain the participle as candidate participle;
Second participle screening unit 5, for obtaining and the semantic similar one or more power domain professions of candidate participle Vocabulary calculates the similarity of candidate participle and one or more power domain specialized vocabularies and determines final point according to similarity Word;
Output unit 6, for final participle to be ranked up by the frequency that participle occurs in the candidate text terms After export.
Wherein, the text segmentation unit 2 includes:
First cutting unit, for by the candidate text terms punctuate and space be separated to obtain multiple texts Part, and remove punctuate and space in the multiple textual portions and obtain multiple text sentences to be filtered;
Second cutting unit, for judging whether the character in each text sentence to be filtered is power industry profession point Word, if so, extracting all identical characters in text sentence, simultaneously cutting is word, if it is not, then extracting all identical in text sentence Character is simultaneously given up;Wherein, it is that the text after character and character together cutting is obtained candidate text sentence that the cutting, which is word,.
Wherein, the participle unit 3 is specifically used for word corresponding with vocabulary in dictionary database in candidate text sentence Remittance, which extracts, to be segmented;Wherein, vocabulary is vocabulary in the dedicated dictionary for word segmentation of power domain in the dictionary database;
The output unit 6 includes:
Similarity calculated, for calculating candidate text when a candidate text sentence is corresponding with multiple candidate participles Each candidate segments the similarity value with one or more power domain specialized vocabularies and carries out being accumulated by the time in this sentence Choosing segments corresponding similarity value;
Final participle determination unit, for choosing the highest candidate participle of similarity value as the final of candidate text sentence Participle.
Wherein, the output unit 6 includes:
Display unit is exported for the final participle after sorting by interval of space, and after selected and sorted before Ten progress emphasis show that other final word segmentation results are then hidden.
It should be noted that system described in the present embodiment two be it is corresponding with one the method for embodiment, be used to implement One the method for example, therefore, other contents not described of system described in related embodiment two can be refering to described in embodiment one Method content obtains, and details are not described herein again.
It should also be understood that system described in one the method for embodiment and embodiment two can be implemented in many ways, including As process, device or system.Method described herein partly can execute this method by being used to indicate processor Program instruction and the instruction being recorded in non-transient computer readable storage medium and implement, non-transient computer is readable Storage medium hard drive, floppy disk, optical disc (small-sized dish (CD) or digital universal dish (DVD)), flash memory etc.. In some embodiments, program instruction can be stored remotely and be sent out on network via optics or electronic communication link It send.
The embodiment of the present invention three provides a kind of computer readable storage medium, is stored thereon with computer program, the program The adaptive Chinese word cutting method towards power industry described in embodiment one is realized when being executed by processor.
Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In principle, the practical application or to the technological improvement in market for best explaining each embodiment, or make the art its Its those of ordinary skill can understand each embodiment disclosed herein.

Claims (10)

1. a kind of adaptive Chinese word cutting method towards power industry, which comprises the steps of:
Step S1, candidate text terms are obtained, candidate's text terms are short sentence or paragraph to be segmented;
Step S2, processing is split to the candidate text terms and obtains multiple candidate text sentences;
Step S3, cutting is carried out to each candidate text sentence and obtains one or more participles;
Step S4, one by one the participle in candidate text terms is replaced with to anticipate with participle word and identical vocabulary and carries out semanteme and sentence Not, if ambiguity, return step S3 occur in the text terms of front and back after replacement, if the text terms of front and back do not have discrimination after replacement Justice then retains the participle as candidate participle;
Step S5, acquisition and the semantic similar one or more power domain specialized vocabularies of candidate participle, calculate candidate participle with The similarity of one or more power domain specialized vocabularies simultaneously determines final participle according to similarity;
Step S6, it is exported after being ranked up final participle by the frequency that participle occurs in the candidate text terms.
2. the adaptive Chinese word cutting method towards power industry as described in claim 1, which is characterized in that the step S2 Include:
By in the candidate text terms punctuate and space be separated to obtain multiple textual portions, and remove the multiple text Punctuate and space in this part obtain multiple text sentences to be filtered;
Judge whether the character in each text sentence to be filtered is power industry profession participle, if so, extracting text sentence In all identical characters and cutting be word, if it is not, then extracting all identical characters in text sentence and giving up;Wherein, described to cut Being divided into word is that the text after character and character together cutting is obtained candidate text sentence.
3. the adaptive Chinese word cutting method towards power industry as described in claim 1, which is characterized in that the step S3 Include:
Vocabulary corresponding with vocabulary in dictionary database in candidate text sentence is extracted and is segmented;Wherein, institute's predicate Vocabulary is vocabulary in the dedicated dictionary for word segmentation of power domain in allusion quotation database.
4. the adaptive Chinese word cutting method towards power industry as described in claim 1, which is characterized in that the step S4 Include:
When a candidate text sentence is corresponding with multiple candidate participles, each candidate participle and one in candidate's text sentence is calculated The similarity value of a or multiple power domain specialized vocabularies simultaneously carries out being accumulated by the corresponding similarity value of candidate participle;
Choose final participle of the highest candidate participle of similarity value as candidate text sentence.
5. the adaptive Chinese word cutting method towards power industry as claimed in claim 4, which is characterized in that the step S6 Include:
Final participle after sequence is exported by interval of space, and the top ten after selected and sorted carries out emphasis and shows, Other final word segmentation results are then hidden.
6. a kind of adaptive Chinese automatic word-cut towards power industry characterized by comprising
Text acquiring unit, for obtaining candidate text terms, candidate's text terms are short sentence or paragraph to be segmented;
Text segmentation unit obtains multiple candidate text sentences for being split processing to the candidate text terms;
Participle unit obtains one or more participles for carrying out cutting to each candidate text sentence;
First participle screening unit, for replacing with by the participle in candidate text terms one by one and segmenting the identical vocabulary of word meaning And semantic differentiation is carried out, if there is ambiguity, return step S3, if the text before and after after replacement in the text terms of front and back after replacement Term does not have ambiguity, then retains the participle as candidate participle;
Second participle screening unit is used to obtain similar one or more power domain specialized vocabularies with candidate participle semanteme, It calculates the similarity of candidate participle and one or more power domain specialized vocabularies and final participle is determined according to similarity;
Output unit, for defeated after being ranked up final participle by the frequency that participle occurs in the candidate text terms Out.
7. the adaptive Chinese automatic word-cut towards power industry as claimed in claim 6, which is characterized in that the text point Cutting unit includes:
First cutting unit, for by the candidate text terms punctuate and space be separated to obtain multiple text portions Point, and remove punctuate and space in the multiple textual portions and obtain multiple text sentences to be filtered;
Second cutting unit, for judging whether the character in each text sentence to be filtered is power industry profession participle, if It is then to extract in text sentence all identical characters and cutting is word, if it is not, then extracting in text sentence all identical characters simultaneously Give up;Wherein, it is that the text after character and character together cutting is obtained candidate text sentence that the cutting, which is word,.
8. the adaptive Chinese automatic word-cut towards power industry as claimed in claim 6, which is characterized in that the participle is single Member is specifically used for extracting vocabulary corresponding with vocabulary in dictionary database in candidate text sentence being segmented;Wherein, Vocabulary is vocabulary in the dedicated dictionary for word segmentation of power domain in the dictionary database;
The output unit includes:
Similarity calculated, for calculating candidate's text language when a candidate text sentence is corresponding with multiple candidate participles Each candidate segments the similarity value with one or more power domain specialized vocabularies and carries out being accumulated by the candidate point in sentence The corresponding similarity value of word;
Final participle determination unit, for choosing final point as candidate text sentence of the highest candidate participle of similarity value Word.
9. the adaptive Chinese automatic word-cut towards power industry as claimed in claim 8, which is characterized in that the output is single Member includes:
Display unit is exported for the final participle after sorting by interval of space, and the top ten after selected and sorted It carries out emphasis and shows that other final word segmentation results are then hidden.
10. a kind of computer readable storage medium, is stored thereon with computer program, power is realized when which is executed by processor Benefit require any one of 1~5 described in the adaptive Chinese word cutting method towards power industry.
CN201910638948.2A 2019-07-16 2019-07-16 Self-adaptive Chinese word segmentation method oriented to power industry, system and medium thereof Active CN110413998B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910638948.2A CN110413998B (en) 2019-07-16 2019-07-16 Self-adaptive Chinese word segmentation method oriented to power industry, system and medium thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910638948.2A CN110413998B (en) 2019-07-16 2019-07-16 Self-adaptive Chinese word segmentation method oriented to power industry, system and medium thereof

Publications (2)

Publication Number Publication Date
CN110413998A true CN110413998A (en) 2019-11-05
CN110413998B CN110413998B (en) 2023-04-21

Family

ID=68361553

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910638948.2A Active CN110413998B (en) 2019-07-16 2019-07-16 Self-adaptive Chinese word segmentation method oriented to power industry, system and medium thereof

Country Status (1)

Country Link
CN (1) CN110413998B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079428A (en) * 2019-12-27 2020-04-28 出门问问信息科技有限公司 Word segmentation and industry dictionary construction method and device and readable storage medium
CN112257425A (en) * 2020-09-29 2021-01-22 国网天津市电力公司 Power data analysis method and system based on data classification model
CN112926320A (en) * 2021-03-24 2021-06-08 山东亿云信息技术有限公司 Text key content intelligent extraction method and system based on subject term optimization

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077275A (en) * 2014-06-27 2014-10-01 北京奇虎科技有限公司 Method and device for performing word segmentation based on context
CN106844326A (en) * 2015-12-04 2017-06-13 北京国双科技有限公司 A kind of method and device for obtaining word
CN107608968A (en) * 2017-09-22 2018-01-19 深圳市易图资讯股份有限公司 Chinese word cutting method, the device of text-oriented big data
CN107918604A (en) * 2017-11-13 2018-04-17 彩讯科技股份有限公司 A kind of Chinese segmenting method and device
CN109828981A (en) * 2017-11-22 2019-05-31 阿里巴巴集团控股有限公司 A kind of data processing method and calculate equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077275A (en) * 2014-06-27 2014-10-01 北京奇虎科技有限公司 Method and device for performing word segmentation based on context
CN106844326A (en) * 2015-12-04 2017-06-13 北京国双科技有限公司 A kind of method and device for obtaining word
CN107608968A (en) * 2017-09-22 2018-01-19 深圳市易图资讯股份有限公司 Chinese word cutting method, the device of text-oriented big data
CN107918604A (en) * 2017-11-13 2018-04-17 彩讯科技股份有限公司 A kind of Chinese segmenting method and device
CN109828981A (en) * 2017-11-22 2019-05-31 阿里巴巴集团控股有限公司 A kind of data processing method and calculate equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079428A (en) * 2019-12-27 2020-04-28 出门问问信息科技有限公司 Word segmentation and industry dictionary construction method and device and readable storage medium
CN111079428B (en) * 2019-12-27 2023-09-19 北京羽扇智信息科技有限公司 Word segmentation and industry dictionary construction method and device and readable storage medium
CN112257425A (en) * 2020-09-29 2021-01-22 国网天津市电力公司 Power data analysis method and system based on data classification model
CN112926320A (en) * 2021-03-24 2021-06-08 山东亿云信息技术有限公司 Text key content intelligent extraction method and system based on subject term optimization
CN112926320B (en) * 2021-03-24 2022-12-27 山东亿云信息技术有限公司 Text key content intelligent extraction method and system based on subject term optimization

Also Published As

Publication number Publication date
CN110413998B (en) 2023-04-21

Similar Documents

Publication Publication Date Title
CN106649783B (en) Synonym mining method and device
US10558754B2 (en) Method and system for automating training of named entity recognition in natural language processing
US20150074112A1 (en) Multimedia Question Answering System and Method
CN110297988A (en) Hot topic detection method based on weighting LDA and improvement Single-Pass clustering algorithm
CN104881458B (en) A kind of mask method and device of Web page subject
CN110413998A (en) A kind of adaptive Chinese word cutting method and its system, medium towards power industry
CN112148881A (en) Method and apparatus for outputting information
CN111861596A (en) Text classification method and device
CN114579104A (en) Data analysis scene generation method, device, equipment and storage medium
CN107577713B (en) Text handling method based on electric power dictionary
CN109885641A (en) A kind of method and system of database Chinese Full Text Retrieval
CN112052397A (en) User feature generation method and device, electronic equipment and storage medium
CN112784009A (en) Subject term mining method and device, electronic equipment and storage medium
CN110704638A (en) Clustering algorithm-based electric power text dictionary construction method
CN110413997A (en) For the new word discovery method and its system of power industry, readable storage medium storing program for executing
CN111475607B (en) Web data clustering method based on Mashup service function feature representation and density peak detection
CN107291952B (en) Method and device for extracting meaningful strings
US20170140010A1 (en) Automatically Determining a Recommended Set of Actions from Operational Data
CN108733733B (en) Biomedical text classification method, system and storage medium based on machine learning
CN113779983B (en) Text data processing method and device, storage medium and electronic device
CN106933797B (en) Target information generation method and device
CN113221538B (en) Event library construction method and device, electronic equipment and computer readable medium
Wei et al. Automatic structuring of it problem ticket data for enhanced problem resolution
CN115905297B (en) Method, apparatus and medium for retrieving data
CN117150046B (en) Automatic task decomposition method and system based on context semantics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant