CN110704638A - Clustering algorithm-based electric power text dictionary construction method - Google Patents

Clustering algorithm-based electric power text dictionary construction method Download PDF

Info

Publication number
CN110704638A
CN110704638A CN201910940220.5A CN201910940220A CN110704638A CN 110704638 A CN110704638 A CN 110704638A CN 201910940220 A CN201910940220 A CN 201910940220A CN 110704638 A CN110704638 A CN 110704638A
Authority
CN
China
Prior art keywords
text
word
clustering
electric power
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910940220.5A
Other languages
Chinese (zh)
Inventor
邓松
徐雨楠
朱博宇
付雄
岳东
吴新新
袁新雅
陈福林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201910940220.5A priority Critical patent/CN110704638A/en
Publication of CN110704638A publication Critical patent/CN110704638A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a clustering algorithm-based electric power text dictionary construction method, which mainly comprises four parts: the system comprises a data classification preprocessor, a data word segmentation processor, a clustering processor and a data processing operation core. The invention provides a clustering algorithm-based electric power text dictionary construction method, which is a strategic method and is mainly used for constructing a dictionary in the electric power field text classification process. Through the model in the invention, the key phrases which can represent the text types in the text in the power field can be more accurately found, and the construction of the dictionary is carried out by utilizing the key phrases.

Description

Clustering algorithm-based electric power text dictionary construction method
Technical Field
The invention relates to the field of data processing of power systems, in particular to a clustering algorithm-based electric power text dictionary construction method, which is mainly used for text data processing in the field of electric power.
Background
The power grid enterprise is an asset-intensive enterprise, the health state management of the power equipment is a core task of the power grid enterprise, and the scientific management by utilizing big data is a necessary trend. However, it is generally believed that the grid data has the characteristics of large quantity, multiple types, low value density and fast change, and is difficult to utilize. The data value density is low, which means that most of data are normal data of the power grid and only a small amount of abnormal data are available. The serious deviation of the data influences the mining effect of the artificial intelligence method based on machine learning, deep learning and the like. Fortunately, the types of the electric power data are numerous, wherein text data has the characteristic of high value density due to the fact that 'important things are often recorded', and the mining prospect is good, so that the electric power text mining is one of key technologies which are focused on the health management of electric power equipment. The existing data mining aiming at the direction of the power grid is researched and applied aiming at the structured data in the power grid, while the research on the direction of the text in the unstructured data in the power grid is almost an original research, so far, the research report on the Chinese text processing of the power grid is almost zero. A technical approach and a solution for acquiring the electric power text information are not available for a while, and a detailed electric power corpus cannot be constructed. It is therefore necessary to construct a dictionary in the power grid related field.
And in the process of equipment operation and maintenance management, the power grid enterprises can record information of equipment such as enemy faults, defects, overhaul and elimination in a Chinese form. The information can be stored in an information management system in a text form, and not only can the past history of the individual health state of the electric power equipment be reflected, but also a technology of storing rich reliability information of the same equipment is provided. Chinese text classification has long been recognized as an important and difficult technique, especially when applied to various professional areas, where it needs to be closely coupled with the knowledge of the professional areas. All fields are rapidly developed, new words, new concepts and new relations are continuously emerged, and if the method still stays in the traditional word analysis, the method is far from meeting the requirements of people; the occurrence of domain dictionaries can solve the problem to a great extent, and research in a specific field can be marginal by constructing the dictionary and collecting the latest concepts and interrelations.
The dictionary construction mainly considers two aspects: (1) how to solve the problem that the dictionary construction is difficult due to the fact that characters in a power grid data text have strong specialties. (2) More texts exist in the power field and do not strictly accord with Chinese grammar, more irregular formats exist in the texts, and difficulties are brought to text processing and semantic analysis in the power field.
Disclosure of Invention
In order to solve the technical problems, the invention provides a clustering algorithm-based electric power text dictionary construction method to solve the problem of electric power system text dictionary construction.
The invention relates to a clustering algorithm-based electric power text dictionary construction method, which adopts the technical scheme that: the equipment used by the electric power text dictionary construction method comprises a data classification preprocessor, a data word segmentation processor, a clustering processor and a data processing operation core;
the electric power text dictionary constructing step is as follows:
step 1: creating an electric power field language database needing to be processed by using the electric power field related documents, preparing to process the text in the electric power field language database, and entering the step 2;
step 2: preprocessing the text to be processed, deleting some words which do not influence the text semantics according to the stop word list, and entering step 3;
and step 3: performing word segmentation on the text preprocessed in the step 2 by using a general dictionary to obtain a batch of well-segmented words, and entering a step 4;
and 4, step 4: searching some key words capable of representing the text for the text after the word segmentation in the step 3 by utilizing a tf-idf algorithm, and entering a step 5;
and 5: constructing a word vector for the keywords obtained in the step 4 by using a word2vec model, and turning to a step 6;
step 6: clustering the constructed word vectors by using a k-meas clustering algorithm, and entering the step 7;
and 7: selecting k word vectors constructed by using word2vec model as clustering centers (mu) in the text12,...μk-1k) Entering step 8;
and 8: calculating the cosine distance from each word vector to k word vectors constructed by using the word2vec model, and entering step 9;
and step 9: the word vectors are classified into k clustering clusters with the minimum cosine distance, the mean value of data points in each partitioned clustering cluster is calculated, and the value is used as a new clustering center;
step 10: if the clustering center is not changed any more or the maximum iteration number is reached, stopping the algorithm and entering the step 11;
step 11: checking whether the keywords obtained by clustering reach a preset threshold value, taking the words reaching the threshold value as the keywords, abandoning the words not reaching the threshold value, and entering step 12;
step 12: constructing a dictionary by using the related keywords obtained in the step 4 and the step 11, and entering a step 13;
step 13: and (6) ending.
Further, the data classification preprocessor performs text preprocessing on the test text to be classified according to the electric power field corpus and the stop word list, and removes some meaningless words and numerical signs of the text.
Further, the stop word vocabulary contains words, numbers, and symbols that often appear in text without practical meaning.
Further, the method for establishing the stop word list comprises the steps of establishing a data statistics knowledge rule base, setting a threshold value for whether a certain number or symbol is filled into the stop word list, and comparing the threshold value to confirm whether the numbers and symbols in the text are added into the stop word list.
Further, the data word segmentation processor, the method for segmenting the preprocessed text, comprises:
(1) performing word segmentation on the preprocessed text by using a general dictionary, and performing vectorization representation on each word after the word segmentation;
(2) selecting characteristics of a large number of word vectors, using tf-idf algorithm,
Figure BDA0002222673760000031
wherein a is the number of times of the word appearing in the text, b is the total word number of the text, c is the total document number of the power field corpus, e is the document number containing the word, the addition of 1 to the denominator is to avoid the occurrence of the condition that the denominator is 0, the value of the word tf multiplied by idf is calculated, and some words with the largest calculation result are selected as keywords;
(3) calculating a word vector of the keyword obtained in (2) by using a word2vec model.
Further, the word2vec model used in the step (3) is a skip-grim model.
Further, the clustering processor clusters word vectors obtained by the word2vec algorithm by using a k-meas algorithm to obtain a batch of new keywords, removes unreasonable keywords obtained by clustering by using a preset threshold value, and constructs a dictionary by using the keywords obtained by clustering above the threshold value and the keywords obtained by using the tf-idf algorithm initially.
Further, the data processing operation core includes all specific operations required for data processing after the data is subjected to feature selection.
The invention has the beneficial effects that: the invention provides a clustering algorithm-based electric power text dictionary construction method, which is a strategic method and is mainly used for constructing a dictionary in the electric power field text classification process. Through the model in the invention, the key phrases which can represent the text types in the text in the power field can be more accurately found, and the construction of the dictionary is carried out by utilizing the key phrases.
Drawings
In order that the present invention may be more readily and clearly understood, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings.
FIG. 1 is a schematic diagram of a system architecture.
FIG. 2 is a schematic flow diagram of the process of the present invention.
Detailed Description
As shown in fig. 1 and 2, the electric power text dictionary construction method based on clustering algorithm of the present invention is characterized in that the apparatus used in the electric power text dictionary construction method includes a data classification preprocessor, a data word segmentation processor, a clustering processor, a data processing operation core;
the electric power text dictionary constructing step is as follows:
step 1: creating an electric power field language database needing to be processed by using the electric power field related documents, preparing to process the text in the electric power field language database, and entering the step 2;
step 2: preprocessing the text to be processed, deleting some words which do not influence the text semantics according to the stop word list, and entering step 3;
and step 3: performing word segmentation on the text preprocessed in the step 2 by using a general dictionary to obtain a batch of well-segmented words, and entering a step 4;
and 4, step 4: searching some key words capable of representing the text for the text after the word segmentation in the step 3 by utilizing a tf-idf algorithm, and entering a step 5;
and 5: constructing a word vector for the keywords obtained in the step 4 by using a word2vec model, and turning to a step 6;
step 6: clustering the constructed word vectors by using a k-meas clustering algorithm, and entering the step 7;
and 7: selecting k word vectors constructed by using word2vec model as clustering centers (mu) in the text12,...μk-1k) Entering step 8;
and 8: calculating the cosine distance from each word vector to k word vectors constructed by using the word2vec model, and entering step 9;
and step 9: the word vectors are classified into k clustering clusters with the minimum cosine distance, the mean value of data points in each partitioned clustering cluster is calculated, and the value is used as a new clustering center;
step 10: if the clustering center is not changed any more or the maximum iteration number is reached, stopping the algorithm and entering the step 11;
step 11: checking whether the keywords obtained by clustering reach a preset threshold value, taking the words reaching the threshold value as the keywords, abandoning the words not reaching the threshold value, and entering step 12;
step 12: constructing a dictionary by using the related keywords obtained in the step 4 and the step 11, and entering a step 13;
step 13: and (6) ending.
The data classification preprocessor is mainly used in the preprocessing process of data and training data sets in the text classification process, and text preprocessing is a necessary stage for converting semi-structured or unstructured texts into a proper text representation form. Usually, characters such as special characters, punctuation marks, numbers and the like which do not contain any information and appear in a text are deleted firstly, however, due to the particularity of the power field, the text generally contains a large number of numbers and symbols, so in the preprocessing process, special processing is performed on the part, and effective numbers and symbols in the text are reserved.
In the text classification, common words in the text need to be removed, wherein the common words refer to words frequently appearing in the text, such as 'a', 'the', etc. in the english, and 'a', 'a' in the chinese, and numbers and symbols, the words cannot bring any help to the classification, and are collected into a set called a "stop word list", stop words contained in the text should be deleted in the text preprocessing process, but due to the particularity of the power field, the text necessarily contains a large number of numbers and symbols. However, depending on the context of the text classification application, the stop words are not limited to the vocabulary in the stop word list because the method is a text related to the power domain, so in the method, a data statistics knowledge rule base is established, whether a certain number or symbol is filled into the stop word list is set to a threshold value, and whether a certain number or symbol in the text is added to the stop word list is confirmed by comparing with the threshold value. Deleting stop words can greatly increase the performance of text classification.
Because the documents in the electric power field are mostly documents such as equipment states and equipment overhaul, the documents are mostly short documents, the preprocessed texts need to be subjected to text word segmentation, the particularity of the electric power field determines that the texts in the field have many texts with extremely strong specialties, a data word segmentation processor needs to be used for segmenting the texts, and the problem that the texts are extremely strong in specialties is solved.
The word segmentation in the text classification process is an important part, and the word segmentation function is to segment the text through the existing word segmentation tool in the existing text, so that a series of segmented words can be obtained, and the words are called as word segmentation sets.
The method comprises the steps of firstly carrying out word segmentation on a preprocessed short text by using a data word segmentation processor, and obtaining a series of words after word segmentation. The data word segmentation processor is also used for firstly utilizing a statistical model (namely tf-idf algorithm) to select characteristics once, and then some words capable of representing the text, namely keywords, are obtained, however, due to the particularity of the power field, some words with the same meaning as the keywords can be omitted, and word vectors are calculated on the keywords by using word2vec algorithm.
The data word segmentation processor is used for segmenting the preprocessed text by the following method:
(1) performing word segmentation on the preprocessed text by using a general dictionary, and performing vectorization representation on each word after the word segmentation;
(2) selecting characteristics of a large number of word vectors, using tf-idf algorithm,wherein a is the number of times of the word appearing in the text, b is the total word number of the text, c is the total document number of the power field corpus, e is the document number containing the word, the addition of 1 to the denominator is to avoid the occurrence of the condition that the denominator is 0, the value of the word tf multiplied by idf is calculated, and some words with the largest calculation result are selected as keywords;
(3) calculating a word vector of the keyword obtained in the step (2) by using a word2vec model; word2vec is an algorithm for converting words into vector form, and calculating similarity in vector space to represent semantic similarity of text. In the embodiment of the application, a skip-grim model in a word2vec algorithm is used, and the model uses a word as an input to predict the context around the word. The essence of this model is to find ux Tvc(i.e., the similarity of two words), we use vcWord vector, u, representing the target wordxA word vector representing the xth word except the target word, where vc=WwcW represents a matrix of target words, W is a d V matrix, where V represents the number of all words, d represents the dimension of the target word, and W iscA one-hot vector representing the target word.
The professionality of the vocabulary separated by the data word separating processor may be ensured, but as the result of the word separating processing is limited, a clustering mode is adopted to perform clustering processing on the word vectors obtained by the processing so as to obtain more professional vocabularies and prepare the subsequent constructed dictionary correspondingly.
The method comprises the steps of obtaining a series of keywords through a data word segmentation processor, obtaining word vectors of the keywords through a word2vec algorithm, clustering words by using the word vectors, clustering the word vectors by using a k-meas clustering algorithm to obtain a series of new keywords, removing unreasonable keywords obtained by clustering by using a preset threshold value, and constructing a dictionary by using the keywords obtained by clustering above the threshold value and the keywords obtained by using a tf-idf algorithm initially.
The data processing operation core comprises all specific operations required during data processing after the data is subjected to feature selection, and other parts are added in the data processing method, so that the data processing is not influenced, and the data processing can be carried out more smoothly and effectively.
For convenience of description, the following application examples are taken as examples:
at present, an electric power enterprise wants to analyze a series of texts about customer complaints and customer maintenance recorded in the enterprise before, mine the demands of users, improve the evaluation of the users on the enterprise, and improve the experience of the users.
Then we can use the method proposed by this patent to construct a dictionary for the complaint text and repair text of the company's electric power enterprise, and then use this dictionary to mine the text data.
The specific implementation scheme is as follows:
(1) the text to be processed is preprocessed, namely the text is processed by deactivating words, and then the text is processed by word segmentation.
(2) And (4) selecting the keywords of the text of the preprocessed and participled words by utilizing tf-idf to perform feature selection.
(3) And (4) constructing a word vector for the keywords in the step (2) by using a word2vec algorithm.
(4) Clustering the constructed word vectors in the step (3) by using a k-means algorithm to obtain a series of new keywords
(5) And (4) constructing a related dictionary by using the keywords obtained in the steps (2) and (4) as root words.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and all equivalent variations made by using the contents of the present specification and the drawings are within the protection scope of the present invention.

Claims (8)

1. A clustering algorithm-based electric power text dictionary construction method is characterized in that equipment used by the electric power text dictionary construction method comprises a data classification preprocessor, a data word segmentation processor, a clustering processor and a data processing operation core;
the electric power text dictionary constructing step is as follows:
step 1: creating an electric power field language database needing to be processed by using the electric power field related documents, preparing to process the text in the electric power field language database, and entering the step 2;
step 2: preprocessing the text to be processed, deleting some words which do not influence the text semantics according to the stop word list, and entering step 3;
and step 3: performing word segmentation on the text preprocessed in the step 2 by using a general dictionary to obtain a batch of well-segmented words, and entering a step 4;
and 4, step 4: searching some key words capable of representing the text for the text after the word segmentation in the step 3 by utilizing a tf-idf algorithm, and entering a step 5;
and 5: constructing a word vector for the keywords obtained in the step 4 by using a word2vec model, and turning to a step 6;
step 6: clustering the constructed word vectors by using a k-meas clustering algorithm, and entering the step 7;
and 7: selecting k word vectors constructed by using word2vec model as clustering centers (mu) in the text12,...μk-1k) Entering step 8;
and 8: calculating the cosine distance from each word vector to k word vectors constructed by using the word2vec model, and entering step 9;
and step 9: the word vectors are classified into k clustering clusters with the minimum cosine distance, the mean value of data points in each partitioned clustering cluster is calculated, and the value is used as a new clustering center;
step 10: if the clustering center is not changed any more or the maximum iteration number is reached, stopping the algorithm and entering the step 11;
step 11: checking whether the keywords obtained by clustering reach a preset threshold value, taking the words reaching the threshold value as the keywords, abandoning the words not reaching the threshold value, and entering step 12;
step 12: constructing a dictionary by using the related keywords obtained in the step 4 and the step 11, and entering a step 13;
step 13: and (6) ending.
2. The method as claimed in claim 1, wherein the data classification preprocessor performs text preprocessing on the test text to be classified according to the electric power domain corpus and the stop word list, and removes some meaningless words and numerical symbols of the text.
3. The method as claimed in claim 1, wherein the stop vocabulary comprises words, numbers and symbols which are frequently appeared in text and have no practical meaning.
4. The method as claimed in claim 1, wherein the stop vocabulary is created by creating a rule base of statistical knowledge of data, setting a threshold value for whether to fill a stop vocabulary with a certain number or symbol, and comparing the threshold value to confirm whether to add the number or symbol of the text to the stop vocabulary.
5. The electric power text dictionary construction method based on the clustering algorithm as claimed in claim 1, wherein the data word segmentation processor is used for segmenting the preprocessed text by the method of:
(1) performing word segmentation on the preprocessed text by using a general dictionary, and performing vectorization representation on each word after the word segmentation;
(2) selecting characteristics of a large number of word vectors, using tf-idf algorithm,
Figure FDA0002222673750000021
wherein a is the number of times of the word appearing in the text, b is the total word number of the text, c is the total document number of the power field corpus, e is the document number containing the word, the addition of 1 to the denominator is to avoid the occurrence of the condition that the denominator is 0, the value of the word tf multiplied by idf is calculated, and some words with the largest calculation result are selected as keywords;
(3) calculating a word vector of the keyword obtained in (2) by using a word2vec model.
6. The electric power text dictionary construction method based on the clustering algorithm as claimed in claim 5, wherein in (3), the word2vec model is used as a skip-grim model.
7. The electric power text dictionary construction method based on the clustering algorithm as claimed in claim 1, wherein the clustering processor performs clustering processing on word vectors obtained by word2vec algorithm by using k-meas algorithm to obtain a batch of new keywords, removes unreasonable keywords obtained by clustering by using a preset threshold value, and constructs a dictionary by using the keywords obtained by clustering above the threshold value and the keywords obtained by using tf-idf algorithm initially.
8. The method as claimed in claim 1, wherein the data processing operation core includes all specific operations required for data processing after feature selection of data.
CN201910940220.5A 2019-09-30 2019-09-30 Clustering algorithm-based electric power text dictionary construction method Pending CN110704638A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910940220.5A CN110704638A (en) 2019-09-30 2019-09-30 Clustering algorithm-based electric power text dictionary construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910940220.5A CN110704638A (en) 2019-09-30 2019-09-30 Clustering algorithm-based electric power text dictionary construction method

Publications (1)

Publication Number Publication Date
CN110704638A true CN110704638A (en) 2020-01-17

Family

ID=69197391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910940220.5A Pending CN110704638A (en) 2019-09-30 2019-09-30 Clustering algorithm-based electric power text dictionary construction method

Country Status (1)

Country Link
CN (1) CN110704638A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107436875A (en) * 2016-05-25 2017-12-05 华为技术有限公司 File classification method and device
CN111368539A (en) * 2020-03-02 2020-07-03 贵州电网有限责任公司 Hotspot analysis modeling method
CN111931483A (en) * 2020-06-22 2020-11-13 中国电力科学研究院有限公司 Extraction method and device for structuring electric power equipment information
CN112148880A (en) * 2020-09-28 2020-12-29 深圳壹账通智能科技有限公司 Customer service dialogue corpus clustering method, system, equipment and storage medium
WO2024179519A1 (en) * 2023-03-01 2024-09-06 维沃移动通信有限公司 Semantic recognition method and apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649662A (en) * 2016-12-13 2017-05-10 成都数联铭品科技有限公司 Construction method of domain dictionary
CN108628824A (en) * 2018-04-08 2018-10-09 上海熙业信息科技有限公司 A kind of entity recognition method based on Chinese electronic health record
CN109284397A (en) * 2018-09-27 2019-01-29 深圳大学 A kind of construction method of domain lexicon, device, equipment and storage medium
CN110287321A (en) * 2019-06-26 2019-09-27 南京邮电大学 A kind of electric power file classification method based on improvement feature selecting

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649662A (en) * 2016-12-13 2017-05-10 成都数联铭品科技有限公司 Construction method of domain dictionary
CN108628824A (en) * 2018-04-08 2018-10-09 上海熙业信息科技有限公司 A kind of entity recognition method based on Chinese electronic health record
CN109284397A (en) * 2018-09-27 2019-01-29 深圳大学 A kind of construction method of domain lexicon, device, equipment and storage medium
CN110287321A (en) * 2019-06-26 2019-09-27 南京邮电大学 A kind of electric power file classification method based on improvement feature selecting

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
石爱辉: "基于时空兴趣点和词袋模型的人体行为识别方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
聂卉 等: "基于在线评论的商业竞争情报自动获取", 《情报杂志》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107436875A (en) * 2016-05-25 2017-12-05 华为技术有限公司 File classification method and device
CN111368539A (en) * 2020-03-02 2020-07-03 贵州电网有限责任公司 Hotspot analysis modeling method
CN111931483A (en) * 2020-06-22 2020-11-13 中国电力科学研究院有限公司 Extraction method and device for structuring electric power equipment information
CN112148880A (en) * 2020-09-28 2020-12-29 深圳壹账通智能科技有限公司 Customer service dialogue corpus clustering method, system, equipment and storage medium
WO2024179519A1 (en) * 2023-03-01 2024-09-06 维沃移动通信有限公司 Semantic recognition method and apparatus

Similar Documents

Publication Publication Date Title
CN108304468B (en) Text classification method and text classification device
CN106844346B (en) Short text semantic similarity discrimination method and system based on deep learning model Word2Vec
CN110704638A (en) Clustering algorithm-based electric power text dictionary construction method
CN109800310B (en) Electric power operation and maintenance text analysis method based on structured expression
CA2777520C (en) System and method for phrase identification
CN113011533A (en) Text classification method and device, computer equipment and storage medium
CN104881458B (en) A kind of mask method and device of Web page subject
CN107992633A (en) Electronic document automatic classification method and system based on keyword feature
CN110781671A (en) Knowledge mining method for intelligent IETM fault maintenance record text
CN109002473A (en) A kind of sentiment analysis method based on term vector and part of speech
CN116628173B (en) Intelligent customer service information generation system and method based on keyword extraction
CN110413998B (en) Self-adaptive Chinese word segmentation method oriented to power industry, system and medium thereof
CN112926340B (en) Semantic matching model for knowledge point positioning
CN110287321A (en) A kind of electric power file classification method based on improvement feature selecting
CN111310467B (en) Topic extraction method and system combining semantic inference in long text
CN114266256A (en) Method and system for extracting new words in field
CN115759119A (en) Financial text emotion analysis method, system, medium and equipment
CN115563512A (en) Semantic matching model generation method and system based on remote supervision
CN117291192B (en) Government affair text semantic understanding analysis method and system
CN112528640A (en) Automatic domain term extraction method based on abnormal subgraph detection
CN116738979A (en) Power grid data searching method and system based on core data identification and electronic equipment
US20220083581A1 (en) Text classification device, text classification method, and text classification program
CN115718791A (en) Specific ordering of text elements and applications thereof
Thilagavathi et al. Document clustering in forensic investigation by hybrid approach
CN113901219A (en) Data analysis method and system based on intention recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200117