CN110704638A - Clustering algorithm-based electric power text dictionary construction method - Google Patents
Clustering algorithm-based electric power text dictionary construction method Download PDFInfo
- Publication number
- CN110704638A CN110704638A CN201910940220.5A CN201910940220A CN110704638A CN 110704638 A CN110704638 A CN 110704638A CN 201910940220 A CN201910940220 A CN 201910940220A CN 110704638 A CN110704638 A CN 110704638A
- Authority
- CN
- China
- Prior art keywords
- text
- word
- clustering
- electric power
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010276 construction Methods 0.000 title claims abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 31
- 230000011218 segmentation Effects 0.000 claims abstract description 31
- 238000012545 processing Methods 0.000 claims abstract description 21
- 230000008569 process Effects 0.000 claims abstract description 12
- 239000013598 vector Substances 0.000 claims description 38
- 238000007781 pre-processing Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 2
- 238000011160 research Methods 0.000 description 4
- 230000036541 health Effects 0.000 description 3
- 238000005065 mining Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a clustering algorithm-based electric power text dictionary construction method, which mainly comprises four parts: the system comprises a data classification preprocessor, a data word segmentation processor, a clustering processor and a data processing operation core. The invention provides a clustering algorithm-based electric power text dictionary construction method, which is a strategic method and is mainly used for constructing a dictionary in the electric power field text classification process. Through the model in the invention, the key phrases which can represent the text types in the text in the power field can be more accurately found, and the construction of the dictionary is carried out by utilizing the key phrases.
Description
Technical Field
The invention relates to the field of data processing of power systems, in particular to a clustering algorithm-based electric power text dictionary construction method, which is mainly used for text data processing in the field of electric power.
Background
The power grid enterprise is an asset-intensive enterprise, the health state management of the power equipment is a core task of the power grid enterprise, and the scientific management by utilizing big data is a necessary trend. However, it is generally believed that the grid data has the characteristics of large quantity, multiple types, low value density and fast change, and is difficult to utilize. The data value density is low, which means that most of data are normal data of the power grid and only a small amount of abnormal data are available. The serious deviation of the data influences the mining effect of the artificial intelligence method based on machine learning, deep learning and the like. Fortunately, the types of the electric power data are numerous, wherein text data has the characteristic of high value density due to the fact that 'important things are often recorded', and the mining prospect is good, so that the electric power text mining is one of key technologies which are focused on the health management of electric power equipment. The existing data mining aiming at the direction of the power grid is researched and applied aiming at the structured data in the power grid, while the research on the direction of the text in the unstructured data in the power grid is almost an original research, so far, the research report on the Chinese text processing of the power grid is almost zero. A technical approach and a solution for acquiring the electric power text information are not available for a while, and a detailed electric power corpus cannot be constructed. It is therefore necessary to construct a dictionary in the power grid related field.
And in the process of equipment operation and maintenance management, the power grid enterprises can record information of equipment such as enemy faults, defects, overhaul and elimination in a Chinese form. The information can be stored in an information management system in a text form, and not only can the past history of the individual health state of the electric power equipment be reflected, but also a technology of storing rich reliability information of the same equipment is provided. Chinese text classification has long been recognized as an important and difficult technique, especially when applied to various professional areas, where it needs to be closely coupled with the knowledge of the professional areas. All fields are rapidly developed, new words, new concepts and new relations are continuously emerged, and if the method still stays in the traditional word analysis, the method is far from meeting the requirements of people; the occurrence of domain dictionaries can solve the problem to a great extent, and research in a specific field can be marginal by constructing the dictionary and collecting the latest concepts and interrelations.
The dictionary construction mainly considers two aspects: (1) how to solve the problem that the dictionary construction is difficult due to the fact that characters in a power grid data text have strong specialties. (2) More texts exist in the power field and do not strictly accord with Chinese grammar, more irregular formats exist in the texts, and difficulties are brought to text processing and semantic analysis in the power field.
Disclosure of Invention
In order to solve the technical problems, the invention provides a clustering algorithm-based electric power text dictionary construction method to solve the problem of electric power system text dictionary construction.
The invention relates to a clustering algorithm-based electric power text dictionary construction method, which adopts the technical scheme that: the equipment used by the electric power text dictionary construction method comprises a data classification preprocessor, a data word segmentation processor, a clustering processor and a data processing operation core;
the electric power text dictionary constructing step is as follows:
step 1: creating an electric power field language database needing to be processed by using the electric power field related documents, preparing to process the text in the electric power field language database, and entering the step 2;
step 2: preprocessing the text to be processed, deleting some words which do not influence the text semantics according to the stop word list, and entering step 3;
and step 3: performing word segmentation on the text preprocessed in the step 2 by using a general dictionary to obtain a batch of well-segmented words, and entering a step 4;
and 4, step 4: searching some key words capable of representing the text for the text after the word segmentation in the step 3 by utilizing a tf-idf algorithm, and entering a step 5;
and 5: constructing a word vector for the keywords obtained in the step 4 by using a word2vec model, and turning to a step 6;
step 6: clustering the constructed word vectors by using a k-meas clustering algorithm, and entering the step 7;
and 7: selecting k word vectors constructed by using word2vec model as clustering centers (mu) in the text1,μ2,...μk-1,μk) Entering step 8;
and 8: calculating the cosine distance from each word vector to k word vectors constructed by using the word2vec model, and entering step 9;
and step 9: the word vectors are classified into k clustering clusters with the minimum cosine distance, the mean value of data points in each partitioned clustering cluster is calculated, and the value is used as a new clustering center;
step 10: if the clustering center is not changed any more or the maximum iteration number is reached, stopping the algorithm and entering the step 11;
step 11: checking whether the keywords obtained by clustering reach a preset threshold value, taking the words reaching the threshold value as the keywords, abandoning the words not reaching the threshold value, and entering step 12;
step 12: constructing a dictionary by using the related keywords obtained in the step 4 and the step 11, and entering a step 13;
step 13: and (6) ending.
Further, the data classification preprocessor performs text preprocessing on the test text to be classified according to the electric power field corpus and the stop word list, and removes some meaningless words and numerical signs of the text.
Further, the stop word vocabulary contains words, numbers, and symbols that often appear in text without practical meaning.
Further, the method for establishing the stop word list comprises the steps of establishing a data statistics knowledge rule base, setting a threshold value for whether a certain number or symbol is filled into the stop word list, and comparing the threshold value to confirm whether the numbers and symbols in the text are added into the stop word list.
Further, the data word segmentation processor, the method for segmenting the preprocessed text, comprises:
(1) performing word segmentation on the preprocessed text by using a general dictionary, and performing vectorization representation on each word after the word segmentation;
(2) selecting characteristics of a large number of word vectors, using tf-idf algorithm,wherein a is the number of times of the word appearing in the text, b is the total word number of the text, c is the total document number of the power field corpus, e is the document number containing the word, the addition of 1 to the denominator is to avoid the occurrence of the condition that the denominator is 0, the value of the word tf multiplied by idf is calculated, and some words with the largest calculation result are selected as keywords;
(3) calculating a word vector of the keyword obtained in (2) by using a word2vec model.
Further, the word2vec model used in the step (3) is a skip-grim model.
Further, the clustering processor clusters word vectors obtained by the word2vec algorithm by using a k-meas algorithm to obtain a batch of new keywords, removes unreasonable keywords obtained by clustering by using a preset threshold value, and constructs a dictionary by using the keywords obtained by clustering above the threshold value and the keywords obtained by using the tf-idf algorithm initially.
Further, the data processing operation core includes all specific operations required for data processing after the data is subjected to feature selection.
The invention has the beneficial effects that: the invention provides a clustering algorithm-based electric power text dictionary construction method, which is a strategic method and is mainly used for constructing a dictionary in the electric power field text classification process. Through the model in the invention, the key phrases which can represent the text types in the text in the power field can be more accurately found, and the construction of the dictionary is carried out by utilizing the key phrases.
Drawings
In order that the present invention may be more readily and clearly understood, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings.
FIG. 1 is a schematic diagram of a system architecture.
FIG. 2 is a schematic flow diagram of the process of the present invention.
Detailed Description
As shown in fig. 1 and 2, the electric power text dictionary construction method based on clustering algorithm of the present invention is characterized in that the apparatus used in the electric power text dictionary construction method includes a data classification preprocessor, a data word segmentation processor, a clustering processor, a data processing operation core;
the electric power text dictionary constructing step is as follows:
step 1: creating an electric power field language database needing to be processed by using the electric power field related documents, preparing to process the text in the electric power field language database, and entering the step 2;
step 2: preprocessing the text to be processed, deleting some words which do not influence the text semantics according to the stop word list, and entering step 3;
and step 3: performing word segmentation on the text preprocessed in the step 2 by using a general dictionary to obtain a batch of well-segmented words, and entering a step 4;
and 4, step 4: searching some key words capable of representing the text for the text after the word segmentation in the step 3 by utilizing a tf-idf algorithm, and entering a step 5;
and 5: constructing a word vector for the keywords obtained in the step 4 by using a word2vec model, and turning to a step 6;
step 6: clustering the constructed word vectors by using a k-meas clustering algorithm, and entering the step 7;
and 7: selecting k word vectors constructed by using word2vec model as clustering centers (mu) in the text1,μ2,...μk-1,μk) Entering step 8;
and 8: calculating the cosine distance from each word vector to k word vectors constructed by using the word2vec model, and entering step 9;
and step 9: the word vectors are classified into k clustering clusters with the minimum cosine distance, the mean value of data points in each partitioned clustering cluster is calculated, and the value is used as a new clustering center;
step 10: if the clustering center is not changed any more or the maximum iteration number is reached, stopping the algorithm and entering the step 11;
step 11: checking whether the keywords obtained by clustering reach a preset threshold value, taking the words reaching the threshold value as the keywords, abandoning the words not reaching the threshold value, and entering step 12;
step 12: constructing a dictionary by using the related keywords obtained in the step 4 and the step 11, and entering a step 13;
step 13: and (6) ending.
The data classification preprocessor is mainly used in the preprocessing process of data and training data sets in the text classification process, and text preprocessing is a necessary stage for converting semi-structured or unstructured texts into a proper text representation form. Usually, characters such as special characters, punctuation marks, numbers and the like which do not contain any information and appear in a text are deleted firstly, however, due to the particularity of the power field, the text generally contains a large number of numbers and symbols, so in the preprocessing process, special processing is performed on the part, and effective numbers and symbols in the text are reserved.
In the text classification, common words in the text need to be removed, wherein the common words refer to words frequently appearing in the text, such as 'a', 'the', etc. in the english, and 'a', 'a' in the chinese, and numbers and symbols, the words cannot bring any help to the classification, and are collected into a set called a "stop word list", stop words contained in the text should be deleted in the text preprocessing process, but due to the particularity of the power field, the text necessarily contains a large number of numbers and symbols. However, depending on the context of the text classification application, the stop words are not limited to the vocabulary in the stop word list because the method is a text related to the power domain, so in the method, a data statistics knowledge rule base is established, whether a certain number or symbol is filled into the stop word list is set to a threshold value, and whether a certain number or symbol in the text is added to the stop word list is confirmed by comparing with the threshold value. Deleting stop words can greatly increase the performance of text classification.
Because the documents in the electric power field are mostly documents such as equipment states and equipment overhaul, the documents are mostly short documents, the preprocessed texts need to be subjected to text word segmentation, the particularity of the electric power field determines that the texts in the field have many texts with extremely strong specialties, a data word segmentation processor needs to be used for segmenting the texts, and the problem that the texts are extremely strong in specialties is solved.
The word segmentation in the text classification process is an important part, and the word segmentation function is to segment the text through the existing word segmentation tool in the existing text, so that a series of segmented words can be obtained, and the words are called as word segmentation sets.
The method comprises the steps of firstly carrying out word segmentation on a preprocessed short text by using a data word segmentation processor, and obtaining a series of words after word segmentation. The data word segmentation processor is also used for firstly utilizing a statistical model (namely tf-idf algorithm) to select characteristics once, and then some words capable of representing the text, namely keywords, are obtained, however, due to the particularity of the power field, some words with the same meaning as the keywords can be omitted, and word vectors are calculated on the keywords by using word2vec algorithm.
The data word segmentation processor is used for segmenting the preprocessed text by the following method:
(1) performing word segmentation on the preprocessed text by using a general dictionary, and performing vectorization representation on each word after the word segmentation;
(2) selecting characteristics of a large number of word vectors, using tf-idf algorithm,wherein a is the number of times of the word appearing in the text, b is the total word number of the text, c is the total document number of the power field corpus, e is the document number containing the word, the addition of 1 to the denominator is to avoid the occurrence of the condition that the denominator is 0, the value of the word tf multiplied by idf is calculated, and some words with the largest calculation result are selected as keywords;
(3) calculating a word vector of the keyword obtained in the step (2) by using a word2vec model; word2vec is an algorithm for converting words into vector form, and calculating similarity in vector space to represent semantic similarity of text. In the embodiment of the application, a skip-grim model in a word2vec algorithm is used, and the model uses a word as an input to predict the context around the word. The essence of this model is to find ux Tvc(i.e., the similarity of two words), we use vcWord vector, u, representing the target wordxA word vector representing the xth word except the target word, where vc=WwcW represents a matrix of target words, W is a d V matrix, where V represents the number of all words, d represents the dimension of the target word, and W iscA one-hot vector representing the target word.
The professionality of the vocabulary separated by the data word separating processor may be ensured, but as the result of the word separating processing is limited, a clustering mode is adopted to perform clustering processing on the word vectors obtained by the processing so as to obtain more professional vocabularies and prepare the subsequent constructed dictionary correspondingly.
The method comprises the steps of obtaining a series of keywords through a data word segmentation processor, obtaining word vectors of the keywords through a word2vec algorithm, clustering words by using the word vectors, clustering the word vectors by using a k-meas clustering algorithm to obtain a series of new keywords, removing unreasonable keywords obtained by clustering by using a preset threshold value, and constructing a dictionary by using the keywords obtained by clustering above the threshold value and the keywords obtained by using a tf-idf algorithm initially.
The data processing operation core comprises all specific operations required during data processing after the data is subjected to feature selection, and other parts are added in the data processing method, so that the data processing is not influenced, and the data processing can be carried out more smoothly and effectively.
For convenience of description, the following application examples are taken as examples:
at present, an electric power enterprise wants to analyze a series of texts about customer complaints and customer maintenance recorded in the enterprise before, mine the demands of users, improve the evaluation of the users on the enterprise, and improve the experience of the users.
Then we can use the method proposed by this patent to construct a dictionary for the complaint text and repair text of the company's electric power enterprise, and then use this dictionary to mine the text data.
The specific implementation scheme is as follows:
(1) the text to be processed is preprocessed, namely the text is processed by deactivating words, and then the text is processed by word segmentation.
(2) And (4) selecting the keywords of the text of the preprocessed and participled words by utilizing tf-idf to perform feature selection.
(3) And (4) constructing a word vector for the keywords in the step (2) by using a word2vec algorithm.
(4) Clustering the constructed word vectors in the step (3) by using a k-means algorithm to obtain a series of new keywords
(5) And (4) constructing a related dictionary by using the keywords obtained in the steps (2) and (4) as root words.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and all equivalent variations made by using the contents of the present specification and the drawings are within the protection scope of the present invention.
Claims (8)
1. A clustering algorithm-based electric power text dictionary construction method is characterized in that equipment used by the electric power text dictionary construction method comprises a data classification preprocessor, a data word segmentation processor, a clustering processor and a data processing operation core;
the electric power text dictionary constructing step is as follows:
step 1: creating an electric power field language database needing to be processed by using the electric power field related documents, preparing to process the text in the electric power field language database, and entering the step 2;
step 2: preprocessing the text to be processed, deleting some words which do not influence the text semantics according to the stop word list, and entering step 3;
and step 3: performing word segmentation on the text preprocessed in the step 2 by using a general dictionary to obtain a batch of well-segmented words, and entering a step 4;
and 4, step 4: searching some key words capable of representing the text for the text after the word segmentation in the step 3 by utilizing a tf-idf algorithm, and entering a step 5;
and 5: constructing a word vector for the keywords obtained in the step 4 by using a word2vec model, and turning to a step 6;
step 6: clustering the constructed word vectors by using a k-meas clustering algorithm, and entering the step 7;
and 7: selecting k word vectors constructed by using word2vec model as clustering centers (mu) in the text1,μ2,...μk-1,μk) Entering step 8;
and 8: calculating the cosine distance from each word vector to k word vectors constructed by using the word2vec model, and entering step 9;
and step 9: the word vectors are classified into k clustering clusters with the minimum cosine distance, the mean value of data points in each partitioned clustering cluster is calculated, and the value is used as a new clustering center;
step 10: if the clustering center is not changed any more or the maximum iteration number is reached, stopping the algorithm and entering the step 11;
step 11: checking whether the keywords obtained by clustering reach a preset threshold value, taking the words reaching the threshold value as the keywords, abandoning the words not reaching the threshold value, and entering step 12;
step 12: constructing a dictionary by using the related keywords obtained in the step 4 and the step 11, and entering a step 13;
step 13: and (6) ending.
2. The method as claimed in claim 1, wherein the data classification preprocessor performs text preprocessing on the test text to be classified according to the electric power domain corpus and the stop word list, and removes some meaningless words and numerical symbols of the text.
3. The method as claimed in claim 1, wherein the stop vocabulary comprises words, numbers and symbols which are frequently appeared in text and have no practical meaning.
4. The method as claimed in claim 1, wherein the stop vocabulary is created by creating a rule base of statistical knowledge of data, setting a threshold value for whether to fill a stop vocabulary with a certain number or symbol, and comparing the threshold value to confirm whether to add the number or symbol of the text to the stop vocabulary.
5. The electric power text dictionary construction method based on the clustering algorithm as claimed in claim 1, wherein the data word segmentation processor is used for segmenting the preprocessed text by the method of:
(1) performing word segmentation on the preprocessed text by using a general dictionary, and performing vectorization representation on each word after the word segmentation;
(2) selecting characteristics of a large number of word vectors, using tf-idf algorithm,wherein a is the number of times of the word appearing in the text, b is the total word number of the text, c is the total document number of the power field corpus, e is the document number containing the word, the addition of 1 to the denominator is to avoid the occurrence of the condition that the denominator is 0, the value of the word tf multiplied by idf is calculated, and some words with the largest calculation result are selected as keywords;
(3) calculating a word vector of the keyword obtained in (2) by using a word2vec model.
6. The electric power text dictionary construction method based on the clustering algorithm as claimed in claim 5, wherein in (3), the word2vec model is used as a skip-grim model.
7. The electric power text dictionary construction method based on the clustering algorithm as claimed in claim 1, wherein the clustering processor performs clustering processing on word vectors obtained by word2vec algorithm by using k-meas algorithm to obtain a batch of new keywords, removes unreasonable keywords obtained by clustering by using a preset threshold value, and constructs a dictionary by using the keywords obtained by clustering above the threshold value and the keywords obtained by using tf-idf algorithm initially.
8. The method as claimed in claim 1, wherein the data processing operation core includes all specific operations required for data processing after feature selection of data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910940220.5A CN110704638A (en) | 2019-09-30 | 2019-09-30 | Clustering algorithm-based electric power text dictionary construction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910940220.5A CN110704638A (en) | 2019-09-30 | 2019-09-30 | Clustering algorithm-based electric power text dictionary construction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110704638A true CN110704638A (en) | 2020-01-17 |
Family
ID=69197391
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910940220.5A Pending CN110704638A (en) | 2019-09-30 | 2019-09-30 | Clustering algorithm-based electric power text dictionary construction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110704638A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107436875A (en) * | 2016-05-25 | 2017-12-05 | 华为技术有限公司 | File classification method and device |
CN111368539A (en) * | 2020-03-02 | 2020-07-03 | 贵州电网有限责任公司 | Hotspot analysis modeling method |
CN111931483A (en) * | 2020-06-22 | 2020-11-13 | 中国电力科学研究院有限公司 | Extraction method and device for structuring electric power equipment information |
CN112148880A (en) * | 2020-09-28 | 2020-12-29 | 深圳壹账通智能科技有限公司 | Customer service dialogue corpus clustering method, system, equipment and storage medium |
WO2024179519A1 (en) * | 2023-03-01 | 2024-09-06 | 维沃移动通信有限公司 | Semantic recognition method and apparatus |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649662A (en) * | 2016-12-13 | 2017-05-10 | 成都数联铭品科技有限公司 | Construction method of domain dictionary |
CN108628824A (en) * | 2018-04-08 | 2018-10-09 | 上海熙业信息科技有限公司 | A kind of entity recognition method based on Chinese electronic health record |
CN109284397A (en) * | 2018-09-27 | 2019-01-29 | 深圳大学 | A kind of construction method of domain lexicon, device, equipment and storage medium |
CN110287321A (en) * | 2019-06-26 | 2019-09-27 | 南京邮电大学 | A kind of electric power file classification method based on improvement feature selecting |
-
2019
- 2019-09-30 CN CN201910940220.5A patent/CN110704638A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649662A (en) * | 2016-12-13 | 2017-05-10 | 成都数联铭品科技有限公司 | Construction method of domain dictionary |
CN108628824A (en) * | 2018-04-08 | 2018-10-09 | 上海熙业信息科技有限公司 | A kind of entity recognition method based on Chinese electronic health record |
CN109284397A (en) * | 2018-09-27 | 2019-01-29 | 深圳大学 | A kind of construction method of domain lexicon, device, equipment and storage medium |
CN110287321A (en) * | 2019-06-26 | 2019-09-27 | 南京邮电大学 | A kind of electric power file classification method based on improvement feature selecting |
Non-Patent Citations (2)
Title |
---|
石爱辉: "基于时空兴趣点和词袋模型的人体行为识别方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
聂卉 等: "基于在线评论的商业竞争情报自动获取", 《情报杂志》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107436875A (en) * | 2016-05-25 | 2017-12-05 | 华为技术有限公司 | File classification method and device |
CN111368539A (en) * | 2020-03-02 | 2020-07-03 | 贵州电网有限责任公司 | Hotspot analysis modeling method |
CN111931483A (en) * | 2020-06-22 | 2020-11-13 | 中国电力科学研究院有限公司 | Extraction method and device for structuring electric power equipment information |
CN112148880A (en) * | 2020-09-28 | 2020-12-29 | 深圳壹账通智能科技有限公司 | Customer service dialogue corpus clustering method, system, equipment and storage medium |
WO2024179519A1 (en) * | 2023-03-01 | 2024-09-06 | 维沃移动通信有限公司 | Semantic recognition method and apparatus |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108304468B (en) | Text classification method and text classification device | |
CN106844346B (en) | Short text semantic similarity discrimination method and system based on deep learning model Word2Vec | |
CN110704638A (en) | Clustering algorithm-based electric power text dictionary construction method | |
CN109800310B (en) | Electric power operation and maintenance text analysis method based on structured expression | |
CA2777520C (en) | System and method for phrase identification | |
CN113011533A (en) | Text classification method and device, computer equipment and storage medium | |
CN104881458B (en) | A kind of mask method and device of Web page subject | |
CN107992633A (en) | Electronic document automatic classification method and system based on keyword feature | |
CN110781671A (en) | Knowledge mining method for intelligent IETM fault maintenance record text | |
CN109002473A (en) | A kind of sentiment analysis method based on term vector and part of speech | |
CN116628173B (en) | Intelligent customer service information generation system and method based on keyword extraction | |
CN110413998B (en) | Self-adaptive Chinese word segmentation method oriented to power industry, system and medium thereof | |
CN112926340B (en) | Semantic matching model for knowledge point positioning | |
CN110287321A (en) | A kind of electric power file classification method based on improvement feature selecting | |
CN111310467B (en) | Topic extraction method and system combining semantic inference in long text | |
CN114266256A (en) | Method and system for extracting new words in field | |
CN115759119A (en) | Financial text emotion analysis method, system, medium and equipment | |
CN115563512A (en) | Semantic matching model generation method and system based on remote supervision | |
CN117291192B (en) | Government affair text semantic understanding analysis method and system | |
CN112528640A (en) | Automatic domain term extraction method based on abnormal subgraph detection | |
CN116738979A (en) | Power grid data searching method and system based on core data identification and electronic equipment | |
US20220083581A1 (en) | Text classification device, text classification method, and text classification program | |
CN115718791A (en) | Specific ordering of text elements and applications thereof | |
Thilagavathi et al. | Document clustering in forensic investigation by hybrid approach | |
CN113901219A (en) | Data analysis method and system based on intention recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200117 |