CN112115260A - Method for automatically calculating Chinese word classification - Google Patents
Method for automatically calculating Chinese word classification Download PDFInfo
- Publication number
- CN112115260A CN112115260A CN202010689433.8A CN202010689433A CN112115260A CN 112115260 A CN112115260 A CN 112115260A CN 202010689433 A CN202010689433 A CN 202010689433A CN 112115260 A CN112115260 A CN 112115260A
- Authority
- CN
- China
- Prior art keywords
- chinese
- module
- word
- new
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for automatically calculating Chinese word classification, which comprises the following steps: s1, establishing an automatic calculation Chinese word classification system, wherein the automatic calculation Chinese word classification system comprises an input module, a new word recognition module, an automatic intelligent calculation module, a new word verification module and a new word classification labeling module; and S2, inputting the Chinese article through the input module. The invention calculates the classification of each word by researching the inheritance characteristics of vocabulary, grammar structure, sequence and character characteristics in the language and by establishing a classification sample and collecting language habits, finally realizes automatic classification and automatically adds a plurality of labels, thereby effectively breaking the situation that the classification of the words in the prior art is limited to a certain extent.
Description
Technical Field
The invention relates to the technical field of natural semantic recognition, in particular to a method for automatically calculating Chinese word classification.
Background
Natural language is a crystal of human intelligence, natural language processing is one of the most difficult problems in artificial intelligence, and research into natural language processing is also attractive and challenging. In theory, natural semantic recognition, NLP, is an attractive way for human-computer interaction. Early language processing systems, such as SHRDLU, worked reasonably well when they were in a limited "building block world," using limited vocabulary sessions. This makes the system quite optimistic for researchers, however, they quickly lose confidence when expanding the system into an environment full of real-world ambiguities and uncertainties.
Natural language learning, while also being viewed as an artificial intelligence complete (AI-complete) problem, requires extensive knowledge about the world and the ability to use this knowledge due to the understanding of natural language. Meanwhile, in natural language processing, the definition of "understanding" also becomes a major problem. Research on understanding the definition problem has been of interest.
The existing NLP is more established in the fields of scientific word segmentation, word vectors, part-of-speech correlation, IDF and the like, is based on application practice after the part-of-speech is correct, and is a brand-new field for automatically classifying words by a computer.
However, the present parts of speech and word classification are generated by manual sorting, dictionaries and historical documents, and the above method has the following technical problems:
1. the classification of words is too coarse, such as nouns, verbs, adjectives …, where millions of nouns in the large class are very inconvenient in NLP applications;
2. the new words are layered endlessly, and the manual sorting lacks a quick discovery mechanism and accuracy;
3. multiple classification can occur to a word, the word can only belong to one class in the traditional mode, and a label is added, so that the classification of the word is limited to a certain extent.
Therefore, a method for automatically calculating Chinese word classification is provided.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method for automatically calculating Chinese word classification, which calculates the classification of each word by researching the inheritance characteristics of words, grammar structures, sequences and character characteristics in languages and by establishing classification samples and collecting language habits, finally realizes automatic classification and automatically adds multiple labels, thereby effectively breaking the situation that the classification of words in the prior art is limited to a certain extent and solving the problems in the background art.
In order to achieve the purpose, the invention provides the following technical scheme:
a method for automatically calculating Chinese word classification comprises the following steps:
s1, establishing an automatic calculation Chinese word classification system, wherein the automatic calculation Chinese word classification system comprises an input module, a new word recognition module, an automatic intelligent calculation module, a new word verification module and a new word classification labeling module;
s2, inputting Chinese articles through the input module;
s3, the new word recognition module recognizes the input Chinese article, finds sentences of possible new words and sends the found sentences of possible new words to the automatic intelligent computation module;
s4, the automatic intelligent computation module intelligently computes and analyzes the received sentences of the possible new words, if a new word is found, the automatic intelligent computation module brings the found new word into other sentences in the new word verification module to verify the part of speech, if a new word is verified and found, the automatic intelligent computation module brings the new word confirmed and found into the new word classification and labeling module;
and S5, classifying and labeling the entered confirmed new words by the new word classifying and labeling module.
Furthermore, the automatic calculation Chinese word classification system further comprises a communication module, and the communication module is used for connecting the automatic calculation Chinese word classification system with the internet or a cloud server.
Further, the automatic calculation Chinese word classification system further comprises a large database, the large database comprises a Chinese part of speech database, a Chinese new word storage database and a wrong Chinese word database, the Chinese part of speech database, the Chinese new word storage database and the wrong Chinese word database are all connected with the automatic intelligent calculation module, all the known Chinese part of speech data are stored in the Chinese part of speech database, and the Chinese new word storage database is used for storing the new word data classified and labeled in the step S5; the wrong chinese vocabulary database is used to store data of wrong vocabulary that was verified to be not a new word in step S4.
Further, the input module is used for inputting Chinese articles needing new word classification and labeling.
Further, the new word recognition module is configured to recognize the input chinese article, recognize and find sentences of possible new words, and send the found sentences of possible new words to the automatic intelligent computation module.
Further, the automatic intelligent computation module is configured to perform intelligent computation and analysis on the received sentences of the possible new words, bring the found new words into other sentences in the new word verification module to verify the part of speech, and bring the new words confirmed to be found into the new word classification and labeling module when the new words are confirmed to be found.
Furthermore, the new word verification module comprises a formulated template sentence, and the template sentence is used for carrying in the new word for verification.
Further, the new word classification and labeling module is used for classifying and labeling the confirmed new words and storing the classified and labeled new words into the Chinese new word storage database.
Furthermore, the classification method comprises the steps of automatically calculating and classifying the classification of each word in a mode of establishing classification samples and collecting language habits, and automatically adding multiple labels.
Furthermore, the marked content is the inheritance characteristics of the vocabulary, the grammar structure, the sequence and the character characteristics in the Chinese language.
In summary, the invention mainly has the following beneficial effects:
the invention calculates the classification of each word by researching the inheritance characteristics of vocabulary, grammar structure, sequence and character characteristics in the language and by establishing a classification sample and collecting language habits, finally realizes automatic classification and automatically adds a plurality of labels, thereby effectively breaking the situation that the classification of the words in the prior art is limited to a certain extent.
Drawings
FIG. 1 is a flow diagram of a method for automatically computing Chinese word classifications according to one embodiment;
FIG. 2 is a block diagram of an exemplary system for automatically computing Chinese word classifications in a method for automatically computing Chinese word classifications;
fig. 3 is a schematic structural diagram of a large database in an automatic computation chinese word classification system in the method for automatically computing a chinese word classification according to an embodiment.
Detailed Description
The present invention is described in further detail below with reference to figures 1-3.
Example 1
A method for automatically calculating a classification of chinese words, as shown in fig. 1-2, comprising the steps of:
s1, establishing an automatic calculation Chinese word classification system, wherein the automatic calculation Chinese word classification system comprises an input module, a new word recognition module, an automatic intelligent calculation module, a new word verification module and a new word classification labeling module;
s2, inputting Chinese articles through the input module;
s3, the new word recognition module recognizes the input Chinese article, finds sentences of possible new words and sends the found sentences of possible new words to the automatic intelligent computation module;
s4, the automatic intelligent computation module intelligently computes and analyzes the received sentences of the possible new words, if a new word is found, the automatic intelligent computation module brings the found new word into other sentences in the new word verification module to verify the part of speech, if a new word is verified and found, the automatic intelligent computation module brings the new word confirmed and found into the new word classification and labeling module;
and S5, classifying and labeling the entered confirmed new words by the new word classifying and labeling module.
Preferably, as shown in fig. 2, the automatic calculation chinese word classification system further includes a communication module, and the communication module is used for connecting the automatic calculation chinese word classification system to the internet or a cloud server.
Preferably, as shown in fig. 2 and 3, the automatic calculation chinese word classification system further includes a big database, the big database includes a chinese part-of-speech database, a chinese new word storage database, and a wrong chinese word database, the chinese part-of-speech database, the chinese new word storage database, and the wrong chinese word database are all connected to the automatic intelligent calculation module, all the known chinese part-of-speech data are stored in the chinese part-of-speech database, and the chinese new word storage database is used for storing the new word data classified and labeled in step S5; the wrong chinese vocabulary database is used to store data of wrong vocabulary that was verified to be not a new word in step S4.
Preferably, the input module is used for inputting Chinese articles needing new word classification and labeling.
Preferably, the module for recognizing new words is used for recognizing the input chinese article, recognizing and finding sentences of new words which may exist, and sending the found sentences of new words which may exist to the automatic intelligent computing module.
Preferably, the automatic intelligent computation module is configured to perform intelligent computation and analysis on the received sentences of the possible new words, bring the found new words into other sentences in the new word verification module to verify the part of speech, and bring the new words confirmed to be found into the new word classification and labeling module when the new words are confirmed to be found.
Preferably, the new word verification module includes a formulated template sentence, and the template sentence is used for carrying in the new word for verification.
Preferably, the new word classification and labeling module is used for classifying and labeling the confirmed new words and storing the classified and labeled new words into the Chinese new word storage database.
Preferably, the classification method is to automatically calculate and classify the classification of each word by establishing classification samples and collecting language habits, and automatically add multiple labels.
Preferably, the marked content is the inheritance characteristics of words, grammar structures, sequences and character features in the Chinese language.
Example 2
This example illustrates the scheme proposed by the present invention:
the names of people include father, mother and the like, the names are firstly decomposed from nouns to form classifications, with the development of the era, many new names appear, such as father, father and even crime, the names can be recognized as a new vocabulary through an algorithm at the first time after the names appear, and then calculation is carried out through an automatic calculation Chinese word classification system, and the part of speech of the vocabulary is a noun and is classified as a name.
In summary, the invention calculates the classification of each word by studying the inheritance characteristics of vocabulary, grammar structure, sequence and character characteristics in the language and by establishing a classification sample and collecting language habits, finally realizes automatic classification and automatically adds multiple labels, thereby effectively breaking the situation that the classification of the words in the prior art is limited to a certain extent.
The parts not involved in the present invention are the same as or can be implemented by the prior art. The present embodiment is only for explaining the present invention, and it is not limited to the present invention, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present invention.
Claims (10)
1. A method for automatically calculating Chinese word classification is characterized in that: the method comprises the following steps:
s1, establishing an automatic calculation Chinese word classification system, wherein the automatic calculation Chinese word classification system comprises an input module, a new word recognition module, an automatic intelligent calculation module, a new word verification module and a new word classification labeling module;
s2, inputting Chinese articles through the input module;
s3, the new word recognition module recognizes the input Chinese article, finds sentences of possible new words and sends the found sentences of possible new words to the automatic intelligent computation module;
s4, the automatic intelligent computation module intelligently computes and analyzes the received sentences of the possible new words, if a new word is found, the automatic intelligent computation module brings the found new word into other sentences in the new word verification module to verify the part of speech, if a new word is verified and found, the automatic intelligent computation module brings the new word confirmed and found into the new word classification and labeling module;
and S5, classifying and labeling the entered confirmed new words by the new word classifying and labeling module.
2. The method of claim 1, wherein the method comprises the steps of: the automatic calculation Chinese word classification system further comprises a communication module, and the communication module is used for connecting the automatic calculation Chinese word classification system with the Internet or a cloud server.
3. The method of claim 1, wherein the method comprises the steps of: the automatic calculation Chinese word classification system further comprises a big database, the big database comprises a Chinese part of speech database, a Chinese new word storage database and a wrong Chinese word database, the Chinese part of speech database, the Chinese new word storage database and the wrong Chinese word database are all connected with the automatic intelligent calculation module, all the known Chinese part of speech data are stored in the Chinese part of speech database, and the Chinese new word storage database is used for storing the new word data classified and labeled in the step S5; the wrong chinese vocabulary database is used to store data of wrong vocabulary that was verified to be not a new word in step S4.
4. The method of claim 1, wherein the method comprises the steps of: the input module is used for inputting Chinese articles needing new word classification and labeling.
5. The method of claim 1, wherein the method comprises the steps of: the new word recognition module is used for recognizing the input Chinese article, recognizing sentences of the possible new words, finding the sentences of the possible new words, and sending the found sentences of the possible new words to the automatic intelligent calculation module.
6. The method of claim 1, wherein the method comprises the steps of: the automatic intelligent computing module is used for receiving sentences of the possible new words to carry out intelligent computing analysis processing, bringing the found new words into other sentences in the new word verification module to verify the part of speech, and bringing the found new words into the new word classification and labeling module when the new words are confirmed to be found.
7. The method of claim 1, wherein the method comprises the steps of: the new word verification module comprises formulated template sentences, and the template sentences are used for carrying in new words for verification.
8. The method of claim 3, wherein the step of automatically computing the classification of the chinese words comprises: and the new word classification and labeling module is used for classifying and labeling the confirmed new words and storing the classified and labeled new words into the Chinese new word storage database.
9. The method of claim 8, wherein the step of automatically computing a classification of chinese words comprises: the classification method comprises the steps of automatically calculating and classifying the classification of each word in a mode of establishing classification samples and collecting language habits, and automatically adding multiple labels.
10. The method of claim 8, wherein the step of automatically computing a classification of chinese words comprises: the marked content is the inheritance characteristics of the vocabulary, the grammar structure, the sequence and the character characteristics in the Chinese language.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010689433.8A CN112115260A (en) | 2020-07-17 | 2020-07-17 | Method for automatically calculating Chinese word classification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010689433.8A CN112115260A (en) | 2020-07-17 | 2020-07-17 | Method for automatically calculating Chinese word classification |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112115260A true CN112115260A (en) | 2020-12-22 |
Family
ID=73799640
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010689433.8A Pending CN112115260A (en) | 2020-07-17 | 2020-07-17 | Method for automatically calculating Chinese word classification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112115260A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101046809A (en) * | 2006-03-28 | 2007-10-03 | 吴风勇 | New word identification method based on association rule model |
CN104915327A (en) * | 2014-03-14 | 2015-09-16 | 腾讯科技(深圳)有限公司 | Text information processing method and device |
CN105138510A (en) * | 2015-08-10 | 2015-12-09 | 昆明理工大学 | Microblog-based neologism emotional tendency judgment method |
US20160364377A1 (en) * | 2015-06-12 | 2016-12-15 | Satyanarayana Krishnamurthy | Language Processing And Knowledge Building System |
CN106815189A (en) * | 2015-11-27 | 2017-06-09 | 镇江诺尼基智能技术有限公司 | A kind of new verb identifying system of Chinese and method |
-
2020
- 2020-07-17 CN CN202010689433.8A patent/CN112115260A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101046809A (en) * | 2006-03-28 | 2007-10-03 | 吴风勇 | New word identification method based on association rule model |
CN104915327A (en) * | 2014-03-14 | 2015-09-16 | 腾讯科技(深圳)有限公司 | Text information processing method and device |
US20160364377A1 (en) * | 2015-06-12 | 2016-12-15 | Satyanarayana Krishnamurthy | Language Processing And Knowledge Building System |
CN105138510A (en) * | 2015-08-10 | 2015-12-09 | 昆明理工大学 | Microblog-based neologism emotional tendency judgment method |
CN106815189A (en) * | 2015-11-27 | 2017-06-09 | 镇江诺尼基智能技术有限公司 | A kind of new verb identifying system of Chinese and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110134757B (en) | Event argument role extraction method based on multi-head attention mechanism | |
CN109241255B (en) | Intention identification method based on deep learning | |
CN106407333B (en) | Spoken language query identification method and device based on artificial intelligence | |
CN108304468B (en) | Text classification method and text classification device | |
CN110110327B (en) | Text labeling method and equipment based on counterstudy | |
CN112347268A (en) | Text-enhanced knowledge graph joint representation learning method and device | |
Boltužić et al. | Fill the gap! analyzing implicit premises between claims from online debates | |
CN116795973B (en) | Text processing method and device based on artificial intelligence, electronic equipment and medium | |
CN113505200B (en) | Sentence-level Chinese event detection method combined with document key information | |
CN115080750B (en) | Weak supervision text classification method, system and device based on fusion prompt sequence | |
CN112580330B (en) | Vietnam news event detection method based on Chinese trigger word guidance | |
CN113157859A (en) | Event detection method based on upper concept information | |
CN114329225A (en) | Search method, device, equipment and storage medium based on search statement | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
CN113051922A (en) | Triple extraction method and system based on deep learning | |
CN113326702A (en) | Semantic recognition method and device, electronic equipment and storage medium | |
CN107480197B (en) | Entity word recognition method and device | |
Yao | Attention-based BiLSTM neural networks for sentiment classification of short texts | |
CN117332789A (en) | Semantic analysis method and system for dialogue scene | |
CN111783464A (en) | Electric power-oriented domain entity identification method, system and storage medium | |
CN115600595A (en) | Entity relationship extraction method, system, equipment and readable storage medium | |
CN112115260A (en) | Method for automatically calculating Chinese word classification | |
CN113468311B (en) | Knowledge graph-based complex question and answer method, device and storage medium | |
CN113626553B (en) | Cascade binary Chinese entity relation extraction method based on pre-training model | |
CN115238077A (en) | Text analysis method, device and equipment based on artificial intelligence and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |