CN112115260A - Method for automatically calculating Chinese word classification - Google Patents

Method for automatically calculating Chinese word classification Download PDF

Info

Publication number
CN112115260A
CN112115260A CN202010689433.8A CN202010689433A CN112115260A CN 112115260 A CN112115260 A CN 112115260A CN 202010689433 A CN202010689433 A CN 202010689433A CN 112115260 A CN112115260 A CN 112115260A
Authority
CN
China
Prior art keywords
chinese
module
word
new
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010689433.8A
Other languages
Chinese (zh)
Inventor
张莹
彭瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Entertainment Interactive Technology Beijing Co ltd
Original Assignee
Entertainment Interactive Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Entertainment Interactive Technology Beijing Co ltd filed Critical Entertainment Interactive Technology Beijing Co ltd
Priority to CN202010689433.8A priority Critical patent/CN112115260A/en
Publication of CN112115260A publication Critical patent/CN112115260A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for automatically calculating Chinese word classification, which comprises the following steps: s1, establishing an automatic calculation Chinese word classification system, wherein the automatic calculation Chinese word classification system comprises an input module, a new word recognition module, an automatic intelligent calculation module, a new word verification module and a new word classification labeling module; and S2, inputting the Chinese article through the input module. The invention calculates the classification of each word by researching the inheritance characteristics of vocabulary, grammar structure, sequence and character characteristics in the language and by establishing a classification sample and collecting language habits, finally realizes automatic classification and automatically adds a plurality of labels, thereby effectively breaking the situation that the classification of the words in the prior art is limited to a certain extent.

Description

Method for automatically calculating Chinese word classification
Technical Field
The invention relates to the technical field of natural semantic recognition, in particular to a method for automatically calculating Chinese word classification.
Background
Natural language is a crystal of human intelligence, natural language processing is one of the most difficult problems in artificial intelligence, and research into natural language processing is also attractive and challenging. In theory, natural semantic recognition, NLP, is an attractive way for human-computer interaction. Early language processing systems, such as SHRDLU, worked reasonably well when they were in a limited "building block world," using limited vocabulary sessions. This makes the system quite optimistic for researchers, however, they quickly lose confidence when expanding the system into an environment full of real-world ambiguities and uncertainties.
Natural language learning, while also being viewed as an artificial intelligence complete (AI-complete) problem, requires extensive knowledge about the world and the ability to use this knowledge due to the understanding of natural language. Meanwhile, in natural language processing, the definition of "understanding" also becomes a major problem. Research on understanding the definition problem has been of interest.
The existing NLP is more established in the fields of scientific word segmentation, word vectors, part-of-speech correlation, IDF and the like, is based on application practice after the part-of-speech is correct, and is a brand-new field for automatically classifying words by a computer.
However, the present parts of speech and word classification are generated by manual sorting, dictionaries and historical documents, and the above method has the following technical problems:
1. the classification of words is too coarse, such as nouns, verbs, adjectives …, where millions of nouns in the large class are very inconvenient in NLP applications;
2. the new words are layered endlessly, and the manual sorting lacks a quick discovery mechanism and accuracy;
3. multiple classification can occur to a word, the word can only belong to one class in the traditional mode, and a label is added, so that the classification of the word is limited to a certain extent.
Therefore, a method for automatically calculating Chinese word classification is provided.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method for automatically calculating Chinese word classification, which calculates the classification of each word by researching the inheritance characteristics of words, grammar structures, sequences and character characteristics in languages and by establishing classification samples and collecting language habits, finally realizes automatic classification and automatically adds multiple labels, thereby effectively breaking the situation that the classification of words in the prior art is limited to a certain extent and solving the problems in the background art.
In order to achieve the purpose, the invention provides the following technical scheme:
a method for automatically calculating Chinese word classification comprises the following steps:
s1, establishing an automatic calculation Chinese word classification system, wherein the automatic calculation Chinese word classification system comprises an input module, a new word recognition module, an automatic intelligent calculation module, a new word verification module and a new word classification labeling module;
s2, inputting Chinese articles through the input module;
s3, the new word recognition module recognizes the input Chinese article, finds sentences of possible new words and sends the found sentences of possible new words to the automatic intelligent computation module;
s4, the automatic intelligent computation module intelligently computes and analyzes the received sentences of the possible new words, if a new word is found, the automatic intelligent computation module brings the found new word into other sentences in the new word verification module to verify the part of speech, if a new word is verified and found, the automatic intelligent computation module brings the new word confirmed and found into the new word classification and labeling module;
and S5, classifying and labeling the entered confirmed new words by the new word classifying and labeling module.
Furthermore, the automatic calculation Chinese word classification system further comprises a communication module, and the communication module is used for connecting the automatic calculation Chinese word classification system with the internet or a cloud server.
Further, the automatic calculation Chinese word classification system further comprises a large database, the large database comprises a Chinese part of speech database, a Chinese new word storage database and a wrong Chinese word database, the Chinese part of speech database, the Chinese new word storage database and the wrong Chinese word database are all connected with the automatic intelligent calculation module, all the known Chinese part of speech data are stored in the Chinese part of speech database, and the Chinese new word storage database is used for storing the new word data classified and labeled in the step S5; the wrong chinese vocabulary database is used to store data of wrong vocabulary that was verified to be not a new word in step S4.
Further, the input module is used for inputting Chinese articles needing new word classification and labeling.
Further, the new word recognition module is configured to recognize the input chinese article, recognize and find sentences of possible new words, and send the found sentences of possible new words to the automatic intelligent computation module.
Further, the automatic intelligent computation module is configured to perform intelligent computation and analysis on the received sentences of the possible new words, bring the found new words into other sentences in the new word verification module to verify the part of speech, and bring the new words confirmed to be found into the new word classification and labeling module when the new words are confirmed to be found.
Furthermore, the new word verification module comprises a formulated template sentence, and the template sentence is used for carrying in the new word for verification.
Further, the new word classification and labeling module is used for classifying and labeling the confirmed new words and storing the classified and labeled new words into the Chinese new word storage database.
Furthermore, the classification method comprises the steps of automatically calculating and classifying the classification of each word in a mode of establishing classification samples and collecting language habits, and automatically adding multiple labels.
Furthermore, the marked content is the inheritance characteristics of the vocabulary, the grammar structure, the sequence and the character characteristics in the Chinese language.
In summary, the invention mainly has the following beneficial effects:
the invention calculates the classification of each word by researching the inheritance characteristics of vocabulary, grammar structure, sequence and character characteristics in the language and by establishing a classification sample and collecting language habits, finally realizes automatic classification and automatically adds a plurality of labels, thereby effectively breaking the situation that the classification of the words in the prior art is limited to a certain extent.
Drawings
FIG. 1 is a flow diagram of a method for automatically computing Chinese word classifications according to one embodiment;
FIG. 2 is a block diagram of an exemplary system for automatically computing Chinese word classifications in a method for automatically computing Chinese word classifications;
fig. 3 is a schematic structural diagram of a large database in an automatic computation chinese word classification system in the method for automatically computing a chinese word classification according to an embodiment.
Detailed Description
The present invention is described in further detail below with reference to figures 1-3.
Example 1
A method for automatically calculating a classification of chinese words, as shown in fig. 1-2, comprising the steps of:
s1, establishing an automatic calculation Chinese word classification system, wherein the automatic calculation Chinese word classification system comprises an input module, a new word recognition module, an automatic intelligent calculation module, a new word verification module and a new word classification labeling module;
s2, inputting Chinese articles through the input module;
s3, the new word recognition module recognizes the input Chinese article, finds sentences of possible new words and sends the found sentences of possible new words to the automatic intelligent computation module;
s4, the automatic intelligent computation module intelligently computes and analyzes the received sentences of the possible new words, if a new word is found, the automatic intelligent computation module brings the found new word into other sentences in the new word verification module to verify the part of speech, if a new word is verified and found, the automatic intelligent computation module brings the new word confirmed and found into the new word classification and labeling module;
and S5, classifying and labeling the entered confirmed new words by the new word classifying and labeling module.
Preferably, as shown in fig. 2, the automatic calculation chinese word classification system further includes a communication module, and the communication module is used for connecting the automatic calculation chinese word classification system to the internet or a cloud server.
Preferably, as shown in fig. 2 and 3, the automatic calculation chinese word classification system further includes a big database, the big database includes a chinese part-of-speech database, a chinese new word storage database, and a wrong chinese word database, the chinese part-of-speech database, the chinese new word storage database, and the wrong chinese word database are all connected to the automatic intelligent calculation module, all the known chinese part-of-speech data are stored in the chinese part-of-speech database, and the chinese new word storage database is used for storing the new word data classified and labeled in step S5; the wrong chinese vocabulary database is used to store data of wrong vocabulary that was verified to be not a new word in step S4.
Preferably, the input module is used for inputting Chinese articles needing new word classification and labeling.
Preferably, the module for recognizing new words is used for recognizing the input chinese article, recognizing and finding sentences of new words which may exist, and sending the found sentences of new words which may exist to the automatic intelligent computing module.
Preferably, the automatic intelligent computation module is configured to perform intelligent computation and analysis on the received sentences of the possible new words, bring the found new words into other sentences in the new word verification module to verify the part of speech, and bring the new words confirmed to be found into the new word classification and labeling module when the new words are confirmed to be found.
Preferably, the new word verification module includes a formulated template sentence, and the template sentence is used for carrying in the new word for verification.
Preferably, the new word classification and labeling module is used for classifying and labeling the confirmed new words and storing the classified and labeled new words into the Chinese new word storage database.
Preferably, the classification method is to automatically calculate and classify the classification of each word by establishing classification samples and collecting language habits, and automatically add multiple labels.
Preferably, the marked content is the inheritance characteristics of words, grammar structures, sequences and character features in the Chinese language.
Example 2
This example illustrates the scheme proposed by the present invention:
the names of people include father, mother and the like, the names are firstly decomposed from nouns to form classifications, with the development of the era, many new names appear, such as father, father and even crime, the names can be recognized as a new vocabulary through an algorithm at the first time after the names appear, and then calculation is carried out through an automatic calculation Chinese word classification system, and the part of speech of the vocabulary is a noun and is classified as a name.
In summary, the invention calculates the classification of each word by studying the inheritance characteristics of vocabulary, grammar structure, sequence and character characteristics in the language and by establishing a classification sample and collecting language habits, finally realizes automatic classification and automatically adds multiple labels, thereby effectively breaking the situation that the classification of the words in the prior art is limited to a certain extent.
The parts not involved in the present invention are the same as or can be implemented by the prior art. The present embodiment is only for explaining the present invention, and it is not limited to the present invention, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present invention.

Claims (10)

1. A method for automatically calculating Chinese word classification is characterized in that: the method comprises the following steps:
s1, establishing an automatic calculation Chinese word classification system, wherein the automatic calculation Chinese word classification system comprises an input module, a new word recognition module, an automatic intelligent calculation module, a new word verification module and a new word classification labeling module;
s2, inputting Chinese articles through the input module;
s3, the new word recognition module recognizes the input Chinese article, finds sentences of possible new words and sends the found sentences of possible new words to the automatic intelligent computation module;
s4, the automatic intelligent computation module intelligently computes and analyzes the received sentences of the possible new words, if a new word is found, the automatic intelligent computation module brings the found new word into other sentences in the new word verification module to verify the part of speech, if a new word is verified and found, the automatic intelligent computation module brings the new word confirmed and found into the new word classification and labeling module;
and S5, classifying and labeling the entered confirmed new words by the new word classifying and labeling module.
2. The method of claim 1, wherein the method comprises the steps of: the automatic calculation Chinese word classification system further comprises a communication module, and the communication module is used for connecting the automatic calculation Chinese word classification system with the Internet or a cloud server.
3. The method of claim 1, wherein the method comprises the steps of: the automatic calculation Chinese word classification system further comprises a big database, the big database comprises a Chinese part of speech database, a Chinese new word storage database and a wrong Chinese word database, the Chinese part of speech database, the Chinese new word storage database and the wrong Chinese word database are all connected with the automatic intelligent calculation module, all the known Chinese part of speech data are stored in the Chinese part of speech database, and the Chinese new word storage database is used for storing the new word data classified and labeled in the step S5; the wrong chinese vocabulary database is used to store data of wrong vocabulary that was verified to be not a new word in step S4.
4. The method of claim 1, wherein the method comprises the steps of: the input module is used for inputting Chinese articles needing new word classification and labeling.
5. The method of claim 1, wherein the method comprises the steps of: the new word recognition module is used for recognizing the input Chinese article, recognizing sentences of the possible new words, finding the sentences of the possible new words, and sending the found sentences of the possible new words to the automatic intelligent calculation module.
6. The method of claim 1, wherein the method comprises the steps of: the automatic intelligent computing module is used for receiving sentences of the possible new words to carry out intelligent computing analysis processing, bringing the found new words into other sentences in the new word verification module to verify the part of speech, and bringing the found new words into the new word classification and labeling module when the new words are confirmed to be found.
7. The method of claim 1, wherein the method comprises the steps of: the new word verification module comprises formulated template sentences, and the template sentences are used for carrying in new words for verification.
8. The method of claim 3, wherein the step of automatically computing the classification of the chinese words comprises: and the new word classification and labeling module is used for classifying and labeling the confirmed new words and storing the classified and labeled new words into the Chinese new word storage database.
9. The method of claim 8, wherein the step of automatically computing a classification of chinese words comprises: the classification method comprises the steps of automatically calculating and classifying the classification of each word in a mode of establishing classification samples and collecting language habits, and automatically adding multiple labels.
10. The method of claim 8, wherein the step of automatically computing a classification of chinese words comprises: the marked content is the inheritance characteristics of the vocabulary, the grammar structure, the sequence and the character characteristics in the Chinese language.
CN202010689433.8A 2020-07-17 2020-07-17 Method for automatically calculating Chinese word classification Pending CN112115260A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010689433.8A CN112115260A (en) 2020-07-17 2020-07-17 Method for automatically calculating Chinese word classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010689433.8A CN112115260A (en) 2020-07-17 2020-07-17 Method for automatically calculating Chinese word classification

Publications (1)

Publication Number Publication Date
CN112115260A true CN112115260A (en) 2020-12-22

Family

ID=73799640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010689433.8A Pending CN112115260A (en) 2020-07-17 2020-07-17 Method for automatically calculating Chinese word classification

Country Status (1)

Country Link
CN (1) CN112115260A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101046809A (en) * 2006-03-28 2007-10-03 吴风勇 New word identification method based on association rule model
CN104915327A (en) * 2014-03-14 2015-09-16 腾讯科技(深圳)有限公司 Text information processing method and device
CN105138510A (en) * 2015-08-10 2015-12-09 昆明理工大学 Microblog-based neologism emotional tendency judgment method
US20160364377A1 (en) * 2015-06-12 2016-12-15 Satyanarayana Krishnamurthy Language Processing And Knowledge Building System
CN106815189A (en) * 2015-11-27 2017-06-09 镇江诺尼基智能技术有限公司 A kind of new verb identifying system of Chinese and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101046809A (en) * 2006-03-28 2007-10-03 吴风勇 New word identification method based on association rule model
CN104915327A (en) * 2014-03-14 2015-09-16 腾讯科技(深圳)有限公司 Text information processing method and device
US20160364377A1 (en) * 2015-06-12 2016-12-15 Satyanarayana Krishnamurthy Language Processing And Knowledge Building System
CN105138510A (en) * 2015-08-10 2015-12-09 昆明理工大学 Microblog-based neologism emotional tendency judgment method
CN106815189A (en) * 2015-11-27 2017-06-09 镇江诺尼基智能技术有限公司 A kind of new verb identifying system of Chinese and method

Similar Documents

Publication Publication Date Title
CN110134757B (en) Event argument role extraction method based on multi-head attention mechanism
CN109241255B (en) Intention identification method based on deep learning
CN106407333B (en) Spoken language query identification method and device based on artificial intelligence
CN108304468B (en) Text classification method and text classification device
CN110110327B (en) Text labeling method and equipment based on counterstudy
CN112347268A (en) Text-enhanced knowledge graph joint representation learning method and device
Boltužić et al. Fill the gap! analyzing implicit premises between claims from online debates
CN116795973B (en) Text processing method and device based on artificial intelligence, electronic equipment and medium
CN113505200B (en) Sentence-level Chinese event detection method combined with document key information
CN115080750B (en) Weak supervision text classification method, system and device based on fusion prompt sequence
CN112580330B (en) Vietnam news event detection method based on Chinese trigger word guidance
CN113157859A (en) Event detection method based on upper concept information
CN114329225A (en) Search method, device, equipment and storage medium based on search statement
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN113051922A (en) Triple extraction method and system based on deep learning
CN113326702A (en) Semantic recognition method and device, electronic equipment and storage medium
CN107480197B (en) Entity word recognition method and device
Yao Attention-based BiLSTM neural networks for sentiment classification of short texts
CN117332789A (en) Semantic analysis method and system for dialogue scene
CN111783464A (en) Electric power-oriented domain entity identification method, system and storage medium
CN115600595A (en) Entity relationship extraction method, system, equipment and readable storage medium
CN112115260A (en) Method for automatically calculating Chinese word classification
CN113468311B (en) Knowledge graph-based complex question and answer method, device and storage medium
CN113626553B (en) Cascade binary Chinese entity relation extraction method based on pre-training model
CN115238077A (en) Text analysis method, device and equipment based on artificial intelligence and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination