CN112115260A

CN112115260A - Method for automatically calculating Chinese word classification

Info

Publication number: CN112115260A
Application number: CN202010689433.8A
Authority: CN
Inventors: 张莹; 彭瑶
Original assignee: Entertainment Interactive Technology Beijing Co ltd
Current assignee: Entertainment Interactive Technology Beijing Co ltd
Priority date: 2020-07-17
Filing date: 2020-07-17
Publication date: 2020-12-22

Abstract

The invention discloses a method for automatically calculating Chinese word classification, which comprises the following steps: s1, establishing an automatic calculation Chinese word classification system, wherein the automatic calculation Chinese word classification system comprises an input module, a new word recognition module, an automatic intelligent calculation module, a new word verification module and a new word classification labeling module; and S2, inputting the Chinese article through the input module. The invention calculates the classification of each word by researching the inheritance characteristics of vocabulary, grammar structure, sequence and character characteristics in the language and by establishing a classification sample and collecting language habits, finally realizes automatic classification and automatically adds a plurality of labels, thereby effectively breaking the situation that the classification of the words in the prior art is limited to a certain extent.

Description

Method for automatically calculating Chinese word classification

Technical Field

The invention relates to the technical field of natural semantic recognition, in particular to a method for automatically calculating Chinese word classification.

Background

Natural language is a crystal of human intelligence, natural language processing is one of the most difficult problems in artificial intelligence, and research into natural language processing is also attractive and challenging. In theory, natural semantic recognition, NLP, is an attractive way for human-computer interaction. Early language processing systems, such as SHRDLU, worked reasonably well when they were in a limited "building block world," using limited vocabulary sessions. This makes the system quite optimistic for researchers, however, they quickly lose confidence when expanding the system into an environment full of real-world ambiguities and uncertainties.

Natural language learning, while also being viewed as an artificial intelligence complete (AI-complete) problem, requires extensive knowledge about the world and the ability to use this knowledge due to the understanding of natural language. Meanwhile, in natural language processing, the definition of "understanding" also becomes a major problem. Research on understanding the definition problem has been of interest.

The existing NLP is more established in the fields of scientific word segmentation, word vectors, part-of-speech correlation, IDF and the like, is based on application practice after the part-of-speech is correct, and is a brand-new field for automatically classifying words by a computer.

However, the present parts of speech and word classification are generated by manual sorting, dictionaries and historical documents, and the above method has the following technical problems:

1. the classification of words is too coarse, such as nouns, verbs, adjectives …, where millions of nouns in the large class are very inconvenient in NLP applications;

2. the new words are layered endlessly, and the manual sorting lacks a quick discovery mechanism and accuracy;

3. multiple classification can occur to a word, the word can only belong to one class in the traditional mode, and a label is added, so that the classification of the word is limited to a certain extent.

Therefore, a method for automatically calculating Chinese word classification is provided.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a method for automatically calculating Chinese word classification, which calculates the classification of each word by researching the inheritance characteristics of words, grammar structures, sequences and character characteristics in languages and by establishing classification samples and collecting language habits, finally realizes automatic classification and automatically adds multiple labels, thereby effectively breaking the situation that the classification of words in the prior art is limited to a certain extent and solving the problems in the background art.

In order to achieve the purpose, the invention provides the following technical scheme:

a method for automatically calculating Chinese word classification comprises the following steps:

s1, establishing an automatic calculation Chinese word classification system, wherein the automatic calculation Chinese word classification system comprises an input module, a new word recognition module, an automatic intelligent calculation module, a new word verification module and a new word classification labeling module;

s2, inputting Chinese articles through the input module;

s3, the new word recognition module recognizes the input Chinese article, finds sentences of possible new words and sends the found sentences of possible new words to the automatic intelligent computation module;

s4, the automatic intelligent computation module intelligently computes and analyzes the received sentences of the possible new words, if a new word is found, the automatic intelligent computation module brings the found new word into other sentences in the new word verification module to verify the part of speech, if a new word is verified and found, the automatic intelligent computation module brings the new word confirmed and found into the new word classification and labeling module;

and S5, classifying and labeling the entered confirmed new words by the new word classifying and labeling module.

Furthermore, the automatic calculation Chinese word classification system further comprises a communication module, and the communication module is used for connecting the automatic calculation Chinese word classification system with the internet or a cloud server.

Further, the automatic calculation Chinese word classification system further comprises a large database, the large database comprises a Chinese part of speech database, a Chinese new word storage database and a wrong Chinese word database, the Chinese part of speech database, the Chinese new word storage database and the wrong Chinese word database are all connected with the automatic intelligent calculation module, all the known Chinese part of speech data are stored in the Chinese part of speech database, and the Chinese new word storage database is used for storing the new word data classified and labeled in the step S5; the wrong chinese vocabulary database is used to store data of wrong vocabulary that was verified to be not a new word in step S4.

Further, the input module is used for inputting Chinese articles needing new word classification and labeling.

Further, the new word recognition module is configured to recognize the input chinese article, recognize and find sentences of possible new words, and send the found sentences of possible new words to the automatic intelligent computation module.

Further, the automatic intelligent computation module is configured to perform intelligent computation and analysis on the received sentences of the possible new words, bring the found new words into other sentences in the new word verification module to verify the part of speech, and bring the new words confirmed to be found into the new word classification and labeling module when the new words are confirmed to be found.

Furthermore, the new word verification module comprises a formulated template sentence, and the template sentence is used for carrying in the new word for verification.

Further, the new word classification and labeling module is used for classifying and labeling the confirmed new words and storing the classified and labeled new words into the Chinese new word storage database.

Furthermore, the classification method comprises the steps of automatically calculating and classifying the classification of each word in a mode of establishing classification samples and collecting language habits, and automatically adding multiple labels.

Furthermore, the marked content is the inheritance characteristics of the vocabulary, the grammar structure, the sequence and the character characteristics in the Chinese language.

In summary, the invention mainly has the following beneficial effects:

the invention calculates the classification of each word by researching the inheritance characteristics of vocabulary, grammar structure, sequence and character characteristics in the language and by establishing a classification sample and collecting language habits, finally realizes automatic classification and automatically adds a plurality of labels, thereby effectively breaking the situation that the classification of the words in the prior art is limited to a certain extent.

Drawings

FIG. 1 is a flow diagram of a method for automatically computing Chinese word classifications according to one embodiment;

FIG. 2 is a block diagram of an exemplary system for automatically computing Chinese word classifications in a method for automatically computing Chinese word classifications;

fig. 3 is a schematic structural diagram of a large database in an automatic computation chinese word classification system in the method for automatically computing a chinese word classification according to an embodiment.

Detailed Description

The present invention is described in further detail below with reference to figures 1-3.

Example 1

A method for automatically calculating a classification of chinese words, as shown in fig. 1-2, comprising the steps of:

s2, inputting Chinese articles through the input module;

Preferably, as shown in fig. 2, the automatic calculation chinese word classification system further includes a communication module, and the communication module is used for connecting the automatic calculation chinese word classification system to the internet or a cloud server.

Preferably, as shown in fig. 2 and 3, the automatic calculation chinese word classification system further includes a big database, the big database includes a chinese part-of-speech database, a chinese new word storage database, and a wrong chinese word database, the chinese part-of-speech database, the chinese new word storage database, and the wrong chinese word database are all connected to the automatic intelligent calculation module, all the known chinese part-of-speech data are stored in the chinese part-of-speech database, and the chinese new word storage database is used for storing the new word data classified and labeled in step S5; the wrong chinese vocabulary database is used to store data of wrong vocabulary that was verified to be not a new word in step S4.

Preferably, the input module is used for inputting Chinese articles needing new word classification and labeling.

Preferably, the module for recognizing new words is used for recognizing the input chinese article, recognizing and finding sentences of new words which may exist, and sending the found sentences of new words which may exist to the automatic intelligent computing module.

Preferably, the automatic intelligent computation module is configured to perform intelligent computation and analysis on the received sentences of the possible new words, bring the found new words into other sentences in the new word verification module to verify the part of speech, and bring the new words confirmed to be found into the new word classification and labeling module when the new words are confirmed to be found.

Preferably, the new word verification module includes a formulated template sentence, and the template sentence is used for carrying in the new word for verification.

Preferably, the new word classification and labeling module is used for classifying and labeling the confirmed new words and storing the classified and labeled new words into the Chinese new word storage database.

Preferably, the classification method is to automatically calculate and classify the classification of each word by establishing classification samples and collecting language habits, and automatically add multiple labels.

Preferably, the marked content is the inheritance characteristics of words, grammar structures, sequences and character features in the Chinese language.

Example 2

This example illustrates the scheme proposed by the present invention:

the names of people include father, mother and the like, the names are firstly decomposed from nouns to form classifications, with the development of the era, many new names appear, such as father, father and even crime, the names can be recognized as a new vocabulary through an algorithm at the first time after the names appear, and then calculation is carried out through an automatic calculation Chinese word classification system, and the part of speech of the vocabulary is a noun and is classified as a name.

In summary, the invention calculates the classification of each word by studying the inheritance characteristics of vocabulary, grammar structure, sequence and character characteristics in the language and by establishing a classification sample and collecting language habits, finally realizes automatic classification and automatically adds multiple labels, thereby effectively breaking the situation that the classification of the words in the prior art is limited to a certain extent.

The parts not involved in the present invention are the same as or can be implemented by the prior art. The present embodiment is only for explaining the present invention, and it is not limited to the present invention, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present invention.

Claims

1. A method for automatically calculating Chinese word classification is characterized in that: the method comprises the following steps:

s2, inputting Chinese articles through the input module;

2. The method of claim 1, wherein the method comprises the steps of: the automatic calculation Chinese word classification system further comprises a communication module, and the communication module is used for connecting the automatic calculation Chinese word classification system with the Internet or a cloud server.

3. The method of claim 1, wherein the method comprises the steps of: the automatic calculation Chinese word classification system further comprises a big database, the big database comprises a Chinese part of speech database, a Chinese new word storage database and a wrong Chinese word database, the Chinese part of speech database, the Chinese new word storage database and the wrong Chinese word database are all connected with the automatic intelligent calculation module, all the known Chinese part of speech data are stored in the Chinese part of speech database, and the Chinese new word storage database is used for storing the new word data classified and labeled in the step S5; the wrong chinese vocabulary database is used to store data of wrong vocabulary that was verified to be not a new word in step S4.

4. The method of claim 1, wherein the method comprises the steps of: the input module is used for inputting Chinese articles needing new word classification and labeling.

5. The method of claim 1, wherein the method comprises the steps of: the new word recognition module is used for recognizing the input Chinese article, recognizing sentences of the possible new words, finding the sentences of the possible new words, and sending the found sentences of the possible new words to the automatic intelligent calculation module.

6. The method of claim 1, wherein the method comprises the steps of: the automatic intelligent computing module is used for receiving sentences of the possible new words to carry out intelligent computing analysis processing, bringing the found new words into other sentences in the new word verification module to verify the part of speech, and bringing the found new words into the new word classification and labeling module when the new words are confirmed to be found.

7. The method of claim 1, wherein the method comprises the steps of: the new word verification module comprises formulated template sentences, and the template sentences are used for carrying in new words for verification.

8. The method of claim 3, wherein the step of automatically computing the classification of the chinese words comprises: and the new word classification and labeling module is used for classifying and labeling the confirmed new words and storing the classified and labeled new words into the Chinese new word storage database.

9. The method of claim 8, wherein the step of automatically computing a classification of chinese words comprises: the classification method comprises the steps of automatically calculating and classifying the classification of each word in a mode of establishing classification samples and collecting language habits, and automatically adding multiple labels.

10. The method of claim 8, wherein the step of automatically computing a classification of chinese words comprises: the marked content is the inheritance characteristics of the vocabulary, the grammar structure, the sequence and the character characteristics in the Chinese language.