CN114417893A - Language association information processing system, method and construction method - Google Patents

Language association information processing system, method and construction method Download PDF

Info

Publication number
CN114417893A
CN114417893A CN202111517180.7A CN202111517180A CN114417893A CN 114417893 A CN114417893 A CN 114417893A CN 202111517180 A CN202111517180 A CN 202111517180A CN 114417893 A CN114417893 A CN 114417893A
Authority
CN
China
Prior art keywords
couplet
language
input
information processing
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111517180.7A
Other languages
Chinese (zh)
Inventor
赵川
尹中
贺鹏
吴畏
黄静雯
周宣志
涂德志
王圆圆
郑雪
唐健
岳鹏
朱洪波
陈永俊
李晓喆
杜玲
卢尧
李晓
彭敦峰
李晟
马源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Univeristy of Technology
Original Assignee
Chengdu Univeristy of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Univeristy of Technology filed Critical Chengdu Univeristy of Technology
Priority to CN202111517180.7A priority Critical patent/CN114417893A/en
Publication of CN114417893A publication Critical patent/CN114417893A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/49Data-driven translation using very large corpora, e.g. the web
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to the technical field of natural language processing, and discloses a language antithetical couplet information processing system, a method and a construction method, wherein the language antithetical couplet information processing system specifically comprises the following components: the input module is used for inputting the uplink through the input box; corpus: for storing the couplet related data; the antithetical couplet generation module is used for triggering corresponding algorithm logic to generate a downlink according to the uplink input by the input module; the output module is used for displaying the generated downlinks in the output frame. The realization of the antithetical system of the invention is a perfect response to ' the antithetical son ' of ' the problem of ' the old-fashioned adhesion ' and ' the opposite son ' of ' the old-fashioned adhesion '; the invention can comprehensively understand the syntactic structure of Chinese more integrally by exploring the topological property of the language. The invention researches the rule of the three dimensions of semantics, syntax and pragmatics from the semantic space to the grammar space and then from the plane space to the three-dimensional topological space in a cognitive angle, thereby breaking the boundary among the three dimensions.

Description

Language association information processing system, method and construction method
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to a language couplet information processing system, a language couplet information processing method and a language couplet information processing construction method.
Background
At present, Chinese is distinguished as Chinese, etc., which belongs to the Tibetan language family and has at least 15 hundred million users. It is quite different from the language structure of english, so the method of research is different accordingly. The existing natural language understanding method used in Chinese is provided on the basis of English, and the problem of difficulty in directly applying the method to Chinese appears. The great difference between Chinese and English lies in that Chinese needs to be participled, while English naturally has a participle function. Thus, the natural language understanding of Chinese and English is very different.
The natural language understanding of Chinese must find a new solution. In the process of reading across subjects, a section of a Chinese grammar which is clarified by Mr. Ji Yanlin in Yi Xin of Mr. Ji Yin who is introduced when the Mr. Ji Yan Lin is commemorated with Mr. Zhao Yuan is not established yet. The inventor is very much in the debate, and names the problem as ' problem of old-fashioned adhesive ", see the scientific publishing company published monograph ' front of intelligent science research '. It is also believed that the resolution of this problem is still being addressed from the point of view of mathematics, formal semantics and artificial intelligence in natural language understanding. And the research is continuously carried out in an intelligent scientific laboratory, and the research starts from the symmetry of the language to the topological research of the language through the practice of a plurality of students. A brief passage to the study of symmetry and programming of the current language.
Topology (Topology) belongs to a sub-field of mathematics, which mainly deals with the property that a geometric figure or space can remain unchanged after a continuous change in shape. It is widely used in various disciplines such as physics, economics, etc. In recent years, linguists have begun to apply the concept of topology to language studies, such as bilingual translation, cognitive linguistics, and recursive studies of languages. By systematically learning the theory of topology, three major properties of topology can be found: topological equivalence, connectivity of topological space and homomorphic mapping are all expressed in different forms in the language.
Natural Language (Natural Language) generally refers to a Language that naturally develops with human culture. French, german, korean belong to a subclass of natural languages, and world languages are artificial languages, which are languages created for specific purposes. Sometimes, however, all languages one speaks are considered "natural" languages, and the language one understands by a computer is a programming language. Such an utterance is common in the field of natural language processing. The primary tool for human communication and thinking is natural language.
Natural Language Understanding (NLU), as a branch subject of artificial intelligence, is the point of origin and breakthrough of artificial intelligence and even intelligent scientific research. It studies the process of using an electronic computer to understand and mimic the generation of human language communications so that the computer can understand and use the natural languages of human society, such as chinese and english, to achieve natural language communications between humans and computers. While replacing some of the human brain activity including searching for information, answering questions, extracting documents, compiling data, and processing all natural language related information.
Natural Language Processing (NLP) is a technique of performing various types of Processing on human-specific written and spoken languages using a computer. Natural language processing is an important branch in the fields of computer science and artificial intelligence. It studies various theories and methods to use natural language to enable effective communication between people and computers. Natural language processing is a subject that combines linguistics, computer science, mathematics, and the like.
At present, the natural language processing technology is widely applied to various aspects of life, study and work of people, and brings great convenience to people. However, neither Chinese search techniques, nor Chinese speech recognition, or Chinese OCR, are well established relative to English. Natural language processing techniques include many aspects such as text classification, dialog systems, machine translation, and the like. A query function that is often used by people uses search engine technology.
The computer should be able to give accurate answers when the user enters a long question in the search engine. Although internet search engines are able to navigate a very large knowledge range, the ability to reach intelligent assistants is far away. One obvious difference is that the query of an internet search is broken down into a few key words rather than the actual semantics expressed in natural language. The search engine can only give poor results or return nothing of value for a long query. People can only translate their question into one or more appropriate keywords to attempt a relatively reasonable response. A long-standing goal of information retrieval research has been to develop retrieval models to provide accurate results for longer, more specialized queries. There is a need for a better understanding of textual information.
In the early stages, that is, when computers were first born, research was started to analyze natural languages of human beings using computers. But at that time were primarily due to strategic considerations. Researchers inspired on deciphering military ciphers and thought that the difference between languages was encoded differently for the "same semantics", which was the idea of the earliest machine translation theory and represented the first era of natural language processing.
After the 20 th century, seventy years, with the great improvement of computer hardware technology, the medium-scale (million-level) corpus processing becomes possible. Through more than ten years of development, natural language processing gradually becomes an independent field of artificial intelligence. At this time, the natural language processing is also divided into two different categories, namely a rule category and a statistical category. Rule pie is based on linguistic theory, and linguists use rules to delineate or interpret ambiguous phenomena based on their understanding of linguistic phenomena. The statistical pie is an empirical method based on corpus statistical analysis, and attaches more importance to the use of mathematics, finds knowledge capable of representing natural language rules from a large-scale real text, and extracts statistical rules.
By the 90 s of the 20 th century, the appearance of the internet has drastically changed the lifestyle of people. The basic language component of a Chinese Internet search engine is Chinese word segmentation. Accompanying these breakthroughs is the emergence of a series of new algorithmic systems, collectively referred to as "machine learning. Most of these methods are developed by simulating the cognitive behavior of human beings on the theoretical basis of the working principles of neurons and brain. Machine learning achieves an accurate and stable effect in dealing with multi-dimensional, non-linear problems. Such as Chinese word segmentation and part-of-speech tagging of large-scale linguistic data.
In the last 21 st century, in 2006, several scientists including Hinton have been designing the first multi-layer neural network algorithm through recent 20 years of effort. Because the learning ability of abstract cognition is realized through a multi-layer architecture, the learning ability is named as deep learning. From word2vec (word vector computing tool) in 2013, deep learning methods are widely used in the field of natural language processing. In 2014, the method focuses on various brand-new word vector representation methods, expands to the level of sentences from 2015, and models such as CNN (convolutional neural network), RNN (recurrent neural network), LSTM (temporal recurrent neural network) and the like appear successively, so that great progress is made in the work of machine translation, document summary, question-answering system and the like. In 2016, 3 months, AlphaGo developed by DeepMind defeats the Korean top chess player Li world \20077ina world wide chess war with attention, and this fact directly pushes people's attention to deep learning to a new climax.
In summary, the problems of the prior art are as follows:
(1) the existing Chinese searching technology, Chinese speech recognition and Chinese OCR are not mature enough, and the understanding of Chinese is not deep enough or the rules of Chinese are not according.
(2) The actual problems expressed by natural language cannot be completely described and inquired; the existing chinese text information is not well understood.
(3) The existing couplet system intelligently carries out the couplet according to the existing upper couplets of the database, and the automatic generation of the couplets cannot be well realized. The symmetry of language is prominent in chinese. This is the very originating point of the present computer implemented system. The Qing Dynasty Li fishery is summarized in Ching Weng rhyme, which is only collection and arrangement of phenomenon level and does not enter into rational analysis of logic classification. The earliest corpus of the couplet system of the invention is the Ching Weng rhyme. The Ching Chinese character has only 3 thousand characters for rhyme, but finishes the arrangement of semantic symmetrical words in Chinese, so the Ching Chinese character is also a rule base. This is the view of Ching Pang rhyme from the perspective of natural language processing.
The difficulty of solving the technical problems is as follows: the method has the advantages that the earliest raw material library of Ching Weng rhyme is too small, and the corpus is expanded by every student; and (4) an evaluation mechanism for the connection of superiority and inferiority. Meaning of couplet and relationship between words and sentences. How to evaluate, i.e. how to select the correspondence.
The significance of solving the technical problems is as follows: the problem is solved, so that the Chinese can be understood with more certainty, and meaningful language understanding and generation can be realized. The first is to grasp the symmetry of the language. This is a very significant feature of Chinese that must be mastered. Will bring an understanding and appreciation of the thought method. This provides an idea to hold the entire human being and its historical languages more globally. For the solution of the couplet evaluation mechanism, the self thinking of human beings is inspired in the couplet process, and a new inspiration is generated; and (5) sorting the contents of the ordered brains to form a better knowledge structure. The application in Chinese understanding and man-machine exchange.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a language couplet information processing system, a language couplet information processing method and a language couplet information construction method.
The present invention is achieved as such, a language couplet information processing system, comprising:
an input module: the input device is used for inputting the uplink through the input box;
corpus: for storing the couplet related data;
the antithetical couplet generation module: the system is used for triggering corresponding algorithm logic to generate a downlink according to the uplink input by the input module;
an output module: for displaying the generated downlinks in an output box.
Further, the corpus specifically includes: and a basic corpus is added, and a dual sentence and Chinese classical couplet expansion rich corpus are added.
Another object of the present invention is to provide a language couplet information processing method for executing the language couplet information processing system, the language couplet information processing method including:
firstly, inputting an upper link through an input box, and inputting the upper link information into a computer after the upper link information is processed by an input module;
secondly, after the uplink data input in the first step are stored in a computer, a couplet generation module of the program triggers corresponding algorithm logic to generate a downlink according to the uplink data input by the input module;
thirdly, the downlink generated in the second step is processed by an output module and then output, and the output is displayed on a software interface;
the language antithetical couplet information processing method antithetical couplet generation module needs to use a corpus; the corpus specifically includes: the method comprises the steps of taking a Chinese hatsuker pair rhyme as a basic corpus, and adding duality sentences and Chinese classical couplet expansion rich corpora; prior to use, the system is constructed using natural language processing techniques.
Another object of the present invention is to provide a method for constructing the language association information processing system, wherein the method for constructing the language association information processing system specifically includes:
step one, selecting Ching Pang rhyme and modern couplet on a network as a corpus;
step two, carrying out statistical learning on data in the corpus;
selecting a Seq2Seq model as a probability model;
performing incremental expansion on the Seq2Seq model to generate a language model of the couplet system;
step five, building an algorithm model framework by using TensorFlow, and generating a couplet generation system with a smoothing algorithm;
and step six, using an evaluation function to check the system model, and continuously optimizing parameters and algorithms.
Further, in the fourth step, the performing incremental expansion on the Seq2Seq model specifically includes:
(1) the sequence couplet system model receives an input clause, captures the meaning of the clause by using a Recurrent Neural Network (RNN) for characters, and obtains a single vector representing a preceding clause; decoding the input vector into subsequent clauses by intelligent character generation using another recurrent neural network; the above process generates a sequence of sentences based on global hierarchy by encoding and decoding;
(2) introducing an attention mechanism into the model to allow the decoder to dynamically select and linearize, combining different portions of the input sequence with different weights; the attention mechanism essentially models the alignment between the input and output positions, and thus treats it as a locally matching model; meanwhile, the attention mechanism can also solve the problem of pitch coding;
(3) finally, a smoothing algorithm for generation of the couplet is added, so that the couplet generator can polish the generated couplet for multiple times and perform one or more iterations to improve the wording; the draft generated from the previous iteration is brought into a hidden state, and a pair of couplets modified by polishing is generated in the next iteration; i.e. the information representative of the previously generated clause draft, will be used again as input, as semantically coherent additional information.
Further, in step six, the evaluation function is:
selecting cross entropy as a loss function; the formula is as follows:
Figure RE-RE-GDA0003561606330000061
wherein p, q represent two different distributions, one is true distribution and one is predicted distribution, respectively; and calculating the similarity between the distribution on the test set and the real situation by the above formula, thereby judging the accuracy of the model in the test set.
Another object of the present invention is to provide a computer program for implementing the language couplet information processing method.
Another object of the present invention is to provide an information data processing terminal for implementing the language association information processing method.
Another object of the present invention is to provide a computer-readable storage medium including instructions which, when run on a computer, cause the computer to execute the language collation information processing method.
In summary, the advantages and positive effects of the invention are:
the realization of the antithetical system of the invention is a perfect response to 'pair' of 'the problem of' the old-fashioned adhesion and 'the pair' of the old-fashioned adhesion; the invention can comprehensively understand the syntactic structure of Chinese more integrally by exploring the topological property of the language. The invention researches the rule of the three dimensions of semantics, syntax and pragmatics from the semantic space to the grammar space and then from the plane space to the three-dimensional topological space in a cognitive angle, thereby breaking the boundary among the three dimensions.
The invention can realize the automatic generation of couplets, can be leisurely coped with both the ancient couplets and the modern couplets, and has strong robustness.
The invention applies the seq2seq model based on the neural network to the couplet system to realize the automatic generation of the couplet, and the continuous training effect of the model can be gradually improved.
Drawings
FIG. 1 is a schematic structural diagram of a language association information processing system provided by an embodiment of the present invention;
in the figure: 1. an input module; 2. a corpus; 3. a couplet generation module; 4. and an output module.
Fig. 2 is a flowchart of a method for constructing a language association information processing system according to an embodiment of the present invention.
Fig. 3 is a flowchart of a method for constructing a language association information processing system according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a sequence alignment generation system according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of an attention mechanism-based antithetical couplet generation system provided by an embodiment of the invention.
Fig. 6 is a schematic diagram of a couplet generation system with a smoothing algorithm according to an embodiment of the present invention.
Fig. 7 is a schematic diagram of a loss function provided by an embodiment of the invention.
Fig. 8 is a schematic diagram of a neuron according to an embodiment of the present invention.
Fig. 9 is a diagram of a neural network structure according to an embodiment of the present invention.
Fig. 10 is a simplified schematic diagram of a recurrent neural network according to an embodiment of the present invention.
Fig. 11 is a complete structure diagram of the recurrent neural network provided by the embodiment of the present invention.
Fig. 12-fig. 15 are application effect diagrams of the language couplet information processing system provided by the embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The following detailed description of the principles of the invention is provided in connection with the accompanying drawings.
As shown in fig. 1, the language association information processing system provided in the embodiment of the present invention specifically includes:
the system comprises an input module 1, a corpus 2, a couplet generation module 3 and an output module 4;
an input module 1: the input device is used for inputting the uplink through the input box;
corpus 2: for storing the couplet related data;
the couplet generation module 3: the system is used for triggering corresponding algorithm logic to generate a downlink according to the uplink input by the input module 1;
the output module 4: for displaying the generated downlinks in an output box.
The corpus 2 provided by the embodiment of the present invention specifically includes: uses the Ching Pang rhyme as basic corpus and adds dual sentences and Chinese classic couplet to expand rich corpus.
As shown in fig. 2 to fig. 3, the method for constructing a language association information processing system according to the embodiment of the present invention specifically includes:
s101, selecting Ching Pang rhyme and modern couplets on a network as a corpus;
s102, carrying out statistical learning on data in a corpus;
s103, selecting a Seq2Seq model as a probability model;
s104, performing incremental expansion on the Seq2Seq model to generate a language model of the couplet system;
s105, building an algorithm model framework by using the TensorFlow, and generating a couplet generation system with a smoothing algorithm;
and S106, checking the system model by using the evaluation function, and continuously optimizing the parameters and the algorithm.
In step S104, as shown in fig. 4 to fig. 6, the incremental expansion of the Seq2Seq model provided in the embodiment of the present invention specifically includes:
(1) firstly, a sequence couplet system model receives an input clause, captures the meaning of the clause by using a Recurrent Neural Network (RNN) for characters, and obtains a single vector representing a preceding clause; decoding the input vector into subsequent clauses by intelligent character generation using another recurrent neural network; the above process generates a sequence of sentences based on global hierarchy by encoding and decoding;
(2) secondly, an attention mechanism is introduced into the model to allow the decoder to perform dynamic selection and linearization, and different parts of the input sequence are combined with different weights; the attention mechanism essentially models the alignment between the input and output positions, and thus treats it as a locally matching model; meanwhile, the attention mechanism can also solve the problem of pitch coding;
(3) finally, a smoothing algorithm for generation of the couplet is added, so that the couplet generator can polish the generated couplet for multiple times and perform one or more iterations to improve the wording; the draft generated from the previous iteration is brought into a hidden state, and a pair of couplets modified by polishing is generated in the next iteration; i.e. the information representative of the previously generated clause draft, will be used again as input, as semantically coherent additional information.
In step S106, the evaluation function provided in the embodiment of the present invention is:
selecting cross entropy as a loss function; the formula is as follows:
Figure RE-RE-GDA0003561606330000091
wherein p, q represent two different distributions, one is true distribution and one is predicted distribution, respectively; and calculating the similarity between the distribution on the test set and the real situation by the above formula, thereby judging the accuracy of the model in the test set.
The application of the principles of the present invention will now be described in further detail with reference to specific embodiments.
Example 1:
1. and (3) developing environment:
notebook computer
Operating the system: win10 system
A development platform: pycharm
And (3) developing a language: python3.6
A calculation framework: tensorflow 1.6
2. The corpus used is stored in text format, each line stores one character, and the total number is 9131 characters.
3. Interface display:
the interface provided by the embodiment of the invention is simple and comprises two text boxes and one button. Firstly, an upper link is input in a text box of the upper link according to personal preference, the length is designed to be between 1 and 50 words, and the requirement of most couplets can be met. The user then clicks the generate button, triggers the corresponding algorithmic logic, and the system will then present the result of the couplet in the next dialog box.
4. System testing
(1) First, five languages and seven languages of the Tang poem are tested.
The five-language and seven-language sentences in the traditional Tang poem are input, and the operation shows that the Chinese poem can directly give a brand-new lower link, which is completely different from the result of the original poem and is good in quality. This shows that the seq2seq model will learn and mimic according to the inputted corpus. It should of course also be seen that the couplet and poem are significantly different and that the poem does not have to strictly follow the rules of the couplet. For example, when "window front moon light" is input, the result is "mountain top white cloud idle". In this example, the upper and lower links do not respond well, and each word in the corresponding position does not form a proper pair because the upper link is not a strict duality sentence.
(2) And secondly, testing aiming at ancient and modern couplets respectively.
The ancient couplet selects sentences in Ching Weng rhyme, the last sentence is 'dispelling intoxication and knowing tea power', and the result given by the system is 'getting drunk spirit in the first time'. While the lower link in the original text is "worries about knowing the right to drink". It is not easy to see that the lower link is different from the original sentence, but the effect of the counter link is completely and exactly the same.
And selecting a modern couplet for testing. In the upper series, "the east China sea white crane lives in the autumn, and the original text of the lower series is" the south Ling Chun Song Wan Carun ". And the result given by the couplet system is 'the south China Hongmei Wanguxiang'. At each respective location, the word and part of speech are the right. Therefore, the couplet system can be gracefully coped with both the ancient couplet and the modern couplet, which shows that the couplet system has strong robustness.
(3) And (3) boundary testing:
if nothing is entered in the top box, i.e. the number of words is zero, the bottom box will give the text prompt "your input is too short". The method also conforms to the daily logic of people, and if the upper link has no content, the lower link naturally has no corresponding result.
And when the number of words entered exceeds 50 words, a reply of "you enter too long" is also given. This is because the system designs the logic to limit the number: such a result occurs if more than 50 words are present. The number of commonly used couplets does not exceed 50 words, and the couplets exceeding 50 words have no practical application value.
(4) Word overlapping test:
the test of the overlapped words is very key for couplet and is an important index for testing the performance. One example of attempting to enter a random query from the network: "is whether or not, or whether or not, is not" true ", and the result is not ideal. The second sentence, black and white, is clearly wrong, and should be "white, black and white". Obviously, the system cannot recognize words in the face of word stacking well, and the algorithm needs to be improved.
(5) Single word testing:
three sets of tests were performed, and the performance of the system was, in most cases, poor when dealing with the single word task. The lower reaches given by a system are generally not words of opposite meaning, but rather the results given and upper reaches may together constitute a word. This means that when the number of words is small, the learned semantics and lexical methods of the language model are still limited, and only results are obtained according to the probability nearby.
5. System evaluation
The antithetical couplet system in the invention adopts a statistical method in the field of natural language processing, namely, the model needs to be continuously trained based on the principle of machine learning. Only if the model is optimized, the performance of the antithetical couplet system can be better. The purpose of model training is optimization of parameters and algorithms, and the evaluation standard is responsible for checking the quality of the model. The loss function is such an evaluation function. The invention selects cross entropy as loss function. The mathematical definition of cross entropy is as follows:
Figure RE-RE-GDA0003561606330000111
where p, q represent two different distributions, one true and one predicted, respectively. The similarity between the distribution on the test set and the real situation can be calculated through the formula, so that the accuracy of the model in the test set is judged. Over a period of training, the loss gradually decreases (number of iterations on the horizontal axis and loss on the vertical axis). An image of the loss function is shown in fig. 7.
Example 2:
the couplet is a unique style of Chinese characters and art, and represents a wonderful thinking cognition mode of Chinese. The intelligent scientific laboratory started the research and construction of the couplet system from 2013. The antithetical couplet system is selected as a foothold and an entry point for the language research of the invention, and the symmetry and the topology of Chinese are tried to be shown to the maximum extent by antithetical couplet, so that the theory and the practice are organically combined, and a new way is developed for the Chinese natural language processing by the way.
1. Features of couplets
The couplet has the characteristics of the couplet, so that the couplet is distinguished from other languages. The couplets have different character lengths, and the shorter couplets have only one to two characters; the length can reach hundreds of characters. The couplets have various forms, including right, opposite, running water, ball and sentence pairs (liu li sheng, 2011). However, whatever form of couplet is used, it must have the following characteristics:
the number of words is equal, and the sentence breaks are consistent.
The second is both flat and narrow, and the tones are harmonious. Level zeptos are based on four tones, and generally divide the four tones into two categories, so there is a saying of "level zeptos". This is presumably based on the comparison of the ancient and this sounds: the first sound and the second sound of the existing pinyin are approximately equal to the flat sound of the ancient times, and the third sound and the fourth sound of the existing pinyin are equal to the zeptosound of the ancient times. Typically, the last word of the upper reaches is used with a zeptogram, while the last word of the lower reaches is used with a flat tone (director, 2011).
The three parts of speech are relative and have the same position. This requires that the corresponding word parts of speech must be consistent and that the corresponding words must appear in the same place.
The four important contents are related and linked up and down. The contents of the top and bottom rows must be closely related, but not identical. The transverse batch is used as a soul of a pair of couplets, which is the summary of the center of the couplets and must play a role in submitting the acceptance of the gang.
The five-couplet does not need rhyme in general conditions, and rhyme is only necessary for rhyme.
2. Topology of antithetical couplets
Languages have topology, while couplets belong to the category of languages. By developing the association accordingly, then the couplet should also have some property of the topology. According to the characteristics of couplet, the concept of homoembryo mapping in topology is not difficult to think. I.e. if the mapping between topological spaces f: X → Y is bijective, and f-1Are continuous, then f is said to be a homoembryo. If there is a homoblast mapping from X to Y, it is called X and Y homoblast, often denoted as f:
Figure RE-RE-GDA0003561606330000121
this process is also referred to as topology transformation. The upper link of the couplet is equivalent to X, and the lower link is equivalent to Y. The input is linked up and then a downlink is generated. In the generation process of the couplet, a topological mapping from the upper-link to the lower-link is obtained. Here, the topology space is a semantic space, and the upper and lower connected benches correspond to the functional relationship of the mapping. Obviously, the process of antithetical couplet is topology transformation, and the number, level, tone, word property and semantic relationship of the upper and lower antithetical couplets are not changed, so that the antithetical couplet system has topology invariance. The antithetical couplet has topology, namely has space feeling and picture feeling, so that the central idea conveyed by the antithetical couplet is three-dimensional and saturated. This is what linguists and literary workers are pursuing. It is more appreciated that interpretation of couplets using mathematical perspectives will have a completely new experience, as this will allow the taste to be transferred from another angle. At this point, science and art are invaluable.
In summary, the couplet is an important component of the Chinese traditional culture as a custom, has a good mathematical structure, can help the invention to better grasp Chinese through explanation and generation, and has profound significance for developing the Chinese traditional culture.
3. Construction of antithetical couplet system
The development work of the couplet system is carried out according to the characteristic that the upper and lower couplets of the couplet are symmetrical, and the development work is an attempt to apply the theory to practice. Before this time, many couplet systems have been developed, but the system can only be matched according to the existing upper couplets of the database, and most of the systems belong to a query process of the database. In order to realize automatic generation of couplets, a new method is needed, and a statistical-based language model and a machine learning algorithm are adopted. The antithetical system is the most suitable application of the principle of computational linguistics, which is to simulate the language ability of a human by a machine by establishing a formal mathematical model and then using a computer program to analyze and process natural language. Meanwhile, the computational linguistics and the machine learning are inseparable, and the machine learning provides a method for the computational linguistics and the machine learning. Machine learning has three elements: algorithm, model, data. Therefore, the design of the antithetical couplet system does not depart from the three factors, and the problems need to be solved from the three factors.
4. Corpus
The first problem to be solved is the problem of data, namely the problem of corpus in natural language processing. The language cannot be independently analyzed even if an algorithm exists without a corpus; on the other hand, characteristics such as quality and scale of the corpus have an important influence on the analysis result. Because of the large capacity of computer storage, real text data and quick and accurate information extraction, a linguist can use the electronic corpus to describe languages from multiple angles and layers and verify various language theories and assumptions, even establish new language models and language concepts.
A Corpus (Corpus, plural Corpora or Corpus) is defined as: is language material stored in a computer for language research and applications. And its collection is made up of naturally occurring written or spoken samples representing a particular language or language variant.
The corpus is not randomly stacked together, and has the following four characteristics:
(1) the content in the corpus should be derived from real text and needs to be representative;
(2) the corpus should be machine-readable, in a format that is processed by computer techniques;
(3) the corpus recorded in the corpus should be properly marked and processed;
(4) the size of the corpus is limited.
The basic corpus adopted by the couplet system is Ching Weng rhyme. The Ching-Weng rhyme is a corpus which is used by an intelligent scientific laboratory for a long time, and has natural advantages if being used as a corpus of a couplet system. The Ching-Weng rhyme is a beginner book which is familiar with the antithesis, rhythm and vocabulary organization and is used for people to write poetry and singeing in the past. The book is created in the early period of Ming dynasty, and the author is Li fishery and Ching Geng, so the book is named as Ching Geng Pai rhyme. Li Yu is a famous opera family in the Ming and Qing dynasty in the Ming and Ming dynasty, and has created "Idle love doll". The whole book is divided into a first volume and a second volume which are separately recorded according to rhythm, and the paired sentences of various types of house, such as astronomy, geography, flowers, trees, birds, beasts and the like are included. The length of the couplet is from one character to two characters, five characters, seven characters, nine characters or even eleven characters, the rhythm is harmonious, and people can exercise in the aspects of voice, vocabulary, repairing and the like.
If the corpus is only insufficient by using the Chinese hat to rhyme, the effect of couplet to stick is not good. According to the algorithm idea of machine learning, only enough data can train a better couplet system. This is especially necessary to expand the corpus. Therefore, some linguistic data are collected from the Internet to be used as organic supplement so as to improve the effect of couplet. The invention selects some dual sentences and Chinese classical couplets, tries to enrich the existing corpus, and builds up the corpus up till now.
5. Language model
In machine learning, a model represents a conditional probability distribution or decision function to be learned. This machine learning-based approach is to generate a model from a training set and analyze new instances from the model. On the basis, the language model is a statistical model used for calculating the probability of a certain language in a sentence. The statistical unit of a general natural language is a sentence, so that the statistical unit can also be regarded as a probability model of the sentence. There are many types of language models, such as a naive bayes model, a hidden markov model, a maximum entropy model, a conditional random field model, a Sequence to Sequence model, and the like. Particularly, according to the characteristics of the couplet system, a Seq2Seq model is selected. Because the two parts of the upper link and the lower link exist in the antithetical couplet system, the lower link needs to be given according to the upper link, and a certain logic relationship exists between the upper link and the lower link. This is exactly in line with the characteristics of the Seq2Seq model. As the name implies, sequence is the meaning of a sequence, and the seq2seq model is from one sequence to another. Initially this model was proposed for translation. The seq2seq model was proposed independently in 2014 by two teams, Google Brain (Cho et al, 2014) and yoshua bengio (Sutskever et al, 2014), which are dedicated to the field of machine translation.
This is effectively a translation model, with the input being a sequence, such as an English sentence, and the output also being a sequence, such as a Chinese sentence. The purpose of the exercise model is to translate one language sequence into another. The whole process maps one sequence as an input to another output sequence by using a recurrent neural network. The most important point of this structure is that the length of the input sequence and the output sequence is variable. At the beginning, most of the models are used for machine translation, but as the technology is mature, the models are gradually applied to the fields of automatic conversation robots, automatic generation of document summaries, automatic generation of picture descriptions and the like. Although different in form, the same in nature. Because they are end-to-end models, both pictures and text can be used as an input sequence. Therefore, applying the seq2seq model to the couplet system should have good effect.
6. seq2seq model core algorithm
(1) Artificial neural network
Neural networks, like topologies, are also a cognitive model. The neural network is not only a core forming the deep learning technology, but also an important component of the seq2seq generating model. The neural networks discussed in this invention are also called artificial neural networks. In the meaning of the name, the inspiration of the artificial neural network refers to the human brain, and is a simulation of the human thinking mechanism. As is well known, the human brain is composed of about 1000 billion neurons and has a very complex structure. Similarly, an artificial neural network also needs to be composed of a large number of "neurons", each having an activation function. There is a connection between any two neurons, which is called a weight, i.e., a temporary memory for the input. The activation function will process accordingly for the input. The output of the neural network is determined by the weight and the activation function together. A simplest neural network consists of only one neuron, as shown in fig. 8.
Neurons can also be referred to as perceptron neural networks, from which any complex neural network is composed. A neuron is composed of two parts of weighted input variable and activation function;
1) weighted input variables:
Figure RE-RE-GDA0003561606330000161
this formula indicates that the input variables of a certain layer are products of all variables and weights of the previous layer, and then summed.
2) Activating a function
The activation function is an output function that is a quadratic treatment of the intermediate result. The definition is as follows:
yj=f(xj)
there are many specific activation functions. The functions of logistic (regression), sigmoid, tanh (hyperbolic tangent), Relu (linear correction unit), softmax and the like are common. The most common activation function is a logistic function, because the curve is continuous and uninterrupted, and meanwhile, the smoothness is provided, and the later calculation is facilitated.
A complex neural network structure is a network architecture consisting of two interconnected neurons between two levels. The structure of which is shown in figure 9,
a multi-level neural network structure. If expressed in the form of a vector, the mathematical expression is:
Net=wTx+b
in the expression, w represents a weight vector of each node, x is a value to be input in each layer, b is a bias vector, and a general initial value is 1. The function of the artificial neural network is more clearly and intuitively expressed by simplifying all numerical values into a vector form.
(2) Recurrent neural network
The seq2seq model used in the present invention is an abbreviated form of sequence-to-sequence. As the name implies, this is the conversion from one sequence to another. In the present invention, a sequence represents a word, such as a word. In the process of encoding and decoding data between the two sequence types, the recurrent neural network is used as a processing tool.
The recurrent neural network is an improvement on the basis of the artificial neural network, has a more complex structure and is suitable for processing sequential data. One of the simplest recurrent neural network structures is shown in fig. 10.
As can be seen from FIG. 10, assuming a neural network is A, it is necessary to obtain an input xt for a certain state and then output a value ht. Due to the use of loops, information can be passed from the present time step to the next time step. Further understood is that: the same network will cycle numerous times at different time states, and each neuron will transmit the updated results to the next time point.
A complete recurrent neural network is formed by a plurality of simple recurrent neural networks, and the processing of the sequence data is realized. By analyzing its principle, both circular and rectangular patterns in the network represent vectors. Where x, y represent the input and output vectors, respectively, and may also represent a state. In the present invention, the x, y vectors represent text invention words, i.e. single words. The biggest difference of the recurrent neural network is that a layer of hidden state h is added in the middle, and the hidden state h has the function of extracting the input data characteristics and then converting the input data characteristics into output. This is viewed from the longitudinal direction, while the implicit states are also shifted in sequence order from the lateral direction. As shown in fig. 11.
Each arrow represents a one-step transfer of the vector. The mathematical formula is as follows:
1) transverse transfer
h1=f(Ux1+Wh0+b)
The latter states are all iterations of the first process, and the parameter is still U, W, b. And so on.
2) Longitudinal transfer
y1=softmax(Vh1+c)
Similarly, the subsequent output follows the same law as the first output. Each arrow is a transformation process and can be calculated all the time. Specifically to the field of natural language processing, c is defined as a context vector used to communicate the two steps of encoding and decoding. Through the continuous iteration process, all input sequences are finally converted into output sequences.
The softmax function is one of activation functions, and is mainly applied to a multi-classification problem, and output values are normalized into probability values. It is used for optimization of the output part of the recurrent neural network. Let C be the number of classes to be predicted, the output is a, and the number is also C, i.e. a1,a2.......a12. Then for a sample, the probability that it is of class i is a function of softmax. The mathematical expression is as follows:
Figure RE-RE-GDA0003561606330000181
(3) seq2Seq model
The Seq2Seq model is a variant of the recurrent neural network. Since the recurrent neural network can only handle the case where the input and output sequences are equal; as an improvement on this, the Seq2Seq model can cope with situations where the input and output lengths are not consistent. The most basic Seq2Seq model consists of three parts: the Encoder, Decoder and the intermediate state vector connecting the two, called the state vector, are fixed in size. The model includes two recurrent neural networks, the encoder is one recurrent neural network, and the decoder is the other recurrent neural network. The encoder receives input data and encodes the input data into a state vector S with a constant size, then transmits the state vector S to the decoder, and the decoder performs a learning process on the state vector S and finally outputs a learning result. Before introducing the seq2seq model, the following concepts need to be defined:
1) inputting this model is divided into two parts, input and output. The seq2seq model assumes that a given antecedent is a set a,
A={x1,x2,....,xm},xi∈V,
xiis a character and V represents a vocabulary, and then an abstract representation of the antecedent sentence a can be grasped.
2) The output then generates a subsequent clause S ═ y1,y2,.....,ym}. From set a, this indicates that the semantics are coherent. At the same time have yiE.g. V. More specifically, each character y in SiAll corresponding to character x in AiAnd (4) harmonizing, which is determined by the couplet constraint.
The input clause is encoded as a hidden vector and the vector is then decoded as the output clause, making the two clauses effectively a one-to-one union. Due to the particularity of generation of the couplet, different neural network models are proposed. And performing incremental expansion on the model to enable the final model to solve the complex problem of generation of the couplet.
The most basic model is first introduced: and (4) a sequence cross-correlation system. This model accepts input clauses. The meaning of the clause is captured by using a Recurrent Neural Network (RNN) on the character. Thus, a representative antecedent clause is obtained
A single vector of (a). The input vector is then decoded into subsequent clauses by intelligent character generation using another recurrent neural network. Basically this process generates a sequence by encoding and decoding, which is based on a sentence of global hierarchy. Figure 4 shows a graph of successive dual generation.
The second model is generated based on a couplet of attention mechanism. In fact the basic Seq2Seq model has many disadvantages, the encoder first encodes the input as a state vector of constant size. This process is essentially a "lossy compression of information" in that the larger the volume of information, the greater the loss of the process of converting the vector into information. At the same time, a larger length of the sequence means that the sequence in the time dimension is long and the recurrent neural network model will also exhibit gradient scatter. Finally, the underlying model linkage component of the encoder and decoder modules is just a state vector of constant size, which makes the decoder not directly aware of many details of the input information. It is based on the Seq2Seq model that there are inevitable drawbacks, and the concept of attention mechanism is introduced later.
Through analysis, the following results can be found: there is a particular phenomenon in a pair of couplets: characters in the same position in the previous clause and the next clause, i.e., xi and yi, usually have some relationship as "join" or "pair". Therefore, this one-to-one correspondence between xi and yi in the couplet generated neural model should be simulated. Recently, attention has been directed to mechanisms that allow the decoder to dynamically select and linearize, combining different portions of the input sequence with different weights. The attention mechanism essentially models the alignment between the input and output positions and can therefore be considered as a local matching model. Furthermore, the pitch coding problem can also be solved by a pair-wise attention mechanism. FIG. 5 shows an extension of the attention mechanism to the sequence couplet generation model.
Smoothing algorithm for generation of couplets. The generation of couplets is an artistic form, and the art usually needs to be polished. The proposed couplet generator will be able to grind the generated couplets multiple times, one or more iterations to improve the wording, compared to the traditional single-channel generators in previous neural network models. The model is essentially the same as the cross-correlation generation of sequences based on attention mechanisms. The information representation, except for the previously generated clause draft, will be used again as input, as semantically coherent additional information. The principle is shown in FIG. 6; the draft generated from the previous iteration will be brought into a hidden state and a copy of the couplet modified by polishing will be generated in the next iteration.
7、tensorflow
The task today is to implement this algorithmic model, which requires support from a computational framework. And the TensorFlow is such a framework for executing machine learning algorithms. Or it is an artificial intelligence learning system and an interface for realizing machine learning algorithm. TensorFlow is a second generation artificial intelligence system developed by Google based on the first generation artificial intelligence system DistBeief, and the naming comes from the operation principle of itself.
Tensor Chinese means Tensor, Tensor is an N-dimensional array, and Flow Chinese means Flow. While the tensrflow framework is a dataflow graph-based computation that can implement a computation process where tensors flow from one end of the flow graph to the other. The TensorFlow is a computational framework or open source software library that converts a large data structure into a dataflow graph and performs analysis and processing in a neural network.
TensorFlow is unique among many deep learning frameworks and is preferred by many researchers and programmers at work due to its superior performance and algorithm execution efficiency. The internal realization principle is complex and ingenious. The computation of TensorFlow can be represented as a directed graph, with each node in the graph representing a mathematical operation in the graph. Nodes and connections between nodes are called edges and represent an array of multidimensional data, i.e., tensors, of interconnections between nodes. The directed graph describes the calculation flow of data, and is also responsible for maintaining and updating the state, and a user can perform condition control or circulation operation on the branches of the directed graph. The user can design a directed graph of this data calculation using several languages, python, C + +, Go, Java, etc. There may be any number of inputs and outputs per node in the directed graph, with each node describing an operation and a node may be an instantiation of an operation. TensorFlow has one important component client, the client, which connects to the master and multiple workers through Session interfaces. Each worker can be connected to many hardware devices, must be supported by a CPU (central processing unit) or a GPU (graphics processing unit), and is responsible for managing these hardware devices. And the master is responsible for guiding all workers to execute the directed graph according to the flow.
FIG. 12 is the input of the upper link "Chinese in accord" in the system and the output of the lower link: "turn to big work". It seems to be flat, and for the further trend of the system of the present invention, it is the endeavor to enter the research object into the system, further to the literature, and to integrate more Chinese. The book-copy-only material is input into the upper link, namely the summary of the old-fashioned adult who writes willow, namely the Oldham. The system gets the following: "question-and-sentence just-in-the-way bullet petty yarn", which is fig. 13; the system also takes the 'question-and-sentence-and-disturb play petty yarn' as the upper link and the system takes the lower link as the 'wind-and-must-question Chinese cabbage'. Is FIG. 14; the former is the linkage of "who has to ask something before the wind" and the latter is the linkage of "Wanyue can think of hometown". Fig. 15. If the relation of 'Wan Yue Fang and thinking about hometown' is taken as the upper relation and the lower relation is continued, the invention tests and cycles to the 24 th edition. The invention measures the antithetical couplets of 'one character', 'two characters' and 'multiple characters'. The top few screenshots are representative.
It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (9)

1. A language couplet information processing system, comprising:
an input module: the input device is used for inputting the uplink through the input box;
corpus: for storing the couplet related data;
the antithetical couplet generation module: the system is used for triggering corresponding algorithm logic to generate a downlink according to the uplink input by the input module;
an output module: for displaying the generated downlinks in an output box.
2. The language couplet information processing system defined in claim 1, wherein the corpus specifically comprises: and a basic corpus is added, and a dual sentence and Chinese classical couplet expansion rich corpus are added.
3. A language collation information processing method that executes the language collation information processing system according to claim 1, characterized in that the language collation information processing method comprises:
firstly, inputting an upper link through an input box, and inputting the upper link information into a computer after the upper link information is processed by an input module;
secondly, after the uplink data input in the first step are stored in a computer, a couplet generation module of the program triggers corresponding algorithm logic to generate a downlink according to the uplink data input by the input module;
thirdly, the downlink generated in the second step is processed by an output module and then output, and the output is displayed on a software interface;
the language antithetical couplet information processing method antithetical couplet generation module needs to use a corpus; the corpus specifically includes: the method comprises the steps of taking a Chinese hatsuker pair rhyme as a basic corpus, and adding duality sentences and Chinese classical couplet expansion rich corpora; prior to use, the system is constructed using natural language processing techniques.
4. The method for constructing the language couplet information processing system according to claim 1, wherein the method for constructing the language couplet information processing system specifically comprises:
selecting a basic corpus and a modern couplet on a network as a corpus;
step two, carrying out statistical learning on data in the corpus;
selecting a Seq2Seq model as a probability model;
performing incremental expansion on the Seq2Seq model to generate a language model of the couplet system;
step five, building an algorithm model framework by using TensorFlow, and generating a couplet generation system with a smoothing algorithm;
and step six, using an evaluation function to check the system model, and continuously optimizing parameters and algorithms.
5. The method for constructing a language couplet information processing system according to claim 4, wherein the incrementally expanding the Seq2Seq model in step four specifically comprises:
(1) the sequence couplet system model receives an input clause, captures the meaning of the clause by using a Recurrent Neural Network (RNN) for characters, and obtains a single vector representing a preceding clause; decoding the input vector into subsequent clauses by intelligent character generation using another recurrent neural network; generating a sequence of sentences based on a global hierarchy by encoding and decoding;
(2) introducing an attention mechanism into the model to allow the decoder to dynamically select and linearize, combining different portions of the input sequence with different weights; the attention mechanism essentially models the alignment between the input and output positions, and thus treats it as a locally matching model; meanwhile, the attention mechanism can also solve the problem of pitch coding;
(3) adding a smoothing algorithm for the generation of the couplet so that the couplet generator can polish the generated couplet for multiple times and perform one or more iterations to improve the wording; the draft generated from the previous iteration is brought into a hidden state, and a pair of couplets modified by polishing is generated in the next iteration; i.e. the information representative of the previously generated clause draft, will be used again as input, as semantically coherent additional information.
6. The method for constructing a language couplet information processing system according to claim 4, wherein the evaluation function in the sixth step is:
selecting cross entropy as a loss function; the formula is as follows:
Figure RE-RE-FDA0003561606320000031
wherein p, q represent two different distributions, one is true distribution and one is predicted distribution, respectively; and calculating the similarity between the distribution on the test set and the real situation by the above formula, thereby judging the accuracy of the model in the test set.
7. A computer program for implementing the language couplet information processing method according to claim 3.
8. An information data processing terminal for implementing the language association information processing method of claim 3.
9. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the language collation information processing method according to claim 3.
CN202111517180.7A 2021-12-08 2021-12-08 Language association information processing system, method and construction method Pending CN114417893A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111517180.7A CN114417893A (en) 2021-12-08 2021-12-08 Language association information processing system, method and construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111517180.7A CN114417893A (en) 2021-12-08 2021-12-08 Language association information processing system, method and construction method

Publications (1)

Publication Number Publication Date
CN114417893A true CN114417893A (en) 2022-04-29

Family

ID=81264936

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111517180.7A Pending CN114417893A (en) 2021-12-08 2021-12-08 Language association information processing system, method and construction method

Country Status (1)

Country Link
CN (1) CN114417893A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532328A (en) * 2019-08-26 2019-12-03 哈尔滨工程大学 A kind of text concept figure building method
CN110837546A (en) * 2019-09-24 2020-02-25 平安科技(深圳)有限公司 Hidden head pair generation method, device, equipment and medium based on artificial intelligence
CN111126061A (en) * 2019-12-24 2020-05-08 北京百度网讯科技有限公司 Method and device for generating antithetical couplet information
CN111797611A (en) * 2020-07-24 2020-10-20 中国平安人寿保险股份有限公司 Couplet generation model, couplet generation method, couplet generation device, computer device, and medium
CN112883709A (en) * 2021-04-18 2021-06-01 沈阳雅译网络技术有限公司 Method for automatically generating couplet by using natural language processing technology

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532328A (en) * 2019-08-26 2019-12-03 哈尔滨工程大学 A kind of text concept figure building method
CN110837546A (en) * 2019-09-24 2020-02-25 平安科技(深圳)有限公司 Hidden head pair generation method, device, equipment and medium based on artificial intelligence
CN111126061A (en) * 2019-12-24 2020-05-08 北京百度网讯科技有限公司 Method and device for generating antithetical couplet information
CN111797611A (en) * 2020-07-24 2020-10-20 中国平安人寿保险股份有限公司 Couplet generation model, couplet generation method, couplet generation device, computer device, and medium
CN112883709A (en) * 2021-04-18 2021-06-01 沈阳雅译网络技术有限公司 Method for automatically generating couplet by using natural language processing technology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
尹中: "语言拓扑性的探索及对联系统的建设" *

Similar Documents

Publication Publication Date Title
Eisenstein Introduction to natural language processing
CN111259653B (en) Knowledge graph question-answering method, system and terminal based on entity relationship disambiguation
CN106484682B (en) Machine translation method, device and electronic equipment based on statistics
CN108628935B (en) Question-answering method based on end-to-end memory network
Zhang et al. Deep Learning+ Student Modeling+ Clustering: A Recipe for Effective Automatic Short Answer Grading.
CN112541356B (en) Method and system for recognizing biomedical named entities
CN113779220A (en) Mongolian multi-hop question-answering method based on three-channel cognitive map and graph attention network
CN113408430B (en) Image Chinese description system and method based on multi-level strategy and deep reinforcement learning framework
CN113657123A (en) Mongolian aspect level emotion analysis method based on target template guidance and relation head coding
CN107679225A (en) A kind of reply generation method based on keyword
CN112818118A (en) Reverse translation-based Chinese humor classification model
CN114881042A (en) Chinese emotion analysis method based on graph convolution network fusion syntax dependence and part of speech
CN114757184B (en) Method and system for realizing knowledge question and answer in aviation field
Munir et al. Adaptive convolution for semantic role labeling
CN116186216A (en) Question generation method and system based on knowledge enhancement and double-graph interaction
Zhu Machine reading comprehension: Algorithms and practice
CN112417170B (en) Relationship linking method for incomplete knowledge graph
CN112100342A (en) Knowledge graph question-answering method based on knowledge representation learning technology
CN116258147A (en) Multimode comment emotion analysis method and system based on heterogram convolution
Lee Natural Language Processing: A Textbook with Python Implementation
CN115796187A (en) Open domain dialogue method based on dialogue structure diagram constraint
Wang et al. Predicting the Chinese poetry prosodic based on a developed BERT model
CN114417893A (en) Language association information processing system, method and construction method
CN113360606A (en) Knowledge graph question-answer joint training method based on Filter
Santana et al. Reducing the impact of out of vocabulary words in the translation of natural language questions into SPARQL queries

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination