CN108021547B - Natural language generation method, natural language generation device and electronic equipment - Google Patents

Natural language generation method, natural language generation device and electronic equipment Download PDF

Info

Publication number
CN108021547B
CN108021547B CN201610965589.8A CN201610965589A CN108021547B CN 108021547 B CN108021547 B CN 108021547B CN 201610965589 A CN201610965589 A CN 201610965589A CN 108021547 B CN108021547 B CN 108021547B
Authority
CN
China
Prior art keywords
sentence pattern
semantics
pattern template
template
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610965589.8A
Other languages
Chinese (zh)
Other versions
CN108021547A (en
Inventor
丁磊
郑继川
董滨
姜珊珊
童毅轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to CN201610965589.8A priority Critical patent/CN108021547B/en
Priority to JP2017204160A priority patent/JP6601470B2/en
Publication of CN108021547A publication Critical patent/CN108021547A/en
Application granted granted Critical
Publication of CN108021547B publication Critical patent/CN108021547B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a natural language generation method, a natural language generation device and electronic equipment. The sentence pattern template is directly extracted from the corpus, the sentence pattern correctness of the subsequently generated natural sentences is ensured, and the sentence pattern template is extracted only by deleting the composition components in the sentences predefined in the input mode, so that excessive manual work is avoided. In addition, the method selects the candidate sentence pattern template based on the matching degree between the input semantics and the sentence pattern template, thereby improving the correctness of the generated natural sentence.

Description

Natural language generation method, natural language generation device and electronic equipment
Technical Field
The present invention relates to the field of natural language processing technologies, and in particular, to a natural language generation method, a natural language generation device, and an electronic device.
Background
With the development of artificial intelligence, the application range of intelligent systems such as man-machine conversation and the like is wider and wider, and the demand for anthropomorphic output, namely the demand for directly outputting natural language, is higher and higher. The prior art generates and outputs natural language implementation schemes, including: 1) generating a natural sentence through a predefined language model; 2) natural sentences are generated through a manually defined template.
The two methods have certain problems in the practical application process. For example, in the implementation of the 1 st embodiment, it is difficult to express the grammar and logic relationship of natural language well through a mathematical model, so it is difficult to ensure the correctness of the generated language; the 2 nd manual template-based approach, which is generally applicable only to a specific field or single use, lacks flexibility and requires a lot of manual work.
Therefore, there is a need for a method for generating natural language, which can improve the implementation flexibility of the solution, reduce the workload of human beings, and improve the correctness of the language generation result.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide a natural language generation method, a natural language generation device, and an electronic device, so as to improve the flexibility of natural sentence generation, reduce the manual workload, and improve the correctness of a language generation result.
In order to solve the above technical problem, a method for generating a natural language according to an embodiment of the present invention includes:
generating at least one sentence pattern template matched with a predefined input pattern according to the sentences in the corpus;
obtaining input semantics based on the input mode, calculating the matching degree between the input semantics and sentence pattern templates, and selecting at least one candidate sentence pattern template with the matching degree meeting a preset condition;
and generating a natural sentence according to the input semantic and the candidate sentence pattern template.
Wherein, in the above method, after the step of generating at least one sentence template matching the predefined input pattern, the method further comprises: calculating the similarity between every two sentence pattern templates;
and in the process of calculating the matching degree between the input semantics and the sentence pattern template, determining a next sentence pattern module for calculating the matching degree according to the similarity between the current sentence pattern template for calculating the matching degree and other sentence pattern templates.
In the above method, the step of calculating the similarity between every two sentence pattern templates includes:
calculating the similarity Sim (p) between every two sentence pattern templates according to the following formula1,p2):
Figure BDA0001145249400000021
Wherein:
Figure BDA0001145249400000022
w represents a word corresponding to the sub-semantics; p is a radical of1、p2Respectively representing a first sentence pattern template and a second sentence pattern template in every two sentence pattern templates; s represents a fill position in the sentence pattern template; t (p, s) represents a set of words in the corpus that can be filled in filling positions s of the sentence pattern template p; num (T ()) represents the number of words in the set T (); n represents the number of words in T (p, s); thetawRepresenting a preset weight coefficient of a word w, x representing a word in T (p, s), cos (w, x) representing the cosine similarity of the word w and x;
T(p1,s)∩T(p2s) denotes the intersection of the two sets, T (p)1,s)∪T(p2S) represents the union of the two sets;
Figure BDA0001145249400000023
the representation is calculated by summing the Y values corresponding to all the filling positions s in the sentence pattern template.
In the above method, the step of calculating the matching degree between the input semantics and the sentence pattern template includes:
aiming at each sub-semantic in the input semantics, respectively determining a first set of words which can be filled in a filling position in the corpus according to the filling position of the sub-semantic in the sentence pattern template; calculating a matching factor of the sub-semantic and a corresponding filling position in the sentence pattern template according to the cosine similarity between the sub-semantic and each word in the first set, wherein the matching factor is positively correlated with the cosine similarity;
and calculating the matching degree between the input semantics and the sentence pattern template according to the matching factor of each sub-semantics and the corresponding filling position in the sentence pattern template.
In the above method, the step of generating a natural sentence according to the input semantic and the candidate sentence pattern template includes:
filling words in the input semantics and/or the replacement semantics to corresponding positions in the candidate sentence pattern template to obtain candidate natural sentences, wherein the semantic similarity between the replacement semantics and the input semantics is higher than a preset threshold value;
and calculating filling semantics formed by sub-semantics of each filling position in the candidate natural sentences and the matching degree between the filling semantics and the corresponding candidate sentence pattern template, and screening out the natural sentences the matching degree of which reaches a preset threshold according to the matching degree.
An embodiment of the present invention further provides a natural language generating apparatus, including:
the template obtaining module is used for generating at least one sentence pattern template matched with a predefined input mode according to the sentences in the corpus;
the template selection module is used for obtaining input semantics based on the input mode, calculating the matching degree between the input semantics and the sentence pattern template, and selecting at least one candidate sentence pattern template of which the matching degree meets a preset condition;
and the sentence generating module generates a natural sentence according to the input semantics and the candidate sentence pattern template.
Wherein, above-mentioned device still includes:
the similarity calculation module is used for calculating the similarity between every two sentence pattern templates after the template acquisition module generates at least one sentence pattern template matched with a predefined input pattern;
the template selection module is also used for determining the next sentence pattern module for calculating the matching degree according to the similarity between the current sentence pattern template for calculating the matching degree and other sentence pattern templates in the process of calculating the matching degree between the input semantics and the sentence pattern template.
In the above apparatus, the similarity calculation module is specifically configured to:
calculating the similarity Sim (p) between every two sentence pattern templates according to the following formula1,p2):
Figure BDA0001145249400000031
Wherein:
Figure BDA0001145249400000041
w represents a word corresponding to the sub-semantics; p is a radical of1、p2Respectively representing a first sentence pattern template and a second sentence pattern template in every two sentence pattern templates; s represents a fill position in the sentence pattern template; t (p, s) represents a set of words in the corpus that can be filled in filling positions s of the sentence pattern template p; num (T ()) represents the number of words in the set T (); n represents the number of words in T (p, s); thetawRepresenting a preset weight coefficient of a word w, x representing a word in T (p, s), cos (w, x) representing the cosine similarity of the word w and x;
T(p1,s)∩T(p2s) denotes the intersection of the two sets, T (p)1,s)∪T(p2S) represents the union of the two sets;
Figure BDA0001145249400000042
the representation is calculated by summing the Y values corresponding to all the filling positions s in the sentence pattern template.
In the above apparatus, the template selecting module is specifically configured to:
aiming at each sub-semantic in the input semantics, respectively determining a first set of words which can be filled in a filling position in the corpus according to the filling position of the sub-semantic in the sentence pattern template; calculating a matching factor of the sub-semantic and a corresponding filling position in the sentence pattern template according to the cosine similarity between the sub-semantic and each word in the first set, wherein the matching factor is positively correlated with the cosine similarity;
and calculating the matching degree between the input semantics and the sentence pattern template according to the matching factor of each sub-semantics and the corresponding filling position in the sentence pattern template.
In the above apparatus, the sentence generation module is specifically configured to fill words in the input semantics and/or the replacement semantics to corresponding positions in the candidate sentence pattern template to obtain a natural sentence, where a semantic similarity between the replacement semantics and the input semantics is higher than a preset threshold; and calculating filling semantics formed by sub-semantics of each filling position in the candidate natural sentences and the matching degree between the filling semantics and the corresponding candidate sentence pattern template, and screening out the natural sentences the matching degree of which reaches a preset threshold according to the matching degree.
The embodiment of the invention also provides an electronic device for people counting, which comprises:
a processor;
and a memory having computer program instructions stored therein,
wherein the computer program instructions, when executed by the processor, cause the processor to perform the steps of:
generating at least one sentence pattern template matched with a predefined input pattern according to the sentences in the corpus;
obtaining input semantics based on the input mode, calculating the matching degree between the input semantics and sentence pattern templates, and selecting at least one candidate sentence pattern template with the matching degree meeting a preset condition;
and generating a natural sentence according to the input semantic and the candidate sentence pattern template.
Compared with the prior art, the natural language generation method, the natural language generation device and the electronic equipment provided by the embodiment of the invention at least have the following beneficial effects: the embodiment of the invention directly extracts the sentence pattern template from the corpus, ensures the sentence pattern accuracy of the subsequently generated natural sentences, and only needs to delete the composition components in the pre-defined sentences in the input mode to avoid excessive manual work. In addition, the embodiment of the invention selects the candidate sentence pattern template based on the matching degree between the input semantics and the sentence pattern template, improves the correctness of the generated natural sentence, filters the generated natural sentence through the matching degree, and can give consideration to the correctness and diversity of the obtained natural sentence.
Drawings
Fig. 1 is a schematic flowchart of a method for generating a natural language according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a natural language generation method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a natural language generating apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of another natural language generating apparatus according to a third embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In the following description, specific details such as specific configurations and components are provided only to help fully understand the embodiments of the present invention in order to make technical problems, technical solutions and advantages to be solved more clear. Thus, it will be apparent to those skilled in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
In various embodiments of the present invention, it should be understood that the sequence numbers of the following processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention. It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.
First, the related concepts related to the following embodiments of the present invention will be explained.
In the embodiment of the present invention, the input mode refers to a classification of the input word, specifically, the classification may include nouns, verbs, adjectives, numerators, quantifiers, adverbs, pronouns, conjunctions, prepositions, auxiliary words, and word atmosphere words, for example, the input mode is: a noun and a verb are input. The input pattern may be a component of or a role that the input word plays in a grammatical structure, and specifically, the component may be a subject, a predicate, an object, a fixed term, a subject, a complement, or the like. That is, the input pattern defines the composition of the input word in the sentence.
Input semantics refers to the input word or word vector (another representation of a word). Since a plurality of words or word vectors may be included in an input semantic, each word or word vector in the input semantic is referred to herein as a sub-semantic, for example, the input semantic is: "Jingdong" and "shop" are taken as an input semantic meaning, and "Jingdong" and "shop" are respectively a sub-semantic meaning of the input semantic meanings.
The sentence pattern template is obtained after sentence components defined in the input pattern are removed from the sentence. For example, for the sentence "we buy clothes in the mall", if the predefined input templates are subject and predicate, the resulting sentence template after deleting the subject "we" and the predicate "buy" from the above sentence is: [ subject ] clothing in a mall [ predicate ]. Wherein, the part in [ is the filling position of the composition components defined in the input mode in the sentence, and the components of the filling position are deleted. Subsequently, when a natural sentence is generated, each sub-semantic in the input semantics conforming to the input pattern is filled in the corresponding filling position, and the sentence (natural sentence) in the natural language can be obtained. For example, the input semantics are "miss king" and "sell", and after filling into the above sentence pattern template, a sentence is obtained: miss king sells clothes in the mall.
The invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
< example one >
As shown in fig. 1, an embodiment of the present invention provides a natural language generating method, which may be applied in an environment such as a human-computer dialog system or an image description generating system. Referring to fig. 1, the method includes:
step 11, generating at least one sentence pattern template matching the predefined input pattern according to the sentences in the corpus.
The embodiment of the invention directly deletes the composition components in the sentence defined in the input mode from the sentence in the preset corpus to obtain the sentence pattern template. In the sentence pattern template, the positions of the deleted composition components are left blank as filling positions for subsequently filling corresponding words in the input semantics. Since a large number of sentences are usually stored in the corpus, there may be a plurality of sentences matching the input pattern, and a plurality of sentence templates can be extracted for the matching sentences.
Here, the input mode may be an input mode defined or determined by a user, or an input mode generated by a system, for example, the image description generation system may recognize content in an image and describe the content in a natural language, and in this case, the input mode may be an input mode generated after the system recognizes the content in the image.
And step 12, obtaining input semantics based on the input mode, calculating the matching degree between the input semantics and the sentence pattern template, and selecting at least one candidate sentence pattern template with the matching degree meeting a preset condition.
In step 12, the matching degree between the input semantics and the sentence pattern template is calculated, and then the sentence pattern template with the matching degree satisfying the predetermined condition is selected as the candidate sentence pattern template. The predetermined condition may be set according to the scene requirement or the calculation amount, for example, it may be a sentence pattern template with a matching degree exceeding a preset numerical threshold, or may be N sentence pattern templates with the highest matching degree, where N is a positive integer. Similarly, the input semantics may be user input semantics or semantics that are self-generated by a system, such as the aforementioned semantics generated by the image description generation system.
In step 12, when calculating the matching degree between the input semantics and the sentence pattern template, the calculation may be specifically performed in the following manner:
step 121, determining, for each sub-semantic in the input semantics, a first set of words that can be filled in the filling position in the corpus according to the filling position of the sub-semantic in the sentence pattern template; and calculating a matching factor of the sub-semantic and a corresponding filling position in the sentence pattern template according to the cosine similarity between the sub-semantic and each word in the first set, wherein the matching factor is positively correlated with the cosine similarity.
In the step 121, when determining the first set of words that can be filled in the corpus at the filling position, determining and obtaining the words in the first set according to the words in the filling position of the sentence in the corpus that is matched with the sentence pattern template, and then calculating cosine similarity between the sub-semantic and each word in the first set, a preferred calculation method is to calculate cosine similarity (cosine distance) between the word vector corresponding to the sub-semantic and the word vector corresponding to each word in the first set, and then calculate the matching factor according to the cosine similarity, where the matching factor is positively correlated with the cosine similarity, that is, the larger the cosine similarity is, the larger the value of the matching factor is, that is, the more the two are matched; conversely, the smaller the cosine similarity, the smaller the value of the matching factor, i.e. the more mismatched the two. One way of calculating the matching factor is provided below, and it should be noted that the embodiments of the present invention are not limited thereto.
Figure BDA0001145249400000081
In the above formula (1), w represents a word corresponding to the sub-semantic, s represents a filling position in the sentence pattern template p, and AM (p, s, w) represents the words w andmatching factor, θ, of fill position s in sentence pattern template pwThe preset weight coefficient represents a word w, T (p, s) represents a set of words in a corpus capable of being filled in filling positions s of a sentence pattern template p, n represents the number of words in T (p, s), x represents a word in T (p, s), and cos (w, x) represents cosine similarity of the words w and x.
And step 122, calculating the matching degree between the input semantics and the sentence pattern template according to the matching factor of each sub-semantics and the corresponding filling position in the sentence pattern template.
In step 122, an average value of the matching factors of each sub-semantic and the corresponding filling position in the sentence pattern template may be calculated, and the average value is used as the matching degree between the input semantic and the sentence pattern template, or a sum value of the matching factors of all sub-semantics and the corresponding filling position in the sentence pattern template may be calculated, and the sum value is used as the matching degree between the input semantic and the sentence pattern template.
And step 13, generating a natural sentence according to the input semantic and the candidate sentence pattern template.
In step 13, one way to obtain the natural sentence is: after the candidate sentence pattern template is selected, the words in the input semantics can be filled to the corresponding filling positions in the candidate sentence pattern template, and the natural sentence is obtained.
In order to obtain the diversity of natural sentences, one implementation manner of the step 13 is as follows: and determining a plurality of alternative semantics of which the semantic similarity with the input semantics is higher than a preset threshold, and then filling words in the input semantics and/or the alternative semantics to corresponding positions in the candidate sentence pattern template to obtain more styles of natural sentences, wherein the semantic similarity can be calculated according to the cosine similarity between word vectors.
In order to balance the correctness and diversity of the obtained natural language sentence, one implementation manner in the step 13 is: filling words in the input semantics and/or the replacement semantics to corresponding positions in the candidate sentence pattern template to obtain candidate natural sentences; and calculating filling semantics formed by sub-semantics of each filling position in the candidate natural sentences and the matching degree between the filling semantics and the corresponding candidate sentence pattern template, and screening out the natural sentences the matching degree of which reaches a preset threshold according to the matching degree. The calculation method of the matching degree here can refer to the implementation of the above steps 121 to 122, and is not described here again.
Through the steps, the sentence pattern template is directly extracted from the corpus, the sentence pattern accuracy of the subsequently generated natural sentences is guaranteed, and the sentence pattern template is extracted only by deleting the components in the sentences predefined in the input mode, so that excessive manual work is avoided. In addition, the embodiment of the invention selects the candidate sentence pattern template based on the matching degree between the input semantics and the sentence pattern template, improves the correctness of the generated natural sentence, filters the generated natural sentence through the matching degree, and can give consideration to the correctness and diversity of the obtained natural sentence.
< example two >
As shown in fig. 2, in order to improve the efficiency of subsequently selecting candidate sentence pattern templates, the method for generating natural language according to the second embodiment of the present invention further calculates the similarity between every two sentence pattern templates after obtaining the sentence pattern templates, and further improves the selection efficiency of the subsequent candidate sentence pattern templates by using the similarity between the sentence pattern templates. Referring to fig. 2, the method includes:
step 21, generating at least one sentence template matching the predefined input pattern according to the sentences in the corpus.
Here, the specific implementation of generating the sentence pattern template may refer to embodiment one, and details are not described here.
And step 22, calculating the similarity between every two sentence pattern templates in the at least one sentence pattern template.
Here, in the above step 22, the similarity Sim (p) between every two sentence pattern templates can be calculated according to the following formula1,p2):
Figure BDA0001145249400000101
Wherein:
Figure BDA0001145249400000102
in the above formula, w represents the word corresponding to the sub-semantics, p1、p2Respectively representing a first sentence pattern template and a second sentence pattern template in every two sentence pattern templates; s represents a fill position in the sentence pattern template; t (p, s) represents a set of words in the corpus that can be filled in filling positions s of the sentence pattern template p; num (T ()) represents the number of words in the set T (); AM (p, s, w) represents the matching factor of the word w with the filling position s in the sentence pattern template p; n represents the number of words in T (p, s); thetawRepresenting a preset weight coefficient of a word w, x representing a word in T (p, s), cos (w, x) representing the cosine similarity of the word w and x;
T(p1,s)∩T(p2s) denotes the intersection of the two sets, T (p)1,s)∪T(p2S) represents the union of the two sets;
Figure BDA0001145249400000103
the representation is calculated by summing the Y values corresponding to all the filling positions s in the sentence pattern template.
And step 23, obtaining input semantics based on the input pattern, calculating a matching degree between the input semantics and the sentence pattern template, and selecting at least one candidate sentence pattern template with the matching degree meeting a preset condition, wherein in the calculation process of the matching degree, the next sentence pattern template for calculating the matching degree is determined according to the similarity between the sentence pattern templates.
Here, in step 23, in the process of calculating the matching degree between the input semantics and the sentence pattern template, a sentence pattern module for calculating the matching degree next is determined according to the similarity between the current sentence pattern template for calculating the matching degree and other sentence pattern templates, so as to improve the efficiency of selecting candidate sentence pattern templates.
For example, firstly, a sentence pattern template is selected from the at least one sentence pattern template obtained in step 21 as the current sentence pattern template; then, calculating the matching degree between the input semantics and the current sentence pattern template: if the matching degree does not reach a preset first threshold, selecting one sentence pattern template from the rest sentence pattern templates in the at least one sentence pattern template as a next sentence pattern module for calculating the matching degree; and if the matching degree reaches a preset first threshold, selecting a sentence pattern template to be calculated, the similarity of which with the current sentence pattern template exceeds a preset second threshold, as a next sentence pattern module for calculating the matching degree according to the similarity between sentence pattern templates. When the number of the remaining sentence pattern templates in the at least one sentence pattern template is 0, or the number of the sentence pattern templates with the input semantic matching degree reaching the preset threshold reaches the preset number, the calculation process of the matching degree may be ended, and at least one candidate sentence pattern template satisfying the preset condition is selected according to the calculated matching degree between the input semantic and each sentence pattern template.
Of course, the above example is only one example of accelerating the selection process of the candidate sentence pattern templates, and the embodiment of the present invention may also adopt other algorithms based on the similarity between the sentence pattern templates to improve the selection efficiency.
As to how to calculate the matching degree between the input semantics and the sentence pattern template, reference may be made to the implementation process in the first embodiment, which is not described herein again.
And 24, generating a natural sentence according to the input semantic and the candidate sentence pattern template.
Here, in step 24, first, words in the input semantics and/or the replacement semantics may be filled into corresponding positions in the candidate sentence pattern template to obtain candidate natural sentences, where semantic similarity between the replacement semantics and the input semantics is higher than a preset threshold, and sub-semantics (words) in the replacement semantics may be selected from the same corpus in step 21 or from other corpora, for example, from an internet corpus, so as to improve diversity of subsequently generated natural sentences. Then, calculating filling semantics formed by sub-semantics of each filling position in the candidate natural sentences and matching degrees between the filling semantics and corresponding candidate sentence pattern templates, and screening out natural sentences of which the matching degrees reach a preset threshold according to the matching degrees.
< example three >
Referring to fig. 3, an embodiment of the present invention provides a device for implementing a natural language generating method according to the foregoing embodiments, where the natural language generating device 30 includes:
a template obtaining module 31, configured to generate at least one sentence pattern template matching a predefined input pattern according to the sentences in the corpus;
a template selection module 32, configured to obtain input semantics based on the input pattern, calculate a matching degree between the input semantics and a sentence pattern template, and select at least one candidate sentence pattern template whose matching degree satisfies a predetermined condition;
and a sentence generating module 33 for generating a natural sentence according to the input semantic and the candidate sentence pattern template.
To improve the selection message of the candidate sentence pattern template, as shown in fig. 4, the natural language generating apparatus according to the embodiment of the present invention may further include: a similarity calculation module 34, configured to calculate a similarity between every two sentence pattern templates after the template obtaining module generates at least one sentence pattern template matching the predefined input pattern. At this time, the template selecting module 32 is further configured to determine a next sentence pattern module for calculating the matching degree according to the similarity between the current sentence pattern template for calculating the matching degree and other sentence pattern templates in the process of calculating the matching degree between the input semantics and the sentence pattern template.
The template selecting module 32 may specifically include: a first selection submodule, configured to select a sentence pattern template from the at least one sentence pattern template as a current sentence pattern template; the calculation submodule is used for calculating the matching degree between the input semantics and the current sentence pattern template; a first processing sub-module, configured to select one sentence pattern template from the remaining sentence pattern templates in the at least one sentence pattern template as a next sentence pattern module for calculating the matching degree when the matching degree between the input semantics and the current sentence pattern template does not reach a preset first threshold; and the second processing sub-module is used for selecting the sentence pattern template to be calculated, the similarity of which with the current sentence pattern template exceeds a preset second threshold, as a next sentence pattern module for calculating the matching degree according to the similarity between the sentence pattern templates when the matching degree between the input semantics and the current sentence pattern template reaches a preset first threshold.
The similarity calculation module 34 is specifically configured to:
calculating the similarity Sim (p) between every two sentence pattern templates according to the following formula1,p2):
Figure BDA0001145249400000121
Wherein:
Figure BDA0001145249400000122
w represents a word corresponding to the sub-semantics; p is a radical of1、p2Respectively representing a first sentence pattern template and a second sentence pattern template in every two sentence pattern templates; s represents a fill position in the sentence pattern template; t (p, s) represents a set of words in the corpus that can be filled in filling positions s of the sentence pattern template p; num (T ()) represents the number of words in the set T (); n represents the number of words in T (p, s); thetawRepresenting a preset weight coefficient of a word w, x representing a word in T (p, s), cos (w, x) representing the cosine similarity of the word w and x;
T(p1,s)∩T(p2s) denotes the intersection of the two sets, T (p)1,s)∪T(p2S) represents the union of the two sets;
Figure BDA0001145249400000131
the representation is calculated by summing the Y values corresponding to all the filling positions s in the sentence pattern template.
Here, the template selecting module 32, when calculating the matching degree between the input semantic meaning and the sentence pattern template, is specifically configured to: aiming at each sub-semantic in the input semantics, respectively determining a first set of words which can be filled in a filling position in the corpus according to the filling position of the sub-semantic in the sentence pattern template; calculating a matching factor of the sub-semantic and a corresponding filling position in the sentence pattern template according to the cosine similarity between the sub-semantic and each word in the first set, wherein the matching factor is positively correlated with the cosine similarity; then, according to the matching factor of each sub-semantic and the corresponding filling position in the sentence pattern template, calculating the matching degree (such as taking the average value or sum value of the matching factors) between the input semantic and the sentence pattern template.
Here, the sentence generating module 33 is specifically configured to fill words in the input semantics and/or the replacement semantics to corresponding positions in the candidate sentence pattern template to obtain a natural sentence, where a semantic similarity between the replacement semantics and the input semantics is higher than a preset threshold; and calculating filling semantics formed by sub-semantics of each filling position in the candidate natural sentences and the matching degree between the filling semantics and the corresponding candidate sentence pattern template, and screening out the natural sentences the matching degree of which reaches a preset threshold according to the matching degree.
< example four >
Referring to fig. 5, an embodiment of the present invention further provides an electronic device for people counting, which can implement the processes of the embodiments shown in fig. 1 or fig. 2 in the embodiments of the present invention. The electronic device may be a Personal Computer (PC), a tablet PC, various smart devices (including smart glasses or smart phones), and the like. As shown in fig. 5, the electronic device 50 may include:
a processor 51;
and a memory having computer program instructions stored therein. Specifically, the memory may include a RAM (random access memory) 52, a ROM (read only memory) 53.
Wherein the computer program instructions, when executed by the processor 51, cause the processor 51 to perform the steps of:
generating at least one sentence pattern template matched with a predefined input pattern according to the sentences in the corpus;
obtaining input semantics based on the input mode, calculating the matching degree between the input semantics and sentence pattern templates, and selecting at least one candidate sentence pattern template with the matching degree meeting a preset condition;
and generating a natural sentence according to the input semantic and the candidate sentence pattern template.
Referring to fig. 5, the electronic device according to the embodiment of the present invention may further include a hard disk 54, an input device 55, a display device 56, and other components. In particular, the input device 55 may be a device with input function and/or receiving function, such as a keyboard, a touch screen, various interfaces to obtain predefined input modes and input semantics. The display device 56 may be an LED display panel or display and may be used to display information such as generated natural language sentences.
The processor 51, RAM52, ROM 53, hard disk 54, input device 55, and display device 56 described above may be interconnected by a bus architecture. A bus architecture may be any architecture that may include any number of interconnected buses and bridges. Various circuits of one or more Central Processing Units (CPUs), represented in particular by processor 52, and one or more memories, represented by RAM52 and ROM 53, are connected together. The bus architecture may also connect various other circuits such as peripherals, voltage regulators, power management circuits, etc., which are well known in the art, and therefore, will not be described in any detail herein.
The input device 55 is used for inputting and storing the sample of the network request data in the hard disk 54.
The RAM52 and the ROM 53 are used to store programs and data necessary for system operation, and data such as intermediate results in the calculation process of the processor.
In the embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the transceiving method according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (7)

1. A method for generating a natural language, comprising:
generating at least one sentence pattern template matched with a predefined input pattern according to the sentences in the corpus;
obtaining input semantics based on the input mode, calculating the matching degree between the input semantics and sentence pattern templates, and selecting at least one candidate sentence pattern template with the matching degree meeting a preset condition;
generating a natural sentence according to the input semantic and the candidate sentence pattern template,
after the step of generating at least one sentence template matching the predefined input pattern, the method further comprises: calculating the similarity between every two sentence pattern templates;
in the process of calculating the matching degree between the input semantics and the sentence pattern template, determining the next sentence pattern module for calculating the matching degree according to the similarity between the current sentence pattern template for calculating the matching degree and other sentence pattern templates,
wherein, the step of calculating the similarity between every two sentence pattern templates comprises the following steps:
calculating the similarity Sim (p) between every two sentence pattern templates according to the following formula1,p2):
Figure FDA0002812055540000011
Wherein:
Figure FDA0002812055540000012
w represents a word corresponding to the sub-semantics; p is a radical of1、p2Respectively representing a first sentence pattern template and a second sentence pattern template in every two sentence pattern templates; s represents a fill position in the sentence pattern template; t (p, s) represents a set of words in the corpus that can be filled in filling positions s of the sentence pattern template p; num (T ()) represents the number of words in the set T (); n represents the number of words in T (p, s); thetawRepresenting a preset weight coefficient of a word w, x representing a word in T (p, s), cos (w, x) representing the cosine similarity of the word w and x;
T(p1,s)∩T(p2s) denotes the intersection of the two sets, T (p)1,s)∪T(p2S) represents the union of the two sets;
Figure FDA0002812055540000013
the representation is calculated by summing the Y values corresponding to all the filling positions s in the sentence pattern template.
2. The method of claim 1,
the step of calculating the matching degree between the input semantics and the sentence pattern template comprises the following steps:
aiming at each sub-semantic in the input semantics, respectively determining a first set of words which can be filled in a filling position in the corpus according to the filling position of the sub-semantic in the sentence pattern template; calculating a matching factor of the sub-semantic and a corresponding filling position in the sentence pattern template according to the cosine similarity between the sub-semantic and each word in the first set, wherein the matching factor is positively correlated with the cosine similarity;
and calculating the matching degree between the input semantics and the sentence pattern template according to the matching factor of each sub-semantics and the corresponding filling position in the sentence pattern template.
3. The method of claim 1,
the step of generating a natural sentence according to the input semantics and the candidate sentence pattern template includes:
filling words in the input semantics and/or the replacement semantics to corresponding positions in the candidate sentence pattern template to obtain candidate natural sentences, wherein the semantic similarity between the replacement semantics and the input semantics is higher than a preset threshold value;
and calculating filling semantics formed by sub-semantics of each filling position in the candidate natural sentences and the matching degree between the filling semantics and the corresponding candidate sentence pattern template, and screening out the natural sentences the matching degree of which reaches a preset threshold according to the matching degree.
4. A natural language generation apparatus, comprising:
the template obtaining module is used for generating at least one sentence pattern template matched with a predefined input mode according to the sentences in the corpus;
the template selection module is used for obtaining input semantics based on the input mode, calculating the matching degree between the input semantics and the sentence pattern template, and selecting at least one candidate sentence pattern template of which the matching degree meets a preset condition;
the sentence generating module generates a natural sentence according to the input semantics and the candidate sentence pattern template;
the similarity calculation module is used for calculating the similarity between every two sentence pattern templates after the template acquisition module generates at least one sentence pattern template matched with a predefined input pattern;
the template selection module is also used for determining the next sentence pattern module for calculating the matching degree according to the similarity between the current sentence pattern template for calculating the matching degree and other sentence pattern templates in the process of calculating the matching degree between the input semantics and the sentence pattern template,
the similarity calculation module is specifically configured to:
calculating the similarity Sim (p) between every two sentence pattern templates according to the following formula1,p2):
Figure FDA0002812055540000031
Wherein:
Figure FDA0002812055540000032
w represents a word corresponding to the sub-semantics; p is a radical of1、p2Respectively representing a first sentence pattern template and a second sentence pattern template in every two sentence pattern templates; s represents a fill position in the sentence pattern template; t (p, s) represents a set of words in the corpus that can be filled in filling positions s of the sentence pattern template p; num (T ()) represents the number of words in the set T (); n represents the number of words in T (p, s); thetawA preset weight coefficient representing the word w, x represents the word in T (p, s),cos (w, x) represents the cosine similarity of the words w and x;
T(p1,s)∩T(p2s) denotes the intersection of the two sets, T (p)1,s)∪T(p2S) represents the union of the two sets;
Figure FDA0002812055540000033
the representation is calculated by summing the Y values corresponding to all the filling positions s in the sentence pattern template.
5. The apparatus of claim 4,
the template selection module is specifically configured to:
aiming at each sub-semantic in the input semantics, respectively determining a first set of words which can be filled in a filling position in the corpus according to the filling position of the sub-semantic in the sentence pattern template; calculating a matching factor of the sub-semantic and a corresponding filling position in the sentence pattern template according to the cosine similarity between the sub-semantic and each word in the first set, wherein the matching factor is positively correlated with the cosine similarity;
and calculating the matching degree between the input semantics and the sentence pattern template according to the matching factor of each sub-semantics and the corresponding filling position in the sentence pattern template.
6. The apparatus of claim 4,
the sentence generating module is specifically configured to fill words in the input semantics and/or the replacement semantics to corresponding positions in the candidate sentence pattern template to obtain a natural sentence, where a semantic similarity between the replacement semantics and the input semantics is higher than a preset threshold; and calculating filling semantics formed by sub-semantics of each filling position in the candidate natural sentences and the matching degree between the filling semantics and the corresponding candidate sentence pattern template, and screening out the natural sentences the matching degree of which reaches a preset threshold according to the matching degree.
7. An electronic device, comprising:
a processor;
and a memory having computer program instructions stored therein,
wherein the computer program instructions, when executed by the processor, cause the processor to perform the steps of:
generating at least one sentence pattern template matched with a predefined input pattern according to the sentences in the corpus;
obtaining input semantics based on the input mode, calculating the matching degree between the input semantics and sentence pattern templates, and selecting at least one candidate sentence pattern template with the matching degree meeting a preset condition;
generating a natural sentence according to the input semantic and the candidate sentence pattern template,
after the step of generating at least one sentence template matching the predefined input pattern, the following steps are also performed: calculating the similarity between every two sentence pattern templates;
in the process of calculating the matching degree between the input semantics and the sentence pattern template, determining the next sentence pattern module for calculating the matching degree according to the similarity between the current sentence pattern template for calculating the matching degree and other sentence pattern templates,
wherein, the step of calculating the similarity between every two sentence pattern templates comprises the following steps:
calculating the similarity Sim (p) between every two sentence pattern templates according to the following formula1,p2):
Figure FDA0002812055540000041
Wherein:
Figure FDA0002812055540000042
w represents a word corresponding to the sub-semantics; p is a radical of1、p2Respectively represent every two sentence patternsA first sentence pattern template and a second sentence pattern template in the template; s represents a fill position in the sentence pattern template; t (p, s) represents a set of words in the corpus that can be filled in filling positions s of the sentence pattern template p; num (T ()) represents the number of words in the set T (); n represents the number of words in T (p, s); thetawRepresenting a preset weight coefficient of a word w, x representing a word in T (p, s), cos (w, x) representing the cosine similarity of the word w and x;
T(p1,s)∩T(p2s) denotes the intersection of the two sets, T (p)1,s)∪T(p2S) represents the union of the two sets;
Figure FDA0002812055540000051
the representation is calculated by summing the Y values corresponding to all the filling positions s in the sentence pattern template.
CN201610965589.8A 2016-11-04 2016-11-04 Natural language generation method, natural language generation device and electronic equipment Active CN108021547B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610965589.8A CN108021547B (en) 2016-11-04 2016-11-04 Natural language generation method, natural language generation device and electronic equipment
JP2017204160A JP6601470B2 (en) 2016-11-04 2017-10-23 NATURAL LANGUAGE GENERATION METHOD, NATURAL LANGUAGE GENERATION DEVICE, AND ELECTRONIC DEVICE

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610965589.8A CN108021547B (en) 2016-11-04 2016-11-04 Natural language generation method, natural language generation device and electronic equipment

Publications (2)

Publication Number Publication Date
CN108021547A CN108021547A (en) 2018-05-11
CN108021547B true CN108021547B (en) 2021-05-04

Family

ID=62084445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610965589.8A Active CN108021547B (en) 2016-11-04 2016-11-04 Natural language generation method, natural language generation device and electronic equipment

Country Status (2)

Country Link
JP (1) JP6601470B2 (en)
CN (1) CN108021547B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086272B (en) * 2018-08-01 2023-02-17 浙江蓝鸽科技有限公司 Sentence pattern recognition method and system
CN108959271B (en) * 2018-08-10 2020-06-16 广州太平洋电脑信息咨询有限公司 Description text generation method and device, computer equipment and readable storage medium
CN109284502B (en) * 2018-09-13 2024-02-13 广州财盟科技有限公司 Text similarity calculation method and device, electronic equipment and storage medium
CN111353293A (en) * 2018-12-21 2020-06-30 深圳市优必选科技有限公司 Statement material generation method and terminal equipment
CN109815486A (en) * 2018-12-25 2019-05-28 出门问问信息科技有限公司 Spatial term method, apparatus, equipment and readable storage medium storing program for executing
CN111666384A (en) * 2019-03-05 2020-09-15 京东数字科技控股有限公司 Task-oriented dialog system intention recognition-oriented corpus generation method and device
SG11202111653XA (en) * 2019-05-02 2021-11-29 The Clinician Pte Ltd System and method for phrase comparison consolidation and reconciliation
CN112101037A (en) * 2019-05-28 2020-12-18 云义科技股份有限公司 Semantic similarity calculation method
CN110222154A (en) * 2019-06-10 2019-09-10 武汉斗鱼鱼乐网络科技有限公司 Similarity calculating method, server and storage medium based on text and semanteme
CN110399499B (en) * 2019-07-18 2022-02-18 珠海格力电器股份有限公司 Corpus generation method and device, electronic equipment and readable storage medium
KR102445497B1 (en) * 2020-12-15 2022-09-21 주식회사 엘지유플러스 Apparatus for generating lexical pattern and training sentence and operating method thereof
WO2023206267A1 (en) * 2022-04-28 2023-11-02 西门子股份公司 Method and apparatus for adjusting natural language statement, and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737042B (en) * 2011-04-08 2015-03-25 北京百度网讯科技有限公司 Method and device for establishing question generation model, and question generation method and device
JP5620349B2 (en) * 2011-07-22 2014-11-05 株式会社東芝 Dialogue device, dialogue method and dialogue program
CN103377239B (en) * 2012-04-26 2020-08-07 深圳市世纪光速信息技术有限公司 Method and device for calculating similarity between texts
CN104391969B (en) * 2014-12-04 2018-01-30 百度在线网络技术(北京)有限公司 Determine the method and device of user's query statement syntactic structure
CN105183848A (en) * 2015-09-07 2015-12-23 百度在线网络技术(北京)有限公司 Human-computer chatting method and device based on artificial intelligence
CN105868313B (en) * 2016-03-25 2019-02-12 浙江大学 A kind of knowledge mapping question answering system and method based on template matching technique

Also Published As

Publication number Publication date
CN108021547A (en) 2018-05-11
JP6601470B2 (en) 2019-11-06
JP2018073411A (en) 2018-05-10

Similar Documents

Publication Publication Date Title
CN108021547B (en) Natural language generation method, natural language generation device and electronic equipment
US20220180202A1 (en) Text processing model training method, and text processing method and apparatus
US20190164064A1 (en) Question and answer interaction method and device, and computer readable storage medium
CN107832432A (en) A kind of search result ordering method, device, server and storage medium
CN109299280B (en) Short text clustering analysis method and device and terminal equipment
CN109284502B (en) Text similarity calculation method and device, electronic equipment and storage medium
CN109117474A (en) Calculation method, device and the storage medium of statement similarity
CN110874528B (en) Text similarity obtaining method and device
JP2020004382A (en) Method and device for voice interaction
CN113255328B (en) Training method and application method of language model
CN111368037A (en) Text similarity calculation method and device based on Bert model
JP2023541742A (en) Sorting model training method and device, electronic equipment, computer readable storage medium, computer program
CN116797695A (en) Interaction method, system and storage medium of digital person and virtual whiteboard
CN111402864A (en) Voice processing method and electronic equipment
CN116821307B (en) Content interaction method, device, electronic equipment and storage medium
CN110895656A (en) Text similarity calculation method and device, electronic equipment and storage medium
CN116738956A (en) Prompt template generation method and device, computer equipment and storage medium
CN109002498B (en) Man-machine conversation method, device, equipment and storage medium
CN114490969B (en) Question and answer method and device based on table and electronic equipment
CN116186219A (en) Man-machine dialogue interaction method, system and storage medium
CN112528646B (en) Word vector generation method, terminal device and computer-readable storage medium
CN111506715B (en) Query method and device, electronic equipment and storage medium
CN110428814B (en) Voice recognition method and device
JPWO2019106758A1 (en) Language processing apparatus, language processing system, and language processing method
CN112988993A (en) Question answering method and computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant