CN108021547B

CN108021547B - Natural language generation method, natural language generation device and electronic equipment

Info

Publication number: CN108021547B
Application number: CN201610965589.8A
Authority: CN
Inventors: 丁磊; 郑继川; 董滨; 姜珊珊; 童毅轩
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2016-11-04
Filing date: 2016-11-04
Publication date: 2021-05-04
Anticipated expiration: 2036-11-04
Also published as: CN108021547A; JP6601470B2; JP2018073411A

Abstract

The invention provides a natural language generation method, a natural language generation device and electronic equipment. The sentence pattern template is directly extracted from the corpus, the sentence pattern correctness of the subsequently generated natural sentences is ensured, and the sentence pattern template is extracted only by deleting the composition components in the sentences predefined in the input mode, so that excessive manual work is avoided. In addition, the method selects the candidate sentence pattern template based on the matching degree between the input semantics and the sentence pattern template, thereby improving the correctness of the generated natural sentence.

Description

Natural language generation method, natural language generation device and electronic equipment

Technical Field

The present invention relates to the field of natural language processing technologies, and in particular, to a natural language generation method, a natural language generation device, and an electronic device.

Background

With the development of artificial intelligence, the application range of intelligent systems such as man-machine conversation and the like is wider and wider, and the demand for anthropomorphic output, namely the demand for directly outputting natural language, is higher and higher. The prior art generates and outputs natural language implementation schemes, including: 1) generating a natural sentence through a predefined language model; 2) natural sentences are generated through a manually defined template.

The two methods have certain problems in the practical application process. For example, in the implementation of the 1 st embodiment, it is difficult to express the grammar and logic relationship of natural language well through a mathematical model, so it is difficult to ensure the correctness of the generated language; the 2 nd manual template-based approach, which is generally applicable only to a specific field or single use, lacks flexibility and requires a lot of manual work.

Therefore, there is a need for a method for generating natural language, which can improve the implementation flexibility of the solution, reduce the workload of human beings, and improve the correctness of the language generation result.

Disclosure of Invention

The technical problem to be solved by the embodiments of the present invention is to provide a natural language generation method, a natural language generation device, and an electronic device, so as to improve the flexibility of natural sentence generation, reduce the manual workload, and improve the correctness of a language generation result.

In order to solve the above technical problem, a method for generating a natural language according to an embodiment of the present invention includes:

generating at least one sentence pattern template matched with a predefined input pattern according to the sentences in the corpus;

obtaining input semantics based on the input mode, calculating the matching degree between the input semantics and sentence pattern templates, and selecting at least one candidate sentence pattern template with the matching degree meeting a preset condition;

and generating a natural sentence according to the input semantic and the candidate sentence pattern template.

Wherein, in the above method, after the step of generating at least one sentence template matching the predefined input pattern, the method further comprises: calculating the similarity between every two sentence pattern templates;

and in the process of calculating the matching degree between the input semantics and the sentence pattern template, determining a next sentence pattern module for calculating the matching degree according to the similarity between the current sentence pattern template for calculating the matching degree and other sentence pattern templates.

In the above method, the step of calculating the similarity between every two sentence pattern templates includes:

calculating the similarity Sim (p) between every two sentence pattern templates according to the following formula₁，p₂)：

Wherein:

w represents a word corresponding to the sub-semantics; p is a radical of₁、p₂Respectively representing a first sentence pattern template and a second sentence pattern template in every two sentence pattern templates; s represents a fill position in the sentence pattern template; t (p, s) represents a set of words in the corpus that can be filled in filling positions s of the sentence pattern template p; num (T ()) represents the number of words in the set T (); n represents the number of words in T (p, s); theta_wRepresenting a preset weight coefficient of a word w, x representing a word in T (p, s), cos (w, x) representing the cosine similarity of the word w and x;

T(p₁,s)∩T(p₂s) denotes the intersection of the two sets, T (p)₁,s)∪T(p₂S) represents the union of the two sets;

the representation is calculated by summing the Y values corresponding to all the filling positions s in the sentence pattern template.

In the above method, the step of calculating the matching degree between the input semantics and the sentence pattern template includes:

aiming at each sub-semantic in the input semantics, respectively determining a first set of words which can be filled in a filling position in the corpus according to the filling position of the sub-semantic in the sentence pattern template; calculating a matching factor of the sub-semantic and a corresponding filling position in the sentence pattern template according to the cosine similarity between the sub-semantic and each word in the first set, wherein the matching factor is positively correlated with the cosine similarity;

and calculating the matching degree between the input semantics and the sentence pattern template according to the matching factor of each sub-semantics and the corresponding filling position in the sentence pattern template.

In the above method, the step of generating a natural sentence according to the input semantic and the candidate sentence pattern template includes:

filling words in the input semantics and/or the replacement semantics to corresponding positions in the candidate sentence pattern template to obtain candidate natural sentences, wherein the semantic similarity between the replacement semantics and the input semantics is higher than a preset threshold value;

and calculating filling semantics formed by sub-semantics of each filling position in the candidate natural sentences and the matching degree between the filling semantics and the corresponding candidate sentence pattern template, and screening out the natural sentences the matching degree of which reaches a preset threshold according to the matching degree.

An embodiment of the present invention further provides a natural language generating apparatus, including:

the template obtaining module is used for generating at least one sentence pattern template matched with a predefined input mode according to the sentences in the corpus;

the template selection module is used for obtaining input semantics based on the input mode, calculating the matching degree between the input semantics and the sentence pattern template, and selecting at least one candidate sentence pattern template of which the matching degree meets a preset condition;

and the sentence generating module generates a natural sentence according to the input semantics and the candidate sentence pattern template.

Wherein, above-mentioned device still includes:

the similarity calculation module is used for calculating the similarity between every two sentence pattern templates after the template acquisition module generates at least one sentence pattern template matched with a predefined input pattern;

the template selection module is also used for determining the next sentence pattern module for calculating the matching degree according to the similarity between the current sentence pattern template for calculating the matching degree and other sentence pattern templates in the process of calculating the matching degree between the input semantics and the sentence pattern template.

In the above apparatus, the similarity calculation module is specifically configured to:

Wherein:

In the above apparatus, the template selecting module is specifically configured to:

In the above apparatus, the sentence generation module is specifically configured to fill words in the input semantics and/or the replacement semantics to corresponding positions in the candidate sentence pattern template to obtain a natural sentence, where a semantic similarity between the replacement semantics and the input semantics is higher than a preset threshold; and calculating filling semantics formed by sub-semantics of each filling position in the candidate natural sentences and the matching degree between the filling semantics and the corresponding candidate sentence pattern template, and screening out the natural sentences the matching degree of which reaches a preset threshold according to the matching degree.

The embodiment of the invention also provides an electronic device for people counting, which comprises:

a processor;

and a memory having computer program instructions stored therein,

wherein the computer program instructions, when executed by the processor, cause the processor to perform the steps of:

Compared with the prior art, the natural language generation method, the natural language generation device and the electronic equipment provided by the embodiment of the invention at least have the following beneficial effects: the embodiment of the invention directly extracts the sentence pattern template from the corpus, ensures the sentence pattern accuracy of the subsequently generated natural sentences, and only needs to delete the composition components in the pre-defined sentences in the input mode to avoid excessive manual work. In addition, the embodiment of the invention selects the candidate sentence pattern template based on the matching degree between the input semantics and the sentence pattern template, improves the correctness of the generated natural sentence, filters the generated natural sentence through the matching degree, and can give consideration to the correctness and diversity of the obtained natural sentence.

Drawings

Fig. 1 is a schematic flowchart of a method for generating a natural language according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a natural language generation method according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a natural language generating apparatus according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of another natural language generating apparatus according to a third embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In the following description, specific details such as specific configurations and components are provided only to help fully understand the embodiments of the present invention in order to make technical problems, technical solutions and advantages to be solved more clear. Thus, it will be apparent to those skilled in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

In various embodiments of the present invention, it should be understood that the sequence numbers of the following processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention. It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.

First, the related concepts related to the following embodiments of the present invention will be explained.

In the embodiment of the present invention, the input mode refers to a classification of the input word, specifically, the classification may include nouns, verbs, adjectives, numerators, quantifiers, adverbs, pronouns, conjunctions, prepositions, auxiliary words, and word atmosphere words, for example, the input mode is: a noun and a verb are input. The input pattern may be a component of or a role that the input word plays in a grammatical structure, and specifically, the component may be a subject, a predicate, an object, a fixed term, a subject, a complement, or the like. That is, the input pattern defines the composition of the input word in the sentence.

Input semantics refers to the input word or word vector (another representation of a word). Since a plurality of words or word vectors may be included in an input semantic, each word or word vector in the input semantic is referred to herein as a sub-semantic, for example, the input semantic is: "Jingdong" and "shop" are taken as an input semantic meaning, and "Jingdong" and "shop" are respectively a sub-semantic meaning of the input semantic meanings.

The sentence pattern template is obtained after sentence components defined in the input pattern are removed from the sentence. For example, for the sentence "we buy clothes in the mall", if the predefined input templates are subject and predicate, the resulting sentence template after deleting the subject "we" and the predicate "buy" from the above sentence is: [ subject ] clothing in a mall [ predicate ]. Wherein, the part in [ is the filling position of the composition components defined in the input mode in the sentence, and the components of the filling position are deleted. Subsequently, when a natural sentence is generated, each sub-semantic in the input semantics conforming to the input pattern is filled in the corresponding filling position, and the sentence (natural sentence) in the natural language can be obtained. For example, the input semantics are "miss king" and "sell", and after filling into the above sentence pattern template, a sentence is obtained: miss king sells clothes in the mall.

The invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

< example one >

As shown in fig. 1, an embodiment of the present invention provides a natural language generating method, which may be applied in an environment such as a human-computer dialog system or an image description generating system. Referring to fig. 1, the method includes:

step 11, generating at least one sentence pattern template matching the predefined input pattern according to the sentences in the corpus.

The embodiment of the invention directly deletes the composition components in the sentence defined in the input mode from the sentence in the preset corpus to obtain the sentence pattern template. In the sentence pattern template, the positions of the deleted composition components are left blank as filling positions for subsequently filling corresponding words in the input semantics. Since a large number of sentences are usually stored in the corpus, there may be a plurality of sentences matching the input pattern, and a plurality of sentence templates can be extracted for the matching sentences.

Here, the input mode may be an input mode defined or determined by a user, or an input mode generated by a system, for example, the image description generation system may recognize content in an image and describe the content in a natural language, and in this case, the input mode may be an input mode generated after the system recognizes the content in the image.

And step 12, obtaining input semantics based on the input mode, calculating the matching degree between the input semantics and the sentence pattern template, and selecting at least one candidate sentence pattern template with the matching degree meeting a preset condition.

In step 12, the matching degree between the input semantics and the sentence pattern template is calculated, and then the sentence pattern template with the matching degree satisfying the predetermined condition is selected as the candidate sentence pattern template. The predetermined condition may be set according to the scene requirement or the calculation amount, for example, it may be a sentence pattern template with a matching degree exceeding a preset numerical threshold, or may be N sentence pattern templates with the highest matching degree, where N is a positive integer. Similarly, the input semantics may be user input semantics or semantics that are self-generated by a system, such as the aforementioned semantics generated by the image description generation system.

In step 12, when calculating the matching degree between the input semantics and the sentence pattern template, the calculation may be specifically performed in the following manner:

step 121, determining, for each sub-semantic in the input semantics, a first set of words that can be filled in the filling position in the corpus according to the filling position of the sub-semantic in the sentence pattern template; and calculating a matching factor of the sub-semantic and a corresponding filling position in the sentence pattern template according to the cosine similarity between the sub-semantic and each word in the first set, wherein the matching factor is positively correlated with the cosine similarity.

In the step 121, when determining the first set of words that can be filled in the corpus at the filling position, determining and obtaining the words in the first set according to the words in the filling position of the sentence in the corpus that is matched with the sentence pattern template, and then calculating cosine similarity between the sub-semantic and each word in the first set, a preferred calculation method is to calculate cosine similarity (cosine distance) between the word vector corresponding to the sub-semantic and the word vector corresponding to each word in the first set, and then calculate the matching factor according to the cosine similarity, where the matching factor is positively correlated with the cosine similarity, that is, the larger the cosine similarity is, the larger the value of the matching factor is, that is, the more the two are matched; conversely, the smaller the cosine similarity, the smaller the value of the matching factor, i.e. the more mismatched the two. One way of calculating the matching factor is provided below, and it should be noted that the embodiments of the present invention are not limited thereto.

In the above formula (1), w represents a word corresponding to the sub-semantic, s represents a filling position in the sentence pattern template p, and AM (p, s, w) represents the words w andmatching factor, θ, of fill position s in sentence pattern template p_wThe preset weight coefficient represents a word w, T (p, s) represents a set of words in a corpus capable of being filled in filling positions s of a sentence pattern template p, n represents the number of words in T (p, s), x represents a word in T (p, s), and cos (w, x) represents cosine similarity of the words w and x.

And step 122, calculating the matching degree between the input semantics and the sentence pattern template according to the matching factor of each sub-semantics and the corresponding filling position in the sentence pattern template.

In step 122, an average value of the matching factors of each sub-semantic and the corresponding filling position in the sentence pattern template may be calculated, and the average value is used as the matching degree between the input semantic and the sentence pattern template, or a sum value of the matching factors of all sub-semantics and the corresponding filling position in the sentence pattern template may be calculated, and the sum value is used as the matching degree between the input semantic and the sentence pattern template.

And step 13, generating a natural sentence according to the input semantic and the candidate sentence pattern template.

In step 13, one way to obtain the natural sentence is: after the candidate sentence pattern template is selected, the words in the input semantics can be filled to the corresponding filling positions in the candidate sentence pattern template, and the natural sentence is obtained.

In order to obtain the diversity of natural sentences, one implementation manner of the step 13 is as follows: and determining a plurality of alternative semantics of which the semantic similarity with the input semantics is higher than a preset threshold, and then filling words in the input semantics and/or the alternative semantics to corresponding positions in the candidate sentence pattern template to obtain more styles of natural sentences, wherein the semantic similarity can be calculated according to the cosine similarity between word vectors.

In order to balance the correctness and diversity of the obtained natural language sentence, one implementation manner in the step 13 is: filling words in the input semantics and/or the replacement semantics to corresponding positions in the candidate sentence pattern template to obtain candidate natural sentences; and calculating filling semantics formed by sub-semantics of each filling position in the candidate natural sentences and the matching degree between the filling semantics and the corresponding candidate sentence pattern template, and screening out the natural sentences the matching degree of which reaches a preset threshold according to the matching degree. The calculation method of the matching degree here can refer to the implementation of the above steps 121 to 122, and is not described here again.

Through the steps, the sentence pattern template is directly extracted from the corpus, the sentence pattern accuracy of the subsequently generated natural sentences is guaranteed, and the sentence pattern template is extracted only by deleting the components in the sentences predefined in the input mode, so that excessive manual work is avoided. In addition, the embodiment of the invention selects the candidate sentence pattern template based on the matching degree between the input semantics and the sentence pattern template, improves the correctness of the generated natural sentence, filters the generated natural sentence through the matching degree, and can give consideration to the correctness and diversity of the obtained natural sentence.

< example two >

As shown in fig. 2, in order to improve the efficiency of subsequently selecting candidate sentence pattern templates, the method for generating natural language according to the second embodiment of the present invention further calculates the similarity between every two sentence pattern templates after obtaining the sentence pattern templates, and further improves the selection efficiency of the subsequent candidate sentence pattern templates by using the similarity between the sentence pattern templates. Referring to fig. 2, the method includes:

step 21, generating at least one sentence template matching the predefined input pattern according to the sentences in the corpus.

Here, the specific implementation of generating the sentence pattern template may refer to embodiment one, and details are not described here.

And step 22, calculating the similarity between every two sentence pattern templates in the at least one sentence pattern template.

Here, in the above step 22, the similarity Sim (p) between every two sentence pattern templates can be calculated according to the following formula₁，p₂)：

Wherein:

in the above formula, w represents the word corresponding to the sub-semantics, p₁、p₂Respectively representing a first sentence pattern template and a second sentence pattern template in every two sentence pattern templates; s represents a fill position in the sentence pattern template; t (p, s) represents a set of words in the corpus that can be filled in filling positions s of the sentence pattern template p; num (T ()) represents the number of words in the set T (); AM (p, s, w) represents the matching factor of the word w with the filling position s in the sentence pattern template p; n represents the number of words in T (p, s); theta_wRepresenting a preset weight coefficient of a word w, x representing a word in T (p, s), cos (w, x) representing the cosine similarity of the word w and x;

And step 23, obtaining input semantics based on the input pattern, calculating a matching degree between the input semantics and the sentence pattern template, and selecting at least one candidate sentence pattern template with the matching degree meeting a preset condition, wherein in the calculation process of the matching degree, the next sentence pattern template for calculating the matching degree is determined according to the similarity between the sentence pattern templates.

Here, in step 23, in the process of calculating the matching degree between the input semantics and the sentence pattern template, a sentence pattern module for calculating the matching degree next is determined according to the similarity between the current sentence pattern template for calculating the matching degree and other sentence pattern templates, so as to improve the efficiency of selecting candidate sentence pattern templates.

For example, firstly, a sentence pattern template is selected from the at least one sentence pattern template obtained in step 21 as the current sentence pattern template; then, calculating the matching degree between the input semantics and the current sentence pattern template: if the matching degree does not reach a preset first threshold, selecting one sentence pattern template from the rest sentence pattern templates in the at least one sentence pattern template as a next sentence pattern module for calculating the matching degree; and if the matching degree reaches a preset first threshold, selecting a sentence pattern template to be calculated, the similarity of which with the current sentence pattern template exceeds a preset second threshold, as a next sentence pattern module for calculating the matching degree according to the similarity between sentence pattern templates. When the number of the remaining sentence pattern templates in the at least one sentence pattern template is 0, or the number of the sentence pattern templates with the input semantic matching degree reaching the preset threshold reaches the preset number, the calculation process of the matching degree may be ended, and at least one candidate sentence pattern template satisfying the preset condition is selected according to the calculated matching degree between the input semantic and each sentence pattern template.

Of course, the above example is only one example of accelerating the selection process of the candidate sentence pattern templates, and the embodiment of the present invention may also adopt other algorithms based on the similarity between the sentence pattern templates to improve the selection efficiency.

As to how to calculate the matching degree between the input semantics and the sentence pattern template, reference may be made to the implementation process in the first embodiment, which is not described herein again.

And 24, generating a natural sentence according to the input semantic and the candidate sentence pattern template.

Here, in step 24, first, words in the input semantics and/or the replacement semantics may be filled into corresponding positions in the candidate sentence pattern template to obtain candidate natural sentences, where semantic similarity between the replacement semantics and the input semantics is higher than a preset threshold, and sub-semantics (words) in the replacement semantics may be selected from the same corpus in step 21 or from other corpora, for example, from an internet corpus, so as to improve diversity of subsequently generated natural sentences. Then, calculating filling semantics formed by sub-semantics of each filling position in the candidate natural sentences and matching degrees between the filling semantics and corresponding candidate sentence pattern templates, and screening out natural sentences of which the matching degrees reach a preset threshold according to the matching degrees.

< example three >

Referring to fig. 3, an embodiment of the present invention provides a device for implementing a natural language generating method according to the foregoing embodiments, where the natural language generating device 30 includes:

a template obtaining module 31, configured to generate at least one sentence pattern template matching a predefined input pattern according to the sentences in the corpus;

a template selection module 32, configured to obtain input semantics based on the input pattern, calculate a matching degree between the input semantics and a sentence pattern template, and select at least one candidate sentence pattern template whose matching degree satisfies a predetermined condition;

and a sentence generating module 33 for generating a natural sentence according to the input semantic and the candidate sentence pattern template.

To improve the selection message of the candidate sentence pattern template, as shown in fig. 4, the natural language generating apparatus according to the embodiment of the present invention may further include: a similarity calculation module 34, configured to calculate a similarity between every two sentence pattern templates after the template obtaining module generates at least one sentence pattern template matching the predefined input pattern. At this time, the template selecting module 32 is further configured to determine a next sentence pattern module for calculating the matching degree according to the similarity between the current sentence pattern template for calculating the matching degree and other sentence pattern templates in the process of calculating the matching degree between the input semantics and the sentence pattern template.

The template selecting module 32 may specifically include: a first selection submodule, configured to select a sentence pattern template from the at least one sentence pattern template as a current sentence pattern template; the calculation submodule is used for calculating the matching degree between the input semantics and the current sentence pattern template; a first processing sub-module, configured to select one sentence pattern template from the remaining sentence pattern templates in the at least one sentence pattern template as a next sentence pattern module for calculating the matching degree when the matching degree between the input semantics and the current sentence pattern template does not reach a preset first threshold; and the second processing sub-module is used for selecting the sentence pattern template to be calculated, the similarity of which with the current sentence pattern template exceeds a preset second threshold, as a next sentence pattern module for calculating the matching degree according to the similarity between the sentence pattern templates when the matching degree between the input semantics and the current sentence pattern template reaches a preset first threshold.

The similarity calculation module 34 is specifically configured to:

Wherein:

Here, the template selecting module 32, when calculating the matching degree between the input semantic meaning and the sentence pattern template, is specifically configured to: aiming at each sub-semantic in the input semantics, respectively determining a first set of words which can be filled in a filling position in the corpus according to the filling position of the sub-semantic in the sentence pattern template; calculating a matching factor of the sub-semantic and a corresponding filling position in the sentence pattern template according to the cosine similarity between the sub-semantic and each word in the first set, wherein the matching factor is positively correlated with the cosine similarity; then, according to the matching factor of each sub-semantic and the corresponding filling position in the sentence pattern template, calculating the matching degree (such as taking the average value or sum value of the matching factors) between the input semantic and the sentence pattern template.

Here, the sentence generating module 33 is specifically configured to fill words in the input semantics and/or the replacement semantics to corresponding positions in the candidate sentence pattern template to obtain a natural sentence, where a semantic similarity between the replacement semantics and the input semantics is higher than a preset threshold; and calculating filling semantics formed by sub-semantics of each filling position in the candidate natural sentences and the matching degree between the filling semantics and the corresponding candidate sentence pattern template, and screening out the natural sentences the matching degree of which reaches a preset threshold according to the matching degree.

< example four >

Referring to fig. 5, an embodiment of the present invention further provides an electronic device for people counting, which can implement the processes of the embodiments shown in fig. 1 or fig. 2 in the embodiments of the present invention. The electronic device may be a Personal Computer (PC), a tablet PC, various smart devices (including smart glasses or smart phones), and the like. As shown in fig. 5, the electronic device 50 may include:

a processor 51;

and a memory having computer program instructions stored therein. Specifically, the memory may include a RAM (random access memory) 52, a ROM (read only memory) 53.

Wherein the computer program instructions, when executed by the processor 51, cause the processor 51 to perform the steps of:

Referring to fig. 5, the electronic device according to the embodiment of the present invention may further include a hard disk 54, an input device 55, a display device 56, and other components. In particular, the input device 55 may be a device with input function and/or receiving function, such as a keyboard, a touch screen, various interfaces to obtain predefined input modes and input semantics. The display device 56 may be an LED display panel or display and may be used to display information such as generated natural language sentences.

The processor 51, RAM52, ROM 53, hard disk 54, input device 55, and display device 56 described above may be interconnected by a bus architecture. A bus architecture may be any architecture that may include any number of interconnected buses and bridges. Various circuits of one or more Central Processing Units (CPUs), represented in particular by processor 52, and one or more memories, represented by RAM52 and ROM 53, are connected together. The bus architecture may also connect various other circuits such as peripherals, voltage regulators, power management circuits, etc., which are well known in the art, and therefore, will not be described in any detail herein.

The input device 55 is used for inputting and storing the sample of the network request data in the hard disk 54.

The RAM52 and the ROM 53 are used to store programs and data necessary for system operation, and data such as intermediate results in the calculation process of the processor.

In the embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the transceiving method according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for generating a natural language, comprising:

generating a natural sentence according to the input semantic and the candidate sentence pattern template,

after the step of generating at least one sentence template matching the predefined input pattern, the method further comprises: calculating the similarity between every two sentence pattern templates;

in the process of calculating the matching degree between the input semantics and the sentence pattern template, determining the next sentence pattern module for calculating the matching degree according to the similarity between the current sentence pattern template for calculating the matching degree and other sentence pattern templates,

wherein, the step of calculating the similarity between every two sentence pattern templates comprises the following steps:

Wherein:

T(p₁，s)∩T(p₂s) denotes the intersection of the two sets, T (p)₁，s)∪T(p₂S) represents the union of the two sets;

2. The method of claim 1,

the step of calculating the matching degree between the input semantics and the sentence pattern template comprises the following steps:

3. The method of claim 1,

the step of generating a natural sentence according to the input semantics and the candidate sentence pattern template includes:

4. A natural language generation apparatus, comprising:

the sentence generating module generates a natural sentence according to the input semantics and the candidate sentence pattern template;

the template selection module is also used for determining the next sentence pattern module for calculating the matching degree according to the similarity between the current sentence pattern template for calculating the matching degree and other sentence pattern templates in the process of calculating the matching degree between the input semantics and the sentence pattern template,

the similarity calculation module is specifically configured to:

Wherein:

w represents a word corresponding to the sub-semantics; p is a radical of₁、p₂Respectively representing a first sentence pattern template and a second sentence pattern template in every two sentence pattern templates; s represents a fill position in the sentence pattern template; t (p, s) represents a set of words in the corpus that can be filled in filling positions s of the sentence pattern template p; num (T ()) represents the number of words in the set T (); n represents the number of words in T (p, s); theta_wA preset weight coefficient representing the word w, x represents the word in T (p, s),cos (w, x) represents the cosine similarity of the words w and x;

5. The apparatus of claim 4,

the template selection module is specifically configured to:

6. The apparatus of claim 4,

the sentence generating module is specifically configured to fill words in the input semantics and/or the replacement semantics to corresponding positions in the candidate sentence pattern template to obtain a natural sentence, where a semantic similarity between the replacement semantics and the input semantics is higher than a preset threshold; and calculating filling semantics formed by sub-semantics of each filling position in the candidate natural sentences and the matching degree between the filling semantics and the corresponding candidate sentence pattern template, and screening out the natural sentences the matching degree of which reaches a preset threshold according to the matching degree.

7. An electronic device, comprising:

a processor;

and a memory having computer program instructions stored therein,

after the step of generating at least one sentence template matching the predefined input pattern, the following steps are also performed: calculating the similarity between every two sentence pattern templates;

Wherein:

w represents a word corresponding to the sub-semantics; p is a radical of₁、p₂Respectively represent every two sentence patternsA first sentence pattern template and a second sentence pattern template in the template; s represents a fill position in the sentence pattern template; t (p, s) represents a set of words in the corpus that can be filled in filling positions s of the sentence pattern template p; num (T ()) represents the number of words in the set T (); n represents the number of words in T (p, s); theta_wRepresenting a preset weight coefficient of a word w, x representing a word in T (p, s), cos (w, x) representing the cosine similarity of the word w and x;