WO2020220702A1 - 生成自然语言 - Google Patents

生成自然语言 Download PDF

Info

Publication number
WO2020220702A1
WO2020220702A1 PCT/CN2019/127634 CN2019127634W WO2020220702A1 WO 2020220702 A1 WO2020220702 A1 WO 2020220702A1 CN 2019127634 W CN2019127634 W CN 2019127634W WO 2020220702 A1 WO2020220702 A1 WO 2020220702A1
Authority
WO
WIPO (PCT)
Prior art keywords
score
natural sentence
natural
sentence
initial
Prior art date
Application number
PCT/CN2019/127634
Other languages
English (en)
French (fr)
Inventor
付圣
任冬淳
丁曙光
钱德恒
王志超
朱炎亮
Original Assignee
北京三快在线科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京三快在线科技有限公司 filed Critical 北京三快在线科技有限公司
Publication of WO2020220702A1 publication Critical patent/WO2020220702A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Definitions

  • This application relates to the field of artificial intelligence, in particular to generating natural language.
  • artificial intelligence devices are widely used in life, and natural language generation devices are one of them.
  • the generating device After obtaining the target instruction that needs to be understood by the user, the generating device generates natural language (that is, the language used by human communication) to describe the target instruction for the user to understand. Therefore, how to generate natural language has become the key for users to quickly and correctly understand target instructions.
  • a method for generating natural language includes:
  • a natural language model is called to generate one or more initial natural sentences according to a reference grammar.
  • the natural language model is a language model trained according to a training data set, and the training data set includes users Natural language describing training instructions;
  • the score value of each initial natural sentence is obtained, and a natural sentence that satisfies the condition is selected as the natural language of the target instruction based on the score value of each initial natural sentence, and the score is used to indicate the accuracy of the initial natural sentence.
  • the obtaining the score of each initial natural sentence includes:
  • any initial natural sentence obtain a first score of the initial natural sentence, where the first score is used to indicate the degree of matching between the initial natural sentence and the training data set;
  • the second score value of the initial natural sentence is obtained according to a public data set, the second score value is used to indicate the degree of matching between the initial natural sentence and the environmental picture, and the public data set includes environmental elements marked with Multiple pictures
  • the product of the first score and the second score is used as the score of the initial natural sentence.
  • the obtaining the second score value of the initial natural sentence according to the public data set includes:
  • the convolution result is calculated according to the classification parameters in the score model to obtain the second score of the initial natural sentence, and the convolution parameters and the classification parameters are obtained by training according to the public data set Parameters.
  • the selecting a natural sentence that satisfies a condition as the natural language of the target instruction based on the score of each initial natural sentence includes:
  • the initial natural sentence with the largest score is selected from the initial natural sentences, and if the score of the initial natural sentence with the largest score is not lower than the reference threshold, the initial natural sentence with the largest score is used as the target instruction natural language.
  • the method further includes:
  • the score of the initial natural sentence with the largest score is lower than the reference threshold, reacquire the target natural sentence with a score not lower than the reference threshold, and use the target natural sentence as the natural sentence of the target instruction Language.
  • the reacquiring the target natural sentence whose score is not lower than the reference threshold includes:
  • the average of the scores of the first natural sentence is greater than the average of the scores of the initial natural sentence, and the score of the first natural sentence with the largest score is greater than the initial natural sentence with the largest score, change The first natural sentence with the largest score is used as the target natural sentence.
  • the method further includes:
  • the natural language model is called, and based on the initial natural sentence with the largest score and the description vocabulary, one or more second natural sentences are generated according to the reference grammar.
  • the number of description words is greater than the number of description words in the initial natural sentence with the largest score;
  • the score value of the second natural sentence is obtained, and the second natural sentence with the largest score is used as the target natural sentence.
  • the method further includes:
  • the natural language model is invoked to generate a candidate natural sentence, and the candidate natural sentence replaces the natural sentence that meets the condition as the natural language of the target instruction.
  • the obtaining the predicted value includes:
  • the first predicted value is used to indicate the probability that after the environmental element is updated, the environmental picture will be updated from the current state to the predicted state, where the current state refers to before the environmental element is updated status;
  • the product of the first predicted value, the second predicted value, and the third predicted value is used as the predicted value.
  • a device for generating natural language includes:
  • the first obtaining module is used to obtain the target vocabulary included in the content of the target instruction and the description vocabulary indicated by the environment element in the environment picture of the target instruction;
  • the generating module is used to call the natural language model to generate one or more initial natural sentences according to the reference grammar based on the target vocabulary and the description vocabulary, the natural language model is a language model trained according to the training data set, the The training data set includes natural language for users to describe training instructions;
  • the second obtaining module is used to obtain the score of each initial natural sentence
  • the selection module is configured to select a natural sentence that satisfies the condition as the natural language of the target instruction based on the score of each initial natural sentence, and the score is used to indicate the accuracy of the initial natural sentence.
  • the second acquiring module is configured to acquire a first score value of the initial natural sentence for any initial natural sentence, and the first score value is used to indicate the relationship between the initial natural sentence and the training The degree of matching of the data set; the second score value of the initial natural sentence is obtained according to the public data set, the second score is used to indicate the degree of matching between the initial natural sentence and the environmental picture, the public data set It includes a plurality of pictures marked with environmental elements; the product of the first score and the second score is used as the score of the initial natural sentence.
  • the second acquiring module is configured to encode the initial natural sentence to obtain encoded natural sentence information; according to the convolution parameter in the scoring model, the encoded natural sentence information and the sentence Convolution calculation is performed on the information in the environment picture to obtain a convolution result; the convolution result is calculated according to the classification parameters in the score model to obtain the second score of the initial natural sentence, and the volume
  • the product parameter and the classification parameter are parameters obtained by training according to the public data set.
  • the selection module is configured to select an initial natural sentence with the largest score from the initial natural sentences, and if the score of the initial natural sentence with the largest score is not lower than a reference threshold, the score is The largest initial natural sentence is used as the natural language of the target instruction.
  • the device further includes: a third acquiring module, configured to re-acquire the score not lower than the reference threshold if the score of the initial natural sentence with the largest score is lower than the reference threshold
  • the target natural sentence uses the target natural sentence as the natural language of the target instruction.
  • the third acquisition module is configured to call the natural language model, and generate one or more first natural sentences according to the first grammar based on the target vocabulary and the description vocabulary, and the first grammar Is any grammar except the reference grammar; obtaining the average value of the scores of the first natural sentence and the average value of the initial natural sentence; if the average of the scores of the first natural sentence The value is greater than the average of the scores of the initial natural sentence, and the score of the first natural sentence with the largest score is greater than the initial natural sentence with the largest score, and the first natural sentence with the largest score is taken as the first natural sentence with the largest score. Describe the target natural sentence.
  • the third obtaining module is further configured to: if the average value of the scores of the first natural sentence is not greater than the average value of the initial natural sentence, or the first natural sentence with the largest score.
  • the natural language model is called, and based on the initial natural sentence with the largest score and the description vocabulary, one or more are generated according to the reference grammar A second natural sentence, where the number of description words in the second natural sentence is greater than the number of description words in the initial natural sentence with the largest score; the score of the second natural sentence is obtained, and the score is the largest As the target natural sentence.
  • the device further includes: a prediction module configured to obtain a prediction value, the prediction value being used to indicate the degree of influence of the environmental element update on the natural sentence that satisfies the condition; if the prediction value is greater than a reference Numerical value, call the natural language model to generate a candidate natural sentence, and replace the natural sentence with the condition as the natural language of the target instruction.
  • a prediction module configured to obtain a prediction value, the prediction value being used to indicate the degree of influence of the environmental element update on the natural sentence that satisfies the condition; if the prediction value is greater than a reference Numerical value, call the natural language model to generate a candidate natural sentence, and replace the natural sentence with the condition as the natural language of the target instruction.
  • the prediction module is configured to obtain a first prediction value, and the first prediction value is used to indicate the probability that the environment picture is updated from the current state to the predicted state after the environmental element is updated, wherein The current state refers to the state before the environmental element is updated; obtaining a second predicted value, where the second predicted value is used to indicate the probability of observing the current state and the environmental element update; obtaining a third predicted value, The third predicted value is used to indicate the degree of influence on the natural sentence that satisfies the condition if the environmental picture is updated from the current state to the predicted state; and the first predicted value, the second predicted value and the The product of the third predicted value is used as the predicted value.
  • a device for generating natural language includes a memory and a processor.
  • the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor to implement the present application.
  • the embodiment provides a method for generating natural language.
  • a readable storage medium is provided, and at least one instruction is stored in the readable storage medium, and the instruction is loaded and executed by a processor to implement the method for generating natural language provided by the embodiment of the present application.
  • the initial natural sentence is generated by the natural language model trained according to the training data set, and the natural sentence that satisfies the condition is selected from the initial natural sentence as the natural language of the target instruction, which is not only more efficient, but also generates natural language
  • the language is clear and easy to understand, and the user experience is better.
  • Figure 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • Figure 2 is a flowchart of a method for generating natural language provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of the process of generating natural language according to an embodiment of the present application.
  • Fig. 4 is a schematic diagram of a process for generating natural language provided by an embodiment of the present application.
  • Figure 5 is a schematic structural diagram of an apparatus for generating natural language provided by an embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of a device for generating natural language provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an apparatus for generating natural language provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • natural language generation devices are widely used in life. After obtaining the target instruction that needs to be understood by the user, the natural language generating device generates natural language, that is, the language used by humans for communication, to describe the target instruction for the user to understand.
  • the target vocabulary is first obtained based on the content of the target instruction, and then the vocabulary corresponding to the environmental element in the implementation environment of the target instruction is used as a description vocabulary, and the target vocabulary and all description vocabulary are arranged in different orders. Statements that form reference quantities. After that, the probability of each sentence being correctly understood by the user is calculated, and the sentence with the highest probability of being correctly understood by the user is used as the natural language describing the target instruction.
  • the target vocabulary is acquired as "black car” based on the target instruction "come to the black car”, and the vocabulary corresponding to the environmental elements in the traffic environment such as “traffic police” and “crossing the street bridge” is used as the description vocabulary, thus Arrange to get sentences such as "the black car behind the overpass next to the traffic police” and "the black car beside the traffic police behind the overpass".
  • the sentence with the highest probability of being correctly understood by the user is selected as the natural language describing the target instruction.
  • the second related technology takes the sentence with the highest probability of being correctly understood by the user as the target sentence, and adds historical information to the target sentence to form one or more update sentences. After that, the probability of each update sentence being correctly understood by the user is calculated, and the update sentence with the highest probability of being correctly understood by the user is used as the natural language describing the target instruction.
  • the black car behind the overpass next to the traffic police is the target sentence.
  • Add historical information to the target sentence such as changing "the black car behind the overpass next to the traffic police” to "the black car behind the overpass next to the traffic police that just turned on with its tail light” and other updated sentences.
  • selection is made through calculation to obtain a natural language describing the target instruction.
  • the related technology 1 has a large amount of calculation and low efficiency, and the semantics of the natural language generated are not accurate enough.
  • Related technology two improves the accuracy of the semantics of natural language by adding historical information.
  • users often do not have a deep impression of historical information the semantics of the natural language generated by related technology two are still not accurate enough and easy to be misunderstood by users. It can be seen that users have poor experience in using related technologies.
  • the embodiment of the present application provides a method for generating natural language, which can be applied in the implementation environment as shown in FIG. 1.
  • at least one terminal 11 and a server 12 are included.
  • the terminal 11 can communicate with the server 12 to obtain a target language model from the server 12. If the terminal 11 can train the model by itself, the method provided in the embodiment of the present application may not rely on the server 12, and the terminal 11 may execute the overall method flow.
  • the terminal 11 may be any electronic product that can interact with the user through one or more methods such as keyboard, touchpad, touch screen, remote control, voice interaction, or handwriting device, such as PC (Personal Computer, personal computer). Computers), mobile phones, smart phones, PDAs (Personal Digital Assistants), wearable devices, Pocket PCs (Pocket PCs), tablets, smart cars, smart TVs, smart speakers, etc.
  • PC Personal Computer
  • Computers mobile phones, smart phones, PDAs (Personal Digital Assistants), wearable devices, Pocket PCs (Pocket PCs), tablets, smart cars, smart TVs, smart speakers, etc.
  • the server 12 may be one server, a server cluster composed of multiple servers, or a cloud computing service center.
  • terminal 11 and server 12 are only examples, and other existing or future terminals or servers that are applicable to this application should also be included in the scope of protection of this application.
  • the citation method is included here.
  • an embodiment of the present application provides a method for generating natural language, which can be applied to the terminal shown in FIG. 1. As shown in Figure 2, the method includes:
  • Step 201 Obtain the target vocabulary included in the content of the target instruction and the description vocabulary indicated by the environment element in the environment picture of the target instruction.
  • the target instruction is an instruction that needs to be understood or executed by the user, and the target vocabulary refers to the vocabulary to be described included in the target instruction.
  • the target instruction is "come to the black car”
  • the target vocabulary is "black car”.
  • the environmental picture of the target instruction is used to indicate the implementation environment of the target instruction.
  • the environmental elements include but are not limited to the people, objects or words in the implementation environment of the target instruction.
  • the names of the persons, objects or words are the descriptions indicated by the environmental elements. Vocabulary, the number of description words can be one or more.
  • the description vocabulary includes but is not limited to "traffic police", "crossing overpass", and "take”.
  • the method of obtaining the environment picture of the target instruction includes collecting by a collecting device such as a camera. After that, feature extraction can be performed on the environment picture of the target instruction through CNN (Convolutional Neural Network), and the environment elements in the environment picture can be obtained; the environment elements extracted by CNN can be classified by the classifier to obtain the environment The name of the element, that is, the description vocabulary indicated by the environmental element.
  • CNN Convolutional Neural Network
  • the description vocabulary can be used to describe the target vocabulary, so that the described target vocabulary is easy for users to understand.
  • Step 202 Based on the target vocabulary and the description vocabulary, the natural language model is invoked to generate one or more initial natural sentences according to the reference grammar.
  • the natural language model is a language model trained according to a training data set, and the training data set includes a natural language for a user to describe training instructions.
  • each training instruction corresponds to an environment picture for indicating the implementation environment of the training instruction.
  • the natural language used by the user to describe the training instruction refers to the natural language used by the user to describe the training instruction by using grammar and vocabulary according to the user's habits after observing the environmental picture of the training instruction. Therefore, the natural language model trained according to the training data set has the ability to generate natural sentences based on grammar, description vocabulary and target vocabulary, and the description vocabulary used by the natural language model includes one or more of all description vocabulary.
  • the natural language in which the user describes each training instruction is used as the training data set, so that the number of natural languages contained in the training data set is More, which in turn makes the natural language model trained on the training data set stronger in generating natural sentences.
  • the user may be a user who needs to understand or execute the target instruction, or may be multiple other users selected through sample extraction, which is not limited in this embodiment.
  • the natural language model is called based on the target vocabulary and description vocabulary, and one or more initial natural sentences can be generated according to the reference grammar. At least one of the description vocabulary or the number of description vocabulary used by different initial natural sentences is different. .
  • initial natural sentences such as "the black car has a traffic police standing next to it”, "there is a Chinese character “take” behind the black car”, and “there is an overpass in front of the black car.”
  • Step 203 Obtain the score of each initial natural sentence, and select the natural sentence that satisfies the condition as the natural language of the target instruction based on the score of each initial natural sentence.
  • the score is used to indicate the accuracy of the initial natural sentence.
  • obtain the score of each initial natural sentence including:
  • Step 2031 For any initial natural sentence, obtain a first score of the initial natural sentence, where the first score is used to indicate the degree of matching between the initial natural sentence and the training data set.
  • the matching degree between the initial natural sentence and the training data set refers to the matching degree between the initial natural sentence and the natural language of the user describing the training instruction included in the training data set.
  • obtaining the first score of the initial natural sentence includes:
  • the first score can be expressed as Among them, C d is the first score, k is used to distinguish different initial natural sentences, k is a positive integer, and the maximum value of k is not greater than the number of initial natural sentences. For example, when the number of initial natural sentences is two, the first score of an initial natural sentence is The first score of another initial natural sentence is
  • Step 2032 Obtain a second score of the initial natural sentence according to the public data set.
  • the second score is used to indicate the degree of matching between the initial natural sentence and the environmental picture.
  • the public data set includes multiple pictures with environmental elements marked.
  • the natural language model Since the natural language model is trained based on the training data set, and the number of natural languages included in the training data set is limited, it may lead to overfitting of the initial natural sentences generated by the natural language model.
  • over-fitting is defined as: the initial natural sentence generated by the natural language model has a high degree of matching with the natural language included in the training data set, while the degree of matching with the environmental picture is low.
  • this embodiment obtains the second score indicating the degree of matching between the initial natural sentence and the environmental picture according to the public data set, so as to avoid the initial natural sentence with a low degree of matching with the environmental picture as the target in the subsequent selection process.
  • the public data set includes multiple pictures marked with environmental elements. Marking environmental elements refers to marking environmental elements as words.
  • the public data sets include but are not limited to Oxford-102, KITTI and CityScope data sets.
  • the degree of matching between the initial natural sentence and the environmental picture depends on the description vocabulary used in the initial natural sentence, it can be expressed indirectly by the matching degree between the description vocabulary used in the initial natural sentence and the vocabulary marked in the public data set
  • the matching degree between the initial natural sentence and the environmental picture For example, for the environmental picture of the target instruction, the descriptive vocabulary used in the initial natural sentence is "white sky", and the vocabulary marked in the public data set is "blue sky", the descriptive vocabulary used in the initial natural sentence is the same as the public
  • the matching degree of the vocabulary marked in the data set is low, and the matching degree of the initial natural sentence and the environmental picture is also low.
  • obtaining the second score of the initial natural sentence according to the public data set includes:
  • the classification parameters in the model calculate the convolution result to obtain the second score of the initial natural sentence.
  • this embodiment adopts an LSTM (Long Short-Term Memory) encoder to encode the initial natural sentence according to the following formula:
  • the score model is called, and the encoded natural sentence information and the information in the environment picture are convolved according to the following formula to obtain the convolution result:
  • f is the convolution result
  • x is the information in the environment picture
  • Wx, bx, Wc, and bc are the convolution parameters in the score model
  • tanh is the hyperbolic tangent function
  • is the convolution calculation symbol.
  • Wm and bm are classification parameters in the score model, and softmax is the classification function.
  • the aforementioned convolution parameters (Wx, bx, Wc, bc) and classification parameters (Wm, bm) can be obtained.
  • the test environment picture and the test natural sentence can also be input into the score model to obtain the second score of the test. Adjust the convolution parameters and classification parameters by analyzing the second score of the test, that is, change the value of one or more of the convolution parameters and the classification parameters, so that the second score output by the score model indicates The accuracy of matching between the initial natural sentences and the environmental pictures is higher.
  • Step 2033 Use the product of the first score and the second score as the score of the initial natural sentence.
  • the first score indicates the matching degree between the initial natural sentence and the training data set
  • the second score indicates the matching degree between the initial natural sentence and the environmental picture.
  • the product of the first score and the second score is used as the score of the initial natural sentence, which can simultaneously reflect the matching degree of the initial natural sentence with the training data set, and the initial natural sentence with the environmental picture, thereby indicating the initial natural sentence The degree of accuracy.
  • this embodiment can also use other methods to calculate the first score and the second score to obtain the initial natural sentence Points.
  • the weighted sum of the first score and the second score may be used as the score of the initial natural sentence.
  • the weights corresponding to the first score and the second score may be the same or different to meet different needs.
  • the natural language that satisfies the target instruction can be selected based on the score of each initial natural sentence. For example, the greater the score of the initial natural sentence, the higher the accuracy of the initial natural sentence.
  • the initial natural sentence with the largest score is selected from the initial natural sentences, if the score of the initial natural sentence with the largest score is If it is not lower than the reference threshold, the initial natural sentence with the largest score is used as the natural language of the target instruction.
  • this embodiment selects the most accurate initial natural sentence from one or more initial natural sentences generated by the natural language model. If the accuracy of the initial natural sentence is not lower than the accuracy indicated by the reference threshold, then It can be shown that the accuracy of the initial natural sentence has reached a standard that is easy to be understood by the user, so the initial natural sentence can be used as the natural language of the target instruction.
  • the reference threshold can be selected based on experience, which is not limited in this embodiment.
  • the score of the initial natural sentence is the product of the first score and the second score (or weighted sum, etc.), any one of the first score and the second score is low, and both As a result, the score value of the initial natural sentence is lower than the reference threshold, and the initial natural sentence cannot be selected as the natural sentence of the target instruction.
  • the initial natural sentences that have a low degree of matching with the training data set or with a low degree of matching with the environment picture will be eliminated, thus ensuring that the initial natural sentence of the natural language as the target instruction matches the training data set. Higher, and the matching degree with the environment picture is higher, which is convenient for users to understand or execute.
  • the method provided in this embodiment further includes: if the score of the initial natural sentence with the largest score is lower than the reference threshold, restart Obtain the target natural sentence whose score is not lower than the reference threshold, and use the target natural sentence as the natural language of the target instruction.
  • the score of the initial natural sentence with the largest score is lower than the reference threshold, it means that the scores of all the initial natural sentences are lower than the reference threshold, that is, the accuracy of all the initial natural sentences is not so accurate that they can be easily understood by users.
  • reacquiring the target natural sentence with a score not lower than a reference threshold includes: calling a natural language model, and generating one or more first natural sentences based on the target vocabulary and description vocabulary according to the first grammar Sentence, the first grammar is any grammar except the reference grammar; get the average value of the first natural sentence and the average value of the initial natural sentence; if the average value of the first natural sentence is greater than the initial The average value of the score of the natural sentence, and the score of the first natural sentence with the largest score is greater than the initial natural sentence with the largest score, and the first natural sentence with the largest score is taken as the target natural sentence.
  • the natural language included in the training data set uses multiple grammars, so the natural language model trained on the training data set can also use multiple grammars, so that different natural sentences can be generated based on the same target vocabulary and description vocabulary.
  • the grammar used by the natural language model is the reference grammar, and the reference grammar is any one of the above-mentioned multiple grammars.
  • the reference grammar may not be the most suitable natural language grammar for generating the target instruction among multiple grammars.
  • the reference used to generate the initial natural sentence The grammar is "subject + adverbial". Using the same target vocabulary "black car” and descriptive vocabulary "traffic police”, by referring to the first grammar other than grammar such as "attribute + subject”, the first natural sentence "there is a black car standing by the traffic police" can be generated.
  • the average value of the scores of the first natural sentence and the average value of the initial natural sentences are compared to determine the natural language grammar that is more suitable for generating the target instruction in the first grammar and the reference grammar.
  • the method of obtaining the score of the first natural sentence is the same as the method of obtaining the score of the initial natural sentence, please refer to the above description, and will not be repeated here.
  • the sum of the scores of the first natural sentence divided by the number of the first natural sentence is the average of the scores of the first natural sentence.
  • the sum of the scores of the initial natural sentence is divided by the number of the initial natural sentence That is, the average value of the initial natural sentences.
  • the formula shown in Figure 3 is as follows, which is used to express the above process of obtaining the average value Q of the score:
  • the first grammar is a natural language grammar more suitable for generating the target instruction. Furthermore, it is also required that the score of the first natural sentence with the largest score is greater than that of the initial natural sentence with the largest score, thereby avoiding the first natural sentence with the largest score and the initial natural sentence with the largest score. That was dropped. Only when the above two conditions are met, the first natural sentence with the highest score is taken as the target natural sentence, and the target natural sentence is used as the natural language of the target instruction, which improves the accuracy of the natural language of the target instruction.
  • the reference grammar among the multiple grammars that can be used by the natural language model is already the most suitable natural language grammar for generating the target instruction, that is to say, the natural language of the target instruction cannot be made by changing the grammar.
  • the accuracy is higher. Since the score of the initial natural sentence with the largest score is lower than the reference threshold, the score of the initial natural sentence with the largest score can be further increased by increasing the number of description words in the initial natural sentence.
  • the natural language model is called, based on the initial natural sentence with the largest score and the description vocabulary, in accordance with the reference grammar
  • One or more second natural sentences are generated.
  • the number of description words in the second natural sentence is greater than the number of description words in the initial natural sentence with the largest score; the score of the second natural sentence is obtained, and the first natural sentence with the largest score is obtained.
  • Two natural sentences are used as target natural sentences. In implementation, the score of each second natural sentence can be obtained.
  • the description vocabulary used by the initial natural sentence with the largest score is one or more of all the description vocabulary.
  • the process of generating a second natural sentence based on the initial natural sentence with the largest score and the description vocabulary refers to the selection of the description vocabulary used by the initial natural sentence with the largest score on the basis of the initial natural sentence with the largest score.
  • One or more are added to the initial natural sentence with the largest score to form the second natural sentence. It can be seen that the initial natural sentence with the largest score and the second natural sentence both use the reference grammar, but the number of description words used in the second natural sentence is more than the number of the initial natural sentence with the largest score, thus achieving Increase in points.
  • the score of each second natural sentence is obtained in the same way as the score of the initial natural sentence, and the second natural sentence with the largest score is used as the target natural sentence, thereby ensuring the natural language of the target instruction The degree of accuracy.
  • the method provided in this embodiment further includes: obtaining the predicted value, and if the predicted value is greater than the reference value, calling the natural language model to generate alternative natural sentences, and replacing the natural sentences meeting the conditions with the alternative natural sentences as the natural language describing the target instruction .
  • the predicted value is used to indicate the degree of influence of the environmental element update on the natural sentence that satisfies the condition, and the environmental element update includes the position update of the environmental element in the environmental picture. For example, if the environmental element is "traffic police", the movement of the traffic police can be regarded as an update of the environmental element.
  • the method of obtaining the predicted value includes: obtaining the first predicted value, the second predicted value, and the third predicted value, and the product of the first predicted value, the second predicted value, and the third predicted value is used as the predicted value.
  • the first predicted value is used to indicate the probability that the environmental picture will be updated from the current state to the predicted state after the environmental element is updated.
  • the current state refers to the state before the environmental element is updated, and the predicted state is the state at a future time.
  • the first predicted value applies the idea of MDP (Markov Decision Process), that is, it is assumed that the future state (corresponding to the predicted state of this embodiment) is only related to the current state (corresponding to the current state of this embodiment) and the current state.
  • the actions in the state (corresponding to the environmental element update in this embodiment) are related, but not related to other factors.
  • the first predicted value can be expressed as P(s′ k
  • the current state refers to the state of the traffic police standing still
  • the first predictive value indicates that after the traffic police is moving, the environmental picture is updated from the traffic police standing still to the prediction Status, such as the probability of a traffic police leaving the enforcement environment.
  • the second predicted value is used to indicate the probability of observing the current state and environmental element update.
  • the second predicted value can be expressed as O( sk , ak ), where sk still represents the current state, and ak still represents the action of updating environmental elements. Still taking the movement of the traffic police as an environmental element as an example, the second predicted value is used to indicate the observed probability that the traffic police is stationary and the traffic police is moving. It can be seen that the second predicted value is the basis of the first predicted value, that is, the current state and environmental element update need to be observed first, and then the environmental picture can be updated from the current state to the predicted state after the environmental element update is further obtained.
  • the third predicted value is used to indicate the degree of influence on the target natural sentence if the environment picture is updated from the current state to the predicted state.
  • the third predicted value can be expressed as d(s′ k , s k ).
  • the third predicted value is a positive number. The smaller the difference between the third predicted value and 0, the smaller the impact of the environmental picture from the current state to the predicted state on the target natural sentence. Accordingly, The greater the difference between the third predicted value and 0, the greater the impact of the environmental picture from the current state to the predicted state on the natural sentences that meet the conditions.
  • the value range of the third predicted value may be [0, 1].
  • the value of the third predicted value is 0 or 1. That is, when the value of the third predicted value is 0, it indicates that updating the current state to the predicted state has no effect on the natural sentence that meets the condition, and the target natural sentence that meets the condition is not updated. When the value of the third predicted value is 1, it indicates that the current state is updated to the predicted state so that the natural sentence that meets the condition needs to be updated.
  • the rules for setting the third predicted value to 1 include but are not limited to the following three types:
  • the first case the environment element is updated, so that the description vocabulary used by the natural sentence that meets the condition is inconsistent with the description vocabulary indicated by the updated environment element.
  • the natural sentence that satisfies the conditions is "the black car has a traffic police standing next to it", and the descriptive vocabulary used in this natural sentence is "traffic police.”
  • the description vocabulary indicated by the updated environmental element does not include "traffic police", so it is necessary to update the natural sentences that meet the conditions.
  • the second situation the target vocabulary changes.
  • the natural sentence that still satisfies the condition is "a black car has a traffic police standing by" as an example, and the target word in the natural sentence is "black car". If the black car leaves the implementation environment, the black car cannot continue to be the target vocabulary of the natural sentence, so the natural sentence that meets the condition needs to be updated.
  • the third case the distance between the user and the implementation environment is not greater than the reference distance.
  • the natural sentence generated from the user's perspective is easier for users to understand than the natural sentence that meets the above conditions.
  • the natural sentence from the user's perspective is, for example, "the black car on your left ". Therefore, it is also necessary to update the natural sentences that meet the conditions.
  • the method for obtaining the first predicted value, the second predicted value, and the third predicted value can all be obtained through an empirical data set, which is not limited in this embodiment.
  • the predicted value x k can be expressed according to the following formula :
  • this embodiment does not limit the calculation method of the predicted value obtained by calculating the first predicted value, the second predicted value, and the third predicted value, and the calculation method may be selected according to needs or experience. For example, in addition to using the product of the first predicted value, the second predicted value, and the third predicted value as the predicted value, the first predicted value, the second predicted value, and the third predicted value can also be weighted and summed to obtain the predicted value. Numerical value. In the calculation process, the weights corresponding to the first predicted value, the second predicted value, and the third predicted value may be the same or different. The weights may be determined based on experience, and the weights are not limited in this embodiment.
  • the predicted value is greater than the reference value, it means that the update of the environmental element causes the natural sentence that meets the condition to be no longer applicable to the natural language as the target instruction. It is greater than the possibility indicated by the reference value, that is, the update of the environmental element.
  • the natural language of the condition has a greater degree of influence. Therefore, the predicted value is greater than the reference value as the opportunity to update the natural sentence that satisfies the condition, and then the natural language model is called to generate the alternative natural sentence, and the alternative natural sentence is replaced by the natural sentence that meets the condition as the natural language describing the target instruction.
  • the natural language describing the instruction is suitable for the environmental picture after the environmental element is updated, thereby ensuring the accuracy of the natural language describing the target instruction when the environmental element is updated.
  • the embodiment of the present application generates initial natural sentences by natural language models trained based on the training data set, and selects natural sentences that meet the conditions from the initial natural sentences as the natural language of the target instruction, which is not only more efficient, but also
  • the generated natural language has clear semantics, easy to understand, and a good user experience.
  • an embodiment of the present application provides a device for generating natural language.
  • the device includes:
  • the first obtaining module 501 is configured to obtain the target vocabulary included in the content of the target instruction and the description vocabulary indicated by the environment element in the environment picture of the target instruction;
  • the generating module 502 is used to call the natural language model to generate one or more initial natural sentences according to the reference grammar based on the target vocabulary and description vocabulary.
  • the natural language model is a language model trained according to the training data set.
  • the training data set includes user description training The natural language of instructions;
  • the second obtaining module 503 is used to obtain the score of each initial natural sentence
  • the selection module 504 is configured to select a natural sentence that satisfies the condition as the natural language of the target instruction based on the score of each initial natural sentence, and the score is used to indicate the accuracy of the initial natural sentence.
  • the second obtaining module 503 is configured to obtain a first score value of the initial natural sentence for any initial natural sentence, and the first score value is used to indicate the degree of matching between the initial natural sentence and the training data set; according to public data Set to obtain the second score of the initial natural sentence.
  • the second score is used to indicate the degree of matching between the initial natural sentence and the environmental picture.
  • the public data set includes multiple pictures marked with environmental elements; the first score is compared with the second score. The product of the values is used as the score of the initial natural sentence.
  • the second acquisition module 503 is used to encode the initial natural sentence to obtain the encoded natural sentence information; according to the convolution parameter in the score model, the encoded natural sentence information and the information in the environment picture Convolution calculation to obtain the convolution result; the convolution result is calculated according to the classification parameters in the score model to obtain the second score of the initial natural sentence.
  • the convolution parameters and the classification parameters are parameters obtained by training according to the public data set.
  • the selection module 504 is configured to select the initial natural sentence with the largest score from the initial natural sentences, and if the score of the initial natural sentence with the largest score is not lower than the reference threshold, then the initial natural sentence with the largest score is selected The natural language as the target instruction.
  • the device further includes: a third acquiring module 505, configured to re-acquire the target natural sentence whose score is not lower than the reference threshold if the score of the initial natural sentence with the largest score is lower than the reference threshold , The natural language of the target natural sentence as the target instruction.
  • a third acquiring module 505 configured to re-acquire the target natural sentence whose score is not lower than the reference threshold if the score of the initial natural sentence with the largest score is lower than the reference threshold , The natural language of the target natural sentence as the target instruction.
  • the third acquisition module 505 is used to call the natural language model, and generate one or more first natural sentences based on the target vocabulary and description vocabulary according to the first grammar, the first grammar being any grammar except the reference grammar ; Get the average of the scores of the first natural sentence and the average of the scores of the initial natural sentence; if the average of the scores of the first natural sentence is greater than the average of the scores of the initial natural sentence, and the score is the largest The score of the first natural sentence is greater than the initial natural sentence with the largest score, and the first natural sentence with the largest score is used as the target natural sentence.
  • the third acquisition module 505 is further configured to: if the average value of the score of the first natural sentence is not greater than the average value of the initial natural sentence, or the score of the first natural sentence with the largest score is not The initial natural sentence greater than the maximum score, the natural language model is called, based on the initial natural sentence and the description vocabulary with the largest score, one or more second natural sentences are generated according to the reference grammar, and the number of description words in the second natural sentence is greater than The number of description words in the initial natural sentence with the largest score; the score of the second natural sentence is obtained, and the second natural sentence with the largest score is used as the target natural sentence.
  • the device further includes: a prediction module 506 for obtaining a predicted value, the predicted value is used to indicate the degree of influence of the environmental element update on the natural sentence that meets the condition; if the predicted value is greater than the reference value, the natural language is called
  • the model generates alternative natural sentences, and replaces the natural sentences with the conditions as natural language describing the target instruction.
  • the prediction module 506 is configured to obtain a first predicted value, and the first predicted value is used to indicate the probability that after the environmental element is updated, the environmental picture will be updated from the current state to the predicted state, where the current state refers to before the environmental element is updated Get the second predicted value, the second predicted value is used to indicate the probability of observing the current state and environmental element update; get the third predicted value, the third predicted value is used to indicate if the environment picture is updated from the current state to the predicted state , The degree of influence on natural sentences that meet the conditions; the product of the first predicted value, the second predicted value, and the third predicted value is used as the predicted value.
  • the embodiment of the present application generates initial natural sentences by natural language models trained based on the training data set, and selects natural sentences that meet the conditions from the initial natural sentences as the natural language of the target instruction, which is not only more efficient, but also
  • the generated natural language has clear semantics, easy to understand, and a good user experience.
  • the terminal 800 may be a portable mobile terminal, such as a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic Video experts compress the standard audio level 4) Player, laptop or desktop computer.
  • the terminal 800 may also be called user equipment, portable terminal, laptop terminal, desktop terminal and other names.
  • the terminal 800 includes a processor 801 and a memory 802.
  • the processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
  • the processor 801 can adopt at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array). achieve.
  • the processor 801 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the awake state, also called a CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor used to process data in the standby state.
  • the processor 801 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used to render and draw content that needs to be displayed on the display screen 805.
  • the processor 801 may further include an AI (Artificial Intelligence) processor, which is used to process computing operations related to machine learning.
  • AI Artificial Intelligence
  • the memory 802 may include one or more computer-readable storage media, which may be non-transitory.
  • the memory 802 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
  • the non-transitory computer-readable storage medium in the memory 802 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 801 to implement the natural language generating method provided by the embodiments of the present application. method.
  • the terminal 800 may optionally further include: a peripheral device interface 803 and at least one peripheral device.
  • the processor 801, the memory 802, and the peripheral device interface 803 may be connected by a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 803 through a bus, a signal line or a circuit board.
  • the peripheral device includes at least one of a radio frequency circuit 804, a touch display screen 805, a camera 808, an audio circuit 807, a positioning component 808, and a power supply 809.
  • the peripheral device interface 803 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 801 and the memory 802.
  • the processor 801, the memory 802, and the peripheral device interface 803 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 801, the memory 802 and the peripheral device interface 803 or The two can be implemented on separate chips or circuit boards, which are not limited in this embodiment.
  • the radio frequency circuit 804 is used to receive and transmit RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 804 communicates with a communication network and other communication devices through electromagnetic signals.
  • the radio frequency circuit 804 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
  • the radio frequency circuit 804 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on.
  • the radio frequency circuit 804 can communicate with other terminals through at least one wireless communication protocol.
  • the wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G and 8G), wireless local area networks and/or WiFi (Wireless Fidelity, wireless fidelity) networks.
  • the radio frequency circuit 804 may also include NFC (Near Field Communication) related circuits, which is not limited in this application.
  • the display screen 805 is used to display UI (User Interface).
  • the UI can include graphics, text, icons, videos, and any combination thereof.
  • the display screen 805 also has the ability to collect touch signals on or above the surface of the display screen 805.
  • the touch signal can be input to the processor 801 as a control signal for processing.
  • the display screen 805 may also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
  • the display screen 805 there may be one display screen 805, which is provided with the front panel of the terminal 800; in other embodiments, there may be at least two display screens 805, which are respectively arranged on different surfaces of the terminal 800 or in a folded design; In still other embodiments, the display screen 805 may be a flexible display screen, which is arranged on the curved surface or the folding surface of the terminal 800. Furthermore, the display screen 805 can also be set as a non-rectangular irregular pattern, that is, a special-shaped screen.
  • the display screen 805 may be made of materials such as LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode, organic light emitting diode).
  • the camera assembly 806 is used to capture images or videos.
  • the camera assembly 806 includes a front camera and a rear camera.
  • the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal.
  • the camera assembly 806 may also include a flash.
  • the flash can be a single-color flash or a dual-color flash. Dual color temperature flash refers to a combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
  • the audio circuit 807 may include a microphone and a speaker.
  • the microphone is used to collect the sound waves of the user and the environment, and convert the sound waves into electrical signals and input to the processor 801 for processing, or input to the radio frequency circuit 804 to implement voice communication. For the purpose of stereo collection or noise reduction, there may be multiple microphones, which are respectively set in different parts of the terminal 800.
  • the microphone can also be an array microphone or an omnidirectional acquisition microphone.
  • the speaker is used to convert the electrical signal from the processor 801 or the radio frequency circuit 804 into sound waves.
  • the speaker can be a traditional membrane speaker or a piezoelectric ceramic speaker.
  • the speaker When the speaker is a piezoelectric ceramic speaker, it can not only convert the electrical signal into human audible sound waves, but also convert the electrical signal into human inaudible sound waves for purposes such as distance measurement.
  • the audio circuit 807 may also include a headphone jack.
  • the positioning component 808 is used to locate the current geographic position of the terminal 800 to implement navigation or LBS (Location Based Service, location-based service).
  • the positioning component 808 may be a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China, the Granus system of Russia, or the Galileo system of the European Union.
  • the power supply 809 is used to supply power to various components in the terminal 800.
  • the power source 809 may be alternating current, direct current, disposable batteries, or rechargeable batteries.
  • the rechargeable battery may support wired charging or wireless charging.
  • the rechargeable battery can also be used to support fast charging technology.
  • the terminal 800 further includes one or more sensors 810.
  • the one or more sensors 810 include, but are not limited to, an acceleration sensor 811, a gyroscope sensor 812, a pressure sensor 813, a fingerprint sensor 814, an optical sensor 815, and a proximity sensor 816.
  • the acceleration sensor 810 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 800.
  • the acceleration sensor 811 can be used to detect the components of the gravitational acceleration on three coordinate axes.
  • the processor 801 may control the touch screen 805 to display the user interface in a horizontal view or a vertical view according to the gravity acceleration signal collected by the acceleration sensor 811.
  • the acceleration sensor 811 may also be used for the collection of game or user motion data.
  • the gyroscope sensor 812 can detect the body direction and rotation angle of the terminal 800, and the gyroscope sensor 812 can cooperate with the acceleration sensor 811 to collect the user's 3D actions on the terminal 800.
  • the processor 801 can implement the following functions according to the data collected by the gyroscope sensor 812: motion sensing (such as changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.
  • the pressure sensor 813 may be disposed on the side frame of the terminal 800 and/or the lower layer of the touch screen 805.
  • the processor 801 performs left and right hand recognition or quick operation according to the holding signal collected by the pressure sensor 813.
  • the processor 801 controls the operability controls on the UI interface according to the user's pressure operation on the touch display screen 805.
  • the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
  • the fingerprint sensor 814 is used to collect the user's fingerprint.
  • the processor 801 identifies the user's identity according to the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the user's identity according to the collected fingerprint.
  • the processor 801 authorizes the user to perform related sensitive operations, which include unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings.
  • the fingerprint sensor 814 may be provided on the front, back or side of the terminal 800. When a physical button or a manufacturer logo is provided on the terminal 800, the fingerprint sensor 814 may be integrated with the physical button or the manufacturer logo.
  • the optical sensor 815 is used to collect the ambient light intensity.
  • the processor 801 may control the display brightness of the touch screen 805 according to the ambient light intensity collected by the optical sensor 815. Specifically, when the ambient light intensity is high, the display brightness of the touch screen 805 is increased; when the ambient light intensity is low, the display brightness of the touch screen 805 is decreased.
  • the processor 801 may also dynamically adjust the shooting parameters of the camera assembly 806 according to the ambient light intensity collected by the optical sensor 815.
  • the proximity sensor 816 also called a distance sensor, is usually arranged on the front panel of the terminal 800.
  • the proximity sensor 816 is used to collect the distance between the user and the front of the terminal 800.
  • the processor 801 controls the touch screen 805 to switch from the on-screen state to the off-screen state; when the proximity sensor 816 detects When the distance between the user and the front of the terminal 800 gradually increases, the processor 801 controls the touch display screen 805 to switch from the rest screen state to the bright screen state.
  • FIG. 8 does not constitute a limitation on the terminal 800, and may include more or fewer components than shown, or combine some components, or adopt different component arrangements.
  • an embodiment of the present application provides a device for generating natural language.
  • the device includes a memory and a processor.
  • the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor to implement the embodiment of the present application.
  • an embodiment of the present application provides a readable storage medium, and the readable storage medium stores at least one instruction, and the instruction is loaded and executed by a processor, so as to implement any of the foregoing generations provided by the embodiment of the present application.
  • the natural language approach is provided.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

本申请公开了生成自然语言,属于人工智能领域。方法包括:获取目标指令的内容所包括的目标词汇,以及目标指令的环境图片中的环境元素所指示的描述词汇;基于目标词汇及描述词汇,调用自然语言模型按照参考语法生成一条或多条初始自然语句,自然语言模型是根据训练数据集训练过的语言模型;获取每条初始自然语句的分值,基于每条初始自然语句的分值选择满足条件的自然语句作为目标指令的自然语言。

Description

生成自然语言
本公开要求于2019年04月29日提交的申请号为201910357502.2、申请名称为“生成自然语言的方法、装置、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本申请涉及人工智能领域,特别涉及生成自然语言。
背景技术
随着人工智能技术的发展,人工智能装置被广泛应用于生活中,自然语言的生成装置便是其中一种。该生成装置在获取需要用户理解的目标指令后,生成自然语言(即人类沟通所使用的语言)来描述该目标指令,以便用户进行理解。因此,如何生成自然语言,成为用户快速、正确地理解目标指令的关键。
发明内容
本申请实施例提供了生成自然语言,所述技术方案如下:
一方面,提供了一种生成自然语言的方法,所述方法包括:
获取目标指令的内容所包括的目标词汇,以及所述目标指令的环境图片中的环境元素所指示的描述词汇;
基于所述目标词汇及所述描述词汇,调用自然语言模型按照参考语法生成一条或多条初始自然语句,所述自然语言模型是根据训练数据集训练过的语言模型,所述训练数据集包括用户描述训练指令的自然语言;
获取每条初始自然语句的分值,基于每条初始自然语句的分值选择满足条件的自然语句作为所述目标指令的自然语言,所述分值用于指示所述初始自然语句的准确程度。
可选地,所述获取每条初始自然语句的分值,包括:
对于任一初始自然语句,获取所述初始自然语句的第一分值,所述第一分值用于指示所述初始自然语句与所述训练数据集的匹配程度;
根据公开数据集获取所述初始自然语句的第二分值,所述第二分值用于指示所述初始自然语句与所述环境图片的匹配程度,所述公开数据集包括标注了环境元素的多张图片;
将所述第一分值与所述第二分值的乘积作为所述初始自然语句的分值。
可选地,所述根据公开数据集获取所述初始自然语句的第二分值,包括:
对所述初始自然语句进行编码,得到编码后的自然语句信息;
根据分值模型中的卷积参数对所述编码后的自然语句信息和所述环境图片中的信息进行卷积计算,得到卷积结果;
根据所述分值模型中的分类参数对所述卷积结果进行计算,得到所述初始自然语句的第二分值,所述卷积参数和所述分类参数是根据所述公开数据集训练得到的参数。
可选地,所述基于每条初始自然语句的分值选择满足条件的自然语句作为所述目标指令的自然语言,包括:
从初始自然语句中选择分值最大的初始自然语句,若所述分值最大的初始自然语句的分值不低于参考阈值,则将所述分值最大的初始自然语句作为所述目标指令的自然语言。
可选地,所述方法还包括:
若所述分值最大的初始自然语句的分值低于所述参考阀值,重新获取分值不低于所述参考阈值的目标自然语句,将所述目标自然语句作为所述目标指令的自然语言。
可选地,所述重新获取分值不低于所述参考阈值的目标自然语句,包括:
调用所述自然语言模型,基于所述目标词汇和所述描述词汇,按照第一语法生成一条或多条第一自然语句,所述第一语法为除所述参考语法外的任一语法;
获取所述第一自然语句的分值的平均值和所述初始自然语句的分值的平均值;
若所述第一自然语句的分值的平均值大于所述初始自然语句的分值的平均值,且分值最大的第一自然语句的分值大于所述分值最大的初始自然语句,将所述分值最大的第一自然语句作为所述目标自然语句。
可选地,所述获取所述第一自然语句的分值的平均值和所述初始自然语句的分值的平均值之后,还包括:
若所述第一自然语句的分值的平均值不大于所述初始自然语句的分值的平均值,或者,所述分值最大的第一自然语句的分值不大于所述分值最大的初始自然语句,调用所述自然语言模型,基于所述分值最大的初始自然语句和所述描述词汇,按照所述参考语法生成一条或多条第二自然语句,所述第二自然语句中的描述词汇的数量大于所述分值最大的初始自然语句中的描述词汇的数量;
获取所述第二自然语句的分值,将分值最大的第二自然语句作为所述目标自然语句。
可选地,所述基于每条初始自然语句的分值选择满足条件的自然语句作为所述目标指令的自然语言之后,所述方法还包括:
获取预测数值,所述预测数值用于指示所述环境元素更新对所述满足条件的自然语句的影响程度;
若所述预测数值大于参考数值,调用所述自然语言模型生成备选自然语句,将所述备选自然语句代替所述满足条件的自然语句作为所述目标指令的自然语言。
可选地,所述获取预测数值,包括:
获取第一预测数值,所述第一预测数值用于指示所述环境元素更新后,所述环境图片由当前状态更新为预测状态的概率,其中,所述当前状态是指所述环境元素更新之前的状态;
获取第二预测数值,所述第二预测数值用于指示观测到所述当前状态及所述环境元素更新的概率;
获取第三预测数值,所述第三预测数值用于指示若所述环境图片由当前状态更新为预测状态,对所述满足条件的自然语句的影响程度;
将所述第一预测数值、所述第二预测数值与所述第三预测数值的乘积作为所述预测数值。
一方面,提供了一种生成自然语言的装置,所述装置包括:
第一获取模块,用于获取目标指令的内容所包括的目标词汇,以及所述目标指令的环境图片中的环境元素所指示的描述词汇;
生成模块,用于基于所述目标词汇及所述描述词汇,调用自然语言模型按照参考语法生成一条或多条初始自然语句,所述自然语言模型是根据训练数据集训练过的语言模型,所述训练数据集包括用户描述训练指令的自然语言;
第二获取模块,用于获取每条初始自然语句的分值;
选择模块,用于基于每条初始自然语句的分值选择满足条件的自然语句作为所述目标指令的自然语言,所述分值用于指示所述初始自然语句的准确程度。
可选地,所述第二获取模块,用于对于任一初始自然语句,获取所述初始自然语句的第一分值,所述第一分值用于指示所述初始自然语句与所述训练数据集的匹配程度;根据公开数据集获取所述初始自然语句的第二分值,所述第二分值用于指示所述初始自然语句与所述环境图片的匹配程度,所述公开数据集包括标注了环境元素的多张图片;将所述第一分值与所述第二分值的乘积作为所述初始自然语句的分值。
可选地,所述第二获取模块,用于对所述初始自然语句进行编码,得到编码后的自然语句信息;根据分值模型中的卷积参数对所述编码后的自然语句信息和所述环境图片中的信息进行卷积计算,得到卷积结果;根据所述分值模型中的分类参数对所述卷积结果进行计算,得到所述初始自然语句的第二分值,所述卷积参数和所述分类参数是根据所述公开数据集训练得到的参数。
可选地,所述选择模块,用于从初始自然语句中选择分值最大的初始自然语句,若所述分值最大的初始自然语句的分值不低于参考阈值,则将所述分值最大的初始自然语句作为所述目标指令的自然语言。
可选地,所述装置还包括:第三获取模块,用于若所述分值最大的初始自然语句的分值低于所述参考阀值,重新获取分值不低于所述参考阈值的目标自然语句,将所述目标自然语句作为所述目标指令的自然语言。
可选地,所述第三获取模块,用于调用所述自然语言模型,基于所述目标词汇和所述描述词汇,按照第一语法生成一条或多条第一自然语句,所述第一语法为除所述参考语法外的任一语法;获取所述第一自然语句的分值的平均值和所述初始自然语句的分值的平均值;若所述第一自然语句的分值的平均值大于所述初始自然语句的分值的平均值,且分值最大的第一自然语句的分值大于所述分值最大的初始自然语句,将所述分值最大的第一自然语句作为所述目标自然语句。
可选地,所述第三获取模块,还用于若所述第一自然语句的分值的平均值不大于所述初始自然语句的分值的平均值,或者,所述分值最大的第一自然语句的分值不大于所述分值最大的初始自然语句,调用所述自然语言模型,基于所述分值最大的初始自然语句和所述描述词汇,按照所述参考语法生成一条或多条第二自然语句,所述第二自然语句中的描述词汇的数量大于所述分值最大的初始自然语句中的描述词汇的数量;获取所述第二自然语句的分值,将分值最大的第二自然语句作为所述目标自然语句。
可选地,所述装置还包括:预测模块,用于获取预测数值,所述预测数值用于指示所述环境元素更新对所述满足条件的自然语句的影响程度;若所述预测数值大于参考数值,调用所述自然语言模型生成备选自然语句,将所述备选自然语句代替所述满足条件的自然 语句作为所述目标指令的自然语言。
可选地,所述预测模块,用于获取第一预测数值,所述第一预测数值用于指示所述环境元素更新后,所述环境图片由当前状态更新为预测状态的概率,其中,所述当前状态是指所述环境元素更新之前的状态;获取第二预测数值,所述第二预测数值用于指示观测到所述当前状态及所述环境元素更新的概率;获取第三预测数值,所述第三预测数值用于指示若所述环境图片由当前状态更新为预测状态,对所述满足条件的自然语句的影响程度;将所述第一预测数值、所述第二预测数值与所述第三预测数值的乘积作为所述预测数值。
一方面,提供了一种生成自然语言的设备,所述设备包括存储器及处理器,所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行,以实现本申请实施例提供的生成自然语言的方法。
另一方面,提供了一种可读存储介质,所述可读存储介质中存储有至少一条指令,所述指令由处理器加载并执行,以实现本申请实施例提供的生成自然语言的方法。
本申请实施例提供的技术方案带来的有益效果至少包括:
本申请实施例通过根据训练数据集训练过的自然语言模型来生成初始自然语句,再从初始自然语句中选择满足条件的自然语句来作为目标指令的自然语言,不仅效率较高,而且生成的自然语言语义明确、易于理解,用户的使用体验较好。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。
图1是本申请实施例提供的实施环境示意图;
图2是本申请实施例提供的生成自然语言的方法的流程图;
图3是本申请实施例提供的生成自然语言的流程示意图;
图4是本申请实施例提供的生成自然语言的流程示意图;
图5是本申请实施例提供的生成自然语言的装置的结构示意图;
图6是本申请实施例提供的生成自然语言的装置的结构示意图;
图7是本申请实施例提供的生成自然语言的装置的结构示意图;
图8是本申请实施例提供的终端的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
随着人工智能技术的发展,自然语言的生成装置被广泛的应用于生活中。自然语言的生成装置在获取需要用户理解的目标指令后,生成自然语言,即人类沟通所使用的语言,来描述该目标指令,以便用户进行理解。
在相关技术一中,首先基于目标指令的内容获取目标词汇,然后将该目标指令的实施环境中的环境元素所对应的词汇作为描述词汇,并按照不同顺序对目标词汇和所有描述词汇进行排列,形成参考数量的语句。之后,计算每个语句被用户正确理解的概率,将被用户正确理解的概率最高的语句作为描述目标指令的自然语言。
例如,以交通环境为例,基于目标指令“到黑车这来”获取目标词汇为“黑车”,将该交通环境中的环境元素所对应的词汇如“交警”及“过街天桥”作为描述词汇,从而排列得到“交警旁边的过街天桥后面的黑车”、“过街天桥后面的交警旁边的黑车”等语句。通过计算选择被用户正确理解的概率最高的语句作为描述目标指令的自然语言。
在相关技术一的基础上,相关技术二将被用户正确理解的概率最高的语句作为目标语句,向该目标语句中添加历史信息,形成一个或多个更新语句。之后,计算每个更新语句被用户正确理解的概率,将被用户正确理解的概率最高的更新语句作为描述目标指令的自然语言。
仍以上述交通环境为例,假设排列得到的“交警旁边的过街天桥后面的黑车”为目标语句。向该目标语句中添加历史信息,如将“交警旁边的过街天桥后面的黑车”变为“交警旁边的过街天桥后面的刚亮了一下尾灯的黑车”等更新语句。之后,再通过计算进行选择,从而得到描述目标指令的自然语言。
然而,对于包括较多环境元素的复杂环境,描述词汇也较多。从而导致相关技术一的计算量较大、效率低,且生成的自然语言的语义不够准确。相关技术二通过添加历史信息来提高自然语言的语义的准确程度,然而由于用户对历史信息常常印象不深,因而相关技术二生成的自然语言的语义仍不够准确,易被用户误解。可以看出,用户对相关技术的使用体验较差。
本申请实施例提供了一种生成自然语言的方法,该方法可应用于如图1所示的实施环境中。图1中,包括至少一个终端11和服务器12,终端11可与服务器12进行通信连接,以从服务器12上获取目标语言模型。若终端11能够自行训练模型,本申请实施例所提供的方法也可不依赖服务器12,而由终端11来执行整体方法流程。
其中,终端11可以是任何一种可与用户通过键盘、触摸板、触摸屏、遥控器、语音交互或手写设备等一种或多种方式进行人机交互的电子产品,例如PC(Personal Computer,个人计算机)、手机、智能手机、PDA(Personal Digital Assistant,个人数字助手)、可穿戴设备、掌上电脑PPC(Pocket PC)、平板电脑、智能车机、智能电视、智能音箱等。
服务器12可以是一台服务器,也可以是由多台服务器组成的服务器集群,或者是一个云计算服务中心。
本领域技术人员应能理解上述终端11和服务器12仅为举例,其他现有的或今后可能出现的终端或服务器如可适用于本申请,也应包含在本申请保护范围以内,并在此以引用方式包含于此。
基于上述图1所示的实施环境,参见图2,本申请实施例提供了一种生成自然语言的方法,该方法可应用于图1所示的终端中。如图2所示,该方法包括:
步骤201,获取目标指令的内容所包括的目标词汇,以及目标指令的环境图片中的环境元素所指示的描述词汇。
其中,目标指令是需要用户理解或执行的指令,目标词汇是指目标指令中包括的待描述的词汇。例如,目标指令为“到黑车这来”,则目标词汇为“黑车”。目标指令的环境图片用于指示目标指令的实施环境,环境元素包括但不限于目标指令的实施环境中的人、物或文 字,人、物或文字所对应的名称即为环境元素所指示的描述词汇,描述词汇的数量可以为一个或多个。例如,参见图3所示的环境图片,则描述词汇包括但不限于“交警”、“过街天桥”以及“搭”。
在本实施例中,获取目标指令的环境图片的方式包括通过摄像机等采集设备进行采集。之后,可通过CNN(Convolutional Neural Network,卷积神经网络)对目标指令的环境图片进行特征提取,可得到环境图片中的环境元素;通过分类器对CNN提取得到的环境元素进行分类,从而获取环境元素的名称,即该环境元素所指示的描述词汇。
在获取目标词汇以及描述词汇之后,可通过描述词汇来对目标词汇进行描述,以使得描述后的目标词汇便于用户理解。
步骤202,基于目标词汇及描述词汇,调用自然语言模型按照参考语法生成一条或多条初始自然语句。
其中,自然语言模型是根据训练数据集训练过的语言模型,训练数据集包括用户描述训练指令的自然语言。需要说明的是,每条训练指令均对应有用于指示该训练指令的实施环境的环境图片。用户描述训练指令的自然语言是指,用户观察训练指令的环境图片后,根据用户的习惯来应用语法及词汇对训练指令进行描述所使用的自然语言。因此,根据训练数据集训练过的自然语言模型具有基于语法、描述词汇以及目标词汇生成自然语句的能力,且自然语言模型所使用的描述词汇包括所有描述词汇中的一个或多个。
在本实施例中,采集不同类型环境、不同时刻的训练指令以及训练指令的环境图片,将用户描述每个训练指令的自然语言作为训练数据集,以使得训练数据集所包含的自然语言的数量较多,进而使得根据该训练数据集训练过的自然语言模型生成自然语句的能力较强。其中,用户可以是需要对目标指令进行理解或执行的用户,也可以是通过样本抽取选中的多个其他用户,本实施例对此不加以限定。
根据以上说明可知,调用自然语言模型基于目标词汇以及描述词汇,按照参考语法可生成一条或多条初始自然语句,不同的初始自然语句所使用的描述词汇或描述词汇的数量中的至少一项不同。例如,仍参见图3所示的环境图片,则可生成“黑车有一个交警站在旁边”、“黑车后面有一个‘搭’的汉字”以及“黑车前面有一个过街天桥”等初始自然语句。
步骤203,获取每条初始自然语句的分值,基于每条初始自然语句的分值选择满足条件的自然语句作为目标指令的自然语言。
其中,分值用于指示初始自然语句的准确程度。初始自然语句的分值越高,则该初始自然语句的准确程度越高,也就是说,该初始自然语句的语义越易于被用户所理解。可选地,获取每条初始自然语句的分值,包括:
步骤2031,对于任一初始自然语句,获取初始自然语句的第一分值,第一分值用于指示初始自然语句与训练数据集的匹配程度。
初始自然语句与训练数据集的匹配程度是指,初始自然语句与训练数据集所包括的用户描述训练指令的自然语言的匹配程度。在一种可选的实施方式中,获取初始自然语句的第一分值,包括:
通过编码器对初始自然语句进行编码得到向量,将该向量输入全连接层以及分类器,以得到初始自然语句的第一分值,第一分值可表示为
Figure PCTCN2019127634-appb-000001
其中,C d为第一分值,k用于区分不同的初始自然语句,k为正整数,且k的最大值不大于初始自然语句的数量。例如,当 初始自然语句的数量为两个时,一个初始自然语句的第一分值为
Figure PCTCN2019127634-appb-000002
另一个初始自然语句的第一分值为
Figure PCTCN2019127634-appb-000003
步骤2032,根据公开数据集获取初始自然语句的第二分值,第二分值用于指示初始自然语句与环境图片的匹配程度,公开数据集包括标注了环境元素的多张图片。
由于自然语言模型是根据训练数据集训练得到的,而训练数据集中所包括的自然语言数量有限,因而可能导致自然语言模型所生成的初始自然语句过拟合。其中,过拟合定义为:自然语言模型所生成的初始自然语句与训练数据集所包括的自然语言之间的匹配程度高,而与环境图片的匹配程度低。
因此,本实施例根据公开数据集来获取用于指示初始自然语句与环境图片的匹配程度的第二分值,以避免后续选择过程中,与环境图片的匹配程度低的初始自然语句被作为目标指令的自然语言。其中,公开数据集包括标注了环境元素的多张图片,标注环境元素是指将环境元素标注为词汇。公开数据集包括但不限于Oxford-102、KITTI及CityScope等数据集。
其中,考虑到初始自然语句与环境图片的匹配程度取决于初始自然语句所使用的描述词汇,因而可通过初始自然语句所使用的描述词汇与公开数据集中所标注的词汇的匹配程度,来间接表示初始自然语句与环境图片的匹配程度。例如,针对目标指令的环境图片,初始自然语句使用的描述词汇为“白色的天空”,而公开数据集所标注的词汇为“蓝色的天空”,则初始自然语句所使用的描述词汇与公开数据集所标注的词汇的匹配程度较低,从而得出初始自然语句与环境图片的匹配程度也较低。
可选地,根据公开数据集获取初始自然语句的第二分值,包括:
对初始自然语句进行编码,得到编码后的自然语句信息;根据分值模型中的卷积参数对编码后的自然语句信息和环境图片中的信息进行卷积计算,得到卷积结果;根据分值模型中的分类参数对卷积结果进行计算,得到初始自然语句的第二分值。
其中,本实施例采用LSTM(Long Short-Term Memory,长短期记忆网络)编码器,按照如下的公式对初始自然语句进行编码:
c=LSTM(y)
其中,c为编码后的自然语句信息,该信息为向量形式;y为初始自然语句。
之后,调用分值模型,按照如下的公式对编码后的自然语句信息和环境图片中的信息进行卷积计算,得到卷积结果:
f=tanh(W x·x+b x)⊙tanh(W c·c+b c)
其中,f即为卷积结果,x为环境图片中的信息,Wx、bx、Wc及bc均为分值模型中的卷积参数,tanh为双曲正切函数,⊙为卷积计算符号。
接着,按照如下的公式对卷积结果进行计算,从而得到第二分值:
Figure PCTCN2019127634-appb-000004
其中,
Figure PCTCN2019127634-appb-000005
代表第二分值,与第一分值相同,k仍用于区分不同的初始自然语句,此处不再加以赘述。另外,Wm及bm均为分值模型中的分类参数,softmax为分类函数。
需要说明的是,根据公开数据集对分值模型进行训练,即可得到上述卷积参数(Wx、bx、Wc、bc)及分类参数(Wm、bm)。进行训练后,还可将测试环境图片及测试自然语句输入该分值模型,得到测试的第二分值。通过分析该测试的第二分值来调整卷积参数及 分类参数,即改变卷积参数及分类参数中的一个或多个参数的数值,以使得该分值模型输出的第二分值所指示的初始自然语句与环境图片的匹配程度的准确性更高。
步骤2033,将第一分值与第二分值的乘积作为初始自然语句的分值。
根据以上说明可知,第一分值指示初始自然语句与训练数据集的匹配程度,第二分值指示初始自然语句与环境图片的匹配程度。则将第一分值与第二分值的乘积作为初始自然语句的分值,该分值可同时体现初始自然语句与训练数据集、初始自然语句与环境图片的匹配程度,从而指示初始自然语句的准确程度。
另外,除了将第一分值与第二分值的乘积作为初始自然语句的分值,本实施例还可采用其他的方式对第一分值与第二分值进行计算,得到初始自然语句的分值。例如,可将第一分值与第二分值的加权求和值作为初始自然语句的分值。此时,第一分值与第二分值所对应的权值可以相同,也可以不同,以满足不同需求。
需要说明的是,若初始自然语句的分值越大指示初始自然语句的准确程度越高,则相应地,第一分值及第二分值越大,其指示的匹配程度也越高。若初始自然语句的分值越小指示初始自然语句的准确程度越高,则相应地,第一分值及第二分值越小,其指示的匹配程度越高。本实施例根据实际情况采用以上两种情况中的一种,在此不加以限定。
在获取每条初始自然语句的分值后,便可基于每条初始自然语句的分值选择满足作为目标指令的自然语言。以初始自然语句的分值越大指示初始自然语句的准确程度越高为例,可选地,从初始自然语句中选择分值最大的初始自然语句,若分值最大的初始自然语句的分值不低于参考阈值,则将分值最大的初始自然语句作为目标指令的自然语言。
可以看出,本实施例从自然语言模型生成的一条或多条初始自然语句中选择准确程度最高的初始自然语句,若该初始自然语句的准确程度不低于参考阈值所指示的准确程度,则可说明该初始自然语句的准确程度已达到易于被用户所理解的标准,因而可将该初始自然语句作为目标指令的自然语言。其中,参考阈值可根据经验选取,本实施例对此不加以限定。
由于初始自然语句的分值为第一分值与第二分值的乘积(或加权求和值等),因而第一分值与第二分值中的任意一项分值较低,均会导致初始自然语句的分值低于参考阈值,进而导致该初始自然语句不能被选择作为目标指令的自然语句。换言之,与训练数据集的匹配程度低或者与环境图片的匹配程度低的初始自然语句均会被剔除掉,因而保证了作为目标指令的自然语言的初始自然语句,既与训练数据集的匹配程度较高,又与环境图片的匹配程度较高,从而便于用户的理解或执行。
当然,上述说明针对于分值最大的初始自然语句的分值不低于参考阈值的情况。而对于分值最大的初始自然语句的分值低于参考阈值的情况,可选地,本实施例提供的方法还包括:若分值最大的初始自然语句的分值低于参考阀值,重新获取分值不低于参考阈值的目标自然语句,将目标自然语句作为目标指令的自然语言。
其中,若分值最大的初始自然语句的分值低于参考阈值,则说明所有初始自然语句的分值均低于参考阈值,也就是所有初始自然语句的准确程度均未达到易于被用户所理解的标准。因此,需要重新获取其他的自然语句来作为目标指令的自然语言,以保证目标指令的自然语句满足易于被用户所理解的标准。
在一种可选的实施方式中,重新获取分值不低于参考阈值的目标自然语句,包括:调 用自然语言模型,基于目标词汇和描述词汇,按照第一语法生成一条或多条第一自然语句,第一语法为除参考语法外的任一语法;获取第一自然语句的分值的平均值和初始自然语句的分值的平均值;若第一自然语句的分值的平均值大于初始自然语句的分值的平均值,且分值最大的第一自然语句的分值大于分值最大的初始自然语句,将分值最大的第一自然语句作为目标自然语句。
训练数据集所包括的自然语言应用了多种语法,因而根据训练数据集训练得到的自然语言模型也可使用多种语法,从而可基于相同的目标词汇及描述词汇生成不同的自然语句。在生成初始自然语句的过程中,自然语言模型所使用的语法为参考语法,参考语法为上述多种语法中的任意一种。
由于初始自然语句的分值均低于参考阈值,因而说明参考语法可能不是多种语法中最适合用于生成目标指令的自然语言的语法,则可从多种语法中选择除参考语法以外的第一语法,并使用目标词汇以及与初始自然语句所使用的描述词汇相同的描述词汇来生成一条或多条第一自然语句。例如,仍参见图3所示的环境图片,若分值最大的初始自然语句为“黑车有一个交警站在旁边”,则目标词汇为黑车,描述词汇为交警,生成初始自然语句所使用的参考语法为“主语+状语”。使用相同的目标词汇“黑车”及描述词汇“交警”,通过参考语法以外的第一语法如“定语+主语”,则可生成第一自然语句“有一个交警站在旁边的黑车”。
之后,再通过对比第一自然语句的分值的平均值与初始自然语句的分值的平均值,来确定第一语法与参考语法中更适合用于生成目标指令的自然语言的语法。其中,获取第一自然语句的分值的方式与获取初始自然语句的分值的方式相同,可参见上述说明,此处不再加以赘述。将第一自然语句的分值之和除以第一自然语句的数量即为第一自然语句的分值的平均值,相同地,将初始自然语句的分值之和除以初始自然语句的数量即为初始自然语句的分值的平均值。图3所示的公式如下,用于表示上述获取分值的平均值Q的过程:
Figure PCTCN2019127634-appb-000006
若第一自然语句的分值的平均值大于初始自然语句的分值的平均值,则可初步确认第一语法是更适合用于生成目标指令的自然语言的语法。进一步地,还需要分值最大的第一自然语句的分值大于分值最大的初始自然语句,从而避免了分值最大的第一自然语句与分值最大的初始自然语句中分值更大的那个被剔除。只有在满足上述两个条件的情况下,才将分值最大的第一自然语句作为目标自然语句,将该目标自然语句作为目标指令的自然语言,提高了目标指令的自然语言的准确程度。
需要说明的是,参见图3,在确定第一语法与参考语法中更适合用于生成目标指令的自然语言的语法之后,还可从自然语言模型所能使用的多种语法选择其他的语法来与更适合的那个语法进行对比。对比过程可多次进行,从而最终得到多种语法中最适合用于生成目标指令的自然语言的语法,将根据该语法所生成的自然语句中分值最大的自然语句作为目标自然语句,从而保证了目标指令的自然语言的准确程度。
若经过多次对比,确定自然语言模型所能使用的多种语法中参考语法已经是最适合用于生成目标指令的自然语言的语法,也就是说通过改变语法的方式不能使得目标指令的自然语言的准确程度更高。由于分值最大的初始自然语句的分值低于参考阈值,因而可通过增加初始自然语句中描述词汇的数量的方式来进一步增大分值最大的初始自然语句的分值。
基于上述考虑,可选地,获取第一自然语句的分值的平均值和初始自然语句的分值的平均值之后,还包括:若第一自然语句的分值的平均值不大于初始自然语句的分值的平均值,或者,分值最大的第一自然语句的分值不大于分值最大的初始自然语句,调用自然语言模型,基于分值最大的初始自然语句和描述词汇,按照参考语法生成一条或多条第二自然语句,第二自然语句中的描述词汇的数量大于分值最大的初始自然语句中的描述词汇的数量;获取第二自然语句的分值,将分值最大的第二自然语句作为目标自然语句。在实施中,可以对每条第二自然语句的分值均进行获取。
其中,分值最大的初始自然语句所使用的描述词汇为所有描述词汇中的一个或多个。基于分值最大的初始自然语句和描述词汇生成第二自然语句的过程是指:在分值最大的初始自然语句的基础上,从未被分值最大的初始自然语句所使用的描述词汇中选择一个或多个添加到分值最大的初始自然语句中,从而形成第二自然语句。可以看出,分值最大的初始自然语句与第二自然语句均使用了参考语法,只是第二自然语句所使用的描述词汇的数量比分值最大的初始自然语句的数量更多,从而实现了分值的增大。
之后,同样按照与获取初始自然语句的分值相同的方式来获取每条第二自然语句的分值,并将分值最大的第二自然语句作为目标自然语句,从而保证了目标指令的自然语言的准确程度。
在选择满足条件的自然语句作为目标指令的自然语言之后,可能由于环境图片中所包括的环境元素发生更新,而导致满足条件的自然语句不再适用于环境元素更新后的环境图片。因此,本实施例提供的方法还包括:获取预测数值,若预测数值大于参考数值,调用自然语言模型生成备选自然语句,将备选自然语句代替满足条件的自然语句作为描述目标指令的自然语言。
其中,预测数值用于指示环境元素更新对满足条件的自然语句的影响程度,环境元素更新包括环境元素在环境图片中的位置更新。例如环境元素为“交警”,则交警发生移动可以看作是环境元素发生了更新。获取预测数值的方式包括:获取第一预测数值、第二预测数值以及第三预测数值,将第一预测数值、第二预测数值与第三预测数值的乘积作为预测数值。
其中,第一预测数值用于指示环境元素更新后,环境图片由当前状态更新为预测状态的概率,当前状态是指环境元素更新之前的状态,预测状态是未来时刻的状态。第一预测数值应用了MDP(Markov Decision Process,马尔可夫决策过程)思想,即假设未来状态(对应于本实施例的预测状态)仅与当前状态(对应于本实施例的当前状态)及当前状态下的动作(对应于本实施例中的环境元素更新)有关,而与其他因素无关。第一预测数值可表示为P(s′ k|s k,a k),其中s k表示当前状态,a k表示环境元素更新这一动作,s′ k表示预测状态。
例如,仍以交警发生移动作为环境元素更新为例,则当前状态是指交警原地不动的状态,而第一预测数值指示了交警发生移动后,环境图片由交警原地不动更新为预测状态,如交警离开实施环境的概率。
第二预测数值用于指示观测到当前状态及环境元素更新的概率。第二预测数值可表示为O(s k,a k),其中s k仍表示当前状态,a k仍表示环境元素更新这一动作。仍然以交警发生移动作为环境元素为例,则第二预测数值用于指示观测到交警原地不动以及交警发生移动的概率。可以看出,第二预测数值是第一预测数值的基础,即首先需要观测到当前状态及 环境元素更新,才能根据进一步获取到环境元素更新后环境图片由当前状态更新为预测状态的概率。
第三预测数值用于指示若环境图片由当前状态更新为预测状态,对目标自然语句的影响程度,第三预测数值可表示为d(s′ k,s k)。在本实施例中,第三预测数值为正数,第三预测数值与0的差值越小,则说明环境图片由当前状态更新为预测状态对目标自然语句的影响程度越小,相应地,第三预测数值与0的差值越大则说明环境图片由当前状态更新为预测状态对满足条件的自然语句的影响程度越大。例如,第三预测数值的取值范围可以为[0,1]。
在一种可选的实施方式中,第三预测数值的值为0或1。也就是说,当第三预测数值的值为0时,指示当前状态更新为预测状态对满足条件的自然语句无影响,不对满足条件的目标自然语句进行更新。而当第三预测数值的值为1时,则指示当前状态更新为预测状态使得满足条件的自然语句需要被更新。其中,令第三预测数值的值为1的规则包括但不限于以下三种:
第一种情况:环境元素更新,使得满足条件的自然语句所使用的描述词汇与更新后的环境元素所指示的描述词汇不一致。例如,满足条件的自然语句为“黑车有一个交警站在旁边”,该自然语句所使用的描述词汇为“交警”。在环境元素更新为交警离开实施环境的情况下,更新后的环境元素所指示的描述词汇不包括“交警”,因而需要对满足条件的自然语句进行更新。
第二种情况:目标词汇发生改变。仍以满足条件的自然语句为“黑车有一个交警站在旁边”为例,该自然语句中的目标词汇为“黑车”。若黑车离开实施环境,则黑车不能继续作为自然语句的目标词汇,因而也需要对满足条件的自然语句进行更新。
第三种情况:用户距离实施环境的距离不大于参考距离。在该情况下,由于用户能够看见真实的实施环境中的环境元素,因而生成用户视角的自然语句比上述满足条件的自然语句更易于用户理解,用户视角的自然语句例如为“您左手边的黑车”。因此,也需要对满足条件的自然语句进行更新。
根据以上三种规则对满足条件的自然语句更新的过程可参见图4。当然,本实施例还可根据需要增加或减少令第三预测数值的值为1的规则,此处不再一一举例说明。
另外,对于第一预测数值、第二预测数值以及第三预测数值的获取方式,均可以通过经验数据集来实现获取,本实施例对此不加以限定。获取第一预测数值、第二预测数值以及第三预测数值之后,将第一预测数值、第二预测数值与第三预测数值的乘积作为预测数值,则预测数值x k可按照如下的公式进行表示:
x k=P(s′ k|s k,a k)O(s k,a k)d(s′ k,s k)
需要说明的是,本实施例不对通过第一预测数值、第二预测数值以及第三预测数值计算得到预测数值的计算方式加以限定,计算方式跟可以需要或经验进行选取。例如,除了将第一预测数值、第二预测数值以及第三预测数值的乘积作为预测数值以外,也可以对第一预测数值、第二预测数值以及第三预测数值进行加权求和,从而得到预测数值。计算过程中,第一预测数值、第二预测数值以及第三预测数值所对应的权重可以相同,也可以不同,权重可根据经验进行确定,本实施例不对权重加以限定。
若预测数值大于参考数值,则说明环境元素更新导致满足条件的自然语句不再适用于 作为目标指令的自然语言的可能性较大,大于参考数值所指示的可能性,也就是环境元素更新对满足条件的自然语言的影响程度较大。因此,将预测数值大于参考数值作为更新满足条件的自然语句的时机,再调用自然语言模型生成备选自然语句,将备选自然语句代替满足条件的自然语句作为描述目标指令的自然语言,使得该描述指令的自然语言适用于环境元素更新后的环境图片,从而保证了环境元素更新的情况下描述目标指令的自然语言的准确性。
综上所述,本申请实施例通过根据训练数据集训练过的自然语言模型来生成初始自然语句,从初始自然语句中选择满足条件的自然语句作为目标指令的自然语言,不仅效率较高,而且生成的自然语言语义明确、易于理解,用户的使用体验好。
基于相同构思,本申请实施例提供了一种生成自然语言的装置,参见图5,该装置包括:
第一获取模块501,用于获取目标指令的内容所包括的目标词汇,以及目标指令的环境图片中的环境元素所指示的描述词汇;
生成模块502,用于基于目标词汇及描述词汇,调用自然语言模型按照参考语法生成一条或多条初始自然语句,自然语言模型是根据训练数据集训练过的语言模型,训练数据集包括用户描述训练指令的自然语言;
第二获取模块503,用于获取每条初始自然语句的分值;
选择模块504,用于基于每条初始自然语句的分值选择满足条件的自然语句作为目标指令的自然语言,分值用于指示初始自然语句的准确程度。
可选地,第二获取模块503,用于对于任一初始自然语句,获取初始自然语句的第一分值,第一分值用于指示初始自然语句与训练数据集的匹配程度;根据公开数据集获取初始自然语句的第二分值,第二分值用于指示初始自然语句与环境图片的匹配程度,公开数据集包括标注了环境元素的多张图片;将第一分值与第二分值的乘积作为初始自然语句的分值。
可选地,第二获取模块503,用于对初始自然语句进行编码,得到编码后的自然语句信息;根据分值模型中的卷积参数对编码后的自然语句信息和环境图片中的信息进行卷积计算,得到卷积结果;根据分值模型中的分类参数对卷积结果进行计算,得到初始自然语句的第二分值,卷积参数和分类参数是根据公开数据集训练得到的参数。
可选地,选择模块504,用于从初始自然语句中选择分值最大的初始自然语句,若分值最大的初始自然语句的分值不低于参考阈值,则将分值最大的初始自然语句作为目标指令的自然语言。
可选地,参见图6,装置还包括:第三获取模块505,用于若分值最大的初始自然语句的分值低于参考阀值,重新获取分值不低于参考阈值的目标自然语句,将目标自然语句作为目标指令的自然语言。
可选地,第三获取模块505,用于调用自然语言模型,基于目标词汇和描述词汇,按照第一语法生成一条或多条第一自然语句,第一语法为除参考语法外的任一语法;获取第一自然语句的分值的平均值和初始自然语句的分值的平均值;若第一自然语句的分值的平均值大于初始自然语句的分值的平均值,且分值最大的第一自然语句的分值大于分值最大的初始自然语句,将分值最大的第一自然语句作为目标自然语句。
可选地,第三获取模块505,还用于若第一自然语句的分值的平均值不大于初始自然语句的分值的平均值,或者,分值最大的第一自然语句的分值不大于分值最大的初始自然语句,调用自然语言模型,基于分值最大的初始自然语句和描述词汇,按照参考语法生成一条或多条第二自然语句,第二自然语句中的描述词汇的数量大于分值最大的初始自然语句中的描述词汇的数量;获取第二自然语句的分值,将分值最大的第二自然语句作为目标自然语句。
可选地,参见图7,装置还包括:预测模块506,用于获取预测数值,预测数值用于指示环境元素更新对满足条件的自然语句的影响程度;若预测数值大于参考数值,调用自然语言模型生成备选自然语句,将备选自然语句代替满足条件的自然语句作为描述目标指令的自然语言。
可选地,预测模块506,用于获取第一预测数值,第一预测数值用于指示环境元素更新后,环境图片由当前状态更新为预测状态的概率,其中,当前状态是指环境元素更新之前的状态;获取第二预测数值,第二预测数值用于指示观测到当前状态及环境元素更新的概率;获取第三预测数值,第三预测数值用于指示若环境图片由当前状态更新为预测状态,对满足条件的自然语句的影响程度;将第一预测数值、第二预测数值与第三预测数值的乘积作为预测数值。
综上所述,本申请实施例通过根据训练数据集训练过的自然语言模型来生成初始自然语句,从初始自然语句中选择满足条件的自然语句作为目标指令的自然语言,不仅效率较高,而且生成的自然语言语义明确、易于理解,用户的使用体验好。
需要说明的是,上述实施例提供的装置在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
参见图8,其示出了本申请实施例提供的一种生成自然语言的终端800的结构示意图。该终端800可以是便携式移动终端,比如:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端800还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,终端800包括有:处理器801和存储器802。
处理器801可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器801可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器801也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中, 处理器801可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏805所需要显示的内容的渲染和绘制。一些实施例中,处理器801还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器802可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器802还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器802中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器801所执行以实现本申请实施例提供的生成自然语言的方法。
在一些实施例中,终端800还可选包括有:外围设备接口803和至少一个外围设备。处理器801、存储器802和外围设备接口803之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口803相连。具体地,外围设备包括:射频电路804、触摸显示屏805、摄像头808、音频电路807、定位组件808和电源809中的至少一种。
外围设备接口803可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器801和存储器802。在一些实施例中,处理器801、存储器802和外围设备接口803被集成在同一芯片或电路板上;在一些其他实施例中,处理器801、存储器802和外围设备接口803中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路804用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路804通过电磁信号与通信网络以及其他通信设备进行通信。射频电路804将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路804包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路804可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:城域网、各代移动通信网络(2G、3G、4G及8G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路804还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。
显示屏805用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏805是触摸显示屏时,显示屏805还具有采集在显示屏805的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器801进行处理。此时,显示屏805还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏805可以为一个,设置终端800的前面板;在另一些实施例中,显示屏805可以为至少两个,分别设置在终端800的不同表面或呈折叠设计;在再一些实施例中,显示屏805可以是柔性显示屏,设置在终端800的弯曲表面上或折叠面上。甚至,显示屏805还可以设置成非矩形的不规则图形,也即异形屏。显示屏805可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件806用于采集图像或视频。可选地,摄像头组件806包括前置摄像头和后 置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件806还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路807可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器801进行处理,或者输入至射频电路804以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在终端800的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器801或射频电路804的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路807还可以包括耳机插孔。
定位组件808用于定位终端800的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件808可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统、俄罗斯的格雷纳斯系统或欧盟的伽利略系统的定位组件。
电源809用于为终端800中的各个组件进行供电。电源809可以是交流电、直流电、一次性电池或可充电电池。当电源809包括可充电电池时,该可充电电池可以支持有线充电或无线充电。该可充电电池还可以用于支持快充技术。
在一些实施例中,终端800还包括有一个或多个传感器810。该一个或多个传感器810包括但不限于:加速度传感器811、陀螺仪传感器812、压力传感器813、指纹传感器814、光学传感器815以及接近传感器816。
加速度传感器810可以检测以终端800建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器811可以用于检测重力加速度在三个坐标轴上的分量。处理器801可以根据加速度传感器811采集的重力加速度信号,控制触摸显示屏805以横向视图或纵向视图进行用户界面的显示。加速度传感器811还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器812可以检测终端800的机体方向及转动角度,陀螺仪传感器812可以与加速度传感器811协同采集用户对终端800的3D动作。处理器801根据陀螺仪传感器812采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器813可以设置在终端800的侧边框和/或触摸显示屏805的下层。当压力传感器813设置在终端800的侧边框时,可以检测用户对终端800的握持信号,由处理器801根据压力传感器813采集的握持信号进行左右手识别或快捷操作。当压力传感器813设置在触摸显示屏805的下层时,由处理器801根据用户对触摸显示屏805的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器814用于采集用户的指纹,由处理器801根据指纹传感器814采集到的指纹识别用户的身份,或者,由指纹传感器814根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器801授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器814可以被设置终端800的正面、背面或侧面。当终端800上设置有物理按键或厂商Logo时,指纹传感器814可以与物理按键或厂商Logo集成在一起。
光学传感器815用于采集环境光强度。在一个实施例中,处理器801可以根据光学传感器815采集的环境光强度,控制触摸显示屏805的显示亮度。具体地,当环境光强度较高时,调高触摸显示屏805的显示亮度;当环境光强度较低时,调低触摸显示屏805的显示亮度。在另一个实施例中,处理器801还可以根据光学传感器815采集的环境光强度,动态调整摄像头组件806的拍摄参数。
接近传感器816,也称距离传感器,通常设置在终端800的前面板。接近传感器816用于采集用户与终端800的正面之间的距离。在一个实施例中,当接近传感器816检测到用户与终端800的正面之间的距离逐渐变小时,由处理器801控制触摸显示屏805从亮屏状态切换为息屏状态;当接近传感器816检测到用户与终端800的正面之间的距离逐渐变大时,由处理器801控制触摸显示屏805从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图8中示出的结构并不构成对终端800的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
基于相同构思,本申请实施例提供了一种生成自然语言的设备,该设备包括存储器及处理器,存储器中存储有至少一条指令,至少一条指令由处理器加载并执行,以实现本申请实施例提供的上述任一种生成自然语言的方法。
基于相同构思,本申请实施例提供了一种可读存储介质,该可读存储介质中存储有至少一条指令,指令由处理器加载并执行,以实现本申请实施例提供的上述任一种生成自然语言的方法。
上述所有可选技术方案,可以采用任意结合形成本申请的可选实施例,在此不再一一赘述。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种生成自然语言的方法,其中,所述方法包括:
    获取目标指令的内容所包括的目标词汇,以及所述目标指令的环境图片中的环境元素所指示的描述词汇;
    基于所述目标词汇及所述描述词汇,调用自然语言模型按照参考语法生成一条或多条初始自然语句,所述自然语言模型是根据训练数据集训练过的语言模型,所述训练数据集包括用户描述训练指令的自然语言;
    获取每条初始自然语句的分值,基于每条初始自然语句的分值选择满足条件的自然语句作为所述目标指令的自然语言,所述分值用于指示所述初始自然语句的准确程度。
  2. 根据权利要求1所述的方法,其中,所述获取每条初始自然语句的分值,包括:
    对于任一初始自然语句,获取所述初始自然语句的第一分值,所述第一分值用于指示所述初始自然语句与所述训练数据集的匹配程度;
    根据公开数据集获取所述初始自然语句的第二分值,所述第二分值用于指示所述初始自然语句与所述环境图片的匹配程度,所述公开数据集包括标注了环境元素的多张图片;
    将所述第一分值与所述第二分值的乘积作为所述初始自然语句的分值。
  3. 根据权利要求2所述的方法,其中,所述根据公开数据集获取所述初始自然语句的第二分值,包括:
    对所述初始自然语句进行编码,得到编码后的自然语句信息;
    根据分值模型中的卷积参数对所述编码后的自然语句信息和所述环境图片中的信息进行卷积计算,得到卷积结果;
    根据所述分值模型中的分类参数对所述卷积结果进行计算,得到所述初始自然语句的第二分值,所述卷积参数和所述分类参数是根据所述公开数据集训练得到的参数。
  4. 根据权利要求1-3任一所述的方法,其中,所述基于每条初始自然语句的分值选择满足条件的自然语句作为所述目标指令的自然语言,包括:
    从初始自然语句中选择分值最大的初始自然语句,若所述分值最大的初始自然语句的分值不低于参考阈值,则将所述分值最大的初始自然语句作为所述目标指令的自然语言。
  5. 根据权利要求4所述的方法,其中,所述方法还包括:
    若所述分值最大的初始自然语句的分值低于所述参考阀值,重新获取分值不低于所述参考阈值的目标自然语句,将所述目标自然语句作为所述目标指令的自然语言。
  6. 根据权利要求5所述的方法,其中,所述重新获取分值不低于所述参考阈值的目标自然语句,包括:
    调用所述自然语言模型,基于所述目标词汇和所述描述词汇,按照第一语法生成一条或多条第一自然语句,所述第一语法为除所述参考语法外的任一语法;
    获取所述第一自然语句的分值的平均值和所述初始自然语句的分值的平均值;
    若所述第一自然语句的分值的平均值大于所述初始自然语句的分值的平均值,且分值最大的第一自然语句的分值大于所述分值最大的初始自然语句,将所述分值最大的第一自然语句作为所述目标自然语句。
  7. 根据权利要求6所述的方法,其中,所述获取所述第一自然语句的分值的平均值和所述初始自然语句的分值的平均值之后,还包括:
    若所述第一自然语句的分值的平均值不大于所述初始自然语句的分值的平均值,或者,所述分值最大的第一自然语句的分值不大于所述分值最大的初始自然语句,调用所述自然语言模型,基于所述分值最大的初始自然语句和所述描述词汇,按照所述参考语法生成一条或多条第二自然语句,所述第二自然语句中的描述词汇的数量大于所述分值最大的初始自然语句中的描述词汇的数量;
    获取所述第二自然语句的分值,将分值最大的第二自然语句作为所述目标自然语句。
  8. 根据权利要求1-3任一所述的方法,其中,所述基于每条初始自然语句的分值选择满足条件的自然语句作为所述目标指令的自然语言之后,所述方法还包括:
    获取预测数值,所述预测数值用于指示所述环境元素更新对所述满足条件的自然语句的影响程度;
    若所述预测数值大于参考数值,调用所述自然语言模型生成备选自然语句,将所述备选自然语句代替所述满足条件的自然语句作为所述目标指令的自然语言。
  9. 根据权利要求8所述的方法,其中,所述获取预测数值,包括:
    获取第一预测数值,所述第一预测数值用于指示所述环境元素更新后,所述环境图片由当前状态更新为预测状态的概率,其中,所述当前状态是指所述环境元素更新之前的状态;
    获取第二预测数值,所述第二预测数值用于指示观测到所述当前状态及所述环境元素更新的概率;
    获取第三预测数值,所述第三预测数值用于指示若所述环境图片由当前状态更新为预测状态,对所述满足条件的自然语句的影响程度;
    将所述第一预测数值、所述第二预测数值与所述第三预测数值的乘积作为所述预测数值。
  10. 一种生成自然语言的装置,其中,所述装置包括:
    第一获取模块,用于获取目标指令的内容所包括的目标词汇,以及所述目标指令的环境图片中的环境元素所指示的描述词汇;
    生成模块,用于基于所述目标词汇及所述描述词汇,调用自然语言模型按照参考语法生成一条或多条初始自然语句,所述自然语言模型是根据训练数据集训练过的语言模型,所述训练数据集包括用户描述训练指令的自然语言;
    第二获取模块,用于获取每条初始自然语句的分值;
    选择模块,用于基于每条初始自然语句的分值选择满足条件的自然语句作为所述目标指令的自然语言,所述分值用于指示所述初始自然语句的准确程度。
  11. 根据权利要求10所述的装置,其中,所述第二获取模块,用于对于任一初始自然语句,获取所述初始自然语句的第一分值,所述第一分值用于指示所述初始自然语句与所述训练数据集的匹配程度;根据公开数据集获取所述初始自然语句的第二分值,所述第二分值用于指示所述初始自然语句与所述环境图片的匹配程度,所述公开数据集包括标注了环境元素的多张图片;将所述第一分值与所述第二分值的乘积作为所述初始自然语句的分值。
  12. 根据权利要求11所述的装置,其中,所述第二获取模块,用于对所述初始自然语句进行编码,得到编码后的自然语句信息;根据分值模型中的卷积参数对所述编码后的自 然语句信息和所述环境图片中的信息进行卷积计算,得到卷积结果;根据所述分值模型中的分类参数对所述卷积结果进行计算,得到所述初始自然语句的第二分值,所述卷积参数和所述分类参数是根据所述公开数据集训练得到的参数。
  13. 根据权利要求10-12任一所述的装置,其中,所述选择模块,用于从初始自然语句中选择分值最大的初始自然语句,若所述分值最大的初始自然语句的分值不低于参考阈值,则将所述分值最大的初始自然语句作为所述目标指令的自然语言。
  14. 根据权利要求13所述的装置,其中,所述装置还包括:第三获取模块,用于若所述分值最大的初始自然语句的分值低于所述参考阀值,重新获取分值不低于所述参考阈值的目标自然语句,将所述目标自然语句作为所述目标指令的自然语言。
  15. 根据权利要求14所述的装置,其中,所述第三获取模块,用于调用所述自然语言模型,基于所述目标词汇和所述描述词汇,按照第一语法生成一条或多条第一自然语句,所述第一语法为除所述参考语法外的任一语法;获取所述第一自然语句的分值的平均值和所述初始自然语句的分值的平均值;若所述第一自然语句的分值的平均值大于所述初始自然语句的分值的平均值,且分值最大的第一自然语句的分值大于所述分值最大的初始自然语句,将所述分值最大的第一自然语句作为所述目标自然语句。
  16. 根据权利要求15所述的装置,其中,所述第三获取模块,还用于若所述第一自然语句的分值的平均值不大于所述初始自然语句的分值的平均值,或者,所述分值最大的第一自然语句的分值不大于所述分值最大的初始自然语句,调用所述自然语言模型,基于所述分值最大的初始自然语句和所述描述词汇,按照所述参考语法生成一条或多条第二自然语句,所述第二自然语句中的描述词汇的数量大于所述分值最大的初始自然语句中的描述词汇的数量;获取所述第二自然语句的分值,将分值最大的第二自然语句作为所述目标自然语句。
  17. 根据权利要求10-12任一所述的装置,其中,所述装置还包括:预测模块,用于获取预测数值,所述预测数值用于指示所述环境元素更新对所述满足条件的自然语句的影响程度;若所述预测数值大于参考数值,调用所述自然语言模型生成备选自然语句,将所述备选自然语句代替所述满足条件的自然语句作为所述目标指令的自然语言。
  18. 根据权利要求17所述的装置,其中,所述预测模块,用于获取第一预测数值,所述第一预测数值用于指示所述环境元素更新后,所述环境图片由当前状态更新为预测状态的概率,其中,所述当前状态是指所述环境元素更新之前的状态;获取第二预测数值,所述第二预测数值用于指示观测到所述当前状态及所述环境元素更新的概率;获取第三预测数值,所述第三预测数值用于指示若所述环境图片由当前状态更新为预测状态,对所述满足条件的自然语句的影响程度;将所述第一预测数值、所述第二预测数值与所述第三预测数值的乘积作为所述预测数值。
  19. 一种生成自然语言的设备,其中,所述设备包括存储器及处理器;所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行,以实现权利要求1-9中任一所述的生成自然语言的方法。
  20. 一种可读存储介质,其中,所述存储介质中存储有至少一条指令,所述指令由处理器加载并执行以实现如权利要求1-9中任一所述的生成自然语言的方法。
PCT/CN2019/127634 2019-04-29 2019-12-23 生成自然语言 WO2020220702A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910357502.2 2019-04-29
CN201910357502.2A CN110096707B (zh) 2019-04-29 2019-04-29 生成自然语言的方法、装置、设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2020220702A1 true WO2020220702A1 (zh) 2020-11-05

Family

ID=67446560

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/127634 WO2020220702A1 (zh) 2019-04-29 2019-12-23 生成自然语言

Country Status (2)

Country Link
CN (1) CN110096707B (zh)
WO (1) WO2020220702A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110096707B (zh) * 2019-04-29 2020-09-29 北京三快在线科技有限公司 生成自然语言的方法、装置、设备及可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120260263A1 (en) * 2011-04-11 2012-10-11 Analytics Intelligence Limited Method, system and program for data delivering using chatbot
CN105279495A (zh) * 2015-10-23 2016-01-27 天津大学 一种基于深度学习和文本总结的视频描述方法
CN105678297A (zh) * 2015-12-29 2016-06-15 南京大学 一种基于标签转移及lstm模型的人像语义分析的方法及系统
CN106056207A (zh) * 2016-05-09 2016-10-26 武汉科技大学 一种基于自然语言的机器人深度交互与推理方法与装置
CN109073404A (zh) * 2016-05-02 2018-12-21 谷歌有限责任公司 用于基于地标和实时图像生成导航方向的系统和方法
CN110096707A (zh) * 2019-04-29 2019-08-06 北京三快在线科技有限公司 生成自然语言的方法、装置、设备及可读存储介质

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040030540A1 (en) * 2002-08-07 2004-02-12 Joel Ovil Method and apparatus for language processing
CN102455786B (zh) * 2010-10-25 2014-09-03 三星电子(中国)研发中心 一种对中文句子输入法的优化系统及方法
EP2615541A1 (en) * 2012-01-11 2013-07-17 Siemens Aktiengesellschaft Computer implemented method, apparatus, network server and computer program product
CN105975558B (zh) * 2016-04-29 2018-08-10 百度在线网络技术(北京)有限公司 建立语句编辑模型的方法、语句自动编辑方法及对应装置
CN106650789B (zh) * 2016-11-16 2023-04-07 同济大学 一种基于深度lstm网络的图像描述生成方法
CN108319581B (zh) * 2017-01-17 2021-10-08 科大讯飞股份有限公司 一种自然语言语句评价方法及装置
CN107133209B (zh) * 2017-03-29 2020-11-03 北京百度网讯科技有限公司 基于人工智能的评论生成方法及装置、设备与可读介质
CN107193807B (zh) * 2017-05-12 2021-05-28 北京百度网讯科技有限公司 基于人工智能的语言转换处理方法、装置及终端
CN107274903B (zh) * 2017-05-26 2020-05-19 北京搜狗科技发展有限公司 文本处理方法和装置、用于文本处理的装置
CN107702706B (zh) * 2017-09-20 2020-08-21 Oppo广东移动通信有限公司 路径确定方法、装置、存储介质及移动终端
CN108039988B (zh) * 2017-10-31 2021-04-30 珠海格力电器股份有限公司 设备控制处理方法及装置
CN108399427A (zh) * 2018-02-09 2018-08-14 华南理工大学 基于多模态信息融合的自然交互方法
CN108846063B (zh) * 2018-06-04 2020-12-22 北京百度网讯科技有限公司 确定问题答案的方法、装置、设备和计算机可读介质
CN108959250A (zh) * 2018-06-27 2018-12-07 众安信息技术服务有限公司 一种基于语言模型和词特征的纠错方法及其系统
CN109034147B (zh) * 2018-09-11 2020-08-11 上海唯识律简信息科技有限公司 基于深度学习和自然语言的光学字符识别优化方法和系统
CN109614613B (zh) * 2018-11-30 2020-07-31 北京市商汤科技开发有限公司 图像的描述语句定位方法及装置、电子设备和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120260263A1 (en) * 2011-04-11 2012-10-11 Analytics Intelligence Limited Method, system and program for data delivering using chatbot
CN105279495A (zh) * 2015-10-23 2016-01-27 天津大学 一种基于深度学习和文本总结的视频描述方法
CN105678297A (zh) * 2015-12-29 2016-06-15 南京大学 一种基于标签转移及lstm模型的人像语义分析的方法及系统
CN109073404A (zh) * 2016-05-02 2018-12-21 谷歌有限责任公司 用于基于地标和实时图像生成导航方向的系统和方法
CN106056207A (zh) * 2016-05-09 2016-10-26 武汉科技大学 一种基于自然语言的机器人深度交互与推理方法与装置
CN110096707A (zh) * 2019-04-29 2019-08-06 北京三快在线科技有限公司 生成自然语言的方法、装置、设备及可读存储介质

Also Published As

Publication number Publication date
CN110096707A (zh) 2019-08-06
CN110096707B (zh) 2020-09-29

Similar Documents

Publication Publication Date Title
WO2020228519A1 (zh) 字符识别方法、装置、计算机设备以及存储介质
CN110471858B (zh) 应用程序测试方法、装置及存储介质
CN110750992B (zh) 命名实体识别方法、装置、电子设备及介质
CN110162604B (zh) 语句生成方法、装置、设备及存储介质
WO2022057435A1 (zh) 基于搜索的问答方法及存储介质
CN111027490B (zh) 人脸属性识别方法及装置、存储介质
CN110147533B (zh) 编码方法、装置、设备及存储介质
CN111339737B (zh) 实体链接方法、装置、设备及存储介质
CN111324699A (zh) 语义匹配的方法、装置、电子设备及存储介质
CN113836946B (zh) 训练评分模型的方法、装置、终端及存储介质
CN110837557A (zh) 摘要生成方法、装置、设备及介质
CN114328815A (zh) 文本映射模型的处理方法、装置、计算机设备及存储介质
CN110990549B (zh) 获取答案的方法、装置、电子设备及存储介质
CN112001442B (zh) 特征检测方法、装置、计算机设备及存储介质
CN117454954A (zh) 模型训练方法、装置、计算机设备及存储介质
CN114691860A (zh) 文本分类模型的训练方法、装置、电子设备及存储介质
WO2020220702A1 (zh) 生成自然语言
CN109829067B (zh) 音频数据处理方法、装置、电子设备及存储介质
CN113343709B (zh) 意图识别模型的训练方法、意图识别方法、装置及设备
CN114925667A (zh) 内容分类方法、装置、设备及计算机可读存储介质
CN110852093A (zh) 文本信息生成方法、装置、计算机设备及存储介质
CN111310701B (zh) 手势识别方法、装置、设备及存储介质
CN111125424B (zh) 提取歌曲核心歌词的方法、装置、设备及存储介质
CN109816047B (zh) 提供标签的方法、装置、设备及可读存储介质
CN112487162A (zh) 确定文本语义信息的方法、装置、设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19927530

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19927530

Country of ref document: EP

Kind code of ref document: A1