CN111242721B

CN111242721B - Voice meal ordering method and device, electronic equipment and storage medium

Info

Publication number: CN111242721B
Application number: CN201911401837.6A
Authority: CN
Inventors: 胡江鹭; 李和瀚; 孙辉丰; 丁鑫哲; 孙叔琦; 孙珂; 李婷婷
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2023-10-31
Anticipated expiration: 2039-12-30
Also published as: CN111242721A

Abstract

The application discloses a voice meal ordering method and device, and relates to the technical field of artificial intelligence. The specific implementation scheme is as follows: collecting voice of a user, performing voice recognition and semantic recognition on the voice, and obtaining semantic information; the semantic information includes: intention information and slot information; when the intention information is the order, determining an object to be ordered according to the semantic information and generating pre-order information of the object to be ordered; comparing the pre-order information with a necessary information list corresponding to the object to be ordered, and judging whether the pre-order information lacks necessary information; when the necessary information is not needed in the pre-order information, the pre-order information and the personal information of the user are combined to execute the order placing operation.

Description

Voice meal ordering method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of computer application, in particular to the technical field of artificial intelligence, and especially relates to a voice meal ordering method, a voice meal ordering device, electronic equipment and a computer readable storage medium.

Background

There is a general demand for online ordering in daily life, and at present, there are two main ways of online ordering: one is that the user directly contacts the manual customer service/merchant to make an order; the other is that the user orders the meal on the visual page by self-service through the meal ordering application program or the website, but the second online meal ordering mode is that for the user, the user needs to select the user from a plurality of commodities on a menu, then orders and pays, a series of ordering processes are complex, and ordering efficiency is low; the first meal ordering mode is convenient and quick, and saves time and worry for users; however, for merchants, more manual customer service is required to respond to the demands of users in time, and the required manpower is high and the cost is high.

Disclosure of Invention

The application provides a voice meal ordering method, a voice meal ordering device, electronic equipment and a computer readable storage medium, wherein order information of a user is determined and ordering is completed by carrying out voice recognition, semantic understanding and order management on voice of the user, so that a large number of customer service personnel are prevented from being arranged at a merchant side, the ordering process of the user is simplified, the ordering cost is reduced, and the ordering efficiency is improved.

An embodiment of a first aspect of the present application provides a voice meal ordering method, including: collecting voice of a user, performing voice recognition and semantic recognition on the voice, and obtaining semantic information; the semantic information includes: intention information and slot information; when the intention information is to be placed, determining an object to be placed according to the semantic information and generating pre-order information of the object to be placed; comparing the pre-order information with a necessary information list corresponding to the object to be ordered, and judging whether the pre-order information lacks necessary information; and when the pre-order information does not lack necessary information, executing the order placing operation by combining the pre-order information and the personal information of the user.

In one embodiment of the present application, the voice ordering method further includes: when the necessary information is absent in the pre-order information, acquiring the absent first necessary information, generating inquiry voice by combining the first necessary information and playing the inquiry voice to the user so as to acquire the first necessary information by combining the voice replied by the user; and updating the pre-order information according to the first requisite information until the requisite information is not absent in the pre-order information.

In one embodiment of the present application, the voice ordering method further includes: and when the intention information is to make an order, if the object to be made is not determined according to the semantic information, generating voice for inquiring the object to be made and playing the voice to the user so as to determine the object to be made by combining the voice replied by the user.

In one embodiment of the present application, the voice ordering method further includes: when the intention information is recommendation, determining an object to be recommended by combining the semantic information; and generating recommended voice according to the object to be recommended and playing the recommended voice to the user so as to be convenient for the user to select.

In one embodiment of the present application, the voice ordering method further includes: in the process of playing the voice, if the voice of the user is collected, carrying out voice recognition and semantic recognition on the collected voice to obtain a voice recognition result and a semantic recognition result; stopping playing the voice when the voice recognition result or the semantic recognition result meets a preset interrupt condition, and processing the collected voice; the preset interrupt condition includes any one or more of the following conditions: the number of words in the voice recognition result is larger than a preset word number threshold value, a preset interrupt keyword exists in the voice recognition result, and intention information exists in the semantic recognition result; and continuing to play the voice after the collected voice processing is completed.

According to the voice meal ordering method, voice recognition and semantic recognition are carried out on voice of a user through collecting the voice of the user, and semantic information is obtained; the semantic information includes: intention information and slot information; when the intention information is to be placed, determining an object to be placed according to the semantic information and generating pre-order information of the object to be placed; comparing the pre-order information with a necessary information list corresponding to the object to be ordered, and judging whether the pre-order information lacks necessary information; and when the pre-order information does not lack necessary information, executing the order placing operation by combining the pre-order information and the personal information of the user. According to the method, voice recognition, semantic understanding and order management are carried out on voice of the user, order information of the user is determined, order placing is completed, a large number of customer service personnel are prevented from being arranged on a merchant side, order placing flow of the user is simplified, order placing cost is reduced, and order placing efficiency is improved.

Another embodiment of the present application provides a voice meal ordering apparatus, including: the acquisition module is used for acquiring voice of a user, carrying out voice recognition and semantic recognition on the voice and acquiring semantic information; the semantic information includes: intention information and slot information; the generation module is used for determining an object to be ordered according to the semantic information and generating pre-order information of the object to be ordered when the intention information is the order; the comparison module is used for comparing the pre-order information with a necessary information list corresponding to the object to be ordered and judging whether the pre-order information lacks necessary information or not; and the order placing module is used for executing order placing operation by combining the pre-order information and the personal information of the user when the pre-order information does not lack necessary information.

In one embodiment of the present application, the voice ordering apparatus further comprises: the first acquisition module and the updating module; the first acquisition module is used for acquiring the first missing necessary information when the necessary information is missing in the pre-order information, generating inquiry voice by combining the first necessary information and playing the inquiry voice to the user so as to acquire the first necessary information by combining the voice replied by the user; the updating module is configured to update the pre-order information according to the first necessary information until the pre-order information is not missing necessary information.

In one embodiment of the present application, when the intention information is to be placed, if the object to be placed is not determined according to the semantic information, the generating module is further configured to generate a voice for querying the object to be placed and play the voice to the user, so as to determine the object to be placed in combination with the voice replied by the user.

In one embodiment of the present application, the voice ordering apparatus further comprises: a determining module; the determining module is used for determining an object to be recommended according to the semantic information when the intention information is recommendation; the generation module is further used for generating recommended voice according to the object to be recommended and playing the recommended voice to the user so as to be convenient for the user to select.

In one embodiment of the present application, the voice ordering apparatus further comprises: the second acquisition module and the processing module; the second acquisition module is used for carrying out voice recognition and semantic recognition on the collected voice if the voice of the user is collected in the voice playing process, and acquiring a voice recognition result and a semantic recognition result; the processing module is used for stopping playing the voice when the voice recognition result or the semantic recognition result meets a preset interrupt condition and processing the collected voice; the preset interrupt condition includes any one or more of the following conditions: the number of words in the voice recognition result is larger than a preset word number threshold value, a preset interrupt keyword exists in the voice recognition result, and intention information exists in the semantic recognition result; and the processing module is also used for continuing to play the voice after the collected voice is processed.

According to the voice meal ordering device, voice recognition and semantic recognition are carried out on voice of a user through collecting the voice of the user, so that semantic information is obtained; the semantic information includes: intention information and slot information; when the intention information is to be placed, determining an object to be placed according to the semantic information and generating pre-order information of the object to be placed; comparing the pre-order information with a necessary information list corresponding to the object to be ordered, and judging whether the pre-order information lacks necessary information; and when the pre-order information does not lack necessary information, executing the order placing operation by combining the pre-order information and the personal information of the user. According to the method, voice recognition, semantic understanding and order management are carried out on voice of the user, order information of the user is determined, order placing is completed, a large number of customer service personnel are prevented from being arranged on a merchant side, order placing flow of the user is simplified, order placing cost is reduced, and order placing efficiency is improved.

An embodiment of a third aspect of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the voice ordering method according to the embodiment of the application.

An embodiment of a fourth aspect of the present application proposes a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the voice ordering method of the embodiment of the present application.

Other effects of the above alternative will be described below in connection with specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:

FIG. 1 is a schematic diagram of a first embodiment according to the present application;

FIG. 2 is a schematic diagram of a recommendation according to one embodiment of the application.

FIG. 3 is a schematic diagram of a recommendation flow according to one embodiment of the application;

FIG. 4 is a schematic diagram of a second embodiment according to the present application;

FIG. 5 is a schematic diagram of a third embodiment according to the present application;

FIG. 6 is a schematic diagram of a fourth embodiment according to the present application;

FIG. 7 is a schematic view of a fifth embodiment according to the application

Fig. 8 is a block diagram of an electronic device for implementing a voice ordering method according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The following describes a voice meal ordering method, a voice meal ordering device, an electronic device and a computer readable storage medium according to the embodiments of the present application with reference to the accompanying drawings. The execution main body of the voice meal ordering method of the embodiment of the application is a voice meal ordering device.

Fig. 1 is a schematic diagram according to a first embodiment of the present application.

As shown in fig. 1, the voice ordering method may include:

step 101, collecting voice of a user, performing voice recognition and semantic recognition on the voice, and obtaining semantic information.

In the embodiment of the application, the voice ordering device can provide a voice input interface for a user, and voice data input by the user can be acquired through the interface. For example, the mobile terminal collects voice of the user through the microphone and uploads the collected voice data to the voice ordering device through the voice input interface, so that the voice ordering device obtains the voice data input by the user.

And then, carrying out voice recognition and semantic recognition on the collected voice data of the user so as to obtain semantic information, wherein the semantic information can comprise, but is not limited to, intention information and slot position information. For example, the voice recognition and semantic recognition are carried out on the 'give me a cup of whole sugar hot latte', and the intention information for analyzing the sentence is as follows: ordering, wherein the slot position information is as follows: one cup (quantity), total sugar (sugar amount information), heat (temperature information), and latte (drink type).

As an example, the collected user speech may be speech recognized and semantically recognized by speech recognition techniques and semantic recognition techniques. The speech recognition technique may be an ASR (Automatic Speech Recognition ) technique, such as: hundred degree speech recognition. Semantic recognition techniques may include, but are not limited to, semantic recognition based on combined semantic derivation, semantic recognition based on intent classification models and word slot entity recognition models, semantic recognition based on intent word slot integrated recognition models, semantic recognition based on sample instance generalization, semantic recognition by reference resolution.

Thus, the collected voice of the user can be subjected to semantic recognition through different semantic recognition technologies, and examples are as follows:

first example: the collected user's speech may be semantically identified based on a combined semantic derivation, i.e., the user's speech may be analyzed based on a rule derived demand structure. It will be appreciated that a class of requirements expressed by a user generally conforms to a pattern, and that requirements having the same pattern can be generalized to form a template through which the user's requirements can be described. For example, "I want a cup of coffee" and "give I a cup of milk tea" have the same pattern, they can be generalized as: template form of [ kw_want: want ] [ kw_num: quantity ] [ kw_drink: drink ].

It should be noted that, under the condition of no corpus (or few corpora), a large number of templates are quickly constructed by summarizing which component segments the user needs can be decomposed into, and the templates are used as templates in the combined semantic derivation so as to realize analysis and acquisition of intention information and slot position information. The corpus can be obtained in advance by related personnel.

A second example: and carrying out semantic recognition on the collected voice of the user based on the intention classification model and the word slot entity recognition model. The method comprises the steps of acquiring corresponding intention information from a collected voice input intention classification model of a user, and analyzing the collected voice input word slot entity recognition model of the user to acquire corresponding slot position information. It should be noted that, in order to ensure accuracy of the intent classification model and the word stock entity recognition model, the intent classification model and the word stock entity recognition model need to be trained by a large amount of corpus to obtain the intent classification model and the word stock entity recognition model.

In a third example, semantic recognition is performed on collected speech of a user based on an intent word slot integrated recognition model. The method is characterized in that the collected voice input intention word groove integrated recognition model of the user can output corresponding intention information and groove position information simultaneously. It should be noted that, in order to ensure accuracy of the intended word and groove integrated recognition model, the intended word and groove integrated recognition model needs to be trained by a large amount of corpus, and a loss function in the model is optimized to obtain the model.

In a fourth example, semantic recognition is performed on collected speech of a user based on sample instance generalization. The template is automatically abstracted from the marked sample according to the currently collected voice of the user, and the generalization of the same or similar sentence patterns is carried out based on the template so as to obtain corresponding intention information and slot position information. For example: "I want a cup of coffee" has been labeled, the intent information is: ordering, wherein the slot position information is as follows: one cup (quantity), coffee (drink type), then by generalizing the sample instance, "I want two cups of milk tea" and the annotated sample have the same sentence, also the intent information and the slot information can be identified. It should be noted that, the template generated by generalizing the sample instance can be generated in two ways, one is automatically generated, and the other is constructed by related technicians.

In a fifth example, the content to which the pronouns in the user's speech are directed is determined by reference resolution. That is, on the basis of any one of the previous four examples, if there is a pronoun in the voice data of the user, it may be determined by reference resolution which noun or phrase the pronoun specifically points to for parsing to obtain the corresponding intention information and slot information. Where the pronouns may include, but are not limited to, ordinal pronouns, fuzzy pronouns, etc. The ordinal pronouns refer to that when the voice ordering device and the user are in dialogue, sentences of the voice ordering device are multi-structure choices, and a certain option is directly represented by ordinal numbers in answers of the user. Such as: the voice meal ordering device says that: "you want coffee, milk tea, or juice", the user answers say: "first", wherein ordinal pronoun 1 refers to coffee. Fuzzy refers to that when the voice ordering device and the user are in dialogue, the sentence of the voice ordering device is a multi-result selection, and a certain option is represented by a keyword in the answer of the user. Such as: the voice meal ordering device says that: "you want pearl milky tea, red bean fresh milky tea or mango green", the user answers say: "Bar of red beans", wherein red beans refer to fresh milk tea of red beans.

Step 102, when the intention information is to place an order, determining an object to be placed according to the semantic information and generating pre-order information of the object to be placed.

In the embodiment of the application, when the intention information is the order, whether the object to be ordered is determined can be judged according to the semantic information.

As an example, when the intent information is to place an order, an object to be placed an order is determined according to the semantic information and pre-order information of the object to be placed an order is generated. In the embodiment of the application, the pre-order information of the object to be placed can be formed through the slot position information in the semantic information. For example, "I want a cup of coffee", wherein the object to be ordered is "coffee", and the corresponding pre-order information is "one cup", "coffee".

As another example, when the intent information is to be placed, if the object to be placed is not determined according to the semantic information, a voice for inquiring the object to be placed is generated and played to the user, so that the object to be placed is determined by combining the voice replied by the user. For example, the user says "i want a cup of beverage", and the voice ordering device asks the user which beverage is specifically, and the user returns the beverage type (such as orange juice) to the voice ordering device, so that the user can determine the object to be ordered.

And step 103, comparing the pre-order information with a necessary information list corresponding to the object to be ordered, and judging whether the pre-order information lacks necessary information.

Step 104, when the necessary information is not absent in the pre-order information, the order placing operation is executed by combining the pre-order information and the personal information of the user.

In the embodiment of the application, the pre-order information can be compared with the necessary information list corresponding to the object to be ordered, and whether the necessary information is absent in the pre-order information is judged. The list of necessary information corresponding to the object to be placed may be stored in advance, as shown in table 1, and taking coffee as an example, the list of necessary information corresponding to the object to be placed may include a first class (coffee), a second class (e.g. latte), a temperature (e.g. heat), a cup amount (e.g. small cup), a sugar amount (e.g. whole sugar), a charging (not adding) and the like.

First class of products	Two-stage products	Temperature (temperature)	Cup amount	Sugar amount	Charging material
						Coffee machine	Latte iron	Heat of the body	Small cup	Whole sugar	Without adding

As an example, after comparing the pre-order information with the necessary information list corresponding to the object to be ordered, when the necessary information is not missing in the pre-order information, the pre-order information and the personal information of the user are combined to execute the order placing operation.

For example, the user says "give me a small cup of whole sugar hot latte, other not add", the voice ordering device performs voice recognition according to the voice data of the user, obtains semantic information, generates corresponding pre-order information according to the semantic information, compares the pre-order information with a necessary information list (such as primary class, secondary class, temperature, cup quantity, sugar quantity, charging and the like) corresponding to an object to be ordered (coffee), determines that the necessary information is not needed in the pre-order information after the comparison, and then performs the ordering operation by combining the pre-order information with personal information (such as seat number, telephone number, name and address) of the user.

As another example, when the necessary information is absent in the pre-order information, the absent first necessary information is obtained, and the first necessary information is combined to generate and play the inquired voice to the user so as to obtain the first necessary information in combination with the voice replied by the user; and updating the pre-order information according to the first requisite information until the requisite information is not absent in the pre-order information.

For example, a user says "hot coffee in cup", the voice ordering device performs voice recognition according to voice data of the user to obtain semantic information, generates corresponding pre-order information according to the semantic information, compares the corresponding pre-order information with a necessary information list (such as primary product, secondary product, temperature, cup amount, sugar amount, charging and the like) corresponding to an object to be ordered (coffee), determines that the necessary information (such as secondary product, cup amount, sugar amount, charging and the like) is absent in the pre-order information after comparison, generates query voice according to the absent necessary information and broadcasts the query voice to the user, for example, the voice ordering device inquires the user whether the user is a latte, a mocha or a American style, and the user answers to say "mocha"; the voice ordering device updates the pre-order information according to the answer result of the user, determines the necessary information which is absent in the pre-order, then the voice ordering device continuously inquires according to the absent necessary information, and updates the pre-order according to the answer of the user until the necessary information is absent in the pre-order information.

It should be noted that, in the embodiment of the present application, each pre-order update needs to be determined together with the current semantic information and the current pre-order status.

As an example, user voice data can be identified, corresponding semantic information is obtained, slot information in the semantic information is segmented, and pre-order updating is performed according to the segmentation result and the current pre-order state. Such as: the user says that 'I say that I want two American style cups, one middle cup and one small cup, the middle cup needs to be hot and the small cup is at normal temperature', the groove information 'two American style cups, one middle cup, one small cup, the middle cup is hot and the small cup is at normal temperature' in the sentence can be segmented, and the segmentation results are that: (1) Two cups [2], american coffee ], one cup [1], a middle cup [ middle cup ], one cup [1], a small cup [ small cup ]; (2) a middle cup, a hot; (3) The small cup [ small cup ], normal temperature [ normal temperature ], and (2) and (3) in the segmentation result are complements to (1), so as shown in table 2, updated pre-order information is:

first class of products	Two-stage products	Temperature (temperature)	Cup amount	Sugar amount	Charging material
						Coffee machine	American style coffee	Heat of the body	Middle cup
Coffee machine	American style coffee	Normal temperature	Small cup

In conclusion, through voice recognition, semantic understanding and order management of the user, order information of the user is determined, and order placing is completed, so that a large number of customer service personnel are prevented from being arranged on a merchant side, the order placing process of the user is simplified, the order placing cost is reduced, and the order placing efficiency is improved.

In order to make the voice ordering more intelligent, when the user does not know which items are on the menu when ordering, the user can actively request the voice ordering device to recommend. Optionally, when the intent information is recommendation, determining an object to be recommended by combining the semantic information, generating recommended voice according to the object to be recommended, and playing the voice to the user so as to be convenient for the user to select. For example, as shown in fig. 2, in a coffee shop ordering scenario, corresponding recommendation needs to be performed according to the situation that the first class and the second class slot information appears in the semantic information of the user.

For example, as shown in fig. 3, fig. 3 is a schematic diagram of a recommended flow according to an embodiment of the application. In a coffee shop ordering scene, beverage types such as coffee, milk tea, fruit juice and the like belong to the class of first grade products; specific certain beverages such as latte, pearl milky tea, freshly squeezed orange juice and the like belong to secondary products. For example, when a user dials a phone call to order a meal, after a dialogue is started, the voice ordering device automatically responds to the user a welcome, and the voice ordering device and the user are in dialogue to collect information required by ordering the meal, and when the user does not know which categories exist on a menu, the voice ordering device can be actively required to recommend the food. At this time, the voice ordering device can determine whether secondary products exist according to semantic information corresponding to the voice of the user, and when the secondary products exist, the voice ordering device can determine that the product information is collected; when the primary class does not exist, carrying out secondary class inquiry; when neither the secondary nor the primary class exists, the primary class is interrogated. After the user selects the category information according to the recommendation of the voice ordering device, the voice ordering device judges whether the necessary information is missing in the pre-order after determining the category information, if the necessary information is missing in the pre-order information, the missing information needs to be inquired, the necessary information in each pre-order is ensured to be complete, and the order is confirmed with the user. Finally, personal information of the user, including telephone numbers, names, addresses and the like, is collected, the user responds to the ending language, the whole dialogue is ended, and meal ordering is completed.

It should be noted that, in the interaction process between the voice ordering device and the user, for example, when the intended information is an order, the object to be ordered is not determined according to the semantic information, and then a voice for inquiring the object to be ordered is generated and played to the user, or when the intended information is a recommendation, the object to be recommended is determined by combining the semantic information; generating recommended voices according to the object to be recommended and playing the recommended voices to a user, and if the voices of the user are collected in the playing process, carrying out voice recognition and semantic recognition on the collected voices to obtain voice recognition results and semantic recognition results; stopping playing the voice when the voice recognition result or the semantic recognition result meets a preset interrupt condition, and processing the collected voice; the preset interrupt condition includes any one or more of the following conditions: the number of words in the voice recognition result is larger than a preset word number threshold value, preset interrupt keywords exist in the voice recognition result, and intention information exists in the semantic recognition result. After the collected voice processing is completed, the voice is continuously played.

For example, during the playing process, the user suddenly starts speaking, the voice ordering device performs voice recognition and semantic recognition on the voice of the user, and when the number of words spoken by the user exceeds a preset word number threshold, the user is considered to be interrupted, and the user needs to stop playing the voice at this time and start listening to the content of the new words spoken by the user; if the word number does not reach the preset word number threshold, the words spoken by the user are ignored, and the playing of the voice does not need to be stopped. For example, in the playing process, the user suddenly starts speaking, the voice ordering device performs voice recognition and semantic recognition on the voice of the user, and when the user speaks and can be analyzed out the actual intention, for example, "I't have thirsty today, a cup of boiled water is first come, the voice ordering device stops playing the voice, and generates new pre-order information according to the steps 101-104, then continues playing the previous voice, completing the perfection and ordering process of the previous pre-order, and performing the perfection and ordering process of the new pre-order after the completion; for another example, in the playing process, the user suddenly starts speaking, the voice ordering device performs voice recognition and semantic recognition on the voice of the user, and the speaking of the user includes certain specific custom keywords, for example, "waiting for me to hear," the voice ordering device stops playing the voice and starts to repeat the voice before playing.

In the embodiment of the application, voice recognition and semantic recognition are carried out on voice by collecting voice of a user, so as to obtain semantic information; the semantic information includes: intention information and slot information; when the intention information is the order, determining an object to be ordered according to the semantic information and generating pre-order information of the object to be ordered; comparing the pre-order information with a necessary information list corresponding to the object to be ordered, and judging whether the pre-order information lacks necessary information; when the necessary information is not needed in the pre-order information, the pre-order information and the personal information of the user are combined to execute the order placing operation. According to the method, voice recognition, semantic understanding and order management are carried out on voice of the user, order information of the user is determined, order placing is completed, a large number of customer service personnel are prevented from being arranged on a merchant side, order placing flow of the user is simplified, order placing cost is reduced, and order placing efficiency is improved.

Corresponding to the voice meal ordering methods provided in the above embodiments, an embodiment of the present application further provides a voice meal ordering device, and since the voice meal ordering device provided in the embodiment of the present application corresponds to the voice meal ordering method provided in the above embodiments, implementation of the voice meal ordering method is also applicable to the voice meal ordering device provided in the embodiment, and will not be described in detail in the embodiment. Fig. 4 is a schematic diagram according to a second embodiment of the present application. As shown in fig. 4, the voice ordering apparatus 400 includes: the system comprises an acquisition module 410, a generation module 420, a comparison module 430 and an ordering module 440.

The collection module 410 is configured to collect voice of a user, perform voice recognition and semantic recognition on the voice, and obtain semantic information; the semantic information includes: intention information and slot information; the generating module 420 is configured to determine an object to be ordered according to the semantic information and generate pre-order information of the object to be ordered when the intention information is to be ordered; a comparison module 430, configured to compare the pre-order information with a list of necessary information corresponding to the object to be ordered, and determine whether the pre-order information lacks necessary information; the ordering module 440 is configured to perform an ordering operation in combination with the pre-order information and the personal information of the user when the pre-order information is not lacking.

As a possible implementation manner of the embodiment of the present application, as shown in fig. 5, fig. 5 is a schematic diagram according to a third embodiment of the present application on the basis of the one shown in fig. 4. The voice ordering device 400 further includes: a first acquisition module 450 and an update module 460.

The first obtaining module 450 is configured to obtain, when the pre-order information lacks the necessary information, the first missing necessary information, generate, in combination with the first necessary information, a voice of the query, and play the voice to the user, so as to obtain the first necessary information in combination with the voice replied by the user; the updating module 460 is configured to update the pre-order information according to the first necessary information until the pre-order information is not missing the necessary information.

As a possible implementation manner of the embodiment of the present application, when the intent information is to be placed, if the object to be placed is not determined according to the semantic information, the generating module 420 is further configured to generate a voice for querying the object to be placed and play the voice to the user, so as to determine the object to be placed in combination with the voice replied by the user.

As a possible implementation manner of the embodiment of the present application, as shown in fig. 6, fig. 6 is a schematic diagram according to a fourth embodiment of the present application on the basis of the one shown in fig. 4. The voice ordering device 400 further includes: a determination module 470.

The determining module 470 is configured to determine, when the intent information is recommendation, an object to be recommended in combination with the semantic information; the generating module 420 is further configured to generate recommended voices according to the object to be recommended and play the recommended voices to the user for selection by the user.

As a possible implementation manner of the embodiment of the present application, as shown in fig. 7, fig. 7 is a schematic diagram according to a fifth embodiment of the present application on the basis of fig. 6. The voice ordering device 400 further includes: a second acquisition module 480 and a processing module 490.

The second obtaining module 480 is configured to, in a process of playing the voice, if the voice of the user is collected, perform voice recognition and semantic recognition on the collected voice, and obtain a voice recognition result and a semantic recognition result; the processing module 490 is configured to suspend playing the voice when the voice recognition result or the semantic recognition result meets a preset interrupt condition, and process the collected voice; the preset interrupt condition includes any one or more of the following conditions: the word number in the voice recognition result is larger than a preset word number threshold value, a preset interrupt keyword exists in the voice recognition result, and intention information exists in the semantic recognition result; the processing module 490 is further configured to continue playing the voice after the collected voice processing is completed.

According to the voice meal ordering device, voice recognition and semantic recognition are carried out on voice through collecting voice of a user, so that semantic information is obtained; the semantic information includes: intention information and slot information; when the intention information is to be placed, determining an object to be placed according to semantic information and generating pre-order information of the object to be placed; comparing the pre-order information with a necessary information list corresponding to the object to be ordered, and judging whether the pre-order information lacks necessary information; when the pre-order information is not lack of necessary information, the pre-order information and the personal information of the user are combined to execute the order placing operation. According to the method, voice recognition, semantic understanding and order management are carried out on voice of the user, order information of the user is determined, order placing is completed, a large number of customer service personnel are prevented from being arranged on a merchant side, order placing flow of the user is simplified, order placing cost is reduced, and order placing efficiency is improved.

According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.

As shown in fig. 8, a block diagram of an electronic device of a voice ordering method according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 8, the electronic device includes: one or more processors 801, memory 802, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 801 is illustrated in fig. 8.

Memory 802 is a non-transitory computer readable storage medium provided by the present application. The memory stores instructions executable by the at least one processor to cause the at least one processor to execute the voice meal ordering method provided by the application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to execute the voice ordering method provided by the present application.

The memory 802 is used as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the voice ordering method in the embodiment of the present application (e.g., the acquisition module 410, the generation module 420, the comparison module 430, the ordering module 440 shown in fig. 4, the first acquisition module 450 and the update module 460 shown in fig. 5, the determination module 470 shown in fig. 6, the second acquisition module 480 and the processing module 490 shown in fig. 7). The processor 801 executes various functional applications of the server and data processing, i.e., implements the voice ordering method in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 802.

Memory 802 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created from the use of the electronic device for voice ordering, etc. In addition, memory 302 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 302 may optionally include memory located remotely from processor 301, which may be connected to the electronic device for voice ordering over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method for ordering food by voice can further comprise: an input device 803 and an output device 804. The processor 801, memory 802, input devices 803, and output devices 804 may be connected by a bus or other means, for example in fig. 8.

The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device for voice ordering, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. The output device 804 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A voice meal ordering method is characterized by comprising the following steps:

collecting voice of a user, performing voice recognition and semantic recognition on the voice, and obtaining semantic information; the semantic information includes: intention information and slot information;

when the intention information is to be placed, determining an object to be placed according to the semantic information and generating pre-order information of the object to be placed;

comparing the pre-order information with a necessary information list corresponding to the object to be ordered, and judging whether the pre-order information lacks necessary information;

when the pre-order information does not lack necessary information, executing an order placing operation by combining the pre-order information and the personal information of the user;

the method further comprises the steps of:

when the necessary information is absent in the pre-order information, acquiring the absent first necessary information, generating inquiry voice by combining the first necessary information and playing the inquiry voice to the user so as to acquire the first necessary information by combining the voice replied by the user;

Updating the pre-order information according to the first requisite information until the requisite information is not absent in the pre-order information, wherein each pre-order information updating needs to be jointly determined by combining current semantic information and current pre-order state;

in the process of playing the voice, if the voice of the user is collected, carrying out voice recognition and semantic recognition on the collected voice to obtain a voice recognition result and a semantic recognition result;

stopping playing the voice when the voice recognition result or the semantic recognition result meets a preset interrupt condition, and processing the collected voice; the preset interrupt condition includes any one or more of the following conditions: the number of words in the voice recognition result is larger than a preset word number threshold value, a preset interrupt keyword exists in the voice recognition result, and intention information exists in the semantic recognition result;

and continuing to play the voice after the collected voice processing is completed.

2. The method as recited in claim 1, further comprising:

and when the intention information is to make an order, if the object to be made is not determined according to the semantic information, generating voice for inquiring the object to be made and playing the voice to the user so as to determine the object to be made by combining the voice replied by the user.

3. The method as recited in claim 1, further comprising:

when the intention information is recommendation, determining an object to be recommended by combining the semantic information;

and generating recommended voice according to the object to be recommended and playing the recommended voice to the user so as to be convenient for the user to select.

4. A voice ordering device, comprising:

the acquisition module is used for acquiring voice of a user, carrying out voice recognition and semantic recognition on the voice and acquiring semantic information; the semantic information includes: intention information and slot information;

the generation module is used for determining an object to be ordered according to the semantic information and generating pre-order information of the object to be ordered when the intention information is the order;

the comparison module is used for comparing the pre-order information with a necessary information list corresponding to the object to be ordered and judging whether the pre-order information lacks necessary information or not;

the order placing module is used for executing order placing operation by combining the pre-order information and the personal information of the user when the pre-order information does not lack necessary information;

the device further comprises: the first acquisition module and the updating module;

The first acquisition module is used for acquiring the first missing necessary information when the necessary information is missing in the pre-order information, generating inquiry voice by combining the first necessary information and playing the inquiry voice to the user so as to acquire the first necessary information by combining the voice replied by the user;

the updating module is configured to update the pre-order information according to the first requisite information until the pre-order information is not lacking, where each pre-order information update needs to be determined together with the current semantic information and the current pre-order state;

the second acquisition module and the processing module;

the second acquisition module is used for carrying out voice recognition and semantic recognition on the collected voice if the voice of the user is collected in the voice playing process, and acquiring a voice recognition result and a semantic recognition result;

the processing module is used for stopping playing the voice when the voice recognition result or the semantic recognition result meets a preset interrupt condition and processing the collected voice; the preset interrupt condition includes any one or more of the following conditions: the number of words in the voice recognition result is larger than a preset word number threshold value, a preset interrupt keyword exists in the voice recognition result, and intention information exists in the semantic recognition result;

And the processing module is also used for continuing to play the voice after the collected voice is processed.

5. The apparatus of claim 4, wherein the generating module is further configured to, when the intention information is to place an order, if the object to be placed is not determined according to the semantic information, generate a voice for querying the object to be placed and play the voice to the user, so as to determine the object to be placed in combination with the voice replied by the user.

6. The apparatus as recited in claim 4, further comprising: a determining module;

the determining module is used for determining an object to be recommended according to the semantic information when the intention information is recommendation;

the generation module is further used for generating recommended voice according to the object to be recommended and playing the recommended voice to the user so as to be convenient for the user to select.

7. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-3.

8. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-3.