CN113012697A - Information interaction method and device and electronic equipment - Google Patents

Information interaction method and device and electronic equipment Download PDF

Info

Publication number
CN113012697A
CN113012697A CN202110246012.2A CN202110246012A CN113012697A CN 113012697 A CN113012697 A CN 113012697A CN 202110246012 A CN202110246012 A CN 202110246012A CN 113012697 A CN113012697 A CN 113012697A
Authority
CN
China
Prior art keywords
word
awakening
wake
audio data
target user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110246012.2A
Other languages
Chinese (zh)
Inventor
孙建伟
王飞
罗讷
李武波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Didi Infinity Technology and Development Co Ltd
Original Assignee
Beijing Didi Infinity Technology and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Didi Infinity Technology and Development Co Ltd filed Critical Beijing Didi Infinity Technology and Development Co Ltd
Priority to CN202110246012.2A priority Critical patent/CN113012697A/en
Publication of CN113012697A publication Critical patent/CN113012697A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The embodiment of the invention discloses an information interaction method, an information interaction device and electronic equipment. The embodiment of the invention determines the awakening word corresponding to the audio information according to at least one awakening word model trained in advance by receiving the audio information, creates the task according to the travel information corresponding to the awakening word, and sends the task creation result to the target user terminal. Therefore, the user can realize one-click task creation through the awakening words, the operation complexity of task creation is simplified, and the user experience is improved.

Description

Information interaction method and device and electronic equipment
Technical Field
The invention relates to the technical field of computers, in particular to an information interaction method, an information interaction device and electronic equipment.
Background
The application of the intelligent equipment greatly facilitates the life of users, for example, the old and other users cannot adapt to the complicated interactive operation in the intelligent equipment well due to factors such as memory, the inconvenience of the users is caused, for example, the field of car booking is realized, the old and other users possibly have the conditions such as memory reduction, the trouble of calling the car is caused for the old and other users, some old and other users cannot adapt to the complicated interactive operation of APP in the intelligent mobile phone, and the difficulty of going out of the old and other users is increased.
Disclosure of Invention
In view of this, embodiments of the present invention provide an information interaction method and apparatus, and an electronic device, so that a user can implement a one-click task creation through a wakeup word, thereby simplifying the operation complexity of task creation and improving the user experience.
In a first aspect, an embodiment of the present invention provides an information interaction method, where the method includes:
receiving audio information;
determining a wakeup word corresponding to the audio information according to at least one wakeup word model trained in advance;
creating a task according to the travel information corresponding to the awakening word;
and sending the task creation result to the target user terminal.
In a second aspect, an embodiment of the present invention provides an information interaction apparatus, where the apparatus includes:
a receiving unit configured to receive audio information;
the awakening word determining unit is configured to determine awakening words corresponding to the audio information according to at least one awakening word model trained in advance;
the task creating unit is configured to create a task according to the travel information corresponding to the awakening words;
a transmitting unit configured to transmit the task creation result to the target user terminal.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, where the memory is used to store one or more computer program instructions, where the one or more computer program instructions are executed by the processor to implement the method according to the first aspect of the embodiment of the present invention.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method according to the first aspect of the embodiment of the present invention.
In a fifth aspect, embodiments of the present invention provide a computer program product, which when run on a computer causes the computer to perform the method according to the first aspect of embodiments of the present invention.
The embodiment of the invention determines the awakening word corresponding to the audio information according to at least one awakening word model trained in advance by receiving the audio information, creates the task according to the travel information corresponding to the awakening word, and sends the task creation result to the target user terminal. Therefore, the user can realize one-click task creation through the awakening words, the operation complexity of task creation is simplified, and the user experience is improved.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart of an information interaction method of an embodiment of the present invention;
fig. 2 is a flowchart of a method for setting a wakeup word according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a method for setting a wakeup word according to an embodiment of the present invention;
FIG. 4 is a flowchart of another wake-up word setting method according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a method for setting a wakeup word according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an information interaction process according to an embodiment of the invention;
FIG. 7 is a schematic diagram of an information interaction device according to an embodiment of the present invention;
fig. 8 is a schematic diagram of an electronic device of an embodiment of the invention.
Detailed Description
The present invention will be described below based on examples, but the present invention is not limited to only these examples. In the following detailed description of the present invention, certain specific details are set forth. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details. Well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the present invention.
Further, those of ordinary skill in the art will appreciate that the drawings provided herein are for illustrative purposes and are not necessarily drawn to scale.
Unless the context clearly requires otherwise, throughout the description, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.
In the following embodiments, the specific description is mainly given by task creation in the application field of network appointment, it should be understood that the application field is not limited in this embodiment, and other application fields, for example, logistics fields such as express delivery, and the like, may all use the interaction method of this embodiment to perform task creation.
It should be understood that, in any embodiment of the present embodiment, after obtaining the user authorization, the obtained relevant user information, such as account information, or location information, is used to create a relevant task for the user.
Fig. 1 is a flowchart of an information interaction method according to an embodiment of the present invention. As shown in fig. 1, the information interaction method according to the embodiment of the present invention includes the following steps:
step S110, receiving the audio information. In an alternative implementation, the target user uploads the audio information through an application in the target user terminal or an applet embedded in the application in the target user terminal. Optionally, a task creation control is provided in a relevant page of the application program, and the audio information is uploaded by triggering the task creation control.
Step S120, determining the awakening words corresponding to the audio information according to at least one awakening word model trained in advance.
In an optional implementation manner, each wakeup word has a corresponding wakeup word model, and the wakeup word model is obtained based on audio data training, where the audio data includes corresponding wakeup word samples. Optionally, assuming that the wakeup word is "home", audio data is obtained, and the audio data includes a plurality of wakeup word samples including "home" and a plurality of voice samples not including "home". Alternatively, audio data comprising "go home" spoken by the same user or a different user may be collected. Alternatively, the content of the plurality of pieces of audio data without the wake-up word "go home" may be the same or different.
And determining a plurality of wake-up word samples including 'returning home' as positive sample data, and determining a plurality of voice samples not including 'returning home' as negative sample data. Optionally, text labeling is performed on the positive sample data. Optionally, the positive sample data and the negative sample data are preprocessed, for example, frequency division processing is performed, fbank feature extraction is performed on the preprocessed positive sample data and negative sample data, fbank features (for example, fbank30 dimensional features) of the positive sample data and fbank features of the negative sample data are obtained, and the fbank features (for example, fbank30 dimensional features) of the positive sample data and the fbank features of the negative sample data are input into the initial model to be trained, so that a wakeup word model corresponding to a wakeup word "go home" is obtained.
In an optional implementation manner, the wakeup word model in this embodiment is a Network model based on CRNN (Convolutional Recurrent Neural Network). CRNN is a network structure of CNN (convolutional layer) + RNN (cyclic layer) + CTC (loss) for identifying data sequences of indefinite length from end to end. The CNN is adopted to extract the characteristics of the audio data, the RNN is adopted to predict the characteristic sequence, namely, each characteristic vector in the sequence is learned, the predicted label distribution is output, and a series of label distributions acquired from an RNN layer are converted into a final label sequence by CTC loss. Therefore, the embodiment can learn the context relationship of the audio data by utilizing the RNN and the CTC, so that the accuracy of data identification can be improved, and the robustness of the model can be improved.
In an alternative implementation, the target user may set multiple wake-up words to bind multiple different trip information. Optionally, in the multiple wake-up words set by the target user, the wake-up words may only include fixed wake-up words provided by the server, may also only include user-defined wake-up words, and may also include fixed wake-up words and user-defined wake-up words provided by the server at the same time. That is to say, the target user may select the fixed wakeup word provided by the server to set and enable the one-key creation task based on the fixed wakeup word, and at the same time, may customize the wakeup word, and obtain the wakeup word model corresponding to the customized wakeup word through the training by uploading the audio data of the customized wakeup word, thereby may set and enable the one-key creation task based on the customized wakeup word. Optionally, a setting method of the fixed wakeup word and a setting method of the custom wakeup word are respectively shown in fig. 2 and fig. 3.
Fig. 2 is a flowchart of a method for setting a wakeup word according to an embodiment of the present invention. In an alternative implementation manner, the wakeup word model of at least one fixed wakeup word is trained in advance for the target user to select. As shown in fig. 2, the method for setting a wakeup word according to the embodiment of the present invention includes the following steps:
and step S210, controlling the fixed awakening words to be displayed on the page of the target user terminal. Optionally, the fixed wakeup word of the wakeup word model trained in advance is controlled to be displayed on the page of the application program in the target user terminal or the applet embedded in the application program in the target user terminal, so that the user can select the fixed wakeup word.
Step S220, receiving the audio information of the wakeup word input by the target user. Optionally, the target user may input the audio information of the wakeup word corresponding to the selected fixed wakeup word, that is, the audio information including the wakeup word, through the application program in the target user terminal or the voice input box in the applet embedded in the application program in the target user terminal.
Step S230, determining a wakeup word model corresponding to the wakeup word audio information. In an alternative implementation, voice recognition is performed on the audio information of the wake word, corresponding text information is determined, and semantic recognition is performed on the text information to determine a matched wake word model. Optionally, an ASR (Automatic Speech Recognition) model is used to perform Speech Recognition on the audio information of the wakeup word, and determine corresponding text information. Optionally, a Natural Language Understanding (NLU) model is used to perform semantic recognition on the text information corresponding to the audio information of the wakeup word, and fuzzy matching is performed according to a semantic recognition result to determine the wakeup word model corresponding to the audio information of the wakeup word.
Step S240, binding the fixed wakeup word selected by the target user with the trip information input by the target user. In an optional implementation manner, the awakening word model corresponding to the fixed awakening word selected by the user is sent to the target user terminal for online or offline use by the target user terminal, the target user terminal is controlled to display on a page to form a travel information input box, the travel information input by the target user in the travel information input box is acquired, and the fixed awakening word selected by the target user is bound with the travel information input by the target user. The trip information may include, among other things, a trip route. Taking the network car booking application scenario as an example, the travel information may further include information such as a network car booking type, for example, a taxi, a express taxi, a car pool, and the like. The preset information can be ' home ' and the corresponding travel route can be ' building A-cell east door ' in B ' and the type of the networked taxi reservation can be a taxi.
Fig. 3 is a schematic diagram of a method for setting a wakeup word according to an embodiment of the present invention. Optionally, taking a car-ordering application scenario as an example, as shown in fig. 3, a wakeup word is set in a car-ordering application page 31 of the target user terminal. Wherein, the fixed wake-up words "go to work", "go home" and "go to school" are controlled to be displayed in the page 31 of the target user terminal. In an alternative implementation manner, the target user may perform speech recognition on the audio information of the wakeup word through the audio information of the wakeup word input in the speech input box 311 of the page 31, determine corresponding text information, and perform semantic recognition on the text information to determine a matched wakeup word model. Optionally, assuming that the fixed wakeup word selected by the user is "on duty", after the target user terminal downloads the wakeup word model for updating the fixed wakeup word "on duty", the target user terminal page is switched to the page 32. Therein, a travel information input box 321 may be included in the page 32. Optionally, the target user may input the trip information through the voice input box 322, perform voice recognition on the trip information audio input by the target user at the target user terminal or the service platform, and fill the recognized text information into the trip information input box 321, respectively. Alternatively, the target user may manually modify the information in the trip information input box 321 to ensure the accuracy of the trip information. In other alternative implementations, the target user may directly fill the corresponding travel information in the travel information input box 321. After the awakening words and the travel information are confirmed to be correct, the target user can bind the awakening words and the corresponding travel information by triggering the confirmation key "ok". As shown in fig. 3, the wake-up word "work" is bound to the trip information "with a taxi from the east door of the a-cell to the B-building".
In another optional implementation manner, the display frame displaying the fixed wakeup words "work", "go home", "go to school" may be used as a selection control, that is, the display frame of the fixed wakeup words "work", "go home", "go to school" may be triggered to determine the selected fixed wakeup words, the wakeup word model corresponding to the fixed wakeup words selected by the user is sent to the target user terminal for the target user terminal to use online or offline, the target user terminal is controlled to display on the page to form the route information input frame, the route information input by the target user in the route information input frame is obtained, and the fixed wakeup words selected by the target user are bound to the route information input by the target user.
Fig. 4 is a flowchart of another wake-up word setting method according to an embodiment of the present invention. In an alternative implementation, the user may customize the wake-up word. As shown in fig. 4, the method for setting a wakeup word according to the embodiment of the present invention includes the following steps:
step S310, receiving a plurality of wake-up word audio data with the self-defined wake-up word input by the target user. Optionally, the target user is guided to input multiple pieces of wakeup word audio data in response to the target user triggering the custom wakeup word control.
Step S320, obtain a plurality of non-wakeup word audio data without the custom wakeup word. Optionally, multiple pieces of non-wakeup word audio data may be obtained from the audio database, and the user may also be guided to input multiple pieces of non-wakeup word audio data, which is not limited in this embodiment.
Step S330, training according to the plurality of awakening word audio data and the non-awakening word audio data to obtain an awakening word model corresponding to the user-defined awakening word. Optionally, a positive sample is determined according to the multiple wakeup word audio data, a negative sample is determined according to the multiple non-wakeup word audio data, and a wakeup word model corresponding to the user-defined wakeup word is obtained according to the training of the positive sample and the negative sample. Optionally, at least part of the wakeup word audio data is determined as a positive sample test set, at least part of the non-wakeup word audio data is determined as a negative sample test set, the trained wakeup word model is tested according to the positive sample test set and the negative sample test set, the performance parameter of the trained wakeup word model is determined, and the trained wakeup word model is obtained in response to the performance parameter meeting the predetermined condition. Optionally, the performance parameter meeting the predetermined condition may be that the recognition accuracy of the wakeup word model is greater than a predetermined threshold.
Optionally, in this embodiment, the wakeup word model is a CRNN-based network model, and the training manner based on the positive samples and the negative samples is as described above, and is not described herein again.
When audio data is acquired, if the audio data input by the target user is too much, the load of the target user is caused, but if the audio data input by the target user is too little, the recognition accuracy of the awakening word model obtained by training is reduced. Therefore, in an implementation manner of this embodiment, a plurality of wake-up word audio data are subjected to noise adding processing and/or speed changing processing to add the wake-up word audio data, and a positive sample is determined according to the added wake-up word audio data. Therefore, more positive samples can be obtained under the condition of not increasing the burden of a user, and the identification accuracy and robustness of the trained awakening word model are improved.
And step S340, binding the personalized awakening words with the travel information input by the target user. In an optional implementation manner, the awakening word model corresponding to the user-defined awakening word is sent to the target user terminal for online or offline use of the target user terminal, the target user terminal is controlled to display on a page to form a travel information input box, travel information input by the target user in the travel information input box is acquired, and the user-defined awakening word of the target user is bound with the travel information input by the target user. The trip information may include, among other things, a trip route. Taking the network car booking application scenario as an example, the travel information may further include information such as a network car booking type, for example, a taxi, a express taxi, a car pool, and the like. The preset information can be ' home ' and the corresponding travel route can be ' building A-cell east door ' in B ' and the type of the networked taxi reservation can be a taxi.
Fig. 5 is a schematic diagram of a method for setting a wakeup word according to an embodiment of the present invention. Optionally, taking the car appointment application scenario as an example, as shown in fig. 5, a wakeup word is set in the car appointment application page 51 of the target user terminal. Wherein, the fixed wake-up word "go to work", "go home", "go to school" and the custom wake-up word control 511 are controlled to be displayed in the page 51 of the target user terminal. In this embodiment, in response to the triggering of the custom wakeup word control 511 or the recognition that the audio information input by the target user through the voice input box 512 is a sentence like "custom setup wakeup word", the target user terminal page is switched to the page 52. In page 52, the control-target user terminal directs the target user to enter a plurality of wake-up word audio data. Optionally, control displays a prompt bubble 522 in page 52, where bubble 522 includes prompt information, such as "please long press to say at least 30 wake word sentences" to guide the target user to input multiple sentences including custom wake words by long pressing voice input control 521. In other optional implementation manners, when jumping to the page 52, the control target terminal prompts the target user to input a plurality of pieces of wake-up word audio data through voice broadcast, for example, voice broadcast "please say at least 30 wake-up word sentences", and the like.
In an alternative implementation, after the collection of the wakeup word audio data is completed, the target user may be further guided to input a plurality of sentences not including the custom wakeup word by long pressing the voice input control 521 on the page 52 to obtain the non-wakeup word audio data. In other optional implementation manners, the audio data of the non-wakeup word may also be obtained through the audio database, and the embodiment does not limit the manner of obtaining the audio data of the non-wakeup word.
After the awakening word audio data and the non-awakening word audio data are collected, an awakening word model corresponding to the user-defined awakening word is obtained according to the awakening word audio data and the non-awakening word audio data in a training mode, and the awakening word model corresponding to the trained user-defined awakening word is sent to the target user terminal to be used by the target user terminal on line or off line.
After the target user terminal downloads the wakeup word model corresponding to the custom wakeup word, the target user terminal page is switched to the page 53. A travel information input box 531 may be included in the page 53. Optionally, the target user may input the trip information through the voice input box 532, perform voice recognition on the trip information audio input by the target user at the target user terminal or the service platform, and fill the recognized text information into the trip information input box 531, respectively. Alternatively, the target user may manually modify the information in the travel information input box 531 to ensure the accuracy of the travel information. In other alternative implementations, the target user may directly fill the corresponding trip information in the trip information input box 531. After the awakening words and the travel information are confirmed to be correct, the target user can bind the awakening words and the corresponding travel information by triggering the confirmation key "ok". As shown in fig. 5, the custom wake-up word "train station" is bound to the trip information "by using a taxi from the east door of the a-cell to the C-train station".
In an alternative implementation, if the task is still not successfully created after the target user speaks the wake word for a predetermined time or multiple times, the identification accuracy of the corresponding wake word model may be low. In this embodiment, the wake-up word model may be retrained to further improve the accuracy of the wake-up word model. The retraining process of the wake word model is similar to steps S310-S330, and is not described herein again.
Therefore, according to the embodiment, a plurality of fixed awakening words are provided for the user to select and bind the corresponding travel route, so that the interaction flow can be reduced, and convenience is provided for the user to set the awakening words. In the embodiment, the user can set the personalized wake-up word by providing the customized wake-up word, so that the setting of the personalized travel route of the user can be realized, and the user experience is further improved.
Step S130, a task is created according to the travel information corresponding to the awakening word corresponding to the currently input audio information of the target user. As described above, each wakeup word is bound with corresponding trip information, so that the trip information can be obtained according to the wakeup word corresponding to the audio information currently input by the target user, and a corresponding task is created according to the trip information. The trip information may include, among other things, a trip route. Taking the network car booking application scenario as an example, the travel information may further include information such as a network car booking type, for example, a taxi, a express taxi, a car pool, and the like. Assuming that the awakening word spoken by the target user is 'on duty', the travel route corresponding to the 'on duty' is 'east door of the A cell-B building', and the vehicle type is taxi, establishing a network appointment task for the 'east door of the A cell-B building' and the vehicle type set by the target user according to the travel route corresponding to the 'on duty', namely generating a network appointment order for a network appointment driver to receive an order through a driver terminal, so that the network appointment order can be established without a complex interaction process, traversal is provided for groups such as old people and the like, and the user experience is further improved.
Step S140, sending the task creation result to the target user terminal. Optionally, the user may be prompted by an application in the target user terminal or a service platform applet embedded in another application in the user terminal for the task creation result and the current state of the task. For example, the control target user terminal broadcasts or displays "your net appointment order from east door of a cell to building B to create, waiting for the driver to pick up the order".
The embodiment of the invention determines the awakening word corresponding to the audio information according to at least one awakening word model trained in advance by receiving the audio information, creates the task according to the travel information corresponding to the awakening word, and sends the task creation result to the target user terminal. Therefore, the user can realize one-click task creation through the awakening words, the operation complexity of task creation is simplified, and the user experience is improved.
Fig. 6 is a schematic diagram of an information interaction process according to an embodiment of the present invention. Taking a car-ordering application scenario as an example, as shown in fig. 6, a car-ordering APP page 61 of a target user terminal includes a wake-up word one-key-typing control 611, the target user can input audio information by triggering the wake-up word one-key-typing control 611, the target user terminal determines, in response to receiving the audio information, a wake-up word corresponding to the audio information according to at least one pre-trained wake-up word model, creates a task according to travel information corresponding to the wake-up word, and sends a task creation result to the target user terminal. As shown in fig. 6, the target user terminal is caused to broadcast a prompt message, for example, a prompt message of "your net appointment order from east door of a cell to building B to create, waiting for driver to pick up" while displaying details of the current status of the task in the form of a floating window 612 in the user terminal interface 61, at the same time when the task is successfully created. For example, the task current state is: waiting for the driver to pick up the order, the starting point is: cell a is east, and the end point is: and B, building.
The embodiment of the invention determines the awakening word corresponding to the audio information according to at least one awakening word model trained in advance by receiving the audio information, creates the task according to the travel information corresponding to the awakening word, and sends the task creation result to the target user terminal. Therefore, the user can realize one-click task creation through the awakening words, the operation complexity of task creation is simplified, and the user experience is improved.
It should be understood that the pages of the target user terminal in fig. 3, 5 and 6 are only for easy understanding, and do not limit the actual application page corresponding to the information interaction method of the present embodiment.
FIG. 7 is a diagram of an information interaction apparatus according to an embodiment of the present invention. The information interaction device 7 of the embodiment of the present invention includes a receiving unit 71, a wakeup word determining unit 72, a task creating unit 73, and a sending unit 74.
The receiving unit 71 is configured to receive audio information. The wake word determining unit 72 is configured to determine a wake word corresponding to the audio information according to at least one pre-trained wake word model. The task creating unit 73 is configured to create a task according to the trip information corresponding to the wakeup word. The sending unit 74 is configured to send the task creation result to the target user terminal.
In an optional implementation manner, each of the wake words has a corresponding wake word model, and the wake word model is obtained based on training with audio data, where the audio data includes a wake word sample.
In an alternative implementation, the wake-up word comprises a fixed wake-up word. The information interaction device further comprises a first setting unit. The first setting unit comprises a first display subunit, a first receiving subunit, a model determining subunit and a first binding subunit.
The first display subunit is configured to control display of a fixed wake-up word on the target user terminal page. The first receiving subunit is configured to receive awakening word audio information input by a target user, wherein the awakening word audio information has a fixed awakening word selected by the target user. And the model determining subunit is configured to determine a wakeup word model corresponding to the wakeup word audio information. A first binding subunit configured to bind the fixed wakeup word selected by the target user with the trip information input by the target user.
In an alternative implementation, the model determination subunit includes a speech recognition module and a semantic recognition module. And the voice recognition module is configured to perform voice recognition on the awakening word audio information and determine corresponding text information. A semantic recognition module is configured to semantically recognize the text information to determine a matching wake word model.
In an alternative implementation, the wake-up word includes a custom wake-up word. The information interaction device 7 further comprises a second setting unit. The second setting unit comprises a second receiving subunit, a data acquiring subunit, a training subunit and a second binding subunit.
The second receiving subunit is configured to receive a plurality of pieces of wake-up word audio data with the custom wake-up word input by the target user. The data acquisition subunit is configured to acquire a plurality of pieces of non-wakeup word audio data without the self-defined wakeup words. And the training subunit is configured to train according to the plurality of awakening word audio data and the non-awakening word audio data to obtain an awakening word model corresponding to the user-defined awakening word. And the second binding subunit is configured to bind the personalized awakening word with the travel information input by the target user.
In an alternative implementation, the training subunit includes a sample determination module and a training module. A sample determining module configured to determine a positive sample according to the plurality of wake word audio data and determine a negative sample according to the plurality of non-wake word audio data. And the training module is configured to train and obtain a wakeup word model corresponding to the user-defined wakeup word according to the positive sample and the negative sample.
In an optional implementation manner, the training subunit further includes a test set determining module, a testing module, and a model obtaining module. A test set determining module configured to determine at least part of the wakeup word audio data as a positive sample test set and determine at least part of the non-wakeup word audio data as a negative sample test set. And the testing module is configured to test the trained awakening word model according to the positive sample test set and the negative sample test set and determine the performance parameters of the trained awakening word model. And the model acquisition module is configured to respond to the performance parameter meeting a preset condition and acquire the trained awakening word model.
In an alternative implementation, the sample determination module includes a data expansion sub-module and a positive sample determination sub-module. The data expansion submodule is configured to perform noise adding processing and/or variable speed processing on a plurality of pieces of awakening word audio data so as to increase the awakening word audio data. The positive sample determination submodule is configured to determine a positive sample from the augmented wake word audio data.
Fig. 8 is a schematic diagram of an electronic device of an embodiment of the invention. As shown in fig. 8, the electronic device 8 is a general-purpose data processing apparatus comprising a general-purpose computer hardware structure including at least a processor 81 and a memory 82. The processor 81 and the memory 82 are connected by a bus 83. The memory 82 is adapted to store instructions or programs executable by the processor 81. Processor 81 may be a stand-alone microprocessor or a collection of one or more microprocessors. Thus, the processor 81 implements the processing of data and the control of other devices by executing instructions stored by the memory 82 to perform the method flows of embodiments of the present invention as described above. The bus 83 connects the above components together, and also connects the above components to a display controller 84 and a display device and an input/output (I/O) device 85. Input/output (I/O) devices 85 may be a mouse, keyboard, modem, network interface, touch input device, motion sensing input device, printer, and other devices known in the art. Typically, the input/output devices 85 are coupled to the system through an input/output (I/O) controller 86.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus (device) or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may employ a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations of methods, apparatus (devices) and computer program products according to embodiments of the application. It will be understood that each flow in the flow diagrams can be implemented by computer program instructions.
These computer program instructions may be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows.
These computer program instructions may also be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows.
Another embodiment of the invention relates to a computer program product for causing a computer to perform some or all of the above method embodiments when the computer program product runs on a computer.
Another embodiment of the invention is directed to a non-transitory storage medium storing a computer-readable program for causing a computer to perform some or all of the above-described method embodiments.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be accomplished by specifying the relevant hardware through a program, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The embodiment of the invention discloses a TS1 and an information interaction method, wherein the method comprises the following steps:
receiving audio information;
determining a wakeup word corresponding to the audio information according to at least one wakeup word model trained in advance;
creating a task according to the travel information corresponding to the awakening word;
and sending the task creation result to the target user terminal.
TS2, according to the method of TS1, each of the wake words has a corresponding wake word model, and the wake word models are obtained based on audio data training, where the audio data includes wake word samples.
TS3, the method according to TS1 or TS2, the wake up word comprising a fixed wake up word;
the fixed awakening word is set through the following steps:
controlling a fixed awakening word to be displayed on a target user terminal page;
receiving awakening word audio information input by a target user, wherein the awakening word audio information comprises a fixed awakening word selected by the target user;
determining a wakeup word model corresponding to the wakeup word audio information;
and binding the fixed awakening words selected by the target user with the travel information input by the target user.
TS4, determining a wakeup word model corresponding to the wakeup word audio information according to the method of TS3 includes:
performing voice recognition on the awakening word audio information, and determining corresponding text information;
and performing semantic recognition on the text information to determine a matched awakening word model.
TS5, the method according to any one of TS1-TS4, the wake-up word comprising a custom wake-up word;
the setting step of the user-defined awakening word comprises the following steps:
receiving a plurality of awakening word audio data with a self-defined awakening word input by a target user;
acquiring a plurality of non-awakening word audio data without self-defined awakening words;
training according to the plurality of awakening word audio data and the non-awakening word audio data to obtain an awakening word model corresponding to the user-defined awakening word;
and binding the personalized awakening words with the travel information input by the target user.
The TS6, training and obtaining the awakening word model corresponding to the self-defined awakening word according to the plurality of pieces of awakening word audio data and the non-awakening word audio data according to the method of TS5, includes:
determining a positive sample according to the plurality of awakening word audio data, and determining a negative sample according to the plurality of non-awakening word audio data;
and training according to the positive sample and the negative sample to obtain a wakeup word model corresponding to the user-defined wakeup word.
The TS7, the method according to TS5 or TS6, the obtaining of the wakeup word model corresponding to the custom wakeup word by training according to the plurality of wakeup word audio data and the non-wakeup word audio data, further includes:
determining at least part of the awakening word audio data as a positive sample test set, and determining at least part of the non-awakening word audio data as a negative sample test set;
testing the trained awakening word model according to the positive sample test set and the negative sample test set, and determining the performance parameters of the trained awakening word model;
and responding to the performance parameter meeting a preset condition, and acquiring a trained awakening word model.
TS8, the method of TS6, determining a positive sample from a plurality of the wake word audio data comprising:
carrying out noise adding processing and/or variable speed processing on the plurality of awakening word audio data to increase the awakening word audio data;
and determining a positive sample according to the added awakening word audio data.
The embodiment of the invention discloses a TS9 and an information interaction device, wherein the device comprises:
a receiving unit configured to receive audio information;
the awakening word determining unit is configured to determine awakening words corresponding to the audio information according to at least one awakening word model trained in advance;
the task creating unit is configured to create a task according to the travel information corresponding to the awakening words;
a transmitting unit configured to transmit the task creation result to the target user terminal.
TS10, according to the apparatus of TS9, each of the wake words has a corresponding wake word model, the wake word models are obtained based on audio data training, wherein the audio data includes wake word samples.
TS11, the device according to TS9 or TS10, the wake-up word comprising a fixed wake-up word, the device further comprising a first setting unit;
the first setting unit includes:
the first display subunit is configured to control a fixed awakening word to be displayed on the target user terminal page;
the first receiving subunit is configured to receive awakening word audio information input by a target user, wherein the awakening word audio information comprises a fixed awakening word selected by the target user;
the model determining subunit is configured to determine a wakeup word model corresponding to the wakeup word audio information;
a first binding subunit configured to bind the fixed wakeup word selected by the target user with the trip information input by the target user.
TS12, the apparatus of TS11, the model determining subunit comprising:
the voice recognition module is configured to perform voice recognition on the awakening word audio information and determine corresponding text information;
a semantic recognition module configured to perform semantic recognition on the text information to determine a matched wake word model.
TS13, the device according to any one of TS9-TS12, the wake-up word including a custom wake-up word, the device further including a second setting unit;
the second setting unit includes:
the second receiving subunit is configured to receive a plurality of pieces of awakening word audio data with the self-defined awakening words input by the target user;
the data acquisition subunit is configured to acquire a plurality of pieces of non-awakening word audio data without the self-defined awakening words;
the training subunit is configured to train according to the plurality of wake-up word audio data and the non-wake-up word audio data to obtain a wake-up word model corresponding to the user-defined wake-up word;
and the second binding subunit is configured to bind the personalized awakening word with the travel information input by the target user.
TS14, the apparatus according to TS13, the training subunit comprising:
a sample determining module configured to determine a positive sample according to the plurality of wake word audio data and a negative sample according to the plurality of non-wake word audio data;
and the training module is configured to train and obtain a wakeup word model corresponding to the user-defined wakeup word according to the positive sample and the negative sample.
TS15, the apparatus according to TS13 or TS14, the training subunit further comprising:
a test set determination module configured to determine at least part of the wakeup word audio data as a positive sample test set and determine at least part of the non-wakeup word audio data as a negative sample test set;
the testing module is configured to test the trained awakening word model according to the positive sample test set and the negative sample test set and determine performance parameters of the trained awakening word model;
and the model acquisition module is configured to respond to the performance parameter meeting a preset condition and acquire the trained awakening word model.
TS16, the apparatus of TS14, the sample determination module comprising:
the data expansion submodule is configured to perform noise adding processing and/or variable speed processing on a plurality of pieces of awakening word audio data so as to add the awakening word audio data;
a positive sample determination submodule configured to determine a positive sample from the augmented wake word audio data.
An embodiment of the invention discloses a TS17, an electronic device, comprising a memory and a processor, the memory for storing one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method according to any one of TS1-TS 8.
The embodiment of the invention discloses a TS18 and a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and when the computer program is executed by a processor, the method of any one of TS1-TS8 is realized.
The embodiment of the invention discloses a TS19 and a computer program product, which when run on a computer causes the computer to execute the method as set forth in any one of TS1-TS 8.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An information interaction method, characterized in that the method comprises:
receiving audio information;
determining a wakeup word corresponding to the audio information according to at least one wakeup word model trained in advance;
creating a task according to the travel information corresponding to the awakening word;
and sending the task creation result to the target user terminal.
2. The method of claim 1, wherein each of the wake words has a corresponding wake word model, and the wake word model is obtained based on audio data training, wherein the audio data includes a wake word sample.
3. The method of claim 1 or 2, wherein the wake-up word comprises a fixed wake-up word;
the fixed awakening word is set through the following steps:
controlling a fixed awakening word to be displayed on a target user terminal page;
receiving awakening word audio information input by a target user, wherein the awakening word audio information comprises a fixed awakening word selected by the target user;
determining a wakeup word model corresponding to the wakeup word audio information;
and binding the fixed awakening words selected by the target user with the travel information input by the target user.
4. The method of claim 3, wherein determining the wake word model to which the wake word audio information corresponds comprises:
performing voice recognition on the awakening word audio information, and determining corresponding text information;
and performing semantic recognition on the text information to determine a matched awakening word model.
5. The method of any of claims 1-4, wherein the wake-up word comprises a custom wake-up word;
the setting step of the user-defined awakening word comprises the following steps:
receiving a plurality of awakening word audio data with a self-defined awakening word input by a target user;
acquiring a plurality of non-awakening word audio data without self-defined awakening words;
training according to the plurality of awakening word audio data and the non-awakening word audio data to obtain an awakening word model corresponding to the user-defined awakening word;
and binding the personalized awakening words with the travel information input by the target user.
6. The method of claim 5, wherein training the wake word model corresponding to the custom wake word according to the plurality of wake word audio data and the non-wake word audio data comprises:
determining a positive sample according to the plurality of awakening word audio data, and determining a negative sample according to the plurality of non-awakening word audio data;
and training according to the positive sample and the negative sample to obtain a wakeup word model corresponding to the user-defined wakeup word.
7. The method of claim 5 or 6, wherein training the obtaining of the wakeup word model corresponding to the custom wakeup word according to the plurality of wakeup word audio data and the non-wakeup word audio data further comprises:
determining at least part of the awakening word audio data as a positive sample test set, and determining at least part of the non-awakening word audio data as a negative sample test set;
testing the trained awakening word model according to the positive sample test set and the negative sample test set, and determining the performance parameters of the trained awakening word model;
and responding to the performance parameter meeting a preset condition, and acquiring a trained awakening word model.
8. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method of any of claims 1-7.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
10. A computer program product, characterized in that, when the computer program product is run on a computer, it causes the computer to perform the method according to any of claims 1-7.
CN202110246012.2A 2021-03-05 2021-03-05 Information interaction method and device and electronic equipment Pending CN113012697A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110246012.2A CN113012697A (en) 2021-03-05 2021-03-05 Information interaction method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110246012.2A CN113012697A (en) 2021-03-05 2021-03-05 Information interaction method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN113012697A true CN113012697A (en) 2021-06-22

Family

ID=76407008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110246012.2A Pending CN113012697A (en) 2021-03-05 2021-03-05 Information interaction method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113012697A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107564517A (en) * 2017-07-05 2018-01-09 百度在线网络技术(北京)有限公司 Voice awakening method, equipment and system, cloud server and computer-readable recording medium
CN107704275A (en) * 2017-09-04 2018-02-16 百度在线网络技术(北京)有限公司 Smart machine awakening method, device, server and smart machine
CN109916423A (en) * 2017-12-12 2019-06-21 上海博泰悦臻网络技术服务有限公司 Intelligent navigation equipment and its route planning method and automatic driving vehicle
CN111653276A (en) * 2020-06-22 2020-09-11 四川长虹电器股份有限公司 Voice awakening system and method
CN111833881A (en) * 2020-08-07 2020-10-27 斑马网络技术有限公司 Travel voice service generation method, travel accompanying assistant system and electronic equipment
CN112163685A (en) * 2020-09-11 2021-01-01 广州宸祺出行科技有限公司 Intelligent trip matching method and system based on voice AI

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107564517A (en) * 2017-07-05 2018-01-09 百度在线网络技术(北京)有限公司 Voice awakening method, equipment and system, cloud server and computer-readable recording medium
CN107704275A (en) * 2017-09-04 2018-02-16 百度在线网络技术(北京)有限公司 Smart machine awakening method, device, server and smart machine
CN109916423A (en) * 2017-12-12 2019-06-21 上海博泰悦臻网络技术服务有限公司 Intelligent navigation equipment and its route planning method and automatic driving vehicle
CN111653276A (en) * 2020-06-22 2020-09-11 四川长虹电器股份有限公司 Voice awakening system and method
CN111833881A (en) * 2020-08-07 2020-10-27 斑马网络技术有限公司 Travel voice service generation method, travel accompanying assistant system and electronic equipment
CN112163685A (en) * 2020-09-11 2021-01-01 广州宸祺出行科技有限公司 Intelligent trip matching method and system based on voice AI

Similar Documents

Publication Publication Date Title
CN106328148B (en) Natural voice recognition method, device and system based on local and cloud hybrid recognition
US8538755B2 (en) Customizable method and system for emotional recognition
CN112100349A (en) Multi-turn dialogue method and device, electronic equipment and storage medium
CN108877778A (en) Sound end detecting method and equipment
KR20180025121A (en) Method and apparatus for inputting information
CN110990543A (en) Intelligent conversation generation method and device, computer equipment and computer storage medium
CN105654949A (en) Voice wake-up method and device
CN111261162B (en) Speech recognition method, speech recognition apparatus, and storage medium
CN110148399A (en) A kind of control method of smart machine, device, equipment and medium
CN111554276B (en) Speech recognition method, device, equipment and computer readable storage medium
CN111739519A (en) Dialogue management processing method, device, equipment and medium based on voice recognition
CN108549628A (en) The punctuate device and method of streaming natural language information
CN111027291B (en) Method and device for adding mark symbols in text and method and device for training model, and electronic equipment
CN111554275B (en) Speech recognition method, device, equipment and computer readable storage medium
CN111399629B (en) Operation guiding method of terminal equipment, terminal equipment and storage medium
CN113012697A (en) Information interaction method and device and electronic equipment
CN114372476B (en) Semantic truncation detection method, device, equipment and computer readable storage medium
CN113012687B (en) Information interaction method and device and electronic equipment
CN112988992A (en) Information interaction method and device and electronic equipment
CN112151034B (en) Voice control method and device of equipment, electronic equipment and storage medium
CN114203201A (en) Spoken language evaluation method, device, equipment, storage medium and program product
CN114596854A (en) Voice processing method and system based on full-duplex communication protocol and computer equipment
CN113362815A (en) Voice interaction method, system, electronic equipment and storage medium
CN113011198B (en) Information interaction method and device and electronic equipment
CN116583820A (en) Voice interaction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210622

RJ01 Rejection of invention patent application after publication