CN117672227A - Question-answer control method and device based on intelligent sound box, computer equipment and medium - Google Patents

Question-answer control method and device based on intelligent sound box, computer equipment and medium Download PDF

Info

Publication number
CN117672227A
CN117672227A CN202410101435.9A CN202410101435A CN117672227A CN 117672227 A CN117672227 A CN 117672227A CN 202410101435 A CN202410101435 A CN 202410101435A CN 117672227 A CN117672227 A CN 117672227A
Authority
CN
China
Prior art keywords
intention
text data
router
intent
recognition model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410101435.9A
Other languages
Chinese (zh)
Other versions
CN117672227B (en
Inventor
方斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Tech Development Co ltd
Original Assignee
New Tech Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New Tech Development Co ltd filed Critical New Tech Development Co ltd
Priority to CN202410101435.9A priority Critical patent/CN117672227B/en
Publication of CN117672227A publication Critical patent/CN117672227A/en
Application granted granted Critical
Publication of CN117672227B publication Critical patent/CN117672227B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The embodiment provides a question-answering control method, a question-answering control device, computer equipment and a question-answering control medium based on an intelligent sound box, wherein the first intelligent sound box sends first voice data to a first router, and the first router comprises a first intention recognition model; the first router converts the first voice data into first text data through a first intention recognition model, converts the first text data into first intention characteristics, and outputs a first intention result according to the first intention characteristics; the first router sends the first text data and the first intention result to the total router; the total router comprises a third intention recognition model; the total router converts the first text data into third intention characteristics through the third intention recognition model, and outputs a third intention result according to the third intention characteristics, so that the distributed artificial intelligent model is realized, high hardware cost is not needed, the hardware cost is reduced, the training time is shortened, the operation efficiency is improved, and the voice recognition rate of the intelligent sound box is improved.

Description

Question-answer control method and device based on intelligent sound box, computer equipment and medium
Technical Field
The invention relates to the technical field of computers, in particular to a question-answer control system based on an intelligent sound box, a question-answer control method based on the intelligent sound box, a question-answer control device based on the intelligent sound box, computer equipment and a storage medium.
Background
Along with the progress of science and technology, the function of audio amplifier is more and more, and intelligent audio amplifier technique matures and commercializes gradually. The intelligent sound box can output and play voice, music and the like, and can also accept interaction between the voice of the user and the user for conversation. The intelligent sound box generally recognizes the problem of the user in an intention recognition mode so as to output a relevant answer, however, the hardware of the existing intention recognition model has higher requirements on running speed, response time and the like, the model training time is longer, the hardware cost is higher, and the voice recognition rate of the existing intelligent sound box is difficult to be greatly improved.
Disclosure of Invention
In view of the above problems, the present embodiment is proposed in order to provide a question-answering control method based on an intelligent sound box, a question-answering control system based on an intelligent sound box, a question-answering control device based on an intelligent sound box, a computer device, and a storage medium, which overcome or at least partially solve the above problems.
In order to solve the above problems, this embodiment discloses a question-answer control method based on an intelligent sound box, including:
the method comprises the steps that a first intelligent sound box obtains first voice data of a first specific space where the first intelligent sound box is located;
the first intelligent sound box sends the first voice data to a first router, wherein the first router comprises a first intention recognition model; the first router converts the first voice data into first text data through a first intention recognition model, converts the first text data into first intention characteristics, and outputs a first intention result according to the first intention characteristics; wherein the first text data includes an intent keyword;
the first router sends first text data and a first intention result to a total router; the total router includes a third intent recognition model; the total router converts the first text data into third intention characteristics through the third intention recognition model, and outputs a third intention result according to the third intention characteristics;
the total router calculates the similarity of the intention keyword, the first intention result and the third intention result respectively to obtain first feature similarity and third feature similarity, determines the feature similarity with high first feature similarity and third feature similarity as the intention similarity, and outputs the intention result corresponding to the intention similarity to a first intelligent sound box.
Preferably, the method further comprises:
the second intelligent sound box acquires second voice data of a second specific space where the second intelligent sound box is located;
the second intelligent sound box sends the second voice data to a second router, and the second router comprises a second intention recognition model; the second router converts the second voice data into second text data through a second intention recognition model, converts the second text data into second intention characteristics, and outputs a second intention result according to the second intention characteristics; wherein the second text data includes an intent keyword;
the second router sends second text data and a second intention result to the total router; the total router includes a third intent recognition model; the total router converts the second text data into fourth intention characteristics through the third intention recognition model, and outputs a fourth intention result according to the fourth intention characteristics;
and the total router calculates the similarity of the intention keyword, the second intention result and the fourth intention result respectively to obtain second characteristic similarity and fourth characteristic similarity, determines the characteristic similarity with high second characteristic similarity and fourth characteristic similarity as the intention similarity, and outputs the intention result corresponding to the intention similarity to a second intelligent sound box.
Preferably, the first intent recognition model comprises a trained song intent model; the training step of the song intention model comprises the following steps:
converting the voice data into text data to obtain a text data training set and song classification intention labels; performing coding feature conversion on the text data training set to obtain coding features;
pooling the coding features to obtain pooled voice features;
inputting the voice characteristics to a full connection layer to obtain an output estimated intention, and obtaining a song intention loss function according to the estimated intention and song classification intention labels;
and adjusting parameters of the intent recognition model according to the song intent loss function to obtain a trained song intent model.
Preferably, the second intention recognition model comprises a trained appliance control intention model; the training step of the electric appliance control intention model comprises the following steps:
converting the voice data into text data to obtain a text data training set and an electrical appliance control classification intention label; performing coding feature conversion on the text data training set to obtain coding features;
pooling the coding features to obtain pooled voice features;
Inputting the voice characteristics to a full connection layer to obtain an output estimated intention, and obtaining an appliance control intention loss function according to an appliance control classification intention label;
and adjusting parameters of the intention recognition model according to the appliance control intention loss function to obtain a trained appliance control intention model.
Preferably, the third intent recognition model comprises a trained search intent model; the training step of the search intention model comprises the following steps:
converting the voice data into text data to obtain a text data training set and a search classification intention label; performing coding feature conversion on the text data training set to obtain coding features;
pooling the coding features to obtain pooled voice features;
inputting the voice characteristics to a full connection layer to obtain an output estimated intention, and obtaining a search intention loss function according to the estimated intention and a search classification intention label;
and adjusting parameters of the intent recognition model according to the searching intent loss function to obtain a trained song intent model.
Preferably, the method further comprises:
calculating a comprehensive loss function of the song intention loss function, the electric appliance control intention loss function and the search intention loss function;
And adjusting parameters of the intention recognition model according to the comprehensive loss function to obtain a trained comprehensive intention recognition model.
Preferably, the method further comprises:
the method comprises the steps of obtaining voice data which are not repeatedly questioned after a preset time interval, converting the voice data into a text data training set, determining an intention result corresponding to the voice data as an intention label, and training an initial intention recognition model through the text data training set and the intention label.
The embodiment discloses a question-answering control system based on intelligent audio amplifier, includes:
the system comprises a first intelligent sound box, a first router, a second intelligent sound box, a second router, a third intelligent sound box and a total router;
the total router is respectively connected with the first router, the second router and the third intelligent sound box, the first router is connected with the first intelligent sound box, and the second router is connected with the second intelligent sound box;
the first router and the first intelligent sound box are arranged in a first specific space; the second router and the second intelligent sound box are arranged in a second specific space; the total router and the third intelligent sound box are arranged in a third specific space; the first specific space, the second specific space and the third specific space are three adjacent spaces;
The first intelligent sound box, the second intelligent sound box and the third intelligent sound box are used for receiving voice data; the first router is provided with a first intention recognition model; the second router is provided with a second intention recognition model; the total router is provided with a third intent recognition model.
The embodiment discloses question-answering control device based on intelligent audio amplifier, includes:
the first acquisition module is used for acquiring first voice data of a first specific space where the first intelligent sound box is located;
the first output module is used for transmitting the first voice data to a first router by a first intelligent sound box, and the first router comprises a first intention recognition model; the first router converts the first voice data into first text data through a first intention recognition model, converts the first text data into first intention characteristics, and outputs a first intention result according to the first intention characteristics; wherein the first text data includes an intent keyword;
the first sending module is used for sending the first text data and the first intention result to the total router by the first router; the total router includes a third intent recognition model; the total router converts the first text data into third intention characteristics through the third intention recognition model, and outputs a third intention result according to the third intention characteristics;
The first determining module is configured to calculate similarities between the intent keyword and the first intent result and between the total router and the third intent result, obtain a first feature similarity and a third feature similarity, determine that feature similarities with high first feature similarity and high third feature similarity are intent similarities, and output an intent result corresponding to the intent similarities to a first intelligent sound box.
The embodiment also discloses a computer device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of question-answer control based on the intelligent sound box when executing the computer program.
The embodiment also discloses a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of question-answering control based on the intelligent sound box.
This embodiment includes the following advantages:
in the embodiment of the invention, the chip and the storage of the router are provided with the artificial intelligent model, the distributed artificial intelligent model is used for carrying out multi-space voice recognition, the distributed artificial intelligent model is realized, the high hardware cost is not needed, the hardware cost is reduced, the training time is shortened, the operation efficiency is improved, and the voice recognition rate of the intelligent sound box is improved.
Drawings
In order to more clearly illustrate the technical solutions of the present embodiment, the drawings required for the description of the embodiment will be briefly described below, and it will be apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for those skilled in the art
Fig. 1 is a flowchart of steps of an embodiment of a question-answer control method based on an intelligent sound box in the present embodiment;
fig. 2 is a schematic diagram of a specific application scenario of the present embodiment;
fig. 3 is a block diagram of an embodiment of a question-answering control device based on an intelligent sound box according to the present embodiment;
FIG. 4 is an internal block diagram of a computer device of one embodiment.
Detailed Description
In order to make the technical problems, technical schemes and beneficial effects solved by the present embodiment more clear, the present embodiment is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In one core concept of the embodiment of the invention, each node (router) in a specific space (a house in a two-room or three-room) is provided with an artificial intelligence model which does not agree with pattern type sample training, each trained artificial intelligence model recognizes intention when receiving voice data in different spaces, and the voice data is sent to an artificial intelligence model which does not agree with pattern type sample training and is arranged in another node (router), the artificial intelligence model which does not agree with pattern type sample training also recognizes intention, the intention of the two artificial intelligence models is compared, the intention of the two artificial intelligence models is judged by taking the intention of the artificial intelligence model which is closest to training of an original pattern type sample (an intention keyword) as a reference, the artificial intelligence model which is trained by other intention type samples can be trained while the intention recognition is performed, and finally a plurality of artificial intelligence models can recognize various types, so that the distributed artificial intelligence model is realized, high hardware cost is not needed, the hardware cost is reduced, the training time is shortened, and the voice recognition rate of an intelligent sound box is improved.
Referring to fig. 1, a step flowchart of an embodiment of a question-answer control method based on an intelligent sound box in this embodiment is shown, which specifically may include the following steps:
step 101, a first intelligent sound box acquires first voice data of a first specific space;
in the embodiment of the invention, the question-answering control method based on the intelligent sound box can be applied to a question-answering control system, and the system can comprise a first intelligent sound box, a first router, a second intelligent sound box, a second router, a third intelligent sound box and a total router;
the total router is respectively connected with the first router, the second router and the third intelligent sound box, the first router is connected with the first intelligent sound box, and the second router is connected with the second intelligent sound box;
the first router and the first intelligent sound box are arranged in a first specific space; the second router and the second intelligent sound box are arranged in a second specific space; the total router and the third intelligent sound box are arranged in a third specific space; the first specific space, the second specific space and the third specific space are three adjacent spaces;
the first intelligent sound box, the second intelligent sound box and the third intelligent sound box are used for receiving voice data; the first router is provided with a first intention recognition model; the second router is provided with a second intention recognition model; the total router is provided with a third intent recognition model.
Referring to fig. 2, a schematic diagram of a specific application scenario of the embodiment of the present invention is shown, for example, a third intelligent speaker and a total router may be disposed in a living room area of a house, a first intelligent speaker and a first router may be disposed in a main sleeping area of the house, a second intelligent speaker and a second router may be disposed in a secondary sleeping area of the house, and by using data processing and conversion between routers in a plurality of different areas, the question-answer recognition rate in each independent space is improved, and the accuracy and applicability of question-answer control are improved.
In practical application, the first intelligent sound box, the second intelligent sound box and the third intelligent sound box can comprise a loudspeaker, a microphone, a processor, a power supply, a wireless network adapter and the like, and can also rotate a component, a mobile component and the like, so that the embodiment of the invention does not excessively limit the components; the intelligent sound box can be connected with a corresponding router through the wireless network adapter, further, the router can comprise a processor and a memory, an intention recognition model can be arranged on the memory, namely, the router can be used for network data transmission and exchange, and can also be used for running of the intention recognition model.
Step 102, a first intelligent sound box sends the first voice data to a first router, wherein the first router comprises a first intention recognition model; the first router converts the first voice data into first text data through a first intention recognition model, converts the first text data into first intention characteristics, and outputs a first intention result according to the first intention characteristics; wherein the first text data includes an intent keyword;
wherein the first intent recognition model trains the song intent model; the training step of the song intention model comprises the following steps:
converting the voice data into text data to obtain a text data training set and song classification intention labels; performing coding feature conversion on the text data training set to obtain coding features;
pooling the coding features to obtain pooled voice features;
inputting the voice characteristics to a full connection layer to obtain an output estimated intention, and obtaining a song intention loss function according to the estimated intention and song classification intention labels;
and adjusting parameters of the intent recognition model according to the song intent loss function to obtain a trained song intent model.
Step 103, the first router sends the first text data and the first intention result to a total router; the total router includes a third intent recognition model; the total router converts the first text data into third intention characteristics through the third intention recognition model, and outputs a third intention result according to the third intention characteristics;
the third intention recognition model comprises a trained search intention model; the training step of the search intention model comprises the following steps:
converting the voice data into text data to obtain a text data training set and a search classification intention label; performing coding feature conversion on the text data training set to obtain coding features;
pooling the coding features to obtain pooled voice features;
inputting the voice characteristics to a full connection layer to obtain an output estimated intention, and obtaining a search intention loss function according to the estimated intention and a search classification intention label;
and adjusting parameters of the intent recognition model according to the searching intent loss function to obtain a trained song intent model.
In the embodiment of the invention, distributed model layout is performed through three different intention recognition models, wherein the intention recognition models can comprise a song intention model, a search intention model and an electric appliance control intention model, and particularly, the intention recognition models can be divided into other types of intention recognition models, and the embodiment of the invention does not have excessive limitation on the intention recognition models; specifically, the intention recognition model may be trained by setting different training samples to obtain different trained intention models, in a specific example, the text data training set and the text data training set in the song classification intention label may be text data including related song keywords (such as "somewhere" in text data "how a certain joss sticks is singed" in text data "somewhere", "joss sticks" in text data conversion, and the song classification intention label may be a song name "different versions of joss sticks of a certain singing" or "different versions of joss sticks of other singers singing".
For the text data training set and the search classification intention label of the search intention model, the text data training set can be text data containing search keywords, the search classification intention label is a corresponding intention result, for example, the text data can be 'help me find next nearby restaurants', the search classification intention label is a restaurant name and an address, and the text data training set and the search classification intention label of the search intention model are formed by the text data training set and the search classification intention label.
For another specific composition aspect of the intention recognition model, the intention recognition model can also comprise an input layer, a maximum pooling layer, a full connection layer, a hiding layer, a softmax layer and the like, the specific composition of the intention recognition model is not excessively limited, the effect of the intention recognition can be realized, the intention recognition model can be other supervised neural network models such as a recurrent neural network and the like, an unsupervised neural network model can also be adopted as the intention recognition model to realize the function of the intention recognition, the unsupervised neural network model can comprise a K-means algorithm model, a hierarchical clustering algorithm model, a PCA principal component analysis model and the like, and the embodiment of the invention does not excessively limit the intention recognition.
Step 104, the total router calculates the similarity of the intention keyword, the first intention result and the third intention result respectively, obtains a first feature similarity and a third feature similarity, determines the feature similarity with high first feature similarity and third feature similarity as the intention similarity, and outputs the intention result corresponding to the intention similarity to the first intelligent sound box.
Specifically, the first feature similarity and the third feature similarity may be cosine similarities, the total router calculates the first feature similarity between the intention keyword and the first intention result, calculates the third feature similarity between the intention keyword and the third intention result, compares the magnitudes of the first feature similarity and the third feature similarity, and selects a higher feature similarity of the first feature similarity and the third feature similarity, that is, if the third feature similarity is greater than the first feature similarity, outputs an intention result corresponding to the third feature similarity to the first intelligent sound box.
In the embodiment of the invention, the chip and the storage of the router are provided with the artificial intelligent model, the distributed artificial intelligent model is used for carrying out multi-space voice recognition, the distributed artificial intelligent model is realized, the high hardware cost is not needed, the hardware cost is reduced, the training time is shortened, the operation efficiency is improved, and the voice recognition rate of the intelligent sound box is improved.
In a preferred embodiment, the second smart speaker may also acquire second voice data of a second specific space where the second smart speaker is located; if the second intelligent sound box can acquire second voice data of the lying person;
the second intelligent sound box sends the second voice data to a second router, and the second router comprises a second intention recognition model; the second router converts the second voice data into second text data through a second intention recognition model, converts the second text data into second intention characteristics, and outputs a second intention result according to the second intention characteristics; wherein the second text data includes an intent keyword;
in one specific application, the second intent recognition model includes a trained appliance control intent model; the training step of the electric appliance control intention model comprises the following steps:
converting the voice data into text data to obtain a text data training set and an electrical appliance control classification intention label; performing coding feature conversion on the text data training set to obtain coding features; pooling the coding features to obtain pooled voice features; inputting the voice characteristics to a full connection layer to obtain an output estimated intention, and obtaining an appliance control intention loss function according to an appliance control classification intention label; and adjusting parameters of the intention recognition model according to the appliance control intention loss function to obtain a trained appliance control intention model.
In a specific example, the text data training set may be text data including an appliance control keyword, and the appliance control classification intention label is a corresponding intention result, for example, the text data may be "help me turn off air conditioner", and the appliance control intention label is a text data training set for controlling the air conditioner to turn off and outputting whether the air conditioner is turned off successfully or not, and the text data training set and the appliance control classification intention label form the appliance control intention model.
Further, the second router sends second text data and a second intention result to the total router; the total router includes a third intent recognition model; the total router converts the second text data into fourth intention characteristics through the third intention recognition model, and outputs a fourth intention result according to the fourth intention characteristics;
the total router calculates the similarity between the intention keyword and the second intention result and the similarity between the intention keyword and the fourth intention result respectively to obtain second feature similarity and fourth feature similarity, determines the feature similarity with high second feature similarity and fourth feature similarity as the intention similarity, and outputs the intention result corresponding to the intention similarity to the second intelligent sound box.
It should be noted that, the feature similarity may be cosine similarity, euclidean distance similarity, etc., which is not limited in this embodiment of the present invention.
In a preferred embodiment of the present invention, the method further includes:
calculating a comprehensive loss function of the song intention loss function, the electric appliance control intention loss function and the search intention loss function;
in an embodiment of the present invention, the song intention loss function may include the following formula:
wherein,a loss function for song intent;is a keyword influence coefficient;is the number of samples;a predicted value of estimated intent for a model output of the ith song intent model sample;classifying the true value of the intent label for the song;
further, the appliance control intent loss function may include the following formula:
wherein,controlling an intent loss function for the appliance;is an environmental impact coefficient;is the number of samples;a predicted value of estimated intent for a model output of the ith appliance control intent model sample;classifying the true value of the intent label for the appliance control;
further, the search intent loss function may include the following formula:
wherein,controlling an intent loss function for the appliance; To control the influence coefficient;is the number of samples;predicted values of estimated intents output for the model of the ith search intent model sample;the true value of the intent label is classified for searching;
specifically, the comprehensive loss function can be calculated from the song intention loss function, the electric appliance control intention loss function and the search intention loss function;
comprehensive loss functionI.e. song intention loss function, appliance control intentionThe distance between the loss function and the search intention loss function is integrated.
And adjusting parameters of the intention recognition model according to the comprehensive loss function to obtain a trained comprehensive intention recognition model. In the embodiment of the invention, the intention recognition model can be trained through three loss functions, so that the trained comprehensive intention recognition model is obtained, and the performance of the model is greatly improved.
The embodiment of the invention also realizes automatic generation of the training set, reduces the cost of model training, and particularly can acquire the voice data which is not repeatedly asked after a preset time interval, convert the voice data into the text data training set, determine the intention result corresponding to the voice data as an intention label, and train an initial intention recognition model through the text data training set and the intention label. The voice data and the corresponding intention result which are only asked once by the user are converted into a text data training set and an intention label, which are used for training an initial intention recognition model, and the model training cost is reduced.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may be performed in other order or simultaneously in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments and that the actions involved are not necessarily required for the present embodiment.
Referring to fig. 3, a block diagram of an embodiment of a question-answering control device based on an intelligent sound box in this embodiment is shown, which may specifically include the following modules:
the first obtaining module 301 is configured to obtain first voice data of a first specific space where the first smart speaker is located;
a first output module 302, configured to send the first voice data to a first router by using a first smart speaker, where the first router includes a first intent recognition model; the first router converts the first voice data into first text data through a first intention recognition model, converts the first text data into first intention characteristics, and outputs a first intention result according to the first intention characteristics; wherein the first text data includes an intent keyword;
A first sending module 303, configured to send, by the first router, the first text data and the first intention result to a total router; the total router includes a third intent recognition model; the total router converts the first text data into third intention characteristics through the third intention recognition model, and outputs a third intention result according to the third intention characteristics;
the first determining module 304 is configured to calculate the similarities between the intent keyword and the first intent result and the third intent result by using the total router, obtain a first feature similarity and a third feature similarity, determine that the feature similarities with high first feature similarity and high third feature similarity are intent similarities, and output an intent result corresponding to the intent similarities to a first intelligent sound box.
Preferably, the apparatus further comprises:
the second acquisition module is used for acquiring second voice data of a second specific space where the second intelligent sound box is positioned;
the second output module is used for sending the second voice data to a second router by a second intelligent sound box, and the second router comprises a second intention recognition model; the second router converts the second voice data into second text data through a second intention recognition model, converts the second text data into second intention characteristics, and outputs a second intention result according to the second intention characteristics; wherein the second text data includes an intent keyword;
The second sending module is used for sending second text data and a second intention result to the total router by the second router; the total router includes a third intent recognition model; the total router converts the second text data into fourth intention characteristics through the third intention recognition model, and outputs a fourth intention result according to the fourth intention characteristics;
the second determining module is configured to calculate similarities between the intent keyword and the second intent result and between the total router and the fourth intent result, obtain a second feature similarity and a fourth feature similarity, determine that feature similarities with high second feature similarity and high fourth feature similarity are intent similarities, and output an intent result corresponding to the intent similarities to a second intelligent sound box.
Preferably, the first intent recognition model comprises a trained song intent model; the training module of the song intention model comprises:
the first coding feature acquisition sub-module is used for converting voice data into text data to obtain a text data training set and song classification intention labels; performing coding feature conversion on the text data training set to obtain coding features;
The first pooling sub-module is used for pooling the coding features to obtain pooled voice features;
the first loss function acquisition submodule is used for inputting the voice characteristics to the full-connection layer to obtain an output estimated intention, and obtaining a song intention loss function according to the estimated intention and a song classification intention label;
and the first adjusting sub-module is used for adjusting parameters of the intention recognition model according to the song intention loss function to obtain a trained song intention model.
Preferably, the second intention recognition model comprises a trained appliance control intention model; the training module of the electrical appliance control intention model comprises:
the second coding feature acquisition sub-module is used for converting the voice data into text data to obtain a text data training set and an electric appliance control classification intention label; performing coding feature conversion on the text data training set to obtain coding features;
the second pooling sub-module is used for pooling the coding features to obtain pooled voice features;
the second loss function obtaining submodule is used for inputting the voice characteristics to the full-connection layer to obtain an output estimated intention, and obtaining an electric appliance control intention loss function according to the estimated intention and an electric appliance control classification intention label;
And the second adjusting sub-module is used for adjusting parameters of the intention recognition model according to the appliance control intention loss function to obtain a trained appliance control intention model.
Preferably, the third intent recognition model comprises a trained search intent model; the module steps of searching the intention model comprise:
the third coding feature acquisition sub-module is used for converting the voice data into text data to obtain a text data training set and a search classification intention label; performing coding feature conversion on the text data training set to obtain coding features;
the third pooling sub-module is used for pooling the coding features to obtain pooled voice features;
the third loss function obtaining sub-module is used for inputting the voice characteristics to the full-connection layer to obtain an output estimated intention, and obtaining a search intention loss function according to the estimated intention and the search classification intention label;
and the third adjusting sub-module is used for adjusting parameters of the intention recognition model according to the searching intention loss function to obtain a trained song intention model.
Preferably, the apparatus further comprises:
the comprehensive loss function calculation module is used for calculating the comprehensive loss functions of the song intention loss function, the electric appliance control intention loss function and the search intention loss function;
The comprehensive intention recognition model obtaining module is used for adjusting parameters of the intention recognition model according to the comprehensive loss function to obtain a trained comprehensive intention recognition model.
Preferably, the apparatus further comprises:
the conversion module is used for obtaining voice data which is not repeatedly questioned after a preset time interval, converting the voice data into a text data training set, determining an intention result corresponding to the voice data as an intention label, and training an initial intention recognition model through the text data training set and the intention label.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
For specific limitations regarding the intelligent speaker based question-answer control apparatus, reference may be made to the above limitations regarding the intelligent speaker based question-answer control method, and no further description is given herein. All or part of the modules in the question-answering control device based on the intelligent sound box can be realized by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
The question-answering control device based on the intelligent sound box can be used for executing the question-answering control method based on the intelligent sound box, provided by any embodiment, and has corresponding functions and beneficial effects.
In one embodiment, a computer device is provided, which may be a smart speaker or router, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by a processor implements a method of lighting rate simulation. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 3 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps as described in fig. 1 when executing the computer program:
in one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, implements the steps as described in fig. 1.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present embodiments may be provided as a method, apparatus, or computer program product. Thus, the present embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present embodiments may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present embodiments are described with reference to flowchart illustrations and/or block diagrams of apparatus, terminal devices (systems), and computer program products according to the embodiments. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present embodiments have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiment.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or terminal device comprising the element.
The invention provides a question-answering control method based on an intelligent sound box, a question-answering control device based on the intelligent sound box, a computer device and a storage medium, which are described in detail, wherein specific examples are applied to illustrate the principle and the implementation of the invention, and the description of the above examples is only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (10)

1. The question-answering control method based on the intelligent sound box is characterized by comprising the following steps of:
the method comprises the steps that a first intelligent sound box obtains first voice data of a first specific space where the first intelligent sound box is located;
the first intelligent sound box sends the first voice data to a first router, wherein the first router comprises a first intention recognition model; the first router converts the first voice data into first text data through a first intention recognition model, converts the first text data into first intention characteristics, and outputs a first intention result according to the first intention characteristics; wherein the first text data includes an intent keyword;
the first router sends first text data and a first intention result to a total router; the total router includes a third intent recognition model; the total router converts the first text data into third intention characteristics through the third intention recognition model, and outputs a third intention result according to the third intention characteristics;
the total router calculates the similarity of the intention keyword, the first intention result and the third intention result respectively to obtain first feature similarity and third feature similarity, determines the feature similarity with high first feature similarity and third feature similarity as the intention similarity, and outputs the intention result corresponding to the intention similarity to a first intelligent sound box.
2. The method according to claim 1, wherein the method further comprises:
the second intelligent sound box acquires second voice data of a second specific space where the second intelligent sound box is located;
the second intelligent sound box sends the second voice data to a second router, and the second router comprises a second intention recognition model; the second router converts the second voice data into second text data through a second intention recognition model, converts the second text data into second intention characteristics, and outputs a second intention result according to the second intention characteristics; wherein the second text data includes an intent keyword;
the second router sends second text data and a second intention result to the total router; the total router includes a third intent recognition model; the total router converts the second text data into fourth intention characteristics through the third intention recognition model, and outputs a fourth intention result according to the fourth intention characteristics;
and the total router calculates the similarity of the intention keyword, the second intention result and the fourth intention result respectively to obtain second characteristic similarity and fourth characteristic similarity, determines the characteristic similarity with high second characteristic similarity and fourth characteristic similarity as the intention similarity, and outputs the intention result corresponding to the intention similarity to a second intelligent sound box.
3. The method of claim 1, wherein the first intent recognition model comprises a trained song intent model; the training step of the song intention model comprises the following steps:
converting the voice data into text data to obtain a text data training set and song classification intention labels; performing coding feature conversion on the text data training set to obtain coding features;
pooling the coding features to obtain pooled voice features;
inputting the voice characteristics to a full connection layer to obtain an output estimated intention, and obtaining a song intention loss function according to the estimated intention and song classification intention labels;
and adjusting parameters of the intent recognition model according to the song intent loss function to obtain a trained song intent model.
4. The method of claim 2, wherein the second intent recognition model comprises a trained appliance control intent model; the training step of the electric appliance control intention model comprises the following steps:
converting the voice data into text data to obtain a text data training set and an electrical appliance control classification intention label; performing coding feature conversion on the text data training set to obtain coding features;
Pooling the coding features to obtain pooled voice features;
inputting the voice characteristics to a full connection layer to obtain an output estimated intention, and obtaining an appliance control intention loss function according to an appliance control classification intention label;
and adjusting parameters of the intention recognition model according to the appliance control intention loss function to obtain a trained appliance control intention model.
5. The method of claim 1, wherein the third intent recognition model comprises a trained search intent model; the training step of the search intention model comprises the following steps:
converting the voice data into text data to obtain a text data training set and a search classification intention label; performing coding feature conversion on the text data training set to obtain coding features;
pooling the coding features to obtain pooled voice features;
inputting the voice characteristics to a full connection layer to obtain an output estimated intention, and obtaining a search intention loss function according to the estimated intention and a search classification intention label;
and adjusting parameters of the intent recognition model according to the searching intent loss function to obtain a trained song intent model.
6. The method of claim 3, 4 or 5, further comprising:
calculating a comprehensive loss function of the song intention loss function, the electric appliance control intention loss function and the search intention loss function;
and adjusting parameters of the intention recognition model according to the comprehensive loss function to obtain a trained comprehensive intention recognition model.
7. The method of claim 3, 4 or 5, further comprising:
the method comprises the steps of obtaining voice data which are not repeatedly questioned after a preset time interval, converting the voice data into a text data training set, determining an intention result corresponding to the voice data as an intention label, and training an initial intention recognition model through the text data training set and the intention label.
8. Question-answering control device based on intelligent audio amplifier, characterized by comprising:
the first acquisition module is used for acquiring first voice data of a first specific space where the first intelligent sound box is located;
the first output module is used for transmitting the first voice data to a first router by a first intelligent sound box, and the first router comprises a first intention recognition model; the first router converts the first voice data into first text data through a first intention recognition model, converts the first text data into first intention characteristics, and outputs a first intention result according to the first intention characteristics; wherein the first text data includes an intent keyword;
The first sending module is used for sending the first text data and the first intention result to the total router by the first router; the total router includes a third intent recognition model; the total router converts the first text data into third intention characteristics through the third intention recognition model, and outputs a third intention result according to the third intention characteristics;
the first determining module is configured to calculate similarities between the intent keyword and the first intent result and between the total router and the third intent result, obtain a first feature similarity and a third feature similarity, determine that feature similarities with high first feature similarity and high third feature similarity are intent similarities, and output an intent result corresponding to the intent similarities to a first intelligent sound box.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the intelligent speaker-based question-answer control method of any one of claims 1 to 7 when the computer program is executed.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the intelligent speaker-based question-answering control method according to any one of claims 1 to 7.
CN202410101435.9A 2024-01-25 2024-01-25 Question-answer control method and device based on intelligent sound box, computer equipment and medium Active CN117672227B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410101435.9A CN117672227B (en) 2024-01-25 2024-01-25 Question-answer control method and device based on intelligent sound box, computer equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410101435.9A CN117672227B (en) 2024-01-25 2024-01-25 Question-answer control method and device based on intelligent sound box, computer equipment and medium

Publications (2)

Publication Number Publication Date
CN117672227A true CN117672227A (en) 2024-03-08
CN117672227B CN117672227B (en) 2024-04-05

Family

ID=90079109

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410101435.9A Active CN117672227B (en) 2024-01-25 2024-01-25 Question-answer control method and device based on intelligent sound box, computer equipment and medium

Country Status (1)

Country Link
CN (1) CN117672227B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006053203A (en) * 2004-08-10 2006-02-23 Sony Corp Speech processing device and method, recording medium and program
CN110347789A (en) * 2019-06-14 2019-10-18 平安科技(深圳)有限公司 Text is intended to intelligent method for classifying, device and computer readable storage medium
CN110517686A (en) * 2019-09-26 2019-11-29 合肥飞尔智能科技有限公司 Intelligent sound box end voice opens the method and system of application
CN111159346A (en) * 2019-12-27 2020-05-15 深圳物控智联科技有限公司 Intelligent answering method based on intention recognition, server and storage medium
CN112565207A (en) * 2020-11-20 2021-03-26 南京大学 Non-invasive intelligent sound box safety evidence obtaining system and method thereof
CN112687270A (en) * 2020-12-22 2021-04-20 苏州思必驰信息科技有限公司 Intelligent voice routing method and device
CN113343709A (en) * 2021-06-22 2021-09-03 北京三快在线科技有限公司 Method for training intention recognition model, method, device and equipment for intention recognition
CN113377899A (en) * 2020-03-09 2021-09-10 华为技术有限公司 Intention recognition method and electronic equipment
CN113886545A (en) * 2021-09-29 2022-01-04 平安银行股份有限公司 Knowledge question answering method, knowledge question answering device, computer readable medium and electronic equipment
US20220101839A1 (en) * 2020-09-25 2022-03-31 Genesys Telecommunications Laboratories, Inc. Systems and methods relating to bot authoring by mining intents from conversation data via intent seeding
US20230097940A1 (en) * 2021-09-27 2023-03-30 David Sandai Kurokawa System and method for extracting and using groups of features for interpretability analysis
CN116016002A (en) * 2022-12-01 2023-04-25 海尔优家智能科技(北京)有限公司 Intelligent household appliance network distribution method and device and electronic device
CN117093687A (en) * 2023-08-03 2023-11-21 京东科技信息技术有限公司 Question answering method and device, electronic equipment and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006053203A (en) * 2004-08-10 2006-02-23 Sony Corp Speech processing device and method, recording medium and program
CN110347789A (en) * 2019-06-14 2019-10-18 平安科技(深圳)有限公司 Text is intended to intelligent method for classifying, device and computer readable storage medium
CN110517686A (en) * 2019-09-26 2019-11-29 合肥飞尔智能科技有限公司 Intelligent sound box end voice opens the method and system of application
CN111159346A (en) * 2019-12-27 2020-05-15 深圳物控智联科技有限公司 Intelligent answering method based on intention recognition, server and storage medium
CN113377899A (en) * 2020-03-09 2021-09-10 华为技术有限公司 Intention recognition method and electronic equipment
US20220101839A1 (en) * 2020-09-25 2022-03-31 Genesys Telecommunications Laboratories, Inc. Systems and methods relating to bot authoring by mining intents from conversation data via intent seeding
CN112565207A (en) * 2020-11-20 2021-03-26 南京大学 Non-invasive intelligent sound box safety evidence obtaining system and method thereof
CN112687270A (en) * 2020-12-22 2021-04-20 苏州思必驰信息科技有限公司 Intelligent voice routing method and device
CN113343709A (en) * 2021-06-22 2021-09-03 北京三快在线科技有限公司 Method for training intention recognition model, method, device and equipment for intention recognition
US20230097940A1 (en) * 2021-09-27 2023-03-30 David Sandai Kurokawa System and method for extracting and using groups of features for interpretability analysis
CN113886545A (en) * 2021-09-29 2022-01-04 平安银行股份有限公司 Knowledge question answering method, knowledge question answering device, computer readable medium and electronic equipment
CN116016002A (en) * 2022-12-01 2023-04-25 海尔优家智能科技(北京)有限公司 Intelligent household appliance network distribution method and device and electronic device
CN117093687A (en) * 2023-08-03 2023-11-21 京东科技信息技术有限公司 Question answering method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN117672227B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
Zhang et al. Cooperative learning and its application to emotion recognition from speech
CN108694940B (en) Voice recognition method and device and electronic equipment
CN103280216B (en) Improve the speech recognition device the relying on context robustness to environmental change
KR20190120353A (en) Speech recognition methods, devices, devices, and storage media
JP7300435B2 (en) Methods, apparatus, electronics, and computer-readable storage media for voice interaction
WO2011126458A1 (en) Automatic frequently asked question compilation from community-based question answering archive
CN109684456B (en) Scene ability intelligent question-answering system based on Internet of things ability knowledge graph
CN112328849A (en) User portrait construction method, user portrait-based dialogue method and device
KR102261199B1 (en) Method, system and computer program for artificial intelligence answer
CN111414513B (en) Music genre classification method, device and storage medium
CN113813609B (en) Game music style classification method and device, readable medium and electronic equipment
Wang et al. Personalized music emotion recognition via model adaptation
CN116956835A (en) Document generation method based on pre-training language model
CN117672227B (en) Question-answer control method and device based on intelligent sound box, computer equipment and medium
CN111583938B (en) Electronic device and voice recognition method
CN111026908B (en) Song label determining method, device, computer equipment and storage medium
Kai [Retracted] Optimization of Music Feature Recognition System for Internet of Things Environment Based on Dynamic Time Regularization Algorithm
Yang et al. Personalized keyword spotting through multi-task learning
CN110210035B (en) Sequence labeling method and device and training method of sequence labeling model
CN115101052A (en) Audio recognition method and computer equipment
CN111552778B (en) Audio resource management method, device, computer readable storage medium and equipment
CN103474063A (en) Voice recognition system and method
CN113112969A (en) Buddhism music score recording method, device, equipment and medium based on neural network
CN112052320A (en) Information processing method and device and computer readable storage medium
CN113838466B (en) Speech recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant