WO2023212993A1 - Appliance control method, storage medium, and electronic device - Google Patents

Appliance control method, storage medium, and electronic device Download PDF

Info

Publication number
WO2023212993A1
WO2023212993A1 PCT/CN2022/096401 CN2022096401W WO2023212993A1 WO 2023212993 A1 WO2023212993 A1 WO 2023212993A1 CN 2022096401 W CN2022096401 W CN 2022096401W WO 2023212993 A1 WO2023212993 A1 WO 2023212993A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
recognition
sentence
substructure
loss value
Prior art date
Application number
PCT/CN2022/096401
Other languages
French (fr)
Chinese (zh)
Inventor
刘建国
王迪
朱毅
Original Assignee
青岛海尔科技有限公司
海尔智家股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 青岛海尔科技有限公司, 海尔智家股份有限公司 filed Critical 青岛海尔科技有限公司
Publication of WO2023212993A1 publication Critical patent/WO2023212993A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/2803Home automation networks
    • H04L12/2816Controlling appliance services of a home automation network by calling their functionalities
    • H04L12/282Controlling appliance services of a home automation network by calling their functionalities based on user interaction within the home
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to the field of communications, specifically, to an equipment control method, a storage medium and an electronic device.
  • NLU Natural Language Understanding
  • Embodiments of the present application provide a device control method, a storage medium, and an electronic device to at least solve the technical problem in related technologies that requires two models to recognize voice control instructions, resulting in high pressure on the server.
  • a device control method including: when the target audio is obtained, identifying the target audio, and obtaining a sentence to be recognized corresponding to the target audio, wherein the target audio is In response to a request to control at least one controlled device; input the above-mentioned sentence to be recognized into the target sentence recognition model to obtain the target meaning and the target word, wherein the above-mentioned target sentence recognition model includes a first recognition substructure and a second recognition substructure, and the above-mentioned third recognition substructure An identification substructure is used to identify the ideographic direction of the sentence, and the above-mentioned second recognition result is used to identify the word data in the sentence.
  • the above-mentioned target sentence recognition model uses multiple sample sentence data to compare the above-mentioned first recognition substructure and the above-mentioned second recognition.
  • sending the control instruction to the target controlled device according to the target expression and the target word includes: obtaining the control instruction and instruction sending information according to the target expression and the target word, wherein the instruction sending The information is used to instruct the above-mentioned control instruction to be sent to the above-mentioned target controlled device in the above-mentioned at least one controlled device.
  • the above-mentioned obtaining the above-mentioned control instruction and the above-mentioned instruction sending information based on the above-mentioned target meaning and the above-mentioned target word includes: determining the execution authority of the above-mentioned control instruction based on the above-mentioned target meaning; and determining the above-mentioned instruction sending based on the above-mentioned target word. information and the instruction content of the above control instructions.
  • the method before acquiring the target audio, includes: acquiring multiple sample sentence data; marking the sentence data in each of the above sample sentence data to obtain the marked multiple sample sentence data, wherein, each marked sample sentence data includes a marked ideographic identifier and a word identifier, the above-mentioned ideographic identifier is used to mark the ideographic direction of the above-mentioned sentence data, and the above-mentioned word identifier is used to mark at least one word data in the above-mentioned sentence data; from The current sample sentence data is determined from the marked plurality of sample sentence data, and an initial sentence recognition model is determined, wherein the above-mentioned initial sentence recognition model includes a first recognition substructure and a second recognition substructure, and the above-mentioned first recognition substructure The above-mentioned second recognition result is used to identify the word data in the above-mentioned sentence data; the above-mentioned current sample sentence data is input into the above-mentioned first recognition sub
  • obtaining the current training loss value based on the above-mentioned first training loss value and the above-mentioned second training loss value includes: performing the above-mentioned first training loss value and the above-mentioned second training loss value according to the first The target weight value and the second target weight value are integrated to obtain the above-mentioned current training loss value, wherein the above-mentioned first target weight value is the weight value of the above-mentioned first training loss value, and the above-mentioned second target weight value is the above-mentioned second training loss value.
  • the weight value of the value includes: performing the above-mentioned first training loss value and the above-mentioned second training loss value according to the first The target weight value and the second target weight value are integrated to obtain the above-mentioned current training loss value, wherein the above-mentioned first target weight value is the weight value of the above-mentioned first training loss value, and the above-mentioned second target weight value is the above-
  • the method further includes: when the word identifier of the current sample sentence data includes a target hidden identifier, the third recognition result output according to the second recognition substructure and the word identifier of the current sample sentence data. , determine the third training loss value of the above-mentioned target hidden identifier; the above-mentioned first training loss value, the above-mentioned second training loss value and the above-mentioned third training loss value are respectively based on the first target weight value, the second target weight value and the third The target weight values are integrated to obtain the above-mentioned current training loss value, wherein the above-mentioned first target weight value is the weight value of the above-mentioned first training loss value, the above-mentioned second target weight value is the weight value of the above-mentioned second training loss value, and the above-mentioned The third target weight value is the weight value of the above-mentioned third training loss value.
  • obtaining the next sample sentence data from the marked plurality of sample sentence data and inputting it into the above initial sentence recognition model includes: When the current training loss value does not reach the recognition convergence condition, the parameters of the first recognition substructure and the second recognition substructure are adjusted to obtain an adjusted initial sentence recognition model; from the marked multiple Obtain the next sample sentence data from the sample sentence data and input it into the above-adjusted initial sentence recognition model.
  • a device control device including: an identification module configured to identify the target audio when the target audio is obtained, and obtain the target audio corresponding to the target audio to be identified.
  • the above-mentioned target audio is used to request the control of at least one controlled device;
  • the input module is configured to input the above-mentioned sentence to be recognized into the target sentence recognition model to obtain the target ideogram and the target word, wherein the above-mentioned target sentence recognition model includes A first recognition substructure and a second recognition substructure.
  • the first recognition substructure is used to identify the ideographic direction of a sentence.
  • the second recognition result is used to identify word data in the sentence.
  • the target sentence recognition model uses multiple samples.
  • the sentence data is a model for identifying the ideographic direction and word data of the sentence, which is obtained by jointly training the above-mentioned first recognition substructure and the above-mentioned second recognition substructure; the sending module is set to be based on the above-mentioned target ideogram and the above-mentioned target word direction.
  • the target controlled device sends a control instruction, wherein the at least one controlled device includes the target controlled device.
  • a computer-readable storage medium stores a computer program, wherein the computer program is configured to execute the above device control method when running. .
  • an electronic device including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the above device through the computer program Control Method.
  • the target audio when the target audio is obtained, the target audio is identified and a sentence to be recognized corresponding to the target audio is obtained, where the target audio is used to request control of at least one controlled device; the sentence to be recognized is input to the target
  • the sentence recognition model obtains the target meaning and the target word.
  • the target sentence recognition model includes a first recognition substructure and a second recognition substructure.
  • the first recognition substructure is used to identify the ideographic direction of the sentence
  • the second recognition result is used to identify
  • the target sentence recognition model is a model obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data, and is used to recognize the ideographic direction of the sentence and the word data; according to the target
  • the ideographic and target words send control instructions to the target controlled device, wherein at least one controlled device includes the target controlled device.
  • Figure 1 is a hardware structure block diagram of a computer terminal according to an optional device control method according to an embodiment of the present application
  • Figure 2 is a flow chart of an optional device control method according to an embodiment of the present application.
  • Figure 3 is a schematic diagram of an optional device control method according to an embodiment of the present application.
  • Figure 4 is a schematic diagram of another optional device control method according to an embodiment of the present application.
  • Figure 5 is a structural block diagram of an optional equipment control device according to an embodiment of the present application.
  • a device control method is provided.
  • This device control method is widely used in whole-house intelligent digital control application scenarios such as Smart Home, Smart Home, Smart Home Equipment Ecology, and Intelligence House Ecology.
  • the above device control method can be applied to the hardware environment composed of the terminal device 102 and the server 104 as shown in FIG. 1 .
  • the server 104 is connected to the terminal device 102 through the network and can be used to provide services (such as application services, etc.) for the terminal or the client installed on the terminal.
  • a database can be set up on the server or independently from the server.
  • cloud computing and/or edge computing services can be configured on the server or independently of the server to provide data computing services for the server 104.
  • the above-mentioned network may include but is not limited to at least one of the following: wired network, wireless network.
  • the above-mentioned wired network may include but is not limited to at least one of the following: wide area network, metropolitan area network, and local area network.
  • the above-mentioned wireless network may include at least one of the following: WIFI (Wireless Fidelity, Wireless Fidelity), Bluetooth.
  • the terminal device 102 may be, but is not limited to, a PC, a mobile phone, a tablet, a smart air conditioner, a smart hood, a smart refrigerator, a smart oven, a smart stove, a smart washing machine, a smart water heater, a smart washing equipment, a smart dishwasher, or a smart projection device.
  • smart TV smart clothes drying rack, smart curtains, smart audio and video, smart sockets, smart audio, smart speakers, smart fresh air equipment, smart kitchen and bathroom equipment, smart bathroom equipment, smart sweeping robot, smart window cleaning robot, smart mopping robot, Smart air purification equipment, smart steamers, smart microwave ovens, smart kitchen appliances, smart purifiers, smart water dispensers, smart door locks, etc.
  • Figure 2 is a flow chart of the device control method according to the embodiment of the present application. The process includes the following steps:
  • Step S202 When the target audio is obtained, identify the target audio and obtain the sentence to be recognized corresponding to the target audio, where the target audio is used to request control of at least one controlled device;
  • Step S204 Input the sentence to be recognized into the target sentence recognition model to obtain the target meaning and the target word.
  • the target sentence recognition model includes a first recognition substructure and a second recognition substructure.
  • the first recognition substructure is used to recognize the sentence.
  • the second recognition result is used to identify word data in the sentence
  • the target sentence recognition model is a model for recognizing sentences obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data. Models of ideographic pointing and word data;
  • Step S206 Send a control instruction to the target controlled device according to the target meaning and the target word, where at least one controlled device includes the target controlled device.
  • the user may, but is not limited to, control the controlled device through voice control instructions;
  • the target audio may include, but is not limited to, the audio used to control the controlled device.
  • the target sentence recognition model may include, but is not limited to, two recognition substructures, respectively used to identify the ideographic direction and word data of the sentence.
  • the two recognition substructures may, but are not limited to, be trained through joint training. get.
  • the target controlled device and the control instruction sent to the target controlled device can be determined according to the target meaning and the target word.
  • the target audio when the target audio is obtained, the target audio is recognized and the sentence to be recognized corresponding to the target audio is obtained, where the target audio is used to request control of at least one controlled device; the sentence to be recognized is input Go to the target sentence recognition model to obtain the target meaning and target words.
  • the target sentence recognition model includes a first recognition substructure and a second recognition substructure.
  • the first recognition substructure is used to identify the ideographic direction of the sentence
  • the second recognition result is For identifying the word data in the sentence
  • the target sentence recognition model is a model obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data, and is used to identify the ideographic direction of the sentence and the word data; Send a control instruction to the target controlled device according to the target meaning and the target word, wherein at least one controlled device includes the target controlled device.
  • two substructures are set up in the target sentence recognition model to identify the ideographic direction and word data respectively, so that one model can be used to identify the intention and slot of the sentence at the same time, thus reducing the pressure on the server and solving the problem.
  • Technical issues that put a lot of pressure on the server due to the need for two models to recognize voice control commands.
  • sending a control instruction to the target controlled device according to the target meaning and the target word includes: obtaining the control instruction and command sending information according to the target meaning and the target word, wherein the command sending information is used to indicate that the control instruction will be Sent to the target controlled device in at least one controlled device.
  • the intention of the target audio can be determined based on the target meaning, and the target controlled device and the instruction content of the control instruction can be determined based on the target words. For example, if the target audio is "turn on the TV in the living room”, then the target meaning is "home appliance control”, the target controlled device is “the TV in the living room”, and the command content of the control instruction is "turn on the TV in the living room”.
  • the controlled device is controlled through voice interaction, which improves the user experience.
  • obtaining the control instruction and instruction sending information based on the target meaning and the target word includes: determining the execution authority of the control instruction based on the target meaning; and determining the instruction sending information and the instruction content of the control instruction based on the target word.
  • the intention of the target audio can be determined based on the target meaning, and the target controlled device and the instruction content of the control instruction can be determined based on the target words. For example, if the target audio is "turn on the air conditioner in the living room”, then the target meaning is "home appliance control”, the target controlled device is “the air conditioner in the living room”, and the command content of the control instruction is "turn on the air conditioner in the living room”.
  • the controlled device is controlled through voice interaction, which improves the user experience.
  • the method before acquiring the target audio, includes: acquiring multiple sample sentence data; marking the sentence data in each sample sentence data to obtain multiple marked sample sentence data, wherein, Each marked sample sentence data includes a marked ideographic identifier and a word identifier.
  • the ideographic identifier is used to mark the ideographic direction of the sentence data, and the word identifier is used to mark at least one word data in the sentence data; from multiple marked samples
  • the current sample sentence data is determined in the sentence data, and an initial sentence recognition model is determined, wherein the initial sentence recognition model includes a first recognition substructure and a second recognition substructure, and the first recognition substructure is used to identify the ideographic direction of the sentence data,
  • the second recognition result is used to identify word data in the sentence data; input the current sample sentence data into the first recognition substructure and the second recognition substructure respectively to obtain the first recognition result output by the first recognition substructure and the second recognition result.
  • the second recognition result output by the substructure determining the first training loss value of the first recognition substructure according to the first recognition result and the ideographic identification of the current sample sentence data; and, according to the second recognition result and the words of the current sample sentence data identify, determine the second training loss value of the second recognition substructure; obtain the current training loss value according to the first training loss value and the second training loss value, where the current training loss value is used to determine the training status of the initial sentence recognition model ;
  • the current training loss value does not reach the recognition convergence condition, obtain the next sample sentence data from the marked multiple sample sentence data and input it into the initial sentence recognition model; when the current training loss value reaches the recognition convergence condition, Determine the initial sentence recognition model as the target sentence recognition model.
  • multiple sample sentence data can be obtained in advance from a local database or server, where the types of the multiple sample sentence data can be the same or different, No limitation is made here.
  • marking the statement data in each sample statement data may include adding an ideographic identifier in front of each sample statement data.
  • the ideographic identifier may be the identifier CLS, which is not limited here. .
  • the first identification substructure and the second identification substructure may be different substructures, or may be partially the same and partially different substructures, which are not limited here.
  • the first recognition substructure of the first recognition substructure may be calculated based on the first recognition result and the real or expected ideographic identification of the current sample sentence data.
  • the training loss value is the difference between the first recognition result and the real or expected ideographic identification of the current sample sentence data; after the second recognition substructure outputs the second recognition result, the second recognition result and the current sample sentence data can be The real or expected word identifier is used to calculate the second training loss value of the second recognition substructure, that is, the difference between the second recognition result and the real or expected word identifier of the current sample sentence data.
  • different weight values can be used to integrate the first training loss value and the second training loss value respectively to obtain the training loss value of the entire sentence recognition model, which is not limited here.
  • the convergence condition of the sentence recognition model can be set in advance. After obtaining the training loss value of the entire sentence recognition model, the overall training loss value is compared with the convergence condition to determine the overall training loss value. Whether the loss value meets the convergence condition; if not, continue training the sentence recognition model; if it is satisfied, the target sentence recognition model can be obtained.
  • a training model can be used to simultaneously identify the intention and slot of the sentence, thereby reducing the training cost.
  • obtaining the current training loss value according to the first training loss value and the second training loss value includes: adjusting the first training loss value and the second training loss value according to the first target weight value and the second training loss value respectively.
  • the two target weight values are integrated to obtain the current training loss value, where the first target weight value is the weight value of the first training loss value, and the second target weight value is the weight value of the second training loss value.
  • different weight values can be used to integrate the first training loss value and the second training loss value respectively to obtain the training loss value of the entire sentence recognition model, which is not limited here.
  • ⁇ and ⁇ can be adjusted differently for different training tasks.
  • the method further includes: when the word identifier of the current sample sentence data includes the target hidden identifier, determining the target according to the third recognition result output by the second recognition substructure and the word identifier of the current sample sentence data. Hidden the third training loss value of the logo; integrate the first training loss value, the second training loss value and the third training loss value according to the first target weight value, the second target weight value and the third target weight value respectively to obtain The current training loss value, where the first target weight value is the weight value of the first training loss value, the second target weight value is the weight value of the second training loss value, and the third target weight value is the weight value of the third training loss value value.
  • the sentence recognition model is used to predict the word identifier corresponding to the hidden identifier, and the third training loss value of the hidden identifier can be calculated based on the recognition result of the sentence recognition model and the real or expected word identifier corresponding to the hidden identifier. ; Furthermore, different weight values can be used to integrate the first training loss value, the second training loss value and the third training loss value respectively to obtain the overall training loss value of the sentence recognition model, which is not limited here.
  • be the weight value of the first training loss value Loss 1
  • be the weight value of the second training loss value Loss 2
  • be the weight value of the second training loss value Loss 3.
  • the next sample sentence data is obtained from the marked multiple sample sentence data and input into the initial sentence recognition model, including: at the current training loss If the value does not reach the recognition convergence condition, adjust the parameters of the first recognition substructure and the second recognition substructure to obtain the adjusted initial sentence recognition model; obtain the next sample from the marked multiple sample sentence data Sentence data is input into the adjusted initial sentence recognition model.
  • the parameters of the first recognition substructure and the second recognition substructure can be adjusted so that the current training loss value reaches the recognition convergence condition as soon as possible. Identify the convergence condition, wherein the parameters of the first identification substructure and the second identification substructure can be adjusted simultaneously, or the parameters of the first identification substructure and the second identification substructure can be adjusted individually. For example, in a parameter adjustment, the parameters of the first identification substructure and the second identification substructure can be adjusted simultaneously, or the parameters of the first identification substructure and the second identification substructure can be adjusted sequentially, that is, the first identification substructure is currently adjusted. The parameters of the structure, the parameters of the second identification substructure will be adjusted next time, and are not limited here.
  • the convergence speed of the model can be improved.
  • the method includes: performing a data preprocessing operation on the multiple sample sentence data to obtain processed multiple sample sentence data, wherein the data preprocessing operation includes at least the following: One: Convert full-width to half-width, convert uppercase numbers to lowercase numbers, convert uppercase letters to lowercase letters, remove emoticons, word segmentation, stop word filtering.
  • sample sentence data after obtaining multiple sample sentence data, you can first perform data preprocessing operations on the sample sentence data, which mainly includes: converting full-width to half-width, converting uppercase numbers to lowercase numbers, converting uppercase letters to lowercase letters, Emoji removal, word segmentation, stop word filtering.
  • word segmentation is the process of recombining continuous word sequences into word sequences according to certain specifications. In English writing, spaces are used as natural delimiters between words. Each word can also be regarded as part of a sequence, which is not limited here.
  • This embodiment provides a model structure of a sentence recognition model, as shown in Figure 3.
  • Word Embedding mapping sparse One-hot encoding into dense vectors.
  • PE t,2i sin(t/1000 2i/d )
  • PE t,2i+1 cos(t/1000 2i/d )
  • t is the absolute position of the text
  • d represents the vector dimension of each text
  • i represents the index of the dimension
  • the feature extraction layer is shown in Figure 4. It uses the self-attention (Self-Attention) mechanism to define three matrices K, Q, and V respectively.
  • the input vector is multiplied by the three vectors K, Q, and V respectively to obtain three Corresponding eigenvectors K 1 , Q 1 , V 1 , bring the obtained three eigenvectors into the formula Among them, T represents the transpose of the matrix, that is, we can get the attention weight we need.
  • 12 groups that is, 12 K, 12 Q, and 12 V, can be defined to better extract features.
  • This step is repeated once, called one layer.
  • This model has four such structures, that is, four layers.
  • a residual network is also added to make up for the problem of information loss, that is, the input of the previous layer is used and added to the output of this layer.
  • an identifier CLS is added to represent the sentence.
  • the features output by the feature extraction layer in the statement recognition model corresponding to CLS are brought into a layer of feedforward neural network (FeedforWard), and then connected to multiple ( Number of categories) SigMoid layer, put the final output into the cross entropy loss function (Cross Entropy Loss function), and obtain LOSS 1 .
  • the features output by the feature extraction layer in the sentence recognition model corresponding to each word in the sentence are brought into the conditional random field (CRF), and the highest-scoring sequence prediction value and LOSS 2 are obtained.
  • CRF conditional random field
  • Loss total ⁇ Loss 1 + ⁇ Loss 2 + ⁇ Loss 3 , where ⁇ , ⁇ , and ⁇ are hyperparameters that need to be adjusted, and different adjustments need to be made for different tasks.
  • the method according to the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is Better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to related technologies.
  • the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk). ), includes several instructions to cause a terminal device (which can be a mobile phone, computer, server, or network device, etc.) to execute the methods of various embodiments of the present application.
  • FIG. 5 is a structural block diagram of an equipment control device according to an embodiment of the present application; as shown in Figure 5, it includes:
  • the identification module 501 is configured to, when the target audio is obtained, identify the target audio and obtain the sentence to be recognized corresponding to the target audio, where the target audio is used to request control of at least one controlled device;
  • the input module 502 is configured to input the sentence to be recognized into the target sentence recognition model to obtain the target meaning and the target word.
  • the target sentence recognition model includes a first recognition substructure and a second recognition substructure.
  • the first recognition substructure is In order to identify the ideographic direction of the sentence, the second recognition result is used to identify the word data in the sentence.
  • the target sentence recognition model is obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data. A model for identifying the ideographic direction of sentences and word data;
  • the sending module 503 is configured to send control instructions to the target controlled device according to the target meaning and the target word, wherein at least one controlled device includes the target controlled device.
  • the target audio when the target audio is obtained, the target audio is recognized and the sentence to be recognized corresponding to the target audio is obtained, where the target audio is used to request control of at least one controlled device; the sentence to be recognized is input Go to the target sentence recognition model to obtain the target meaning and target words.
  • the target sentence recognition model includes a first recognition substructure and a second recognition substructure.
  • the first recognition substructure is used to identify the ideographic direction of the sentence
  • the second recognition result is For identifying the word data in the sentence
  • the target sentence recognition model is a model obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data, and is used to identify the ideographic direction of the sentence and the word data; Send a control instruction to the target controlled device according to the target meaning and the target word, wherein at least one controlled device includes the target controlled device.
  • two substructures are set up in the target sentence recognition model to identify the ideographic direction and word data respectively, so that one model can be used to identify the intention and slot of the sentence at the same time, thus reducing the pressure on the server and solving the problem.
  • Technical issues that put a lot of pressure on the server due to the need for two models to recognize voice control commands.
  • Embodiments of the present application also provide a storage medium that includes a stored program, wherein any of the above methods is executed when the program is run.
  • the above-mentioned storage medium may be configured to store program codes for performing the following steps:
  • the target sentence recognition model includes a first recognition substructure and a second recognition substructure.
  • the first recognition substructure is used to recognize the meaning of the sentence.
  • Point to, the second recognition result is used to identify the word data in the sentence, and the target sentence recognition model is the ideographic meaning of the sentence obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data.
  • S3 Send a control instruction to the target controlled device according to the target meaning and the target word, where at least one controlled device includes the target controlled device.
  • the target audio when the target audio is obtained, the target audio is recognized and the sentence to be recognized corresponding to the target audio is obtained, where the target audio is used to request control of at least one controlled device; the sentence to be recognized is input Go to the target sentence recognition model to obtain the target meaning and target words.
  • the target sentence recognition model includes a first recognition substructure and a second recognition substructure.
  • the first recognition substructure is used to identify the ideographic direction of the sentence
  • the second recognition result is For identifying the word data in the sentence
  • the target sentence recognition model is a model obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data, and is used to identify the ideographic direction of the sentence and the word data; Send a control instruction to the target controlled device according to the target meaning and the target word, wherein at least one controlled device includes the target controlled device.
  • two substructures are set up in the target sentence recognition model to identify the ideographic direction and word data respectively, so that one model can be used to identify the intention and slot of the sentence at the same time, thus reducing the pressure on the server and solving the problem.
  • Technical issues that put a lot of pressure on the server due to the need for two models to recognize voice control commands.
  • An embodiment of the present application also provides an electronic device, including a memory and a processor.
  • a computer program is stored in the memory, and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.
  • the above-mentioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the above-mentioned processor, and the input-output device is connected to the above-mentioned processor.
  • the above-mentioned processor may be configured to perform the following steps through a computer program:
  • the target sentence recognition model includes a first recognition substructure and a second recognition substructure.
  • the first recognition substructure is used to recognize the meaning of the sentence.
  • Point to, the second recognition result is used to identify the word data in the sentence, and the target sentence recognition model is the ideographic meaning of the sentence obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data.
  • S3 Send a control instruction to the target controlled device according to the target meaning and the target word, where at least one controlled device includes the target controlled device.
  • the target audio when the target audio is obtained, the target audio is recognized and the sentence to be recognized corresponding to the target audio is obtained, where the target audio is used to request control of at least one controlled device; the sentence to be recognized is input Go to the target sentence recognition model to obtain the target meaning and target words.
  • the target sentence recognition model includes a first recognition substructure and a second recognition substructure.
  • the first recognition substructure is used to identify the ideographic direction of the sentence
  • the second recognition result is For identifying the word data in the sentence
  • the target sentence recognition model is a model obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data, and is used to identify the ideographic direction of the sentence and the word data; Send a control instruction to the target controlled device according to the target meaning and the target word, wherein at least one controlled device includes the target controlled device.
  • two substructures are set up in the target sentence recognition model to identify the ideographic direction and word data respectively, so that one model can be used to identify the intention and slot of the sentence at the same time, thus reducing the pressure on the server and solving the problem.
  • Technical issues that put a lot of pressure on the server due to the need for two models to recognize voice control commands.
  • the above storage medium may include but is not limited to: U disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM), Various media that can store program code, such as mobile hard drives, magnetic disks, or optical disks.
  • ROM read-only memory
  • RAM random access memory
  • program code such as mobile hard drives, magnetic disks, or optical disks.
  • modules or steps of the present application can be implemented using general-purpose computing devices, and they can be concentrated on a single computing device, or distributed across a network composed of multiple computing devices. , optionally, they may be implemented in program code executable by a computing device, such that they may be stored in a storage device for execution by the computing device, and in some cases, may be in a sequence different from that herein.
  • the steps shown or described are performed either individually as individual integrated circuit modules, or as multiple modules or steps among them as a single integrated circuit module. As such, the application is not limited to any specific combination of hardware and software.
  • the target audio When the target audio is obtained, identify the target audio and obtain the sentence to be recognized corresponding to the target audio, where the target audio is used to request control of at least one controlled device; input the sentence to be recognized into the target sentence recognition model to obtain the target ideogram and the target word, wherein the target sentence recognition model includes a first recognition substructure and a second recognition substructure.
  • the first recognition substructure is used to identify the ideographic direction of the sentence
  • the second recognition result is used to identify the word data in the sentence.
  • the target The sentence recognition model is a model obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data, and is used to recognize the ideographic direction and word data of the sentence; according to the target ideographic meaning and the target word, the target recipient is
  • the control device sends a control instruction, wherein at least one controlled device includes a target controlled device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Automation & Control Theory (AREA)
  • Machine Translation (AREA)

Abstract

An appliance control method, a storage medium, and an electronic device. The appliance control method comprises: if a target audio is obtained, recognizing the target audio to obtain a statement to be recognized corresponding to the target audio, wherein the target audio is used for requesting to control at least one controlled appliance (S202); inputting the statement to be recognized into a target statement recognition model to obtain a target intention and a target word (S204); and sending a control instruction to a target controlled appliance according to the target intention and the target word, wherein the at least one controlled appliance comprises the target controlled appliance (S206). The method solves the technical problem of large pressure on a server caused by the requirement of two models for the recognition of a voice control instruction.

Description

设备控制方法、存储介质及电子装置Equipment control method, storage medium and electronic device 技术领域Technical field
本申请涉及通信领域,具体而言,涉及一种设备控制方法、存储介质及电子装置。The present application relates to the field of communications, specifically, to an equipment control method, a storage medium and an electronic device.
背景技术Background technique
目前,随着智能家居的发展,越来越多的智能家居可以通过语音对其进行控制。而在任务型对话中,意图识别和槽位填充是语义理解模块(Nartul Language Understanding,简称NLU)绕不开的问题。Currently, with the development of smart homes, more and more smart homes can be controlled through voice. In task-based conversations, intent recognition and slot filling are issues that the semantic understanding module (Nartul Language Understanding, or NLU for short) cannot avoid.
相关技术中,传统任务型对话至少需要两个模型,分别识别意图和槽位。也就是说,需要将用户的语音控制指令输入至两个模型中。而随着智能设备的增多,将导致服务器的压力过大。In related technologies, traditional task-based dialogue requires at least two models to identify intent and slot respectively. In other words, the user's voice control instructions need to be input into the two models. As the number of smart devices increases, servers will be put under excessive pressure.
针对相关技术中,由于需要两个模型来识别语音控制指令,导致服务器压力较大的技术问题,尚未提出有效的解决方案。In related technologies, no effective solution has yet been proposed for the technical problem that requires two models to recognize voice control instructions, which causes great pressure on the server.
发明内容Contents of the invention
本申请实施例提供了一种设备控制方法、存储介质及电子装置,以至少解决相关技术中,由于需要两个模型来识别语音控制指令,导致服务器压力较大的技术问题。Embodiments of the present application provide a device control method, a storage medium, and an electronic device to at least solve the technical problem in related technologies that requires two models to recognize voice control instructions, resulting in high pressure on the server.
根据本申请实施例的一个实施例,提供了一种设备控制方法,包括:在获取到目标音频的情况下,识别上述目标音频,得到上述目标音频对应的待识别语句,其中,上述目标音频用于请求控制至少一个受控设备;将上述待识别语句输入至目标语句识别模型,得到目标表意和目标词语,其中,上述目标语句识别模型包括第一识别子结构以及第二识别子结构,上述第一识别子结构用于识别语句的表意指向,上述第二识别结果用于识别语句中的词语数据,上述目标语句识别模型 为利用多个样本语句数据对上述第一识别子结构以及上述第二识别子结构进行联合训练得到的、用于识别语句的表意指向以及词语数据的模型;根据上述目标表意和上述目标词语向目标受控设备发送控制指令,其中,上述至少一个受控设备包括上述目标受控设备。According to an embodiment of the present application, a device control method is provided, including: when the target audio is obtained, identifying the target audio, and obtaining a sentence to be recognized corresponding to the target audio, wherein the target audio is In response to a request to control at least one controlled device; input the above-mentioned sentence to be recognized into the target sentence recognition model to obtain the target meaning and the target word, wherein the above-mentioned target sentence recognition model includes a first recognition substructure and a second recognition substructure, and the above-mentioned third recognition substructure An identification substructure is used to identify the ideographic direction of the sentence, and the above-mentioned second recognition result is used to identify the word data in the sentence. The above-mentioned target sentence recognition model uses multiple sample sentence data to compare the above-mentioned first recognition substructure and the above-mentioned second recognition. A model for identifying the ideographic direction and word data of a sentence obtained through joint training of substructures; sending control instructions to the target controlled device according to the target ideogram and the target word, wherein the at least one controlled device includes the target controlled device. control equipment.
在一个示例性实施例中,上述根据上述目标表意和上述目标词语向目标受控设备发送控制指令,包括:根据上述目标表意和上述目标词语获取上述控制指令以及指令发送信息,其中,上述指令发送信息用于指示将上述控制指令发送至上述至少一个受控设备中的上述目标受控设备。In an exemplary embodiment, sending the control instruction to the target controlled device according to the target expression and the target word includes: obtaining the control instruction and instruction sending information according to the target expression and the target word, wherein the instruction sending The information is used to instruct the above-mentioned control instruction to be sent to the above-mentioned target controlled device in the above-mentioned at least one controlled device.
在一个示例性实施例中,上述根据上述目标表意和上述目标词语获取上述控制指令以及指令发送信息,包括:根据上述目标表意确定上述控制指令的执行权限;以及,根据上述目标词语确定上述指令发送信息以及上述控制指令的指令内容。In an exemplary embodiment, the above-mentioned obtaining the above-mentioned control instruction and the above-mentioned instruction sending information based on the above-mentioned target meaning and the above-mentioned target word includes: determining the execution authority of the above-mentioned control instruction based on the above-mentioned target meaning; and determining the above-mentioned instruction sending based on the above-mentioned target word. information and the instruction content of the above control instructions.
在一个示例性实施例中,在获取所述目标音频之前,包括:获取多个样本语句数据;对每个上述样本语句数据中的语句数据进行标记,得到标记后的上述多个样本语句数据,其中,每个标记后的样本语句数据中包括标记的表意标识和词语标识,上述表意标识用于标记上述语句数据的表意指向,上述词语标识用于标记上述语句数据中的至少一个词语数据;从标记后的上述多个样本语句数据中确定出当前样本语句数据,并确定初始语句识别模型,其中,上述初始语句识别模型包括第一识别子结构以及第二识别子结构,上述第一识别子结构用于识别上述语句数据的表意指向,上述第二识别结果用于识别上述语句数据中的词语数据;将上述当前样本语句数据分别输入上述第一识别子结构以及上述第二识别子结构,得到上述第一识别子结构输出的第一识别结果、以及上述第二识别子结构输出的第二识别结果;根据上述第一识别结果和上述当前样本语句数据的表意标识,确定上述第一识别子结构的第一训练损失值;以及,根据上述第二识别结果和上述当前样本语句数据的词语标识,确定上述第二识别子结构的第二训练损失值;根据上述第一训练损失值以及上述第二训练损失值,得到当前训练损失值,其中,上述当前训练损失值用于确定上述初始语句识别模型的训练状况;在上述当前训 练损失值未达到识别收敛条件的情况下,从标记后的上述多个样本语句数据中获取下一个样本语句数据输入上述初始语句识别模型;在上述当前训练损失值达到识别上述收敛条件的情况下,确定上述初始语句识别模型为上述目标语句识别模型。In an exemplary embodiment, before acquiring the target audio, the method includes: acquiring multiple sample sentence data; marking the sentence data in each of the above sample sentence data to obtain the marked multiple sample sentence data, Wherein, each marked sample sentence data includes a marked ideographic identifier and a word identifier, the above-mentioned ideographic identifier is used to mark the ideographic direction of the above-mentioned sentence data, and the above-mentioned word identifier is used to mark at least one word data in the above-mentioned sentence data; from The current sample sentence data is determined from the marked plurality of sample sentence data, and an initial sentence recognition model is determined, wherein the above-mentioned initial sentence recognition model includes a first recognition substructure and a second recognition substructure, and the above-mentioned first recognition substructure The above-mentioned second recognition result is used to identify the word data in the above-mentioned sentence data; the above-mentioned current sample sentence data is input into the above-mentioned first recognition sub-structure and the above-mentioned second recognition sub-structure respectively to obtain the above-mentioned The first recognition result output by the first recognition substructure, and the second recognition result output by the above-mentioned second recognition substructure; according to the above-mentioned first recognition result and the ideographic identification of the above-mentioned current sample sentence data, determine the above-mentioned first recognition substructure a first training loss value; and, based on the above-mentioned second recognition result and the word identifier of the above-mentioned current sample sentence data, determine a second training loss value of the above-mentioned second recognition substructure; based on the above-mentioned first training loss value and the above-mentioned second training loss value to obtain the current training loss value, where the above-mentioned current training loss value is used to determine the training status of the above-mentioned initial sentence recognition model; when the above-mentioned current training loss value does not reach the recognition convergence condition, the above-mentioned multiple marked Obtain the next sample sentence data from the sample sentence data and input it into the above-mentioned initial sentence recognition model; when the above-mentioned current training loss value reaches the above-mentioned convergence condition for recognition, the above-mentioned initial sentence recognition model is determined to be the above-mentioned target sentence recognition model.
在一个示例性实施例中,上述根据上述第一训练损失值以及上述第二训练损失值,得到当前训练损失值,包括:对上述第一训练损失值和上述第二训练损失值分别按照第一目标权重值和第二目标权重值进行整合,得到上述当前训练损失值,其中,上述第一目标权重值为上述第一训练损失值的权重值,上述第二目标权重值为上述第二训练损失值的权重值。In an exemplary embodiment, obtaining the current training loss value based on the above-mentioned first training loss value and the above-mentioned second training loss value includes: performing the above-mentioned first training loss value and the above-mentioned second training loss value according to the first The target weight value and the second target weight value are integrated to obtain the above-mentioned current training loss value, wherein the above-mentioned first target weight value is the weight value of the above-mentioned first training loss value, and the above-mentioned second target weight value is the above-mentioned second training loss value. The weight value of the value.
在一个示例性实施例中,还包括:在上述当前样本语句数据的词语标识包括目标隐藏标识的情况下,根据上述第二识别子结构输出的第三识别结果和上述当前样本语句数据的词语标识,确定上述目标隐藏标识的第三训练损失值;对上述第一训练损失值、上述第二训练损失值和上述第三训练损失值分别按照第一目标权重值、第二目标权重值和第三目标权重值进行整合,得到上述当前训练损失值,其中,上述第一目标权重值为上述第一训练损失值的权重值,上述第二目标权重值为上述第二训练损失值的权重值,上述第三目标权重值为上述第三训练损失值的权重值。In an exemplary embodiment, the method further includes: when the word identifier of the current sample sentence data includes a target hidden identifier, the third recognition result output according to the second recognition substructure and the word identifier of the current sample sentence data. , determine the third training loss value of the above-mentioned target hidden identifier; the above-mentioned first training loss value, the above-mentioned second training loss value and the above-mentioned third training loss value are respectively based on the first target weight value, the second target weight value and the third The target weight values are integrated to obtain the above-mentioned current training loss value, wherein the above-mentioned first target weight value is the weight value of the above-mentioned first training loss value, the above-mentioned second target weight value is the weight value of the above-mentioned second training loss value, and the above-mentioned The third target weight value is the weight value of the above-mentioned third training loss value.
在一个示例性实施例中,上述在上述当前训练损失值未达到识别收敛条件的情况下,从标记后的上述多个样本语句数据中获取下一个样本语句数据输入上述初始语句识别模型,包括:在上述当前训练损失值未达到识别收敛条件的情况下,对上述第一识别子结构和上述第二识别子结构的参数进行调整,得到调整后的初始语句识别模型;从标记后的上述多个样本语句数据中获取下一个样本语句数据输入上述调整后的初始语句识别模型。In an exemplary embodiment, when the current training loss value does not reach the recognition convergence condition, obtaining the next sample sentence data from the marked plurality of sample sentence data and inputting it into the above initial sentence recognition model includes: When the current training loss value does not reach the recognition convergence condition, the parameters of the first recognition substructure and the second recognition substructure are adjusted to obtain an adjusted initial sentence recognition model; from the marked multiple Obtain the next sample sentence data from the sample sentence data and input it into the above-adjusted initial sentence recognition model.
根据本申请实施例的另一个实施例,还提供了一种设备控制装置,包括:识别模块,被设置为在获取到目标音频的情况下,识别上述目标音频,得到上述目标音频对应的待识别语句,其中,上述目标音频用于请求控制至少一个受控设备;输入模块,被设置为将上述待识别语句输入至目标语句识别模型,得到目标表意 和目标词语,其中,上述目标语句识别模型包括第一识别子结构以及第二识别子结构,上述第一识别子结构用于识别语句的表意指向,上述第二识别结果用于识别语句中的词语数据,上述目标语句识别模型为利用多个样本语句数据对上述第一识别子结构以及上述第二识别子结构进行联合训练得到的、用于识别语句的表意指向以及词语数据的模型;发送模块,被设置为根据上述目标表意和上述目标词语向目标受控设备发送控制指令,其中,上述至少一个受控设备包括上述目标受控设备。According to another embodiment of the present application, a device control device is also provided, including: an identification module configured to identify the target audio when the target audio is obtained, and obtain the target audio corresponding to the target audio to be identified. statement, wherein the above-mentioned target audio is used to request the control of at least one controlled device; the input module is configured to input the above-mentioned sentence to be recognized into the target sentence recognition model to obtain the target ideogram and the target word, wherein the above-mentioned target sentence recognition model includes A first recognition substructure and a second recognition substructure. The first recognition substructure is used to identify the ideographic direction of a sentence. The second recognition result is used to identify word data in the sentence. The target sentence recognition model uses multiple samples. The sentence data is a model for identifying the ideographic direction and word data of the sentence, which is obtained by jointly training the above-mentioned first recognition substructure and the above-mentioned second recognition substructure; the sending module is set to be based on the above-mentioned target ideogram and the above-mentioned target word direction. The target controlled device sends a control instruction, wherein the at least one controlled device includes the target controlled device.
根据本申请实施例的又一方面,还提供了一种计算机可读的存储介质,该计算机可读的存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述设备控制方法。According to yet another aspect of the embodiment of the present application, a computer-readable storage medium is also provided. The computer-readable storage medium stores a computer program, wherein the computer program is configured to execute the above device control method when running. .
根据本申请实施例的又一方面,还提供了一种电子装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,上述处理器通过计算机程序执行上述设备控制方法。According to another aspect of the embodiment of the present application, an electronic device is also provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the above device through the computer program Control Method.
在本申请实施例中,在获取到目标音频的情况下,识别目标音频,得到目标音频对应的待识别语句,其中,目标音频用于请求控制至少一个受控设备;将待识别语句输入至目标语句识别模型,得到目标表意和目标词语,其中,目标语句识别模型包括第一识别子结构以及第二识别子结构,第一识别子结构用于识别语句的表意指向,第二识别结果用于识别语句中的词语数据,目标语句识别模型为利用多个样本语句数据对第一识别子结构以及第二识别子结构进行联合训练得到的、用于识别语句的表意指向以及词语数据的模型;根据目标表意和目标词语向目标受控设备发送控制指令,其中,至少一个受控设备包括目标受控设备。采用上述技术方案,通过在目标语句识别模型中设置两个子结构,来分别识别表意指向和词语数据,从而利用一个模型可以同时识别语句的意图和槽位,进而减小了服务器的压力,解决了由于需要两个模型来识别语音控制指令,导致服务器压力较大的技术问题。In the embodiment of the present application, when the target audio is obtained, the target audio is identified and a sentence to be recognized corresponding to the target audio is obtained, where the target audio is used to request control of at least one controlled device; the sentence to be recognized is input to the target The sentence recognition model obtains the target meaning and the target word. The target sentence recognition model includes a first recognition substructure and a second recognition substructure. The first recognition substructure is used to identify the ideographic direction of the sentence, and the second recognition result is used to identify The word data in the sentence, the target sentence recognition model is a model obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data, and is used to recognize the ideographic direction of the sentence and the word data; according to the target The ideographic and target words send control instructions to the target controlled device, wherein at least one controlled device includes the target controlled device. Using the above technical solution, two substructures are set up in the target sentence recognition model to identify the ideographic direction and word data respectively, so that one model can be used to identify the intention and slot of the sentence at the same time, thus reducing the pressure on the server and solving the problem. Technical issues that put a lot of pressure on the server due to the need for two models to recognize voice control commands.
附图说明Description of the drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or related technologies, the following will briefly introduce the drawings needed to describe the embodiments or related technologies. Obviously, for those of ordinary skill in the art, Other drawings can also be obtained based on these drawings without incurring any creative effort.
图1是根据本申请实施例的一种可选的设备控制方法的计算机终端的硬件结构框图;Figure 1 is a hardware structure block diagram of a computer terminal according to an optional device control method according to an embodiment of the present application;
图2是根据本申请实施例的一种可选的设备控制方法的流程图;Figure 2 is a flow chart of an optional device control method according to an embodiment of the present application;
图3是根据本申请实施例的一种可选的设备控制方法的示意图;Figure 3 is a schematic diagram of an optional device control method according to an embodiment of the present application;
图4是根据本申请实施例的另一种可选的设备控制方法的示意图;Figure 4 is a schematic diagram of another optional device control method according to an embodiment of the present application;
图5是根据本申请实施例的一种可选的设备控制装置的结构框图。Figure 5 is a structural block diagram of an optional equipment control device according to an embodiment of the present application.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。In order to enable those in the technical field to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only These are part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts should fall within the scope of protection of this application.
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步 骤或单元。It should be noted that the terms "first", "second", etc. in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method, system, product, or apparatus that encompasses a series of steps or units and need not be limited to those explicitly listed. Those steps or elements may instead include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.
根据本申请实施例的一个方面,提供了一种设备控制方法。该设备控制方法广泛应用于智慧家庭(Smart Home)、智能家居、智能家用设备生态、智慧住宅(Intelligence House)生态等全屋智能数字化控制应用场景。可选地,在本实施例中,上述设备控制方法可以应用于如图1所示的由终端设备102和服务器104所构成的硬件环境中。如图1所示,服务器104通过网络与终端设备102进行连接,可用于为终端或终端上安装的客户端提供服务(如应用服务等),可在服务器上或独立于服务器设置数据库,用于为服务器104提供数据存储服务,可在服务器上或独立于服务器配置云计算和/或边缘计算服务,用于为服务器104提供数据运算服务。According to an aspect of an embodiment of the present application, a device control method is provided. This device control method is widely used in whole-house intelligent digital control application scenarios such as Smart Home, Smart Home, Smart Home Equipment Ecology, and Intelligence House Ecology. Optionally, in this embodiment, the above device control method can be applied to the hardware environment composed of the terminal device 102 and the server 104 as shown in FIG. 1 . As shown in Figure 1, the server 104 is connected to the terminal device 102 through the network and can be used to provide services (such as application services, etc.) for the terminal or the client installed on the terminal. A database can be set up on the server or independently from the server. To provide data storage services for the server 104, cloud computing and/or edge computing services can be configured on the server or independently of the server to provide data computing services for the server 104.
上述网络可以包括但不限于以下至少之一:有线网络,无线网络。上述有线网络可以包括但不限于以下至少之一:广域网,城域网,局域网,上述无线网络可以包括但不限于以下至少之一:WIFI(Wireless Fidelity,无线保真),蓝牙。终端设备102可以并不限定于为PC、手机、平板电脑、智能空调、智能烟机、智能冰箱、智能烤箱、智能炉灶、智能洗衣机、智能热水器、智能洗涤设备、智能洗碗机、智能投影设备、智能电视、智能晾衣架、智能窗帘、智能影音、智能插座、智能音响、智能音箱、智能新风设备、智能厨卫设备、智能卫浴设备、智能扫地机器人、智能擦窗机器人、智能拖地机器人、智能空气净化设备、智能蒸箱、智能微波炉、智能厨宝、智能净化器、智能饮水机、智能门锁等。The above-mentioned network may include but is not limited to at least one of the following: wired network, wireless network. The above-mentioned wired network may include but is not limited to at least one of the following: wide area network, metropolitan area network, and local area network. The above-mentioned wireless network may include at least one of the following: WIFI (Wireless Fidelity, Wireless Fidelity), Bluetooth. The terminal device 102 may be, but is not limited to, a PC, a mobile phone, a tablet, a smart air conditioner, a smart hood, a smart refrigerator, a smart oven, a smart stove, a smart washing machine, a smart water heater, a smart washing equipment, a smart dishwasher, or a smart projection device. , smart TV, smart clothes drying rack, smart curtains, smart audio and video, smart sockets, smart audio, smart speakers, smart fresh air equipment, smart kitchen and bathroom equipment, smart bathroom equipment, smart sweeping robot, smart window cleaning robot, smart mopping robot, Smart air purification equipment, smart steamers, smart microwave ovens, smart kitchen appliances, smart purifiers, smart water dispensers, smart door locks, etc.
在本实施例中提供了一种设备控制方法,图2是根据本申请实施例的设备控制方法的流程图,该流程包括如下步骤:This embodiment provides a device control method. Figure 2 is a flow chart of the device control method according to the embodiment of the present application. The process includes the following steps:
步骤S202,在获取到目标音频的情况下,识别目标音频,得到目标音频对应的待识别语句,其中,目标音频用于请求控制至少一个受控设备;Step S202: When the target audio is obtained, identify the target audio and obtain the sentence to be recognized corresponding to the target audio, where the target audio is used to request control of at least one controlled device;
步骤S204,将待识别语句输入至目标语句识别模型,得到目标表意和目标词语,其中,目标语句识别模型包括第一识别子结构以及第二识别子结构,第一识别子结构用于识别语句的表意指向,第二识别结果用于识别语句中的词语数据,目标语句识别模型为利用多个样本语句数据对第一识别子结构以及第二识别子 结构进行联合训练得到的、用于识别语句的表意指向以及词语数据的模型;Step S204: Input the sentence to be recognized into the target sentence recognition model to obtain the target meaning and the target word. The target sentence recognition model includes a first recognition substructure and a second recognition substructure. The first recognition substructure is used to recognize the sentence. Indicative direction, the second recognition result is used to identify word data in the sentence, and the target sentence recognition model is a model for recognizing sentences obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data. Models of ideographic pointing and word data;
步骤S206,根据目标表意和目标词语向目标受控设备发送控制指令,其中,至少一个受控设备包括目标受控设备。Step S206: Send a control instruction to the target controlled device according to the target meaning and the target word, where at least one controlled device includes the target controlled device.
可选地,在本实施例中,用户可以但不限于通过语音控制指令来控制受控设备;目标音频可以包括但不限于控制受控设备的音频。Optionally, in this embodiment, the user may, but is not limited to, control the controlled device through voice control instructions; the target audio may include, but is not limited to, the audio used to control the controlled device.
可选地,在本实施例中,目标语句识别模型可以包括但不限于包括两个识别子结构,分别用于识别语句的表意指向及词语数据,两个识别子结构可以但不限于通过联合训练得到。Optionally, in this embodiment, the target sentence recognition model may include, but is not limited to, two recognition substructures, respectively used to identify the ideographic direction and word data of the sentence. The two recognition substructures may, but are not limited to, be trained through joint training. get.
可选地,在本实施例中,根据目标表意和目标词语可以确定目标受控设备及发送至目标受控设备的控制指令。Optionally, in this embodiment, the target controlled device and the control instruction sent to the target controlled device can be determined according to the target meaning and the target word.
通过本申请实施例提供的方案,在获取到目标音频的情况下,识别目标音频,得到目标音频对应的待识别语句,其中,目标音频用于请求控制至少一个受控设备;将待识别语句输入至目标语句识别模型,得到目标表意和目标词语,其中,目标语句识别模型包括第一识别子结构以及第二识别子结构,第一识别子结构用于识别语句的表意指向,第二识别结果用于识别语句中的词语数据,目标语句识别模型为利用多个样本语句数据对第一识别子结构以及第二识别子结构进行联合训练得到的、用于识别语句的表意指向以及词语数据的模型;根据目标表意和目标词语向目标受控设备发送控制指令,其中,至少一个受控设备包括目标受控设备。采用上述技术方案,通过在目标语句识别模型中设置两个子结构,来分别识别表意指向和词语数据,从而利用一个模型可以同时识别语句的意图和槽位,进而减小了服务器的压力,解决了由于需要两个模型来识别语音控制指令,导致服务器压力较大的技术问题。Through the solution provided by the embodiment of the present application, when the target audio is obtained, the target audio is recognized and the sentence to be recognized corresponding to the target audio is obtained, where the target audio is used to request control of at least one controlled device; the sentence to be recognized is input Go to the target sentence recognition model to obtain the target meaning and target words. The target sentence recognition model includes a first recognition substructure and a second recognition substructure. The first recognition substructure is used to identify the ideographic direction of the sentence, and the second recognition result is For identifying the word data in the sentence, the target sentence recognition model is a model obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data, and is used to identify the ideographic direction of the sentence and the word data; Send a control instruction to the target controlled device according to the target meaning and the target word, wherein at least one controlled device includes the target controlled device. Using the above technical solution, two substructures are set up in the target sentence recognition model to identify the ideographic direction and word data respectively, so that one model can be used to identify the intention and slot of the sentence at the same time, thus reducing the pressure on the server and solving the problem. Technical issues that put a lot of pressure on the server due to the need for two models to recognize voice control commands.
在一个示例性实施例中,根据目标表意和目标词语向目标受控设备发送控制指令,包括:根据目标表意和目标词语获取控制指令以及指令发送信息,其中,指令发送信息用于指示将控制指令发送至至少一个受控设备中的目标受控设备。In an exemplary embodiment, sending a control instruction to the target controlled device according to the target meaning and the target word includes: obtaining the control instruction and command sending information according to the target meaning and the target word, wherein the command sending information is used to indicate that the control instruction will be Sent to the target controlled device in at least one controlled device.
可选地,在本实施例中,根据目标表意可以确定目标音频的意图,根据目标 词语可以确定目标受控设备以及控制指令的指令内容。例如,目标音频为“打开客厅的电视”,那么,目标表意为“家电控制”,目标受控设备为“客厅电视”,控制指令的指令内容为“打开客厅电视”。Optionally, in this embodiment, the intention of the target audio can be determined based on the target meaning, and the target controlled device and the instruction content of the control instruction can be determined based on the target words. For example, if the target audio is "turn on the TV in the living room", then the target meaning is "home appliance control", the target controlled device is "the TV in the living room", and the command content of the control instruction is "turn on the TV in the living room".
通过本申请实施例提供的方案,通过语音交互来控制受控设备,提高了用户体验。Through the solution provided by the embodiments of this application, the controlled device is controlled through voice interaction, which improves the user experience.
在一个示例性实施例中,根据目标表意和目标词语获取控制指令以及指令发送信息,包括:根据目标表意确定控制指令的执行权限;以及,根据目标词语确定指令发送信息以及控制指令的指令内容。In an exemplary embodiment, obtaining the control instruction and instruction sending information based on the target meaning and the target word includes: determining the execution authority of the control instruction based on the target meaning; and determining the instruction sending information and the instruction content of the control instruction based on the target word.
可选地,在本实施例中,根据目标表意可以确定目标音频的意图,根据目标词语可以确定目标受控设备以及控制指令的指令内容。例如,目标音频为“打开客厅的空调”,那么,目标表意为“家电控制”,目标受控设备为“客厅空调”,控制指令的指令内容为“打开客厅空调”。Optionally, in this embodiment, the intention of the target audio can be determined based on the target meaning, and the target controlled device and the instruction content of the control instruction can be determined based on the target words. For example, if the target audio is "turn on the air conditioner in the living room", then the target meaning is "home appliance control", the target controlled device is "the air conditioner in the living room", and the command content of the control instruction is "turn on the air conditioner in the living room".
通过本申请实施例提供的方案,通过语音交互来控制受控设备,提高了用户体验。Through the solution provided by the embodiments of this application, the controlled device is controlled through voice interaction, which improves the user experience.
在一个示例性实施例中,在获取所述目标音频之前,包括:获取多个样本语句数据;对每个样本语句数据中的语句数据进行标记,得到标记后的多个样本语句数据,其中,每个标记后的样本语句数据中包括标记的表意标识和词语标识,表意标识用于标记语句数据的表意指向,词语标识用于标记语句数据中的至少一个词语数据;从标记后的多个样本语句数据中确定出当前样本语句数据,并确定初始语句识别模型,其中,初始语句识别模型包括第一识别子结构以及第二识别子结构,第一识别子结构用于识别语句数据的表意指向,第二识别结果用于识别语句数据中的词语数据;将当前样本语句数据分别输入第一识别子结构以及第二识别子结构,得到第一识别子结构输出的第一识别结果、以及第二识别子结构输出的第二识别结果;根据第一识别结果和当前样本语句数据的表意标识,确定第一识别子结构的第一训练损失值;以及,根据第二识别结果和当前样本语句数据的词语标识,确定第二识别子结构的第二训练损失值;根据第一训练损失值以及第二训练损失值,得到当前训练损失值,其中,当前训练损失值用于确定初始语 句识别模型的训练状况;在当前训练损失值未达到识别收敛条件的情况下,从标记后的多个样本语句数据中获取下一个样本语句数据输入初始语句识别模型;在当前训练损失值达到识别收敛条件的情况下,确定初始语句识别模型为目标语句识别模型。In an exemplary embodiment, before acquiring the target audio, the method includes: acquiring multiple sample sentence data; marking the sentence data in each sample sentence data to obtain multiple marked sample sentence data, wherein, Each marked sample sentence data includes a marked ideographic identifier and a word identifier. The ideographic identifier is used to mark the ideographic direction of the sentence data, and the word identifier is used to mark at least one word data in the sentence data; from multiple marked samples The current sample sentence data is determined in the sentence data, and an initial sentence recognition model is determined, wherein the initial sentence recognition model includes a first recognition substructure and a second recognition substructure, and the first recognition substructure is used to identify the ideographic direction of the sentence data, The second recognition result is used to identify word data in the sentence data; input the current sample sentence data into the first recognition substructure and the second recognition substructure respectively to obtain the first recognition result output by the first recognition substructure and the second recognition result. The second recognition result output by the substructure; determining the first training loss value of the first recognition substructure according to the first recognition result and the ideographic identification of the current sample sentence data; and, according to the second recognition result and the words of the current sample sentence data identify, determine the second training loss value of the second recognition substructure; obtain the current training loss value according to the first training loss value and the second training loss value, where the current training loss value is used to determine the training status of the initial sentence recognition model ; When the current training loss value does not reach the recognition convergence condition, obtain the next sample sentence data from the marked multiple sample sentence data and input it into the initial sentence recognition model; when the current training loss value reaches the recognition convergence condition, Determine the initial sentence recognition model as the target sentence recognition model.
可选地,在本实施例中,在对语句识别模型进行训练之前,可以从本地数据库或服务器中预先获取多个样本语句数据,其中,多个样本语句数据的类型可以相同,也可以不同,在此不作限定。Optionally, in this embodiment, before training the sentence recognition model, multiple sample sentence data can be obtained in advance from a local database or server, where the types of the multiple sample sentence data can be the same or different, No limitation is made here.
可选地,在本实施例中,对每个样本语句数据中的语句数据进行标记可以包括在每个样本语句数据的前面添加表意标识,例如,表意标识可以为标识符CLS,在此不作限定。Optionally, in this embodiment, marking the statement data in each sample statement data may include adding an ideographic identifier in front of each sample statement data. For example, the ideographic identifier may be the identifier CLS, which is not limited here. .
可选地,在本实施例中,第一识别子结构和第二识别子结构可以是不同的子结构,也可以是部分相同、部分不同的子结构,在此不作限定。Optionally, in this embodiment, the first identification substructure and the second identification substructure may be different substructures, or may be partially the same and partially different substructures, which are not limited here.
可选地,在本实施例中,在第一识别子结构输出第一识别结果后,可以根据第一识别结果和当前样本语句数据真实或期望的表意标识,计算第一识别子结构的第一训练损失值,即第一识别结果与当前样本语句数据真实或期望的表意标识之间的差异大小;在第二识别子结构输出第二识别结果后,可以根据第二识别结果和当前样本语句数据真实或期望的词语标识,计算第二识别子结构的第二训练损失值,即第二识别结果与当前样本语句数据真实或期望的词语标识之间的差异大小。Optionally, in this embodiment, after the first recognition substructure outputs the first recognition result, the first recognition substructure of the first recognition substructure may be calculated based on the first recognition result and the real or expected ideographic identification of the current sample sentence data. The training loss value is the difference between the first recognition result and the real or expected ideographic identification of the current sample sentence data; after the second recognition substructure outputs the second recognition result, the second recognition result and the current sample sentence data can be The real or expected word identifier is used to calculate the second training loss value of the second recognition substructure, that is, the difference between the second recognition result and the real or expected word identifier of the current sample sentence data.
可选地,在本实施例中,可以利用不同的权重值分别对第一训练损失值以及第二训练损失值进行整合,得到语句识别模型整体的训练损失值,在此不作限定。Optionally, in this embodiment, different weight values can be used to integrate the first training loss value and the second training loss value respectively to obtain the training loss value of the entire sentence recognition model, which is not limited here.
可选地,在本实施例中,可以预先设置语句识别模型的收敛条件,在得到语句识别模型整体的训练损失值之后,将该整体的训练损失值与收敛条件进行比较,确定该整体的训练损失值是否满足收敛条件;在不满足的情况下,继续对该语句识别模型进行训练;在满足的情况下,即可得到目标语句识别模型。Optionally, in this embodiment, the convergence condition of the sentence recognition model can be set in advance. After obtaining the training loss value of the entire sentence recognition model, the overall training loss value is compared with the convergence condition to determine the overall training loss value. Whether the loss value meets the convergence condition; if not, continue training the sentence recognition model; if it is satisfied, the target sentence recognition model can be obtained.
通过本申请实施例提供的方案,通过在样本语句数据中添加表意标识,从而 利用一个训练模型可以同时识别语句的意图和槽位,进而减少了训练成本。Through the solution provided by the embodiments of this application, by adding ideographic identifiers to the sample sentence data, a training model can be used to simultaneously identify the intention and slot of the sentence, thereby reducing the training cost.
在一个示例性实施例中,根据第一训练损失值以及第二训练损失值,得到当前训练损失值,包括:对第一训练损失值和第二训练损失值分别按照第一目标权重值和第二目标权重值进行整合,得到当前训练损失值,其中,第一目标权重值为第一训练损失值的权重值,第二目标权重值为第二训练损失值的权重值。In an exemplary embodiment, obtaining the current training loss value according to the first training loss value and the second training loss value includes: adjusting the first training loss value and the second training loss value according to the first target weight value and the second training loss value respectively. The two target weight values are integrated to obtain the current training loss value, where the first target weight value is the weight value of the first training loss value, and the second target weight value is the weight value of the second training loss value.
可选地,在本实施例中,可以利用不同的权重值分别对第一训练损失值以及第二训练损失值进行整合,得到语句识别模型整体的训练损失值,在此不作限定。Optionally, in this embodiment, different weight values can be used to integrate the first training loss value and the second training loss value respectively to obtain the training loss value of the entire sentence recognition model, which is not limited here.
例如,设α为第一训练损失值Loss 1的权重值,β为第二训练损失值Loss 2的权重值,那么,当前训练损失值为Loss total=αLoss 1+βLoss 2。其中,α和β针对不同的训练任务,可以做不同的调整。 For example, assuming α is the weight value of the first training loss value Loss 1 and β is the weight value of the second training loss value Loss 2 , then the current training loss value is Loss total = αLoss 1 + βLoss 2 . Among them, α and β can be adjusted differently for different training tasks.
通过本申请实施例提供的方案,通过利用权重值对训练损失值进行整合,可以确保语句识别模型整体的训练损失值的准确性。Through the solution provided by the embodiments of this application, by using weight values to integrate the training loss values, the accuracy of the overall training loss value of the sentence recognition model can be ensured.
在一个示例性实施例中,还包括:在当前样本语句数据的词语标识包括目标隐藏标识的情况下,根据第二识别子结构输出的第三识别结果和当前样本语句数据的词语标识,确定目标隐藏标识的第三训练损失值;对第一训练损失值、第二训练损失值和第三训练损失值分别按照第一目标权重值、第二目标权重值和第三目标权重值进行整合,得到当前训练损失值,其中,第一目标权重值为第一训练损失值的权重值,第二目标权重值为第二训练损失值的权重值,第三目标权重值为第三训练损失值的权重值。In an exemplary embodiment, the method further includes: when the word identifier of the current sample sentence data includes the target hidden identifier, determining the target according to the third recognition result output by the second recognition substructure and the word identifier of the current sample sentence data. Hidden the third training loss value of the logo; integrate the first training loss value, the second training loss value and the third training loss value according to the first target weight value, the second target weight value and the third target weight value respectively to obtain The current training loss value, where the first target weight value is the weight value of the first training loss value, the second target weight value is the weight value of the second training loss value, and the third target weight value is the weight value of the third training loss value value.
需要说明的是,由于语句数据中会包括各种类型的词语,有一些词语,例如,未登录词,可能无法被语句识别模型识别,因此,我们将无法被语句识别模型识别的词语利用隐藏标识来代替,并利用语句识别模型来预测该隐藏标识所对应的词语标识,并可以根据语句识别模型的识别结果和该隐藏标识所对应的真实或期望词语标识计算该隐藏标识的第三训练损失值;进而可以利用不同的权重值分别对第一训练损失值、第二训练损失值以及第三训练损失值进行整合,得到语句识别模型整体的训练损失值,在此不作限定。It should be noted that since the sentence data will include various types of words, some words, such as unregistered words, may not be recognized by the sentence recognition model. Therefore, we use hidden identifiers for words that cannot be recognized by the sentence recognition model. Instead, the sentence recognition model is used to predict the word identifier corresponding to the hidden identifier, and the third training loss value of the hidden identifier can be calculated based on the recognition result of the sentence recognition model and the real or expected word identifier corresponding to the hidden identifier. ; Furthermore, different weight values can be used to integrate the first training loss value, the second training loss value and the third training loss value respectively to obtain the overall training loss value of the sentence recognition model, which is not limited here.
例如,设α为第一训练损失值Loss 1的权重值,β为第二训练损失值Loss 2的权重值,γ为第二训练损失值Loss 3的权重值,那么,当前训练损失值为Loss total=αLoss 1+βLoss 2+γLoss 3。其中,α+100β+γ=1,α、β和γ针对不同的训练任务,可以做不同的调整。 For example, let α be the weight value of the first training loss value Loss 1 , β be the weight value of the second training loss value Loss 2 , and γ be the weight value of the second training loss value Loss 3. Then, the current training loss value is Loss total =αLoss 1 +βLoss 2 +γLoss 3 . Among them, α+100β+γ=1, α, β and γ can be adjusted differently for different training tasks.
通过本申请实施例提供的方案,通过利用权重值对训练损失值进行整合,可以确保语句识别模型整体的训练损失值的准确性。Through the solution provided by the embodiments of this application, by using weight values to integrate the training loss values, the accuracy of the overall training loss value of the sentence recognition model can be ensured.
在一个示例性实施例中,在当前训练损失值未达到识别收敛条件的情况下,从标记后的多个样本语句数据中获取下一个样本语句数据输入初始语句识别模型,包括:在当前训练损失值未达到识别收敛条件的情况下,对第一识别子结构和第二识别子结构的参数进行调整,得到调整后的初始语句识别模型;从标记后的多个样本语句数据中获取下一个样本语句数据输入调整后的初始语句识别模型。In an exemplary embodiment, when the current training loss value does not reach the recognition convergence condition, the next sample sentence data is obtained from the marked multiple sample sentence data and input into the initial sentence recognition model, including: at the current training loss If the value does not reach the recognition convergence condition, adjust the parameters of the first recognition substructure and the second recognition substructure to obtain the adjusted initial sentence recognition model; obtain the next sample from the marked multiple sample sentence data Sentence data is input into the adjusted initial sentence recognition model.
可选地,在本实施例中,在当前训练损失值未达到识别收敛条件的情况下,可以通过对第一识别子结构和第二识别子结构的参数进行调整,使得当前训练损失值尽快达到识别收敛条件,其中,可以同时对第一识别子结构和第二识别子结构的参数进行调整,也可以单独对第一识别子结构和第二识别子结构的参数进行调整。例如,在一次参数调整中,可以同时调整第一识别子结构和第二识别子结构的参数,也可以依次调整第一识别子结构和第二识别子结构的参数,即当前调整第一识别子结构的参数,下一次调整第二识别子结构的参数,在此不作限定。Optionally, in this embodiment, when the current training loss value does not reach the recognition convergence condition, the parameters of the first recognition substructure and the second recognition substructure can be adjusted so that the current training loss value reaches the recognition convergence condition as soon as possible. Identify the convergence condition, wherein the parameters of the first identification substructure and the second identification substructure can be adjusted simultaneously, or the parameters of the first identification substructure and the second identification substructure can be adjusted individually. For example, in a parameter adjustment, the parameters of the first identification substructure and the second identification substructure can be adjusted simultaneously, or the parameters of the first identification substructure and the second identification substructure can be adjusted sequentially, that is, the first identification substructure is currently adjusted. The parameters of the structure, the parameters of the second identification substructure will be adjusted next time, and are not limited here.
通过本申请实施例提供的方案,通过对语句识别模型的参数进行调整,可以提高模型收敛的速度。Through the solutions provided by the embodiments of this application, by adjusting the parameters of the sentence recognition model, the convergence speed of the model can be improved.
在一个示例性实施例中,在获取多个样本语句数据之后,包括:对多个样本语句数据进行数据预处理操作,得到处理后的多个样本语句数据,其中,数据预处理操作包括以下至少之一:全角转半角,大写数字转小写数字,大写字母转小写字母,表情符号去除,分词,停用词过滤。In an exemplary embodiment, after obtaining multiple sample sentence data, the method includes: performing a data preprocessing operation on the multiple sample sentence data to obtain processed multiple sample sentence data, wherein the data preprocessing operation includes at least the following: One: Convert full-width to half-width, convert uppercase numbers to lowercase numbers, convert uppercase letters to lowercase letters, remove emoticons, word segmentation, stop word filtering.
可选地,在本实施例中,在获取多个样本语句数据之后,可以先对样本语句 数据进行数据预处理操作,主要包括:全角转半角,大写数字转小写数字,大写字母转小写字母,表情符号去除,分词,停用词过滤。其中,分词就是将连续的字序列按照一定的规范重新组合成词序列的过程。在英文的行文中,单词之间是以空格作为自然分界符的,也可以将每一个字都看作一个序列的部分,在此不作限定。Optionally, in this embodiment, after obtaining multiple sample sentence data, you can first perform data preprocessing operations on the sample sentence data, which mainly includes: converting full-width to half-width, converting uppercase numbers to lowercase numbers, converting uppercase letters to lowercase letters, Emoji removal, word segmentation, stop word filtering. Among them, word segmentation is the process of recombining continuous word sequences into word sequences according to certain specifications. In English writing, spaces are used as natural delimiters between words. Each word can also be regarded as part of a sequence, which is not limited here.
通过本申请实施例提供的方案,通过对样本语句数据进行数据预处理操作,可以提高语句识别模型的训练速度和训练准确度。Through the solution provided by the embodiments of this application, by performing data preprocessing operations on the sample sentence data, the training speed and training accuracy of the sentence recognition model can be improved.
为了更好的理解上述语句识别模型的训练方法的过程,以下再结合可选实施例对上述语句识别模型的训练的实现方法进行说明,但不用于限定本申请实施例的技术方案。In order to better understand the process of the training method of the above sentence recognition model, the implementation method of training the above sentence recognition model will be described below with reference to optional embodiments, but this is not intended to limit the technical solutions of the embodiments of the present application.
在本实施例中提供了一种语句识别模型的模型结构,如图3所示。This embodiment provides a model structure of a sentence recognition model, as shown in Figure 3.
1)输入层1) Input layer
由于每个词语都用One-hot编码,而One-hot编码是一种离散编码,对于语句识别模型来说,稀疏向量非常不利于深度模型学习,也非常浪费存储空间,所以需要使用词嵌入(Word Embedding),将稀疏的One-hot编码,映射到稠密向量中。Since each word is encoded with One-hot, and One-hot encoding is a discrete encoding, for the sentence recognition model, sparse vectors are very unfavorable for deep model learning and waste storage space, so word embedding ( Word Embedding), mapping sparse One-hot encoding into dense vectors.
由于语句识别模型使用双向自编码结构,一个语句中的所有文字都是同时输入到语句识别模型,语句识别模型需要知道文字与文字之间的位置信息,所以加入了位置编码(Position Embedding),其公式为:Since the sentence recognition model uses a two-way auto-encoding structure, all the words in a sentence are input to the sentence recognition model at the same time. The sentence recognition model needs to know the position information between words, so position encoding (Position Embedding) is added. The formula is:
PE t,2i=sin(t/1000 2i/d) PE t,2i =sin(t/1000 2i/d )
PE t,2i+1=cos(t/1000 2i/d) PE t,2i+1 =cos(t/1000 2i/d )
其中,t为文字的绝对位置,d代表每个文字的向量维度,i表示维度的索引。Among them, t is the absolute position of the text, d represents the vector dimension of each text, and i represents the index of the dimension.
将词嵌入与位置嵌入做加法,作为模型的输入。Add word embedding and position embedding as input to the model.
2)特征提取层2) Feature extraction layer
特征提取层如图4所示,使用自注意力(Self-Attention)机制,分别定义K、Q、V三个矩阵,用输入向量分别与K、Q、V三个向量相乘,得到三个对应的特征向量K 1、Q 1、V 1,将得到的三个特征向量带入公式
Figure PCTCN2022096401-appb-000001
其中,T表示矩阵的转置,即可以得到我们所需的注意力权重。
The feature extraction layer is shown in Figure 4. It uses the self-attention (Self-Attention) mechanism to define three matrices K, Q, and V respectively. The input vector is multiplied by the three vectors K, Q, and V respectively to obtain three Corresponding eigenvectors K 1 , Q 1 , V 1 , bring the obtained three eigenvectors into the formula
Figure PCTCN2022096401-appb-000001
Among them, T represents the transpose of the matrix, that is, we can get the attention weight we need.
以三个矩阵为一组,因为特征可能有多重情况,所以可以定义多组,例如,定义12组,即12个K,12个Q,12个V,来更好地提取特征。Taking three matrices as a group, because features may have multiple situations, multiple groups can be defined. For example, 12 groups, that is, 12 K, 12 Q, and 12 V, can be defined to better extract features.
这个步骤重复一个,称为一层,本模型有四种这种结构,即有四层。This step is repeated once, called one layer. This model has four such structures, that is, four layers.
在每层之间,还加入了残差网络,来弥补信息丢失问题,即用上一层的输入,加到这一层的输出上。Between each layer, a residual network is also added to make up for the problem of information loss, that is, the input of the previous layer is used and added to the output of this layer.
在输入模型之前,需要对一批数据归一化,将所有句子统一成一个长度,长度不足部分需要补零,过长部分需要删除,这就使批归一化(BatchNormalize)阶段造成了误差,故本申请中引入了层归一化(LayerNormalize),区别于BatchNormalize,LayerNormalize聚焦于每一个数据,而不是一批数据,可以极大地减小因为要统一句子长度,造成的数据误差。Before inputting into the model, a batch of data needs to be normalized to unify all sentences into one length. Parts with insufficient length need to be filled with zeros, and parts that are too long need to be deleted. This causes errors in the batch normalization (BatchNormalize) stage. Therefore, layer normalization (LayerNormalize) is introduced in this application. Different from BatchNormalize, LayerNormalize focuses on each data rather than a batch of data, which can greatly reduce data errors caused by unifying sentence lengths.
3)表意识别3)Ideographic recognition
对于每一条语句,都加入了一个标识符CLS来代表这条句子,将CLS对应的语句识别模型中特征提取层输出的特征带入一层前馈神经网络(FeedforWard),然后接入多个(类别数量)SigMoid层,将最后的输出带入交叉熵损失函数(Cross Entropy Loss function)中,得到LOSS 1For each statement, an identifier CLS is added to represent the sentence. The features output by the feature extraction layer in the statement recognition model corresponding to CLS are brought into a layer of feedforward neural network (FeedforWard), and then connected to multiple ( Number of categories) SigMoid layer, put the final output into the cross entropy loss function (Cross Entropy Loss function), and obtain LOSS 1 .
4)词语识别4) Word recognition
将语句中每个词对应语句识别模型中特征提取层输出的特征带入到条件随机场(CRF)中,得到得分最高的序列预测值和LOSS 2The features output by the feature extraction layer in the sentence recognition model corresponding to each word in the sentence are brought into the conditional random field (CRF), and the highest-scoring sequence prediction value and LOSS 2 are obtained.
5)Mask遮罩5)Mask mask
由于在实际应用中,会遇到大量的未登录词(Out-Of-Vocab,简称OOV)问 题,比如“红队5:1大胜蓝队”中的“5:1”就是一个未登录词,为了解决这个问题,本申请引入了[MASK]随机替换一些字,在最后的语句识别模型的输出中,取出[MASK]对应的特征,用这个特征预测被遮盖的词,得到LOSS 3In practical applications, you will encounter a large number of out-of-vocab (OOV) problems. For example, "5:1" in "The red team beat the blue team 5:1" is an out-of-vocabulary word. , In order to solve this problem, this application introduces [MASK] to randomly replace some words. In the output of the final sentence recognition model, the features corresponding to [MASK] are taken out, and this feature is used to predict the covered words, and LOSS 3 is obtained.
6)Loss函数6)Loss function
由于LOSS 1和LOSS 3是位于0和1之间的值,而CRF是序列损失的累加,两者差距过大,所以需要一些超参数来做调整,所以本模型的损失函数为Loss total=αLoss 1+βLoss 2+γLoss 3,其中,α、β、γ为需要调整的超参数,需要针对不同任务,做不同的调整。 Since LOSS 1 and LOSS 3 are values between 0 and 1, and CRF is the accumulation of sequence losses, the difference between the two is too large, so some hyperparameters are needed to adjust, so the loss function of this model is Loss total = αLoss 1 +βLoss 2 +γLoss 3 , where α, β, and γ are hyperparameters that need to be adjusted, and different adjustments need to be made for different tasks.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is Better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to related technologies. The computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk). ), includes several instructions to cause a terminal device (which can be a mobile phone, computer, server, or network device, etc.) to execute the methods of various embodiments of the present application.
图5是根据本申请实施例的一种设备控制装置的结构框图;如图5所示,包括:Figure 5 is a structural block diagram of an equipment control device according to an embodiment of the present application; as shown in Figure 5, it includes:
识别模块501,被设置为在获取到目标音频的情况下,识别目标音频,得到目标音频对应的待识别语句,其中,目标音频用于请求控制至少一个受控设备;The identification module 501 is configured to, when the target audio is obtained, identify the target audio and obtain the sentence to be recognized corresponding to the target audio, where the target audio is used to request control of at least one controlled device;
输入模块502,被设置为将待识别语句输入至目标语句识别模型,得到目标表意和目标词语,其中,目标语句识别模型包括第一识别子结构以及第二识别子结构,第一识别子结构用于识别语句的表意指向,第二识别结果用于识别语句中的词语数据,目标语句识别模型为利用多个样本语句数据对第一识别子结构以及第二识别子结构进行联合训练得到的、用于识别语句的表意指向以及词语数据的模型;The input module 502 is configured to input the sentence to be recognized into the target sentence recognition model to obtain the target meaning and the target word. The target sentence recognition model includes a first recognition substructure and a second recognition substructure. The first recognition substructure is In order to identify the ideographic direction of the sentence, the second recognition result is used to identify the word data in the sentence. The target sentence recognition model is obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data. A model for identifying the ideographic direction of sentences and word data;
发送模块503,被设置为根据目标表意和目标词语向目标受控设备发送控制 指令,其中,至少一个受控设备包括目标受控设备。The sending module 503 is configured to send control instructions to the target controlled device according to the target meaning and the target word, wherein at least one controlled device includes the target controlled device.
通过本申请实施例提供的方案,在获取到目标音频的情况下,识别目标音频,得到目标音频对应的待识别语句,其中,目标音频用于请求控制至少一个受控设备;将待识别语句输入至目标语句识别模型,得到目标表意和目标词语,其中,目标语句识别模型包括第一识别子结构以及第二识别子结构,第一识别子结构用于识别语句的表意指向,第二识别结果用于识别语句中的词语数据,目标语句识别模型为利用多个样本语句数据对第一识别子结构以及第二识别子结构进行联合训练得到的、用于识别语句的表意指向以及词语数据的模型;根据目标表意和目标词语向目标受控设备发送控制指令,其中,至少一个受控设备包括目标受控设备。采用上述技术方案,通过在目标语句识别模型中设置两个子结构,来分别识别表意指向和词语数据,从而利用一个模型可以同时识别语句的意图和槽位,进而减小了服务器的压力,解决了由于需要两个模型来识别语音控制指令,导致服务器压力较大的技术问题。Through the solution provided by the embodiment of the present application, when the target audio is obtained, the target audio is recognized and the sentence to be recognized corresponding to the target audio is obtained, where the target audio is used to request control of at least one controlled device; the sentence to be recognized is input Go to the target sentence recognition model to obtain the target meaning and target words. The target sentence recognition model includes a first recognition substructure and a second recognition substructure. The first recognition substructure is used to identify the ideographic direction of the sentence, and the second recognition result is For identifying the word data in the sentence, the target sentence recognition model is a model obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data, and is used to identify the ideographic direction of the sentence and the word data; Send a control instruction to the target controlled device according to the target meaning and the target word, wherein at least one controlled device includes the target controlled device. Using the above technical solution, two substructures are set up in the target sentence recognition model to identify the ideographic direction and word data respectively, so that one model can be used to identify the intention and slot of the sentence at the same time, thus reducing the pressure on the server and solving the problem. Technical issues that put a lot of pressure on the server due to the need for two models to recognize voice control commands.
本申请的实施例还提供了一种存储介质,该存储介质包括存储的程序,其中,上述程序运行时执行上述任一项的方法。Embodiments of the present application also provide a storage medium that includes a stored program, wherein any of the above methods is executed when the program is run.
可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下步骤的程序代码:Optionally, in this embodiment, the above-mentioned storage medium may be configured to store program codes for performing the following steps:
S1,在获取到目标音频的情况下,识别目标音频,得到目标音频对应的待识别语句,其中,目标音频用于请求控制至少一个受控设备;S1, when the target audio is obtained, identify the target audio and obtain the sentence to be recognized corresponding to the target audio, where the target audio is used to request control of at least one controlled device;
S2,将待识别语句输入至目标语句识别模型,得到目标表意和目标词语,其中,目标语句识别模型包括第一识别子结构以及第二识别子结构,第一识别子结构用于识别语句的表意指向,第二识别结果用于识别语句中的词语数据,目标语句识别模型为利用多个样本语句数据对第一识别子结构以及第二识别子结构进行联合训练得到的、用于识别语句的表意指向以及词语数据的模型;S2. Input the sentence to be recognized into the target sentence recognition model to obtain the target meaning and the target word. The target sentence recognition model includes a first recognition substructure and a second recognition substructure. The first recognition substructure is used to recognize the meaning of the sentence. Point to, the second recognition result is used to identify the word data in the sentence, and the target sentence recognition model is the ideographic meaning of the sentence obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data. Models for pointing and word data;
S3,根据目标表意和目标词语向目标受控设备发送控制指令,其中,至少一个受控设备包括目标受控设备。S3: Send a control instruction to the target controlled device according to the target meaning and the target word, where at least one controlled device includes the target controlled device.
通过本申请实施例提供的方案,在获取到目标音频的情况下,识别目标音频,得到目标音频对应的待识别语句,其中,目标音频用于请求控制至少一个受控设备;将待识别语句输入至目标语句识别模型,得到目标表意和目标词语,其中,目标语句识别模型包括第一识别子结构以及第二识别子结构,第一识别子结构用于识别语句的表意指向,第二识别结果用于识别语句中的词语数据,目标语句识别模型为利用多个样本语句数据对第一识别子结构以及第二识别子结构进行联合训练得到的、用于识别语句的表意指向以及词语数据的模型;根据目标表意和目标词语向目标受控设备发送控制指令,其中,至少一个受控设备包括目标受控设备。采用上述技术方案,通过在目标语句识别模型中设置两个子结构,来分别识别表意指向和词语数据,从而利用一个模型可以同时识别语句的意图和槽位,进而减小了服务器的压力,解决了由于需要两个模型来识别语音控制指令,导致服务器压力较大的技术问题。Through the solution provided by the embodiment of the present application, when the target audio is obtained, the target audio is recognized and the sentence to be recognized corresponding to the target audio is obtained, where the target audio is used to request control of at least one controlled device; the sentence to be recognized is input Go to the target sentence recognition model to obtain the target meaning and target words. The target sentence recognition model includes a first recognition substructure and a second recognition substructure. The first recognition substructure is used to identify the ideographic direction of the sentence, and the second recognition result is For identifying the word data in the sentence, the target sentence recognition model is a model obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data, and is used to identify the ideographic direction of the sentence and the word data; Send a control instruction to the target controlled device according to the target meaning and the target word, wherein at least one controlled device includes the target controlled device. Using the above technical solution, two substructures are set up in the target sentence recognition model to identify the ideographic direction and word data respectively, so that one model can be used to identify the intention and slot of the sentence at the same time, thus reducing the pressure on the server and solving the problem. Technical issues that put a lot of pressure on the server due to the need for two models to recognize voice control commands.
本申请的实施例还提供了一种电子装置,包括存储器和处理器,该存储器中存储有计算机程序,该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。An embodiment of the present application also provides an electronic device, including a memory and a processor. A computer program is stored in the memory, and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.
可选地,上述电子装置还可以包括传输设备以及输入输出设备,其中,该传输设备和上述处理器连接,该输入输出设备和上述处理器连接。Optionally, the above-mentioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the above-mentioned processor, and the input-output device is connected to the above-mentioned processor.
可选地,在本实施例中,上述处理器可以被设置为通过计算机程序执行以下步骤:Optionally, in this embodiment, the above-mentioned processor may be configured to perform the following steps through a computer program:
S1,在获取到目标音频的情况下,识别目标音频,得到目标音频对应的待识别语句,其中,目标音频用于请求控制至少一个受控设备;S1, when the target audio is obtained, identify the target audio and obtain the sentence to be recognized corresponding to the target audio, where the target audio is used to request control of at least one controlled device;
S2,将待识别语句输入至目标语句识别模型,得到目标表意和目标词语,其中,目标语句识别模型包括第一识别子结构以及第二识别子结构,第一识别子结构用于识别语句的表意指向,第二识别结果用于识别语句中的词语数据,目标语句识别模型为利用多个样本语句数据对第一识别子结构以及第二识别子结构进行联合训练得到的、用于识别语句的表意指向以及词语数据的模型;S2. Input the sentence to be recognized into the target sentence recognition model to obtain the target meaning and the target word. The target sentence recognition model includes a first recognition substructure and a second recognition substructure. The first recognition substructure is used to recognize the meaning of the sentence. Point to, the second recognition result is used to identify the word data in the sentence, and the target sentence recognition model is the ideographic meaning of the sentence obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data. Models for pointing and word data;
S3,根据目标表意和目标词语向目标受控设备发送控制指令,其中,至少一个受控设备包括目标受控设备。S3: Send a control instruction to the target controlled device according to the target meaning and the target word, where at least one controlled device includes the target controlled device.
通过本申请实施例提供的方案,在获取到目标音频的情况下,识别目标音频,得到目标音频对应的待识别语句,其中,目标音频用于请求控制至少一个受控设备;将待识别语句输入至目标语句识别模型,得到目标表意和目标词语,其中,目标语句识别模型包括第一识别子结构以及第二识别子结构,第一识别子结构用于识别语句的表意指向,第二识别结果用于识别语句中的词语数据,目标语句识别模型为利用多个样本语句数据对第一识别子结构以及第二识别子结构进行联合训练得到的、用于识别语句的表意指向以及词语数据的模型;根据目标表意和目标词语向目标受控设备发送控制指令,其中,至少一个受控设备包括目标受控设备。采用上述技术方案,通过在目标语句识别模型中设置两个子结构,来分别识别表意指向和词语数据,从而利用一个模型可以同时识别语句的意图和槽位,进而减小了服务器的压力,解决了由于需要两个模型来识别语音控制指令,导致服务器压力较大的技术问题。Through the solution provided by the embodiment of the present application, when the target audio is obtained, the target audio is recognized and the sentence to be recognized corresponding to the target audio is obtained, where the target audio is used to request control of at least one controlled device; the sentence to be recognized is input Go to the target sentence recognition model to obtain the target meaning and target words. The target sentence recognition model includes a first recognition substructure and a second recognition substructure. The first recognition substructure is used to identify the ideographic direction of the sentence, and the second recognition result is For identifying the word data in the sentence, the target sentence recognition model is a model obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data, and is used to identify the ideographic direction of the sentence and the word data; Send a control instruction to the target controlled device according to the target meaning and the target word, wherein at least one controlled device includes the target controlled device. Using the above technical solution, two substructures are set up in the target sentence recognition model to identify the ideographic direction and word data respectively, so that one model can be used to identify the intention and slot of the sentence at the same time, thus reducing the pressure on the server and solving the problem. Technical issues that put a lot of pressure on the server due to the need for two models to recognize voice control commands.
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。Optionally, in this embodiment, the above storage medium may include but is not limited to: U disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM), Various media that can store program code, such as mobile hard drives, magnetic disks, or optical disks.
可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。Optionally, for specific examples in this embodiment, reference can be made to the examples described in the above-mentioned embodiments and optional implementations, and details will not be described again in this embodiment.
显然,本领域的技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that the above-mentioned modules or steps of the present application can be implemented using general-purpose computing devices, and they can be concentrated on a single computing device, or distributed across a network composed of multiple computing devices. , optionally, they may be implemented in program code executable by a computing device, such that they may be stored in a storage device for execution by the computing device, and in some cases, may be in a sequence different from that herein. The steps shown or described are performed either individually as individual integrated circuit modules, or as multiple modules or steps among them as a single integrated circuit module. As such, the application is not limited to any specific combination of hardware and software.
以上所述仅是本申请的优选实施方式,应当指出,对于本技术领域的普通技 术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。The above are only the preferred embodiments of the present application. It should be pointed out that for those of ordinary skill in the art, several improvements and modifications can be made without departing from the principles of the present application. These improvements and modifications can also be made. should be regarded as the scope of protection of this application.
工业实用性Industrial applicability
在获取到目标音频的情况下,识别目标音频,得到目标音频对应的待识别语句,其中,目标音频用于请求控制至少一个受控设备;将待识别语句输入至目标语句识别模型,得到目标表意和目标词语,其中,目标语句识别模型包括第一识别子结构以及第二识别子结构,第一识别子结构用于识别语句的表意指向,第二识别结果用于识别语句中的词语数据,目标语句识别模型为利用多个样本语句数据对第一识别子结构以及第二识别子结构进行联合训练得到的、用于识别语句的表意指向以及词语数据的模型;根据目标表意和目标词语向目标受控设备发送控制指令,其中,至少一个受控设备包括目标受控设备。采用上述技术方案,通过在目标语句识别模型中设置两个子结构,来分别识别表意指向和词语数据,从而利用一个模型可以同时识别语句的意图和槽位,进而减小了服务器的压力,解决了由于需要两个模型来识别语音控制指令,导致服务器压力较大的技术问题。When the target audio is obtained, identify the target audio and obtain the sentence to be recognized corresponding to the target audio, where the target audio is used to request control of at least one controlled device; input the sentence to be recognized into the target sentence recognition model to obtain the target ideogram and the target word, wherein the target sentence recognition model includes a first recognition substructure and a second recognition substructure. The first recognition substructure is used to identify the ideographic direction of the sentence, and the second recognition result is used to identify the word data in the sentence. The target The sentence recognition model is a model obtained by jointly training the first recognition substructure and the second recognition substructure using multiple sample sentence data, and is used to recognize the ideographic direction and word data of the sentence; according to the target ideographic meaning and the target word, the target recipient is The control device sends a control instruction, wherein at least one controlled device includes a target controlled device. Using the above technical solution, two substructures are set up in the target sentence recognition model to identify the ideographic direction and word data respectively, so that one model can be used to identify the intention and slot of the sentence at the same time, thus reducing the pressure on the server and solving the problem. Technical issues that put a lot of pressure on the server due to the need for two models to recognize voice control commands.

Claims (10)

  1. 一种设备控制方法,包括:A device control method including:
    在获取到目标音频的情况下,识别所述目标音频,得到所述目标音频对应的待识别语句,其中,所述目标音频用于请求控制至少一个受控设备;When the target audio is obtained, identify the target audio and obtain the sentence to be recognized corresponding to the target audio, wherein the target audio is used to request control of at least one controlled device;
    将所述待识别语句输入至目标语句识别模型,得到目标表意和目标词语,其中,所述目标语句识别模型包括第一识别子结构以及第二识别子结构,所述第一识别子结构用于识别语句的表意指向,所述第二识别结果用于识别语句中的词语数据,所述目标语句识别模型为利用多个样本语句数据对所述第一识别子结构以及所述第二识别子结构进行联合训练得到的、用于识别语句的表意指向以及词语数据的模型;The sentence to be recognized is input into the target sentence recognition model to obtain the target meaning and the target word, wherein the target sentence recognition model includes a first recognition substructure and a second recognition substructure, and the first recognition substructure is used to Identify the ideographic direction of the sentence, the second recognition result is used to identify word data in the sentence, and the target sentence recognition model uses multiple sample sentence data to compare the first recognition substructure and the second recognition substructure. A model obtained through joint training for identifying the ideographic orientation of sentences and word data;
    根据所述目标表意和所述目标词语向目标受控设备发送控制指令,其中,所述至少一个受控设备包括所述目标受控设备。Send a control instruction to a target controlled device according to the target meaning and the target word, wherein the at least one controlled device includes the target controlled device.
  2. 根据权利要求1所述的方法,其中,所述根据所述目标表意和所述目标词语向目标受控设备发送控制指令,包括:The method according to claim 1, wherein sending a control instruction to a target controlled device according to the target meaning and the target word includes:
    根据所述目标表意和所述目标词语获取所述控制指令以及指令发送信息,其中,所述指令发送信息用于指示将所述控制指令发送至所述至少一个受控设备中的所述目标受控设备。The control instruction and instruction sending information are obtained according to the target meaning and the target word, wherein the instruction sending information is used to indicate sending the control instruction to the target subject in the at least one controlled device. control equipment.
  3. 根据权利要求2所述的方法,其中,所述根据所述目标表意和所述目标词语获取所述控制指令以及指令发送信息,包括:The method according to claim 2, wherein said obtaining the control instruction and instruction sending information according to the target ideogram and the target word includes:
    根据所述目标表意确定所述控制指令的执行权限;以及,根据所述目标词语确定所述指令发送信息以及所述控制指令的指令内容。The execution authority of the control instruction is determined according to the target meaning; and the instruction sending information and the instruction content of the control instruction are determined according to the target word.
  4. 根据权利要求1所述的方法,其中,在获取所述目标音频之前,包括:The method according to claim 1, wherein before obtaining the target audio, comprising:
    获取多个样本语句数据;Obtain multiple sample statement data;
    对每个所述样本语句数据中的语句数据进行标记,得到标记后的所述多 个样本语句数据,其中,每个标记后的样本语句数据中包括标记的表意标识和词语标识,所述表意标识用于标记所述语句数据的表意指向,所述词语标识用于标记所述语句数据中的至少一个词语数据;Mark the sentence data in each of the sample sentence data to obtain the plurality of marked sample sentence data, wherein each marked sample sentence data includes a marked ideographic identifier and a word identifier, and the ideographic identifier The identifier is used to mark the ideographic direction of the sentence data, and the word identifier is used to mark at least one word data in the sentence data;
    从标记后的所述多个样本语句数据中确定出当前样本语句数据,并确定初始语句识别模型,其中,所述初始语句识别模型包括第一识别子结构以及第二识别子结构,所述第一识别子结构用于识别所述语句数据的表意指向,所述第二识别结果用于识别所述语句数据中的词语数据;Determine the current sample sentence data from the marked plurality of sample sentence data, and determine an initial sentence recognition model, wherein the initial sentence recognition model includes a first recognition substructure and a second recognition substructure, and the third recognition substructure An identification substructure is used to identify the ideographic direction of the sentence data, and the second recognition result is used to identify word data in the sentence data;
    将所述当前样本语句数据分别输入所述第一识别子结构以及所述第二识别子结构,得到所述第一识别子结构输出的第一识别结果、以及所述第二识别子结构输出的第二识别结果;The current sample sentence data is input into the first recognition substructure and the second recognition substructure respectively, and the first recognition result output by the first recognition substructure and the first recognition result output by the second recognition substructure are obtained. second identification result;
    根据所述第一识别结果和所述当前样本语句数据的表意标识,确定所述第一识别子结构的第一训练损失值;以及,根据所述第二识别结果和所述当前样本语句数据的词语标识,确定所述第二识别子结构的第二训练损失值;Determine the first training loss value of the first recognition substructure according to the first recognition result and the ideographic identification of the current sample sentence data; and, according to the second recognition result and the current sample sentence data Word identification, determining the second training loss value of the second recognition substructure;
    根据所述第一训练损失值以及所述第二训练损失值,得到当前训练损失值,其中,所述当前训练损失值用于确定所述初始语句识别模型的训练状况;According to the first training loss value and the second training loss value, a current training loss value is obtained, wherein the current training loss value is used to determine the training status of the initial sentence recognition model;
    在所述当前训练损失值未达到识别收敛条件的情况下,从标记后的所述多个样本语句数据中获取下一个样本语句数据输入所述初始语句识别模型;When the current training loss value does not reach the recognition convergence condition, obtain the next sample sentence data from the marked plurality of sample sentence data and input it into the initial sentence recognition model;
    在所述当前训练损失值达到识别所述收敛条件的情况下,确定所述初始语句识别模型为所述目标语句识别模型。When the current training loss value reaches the convergence condition for identifying, the initial sentence recognition model is determined to be the target sentence recognition model.
  5. 根据权利要求4所述的方法,其中,所述根据所述第一训练损失值以及所述第二训练损失值,得到当前训练损失值,包括:The method according to claim 4, wherein obtaining the current training loss value according to the first training loss value and the second training loss value includes:
    对所述第一训练损失值和所述第二训练损失值分别按照第一目标权重值和第二目标权重值进行整合,得到所述当前训练损失值,其中,所述第一目标权重值为所述第一训练损失值的权重值,所述第二目标权重值为所述第二训练损失值的权重值。The first training loss value and the second training loss value are integrated according to the first target weight value and the second target weight value respectively to obtain the current training loss value, wherein the first target weight value is The weight value of the first training loss value, and the second target weight value is the weight value of the second training loss value.
  6. 根据权利要求4所述的方法,其中,还包括:The method of claim 4, further comprising:
    在所述当前样本语句数据的词语标识包括目标隐藏标识的情况下,根据所述第二识别子结构输出的第三识别结果和所述当前样本语句数据的词语标识,确定所述目标隐藏标识的第三训练损失值;When the word identifier of the current sample sentence data includes a target hidden identifier, the target hidden identifier is determined based on the third recognition result output by the second recognition substructure and the word identifier of the current sample sentence data. The third training loss value;
    对所述第一训练损失值、所述第二训练损失值和所述第三训练损失值分别按照第一目标权重值、第二目标权重值和第三目标权重值进行整合,得到所述当前训练损失值,其中,所述第一目标权重值为所述第一训练损失值的权重值,所述第二目标权重值为所述第二训练损失值的权重值,所述第三目标权重值为所述第三训练损失值的权重值。The first training loss value, the second training loss value and the third training loss value are integrated according to the first target weight value, the second target weight value and the third target weight value respectively to obtain the current Training loss value, wherein the first target weight value is the weight value of the first training loss value, the second target weight value is the weight value of the second training loss value, and the third target weight value The value is the weight value of the third training loss value.
  7. 根据权利要求4所述的方法,其中,所述在所述当前训练损失值未达到识别收敛条件的情况下,从标记后的所述多个样本语句数据中获取下一个样本语句数据输入所述初始语句识别模型,包括:The method according to claim 4, wherein when the current training loss value does not reach the recognition convergence condition, the next sample sentence data is obtained from the marked plurality of sample sentence data and input into the Initial sentence recognition model, including:
    在所述当前训练损失值未达到识别收敛条件的情况下,对所述第一识别子结构和所述第二识别子结构的参数进行调整,得到调整后的初始语句识别模型;When the current training loss value does not reach the recognition convergence condition, adjust the parameters of the first recognition substructure and the second recognition substructure to obtain an adjusted initial sentence recognition model;
    从标记后的所述多个样本语句数据中获取下一个样本语句数据输入所述调整后的初始语句识别模型。The next sample sentence data is obtained from the marked plurality of sample sentence data and input into the adjusted initial sentence recognition model.
  8. 一种设备控制装置,包括:An equipment control device including:
    识别模块,被设置为在获取到目标音频的情况下,识别所述目标音频,得到所述目标音频对应的待识别语句,其中,所述目标音频用于请求控制至少一个受控设备;An identification module, configured to, when the target audio is obtained, identify the target audio and obtain the sentence to be recognized corresponding to the target audio, wherein the target audio is used to request control of at least one controlled device;
    输入模块,被设置为将所述待识别语句输入至目标语句识别模型,得到目标表意和目标词语,其中,所述目标语句识别模型包括第一识别子结构以及第二识别子结构,所述第一识别子结构用于识别语句的表意指向,所述第二识别结果用于识别语句中的词语数据,所述目标语句识别模型为利用多个样本语句数据对所述第一识别子结构以及所述第二识别子结构进行联合训练得到的、用于识别语句的表意指向以及词语数据的模型;The input module is configured to input the sentence to be recognized into a target sentence recognition model to obtain the target meaning and the target word, wherein the target sentence recognition model includes a first recognition substructure and a second recognition substructure, and the third recognition substructure An identification substructure is used to identify the ideographic direction of a sentence, the second identification result is used to identify word data in the sentence, and the target sentence identification model uses multiple sample sentence data to identify the first identification substructure and the A model for identifying the ideographic direction of sentences and word data obtained through joint training of the second recognition substructure;
    发送模块,被设置为根据所述目标表意和所述目标词语向目标受控设备发送控制指令,其中,所述至少一个受控设备包括所述目标受控设备。The sending module is configured to send a control instruction to a target controlled device according to the target meaning and the target word, wherein the at least one controlled device includes the target controlled device.
  9. 一种计算机可读的存储介质,所述计算机可读的存储介质包括存储的程序,其中,所述程序运行时执行权利要求1至7中任一项所述的方法。A computer-readable storage medium includes a stored program, wherein the method of any one of claims 1 to 7 is executed when the program is run.
  10. 一种电子装置,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器被设置为通过所述计算机程序执行权利要求1至7中任一项所述的方法。An electronic device includes a memory and a processor, a computer program is stored in the memory, and the processor is configured to execute the method according to any one of claims 1 to 7 through the computer program.
PCT/CN2022/096401 2022-05-05 2022-05-31 Appliance control method, storage medium, and electronic device WO2023212993A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210481784.9 2022-05-05
CN202210481784.9A CN117059083A (en) 2022-05-05 2022-05-05 Equipment control method, storage medium and electronic device

Publications (1)

Publication Number Publication Date
WO2023212993A1 true WO2023212993A1 (en) 2023-11-09

Family

ID=88646180

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/096401 WO2023212993A1 (en) 2022-05-05 2022-05-31 Appliance control method, storage medium, and electronic device

Country Status (2)

Country Link
CN (1) CN117059083A (en)
WO (1) WO2023212993A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160147736A1 (en) * 2014-11-26 2016-05-26 Abbyy Infopoisk Llc Creating ontologies by analyzing natural language texts
CN106649694A (en) * 2016-12-19 2017-05-10 北京云知声信息技术有限公司 Method and device for identifying user's intention in voice interaction
CN108628830A (en) * 2018-04-24 2018-10-09 北京京东尚科信息技术有限公司 A kind of method and apparatus of semantics recognition
CN110866094A (en) * 2018-08-13 2020-03-06 珠海格力电器股份有限公司 Instruction recognition method, instruction recognition device, storage medium, and electronic device
CN110895936A (en) * 2018-09-13 2020-03-20 珠海格力电器股份有限公司 Voice processing method and device based on household appliance
CN111429903A (en) * 2020-03-19 2020-07-17 百度在线网络技术(北京)有限公司 Audio signal identification method, device, system, equipment and readable medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160147736A1 (en) * 2014-11-26 2016-05-26 Abbyy Infopoisk Llc Creating ontologies by analyzing natural language texts
CN106649694A (en) * 2016-12-19 2017-05-10 北京云知声信息技术有限公司 Method and device for identifying user's intention in voice interaction
CN108628830A (en) * 2018-04-24 2018-10-09 北京京东尚科信息技术有限公司 A kind of method and apparatus of semantics recognition
CN110866094A (en) * 2018-08-13 2020-03-06 珠海格力电器股份有限公司 Instruction recognition method, instruction recognition device, storage medium, and electronic device
CN110895936A (en) * 2018-09-13 2020-03-20 珠海格力电器股份有限公司 Voice processing method and device based on household appliance
CN111429903A (en) * 2020-03-19 2020-07-17 百度在线网络技术(北京)有限公司 Audio signal identification method, device, system, equipment and readable medium

Also Published As

Publication number Publication date
CN117059083A (en) 2023-11-14

Similar Documents

Publication Publication Date Title
Han et al. An improved evolutionary extreme learning machine based on particle swarm optimization
CN111414987B (en) Training method and training device of neural network and electronic equipment
CN108733508B (en) Method and system for controlling data backup
Cuong Nguyen et al. Reduced‐order observer design for one‐sided Lipschitz time‐delay systems subject to unknown inputs
CN109829299A (en) A kind of unknown attack recognition methods based on depth self-encoding encoder
TW201917602A (en) Semantic encoding method and device for text capable of enabling mining of semantic relationships of text and of association between text and topics, and realizing fixed semantic encoding of text data having an indefinite length
JP2019513246A (en) Training method of random forest model, electronic device and storage medium
Shin Application of boosting regression trees to preliminary cost estimation in building construction projects
CN113628059B (en) Associated user identification method and device based on multi-layer diagram attention network
CN112528029A (en) Text classification model processing method and device, computer equipment and storage medium
CN109710953A (en) A kind of interpretation method and device calculate equipment, storage medium and chip
CN110321430B (en) Domain name recognition and domain name recognition model generation method, device and storage medium
Bugeja et al. Functional classification and quantitative analysis of smart connected home devices
CN115510186A (en) Instant question and answer method, device, equipment and storage medium based on intention recognition
CN113256438B (en) Role identification method and system for network user
WO2023212993A1 (en) Appliance control method, storage medium, and electronic device
Du et al. Structure tuning method on deep convolutional generative adversarial network with nondominated sorting genetic algorithm II
CN114064125B (en) Instruction analysis method and device and electronic equipment
CN114329744B (en) House type reconstruction method and computer readable storage medium
Noor et al. Reverse engineering sparse gene regulatory networks using cubature kalman filter and compressed sensing
KR102554750B1 (en) Method and system for transfer learning of deep learning model based on document similarity learning
CN114925158A (en) Sentence text intention recognition method and device, storage medium and electronic device
CN114091021A (en) Malicious code detection method for electric power enterprise safety protection
CN110516717B (en) Method and apparatus for generating image recognition model
CN115705464A (en) Information processing method, device and equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22940687

Country of ref document: EP

Kind code of ref document: A1