WO2021147041A1 - 语义分析方法、装置、设备及存储介质 - Google Patents

语义分析方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021147041A1
WO2021147041A1 PCT/CN2020/073914 CN2020073914W WO2021147041A1 WO 2021147041 A1 WO2021147041 A1 WO 2021147041A1 CN 2020073914 W CN2020073914 W CN 2020073914W WO 2021147041 A1 WO2021147041 A1 WO 2021147041A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity
text
attention
feature
module
Prior art date
Application number
PCT/CN2020/073914
Other languages
English (en)
French (fr)
Inventor
李宏广
聂为然
高益
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2020/073914 priority Critical patent/WO2021147041A1/zh
Priority to CN202080004415.XA priority patent/CN112543932A/zh
Publication of WO2021147041A1 publication Critical patent/WO2021147041A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Definitions

  • This application relates to the technical field of natural language understanding, in particular to a semantic analysis method, device, equipment and storage medium.
  • Natural language understanding is a technology in which a computer analyzes the semantics of text in natural language. It aims to make the computer understand the meaning of natural language, so that users can use natural language to communicate with the computer. NLU technology has been widely used in many scenarios. For example, in the vehicle field, after the driver speaks the voice based on natural language, the vehicle terminal can convert the voice into text, perform semantic analysis on the text to obtain the semantic information of the text, and execute the corresponding instructions according to the semantic information to realize voice interaction Function.
  • the text to be analyzed can be segmented to obtain each word contained in the text, and each word is input into the word2vector model (a model that converts words into vectors), and each word is represented as a vector through the word2vector model. According to the vector corresponding to each word, analyze the semantic information of the text.
  • word2vector model a model that converts words into vectors
  • the text often contains some specific entities, such as songs, locations, etc. These entities have a great impact on the semantics of the text. However, when the above method is used, the ability to recognize entities in the text is poor, resulting in insufficient semantic understanding of the computer.
  • This application provides a semantic analysis method, device, equipment, and storage medium, which can improve the computer's semantic understanding ability.
  • a semantic analysis method is provided.
  • an entity in a text to be analyzed is obtained; and a structured entity vector corresponding to the entity is obtained according to the entity in the text to be analyzed.
  • the structured entity vector is used to indicate the identity of the entity and the attributes of the entity; feature extraction is performed on the structured entity vector to obtain entity features; the entity features, the lexical features of the text, and the text Syntactic features of the text are merged to obtain the semantic features of the text, and the semantic features are used to obtain the semantic information of the text.
  • the entity feature is extracted from the structured entity vector, and the entity feature is merged with the lexical feature and the syntactic feature to obtain the inclusion
  • the semantic features of entity features, lexical features, and syntactic features are decoded to obtain semantic information. Since the structured entity vector contains the identity of the entity and the attributes of the entity, the attributes of the entity can be used to enhance the ability of semantic understanding.
  • the method for extracting the structured entity vector may include: obtaining the structured entity vector from an entity construction table according to the entity in the text to be analyzed, and the entity construction table is used to store the entity and the structure.
  • the mapping relationship between the transformation entity vectors may be used to store the entity and the structure.
  • the subsequent pre-training model can enhance the pre-training when further recognition based on the structured entity vector.
  • the entity construction table includes entities associated with the vehicle domain, and the text is obtained by recognizing the voice collected by the vehicle terminal. In this way, it helps to build a structured knowledge entity in the vehicle field.
  • the entity construction table includes at least one of entities with irregular names, entities with the number of characters in the name exceeding a threshold, and entities with a word frequency of the name lower than the threshold. Because the names of these entities are prone to ambiguity or have multiple meanings, it is difficult for the machine to understand the correct semantics. By pre-storing the vector representation of these entities in the entity construction table in advance, the machine can look up the table to obtain an accurate vector representation. Incorporating entity features into the process can help improve the accuracy of semantic understanding.
  • the method of fusing the entity feature, the lexical feature, and the syntactic feature includes: performing a weighted summation of the entity feature, the lexical feature, and the syntactic feature to obtain the fusion feature; performing nonlinear transformation on the fusion feature through an activation function, Get the semantic feature.
  • lexical, syntactic, and entity features are features in different vector spaces, or that lexical, syntactic, and entity features are heterogeneous information, this can be summed up by weighting the entity feature, lexical feature, and syntactic feature. The three features are fused together to achieve heterogeneous information fusion.
  • the lexical features and syntactic features of the text are extracted in such a way: the text is input into a semantic understanding model, the semantic understanding model is obtained by performing migration training on the pre-training model according to the first sample, and the first sample includes For the text with semantic information, the pre-training model is obtained by training according to the second sample, and the second sample includes the masked text; through the semantic understanding model, the lexical feature and the syntactic feature are extracted from the text.
  • the pre-training model has basic natural language processing capabilities.
  • the pre-training model is fine-tuned using the text with semantic information, so that the pre-training model learns the relationship between the text and the semantic information in the process of fine-tuning , With the ability to extract lexical features, syntactic features and semantic features. Then in the model application stage, the semantic understanding model can be used to extract accurate lexical, syntactic and semantic features.
  • the way for the semantic understanding model to extract lexical features and syntactic features may include: performing an attention operation on the text to obtain a first output result, the first output result being used to indicate the dependence between words in the text Relationship; normalize the first output result to obtain the second output result; perform linear transformation and nonlinear transformation on the second output result to obtain the third output result; normalize the third output result, Get the lexical feature and the syntactic feature.
  • the semantic understanding model includes a first multi-head attention model.
  • the manner of attention calculation includes: inputting the text into the first multi-head attention model; Each attention module of, performs attention operations on the text to obtain the output result of each attention module; splices the output results of each attention module to obtain the spliced result; performs linear transformation on the spliced result , Get the first output result.
  • the multi-head attention mechanism can be used to capture long-distance features in the text, and can extract rich contextual and semantic representation information, and enhance the ability to extract lexical and syntactic features.
  • the method of extracting entity features includes: inputting the structured entity vector into a second multi-head attention model; using each attention module in the second multi-head attention model to separate the structured entity vector Perform attention operations to obtain the output result of each attention module; splice the output results of each attention module to obtain the splicing result; perform linear transformation on the splicing result to obtain the entity feature.
  • the multi-head attention mechanism can capture the correlation between words in the structured entity vector, and help to capture long-distance features, so that the extracted entity features can accurately express semantics, so The physical features are more accurate.
  • a semantic analysis device in a second aspect, has the function of realizing the semantic analysis in the first aspect or any one of the optional methods of the first aspect.
  • the semantic analysis device includes at least one module, and at least one module is used to implement the semantic analysis method provided in the first aspect or any one of the optional methods of the first aspect.
  • an execution device in a third aspect, includes a processor configured to execute instructions so that the execution device executes the semantic analysis method provided in the first aspect or any one of the optional manners in the first aspect. .
  • the execution device includes a processor configured to execute instructions so that the execution device executes the semantic analysis method provided in the first aspect or any one of the optional manners in the first aspect.
  • a computer-readable storage medium stores at least one instruction, and the instruction is read by a processor to enable an execution device to execute the first aspect or any one of the optional manners of the first aspect
  • the semantic analysis method provided.
  • a computer program product is provided.
  • the execution device executes the semantic analysis method provided in the first aspect or any one of the optional methods in the first aspect.
  • a chip in a sixth aspect, includes a processor and a data interface.
  • the processor reads instructions stored in a memory through the data interface, and executes the first aspect or any one of the optional methods provided in the first aspect. Semantic analysis method.
  • the chip may further include a memory in which instructions are stored, and the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the above-mentioned first
  • the semantic analysis method provided by any one of the optional methods of the first aspect or the first aspect.
  • FIG. 1 is a schematic structural diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of extracting lexical features and syntactic features according to a semantic understanding model provided by an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a method for training a semantic understanding model provided by an embodiment of the present application
  • FIG. 4 is a schematic flowchart of a semantic analysis method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of extracting a structured entity vector provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of fusing entity features, lexical features, and syntactic features according to an embodiment of the present application
  • FIG. 7 is a schematic flowchart of a method for in-vehicle voice interaction based on a semantic understanding model and a structured entity vector provided by an embodiment of the present application;
  • FIG. 8 is a schematic flowchart of semantic intent understanding and semantic slot extraction provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a semantic analysis device provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a training device for a semantic understanding model provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of the hardware structure of a semantic analysis device provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of the hardware structure of a training device for a semantic understanding model provided by an embodiment of the present application.
  • the semantic analysis method provided by the embodiments of the present application can be applied to human-computer interaction scenarios and other scenarios that require a computer to understand natural language.
  • the semantic analysis method of the embodiment of the present application can be applied to a voice interaction scenario, for example, in a vehicle voice interaction scenario.
  • the voice interaction scenario and the vehicle voice interaction are briefly introduced below.
  • Voice interaction refers to the transmission of information between humans and devices through natural voice.
  • the in-vehicle voice interaction scene is a scene in which the user conducts voice interaction with the in-vehicle terminal mounted on the car.
  • the user can issue a voice containing instructions, and the vehicle-mounted terminal can convert the user's voice into instructions that the machine can understand, and execute the instructions to implement operations, thereby achieving voice calls and turning on and off the vehicle air conditioner.
  • Intelligent life functions such as automatic height/temperature adjustment of the seat and music playback.
  • users can free their hands and eyes to handle other things. For example, when they want to listen to music, users can order songs through voice, so that their hands and eyes can be used for driving, which is great Improve driving safety and convenience in vehicle-mounted scenes.
  • natural language understanding is the key technology to realize the vehicle voice interaction system.
  • Natural language understanding is part of natural language processing (NLP), the core of NLP, and the difficulty of NLP.
  • NLP natural language processing
  • natural language understanding technology is to hope that the machine has the ability to understand natural language like human beings.
  • the machine can output correct semantic information (such as correct semantic intent and semantic slot).
  • natural language is a common way of expression in people's daily life. For example, when describing the characteristic of hunchback, the natural language expression can be: I have a hunchback, and the expression in non-natural language can be: my back is curved .
  • in-vehicle terminals often have insufficient semantic intent understanding.
  • in-vehicle terminals cannot understand some structured knowledge entities and Abstract semantic representation. For example, for basic entities such as song names with irregular grammar, long-character place names, and low-frequency character place names, it is difficult for the vehicle-mounted terminal to recognize them, and insufficient entity recognition capabilities will greatly affect the accuracy of understanding semantics. For example, the user wants to go to a holiday square called "The Flower of the World” in Beijing, so the user says “Search for the Flower of the World” to the vehicle-mounted terminal.
  • the user's intention expressed in this sentence is navigation, and the destination is the flower of the world.
  • the vehicle terminal recognizes the four words "flower of the world"
  • the navigation service was supposed to be executed, but because the user's intention was misunderstood, the music playback service was executed, and the service performed by the vehicle-mounted terminal could not meet the user's expected feedback.
  • a semantic understanding method combining a pre-training model and a structured entity vector is provided.
  • a pre-training model is obtained, and the pre-training model is fine-tuned to obtain a semantic understanding model, so that the semantic understanding model can realize the extraction of lexical features, syntactic features and semantic features
  • the semantic understanding model can improve the understanding of semantic intent and the extraction of semantic slots through the pre-training process and the process of model fine-tuning, especially the extraction of lexical features, syntactic features and semantic features in the vehicle field. Strong understanding of semantic intentions.
  • a structured entity vector the representation of the entity is realized, and the attributes of the entity can enhance the semantic intent understanding ability of the semantic understanding model.
  • the vehicle terminal it is helpful for the vehicle terminal to recognize the basic structured entity vector and improve the semantic intent understanding ability and the semantic slot extraction ability.
  • entity features, lexical features, and syntactic features the fusion of heterogeneous information is realized, and the semantic information of three different vector spaces of entity features, lexical features and syntactic features are combined to identify semantics, thereby Improve the accuracy of semantic understanding.
  • the semantic understanding model training method relates to the understanding of natural language, and can be specifically applied to data processing methods such as data training, machine learning, and deep learning.
  • data processing methods such as data training, machine learning, and deep learning.
  • Text with semantic information such as semantic intent and semantic slots
  • semantic understanding model is obtained;
  • the semantic analysis method provided in the application embodiment can use the above-mentioned trained semantic understanding model to input input data (such as the text to be analyzed in the embodiment of this application) into the trained semantic understanding model to obtain output data (such as this Semantic information such as semantic intent and semantic slot in the application).
  • training method of the semantic understanding model and the semantic analysis method provided in the embodiments of the application are inventions based on the same concept, and can also be understood as two parts in a system, or two stages of an overall process : Such as model training stage and model application stage.
  • the self-attention mechanism is an improvement of the attention mechanism, which reduces the dependence on external information and is better at capturing the internal correlation of data or features.
  • the essence of the self-attention mechanism is to calculate the sequence related to itself; the target sequence and the source sequence in the self-attention mechanism are the same.
  • the self-attention mechanism in the field of NLP, it is possible to extract the inter-word dependencies of the sentence itself, such as common phrases and things referred to by pronouns.
  • pronouns When a sentence is input, when the machine encodes each word, it not only pays attention to the word to be encoded, but also pays attention to other words in the input sentence, and learns by calculating attention to each word and all words in the sentence
  • the word dependency relationship within the sentence captures the internal structure of the sentence.
  • the process of attention calculation can be encapsulated in the attention function (Attention function), the attention function can be recorded as Attention (X, X, X), after the machine gets the input text sequence, the text sequence can be regarded as X, call attention Force function to perform self-attention calculations.
  • the self-attention mechanism has many advantages. For example, from the perspective of long-distance dependent learning, since the self-attention mechanism is to calculate attention for every word and all words, no matter how long the distance between words is, the maximum path length is only 1. , So it can ignore the distance between words, calculate the dependency relationship, and learn the internal structure of a sentence.
  • the process of using vectors to implement self-attention operations may include the following steps S10 to S14:
  • Step S10 Generate three vectors for each word in the input sequence.
  • the three vectors include a query vector, a key vector, and a value vector. Normally, these three vectors are created by multiplying the word embeddings and three weight matrices. For example, if the input sentence is thinking machine (thinking machine), the first word in this sentence is "Thinking" (thinking), the word "Thinking” is embedded as X1, X1 is multiplied by the WQ weight matrix to get q1 , Q1 is the query vector related to this word.
  • Step S11 Calculate the score. Assuming that the first word "Thinking” in this example calculates the self-attention vector, each word in the input sentence can be used to score "Thinking” to get the word score (Score). For example, the score of the word “Thinking” expresses how important the other parts of the sentence are in the process of encoding the word "Thinking". The score of the word “Thinking” is calculated by the dot product of the key vector of the word "Thinking" (all words in the input sentence) and the query vector of "Thinking".
  • the word embedding of the first word is x1, the query vector of the first word is q1, the key vector of the first word is k1, the value vector of the first word is v1,
  • the word embedding of 1 word is x2, the query vector of the first word is q2, the key vector of the first word is k2, and the value vector of the first word is v2.
  • the first score is the dot product of q1 and k1
  • the second score is the dot product of q1 and k2.
  • Step S12 Process the word score (Score), for example, divide the score by a default value, and then use the softmax function to calculate the result of the division to obtain the softmax score of the word.
  • the function of dividing the score by the default value is to reduce the score to a smaller value range by dividing, and avoid the softmax score from being non-zero or 1.
  • the function of the softmax function operation is to normalize the scores of all words, so that the softmax score of each word is a positive number, and the sum of the softmax scores of all words in the sentence is 1.
  • the softmax score determines the contribution of each word to the encoding current word (such as "Thinking” and "machine” to "Thinking").
  • Step S13 Multiply each value vector by the softmax score.
  • Step S14 Sum the weighted value vector to obtain the output of the self-attention layer at the position (for example, the output of the first word "Thinking").
  • the self-attention calculation is completed, and the calculated vector can be passed to the feedforward neural network.
  • the above-mentioned steps S10 to S14 can be used to complete the calculation in the form of a matrix, so that the calculation can be faster.
  • the calculation of self-attention can be realized with a matrix.
  • Step S20 Calculate the query matrix, the key matrix and the value matrix. Specifically, the word vector of each word in the input sentence is loaded into the matrix X, and the matrix X is respectively multiplied by the query weight matrix W Q , the key weight matrix W K , and the value weight matrix W V to obtain the query matrix Q, key Matrix K and value matrix V.
  • the query matrix Q can be calculated by the following formula (1)
  • the key matrix K can be calculated by the following formula (2)
  • the value matrix V can be calculated by the following formula (3).
  • each row in matrix X corresponds to a word in the input sentence
  • each row of matrix X is a word vector of one word
  • matrix Q represents the query (Queries) matrix of the input sentence
  • each row in matrix Q is a word of one word.
  • Query vector matrix K represents the key matrix of the input sentence
  • each row in matrix K represents the Key vector of a word
  • matrix V represents the value matrix of the input sentence
  • each row in matrix V represents the Value vector of a word .
  • Step S21 can be expressed by the following formula (4).
  • the following formula (4) is a combination of the above steps S11 to S14.
  • the multi-headed attention model is called multi-headed because the multi-headed attention model contains h attention modules.
  • Each attention module can implement the self-attention mechanism shown in (1) above.
  • h is a positive integer greater than 1, for example, h can be 8.
  • each attention module maintains an independent query weight matrix, key weight matrix, and value weight matrix. Therefore, the input matrix X and the query weight matrix W Q , key weight matrix W K and value weight matrix of each attention module are used After W V is calculated, h query matrices Q, key matrix K, and value matrix V are generated, and then h matrices Z are generated, which are matrix Z 0 , matrix Z 1 to matrix Z h , respectively.
  • the network after the multi-head attention model (such as the feedforward network) does not need to input h matrices, it needs to input a matrix, and the matrix is required to be composed of the representation vector of each word. Therefore, the h matrices Z can be compressed into one matrix.
  • One way to achieve compression is to splice h matrices (matrix Z 0 , matrix Z 1 to matrix Z h ) together, and then use an additional weight matrix W O to multiply the splicing result, and the result of the multiplication is the fusion
  • the matrix Z that contains all the attention module information can be used for subsequent operations, such as sending it to the feedforward network.
  • the number of dimensions of the spliced output result is equal to the sum of the number of dimensions of the spliced input parameters
  • the number of rows of the spliced output result is equal to the number of rows of the spliced input parameters.
  • the output result is a large matrix containing h matrices
  • the number of dimensions of this large matrix is the sum of the number of dimensions of the h matrices
  • the number of rows in this large matrix is equal to the number of rows in each of the h matrices.
  • the multi-head attention model has many effects.
  • the multi-head attention model uses multiple attention modules, and the various weight matrices corresponding to each attention module are initialized randomly. After training, each weight matrix is used To project the input word embeddings (or vectors from lower encoders/decoders) into different representation subspaces, allowing the model to learn relevant information in different representation subspaces. Therefore, the multi-head attention model extracts The ability of semantic features is very strong.
  • the multi-head attention model is a model based on the self-attention mechanism, it has the benefits of the self-attention mechanism and can learn the internal structure of a sentence.
  • the multi-head attention model uses multiple attention modules to expand the model's ability to focus on different positions, thereby further enhancing the ability to capture long-distance features.
  • the multi-head attention model performs well in lexical, syntactic, semantic, context processing ability, long-distance feature capture and other aspects, so the comprehensive feature extraction ability is very strong.
  • the multi-headed attention model does not depend on the previous calculations, so it can be operated in parallel.
  • the above describes the self-attention mechanism involved in the semantic understanding model of the embodiment of the present application.
  • the semantic understanding model of the embodiment of the present application also relates to some concepts in the AI field. For ease of understanding, these concepts are introduced below.
  • Activation functions a function used to perform nonlinear transformations.
  • Gaussian error linear units is a high-performance activation function.
  • the nonlinear transformation of the Gelu function is a random regular transformation method that meets expectations, so it performs well in the NLP field, especially It performs best in the self-attention model; it can avoid the problem of vanishing gradient.
  • Important equation taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, and the training of the model becomes a process of reducing this loss as much as possible.
  • the model can use an error backpropagation (BP) algorithm to modify the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial super-resolution model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal super-resolution model parameters, such as a weight matrix.
  • the above describes the self-attention mechanism involved in the semantic understanding model of the embodiment of the present application.
  • the semantic understanding model of the embodiment of the present application also involves some concepts in the field of knowledge graph technology. In order to facilitate understanding, these concepts are introduced below.
  • Entity refers to something that is distinguishable and exists independently.
  • An entity can be a specific object, such as a certain person, a certain city, a certain kind of plant, etc., a certain kind of commodity, and so on.
  • the entity can also be an abstract event, such as a borrowed book, a ball game, etc. Everything in the world is made up of concrete things, and things can be called entities.
  • Entity extraction refers to the extraction of entities in the text, such as extracting the names of persons, organizations/organizations, geographic locations, events/dates, character values, and monetary values in the text. Entity extraction includes detection (find) and classification (classify) of entities. In layman's terms, entity extraction is to find entities from sentences and label them.
  • Attribute An entity has many characteristics, and each characteristic is called an attribute. Each attribute has a value range, and its type can be integer, real, or string. For example, a student (entity) has attributes such as student ID, name, age, gender, and the corresponding value range is character, string, integer, and string type.
  • an embodiment of the present application provides a system architecture 100.
  • the data collection device 16 is used to collect training data.
  • the training data includes: text marked with semantic information, such as text marked with semantic intent and semantic slot.
  • the training data also includes masked text, such as samples processed by a random multivariate mask strategy; the data collection device 16 stores the training data in the database 13.
  • the training device 12 trains to obtain the semantic understanding model 200 based on the training data maintained in the database 13.
  • Embodiment 1 will use Embodiment 1 to describe in more detail how the training device 12 obtains the semantic understanding model 200 based on the training data.
  • the semantic understanding model 200 can be used to implement the function of extracting the lexical feature and the syntactic feature in the embodiment of the present application, namely ,
  • the text to be analyzed is input into the semantic understanding model 200 after relevant preprocessing, and then the lexical feature and the syntactic feature can be obtained.
  • the semantic understanding model 200 in the embodiments of the present application may specifically be a model based on the attention mechanism.
  • the semantic understanding model 200 is performed by pre-training models (such as multi-head attention models and some weight matrices). ) Is obtained by fine-tuning the model.
  • pre-training models such as multi-head attention models and some weight matrices.
  • the training data maintained in the database 13 may not all come from the collection of the data collection device 16, and may also be received from other devices.
  • the training device 12 does not necessarily perform the training of the semantic understanding model 200 based on the training data maintained by the database 13. It may also obtain training data from the cloud or other places for model training. The above description should not be used as an implementation of this application. Limitations of examples.
  • the semantic understanding model 200 trained according to the training device 12 can be applied to different systems or devices.
  • the semantic understanding model 200 is applied to the execution device 11 shown in FIG. 1, and the execution device 11 may be a terminal, such as a vehicle-mounted terminal, Mobile terminal, tablet computer, notebook computer, AR/VR, etc., can also be server or cloud, etc.
  • the execution device 11 is configured with an I/O interface 112 for data interaction with external devices.
  • the system architecture shown in FIG. 1 can be applied to voice interaction scenarios.
  • the product form of the voice interaction solution provided by the embodiment of the application may be the voice personalized adaptive algorithm module of the voice interaction software system, and the product implementation form is running Computer programs on various terminal devices.
  • the voice interaction product provided by the embodiment of the present application can understand the semantic intent of the vehicle user control instruction, and realize the function of the corresponding vehicle module.
  • the user can input voice to the I/O interface 112 through the audio collection device 14.
  • the audio collection device 14 may include a distributed microphone array, which is used to collect voice control commands of the user.
  • the audio collection device 14 may perform some audio signal preprocessing operations such as sound source localization, echo cancellation, and signal enhancement.
  • the voice recognition module 113 is configured to perform voice recognition according to the input data (such as the voice signal) received by the I/O interface 112 to obtain the text to be analyzed. In this way, the input data is converted from a voice signal to a text signal, and output to the semantic understanding module 111.
  • the semantic understanding module 111 is used for understanding semantics, for example, extracting semantic intentions and semantic slots of users.
  • the semantic understanding module 111 may include a semantic understanding model 200, an entity extraction module 210, an entity construction module 220, a heterogeneous information fusion module 230, and a semantic decoding module 240.
  • the specific functions of each module are as follows:
  • the semantic understanding model 200 is obtained after migration training according to the pre-training model.
  • the semantic understanding model 200 is responsible for realizing the extraction of lexical and syntactic and semantic features of text input, and realizing the preliminary semantic intention understanding of user commands.
  • the entity extraction module 210 is used to extract entities in the text input to obtain effective entities.
  • the entity construction module 220 is used to vectorize an entity to obtain a representation of the entity and its attributes.
  • the heterogeneous information fusion module 230 fuses the lexical features, syntactic features and entity features of the text input to obtain semantic features. This semantic feature combines effective information in different vector spaces to enhance the ability to understand semantic intentions and semantics The extraction capacity of the slot.
  • the semantic decoding module 240 is used to decode semantic features to obtain semantic information, such as semantic intent understanding of user command input and semantic slot extraction, and output control commands.
  • the execution device 11 When the execution device 11 preprocesses the input data, or when the semantic understanding module 111 of the execution device 11 performs calculations and other related processing, the execution device 11 can call the data, codes, etc. in the data storage system 15 for corresponding processing. For processing, data, instructions, etc. obtained by corresponding processing may also be stored in the data storage system 15. In addition, after the execution device 11 can determine the semantic intention and semantic slot of the user, it can issue the control command to the I/O interface 112.
  • the I/O interface 112 returns the control command to the vehicle-mounted execution system 18, and the vehicle-mounted execution system 18 executes corresponding control commands, such as listening to songs, voice navigation, answering calls, controlling car temperature, etc., supporting intelligent vehicle-mounted scenes.
  • the training device 12 can also generate corresponding semantic understanding models 200 based on different training data for different tasks.
  • the corresponding semantic understanding models 200 can be used to achieve the above goals or complete the above tasks, thereby providing users with desired results. .
  • the above system architecture can also be applied to a machine translation scene or a robot question answering scene.
  • the audio collection device 14 shown in FIG. 1 can also be replaced with a mobile phone, a personal computer or other user equipment.
  • the user can manually set the input data, and the manual setting can be operated through the interface provided by the I/O interface 112.
  • the user equipment can automatically send input data to the I/O interface 112. If the user equipment is required to automatically send input data and the user's authorization is required, the user can set the corresponding authority in the user equipment.
  • the user can view the result output by the execution device 11 on the user equipment, and the specific presentation form may be a specific manner such as display, sound, and action.
  • the user equipment can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data and store it in the database 13 as shown in the figure.
  • the I/O interface 112 directly uses the input data input to the I/O interface 112 and the output result of the output I/O interface 112 as the new sample data as shown in the figure. Stored in the database 13.
  • Fig. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data storage system 15 is an external memory relative to the execution device 11. In other cases, the data storage system 15 may also be placed in the execution device 11.
  • the semantic understanding model 200 is obtained by training according to the training device 12.
  • the semantic understanding model 200 provided in this embodiment of the present application may include: a first multi-head attention model 201, a first vector normalization layer 202, and a forward transfer layer 203 and the second vector normalization layer 204.
  • the first multi-head attention model 201 is used to receive the input text, perform attention operations on the text, and send the output result to the first vector normalization layer 202.
  • the first multi-head attention model 201 includes a plurality of attention modules, and each attention module is also called an attention module.
  • the first multi-head attention model 201 includes attention module 0, attention module 1, attention module 2, attention module 3, attention module 4, attention module 5, attention module 6 and Attention module 7.
  • attention module can implement attention calculation, and the technical details of each attention module calculation can refer to the description in (1) above.
  • the first vector normalization layer 202 is used to receive the input of the first multi-head attention model 201, perform normalization calculation on it, and send the output result to the forward transfer layer 203.
  • the first vector normalization layer 202 performs normalization calculation to normalize the mean variance of the sample, simplifying the overall learning difficulty.
  • the forward transfer layer 203 is configured to receive the input of the first vector normalization layer 202, perform forward transfer calculation on it, and send the output result to the second vector normalization layer 204.
  • the forward transfer layer 203 can implement line linear transformation and nonlinear transformation through forward transfer calculation, and map the input of the first vector normalization layer 202 to a high-dimensional vector space.
  • the second vector normalization layer 204 is used to receive the input of the forward transfer layer 203, perform normalization calculation on it, and output the output result.
  • the second vector normalization layer 204 can also perform normalization calculations to normalize the mean variance of the samples, simplifying the overall learning difficulty.
  • FIG. 3 is a method for training a semantic understanding model provided in the first embodiment of this application.
  • the first embodiment can be specifically executed by the training device 12 shown in FIG. 1.
  • the first embodiment involves the pre-training process and fine-tuning of the model.
  • the samples used in the pre-training process can be different from the samples used in the model fine-tuning process.
  • this embodiment refers to the sample used in the model fine-tuning process as the first sample, and the sample used in the pre-training process as the second sample.
  • the first sample and the second sample may be the training data maintained in the database 13 as shown in FIG. 1.
  • S301 and S302 in the first embodiment may be executed in the training device 12 or in the training device 12.
  • the cloud device first preprocesses the second sample received or obtained from the database 13, such as the pre-training process of S301 and S302, to obtain the pre-training model, the pre-training model and The first sample is used as the input of the training device 12, and the training device 12 executes S303 to S304.
  • the first embodiment includes the following S301 to S304:
  • the training device obtains a second sample.
  • the second sample is the text processed based on the mask strategy, and the second sample includes the masked text.
  • the second sample can label the lexeme corresponding to the mask, that is, the label of the second sample is the position of the word replaced by [mask] in the sentence.
  • the large-scale corpus can be obtained, the masking strategy can be used to process the large-scale corpus, and the processed large-scale corpus can be annotated to obtain the second sample.
  • the mask strategy may include at least one of a random mask strategy and a multiple mask (N-gram Mask) strategy.
  • the method of using the mask strategy to train the model can be called random multivariate dynamic mask training.
  • the original text is "Turn on the air conditioner in the car”
  • the second sample obtained is "Play [mask] empty in the car [mask]”.
  • the original text is "Navigate to Pudong Avenue”.
  • the second sample obtained is “Navigate to [mask][mask] Avenue”.
  • the original text is "I want to hear Jay Chou's Qilixiang”.
  • the second sample obtained is "I want to hear [mask][mask][mask ] Of Qilixiang”.
  • the original text is "I want to call my home”.
  • the second sample obtained is "I want to call my home” [Mask][mask]”.
  • the original text is "heat the passenger seat”.
  • the second sample obtained is "given [mask]drive[mask]seat[ mask] hot”.
  • the training device performs model training according to the second sample to obtain a pre-training model.
  • Model training can be achieved through loss function and backpropagation algorithm. For details, please refer to the description of (5) and (6) above.
  • the training device obtains the first sample.
  • the first sample includes text annotated with semantic information.
  • the first sample is marked with semantic intent and semantic slot.
  • the first sample when applied in the vehicle field, may be a text in the vehicle field, such as a corpus in a vehicle voice interaction scene.
  • the training device performs migration training on the pre-training model according to the first sample to obtain a semantic understanding model.
  • the migration training can be fine-tuning of the model.
  • Model fine-tuning is conceptually different from model training.
  • Model training usually means that before training, the parameters of the model are randomly initialized, and a new network is trained from the beginning according to the randomly initialized parameters.
  • Model fine-tuning refers to fine-tuning the parameters of the model based on the pre-training model according to specific tasks.
  • the method of fine-tuning can use the parameters that have been trained in the pre-training model, so it is less expensive than training from scratch. A large amount of computing resources and computing time are removed, and computing efficiency and accuracy are improved.
  • Model fine-tuning can be achieved through loss function and back propagation algorithm. For details, please refer to the description of (5) and (6) above.
  • the manner of obtaining the semantic understanding model described in the above manner is only an example, and the semantic understanding model may also be other large-scale pre-training language models based on pre-training and fine-tuning paradigms.
  • This embodiment provides a model training method for implementing a semantic understanding function.
  • a pre-training model is trained by using a mask strategy, so that the pre-training model has basic natural language processing capabilities.
  • the pre-training model is fine-tuned using the text with semantic information, so that the pre-training model learns the relationship between the text and the semantic information in the process of fine-tuning , With the ability to extract lexical features, syntactic features and semantic features.
  • the semantic understanding model can be used to extract accurate lexical features, syntactic features and semantic features.
  • Fig. 4 is a semantic analysis method provided in the second embodiment of the application.
  • the second embodiment can be specifically executed by the execution device 11 shown in Fig. 1, and the text to be analyzed in the second embodiment can be as shown in Fig. 1
  • the speech given by the audio collection device 14 is converted, the speech recognition module 113 in the execution device 11 can be used to execute S401 in the second embodiment, and the semantic understanding module 111 in the execution device 11 can be used to execute S402 to S407.
  • the second embodiment can be processed by a central processing unit (CPU), or can be processed by a CPU and a graphics processing unit (English: Graphics Processing Unit, abbreviation: GPU).
  • CPU central processing unit
  • GPU Graphics Processing Unit
  • the use of other processors suitable for neural network calculations is not limited in this application.
  • the second embodiment includes S401 to S407.
  • the execution device obtains the text to be analyzed.
  • the execution device will collect the voice signal, perform voice recognition on the voice signal, and obtain the text.
  • the voice signal contains a control command to the vehicle-mounted terminal, and the form of the text may be a text signal.
  • S401 may include the following steps A to B.
  • n1 represents the length of the user's voice control command.
  • n2 represents the length of the text input, and n2 and n1 are equal or not equal.
  • the execution device extracts morphological features and syntactic features from the text.
  • the execution device extracts lexical features and syntactic features by performing steps 1 to 2 described below.
  • Step 1 The execution device inputs the text into the semantic understanding model.
  • the text can be input into the semantic understanding model in the form of a vector or a matrix.
  • the execution device can extract the character word vector, relative position word vector, and character type word vector of the text, and input a matrix composed of the character word vector, relative position word vector, and character type word vector into the semantic understanding model.
  • the text is "I want to listen to Qilixiang" and the input is ([CLS] I want to listen to Qilixiang SEPpad padpad).
  • Character word vector is (E [CLS] E E I want to listen to E E E incense in seven E E [SEP] E [pad] E [pad] E [pad] E [pad]).
  • the relative position word vector is (E 0 E 1 E 2 E 3 E 4 E 5 E 6 E 7 E 8 E 9 E 10 ).
  • the type word vector is (E 1 E 1 E 1 E 1 E 1 E 1 E 1 E 1 E 1 E 0 E 0 E 0 ).
  • E is the abbreviation of embedding, and E represents the word vector.
  • [CLS] and [SEP] are separators.
  • Pad is a padding element, which is used to process the input text to the same length.
  • Step 2 The execution device extracts lexical features and syntactic features from the text through the semantic understanding model.
  • the execution device executes the following steps 2.1 to 2.4.
  • Step 2.1 The execution device performs an attention operation on the text to obtain a first output result, which indicates the dependency relationship between words in the text.
  • the execution device uses a multi-head attention mechanism to implement step 2.1.
  • the execution device executes the following steps 2.1.1 to 2.1.4.
  • Step 2.1.1 the execution device inputs the text into the first multi-head attention model 201.
  • a multi-head attention model can be set in the pre-training model, and the multi-head attention model can also be used in the entity feature extraction stage.
  • this embodiment refers to the multi-head attention model included in the pre-training model as the first Multi-head attention model, the multi-head attention model used in the entity feature extraction stage is called the second multi-head attention model.
  • the first multi-head attention model 201 includes m-layer transform (transformer) units, each transformer unit is used to implement a multi-head attention mechanism, and each transformer unit includes h self-attention modules.
  • the first multi-head attention model 201 includes attention module 0, attention module 1, attention module 2, attention module 3, attention module 4, attention module 5, and attention module 6.
  • a matrix composed of character word vectors, relative position word vectors, and character type word vectors can be used as the input matrix X of the first multi-head attention model 201, and the input matrix X can be input to the attention module respectively 0.
  • Attention module 1 attention module 2, attention module 3, attention module 4, attention module 5, attention module 6, and attention module 7.
  • Step 2.1.2 the execution device performs an attention operation on the text through each attention module in the first multi-head attention model 201, and obtains the output result of each attention module.
  • Attention module 0 attention module 1, attention module 2, attention module 3, attention module 4, attention module 5, attention module 6 and attention module 7 can respectively input Matrix X performs attention calculations to get the output result of attention module 0, the output result of attention module 1, the output result of attention module 2, the output result of attention module 3, the output result of attention module 4, and the attention The output result of module 5, the output result of attention module 6, and the output result of attention module 7.
  • each attention module can use the following formulas (5) to (7) to perform attention calculations, and the output result of the attention module can be expressed by formula (8).
  • the attention calculation is the Attention in the following formula (8).
  • V W V X 1 (7)
  • X 1 is the input text signal.
  • W Q in formula (5) represents the query weight matrix of one attention module in the first multi-head attention model 201, and Q represents the query matrix of one attention module in the first multi-head attention model 201.
  • W K in the formula (6) represents the key weight matrix of one attention module in the first multi-head attention model 201, and K represents the key matrix of one attention module in the first multi-head attention model 201.
  • W V in formula (7) represents the value weight matrix of one attention module in the first multi-head attention model 201, and V represents the value matrix of one attention module in the first multi-head attention model 201.
  • the head(i) in formula (8) represents the output matrix of the current self-attention mechanism.
  • Each row of head(i) represents the self-attention vector of a vocabulary.
  • the self-attention vector represents each word in the sentence (the current word The contribution of itself and other words) to the current word, or the score of each word to the current word, i represents the i-th attention module, i is a positive integer greater than 1, i is less than or equal to h, head(i)
  • the number of columns the number of columns of the Value vector.
  • dk is the corresponding hidden neural unit dimension. Attention means attention calculation.
  • Step 2.1.3 The execution device splices the output results of each attention module to obtain the spliced result.
  • the data form of the output result of the attention module is a matrix
  • the data form of the splicing result is also a matrix
  • the number of dimensions of the splicing result is equal to the sum of the number of dimensions of the output result of each attention module.
  • the splicing method can be horizontal splicing, and the splicing process can be realized by calling the concat (splicing) function. It should be understood that the way of horizontal splicing is only an exemplary illustration.
  • splicing methods to splice the output results of each attention module, for example, use vertical splicing to splice the output results of each attention module to obtain the splicing result, then the number of rows of the splicing result It is equal to the sum of the number of rows of the output result of each attention module, and this embodiment does not specifically limit how to perform splicing.
  • Step 2.1.4 The execution device performs linear transformation on the splicing result to obtain the first output result.
  • the linear transformation method may be multiplication with a weight matrix, that is, step 2.1.4 may specifically be: the execution device multiplies the splicing result by the weight matrix, and uses the product as the first output result.
  • the linear transformation can also adopt other methods besides multiplying the weight matrix, for example, multiplying the splicing result by a certain constant, thereby performing a linear transformation on the splicing result, or combining the splicing result with a certain constant. Plus, so as to perform a linear transformation on the splicing result, and this embodiment does not limit the method of linear transformation.
  • step 2.1.3 and step 2.1.4 can be expressed by the following formula (9-1) and formula (9-2).
  • the splicing in step 2.1.3 is Concat in the following formula (9-1), and the linear transformation in step 2.1.4 is multiplying W O in the following formula (9-1).
  • W O is the weight matrix
  • the W O matrix is obtained by joint training in the first multi-head attention model
  • Concat represents the splicing operation.
  • MultiHead is the output of the first multi-head attention model.
  • MultiHead is a matrix, which is a fusion of h self-attention matrices.
  • h represents the number of attention modules in the first multi-head attention model, h is a positive integer greater than 1, head 1 represents attention module 1, head h represents attention module h, "head 1 , ... head h " represents Attention module 1, attention module 2 to attention module h are h attention modules, h*dk is the overall dimensional size of the multi-head attention mechanism of the current transformer unit. Where means among them.
  • Attention means attention calculation.
  • the multi-head attention mechanism can be used to capture long-distance features in the text, and can extract rich contextual and semantic representation information, and enhance the ability to extract lexical and syntactic features.
  • Step 2.2 The execution device normalizes the first output result to obtain the second output result.
  • the execution device uses the following formula (10) to perform operations, and the normalization is achieved by the LayerNorm function in the following formula (10).
  • the LayerNorm function is only an exemplary implementation, and the execution device may also use other methods to perform normalization, and this embodiment does not specifically limit how to perform the normalization.
  • x represents the second output result.
  • LayerNorm represents standardized calculation operations.
  • MultiHead means multi-head attention
  • MultiHead (Q, K, V) is the first output result
  • MultiHead (Q, K, V) is the output of the multi-head attention mechanism, which is also the result of formula (9).
  • Sublayer represents the residual calculation operation.
  • step 2.2 vector standardization can be realized, and vector standardization can realize the normalization of the mean variance of the sample, thus simplifying the difficulty of learning.
  • Step 2.3 The execution device performs linear transformation and non-linear transformation on the second output result to obtain the third output result.
  • the output result of the first multi-head attention model 201 may be input to the first vector normalization layer 202, and the first vector normalization layer 202 will perform linear transformation and nonlinear transformation to obtain the third output result.
  • the forward pass calculation is used to realize the high-dimensional mapping of the vector space, and the lexical, syntactic and semantic features are extracted.
  • the linear transformation may include an operation of multiplying with a matrix and an operation of adding an offset
  • the nonlinear transformation may be realized by a nonlinear function.
  • selecting the maximum value may be non-linear transformation operation, e.g., performing arithmetic device may be using the following formula (11), multiplied by the linear transformation W 1 and b 1 is achieved by adding the following equation (11), a nonlinear
  • the transformation is realized by the max function in the following formula (11).
  • the max function is only an exemplary implementation of nonlinear transformation, and the execution device can also use other methods to perform nonlinear transformation, for example, performing operations through activation functions to achieve nonlinear transformation. This embodiment does not deal with how to perform nonlinear transformation. Specific restrictions.
  • multiplying by W 1 and adding b 1 are only exemplary implementations of linear transformation, and the execution device may also use other methods to perform linear transformation, and this embodiment does not specifically limit how to perform linear transformation.
  • FFN feed-forward neural network
  • max represents the operation of calculating the maximum value
  • W1 and W2 both represent the weight matrix of the forward pass
  • b1 and b2 both represent the bias parameter of the weight matrix
  • x represents The output of vector normalization is the result of formula (10), which is the second output result.
  • Step 2.4 The execution device normalizes the third output result to obtain lexical features and syntactic features.
  • the output result of the forward pass layer 203 may be input to the second vector normalization layer 204, and the second vector normalization layer 204 will be normalized to obtain lexical features and syntactic features.
  • the mean variance of the sample is normalized to simplify the entire learning difficulty.
  • the execution device uses the following formula (12) to perform operations.
  • the normalization is achieved by the LayerNorm function in the following formula (12).
  • the LayerNorm function is only an exemplary implementation, and the execution device may also use other methods to normalize the third output result, and this embodiment does not specifically limit how to perform the normalization.
  • LayerNorm represents the standardized calculation operation
  • FFN(x) is the output of the forward pass
  • sublayer represents the residual calculation operation
  • V represents the output matrix of the transformer unit
  • the dimension of V is the total number of lexical features and syntactic features as a whole.
  • the process of performing the above method it is determined whether the calculation of the m-layer transformer unit is completed. If it is determined that the calculation of the m-layer transformer unit is not completed, the calculation is continued until the calculation of the m-layer transformer is completed, and the final pre-training is output The tensor structure of the language model.
  • the semantic understanding model based on the pre-training model fine-tuning is used to extract the lexical and syntactic features contained in the input text. Because the model has gone through the pre-training process and the model fine-tuning process, the overall model is very strong. Semantic comprehension capabilities, such as semantic intent comprehension capabilities and semantic slot extraction capabilities. In particular, using the text in the vehicle domain as a sample to fine-tune the model, so that the model as a whole has a strong ability to understand semantic intent in the vehicle domain. In addition, when the semantic understanding model is implemented by the self-attention mechanism, attention can be performed to capture the correlation between words in the text, and it helps to capture long-distance features, and the extracted syntactic features are more accurate .
  • the execution device acquires entities in the text to be analyzed.
  • the execution device extracts entities from the text to obtain the entities in the text.
  • the execution device obtains a structured entity vector corresponding to the entity according to the entity in the text to be analyzed.
  • the structured entity vector is used to indicate the identity of the entity and the attribute of the entity.
  • the structured entity vector is a vector representation of the entity. Due to the use of the vector data format, the data structure is more regular and complete. For example, the number of dimensions of the structured entity vector is 100 dimensions. Of course, the structured entity vector may not be a 100-dimensional vector, but a vector of other dimensions. This embodiment does not deal with the specific dimension of the structured entity vector. limited.
  • Mo is the identifier of an entity
  • the attribute of Mo is the song name
  • the structured entity vector of Mo is (-0.0369 -0.1494 0.0732 0.0774 0.0518 0.0518%), where (-0.0369-0.1494 0.0732 0.0774 0.0518 0.0518).
  • the ellipsis in indicates 94 dimensions that are not shown, and -0.0369, -0.1494, 0.0732, 0.0774, 0.0518, and 0.0518 are the values of the 6 dimensions respectively.
  • (-0.0369 -0.1494 0.0732 0.0774 0.0518 0.0518...) means silent and song name.
  • the execution device obtains the structured entity vector from the entity construction table according to the entities in the text. For example, see Figure 5, apparatus according to the execution entity (Lane E E E seven incense), Pachet acquired from the entity table constructed vector is structured entity (-0.7563 -0.6532 0.2182 0.3914 0.3628 0.5528).
  • the entity construction table is used to store the mapping relationship between the entity and the structured entity vector.
  • the entity construction table is also called the knowledge entity mapping table, which is used to map the entity into a structured entity vector to realize the representation of the entity.
  • the entity construction table is pre-stored in the execution device.
  • the execution device uses the entity as an index to query the entity construction table to obtain a structured entity vector, thereby mapping the entity to a vector representation.
  • the entity construction table is set according to experience, for example, each word in the Chinese thesaurus is input into the word embedding model in advance, each word is processed through the word embedding model, and the word vector of each word is output.
  • the word embedding model may be a neural network model.
  • the entity construction table may be as shown in Table 1 below.
  • the meaning of the entity construction table is that Mu is an entity, and the structured entity vector of Mu is (-0.0369 -0.1494 0.0732 0.0774 0.0518 0.0518...);
  • Laojiumen is an entity, and the structured entity vector of Laojiumen is ( -0.0154 -0.2385 0.1943 0.4892 0.7531 0.9021...);
  • the bird in the forest is an entity, and the structured entity vector of the bird in the forest is (-0.1692 -0.4494 0.7911 0.9651 0.7226 0.3128...);
  • Qilixiang is an entity,
  • the structured entity vector of Qilixiang is (-0.7563 -0.6532 0.2182 0.3914 0.3628 0.5528).
  • each structured entity vector is a 100-dimensional vector
  • the ellipsis in each structured entity vector in Figure 5 and Table 1 represents the unshown 94-dimensional value.
  • Table 1 The last row represents other entities that are included in the entity construction table but are not shown in Table 1.
  • entity Structured entity vector silent -0.0369 -0.1494 0.0732 0.0774 0.0518 0.0518... Old nine doors -0.0154 -0.2385 0.1943 0.4892 0.7531 0.902... Bird in the forest -0.1692 -0.4494 0.7911 0.9651 0.7226 0.3128... Qilixiang -0.7563 -0.6532 0.2182 0.3914 0.3628 0.5528... ... .
  • the application is in the vehicle field
  • the entity construction table includes entities associated with the vehicle field.
  • the in-vehicle field includes navigation business, music playback business, radio business, communication business, short message sending and receiving business, instant messaging application business, schedule query business, news push business, smart question and answer business, air conditioning control Business area, vehicle control business area, maintenance business area
  • the entity construction table includes entities related to these business areas.
  • there are many navigation scenes and song listening scenes in the vehicle field and the entity construction table may include locations and songs. In this way, it helps to build a structured knowledge entity in the vehicle field.
  • the execution device extracts the entity of this text, and obtains that the entity is "silence", and the attribute of "silence” is the song name.
  • the execution device queries the above table 1 according to the "default”, and the structured entity vector is (-0.0369 -0.1494 0.0732 0.0774 0.0518 0.0518%), (-0.0369 -0.1494 0.0732 0.0774 0.0518 0.0518%) represents the entity "default” and
  • the attribute of the song name is subsequently determined as "listen to the song” based on the structured entity vector.
  • the execution device extracts entities from this text, and the entity is obtained as “Lao Jiu Men”, and the attribute of "Lao Jiu Men” is the name of the singer.
  • the execution device queries the above table 1 according to the "Old Nine Gates", and the structured entity vector is (-0.0154 -0.2385 0.1943 0.4892 0.7531 0.902%), (-0.0154 -0.2385 0.1943 0.4892 0.7531 0.902%) means "Old Nine Gates” The attribute of this entity and the name of the singer.
  • the execution device extracts entities from this text, and the entity is "Bird in the forest” or "Bird in the forest”.
  • the attribute is the name of the song.
  • the execution device queries the above table 1 according to the "birds in the forest", and the structured entity vector is (-0.1692 -0.4494 0.7911 0.9651 0.7226 0.3128%), (-0.1692 -0.4494 0.7911 0.9651 0.7226 0.3128%) means "in the forest”
  • the entity and the attribute of the song name is
  • the entity construction table includes at least one of entities with irregular names, entities with the number of characters in the name exceeding the threshold, and entities with the word frequency of the name lower than the threshold.
  • An entity with an irregular name is, for example, a grammatically irregular song.
  • An entity whose name has a number of characters exceeding the threshold is, for example, a long-character place name.
  • An entity whose name has a word frequency lower than the threshold is, for example, a place name with low-frequency characters. Because the names of these entities are prone to ambiguity or have multiple meanings, it is difficult for the machine to understand the correct semantics.
  • the machine can look up the table to obtain an accurate vector representation. Incorporating entity features into the process can help improve the accuracy of semantic understanding.
  • the execution device can determine the semantic intent of this sentence as "navigation” instead of "listen to songs” after semantic analysis based on the vector, thereby improving The accuracy of semantic intent recognition is improved.
  • the execution device adopts the following formula (13) to realize the extraction of the structured entity vector.
  • obtaining the entity in the text to be analyzed is Extract in the following formula (13)
  • obtaining the structured entity vector is F in the following formula (13).
  • x1...xn represents the text to be analyzed
  • x1 represents the first word in the text
  • xn represents the nth word in the text
  • ... represents the words contained in the text but is not shown
  • Extract represents the entity extraction operation
  • F represents the mapping function used to construct the entity
  • E1 represents the structured entity vector
  • e1 represents the vector representation of each entity extracted.
  • the entities in the input text are extracted, and a structured entity vector is constructed to vectorize the entity. Since the entity vector can represent the entity and the attributes of the entity, the vectorized representation effect of the entity is good. The effective embedding of entities is realized. Therefore, when the subsequent pre-training model performs further recognition based on the structured entity vector, it can enhance the on-board semantic intent understanding capability and the semantic slot extraction capability of the pre-training model.
  • S402 and S403 can be executed sequentially. For example, S402 may be executed first, and then S403; or S403 may be executed first, and then S402. In other embodiments, S402 and S403 can also be executed in parallel, that is, S402 and S403 can be executed simultaneously.
  • the execution device performs feature extraction on the structured entity vector to obtain entity features.
  • the execution device performs an attention operation on the structured entity vector to obtain the entity feature, so that the entity feature can capture the internal structure and dependency relationship of the structured entity vector.
  • the execution device uses the multi-head attention model to perform the following steps (1) to (4) to perform feature extraction on the structured entity vector.
  • Step (1) The execution device inputs the structured entity vector into the second multi-head attention model.
  • the second multi-head attention model includes m-layer transformer units, each transformer unit is used to implement the multi-head attention mechanism, and each transformer unit includes h self-attention modules.
  • the second multi-head attention model includes attention module 0, attention module 1, attention module 2, attention module 3, attention module 4, attention module 5, attention module 6 and Attention module 7.
  • the structured entity vector of Qilixiang (-0.7563 -0.6532 0.2182 0.3914 0.3628 0.55287) can be used as the input matrix X of the second multi-head attention model, and the input matrix X can be input to the attention Module 0, attention module 1, attention module 2, attention module 3, attention module 4, attention module 5, attention module 6, and attention module 7.
  • Step (2) The execution device performs attention operations on the structured entity vector through each attention module in the second multi-head attention model to obtain the output result of each attention module.
  • Attention module 0 attention module 1, attention module 2, attention module 3, attention module 4, attention module 5, attention module 6 and attention module 7 can respectively input Matrix X performs attention calculations to get the output result of attention module 0, the output result of attention module 1, the output result of attention module 2, the output result of attention module 3, the output result of attention module 4, and the attention The output result of module 5, the output result of attention module 6, and the output result of attention module 7.
  • each attention module can use the following formulas (14) to (17) to perform attention calculations, and the output result of the attention module can be expressed by formula (18).
  • X 2 represents the input structured entity vector
  • W Q in formula (14) represents the query weight matrix of an attention module in the second multi-head attention model
  • Q represents an attention in the second multi-head attention model
  • W K in formula (15) is the key weight matrix of an attention module in the second multi-head attention model
  • K represents the key matrix of an attention module in the second multi-head attention model.
  • W V in formula (16) is the value weight matrix of one attention module in the second multi-head attention model
  • V represents the value matrix of one attention module in the second multi-head attention model.
  • head(i) represents the output matrix of the current self-attention mechanism
  • the number of columns of head(i) the number of columns of the value (Value) vector.
  • dk is the corresponding hidden neural unit dimension.
  • Attention means attention calculation.
  • softmax means to operate through the softmax function.
  • Step (3) The execution device splices the output results of each attention module to obtain the spliced result.
  • the data form of the output result of the attention module is a matrix
  • the data form of the splicing result is also a matrix
  • the number of dimensions of the splicing result is equal to the sum of the number of dimensions of the output result of each attention module.
  • the splicing method can be horizontal splicing, and the splicing process can be realized by calling the concat (splicing) function. It should be understood that the way of horizontal splicing is only an exemplary illustration.
  • splicing methods to splice the output results of each attention module, for example, use vertical splicing to splice the output results of each attention module to obtain the splicing result, then the number of rows of the splicing result It is equal to the sum of the number of rows of the output result of each attention module, and this embodiment does not specifically limit how to perform splicing.
  • the multi-head attention model has 12 attention modules, and the output result of each attention module of these 12 attention modules is a matrix with 10 rows and 64 columns, and the splicing result is a matrix with 10 rows and 768 columns.
  • the first to 12th columns in the result are the output results of the first attention module
  • the 13th to 24th columns in the stitching result are the output results of the second attention module
  • the 25th column in the stitching result The 36th column is the output result of the third attention module
  • the 705th to the 768th column in the splicing result are the output results of the 12th attention module.
  • the output result of each attention module is head i in formula (20)
  • the output result of h attention modules is head in formula (19) 1 ,...head h , where head1 is the output result of attention module 1, headh is the output result of attention module h, and the ellipsis indicates the output result of other attention modules not shown.
  • the splicing can be The operation is performed by the Concat function in formula (19).
  • Concat in formula (19) represents the splicing operation
  • h represents the number of attention modules
  • h is a positive integer greater than 1
  • WO represents a weight matrix
  • WO is obtained by joint training in the second multi-head attention model
  • MultiHead is the output of the second long attention model
  • Q i denotes a corresponding attention module headi Q matrix
  • K i denotes matrix K corresponding to the attention module headi
  • V i represents headi attention module corresponding to the V matrix.
  • Step (4) The execution device performs linear transformation on the splicing result to obtain the entity feature.
  • the linear transformation is multiplied by a weight matrix
  • step (4) may specifically be: the execution device multiplies the splicing result by the weight matrix, and uses the product as the entity feature.
  • the weight matrix used in the linear transformation is WO
  • step (4) can be specifically: multiplying Concat (head 1 , ... head h ) and WO to obtain
  • the product is MultiHead (Q, K, V)
  • MultiHead (Q, K, V) is the entity feature.
  • the linear transformation can also adopt other methods besides multiplying the weight matrix, for example, multiplying the splicing result by a certain constant, thereby performing a linear transformation on the splicing result, or combining the splicing result with a certain constant. Plus, so as to perform a linear transformation on the splicing result, and this embodiment does not limit the method of linear transformation.
  • step (3) and step (4) can be expressed by the above formula (19), formula (20) and the following formula (21).
  • E2 represents the entity feature extracted from the structured entity vector of the text.
  • the data form of E2 is a matrix, each row of E2 is a structured entity vector corresponding to an entity in the text, and the number of dimensions of E2 is equal to the number of dimensions of a structured entity vector.
  • E2 is a matrix of N rows, the first row of E2 is the structured entity vector corresponding to the first entity in the text, and the second row of E2 is the second in the text.
  • a structured entity vector corresponding to each entity If a structured entity vector is a 100-dimensional vector, the number of dimensions of E2 is equal to 100. N is a positive integer.
  • the multi-head attention mechanism can capture the correlation between words in the structured entity vector, and help to capture long-distance features, so that the extracted entity features can accurately express semantics, so The physical features are more accurate.
  • the execution device fuses the entity feature of the text, the lexical feature of the text, and the syntactic feature of the text to obtain the semantic feature of the text.
  • the execution device realizes the preliminary semantic intent understanding of the text information by extracting the lexical, syntactic and entity features from the text. Next, the execution device fuses lexical features, syntactic features, and entity features to combine the three features.
  • the fused semantic features include entity features, lexical features, and syntactic features, which contain a wealth of semantic-related information. Therefore, the semantic features can be used to obtain the semantic information of the text, and the use of the fused semantic features can further enhance the on-board semantic intent understanding ability and the semantic slot extraction ability of the pre-training model itself.
  • the output of the semantic understanding model is (w1w2w3w4w5w6w7w8w9), (w1w2w3w4w5w6w7w8w9) contains the lexical features of the text and the syntactic features of the text, and (w1w2w3w4w5w6w7w8w9) is the fusion of the lexical features and the syntactic features of the text Syntactic features are integrated in the internal calculation process of the semantic understanding model.
  • the entity feature obtained through 504 is (e5e6e7).
  • e5 is an entity feature of a structured entity vector
  • e5 is a vector
  • e6 is an entity feature of another structured entity vector
  • E6 is a vector
  • e7 is the entity feature of another structured entity vector
  • e7 is also a vector. Since (w1w2w3w4w5w6w7w8w9) already contains lexical features and syntactic features, after fusing them with entity features, semantic features will include lexical features, syntactic features and entity features.
  • the execution device can perform feature fusion through the following steps one to two.
  • Step 1 The execution device performs a weighted summation on the entity features of the text, the lexical features of the text, and the syntax features of the text to obtain the fusion feature.
  • lexical, syntactic, and entity features are features in different vector spaces, or that lexical, syntactic, and entity features are heterogeneous information, this can be summed up by weighting the entity feature, lexical feature, and syntactic feature. The three features are fused together to achieve heterogeneous information fusion.
  • Step 2 The execution device performs nonlinear transformation on the fusion feature through the activation function to obtain the semantic feature.
  • the activation function can adopt the GELU function.
  • the execution device can use the following formula (22) and formula (23) to perform operations, and formula (22) and formula (23) can be provided as heterogeneous information fusion strategies.
  • wi represents the output of the semantic model 200 to be appreciated, wi may be in the form of a text sequence.
  • V obtained by LayerNorm in the above formula (12) can be in the form of a matrix
  • the wi in the formula (22) is a row of V in the above formula (12).
  • ei represents the output result of the entity building module
  • the form of ei can be a knowledge sequence, that is, a structured entity vector
  • ei can be a row in the matrix E2 obtained by formula (21)
  • ⁇ (x) means conforming to (0, 1) The probability distribution function of the normal distribution.
  • the execution device decodes the semantic feature to obtain the semantic information of the text.
  • S407 is an optional step, and this embodiment does not limit whether to perform S407.
  • the semantic information includes at least one of semantic intent and semantic slot.
  • the execution device can calculate the probability distribution of semantic intent to obtain the current semantic intent and semantic slot.
  • y1 represents semantic intent
  • y2...yn+1 represents the semantic slot information of the text signal.
  • the execution device uses the following formula (24) and formula (25) for calculation.
  • y1 represents semantic intent
  • Wh1 represents the weight matrix
  • b1 represents the bias parameter
  • F represents the function used for decoding.
  • yi represents the semantic slot
  • Wh2 represents the weight matrix
  • b2 represents the bias parameter.
  • the execution device executes a corresponding operation according to the semantic information.
  • the execution device is an in-vehicle terminal, and the in-vehicle terminal controls the in-vehicle execution system to operate according to semantic information, so as to perform in-vehicle voice interaction.
  • the execution device can wait. If a new voice signal arrives, the execution device re-executes the above process to understand the semantics of the new voice signal.
  • the method provided in this embodiment constructs a structured entity vector to represent the identity of the entity and the attribute of the entity in the form of a vector, extracts the entity feature from the structured entity vector, and fuses the entity feature with the lexical feature and the syntactic feature, Semantic features including entity features, lexical features, and syntactic features are obtained. Semantic information is obtained after the semantic features are decoded. Since the structured entity vector contains the identity of the entity and the attributes of the entity, the attributes of the entity can be used to enhance the ability of semantic understanding.
  • the second embodiment is illustrated by the third embodiment below.
  • the execution device is a vehicle-mounted terminal
  • the text to be recognized is obtained by recognizing the voice collected by the vehicle-mounted terminal.
  • the third embodiment is about how the vehicle-mounted terminal uses the second embodiment to perform voice interaction with the user. It should be understood that the steps in the third embodiment are the same as those in the second embodiment, please refer to the second embodiment, and will not be repeated in the third embodiment.
  • FIG. 7 is the third embodiment of a vehicle-mounted voice interaction based on a semantic understanding model and a structured entity vector provided by the third embodiment of the application.
  • the third embodiment may be specifically executed by a vehicle-mounted terminal.
  • the third embodiment includes S701 to S704.
  • the audio device of the vehicle-mounted terminal collects the voice input by the user, the voice is a control command signal, and the audio device is, for example, a distributed microphone array.
  • the voice recognition module of the vehicle terminal converts the voice signal into a text signal, and the text signal is input into the semantic understanding module of the vehicle terminal.
  • the steps corresponding to the semantic understanding module include S7031 to S7039.
  • the vehicle-mounted terminal Based on the multi-head attention mechanism, the vehicle-mounted terminal performs attention operations on the text signal through multiple attention modules to obtain the output result of each attention module, and obtains the first output result after splicing and linear transformation.
  • the vehicle-mounted terminal performs a vector normalization operation on the first output result, so that the first output result is normalized to the second output result.
  • the vehicle-mounted terminal performs a forward transfer operation on the second output result, so that the second output result is converted into a third output result after being subjected to linear transformation and non-linear transformation.
  • the vehicle-mounted terminal performs a vector normalization operation on the third output result, so that the third output result is normalized into a syntactic feature and a lexical feature.
  • the knowledge entity extraction module of the vehicle terminal extracts entities from the text input to obtain effective entities.
  • the knowledge entity building module of the vehicle-mounted terminal performs vectorized representation of the entity to obtain a characterization of the entity's attributes.
  • the vehicle-mounted terminal uses multiple attention modules to perform attention operations on the characterization of the entity's attributes, to obtain the output result of each attention module, and to obtain the entity characteristics through splicing and linear transformation.
  • the heterogeneous information fusion module of the vehicle terminal realizes effective information fusion of the syntactic, lexical, and entity features of the text input in different vector spaces.
  • the vehicle-mounted terminal calculates the semantic intent probability distribution through the semantic decoder to obtain the user's current semantic intent and semantic slot.
  • the vehicle-mounted function module receives the control command signal, and performs operations according to the control command signal.
  • the method provided in this embodiment provides a vehicle-mounted voice interaction method based on semantic understanding models and structured entity vectors in the vehicle field. Because the semantic understanding model that has undergone pre-training and model fine-tuning is used, it is based on structured entity vectors. The entity features are extracted, and the entity features, lexical features and syntactic features are integrated. Therefore, it can solve the problem of insufficient semantic intent understanding and incomplete recognition of basic structured knowledge entities in the scene of in-vehicle voice interaction, thereby further enhancing the in-vehicle The domain's semantic intention understanding ability and semantic slot information extraction ability.
  • the first embodiment is the training stage of the semantic understanding model (the stage performed by the training device 12 as shown in FIG. 1), and the specific training adopts any possible implementation method based on the first embodiment and the first embodiment.
  • the pre-training model provided in the second embodiment can be understood as the application stage of the semantic understanding model (the stage executed by the execution device 11 as shown in FIG. 1), which can be specifically embodied as using the training performed by the first embodiment
  • the obtained semantic understanding model obtains the output semantic information according to the voice or text input by the user
  • the third embodiment is an embodiment included in the second embodiment.
  • the semantic analysis method of the embodiment of the present application is introduced above, and the semantic analysis device of the embodiment of the present application is introduced below. It should be understood that the semantic analysis device has any function of the execution device in the foregoing method.
  • FIG. 9 is a schematic structural diagram of a semantic analysis apparatus provided by an embodiment of the present application.
  • the semantic analysis apparatus 900 includes: an acquisition module 901 for executing S403 to S404; an extraction module 902 for executing S405 ; The fusion module 903 is used to perform S406.
  • the fusion module 903 includes: a weighted sum sub-module for performing step one in S406; and a transform sub-module for performing step two in S406.
  • the extraction module 902 includes: an attention sub-module, used to perform step 2.1 in S402; a normalization sub-module, used to perform step 2.2 in S402; a transformation sub-module, used to perform step S402 Step 2.3:
  • the normalization sub-module is also used to perform step 2.4 in S402.
  • the attention sub-module is used to execute step 2.1.1 to step 2.1.4 in S402.
  • the extraction module 902 includes: an input sub-module for performing step (1) in S405; an attention sub-module for performing step (2) in S405; and a splicing sub-module for performing S405 Step (3) in S405; a transformation sub-module for executing step (4) in S405.
  • semantic analysis apparatus 900 provided in the embodiment of FIG. 9 corresponds to the execution device in the foregoing method embodiment, and each module in the semantic analysis apparatus 900 and the foregoing other operations and/or functions are used to implement the execution device in the method embodiment.
  • each module in the semantic analysis apparatus 900 and the foregoing other operations and/or functions are used to implement the execution device in the method embodiment.
  • the semantic analysis device provided in the embodiment of FIG. 9 only uses the division of the above-mentioned functional modules for example when analyzing semantics.
  • the above-mentioned functions can be allocated by different functional modules as needed, that is, semantics.
  • the internal structure of the analysis device is divided into different functional modules to complete all or part of the functions described above.
  • the semantic analysis device provided in the foregoing embodiment belongs to the same concept as the foregoing embodiment 2, and its specific implementation process is detailed in the method embodiment, which will not be repeated here.
  • FIG. 10 is a schematic structural diagram of a training device for a semantic understanding model provided by an embodiment of the present application.
  • the training device 1000 for the semantic understanding model includes: an acquisition module 1001 for performing S301; a training module 1002, It is used to perform S302; the acquisition module 1001 is also used to perform S303, and the training module 1002 is also used to perform S304.
  • the training device 1000 of the semantic understanding model provided in the embodiment of FIG. 10 corresponds to the training device in the foregoing method embodiment, and each module in the training device 1000 of the semantic understanding model and the foregoing other operations and/or functions are used to implement the method.
  • each module in the training device 1000 of the semantic understanding model and the foregoing other operations and/or functions are used to implement the method.
  • the various steps and methods implemented by the training device in the embodiment for specific details, please refer to the above method embodiment. For the sake of brevity, details are not repeated here.
  • the training device for the semantic understanding model provided in the embodiment of FIG. 10 only uses the division of the above-mentioned functional modules for example when training the semantic understanding model.
  • the above-mentioned functions can be assigned to different functions according to needs.
  • Module completion that is, the internal structure of the training device of the semantic understanding model is divided into different functional modules to complete all or part of the functions described above.
  • the training device for the semantic understanding model provided in the foregoing embodiment belongs to the same concept as the foregoing embodiment 1, and the specific implementation process is detailed in the method embodiment, and will not be repeated here.
  • FIG. 11 is a schematic diagram of the hardware structure of a semantic analysis device provided by an embodiment of the present application.
  • the semantic analysis apparatus 1100 shown in FIG. 11 includes a memory 1101, a processor 1102, a communication interface 1103, and a bus 1104.
  • the memory 1101, the processor 1102, and the communication interface 1103 implement communication connections between each other through the bus 1104.
  • the memory 1101 may be a read only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device, or a random access memory (Random Access Memory, RAM).
  • the memory 1101 may store a program. When the program stored in the memory 1101 is executed by the processor 1102, the processor 1102 and the communication interface 1103 are used to execute each step of the semantic analysis method of the embodiment of the present application.
  • the processor 1102 may adopt a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a graphics processing unit (graphics processing unit, GPU), or one or more
  • the integrated circuit is used to execute related programs to realize the functions required by the units in the semantic analysis device of the embodiment of the present application, or to execute the semantic analysis method of the method embodiment of the present application.
  • the processor 1102 may also be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the semantic analysis method of the present application can be completed by an integrated logic circuit of hardware in the processor 1102 or instructions in the form of software.
  • the aforementioned processor 1102 may also be a general-purpose processor, a digital signal processor (Digital Signal Processing, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices , Discrete gates or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processing
  • ASIC application specific integrated circuit
  • FPGA ready-made programmable gate array
  • FPGA Field Programmable Gate Array
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1101, and the processor 1102 reads the information in the memory 1101, and combines its hardware to complete the functions required by the units included in the semantic analysis device of the embodiment of the present application, or perform the semantic analysis of the method embodiment of the present application method.
  • the communication interface 1103 uses a transceiving device such as but not limited to a transceiver to implement communication between the device 1100 and other devices or a communication network.
  • a transceiving device such as but not limited to a transceiver to implement communication between the device 1100 and other devices or a communication network.
  • the text (such as the text to be analyzed in the second embodiment of the present application) can be obtained through the communication interface 1103.
  • the bus 1104 may include a path for transferring information between various components of the device 1100 (for example, the memory 1101, the processor 1102, and the communication interface 1103).
  • extraction module 902, the fusion module 903, and the decoding module 903 in the semantic analysis apparatus 900 may be equivalent to the processor 1102.
  • FIG. 12 is a schematic diagram of the hardware structure of a training device for a semantic understanding model provided by an embodiment of the present application.
  • the training apparatus 1200 of the semantic understanding model shown in FIG. 12 includes a memory 1201, a processor 1202, a communication interface 1203, and a bus 1204.
  • the memory 1201, the processor 1202, and the communication interface 1203 implement communication connections between each other through the bus 1204.
  • the memory 1201 may be a read only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device, or a random access memory (Random Access Memory, RAM).
  • the memory 1201 may store a program. When the program stored in the memory 1201 is executed by the processor 1202, the processor 1202 and the communication interface 1203 are used to execute each step of the semantic understanding model training method of the embodiment of the present application.
  • the processor 1202 may adopt a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a graphics processing unit (graphics processing unit, GPU), or one or more
  • the integrated circuit is used to execute related programs to realize the functions required by the units in the training device of the semantic understanding model of the embodiment of the present application, or to execute the training method of the semantic understanding model of the method embodiment of the present application.
  • the processor 1202 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the training method of the semantic understanding model of the present application can be completed by the integrated logic circuit of the hardware in the processor 1202 or the instructions in the form of software.
  • the aforementioned processor 1202 may also be a general-purpose processor, a digital signal processor (Digital Signal Processing, DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices , Discrete gates or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processing
  • ASIC application specific integrated circuit
  • FPGA off-the-shelf programmable gate array
  • FPGA Field Programmable Gate Array
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1201, and the processor 1202 reads the information in the memory 1201, and combines its hardware to complete the functions required by the units included in the training device for the semantic understanding model of the embodiment of the present application, or execute the method embodiment of the present application The training method of the semantic understanding model.
  • the communication interface 1203 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 1200 and other devices or a communication network.
  • a transceiver device such as but not limited to a transceiver to implement communication between the device 1200 and other devices or a communication network.
  • the training data (such as the masked text in the first embodiment of the present application or the text marked with semantic information such as semantic intent and semantic slot) can be obtained through the communication interface 1203.
  • the bus 1204 may include a path for transferring information between various components of the device 1200 (for example, the memory 1201, the processor 1202, and the communication interface 1203).
  • the acquisition module 1001 in the training device 1000 of the semantic understanding model is equivalent to the communication interface 1203 in the training device 1200 of the semantic understanding model, and the training module 1002 can be equivalent to the processor 1202.
  • the devices 1200 and 1100 shown in FIG. 12 and FIG. 11 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the devices 1200 and 1100 also include implementations. Other devices necessary for normal operation. At the same time, according to specific needs, those skilled in the art should understand that the apparatuses 1200 and 1100 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the apparatuses 1200 and 1100 may also only include the necessary devices for implementing the embodiments of the present application, and not necessarily all the devices shown in FIG. 12 or FIG. 11.
  • the device 1200 is equivalent to the training device 12 in FIG. 1, and the device 1100 is equivalent to the execution device 11 in FIG. 1.
  • a person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the unit is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the unit described as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may also be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • this function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

一种语义分析方法、装置、设备及存储介质,涉及人工智能领域,具体涉及自然语言理解领域。该方法包括:从待分析的文本中提取结构化实体向量,该结构化实体向量用于指示所述实体的标识以及所述实体的属性;对该结构化实体向量进行特征提取,得到实体特征;对该实体特征、该文本的词法特征和该文本的句法特征进行融合,得到该文本的语义特征;对该语义特征进行解码,得到该文本的语义信息。该方法能够利用实体的属性增强语义理解的能力。

Description

语义分析方法、装置、设备及存储介质 技术领域
本申请涉及自然语言理解技术领域,特别涉及一种语义分析方法、装置、设备及存储介质。
背景技术
自然语言理解(natural language understanding,NLU)是一种由计算机分析自然语言形式的文本的语义的技术,旨在令计算机理解自然语言的意义,从而方便用户使用自然语言同计算机进行通讯。NLU技术在很多场景中得到了广泛的应用。例如,在车载领域中,驾驶员基于自然语言说出语音后,车载终端可以将语音转换为文本,对文本进行语义分析,得到文本的语义信息,根据语义信息执行对应的指令,从而实现语音交互的功能。
时下,可以对待分析的文本进行分词,得到文本中包含的每个词,将每个词分别输入word2vector模型(一种将词转换为向量的模型),通过word2vector模型将每个词表征为向量,根据每个词对应的向量,分析文本的语义信息。
文本中经常会包含一些特定的实体,比如歌曲、地点等,这些实体会对文本的语义产生很大的影响。而采用上述方法时,识别文本中实体的能力较差,造成计算机的语义理解能力不足。
发明内容
本申请提供一种语义分析方法、装置、设备及存储介质,能够提高计算机的语义理解能力。
第一方面,提供了一种语义分析方法,在该方法中,获取待分析文本中的实体;根据所述待分析文本中的所述实体,获取所述实体对应的结构化实体向量,所述结构化实体向量用于指示所述实体的标识以及所述实体的属性;对所述结构化实体向量进行特征提取,得到实体特征;对所述实体特征、所述文本的词法特征和所述文本的句法特征进行融合,得到所述文本的语义特征,所述语义特征用于获取所述文本的语义信息。
上述方法中,通过构建结构化实体向量,以向量的形式来表征实体的标识和实体的属性,从结构化实体向量提取出实体特征,将实体特征与词法特征和句法特征进行融合,得到包含了实体特征、词法特征和句法特征的语义特征,对语义特征解码后得到语义信息,由于结构化实体向量中包含实体的标识和实体的属性,能够利用实体的属性增强语义理解的能力。
可选地,结构化实体向量的提取方式可以包括:根据所述待分析文本中的所述实体,从实体构建表中获取所述结构化实体向量,所述实体构建表用于保存实体与结构化实体向量之间的映射关系。通过这一方式,由于实体的向量能够表征实体和实体的属性,因此实体的向量化表示效果好,实现实体的有效嵌入,因此后续预训练模型根据结构化实体向量进行进一步识别时,能够增强预训练模型的车载语义意图理解能力和语义槽位提取能力。
可选地,该实体构建表包括车载领域关联的实体,该文本是对车载终端采集的语音进行 识别得到的。通过这种方式,有助于构建车载领域结构化知识实体。
可选地,该实体构建表包括名称不规则的实体、名称的字符数量超过阈值的实体、名称的词频低于阈值的实体中的至少一项。这些实体由于名称容易引起歧义或具有多种含义,机器难以理解正确语义,而通过预先将这些实体的向量表示预先存入实体构建表,机器查表即可得到准确的向量表示,通过在语义理解的过程中融入实体特征,有助于提高语义理解的准确性。
可选地,融合实体特征、词法特征和句法特征的方式包括:对该实体特征、该词法特征和该句法特征进行加权求和,得到融合特征;通过激活函数对该融合特征进行非线性变换,得到该语义特征。由于词法特征、句法特征和实体特征是不同向量空间中的特征,或者说词法特征、句法特征和实体特征是异构信息,通过对实体特征、词法特征和句法特征进行加权求和,可以将这三种特征融合在一起,从而实现异构信息融合。
可选地,文本的词法特征和句法特征采用这样的方式提取:将文本输入语义理解模型,该语义理解模型是根据第一样本对预训练模型进行迁移训练得到的,该第一样本包括标注了语义信息的文本,该预训练模型是根据第二样本训练得到的,该第二样本包括被掩码的文本;通过该语义理解模型,从该文本中提取该词法特征和该句法特征。通过采用掩码策略训练出预训练模型,使得预训练模型具备基本的自然语言处理能力。在预训练模型的基础上,结合语义理解的目标,使用标注了语义信息的文本对预训练模型进行模型微调,使得预训练模型通过微调的过程中,学习出文本与语义信息之间的关联关系,具备词法特征、句法特征和语义特征的提取能力。那么在模型应用阶段,即可利用该语义理解模型,提取出准确的词法特征、句法特征和语义特征。
可选地,语义理解模型提取词法特征和句法特征的方式可以包括:对该文本进行注意力运算,得到第一输出结果,该第一输出结果用于指示该文本中词与词之间的依赖关系;对该第一输出结果进行归一化,得到第二输出结果;对该第二输出结果进行线性变换和非线性变换,得到第三输出结果;对该第三输出结果进行归一化,得到该词法特征和该句法特征。
可选地,该语义理解模型包括第一多头注意力模型,相应地,注意力运算的方式包括:将该文本输入该第一多头注意力模型;通过该第一多头注意力模型中的每个注意力模块,分别对该文本进行注意力运算,得到每个注意力模块的输出结果;对该每个注意力模块的输出结果进行拼接,得到拼接结果;对该拼接结果进行线性变换,得到该第一输出结果。通过上述方式,能够利用多头注意力机制,捕获到文本中长距离特征,能够提取到丰富的上下文语义表征信息,增强对词法特征和句法特征的提取能力。
可选地,提取实体特征的方式包括:将该结构化实体向量输入第二多头注意力模型;通过该第二多头注意力模型中的每个注意力模块,分别对该结构化实体向量进行注意力运算,得到每个注意力模块的输出结果;对该每个注意力模块的输出结果进行拼接,得到拼接结果;对该拼接结果进行线性变换,得到该实体特征。通过上述方式,利用多头注意力机制,能够捕捉到结构化实体向量内部中词与词之间的相关性,并且有助于捕获长距离特征,使得提取到的实体特征能够准确表达出语义,因此实体特征更加精确。
第二方面,提供一种语义分析装置,该语义分析装置具有实现上述第一方面或第一方面任一种可选方式中语义分析的功能。该语义分析装置包括至少一个模块,至少一个模块用于 实现上述第一方面或第一方面任一种可选方式所提供的语义分析方法。第二方面提供的语义分析装置的具体细节可参见上述第一方面或第一方面任一种可选方式,此处不再赘述。
第三方面,提供了一种执行设备,该执行设备包括处理器,该处理器用于执行指令,使得该执行设备执行上述第一方面或第一方面任一种可选方式所提供的语义分析方法。第三方面提供的执行设备的具体细节可参见上述第一方面或第一方面任一种可选方式,此处不再赘述。
第四方面,提供了一种计算机可读存储介质,该存储介质中存储有至少一条指令,该指令由处理器读取以使执行设备执行上述第一方面或第一方面任一种可选方式所提供的语义分析方法。
第五方面,提供了一种计算机程序产品,当该计算机程序产品在执行设备上运行时,使得执行设备执行上述第一方面或第一方面任一种可选方式所提供的语义分析方法。
第六方面,提供一种芯片,该芯片包括处理器与数据接口,该处理器通过该数据接口读取存储器上存储的指令,执行上述第一方面或第一方面任一种可选方式所提供的语义分析方法。
可选地,作为一种实现方式,该芯片还可以包括存储器,该存储器中存储有指令,该处理器用于执行该存储器上存储的指令,当该指令被执行时,该处理器用于执行上述第一方面或第一方面任一种可选方式所提供的语义分析方法。
附图说明
图1是本申请实施例提供的一种系统架构的结构示意图;
图2是本申请实施例提供的一种根据语义理解模型提取词法特征和句法特征的示意图;
图3是本申请实施例提供的一种语义理解模型的训练方法的示意性流程图;
图4是本申请实施例提供的一种语义分析方法的示意性流程图;
图5是本申请实施例提供的一种提取结构化实体向量的示意图;
图6是本申请实施例提供的一种对实体特征、词法特征和句法特征进行融合的示意图;
图7是本申请实施例提供的一种基于语义理解模型和结构化实体向量的车载语音交互的方法的示意性流程图;
图8是本申请实施例提供的一种语义意图理解和语义槽位提取的示意性流程图;
图9是本申请实施例提供的一种语义分析装置的结构示意图;
图10是本申请实施例提供的一种语义理解模型的训练装置的结构示意图;
图11是本申请实施例提供的语义分析装置的硬件结构示意图;
图12是本申请实施例提供的一种语义理解模型的训练装置的硬件结构示意图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
本申请实施例提供的语义分析方法能够应用在人机交互场景以及其他需要让计算机理解自然语言的场景。具体而言,本申请实施例的语义分析方法能够应用在语音交互场景,例如应用在车载语音交互场景中,下面分别对语音交互场景和车载语音交互进行简单的介绍。
语音交互指的是人类与设备通过自然语音进行信息的传递。车载语音交互场景是用户与汽车上搭载的车载终端进行语音交互的场景。例如,在车辆驾驶过程中,用户可以发出包含指令的语音,车载终端可以将用户的语音转换为机器可以理解的指令,执行指令来实现操作,从而实现语音通话、车载空调的开启与关闭,车载座椅的自动化高度/温度调节和音乐播放等智能化生活化功能。通过语音交互的人机交互方式,用户可以将手和眼睛空闲出来去处理其他事情,比如想听音乐时,用户通过语音的方式来点播歌曲,这样手和眼睛可以专心用来驾驶,从而极大提升车载场景中驾驶安全性和便捷性。
在语音交互的应用场景中,自然语言理解(NLU)是实现车载语音交互系统的关键技术。自然语言理解是自然语言处理(Natural language processing,NLP)的一部分,是NLP的核心,也是NLP的难点。通俗地讲,自然语言理解技术就是希望机器像人一样,具备理解自然语言的能力,当给定一个输入的文本后,机器能输出正确的语义信息(比如正确的语义意图和语义槽位)。其中,自然语言就是人们平时在生活中常用的表达方式,比如,在描述驼背这个特征时,用自然语言表达可以是:我背有点驼,用非自然语言表达可以是:我的背部呈弯曲状。
然而目前,自然语言理解还有一些不尽人意的地方,尤其是车载语音交互的场景中,车载终端经常出现语义意图理解能力不足的问题,同时,车载终端也不能理解一些结构化的知识实体和抽象的语义表示。例如,对于语法不规则的歌曲名、长字符地名、低频字符地名等基本实体,车载终端难以识别出来,而实体识别能力不足会极大的影响理解语义的准确性。例如,用户想去北京市一个名为“世界之花”的假日广场,于是用户对车载终端说“搜索世界之花”。这句话表达的用户意图是导航,目的地为世界之花。而车载终端识别“世界之花”这四个字时,容易将世界之花理解为一首歌曲,将这句话错误地理解为用户意图是听歌,歌曲名为世界之花,导致车载终端本应执行导航业务,结果因为用户意图理解错误,执行了音乐播放业务,导致车载终端执行的业务不能满足用户预期的反馈。
由此可见,在车载语音交互场景中,如何提高语义理解能力是至关重要的,这也正是未来车载领域的热门研究方向。
而本申请的一些实施例中,提供了结合预训练模型和结构化实体向量的语义理解方法。一方面,通过采用大规模语料进行随机多元动态掩码训练,得到预训练模型,对预训练模型进行模型微调,得到语义理解模型,使得语义理解模型能够实现词法特征、句法特征和语义特征的提取,语义理解模型通过预训练的过程和模型微调的过程,能够提升语义意图的理解能力和语义槽位的提取能力,尤其是提升了车载领域下词法特征、句法特征和语义特征的提取,有很强的语义意图理解能力。另一方面,通过构建结构化实体向量,实现了实体的表征工作,实体的属性能够增强语义理解模型的语义意图理解能力。尤其是,通过设置车载领域的实体构建表,有助于车载终端识别基本的结构化实体向量,提升语义意图理解能力和语义槽位提取能力。再一方面,通过融合实体特征、词法特征和句法特征,实现了异构信息的融合,将实体特征、词法特征和句法特征这三种不同向量空间的语义信息结合在一起,来识别语义,从而提高语义理解的准确性。
下面从模型训练侧和模型应用侧对本申请提供的方法进行描述:
本申请实施例提供的语义理解模型训练方法,涉及自然语言的理解,具体可以应用于数据训练、机器学习、深度学习等数据处理方法,对训练数据(如本申请中的被掩码的文本或标注了语义意图、语义槽位等语义信息的文本)进行符号化和形式化的智能信息建模、抽取、预处理、预训练、模型微调等,最终得到训练好的语义理解模型;并且,本申请实施例提供的语义分析方法可以运用上述训练好的语义理解模型,将输入数据(如本申请实施例中待分析的文本)输入到该训练好的语义理解模型中,得到输出数据(如本申请中的语义意图、语义槽位等语义信息)。需要说明的是,本申请实施例提供的语义理解模型的训练方法和语义分析方法是基于同一个构思产生的发明,也可以理解为一个系统中的两个部分,或一个整体流程的两个阶段:如模型训练阶段和模型应用阶段。
由于本申请的语义理解模型涉及注意力机制在自然语言理解的应用,为了便于理解,下面先对本申请实施例涉及的注意力机制中的相关概念进行介绍。
(1)自注意力(self-attention)机制。
自注意力机制是注意力机制的改进,其减少了对外部信息的依赖,更擅长捕捉数据或特征的内部相关性。自注意力机制的本质是计算跟自己相关的序列;自注意力机制中目标序列与源序列是一样的。通过将自注意力机制应用在NLP领域,能够提取句子自身词间依赖,比如常用短语、代词指代的事物等。当输入一个句子后,机器在对每个词编码时,不单单关注要编码的这个词,还关注输入句子的其他单词,通过对每个词和该句子中的所有词进行注意力计算,学习句子内部的词依赖关系,从而捕获句子的内部结构。注意力运算的流程可以封装在注意力函数(Attention函数)中,该注意力函数可以记为Attention(X,X,X),机器得到输入的文本序列后,可以将文本序列作为X,调用注意力函数来进行自注意力运算。自注意力机制具有很多方面的优势。例如,从长距离依赖学习的角度来说,由于自注意力机制是每个词和所有词都要计算注意力,所以不管词与词之间有多长的距离,最大的路径长度也只是1,因此能够无视词之间的距离,计算依赖关系,从而学习出一个句子的内部结构。
以下,首先介绍如何使用向量实现自注意力运算,然后介绍下如何使用矩阵实现自注意力运算。
使用向量实现自注意力运算的过程可以包括以下步骤S10至步骤S14:
步骤S10、为输入序列中的每个词,生成三个向量,这三个向量包括一个查询向量、一个键向量和一个值向量。通常情况下,这三个向量是通过词的词嵌入与三个权重矩阵后相乘创建的。例如,如果输入的句子是thinking machine(具有思考能力的机器),这个句子中第一个词是“Thinking”(思考),“Thinking”的词嵌入为X1,X1与WQ权重矩阵相乘得到q1,q1就是与这个单词相关的查询向量。
步骤S11、计算得分。假设这个例子中的第一个词“Thinking”计算自注意力向量,可以使用输入句子中的每个词对“Thinking”打分,得到词的分数(Score)。例如,词“Thinking”的分数表达在编码词“Thinking”的过程中有多重视句子的其它部分。词“Thinking”的分数是通过对“Thinking”打分的词(输入句子的所有词)的键向量与“Thinking”的查询向量进行点积计算得到的。例如,如果句子包含2个词,第1个词的词嵌入为x1,第1个词的查询向量为q1,第1个词的键向量为k1,第1个词的值向量为v1,第1个词的词嵌入为x2,第1个词的查询向 量为q2,第1个词的键向量为k2,第1个词的值向量为v2,要处理第1个词的自注意力的话,第一个分数是q1和k1的点积,第二个分数是q1和k2的点积。
步骤S12、对词的分数(Score)进行处理,例如将分数除以默认值,然后将相除的结果通过softmax函数运算,得到词的softmax分数。其中,将分数除以默认值的作用是通过相除将分数缩小至一个较小的取值范围,避免softmax分数非零即1。通过softmax函数运算的作用是使所有词的分数归一化,这样每个词的softmax分数都是正数,且句子中所有词的的softmax分数之和为1。softmax分数决定了每个词对编码当前词(如“Thinking”和“machine”对“Thinking”)的贡献。
步骤S13、将每个值向量乘以softmax分数。
步骤S14、对加权值向量求和,得到自注意力层在该位置的输出(例如对第一个词“Thinking”的输出)。
通过执行上述步骤S10至步骤S14,完成了自注意力的计算,计算得到的向量可以传给前馈神经网络。在一些情况下,上述步骤S10至步骤S14可以通过矩阵形式完成运算,以便算得更快。例如,可以通过执行下述步骤S20至步骤S21,用矩阵实现自注意力的计算。
步骤S20、计算查询矩阵、键矩阵和值矩阵。具体地,将输入的句子中每个词的词向量装入矩阵X中,将矩阵X分别乘以查询权重矩阵W Q、键权重矩阵W K、值权重矩阵W V,得到查询矩阵Q、键矩阵K和值矩阵V。其中,可以采用下述公式(1)计算查询矩阵Q,采用下述公式(2)计算键矩阵K,采用下述公式(3)计算值矩阵V。
Q=W Q X 1   (1)
K=W K X 1   (2)
V=W V X 1   (3)
其中,矩阵X中的每一行对应于输入句子中的一个词,矩阵X的每一行为一个词的词向量,矩阵Q表示输入句子的查询(Queries)矩阵,矩阵Q中每一行为一个词的Query向量,矩阵K表示输入句子的键(Key)矩阵,矩阵K中每一行为一个词的Key向量,矩阵V表示输入句子的值(Value)矩阵,矩阵V中每一行为一个词的Value向量。
步骤S21、可以通过以下公式(4)表达,下述公式(4)为上述步骤S11至步骤S14的合并。
Figure PCTCN2020073914-appb-000001
(2)多头注意力(Multi-Head Attention)模型。
多头注意力模型被称为多头,是因为多头注意力模型包含h个注意力模块,每一个注意力模块均可以实现上述(1)所示的自注意力机制,通过h个注意力模块会进行h次注意力运算,h为大于1的正整数,例如h可以是8。其中,每个注意力模块保持独立的查询权重矩阵、键权重矩阵、值权重矩阵,因此使用输入的矩阵X与每个注意力模块的查询权重矩阵W Q、键权重矩阵W K、值权重矩阵W V进行运算后,会产生h个查询矩阵Q、键矩阵K和值矩阵V,进而产生h个矩阵Z,分别是矩阵Z 0、矩阵Z 1至矩阵Z h,。然而,通常情况下如多头注意力 模型之后的网络(如前馈网络)不需要输入h个矩阵,它需要输入一个矩阵,要求该矩阵由每一个词的表示向量组成。因此,可以将h个矩阵Z压缩成一个矩阵。一种压缩的实现方式是,将h个矩阵(矩阵Z 0、矩阵Z 1至矩阵Z h)拼接在一起,然后用一个附加的权重矩阵W O与拼接结果相乘,相乘的结果是融合了所有注意力模块信息的矩阵Z,可以使用该矩阵Z进行后续运算,如送到前馈网络。可选地,拼接的输出结果的维度数量等于拼接的输入参数的维度数量之和,拼接的输出结果的行数和拼接的输入参数的行数相等。例如,对h个矩阵(矩阵Z 0、矩阵Z 1至矩阵Z h)进行拼接后,输出结果是一个包含h个矩阵的大矩阵,这个大矩阵的维度数量是h个矩阵的维度数量之和,这个大矩阵的行数等于h个矩阵中每一个矩阵的行数。
多头注意力模型具有很多方面的效果。
从语义特征提取能力的角度来讲,多头注意力模型由于使用多个注意力模块,每个注意力模块对应的各种权重矩阵都是随机初始化的,在训练之后,每个权重矩阵都被用来将输入词嵌入(或来自较低编码器/解码器的向量)投影到不同的表示子空间中,从而允许模型在不同的表示子空间里学习到相关的信息,因此,多头注意力模型提取语义特征的能力很强。
从长距离特征捕获能力的角度来讲,首先,多头注意力模型由于是基于自注意力机制的模型,具备自注意力机制的好处,能够学习出一个句子的内部结构。在此基础上,多头注意力模型由于使用多个注意力模块,扩展了模型专注于不同位置的能力,因此进一步增强了长距离特征捕获能力。
从任务综合特征抽取能力的角度来讲,多头注意力模型在词法,句法,语义,上下文处理能力,长距离特征捕获等各方面的性能的表现均较为出色,因此综合特征抽取能力很强。
从并行计算能力的角度来讲,多头注意力模型由于不依赖于前一时刻的计算,因此可以并行运算。
以上介绍了本申请实施例的语义理解模型涉及的自注意力机制,本申请实施例的语义理解模型还涉及AI领域的一些概念,为了便于理解,下面对这些概念进行介绍。
(3)激活函数(activation functions):是一种用于进行非线性变换的函数。
(4)高斯误差线性单元(Gaussian error linear units,Gelu)是一种高性能的激活函数,Gelu函数的非线性变换是一种符合预期的随机正则变换方式,因此在NLP领域表现出色,尤其是在自注意力模型中表现最好;能避免梯度消失问题。
(5)损失函数
在训练模型的过程中,因为希望模型的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为模型中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到模型能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么模型的训练就变成了尽可能缩小这个loss的过程。
(6)反向传播算法
模型可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的超分辨率模型中参数的大小,使得超分辨率模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的超分辨率模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的超分辨率模型的参数,例如权重矩阵。
以上介绍了本申请实施例的语义理解模型涉及的自注意力机制,本申请实施例的语义理解模型还涉及知识图谱技术领域的一些概念,为了便于理解,下面对这些概念进行介绍。
(7)实体(entity)是指具有可区别性且独立存在的某种事物。实体可以是具体的对象,如某一个人、某一个城市、某一种植物等、某一种商品等等。实体也可以是抽象的事件,如:一次借书、一场球赛等。世界万物由具体事物组成,而事物均可以称为实体。
(8)实体抽取:是指提取文本中的实体,例如将文本中的人名、组织/机构名、地理位置、事件/日期、字符值、金额值抽取出来。实体抽取包括对实体进行检测(find)和分类(classify)。通俗地说,实体抽取就是从句子中找到实体,并对实体打上标签。
(9)属性:实体有很多特性,每一个特性称为属性。每个属性有一个值域,其类型可以是整数型、实数型、字符串型。如:学生(实体)有学号、姓名、年龄、性别等属性,相应值域为字符、字符串、整数和字符串型。
下面介绍本申请实施例提供的系统架构。
参见附图1,本申请实施例提供了一种系统架构100。如系统架构100所示,数据采集设备16用于采集训练数据,本申请实施例中训练数据包括:标注了语义信息的文本,例如标注有语义意图和语义槽位的文本。可选地,训练数据还包括被掩码的文本,例如经过了随机多元掩码策略处理后的样本;数据采集设备16将训练数据存入数据库13。训练设备12基于数据库13中维护的训练数据训练得到语义理解模型200。下面将以实施例一,更详细地描述训练设备12如何基于训练数据得到语义理解模型200,该语义理解模型200能够用于实现本申请实施例中提取该词法特征和该句法特征的功能,即,将待分析的文本通过相关预处理后输入该语义理解模型200,即可得到词法特征和该句法特征。
本申请实施例中的语义理解模型200具体可以为基于注意力机制的模型,在本申请的一些实施例中,该语义理解模型200是通过对预训练模型(如多头注意力模型以及一些权重矩阵)进行模型微调得到的。需要说明的是,在实际的应用中,该数据库13中维护的训练数据不一定都来自于数据采集设备16的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备12也不一定完全基于数据库13维护的训练数据进行语义理解模型200的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
根据训练设备12训练得到的语义理解模型200可以应用于不同的系统或设备中,例如,语义理解模型200应用于图1所示的执行设备11,该执行设备11可以是终端,如车载终端、手机终端,平板电脑,笔记本电脑,AR/VR等,还可以是服务器或者云端等。在附图1中,执行设备11配置有I/O接口112,用于与外部设备进行数据交互。
附图1所示的系统架构中可以应用在语音交互的场景,本申请实施例提供的语音交互方案的产品形态可以是语音交互软件系统的语音个性化自适应算法模块,产品的实现形式为运行在各种终端设备上的计算机程序。例如,应用在车载语音交互的场景,通过本申请实施例提供的语音交互产品能够理解车载用户控制指令的语义意图,实现相应的车载模块的功能。
下面,对系统架构中各个模块的功能举例说明。
用户可以通过音频采集设备14向I/O接口112输入语音。音频采集设备14可以包括分布式麦克风阵列,该分布式麦克风阵列用于采集用户的语音控制命令,此外,音频采集设备14可以进行一些声源定位、回声消除和信号增强等音频信号预处理操作。
语音识别模块113用于根据I/O接口112接收到的输入数据(如该语音信号)进行语音识别,得到待分析的文本。这样,将输入数据从语音信号转化成文本信号,输出给语义理解模块111。
语义理解模块111用于理解语义,例如提取用户的语义意图和语义槽位。语义理解模块111可以包括语义理解模型200、实体抽取模块210、实体构建模块220、异构信息融合模块230以及语义解码模块240。各个模块的具体作用如下:
语义理解模型200根据预训练模型进行迁移训练后得到,语义理解模型200负责实现文本输入的词法,句法语义特征的提取,实现初步的用户命令的语义意图理解。
实体抽取模块210用于将文本输入中的实体抽取出来,得到有效的实体。
实体构建模块220用于将实体进行向量化表示,得到实体以及属性的表征。
异构信息融合模块230将文本输入的词法特征,句法特征和实体特征进行融合,得到语义特征,该语义特征由于结合了不同的向量空间中的有效的信息,能够增强语义意图的理解能力和语义槽位的提取能力。
语义解码模块240用于将语义特征解码,得到语义信息,例如用户命令输入的语义意图理解和语义槽位提取,输出控制命令。
在执行设备11对输入数据进行预处理,或者在执行设备11的语义理解模块111执行计算等相关的处理过程中,执行设备11可以调用数据存储系统15中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统15中。此外,执行设备11可以确定用户的语义意图和语义槽位之后,则将控制命令下发给I/O接口112。
最后,I/O接口112将控制命令返回给车载执行系统18,车载执行系统18执行相应的控制命令,如听歌,语音导航、接听来电、控制车温等,支持智能化的车载场景。
值得说明的是,车载语音交互的场景仅是举例说明。训练设备12还可以针对不同的任务,基于不同的训练数据生成相应的语义理解模型200,该相应的语义理解模型200即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。
比如,上述系统架构也可以应用在机器翻译的场景或者机器人问答的场景,附图1中所示的音频采集设备14也可以被替换为手机、个人电脑或者其他用户设备。用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,用户设备可以自动地向I/O接口112发送输入数据,如果要求用户设备自动发送输入数据需要获得用户的授权,则用户可以在用户设备中设置相应权限。用户可以在用户设备查看执行设备11输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。用户设备也可以作为数 据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库13。当然,也可以不经过用户设备进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库13。
值得注意的是,附图1仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在附图1中,数据存储系统15相对执行设备11是外部存储器,在其它情况下,也可以将数据存储系统15置于执行设备11中。
如图2所示,根据训练设备12训练得到语义理解模型200,本申请实施例提供的语义理解模型200可以包括:第一多头注意力模型201、第一向量标准化层202、前向传递层203和第二向量标准化层204。
第一多头注意力模型201用于接收输入的文本,对文本进行注意力运算后,将输出结果发送给第一向量标准化层202。第一多头注意力模型201包括多个注意力模块,每个注意力模块也称一个注意力模块。例如在图2中,第一多头注意力模型201包括注意力模块0、注意力模块1、注意力模块2、注意力模块3、注意力模块4、注意力模块5、注意力模块6和注意力模块7。其中,第一多头注意力模型201整体的技术细节可以参考上述(2)的描述。每个注意力模块可以实现注意力运算,每个注意力模块运算的技术细节可以参考上述(1)的描述。
第一向量标准化层202用于接收第一多头注意力模型201的输入,对其进行标准化计算后,将输出结果发送给前向传递层203。第一向量标准化层202通过进行标准化计算,实现对样本的均值方差归一化,简化整个学习难度。
前向传递层203用于接收第一向量标准化层202的输入,对其进行前向传递计算,将输出结果发送给第二向量标准化层204。前向传递层203能够通过前向传递计算,实现行线性变换和非线性变换,将第一向量标准化层202的输入映射到高维的向量空间。
第二向量标准化层204用于接收前向传递层203的输入,对其进行标准化计算后,对输出结果进行输出。第二向量标准化层204同样可以通过进行标准化计算,实现对样本的均值方差归一化,简化整个学习难度。
实施例一:
图3为本申请实施例一提供的一种语义理解模型的训练方法,实施例一具体可以由如图1所示的训练设备12执行,实施例一涉及预训练过程以及模型微调(fine-tuning)过程,预训练过程使用的样本与模型微调过程使用的样本可以不同。为了区分描述,本实施例将模型微调过程使用的样本使用的样本称为第一样本,将预训练过程使用的样本称为第二样本。该第一样本和第二样本可以是如图1所示的数据库13中维护的训练数据,可选的,实施例一的S301和S302可以在训练设备12中执行,也可以在训练设备12之前由其他功能模块预先执行,例如,云端设备先对从该数据库13中接收或者获取到的第二样本进行预处理,如S301和S302的预训练过程,得到预训练模型,该预训练模型和第一样本作为该训练设备12的输入,并由该训练设备12执行S303至S304。
示例性地,该实施例一包括以下S301至S304:
S301、训练设备获取第二样本。
第二样本是基于掩码(Mask)策略进行处理后的文本,第二样本包括被掩码的文本。第二样本可以标注掩码对应的词位,即,第二样本的标签是被[mask]替换掉的词语在句子中所处的位置。
在一种可能的实现中,可以对获取大规模语料,采用掩码策略对大规模语料进行处理,对处理后的大规模语料进行标注,得到第二样本。其中,该掩码策略可以包括随机掩码策略和多元掩码(N-gram Mask)策略中的至少一项。使用掩码策略训练模型的方式可以称为随机多元动态掩码训练。
例如,原始文本是“打开车内空调”,基于掩码策略对“打开车内空调”进行处理后,得到的第二样本是“打[mask]车内空[mask]”。
又如,原始文本是“导航去浦东大道”,基于掩码策略对“导航去浦东大道”进行处理后,得到的第二样本是“导航去[mask][mask]大道”。
又如,原始文本是“我想听周杰伦的七里香”,基于掩码策略对“我想听周杰伦的七里香”进行处理后,得到的第二样本是“我想听[mask][mask][mask]的七里香”。
又如,原始文本是“我要给我家里打个电话”,基于掩码策略对“我要给我家里打个电话”进行处理后,得到的第二样本是“是我要给我家里打个[mask][mask]”。
又如,原始文本是“给副驾驶座椅加热”,基于掩码策略对“给副驾驶座椅加热”进行处理后,得到的第二样本是“给[mask]驾[mask]座椅[mask]热”。
S302、训练设备根据第二样本进行模型训练,得到预训练模型。
模型训练可以通过损失函数以及反向传播算法实现,其具体细节可参考上述(5)和(6)的描述。
S303、训练设备获取第一样本。
该第一样本包括标注了语义信息的文本。例如,第一样本标注了语义意图和语义槽位。可选地,应用在车载领域,第一样本可以是车载领域的文本,例如车载语音交互场景中的语料库。
S304、训练设备根据第一样本对预训练模型进行迁移训练,得到语义理解模型。
在S304中,迁移训练可以为模型微调。模型微调与模型训练在概念上有所区别,模型训练通常是指在训练之前,模型的参数被随机初始化,根据随机初始化的参数从头开始训练一个新的网络。而模型微调是指在预训练模型的基础上,根据特定的任务进对模型的参数进行微调,微调的方式能够利用预训练模型中已经训练好的参数,因此相对于从头开始训练而言,省去大量计算资源和计算时间,提高了计算效率和准确率。模型微调可以通过损失函数以及反向传播算法实现,其具体细节可参考上述(5)和(6)的描述。
当然,上述方式描述的获取语义理解模型的方式仅是举例说明,语义理解模型也可以是其他基于预训练和微调范式的大规模预训练语言模型。
本实施例提供了用于实现语义理解功能的模型训练方法,通过采用掩码策略训练出预训练模型,使得预训练模型具备基本的自然语言处理能力。在预训练模型的基础上,结合语义理解的目标,使用标注了语义信息的文本对预训练模型进行模型微调,使得预训练模型通过微调的过程中,学习出文本与语义信息之间的关联关系,具备词法特征、句法特征和语义特征的提取能力。那么在模型应用阶段,即可利用该语义理解模型,提取出准确的词法特征、 句法特征和语义特征。
实施例二:
图4为本申请实施例二提供的一种语义分析方法,实施例二具体可以由如图1所示的执行设备11执行,该实施例二中的待分析的文本可以由图1所示的音频采集设备14给出的语音转换得到,该执行设备11中的语音识别模块113可以用来执行实施例二中S401,该执行设备11中的语义理解模块111可以用于执行S402至S407。
可选的,该实施例二可以由中央处理器(CPU,central processing unit)处理,也可以由CPU和图形处理器(英语:Graphics Processing Unit,缩写:GPU)共同处理,也可以不用GPU,而使用其他适合用于神经网络计算的处理器,本申请不做限制。该实施例二包括S401至S407。
S401、执行设备获取待分析的文本。
例如,应用在语音交互领域,用户说话后,执行设备会采集到语音信号,对语音信号进行语音识别,得到文本。其中,该语音信号包含了对车载终端的控制命令,文本的形式可以是文本信号。
请参考图1所示的系统架构,S401可以包括以下步骤A至步骤B。
步骤A、在汽车启动或者汽车行驶的过程中,音频采集设备(如分布式麦克风阵列)采集语音信号T=(t1、t2、…tn1),接着,音频采集设备将语音信号T=(t1、t2、…tn1)传递给车载终端的ASR系统。其中,n1表示用户的语音控制命令的长度。
步骤B、车载终端的语音识别(automatic speech recognition,ASR)系统接收音频设备采集的语音信号T=(t1、t2…tn1),对语音信号T=(t1、t2、…tn1)进行语音识别,得到文本信号X=(x1、x2、…xn2),将X=(x1、x2、…xn2)继续传递给语义理解模块。其中,n2表示文本输入的长度,n2和n1相等或不等。
S402、执行设备从文本中提取词法特征和句法特征。
例如,执行设备通过执行下述步骤一至步骤二来提取词法特征和句法特征。
步骤一、执行设备将文本输入语义理解模型。
可选地,可以将文本以向量或矩阵的形式输入语义理解模型。例如,执行设备可以提取文本的字符词向量、相对位置词向量、字符类型词向量,将字符词向量、相对位置词向量、字符类型词向量组成的矩阵输入语义理解模型。例如,请参见图2,文本是“我想听七里香”,输入是([CLS]我想听七里香SEPpad padpad)。字符词向量是(E [CLS]E E E E E E E [SEP]E [pad]E [pad]E [pad])。相对位置词向量是(E 0E 1E 2E 3E 4E 5E 6E 7E 8E 9E 10)。类型词向量是(E 1E 1E 1E 1E 1E 1E 1E 1E 0E 0E 0)。其中,这些参数中E是embedding的缩写,E表示词向量。[CLS]和[SEP]为分隔符。Pad为填充元素,用于将输入的文本处理为相同长度。
步骤二、执行设备通过语义理解模型,从文本中提取词法特征和句法特征。
例如,执行设备执行下述步骤2.1至步骤2.4。
步骤2.1、执行设备对文本进行注意力运算,得到第一输出结果,第一输出结果指示文本中词与词之间的依赖关系。
注意力运算的技术细节可以参考上述概念介绍中(1)至(2)的描述。
可选地,执行设备利用多头注意力机制实现步骤2.1。例如,执行设备执行下述步骤2.1.1 至步骤2.1.4。
步骤2.1.1、执行设备将文本输入第一多头注意力模型201。
本实施例中,预训练模型中可以设置多头注意力模型,实体特征提取阶段也可以利用多头注意力模型,为了区分描述,本实施例将预训练模型中包含的多头注意力模型称为第一多头注意力模型,将实体特征提取阶段使用的多头注意力模型称为第二多头注意力模型。
例如,第一多头注意力模型201包括m层转换(transformer)单元,每一个transformer单元用于执行多头注意力机制,每一个transformer单元包括h个自注意力模块。例如,请参见图2,第一多头注意力模型201包括注意力模块0、注意力模块1、注意力模块2、注意力模块3、注意力模块4、注意力模块5、注意力模块6和注意力模块7。
例如,请参见图2,可以将字符词向量、相对位置词向量、字符类型词向量组成的矩阵,作为第一多头注意力模型201的输入矩阵X,将输入矩阵X分别输入至注意力模块0、注意力模块1、注意力模块2、注意力模块3、注意力模块4、注意力模块5、注意力模块6和注意力模块7。
步骤2.1.2、执行设备通过第一多头注意力模型201中的每个注意力模块,分别对文本进行注意力运算,得到每个注意力模块的输出结果。
例如,请参见图2,注意力模块0、注意力模块1、注意力模块2、注意力模块3、注意力模块4、注意力模块5、注意力模块6和注意力模块7可以分别对输入矩阵X进行注意力运算,得到注意力模块0的输出结果、注意力模块1的输出结果、注意力模块2的输出结果、注意力模块3的输出结果、注意力模块4的输出结果、注意力模块5的输出结果、注意力模块6的输出结果、注意力模块7的输出结果。
其中,每个注意力模块可以采用下述公式(5)至(7)进行注意力运算,注意力模块的输出结果可以通过公式(8)表示。其中,注意力运算为下述公式(8)中的Attention。
Q=W Q X 1   (5)
K=W K X 1   (6)
V=W V X 1   (7)
head(i)=Attention(Q,K,V)   (8)
其中,X 1是输入的文本信号。公式(5)中的W Q表示第一多头注意力模型201中一个注意力模块的查询权重矩阵,Q表示第一多头注意力模型201中一个注意力模块的查询矩阵。公式(6)中的W K表示第一多头注意力模型201中一个注意力模块的键权重矩阵,K表示第一多头注意力模型201中一个注意力模块的键矩阵。公式(7)中的W V表示第一多头注意力模型201中一个注意力模块的值权重矩阵,V表示第一多头注意力模型201中一个注意力模块的值矩阵。公式(8)中的head(i)表示当前自注意力机制的输出矩阵,head(i)的每一行为一个词汇的自注意力向量,该自注意力向量表示句子中每个词(当前词本身以及其他词)对当前词的贡献度,或者说每个词对当前词的打分,i表示第i个注意力模块,i为大于1的正整数,i小于或等于h,head(i)的列数=Value向量的列数。dk为对应的隐藏神经单元维度。Attention表示注意力运算。
步骤2.1.3、执行设备对每个注意力模块的输出结果进行拼接,得到拼接结果。
可选地,注意力模块的输出结果的数据形式是矩阵,拼接结果的数据形式也是矩阵,拼 接结果的维度数量等于每个注意力模块的输出结果的维度数量之和。拼接的方式可以是横向拼接,拼接过程可以通过调用concat(拼接)函数实现。应理解,横向拼接的方式仅是示例性说明。可选地,采用其他拼接方式,对每个注意力模块的输出结果进行拼接,例如采用纵向拼接的方式,对每个注意力模块的输出结果进行拼接,得到拼接结果,则拼接结果的行数等于每个注意力模块的输出结果的行数之和,本实施例对如何进行拼接不做具体限定。
步骤2.1.4、执行设备对拼接结果进行线性变换,得到第一输出结果。
其中,线性变换的方式可以是与一个权重矩阵相乘,也即是,步骤2.1.4具体可以是:执行设备对拼接结果与权重矩阵相乘,将乘积作为第一输出结果。可选地,线性变换也可以采用与权重矩阵相乘之外的其他方式,例如,将拼接结果与某一常数相乘,从而对拼接结果进行线性变换,或者,将拼接结果与某一常数相加,从而对拼接结果进行线性变换,本实施例对线性变换采用哪种方式不做限定。
示例性地,步骤2.1.3和步骤2.1.4可以通过下述公式(9-1)和公式(9-2)表示。步骤2.1.3中的拼接是下述公式(9-1)中的Concat,步骤2.1.4中的线性变换是下述公式(9-1)中和W O相乘。
MultiHead(Q,K,V)=Concat(head 1,......head h)W o   (9-1)
headi=Attention(QW i Q,KW i k,VW i V)   (9-2)
其中,W O为权重矩阵,W O矩阵通过在第一多头注意力模型中联合训练得到,Concat表示拼接操作。MultiHead是第一多头注意力模型的输出。MultiHead是一个矩阵,该矩阵为h个自注意力矩阵的融合。h表示第一多头注意力模型中注意力模块的数量,h为大于1的正整数,head 1表示注意力模块1,head h表示注意力模块h,“head 1,……head h”表示注意力模块1、注意力模块2至注意力模块h这h个注意力模块,h*dk是当前transformer单元多头注意力机制的整体维度大小。Where意思是其中。Attention表示注意力运算。
通过上述方式,能够利用多头注意力机制,捕获到文本中长距离特征,能够提取到丰富的上下文语义表征信息,增强对词法特征和句法特征的提取能力。
步骤2.2、执行设备对第一输出结果进行归一化,得到第二输出结果。
例如,执行设备采用以下公式(10)进行运算,归一化通过下述公式(10)中的LayerNorm函数实现。当然,LayerNorm函数仅是一种示例性实现方式,执行设备也可以采用其他方式进行归一化,本实施例对如何进行归一化不做具体限定。
x=LayerNorm(MultiHead(Q,K,V)+sublayer(MultiHead(Q,K,V)))   (10)
在公式(10)中,x表示第二输出结果。LayerNorm表示标准化计算操作。MultiHead是多头注意力的意思,MultiHead(Q,K,V)为第一输出结果,MultiHead(Q,K,V)为多头注意力机制的输出,也是公式(9)的结果。Sublayer表示残差计算操作。
通过步骤2.2,可以实现向量标准化,而向量标准化能够实现对样本的均值方差的归一化,因此简化了学习的难度。
步骤2.3、执行设备对第二输出结果进行线性变换和非线性变换,得到第三输出结果。
例如,请参见图2,可以将第一多头注意力模型201的输出结果输入第一向量标准化层202,通过第一向量标准化层202会进行线性变换和非线性变换,得到第三输出结果。通过这种方式,得到标准化输出结果之后,采用前向传递计算实现向量空间的高维映射,提取到词法特征、句法特征和语义特征。
其中,线性变换可以包括与矩阵相乘的运算、与偏置相加的运算,非线性变换可以通过 非线性函数实现。例如,非线性变换可以是求最大值的操作,例如,执行设备可以采用以下公式(11)进行运算,线性变换通过下述公式(11)中乘以W 1以及加上b 1实现,非线性变换通过下述公式(11)中max函数实现。其中,max函数仅是非线性变换的示例性实现方式,执行设备也可以采用其他方式进行非线性变换,例如通过激活函数进行运算,从而实现非线性变换,本实施例对如何进行非线性变换不做具体限定。此外,乘以W 1以及加上b 1仅是线性变换的示例性实现方式,执行设备也可以采用其他方式进行线性变换,本实施例对如何进行线性变换不做具体限定。
FFN(x)=max(0,xW 1+b 1)W 2+b 2   (11)
其中,FFN表示前馈神经网络(feed-forward neural network),max表示计算最大值的操作,W1和W2均表示前向传递的权重矩阵,b1和b2均表示权重矩阵的偏置参数,x表示向量标准化的输出,即公式(10)的结果,也就是第二输出结果。
步骤2.4、执行设备对第三输出结果进行归一化,得到词法特征和句法特征。
例如,请参见图2,可以将前向传递层203的输出结果输入第二向量标准化层204,通过第二向量标准化层204会进行归一化,得到词法特征和句法特征。通过步骤2.4,实现对样本的均值方差归一化,简化整个学习难度。
例如,执行设备采用以下公式(12)进行运算。其中,归一化通过下述公式(12)中的LayerNorm函数实现。当然,LayerNorm函数仅是一种示例性实现方式,执行设备也可以采用其他方式对第三输出结果进行归一化,本实施例对如何进行归一化不做具体限定。
V=LayerNorm(FFN(x)+sublayer(FFN(x)))   (12)
其中LayerNorm表示标准化计算操作,FFN(x)为前向传递的输出,sublayer表示残差计算操作,V表示transformer单元的输出矩阵,V的维度为词法特征和句法特征整体的维度总数量。
在一些实施例中,在执行上述方法的过程中,判断m层transformer单元是否计算完成,若判断m层transformer单元计算还未完成,继续进行计算,直到m层transformer计算完成,输出最终的预训练语言模型的张量结构。
通过上述方式,采用了基于预训练模型微调得到的语义理解模型,提取出输入的文本中包含的词法特征和句法特征,由于模型经过了预训练流程和模型微调过程,使得模型整体有很强的语义理解能力,例如语义意图理解能力和语义槽位提取信息能力。尤其是,利用车载领域的文本作为样本进行模型微调,使得模型整体有很强的车载领域的语义意图理解能力。此外,在语义理解模型采用自注意力机制实现时,能够通过进行注意力运算,捕捉到文本内部词与词之间的相关性,并且有助于捕获长距离特征,提取到的句法特征更加精确。
S403、执行设备获取待分析文本中的实体。
例如,执行设备对文本进行实体抽取,得到文本中的实体。比如说,执行设备得到输入的文本为X=(x1x2…xn),执行设备对X=(x1x2…xn)进行Extract操作(即实体抽取操作),得到实体(e1,…,ej)。例如,请参见图5,可以将(E [CLS]E E E E E E E [SEP]E [pad]E [pad]E [pad]),作为实体抽取模块的输入,实体抽取模块对(E [CLS]E E E 、E E E E [SEP]E [pad]E [pad]E [pad])进行实体抽取,得到抽取的实体为(E E E )。其中ej表示文本中第j个实体,j为正整数。
S404、执行设备根据待分析文本中的实体,获取实体对应的结构化实体向量,结构化实 体向量用于指示实体的标识以及实体的属性。
结构化实体向量是实体的向量表示。由于采用了向量这种数据形式,使得数据结构较为规则和完整。例如,结构化实体向量的维度的数量是100维,当然,结构化实体向量也可以不是100维的向量,而是其他维数的向量,本实施例对结构化实体向量的具体维数不做限定。例如,默是一个实体的标识,默的属性为歌曲名,默的结构化实体向量为(-0.0369 -0.1494 0.0732 0.0774 0.0518 0.0518….),其中,(-0.0369-0.1494 0.0732 0.0774 0.0518 0.0518….)中的省略号表示未示出的94个维度,-0.0369、-0.1494、0.0732、0.0774、0.0518、0.0518分别是6个维度的取值。(-0.0369 -0.1494 0.0732 0.0774 0.0518 0.0518….)表示默以及歌曲名。
在一些实施例中,执行设备根据文本中的实体,从实体构建表中获取结构化实体向量。例如,请参见图5,执行设备根据实体(E E E ),从实体构建表中获取七里香的结构化实体向量为(-0.7563 -0.6532 0.2182 0.3914 0.3628 0.5528)。
其中,实体构建表用于保存实体与结构化实体向量之间的映射关系。实体构建表也称为知识实体映射表,用于将实体映射为结构化实体向量,实现实体的表征工作。可选地,实体构建表预先保存在执行设备中。可选地,执行设备以实体为索引,查询实体构建表,得到结构化实体向量,从而将实体映射为向量表示。可选地,实体构建表根据经验设置,例如,预先将中文词库中的每一个词输入词嵌入模型中,通过词嵌入模型对每个词进行处理,输出每个词的词向量。用户根据经验,从中文词库中的每个词中选取实体,从词嵌入模型输出的所有词向量中,筛选出表示实体的词向量,将筛选出的词向量作为结构化实体向量,将结构化实体向量存入实体构建表。其中,该词嵌入模型可以是神经网络模型。
示意性地,参见图5,实体构建表可以如下表1所示。该实体构建表的含义是,默是一个实体,默的结构化实体向量为(-0.0369 -0.1494 0.0732 0.0774 0.0518 0.0518……);老九门是一个实体,老九门的结构化实体向量为(-0.0154 -0.2385 0.1943 0.4892 0.7531 0.9021……);林中的小鸟是一个实体,林中的小鸟的结构化实体向量为(-0.1692 -0.4494 0.7911 0.9651 0.7226 0.3128……);七里香是一个实体,七里香的结构化实体向量为(-0.7563 -0.6532 0.2182 0.3914 0.3628 0.5528……)。其中,图5和表1中,每个结构化实体向量为100维度的向量,图5和表1中每个结构化实体向量中的省略号表示未示出的94个维度的数值,表1的最后一行表示实体构建表包含而在表1中未示出的其他实体。
表1
实体 结构化实体向量
-0.0369 -0.1494 0.0732 0.0774 0.0518 0.0518……
老九门 -0.0154 -0.2385 0.1943 0.4892 0.7531 0.902……
林中的小鸟 -0.1692 -0.4494 0.7911 0.9651 0.7226 0.3128……
七里香 -0.7563 -0.6532 0.2182 0.3914 0.3628 0.5528……
…… ……………………………………………………………………
在一些实施例中,应用在车载领域,实体构建表包括车载领域关联的实体。例如,车载领域包括导航业务领域、音乐播放业务领域、电台业务领域、通讯业务领域、短信收发业务领域、即时通信应用业务领域、日程查询业务领域、新闻推送业务领域、智能问答业务领域、空调控制业务领域、车控业务领域、维修业务领域,实体构建表包括这些业务领域关联的实 体。其中,车载领域中导航场景和听歌场景较为众多,则实体构建表可以包括地点和歌曲。通过这种方式,有助于构建车载领域结构化知识实体。
例如,如果待分析的文本为“播放那英的默”,执行设备对这个文本进行实体抽取,得到实体为“默”,“默”的属性为歌曲名。执行设备根据“默”查询上表1,得到结构化实体向量为(-0.0369 -0.1494 0.0732 0.0774 0.0518 0.0518……),(-0.0369 -0.1494 0.0732 0.0774 0.0518 0.0518……)表示“默”这一实体以及歌曲名这一属性,后续根据结构化实体向量,确定意图为“听歌”。又如,如果待分析的文本为“老九门的歌曲”,执行设备对这个文本进行实体抽取,得到实体为“老九门”,“老九门”的属性为歌手名。执行设备根据“老九门”查询上表1,得到结构化实体向量为(-0.0154 -0.2385 0.1943 0.4892 0.7531 0.902……),(-0.0154 -0.2385 0.1943 0.4892 0.7531 0.902……)表示“老九门”这一实体以及歌手名这一属性。又如,如果待分析的文本为“帮我找一找林中的小鸟”,执行设备对这个文本进行实体抽取,得到实体为“林中的小鸟”,“林中的小鸟”的属性为歌曲名。执行设备根据“林中的小鸟”查询上表1,得到结构化实体向量为(-0.1692 -0.4494 0.7911 0.9651 0.7226 0.3128……),(-0.1692 -0.4494 0.7911 0.9651 0.7226 0.3128……)表示“林中的小鸟”这一实体以及歌曲名这一属性。
在一些实施例中,实体构建表包括名称不规则的实体、名称的字符数量超过阈值的实体、名称的词频低于阈值的实体中的至少一项。名称不规则的实体例如是语法不规则歌曲。名称的字符数量超过阈值的实体例如是长字符地名。名称的词频低于阈值的实体例如是低频字符地名。这些实体由于名称容易引起歧义或具有多种含义,机器难以理解正确语义,而通过预先将这些实体的向量表示预先存入实体构建表,机器查表即可得到准确的向量表示,通过在语义理解的过程中融入实体特征,有助于提高语义理解的准确性。
例如,对“搜索世界之花”这句话进行语义理解时,容易将“世界之花”识别为歌曲名,错误地判定这句话的语义意图为“听歌”。而通过预先为“世界之花”构建结构化实体向量,使用一个向量来表达“世界之花”这一实体和地名这一属性,将向量保存在实体构建表中,如果用户说“搜索世界之花”,执行设备会将“搜索世界之花”作为待识别的文本,抽取实体为“世界之花”,查询实体构建表,得到预先为“世界之花”构建的向量表示(即结构化实体向量)。由于该向量指示“世界之花”的属性是地名而不是歌曲名,因此执行设备根据向量进行语义分析后,能够判定这句话的语义意图为“导航”而不是“听歌”,由此提高了语义意图识别的准确性。
综上以上步骤a和步骤b,例如,执行设备采用以下公式(13),实现结构化实体向量的提取。其中,获取待分析文本中的实体是下述公式(13)中的Extract,获取结构化实体向量是下述公式(13)中的F。
E1={e1,…ej}=F(Extract{x1,…xn})   (13)
其中,x1……xn表示待分析的文本,x1表示文本中的第1个字,xn表示文本中的第n个字,……表示文本中包含而未示出的字,Extract表示实体抽取操作,F表示用于构建实体的映射函数,E1表示结构化实体向量,e1,…ej表示抽取的每个实体的向量表示。
通过上述方法,将输入的文本中的实体抽取出来,通过构建结构化实体向量,从而将实体进行向量化表示,由于实体的向量能够表征实体和实体的属性,因此实体的向量化表示效果好,实现实体的有效嵌入,因此后续预训练模型根据结构化实体向量进行进一步识别时,能够增强预训练模型的车载语义意图理解能力和语义槽位提取能力。
应理解,本实施例对S402与S403的时序不做限定。在一些实施例中,S402与S403可以顺序执行。例如,可以先执行S402,再执行S403;也可以先执行S403,再执行S402。在另一些实施例中,S402与S403也可以并行执行,即,可以同时执行S402以及S403。
S405、执行设备对结构化实体向量进行特征提取,得到实体特征。
可选地,执行设备对结构化实体向量进行注意力运算,得到实体特征,使得实体特征能够捕捉结构化实体向量内部的结构和依赖关系。示例性地,执行设备利用多头注意力模型,执行下述步骤(1)至步骤(4),对结构化实体向量进行特征提取。
步骤(1)执行设备将结构化实体向量输入第二多头注意力模型。
例如,第二多头注意力模型包括m层transformer单元,每一个transformer单元用于执行多头注意力机制,每一个transformer单元包括h个自注意力模块。例如,请参见图5,第二多头注意力模型包括注意力模块0、注意力模块1、注意力模块2、注意力模块3、注意力模块4、注意力模块5、注意力模块6和注意力模块7。
例如,请参见图5,可以将七里香的结构化实体向量(-0.7563 -0.6532 0.2182 0.3914 0.3628 0.5528……),作为第二多头注意力模型的输入矩阵X,将输入矩阵X分别输入至注意力模块0、注意力模块1、注意力模块2、注意力模块3、注意力模块4、注意力模块5、注意力模块6和注意力模块7。
步骤(2)执行设备通过第二多头注意力模型中的每个注意力模块,分别对结构化实体向量进行注意力运算,得到每个注意力模块的输出结果。
例如,请参见图5,注意力模块0、注意力模块1、注意力模块2、注意力模块3、注意力模块4、注意力模块5、注意力模块6和注意力模块7可以分别对输入矩阵X进行注意力运算,得到注意力模块0的输出结果、注意力模块1的输出结果、注意力模块2的输出结果、注意力模块3的输出结果、注意力模块4的输出结果、注意力模块5的输出结果、注意力模块6的输出结果、注意力模块7的输出结果。
其中,每个注意力模块可以采用下述公式(14)至(17)进行注意力运算,注意力模块的输出结果可以通过公式(18)表示。
Q=W QX 2   (14)
K=W KX 2   (15)
V=W VX 2   (16)
Figure PCTCN2020073914-appb-000002
head(i)=Attention(Q,K,V)   (18)
其中,X 2表示输入的结构化实体向量,公式(14)中的W Q表示第二多头注意力模型中一个注意力模块的查询权重矩阵,Q表示第二多头注意力模型中一个注意力模块的查询矩阵。公式(15)中的W K为第二多头注意力模型中一个注意力模块的键权重矩阵,K表示第二多头注意力模型中一个注意力模块的键矩阵。公式(16)中的W V为第二多头注意力模型中一个注意力模块的值权重矩阵,V表示第二多头注意力模型中一个注意力模块的值矩阵。head(i)表示当前自注意力机制的输出矩阵,head(i)的列数=值(Value)向量的列数。dk为对应的隐 藏神经单元维度。Attention表示注意力运算。softmax表示通过softmax函数运算。
步骤(3)执行设备对每个注意力模块的输出结果进行拼接,得到拼接结果。
可选地,注意力模块的输出结果的数据形式是矩阵,拼接结果的数据形式也是矩阵,拼接结果的维度数量等于每个注意力模块的输出结果的维度数量之和。拼接的方式可以是横向拼接,拼接过程可以通过调用concat(拼接)函数实现。应理解,横向拼接的方式仅是示例性说明。可选地,采用其他拼接方式,对每个注意力模块的输出结果进行拼接,例如采用纵向拼接的方式,对每个注意力模块的输出结果进行拼接,得到拼接结果,则拼接结果的行数等于每个注意力模块的输出结果的行数之和,本实施例对如何进行拼接不做具体限定。
例如,多头注意力模型有12个注意力模块,这12个注意力模块中每一个注意力模块的输出结果是10行64列的矩阵,则拼接结果是一个10行768列的矩阵,其中拼接结果中的第1列至第12列是第1个注意力模块的输出结果,拼接结果中的第13列至第24列是第2个注意力模块的输出结果,拼接结果中的第25列至第36列是第3个注意力模块的输出结果,依次类推,拼接结果中的第705列至第768列是第12个注意力模块的输出结果。例如,请参见下面的公式(19)和公式(20),每个注意力模块的输出结果是公式(20)中的head i,h个注意力模块的输出结果是公式(19)中的head 1,......head h,其中,head1是注意力模块1的输出结果,headh是注意力模块h的输出结果,省略号表示未示出的其他注意力模块的输出结果,拼接可以是通过公式(19)中Concat函数进行运算的操作。
MultiHead(Q,K,V)=Concat(head 1,......head h)W O   (19)
headi=Attention(Q iW i Q,K iW i K,V iW i V)   (20)
其中,公式(19)中的Concat表示拼接操作,h表示注意力模块的数量,h为大于1的正整数,WO表示一个权重矩阵,WO通过在第二多头注意力模型中联合训练得到,MultiHead是第二多头注意力模型的输出,Q i表示注意力模块headi对应的Q矩阵,K i表示注意力模块headi对应的K矩阵,V i表示注意力模块headi对应的V矩阵。
步骤(4)执行设备对拼接结果进行线性变换,得到实体特征。
可选地,线性变换的方式是与一个权重矩阵相乘,也即是,步骤(4)具体可以是:执行设备对拼接结果与权重矩阵相乘,将乘积作为实体特征。例如,参见上面公式(20),线性变换所使用的权重矩阵是WO,步骤(4)具体可以是:对Concat(head 1,......head h)与W O相乘,得到的乘积是MultiHead(Q,K,V),MultiHead(Q,K,V)即为实体特征。可选地,线性变换也可以采用与权重矩阵相乘之外的其他方式,例如,将拼接结果与某一常数相乘,从而对拼接结果进行线性变换,或者,将拼接结果与某一常数相加,从而对拼接结果进行线性变换,本实施例对线性变换采用哪种方式不做限定。
示例性地,步骤(3)和步骤(4)可以通过上述公式(19)、公式(20)和以下公式(21)表示。
E2=MultiHead(Q,K,V)   (21)
E2表示根据文本的结构化实体向量提取出的实体特征。可选地,E2的数据形式是一个矩阵,E2的每一行是文本中一个实体对应的结构化实体向量,E2的维度数量和一个结构化实体向量的维度数量相等。例如,如果待分析的文本中共包含N个实体,则E2是N行的矩阵,E2的第1行是文本中第1个实体对应的结构化实体向量,E2的第2行是文本中第2个实体对应的结构化实体向量,如果一个结构化实体向量是100维度的向量,则E2的维度数量 等于100。N为正整数。
通过上述方式,利用多头注意力机制,能够捕捉到结构化实体向量内部中词与词之间的相关性,并且有助于捕获长距离特征,使得提取到的实体特征能够准确表达出语义,因此实体特征更加精确。
S406、执行设备对文本的实体特征、文本的词法特征和文本的句法特征进行融合,得到文本的语义特征。
执行设备通过从文本中提取词法特征,句法特征和实体特征,实现了对文本信息的初步的语义意图理解。接下来,执行设备将词法特征、句法特征和实体特征进行融合,从而将三种特征结合起来,融合得到的语义特征包含了实体特征、词法特征和句法特征,蕴含了丰富的语义相关的信息,因此语义特征能够用于得到所述文本的语义信息,使用融合后的语义特征能够进一步增强预训练模型本身的车载语义意图理解能力和语义槽位提取能力。
例如,请参见图6,得到语义理解模型输出的结果为(w1w2w3w4w5w6w7w8w9),(w1w2w3w4w5w6w7w8w9)包含文本的词法特征和文本的句法特征,(w1w2w3w4w5w6w7w8w9)是文本的词法特征和句法特征的融合,词法特征和句法特征是在语义理解模型的内部计算过程中实现融合的。此外,通过504得到的实体特征为(e5e6e7)。那么,可以将(w1w2w3w4w5w6w7w8w9)与(e5e6e7)进行融合,将融合结果作为语义特征,其中,e5是一个结构化实体向量的实体特征,e5是一个向量,e6是另一个结构化实体向量的实体特征,e6是一个向量,e7是另一个结构化实体向量的实体特征,e7也是一个向量。由于(w1w2w3w4w5w6w7w8w9)已经包含了词法特征和句法特征,将其与实体特征进行融合后,语义特征会包含词法特征、句法特征和实体特征。
例如,执行设备可以通过下述步骤一至步骤二,进行特征融合。
步骤一、执行设备对文本的实体特征、文本的词法特征和文本的句法特征进行加权求和,得到融合特征。
由于词法特征、句法特征和实体特征是不同向量空间中的特征,或者说词法特征、句法特征和实体特征是异构信息,通过对实体特征、词法特征和句法特征进行加权求和,可以将这三种特征融合在一起,从而实现异构信息融合。
步骤二、执行设备通过激活函数对融合特征进行非线性变换,得到语义特征。
其中,激活函数可以采用GELU函数。例如,执行设备可以采用下述公式(22)和公式(23)进行运算,公式(22)和公式(23)可以提供为异构信息融合策略。
h=GELU(W ti+W e*e i+b)  (22)
GELU(X)=xP(X<=x)=xφ(x),φ(x)~(0,1)   (23)
其中GELU表示激活函数,W t表示权重矩阵,W e表示权重矩阵,b表示偏置参数,wi表示语义理解模型200的输出结果,wi的形式可以是一个文本序列。例如,上述公式(12)中通过LayerNorm得出的V可以是一个矩阵的形式,公式(22)中的wi是上述公式(12)中V中的一行。ei表示实体构建模块的输出结果,ei的形式可以是一个知识序列,即一个结构化实体向量,ei可以是公式(21)得出的矩阵E2中的一行,φ(x)表示符合(0,1)正态分布的概率分布函数。
S407、执行设备对语义特征进行解码,得到文本的语义信息。
S407为可选步骤,本实施例对是否执行S407不做限定。
例如,语义信息包括语义意图和语义槽位中的至少一项。执行设备可以计算语义意图的概率分布,得到当前的语义意图和语义槽位。例如,执行设备的语义理解编码器可以处理文本信号序列X=x1 x2…xn,生成新的序列Z=z1 z2…zn,n表示文本输入的长度,之后,语义理解解码器继续处理文本信号序列Z,得到最终的输出序列Y=y1 y2…yn+1。其中y1表示语义意图,y2…yn+1表示文本信号的语义槽位信息。例如,执行设备采用以下公式(24)和公式(25)进行计算。
y1=F(Wh1*hi+b1)   (24)
yi=F(Wh2*hi+b2)   (25)
公式(24)中y1表示语义意图,Wh1表示权重矩阵,b1表示偏置参数,F表示用于解码的函数。公式(25)中yi表示语义槽位,Wh2表示权重矩阵,b2表示偏置参数。
可选地,执行设备理解了语义信息之后,根据语义信息执行相应的操作。例如,应用在车载领域,执行设备是车载终端,车载终端根据语义信息,控制车载执行系统进行操作,从而进行车载语音交互。之后,执行设备可以进行等待,若新的语音信号来到,执行设备重新执行上述过程以理解新的语音信号的语义。
本实施例提供的方法,通过构建结构化实体向量,以向量的形式来表征实体的标识和实体的属性,从结构化实体向量提取出实体特征,将实体特征与词法特征和句法特征进行融合,得到包含了实体特征、词法特征和句法特征的语义特征,对语义特征解码后得到语义信息,由于结构化实体向量中包含实体的标识和实体的属性,能够利用实体的属性增强语义理解的能力。
以下通过实施例三,对实施例二进行举例说明。在实施例三示出的方法中,执行设备为车载终端,待识别的文本是对车载终端采集的语音进行识别得到的。换句话说,实施例三关于车载终端如何利用实施例二与用户进行语音交互。应理解,实施例三与实施例二同理的步骤还请参见实施例二,在实施例三不做赘述。
实施例三
图7为本申请实施例三提供的一种基于语义理解模型和结构化实体向量的车载语音交互的实施例三,该实施例三具体可以由车载终端执行,实施例三包括S701至S704。
S701、车载终端的音频设备采集用户输入的语音,该语音为控制命令信号,该音频设备例如是分布式麦克风阵列。
S702、车载终端的语音识别模块将语音信号转化为文本信号,将文本信号输入车载终端的语义理解模块。
S703、参见图8,语义理解模块对应的步骤包括S7031至S7039。
S7031、车载终端基于多头注意力机制,通过多个注意力模块对文本信号进行注意力运算,得到每个注意力模块的输出结果,通过拼接和线性变换后,得到第一输出结果。
S7032、车载终端对第一输出结果执行向量标准化操作,使得第一输出结果归一化为第二输出结果。
S7033、车载终端对第二输出结果进行前向传递操作,使得第二输出结果经过线性变换和非线性变换后,转换为第三输出结果。
S7034、车载终端对第三输出结果执行向量标准化操作,使得第三输出结果归一化为句法特征和词法特征。
通过S7031至S7034,实现了文本输入的词法特征、句法特征和语义特征的提取,实现初步的用户命令的语义意图理解。
S7035、车载终端的知识实体抽取模块将文本输入中实体抽取出来,得到有效的实体。
S7036、车载终端的知识实体构建模块将实体进行向量化表示,得到实体的属性的表征。
S7037、车载终端基于多头注意力机制,通过多个注意力模块对实体的属性的表征进行注意力运算,得到每个注意力模块的输出结果,通过拼接和线性变换后,得到实体特征。
S7038、车载终端的异构信息融合模块将文本输入的句法特征、词法特征和实体特征在不同的向量空间实现有效的信息融合。
S7039、车载终端通过语义解码器,计算语义意图概率分布,得到用户的当前语义意图和和语义槽位。
S704、车载功能模块接收控制命令信号,根据控制命令信号执行操作。
本实施例提供的方法提供了一种车载领域下基于语义理解模型和结构化实体向量的车载语音交互的方法,由于利用了经过了预训练和模型微调的语义理解模型,并基于结构化实体向量提取到了实体特征,并融合了实体特征、词法特征和句法特征,因此能够在车载语音交互的场景下,解决语义意图理解能力不足和不能完全识别基本的结构化知识实体的问题,从而进一步增强车载领域的语义意图理解能力和语义槽位信息提取能力。
可以理解,实施例一为该语义理解模型的训练阶段(如图1所示的训练设备12执行的阶段),具体训练是采用由实施例一以及实施例一基础上任意一种可能的实现方式中提供的预训练模型进行的;而实施例二则可以理解为是该语义理解模型的应用阶段(如图1所示的执行设备11执行的阶段),具体可以体现为采用由实施例一训练得到的语义理解模型,并根据用户输入的语音或文本,从而得到输出的语义信息,而实施例三为实施例二包括的一种实施例。
以上介绍了本申请实施例的语义分析方法,以下介绍本申请实施例的语义分析装置,应理解,该语义分析装置其具有上述方法中执行设备的任意功能。
图9是本申请实施例提供的一种语义分析装置的结构示意图,如图9所示,该语义分析装置900包括:获取模块901,用于执行S403至S404;提取模块902,用于执行S405;融合模块903,用于执行S406。
可选地,该融合模块903,包括:加权求和子模块,用于执行S406中的步骤一;变换子模块,用于执行S406中的步骤二。
可选地,该提取模块902,包括:注意力子模块,用于执行S402中的步骤2.1;归一化子模块,用于执行S402中的步骤2.2;变换子模块,用于执行S402中的步骤2.3;该归一化子模块,还用于执行S402中的步骤2.4。
可选地,该注意力子模块,用于执行S402中的步骤2.1.1至步骤2.1.4。
可选地,该提取模块902,包括:输入子模块,用于执行S405中的步骤(1);注意力子模块,用于执行S405中的步骤(2);拼接子模块,用于执行S405中的步骤(3);变换 子模块,用于执行S405中的步骤(4)。
应理解,图9实施例提供的语义分析装置900对应于上述方法实施例中的执行设备,语义分析装置900中的各模块和上述其他操作和/或功能分别为了实现方法实施例中的执行设备所实施的各种步骤和方法,具体细节可参见上述方法实施例,为了简洁,在此不再赘述。
应理解,图9实施例提供的语义分析装置在分析语义时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将语义分析装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的语义分析装置与上述实施例二属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图10是本申请实施例提供的一种语义理解模型的训练装置的结构示意图,如图10所示,该语义理解模型的训练装置1000包括:获取模块1001,用于执行S301;训练模块1002,用于执行S302;获取模块1001,还用于执行S303,训练模块1002,还用于执行S304。
应理解,图10实施例提供的语义理解模型的训练装置1000对应于上述方法实施例中的训练设备,语义理解模型的训练装置1000中的各模块和上述其他操作和/或功能分别为了实现方法实施例中的训练设备所实施的各种步骤和方法,具体细节可参见上述方法实施例,为了简洁,在此不再赘述。
应理解,图10实施例提供的语义理解模型的训练装置在训练语义理解模型时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将语义理解模型的训练装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的语义理解模型的训练装置与上述实施例一属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图11是本申请实施例提供的语义分析装置的硬件结构示意图。图11所示的语义分析装置1100(该装置1100具体可以是一种计算机设备)包括存储器1101、处理器1102、通信接口1103以及总线1104。其中,存储器1101、处理器1102、通信接口1103通过总线1104实现彼此之间的通信连接。
存储器1101可以是只读存储器(Read Only Memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(Random Access Memory,RAM)。存储器1101可以存储程序,当存储器1101中存储的程序被处理器1102执行时,处理器1102和通信接口1103用于执行本申请实施例的语义分析方法的各个步骤。
处理器1102可以采用通用的中央处理器(Central Processing Unit,CPU),微处理器,应用专用集成电路(Application Specific Integrated Circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的语义分析装置中的单元所需执行的功能,或者执行本申请方法实施例的语义分析方法。
处理器1102还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的语义分析方法的各个步骤可以通过处理器1102中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1102还可以是通用处理器、数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA) 或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1101,处理器1102读取存储器1101中的信息,结合其硬件完成本申请实施例的语义分析装置中包括的单元所需执行的功能,或者执行本申请方法实施例的语义分析方法。
通信接口1103使用例如但不限于收发器一类的收发装置,来实现装置1100与其他设备或通信网络之间的通信。例如,可以通过通信接口1103获取文本(如本申请实施例二中的待分析的文本)。
总线1104可包括在装置1100各个部件(例如,存储器1101、处理器1102、通信接口1103)之间传送信息的通路。
应理解,语义分析装置900中的提取模块902、融合模块903和解码模块903可以相当于处理器1102。
图12是本申请实施例提供的一种语义理解模型的训练装置的硬件结构示意图。图12所示的语义理解模型的训练装置1200(该装置1200具体可以是一种计算机设备)包括存储器1201、处理器1202、通信接口1203以及总线1204。其中,存储器1201、处理器1202、通信接口1203通过总线1204实现彼此之间的通信连接。
存储器1201可以是只读存储器(Read Only Memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(Random Access Memory,RAM)。存储器1201可以存储程序,当存储器1201中存储的程序被处理器1202执行时,处理器1202和通信接口1203用于执行本申请实施例的语义理解模型的训练方法的各个步骤。
处理器1202可以采用通用的中央处理器(Central Processing Unit,CPU),微处理器,应用专用集成电路(Application Specific Integrated Circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的语义理解模型的训练装置中的单元所需执行的功能,或者执行本申请方法实施例的语义理解模型的训练方法。
处理器1202还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的语义理解模型的训练方法的各个步骤可以通过处理器1202中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1202还可以是通用处理器、数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1201,处理器1202 读取存储器1201中的信息,结合其硬件完成本申请实施例的语义理解模型的训练装置中包括的单元所需执行的功能,或者执行本申请方法实施例的语义理解模型的训练方法。
通信接口1203使用例如但不限于收发器一类的收发装置,来实现装置1200与其他设备或通信网络之间的通信。例如,可以通过通信接口1203获取训练数据(如本申请实施例一中的被掩码的文本或标注了语义意图、语义槽位等语义信息的文本)。
总线1204可包括在装置1200各个部件(例如,存储器1201、处理器1202、通信接口1203)之间传送信息的通路。
应理解,语义理解模型的训练装置1000中的获取模块1001相当于语义理解模型的训练装置1200中的通信接口1203,训练模块1002可以相当于处理器1202。
应注意,尽管图12和图11所示的装置1200和1100仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,装置1200和1100还包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,装置1200和1100还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,装置1200和1100也可仅仅包括实现本申请实施例所必须的器件,而不必包括图12或图11中所示的全部器件。
可以理解,装置1200相当于图1中的训练设备12,该装置1100相当于图1中的执行设备11。本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
该作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
该功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务 器,或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (16)

  1. 一种语义分析方法,其特征在于,所述方法包括:
    获取待分析文本中的实体;
    根据所述待分析文本中的所述实体,获取所述实体对应的结构化实体向量,所述结构化实体向量用于指示所述实体的标识以及所述实体的属性;
    对所述结构化实体向量进行特征提取,得到实体特征;
    对所述实体特征、所述文本的词法特征和所述文本的句法特征进行融合,得到所述文本的语义特征,所述语义特征用于获取所述文本的语义信息。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述待分析文本中的所述实体,获取所述实体对应的结构化实体向量,包括:
    根据所述待分析文本中的所述实体,从实体构建表中获取所述结构化实体向量,所述实体构建表用于保存实体与结构化实体向量之间的映射关系。
  3. 根据权利要求1所述的方法,其特征在于,所述对所述实体特征、所述文本的词法特征和所述文本的句法特征进行融合,得到所述文本的语义特征,包括:
    对所述实体特征、所述词法特征和所述句法特征进行加权求和,得到融合特征;
    通过激活函数对所述融合特征进行非线性变换,得到所述语义特征。
  4. 根据权利要求1所述的方法,其特征在于,所述对所述实体特征、所述文本的词法特征和所述文本的句法特征进行融合之前,所述方法还包括:
    将所述文本输入语义理解模型,所述语义理解模型是根据第一样本对预训练模型进行迁移训练得到的,所述第一样本包括标注了语义信息的文本,所述预训练模型是根据第二样本训练得到的,所述第二样本包括被掩码的文本;
    通过所述语义理解模型,从所述文本中提取所述词法特征和所述句法特征。
  5. 根据权利要求4所述的方法,其特征在于,所述通过所述语义理解模型,从所述文本中提取所述词法特征和所述句法特征,包括:
    对所述文本进行注意力运算,得到第一输出结果,所述第一输出结果用于指示所述文本中词与词之间的依赖关系;
    对所述第一输出结果进行归一化,得到第二输出结果;
    对所述第二输出结果进行线性变换和非线性变换,得到第三输出结果;
    对所述第三输出结果进行归一化,得到所述词法特征和所述句法特征。
  6. 根据权利要求5所述的方法,其特征在于,所述语义理解模型包括第一多头注意力模型,所述对所述文本进行注意力运算,得到第一输出结果,包括:
    将所述文本输入所述第一多头注意力模型;
    通过所述第一多头注意力模型中的每个注意力模块,分别对所述文本进行注意力运算,得到每个注意力模块的输出结果;
    对所述每个注意力模块的输出结果进行拼接,得到拼接结果;
    对所述拼接结果进行线性变换,得到所述第一输出结果。
  7. 根据权利要求1所述的方法,其特征在于,所述对所述结构化实体向量进行特征提取,得到实体特征,包括:
    将所述结构化实体向量输入第二多头注意力模型;
    通过所述第二多头注意力模型中的每个注意力模块,分别对所述结构化实体向量进行注意力运算,得到每个注意力模块的输出结果;
    对所述每个注意力模块的输出结果进行拼接,得到拼接结果;
    对所述拼接结果进行线性变换,得到所述实体特征。
  8. 一种语义分析装置,其特征在于,所述装置包括:
    获取模块,用于获取待分析文本中的实体,根据所述待分析文本中的所述实体,获取所述实体对应的结构化实体向量,所述结构化实体向量用于指示所述实体的标识以及所述实体的属性;
    提取模块,用于对所述结构化实体进行特征提取,得到实体特征;
    融合模块,用于对所述实体特征、所述文本的词法特征和所述文本的句法特征进行融合,得到所述文本的语义特征,所述语义特征用于获取所述文本的语义信息。
  9. 根据权利要求8所述的装置,其特征在于,所述获取模块,用于根据所述待分析文本中的所述实体,从实体构建表中获取所述结构化实体向量,所述实体构建表用于保存实体与结构化实体向量之间的映射关系。
  10. 根据权利要求8所述的装置,其特征在于,所述融合模块,包括:
    加权求和子模块,用于对所述实体特征、所述词法特征和所述句法特征进行加权求和,得到融合特征;
    变换子模块,用于通过激活函数对所述融合特征进行非线性变换,得到所述语义特征。
  11. 根据权利要求8所述的装置,其特征在于,所述装置还包括:
    输入模块,用于将所述文本输入语义理解模型,所述语义理解模型是根据第一样本对预训练模型进行迁移训练得到的,所述第一样本包括标注了语义信息的文本,所述预训练模型是根据第二样本训练得到的,所述第二样本包括被掩码的文本;
    所述提取模块,还用于通过所述语义理解模型,从所述文本中提取所述词法特征和所述句法特征。
  12. 根据权利要求11所述的装置,其特征在于,所述提取模块,包括:
    注意力子模块,用于对所述文本进行注意力运算,得到第一输出结果,所述第一输出结 果用于指示所述文本中词与词之间的依赖关系;
    归一化子模块,用于对所述第一输出结果进行归一化,得到第二输出结果;
    变换子模块,用于对所述第二输出结果进行线性变换和非线性变换,得到第三输出结果;
    所述归一化子模块,还用于对所述第三输出结果进行归一化,得到所述词法特征和所述句法特征。
  13. 根据权利要求12所述的装置,其特征在于,所述语义理解模型包括第一多头注意力模型,所述注意力子模块,用于将所述文本输入所述第一多头注意力模型;通过所述第一多头注意力模型中的每个注意力模块,分别对所述文本进行注意力运算,得到每个注意力模块的输出结果;对所述每个注意力模块的输出结果进行拼接,得到拼接结果;对所述拼接结果进行线性变换,得到所述第一输出结果。
  14. 根据权利要求8所述的装置,其特征在于,所述提取模块,包括:
    输入子模块,用于将所述结构化实体向量输入第二多头注意力模型;
    注意力子模块,用于通过所述第二多头注意力模型中的每个注意力模块,分别对所述结构化实体向量进行注意力运算,得到每个注意力模块的输出结果;
    拼接子模块,用于对所述每个注意力模块的输出结果进行拼接,得到拼接结果;
    变换子模块,用于对所述拼接结果进行线性变换,得到所述实体特征。
  15. 一种执行设备,其特征在于,所述执行设备包括处理器,所述处理器用于执行指令,使得所述执行设备执行如权利要求1至权利要求7中任一项所述的方法。
  16. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令,所述指令由处理器读取以使执行设备执行如权利要求1至权利要求7中任一项所述的方法。
PCT/CN2020/073914 2020-01-22 2020-01-22 语义分析方法、装置、设备及存储介质 WO2021147041A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/073914 WO2021147041A1 (zh) 2020-01-22 2020-01-22 语义分析方法、装置、设备及存储介质
CN202080004415.XA CN112543932A (zh) 2020-01-22 2020-01-22 语义分析方法、装置、设备及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/073914 WO2021147041A1 (zh) 2020-01-22 2020-01-22 语义分析方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021147041A1 true WO2021147041A1 (zh) 2021-07-29

Family

ID=75017367

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/073914 WO2021147041A1 (zh) 2020-01-22 2020-01-22 语义分析方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN112543932A (zh)
WO (1) WO2021147041A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883741B (zh) * 2021-04-29 2021-07-27 华南师范大学 基于双通道图神经网络的特定目标情感分类方法
CN113468307B (zh) * 2021-06-30 2023-06-30 网易(杭州)网络有限公司 文本处理方法、装置、电子设备及存储介质
CN113434699B (zh) * 2021-06-30 2023-07-18 平安科技(深圳)有限公司 用于文本匹配的bert模型的预训练方法、计算机装置和存储介质
CN114328909A (zh) * 2021-11-12 2022-04-12 腾讯科技(深圳)有限公司 文本处理方法、相关设备、存储介质及计算机程序产品
CN114301630A (zh) * 2021-11-30 2022-04-08 北京六方云信息技术有限公司 网络攻击检测方法、装置、终端设备及存储介质
CN114638231B (zh) * 2022-03-21 2023-07-28 马上消费金融股份有限公司 实体链接方法、装置及电子设备
CN115545018B (zh) * 2022-10-14 2023-07-28 人民网股份有限公司 一种多模态多粒度实体识别系统及实体识别方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133345A (zh) * 2017-05-22 2017-09-05 北京百度网讯科技有限公司 基于人工智能的交互方法和装置
CN110175334A (zh) * 2019-06-05 2019-08-27 苏州派维斯信息科技有限公司 基于自定义的知识槽结构的文本知识抽取系统和方法
CN110309277A (zh) * 2018-03-28 2019-10-08 蔚来汽车有限公司 人机对话语义解析方法和系统
CN110413992A (zh) * 2019-06-26 2019-11-05 重庆兆光科技股份有限公司 一种语义分析识别方法、系统、介质和设备
CN110457689A (zh) * 2019-07-26 2019-11-15 科大讯飞(苏州)科技有限公司 语义处理方法及相关装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104484459B (zh) * 2014-12-29 2019-07-23 北京奇虎科技有限公司 一种对知识图谱中的实体进行合并的方法及装置
CN109388793B (zh) * 2017-08-03 2023-04-07 阿里巴巴集团控股有限公司 实体标注方法、意图识别方法及对应装置、计算机存储介质
CN109918647A (zh) * 2019-01-30 2019-06-21 中国科学院信息工程研究所 一种安全领域命名实体识别方法及神经网络模型
CN110209817B (zh) * 2019-05-31 2023-06-09 安徽省泰岳祥升软件有限公司 文本处理模型的训练方法、装置和文本处理方法
CN110705299B (zh) * 2019-09-26 2022-10-25 北京明略软件系统有限公司 实体和关系的联合抽取方法、模型、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133345A (zh) * 2017-05-22 2017-09-05 北京百度网讯科技有限公司 基于人工智能的交互方法和装置
CN110309277A (zh) * 2018-03-28 2019-10-08 蔚来汽车有限公司 人机对话语义解析方法和系统
CN110175334A (zh) * 2019-06-05 2019-08-27 苏州派维斯信息科技有限公司 基于自定义的知识槽结构的文本知识抽取系统和方法
CN110413992A (zh) * 2019-06-26 2019-11-05 重庆兆光科技股份有限公司 一种语义分析识别方法、系统、介质和设备
CN110457689A (zh) * 2019-07-26 2019-11-15 科大讯飞(苏州)科技有限公司 语义处理方法及相关装置

Also Published As

Publication number Publication date
CN112543932A (zh) 2021-03-23

Similar Documents

Publication Publication Date Title
WO2021147041A1 (zh) 语义分析方法、装置、设备及存储介质
CN109923608B (zh) 利用神经网络对混合语音识别结果进行评级的系统和方法
WO2022057712A1 (zh) 电子设备及其语义解析方法、介质和人机对话系统
Vashisht et al. Speech recognition using machine learning
CN112100349A (zh) 一种多轮对话方法、装置、电子设备及存储介质
WO2021190259A1 (zh) 一种槽位识别方法及电子设备
CN111833845B (zh) 多语种语音识别模型训练方法、装置、设备及存储介质
JP2005084681A (ja) 意味的言語モデル化および信頼性測定のための方法およびシステム
CN109887484A (zh) 一种基于对偶学习的语音识别与语音合成方法及装置
US10872601B1 (en) Natural language processing
US11961515B2 (en) Contrastive Siamese network for semi-supervised speech recognition
CN115617955B (zh) 分级预测模型训练方法、标点符号恢复方法及装置
CN112632244A (zh) 一种人机通话的优化方法、装置、计算机设备及存储介质
WO2021098318A1 (zh) 应答方法、终端及存储介质
Zhao et al. End-to-end-based Tibetan multitask speech recognition
Lin et al. Towards multilingual end‐to‐end speech recognition for air traffic control
WO2023272616A1 (zh) 一种文本理解方法、系统、终端设备和存储介质
CN111916088A (zh) 一种语音语料的生成方法、设备及计算机可读存储介质
CN106971721A (zh) 一种基于嵌入式移动设备的地方口音语音识别系统
CN114360584A (zh) 一种基于音素级的语音情感分层式识别方法及系统
CN113486661A (zh) 一种文本理解方法、系统、终端设备和存储介质
CN113393841A (zh) 语音识别模型的训练方法、装置、设备及存储介质
CN112257432A (zh) 一种自适应意图识别方法、装置及电子设备
KR100400220B1 (ko) 대화 모델을 이용한 자동 통역 장치 및 방법
CN116226338A (zh) 基于检索和生成融合的多轮对话系统及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20915787

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20915787

Country of ref document: EP

Kind code of ref document: A1