WO2019200923A1 - 基于拼音的语义识别方法、装置以及人机对话系统 - Google Patents

基于拼音的语义识别方法、装置以及人机对话系统 Download PDF

Info

Publication number
WO2019200923A1
WO2019200923A1 PCT/CN2018/117626 CN2018117626W WO2019200923A1 WO 2019200923 A1 WO2019200923 A1 WO 2019200923A1 CN 2018117626 W CN2018117626 W CN 2018117626W WO 2019200923 A1 WO2019200923 A1 WO 2019200923A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
recognized
statement
pinyin
vector
Prior art date
Application number
PCT/CN2018/117626
Other languages
English (en)
French (fr)
Inventor
李英杰
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US16/464,381 priority Critical patent/US11100921B2/en
Publication of WO2019200923A1 publication Critical patent/WO2019200923A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Definitions

  • Embodiments of the present disclosure relate to the field of human-machine dialogue, and in particular, to a semantic recognition method, apparatus, and human-machine dialog system.
  • a method for semantic recognition is provided.
  • the pinyin sequence of the sentence to be recognized is obtained.
  • the pinyin sequence includes a plurality of pinyin segments.
  • word vectors of the plurality of pinyin segments are obtained.
  • the word vectors of the plurality of pinyin segments are combined into a sentence vector of the sentence to be recognized.
  • an output vector of the sentence to be recognized is obtained using a neural network.
  • the semantics of the statement to be recognized are identified as the semantics of the reference sentence.
  • the pinyin segment is a pinyin of a word in the sentence to be recognized.
  • the Pinyin fragment is a Pinyin letter of a word in the sentence to be recognized.
  • the step of determining a reference sentence that is semantically similar to the statement to be recognized based on an output vector of the to-be-identified statement calculating an output vector and a reference sentence of the statement to be recognized The distance between the output vectors of the focused candidate reference statements. When the distance is less than a threshold, the candidate reference sentence is determined as a reference sentence that is semantically similar to the statement to be recognized.
  • the word vectors of the plurality of Pinyin segments are obtained using a word embedding model.
  • the method further comprises training the word embedding model using the first training data.
  • the first training data includes a pinyin sequence of a plurality of training sentences.
  • the method further comprises: obtaining a Pinyin sequence for each of the at least one set of training sentences, wherein the semantics of the training statements in each set of training sentences are similar; for each set of training statements: Obtaining a word vector for each Pinyin fragment in each Pinyin sequence of the training sentence; combining the word vectors of each Pinyin fragment in each Pinyin sequence of each training sentence into a sentence vector for each training sentence; and using each training The sentence vector of the statement trains the neural network such that the neural network has the same output vector for each training statement.
  • the pinyin sequence of the sentence to be recognized in the step of obtaining the pinyin sequence of the sentence to be recognized, the pinyin sequence of the sentence to be recognized input by the user through the pinyin input method is obtained.
  • the voice information of the statement to be recognized issued by the user is obtained. Then, the voice information is voice-recognized to obtain text information corresponding to the voice information. Next, the text information is converted into a pinyin sequence of the sentence to be recognized.
  • an apparatus for semantic recognition includes at least one processor and at least one memory storing a computer program. And when the computer program is executed by the at least one processor, causing the apparatus to: obtain a pinyin sequence of a sentence to be recognized, the pinyin sequence includes a plurality of pinyin segments; and obtain words of the plurality of pinyin segments a vector; combining the word vectors of the plurality of pinyin segments into a sentence vector of the sentence to be recognized; and obtaining an output vector of the statement to be recognized using a neural network based on the sentence vector of the sentence to be recognized; Determining a reference sentence that is semantically similar to the sentence to be recognized based on an output vector of the statement to be recognized; and identifying a semantic of the sentence to be recognized as a semantic of the reference sentence.
  • an apparatus for semantic recognition includes: a pinyin sequence obtaining module configured to obtain a pinyin sequence of a sentence to be recognized; a word embedding module configured to obtain a word vector of the plurality of pinyin segments; a sentence vector obtaining module configured to The word vectors of the plurality of pinyin segments are combined into a sentence vector of the sentence to be recognized; a neural network module configured to obtain an output vector of the to-be-identified sentence using a neural network based on a sentence vector of the sentence to be recognized a semantic recognition module configured to determine a reference sentence that is semantically similar to the statement to be recognized based on an output vector of the statement to be recognized, and to identify a semantic of the statement to be recognized as the reference sentence Semantics.
  • a system for human-machine dialog comprising: an obtaining device configured to acquire a statement to be recognized from a user; use according to any one of the embodiments of the present disclosure And a display device configured to, in response to determining a reference sentence that is semantically similar to the statement to be recognized, obtain a reply associated with the reference sentence, and output the reply to the user.
  • a computer readable storage medium storing computer executable instructions that, when executed by a computer, cause the computer to perform any of the embodiments in accordance with the present disclosure
  • the method for semantic recognition is also provided.
  • a computer system comprising a processor and a memory coupled to the processor, the memory storing program instructions, the processor being configured to load and execute A method for semantic recognition according to any one of the embodiments of the present disclosure is performed by program instructions in a memory.
  • FIG. 1 shows a schematic structural diagram of an exemplary human-machine dialog system in which a semantic recognition method and apparatus can be implemented in accordance with an embodiment of the present disclosure
  • Figure 2 shows a schematic dialog flow diagram of the human-machine dialog system shown in Figure 1;
  • FIG. 3 illustrates a flow chart of a semantic recognition method in accordance with an embodiment of the present disclosure
  • FIG. 4 illustrates an exemplary training process for the word embedding model in a semantic recognition method in accordance with an embodiment of the present disclosure
  • FIG. 5 illustrates an exemplary training process for the neural network in a semantic recognition method in accordance with an embodiment of the present disclosure
  • FIG. 6 shows a schematic structural block diagram of a semantic recognition apparatus according to an embodiment of the present disclosure
  • FIG. 7 shows a schematic structural block diagram of a semantic recognition apparatus according to an embodiment of the present disclosure.
  • the erroneous word detection model is mainly used, and the target word is paired with the universal word and judged one by one whether the erroneous word pair feature in the model is met. If the result of the test is that the target word is a wrong word, the wrong word is replaced with a generic word corresponding to the wording.
  • the implementation steps of this method are cumbersome, and the processing of the wrong words requires manual labeling, which further increases the cost.
  • FIG. 1 shows a schematic block diagram of an exemplary human-machine dialog system 100 in which a semantic recognition method and apparatus may be implemented in accordance with an embodiment of the present disclosure.
  • the human-machine dialog system may include a smart terminal unit 110, a voice recognition server 120, a web server 130, and a semantic server 140.
  • the smart terminal unit 110 may be a smart terminal such as a personal computer, a smart phone, a tablet computer, or the like.
  • the smart terminal unit 110 may have a voice collection function, so that the user's voice information can be collected; the network communication function, so that the collected voice information can be sent to the voice recognition server 120 for processing, and the voice recognition server 120 can be recognized.
  • the information is sent to the web server 130; and a certain computational storage function enables storage and calculations related to the collection and transmission of voice information and other functions.
  • the voice recognition server 120 can be a server computer system with voice recognition function, which can use a third party voice recognition service, such as a voice recognition function provided by a company such as Keda Xunfei and Baidu. After the smart terminal unit 110 sends the collected voice information to the voice recognition server 120, the voice recognition server 120 performs voice recognition on the voice information, generates corresponding text information, and returns the text information to the smart information. Terminal unit 110.
  • a third party voice recognition service such as a voice recognition function provided by a company such as Keda Xunfei and Baidu.
  • the smart terminal unit 110 may itself have a voice recognition function, and in this case, the human-machine dialog system 100 may not include the separate voice recognition server 120.
  • the web server 130 can be a computer system having web service functionality and providing a web access interface.
  • the web server 130 receives the text information sent by the smart terminal unit 110 as problem information, sends the text information to the semantic server 140, and sends the result returned by the semantic server 140 as a result of the reply.
  • the smart terminal unit 110 is given.
  • the semantic server 140 may be a computer system having a semantic understanding function for processing problem information.
  • the matching problem is sought by matching the problem information with the stored questions in the database including the question answer.
  • the problem information is identified by the matched question, and then the corresponding reply is returned.
  • the semantic server 140 includes functionality to provide semantic understanding services, as well as functionality to provide model training of models upon which semantic understanding relies.
  • the semantic server 140 may only include a semantic understanding service function that uses a trained model to provide a semantic understanding service. The training of the model can be located on a separate server.
  • the web server 130 and the semantic server 140 can be combined into a single server and implemented on a single computer system.
  • the intelligent terminal unit 110, the voice recognition server 120, the web server 130, and the semantic server 140 may be communicably connected to each other through a network.
  • the network may be, for example, any one or more of a computer network and/or a telecommunications network such as the Internet, a local area network, a wide area network, an intranet, and the like.
  • FIG. 2 there is shown a schematic dialog flow diagram of the human-machine dialog system shown in Figure 1. As shown in FIG. 2, the dialog flow includes the following steps:
  • step 201 the smart terminal unit 110 collects voice information through a microphone or the like, and then transmits the collected voice information to the voice recognition server 120 through the network.
  • the voice recognition server 120 performs voice recognition on the voice information collected by the smart terminal unit 110, generates text information (for example, Chinese character text information or text information in other languages) as a result of voice recognition, and returns it to the smart terminal unit. 110.
  • text information for example, Chinese character text information or text information in other languages
  • step 203 after receiving the text information as the result of the voice recognition, the smart terminal unit 110 transmits it as problem information (for example, packaged problem information having a specific format) to the web server 130.
  • problem information for example, packaged problem information having a specific format
  • the web server 130 obtains the text information from the question information sent by the smart terminal unit 110 as a question text and sends it to the semantic server 140.
  • the semantic server 140 After receiving the question text, the semantic server 140 performs semantic recognition by matching the question text to the question in the database including the question answer. The semantic server 140 returns a corresponding response after finding the best matching question.
  • semantic recognition methods and apparatus in accordance with embodiments of the present disclosure are primarily implemented in the semantic server 140 of the dialog system 100.
  • composition and dialog flow of the exemplary dialog system 100 in which the semantic recognition method and apparatus may be implemented according to an embodiment of the present disclosure are described above with reference to the accompanying drawings.
  • the web server can also be implemented by other types of servers or local computer systems. Some systems may also not include a web server, but rather communicate directly with the semantic server by the intelligent terminal unit.
  • the semantic recognition method and apparatus according to an embodiment of the present disclosure may also be implemented in other systems than the dialog system 100.
  • the semantic recognition method and apparatus according to an embodiment of the present disclosure can also be applied to any case where a pinyin input method is used to semantically recognize text (for example, Chinese text) input using a pinyin input method.
  • a pinyin input method is used to semantically recognize text (for example, Chinese text) input using a pinyin input method.
  • a semantic recognition method and apparatus according to an embodiment of the present disclosure may be used for pinyin.
  • the text output by the input method is semantically identified to identify and/or replace typos.
  • the system in which the semantic recognition method and apparatus according to an embodiment of the present disclosure may be applied may not include the voice recognition server, but may include: intelligence for accepting the user's pinyin input and generating corresponding text information a terminal unit, a web server for receiving text information from the smart terminal unit, and a semantic server for receiving text information from the web server, semantically identifying the text information, and returning a semantic recognition result.
  • the smart terminal unit may include a device having a pinyin input method, such as a keyboard, a touch screen, etc., so that text can be input using the pinyin input method.
  • the intelligent terminal unit may not include a voice collection function.
  • FIG. 3 a flow diagram of a semantic recognition method in accordance with an embodiment of the present disclosure is shown. At least a portion of the semantic recognition method can be performed, for example, in the dialog system 100 shown in FIG. 1 and described above (eg, primarily by the semantic server 140), or in other systems (eg, systems using pinyin input methods). Executed in).
  • the semantic recognition method may include the following steps:
  • a pinyin sequence of the sentence to be recognized is obtained.
  • the pinyin sequence includes a plurality of pinyin segments.
  • This step 301 can be performed by, for example, the semantic server 140 in the dialog system 100 shown in FIG. 1, in which case the semantic server 140 can obtain user-to-speech conversion from the web server 130 or the smart terminal unit 110. Text information and convert it to the corresponding pinyin sequence.
  • This step 301 can also be performed in common by, for example, the semantic server 140, the smart terminal unit 110, the voice recognition server 120, and the web server 130 in the dialog system 100 shown in FIG.
  • the statement to be recognized may include, for example, a word or a word in a Chinese sentence, and may also include a word in a sentence such as English or the like.
  • the step 301 of obtaining the Pinyin sequence of the sentence to be recognized includes the substep of obtaining a Pinyin sequence of the sentence to be recognized input by the user through the Pinyin input method. This sub-step can be performed, for example, by a smart terminal unit using a pinyin input method.
  • the step 301 of obtaining a Pinyin sequence of the statement to be recognized includes the following sub-steps:
  • Sub-step 1 Obtain the voice information of the statement to be recognized issued by the user.
  • This sub-step can be performed, for example, by the intelligent terminal unit 110 in the dialog system 100.
  • the smart terminal unit 110 can obtain the voice information of the sentence "what year this picture is drawn" issued by the user.
  • Sub-step 2 Perform speech recognition on the speech information to obtain text information corresponding to the speech information.
  • This sub-step can be performed, for example, by the speech recognition server 120 in the dialog system 100.
  • the voice recognition server 120 can perform voice recognition on the voice information of the sentence "Which picture is this year", and obtain text information "what year is this picture”.
  • Sub-step 3 Convert the text information into a pinyin sequence of the sentence to be recognized.
  • This sub-step can be performed, for example, by the semantic server 140 in the dialog system 100.
  • the semantic server 140 can receive text information "what year is this picture drawn”. After the word information is divided into words, the text information is converted into a pinyin sequence "zhe fu hua shi na nian hua de”.
  • step 302 word vectors for the plurality of pinyin segments are obtained.
  • This step 302 can be performed, for example, by the semantic server 140 of the dialog system 100 shown in Figure 1 or a semantic server in other systems.
  • the plurality of pinyin segments are pinyin segments of each of the words to be recognized.
  • each pinyin fragment in the pinyin sequence “zhe fu hua shi na nian hua de” is “zhe”, “fu”, “hua”, “shi”, “na”, “nian”, “hua”, “ De”.
  • the method before the step 302, the method further includes a step 303, in which the pinyin segments of each of the words to be recognized are split into initials and finals.
  • the pinyin segment in the Pinyin sequence For example, the pinyin “zhe”, “fu”, “hua”, “shi”, “na”, “nian”, “hua”, “” of each word in the pinyin sequence "zhe fu hua shi na nian hua de” De” is split into initials and finals to form pinyin fragments "zh”, “e”, “f”, "u”, "h”, “ua”, “sh”, “i”, "n”, “a”, “n”, “ian”, "h”, “ua”, “d”, “e”.
  • the word vectors of the plurality of pinyin segments are obtained using a word embedding model.
  • the word embedding model may embed a model for a trained word, and the training method may be as described later.
  • the word embedding model can be any type of word embedding model known in the art.
  • the word embedding model can be used to use words from a vocabulary (for example, in this application, the initials or finals of Chinese characters, Chinese pinyin or Chinese pinyin can also be used, such as English and the like. Language words, etc.) are mapped to vectors in vector space (which can be called word vectors).
  • the word embedding model receives each pinyin segment in the pinyin sequence as an input, and outputs a word vector of each pinyin segment.
  • the word embedding model receives pinyin segments "zh”, “e”, “f”, “u”, “h”, “ua”, “sh”, “i”, "n”, “a”, “n”, “ian”, "h”, “ua”, “d”, “e”, and output the word vector of each pinyin fragment.
  • the word embedding model is a Word2vec model.
  • the Word2vec model is a common set of word embedding models. These models are a two-layer neural network that is trained to reconstruct the linguistic context of the word.
  • Word2vec takes a text corpus as input and produces a vector space that typically has hundreds of dimensions. Each word in the corpus is assigned a corresponding vector in the space, the word vector. Word vectors are distributed in vector space such that word vectors of words having a common context in the corpus are located close to each other in vector space.
  • the word vectors of the plurality of pinyin segments are combined into a sentence vector of the sentence to be recognized.
  • Each element of the sentence vector is a word vector of each pinyin segment in the pinyin sequence of the sentence to be recognized.
  • the sentence vector can be a multi-dimensional vector.
  • the sentence vector of the sentence "Which picture is this year” is composed of each pinyin fragment “zh”, “e”, “f”, “u”, “h”, “ua”, “sh”, “ Word vectors of i”, "n”, “a”, “n”, “ian”, "h”, “ua”, “d”, “e”.
  • This step 304 can be performed, for example, by the semantic server 140 of the dialog system 100 shown in FIG. 1 or a semantic server in other systems.
  • an output vector of the statement to be recognized is obtained using a neural network based on the sentence vector of the sentence to be recognized.
  • This step 305 can be performed, for example, by the semantic server 140 of the dialog system 100 shown in FIG. 1 or a semantic server in other systems.
  • the neural network can for example be stored in the form of software in the memory of the semantic server.
  • the neural network may be a trained neural network, and the training method thereof may be as described later.
  • the neural network may be any neural network or combination of several neural networks that are known in the art to be capable of analyzing natural language.
  • the neural network may be a deep learning neural network such as Convolutional Neural Networks (CNN) or Long Short-Term Memory (LSTM).
  • CNN Convolutional Neural Networks
  • LSTM Long Short-Term Memory
  • CNN can generally include an input layer (A+B), several convolutional layers + activation function layers, several sub-sampling layers interleaved with convolution layers, and an output layer, as is known in the art.
  • the input layer is for receiving input data.
  • the convolution layer is used to perform convolution processing on data output from the previous layer.
  • the convolutional layer has weights and offsets.
  • the weight represents a convolution kernel, and the offset is the scalar of the output superimposed on the convolutional layer.
  • each convolutional layer can include tens or hundreds of convolution kernels.
  • Each CNN can include multiple convolution layers.
  • the activation function layer is used to perform function transformation on the output data of the previous convolutional layer.
  • the sub-sampling layer is used to subsample the data from the previous layer, including but not limited to: max-pooling, avg-pooling, random combining, undersampling (decimation, for example, selecting a fixed pixel), demultiplexing the output (demuxout, splitting the input image into a plurality of smaller images), and the like.
  • the output layer can include an activation function and is used to output output data.
  • Neural networks usually go through the training phase and the use phase.
  • the neural network is trained using training data, which includes input data and expected output data.
  • input data is input into the neural network to obtain output data.
  • the parameters inside the neural network are adjusted by comparison with the expected output data.
  • the trained neural network can be used to perform tasks such as image, semantic recognition, etc., that is, input data is input into the trained neural network to obtain corresponding output data.
  • a reference sentence that is semantically similar to the statement to be recognized is determined.
  • the semantics of the statement to be recognized is identified as the semantics of the reference sentence.
  • the steps 306 and 307 can be performed, for example, by the semantic server 140 of the dialog system 100 shown in FIG. 1 or a semantic server in other systems.
  • the output layer of the neural network may be used directly to determine a reference statement that is semantically similar to the statement to be recognized based on an output vector of the statement to be recognized.
  • the reference sentence may be, for example, a question statement from a database including a question answer.
  • a plurality of problem statements that may be involved in the dialog system 100 and responses corresponding to each question statement may be included in the database.
  • the database may be stored, for example, in a memory associated with the semantic server 140 or in a memory accessible by the semantic server 140.
  • the neural network can be used to obtain the output vector of the statement in step 305.
  • a sentence vector for each question statement in the database (which may be obtained by step 304 above) is input to the neural network to obtain an output vector for each question statement.
  • the statement to be recognized is semantically similar to a question statement. If it is determined that the to-be-identified statement is semantically similar to a question statement in the database, a reply corresponding to the question statement may be obtained from the database. The reply is then provided to the user as a reply to the statement to be recognized.
  • the reference sentence may be, for example, a search sentence library from a search system.
  • the search statement library can include a large number of search statements that may be involved in the search system.
  • the neural network can be used in step 305 to obtain an output vector of the statement to be recognized.
  • a sentence vector of each search sentence in the search sentence library (which can be obtained by the same steps described above) is input to the neural network to obtain an output vector of each search sentence.
  • it is determined whether the statement to be recognized is semantically similar to a certain search sentence. If it is determined that the to-be-identified statement is semantically similar to a certain search sentence, the search statement may be presented to the user to replace the search sentence input by the user that may contain the wrong pinyin.
  • the step 306 of identifying whether the statement to be recognized and the reference sentence are semantically similar by comparing an output vector of the statement to be recognized with an output vector of a reference sentence may include the following sub- step:
  • sub-step 1 the distance between the output vector of the statement to be recognized and the output vector of the candidate reference statement in the reference sentence set is calculated.
  • the candidate reference sentence is determined as a reference sentence that is semantically similar to the statement to be recognized.
  • cosine distance also called cosine similarity
  • Euclidean distance Euclidean distance
  • Mahalanobis distance Mahalanobis distance
  • the word embedding model used in the step 302 can be a trained word embedding model.
  • the neural network used in the step 305 can be a trained neural network.
  • the semantic recognition method may further include implementing a training process for the word embedding model and a training process for the neural network.
  • the training process for the word embedding model can be completed prior to step 302 of using the word embedding model.
  • the training process for the neural network can be completed prior to step 305 of using the neural network.
  • These training processes may be performed by, for example, the semantic server 140 in the dialog system 100 shown in FIG. 1, or may also be performed by a semantic server in other systems.
  • the technical solution of the embodiment of the present disclosure it is possible to obtain a pinyin sequence having a high degree of similarity to the pronunciation of the pinyin sequence of the words in the sentence to be recognized, so as to remove the words of the same pronunciation but different meanings appearing in the speech recognition or spelling process.
  • the interference caused. This improves the accuracy of speech understanding or pinyin input.
  • the pre-processing steps required according to the technical solution of the embodiments of the present disclosure are simple and efficient, and thus are a low-cost solution.
  • Fig. 4 illustrates an exemplary training process for the word embedding model in a semantic recognition method, in accordance with an embodiment of the present disclosure.
  • the training process for the word embedding model in the semantic recognition method includes the following steps:
  • the word embedding model is trained using the first training data.
  • the first training data includes a pinyin sequence of a plurality of training sentences.
  • the first training data can be generated, for example, by acquiring a large number of sentences from a text corpus, converting each sentence into a pinyin sequence, and obtaining a plurality of pinyin segments in the pinyin sequence of each sentence.
  • the pinyin segments may be, for example, pinyin for each word (or word), or may be a pinyin segment formed by further splitting the pinyin of each word (or word) into initials and finals.
  • the text corpus may for example be a text corpus for a particular kind of dialog system.
  • the statement in the text corpus is the statement used in the particular kind of dialog system.
  • a text corpus for a dialog system for technical support for a certain or a certain type of product will include various statements used in the technical support process for that or such product.
  • the text corpus may also be a corpus of statements used in some other context.
  • the text corpus may also be a corpus of common sentences in a language (eg, Chinese, English).
  • the pinyin segments in the pinyin sequence of each sentence in the first training data are input into the word embedding model.
  • the word embedding model outputs a word vector for each pinyin piece in the pinyin sequence of each sentence.
  • the parameters of the word embedding model are continuously adjusted such that word vectors of a pinyin segment having a common context (eg, appearing in the same sentence and having a distance less than a specified distance) in the first training data are in the vector space. The location is closer.
  • the trained word embedding model can output a word vector close to the distance for the pinyin segments having the common context.
  • the word embedding model can be used in the step 302.
  • Fig. 5 illustrates an exemplary training process for the neural network in a semantic recognition method, in accordance with an embodiment of the present disclosure.
  • the training process for the neural network in the semantic recognition method includes the following steps:
  • a pinyin sequence for each of the at least one set of training statements is obtained.
  • the semantics of the training statements in each set of training statements are similar.
  • the training sentence "Who is this picture drawn” and the training statement "Who is the author of this picture” is a set of semantically similar training statements.
  • the at least one set of training statements can be derived, for example, from a text corpus.
  • the text corpus may for example be a text corpus for a particular kind of dialog system.
  • the statement in the text corpus is the statement used in the particular kind of dialog system.
  • a text corpus for a dialog system for technical support for a certain or a certain type of product will include various statements used in the technical support process for that or such product.
  • the text corpus may also be a corpus of statements used in some other context.
  • the text corpus may also be a corpus of common sentences in a language (eg, Chinese, English).
  • each training sentence can be converted to a pinyin sequence. Then, a plurality of pinyin segments in the Pinyin sequence of each training sentence are obtained.
  • the pinyin segments may be, for example, pinyin for each word (or word), or may be a pinyin segment formed by further splitting the pinyin of each word (or word) into initials and finals.
  • a word vector for each pinyin segment in each of the Pinyin sequences of the training statement is obtained.
  • the word vector of each Pinyin fragment in the Pinyin sequence is obtained using the word embedding model.
  • the word embedding model may be, for example, a word embedding model trained in the above step 401.
  • the word vectors of each pinyin segment in the Pinyin sequence of each training sentence are combined into a sentence vector for each training sentence.
  • Each element of the sentence vector of each training sentence is a word vector of each pinyin segment in the Pinyin sequence of each training sentence.
  • the sentence vector can be a multi-dimensional vector.
  • the neural network is trained using a sentence vector of each of the at least one set of training statements.
  • a sentence vector of each of the set of semantically similar training sentences is input to the neural network to obtain an output of the neural network.
  • the internal parameters of the neural network are then adjusted with the same goal of making each of the set of semantically similar training statements identical.
  • the neural network will be able to output the same or similar results for a plurality of sentences that are semantically identical or similar but different in text, thereby obtaining semantic recognition capabilities.
  • the semantic recognition method has been described above with reference to the accompanying drawings, and it is to be noted that the above description is only an example, and is not a limitation of the present disclosure.
  • the method may have more, fewer, or different steps, and the order, inclusion, and function relationships between the various steps may differ from those described and illustrated.
  • multiple functions that are typically completed in one step can also be performed in a number of separate steps.
  • Multiple steps to perform different functions can be combined into one step to perform these functions.
  • Some steps can be performed in any order or in parallel. All such variations are within the spirit and scope of the disclosure.
  • a semantic recognition apparatus is also provided.
  • Fig. 6, shows a schematic block diagram of a semantic recognition apparatus 600 in accordance with an embodiment of the present disclosure.
  • the functions or operations performed by the components in the semantic recognition device 600 correspond to at least some of the above-described semantic recognition methods according to embodiments of the present disclosure.
  • the semantic recognition device is implemented by, for example, the semantic server 140 in the dialog system 100 shown in FIG. 1, or by a semantic server in other systems.
  • the semantic recognition device may be implemented, for example, by a combination of general purpose computer hardware and semantic recognition software that implements the processor, memory, etc. of the semantic server. Where the memory loads the semantic recognition software to the processor and the semantic recognition software is executed by the processor, the components in the semantic recognition device are formed and their functions or operations are performed.
  • the semantic recognition apparatus 600 includes a Pinyin sequence obtaining module 601, a word embedding module 602, a sentence vector obtaining module 603, a neural network module 604, and a semantic recognition module 605.
  • the Pinyin Sequence Acquisition Module 601 is configured to obtain a Pinyin sequence of the sentence to be recognized.
  • the word embedding module 602 is configured to obtain word vectors for the plurality of pinyin segments.
  • the sentence vector obtaining module 603 is configured to combine the word vectors of the plurality of pinyin segments into a sentence vector of the sentence to be recognized.
  • the neural network module 604 is configured to obtain an output vector of the statement to be recognized using a neural network based on a sentence vector of the sentence to be recognized.
  • the semantic recognition module 605 is configured to determine a reference sentence that is semantically similar to the sentence to be recognized based on an output vector of the sentence to be recognized, and to identify a semantic of the sentence to be recognized as a semantic of the reference sentence.
  • the pinyin segment is a pinyin of a word in the sentence to be recognized.
  • the semantic recognition apparatus further includes:
  • the splitting module 606 is configured to split the pinyin corresponding to the words in the sentence to be recognized in the pinyin sequence into an initial and a final as the pinyin in the pinyin sequence.
  • the semantic recognition module 605 is further configured to:
  • the candidate reference sentence is determined as a reference sentence that is semantically similar to the statement to be recognized.
  • the word embedding model is a Word2vec model.
  • the word embedding module is further configured to be trained using the first training data.
  • the first training data includes a pinyin sequence of a plurality of training sentences.
  • the Pinyin Sequence Acquisition Module 601 is further configured to obtain a Pinyin sequence of words in each of the at least one set of second training sentences.
  • the semantics of the training statements in each set of second training sentences are similar.
  • the word embedding module 602 is further configured to obtain a word vector for each of the pinyin segments of each training sentence.
  • the sentence vector obtaining module 603 is further configured to combine the word vectors of each pinyin segment in the Pinyin sequence of each training sentence into a sentence vector of each training sentence.
  • the neural network module 604 is further configured to train the neural network using a sentence vector of each training statement such that the neural network has the same output vector for each training statement.
  • the Pinyin Sequence Acquisition Module 601 is further configured to obtain a Pinyin sequence of a sentence to be recognized input by a user through a Pinyin input method.
  • FIG. 7 shows a schematic structural block diagram of a semantic recognition apparatus 700 according to an embodiment of the present disclosure.
  • the apparatus 700 can include a processor 701 and a memory 702 that stores a computer program.
  • the apparatus 700 is caused to perform the steps of the semantic recognition method as shown in FIG. That is, device 700 can obtain a pinyin sequence of the sentence to be recognized.
  • the pinyin sequence includes a plurality of pinyin segments.
  • the device 700 can then obtain the word vectors for the plurality of pinyin segments.
  • the apparatus 700 may combine the word vectors of the plurality of pinyin segments into a sentence vector of the sentence to be recognized.
  • Apparatus 700 can then obtain an output vector of the statement to be recognized using a neural network based on the sentence vector of the statement to be recognized.
  • the device 700 may determine a reference sentence that is semantically similar to the statement to be recognized based on an output vector of the statement to be recognized.
  • Apparatus 700 can identify the semantics of the statement to be recognized as the semantics of the reference statement.
  • processor 701 may be, for example, a central processing unit CPU, a microprocessor, a digital signal processor (DSP), a multi-core based processor architecture processor, or the like.
  • Memory 702 can be any type of memory implemented using data storage techniques including, but not limited to, random access memory, read only memory, semiconductor based memory, flash memory, magnetic disk memory, and the like.
  • device 700 may also include an input device 703, such as a keyboard, mouse, microphone, etc., for entering a statement to be recognized. Additionally, device 700 can also include an output device 704, such as a display or the like, for outputting a reply.
  • an input device 703 such as a keyboard, mouse, microphone, etc.
  • an output device 704 such as a display or the like, for outputting a reply.
  • the apparatus 700 may determine, based on an output vector of the to-be-identified statement, a reference sentence that is semantically similar to the statement to be recognized by calculating an output vector of the to-be-identified sentence. a distance from an output vector of the candidate reference statement in the reference sentence set; when the distance is less than the threshold, the candidate reference sentence is determined as a reference sentence that is semantically similar to the statement to be recognized.
  • apparatus 700 may also train the word embedding model using first training data.
  • the first training data includes a pinyin sequence of a plurality of training sentences.
  • apparatus 700 may also obtain a Pinyin sequence for each of the at least one set of training statements, wherein the semantics of the training statements in each set of training sentences are similar. For each set of training sentences, the apparatus 700 may further: obtain a word vector of each Pinyin fragment in each Pinyin sequence of the training sentence; combine the word vectors of each Pinyin fragment in each Pinyin sequence of each training sentence into each a sentence vector of the training statement; and training the neural network using the sentence vector of each training statement such that the neural network has the same output vector for each training statement.
  • the apparatus 700 may obtain a Pinyin sequence of a sentence to be recognized by obtaining a Pinyin sequence of a sentence to be recognized input by a user through a Pinyin input method.
  • the apparatus 700 may obtain a pinyin sequence of a sentence to be recognized by obtaining a voice information of a sentence to be recognized issued by a user, and performing voice recognition on the voice information to obtain a corresponding Text information of the voice information; converting the text information into a pinyin sequence of the sentence to be recognized.
  • the device may have more, fewer, or different modules, and the relationships, inclusive and functional relationships between the various modules may be different than described and illustrated.
  • the device may have more, fewer, or different modules, and the relationships, inclusive and functional relationships between the various modules may be different than described and illustrated.
  • multiple functions performed by one module can also be performed by multiple separate modules. Multiple modules that perform different functions can be combined into one module that performs these functions.
  • the functions performed by one module can also be performed by another module. All such variations are within the spirit and scope of the disclosure.
  • a human-machine dialog system is also provided.
  • the human-machine dialog system may be, for example, the human-machine dialog system 100 shown in FIG. 1, or a part thereof or a variant thereof.
  • the human-machine dialog system may include: an acquisition device, the semantic recognition device 600, 700, and an output device according to any one of the embodiments of the present disclosure.
  • the obtaining means is configured to acquire a statement to be recognized from the user.
  • the output device is configured to, in response to determining a reference statement that is semantically similar to the statement to be recognized, obtain a reply associated with the reference statement and output the reply to the user.
  • a computer readable storage medium storing computer executable instructions.
  • the computer executable instructions when executed by a computer, cause the computer to perform a semantic recognition method in accordance with any one of the embodiments of the present disclosure.
  • a computer system in still another aspect of the present disclosure, includes a processor and a memory coupled to the processor.
  • Program instructions are stored in the memory, the processor being configured to perform a semantic recognition method according to any one of the embodiments of the present disclosure by loading and executing program instructions in the memory.
  • the computer system may also include other components, such as various input and output components, communication components, etc., as such components may be components of existing computer systems and therefore will not be described again.
  • text information is converted to pinyin during the training phase.
  • the pinyin of a word is further divided into two parts, an initial and a final. Then the word is embedded.
  • the text information is converted into a sentence vector, it is trained through the neural network.
  • the service is provided, the text information is converted into a pinyin sequence, and then the neural network forward operation is used to obtain the sentence with the highest similarity as the matching result.
  • This can adapt to more erroneous words and remove the interference caused by words with different meanings in speech recognition or spelling. And you can keep the original network design unchanged, just add simple pre-processing.
  • the technical solution provided by the embodiments of the present disclosure ultimately improves the accuracy of semantic understanding in the entire system, and is a low-cost solution.
  • the semantic recognition method and apparatus and human-machine dialog system may be implemented by hardware, software, firmware, or any combination thereof.
  • the semantic recognition method and apparatus and human-machine dialog system according to embodiments of the present disclosure may be implemented in a centralized manner in one computer system, or in a distributed manner, in which different components are distributed over several interconnected In a computer system.
  • a typical combination of hardware and software can be a general purpose computer system with a computer program.
  • the program code modules in the computer program correspond to modules in the semantic recognition apparatus according to an embodiment of the present disclosure, and when the computer program is loaded and executed, the computer system is controlled to execute the embodiment according to the present disclosure. Semantic recognition of the operation and function of each module in the device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

本公开提供了一种用于语义识别的方法和装置及人机对话系统。该方法包括:获得待识别语句的拼音序列,所述拼音序列包括多个拼音片段;获得所述多个拼音片段的词向量;将所述多个拼音片段的词向量组合成所述待识别语句的句向量;基于所述待识别语句的句向量,使用神经网络获得所述待识别语句的输出向量;基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句;以及将所述待识别语句的语义识别为所述参考语句的语义。

Description

基于拼音的语义识别方法、装置以及人机对话系统
相关申请的交叉引用
本申请要求于2018年4月19日递交的中国专利申请第201810354766.8号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本公开的实施例涉及人机对话领域,具体涉及一种语义识别方法、装置以及人机对话系统。
背景技术
随着网络智能设备数量的迅猛发展,设备具有各种各样的形态和各种各样的交互方式。尤其随着语音识别技术公司的崛起,语音识别技术越来越成熟,应用的范围也越来越广。基于语音的人机交互方式被普遍作为更加流行的人机交互方式。在实际运用过程中,目前语音识别功能一般都是将语音信号转换为文本信息。再基于所转换的文本信息来向用户提供答复。此外,还有基于文本输入的人机交互方式,比如文本搜索、线上咨询等。
发明内容
在本公开的一个方面,提供了一种用于语义识别的方法。在该方法中,获得待识别语句的拼音序列。所述拼音序列包括多个拼音片段。然后,获得所述多个拼音片段的词向量。将所述多个拼音片段的词向量组合成所述待识别语句的句向量。接着,基于所述待识别语句的句向量,使用神经网络获得所述待识别语句的输出向量。基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句。将所述待识别语句的语义识别为所述参考语句的语义。
在本公开的一些实施例中,所述拼音片段是所述待识别语句中的词的 拼音。
在本公开的一些实施例中,所述拼音片段是所述待识别语句中的词的拼音字母。
在本公开的一些实施例中,在基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句的步骤中,计算所述待识别语句的输出向量与参考语句集中的候选参考语句的输出向量之间的距离。当所述距离小于阈值时,将所述候选参考语句确定为与所述待识别语句在语义上相似的参考语句。
在本公开的一些实施例中,所述多个拼音片段的词向量使用词嵌入模型获得。
在本公开的一些实施例中,所述方法进一步包括:使用第一训练数据训练所述词嵌入模型。所述第一训练数据包括多个训练语句的拼音序列。
在本公开的一些实施例中,所述方法进一步包括:获得至少一组训练语句中的每个训练语句的拼音序列,其中每组训练语句中的训练语句的语义相似;对于每组训练语句:获得每个训练语句的拼音序列中的每个拼音片段的词向量;将每个训练语句的拼音序列中的每个拼音片段的词向量组合成每个训练语句的句向量;以及使用每个训练语句的句向量训练所述神经网络,以使得所述神经网络针对每个训练语句的输出向量相同。
在本公开的一些实施例中,在获得待识别语句的拼音序列的步骤中,获得用户通过拼音输入法输入的待识别语句的拼音序列。
在本公开的一些实施例中,在获得待识别语句的拼音序列的步骤中,获得用户发出的待识别语句的语音信息。然后,对所述语音信息进行语音识别,以获得对应于所述语音信息的文本信息。接着,将所述文本信息转换为所述待识别语句的拼音序列。
在本公开的另一个方面,提供了一种用于语义识别的装置。该装置包括:至少一个处理器,以及存储有计算机程序的至少一个存储器。当所述计算机程序由所述至少一个处理器执行时,使得所述装置执行以下操作:获得待识别语句的拼音序列,所述拼音序列包括多个拼音片段;获得所述 多个拼音片段的词向量;将所述多个拼音片段的所述词向量组合成所述待识别语句的句向量;基于所述待识别语句的所述句向量,使用神经网络获得所述待识别语句的输出向量;基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句;以及将所述待识别语句的语义识别为所述参考语句的语义。
在本公开的另一个方面,提供了一种用于语义识别的装置。该装置包括:拼音序列获得模块,其被配置为获得待识别语句的拼音序列;词嵌入模块,其被配置为获得所述多个拼音片段的词向量;句向量获得模块,其被配置为将所述多个拼音片段的词向量组合成所述待识别语句的句向量;神经网络模块,其被配置为基于所述待识别语句的句向量,使用神经网络获得所述待识别语句的输出向量;语义识别模块,其被配置为基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句,以及将所述待识别语句的语义识别为所述参考语句的语义。
在本公开的又一个方面,还提供了一种用于人机对话的系统,包括:获取装置,其被配置为获取来自用户的待识别语句;根据本公开的任何一个实施例所述的用于语义识别的装置;以及输出设备,其被配置为响应于确定与所述待识别语句在语义上相似的参考语句,获得与所述参考语句关联的答复,并将所述答复输出给用户。
在本公开的再一个方面,还提供了一种计算机可读存储介质,其存储有计算机可执行指令,所述计算机可执行指令当被计算机执行时使得该计算机执行根据本公开的任何一个实施例所述的用于语义识别的方法。
在本公开的再一个方面,还提供了一种计算机系统,其包括处理器和与处理器相连接的存储器,所述存储器中存储有程序指令,所述处理器被配置为通过加载和执行所述存储器中的程序指令而执行根据本公开的任何一个实施例所述的用于语义识别的方法。
附图说明
图1示出了根据本公开的实施例的语义识别方法和装置可在其中实现 的示例性人机对话系统的示意性结构图;
图2示出了如图1所示的人机对话系统的示意性对话流程图;
图3示出了根据本公开的实施例的语义识别方法的流程图;
图4示出了根据本公开的实施例的语义识别方法中针对所述词嵌入模型的示意性训练过程;
图5示出了根据本公开的实施例的语义识别方法中针对所述神经网络的示意性训练过程;
图6示出了根据本公开的实施例的语义识别装置的示意性结构框图;以及
图7示出了根据本公开的实施例的语义识别装置的示意性结构框图。
具体实施方式
为使本领域的技术人员更好地理解本公开的解决方案,下面结合附图和具体实施方式对本公开的实施例所提供的语义识别方法、装置以及人机对话系统作进一步详细描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
对于基于语音的人机交互方式,在实际运用过程中,很难保证通过语音识别来转换的文本信息的准确率。有时识别结果返回的是发音相似的词,但是其词义却差别很大。这就会造成后续进行语义理解时的不准确。语音识别作为整个对话系统的前端输入,其准确性对后面的处理有很大的影响。比如“这幅画是哪年画的”有时被识别为“这幅画是打电话的”,“哪年画的”有时被识别为“那年画的”等情况,这会导致基于所识别的语义无法获得正确的答复。
另外,针对基于文本输入的人机交互方式,比如文本搜索,很多使用拼音输入法的用户经常拼写错字。在这种情况下,由于输入的词的音相近但是意义不同,也会导致无法进行准确的搜索。
现在有一些利用深度学习方法的语音识别后处理的技术方案。在该技术方案中,主要利用错词检测模型,对目标词与通用词配对并逐一判断是否符合该模型中的错词对特征。如果检测结果是该目标词为错词,则会用对应该措词的通用词来替换该错词。此方法实现步骤比较繁琐,并且处理错词对需要人为标注,进一步增大了成本。
可见,本领域中需要一种能够改进语音识别和拼音输入的准确率的技术方案。
现参照图1,其示出了根据本公开的实施例的语义识别方法和装置可在其中实现的一种示例性人机对话系统100的示意性结构图。
如图1中所示,所述人机对话系统可以包括智能终端单元110,语音识别服务器120,web服务器130和语义服务器140。
所述智能终端单元110可以是诸如个人计算机、智能手机、平板计算机等的智能终端。所述智能终端单元110可以具有语音采集功能,从而可采集用户的语音信息;网络通信功能,从而可将所采集的语音信息发送到语音识别服务器120进行处理,并可将语音识别服务器120识别出的信息发送到web服务器130;以及一定的计算存储功能,从而能进行与语音信息的采集和发送以及其他功能相关的存储和计算。
所述语音识别服务器120可以是一具有语音识别功能的服务器计算机系统,其可以使用第三方语音识别服务,比如科大讯飞、百度等公司提供的语音识别功能。当所述智能终端单元110将采集到的语音信息发送给语音识别服务器120后,语音识别服务器120对该语音信息经过语音识别,产生对应的文本信息,并将所述文本信息返回到所述智能终端单元110。
在一些实施例中,所述智能终端单元110本身可带有语音识别功能,且在这种情况下,所述人机对话系统100可不包括单独的所述语音识别服务器120。
所述web服务器130可以是具有web服务功能并提供web访问接口的计算机系统。所述web服务器130接收所述智能终端单元110发送的作为问题信息的所述文本信息,将所述文本信息发送给所述语义服务器140, 并将所述语义服务器140返回的作为答复的结果发给所述智能终端单元110。
所述语义服务器140可以是具有语义理解功能的计算机系统,其用于对问题信息进行处理。通过将所述问题信息与所存储的包括问题答复的数据库中的各问题进行匹配,寻找所匹配的问题。通过所匹配的问题,对所述问题信息进行识别,然后返回对应的答复。所述语义服务器140包括了提供语义理解服务的功能,以及提供语义理解所依赖的模型的模型训练的功能。在另一些实施例中,所述语义服务器140可仅包括语义理解服务功能,其使用训练好的模型来提供语义理解服务。而模型的训练可以位于另外一个单独的服务器上。
在一些实施例中,所述web服务器130和语义服务器140可以合并为单个服务器,并在单个计算机系统上实现。
所述智能终端单元110、语音识别服务器120、web服务器130和语义服务器140可以通过网络相互通信连接。所述网络例如可以是因特网、局域网、广域网、内部网等的任何一种或多个计算机网络和/或电信网络。
现参照图2,其示出了如图1所示的人机对话系统的示意性对话流程图。如图2中所示,该对话流程包括如下步骤:
在步骤201,智能终端单元110通过麦克风等采集语音信息,然后将采集到的语音信息通过网络发送到语音识别服务器120。
在步骤202,语音识别服务器120对智能终端单元110采集的语音信息进行语音识别,产生作为语音识别结果的文本信息(例如汉字文本信息或其它语言的文本信息),并将其返回给智能终端单元110。
在步骤203,智能终端单元110接收到作为语音识别结果的文本信息后,将其作为问题信息(例如封装成具有特定格式的问题信息)发送给web服务器130。
在步骤204,web服务器130从智能终端单元110发送的问题信息中获取所述文本信息,作为问题文本,并发送给语义服务器140。
在步骤205,语义服务器140收到问题文本后,通过将该问题文本与 包括问题答复的数据库中的问题进行匹配来进行语义识别。语义服务器140在找到最匹配的问题后,返回相应答复。
在一些实施例中,根据本公开的实施例的语义识别方法和装置主要在所述对话系统100的语义服务器140中实现。
以上参照附图描述了根据本公开的实施例的语义识别方法和装置可以在其中实现的示例性对话系统100的组成和对话流程,应指出的是,以上描述仅为示例,而不是对本公开可在其中实现的系统的限制。例如,所述web服务器也可以由其他类型的服务器或本地计算机系统来实现。一些系统也可以不包括web服务器,而是由智能终端单元直接与语义服务器通信。
根据本公开的实施例的语义识别方法和装置也可以在所述对话系统100之外的其他系统中实现。例如,根据本公开的实施例的语义识别方法和装置也可用于任何使用拼音输入法的场合,以对使用拼音输入法输入的文本(例如汉语文本)进行语义识别。例如,当使用拼音输入法在浏览器的搜索框中输入搜索文本时,或者使用拼音输入法在字处理应用程序中输入文本时,可以使用根据本公开的实施例的语义识别方法和装置对拼音输入法所输出的文本进行语义识别,以识别和/或替换其中的错字。在这种情况下,根据本公开的实施例的语义识别方法和装置可在其中应用的系统可以不包括语音识别服务器,但可以包括:用于接受用户的拼音输入并产生相应的文本信息的智能终端单元,用于接收来自智能终端单元的文本信息的web服务器,以及用于接收来自web服务器的文本信息、对该文本信息进行语义识别并返回语义识别结果的语义服务器。相应地,所述智能终端单元可以包括一个具有拼音输入法的装置,例如键盘、触摸屏等,从而可以利用拼音输入法输入文本。而且所述智能终端单元可以不包括语音采集功能。
现参照图3,其示出了根据本公开的实施例的语义识别方法的流程图。该语义识别方法的至少一部分可以在例如图1中所示和以上所述的对话系统100中执行(例如主要由所述语义服务器140执行),也可以在其他系统(例如使用拼音输入法的系统)中执行。
如图3中所示,根据本公开的实施例的语义识别方法可包括以下步骤:
在步骤301,获得待识别语句的拼音序列。该拼音序列包括多个拼音片段。该步骤301可以由例如图1中所示的对话系统100中的语义服务器140来执行,在这种情况下,所述语义服务器140可获得来自web服务器130或智能终端单元110的由用户语音转换的文本信息,并将其转换为相应的拼音序列。该步骤301也可以由例如图1中所示的对话系统100中的语义服务器140、智能终端单元110、语音识别服务器120以及web服务器130共同执行。
所述待识别的语句可包括例如汉语语句中的字或词,也可包括诸如英语等其他语言的语句中的单词等。
在一些示例性实施例中,所述获得待识别语句的拼音序列的步骤301包括以下子步骤:获得用户通过拼音输入法输入的待识别语句的拼音序列。该子步骤例如可以由使用拼音输入法的智能终端单元执行。
在另一些示例性实施例中,所述获得待识别语句的拼音序列的步骤301包括以下子步骤:
子步骤1:获得用户发出的待识别语句的语音信息。该子步骤例如可以由所述对话系统100中的智能终端单元110执行。例如,智能终端单元110可以获得用户发出的语句“这幅画是哪年画的”的语音信息。
子步骤2:对所述语音信息进行语音识别,以获得对应于所述语音信息的文本信息。该子步骤例如可以由所述对话系统100中的语音识别服务器120执行。例如,语音识别服务器120可以对语句“这幅画是哪年画的”的语音信息进行语音识别,获得“这幅画是哪年画的”的文本信息。
子步骤3:将所述文本信息转换为所述待识别语句的拼音序列。该子步骤例如可以由所述对话系统100中的语义服务器140执行。例如,语义服务器140可以接收“这幅画是哪年画的”的文本信息。对该文本信息进行词语划分后,该文本信息被转换为拼音序列“zhe fu hua shi na nian hua de”。
在步骤302,获得所述多个拼音片段的词向量。该步骤302例如可以 由图1中所示的对话系统100的语义服务器140或其他系统中的语义服务器来执行。
在一些示例性实施例中,所述多个拼音片段为所述待识别语句中的每个词的拼音片段。例如,拼音序列“zhe fu hua shi na nian hua de”中的每个拼音片段为“zhe”、“fu”、“hua”、“shi”、“na”、“nian”、“hua”、“de”。
在另一些示例性实施例中,在所述步骤302之前,所述方法还包括步骤303,在该步骤303中,将所述待识别语句中的每个词的拼音片段拆分为声母和韵母,作为所述拼音序列中的拼音片段。例如,拼音序列“zhe fu hua shi na nian hua de”中的每个词的拼音“zhe”、“fu”、“hua”、“shi”、“na”、“nian”、“hua”、“de”被拆分为声母和韵母,从而形成拼音片段“zh”、“e”、“f”、“u”、“h”、“ua”、“sh”、“i”、“n”、“a”、“n”、“ian”、“h”、“ua”、“d”、“e”。
在一些示例性实施例中,所述多个拼音片段的词向量使用词嵌入模型获得。所述词嵌入模型可以为已经过训练的词嵌入模型,其训练方法可以如后文中所述。
所述词嵌入模型可以为本领域中所知的任何一种词嵌入模型。如本领域中所知的,词嵌入模型可用于将来自于一个词汇表的词(例如,在本申请中可以为汉字、汉字的拼音或汉字拼音的声母或韵母,也可以为诸如英语等其他语言的单词等)映射为向量空间中的向量(可称为词向量)。在本公开的实施例中,所述词嵌入模型接收所述拼音序列中的每个拼音片段作为输入,并输出每个拼音片段的词向量。例如,所述词嵌入模型接收拼音片段“zh”、“e”、“f”、“u”、“h”、“ua”、“sh”、“i”、“n”、“a”、“n”、“ian”、“h”、“ua”、“d”、“e”,并输出每个拼音片段的词向量。
在本公开的示例性实施例中,所述词嵌入模型为Word2vec模型。如本领域中所知的,Word2vec模型是一组常见的词嵌入模型。这些模型是一种两层神经网络,其被训练以重建词的语言学上下文。Word2vec以一个文本语料库为输入,并产生一个通常具有数百个维度的向量空间。该语料库中的每个词被分配该空间中的一个相应向量,即词向量。词向量在向量空间 中被分布为使得在语料库中具有共同上下文的词的词向量在向量空间中位置相互接近。
在步骤304,将所述多个拼音片段的词向量组合成所述待识别语句的句向量。所述句向量的每个元素是所述待识别语句的拼音序列中的每个拼音片段的词向量。所述句向量可以是一个多维的矢量。例如,语句“这幅画是哪年画的”的句向量是由每个拼音片段“zh”、“e”、“f”、“u”、“h”、“ua”、“sh”、“i”、“n”、“a”、“n”、“ian”、“h”、“ua”、“d”、“e”的词向量组成的。该步骤304例如可以由图1中所示的对话系统100的语义服务器140或其他系统中的语义服务器来执行。
在步骤305,基于所述待识别语句的句向量,使用神经网络获得所述待识别语句的输出向量。该步骤305例如可以由图1中所示的对话系统100的语义服务器140或其他系统中的语义服务器来执行。所述神经网络例如可以以软件的形式存储在所述语义服务器的存储器中。
所述神经网络可以是已经过训练的神经网络,其训练方法可以如后文中所述。
所述神经网络可以是本领域中所知的任何能够对自然语言进行分析处理的一种神经网络或几种神经网络的组合。例如,所述神经网络可以是卷积神经网络(Convolutional Neural Networks,CNN)、长短期记忆网络(Long Short-Term Memory,LSTM)等深度学习神经网络。
以CNN为例,如本领域中所知的,CNN通常可包括一个输入层(A+B)、若干卷积层+激活函数层、若干与卷积层交错的子采样层和一个输出层等。所述输入层用于接收输入数据。所述卷积层用于对前面的层输出的数据进行卷积(convolution)处理。卷积层具有权重和偏置。权重表示一个卷积核,偏置是叠加到卷积层的输出的标量。通常,每个卷积层可包括数十个或数百个卷积核。每个CNN可包括多个卷积层。激活函数层用于对前面的卷积层的输出数据进行函数变换。子采样层用于对来自前面的层的数据进行子采样,所述子采样的方法包括但不限于:最大值合并(max-pooling)、平均值合并(avg-pooling)、随机合并、欠采样(decimation,例如选择固 定的像素)、解复用输出(demuxout,将输入图像拆分为多个更小的图像)等。所述输出层可包括激活函数,并用于输出输出数据。
神经网络通常经过训练阶段和使用阶段。在训练阶段,使用训练数据(其包括输入数据和预期输出数据)对神经网络进行训练。在训练阶段,将输入数据输入神经网络,获得输出数据。然后,通过与预期输出数据进行比较,对神经网络内部的各参数进行调整。在使用阶段,经训练的神经网络可用于完成图像、语义识别等任务,即将输入数据输入经训练的神经网络,以获得相应的输出数据。
在步骤306,基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句。
在步骤307,将所述待识别语句的语义识别为所述参考语句的语义。
所述步骤306和步骤307例如可以由图1中所示的对话系统100的语义服务器140或其他系统中的语义服务器来执行。在一些示例性实施例中,所述神经网络的输出层可直接用于基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句。
当所述语义识别方法应用于如图1中所示的对话系统100中时,所述参考语句例如可以是来自于一个包括问题答复的数据库中的问题语句。所述数据库中可以包括该对话系统100中可能涉及的大量问题语句以及每个问题语句所对应的答复。所述数据库例如可存储在与所述语义服务器140关联的存储器中,或者存储在所述语义服务器140可访问的存储器中。这样,可以在步骤305中使用所述神经网络获得所述语句的输出向量。另外,将所述数据库中的每个问题语句的句向量(其可通过上述步骤304获得)输入所述神经网络以获得每个问题语句的输出向量。然后,通过比较所述待识别语句的输出向量与每个问题语句的输出向量,来判断所述待识别语句是否与某个问题语句在语义上相似。如果判断所述待识别语句与所述数据库中的某个问题语句在语义上相似,则可以从所述数据库中获得与该问题语句对应的答复。然后,向用户提供该答复,作为针对所述待识别语句的答复。
当所述语义识别方法应用于搜索系统等使用拼音输入法的系统中时,所述参考语句例如可以是来自于搜索系统的搜索语句库。所述搜索语句库可以包括该搜索系统中可能涉及的大量搜索语句。这样,可以在步骤305中使用所述神经网络获得所述待识别语句的输出向量。另外,将搜索语句库中的每个搜索语句的句向量(其可通过上述相同步骤获得)输入所述神经网络以获得每个搜索语句的输出向量。然后,通过比较所述待识别语句的输出向量与每个搜索语句的输出向量,来判断所述待识别语句是否与某个搜索语句在语义上相似。如果判断所述待识别语句与某个搜索语句在语义上相似,则可以向用户呈现该搜索语句,以替换用户所输入的可能包含错误拼音的搜索语句。
在一些示例性实施例中,所述通过比较所述待识别语句的输出向量与参考语句的输出向量来识别所述待识别语句与所述参考语句是否在语义上相似的步骤306可包括以下子步骤:
在子步骤1,计算所述待识别语句的输出向量与参考语句集中的候选参考语句的输出向量之间的距离。
在子步骤2,当所述距离小于阈值时,将所述候选参考语句确定为与所述待识别语句在语义上相似的参考语句。
可以采用本领域中所知的任何一种方法,例如余弦距离(也称为余弦相似度)、欧氏距离、马氏距离等方法,来计算所述待识别语句的输出向量与候选参考语句的输出向量之间的距离。
如上所述,在所述步骤302中使用的词嵌入模型可以是已经过训练的词嵌入模型。并且在所述步骤305中使用的神经网络可以是已经过训练的神经网络。因此,在一些示例性实施例中,所述语义识别方法还可包括实施针对所述词嵌入模型的训练过程和针对所述神经网络的训练过程。所述针对所述词嵌入模型的训练过程可以在使用所述词嵌入模型的步骤302之前完成。所述针对所述神经网络的训练过程可以在使用神经网络的步骤305之前完成。这些训练过程可以由例如图1中所示的对话系统100中的语义服务器140来执行,或者也可以由其他系统中的语义服务器来执行。
根据本公开的实施例的技术方案,能够获得与待识别语句中的词的拼音序列的发音相似度高的拼音序列,以去除语音识别或者拼写过程中出现的由发音相同但词义不同的词所造成的干扰。这提高了语音理解或拼音输入的准确率。此外,根据本公开的实施例的技术方案所需的预处理步骤简单而高效,因此是一种低成本的解决方案。
现参照图4,其示出了根据本公开的实施例的语义识别方法中针对所述词嵌入模型的示意性训练过程。如图4中所示,在一些示例性实施例中,所述语义识别方法中针对所述词嵌入模型的训练过程包括以下步骤:
在步骤401,使用第一训练数据训练所述词嵌入模型。所述第一训练数据包括多个训练语句的拼音序列。
所述第一训练数据例如可以通过以下方式来产生:获取来自文本语料库中的大量语句,将每个语句转换为拼音序列,并获得每个语句的拼音序列中的多个拼音片段。所述拼音片断例如可以为每个词(或字)的拼音,或者也可以是将每个词(或字)的拼音进一步拆分为声母和韵母而形成的拼音片段。
所述文本语料库例如可以是针对特定种类的对话系统的文本语料库。这样,该文本语料库中的语句是该特定种类的对话系统中所使用的语句。例如,针对关于某种或某类产品的技术支持的对话系统的文本语料库中将包括该种或该类产品的技术支持过程中所使用的各种语句。或者,所述文本语料库也可以是某种其他场合中所使用的语句的语料库。再或者,所述文本语料库也可以是某种语言(例如汉语、英语)中的常见语句的语料库。
如本领域技术人员可知的,在所述词嵌入模型的训练过程中,将所述第一训练数据中的每个语句的拼音序列中的拼音片断输入所述词嵌入模型。所述词嵌入模型输出每个语句的拼音序列中的每个拼音片断的词向量。在这个过程中,不断调整所述词嵌入模型的参数,使得在所述第一训练数据中具有共同上下文(例如出现在相同语句中且距离小于指定距离)的拼音片段的词向量在向量空间中的位置更为接近。这样,当训练完成后,经过训练的所述词嵌入模型就能够针对具有共同上下文的拼音片段输出距离 接近的词向量。这样就可以在所述步骤302中使用所述词嵌入模型。
现参照图5,其示出了根据本公开的实施例的语义识别方法中针对所述神经网络的示意性训练过程。如图5中所示,在一些示例性实施例中,所述语义识别方法中针对所述神经网络的训练过程包括以下步骤:
在步骤501,获得至少一组训练语句中的每个训练语句的拼音序列。
每组训练语句中的训练语句的语义相似。例如,训练语句“这幅画是谁画的”和训练语句“这幅画的作者是谁”是一组语义上相似的训练语句。所述至少一组训练语句例如可来自于文本语料库。所述文本语料库例如可以是针对特定种类的对话系统的文本语料库。这样,该文本语料库中的语句是该特定种类的对话系统中所使用的语句。例如,针对关于某种或某类产品的技术支持的对话系统的文本语料库中将包括该种或该类产品的技术支持过程中所使用的各种语句。或者,所述文本语料库也可以是某种其他场合中所使用的语句的语料库。再或者,所述文本语料库也可以是某种语言(例如汉语、英语)中的常见语句的语料库。
当从例如文本语料库获得所述至少一组训练语句后,可以将每个训练语句转换为拼音序列。然后,获得每个训练语句的拼音序列中的多个拼音片段。所述拼音片断例如可以为每个词(或字)的拼音,或者也可以是将每个词(或字)的拼音进一步拆分为声母和韵母形成的拼音片段。
在步骤502,获得每个训练语句的拼音序列中的每个拼音片段的词向量。拼音序列中的每个拼音片段的词向量使用所述词嵌入模型来获得。所述词嵌入模型例如可以是在上述步骤401中训练过的词嵌入模型。
在步骤503,将每个训练语句的拼音序列中的每个拼音片段的词向量组合成每个训练语句的句向量。每个训练语句的句向量的每个元素是每个训练语句的拼音序列中的每个拼音片段的词向量。所述句向量可以是一个多维的矢量。
在步骤504,使用所述至少一组训练语句中的每个训练语句的句向量训练所述神经网络。在训练过程中,将每一组语义相似的训练语句中的每个训练语句的句向量输入所述神经网络,以获得神经网络的输出结果。然 后以使得每一组语义相似的训练语句中的每个训练语句的输出结果相同为目标,调整所述神经网络的内部参数。这样,经过大量训练语句的训练之后,所述神经网络将能够针对语义上相同或相似但文字上不同的多个语句,输出相同或相似的结果,从而获得了语义识别能力。
以上参照附图描述了根据本公开的实施例的语义识别方法,应指出的是,以上描述仅为示例,而不是对本公开的限制。在本公开的其他实施例中,该方法可具有更多、更少或不同的步骤,且各步骤之间的顺序、包含和功能等关系可以与所描述和图示的不同。例如,通常一个步骤完成的多个功能也可以由多个单独的步骤来执行。执行不同功能的多个步骤可以合并为执行这些功能的一个步骤。一些步骤之间可以任何顺序执行或并行执行。所有这些变化都处于本公开的精神和范围之内。
在本公开的另一个方面,还提供了一种语义识别装置。现参照图6,其示出了根据本公开的实施例的语义识别装置600的示意性结构框图。该语义识别装置600中的各组件执行的功能或操作对应于上述根据本公开的实施例的语义识别方法中的至少一些步骤。为简明起见,在以下描述中省略了与以上描述重复的一些细节,因此,可参照以上描述获得对根据本公开的实施例的语义识别装置600的更详细的了解。在一些实施例中,该语义识别装置由例如图1中所示的对话系统100中的语义服务器140实现,或者由其他系统中的语义服务器实现。具体地,该语义识别装置例如可以由实现所述语义服务器的处理器、存储器等通用计算机硬件和语义识别软件的组合来实现。在存储器向处理器加载所述语义识别软件并且由处理器执行所述语义识别软件的情况下,形成所述语义识别装置中的各组件并执行其功能或操作。
如图6中所示,根据本公开的实施例的语义识别装置600包括:拼音序列获得模块601、词嵌入模块602、句向量获得模块603、神经网络模块604和语义识别模块605。
拼音序列获得模块601被配置为获得待识别语句的拼音序列。
词嵌入模块602被配置为获得所述多个拼音片段的词向量。
句向量获得模块603被配置为将所述多个拼音片段的词向量组合成所述待识别语句的句向量。
神经网络模块604被配置为基于所述待识别语句的句向量,使用神经网络获得所述待识别语句的输出向量。
语义识别模块605被配置为基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句,以及将所述待识别语句的语义识别为所述参考语句的语义。
在一些示例性实施例中,所述拼音片段是所述待识别语句中的词的拼音。
在一些示例性实施例中,所述语义识别装置还包括:
拆分模块606,其被配置为将所述拼音序列中对应于所述待识别语句中的词的拼音拆分为声母和韵母,作为所述拼音序列中的拼音片段。
在一些示例性实施例中,所述语义识别模块605进一步被配置为:
计算所述待识别语句的输出向量与参考语句集中的候选参考语句的输出向量之间的距离;
当所述距离小于阈值时,将所述候选参考语句确定为与所述待识别语句在语义上相似的参考语句。
在一些示例性实施例中,所述词嵌入模型为Word2vec模型。
在一些示例性实施例中,所述词嵌入模块进一步被配置为使用第一训练数据来被训练。所述第一训练数据包括多个训练语句的拼音序列。
在一些示例性实施例中,所述拼音序列获得模块601进一步被配置为获得至少一组第二训练语句中的每个训练语句中的词的拼音序列。其中,每组第二训练语句中的训练语句的语义相似。
所述词嵌入模块602进一步被配置为获得每个训练语句的拼音序列中的每个拼音片段的词向量。
所述句向量获得模块603进一步被配置为将每个训练语句的拼音序列中的每个拼音片段的词向量组合成为每个训练语句的句向量。
所述神经网络模块604进一步被配置为使用每个训练语句的句向量训 练所述神经网络,以使得所述神经网络针对每个训练语句的输出向量相同。
在一些示例性实施例中,所述拼音序列获得模块601进一步被配置为:获得用户通过拼音输入法输入的待识别语句的拼音序列。
在本公开的另一个方面,还提供了一种语义识别装置。图7示出了根据本公开的实施例的语义识别装置700的示意性结构框图。
如图7所示,该装置700可包括处理器701和存储有计算机程序的存储器702。当计算机程序由处理器701执行时,使得装置700可执行如图3所示的语义识别方法的步骤。也就是说,装置700可获得待识别语句的拼音序列。所述拼音序列包括多个拼音片段。然后,装置700可获得所述多个拼音片段的词向量。接着,装置700可将所述多个拼音片段的所述词向量组合成所述待识别语句的句向量。然后,装置700可基于所述待识别语句的所述句向量,使用神经网络获得所述待识别语句的输出向量。接着,装置700可基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句。装置700可将所述待识别语句的语义识别为所述参考语句的语义。
在本公开的一些实施例中,处理器701可以是例如中央处理单元CPU、微处理器、数字信号处理器(DSP)、基于多核的处理器架构的处理器等。存储器702可以是使用数据存储技术实现的任何类型的存储器,包括但不限于随机存取存储器、只读存储器、基于半导体的存储器、闪存、磁盘存储器等。
此外,在本公开的一些实施例中,装置700也可包括输入设备703,例如键盘、鼠标、麦克风等,用于输入待识别语句。另外,装置700还可包括输出设备704,例如显示器等,用于输出答复。
在本公开的一些实施例中,装置700可通过以下操作来基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句:计算所述待识别语句的输出向量与参考语句集中的候选参考语句的输出向量之间的距离;当所述距离小于阈值时,将所述候选参考语句确定为与所述待识别语句在语义上相似的参考语句。
在本公开的一些实施例中,装置700还可以使用第一训练数据训练所述词嵌入模型。所述第一训练数据包括多个训练语句的拼音序列。
在本公开的一些实施例中,装置700还可以获得至少一组训练语句中的每个训练语句的拼音序列,其中每组训练语句中的训练语句的语义相似。对于每组训练语句,装置700还可以:获得每个训练语句的拼音序列中的每个拼音片段的词向量;将每个训练语句的拼音序列中的每个拼音片段的词向量组合成每个训练语句的句向量;以及使用每个训练语句的句向量训练所述神经网络,以使得所述神经网络针对每个训练语句的输出向量相同。
在本公开的一些实施例中,装置700可通过以下操作来获得待识别语句的拼音序列:获得用户通过拼音输入法输入的待识别语句的拼音序列。
在本公开的一些实施例中,装置700可通过以下操作来获得待识别语句的拼音序列:获得用户发出的待识别语句的语音信息;对所述语音信息进行语音识别,以获得对应于所述语音信息的文本信息;将所述文本信息转换为所述待识别语句的拼音序列。
以上参照附图描述了根据本公开的实施例的语义识别装置,应指出的是,以上描述仅为示例,而不是对本公开的限制。在本公开的其他实施例中,该装置可具有更多、更少或不同的模块,且各模块之间的连接、包含和功能等关系可以与所描述和图示的不同。例如,通常一个模块执行的多个功能也可以由多个单独的模块来执行。执行不同功能的多个模块可以合并为执行这些功能的一个模块。一个模块执行的功能也可以由另一个模块来执行。所有这些变化都处于本公开的精神和范围之内。
在本公开的又一个方面,还提供了一种人机对话系统。该人机对话系统例如可以是图1中所示的人机对话系统100,或者其一部分或其变体。
根据本公开的实施例,该人机对话系统可包括:获取装置、根据本公开的任何一个实施例所述的语义识别装置600、700和输出设备。
获取装置被配置为获取来自用户的待识别语句。
输出设备被配置为响应于确定与所述待识别语句在语义上相似的参考语句,获得与所述参考语句关联的答复,并将所述答复输出给用户。
在本公开的再一个方面,还提供了一种计算机可读存储介质,其存储有计算机可执行指令。所述计算机可执行指令当被计算机执行时使得该计算机执行根据本公开的任何一个实施例所述的语义识别方法。
在本公开的另外一个方面,还提供了一种计算机系统,其包括处理器和与处理器相连接的存储器。所述存储器中存储有程序指令,所述处理器被配置为通过加载和执行所述存储器中的程序指令而执行根据本公开的任何一个实施例所述的语义识别方法。如本领域的技术人员可理解的,该计算机系统还可以包括其他部件,例如各种输入输出部件、通信部件等,由于这些部分可以为现有的计算机系统中的部件,因此不再赘述。
可见,在本公开的实施例中,在训练阶段,将文本信息转换为拼音。并且在一些实施例中,进一步将一个词的拼音分为声母和韵母两个部分。之后进行词嵌入。在将文本信息转换为句向量后再经过神经网络进行训练。在服务提供时,将文本信息转换为拼音序列,再通过神经网络前向运算得到相似度最高的句子作为匹配结果。这样能够适应更多错词情况,去除语音识别或者拼写过程中出现词义不同的词所造成的干扰。并且可以保持原来的网络设计不变,只是增加简单的预处理即可。本公开的实施例所提供的技术方案最终提高了整个系统中的语义理解的准确率,且是一种低成本的解决方案。
根据本公开实施例的语义识别方法和装置以及人机对话系统可以由硬件、软件、固件或其任意组合来实现。根据本公开的实施例的语义识别方法和装置以及人机对话系统可以集中的方式在一个计算机系统中实现,或者以分布方式实现,在这种分布方式中,不同的部件分布在若干互连的计算机系统中。一种典型的硬件和软件的组合可以是带有计算机程序的通用计算机系统。该计算机程序中的程序代码模块对应于根据本公开的实施例的语义识别装置中的各模块,且当该计算机程序被加载和执行时,控制该计算机系统而使其执行根据本公开的实施例的语义识别装置中的各模块的操作和功能。
可以理解的是,本公开的以上各实施例仅仅是为了说明本公开的原理 而采用的示例性实施例,本公开并不局限于此。对于本领域内的普通技术人员而言,在不脱离本公开的精神和实质的情况下,可以做出各种变型和改进。这些变型和改进也视为处于本公开的保护范围之内。本公开的保护范围仅由所附权利要求书的语言表述的含义及其等同含义所限定。

Claims (21)

  1. 一种用于语义识别的方法,包括:
    获得待识别语句的拼音序列,所述拼音序列包括多个拼音片段;
    获得所述多个拼音片段的词向量;
    将所述多个拼音片段的所述词向量组合成所述待识别语句的句向量;
    基于所述待识别语句的所述句向量,使用神经网络获得所述待识别语句的输出向量;
    基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句;以及
    将所述待识别语句的语义识别为所述参考语句的语义。
  2. 根据权利要求1所述的方法,其中,
    所述拼音片段是所述待识别语句中的词的拼音。
  3. 根据权利要求1所述的方法,其中,
    所述拼音片段是所述待识别语句中的词的拼音字母。
  4. 根据权利要求1所述的方法,其中,所述基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句包括:
    计算所述待识别语句的输出向量与参考语句集中的候选参考语句的输出向量之间的距离;
    当所述距离小于阈值时,将所述候选参考语句确定为与所述待识别语句在语义上相似的参考语句。
  5. 根据权利要求1所述的方法,其中,所述多个拼音片段的词向量使用词嵌入模型获得。
  6. 根据权利要求5所述的方法,进一步包括:
    使用第一训练数据训练所述词嵌入模型,其中,所述第一训练数据包括多个训练语句的拼音序列。
  7. 根据权利要求1所述的方法,进一步包括:
    获得至少一组训练语句中的每个训练语句的拼音序列,其中每组训练语句中的训练语句的语义相似;
    对于每组训练语句:
    获得每个训练语句的拼音序列中的每个拼音片段的词向量;
    将每个训练语句的拼音序列中的每个拼音片段的词向量组合成每个训练语句的句向量;以及
    使用每个训练语句的句向量训练所述神经网络,以使得所述神经网络针对每个训练语句的输出向量相同。
  8. 根据权利要求1所述的方法,其中,所述获得待识别语句的拼音序列包括:
    获得用户通过拼音输入法输入的待识别语句的拼音序列。
  9. 根据权利要求1所述的方法,其中,所述获得待识别语句的拼音序列包括:
    获得用户发出的待识别语句的语音信息;
    对所述语音信息进行语音识别,以获得对应于所述语音信息的文本信息;
    将所述文本信息转换为所述待识别语句的拼音序列。
  10. 一种用于语义识别的装置,包括:
    至少一个处理器;以及
    存储有计算机程序的至少一个存储器;
    其中,当所述计算机程序由所述至少一个处理器执行时,使得所述装置执行以下操作:
    获得待识别语句的拼音序列,所述拼音序列包括多个拼音片段;
    获得所述多个拼音片段的词向量;
    将所述多个拼音片段的所述词向量组合成所述待识别语句的句向量;
    基于所述待识别语句的所述句向量,使用神经网络获得所述待识别语句的输出向量;
    基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句;以及
    将所述待识别语句的语义识别为所述参考语句的语义。
  11. 根据权利要求10所述的装置,其中,
    所述拼音片段是所述待识别语句中的词的拼音。
  12. 根据权利要求10所述的装置,其中,
    所述拼音片段是所述待识别语句中的词的拼音字母。
  13. 根据权利要求10所述的装置,其中,所述计算机程序在由所述至少一个处理器执行时使得所述装置通过以下操作来基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句:
    计算所述待识别语句的输出向量与参考语句集中的候选参考语句的输出向量之间的距离;
    当所述距离小于阈值时,将所述候选参考语句确定为与所述待识别语句在语义上相似的参考语句。
  14. 根据权利要求10所述的装置,其中,所述多个拼音片段的词向量使用词嵌入模型获得。
  15. 根据权利要求14所述的装置,其中,所述计算机程序在由所述至少一个处理器执行时使得所述装置还执行以下操作:
    使用第一训练数据训练所述词嵌入模型,其中,所述第一训练数据包括多个训练语句的拼音序列。
  16. 根据权利要求10所述的装置,其中,所述计算机程序在由所述至少一个处理器执行时使得所述装置还执行以下操作:
    获得至少一组训练语句中的每个训练语句的拼音序列,其中每组训练语句中的训练语句的语义相似;
    对于每组训练语句:
    获得每个训练语句的拼音序列中的每个拼音片段的词向量;
    将每个训练语句的拼音序列中的每个拼音片段的词向量组合成每个训练语句的句向量;以及
    使用每个训练语句的句向量训练所述神经网络,以使得所述神经网络针对每个训练语句的输出向量相同。
  17. 根据权利要求10所述的装置,其中,所述计算机程序在由所述至少一个处理器执行时使得所述装置通过以下操作来获得待识别语句的拼音序列:
    获得用户通过拼音输入法输入的待识别语句的拼音序列。
  18. 根据权利要求10所述的装置,其中,所述计算机程序在由所述至少一个处理器执行时使得所述装置通过以下操作来获得待识别语句的拼音序列:
    获得用户发出的待识别语句的语音信息;
    对所述语音信息进行语音识别,以获得对应于所述语音信息的文本信息;
    将所述文本信息转换为所述待识别语句的拼音序列。
  19. 一种用于语义识别的装置,包括:
    拼音序列获得模块,其被配置为获得待识别语句的拼音序列;
    词嵌入模块,其被配置为获得所述多个拼音片段的词向量;
    句向量获得模块,其被配置为将所述多个拼音片段的所述词向量组合成所述待识别语句的句向量;
    神经网络模块,其被配置为基于所述待识别语句的所述句向量,使用神经网络获得所述待识别语句的输出向量;以及
    语义识别模块,其被配置为基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句,以及将所述待识别语句的语义识别为所述参考语句的语义。
  20. 一种用于人机对话的系统,包括:
    获取装置,其被配置为获取来自用户的待识别语句;
    根据权利要求10-18中任何一个所述的用于语义识别的装置,
    以及输出设备,其被配置为响应于确定与所述待识别语句在语义上相似的参考语句,获得与所述参考语句关联的答复,并将所述答复输出给用户。
  21. 一种计算机可读存储介质,其存储有计算机可执行指令,所述计 算机可执行指令当被计算机执行时使得该计算机执行根据权利要求1-9中任何一个所述的方法。
PCT/CN2018/117626 2018-04-19 2018-11-27 基于拼音的语义识别方法、装置以及人机对话系统 WO2019200923A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/464,381 US11100921B2 (en) 2018-04-19 2018-11-27 Pinyin-based method and apparatus for semantic recognition, and system for human-machine dialog

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810354766.8 2018-04-19
CN201810354766.8A CN108549637A (zh) 2018-04-19 2018-04-19 基于拼音的语义识别方法、装置以及人机对话系统

Publications (1)

Publication Number Publication Date
WO2019200923A1 true WO2019200923A1 (zh) 2019-10-24

Family

ID=63515638

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/117626 WO2019200923A1 (zh) 2018-04-19 2018-11-27 基于拼音的语义识别方法、装置以及人机对话系统

Country Status (3)

Country Link
US (1) US11100921B2 (zh)
CN (1) CN108549637A (zh)
WO (1) WO2019200923A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942767A (zh) * 2019-11-05 2020-03-31 深圳市一号互联科技有限公司 一种asr语言模型识别标注与优化方法及其装置
CN110992959A (zh) * 2019-12-06 2020-04-10 北京市科学技术情报研究所 一种语音识别方法及系统
CN111079898A (zh) * 2019-11-28 2020-04-28 华侨大学 一种基于TextCNN网络的信道编码识别方法
CN111414481A (zh) * 2020-03-19 2020-07-14 哈尔滨理工大学 基于拼音和bert嵌入的中文语义匹配方法
CN112133295A (zh) * 2020-11-09 2020-12-25 北京小米松果电子有限公司 语音识别方法、装置及存储介质
CN113360623A (zh) * 2021-06-25 2021-09-07 达闼机器人有限公司 一种文本匹配方法、电子设备及可读存储介质

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549637A (zh) * 2018-04-19 2018-09-18 京东方科技集团股份有限公司 基于拼音的语义识别方法、装置以及人机对话系统
CN109446521B (zh) * 2018-10-18 2023-08-25 京东方科技集团股份有限公司 命名实体识别方法、装置、电子设备、机器可读存储介质
CN109299269A (zh) * 2018-10-23 2019-02-01 阿里巴巴集团控股有限公司 一种文本分类方法和装置
CN109657229A (zh) * 2018-10-31 2019-04-19 北京奇艺世纪科技有限公司 一种意图识别模型生成方法、意图识别方法及装置
CN109684643B (zh) * 2018-12-26 2021-03-12 湖北亿咖通科技有限公司 基于句向量的文本识别方法、电子设备及计算机可读介质
US11250221B2 (en) * 2019-03-14 2022-02-15 Sap Se Learning system for contextual interpretation of Japanese words
CN109918681B (zh) * 2019-03-29 2023-01-31 哈尔滨理工大学 一种基于汉字-拼音的融合问题语义匹配方法
CN110097880A (zh) * 2019-04-20 2019-08-06 广东小天才科技有限公司 一种基于语音识别的答题判定方法及装置
CN111862961A (zh) * 2019-04-29 2020-10-30 京东数字科技控股有限公司 识别语音的方法和装置
CN112037776A (zh) * 2019-05-16 2020-12-04 武汉Tcl集团工业研究院有限公司 一种语音识别方法、语音识别装置及终端设备
CN110288980A (zh) * 2019-06-17 2019-09-27 平安科技(深圳)有限公司 语音识别方法、模型的训练方法、装置、设备及存储介质
FR3098000B1 (fr) * 2019-06-27 2022-05-13 Ea4T Procédé et dispositif d’obtention d’une réponse à partir d’une question orale posée à une interface homme-machine.
US11170175B1 (en) * 2019-07-01 2021-11-09 Intuit, Inc. Generating replacement sentences for a particular sentiment
CN110473540B (zh) * 2019-08-29 2022-05-31 京东方科技集团股份有限公司 语音交互方法及系统、终端设备、计算机设备及介质
CN110705274B (zh) * 2019-09-06 2023-03-24 电子科技大学 基于实时学习的融合型词义嵌入方法
CN110909534B (zh) * 2019-11-08 2021-08-24 北京华宇信息技术有限公司 一种深度学习评价模型、输入法拼音纠错方法及装置
WO2021124490A1 (ja) * 2019-12-18 2021-06-24 富士通株式会社 情報処理プログラム、情報処理方法および情報処理装置
CN110990632B (zh) * 2019-12-19 2023-05-02 腾讯科技(深圳)有限公司 一种视频处理方法及装置
CN111192572A (zh) * 2019-12-31 2020-05-22 斑马网络技术有限公司 语义识别的方法、装置及系统
CN111145734A (zh) * 2020-02-28 2020-05-12 北京声智科技有限公司 一种语音识别方法及电子设备
CN113539247B (zh) * 2020-04-14 2024-06-18 京东科技控股股份有限公司 语音数据处理方法、装置、设备及计算机可读存储介质
CN111696535B (zh) * 2020-05-22 2021-10-26 百度在线网络技术(北京)有限公司 基于语音交互的信息核实方法、装置、设备和计算机存储介质
CN111832308B (zh) * 2020-07-17 2023-09-08 思必驰科技股份有限公司 语音识别文本连贯性处理方法和装置
CN114091408A (zh) * 2020-08-04 2022-02-25 科沃斯商用机器人有限公司 文本纠正、模型训练方法、纠正模型、设备及机器人
WO2022028378A1 (zh) * 2020-08-06 2022-02-10 杭州海康威视数字技术股份有限公司 语音意图识别方法、装置及设备
CN112149680B (zh) * 2020-09-28 2024-01-16 武汉悦学帮网络技术有限公司 错字检测识别方法、装置、电子设备及存储介质
CN112259182B (zh) * 2020-11-05 2023-08-11 中国联合网络通信集团有限公司 一种电子病历的生成方法和装置
CN112767924A (zh) * 2021-02-26 2021-05-07 北京百度网讯科技有限公司 语音识别方法、装置、电子设备及存储介质
CN113035200B (zh) * 2021-03-03 2022-08-05 科大讯飞股份有限公司 基于人机交互场景的语音识别纠错方法、装置以及设备
CN113268974B (zh) * 2021-05-18 2022-11-29 平安科技(深圳)有限公司 多音字发音标注方法、装置、设备及存储介质
CN113284499A (zh) * 2021-05-24 2021-08-20 湖北亿咖通科技有限公司 一种语音指令识别方法及电子设备
CN113345429B (zh) * 2021-06-18 2022-03-29 图观(天津)数字科技有限公司 一种基于复杂场景的语义分析方法及系统
CN113655893B (zh) * 2021-07-08 2024-06-18 华为技术有限公司 一种词句生成方法、模型训练方法及相关设备
CN113781998B (zh) * 2021-09-10 2024-06-07 河南松音科技有限公司 基于方言纠正模型的语音识别方法、装置、设备及介质
CN114360517B (zh) * 2021-12-17 2023-04-18 天翼爱音乐文化科技有限公司 一种复杂环境下的音频处理方法、装置及存储介质
CN116312968A (zh) * 2023-02-09 2023-06-23 广东德澳智慧医疗科技有限公司 一种基于人机对话和核心算法的心理咨询和疗愈系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090210218A1 (en) * 2008-02-07 2009-08-20 Nec Laboratories America, Inc. Deep Neural Networks and Methods for Using Same
CN106484664A (zh) * 2016-10-21 2017-03-08 竹间智能科技(上海)有限公司 一种短文本间相似度计算方法
CN106897263A (zh) * 2016-12-29 2017-06-27 北京光年无限科技有限公司 基于深度学习的机器人对话交互方法及装置
CN107491547A (zh) * 2017-08-28 2017-12-19 北京百度网讯科技有限公司 基于人工智能的搜索方法和装置
CN108549637A (zh) * 2018-04-19 2018-09-18 京东方科技集团股份有限公司 基于拼音的语义识别方法、装置以及人机对话系统

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6848080B1 (en) * 1999-11-05 2005-01-25 Microsoft Corporation Language input architecture for converting one text form to another text form with tolerance to spelling, typographical, and conversion errors
CN100429648C (zh) * 2003-05-28 2008-10-29 洛昆多股份公司 一种文本自动分块的方法、分块器和文本到语言合成系统
US8095364B2 (en) * 2004-06-02 2012-01-10 Tegic Communications, Inc. Multimodal disambiguation of speech recognition
US9471566B1 (en) * 2005-04-14 2016-10-18 Oracle America, Inc. Method and apparatus for converting phonetic language input to written language output
US20060282255A1 (en) * 2005-06-14 2006-12-14 Microsoft Corporation Collocation translation from monolingual and available bilingual corpora
US8204751B1 (en) * 2006-03-03 2012-06-19 At&T Intellectual Property Ii, L.P. Relevance recognition for a human machine dialog system contextual question answering based on a normalization of the length of the user input
US8862988B2 (en) * 2006-12-18 2014-10-14 Semantic Compaction Systems, Inc. Pictorial keyboard with polysemous keys for Chinese character output
CN101785000B (zh) * 2007-06-25 2013-04-24 谷歌股份有限公司 词概率确定方法和系统
US20140330865A1 (en) * 2011-11-30 2014-11-06 Nokia Corporation Method and apparatus for providing address geo-coding
KR101394253B1 (ko) * 2012-05-16 2014-05-13 광주과학기술원 음성 인식 오류 보정 장치
KR101364774B1 (ko) * 2012-12-07 2014-02-20 포항공과대학교 산학협력단 음성 인식의 오류 수정 방법 및 장치
CN103678675A (zh) 2013-12-25 2014-03-26 乐视网信息技术(北京)股份有限公司 通过拼音进行搜索的方法、服务器及系统
WO2016008512A1 (en) * 2014-07-15 2016-01-21 Ibeezi Sprl Input of characters of a symbol-based written language
CN104298429B (zh) * 2014-09-25 2018-05-04 北京搜狗科技发展有限公司 一种基于输入的信息展示方法和输入法系统
CN105874874B (zh) * 2014-12-09 2021-01-29 华为技术有限公司 一种处理信息的方法及装置
US9965045B2 (en) * 2015-02-12 2018-05-08 Hoi Chiu LO Chinese input method using pinyin plus tones
CN105244029B (zh) * 2015-08-28 2019-02-26 安徽科大讯飞医疗信息技术有限公司 语音识别后处理方法及系统
CN106683677B (zh) * 2015-11-06 2021-11-12 阿里巴巴集团控股有限公司 语音识别方法及装置
US10509862B2 (en) * 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
CN107515850A (zh) * 2016-06-15 2017-12-26 阿里巴巴集团控股有限公司 确定多音字发音的方法、装置和系统
KR20180055189A (ko) * 2016-11-16 2018-05-25 삼성전자주식회사 자연어 처리 방법 및 장치와 자연어 처리 모델을 학습하는 방법 및 장치
CN107220235B (zh) * 2017-05-23 2021-01-22 北京百度网讯科技有限公司 基于人工智能的语音识别纠错方法、装置及存储介质
CN107451121A (zh) * 2017-08-03 2017-12-08 京东方科技集团股份有限公司 一种语音识别方法及其装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090210218A1 (en) * 2008-02-07 2009-08-20 Nec Laboratories America, Inc. Deep Neural Networks and Methods for Using Same
CN106484664A (zh) * 2016-10-21 2017-03-08 竹间智能科技(上海)有限公司 一种短文本间相似度计算方法
CN106897263A (zh) * 2016-12-29 2017-06-27 北京光年无限科技有限公司 基于深度学习的机器人对话交互方法及装置
CN107491547A (zh) * 2017-08-28 2017-12-19 北京百度网讯科技有限公司 基于人工智能的搜索方法和装置
CN108549637A (zh) * 2018-04-19 2018-09-18 京东方科技集团股份有限公司 基于拼音的语义识别方法、装置以及人机对话系统

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110942767A (zh) * 2019-11-05 2020-03-31 深圳市一号互联科技有限公司 一种asr语言模型识别标注与优化方法及其装置
CN110942767B (zh) * 2019-11-05 2023-03-17 深圳市一号互联科技有限公司 一种asr语言模型识别标注与优化方法及其装置
CN111079898A (zh) * 2019-11-28 2020-04-28 华侨大学 一种基于TextCNN网络的信道编码识别方法
CN111079898B (zh) * 2019-11-28 2023-04-07 华侨大学 一种基于TextCNN网络的信道编码识别方法
CN110992959A (zh) * 2019-12-06 2020-04-10 北京市科学技术情报研究所 一种语音识别方法及系统
CN111414481A (zh) * 2020-03-19 2020-07-14 哈尔滨理工大学 基于拼音和bert嵌入的中文语义匹配方法
CN111414481B (zh) * 2020-03-19 2023-09-26 哈尔滨理工大学 基于拼音和bert嵌入的中文语义匹配方法
CN112133295A (zh) * 2020-11-09 2020-12-25 北京小米松果电子有限公司 语音识别方法、装置及存储介质
CN112133295B (zh) * 2020-11-09 2024-02-13 北京小米松果电子有限公司 语音识别方法、装置及存储介质
CN113360623A (zh) * 2021-06-25 2021-09-07 达闼机器人有限公司 一种文本匹配方法、电子设备及可读存储介质

Also Published As

Publication number Publication date
US20210264903A9 (en) 2021-08-26
US20200335096A1 (en) 2020-10-22
CN108549637A (zh) 2018-09-18
US11100921B2 (en) 2021-08-24

Similar Documents

Publication Publication Date Title
WO2019200923A1 (zh) 基于拼音的语义识别方法、装置以及人机对话系统
US11568855B2 (en) System and method for defining dialog intents and building zero-shot intent recognition models
US11238845B2 (en) Multi-dialect and multilingual speech recognition
US10540964B2 (en) Method and apparatus for processing natural language, method and apparatus for training natural language processing model
US10437929B2 (en) Method and system for processing an input query using a forward and a backward neural network specific to unigrams
US9805718B2 (en) Clarifying natural language input using targeted questions
WO2017127296A1 (en) Analyzing textual data
US11907665B2 (en) Method and system for processing user inputs using natural language processing
CN117521675A (zh) 基于大语言模型的信息处理方法、装置、设备及存储介质
CN110717021A (zh) 人工智能面试中获取输入文本和相关装置
CN112668333A (zh) 命名实体的识别方法和设备、以及计算机可读存储介质
CN113157959A (zh) 基于多模态主题补充的跨模态检索方法、装置及系统
Abhishek et al. Aiding the visually impaired using artificial intelligence and speech recognition technology
US11615787B2 (en) Dialogue system and method of controlling the same
Alrumiah et al. Intelligent Quran Recitation Recognition and Verification: Research Trends and Open Issues
CN113051384A (zh) 基于对话的用户画像抽取方法及相关装置
CN113609873A (zh) 翻译模型训练方法、装置及介质
JP2019204415A (ja) 言い回し文生成方法、言い回し文装置及びプログラム
Hattimare et al. Maruna Bot: An extensible retrieval-focused framework for task-oriented dialogues
US20220215834A1 (en) System and method for speech to text conversion
JP7411149B2 (ja) 学習装置、推定装置、学習方法、推定方法及びプログラム
KR102383043B1 (ko) 생략 복원 학습 방법과 인식 방법 및 이를 수행하기 위한 장치
Singh et al. Identification system for different Punjabi dialects using random forest
Latha et al. Visual audio summarization based on NLP models
Vasuki et al. Using Pre-trained Models for Code-Switched Speech Recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18915564

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18915564

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 18915564

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.05.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18915564

Country of ref document: EP

Kind code of ref document: A1