WO2023137903A1 - Reply statement determination method and apparatus based on rough semantics, and electronic device - Google Patents

Reply statement determination method and apparatus based on rough semantics, and electronic device Download PDF

Info

Publication number
WO2023137903A1
WO2023137903A1 PCT/CN2022/090129 CN2022090129W WO2023137903A1 WO 2023137903 A1 WO2023137903 A1 WO 2023137903A1 CN 2022090129 W CN2022090129 W CN 2022090129W WO 2023137903 A1 WO2023137903 A1 WO 2023137903A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
rough
feature vector
voice information
reply
Prior art date
Application number
PCT/CN2022/090129
Other languages
French (fr)
Chinese (zh)
Inventor
舒畅
陈又新
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023137903A1 publication Critical patent/WO2023137903A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/268Morphological analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Definitions

  • the present application relates to the technical field of artificial intelligence, in particular to a method, device and electronic equipment for determining reply sentences based on rough semantics.
  • the previous round of dialogue text is usually encoded, and the hidden layer state feature of the obtained encoded information is used as one of the inputs of the decoder, and then the dialogue reply is automatically generated according to the time sequence through the decoder.
  • the hidden layer state features encoded by the previous round of dialogue text are used as one of the basis for the generation of reply sentences in the current round of dialogue, so that the reply sentence generation process includes the information characteristics of the previous round of dialogue.
  • the embodiments of the present application provide a method, device and electronic device for determining reply sentences based on rough semantics, which can simultaneously extract key information and rough information in the previous round of dialogue, and then make the generated reply sentences more accurate.
  • the implementation of the present application provides a method for determining a reply sentence based on rough semantics, including:
  • the previous round of voice information adjacent to the voice information is obtained, wherein the time of occurrence of the previous round of voice information is less than the time of occurrence of the voice information, and the absolute value of the difference between the time of occurrence of the previous round of voice information and the time of occurrence of the voice information is the smallest;
  • the embodiment of the present application provides a device for determining reply sentences based on rough semantics, including:
  • the acquisition module is used to obtain the previous round of voice information adjacent to the voice information according to the occurrence time of the voice information at the current moment of the user, wherein the occurrence time of the previous round of voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is the smallest;
  • the processing module is used to perform rough semantic extraction on the voice information according to the previous round of voice information, obtain rough semantic features corresponding to the voice information, perform word segmentation processing on the voice information, obtain keyword groups, and perform multiple hidden feature extraction processing on the keyword groups to obtain initial hidden layer state feature vectors;
  • the generation module is used to perform multiple reply word generation processing according to the rough semantic features and the initial hidden layer state feature vector to obtain at least one reply word, and to splice the obtained at least one reply word according to the generation order of each reply word in the at least one reply word to obtain the reply sentence of the voice information.
  • an embodiment of the present application provides an electronic device, which includes a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, and the one or more programs include instructions for performing the following steps:
  • the previous round of voice information adjacent to the voice information is acquired, wherein the occurrence time of the previous round of voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is the smallest;
  • the at least one reply word is spliced according to the generation sequence of each reply word in the at least one reply word to obtain the reply sentence of the voice information.
  • an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the following steps:
  • the previous round of voice information adjacent to the voice information is acquired, wherein the occurrence time of the previous round of voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is the smallest;
  • the at least one reply word is spliced according to the generation sequence of each reply word in the at least one reply word to obtain the reply sentence of the voice information.
  • an embodiment of the present application provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer is operable to cause the computer to execute the method in the first aspect.
  • the semantic features that can contain high-level abstract information in the previous round of voice information are obtained, which are used as the rough semantic features of the user's current voice information, thereby realizing synchronous extraction of key information and rough information in the previous round of voice information.
  • word segmentation is performed on the voice information of the user at the current moment, and multiple hidden feature extraction processes are performed on the obtained multiple keywords to obtain the initial hidden layer state feature vector of the voice information of the user at the current moment.
  • the rough semantic features and the initial hidden layer state feature vector multiple reply words are generated, and the obtained at least one reply word is spliced according to the generation order of each reply word in the at least one reply word to obtain the reply sentence of the voice information.
  • the rough semantic features that contain both key information and rough information in the previous round of dialogue are used as one of the basis for generating reply sentences in this round of dialogue, so that the reply sentence generation process includes more comprehensive information features of the previous round of dialogue.
  • the generated reply sentences are more accurate, can better fit with the main body of the dialogue, and improve user experience.
  • FIG. 1 is a schematic diagram of the hardware structure of an apparatus for determining a reply sentence based on rough semantics provided in an embodiment of the present application;
  • FIG. 2 is a schematic flowchart of a method for determining a reply sentence based on rough semantics provided in an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a method for extracting rough semantics from voice information based on the previous round of voice information to obtain rough semantic features corresponding to the voice information provided by an embodiment of the present application;
  • FIG. 4 is a schematic structural diagram of a gated cyclic unit encoder provided in an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a multi-layer perceptron provided in an embodiment of the present application.
  • FIG. 6 is a schematic flow diagram of a method for inputting at least one rough context information and at least one first hidden layer state feature vector into a rough decoder for multiple decoding processes to obtain rough semantic features of speech information provided by an embodiment of the present application;
  • FIG. 7 is a block flow diagram of a reply word generation process provided by an embodiment of the present application.
  • FIG. 8 is a block diagram of functional modules of a device for determining a reply sentence based on rough semantics provided in an embodiment of the present application
  • FIG. 9 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
  • FIG. 1 is a schematic diagram of a hardware structure of an apparatus for determining a reply sentence based on rough semantics provided by an embodiment of the present application.
  • the apparatus 100 for determining reply sentences based on rough semantics includes at least one processor 101 , a communication line 102 , a memory 103 and at least one communication interface 104 .
  • the processor 101 may be a general-purpose central processing unit (central processing unit, CPU), microprocessor, application-specific integrated circuit (application-specific integrated circuit, ASIC), or one or more integrated circuits for controlling the execution of the program of the present application.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • Communication line 102 which may include a path, transmits information between the aforementioned components.
  • the communication interface 104 may be any device such as a transceiver (such as an antenna) for communicating with other devices or communication networks, such as Ethernet, RAN, wireless local area networks (wireless local area networks, WLAN) and the like.
  • a transceiver such as an antenna
  • WLAN wireless local area networks
  • the memory 103 may be a read-only memory (ROM) or other types of static storage devices capable of storing static information and instructions, a random access memory (random access memory, RAM) or other types of dynamic storage devices capable of storing information and instructions, or may be an electrically erasable programmable read-only memory (EEPROM), a compact disc (compact disc) read-only memory, CD-ROM) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.), magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and can be accessed by a computer, without limitation.
  • ROM read-only memory
  • RAM random access memory
  • EEPROM electrically erasable programmable read-only memory
  • CD-ROM compact disc
  • optical disc storage including compact disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.
  • magnetic disk storage medium or other magnetic storage device or any
  • the memory 103 may exist independently and be connected to the processor 101 through the communication line 102 .
  • the memory 103 can also be integrated with the processor 101 .
  • the memory 103 provided in this embodiment of the present application may generally be non-volatile.
  • the memory 103 is used to store computer-executed instructions for implementing the solutions of the present application, and the execution is controlled by the processor 101 .
  • the processor 101 is configured to execute computer-executed instructions stored in the memory 103, so as to implement the methods provided in the following embodiments of the present application.
  • computer-executed instructions may also be referred to as application code, which is not specifically limited in the present application.
  • the processor 101 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 1 .
  • the apparatus 100 for determining a reply sentence based on rough semantics may include multiple processors, such as the processor 101 and the processor 107 in FIG. 1 .
  • Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor.
  • a processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
  • the device 100 for determining the reply statement based on rough semantics is a server, for example, it can be an independent server, or it can be a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN), and big data and artificial intelligence platforms.
  • the apparatus 100 for determining a reply sentence based on rough semantics may further include an output device 105 and an input device 106 .
  • Output device 105 is in communication with processor 101 and may display information in a variety of ways.
  • the output device 105 may be a liquid crystal display (liquid crystal display, LCD), a light emitting diode (light emitting diode, LED) display device, a cathode ray tube (cathode ray tube, CRT) display device, or a projector (projector), etc.
  • the input device 106 communicates with the processor 101 and can receive user input in various ways.
  • the input device 106 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.
  • the apparatus 100 for determining a reply sentence based on rough semantics may be a general-purpose device or a special-purpose device.
  • the embodiment of the present application does not limit the type of the device 100 for determining a reply sentence based on rough semantics.
  • AI artificial intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • the method for determining reply sentences based on rough semantics in this application can be applied to scenarios such as telephone consultation, e-commerce sales, offline physical sales, business promotion, outbound calls by agents, and promotion on social platforms.
  • the telephone consultation scenario is used as an example to illustrate the method for determining the reply sentence based on rough semantics.
  • the method for determining the reply sentence based on rough semantics in other scenarios is similar to that in the telephone consultation scenario, and will not be described here.
  • FIG. 2 is a schematic flowchart of a method for determining a reply sentence based on rough semantics provided in an embodiment of the present application.
  • the method for determining reply sentences based on rough semantics includes the following steps:
  • the occurrence time of the previous round of voice information is shorter than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is the smallest.
  • the previous round of voice information is the last sentence spoken by the user before the voice information at the current moment.
  • the previous round of voice information can be determined by querying historical dialogue data that records dialogue data generated before the current time by the dialogue event to which the user's current voice information belongs, based on the occurrence time of the user's current voice information.
  • two interrelated sentence queues can be saved in the historical dialogue data, one of which is used to store the user sentences issued by the user, and the other pair is used to store the reply sentences made by AI to the user's sentences.
  • each user statement in the user statement queue and each reply statement in the reply statement queue all include a dialogue identifier and a dialogue occurrence time, and the user statement and the reply statement with the same identifier are formed into a question-answer pair through the dialogue identifier, that is, the reply statement with the same dialogue identifier is a reply to the user statement.
  • the question-and-answer logic in the historical dialogue data can be guaranteed, and the sentences of the user and AI can be saved separately for easy search.
  • rough semantic features can be understood as semantic features that include high-level abstract information in the previous round of speech information.
  • multiple high level parallel sequences can be obtained by actively constructing a high level coarse sequence representation (high level coarse sequence representation), and then analyzing the high level coarse sequence representation. Then, through the layered structure, the low-order coarse sequence is first generated, so that the information in multiple high-order parallel sequences flows to the low-order rough sequence, so as to realize the synchronous extraction of the key information and rough information in the speech information, so that the information of multiple levels can be displayed simultaneously.
  • the model that generates reply sentences can also better remember and understand long-term content, and then generate meaningful replies that are closely related to the topic, improving user experience.
  • this embodiment provides a method for performing rough semantic extraction on voice information according to the previous round of voice information, and obtaining rough semantic features corresponding to the voice information, as shown in FIG. 3 , the method includes:
  • 301 Detect the previous round of voice information, and obtain at least one first word contained in the previous round of voice information.
  • the detection process may be to perform word segmentation after performing text conversion on the previous round of speech information, and then obtain all the words that can be obtained through word segmentation processing as the at least one first word.
  • each of the at least one first word may include a word tag, and the word tag may be part-of-speech information of the corresponding first word, for example: noun, verb, named entity, and the like.
  • the named entity information in the text converted from the text can be extracted through the conditional random field model (Conditional Random Fields, CRF), and the type of the named entity, such as the name of the person or the name of the institution, can be marked through the CRF.
  • CRF conditional random field model
  • the type of the named entity such as the name of the person or the name of the institution
  • CRF conditional random field model
  • POS part-of-speech tagging tool
  • POS can also be used for noun recognition and extraction, and then the first word containing part-of-speech information can be obtained.
  • At least one first word obtained by word segmentation can be input into a gate recurrent unit (Gate Recurrent Unit, GRU) encoder for encoding to obtain the second hidden layer state feature vector. Then input the state feature vector of the second hidden layer into a multilayer perceptron (MultiLayer Perceptron, MLP), and obtain a linear output result. Finally, input the linear output result into the temporal classifier to obtain the temporal information of the previous round of speech information.
  • GRU Gate Recurrent Unit
  • GRU the structure of GRU is shown in Figure 4, which includes reset gate r t , update gate z t , candidate memory unit and the current memory unit h t .
  • the operating logic of the reset gate r t can be expressed by formula 1:
  • is the activation function
  • W r and U r are the parameter matrices corresponding to the reset gate r t
  • the initialized values are all random, and new values can be obtained through training the model.
  • b r is the bias corresponding to the reset gate r t , which is also trainable.
  • W z and U z are the parameter matrices corresponding to the update gate z t , and the initialized values are all random, and new values can be obtained through training the model.
  • b z is the bias corresponding to the update gate z t , which is also trainable.
  • tanh is the activation function
  • W and U are candidate memory units
  • the corresponding parameter matrix is initialized with random values, and new values can be obtained by training the model.
  • b is the candidate memory unit
  • the corresponding bias is also trainable.
  • z t is the weight, which is trainable.
  • the structure of MLP is shown in Figure 5. It consists of two linear layers Linear and a ReLu activation function. After outputting the linear output result through the last linear layer, the linear output result will be input into the softmax function again for multi-label classification, and finally the tense classifier is used to judge the tense of the current sentence.
  • the voice message "I'm running” is in the present continuous tense, but because it does not contain independent words such as "pass”, "zhe” and "le”, it will be missed in the traditional recognition method.
  • the second word is the first word in the word tag with the temporal information of the corresponding voice information added.
  • the second word carries the corresponding part-of-speech information and tense information of the speech on the basis of carrying the information of the speech itself, so that the reply sentences generated subsequently are more accurate.
  • the coarse encoder may be a GRU encoder.
  • the encoder outputs corresponding rough context information and a first hidden layer state feature vector.
  • each second word in the encoder corresponds to a hidden layer state feature vector, that is, the first hidden layer state feature vector.
  • the rough context information can be input into the decoder, and the decoder will calculate the similarity between the feature vector of the current decoding process (the output of the current decoding process of the decoder) and the hidden layer state features decoded from the input rough context information.
  • a similarity value is calculated for each piece of rough context information, and then these similarities are normalized to obtain a weight corresponding to each piece of rough context information.
  • this embodiment provides a method of inputting at least one rough context information and at least one first hidden layer state feature vector into a rough decoder for multiple decoding processes to obtain rough semantic features of speech information. As shown in FIG. 6 , the method includes:
  • i is an integer greater than or equal to 1 and less than or equal to j
  • j is the number of at least one rough context information
  • j is an integer greater than or equal to 1
  • the input feature vector A i is the first rough context information in at least one rough context information.
  • the similarity D i can be obtained by calculating the cosine similarity between the output feature vector B i and the ith first hidden layer state feature vector C i .
  • the similarity degree D i can be input into the softmax function for normalization processing to obtain the weight E i of the input feature vector A i .
  • the output at the previous moment will be used as the input at the next moment, until the final output obtained after multiple decoding processes is the rough semantic feature of the speech information.
  • the voice information may be converted into text, and then the text may be segmented to obtain at least one first keyword. Then, at least one second keyword is obtained by combining any two different first adjacent words and second adjacent words in the at least one first keyword, and the field interval between the first adjacent word and the second adjacent word is smaller than the first threshold.
  • the first adjacent word and the second adjacent word are any two different adjacent fields in the second candidate field whose field interval is smaller than the first threshold, and the field interval can be understood as the number of characters between the corresponding positions of the first adjacent word and the second adjacent word in their corresponding text.
  • the first keywords can be obtained: “Shanghai City”, “2016”, “Disney”, “Parkland”, “Pudong” and “New District”.
  • the number of characters between the corresponding positions of the first keyword "2016” and “Disney” in the text is 3, so the character distance between the first keyword “2016” and “Disney” is 3.
  • the number of characters between the corresponding positions of the first keyword “Disney” and “Paradise” in the text is 0, so the character distance between the first keyword “Disney” and “Paradise” is 0.
  • the first threshold can be set to 1.
  • the first keywords that meet the requirements are “Disney” and “Park”, and “Pudong” and “New District”.
  • the third candidate fields “Disneyland” and “Pudong New Area” can be obtained.
  • each second keyword in the at least one second keyword is matched with a preset entity library, and second keywords that fail to be matched are screened out to obtain at least one third keyword.
  • the first keyword constituting each third keyword in the at least one third keyword is deleted to obtain at least one fourth keyword.
  • the fourth keyword is the remaining first keyword after removing the first keyword constituting each third keyword in the at least one third keyword.
  • the determined third keyword is "Disneyland”
  • the third keyword “Disneyland” is composed of the first keywords "Disney” and “Paradise”
  • the first keywords "Disney” and “Paradise” are removed from the original several first keywords: “Shanghai”, “2016”, “Disney”, “Paradise”, “Pudong” and “New District”
  • the remaining first keywords “Shanghai”, “2016”, “Pudong” and “New District” are the fourth keywords.
  • At least one third keyword and at least one fourth keyword are combined to obtain a keyword group.
  • the keyword group may include at least one keyword, and the at least one keyword is arranged according to the order of each keyword in the at least one keyword in the voice information. Based on this, this embodiment provides a method for performing multiple hidden feature extraction processing on keyword groups to obtain the initial hidden layer state feature vector, specifically as follows:
  • the first input hidden feature H n is input to the GUR encoder to obtain the first output hidden feature I n , wherein n is an integer greater than or equal to 1 and less than or equal to m, m is the number of at least one keyword, and m is an integer greater than or equal to 1.
  • the input word vector K p , the second input hidden feature L p and the rough semantic feature can be input into the gated recurrent unit decoder to obtain the reply word Op and the second output hidden feature R p , wherein p is an integer greater than or equal to 1 and less than or equal to q, and q is determined as an integer greater than or equal to 1 by the voice information.
  • word embedding is performed on the reply word O p to obtain the reply word vector S p .
  • the reply word vector S p is used as the input word vector K p +1 of the p+1-th reply word generation process
  • the second output hidden feature R p is used as the second input hidden feature L p +1 of the p+1-th reply word generation process for the p+1-th reply word generation process until at least one reply word is obtained after multiple reply word generation processes.
  • the generation process is to generate a reply word each time, generate the reply word O p at the pth time, and then generate the reply word O p+1 at the p+1 time.
  • the word vector of the reply word O p generated last time (that is, the pth time) is also used as one of the inputs of the p+1th time.
  • the other input is the rough semantic feature, that is, the reply word O p+1 is determined by the word vector of the reply word Op p , the second output hidden feature R p generated for the pth time, and the rough semantic feature.
  • the method for determining reply sentences based on rough semantics provided by this application, by obtaining the previous round of voice information of the user's current voice information, and then performing rough semantic extraction on the previous round of voice information, the semantic features that can contain the high-level abstract information in the previous round of voice information are obtained, which are used as the rough semantic features of the user's current voice information, thereby realizing synchronous extraction of key information and rough information in the previous round of voice information. Then, word segmentation is performed on the voice information of the user at the current moment, and multiple hidden feature extraction processes are performed on the obtained multiple keywords to obtain the initial hidden layer state feature vector of the voice information of the user at the current moment.
  • the rough semantic features and the initial hidden layer state feature vector multiple reply words are generated, and the obtained at least one reply word is spliced according to the generation order of each reply word in the at least one reply word to obtain the reply sentence of the voice information.
  • the rough semantic features containing both key information and rough information in the previous round of dialogue are used as one of the basis for generating reply sentences in the current round of dialogue, so that the reply sentence generation process includes more comprehensive information features of the previous round of dialogue.
  • the generated reply sentences are more accurate, can better fit with the main body of the dialogue, and improve user experience.
  • FIG. 8 is a block diagram of functional modules of an apparatus for determining reply sentences based on rough semantics provided in an embodiment of the present application.
  • the device 800 for determining a reply sentence based on rough semantics includes:
  • the acquisition module 801 is used to acquire the previous round of voice information adjacent to the voice information according to the occurrence time of the voice information at the user's current moment, wherein the occurrence time of the previous round of voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is the smallest;
  • the processing module 802 is used to perform rough semantic extraction on the voice information according to the previous round of voice information, obtain rough semantic features corresponding to the voice information, perform word segmentation processing on the voice information, obtain keyword groups, and perform multiple hidden feature extraction processing on the keyword groups to obtain initial hidden layer state feature vectors;
  • the generation module 803 is used to perform multiple reply word generation processing according to the rough semantic feature and the initial hidden layer state feature vector to obtain at least one reply word, and splice at least one reply word according to the generation order of each reply word in the at least one reply word to obtain a reply sentence of the voice information.
  • the processing module 802 in performing rough semantic extraction on the voice information according to the previous round of voice information, and obtaining rough semantic features corresponding to the voice information, the processing module 802 is specifically used for:
  • At least one second word into a rough encoder for encoding to obtain at least one rough context information and at least one first hidden layer state feature vector, wherein at least one rough context information is in one-to-one correspondence with at least one second word, and at least one first hidden layer state feature vector is in one-to-one correspondence with at least one second word;
  • the processing module 802 is specifically used to:
  • processing module 802 is specifically used to:
  • the target output feature vector G i is used as the input feature vector A i+1 of the i+1 decoding process to perform the i+1 decoding process until multiple decoding processes are performed to obtain the rough semantic features of the speech information.
  • the keyword group includes at least one keyword, and the at least one keyword is arranged according to the sequence of each keyword in the at least one keyword in the voice information.
  • the processing module 802 is specifically used for:
  • the first output hidden feature I n is used as the first input hidden feature H n+1 of the n+1 hidden feature extraction process to perform the n+1 hidden feature extraction process until the hidden feature extraction process is performed multiple times, and the initial hidden layer state feature vector is obtained.
  • the generating module 803 is specifically used for:
  • the input word vector Kp is input into the gated recurrent unit decoder to obtain the reply word Op and the second output hidden feature Rp , wherein p is an integer greater than or equal to 1 and less than or equal to q, and q is determined by the voice information to be an integer greater than or equal to 1.
  • the reply word vector S p is used as the input word vector K p +1 of the p+1 reply word generation process, and the second output hidden feature R p is used as the second input hidden feature L p+1 of the p+1 reply word generation process for the p+1 reply word generation process until at least one reply word is obtained after multiple reply word generation processes.
  • processing module 802 is specifically used for:
  • first adjacent word and the second adjacent word to obtain at least one second keyword, wherein the first adjacent word and the second adjacent word are any two different first keywords in the at least one first keyword, and the field interval between the first adjacent word and the second adjacent word is less than the first threshold;
  • the first keyword forming each third keyword in the at least one third keyword is deleted to obtain at least one fourth keyword;
  • FIG. 9 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
  • an electronic device 900 includes a transceiver 901 , a processor 902 and a memory 903 . They are connected through a bus 904 .
  • the memory 903 is used to store computer programs and data, and can transmit the data stored in the memory 903 to the processor 902 .
  • the processor 902 is used to read the computer program in the memory 903 to perform the following operations:
  • the previous round of voice information adjacent to the voice information is obtained, wherein the time of occurrence of the previous round of voice information is less than the time of occurrence of the voice information, and the absolute value of the difference between the time of occurrence of the previous round of voice information and the time of occurrence of the voice information is the smallest;
  • the at least one reply word is spliced according to the generation sequence of each reply word in the at least one reply word to obtain a reply sentence of the voice information.
  • the processor 902 is specifically configured to perform the following operations in terms of performing rough semantic extraction on the voice information based on the previous round of voice information to obtain rough semantic features corresponding to the voice information:
  • At least one second word input rough encoder is encoded, obtains at least one coarse context information and at least one first hidden layer state feature vector, wherein, at least one rough context information and at least one second word one-to-one correspondence, at least one first hidden layer state feature vector one-to-one correspondence with at least one second word;
  • the processor 902 in terms of determining the temporal information of the previous round of speech information according to at least one first word, is specifically configured to perform the following operations:
  • the processor 902 is specifically configured to perform the following operations in terms of inputting at least one rough context information and at least one first hidden layer state feature vector into a rough decoder for multiple decoding processes to obtain rough semantic features of speech information:
  • the target output feature vector G i is used as the input feature vector A i+1 of the i+1 decoding process to perform the i+1 decoding process until multiple decoding processes are performed to obtain the rough semantic features of the speech information.
  • the keyword group includes at least one keyword, and the at least one keyword is arranged according to the sequence of each keyword in the at least one keyword in the voice information.
  • the processor 902 is specifically configured to perform the following operations in performing multiple hidden feature extraction processes on the keyword group to obtain the initial hidden layer state feature vector:
  • the first output hidden feature I n is used as the first input hidden feature H n+1 of the n+1 hidden feature extraction process to perform the n+1 hidden feature extraction process until the hidden feature extraction process is performed multiple times, and the initial hidden layer state feature vector is obtained.
  • the processor 902 is specifically configured to perform the following operations in terms of generating at least one reply word based on the rough semantic feature and the initial hidden layer state feature vector for multiple times of reply word generation:
  • the input word vector Kp is input into the gated recurrent unit decoder to obtain the reply word Op and the second output hidden feature Rp , wherein p is an integer greater than or equal to 1 and less than or equal to q, and q is determined by the voice information to be an integer greater than or equal to 1.
  • the reply word vector S p is used as the input word vector K p +1 of the p+1 reply word generation process, and the second output hidden feature R p is used as the second input hidden feature L p+1 of the p+1 reply word generation process for the p+1 reply word generation process until at least one reply word is obtained after multiple reply word generation processes.
  • the processor 902 is specifically configured to perform the following operations in performing word segmentation processing on the speech information to obtain keyword groups:
  • first adjacent word and the second adjacent word to obtain at least one second keyword, wherein the first adjacent word and the second adjacent word are any two different first keywords in the at least one first keyword, and the field interval between the first adjacent word and the second adjacent word is less than the first threshold;
  • the first keyword forming each third keyword in the at least one third keyword is deleted to obtain at least one fourth keyword;
  • the apparatus for determining reply sentences based on rough semantics in the present application may include smart phones (such as Android phones, iOS phones, Windows Phone phones, etc.), tablet computers, palmtop computers, notebook computers, mobile Internet devices MID (Mobile Internet Devices, referred to as: MID), robots or wearable devices, etc.
  • smart phones such as Android phones, iOS phones, Windows Phone phones, etc.
  • tablet computers palmtop computers
  • notebook computers mobile Internet devices MID (Mobile Internet Devices, referred to as: MID), robots or wearable devices, etc.
  • MID Mobile Internet Devices
  • robots or wearable devices etc.
  • the above device for determining a reply sentence based on rough semantics is only an example, not exhaustive, including but not limited to the above device for determining a reply sentence based on rough semantics.
  • the apparatus for determining reply sentences based on rough semantics may also include: intelligent vehicle-mounted terminals, computer equipment, and the like.
  • the embodiments of the present application also provide a computer-readable storage medium
  • the computer-readable storage medium stores a computer program
  • the computer program is executed by a processor to implement some or all steps of any method for determining a reply sentence based on rough semantics as described in the above-mentioned method embodiments.
  • the storage medium may include a hard disk, a floppy disk, an optical disk, a magnetic tape, a magnetic disk, a flash memory, and the like.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the embodiment of the present application also provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause the computer to execute some or all of the steps of any method for determining a reply sentence based on rough semantics as described in the above-mentioned method embodiments.

Abstract

Disclosed in the present application are a reply statement determination method and apparatus based on rough semantics, and an electronic device. The method comprises: according to an occurrence time of speech information of a user at the current moment, acquiring a previous round of speech information adjacent to the speech information; according to the previous round of speech information, performing rough semantic extraction on the speech information, so as to obtain a rough semantic feature corresponding to the speech information; performing word segmentation processing on the speech information, so as to obtain a keyword group; performing a plurality of instances of hidden feature extraction processing on the keyword group, so as to obtain an initial hidden layer state feature vector; according to the rough semantic feature and the initial hidden layer state feature vector, performing a plurality of instances of reply word generation processing, so as to obtain at least one reply word; and splicing the at least one reply word according to a generation sequence of each reply word from among the at least one reply word, so as to obtain a reply statement of the speech information.

Description

基于粗糙语义的回复语句确定方法、装置及电子设备Method, device and electronic equipment for determining reply sentence based on rough semantics
优先权申明priority statement
本申请要求于2022年1月22日提交中国专利局、申请号为202210083351.8,发明名称为“基于粗糙语义的回复语句确定方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202210083351.8 and the invention title "Method, device and electronic equipment for determining reply sentences based on rough semantics" submitted to the China Patent Office on January 22, 2022, the entire contents of which are incorporated in this application by reference.
技术领域technical field
本申请涉及人工智能技术领域,具体涉及一种基于粗糙语义的回复语句确定方法、装置及电子设备。The present application relates to the technical field of artificial intelligence, in particular to a method, device and electronic equipment for determining reply sentences based on rough semantics.
背景技术Background technique
目前,在传统对话模型中,通常是将前一轮对话文本进行编码,将得到的编码信息的隐藏层状态特征作为解码器的输入之一,再通过解码器按照时间序列自动生成对话的回复。这种方法中通过将前一轮对话文本编码出的隐藏层状态特征作为本轮对话中回复语句的生成依据之一,使回复语句生成过程包含了前一轮对话的信息特征。At present, in the traditional dialogue model, the previous round of dialogue text is usually encoded, and the hidden layer state feature of the obtained encoded information is used as one of the inputs of the decoder, and then the dialogue reply is automatically generated according to the time sequence through the decoder. In this method, the hidden layer state features encoded by the previous round of dialogue text are used as one of the basis for the generation of reply sentences in the current round of dialogue, so that the reply sentence generation process includes the information characteristics of the previous round of dialogue.
但是,发明人意识到在传统的方案中,为了使模型可以针对对话中的关键信息进行回复语句的构建,这些特征往往是聚焦于前一轮对话中的关键信息,继而在实际提取过程中,这些关键信息被作为特征提取出来,而对话中的一些粗糙信息往往被丢弃。从而忽略了在某些文本中,一些粗糙信息更能体现了对话真正的关注点,导致回复语句不够精准。However, the inventor realized that in the traditional solution, in order to enable the model to construct reply sentences for the key information in the dialogue, these features are often focused on the key information in the previous round of dialogue, and then in the actual extraction process, these key information are extracted as features, while some rough information in the dialogue is often discarded. Thus ignoring that in some texts, some rough information can better reflect the real focus of the dialogue, resulting in inaccurate reply sentences.
发明内容Contents of the invention
为了解决现有技术中存在的上述问题,本申请实施方式提供了一种基于粗糙语义的回复语句确定方法、装置及电子设备,可以同时提取前一轮对话中关键信息和粗糙信息,继而使生成的回复语句更加精准。In order to solve the above-mentioned problems in the prior art, the embodiments of the present application provide a method, device and electronic device for determining reply sentences based on rough semantics, which can simultaneously extract key information and rough information in the previous round of dialogue, and then make the generated reply sentences more accurate.
第一方面,本申请的实施方式提供了一种基于粗糙语义的回复语句确定方法,包括:In the first aspect, the implementation of the present application provides a method for determining a reply sentence based on rough semantics, including:
根据用户当前时刻的语音信息的发生时间,获取与语音信息相邻的前一轮语音信息,其中,前一轮语音信息的发生时间小于语音信息的发生时间,且前一轮语音信息的发生时 间与语音信息的发生时间之间的差值的绝对值最小;According to the time of occurrence of the voice information at the user's current moment, the previous round of voice information adjacent to the voice information is obtained, wherein the time of occurrence of the previous round of voice information is less than the time of occurrence of the voice information, and the absolute value of the difference between the time of occurrence of the previous round of voice information and the time of occurrence of the voice information is the smallest;
根据前一轮语音信息对语音信息进行粗糙语义提取,得到对应于语音信息的粗糙语义特征;Perform rough semantic extraction on the voice information according to the previous round of voice information, and obtain rough semantic features corresponding to the voice information;
对语音信息进行分词处理,得到关键词组;Perform word segmentation processing on the voice information to obtain keyword groups;
对关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量;Perform multiple hidden feature extraction processing on the keyword group to obtain the initial hidden layer state feature vector;
根据粗糙语义特征和初始隐藏层状态特征向量进行多次回复词生成处理,得到至少一个回复词;Perform multiple reply word generation processes according to the rough semantic feature and the initial hidden layer state feature vector to obtain at least one reply word;
将得到至少一个回复词按照至少一个回复词中每个回复词的生成顺序进行拼接,得到语音信息的回复语句。Splicing the obtained at least one reply word according to the generation sequence of each reply word in the at least one reply word to obtain a reply sentence of the voice information.
第二方面,本申请的实施方式提供了一种基于粗糙语义的回复语句确定装置,包括:In the second aspect, the embodiment of the present application provides a device for determining reply sentences based on rough semantics, including:
获取模块,用于根据用户当前时刻的语音信息的发生时间,获取与语音信息相邻的前一轮语音信息,其中,前一轮语音信息的发生时间小于语音信息的发生时间,且前一轮语音信息的发生时间与语音信息的发生时间之间的差值的绝对值最小;The acquisition module is used to obtain the previous round of voice information adjacent to the voice information according to the occurrence time of the voice information at the current moment of the user, wherein the occurrence time of the previous round of voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is the smallest;
处理模块,用于根据前一轮语音信息对语音信息进行粗糙语义提取,得到对应于语音信息的粗糙语义特征,对语音信息进行分词处理,得到关键词组,并对关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量;The processing module is used to perform rough semantic extraction on the voice information according to the previous round of voice information, obtain rough semantic features corresponding to the voice information, perform word segmentation processing on the voice information, obtain keyword groups, and perform multiple hidden feature extraction processing on the keyword groups to obtain initial hidden layer state feature vectors;
生成模块,用于根据粗糙语义特征和初始隐藏层状态特征向量进行多次回复词生成处理,得到至少一个回复词,并将得到至少一个回复词按照至少一个回复词中每个回复词的生成顺序进行拼接,得到语音信息的回复语句。The generation module is used to perform multiple reply word generation processing according to the rough semantic features and the initial hidden layer state feature vector to obtain at least one reply word, and to splice the obtained at least one reply word according to the generation order of each reply word in the at least one reply word to obtain the reply sentence of the voice information.
第三方面,本申请实施方式提供一种电子设备,其中,包括处理器、存储器、通信接口以及一个或多个程序,其中,所述一个或多个程序被存储在所述存储器中,并且被配置由所述处理器执行,所述一个或多个程序包括用于执行如下步骤的指令:In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, and the one or more programs include instructions for performing the following steps:
根据用户当前时刻的语音信息的发生时间,获取与所述语音信息的相邻的前一轮语音信息,其中,所述前一轮语音信息的发生时间小于所述语音信息的发生时间,且所述前一轮语音信息的发生时间与所述语音信息的发生时间之间的差值的绝对值最小;According to the occurrence time of the voice information at the user's current moment, the previous round of voice information adjacent to the voice information is acquired, wherein the occurrence time of the previous round of voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is the smallest;
根据所述前一轮语音信息对所述语音信息进行粗糙语义提取,得到对应于所述语音信息的粗糙语义特征;performing rough semantic extraction on the voice information according to the previous round of voice information, to obtain rough semantic features corresponding to the voice information;
对所述语音信息进行分词处理,得到关键词组;performing word segmentation processing on the voice information to obtain keyword groups;
对所述关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量;Performing multiple hidden feature extraction processes on the keyword group to obtain an initial hidden layer state feature vector;
根据所述粗糙语义特征和所述初始隐藏层状态特征向量进行多次回复词生成处理,得到至少一个回复词;Perform multiple reply word generation processes according to the rough semantic feature and the initial hidden layer state feature vector to obtain at least one reply word;
将所述至少一个回复词按照所述至少一个回复词中每个回复词的生成顺序进行拼接,得到所述语音信息的回复语句。The at least one reply word is spliced according to the generation sequence of each reply word in the at least one reply word to obtain the reply sentence of the voice information.
第四方面,本申请实施方式提供一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现如下步骤:In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the following steps:
根据用户当前时刻的语音信息的发生时间,获取与所述语音信息的相邻的前一轮语音信息,其中,所述前一轮语音信息的发生时间小于所述语音信息的发生时间,且所述前一轮语音信息的发生时间与所述语音信息的发生时间之间的差值的绝对值最小;According to the occurrence time of the voice information at the user's current moment, the previous round of voice information adjacent to the voice information is acquired, wherein the occurrence time of the previous round of voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is the smallest;
根据所述前一轮语音信息对所述语音信息进行粗糙语义提取,得到对应于所述语音信息的粗糙语义特征;performing rough semantic extraction on the voice information according to the previous round of voice information, to obtain rough semantic features corresponding to the voice information;
对所述语音信息进行分词处理,得到关键词组;performing word segmentation processing on the voice information to obtain keyword groups;
对所述关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量;Performing multiple hidden feature extraction processes on the keyword group to obtain an initial hidden layer state feature vector;
根据所述粗糙语义特征和所述初始隐藏层状态特征向量进行多次回复词生成处理,得到至少一个回复词;Perform multiple reply word generation processes according to the rough semantic feature and the initial hidden layer state feature vector to obtain at least one reply word;
将所述至少一个回复词按照所述至少一个回复词中每个回复词的生成顺序进行拼接,得到所述语音信息的回复语句。The at least one reply word is spliced according to the generation sequence of each reply word in the at least one reply word to obtain the reply sentence of the voice information.
第五方面,本申请实施方式提供一种计算机程序产品,计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,计算机可操作来使计算机执行如第一方面的方法。In a fifth aspect, an embodiment of the present application provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer is operable to cause the computer to execute the method in the first aspect.
实施本申请实施方式,具有如下有益效果:Implementing the implementation mode of the present application has the following beneficial effects:
在本申请实施方式中,通过获取用户当前时刻的语音信息的前一轮语音信息,继而对该前一轮语音信息进行粗糙语义提取,得到可以包含该前一轮语音信息中高层次的抽象信息的语义特征,作为用户当前时刻的语音信息的粗糙语义特征,由此,实现了对前一轮语音信息中关键信息和粗糙信息的同步提取。然后,对用户当前时刻的语音信息进行分词处理,并对得到的多个关键词进行多次隐藏特征提取处理,得到用户当前时刻的语音信息的初始隐藏层状态特征向量。最后,根据粗糙语义特征和初始隐藏层状态特征向量进行多次回复词生成处理,将得到的至少一个回复词按照至少一个回复词中每个回复词的生成顺序进行拼接,得到语音信息的回复语句。基于此,将同时包含前一轮对话中的关键信息和粗 糙信息的粗糙语义特征作为本轮对话中回复语句的生成依据之一,使回复语句生成过程包含了前一轮对话的更加全面的信息特征。由此,生成的回复语句的精准度更高,可以与对话的主体更好的契合,提升用户体验。In the embodiment of the present application, by obtaining the previous round of voice information of the user's current voice information, and then performing rough semantic extraction on the previous round of voice information, the semantic features that can contain high-level abstract information in the previous round of voice information are obtained, which are used as the rough semantic features of the user's current voice information, thereby realizing synchronous extraction of key information and rough information in the previous round of voice information. Then, word segmentation is performed on the voice information of the user at the current moment, and multiple hidden feature extraction processes are performed on the obtained multiple keywords to obtain the initial hidden layer state feature vector of the voice information of the user at the current moment. Finally, according to the rough semantic features and the initial hidden layer state feature vector, multiple reply words are generated, and the obtained at least one reply word is spliced according to the generation order of each reply word in the at least one reply word to obtain the reply sentence of the voice information. Based on this, the rough semantic features that contain both key information and rough information in the previous round of dialogue are used as one of the basis for generating reply sentences in this round of dialogue, so that the reply sentence generation process includes more comprehensive information features of the previous round of dialogue. As a result, the generated reply sentences are more accurate, can better fit with the main body of the dialogue, and improve user experience.
附图说明Description of drawings
为了更清楚地说明本申请实施方式中的技术方案,下面将对实施方式描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments. Obviously, the accompanying drawings in the following description are some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without creative work.
图1为本申请实施方式提供的一种基于粗糙语义的回复语句确定装置的硬件结构示意图;FIG. 1 is a schematic diagram of the hardware structure of an apparatus for determining a reply sentence based on rough semantics provided in an embodiment of the present application;
图2为本申请实施方式提供的一种基于粗糙语义的回复语句确定方法的流程示意图;FIG. 2 is a schematic flowchart of a method for determining a reply sentence based on rough semantics provided in an embodiment of the present application;
图3为本申请实施方式提供的一种根据前一轮语音信息对语音信息进行粗糙语义提取,得到对应于语音信息的粗糙语义特征的方法的流程示意图;FIG. 3 is a schematic flowchart of a method for extracting rough semantics from voice information based on the previous round of voice information to obtain rough semantic features corresponding to the voice information provided by an embodiment of the present application;
图4为本申请实施方式提供的一种门控循环单元编码器的结构示意图;FIG. 4 is a schematic structural diagram of a gated cyclic unit encoder provided in an embodiment of the present application;
图5为本申请实施方式提供的一种多层感知器的结构示意图;FIG. 5 is a schematic structural diagram of a multi-layer perceptron provided in an embodiment of the present application;
图6为本申请实施方式提供的一种将至少一个粗糙上下文信息至少一个第一隐藏层状态特征向量输入粗糙解码器进行多次解码处理,得到语音信息的粗糙语义特征的方法的流程示意图;6 is a schematic flow diagram of a method for inputting at least one rough context information and at least one first hidden layer state feature vector into a rough decoder for multiple decoding processes to obtain rough semantic features of speech information provided by an embodiment of the present application;
图7为本申请实施方式提供的一种回复词生成处理的流程框图;FIG. 7 is a block flow diagram of a reply word generation process provided by an embodiment of the present application;
图8为本申请实施方式提供的一种基于粗糙语义的回复语句确定装置的功能模块组成框图;FIG. 8 is a block diagram of functional modules of a device for determining a reply sentence based on rough semantics provided in an embodiment of the present application;
图9为本申请实施方式提供的一种电子设备的结构示意图。FIG. 9 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施方式中的附图,对本申请实施方式中的技术方案进行清楚、完整地描述,显然,所描述的实施方式是本申请一部分实施方式,而不是全部的实施方式。基于本申请中的实施方式,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施方式,都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are part of the embodiments of the application, not all of them. Based on the implementation manners in this application, all other implementation manners obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.
首先,参阅图1,图1为本申请实施方式提供的一种基于粗糙语义的回复语句确定装置的硬件结构示意图。该基于粗糙语义的回复语句确定装置100包括至少一个处理器101,通信线路102,存储器103以及至少一个通信接口104。First, please refer to FIG. 1 , which is a schematic diagram of a hardware structure of an apparatus for determining a reply sentence based on rough semantics provided by an embodiment of the present application. The apparatus 100 for determining reply sentences based on rough semantics includes at least one processor 101 , a communication line 102 , a memory 103 and at least one communication interface 104 .
在本实施方式中,处理器101,可以是一个通用中央处理器(central processing unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本申请方案程序执行的集成电路。In this embodiment, the processor 101 may be a general-purpose central processing unit (central processing unit, CPU), microprocessor, application-specific integrated circuit (application-specific integrated circuit, ASIC), or one or more integrated circuits for controlling the execution of the program of the present application.
通信线路102,可以包括一通路,在上述组件之间传送信息。 Communication line 102, which may include a path, transmits information between the aforementioned components.
通信接口104,可以是任何收发器一类的装置(如天线等),用于与其他设备或通信网络通信,例如以太网,RAN,无线局域网(wireless local area networks,WLAN)等。The communication interface 104 may be any device such as a transceiver (such as an antenna) for communicating with other devices or communication networks, such as Ethernet, RAN, wireless local area networks (wireless local area networks, WLAN) and the like.
存储器103,可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。The memory 103 may be a read-only memory (ROM) or other types of static storage devices capable of storing static information and instructions, a random access memory (random access memory, RAM) or other types of dynamic storage devices capable of storing information and instructions, or may be an electrically erasable programmable read-only memory (EEPROM), a compact disc (compact disc) read-only memory, CD-ROM) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.), magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and can be accessed by a computer, without limitation.
在本实施方式中,存储器103可以独立存在,通过通信线路102与处理器101相连接。存储器103也可以和处理器101集成在一起。本申请实施方式提供的存储器103通常可以具有非易失性。其中,存储器103用于存储执行本申请方案的计算机执行指令,并由处理器101来控制执行。处理器101用于执行存储器103中存储的计算机执行指令,从而实现本申请下述实施方式中提供的方法。In this embodiment, the memory 103 may exist independently and be connected to the processor 101 through the communication line 102 . The memory 103 can also be integrated with the processor 101 . The memory 103 provided in this embodiment of the present application may generally be non-volatile. Wherein, the memory 103 is used to store computer-executed instructions for implementing the solutions of the present application, and the execution is controlled by the processor 101 . The processor 101 is configured to execute computer-executed instructions stored in the memory 103, so as to implement the methods provided in the following embodiments of the present application.
在可选的实施方式中,计算机执行指令也可以称之为应用程序代码,本申请对此不作具体限定。In an optional implementation manner, computer-executed instructions may also be referred to as application code, which is not specifically limited in the present application.
在可选的实施方式中,处理器101可以包括一个或多个CPU,例如图1中的CPU0和CPU1。In an optional implementation manner, the processor 101 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 1 .
在可选的实施方式中,该基于粗糙语义的回复语句确定装置100可以包括多个处理器,例如图1中的处理器101和处理器107。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。In an optional implementation manner, the apparatus 100 for determining a reply sentence based on rough semantics may include multiple processors, such as the processor 101 and the processor 107 in FIG. 1 . Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
在可选的实施方式中,若基于粗糙语义的回复语句确定装置100为服务器,例如,可以是独立的服务器,也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。则基于粗糙语义的回复语句确定装置100还可以包括输出设备105和输入设备106。输出设备105和处理器101通信,可以以多种方式来显示信息。例如,输出设备105可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或投影仪(projector)等。输入设备106和处理器101通信,可以以多种方式接收用户的输入。例如,输入设备106可以是鼠标、键盘、触摸屏设备或传感设备等。In an optional embodiment, if the device 100 for determining the reply statement based on rough semantics is a server, for example, it can be an independent server, or it can be a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN), and big data and artificial intelligence platforms. Then the apparatus 100 for determining a reply sentence based on rough semantics may further include an output device 105 and an input device 106 . Output device 105 is in communication with processor 101 and may display information in a variety of ways. For example, the output device 105 may be a liquid crystal display (liquid crystal display, LCD), a light emitting diode (light emitting diode, LED) display device, a cathode ray tube (cathode ray tube, CRT) display device, or a projector (projector), etc. The input device 106 communicates with the processor 101 and can receive user input in various ways. For example, the input device 106 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.
上述的基于粗糙语义的回复语句确定装置100可以是一个通用设备或者是一个专用设备。本申请实施方式不限定基于粗糙语义的回复语句确定装置100的类型。The apparatus 100 for determining a reply sentence based on rough semantics may be a general-purpose device or a special-purpose device. The embodiment of the present application does not limit the type of the device 100 for determining a reply sentence based on rough semantics.
其次,需要说明的是,本申请所公开的实施方式可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。Secondly, it should be noted that the embodiments disclosed in this application can acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
最后,本申请中的基于粗糙语义的回复语句确定方法可以应用到电话咨询,电商销售、线下实体销售、业务推广、坐席电话外呼、社交平台推广等场景。本申请中主要以电话咨询场景为例说明该基于粗糙语义的回复语句确定方法方法,其他场景中的基于粗糙语义的回复语句确定方法与电话咨询场景下的实现方式类似,在此不再叙述。Finally, the method for determining reply sentences based on rough semantics in this application can be applied to scenarios such as telephone consultation, e-commerce sales, offline physical sales, business promotion, outbound calls by agents, and promotion on social platforms. In this application, the telephone consultation scenario is used as an example to illustrate the method for determining the reply sentence based on rough semantics. The method for determining the reply sentence based on rough semantics in other scenarios is similar to that in the telephone consultation scenario, and will not be described here.
以下,将对本申请所公开的基于粗糙语义的回复语句确定方法进行说明:In the following, the method for determining reply sentences based on rough semantics disclosed in this application will be described:
参阅图2,图2为本申请实施方式提供的一种基于粗糙语义的回复语句确定方法的流程示意图。该基于粗糙语义的回复语句确定方法包括以下步骤:Referring to FIG. 2 , FIG. 2 is a schematic flowchart of a method for determining a reply sentence based on rough semantics provided in an embodiment of the present application. The method for determining reply sentences based on rough semantics includes the following steps:
201:根据用户当前时刻的语音信息的发生时间,获取与语音信息相邻的前一轮语音信息。201: Acquire the previous round of voice information adjacent to the voice information according to the occurrence time of the voice information at the current moment of the user.
在本实施方式中,该前一轮语音信息的发生时间小于语音信息的发生时间,且前一轮语音信息的发生时间与语音信息的发生时间之间的差值的绝对值最小。简单而言,该前一轮语音信息即为用户在说出当前时刻的语音信息之前说的上一句话。In this embodiment, the occurrence time of the previous round of voice information is shorter than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is the smallest. In simple terms, the previous round of voice information is the last sentence spoken by the user before the voice information at the current moment.
示例性的,可以通过用户当前时刻的语音信息的发生时间,查询记录用户当前时刻的语音信息所属的对话事件在当前时刻前所产生的对话数据的历史对话数据来确定该前一轮语音信息。具体而言,历史对话数据中可以保存两条相互关联的语句队列,其中,一队用于存储用户所发出的用户语句,另一对用于存储AI对用户的语句做出的回复语句。同时,用户语句队列中的每个用户语句,以及回复语句队列中的每个回复语句均包含有对话标识和对话发生时间,通过对话标识将标识相同的用户语句和回复语句组成为一个问答对,即对话标识相同的回复语句为对用户语句的答复。由此,既可以保证历史对话数据中的问答逻辑性,同时将用户和AI的语句分开保存,便于查找。Exemplarily, the previous round of voice information can be determined by querying historical dialogue data that records dialogue data generated before the current time by the dialogue event to which the user's current voice information belongs, based on the occurrence time of the user's current voice information. Specifically, two interrelated sentence queues can be saved in the historical dialogue data, one of which is used to store the user sentences issued by the user, and the other pair is used to store the reply sentences made by AI to the user's sentences. At the same time, each user statement in the user statement queue and each reply statement in the reply statement queue all include a dialogue identifier and a dialogue occurrence time, and the user statement and the reply statement with the same identifier are formed into a question-answer pair through the dialogue identifier, that is, the reply statement with the same dialogue identifier is a reply to the user statement. In this way, the question-and-answer logic in the historical dialogue data can be guaranteed, and the sentences of the user and AI can be saved separately for easy search.
因此,在本实施方式中,可以通过查询用户语句队列,确定对话发生时间早于用户当前时刻的语音信息的发生时间,且发生时间与第一语句的发生时间之间的差的绝对值最小的语音信息作为前一轮语音信息。Therefore, in this embodiment, by querying the user sentence queue, it is possible to determine the speech information whose dialogue occurrence time is earlier than the occurrence time of the voice information at the user's current moment, and whose absolute value of the difference between the occurrence time and the occurrence time of the first sentence is the smallest, as the previous round of voice information.
202:根据前一轮语音信息对语音信息进行粗糙语义提取,得到对应于语音信息的粗糙语义特征。202: Perform rough semantic extraction on the voice information according to the previous round of voice information, and obtain rough semantic features corresponding to the voice information.
在本实施方式中,粗糙语义特征可以理解为包含前一轮语音信息中高层次的抽象信息的语义特征。示例性的,可以通过主动构造一个的高阶粗糙序列表示(high level coarse sequence representation),继而对该高阶粗糙序列表示进行分析,得到多个高阶平行序列。再通过分层结构,先生成低阶粗糙(coarse)序列,让多个高阶平行序列中的信息流向低阶粗糙序列,实现对语音信息中的关键信息和粗糙信息的同步提取,使多个层次的信息可以同步体现。同时,转化为低阶粗糙序列后,生成回复语句的模型也能够更好的对长期内容进行记忆和理解,继而生成与主题密切相关的有意义的回复,提升用户体验。In this embodiment, rough semantic features can be understood as semantic features that include high-level abstract information in the previous round of speech information. Exemplarily, multiple high level parallel sequences can be obtained by actively constructing a high level coarse sequence representation (high level coarse sequence representation), and then analyzing the high level coarse sequence representation. Then, through the layered structure, the low-order coarse sequence is first generated, so that the information in multiple high-order parallel sequences flows to the low-order rough sequence, so as to realize the synchronous extraction of the key information and rough information in the speech information, so that the information of multiple levels can be displayed simultaneously. At the same time, after being transformed into a low-level rough sequence, the model that generates reply sentences can also better remember and understand long-term content, and then generate meaningful replies that are closely related to the topic, improving user experience.
示例性,本实施方式提供了一种根据前一轮语音信息对语音信息进行粗糙语义提取,得到对应于语音信息的粗糙语义特征的方法,如图3所示,该方法包括:Exemplarily, this embodiment provides a method for performing rough semantic extraction on voice information according to the previous round of voice information, and obtaining rough semantic features corresponding to the voice information, as shown in FIG. 3 , the method includes:
301:对前一轮语音信息进行检测,得到前一轮语音信息包含的至少一个第一词语。301: Detect the previous round of voice information, and obtain at least one first word contained in the previous round of voice information.
在本实施方式中,检测过程可以是将前一轮语音信息进行文本转换后进行分词,继而 获取通过分词处理可以得到的所有词语作为该至少一个第一词语。同时,该至少一个第一词语中的每个第一词语可以包括词语标签,该词语标签可以是对应的第一词语的词性信息,例如:名词、动词、命名实体等。In this embodiment, the detection process may be to perform word segmentation after performing text conversion on the previous round of speech information, and then obtain all the words that can be obtained through word segmentation processing as the at least one first word. Meanwhile, each of the at least one first word may include a word tag, and the word tag may be part-of-speech information of the corresponding first word, for example: noun, verb, named entity, and the like.
由此,在本实施方式中,可以通过条件随机场模型(Conditional Random Fields,CRF)对文本转换得到的文字文本中的命名实体信息进行提取,并通过CRF标注出命名实体的类型,比如人名或者机构名等。再使用词性标注工具(Part-Of-Speech,POS)对文字文本进行分词和词性标注,提取文字文本中的名词和动词。在这个过程中需要结合CRF和POS的综合结果,因为POS的识别只是在词上,而CRF可以是一个完整的短语,例如:我在上海复旦大学工作,CRF可以完整识别出“上海复旦大学”这个机构名实体,而POS只能识别出名词:“上海”、“复旦”和“大学”。所以在处理实体词上,如果POS的结果被CRF所包含,会优先使用CRF的结果,动词方面则只会使用POS的结果。由此,即可得到包含词性信息标注的第一词语。Therefore, in this embodiment, the named entity information in the text converted from the text can be extracted through the conditional random field model (Conditional Random Fields, CRF), and the type of the named entity, such as the name of the person or the name of the institution, can be marked through the CRF. Then use the part-of-speech tagging tool (Part-Of-Speech, POS) to perform word segmentation and part-of-speech tagging on the text, and extract the nouns and verbs in the text. In this process, it is necessary to combine the comprehensive results of CRF and POS, because the recognition of POS is only on words, and CRF can be a complete phrase. For example: I work at Fudan University in Shanghai, and CRF can fully recognize the institution name entity of "Shanghai Fudan University", while POS can only recognize nouns: "Shanghai", "Fudan" and "University". Therefore, when dealing with entity words, if the result of POS is included in CRF, the result of CRF will be used first, and the result of POS will only be used for verbs. Thus, the first word tagged with part-of-speech information can be obtained.
在可选的实施方式中,如果用户的使用的语言为英语,则可以预先构造一个对应领域的动词和命名实体的集合,继而通过匹配的方式提取原句中的动词和命名实体,对英文的名词的提取则可以同样使用POS进行名词识别和提取,继而得到包含词性信息标注的第一词语。In an optional implementation, if the language used by the user is English, a set of verbs and named entities corresponding to the domain can be pre-constructed, and then the verbs and named entities in the original sentence can be extracted by matching. For the extraction of English nouns, POS can also be used for noun recognition and extraction, and then the first word containing part-of-speech information can be obtained.
302:根据至少一个第一词语确定前一轮语音信息的时态信息。302: Determine temporal information of the previous round of voice information according to at least one first word.
在本实施方式中,可以将分词所得到的至少一个第一词语输入门控循环单元(Gate Recurrent Unit,GRU)编码器进行编码,得到第二隐藏层状态特征向量。继而将第二隐藏层状态特征向量输入多层感知器(MultiLayer Perceptron,MLP),得到线性输出结果。最后,将线性输出结果输入时态分类器,得到前一轮语音信息的时态信息。In this embodiment, at least one first word obtained by word segmentation can be input into a gate recurrent unit (Gate Recurrent Unit, GRU) encoder for encoding to obtain the second hidden layer state feature vector. Then input the state feature vector of the second hidden layer into a multilayer perceptron (MultiLayer Perceptron, MLP), and obtain a linear output result. Finally, input the linear output result into the temporal classifier to obtain the temporal information of the previous round of speech information.
具体而言,GRU的结构如图4所示,其中包括重置门r t、更新门z t、候选记忆单元
Figure PCTCN2022090129-appb-000001
和当前时刻记忆单元h t
Specifically, the structure of GRU is shown in Figure 4, which includes reset gate r t , update gate z t , candidate memory unit
Figure PCTCN2022090129-appb-000001
and the current memory unit h t .
具体而言,重置门r t的运行逻辑可以通过公式①进行表示: Specifically, the operating logic of the reset gate r t can be expressed by formula ①:
r t=σ(W rX t+U rh t-1+b r).........① r t =σ(W r X t +U r h t-1 +b r )..........①
其中,σ是激活函数,W r和U r是重置门r t对应的参数矩阵,初始化的值都是随机的,可以通过对模型的训练得到新的值。b r是重置门r t对应的偏置,也是可训练的。 Among them, σ is the activation function, W r and U r are the parameter matrices corresponding to the reset gate r t , the initialized values are all random, and new values can be obtained through training the model. b r is the bias corresponding to the reset gate r t , which is also trainable.
进一步的,更新门z t的运行逻辑可以通过公式②进行表示: Furthermore, the operating logic of the update gate z t can be expressed by formula ②:
z t=σ(W zX t+U zh t-1+b z).........② z t =σ(W z X t +U z h t-1 +b z )..........②
其中,W z和U z是更新门z t对应的参数矩阵,初始化的值都是随机的,可以通过对模型的训练得到新的值。b z是更新门z t对应的偏置,也是可训练的。 Among them, W z and U z are the parameter matrices corresponding to the update gate z t , and the initialized values are all random, and new values can be obtained through training the model. b z is the bias corresponding to the update gate z t , which is also trainable.
进一步的,候选记忆单元
Figure PCTCN2022090129-appb-000002
的运行逻辑可以通过公式③进行表示:
Further, the candidate memory unit
Figure PCTCN2022090129-appb-000002
The operation logic of can be expressed by the formula ③:
Figure PCTCN2022090129-appb-000003
Figure PCTCN2022090129-appb-000003
其中,tanh是激活函数,W和U是候选记忆单元
Figure PCTCN2022090129-appb-000004
对应的参数矩阵,初始化的值都是随机的,可以通过对模型的训练得到新的值。b是候选记忆单元
Figure PCTCN2022090129-appb-000005
对应的偏置,也是可训练的。
Among them, tanh is the activation function, W and U are candidate memory units
Figure PCTCN2022090129-appb-000004
The corresponding parameter matrix is initialized with random values, and new values can be obtained by training the model. b is the candidate memory unit
Figure PCTCN2022090129-appb-000005
The corresponding bias is also trainable.
进一步的,当前时刻记忆单元h t的运行逻辑可以通过公式④进行表示: Furthermore, the operating logic of the memory unit h t at the current moment can be expressed by formula ④:
Figure PCTCN2022090129-appb-000006
Figure PCTCN2022090129-appb-000006
其中,z t为权重,是可训练的。 Among them, z t is the weight, which is trainable.
在本实施方式中,MLP的结构如图5所示,由两个线性层Linear和一个ReLu激活函数组成,在通过最后一个线性层输出线性输出结果后,会将该线性输出结果再次输入进softmax函数进行多标签分类,最后由时态分类器判断当前句子的时态。由此,避免了传统时态识别中单纯使用“过”、“着”、“了”等独立词引发的误识别和漏识别。比如:语音信息“我在跑步”是现在进行时,但因为没有包含“过”、“着”、“了”等独立词,所以在传统识别方式中会被漏识别。In this embodiment, the structure of MLP is shown in Figure 5. It consists of two linear layers Linear and a ReLu activation function. After outputting the linear output result through the last linear layer, the linear output result will be input into the softmax function again for multi-label classification, and finally the tense classifier is used to judge the tense of the current sentence. As a result, misrecognition and missed recognition caused by simply using independent words such as "Guo", "Zhe", and "Le" in traditional tense recognition are avoided. For example: the voice message "I'm running" is in the present continuous tense, but because it does not contain independent words such as "pass", "zhe" and "le", it will be missed in the traditional recognition method.
303:将时态信息添加进每个第一词语的词语标签中,得到与至少一个第一词语一一对应的至少一个第二词语。303: Add temporal information to the word label of each first word to obtain at least one second word corresponding to at least one first word one-to-one.
简单而言,在本实施方式中,第二词语即为词语标签中添加了对应的语音信息的时态信息的第一词语。由此,使第二词语在携带语音本身的信息的基础上,还携带有相应的词性信息和语音的时态信息,使后续生成的回复语句更加精准。In short, in this embodiment, the second word is the first word in the word tag with the temporal information of the corresponding voice information added. Thus, the second word carries the corresponding part-of-speech information and tense information of the speech on the basis of carrying the information of the speech itself, so that the reply sentences generated subsequently are more accurate.
304:将至少一个第二词语输入粗糙编码器进行编码,得到与至少一个第二词语一一对应的至少一个粗糙上下文信息,和与至少一个第二词语一一对应的至少一个第一隐藏层状态特征向量。304: Input at least one second word into a rough encoder for encoding, and obtain at least one rough context information one-to-one corresponding to at least one second word, and at least one first hidden layer state feature vector one-to-one corresponding to at least one second word.
在本实施方式中,粗糙编码器可以是GRU编码器。具体而言,在编码时,将至少一个第二词语按照顺序依次输入GRU编码器,由编码器输出对应的粗糙上下文信息和第一隐藏层状态特征向量。在编码的过程中,输入GRU编码器的除了当前编码的第二词语外,还可以将上一次编码过程输出的第一隐藏层状态特征向量也作为当前编码的输入。即,对第x个第二词语编码时,可以将第x个第二词语,和第x-1个第一隐藏层状态特征向量输入GRU 编码器,得到第x个粗糙上下文信息和第x个第一隐藏层状态特征向量。且当x=1时,由于不存在第0个第二词语,此时,只将第1个第二词语输入GRU编码器进行编码即可。In this embodiment, the coarse encoder may be a GRU encoder. Specifically, during encoding, at least one second word is sequentially input into a GRU encoder, and the encoder outputs corresponding rough context information and a first hidden layer state feature vector. In the encoding process, in addition to the second word currently encoded, the input to the GRU encoder can also use the first hidden layer state feature vector output from the previous encoding process as the input of the current encoding. That is, when encoding the xth second word, the xth second word and the x-1th first hidden layer state feature vector can be input into the GRU encoder to obtain the xth rough context information and the xth first hidden layer state feature vector. And when x=1, since there is no 0th second word, at this time, only the 1st second word is input into the GRU encoder for encoding.
305:将至少一个粗糙上下文信息至少一个第一隐藏层状态特征向量输入粗糙解码器进行多次解码处理,得到语音信息的粗糙语义特征。305: Input at least one piece of rough context information and at least one first hidden layer state feature vector into a rough decoder for multiple decoding processes to obtain rough semantic features of the speech information.
在本实施方式中,在提取粗糙语义特征时,对于语音信息而言,拆分所得的每一个第二词语的重要程度是不一样的。因此,在将粗糙上下文信息输入粗糙解码器前,可以对这些信息进行注意力处理,以获取各个粗糙上下文信息的重要程度。In this embodiment, when extracting rough semantic features, for speech information, the importance of each second word obtained by splitting is different. Therefore, before the coarse context information is fed into the coarse decoder, attention processing can be performed on these information to obtain the importance of each coarse context information.
示例性的,每一个第二词语在编码器中都分别对应了一个隐藏层状态特征向量,即第一隐藏层状态特征向量,简单而言,有多少个粗糙上下文信息就有多少个第一隐藏层状态特征向量。因此,可以将粗糙上下文信息输入进解码器,解码器在解码时会计算当前解码过程的特征向量(解码器当前解码过程的输出)与输入的粗糙上下文信息解码所得的隐藏层状态特征之间的相似度。由此,对每一个粗糙上下文信息都会计算出一个相似度的值,然后这些相似度进行归一化获得每一个粗糙上下文信息对应的权重。再将每一个粗糙上下文信息对应的权重与该粗糙上下文信息输入编码器获得的隐藏层状态特征向量进行相乘,得到注意力的特征,再与该粗糙上下文信息输入解码器时获得的输出特征向量相加,得到该粗糙上下文信息输入解码器得到的最终特征。Exemplarily, each second word in the encoder corresponds to a hidden layer state feature vector, that is, the first hidden layer state feature vector. In simple terms, there are as many first hidden layer state feature vectors as there are coarse context information. Therefore, the rough context information can be input into the decoder, and the decoder will calculate the similarity between the feature vector of the current decoding process (the output of the current decoding process of the decoder) and the hidden layer state features decoded from the input rough context information. Thus, a similarity value is calculated for each piece of rough context information, and then these similarities are normalized to obtain a weight corresponding to each piece of rough context information. Then multiply the weight corresponding to each rough context information with the hidden layer state feature vector obtained by inputting the rough context information into the encoder to obtain the feature of attention, and then add it to the output feature vector obtained when the rough context information is input into the decoder to obtain the final feature obtained by inputting the rough context information into the decoder.
基于此,本实施方式提供了一种将至少一个粗糙上下文信息至少一个第一隐藏层状态特征向量输入粗糙解码器进行多次解码处理,得到语音信息的粗糙语义特征的方法,如图6所示,该方法包括:Based on this, this embodiment provides a method of inputting at least one rough context information and at least one first hidden layer state feature vector into a rough decoder for multiple decoding processes to obtain rough semantic features of speech information. As shown in FIG. 6 , the method includes:
601:在第i次解码处理中,将输入特征向量A i输入粗糙解码器,得到输出特征向量B i601: In the i-th decoding process, input the input feature vector A i into the rough decoder to obtain the output feature vector B i .
在本实施方式中,i为大于或等于1,且小于或等于j的整数,j为至少一个粗糙上下文信息的数量,j为大于或等于1的整数,当i=1时,输入特征向量A i为至少一个粗糙上下文信息中的第1个粗糙上下文信息。 In this embodiment, i is an integer greater than or equal to 1 and less than or equal to j, j is the number of at least one rough context information, j is an integer greater than or equal to 1, when i=1, the input feature vector A i is the first rough context information in at least one rough context information.
602:计算输出特征向量B i和至少一个第一隐藏层状态特征向量中第i个第一隐藏层状态特征向量C i之间的相似度D i602: Calculate the similarity D i between the output feature vector B i and the i-th first hidden layer state feature vector C i among at least one first hidden layer state feature vector.
在本实施方式中,可以通过计算输出特征向量B i和第i个第一隐藏层状态特征向量C i之间的余弦相似度来得到相似度D iIn this embodiment, the similarity D i can be obtained by calculating the cosine similarity between the output feature vector B i and the ith first hidden layer state feature vector C i .
603:对相似度D i进行归一化处理,得到输入特征向量A i的权重E i603: Perform normalization processing on the similarity D i to obtain the weight E i of the input feature vector A i .
在本实施方式中,可以将相似度D i输入softmax函数进行归一化处理,得到输入特征向 量A i的权重E iIn this embodiment, the similarity degree D i can be input into the softmax function for normalization processing to obtain the weight E i of the input feature vector A i .
604:将权重E i与第i个第一隐藏层状态特征向量C i相乘,得到权重特征向量F i604: Multiply the weight E i by the i-th first hidden layer state feature vector C i to obtain a weight feature vector F i .
605:将权重特征向量F i与输出特征向量B i相加,得到目标输出特征向量G i605: Add the weight feature vector F i to the output feature vector B i to obtain the target output feature vector G i .
606:将目标输出特征向量G i作为第i+1次解码处理的输入特征向量A i+1进行第i+1次解码处理,直至进行多次解码处理,得到语音信息的粗糙语义特征。 606: Use the target output feature vector G i as the input feature vector A i+1 of the i+1 decoding process to perform the i+1 decoding process until multiple decoding processes are performed to obtain rough semantic features of the speech information.
具体而言,在多次解码处理的过程中,上一时刻的输出会作为下一时刻的输入,直至进行完多次解码处理后,得到的最终输出,即为语音信息的粗糙语义特征。Specifically, in the process of multiple decoding processes, the output at the previous moment will be used as the input at the next moment, until the final output obtained after multiple decoding processes is the rough semantic feature of the speech information.
203:对语音信息进行分词处理,得到关键词组。203: Perform word segmentation processing on the speech information to obtain keyword groups.
在本实施方式中,可以将语音信息转化为文字文本,继而对文字文本进行切分处理,得到至少一个第一关键词。然后,将至少一个第一关键词中任意两个不同的第一相邻词和第二相邻词进行组合,得到至少一个第二关键词,该第一相邻词和第二相邻词之间的字段间隔是小于第一阈值的。In this embodiment, the voice information may be converted into text, and then the text may be segmented to obtain at least one first keyword. Then, at least one second keyword is obtained by combining any two different first adjacent words and second adjacent words in the at least one first keyword, and the field interval between the first adjacent word and the second adjacent word is smaller than the first threshold.
具体而言,第一相邻词和第二相邻词为第二候选字段中字段间隔小于第一阈值的任意两个不同的相邻字段,该字段间隔可以理解为第一相邻词和第二相邻词在其对应的文字文本中相应位置之间的字符数量。示例性的,对于文字文本“上海市与2016年开园的迪士尼乐园坐落于浦东新区”,经过分词并筛选后可以得到第一关键词:“上海市”、“2016年”、“迪士尼”、“乐园”、“浦东”和“新区”。此时,第一关键词“2016年”和“迪士尼”在文字文本中相应位置之间的字符数量为3,所以第一关键词“2016年”和“迪士尼”之间的字符距离为3。而第一关键词“迪士尼”和“乐园”在文字文本中相应位置之间的字符数量为0,所以第一关键词“迪士尼”和“乐园”之间的字符距离为0。Specifically, the first adjacent word and the second adjacent word are any two different adjacent fields in the second candidate field whose field interval is smaller than the first threshold, and the field interval can be understood as the number of characters between the corresponding positions of the first adjacent word and the second adjacent word in their corresponding text. Exemplarily, for the text "Shanghai and Disneyland, which opened in 2016, is located in Pudong New District", after word segmentation and screening, the first keywords can be obtained: "Shanghai City", "2016", "Disney", "Parkland", "Pudong" and "New District". At this time, the number of characters between the corresponding positions of the first keyword "2016" and "Disney" in the text is 3, so the character distance between the first keyword "2016" and "Disney" is 3. However, the number of characters between the corresponding positions of the first keyword "Disney" and "Paradise" in the text is 0, so the character distance between the first keyword "Disney" and "Paradise" is 0.
在本实施方式中,可以将第一阈值设置为1,由此,以上述文字文本“上海市与2016年开园的迪士尼乐园坐落于浦东新区”为例,满足要求的第一关键词:“迪士尼”和“乐园”,以及“浦东”和“新区”。由此,可以得到第三候选字段“迪士尼乐园”和“浦东新区”。In this embodiment, the first threshold can be set to 1. Thus, taking the above-mentioned text "Shanghai and Disneyland, which opened in 2016, are located in Pudong New District" as an example, the first keywords that meet the requirements are "Disney" and "Park", and "Pudong" and "New District". Thus, the third candidate fields "Disneyland" and "Pudong New Area" can be obtained.
然后,将至少一个第二关键词中的每个第二关键词与预设的实体库进行匹配,并筛除匹配失败的第二关键词,得到至少一个第三关键词。再在至少一个第一关键词中,将组成至少一个第三关键词中的每个第三关键词的第一关键词删除,得到至少一个第四关键词。Then, each second keyword in the at least one second keyword is matched with a preset entity library, and second keywords that fail to be matched are screened out to obtain at least one third keyword. In the at least one first keyword, the first keyword constituting each third keyword in the at least one third keyword is deleted to obtain at least one fourth keyword.
具体而言,第四关键词即为剔除了组成组成至少一个第三关键词中的每个第三关键词的第一关键词后剩下的第一关键词。示例性的,沿用上述文字文本“上海市与2016年开园的迪士尼乐园坐落于浦东新区”的示例,假定确定的第三关键词为“迪士尼乐园”,则由于第三 关键词“迪士尼乐园”是由第一关键词“迪士尼”和“乐园”组成的,因此,将第一关键词“迪士尼”和“乐园”从原来得到的若干个第一关键词:“上海市”、“2016年”、“迪士尼”、“乐园”、“浦东”和“新区”中剔除,则剩下的第一关键词:“上海市”、“2016年”、“浦东”和“新区”即为第四关键词。Specifically, the fourth keyword is the remaining first keyword after removing the first keyword constituting each third keyword in the at least one third keyword. Exemplarily, following the example of the above-mentioned text "Shanghai and the Disneyland opened in 2016 are located in Pudong New Area", assuming that the determined third keyword is "Disneyland", since the third keyword "Disneyland" is composed of the first keywords "Disney" and "Paradise", the first keywords "Disney" and "Paradise" are removed from the original several first keywords: "Shanghai", "2016", "Disney", "Paradise", "Pudong" and "New District", Then the remaining first keywords: "Shanghai", "2016", "Pudong" and "New District" are the fourth keywords.
最后,将至少一个第三关键词和至少一个第四关键词进行组合,得到关键词组。Finally, at least one third keyword and at least one fourth keyword are combined to obtain a keyword group.
具体而言,沿用上述文字文本“上海市与2016年开园的迪士尼乐园坐落于浦东新区”的示例,将第三关键词“迪士尼乐园”和第四关键词:“上海市”、“2016年”、“浦东”和“新区”进行组合,即可得到关键词组:“上海市”、“2016年”、“迪士尼乐园”、“浦东”和“新区”。Specifically, following the example of the above-mentioned text "Shanghai and Disneyland, which opened in 2016, are located in Pudong New District", combine the third keyword "Disneyland" with the fourth keyword: "Shanghai City", "2016", "Pudong" and "New District" to obtain the keyword group: "Shanghai City", "2016", "Disneyland", "Pudong" and "New District".
204:对关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量。204: Perform multiple hidden feature extraction processes on the keyword group to obtain an initial hidden layer state feature vector.
在本实施方式中,关键词组可以包括至少一个关键词,且至少一个关键词按照至少一个关键词中的每个关键词在语音信息中的先后位置顺序进行排列。基于此,本实施方式提供了一种对关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量的方法,具体如下:In this embodiment, the keyword group may include at least one keyword, and the at least one keyword is arranged according to the order of each keyword in the at least one keyword in the voice information. Based on this, this embodiment provides a method for performing multiple hidden feature extraction processing on keyword groups to obtain the initial hidden layer state feature vector, specifically as follows:
在第n次隐藏特征提取处理中,将第一输入隐藏特征H n输入GUR编码器,得到第一输出隐藏特征I n,其中,n为大于或等于1,且小于或等于m的整数,m为至少一个关键词的数量,m为大于或等于1的整数,当n=1时,输入隐藏特征H n为至少一个关键词中的第1个关键词;将第一输出隐藏特征I n作为第n+1次隐藏特征提取处理的第一输入隐藏特征H n+1进行第n+1次隐藏特征提取处理,直至进行多次隐藏特征提取处理后,得到初始隐藏层状态特征向量。 In the n hidden feature extraction process, the first input hidden feature H n is input to the GUR encoder to obtain the first output hidden feature I n , wherein n is an integer greater than or equal to 1 and less than or equal to m, m is the number of at least one keyword, and m is an integer greater than or equal to 1. When n=1, the input hidden feature H n is the first keyword in the at least one keyword; the first output hidden feature I n is used as the first input hidden feature H n +1 of the n+1 hidden feature extraction process to perform the n+1 hidden feature Extraction processing, until the hidden feature extraction process is performed multiple times, and the initial hidden layer state feature vector is obtained.
205:根据粗糙语义特征和初始隐藏层状态特征向量进行多次回复词生成处理,得到至少一个回复词。205: Perform multiple reply word generation processes according to the rough semantic feature and the initial hidden layer state feature vector to obtain at least one reply word.
在本实施方式中,在第p次回复词生成处理时,可以将输入词向量K p、第二输入隐藏特征L p和粗糙语义特征输入门控循环单元解码器,得到回复词O p和第二输出隐藏特征R p,其中,p为大于或等于1,且小于或等于q的整数,q由语音信息决定为大于或等于1的整数,当p=1时,输入词向量K p为初始隐藏层状态特征向量。然后,对回复词O p进行词嵌入处理,得到回复词向量S p。最后,将回复词向量S p作为第p+1次回复词生成处理的输入词向量K p+1,将第二输出隐藏特征R p作为第p+1次回复词生成处理的第二输入隐藏特征L p+1进行第p+1次回复词生成处理,直至进行多次回复词生成处理后,得到至少一个回复词。 In this embodiment, during the p-time reply word generation process, the input word vector K p , the second input hidden feature L p and the rough semantic feature can be input into the gated recurrent unit decoder to obtain the reply word Op and the second output hidden feature R p , wherein p is an integer greater than or equal to 1 and less than or equal to q, and q is determined as an integer greater than or equal to 1 by the voice information. When p=1, the input word vector K p is the initial hidden layer state feature vector. Then, word embedding is performed on the reply word O p to obtain the reply word vector S p . Finally, the reply word vector S p is used as the input word vector K p +1 of the p+1-th reply word generation process, and the second output hidden feature R p is used as the second input hidden feature L p +1 of the p+1-th reply word generation process for the p+1-th reply word generation process until at least one reply word is obtained after multiple reply word generation processes.
具体而言,如图7所示,生成过程是每次生成一个回复词,在第p次生成回复词O p,然 后在第p+1次生成回复词O p+1。但是,在第p+1次的时候会把上一次(即第p次)生成的回复词O p的词向量也作为第p+1次的输入之一。而另一个输入就是粗糙语义特征,即回复词O p+1是由回复词O p的词向量、第p次生成的第二输出隐藏特征R p以及粗糙语义特征三个决定的。 Specifically, as shown in FIG. 7 , the generation process is to generate a reply word each time, generate the reply word O p at the pth time, and then generate the reply word O p+1 at the p+1 time. However, at the p+1th time, the word vector of the reply word O p generated last time (that is, the pth time) is also used as one of the inputs of the p+1th time. The other input is the rough semantic feature, that is, the reply word O p+1 is determined by the word vector of the reply word Op p , the second output hidden feature R p generated for the pth time, and the rough semantic feature.
206:将至少一个回复词按照至少一个回复词中每个回复词的生成顺序进行拼接,得到语音信息的回复语句。206: Concatenate at least one reply word according to the generation sequence of each reply word in the at least one reply word to obtain a reply sentence of the voice information.
综上所述,本申请所提供的基于粗糙语义的回复语句确定方法中,通过获取用户当前时刻的语音信息的前一轮语音信息,继而对该前一轮语音信息进行粗糙语义提取,得到可以包含该前一轮语音信息中高层次的抽象信息的语义特征,作为用户当前时刻的语音信息的粗糙语义特征,由此,实现了对前一轮语音信息中关键信息和粗糙信息的同步提取。然后,对用户当前时刻的语音信息进行分词处理,并对得到的多个关键词进行多次隐藏特征提取处理,得到用户当前时刻的语音信息的初始隐藏层状态特征向量。最后,根据粗糙语义特征和初始隐藏层状态特征向量进行多次回复词生成处理,将得到的至少一个回复词按照至少一个回复词中每个回复词的生成顺序进行拼接,得到语音信息的回复语句。基于此,将同时包含前一轮对话中的关键信息和粗糙信息的粗糙语义特征作为本轮对话中回复语句的生成依据之一,使回复语句生成过程包含了前一轮对话的更加全面的信息特征。由此,生成的回复语句的精准度更高,可以与对话的主体更好的契合,提升用户体验。To sum up, in the method for determining reply sentences based on rough semantics provided by this application, by obtaining the previous round of voice information of the user's current voice information, and then performing rough semantic extraction on the previous round of voice information, the semantic features that can contain the high-level abstract information in the previous round of voice information are obtained, which are used as the rough semantic features of the user's current voice information, thereby realizing synchronous extraction of key information and rough information in the previous round of voice information. Then, word segmentation is performed on the voice information of the user at the current moment, and multiple hidden feature extraction processes are performed on the obtained multiple keywords to obtain the initial hidden layer state feature vector of the voice information of the user at the current moment. Finally, according to the rough semantic features and the initial hidden layer state feature vector, multiple reply words are generated, and the obtained at least one reply word is spliced according to the generation order of each reply word in the at least one reply word to obtain the reply sentence of the voice information. Based on this, the rough semantic features containing both key information and rough information in the previous round of dialogue are used as one of the basis for generating reply sentences in the current round of dialogue, so that the reply sentence generation process includes more comprehensive information features of the previous round of dialogue. As a result, the generated reply sentences are more accurate, can better fit with the main body of the dialogue, and improve user experience.
参阅图8,图8为本申请实施方式提供的一种基于粗糙语义的回复语句确定装置的功能模块组成框图。如图8所示,该基于粗糙语义的回复语句确定装置800包括:Referring to FIG. 8 , FIG. 8 is a block diagram of functional modules of an apparatus for determining reply sentences based on rough semantics provided in an embodiment of the present application. As shown in FIG. 8, the device 800 for determining a reply sentence based on rough semantics includes:
获取模块801,用于根据用户当前时刻的语音信息的发生时间,获取与语音信息相邻的前一轮语音信息,其中,前一轮语音信息的发生时间小于语音信息的发生时间,且前一轮语音信息的发生时间与语音信息的发生时间之间的差值的绝对值最小;The acquisition module 801 is used to acquire the previous round of voice information adjacent to the voice information according to the occurrence time of the voice information at the user's current moment, wherein the occurrence time of the previous round of voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is the smallest;
处理模块802,用于根据前一轮语音信息对语音信息进行粗糙语义提取,得到对应于语音信息的粗糙语义特征,对语音信息进行分词处理,得到关键词组,并对关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量;The processing module 802 is used to perform rough semantic extraction on the voice information according to the previous round of voice information, obtain rough semantic features corresponding to the voice information, perform word segmentation processing on the voice information, obtain keyword groups, and perform multiple hidden feature extraction processing on the keyword groups to obtain initial hidden layer state feature vectors;
生成模块803,用于根据粗糙语义特征和初始隐藏层状态特征向量进行多次回复词生成处理,得到至少一个回复词,并将至少一个回复词按照至少一个回复词中每个回复词的生成顺序进行拼接,得到语音信息的回复语句。The generation module 803 is used to perform multiple reply word generation processing according to the rough semantic feature and the initial hidden layer state feature vector to obtain at least one reply word, and splice at least one reply word according to the generation order of each reply word in the at least one reply word to obtain a reply sentence of the voice information.
在本申请的实施方式中,在根据前一轮语音信息对语音信息进行粗糙语义提取,得到 对应于语音信息的粗糙语义特征方面,处理模块802,具体用于:In the embodiment of the present application, in performing rough semantic extraction on the voice information according to the previous round of voice information, and obtaining rough semantic features corresponding to the voice information, the processing module 802 is specifically used for:
对前一轮语音信息进行检测,得到前一轮语音信息包含的至少一个第一词语,其中,至少一个第一词语中的每个第一词语包括词语标签;Detecting the previous round of voice information to obtain at least one first word contained in the previous round of voice information, wherein each first word in the at least one first word includes a word label;
根据至少一个第一词语确定前一轮语音信息的时态信息;determining the temporal information of the previous round of speech information according to at least one first word;
将时态信息添加进每个第一词语的词语标签中,得到至少一个第二词语,其中,至少一个第二词语与至少一个第一词语一一对应;adding temporal information into the word tags of each first word to obtain at least one second word, wherein at least one second word is in one-to-one correspondence with at least one first word;
将至少一个第二词语输入粗糙编码器进行编码,得到至少一个粗糙上下文信息和至少一个第一隐藏层状态特征向量,其中,至少一个粗糙上下文信息与至少一个第二词语一一对应,至少一个第一隐藏层状态特征向量与至少一个第二词语一一对应;Inputting at least one second word into a rough encoder for encoding to obtain at least one rough context information and at least one first hidden layer state feature vector, wherein at least one rough context information is in one-to-one correspondence with at least one second word, and at least one first hidden layer state feature vector is in one-to-one correspondence with at least one second word;
将至少一个粗糙上下文信息至少一个第一隐藏层状态特征向量输入粗糙解码器进行多次解码处理,得到语音信息的粗糙语义特征。Inputting at least one piece of rough context information and at least one first hidden layer state feature vector into a rough decoder for multiple decoding processes to obtain rough semantic features of speech information.
在本申请的实施方式中,在根据至少一个第一词语确定前一轮语音信息的时态信息方面,处理模块802,具体用于:In the embodiment of the present application, in terms of determining the temporal information of the previous round of speech information according to at least one first word, the processing module 802 is specifically used to:
将至少一个第一词语输入门控循环单元编码器进行编码,得到第二隐藏层状态特征向量;Inputting at least one first word into a gated recurrent unit encoder for encoding to obtain a second hidden layer state feature vector;
将第二隐藏层状态特征向量输入多层感知器,得到线性输出结果;Input the state feature vector of the second hidden layer into the multi-layer perceptron to obtain a linear output result;
将线性输出结果输入时态分类器,得到前一轮语音信息的时态信息。Input the linear output result into the temporal classifier to obtain the temporal information of the previous round of speech information.
在本申请的实施方式中,在将至少一个粗糙上下文信息至少一个第一隐藏层状态特征向量输入粗糙解码器进行多次解码处理,得到语音信息的粗糙语义特征方面,处理模块802,具体用于:In the embodiment of the present application, the processing module 802 is specifically used to:
在第i次解码处理中,将输入特征向量A i输入粗糙解码器,得到输出特征向量B i,其中,i为大于或等于1,且小于或等于j的整数,j为至少一个粗糙上下文信息的数量,j为大于或等于1的整数,当i=1时,输入特征向量A i为至少一个粗糙上下文信息中的第1个粗糙上下文信息; In the i-th decoding process, the input feature vector A i is input to the rough decoder to obtain the output feature vector B i , wherein i is an integer greater than or equal to 1 and less than or equal to j, j is the quantity of at least one rough context information, j is an integer greater than or equal to 1, when i=1, the input feature vector A i is the first rough context information in at least one rough context information;
计算输出特征向量B i和至少一个第一隐藏层状态特征向量中第i个第一隐藏层状态特征向量C i之间的相似度D iCalculating the similarity D i between the output feature vector B i and the i-th first hidden layer state feature vector C i in at least one first hidden layer state feature vector;
对相似度D i进行归一化处理,得到输入特征向量A i的权重E iNormalize the similarity D i to obtain the weight E i of the input feature vector A i ;
将权重E i与第i个第一隐藏层状态特征向量C i相乘,得到权重特征向量F iMultiply the weight E i with the i-th first hidden layer state feature vector C i to obtain the weight feature vector F i ;
将权重特征向量F i与输出特征向量B i相加,得到目标输出特征向量G iAdd the weight feature vector F i to the output feature vector B i to get the target output feature vector G i ;
将目标输出特征向量G i作为第i+1次解码处理的输入特征向量A i+1进行第i+1次解码处理,直至进行多次解码处理,得到语音信息的粗糙语义特征。 The target output feature vector G i is used as the input feature vector A i+1 of the i+1 decoding process to perform the i+1 decoding process until multiple decoding processes are performed to obtain the rough semantic features of the speech information.
在本申请的实施方式中,关键词组包括至少一个关键词,且至少一个关键词按照至少一个关键词中的每个关键词在语音信息中的先后位置顺序进行排列。基于此,在对关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量方面,处理模块802,具体用于:In the implementation manner of the present application, the keyword group includes at least one keyword, and the at least one keyword is arranged according to the sequence of each keyword in the at least one keyword in the voice information. Based on this, the processing module 802 is specifically used for:
在第n次隐藏特征提取处理中,将第一输入隐藏特征H n输入门控循环单元编码器,得到第一输出隐藏特征I n,其中,n为大于或等于1,且小于或等于m的整数,m为至少一个关键词的数量,m为大于或等于1的整数,当n=1时,输入隐藏特征H n为至少一个关键词中的第1个关键词; In the n hidden feature extraction process, the first input hidden feature H n is input into the gated recurrent unit encoder to obtain the first output hidden feature I n , wherein n is an integer greater than or equal to 1 and less than or equal to m, m is the number of at least one keyword, m is an integer greater than or equal to 1, and when n=1, the input hidden feature H n is the first keyword in at least one keyword;
将第一输出隐藏特征I n作为第n+1次隐藏特征提取处理的第一输入隐藏特征H n+1进行第n+1次隐藏特征提取处理,直至进行多次隐藏特征提取处理后,得到初始隐藏层状态特征向量。 The first output hidden feature I n is used as the first input hidden feature H n+1 of the n+1 hidden feature extraction process to perform the n+1 hidden feature extraction process until the hidden feature extraction process is performed multiple times, and the initial hidden layer state feature vector is obtained.
在本申请的实施方式中,在根据粗糙语义特征和初始隐藏层状态特征向量进行多次回复词生成处理,得到至少一个回复词方面,生成模块803,具体用于:In the embodiment of the present application, in terms of generating at least one reply word by performing multiple reply word generation processes according to the rough semantic feature and the initial hidden layer state feature vector, the generating module 803 is specifically used for:
在第p次回复词生成处理时,将输入词向量K p、第二输入隐藏特征L p和粗糙语义特征输入门控循环单元解码器,得到回复词O p和第二输出隐藏特征R p,其中,p为大于或等于1,且小于或等于q的整数,q由语音信息决定为大于或等于1的整数,当p=1时,输入词向量K p为初始隐藏层状态特征向量; In the p-time reply word generation process, the input word vector Kp , the second input hidden feature Lp and the rough semantic feature are input into the gated recurrent unit decoder to obtain the reply word Op and the second output hidden feature Rp , wherein p is an integer greater than or equal to 1 and less than or equal to q, and q is determined by the voice information to be an integer greater than or equal to 1. When p=1, the input word vector Kp is the initial hidden layer state feature vector;
对回复词O p进行词嵌入处理,得到回复词向量S pPerform word embedding processing on the reply word O p to obtain the reply word vector S p ;
将回复词向量S p作为第p+1次回复词生成处理的输入词向量K p+1,将第二输出隐藏特征R p作为第p+1次回复词生成处理的第二输入隐藏特征L p+1进行第p+1次回复词生成处理,直至进行多次回复词生成处理后,得到至少一个回复词。 The reply word vector S p is used as the input word vector K p +1 of the p+1 reply word generation process, and the second output hidden feature R p is used as the second input hidden feature L p+1 of the p+1 reply word generation process for the p+1 reply word generation process until at least one reply word is obtained after multiple reply word generation processes.
在本申请的实施方式中,在对语音信息进行分词处理,得到关键词组方面,处理模块802,具体用于:In the embodiment of the present application, the processing module 802 is specifically used for:
将语音信息转化为文字文本,并对文字文本进行切分处理,得到至少一个第一关键词;Converting the speech information into text, and performing segmentation processing on the text to obtain at least one first keyword;
将第一相邻词和第二相邻词进行组合,得到至少一个第二关键词,其中,第一相邻词和第二相邻词为述至少一个第一关键词中任意两个不同的第一关键词,且第一相邻词和第二相邻词之间的字段间隔小于第一阈值;Combining the first adjacent word and the second adjacent word to obtain at least one second keyword, wherein the first adjacent word and the second adjacent word are any two different first keywords in the at least one first keyword, and the field interval between the first adjacent word and the second adjacent word is less than the first threshold;
将至少一个第二关键词中的每个第二关键词与预设的实体库进行匹配,并筛除匹配失败的第二关键词,得到至少一个第三关键词;Match each second keyword in the at least one second keyword with a preset entity library, and filter out the second keywords that fail to match, to obtain at least one third keyword;
在至少一个第一关键词中,将组成至少一个第三关键词中的每个第三关键词的第一关键词删除,得到至少一个第四关键词;In the at least one first keyword, the first keyword forming each third keyword in the at least one third keyword is deleted to obtain at least one fourth keyword;
将至少一个第三关键词和至少一个第四关键词进行组合,得到关键词组。Combining at least one third keyword and at least one fourth keyword to obtain a keyword group.
参阅图9,图9为本申请实施方式提供的一种电子设备的结构示意图。如图9所示,电子设备900包括收发器901、处理器902和存储器903。它们之间通过总线904连接。存储器903用于存储计算机程序和数据,并可以将存储器903存储的数据传输给处理器902。Referring to FIG. 9 , FIG. 9 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. As shown in FIG. 9 , an electronic device 900 includes a transceiver 901 , a processor 902 and a memory 903 . They are connected through a bus 904 . The memory 903 is used to store computer programs and data, and can transmit the data stored in the memory 903 to the processor 902 .
处理器902用于读取存储器903中的计算机程序执行以下操作:The processor 902 is used to read the computer program in the memory 903 to perform the following operations:
根据用户当前时刻的语音信息的发生时间,获取与语音信息相邻的前一轮语音信息,其中,前一轮语音信息的发生时间小于语音信息的发生时间,且前一轮语音信息的发生时间与语音信息的发生时间之间的差值的绝对值最小;According to the time of occurrence of the voice information of the user at the current moment, the previous round of voice information adjacent to the voice information is obtained, wherein the time of occurrence of the previous round of voice information is less than the time of occurrence of the voice information, and the absolute value of the difference between the time of occurrence of the previous round of voice information and the time of occurrence of the voice information is the smallest;
根据前一轮语音信息对语音信息进行粗糙语义提取,得到对应于语音信息的粗糙语义特征;Perform rough semantic extraction on the voice information according to the previous round of voice information, and obtain rough semantic features corresponding to the voice information;
对语音信息进行分词处理,得到关键词组;Perform word segmentation processing on the voice information to obtain keyword groups;
对关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量;Perform multiple hidden feature extraction processing on the keyword group to obtain the initial hidden layer state feature vector;
根据粗糙语义特征和初始隐藏层状态特征向量进行多次回复词生成处理,得到至少一个回复词;Perform multiple reply word generation processes according to the rough semantic feature and the initial hidden layer state feature vector to obtain at least one reply word;
将至少一个回复词按照至少一个回复词中每个回复词的生成顺序进行拼接,得到语音信息的回复语句。The at least one reply word is spliced according to the generation sequence of each reply word in the at least one reply word to obtain a reply sentence of the voice information.
在本申请的实施方式中,在根据前一轮语音信息对语音信息进行粗糙语义提取,得到对应于语音信息的粗糙语义特征方面,处理器902,具体用于执行以下操作:In the embodiment of the present application, the processor 902 is specifically configured to perform the following operations in terms of performing rough semantic extraction on the voice information based on the previous round of voice information to obtain rough semantic features corresponding to the voice information:
对前一轮语音信息进行检测,得到前一轮语音信息包含的至少一个第一词语,其中,至少一个第一词语中的每个第一词语包括词语标签;Detecting the previous round of voice information to obtain at least one first word contained in the previous round of voice information, wherein each first word in the at least one first word includes a word label;
根据至少一个第一词语确定前一轮语音信息的时态信息;determining the temporal information of the previous round of speech information according to at least one first word;
将时态信息添加进每个第一词语的词语标签中,得到至少一个第二词语,其中,至少一个第二词语与至少一个第一词语一一对应;adding temporal information into the word tags of each first word to obtain at least one second word, wherein at least one second word is in one-to-one correspondence with at least one first word;
将至少一个第二词语输入粗糙编码器进行编码,得到至少一个粗糙上下文信息和至少 一个第一隐藏层状态特征向量,其中,至少一个粗糙上下文信息与至少一个第二词语一一对应,至少一个第一隐藏层状态特征向量与至少一个第二词语一一对应;At least one second word input rough encoder is encoded, obtains at least one coarse context information and at least one first hidden layer state feature vector, wherein, at least one rough context information and at least one second word one-to-one correspondence, at least one first hidden layer state feature vector one-to-one correspondence with at least one second word;
将至少一个粗糙上下文信息至少一个第一隐藏层状态特征向量输入粗糙解码器进行多次解码处理,得到语音信息的粗糙语义特征。Inputting at least one piece of rough context information and at least one first hidden layer state feature vector into a rough decoder for multiple decoding processes to obtain rough semantic features of speech information.
在本申请的实施方式中,在根据至少一个第一词语确定前一轮语音信息的时态信息方面,处理器902,具体用于执行以下操作:In the embodiment of the present application, in terms of determining the temporal information of the previous round of speech information according to at least one first word, the processor 902 is specifically configured to perform the following operations:
将至少一个第一词语输入门控循环单元编码器进行编码,得到第二隐藏层状态特征向量;Inputting at least one first word into a gated recurrent unit encoder for encoding to obtain a second hidden layer state feature vector;
将第二隐藏层状态特征向量输入多层感知器,得到线性输出结果;Input the state feature vector of the second hidden layer into the multi-layer perceptron to obtain a linear output result;
将线性输出结果输入时态分类器,得到前一轮语音信息的时态信息。Input the linear output result into the temporal classifier to obtain the temporal information of the previous round of speech information.
在本申请的实施方式中,在将至少一个粗糙上下文信息至少一个第一隐藏层状态特征向量输入粗糙解码器进行多次解码处理,得到语音信息的粗糙语义特征方面,处理器902,具体用于执行以下操作:In the embodiment of the present application, the processor 902 is specifically configured to perform the following operations in terms of inputting at least one rough context information and at least one first hidden layer state feature vector into a rough decoder for multiple decoding processes to obtain rough semantic features of speech information:
在第i次解码处理中,将输入特征向量A i输入粗糙解码器,得到输出特征向量B i,其中,i为大于或等于1,且小于或等于j的整数,j为至少一个粗糙上下文信息的数量,j为大于或等于1的整数,当i=1时,输入特征向量A i为至少一个粗糙上下文信息中的第1个粗糙上下文信息; In the i-th decoding process, the input feature vector A i is input to the rough decoder to obtain the output feature vector B i , wherein i is an integer greater than or equal to 1 and less than or equal to j, j is the quantity of at least one rough context information, j is an integer greater than or equal to 1, when i=1, the input feature vector A i is the first rough context information in at least one rough context information;
计算输出特征向量B i和至少一个第一隐藏层状态特征向量中第i个第一隐藏层状态特征向量C i之间的相似度D iCalculating the similarity D i between the output feature vector B i and the i-th first hidden layer state feature vector C i in at least one first hidden layer state feature vector;
对相似度D i进行归一化处理,得到输入特征向量A i的权重E iNormalize the similarity D i to obtain the weight E i of the input feature vector A i ;
将权重E i与第i个第一隐藏层状态特征向量C i相乘,得到权重特征向量F iMultiply the weight E i with the i-th first hidden layer state feature vector C i to obtain the weight feature vector F i ;
将权重特征向量F i与输出特征向量B i相加,得到目标输出特征向量G iAdd the weight feature vector F i to the output feature vector B i to get the target output feature vector G i ;
将目标输出特征向量G i作为第i+1次解码处理的输入特征向量A i+1进行第i+1次解码处理,直至进行多次解码处理,得到语音信息的粗糙语义特征。 The target output feature vector G i is used as the input feature vector A i+1 of the i+1 decoding process to perform the i+1 decoding process until multiple decoding processes are performed to obtain the rough semantic features of the speech information.
在本申请的实施方式中,关键词组包括至少一个关键词,且至少一个关键词按照至少一个关键词中的每个关键词在语音信息中的先后位置顺序进行排列。基于此,在对关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量方面,处理器902,具体用于执行以下操作:In the implementation manner of the present application, the keyword group includes at least one keyword, and the at least one keyword is arranged according to the sequence of each keyword in the at least one keyword in the voice information. Based on this, the processor 902 is specifically configured to perform the following operations in performing multiple hidden feature extraction processes on the keyword group to obtain the initial hidden layer state feature vector:
在第n次隐藏特征提取处理中,将第一输入隐藏特征H n输入门控循环单元编码器,得 到第一输出隐藏特征I n,其中,n为大于或等于1,且小于或等于m的整数,m为至少一个关键词的数量,m为大于或等于1的整数,当n=1时,输入隐藏特征H n为至少一个关键词中的第1个关键词; In the n hidden feature extraction process, the first input hidden feature H n is input into the gated recurrent unit encoder to obtain the first output hidden feature I n , wherein n is an integer greater than or equal to 1 and less than or equal to m, m is the number of at least one keyword, m is an integer greater than or equal to 1, and when n=1, the input hidden feature H n is the first keyword in at least one keyword;
将第一输出隐藏特征I n作为第n+1次隐藏特征提取处理的第一输入隐藏特征H n+1进行第n+1次隐藏特征提取处理,直至进行多次隐藏特征提取处理后,得到初始隐藏层状态特征向量。 The first output hidden feature I n is used as the first input hidden feature H n+1 of the n+1 hidden feature extraction process to perform the n+1 hidden feature extraction process until the hidden feature extraction process is performed multiple times, and the initial hidden layer state feature vector is obtained.
在本申请的实施方式中,在根据粗糙语义特征和初始隐藏层状态特征向量进行多次回复词生成处理,得到至少一个回复词方面,处理器902,具体用于执行以下操作:In the embodiment of the present application, the processor 902 is specifically configured to perform the following operations in terms of generating at least one reply word based on the rough semantic feature and the initial hidden layer state feature vector for multiple times of reply word generation:
在第p次回复词生成处理时,将输入词向量K p、第二输入隐藏特征L p和粗糙语义特征输入门控循环单元解码器,得到回复词O p和第二输出隐藏特征R p,其中,p为大于或等于1,且小于或等于q的整数,q由语音信息决定为大于或等于1的整数,当p=1时,输入词向量K p为初始隐藏层状态特征向量; In the p-time reply word generation process, the input word vector Kp , the second input hidden feature Lp and the rough semantic feature are input into the gated recurrent unit decoder to obtain the reply word Op and the second output hidden feature Rp , wherein p is an integer greater than or equal to 1 and less than or equal to q, and q is determined by the voice information to be an integer greater than or equal to 1. When p=1, the input word vector Kp is the initial hidden layer state feature vector;
对回复词O p进行词嵌入处理,得到回复词向量S pPerform word embedding processing on the reply word O p to obtain the reply word vector S p ;
将回复词向量S p作为第p+1次回复词生成处理的输入词向量K p+1,将第二输出隐藏特征R p作为第p+1次回复词生成处理的第二输入隐藏特征L p+1进行第p+1次回复词生成处理,直至进行多次回复词生成处理后,得到至少一个回复词。 The reply word vector S p is used as the input word vector K p +1 of the p+1 reply word generation process, and the second output hidden feature R p is used as the second input hidden feature L p+1 of the p+1 reply word generation process for the p+1 reply word generation process until at least one reply word is obtained after multiple reply word generation processes.
在本申请的实施方式中,在对语音信息进行分词处理,得到关键词组方面,处理器902,具体用于执行以下操作:In the embodiment of the present application, the processor 902 is specifically configured to perform the following operations in performing word segmentation processing on the speech information to obtain keyword groups:
将语音信息转化为文字文本,并对文字文本进行切分处理,得到至少一个第一关键词;Converting the speech information into text, and performing segmentation processing on the text to obtain at least one first keyword;
将第一相邻词和第二相邻词进行组合,得到至少一个第二关键词,其中,第一相邻词和第二相邻词为述至少一个第一关键词中任意两个不同的第一关键词,且第一相邻词和第二相邻词之间的字段间隔小于第一阈值;Combining the first adjacent word and the second adjacent word to obtain at least one second keyword, wherein the first adjacent word and the second adjacent word are any two different first keywords in the at least one first keyword, and the field interval between the first adjacent word and the second adjacent word is less than the first threshold;
将至少一个第二关键词中的每个第二关键词与预设的实体库进行匹配,并筛除匹配失败的第二关键词,得到至少一个第三关键词;Match each second keyword in the at least one second keyword with a preset entity library, and filter out the second keywords that fail to match, to obtain at least one third keyword;
在至少一个第一关键词中,将组成至少一个第三关键词中的每个第三关键词的第一关键词删除,得到至少一个第四关键词;In the at least one first keyword, the first keyword forming each third keyword in the at least one third keyword is deleted to obtain at least one fourth keyword;
将至少一个第三关键词和至少一个第四关键词进行组合,得到关键词组。Combining at least one third keyword and at least one fourth keyword to obtain a keyword group.
应理解,本申请中的基于粗糙语义的回复语句确定装置可以包括智能手机(如Android 手机、iOS手机、Windows Phone手机等)、平板电脑、掌上电脑、笔记本电脑、移动互联网设备MID(Mobile Internet Devices,简称:MID)、机器人或穿戴式设备等。上述基于粗糙语义的回复语句确定装置仅是举例,而非穷举,包含但不限于上述基于粗糙语义的回复语句确定装置。在实际应用中,上述基于粗糙语义的回复语句确定装置还可以包括:智能车载终端、计算机设备等等。It should be understood that the apparatus for determining reply sentences based on rough semantics in the present application may include smart phones (such as Android phones, iOS phones, Windows Phone phones, etc.), tablet computers, palmtop computers, notebook computers, mobile Internet devices MID (Mobile Internet Devices, referred to as: MID), robots or wearable devices, etc. The above device for determining a reply sentence based on rough semantics is only an example, not exhaustive, including but not limited to the above device for determining a reply sentence based on rough semantics. In practical applications, the apparatus for determining reply sentences based on rough semantics may also include: intelligent vehicle-mounted terminals, computer equipment, and the like.
因此,本申请实施方式还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现如上述方法实施方式中记载的任何一种基于粗糙语义的回复语句确定方法的部分或全部步骤。例如,所述存储介质可以包括硬盘、软盘、光盘、磁带、磁盘、优盘、闪存等。所述计算机可读存储介质可以是非易失性,也可以是易失性。Therefore, the embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement some or all steps of any method for determining a reply sentence based on rough semantics as described in the above-mentioned method embodiments. For example, the storage medium may include a hard disk, a floppy disk, an optical disk, a magnetic tape, a magnetic disk, a flash memory, and the like. The computer-readable storage medium may be non-volatile or volatile.
本申请实施方式还提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如上述方法实施方式中记载的任何一种基于粗糙语义的回复语句确定方法的部分或全部步骤。The embodiment of the present application also provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause the computer to execute some or all of the steps of any method for determining a reply sentence based on rough semantics as described in the above-mentioned method embodiments.
以上对本申请实施方式进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施方式的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The above is a detailed introduction to the implementation of the present application. In this paper, specific examples are used to illustrate the principle and implementation of the application. The description of the above implementation is only used to help understand the method and core idea of the application; at the same time, for those of ordinary skill in the art, according to the thinking of the application, there will be changes in the specific implementation and application scope. In summary, the content of this specification should not be understood as limiting the application.

Claims (20)

  1. 一种基于粗糙语义的回复语句确定方法,其中,所述方法包括:A method for determining a reply sentence based on rough semantics, wherein the method includes:
    根据用户当前时刻的语音信息的发生时间,获取与所述语音信息的相邻的前一轮语音信息,其中,所述前一轮语音信息的发生时间小于所述语音信息的发生时间,且所述前一轮语音信息的发生时间与所述语音信息的发生时间之间的差值的绝对值最小;According to the occurrence time of the voice information at the user's current moment, the previous round of voice information adjacent to the voice information is acquired, wherein the occurrence time of the previous round of voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is the smallest;
    根据所述前一轮语音信息对所述语音信息进行粗糙语义提取,得到对应于所述语音信息的粗糙语义特征;performing rough semantic extraction on the voice information according to the previous round of voice information, to obtain rough semantic features corresponding to the voice information;
    对所述语音信息进行分词处理,得到关键词组;performing word segmentation processing on the voice information to obtain keyword groups;
    对所述关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量;Performing multiple hidden feature extraction processes on the keyword group to obtain an initial hidden layer state feature vector;
    根据所述粗糙语义特征和所述初始隐藏层状态特征向量进行多次回复词生成处理,得到至少一个回复词;Perform multiple reply word generation processes according to the rough semantic feature and the initial hidden layer state feature vector to obtain at least one reply word;
    将所述至少一个回复词按照所述至少一个回复词中每个回复词的生成顺序进行拼接,得到所述语音信息的回复语句。The at least one reply word is spliced according to the generation sequence of each reply word in the at least one reply word to obtain the reply sentence of the voice information.
  2. 根据权利要求1所述的方法,其中,所述根据所述前一轮语音信息对所述语音信息进行粗糙语义提取,得到对应于所述语音信息的粗糙语义特征,包括:The method according to claim 1, wherein said performing rough semantic extraction on said voice information according to said previous round of voice information to obtain rough semantic features corresponding to said voice information, comprising:
    对所述前一轮语音信息进行检测,得到所述前一轮语音信息包含的至少一个第一词语,其中,所述至少一个第一词语中的每个第一词语包括词语标签;Detecting the previous round of voice information to obtain at least one first word contained in the previous round of voice information, wherein each first word in the at least one first word includes a word label;
    根据所述至少一个第一词语确定所述前一轮语音信息的时态信息;determining temporal information of the previous round of voice information according to the at least one first word;
    将所述时态信息添加进所述每个第一词语的词语标签中,得到至少一个第二词语,其中,所述至少一个第二词语与所述至少一个第一词语一一对应;adding the temporal information to the word label of each first word to obtain at least one second word, wherein the at least one second word corresponds to the at least one first word;
    将所述至少一个第二词语输入粗糙编码器进行编码,得到至少一个粗糙上下文信息和至少一个第一隐藏层状态特征向量,其中,所述至少一个粗糙上下文信息与所述至少一个第二词语一一对应,所述至少一个第一隐藏层状态特征向量与所述至少一个第二词语一一对应;Inputting the at least one second word into a rough encoder for encoding to obtain at least one rough context information and at least one first hidden layer state feature vector, wherein the at least one rough context information is in one-to-one correspondence with the at least one second word, and the at least one first hidden layer state feature vector is in one-to-one correspondence with the at least one second word;
    将所述至少一个粗糙上下文信息所述至少一个第一隐藏层状态特征向量输入粗糙解码器进行多次解码处理,得到所述语音信息的粗糙语义特征。Inputting the at least one first hidden layer state feature vector of the at least one rough context information into a rough decoder for multiple decoding processes to obtain rough semantic features of the speech information.
  3. 根据权利要求2所述的方法,其中,所述根据所述至少一个第一词语确定所述前一轮语音信息的时态信息,包括:The method according to claim 2, wherein said determining the temporal information of the previous round of speech information according to the at least one first word comprises:
    将所述至少一个第一词语输入门控循环单元编码器进行编码,得到第二隐藏层状态特 征向量;The at least one first word input gated recurrent unit encoder is encoded to obtain the second hidden layer state feature vector;
    将所述第二隐藏层状态特征向量输入多层感知器,得到线性输出结果;Inputting the second hidden layer state feature vector into the multi-layer perceptron to obtain a linear output result;
    将所述线性输出结果输入时态分类器,得到所述前一轮语音信息的时态信息。Inputting the linear output result into a temporal classifier to obtain the temporal information of the previous round of speech information.
  4. 根据权利要求2所述的方法,其中,所述将所述至少一个粗糙上下文信息所述至少一个第一隐藏层状态特征向量输入粗糙解码器进行多次解码处理,得到所述语音信息的粗糙语义特征,包括:The method according to claim 2, wherein said inputting said at least one first hidden layer state feature vector of said at least one rough context information into a rough decoder for multiple decoding processes to obtain rough semantic features of said speech information, comprising:
    在第i次解码处理中,将输入特征向量A i输入所述粗糙解码器,得到输出特征向量B i,其中,i为大于或等于1,且小于或等于j的整数,j为所述至少一个粗糙上下文信息的数量,j为大于或等于1的整数,当i=1时,所述输入特征向量A i为所述至少一个粗糙上下文信息中的第1个粗糙上下文信息; In the i-th decoding process, the input feature vector A i is input to the rough decoder to obtain the output feature vector B i , where i is an integer greater than or equal to 1 and less than or equal to j, j is the number of the at least one rough context information, j is an integer greater than or equal to 1, and when i=1, the input feature vector A i is the first rough context information in the at least one rough context information;
    计算所述输出特征向量B i和所述至少一个第一隐藏层状态特征向量中第i个第一隐藏层状态特征向量C i之间的相似度D iCalculating the similarity D i between the output feature vector B i and the ith first hidden layer state feature vector C i of the at least one first hidden layer state feature vector;
    对所述相似度D i进行归一化处理,得到所述输入特征向量A i的权重E iPerform normalization processing on the similarity D i to obtain the weight E i of the input feature vector A i ;
    将所述权重E i与所述第i个第一隐藏层状态特征向量C i相乘,得到权重特征向量F iMultiplying the weight E i by the ith first hidden layer state feature vector C i to obtain a weight feature vector F i ;
    将所述权重特征向量F i与所述输出特征向量B i相加,得到目标输出特征向量G iAdding the weight feature vector F i to the output feature vector B i to obtain the target output feature vector G i ;
    将所述目标输出特征向量G i作为第i+1次解码处理的输入特征向量A i+1进行所述第i+1次解码处理,直至进行所述多次解码处理,得到所述语音信息的粗糙语义特征。 The target output feature vector G i is used as the input feature vector A i+ 1 of the i+1 decoding process to perform the i+1 decoding process until the multiple decoding processes are performed to obtain the rough semantic features of the speech information.
  5. 根据权利要求1所述的方法,其中,The method according to claim 1, wherein,
    所述关键词组包括至少一个关键词,且所述至少一个关键词按照所述至少一个关键词中的每个关键词在所述语音信息中的先后位置顺序进行排列;The keyword group includes at least one keyword, and the at least one keyword is arranged according to the order of each keyword in the at least one keyword in the voice information;
    所述对所述关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量,包括:The described keyword group is subjected to multiple hidden feature extraction processes to obtain an initial hidden layer state feature vector, including:
    在第n次隐藏特征提取处理中,将第一输入隐藏特征H n输入门控循环单元编码器,得到第一输出隐藏特征I n,其中,n为大于或等于1,且小于或等于m的整数,m为所述至少一个关键词的数量,m为大于或等于1的整数,当n=1时,所述输入隐藏特征H n为所述至少一个关键词中的第1个关键词; In the nth hidden feature extraction process, the first input hidden feature H n is input into the gated recurrent unit encoder to obtain the first output hidden feature I n , wherein n is an integer greater than or equal to 1 and less than or equal to m, m is the number of the at least one keyword, m is an integer greater than or equal to 1, and when n=1, the input hidden feature H n is the first keyword in the at least one keyword;
    将所述第一输出隐藏特征I n作为第n+1次隐藏特征提取处理的第一输入隐藏特征H n+1进行所述第n+1次隐藏特征提取处理,直至进行所述多次隐藏特征提取处理后,得到所述初始隐藏层状态特征向量。 The first output hidden feature I n is used as the first input hidden feature H n+1 of the n+1 hidden feature extraction process to perform the n+1 hidden feature extraction process until the hidden feature extraction process is performed multiple times to obtain the initial hidden layer state feature vector.
  6. 根据权利要求1所述的方法,其中,所述根据所述粗糙语义特征和所述初始隐藏层状态特征向量进行多次回复词生成处理,得到至少一个回复词,包括:The method according to claim 1, wherein, performing multiple reply word generation processing according to the rough semantic feature and the initial hidden layer state feature vector to obtain at least one reply word, comprising:
    在第p次回复词生成处理时,将输入词向量K p、第二输入隐藏特征L p和所述粗糙语义特征输入门控循环单元解码器,得到回复词O p和第二输出隐藏特征R p,其中,p为大于或等于1,且小于或等于q的整数,q由所述语音信息决定为大于或等于1的整数,当p=1时,所述输入词向量K p为所述初始隐藏层状态特征向量; During the p-time reply word generation process, the input word vector Kp , the second input hidden feature Lp and the rough semantic feature are input into the gated recurrent unit decoder to obtain the reply word Op and the second output hidden feature Rp , wherein p is an integer greater than or equal to 1 and less than or equal to q, and q is determined by the voice information to be an integer greater than or equal to 1. When p=1, the input word vector Kp is the initial hidden layer state feature vector;
    对所述回复词O p进行词嵌入处理,得到回复词向量S pCarry out word embedding processing to described reply word Op , obtain reply word vector S p ;
    将所述回复词向量S p作为第p+1次回复词生成处理的输入词向量K p+1,将所述第二输出隐藏特征R p作为所述第p+1次回复词生成处理的第二输入隐藏特征L p+1进行所述第p+1次回复词生成处理,直至进行所述多次回复词生成处理后,得到所述至少一个回复词。 The reply word vector S p is used as the input word vector K p+1 of the p+1 reply word generation process, and the second output hidden feature R p is used as the second input hidden feature L p+1 of the p+1 reply word generation process for the p+1 reply word generation process until the at least one reply word is obtained after performing the multiple reply word generation processes.
  7. 根据权利要求1所述的方法,其中,所述对所述语音信息进行分词处理,得到关键词组,包括:The method according to claim 1, wherein said performing word segmentation processing on said speech information to obtain a keyword group includes:
    将所述语音信息转化为文字文本,并对所述文字文本进行切分处理,得到至少一个第一关键词;Converting the speech information into text, and performing segmentation processing on the text to obtain at least one first keyword;
    将第一相邻词和第二相邻词进行组合,得到至少一个第二关键词,其中,所述第一相邻词和所述第二相邻词为所述述至少一个第一关键词中任意两个不同的第一关键词,且所述第一相邻词和所述第二相邻词之间的字段间隔小于第一阈值;Combining the first adjacent word and the second adjacent word to obtain at least one second keyword, wherein the first adjacent word and the second adjacent word are any two different first keywords in the at least one first keyword, and the field interval between the first adjacent word and the second adjacent word is less than a first threshold;
    将所述至少一个第二关键词中的每个第二关键词与预设的实体库进行匹配,并筛除匹配失败的第二关键词,得到至少一个第三关键词;matching each second keyword of the at least one second keyword with a preset entity library, and filtering out second keywords that fail to match, to obtain at least one third keyword;
    在所述至少一个第一关键词中,将组成所述至少一个第三关键词中的每个第三关键词的第一关键词删除,得到至少一个第四关键词;In the at least one first keyword, deleting the first keyword constituting each third keyword in the at least one third keyword to obtain at least one fourth keyword;
    将所述至少一个第三关键词和所述至少一个第四关键词进行组合,得到所述关键词组。Combining the at least one third keyword and the at least one fourth keyword to obtain the keyword group.
  8. 一种基于粗糙语义的回复语句确定装置,其中,所述装置包括:A device for determining a reply sentence based on rough semantics, wherein the device includes:
    获取模块,用于根据用户当前时刻的语音信息的发生时间,获取与所述语音信息相邻的前一轮语音信息,其中,所述前一轮语音信息的发生时间小于所述语音信息的发生时间,且所述前一轮语音信息的发生时间与所述语音信息的发生时间之间的差值的绝对值最小;An acquisition module, configured to acquire the previous round of voice information adjacent to the voice information according to the occurrence time of the voice information at the current moment of the user, wherein the occurrence time of the previous round of voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is the smallest;
    处理模块,用于根据所述前一轮语音信息对所述语音信息进行粗糙语义提取,得到对应于所述语音信息的粗糙语义特征,对所述语音信息进行分词处理,得到关键词组,并对所述关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量;A processing module, configured to perform rough semantic extraction on the voice information according to the previous round of voice information, obtain rough semantic features corresponding to the voice information, perform word segmentation processing on the voice information, obtain keyword groups, and perform multiple hidden feature extraction processing on the keyword groups to obtain initial hidden layer state feature vectors;
    生成模块,用于根据所述粗糙语义特征和所述初始隐藏层状态特征向量进行多次回复词生成处理,得到至少一个回复词,并将所述至少一个回复词按照所述至少一个回复词中每个回复词的生成顺序进行拼接,得到所述语音信息的回复语句。The generating module is used to perform multiple reply word generation processing according to the rough semantic features and the initial hidden layer state feature vector to obtain at least one reply word, and splice the at least one reply word according to the generation order of each reply word in the at least one reply word to obtain the reply sentence of the voice information.
  9. 一种电子设备,其中,包括处理器、存储器、通信接口以及一个或多个程序,其中,所述一个或多个程序被存储在所述存储器中,并且被配置由所述处理器执行,所述一个或多个程序包括用于执行如下步骤的指令:An electronic device, including a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, and the one or more programs include instructions for performing the following steps:
    根据用户当前时刻的语音信息的发生时间,获取与所述语音信息的相邻的前一轮语音信息,其中,所述前一轮语音信息的发生时间小于所述语音信息的发生时间,且所述前一轮语音信息的发生时间与所述语音信息的发生时间之间的差值的绝对值最小;According to the occurrence time of the voice information at the user's current moment, the previous round of voice information adjacent to the voice information is acquired, wherein the occurrence time of the previous round of voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is the smallest;
    根据所述前一轮语音信息对所述语音信息进行粗糙语义提取,得到对应于所述语音信息的粗糙语义特征;performing rough semantic extraction on the voice information according to the previous round of voice information, to obtain rough semantic features corresponding to the voice information;
    对所述语音信息进行分词处理,得到关键词组;performing word segmentation processing on the voice information to obtain keyword groups;
    对所述关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量;Performing multiple hidden feature extraction processes on the keyword group to obtain an initial hidden layer state feature vector;
    根据所述粗糙语义特征和所述初始隐藏层状态特征向量进行多次回复词生成处理,得到至少一个回复词;Perform multiple reply word generation processes according to the rough semantic feature and the initial hidden layer state feature vector to obtain at least one reply word;
    将所述至少一个回复词按照所述至少一个回复词中每个回复词的生成顺序进行拼接,得到所述语音信息的回复语句。The at least one reply word is spliced according to the generation sequence of each reply word in the at least one reply word to obtain the reply sentence of the voice information.
  10. 根据权利要求9所述的电子设备,其中,所述根据所述前一轮语音信息对所述语音信息进行粗糙语义提取,得到对应于所述语音信息的粗糙语义特征,包括:The electronic device according to claim 9, wherein, performing rough semantic extraction on the voice information according to the previous round of voice information to obtain rough semantic features corresponding to the voice information, comprising:
    对所述前一轮语音信息进行检测,得到所述前一轮语音信息包含的至少一个第一词语,其中,所述至少一个第一词语中的每个第一词语包括词语标签;Detecting the previous round of voice information to obtain at least one first word contained in the previous round of voice information, wherein each first word in the at least one first word includes a word label;
    根据所述至少一个第一词语确定所述前一轮语音信息的时态信息;determining temporal information of the previous round of voice information according to the at least one first word;
    将所述时态信息添加进所述每个第一词语的词语标签中,得到至少一个第二词语,其中,所述至少一个第二词语与所述至少一个第一词语一一对应;adding the temporal information to the word label of each first word to obtain at least one second word, wherein the at least one second word corresponds to the at least one first word;
    将所述至少一个第二词语输入粗糙编码器进行编码,得到至少一个粗糙上下文信息和至少一个第一隐藏层状态特征向量,其中,所述至少一个粗糙上下文信息与所述至少一个第二词语一一对应,所述至少一个第一隐藏层状态特征向量与所述至少一个第二词语一一对应;Inputting the at least one second word into a rough encoder for encoding to obtain at least one rough context information and at least one first hidden layer state feature vector, wherein the at least one rough context information is in one-to-one correspondence with the at least one second word, and the at least one first hidden layer state feature vector is in one-to-one correspondence with the at least one second word;
    将所述至少一个粗糙上下文信息所述至少一个第一隐藏层状态特征向量输入粗糙解 码器进行多次解码处理,得到所述语音信息的粗糙语义特征。The at least one first hidden layer state feature vector of the at least one rough context information is input into a rough decoder for multiple decoding processes to obtain the rough semantic features of the speech information.
  11. 根据权利要求10所述的电子设备,其中,所述根据所述至少一个第一词语确定所述前一轮语音信息的时态信息,包括:The electronic device according to claim 10, wherein said determining the temporal information of the previous round of voice information according to the at least one first word comprises:
    将所述至少一个第一词语输入门控循环单元编码器进行编码,得到第二隐藏层状态特征向量;Inputting the at least one first word into a gated recurrent unit encoder for encoding to obtain a second hidden layer state feature vector;
    将所述第二隐藏层状态特征向量输入多层感知器,得到线性输出结果;Inputting the second hidden layer state feature vector into the multi-layer perceptron to obtain a linear output result;
    将所述线性输出结果输入时态分类器,得到所述前一轮语音信息的时态信息。Inputting the linear output result into a temporal classifier to obtain the temporal information of the previous round of speech information.
  12. 根据权利要求10所述的电子设备,其中,所述将所述至少一个粗糙上下文信息所述至少一个第一隐藏层状态特征向量输入粗糙解码器进行多次解码处理,得到所述语音信息的粗糙语义特征,包括:The electronic device according to claim 10, wherein said inputting said at least one first hidden layer state feature vector of said at least one rough context information into a rough decoder for multiple decoding processes to obtain rough semantic features of said speech information, comprising:
    在第i次解码处理中,将输入特征向量A i输入所述粗糙解码器,得到输出特征向量B i,其中,i为大于或等于1,且小于或等于j的整数,j为所述至少一个粗糙上下文信息的数量,j为大于或等于1的整数,当i=1时,所述输入特征向量A i为所述至少一个粗糙上下文信息中的第1个粗糙上下文信息; In the i-th decoding process, the input feature vector A i is input to the rough decoder to obtain the output feature vector B i , where i is an integer greater than or equal to 1 and less than or equal to j, j is the number of the at least one rough context information, j is an integer greater than or equal to 1, and when i=1, the input feature vector A i is the first rough context information in the at least one rough context information;
    计算所述输出特征向量B i和所述至少一个第一隐藏层状态特征向量中第i个第一隐藏层状态特征向量C i之间的相似度D iCalculating the similarity D i between the output feature vector B i and the ith first hidden layer state feature vector C i of the at least one first hidden layer state feature vector;
    对所述相似度D i进行归一化处理,得到所述输入特征向量A i的权重E iPerform normalization processing on the similarity D i to obtain the weight E i of the input feature vector A i ;
    将所述权重E i与所述第i个第一隐藏层状态特征向量C i相乘,得到权重特征向量F iMultiplying the weight E i by the ith first hidden layer state feature vector C i to obtain a weight feature vector F i ;
    将所述权重特征向量F i与所述输出特征向量B i相加,得到目标输出特征向量G iAdding the weight feature vector F i to the output feature vector B i to obtain the target output feature vector G i ;
    将所述目标输出特征向量G i作为第i+1次解码处理的输入特征向量A i+1进行所述第i+1次解码处理,直至进行所述多次解码处理,得到所述语音信息的粗糙语义特征。 The target output feature vector G i is used as the input feature vector A i+ 1 of the i+1 decoding process to perform the i+1 decoding process until the multiple decoding processes are performed to obtain the rough semantic features of the speech information.
  13. 根据权利要求9所述的电子设备,其中,The electronic device according to claim 9, wherein,
    所述关键词组包括至少一个关键词,且所述至少一个关键词按照所述至少一个关键词中的每个关键词在所述语音信息中的先后位置顺序进行排列;The keyword group includes at least one keyword, and the at least one keyword is arranged according to the order of each keyword in the at least one keyword in the voice information;
    所述对所述关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量,包括:The described keyword group is subjected to multiple hidden feature extraction processes to obtain an initial hidden layer state feature vector, including:
    在第n次隐藏特征提取处理中,将第一输入隐藏特征H n输入门控循环单元编码器,得到第一输出隐藏特征I n,其中,n为大于或等于1,且小于或等于m的整数,m为所述至少一个关键词的数量,m为大于或等于1的整数,当n=1时,所述输入隐藏特征H n为所述至少一 个关键词中的第1个关键词; In the nth hidden feature extraction process, the first input hidden feature H n is input into the gated recurrent unit encoder to obtain the first output hidden feature I n , wherein n is an integer greater than or equal to 1 and less than or equal to m, m is the number of the at least one keyword, m is an integer greater than or equal to 1, and when n=1, the input hidden feature H n is the first keyword in the at least one keyword;
    将所述第一输出隐藏特征I n作为第n+1次隐藏特征提取处理的第一输入隐藏特征H n+1进行所述第n+1次隐藏特征提取处理,直至进行所述多次隐藏特征提取处理后,得到所述初始隐藏层状态特征向量。 The first output hidden feature I n is used as the first input hidden feature H n+1 of the n+1 hidden feature extraction process to perform the n+1 hidden feature extraction process until the hidden feature extraction process is performed multiple times to obtain the initial hidden layer state feature vector.
  14. 根据权利要求9所述的电子设备,其中,所述根据所述粗糙语义特征和所述初始隐藏层状态特征向量进行多次回复词生成处理,得到至少一个回复词,包括:The electronic device according to claim 9, wherein, performing multiple reply word generation processes according to the rough semantic feature and the initial hidden layer state feature vector to obtain at least one reply word, comprising:
    在第p次回复词生成处理时,将输入词向量K p、第二输入隐藏特征L p和所述粗糙语义特征输入门控循环单元解码器,得到回复词O p和第二输出隐藏特征R p,其中,p为大于或等于1,且小于或等于q的整数,q由所述语音信息决定为大于或等于1的整数,当p=1时,所述输入词向量K p为所述初始隐藏层状态特征向量; During the p-time reply word generation process, the input word vector Kp , the second input hidden feature Lp and the rough semantic feature are input into the gated recurrent unit decoder to obtain the reply word Op and the second output hidden feature Rp , wherein p is an integer greater than or equal to 1 and less than or equal to q, and q is determined by the voice information to be an integer greater than or equal to 1. When p=1, the input word vector Kp is the initial hidden layer state feature vector;
    对所述回复词O p进行词嵌入处理,得到回复词向量S pCarry out word embedding processing to described reply word Op , obtain reply word vector S p ;
    将所述回复词向量S p作为第p+1次回复词生成处理的输入词向量K p+1,将所述第二输出隐藏特征R p作为所述第p+1次回复词生成处理的第二输入隐藏特征L p+1进行所述第p+1次回复词生成处理,直至进行所述多次回复词生成处理后,得到所述至少一个回复词。 The reply word vector S p is used as the input word vector K p+1 of the p+1 reply word generation process, and the second output hidden feature R p is used as the second input hidden feature L p+1 of the p+1 reply word generation process for the p+1 reply word generation process until the at least one reply word is obtained after performing the multiple reply word generation processes.
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现如下步骤:A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the following steps:
    根据用户当前时刻的语音信息的发生时间,获取与所述语音信息的相邻的前一轮语音信息,其中,所述前一轮语音信息的发生时间小于所述语音信息的发生时间,且所述前一轮语音信息的发生时间与所述语音信息的发生时间之间的差值的绝对值最小;According to the occurrence time of the voice information at the user's current moment, the previous round of voice information adjacent to the voice information is acquired, wherein the occurrence time of the previous round of voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is the smallest;
    根据所述前一轮语音信息对所述语音信息进行粗糙语义提取,得到对应于所述语音信息的粗糙语义特征;performing rough semantic extraction on the voice information according to the previous round of voice information, to obtain rough semantic features corresponding to the voice information;
    对所述语音信息进行分词处理,得到关键词组;performing word segmentation processing on the voice information to obtain keyword groups;
    对所述关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量;Performing multiple hidden feature extraction processes on the keyword group to obtain an initial hidden layer state feature vector;
    根据所述粗糙语义特征和所述初始隐藏层状态特征向量进行多次回复词生成处理,得到至少一个回复词;Perform multiple reply word generation processes according to the rough semantic feature and the initial hidden layer state feature vector to obtain at least one reply word;
    将所述至少一个回复词按照所述至少一个回复词中每个回复词的生成顺序进行拼接,得到所述语音信息的回复语句。The at least one reply word is spliced according to the generation sequence of each reply word in the at least one reply word to obtain the reply sentence of the voice information.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述根据所述前一轮语音信息对所述语音信息进行粗糙语义提取,得到对应于所述语音信息的粗糙语义特征,包括:The computer-readable storage medium according to claim 15, wherein said performing rough semantic extraction on the voice information according to the previous round of voice information to obtain rough semantic features corresponding to the voice information, comprising:
    对所述前一轮语音信息进行检测,得到所述前一轮语音信息包含的至少一个第一词语,其中,所述至少一个第一词语中的每个第一词语包括词语标签;Detecting the previous round of voice information to obtain at least one first word contained in the previous round of voice information, wherein each first word in the at least one first word includes a word label;
    根据所述至少一个第一词语确定所述前一轮语音信息的时态信息;determining temporal information of the previous round of voice information according to the at least one first word;
    将所述时态信息添加进所述每个第一词语的词语标签中,得到至少一个第二词语,其中,所述至少一个第二词语与所述至少一个第一词语一一对应;adding the temporal information to the word label of each first word to obtain at least one second word, wherein the at least one second word corresponds to the at least one first word;
    将所述至少一个第二词语输入粗糙编码器进行编码,得到至少一个粗糙上下文信息和至少一个第一隐藏层状态特征向量,其中,所述至少一个粗糙上下文信息与所述至少一个第二词语一一对应,所述至少一个第一隐藏层状态特征向量与所述至少一个第二词语一一对应;Inputting the at least one second word into a rough encoder for encoding to obtain at least one rough context information and at least one first hidden layer state feature vector, wherein the at least one rough context information is in one-to-one correspondence with the at least one second word, and the at least one first hidden layer state feature vector is in one-to-one correspondence with the at least one second word;
    将所述至少一个粗糙上下文信息所述至少一个第一隐藏层状态特征向量输入粗糙解码器进行多次解码处理,得到所述语音信息的粗糙语义特征。Inputting the at least one first hidden layer state feature vector of the at least one rough context information into a rough decoder for multiple decoding processes to obtain rough semantic features of the speech information.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述根据所述至少一个第一词语确定所述前一轮语音信息的时态信息,包括:The computer-readable storage medium according to claim 16, wherein said determining the temporal information of the previous round of speech information according to the at least one first word comprises:
    将所述至少一个第一词语输入门控循环单元编码器进行编码,得到第二隐藏层状态特征向量;Inputting the at least one first word into a gated recurrent unit encoder for encoding to obtain a second hidden layer state feature vector;
    将所述第二隐藏层状态特征向量输入多层感知器,得到线性输出结果;Inputting the second hidden layer state feature vector into the multi-layer perceptron to obtain a linear output result;
    将所述线性输出结果输入时态分类器,得到所述前一轮语音信息的时态信息。Inputting the linear output result into a temporal classifier to obtain the temporal information of the previous round of speech information.
  18. 根据权利要求16所述的计算机可读存储介质,其中,所述将所述至少一个粗糙上下文信息所述至少一个第一隐藏层状态特征向量输入粗糙解码器进行多次解码处理,得到所述语音信息的粗糙语义特征,包括:The computer-readable storage medium according to claim 16, wherein said inputting said at least one first hidden layer state feature vector of said at least one rough context information into a rough decoder for multiple decoding processes to obtain rough semantic features of said speech information comprises:
    在第i次解码处理中,将输入特征向量A i输入所述粗糙解码器,得到输出特征向量B i,其中,i为大于或等于1,且小于或等于j的整数,j为所述至少一个粗糙上下文信息的数量,j为大于或等于1的整数,当i=1时,所述输入特征向量A i为所述至少一个粗糙上下文信息中的第1个粗糙上下文信息; In the i-th decoding process, the input feature vector A i is input to the rough decoder to obtain the output feature vector B i , where i is an integer greater than or equal to 1 and less than or equal to j, j is the number of the at least one rough context information, j is an integer greater than or equal to 1, and when i=1, the input feature vector A i is the first rough context information in the at least one rough context information;
    计算所述输出特征向量B i和所述至少一个第一隐藏层状态特征向量中第i个第一隐藏层状态特征向量C i之间的相似度D iCalculating the similarity D i between the output feature vector B i and the ith first hidden layer state feature vector C i of the at least one first hidden layer state feature vector;
    对所述相似度D i进行归一化处理,得到所述输入特征向量A i的权重E iPerform normalization processing on the similarity D i to obtain the weight E i of the input feature vector A i ;
    将所述权重E i与所述第i个第一隐藏层状态特征向量C i相乘,得到权重特征向量F iMultiplying the weight E i by the ith first hidden layer state feature vector C i to obtain a weight feature vector F i ;
    将所述权重特征向量F i与所述输出特征向量B i相加,得到目标输出特征向量G iAdding the weight feature vector F i to the output feature vector B i to obtain the target output feature vector G i ;
    将所述目标输出特征向量G i作为第i+1次解码处理的输入特征向量A i+1进行所述第i+1次解码处理,直至进行所述多次解码处理,得到所述语音信息的粗糙语义特征。 The target output feature vector G i is used as the input feature vector A i+ 1 of the i+1 decoding process to perform the i+1 decoding process until the multiple decoding processes are performed to obtain the rough semantic features of the speech information.
  19. 根据权利要求15所述的计算机可读存储介质,其中,The computer readable storage medium of claim 15, wherein:
    所述关键词组包括至少一个关键词,且所述至少一个关键词按照所述至少一个关键词中的每个关键词在所述语音信息中的先后位置顺序进行排列;The keyword group includes at least one keyword, and the at least one keyword is arranged according to the order of each keyword in the at least one keyword in the voice information;
    所述对所述关键词组进行多次隐藏特征提取处理,得到初始隐藏层状态特征向量,包括:The described keyword group is subjected to multiple hidden feature extraction processes to obtain an initial hidden layer state feature vector, including:
    在第n次隐藏特征提取处理中,将第一输入隐藏特征H n输入门控循环单元编码器,得到第一输出隐藏特征I n,其中,n为大于或等于1,且小于或等于m的整数,m为所述至少一个关键词的数量,m为大于或等于1的整数,当n=1时,所述输入隐藏特征H n为所述至少一个关键词中的第1个关键词; In the nth hidden feature extraction process, the first input hidden feature H n is input into the gated recurrent unit encoder to obtain the first output hidden feature I n , wherein n is an integer greater than or equal to 1 and less than or equal to m, m is the number of the at least one keyword, m is an integer greater than or equal to 1, and when n=1, the input hidden feature H n is the first keyword in the at least one keyword;
    将所述第一输出隐藏特征I n作为第n+1次隐藏特征提取处理的第一输入隐藏特征H n+1进行所述第n+1次隐藏特征提取处理,直至进行所述多次隐藏特征提取处理后,得到所述初始隐藏层状态特征向量。 The first output hidden feature I n is used as the first input hidden feature H n+1 of the n+1 hidden feature extraction process to perform the n+1 hidden feature extraction process until the hidden feature extraction process is performed multiple times to obtain the initial hidden layer state feature vector.
  20. 根据权利要求15所述的计算机可读存储介质,其中,所述根据所述粗糙语义特征和所述初始隐藏层状态特征向量进行多次回复词生成处理,得到至少一个回复词,包括:The computer-readable storage medium according to claim 15, wherein, performing multiple reply word generation processes according to the rough semantic feature and the initial hidden layer state feature vector to obtain at least one reply word, comprising:
    在第p次回复词生成处理时,将输入词向量K p、第二输入隐藏特征L p和所述粗糙语义特征输入门控循环单元解码器,得到回复词O p和第二输出隐藏特征R p,其中,p为大于或等于1,且小于或等于q的整数,q由所述语音信息决定为大于或等于1的整数,当p=1时,所述输入词向量K p为所述初始隐藏层状态特征向量; During the p-time reply word generation process, the input word vector Kp , the second input hidden feature Lp and the rough semantic feature are input into the gated recurrent unit decoder to obtain the reply word Op and the second output hidden feature Rp , wherein p is an integer greater than or equal to 1 and less than or equal to q, and q is determined by the voice information to be an integer greater than or equal to 1. When p=1, the input word vector Kp is the initial hidden layer state feature vector;
    对所述回复词O p进行词嵌入处理,得到回复词向量S pCarry out word embedding processing to described reply word Op , obtain reply word vector S p ;
    将所述回复词向量S p作为第p+1次回复词生成处理的输入词向量K p+1,将所述第二输出隐藏特征R p作为所述第p+1次回复词生成处理的第二输入隐藏特征L p+1进行所述第p+1次回复词生成处理,直至进行所述多次回复词生成处理后,得到所述至少一个回复词。 The reply word vector S p is used as the input word vector K p+1 of the p+1 reply word generation process, and the second output hidden feature R p is used as the second input hidden feature L p+1 of the p+1 reply word generation process for the p+1 reply word generation process until the at least one reply word is obtained after performing the multiple reply word generation processes.
PCT/CN2022/090129 2022-01-22 2022-04-29 Reply statement determination method and apparatus based on rough semantics, and electronic device WO2023137903A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210083351.8A CN114417891B (en) 2022-01-22 2022-01-22 Reply statement determination method and device based on rough semantics and electronic equipment
CN202210083351.8 2022-01-22

Publications (1)

Publication Number Publication Date
WO2023137903A1 true WO2023137903A1 (en) 2023-07-27

Family

ID=81278095

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090129 WO2023137903A1 (en) 2022-01-22 2022-04-29 Reply statement determination method and apparatus based on rough semantics, and electronic device

Country Status (2)

Country Link
CN (1) CN114417891B (en)
WO (1) WO2023137903A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116842169A (en) * 2023-09-01 2023-10-03 国网山东省电力公司聊城供电公司 Power grid session management method, system, terminal and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016090891A (en) * 2014-11-07 2016-05-23 トヨタ自動車株式会社 Response generation apparatus, response generation method, and response generation program
CN109241262A (en) * 2018-08-31 2019-01-18 出门问问信息科技有限公司 The method and device of revert statement is generated based on keyword
CN110851574A (en) * 2018-07-27 2020-02-28 北京京东尚科信息技术有限公司 Statement processing method, device and system
CN111460115A (en) * 2020-03-17 2020-07-28 深圳市优必选科技股份有限公司 Intelligent man-machine conversation model training method, model training device and electronic equipment
CN113035179A (en) * 2021-03-03 2021-06-25 科大讯飞股份有限公司 Voice recognition method, device, equipment and computer readable storage medium
CN113378557A (en) * 2021-05-08 2021-09-10 重庆邮电大学 Automatic keyword extraction method, medium and system based on fault-tolerant rough set

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413729B (en) * 2019-06-25 2023-04-07 江南大学 Multi-turn dialogue generation method based on clause-context dual attention model
CN112732340B (en) * 2019-10-14 2022-03-15 思必驰科技股份有限公司 Man-machine conversation processing method and device
CN111368538B (en) * 2020-02-29 2023-10-24 平安科技(深圳)有限公司 Voice interaction method, system, terminal and computer readable storage medium
CN112528989B (en) * 2020-12-01 2022-10-18 重庆邮电大学 Description generation method for semantic fine granularity of image

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016090891A (en) * 2014-11-07 2016-05-23 トヨタ自動車株式会社 Response generation apparatus, response generation method, and response generation program
CN110851574A (en) * 2018-07-27 2020-02-28 北京京东尚科信息技术有限公司 Statement processing method, device and system
CN109241262A (en) * 2018-08-31 2019-01-18 出门问问信息科技有限公司 The method and device of revert statement is generated based on keyword
CN111460115A (en) * 2020-03-17 2020-07-28 深圳市优必选科技股份有限公司 Intelligent man-machine conversation model training method, model training device and electronic equipment
CN113035179A (en) * 2021-03-03 2021-06-25 科大讯飞股份有限公司 Voice recognition method, device, equipment and computer readable storage medium
CN113378557A (en) * 2021-05-08 2021-09-10 重庆邮电大学 Automatic keyword extraction method, medium and system based on fault-tolerant rough set

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116842169A (en) * 2023-09-01 2023-10-03 国网山东省电力公司聊城供电公司 Power grid session management method, system, terminal and storage medium
CN116842169B (en) * 2023-09-01 2024-01-12 国网山东省电力公司聊城供电公司 Power grid session management method, system, terminal and storage medium

Also Published As

Publication number Publication date
CN114417891B (en) 2023-05-09
CN114417891A (en) 2022-04-29

Similar Documents

Publication Publication Date Title
CN108920622B (en) Training method, training device and recognition device for intention recognition
CN112528672B (en) Aspect-level emotion analysis method and device based on graph convolution neural network
JP7301922B2 (en) Semantic retrieval method, device, electronic device, storage medium and computer program
CN112949415B (en) Image processing method, apparatus, device and medium
CN110737758A (en) Method and apparatus for generating a model
EP4113357A1 (en) Method and apparatus for recognizing entity, electronic device and storage medium
CN112632226B (en) Semantic search method and device based on legal knowledge graph and electronic equipment
CN111026840B (en) Text processing method, device, server and storage medium
CN116304748B (en) Text similarity calculation method, system, equipment and medium
WO2023134083A1 (en) Text-based sentiment classification method and apparatus, and computer device and storage medium
CN112926308B (en) Method, device, equipment, storage medium and program product for matching text
CN114912450B (en) Information generation method and device, training method, electronic device and storage medium
US20220358955A1 (en) Method for detecting voice, method for training, and electronic devices
CN115796182A (en) Multi-modal named entity recognition method based on entity-level cross-modal interaction
WO2023137903A1 (en) Reply statement determination method and apparatus based on rough semantics, and electronic device
CN112906368B (en) Industry text increment method, related device and computer program product
CN111767720B (en) Title generation method, computer and readable storage medium
CN114120166A (en) Video question and answer method and device, electronic equipment and storage medium
WO2021129411A1 (en) Text processing method and device
CN113761923A (en) Named entity recognition method and device, electronic equipment and storage medium
CN116881446A (en) Semantic classification method, device, equipment and storage medium thereof
CN116186244A (en) Method for generating text abstract, method and device for training abstract generation model
CN115827865A (en) Method and system for classifying objectionable texts by fusing multi-feature map attention mechanism
CN114722832A (en) Abstract extraction method, device, equipment and storage medium
CN110852066A (en) Multi-language entity relation extraction method and system based on confrontation training mechanism

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22921334

Country of ref document: EP

Kind code of ref document: A1