CN114417891A - Reply sentence determination method and device based on rough semantics and electronic equipment - Google Patents

Reply sentence determination method and device based on rough semantics and electronic equipment Download PDF

Info

Publication number
CN114417891A
CN114417891A CN202210083351.8A CN202210083351A CN114417891A CN 114417891 A CN114417891 A CN 114417891A CN 202210083351 A CN202210083351 A CN 202210083351A CN 114417891 A CN114417891 A CN 114417891A
Authority
CN
China
Prior art keywords
word
voice information
reply
feature vector
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210083351.8A
Other languages
Chinese (zh)
Other versions
CN114417891B (en
Inventor
舒畅
陈又新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202210083351.8A priority Critical patent/CN114417891B/en
Priority to PCT/CN2022/090129 priority patent/WO2023137903A1/en
Publication of CN114417891A publication Critical patent/CN114417891A/en
Application granted granted Critical
Publication of CN114417891B publication Critical patent/CN114417891B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/268Morphological analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a reply sentence determination method and device based on rough semantics and electronic equipment, wherein the method comprises the following steps: acquiring previous round voice information adjacent to the voice information according to the occurrence time of the voice information of the user at the current moment; performing rough semantic extraction on the voice information according to the previous round of voice information to obtain rough semantic features corresponding to the voice information; performing word segmentation processing on the voice information to obtain a key phrase; carrying out multiple hidden feature extraction processing on the key phrase to obtain an initial hidden layer state feature vector; performing multiple reply word generation processing according to the rough semantic features and the initial hidden layer state feature vector to obtain at least one reply word; and splicing the at least one reply word according to the generation sequence of each reply word in the at least one reply word to obtain the reply sentence of the voice message.

Description

Reply sentence determination method and device based on rough semantics and electronic equipment
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a reply sentence determination method and device based on rough semantics and electronic equipment.
Background
At present, in the conventional dialog model, the previous dialog text is usually encoded, the hidden layer state feature of the obtained encoded information is used as one of the input of the decoder, and then the decoder automatically generates the dialog reply according to the time sequence. In the method, the hidden layer state characteristic coded by the previous dialog text is used as one of the generation bases of the reply sentence in the current dialog, so that the reply sentence generation process comprises the information characteristic of the previous dialog.
However, in the conventional scheme, in order to enable the model to construct a reply sentence for the key information in the dialog, the features are often focused on the key information in the previous dialog, and then in the actual extraction process, the key information is extracted as the features, and some coarse information in the dialog is often discarded. Therefore, some rough information can reflect the real attention of the conversation in some texts, and the reply sentence is not accurate enough.
Disclosure of Invention
In order to solve the above problems in the prior art, embodiments of the present application provide a reply sentence determination method and apparatus based on a rough semantic meaning, and an electronic device, which can simultaneously extract key information and rough information in a previous round of conversation, so that a generated reply sentence is more accurate.
In a first aspect, an embodiment of the present application provides a reply statement determination method based on a rough semantic meaning, including:
acquiring previous round voice information adjacent to the voice information according to the occurrence time of the voice information of the current moment of the user, wherein the occurrence time of the previous round voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round voice information and the occurrence time of the voice information is minimum;
performing rough semantic extraction on the voice information according to the previous round of voice information to obtain rough semantic features corresponding to the voice information;
performing word segmentation processing on the voice information to obtain a key phrase;
carrying out multiple hidden feature extraction processing on the key phrase to obtain an initial hidden layer state feature vector;
performing multiple reply word generation processing according to the rough semantic features and the initial hidden layer state feature vector to obtain at least one reply word;
and splicing the obtained at least one reply word according to the generation sequence of each reply word in the at least one reply word to obtain a reply sentence of the voice message.
In a second aspect, an embodiment of the present application provides a reply sentence determination apparatus based on a rough semantic meaning, including:
the acquisition module is used for acquiring the previous round of voice information adjacent to the voice information according to the occurrence time of the voice information of the current moment of the user, wherein the occurrence time of the previous round of voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is minimum;
the processing module is used for performing rough semantic extraction on the voice information according to the previous round of voice information to obtain rough semantic features corresponding to the voice information, performing word segmentation processing on the voice information to obtain a key word group, and performing multiple hidden feature extraction processing on the key word group to obtain an initial hidden layer state feature vector;
and the generating module is used for performing multiple reply word generation processing according to the rough semantic features and the initial hidden layer state feature vector to obtain at least one reply word, and splicing the obtained at least one reply word according to the generation sequence of each reply word in the at least one reply word to obtain a reply sentence of the voice information.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor coupled to the memory, the memory for storing a computer program, the processor for executing the computer program stored in the memory to cause the electronic device to perform the method of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having a computer program stored thereon, the computer program causing a computer to perform the method according to the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program, the computer operable to cause the computer to perform a method according to the first aspect.
The implementation of the embodiment of the application has the following beneficial effects:
in the embodiment of the application, the semantic features which can contain high-level abstract information in the previous round of voice information are obtained by obtaining the previous round of voice information of the user at the current moment and then performing rough semantic extraction on the previous round of voice information, and the semantic features are used as the rough semantic features of the voice information of the user at the current moment, so that the synchronous extraction of key information and rough information in the previous round of voice information is realized. Then, word segmentation processing is carried out on the voice information of the user at the current moment, and multiple hidden feature extraction processing is carried out on the obtained multiple keywords to obtain an initial hidden layer state feature vector of the voice information of the user at the current moment. And finally, performing multiple reply word generation processing according to the rough semantic features and the initial hidden layer state feature vector, and splicing the obtained at least one reply word according to the generation sequence of each reply word in the at least one reply word to obtain a reply sentence of the voice information. Based on the above, the rough semantic features simultaneously containing the key information and rough information in the previous round of conversation are used as one of the generation bases of the reply sentences in the current round of conversation, so that the reply sentence generation process contains more comprehensive information features of the previous round of conversation. Therefore, the generated reply sentence is higher in precision, can be better matched with the main body of the conversation, and improves user experience.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic hardware configuration diagram of a reply statement determination apparatus based on rough semantics according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of a reply statement determination method based on rough semantics according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of a method for performing coarse semantic extraction on voice information according to previous round of voice information to obtain coarse semantic features corresponding to the voice information according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a gated cyclic unit encoder according to an embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram of a multilayer sensor according to an embodiment of the present disclosure;
fig. 6 is a flowchart illustrating a method for inputting at least one coarse context information and at least one first hidden layer state feature vector into a coarse decoder for performing a plurality of decoding processes to obtain coarse semantic features of speech information according to an embodiment of the present application;
fig. 7 is a block flow diagram of a reply word generation process according to an embodiment of the present application;
fig. 8 is a block diagram illustrating functional modules of a reply sentence determination apparatus based on rough semantics according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application are within the scope of protection of the present application.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
First, referring to fig. 1, fig. 1 is a schematic diagram of a hardware structure of a reply sentence determination device based on a rough semantic meaning according to an embodiment of the present disclosure. The reply sentence determination apparatus 100 based on the rough semantics includes at least one processor 101, a communication line 102, a memory 103, and at least one communication interface 104.
In this embodiment, the processor 101 may be a general processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of programs according to the present disclosure.
The communication link 102, which may include a path, carries information between the aforementioned components.
The communication interface 104 may be any transceiver or other device (e.g., an antenna, etc.) for communicating with other devices or communication networks, such as an ethernet, RAN, Wireless Local Area Network (WLAN), etc.
The memory 103 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
In this embodiment, the memory 103 may be independent and connected to the processor 101 through the communication line 102. The memory 103 may also be integrated with the processor 101. The memory 103 provided in the embodiments of the present application may generally have a nonvolatile property. The memory 103 is used for storing computer-executable instructions for executing the scheme of the application, and is controlled by the processor 101 to execute. The processor 101 is configured to execute computer-executable instructions stored in the memory 103, thereby implementing the methods provided in the embodiments of the present application described below.
In alternative embodiments, computer-executable instructions may also be referred to as application code, which is not specifically limited in this application.
In alternative embodiments, processor 101 may include one or more CPUs, such as CPU0 and CPU1 of FIG. 1.
In an alternative embodiment, the reply sentence determination apparatus 100 based on the coarse semantics may include a plurality of processors, such as the processor 101 and the processor 107 in fig. 1. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In an alternative embodiment, if the reply sentence determination apparatus 100 based on the rough semantics is a server, for example, the apparatus may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform. The reply sentence determination apparatus 100 based on the rough semantics may further include an output device 105 and an input device 106. The output device 105 is in communication with the processor 101 and may display information in a variety of ways. For example, the output device 105 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 106 is in communication with the processor 101 and may receive user input in a variety of ways. For example, the input device 106 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.
The above-described reply sentence determination apparatus 100 based on the rough semantics may be a general-purpose device or a special-purpose device. The present embodiment does not limit the type of the reply sentence determination apparatus 100 based on the rough semantics.
Next, it should be noted that the embodiments disclosed in the present application may acquire and process related data based on artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Finally, the reply sentence determination method based on the rough semantics can be applied to telephone consultation, e-commerce sales, off-line entity sales, service promotion, seat telephone outbound, social platform promotion and other scenes. In the present application, the phone consultation scenario is mainly taken as an example to illustrate the reply sentence determination method based on the rough semantics, and the reply sentence determination method based on the rough semantics in other scenarios is similar to the implementation manner in the phone consultation scenario and will not be described here.
In the following, the reply sentence determination method based on rough semantics disclosed in the present application will be explained:
referring to fig. 2, fig. 2 is a schematic flowchart of a reply statement determination method based on rough semantics according to an embodiment of the present disclosure. The reply sentence determination method based on the rough semantics comprises the following steps:
201: and acquiring the previous round of voice information adjacent to the voice information according to the occurrence time of the voice information of the user at the current moment.
In the present embodiment, the occurrence time of the voice information of the previous round is shorter than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the voice information of the previous round and the occurrence time of the voice information is smallest. In brief, the previous round of voice information is the last word spoken by the user before the voice information at the current moment is spoken.
For example, the previous round of voice information may be determined by querying historical dialogue data recording dialogue data generated before the current time by a dialogue event to which the voice information at the current time belongs, by the occurrence time of the voice information at the current time of the user. Specifically, two interrelated sentence queues may be maintained in the historical dialogue data, where one queue is used to store user sentences issued by the user, and the other queue is used to store reply sentences made by the AI on the user sentences. Meanwhile, each user statement in the user statement queue and each reply statement in the reply statement queue comprise a conversation mark and conversation occurrence time, and the user statements and the reply statements with the same marks are combined into a question-answer pair through the conversation marks, namely the reply statements with the same conversation marks are replies to the user statements. Therefore, the question-answer logicality in the historical dialogue data can be guaranteed, and meanwhile, the statements of the user and the AI are separately stored, so that the search is facilitated.
Therefore, in the present embodiment, by referring to the user sentence queue, it is possible to determine the speech information whose dialog occurrence time is earlier than the occurrence time of the speech information of the user current time and whose absolute value of the difference between the occurrence time and the occurrence time of the first sentence is the smallest as the previous round of speech information.
202: and performing rough semantic extraction on the voice information according to the previous round of voice information to obtain rough semantic features corresponding to the voice information.
In this embodiment, the rough semantic features may be understood as semantic features including high-level abstract information in the previous round of voice information. Illustratively, a high-order coarse sequence representation (high-level coarse sequence representation) can be actively constructed, and then the high-order coarse sequence representation is analyzed to obtain a plurality of high-order parallel sequences. And then generating a low-order coarse (coarse) sequence by a hierarchical structure, and enabling information in a plurality of high-order parallel sequences to flow to the low-order coarse sequence, thereby realizing synchronous extraction of key information and coarse information in voice information and enabling the information of a plurality of layers to be synchronously embodied. Meanwhile, after the low-order rough sequence is converted, the model for generating the reply sentence can better memorize and understand the long-term content, and then meaningful replies closely related to the theme are generated, so that the user experience is improved.
For example, the present embodiment provides a method for performing rough semantic extraction on voice information according to a previous round of voice information to obtain rough semantic features corresponding to the voice information, as shown in fig. 3, where the method includes:
301: and detecting the previous round of voice information to obtain at least one first word contained in the previous round of voice information.
In this embodiment, the detection process may be to perform text conversion on the previous round of voice information and then perform word segmentation, and then obtain all words obtained through word segmentation processing as the at least one first word. Meanwhile, each of the at least one first word may include a word tag, and the word tag may be part-of-speech information of the corresponding first word, for example: nouns, verbs, named entities, etc.
Thus, in the present embodiment, named entity information in the text obtained by text conversion may be extracted by a Conditional Random Field (CRF), and the type of the named entity, such as a person name or an organization name, may be marked by the CRF. And then, using a Part-Of-Speech tagging tool (POS) to perform word segmentation and Part-Of-Speech tagging on the text, and extracting nouns and verbs in the text. The combined results of CRF and POS are combined in this process because POS identification is only word-wise, while CRF can be a complete phrase, such as: i work at Shanghai's redden university, CRF can fully identify the organization name entity "Shanghai redden university", while POS can only identify the noun: "shanghai", "fudan" and "university". Therefore, in processing the entity words, if the result of POS is contained in CRF, the result of CRF will be used preferentially, and the verb aspect will only use the result of POS. Thus, the first word containing the part of speech information label can be obtained.
In an optional embodiment, if the language used by the user is english, a set of verbs and named entities in a corresponding domain may be constructed in advance, and then verbs and named entities in the original sentence are extracted in a matching manner, and then the extraction of english nouns may be performed by using POS for noun recognition and extraction, and then the first term including the part-of-speech information tag is obtained.
302: and determining the temporal information of the previous round of voice information according to the at least one first word.
In this embodiment, at least one first word obtained by word segmentation may be input to a Gate Recovery Unit (GRU) encoder for encoding, so as to obtain a second hidden layer state feature vector. And then inputting the second hidden layer state feature vector into a multi-layer Perceptron (MLP) to obtain a linear output result. And finally, inputting the linear output result into a temporal classifier to obtain temporal information of the previous round of voice information.
Specifically, the GRU has a structure as shown in FIG. 4, which includes a reset gate rtUpdate gate ztCandidate memory cell
Figure BDA0003483702500000091
And a current time memory unit ht
Specifically, the reset gate rtThe operation logic of (c) can be expressed by the formula (i):
rt=σ(WrXt+Urht-1+br).........①
where σ is the activation function, WrAnd UrIs a reset gate rtAnd the initialized values of the corresponding parameter matrixes are random, and new values can be obtained through training the model. brIs a reset gate rtThe corresponding biasing, is also trainable.
Further, the door z is updatedtThe operation logic of (c) can be expressed by the formula (ii):
zt=σ(WzXt+Uzht-1+bz).........②
wherein, WzAnd UzIs to update the door ztAnd the initialized values of the corresponding parameter matrixes are random, and new values can be obtained through training the model. bzIs to update the door ztThe corresponding biasing, is also trainable.
Further, candidate memory cells
Figure BDA0003483702500000092
The operation logic of (c) can be expressed by formula (c):
Figure BDA0003483702500000093
where tanh is the activation function, W and U are candidate tokensMemory cell
Figure BDA0003483702500000094
And the initialized values of the corresponding parameter matrixes are random, and new values can be obtained through training the model. b is a candidate memory cell
Figure BDA0003483702500000095
The corresponding biasing, is also trainable.
Further, a current time memory unit htThe operating logic of (c) can be represented by the formula (iv):
Figure BDA0003483702500000096
wherein z istFor weighting, it is trainable.
In this embodiment, the MLP structure is composed of two Linear layers Linear and a ReLu activation function, as shown in fig. 5, and after the Linear output result is output through the last Linear layer, the Linear output result is input into the softmax function again for multi-tag classification, and finally, the temporal classifier determines the temporal state of the current sentence. Therefore, false recognition and missing recognition caused by the fact that independent words such as ' over ', ' and ' over ' are used only in the traditional temporal recognition are avoided. Such as: the voice message "i'm running" is currently performed, but because independent words such as "over", "is", etc. are not included, the voice message is missed to be recognized in the traditional recognition mode.
303: adding the temporal information into the word label of each first word to obtain at least one second word corresponding to at least one first word one by one.
In brief, in this embodiment, the second word is the first word of the temporal information to which the corresponding voice information is added to the word tag. Therefore, the second word carries corresponding part of speech information and temporal information of the voice on the basis of carrying the information of the voice, and the subsequently generated reply sentence is more accurate.
304: and inputting at least one second word into a coarse encoder to be encoded, and obtaining at least one coarse context information in one-to-one correspondence with the at least one second word and at least one first hidden layer state feature vector in one-to-one correspondence with the at least one second word.
In this embodiment, the coarse encoder may be a GRU encoder. Specifically, at the time of encoding, at least one second word is sequentially input to the GRU encoder in order, and the encoder outputs corresponding coarse context information and a first hidden layer state feature vector. In the encoding process, besides the currently encoded second word, the first hidden layer state feature vector output by the last encoding process can also be used as the input of the current encoding. That is, when the xth second word is encoded, the xth second word and the xth-1 first hidden layer state feature vector may be input to the GRU encoder, so as to obtain the xth coarse context information and the xth first hidden layer state feature vector. And when x is 1, since there is no 0 th second word, at this time, only the 1 st second word may be input to the GRU encoder for encoding.
305: and inputting at least one rough context information and at least one first hidden layer state feature vector into a rough decoder to perform decoding processing for multiple times to obtain rough semantic features of the voice information.
In the present embodiment, when extracting the rough semantic features, the importance level of each of the second words obtained by splitting is different for the speech information. Thus, before coarse context information is input to the coarse decoder, the information may be processed with attention to obtain the importance of the respective coarse context information.
For example, each second word corresponds to one hidden layer state feature vector, i.e. the first hidden layer state feature vector, in the encoder, in a simple manner, there are how many first hidden layer state feature vectors there are how many coarse context information. Thus, the coarse context information may be input to a decoder, which, when decoding, calculates the similarity between the feature vector of the current decoding process (the output of the current decoding process of the decoder) and the hidden layer state features decoded from the input coarse context information. Therefore, a similarity value is calculated for each rough context information, and then the similarities are normalized to obtain the weight corresponding to each rough context information. And multiplying the weight corresponding to each piece of rough context information by the hidden layer state feature vector obtained by the rough context information input encoder to obtain the attention feature, and adding the attention feature to the output feature vector obtained when the rough context information is input into the decoder to obtain the final feature obtained by inputting the rough context information into the decoder.
Based on this, the present embodiment provides a method for inputting at least one coarse context information and at least one first hidden layer state feature vector into a coarse decoder for performing a plurality of decoding processes to obtain coarse semantic features of speech information, as shown in fig. 6, the method includes:
601: in the i-th decoding process, the feature vector A is inputiInputting the coarse decoder to obtain an output feature vector Bi
In this embodiment, i is an integer greater than or equal to 1 and less than or equal to j, j is the number of at least one piece of coarse context information, j is an integer greater than or equal to 1, and when i is equal to 1, the feature vector a is inputiIs the 1 st coarse context information of the at least one coarse context information.
602: computing output feature vector BiAnd the ith first hidden layer state feature vector C in the at least one first hidden layer state feature vectoriSimilarity between them Di
In the present embodiment, the output feature vector B may be calculatediAnd the ith first hidden layer state feature vector CiCosine similarity between them to obtain similarity Di
603: to similarity DiCarrying out normalization processing to obtain an input feature vector AiWeight E ofi
In this embodiment, theSimilarity DiInputting a softmax function for normalization processing to obtain an input characteristic vector AiWeight E ofi
604: weight EiAnd the ith first hidden layer state feature vector CiMultiplying to obtain a weight feature vector Fi
605: weighting feature vector FiAnd output feature vector BiAdding to obtain a target output characteristic vector Gi
606: outputting the target to a feature vector GiInput feature vector a as i +1 th decoding processi+1And (5) carrying out decoding processing for the (i + 1) th time until carrying out decoding processing for multiple times to obtain the rough semantic features of the voice information.
Specifically, in the process of multiple decoding processes, the output at the previous time is used as the input at the next time, and the final output obtained after the multiple decoding processes is the rough semantic feature of the voice information.
203: and performing word segmentation processing on the voice information to obtain a key phrase.
In this embodiment, the voice information may be converted into a text, and then the text may be segmented to obtain at least one first keyword. Then, any two different first adjacent words and second adjacent words in the at least one first keyword are combined to obtain at least one second keyword, and the field interval between the first adjacent words and the second adjacent words is smaller than a first threshold value.
Specifically, the first neighboring word and the second neighboring word are any two different neighboring fields in the second candidate field, where the field interval is smaller than the first threshold, and the field interval can be understood as the number of characters between the corresponding positions of the first neighboring word and the second neighboring word in the corresponding text. For example, for the text "Disney park in Pudong New zone of Shanghai City and 2016, the first keyword can be obtained after word segmentation and screening: "Shanghai City", "2016", "Disney", "paradise", "Pudong", and "New district". At this time, the number of characters between the corresponding positions of the first keyword "2016 year" and "disney" in the text of the word is 3, so that the character distance between the first keyword "2016 year" and "disney" is 3. And the number of characters between corresponding positions of the first keyword "disney" and "park" in the text is 0, so the character distance between the first keyword "disney" and "park" is 0.
In this embodiment, the first threshold may be set to 1, thereby satisfying the required first keyword, taking the text "Disney park located in Pudong New zone in Shanghai City and 2016 open park" as an example: "Disney" and "paradise", and "Pudong" and "New district". Thus, the third candidate fields "disneyland" and "purdong new area" can be obtained.
And then, matching each second keyword in the at least one second keyword with a preset entity library, and screening out the second keywords failing to be matched to obtain at least one third keyword. And deleting the first keywords of each third keyword in the at least one third keyword to obtain at least one fourth keyword.
Specifically, the fourth keyword is the first keyword remaining after the first keyword forming each of the at least one third keyword is removed. Illustratively, following the example of the above text "Disneyland park in Pudong New zone with 2016 open park" in Shanghai city, assuming that the determined third keyword is "Disneyland park", since the third keyword "Disneyland park" is composed of the first keywords "Disney" and "park", the first keywords "Disney" and "park" are derived from the original several first keywords: "Shanghai City", "2016", "Disney", "paradise", "Pudong", and "New zone" are removed, the remaining first keyword: "Shanghai City", "2016", "Pudong" and "New zone" are the fourth keyword.
And finally, combining the at least one third keyword and the at least one fourth keyword to obtain a keyword group.
Specifically, following the example of the above-described text "Disneyland park located in Pudong New zone in Shanghai City and 2016, the third keyword" Disneyland "and the fourth keyword: the key phrases are obtained by combining Shanghai city, 2016, Pudong and new area: "Shanghai City", "2016", "Disney park", "Pudong", and "New district".
204: and carrying out multiple hidden feature extraction processing on the key phrase to obtain an initial hidden layer state feature vector.
In this embodiment, the keyword group may include at least one keyword, and the at least one keyword is arranged according to a sequence of a position of each keyword in the at least one keyword in the voice message. Based on this, the present embodiment provides a method for performing multiple hidden feature extraction processing on a keyword group to obtain an initial hidden layer state feature vector, which includes:
in the nth hidden feature extraction process, the first input hidden feature H is extractednInputting the GUR encoder to obtain a first output hidden feature InWherein n is an integer greater than or equal to 1 and less than or equal to m, m is the number of at least one keyword, m is an integer greater than or equal to 1, and when n is 1, the hidden feature H is inputnIs the 1 st keyword in the at least one keyword; hiding the first output with a feature InFirst input hidden feature H as n +1 th hidden feature extraction processn+1And (4) carrying out the (n + 1) th hidden feature extraction processing until the initial hidden layer state feature vector is obtained after the hidden feature extraction processing is carried out for multiple times.
205: and performing repeated reply word generation processing according to the rough semantic features and the initial hidden layer state feature vector to obtain at least one reply word.
In this embodiment, the input word vector K may be used in the p-th reply word generation processpA second input hidden feature LpAnd coarse semantic features are input into a gated cyclic unit decoder to obtain a reply word OpAnd a second output hidden feature RpWherein p is an integer greater than or equal to 1 and less than or equal to q, q is determined by the speech information to be an integer greater than or equal to 1, and when p is 1, the word vector K is inputpIs the initial hidden layer state feature vector. Then, for the reply word OpPerforming word embedding processing to obtain a reply word vector Sp. Finally, the reply word vector SpInput word vector K as a p +1 th reply word generation processp+1Hiding the second output with the feature RpSecond input hidden feature L as a p +1 th reply word generation processp+1And performing the (p + 1) th replying word generation processing until at least one replying word is obtained after performing the replying word generation processing for multiple times.
Specifically, as shown in FIG. 7, the generation process generates one reply word at a time, and generates a reply word O at the p-th timepThen, a reply word O is generated at the p +1 th timep+1. However, at the time of the p +1 th time, the reply word O generated last time (i.e., the p-th time) is generatedpAlso as one of the inputs at the p +1 th time. While the other input is a coarse semantic feature, i.e. a reply word Op+1Is composed of a reply word OpWord vector of (1), second output hidden feature R generated p timepAnd rough semantic features.
206: and splicing the at least one reply word according to the generation sequence of each reply word in the at least one reply word to obtain the reply sentence of the voice message.
In summary, in the reply sentence determination method based on rough semantics provided by the present invention, the semantic features that can include high-level abstract information in the previous round of voice information are obtained by obtaining the previous round of voice information of the user at the current moment and then performing rough semantics extraction on the previous round of voice information, and the semantic features are used as the rough semantic features of the voice information of the user at the current moment, thereby realizing synchronous extraction of key information and rough information in the previous round of voice information. Then, word segmentation processing is carried out on the voice information of the user at the current moment, and multiple hidden feature extraction processing is carried out on the obtained multiple keywords to obtain an initial hidden layer state feature vector of the voice information of the user at the current moment. And finally, performing multiple reply word generation processing according to the rough semantic features and the initial hidden layer state feature vector, and splicing the obtained at least one reply word according to the generation sequence of each reply word in the at least one reply word to obtain a reply sentence of the voice information. Based on the above, the rough semantic features simultaneously containing the key information and rough information in the previous round of conversation are used as one of the generation bases of the reply sentences in the current round of conversation, so that the reply sentence generation process contains more comprehensive information features of the previous round of conversation. Therefore, the generated reply sentence is higher in precision, can be better matched with the main body of the conversation, and improves user experience.
Referring to fig. 8, fig. 8 is a block diagram illustrating functional modules of a reply sentence determination apparatus based on rough semantics according to an embodiment of the present disclosure. As shown in fig. 8, the reply sentence determination apparatus 800 based on rough semantics includes:
an obtaining module 801, configured to obtain previous round of voice information adjacent to the voice information according to occurrence time of the voice information at a current moment of a user, where the occurrence time of the previous round of voice information is less than the occurrence time of the voice information, and an absolute value of a difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is minimum;
the processing module 802 is configured to perform coarse semantic extraction on the voice information according to the previous round of voice information to obtain coarse semantic features corresponding to the voice information, perform word segmentation processing on the voice information to obtain a keyword group, and perform multiple hidden feature extraction processing on the keyword group to obtain an initial hidden layer state feature vector;
the generating module 803 is configured to perform multiple reply word generation processing according to the coarse semantic features and the initial hidden layer state feature vector to obtain at least one reply word, and splice the at least one reply word according to a generation sequence of each reply word in the at least one reply word to obtain a reply sentence of the voice information.
In an embodiment of the present invention, in terms of performing rough semantic extraction on voice information according to previous round of voice information to obtain rough semantic features corresponding to the voice information, the processing module 802 is specifically configured to:
detecting the previous round of voice information to obtain at least one first word contained in the previous round of voice information, wherein each first word in the at least one first word comprises a word label;
determining the temporal information of the previous round of voice information according to at least one first word;
adding the temporal information into the word label of each first word to obtain at least one second word, wherein the at least one second word is in one-to-one correspondence with the at least one first word;
inputting at least one second word into a coarse encoder to be encoded to obtain at least one coarse context information and at least one first hidden layer state feature vector, wherein the at least one coarse context information corresponds to the at least one second word one by one, and the at least one first hidden layer state feature vector corresponds to the at least one second word one by one;
and inputting at least one rough context information and at least one first hidden layer state feature vector into a rough decoder to perform decoding processing for multiple times to obtain rough semantic features of the voice information.
In an embodiment of the present invention, in terms of determining temporal information of a previous round of speech information according to at least one first word, the processing module 802 is specifically configured to:
inputting at least one first word into a gated cyclic unit encoder for encoding to obtain a second hidden layer state feature vector;
inputting the second hidden layer state feature vector into a multilayer perceptron to obtain a linear output result;
and inputting the linear output result into a temporal classifier to obtain temporal information of the previous round of voice information.
In an embodiment of the present invention, in terms of inputting at least one coarse context information and at least one first hidden layer state feature vector into a coarse decoder for performing multiple decoding processes to obtain coarse semantic features of speech information, the processing module 802 is specifically configured to:
in the i-th decoding process, the feature vector A is inputiInputting the coarse decoder to obtain an output feature vector BiWherein i is an integer greater than or equal to 1 and less than or equal to j, j is the number of at least one piece of coarse context information, j is an integer greater than or equal to 1, and when i is equal to 1, the feature vector A is inputiThe 1 st rough context information in the at least one rough context information;
computing output feature vector BiAnd the ith first hidden layer state feature vector C in the at least one first hidden layer state feature vectoriSimilarity between them Di
To similarity DiCarrying out normalization processing to obtain an input feature vector AiWeight E ofi
Weight EiAnd the ith first hidden layer state feature vector CiMultiplying to obtain a weight feature vector Fi
Weighting feature vector FiAnd output feature vector BiAdding to obtain a target output characteristic vector Gi
Outputting the target to a feature vector GiInput feature vector a as i +1 th decoding processi+1And (5) carrying out decoding processing for the (i + 1) th time until carrying out decoding processing for multiple times to obtain the rough semantic features of the voice information.
In an embodiment of the present invention, the keyword group includes at least one keyword, and the at least one keyword is arranged according to a sequence of a position of each keyword in the at least one keyword in the voice message. Based on this, in the aspect of extracting and processing the hidden features of the keyword group for multiple times to obtain the initial hidden layer state feature vector, the processing module 802 is specifically configured to:
in the nth hidden feature extraction process, the first input hidden feature H is extractednInput gated cyclic unit encoder to obtain a first output hidden feature InWherein n is an integer greater than or equal to 1 and less than or equal to m, and m is at least oneThe number of the key words, m is an integer greater than or equal to 1, and when n is equal to 1, the hidden feature H is inputnIs the 1 st keyword in the at least one keyword;
hiding the first output with a feature InFirst input hidden feature H as n +1 th hidden feature extraction processn+1And (4) carrying out the (n + 1) th hidden feature extraction processing until the initial hidden layer state feature vector is obtained after the hidden feature extraction processing is carried out for multiple times.
In an embodiment of the present invention, in terms of performing multiple reply word generation processing according to the coarse semantic features and the initial hidden layer state feature vector to obtain at least one reply word, the generating module 803 is specifically configured to:
when generating the recovery word for the p-th time, inputting the word vector KpA second input hidden feature LpAnd coarse semantic features are input into a gated cyclic unit decoder to obtain a reply word OpAnd a second output hidden feature RpWherein p is an integer greater than or equal to 1 and less than or equal to q, q is determined by the speech information to be an integer greater than or equal to 1, and when p is 1, the word vector K is inputpIs an initial hidden layer state feature vector;
for reply word OpPerforming word embedding processing to obtain a reply word vector Sp
Will reply to the word vector SpInput word vector K as a p +1 th reply word generation processp+1Hiding the second output with the feature RpSecond input hidden feature L as a p +1 th reply word generation processp+1And performing the (p + 1) th replying word generation processing until at least one replying word is obtained after performing the replying word generation processing for multiple times.
In the embodiment of the present invention, in terms of performing word segmentation processing on the voice information to obtain a keyword group, the processing module 802 is specifically configured to:
converting the voice information into a text, and segmenting the text to obtain at least one first keyword;
combining the first adjacent words and the second adjacent words to obtain at least one second keyword, wherein the first adjacent words and the second adjacent words are any two different first keywords in the at least one first keyword, and the field interval between the first adjacent words and the second adjacent words is smaller than a first threshold value;
matching each second keyword in the at least one second keyword with a preset entity library, and screening out the second keywords which fail to be matched to obtain at least one third keyword;
deleting the first keywords of each third keyword in the at least one third keyword to obtain at least one fourth keyword;
and combining the at least one third keyword and the at least one fourth keyword to obtain a keyword group.
Referring to fig. 9, fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 9, the electronic device 900 includes a transceiver 901, a processor 902, and a memory 903. Connected to each other by a bus 904. The memory 903 is used to store computer programs and data, and may transfer the data stored in the memory 903 to the processor 902.
The processor 902 is configured to read the computer program in the memory 903 to perform the following operations:
acquiring previous round voice information adjacent to the voice information according to the occurrence time of the voice information of the current moment of the user, wherein the occurrence time of the previous round voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round voice information and the occurrence time of the voice information is minimum;
performing rough semantic extraction on the voice information according to the previous round of voice information to obtain rough semantic features corresponding to the voice information;
performing word segmentation processing on the voice information to obtain a key phrase;
carrying out multiple hidden feature extraction processing on the key phrase to obtain an initial hidden layer state feature vector;
performing multiple reply word generation processing according to the rough semantic features and the initial hidden layer state feature vector to obtain at least one reply word;
and splicing the at least one reply word according to the generation sequence of each reply word in the at least one reply word to obtain the reply sentence of the voice message.
In an embodiment of the present invention, in terms of performing a rough semantic extraction on voice information according to a previous round of voice information to obtain a rough semantic feature corresponding to the voice information, the processor 902 is specifically configured to perform the following operations:
detecting the previous round of voice information to obtain at least one first word contained in the previous round of voice information, wherein each first word in the at least one first word comprises a word label;
determining the temporal information of the previous round of voice information according to at least one first word;
adding the temporal information into the word label of each first word to obtain at least one second word, wherein the at least one second word is in one-to-one correspondence with the at least one first word;
inputting at least one second word into a coarse encoder to be encoded to obtain at least one coarse context information and at least one first hidden layer state feature vector, wherein the at least one coarse context information corresponds to the at least one second word one by one, and the at least one first hidden layer state feature vector corresponds to the at least one second word one by one;
and inputting at least one rough context information and at least one first hidden layer state feature vector into a rough decoder to perform decoding processing for multiple times to obtain rough semantic features of the voice information.
In an embodiment of the present invention, in determining temporal information of a previous round of speech information according to at least one first word, the processor 902 is specifically configured to:
inputting at least one first word into a gated cyclic unit encoder for encoding to obtain a second hidden layer state feature vector;
inputting the second hidden layer state feature vector into a multilayer perceptron to obtain a linear output result;
and inputting the linear output result into a temporal classifier to obtain temporal information of the previous round of voice information.
In an embodiment of the present invention, in inputting at least one coarse context information and at least one first hidden layer state feature vector into a coarse decoder for performing a plurality of decoding processes to obtain coarse semantic features of speech information, the processor 902 is specifically configured to perform the following operations:
in the i-th decoding process, the feature vector A is inputiInputting the coarse decoder to obtain an output feature vector BiWherein i is an integer greater than or equal to 1 and less than or equal to j, j is the number of at least one piece of coarse context information, j is an integer greater than or equal to 1, and when i is equal to 1, the feature vector A is inputiThe 1 st rough context information in the at least one rough context information;
computing output feature vector BiAnd the ith first hidden layer state feature vector C in the at least one first hidden layer state feature vectoriSimilarity between them Di
To similarity DiCarrying out normalization processing to obtain an input feature vector AiWeight E ofi
Weight EiAnd the ith first hidden layer state feature vector CiMultiplying to obtain a weight feature vector Fi
Weighting feature vector FiAnd output feature vector BiAdding to obtain a target output characteristic vector Gi
Outputting the target to a feature vector GiInput feature vector a as i +1 th decoding processi+1And (5) carrying out decoding processing for the (i + 1) th time until carrying out decoding processing for multiple times to obtain the rough semantic features of the voice information.
In an embodiment of the present invention, the keyword group includes at least one keyword, and the at least one keyword is arranged according to a sequence of a position of each keyword in the at least one keyword in the voice message. Based on this, in terms of performing multiple hidden feature extraction processing on the keyword group to obtain an initial hidden layer state feature vector, the processor 902 is specifically configured to perform the following operations:
in the nth hidden feature extraction process, the first input hidden feature H is extractednInput gated cyclic unit encoder to obtain a first output hidden feature InWherein n is an integer greater than or equal to 1 and less than or equal to m, m is the number of at least one keyword, m is an integer greater than or equal to 1, and when n is 1, the hidden feature H is inputnIs the 1 st keyword in the at least one keyword;
hiding the first output with a feature InFirst input hidden feature H as n +1 th hidden feature extraction processn+1And (4) carrying out the (n + 1) th hidden feature extraction processing until the initial hidden layer state feature vector is obtained after the hidden feature extraction processing is carried out for multiple times.
In an embodiment of the present invention, in terms of performing multiple reply word generation processing according to the coarse semantic features and the initial hidden layer state feature vector to obtain at least one reply word, the processor 902 is specifically configured to perform the following operations:
when generating the recovery word for the p-th time, inputting the word vector KpA second input hidden feature LpAnd coarse semantic features are input into a gated cyclic unit decoder to obtain a reply word OpAnd a second output hidden feature RpWherein p is an integer greater than or equal to 1 and less than or equal to q, q is determined by the speech information to be an integer greater than or equal to 1, and when p is 1, the word vector K is inputpIs an initial hidden layer state feature vector;
for reply word OpPerforming word embedding processing to obtain a reply word vector Sp
Will reply to the word vector SpInput word vector K as a p +1 th reply word generation processp+1Hiding the second output with the feature RpSecond input hidden feature L as a p +1 th reply word generation processp+1Performing the (p + 1) th reply word generation processing until at least one reply is obtained after performing the reply word generation processing for multiple timesA word.
In an embodiment of the present invention, in terms of performing a word segmentation process on the voice information to obtain a keyword group, the processor 902 is specifically configured to perform the following operations:
converting the voice information into a text, and segmenting the text to obtain at least one first keyword;
combining the first adjacent words and the second adjacent words to obtain at least one second keyword, wherein the first adjacent words and the second adjacent words are any two different first keywords in the at least one first keyword, and the field interval between the first adjacent words and the second adjacent words is smaller than a first threshold value;
matching each second keyword in the at least one second keyword with a preset entity library, and screening out the second keywords which fail to be matched to obtain at least one third keyword;
deleting the first keywords of each third keyword in the at least one third keyword to obtain at least one fourth keyword;
and combining the at least one third keyword and the at least one fourth keyword to obtain a keyword group.
It should be understood that the reply sentence determination device based on the rough semantics in the present application may include a smart Phone (e.g., an Android Phone, an iOS Phone, a Windows Phone, etc.), a tablet computer, a palm computer, a notebook computer, a Mobile Internet device MID (Mobile Internet Devices, abbreviated as MID), a robot, or a wearable device, etc. The above reply sentence determination device based on the rough semantics is merely an example, and is not exhaustive, and includes but is not limited to the above reply sentence determination device based on the rough semantics. In practical applications, the above reply sentence determination apparatus based on rough semantics may further include: intelligent vehicle-mounted terminal, computer equipment and the like.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention can be implemented by combining software and a hardware platform. With this understanding in mind, all or part of the technical solutions of the present invention that contribute to the background can be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments or some parts of the embodiments.
Accordingly, the present application also provides a computer readable storage medium, which stores a computer program, wherein the computer program is executed by a processor to implement part or all of the steps of any one of the reply sentence determination methods based on rough semantics as described in the above method embodiments. For example, the storage medium may include a hard disk, a floppy disk, an optical disk, a magnetic tape, a magnetic disk, a flash memory, and the like.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any one of the coarse semantics based reply sentence determination methods as set forth in the above method embodiments.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are all alternative embodiments and that the acts and modules referred to are not necessarily required by the application.
In the above embodiments, the description of each embodiment has its own emphasis, and for parts not described in detail in a certain embodiment, reference may be made to the description of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is merely a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.
The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, and the memory may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the methods and their core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A method for determining a reply sentence based on rough semantics, the method comprising:
acquiring previous round voice information adjacent to the voice information according to the occurrence time of the voice information of the current moment of the user, wherein the occurrence time of the previous round voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round voice information and the occurrence time of the voice information is minimum;
performing rough semantic extraction on the voice information according to the previous round of voice information to obtain rough semantic features corresponding to the voice information;
performing word segmentation processing on the voice information to obtain a key word group;
carrying out multiple hidden feature extraction processing on the key phrase to obtain an initial hidden layer state feature vector;
performing multiple reply word generation processing according to the rough semantic features and the initial hidden layer state feature vector to obtain at least one reply word;
and splicing the at least one reply word according to the generation sequence of each reply word in the at least one reply word to obtain the reply sentence of the voice message.
2. The method of claim 1, wherein the performing rough semantic extraction on the speech information according to the previous round of speech information to obtain rough semantic features corresponding to the speech information comprises:
detecting the previous round of voice information to obtain at least one first word contained in the previous round of voice information, wherein each first word in the at least one first word comprises a word label;
determining the temporal information of the previous round of voice information according to the at least one first word;
adding the temporal information into a word label of each first word to obtain at least one second word, wherein the at least one second word is in one-to-one correspondence with the at least one first word;
inputting the at least one second word into a coarse encoder for encoding to obtain at least one coarse context information and at least one first hidden layer state feature vector, wherein the at least one coarse context information corresponds to the at least one second word one by one, and the at least one first hidden layer state feature vector corresponds to the at least one second word one by one;
and inputting the at least one rough context information and the at least one first hidden layer state feature vector into a rough decoder to perform decoding processing for multiple times to obtain rough semantic features of the voice information.
3. The method of claim 2, wherein the determining temporal information of the previous round of speech information according to the at least one first word comprises:
inputting the at least one first word into a gated cyclic unit encoder for encoding to obtain a second hidden layer state feature vector;
inputting the second hidden layer state feature vector into a multilayer perceptron to obtain a linear output result;
and inputting the linear output result into a temporal classifier to obtain temporal information of the previous round of voice information.
4. The method according to claim 2, wherein said inputting the at least one coarse context information and the at least one first hidden layer state feature vector into a coarse decoder for a plurality of decoding processes to obtain coarse semantic features of the speech information comprises:
in the i-th decoding process, the feature vector A is inputiInputting the coarse decoder to obtain an output feature vector BiWherein i is an integer greater than or equal to 1 and less than or equal to j, j is the number of the at least one coarse context information, j is an integer greater than or equal to 1, and when i is equal to 1, the input feature vector a isiThe 1 st rough context information in the at least one rough context information;
calculating the output feature vector BiAnd an ith first hidden layer state feature vector C of the at least one first hidden layer state feature vectoriSimilarity between them Di
For the similarity DiCarrying out normalization processing to obtain the input characteristic vector AiWeight E ofi
The weight E isiAnd the ith first hidden layer state feature vector CiMultiplying to obtain a weight feature vector Fi
The weight feature vector FiAnd the output feature vector BiAdding to obtain a target output characteristic vector Gi
Outputting the target feature vector GiInput feature vector a as i +1 th decoding processi+1Performing the i +1 th decoding process until the decoding process is performedAnd carrying out decoding processing for multiple times to obtain the rough semantic features of the voice information.
5. The method of claim 1,
the keyword group comprises at least one keyword, and the at least one keyword is arranged according to the sequence of the position of each keyword in the at least one keyword in the voice information;
the multiple hidden feature extraction processing is performed on the keyword group to obtain an initial hidden layer state feature vector, and the method comprises the following steps:
in the nth hidden feature extraction process, the first input hidden feature H is extractednInput gated cyclic unit encoder to obtain a first output hidden feature InWherein n is an integer greater than or equal to 1 and less than or equal to m, m is the number of the at least one keyword, m is an integer greater than or equal to 1, and when n is 1, the input hidden feature HnThe number of the keywords is 1 in the at least one keyword;
hiding the first output with a feature InFirst input hidden feature H as n +1 th hidden feature extraction processn+1And (4) carrying out the (n + 1) th hidden feature extraction processing until the initial hidden layer state feature vector is obtained after the multiple hidden feature extraction processing.
6. The method according to claim 1, wherein performing a plurality of reply word generation processes according to the coarse semantic features and the initial hidden layer state feature vector to obtain at least one reply word comprises:
when generating the recovery word for the p-th time, inputting the word vector KpA second input hidden feature LpInputting the coarse semantic features into a gated cyclic unit decoder to obtain a reply word OpAnd a second output hidden feature RpWherein p is an integer greater than or equal to 1 and less than or equal to q, q is determined by the speech information to be an integer greater than or equal to 1, and when p is 1,the input word vector KpThe initial hidden layer state feature vector is obtained;
for the reply word OpPerforming word embedding processing to obtain a reply word vector Sp
The reply word vector SpInput word vector K as a p +1 th reply word generation processp+1Hiding the second output with the feature RpSecond input hidden feature L as the p +1 th reply word generation processp+1And performing the (p + 1) th replying word generation processing until the at least one replying word is obtained after the multiple replying word generation processing.
7. The method of claim 1, wherein the performing word segmentation processing on the voice message to obtain a keyword group comprises:
converting the voice information into a text, and segmenting the text to obtain at least one first keyword;
combining a first adjacent word and a second adjacent word to obtain at least one second keyword, wherein the first adjacent word and the second adjacent word are any two different first keywords in the at least one first keyword, and the field interval between the first adjacent word and the second adjacent word is smaller than a first threshold value;
matching each second keyword in the at least one second keyword with a preset entity library, and screening out the second keywords which fail to be matched to obtain at least one third keyword;
deleting the first keywords forming each third keyword in the at least one third keyword from the at least one first keyword to obtain at least one fourth keyword;
and combining the at least one third keyword and the at least one fourth keyword to obtain the keyword group.
8. An apparatus for determining a reply sentence based on a rough semantic, the apparatus comprising:
the acquisition module is used for acquiring the previous round of voice information adjacent to the voice information according to the occurrence time of the voice information of the current moment of the user, wherein the occurrence time of the previous round of voice information is less than the occurrence time of the voice information, and the absolute value of the difference between the occurrence time of the previous round of voice information and the occurrence time of the voice information is minimum;
the processing module is used for performing rough semantic extraction on the voice information according to the previous round of voice information to obtain rough semantic features corresponding to the voice information, performing word segmentation processing on the voice information to obtain a key word group, and performing multiple hidden feature extraction processing on the key word group to obtain an initial hidden layer state feature vector;
and the generating module is used for performing multiple reply word generation processing according to the rough semantic features and the initial hidden layer state feature vector to obtain at least one reply word, and splicing the at least one reply word according to the generation sequence of each reply word in the at least one reply word to obtain a reply sentence of the voice information.
9. An electronic device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the one or more programs including instructions for performing the steps in the method of any of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method according to any one of claims 1-7.
CN202210083351.8A 2022-01-22 2022-01-22 Reply statement determination method and device based on rough semantics and electronic equipment Active CN114417891B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210083351.8A CN114417891B (en) 2022-01-22 2022-01-22 Reply statement determination method and device based on rough semantics and electronic equipment
PCT/CN2022/090129 WO2023137903A1 (en) 2022-01-22 2022-04-29 Reply statement determination method and apparatus based on rough semantics, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210083351.8A CN114417891B (en) 2022-01-22 2022-01-22 Reply statement determination method and device based on rough semantics and electronic equipment

Publications (2)

Publication Number Publication Date
CN114417891A true CN114417891A (en) 2022-04-29
CN114417891B CN114417891B (en) 2023-05-09

Family

ID=81278095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210083351.8A Active CN114417891B (en) 2022-01-22 2022-01-22 Reply statement determination method and device based on rough semantics and electronic equipment

Country Status (2)

Country Link
CN (1) CN114417891B (en)
WO (1) WO2023137903A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116842169B (en) * 2023-09-01 2024-01-12 国网山东省电力公司聊城供电公司 Power grid session management method, system, terminal and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413729A (en) * 2019-06-25 2019-11-05 江南大学 Talk with generation method based on the more wheels of tail sentence-dual attention model of context
CN111368538A (en) * 2020-02-29 2020-07-03 平安科技(深圳)有限公司 Voice interaction method, system, terminal and computer readable storage medium
CN112528989A (en) * 2020-12-01 2021-03-19 重庆邮电大学 Description generation method for semantic fine granularity of image
WO2021072914A1 (en) * 2019-10-14 2021-04-22 苏州思必驰信息科技有限公司 Human-machine conversation processing method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6299563B2 (en) * 2014-11-07 2018-03-28 トヨタ自動車株式会社 Response generation method, response generation apparatus, and response generation program
CN110851574A (en) * 2018-07-27 2020-02-28 北京京东尚科信息技术有限公司 Statement processing method, device and system
CN109241262B (en) * 2018-08-31 2021-01-05 出门问问信息科技有限公司 Method and device for generating reply sentence based on keyword
CN111460115B (en) * 2020-03-17 2023-05-26 深圳市优必选科技股份有限公司 Intelligent man-machine conversation model training method, model training device and electronic equipment
CN113035179B (en) * 2021-03-03 2023-09-26 中国科学技术大学 Voice recognition method, device, equipment and computer readable storage medium
CN113378557B (en) * 2021-05-08 2022-08-23 重庆邮电大学 Automatic keyword extraction method, medium and system based on fault-tolerant rough set

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413729A (en) * 2019-06-25 2019-11-05 江南大学 Talk with generation method based on the more wheels of tail sentence-dual attention model of context
WO2021072914A1 (en) * 2019-10-14 2021-04-22 苏州思必驰信息科技有限公司 Human-machine conversation processing method
CN111368538A (en) * 2020-02-29 2020-07-03 平安科技(深圳)有限公司 Voice interaction method, system, terminal and computer readable storage medium
CN112528989A (en) * 2020-12-01 2021-03-19 重庆邮电大学 Description generation method for semantic fine granularity of image

Also Published As

Publication number Publication date
WO2023137903A1 (en) 2023-07-27
CN114417891B (en) 2023-05-09

Similar Documents

Publication Publication Date Title
CN108920622B (en) Training method, training device and recognition device for intention recognition
CN109960728B (en) Method and system for identifying named entities of open domain conference information
CN111738016A (en) Multi-intention recognition method and related equipment
EP4113357A1 (en) Method and apparatus for recognizing entity, electronic device and storage medium
CN112017643B (en) Speech recognition model training method, speech recognition method and related device
CN113987169A (en) Text abstract generation method, device and equipment based on semantic block and storage medium
CN111666766A (en) Data processing method, device and equipment
CN114298035A (en) Text recognition desensitization method and system thereof
CN115796182A (en) Multi-modal named entity recognition method based on entity-level cross-modal interaction
CN115983271A (en) Named entity recognition method and named entity recognition model training method
CN111597816A (en) Self-attention named entity recognition method, device, equipment and storage medium
CN110263304B (en) Statement encoding method, statement decoding method, device, storage medium and equipment
CN113221553A (en) Text processing method, device and equipment and readable storage medium
CN114742016A (en) Chapter-level event extraction method and device based on multi-granularity entity differential composition
CN111767720B (en) Title generation method, computer and readable storage medium
CN110019952B (en) Video description method, system and device
WO2023137903A1 (en) Reply statement determination method and apparatus based on rough semantics, and electronic device
CN113705315A (en) Video processing method, device, equipment and storage medium
CN115269768A (en) Element text processing method and device, electronic equipment and storage medium
CN116595023A (en) Address information updating method and device, electronic equipment and storage medium
CN116562291A (en) Chinese nested named entity recognition method based on boundary detection
CN113779202B (en) Named entity recognition method and device, computer equipment and storage medium
CN116186244A (en) Method for generating text abstract, method and device for training abstract generation model
CN116976341A (en) Entity identification method, entity identification device, electronic equipment, storage medium and program product
CN114722832A (en) Abstract extraction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant