WO2021051521A1 - 获取应答信息的方法、装置、计算机设备及存储介质 - Google Patents

获取应答信息的方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021051521A1
WO2021051521A1 PCT/CN2019/116944 CN2019116944W WO2021051521A1 WO 2021051521 A1 WO2021051521 A1 WO 2021051521A1 CN 2019116944 W CN2019116944 W CN 2019116944W WO 2021051521 A1 WO2021051521 A1 WO 2021051521A1
Authority
WO
WIPO (PCT)
Prior art keywords
text information
target
recognized
target text
sentence
Prior art date
Application number
PCT/CN2019/116944
Other languages
English (en)
French (fr)
Inventor
王健宗
程宁
于凤英
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021051521A1 publication Critical patent/WO2021051521A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device, computer equipment, and storage medium for obtaining response information.
  • the customer service needs to judge the required services based on the user's voice or text conversation content, and make further judgments and answers based on the user's response and the questions raised.
  • the language expression ability and expression habits of users are different. Therefore, traditional human customer service needs to have good expression ability to explain intent, and excellent logical thinking ability can understand the user's intention in order to make correct judgments. How to persuade users, these business capabilities There will be big differences according to the ability and quality of the manual customer service, training time and work experience. The service quality of different customer service is uneven, which may cause the overall service quality of the manual customer service to be lower.
  • the embodiments of the present application provide a method, device, computer equipment, and storage medium for obtaining response information, which are used to output response information and improve the quality of customer service.
  • an embodiment of the present application provides a method for obtaining response information, including:
  • each candidate document in the candidate document set and the target text information to be recognized into a similarity recognition model, and output the similarity between each candidate document and the target text information through the similarity recognition model ;
  • Each candidate document corresponds to an intent;
  • the similarity score between the target candidate document and the target text information to be recognized is greater than a threshold, determine the target intention corresponding to the target candidate document; each of the intentions has associated response information;
  • an embodiment of the present application provides a device for obtaining response information, including:
  • the obtaining module is used to obtain the target text information to be recognized
  • a retrieval module configured to input the to-be-identified target text information acquired by the acquisition module into the ES database for retrieval, and obtain a set of candidate documents similar to the target text information through an inverted index;
  • the similarity recognition module is used to input each candidate document in the candidate document set retrieved by the retrieval module and the target text information to be recognized into the similarity recognition model, and recognize through the similarity
  • the model outputs the similarity between each candidate document and the target text information; each candidate document corresponds to an intent;
  • the intention determination module is used to determine the target intention corresponding to the target candidate document when the similarity score between the target candidate document determined by the similarity recognition module and the target text information to be recognized is greater than a threshold; Have associated response information;
  • a response information determination module configured to determine the target response information corresponding to the target intention determined by the intention determination module
  • the output module is used to output the target response information determined by the response information determining module.
  • an embodiment of the present application also provides a computer device, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor executes the The computer-readable instructions implement the method described in the first aspect.
  • the embodiments of the present application provide one or more readable storage media storing computer readable instructions.
  • the computer readable instructions are executed by one or more processors, the one or more processing
  • the method described in the first aspect is implemented when the device is executed.
  • FIG. 1 is a schematic diagram of a scene of a communication system in an embodiment of the present application
  • FIG. 2 is a schematic flowchart of steps of an embodiment of a method for obtaining response information in an embodiment of the present application
  • FIG. 3 is a schematic flowchart of steps of another embodiment of a method for obtaining response information in an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of an embodiment of an apparatus for obtaining response information in an embodiment of the present application
  • Fig. 5 is a schematic structural diagram of another embodiment of a device for obtaining response information in an embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of an embodiment of a computer device in an embodiment of the present application.
  • the embodiment of the present application provides a method for obtaining response information.
  • the method can be applied to a communication system.
  • the communication system includes a server and a terminal.
  • the server can be an independent server or composed of multiple servers.
  • the terminal includes, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the execution subject of the method for obtaining response information in the embodiment of the present application is described by taking a server as an example.
  • the server obtains the target text information to be recognized; inputs the text information to be recognized into the ES database for retrieval, and obtains a set of candidate documents similar to the target text information through the inverted index, which narrows the search scope , And then input each candidate document and the target text information in the candidate document set into the similarity recognition model, and output the similarity between each candidate document and the text information to be recognized through the similarity recognition model, which improves The processing efficiency of the similarity recognition model; if the similarity score between the target candidate document and the target text information is greater than the threshold, the target intent corresponding to the target candidate document is determined; the target corresponding to the target intent is determined based on the correct target intent Response information, output the target response information, based on artificial intelligence output standard response information, which greatly improves the quality of customer service.
  • an embodiment of the present application provides a schematic flowchart of an embodiment of a method for obtaining response information.
  • the method for obtaining response information may specifically include the following steps:
  • the server receives the voice information or text information sent by the terminal. If the server receives the text information sent by the terminal, it does not need to be converted; if the terminal sends voice information, the voice information can be converted into text information, and then the received
  • the text information is preprocessed, where the preprocessing includes de-duplication, modal particle removal, and error correction on the received text information to obtain the target text information.
  • the terminal may be a voice output device, such as a fixed telephone, a mobile phone, etc.
  • the terminal may also be a smart terminal device, for example, a smart phone, a computer, a palmtop computer, etc.
  • the terminal may take a mobile phone as an example for illustration.
  • the server can receive the voice information sent by the mobile phone.
  • the voice information can be consulting information.
  • the server receives the voice information sent by the user over the phone.
  • the voice information can be "I want to find an insurance, suitable for Children's".
  • the server receives the voice information and converts the voice information into corresponding text information. After preprocessing the text information, the target text information is obtained.
  • the server can also actively call out the voice message to the mobile phone.
  • the voice message can be "Hello, you are our VIP customer. We are now holding an event to give back to old customers. Among them, A insurance is very suitable you”.
  • the server will receive the voice message sent by the user via the mobile phone.
  • the voice message is "A insurance is financial or consumption type”
  • the server can convert the received voice message into text message to be recognized. After preprocessing the text information, the target text information is obtained.
  • the document in the embodiment of the present application refers to a storage object in the form of text, covering multiple formats, such as Word, PDF, html, XML and other documents in different formats can be called documents.
  • documents For another example, an email, a short message, or a Weibo can also be called a document.
  • the documents in the embodiments of the present application represent text information.
  • a document collection is pre-stored in the ES database.
  • the document collection includes a large number of documents.
  • the document collection can include multiple sub-collections. Different business scenarios correspond to different sub-collections. For example, life insurance corresponds to sub-collection A. This sub-collection A Corresponding to storage life insurance-related documents, auto insurance corresponds to sub-collection B, and sub-collection B corresponds to storing auto insurance-related documents.
  • ES ElasticSearch
  • ES is a distributed full-text search framework. The bottom layer is based on Lucene. ES contains multiple fields, and each field has its own inverted index.
  • the target text information to be recognized is segmented to obtain the word sequence.
  • the document information is "A insurance is financial management type or consumption type”
  • the word sequence after word segmentation is "A insurance is financial management type or consumption type” Type”
  • the keywords are "A insurance", “financial management”, “consumption type”
  • Inverted index From the perspective of words, it is the association relationship between words and document IDs, that is to say, the document ID is searched by words.
  • Front index From the perspective of words, it is the association relationship between document ID and document content and words, that is to say, the content of the document can be obtained through ID.
  • Inverted index entries (Posting) mainly contain the following information:
  • the document ID is used to obtain the information corresponding to the document.
  • Word frequency record the number of times the word appears in the document.
  • each document corresponds to a document ID
  • the content of the document is represented as a collection of a series of keywords.
  • the keywords have also been converted to keyword IDs, for example
  • the document to be recognized is segmented, and m keywords (words) are extracted.
  • search for the corresponding document ID according to the word inverted index and then query the complete content of the document ID according to the forward index. Finally, the complete content corresponding to the document ID is returned.
  • the word sequence is obtained, and the text ID corresponding to the word sequence is retrieved through the inverted index, and the text corresponding to each word in the word sequence in a different text ID is calculated
  • the word frequency in the corresponding text ID determines the similarity between the target text information to be recognized and the candidate document according to the weight of each word in the corresponding text ID, search for similar candidate documents, and select presets in the order of similarity.
  • a number of candidate documents such as TOP30 candidate documents, the preset number of candidate documents constitute a candidate document set.
  • the inverted index is a specific storage form that implements the "word-document matrix". Through the inverted index, a list of documents containing the word can be quickly obtained according to the word. Then, a preset number of documents similar to the target text information to be recognized can be quickly searched out.
  • S103 Input each candidate document in the candidate document set and the target text information to be recognized into a similarity recognition model to obtain the similarity between each candidate document and the target text information to be recognized;
  • the similarity recognition model includes a word segmentation model; the word segmentation model may be a hidden markov model (HMM).
  • the word segmentation model is obtained by training the HMM model according to the first training sample set.
  • the training samples included in the first training sample set are the word sequence after word segmentation, and each word in the word sequence is labeled and features are labeled.
  • the feature can be part of speech, entity location, context, etc.
  • the similarity recognition model also includes a recognition model.
  • the recognition model is a Logistic Regression (LR) model. For sample points that exist in a multi-dimensional space, a linear combination of features (feature weighting) is used to fit the distribution of points in the space. And trajectory.
  • LR Logistic Regression
  • the LR model has the advantages of simplicity, efficiency, easy parallelization and online learning (dynamic expansion).
  • the recognition model is obtained by learning a second sample data set through the LR model.
  • Each sample in the second sample data set includes a corpus pair and the similarity of the corpus pair; the corpus pair includes the first corpus and the second corpus, Segment the first corpus to get the first word sequence; segment the second corpus to get the second word sequence; convert the first word sequence into the first word vector sequence, and the second word sequence into the second word vector Sequence, calculate the distance between the first word vector sequence and the second word vector sequence, the distance is used to indicate the similarity between the first corpus and the second corpus; extract the features and feature values of the first word sequence, and extract the first word sequence
  • the characteristics and characteristic values of the two-word sequence which include but are not limited to parts of speech, word meaning, context, sentence components (subject, predicate, object, etc.), categories (such as characters, phrases, idioms, etc.), etc.
  • the feature vector and similarity of each sample are input into the above formula 1 for training, and the parameter w in the
  • the target text information is segmented through the word segmentation model to obtain the first word sequence, and each word in the first word sequence can be marked with features and the corresponding feature value of each feature; through the word segmentation model, each candidate The document is segmented to obtain a second word sequence, and each word in the second word sequence can be marked with a feature and a feature value corresponding to each feature.
  • the first word sequence and the second word sequence are input to the recognition model, and the feature vector corresponding to each feature is determined.
  • the recognition model calculates the similarity of features corresponding to the same position in the first word sequence and the second word sequence.
  • the first word sequence is: I want to know a child insurance
  • the second word sequence is: I want to know the role of child insurance.
  • the feature value corresponding to the feature (part of speech) in the first word sequence is a pronoun
  • the feature value corresponding to the feature (part of speech) in the second word sequence is a pronoun
  • the feature (part of speech) corresponds to The feature value is similar, and the degree of association is 1.
  • the correlation degree of the corresponding feature value of the first feature of each location constitutes the first feature vector, which is x 1 ;
  • the second feature of each location (such as entity location)
  • the relevance of the corresponding eigenvalues constitutes the second eigenvector, which is x 2 ;
  • the relevance of the n-th feature corresponding to the feature at each position constitutes the n-th eigenvector, which is x n .
  • the similarity between the target text information and the candidate document can be output.
  • the x 1, x 2,. . . x n is input into the above formula 1 to obtain the similarity between the target text information and the candidate document.
  • the number of words is a preset number, for example, the number can be 6, 7, or 8, etc.
  • the specific number is not limited.
  • the preset number is based on experience Obtained, for example, in a dialogue, usually 7 feature words can indicate the semantics of a sentence. Therefore, the candidate documents stored in the ES library may be the documents after the screening of the feature words.
  • a corpus "I want to know the role of child insurance”
  • the processed corpus is "children’s insurance role”, which is very Obviously, only the processed corpus can show the semantics of the corpus.
  • the similarity calculation is performed. After the first word sequence retains the same number of characteristic words as the second word sequence, the first word sequence is calculated.
  • the similarity between the word sequence and the second word sequence can effectively reduce the data processing dimension and increase the speed.
  • the word segmentation model After the word segmentation model is used to segment the target text information to obtain the first word sequence; the word segmentation model performs word segmentation on the candidate document to obtain the second word sequence, after inputting the first word sequence and the second word sequence into the recognition model, before the step of determining the feature vector corresponding to each feature, the method may further include the following steps:
  • a preset number of feature words are selected from the first word sequence, and the preset number of feature words are used as the first word sequence input to the recognition model.
  • the number of feature words in the first word sequence is the same as the number of feature words in words in the second word sequence.
  • Methods for selecting feature words include, but are not limited to, the method based on information gain and the method based on word frequency.
  • information gain refers to the amount of information that a certain feature word can bring in the entire document, and the importance of the feature word is measured by the amount of information that can be brought.
  • frequency refers to the frequency of a certain word in the entire document. The greater the frequency of occurrence, the greater the importance of the feature word in the entire document.
  • the ES database can be retrieved through the inverted index first to improve the retrieval speed, and retrieve a candidate document set similar to the target text information to be processed.
  • the candidate document set includes a preset number of candidate documents, which is extremely large. The number of candidate documents is reduced, and the recognition efficiency of the target text information of the similarity recognition model and the similarity of the candidate document set is improved.
  • S104 If the similarity between the target candidate document and the target text information to be recognized is greater than a threshold, determine the target intent corresponding to the target candidate document.
  • the target candidate document is a candidate document whose similarity to the target text information in the candidate document set is greater than a threshold.
  • the number of target candidate documents is not limited.
  • the number of target candidate documents can be one or multiple ( The multiple in this embodiment includes two or more).
  • the target candidate document is taken as an example for illustration.
  • Each candidate document corresponds to an intent, and the intent is stored in association with the candidate document; the intent can include but is not limited to the subject information field and the status field.
  • a candidate document is: I want to learn about child insurance.
  • the subject information field is used to indicate the demand.
  • the subject information field indicates "child insurance”;
  • the status field is used to indicate "Yes" or "No". In this example, it needs to be understood as "Yes”. ", do not need to be understood as "No”.
  • the intent of the candidate document is: Child Insurance-Yes.
  • the threshold can be 0.6, 0.7, 0.8, etc. In this example, the threshold can be described by taking 0.7 as an example. If the similarity between the target candidate document (I want to know about children's insurance) and the target text information is greater than 0.7, the target intent corresponding to the target candidate document (eg, children's insurance-yes) is determined.
  • S105 Determine target response information corresponding to the target intention; each of the intentions has associated response information.
  • At least one response information is associated with each intent, and the response information is pre-stored based on experience.
  • the intention is: children’s insurance-yes
  • the state in the intention indicates “yes”, indicating that the customer “needs”
  • the main body information indicates “children’s insurance”
  • the main purpose of the associated response information is to explain children’s insurance, and the response information needs to be emphasized
  • the role and cost of child insurance If the intent is: children’s insurance-no, it means that the customer “does not need”, and the main message indicates “children’s insurance”.
  • the main purpose of the associated response information can be: the first kind of speech, focusing on the advantages of children’s insurance, and hope that the customer is Child insurance has a correct understanding.
  • the second phrase is to expand the scope of insurance types, give examples of insurance types, and briefly explain the applicable groups of each type of insurance, and ask customers which type of insurance they need.
  • the method of outputting the target response information may be voice output or text output, which is not limited here.
  • the target response information is output; if the target intent is associated with at least two target response messages, it can be determined which response information is preferentially output according to the preset priority of the response information. Or, obtain the customer information of the customer to which the terminal belongs, and select the response information based on the customer information. For example, if the customer is a VIP customer, is married, and has purchased other types of insurance, and has no record of purchasing child insurance, then select the first type of response information as Target response information output.
  • the target text information to be recognized is input into the sentence segmentation model, the target text information is segmented through the sentence segmentation model, the segmented text information is used as the target text information to be recognized, and the target text information to be recognized is input into the ES data To search in, repeat steps S20-S60.
  • the sentence segmentation model is a combination model of a Long Short-Term Memory (LSTM) and a conditional random field algorithm (CRF), and the segmented text information is output through the sentence segmentation model, include:
  • the target text information to be recognized is input into the LSTM, and each word in the target text information is feature-labeled through the LSTM to obtain the target text information with feature labels; the LSTM is performed by performing a sample set After training, the sample set includes multiple feature-labeled corpus.
  • LSTM Long Short Term Memory
  • RNN Chemical Neural Network
  • the LSTM model in this example is used to perform feature labeling on the target text to be recognized.
  • the feature labeling in this example includes, but is not limited to, part-of-speech labeling, sentence component labeling, and entity recognition.
  • the entities in this example can refer to names of people, names of places, names of organizations, proprietary names in a certain field, etc. Since the sentences and words used in different fields have certain regularity, the sentence segmentation model can be obtained by training according to different application fields This field includes the financial field, the manufacturing field, the science and technology field, and so on.
  • the interruption sentence model in the embodiment of the present application is explained by taking the financial field as an example.
  • the samples contained in the third sample data set are corpus in the financial field, and the words in each corpus have feature annotations.
  • LSTM learns from the third sample training set, Get model parameters.
  • the target text information to be recognized into the LSTM, and the LSTM outputs the marked features of the current word.
  • the target text information to be recognized is "I want to know about children’s health insurance”.
  • a feature (part of speech) Take an example to illustrate, such as: I (pronoun) want (can wish verb) to understand (verb) child insurance (proper noun) disease insurance (proper noun).
  • the target text information with the feature annotation is input into the CRF, and the CRF is used to insert a separator in the target text according to the feature annotation, and at least two pieces of text information after the segmentation are output.
  • the LSTM includes an input layer, a hidden layer and an output layer.
  • the output layer of the LSTM is CRF.
  • the text information with feature annotations is input into the CRF.
  • the feature function of the CRF determines the relationship between words in a sliding window, that is, the word is in the The position in the sentence, the feature of the current word tag, calculate the probability of inserting a separator between every two words, determine the position of the separator according to the probability, and perform conditional intervention on the output result of LSTM based on CRF, and select the one with the highest probability Optimal path.
  • CRF comprehensively calculates each feature to perform sentence segmentation, it will be more accurate. For example, take “I want to know about children’s health insurance” as an example, where “must” and “understand” "These two words are both verbs, and the probability of sentence segmentation between two verbs is very low. From the perspective of sentence composition, “Yao” is an adverbial and “understanding” is a predicate. The probability of sentence segmentation between adverbial and predicate is low. , "Knowledge” is a verb, and "child risk” is a noun.
  • the text information to be recognized is segmented, and the long sentence is segmented into short sentences, so as to improve the efficiency of ES database retrieval and the recognition accuracy of the similarity recognition model.
  • the words after the segmentation can be stored in the ES database, and the candidate documents in the ES data are continuously updated; the data volume of the candidate documents is continuously increased. Improve the accuracy of subsequent ES database retrieval.
  • step S101 the specific step of obtaining the target text information to be recognized may further include:
  • the server can call out a query sentence to the phone to which the customer belongs.
  • the query sentence in this example does not limit the sentence to be a question sentence, but is functionally named to distinguish the received sentence.
  • the call in this example The sentence can be called an "inquiry sentence”, and the received sentence can be called an "answer sentence", which will not be described in detail below.
  • the query sentence is an outgoing query sentence for a certain insurance product.
  • the query sentence is "The current child insurance function for old customers is very wide, I hope you can find out.”
  • the query sentence carries a category label, and the category label is "Children's Insurance", the category label is only an example and does not limit the application.
  • the answer sentence can be "OK, I want to know about children's medical insurance", and the answer sentence carries the category label.
  • the answer sentence is input into the target sub-database corresponding to the category label in the ES database for retrieval.
  • the function of the category label is used for index retrieval in the ES database.
  • Candidate documents also known as candidate words
  • the question and answer sentence carries the category label, and the server can use the category label. Input the question and answer sentence into the sub-database (children's insurance database) for retrieval, which improves the retrieval efficiency of the ES database.
  • the determining the target response information corresponding to the target intent according to the target intent includes:
  • the target response information corresponding to the highest priority target intent is selected.
  • step S103 if there are at least two target candidate documents whose similarity to the target text information is greater than the threshold, then there may also be two target intentions; or, if the target candidate document and the target to be identified have at least two data.
  • the similarity of the text information is less than or equal to the threshold, and the target text information needs to be segmented.
  • the segmented target text information may also correspond to at least two target candidate documents. In this case, there may also be at least two target documents. Goal intent.
  • the intent of the complaint category has the highest priority
  • the priority of the intent of a certain type of insurance is the second priority
  • the priority of the basic understanding of insurance is the third priority. It should be noted that in this example, only the priority level corresponding to the intention is illustrated, and the intention and priority level cannot be exhaustively listed here. In other categories, the corresponding intention and priority need to be set.
  • the two intentions are “intentions for complaints” and “intentions for a certain type of insurance”
  • the target response information corresponding to the "intentions for complaints” should be selected first and Perform output.
  • the method further includes:
  • the user's feedback sentence can be received, for example, the feedback sentence is "OK", “I don't want to know” and so on.
  • the characteristic information can include but is not limited to pitch, pitch, etc.; the voice signal can be converted into an electrical signal through a microphone, and converted into a voice waveform. Sound is a kind of wave, and the frequency (sound) The number of times the source vibrates in one second) and amplitude are important attributes to describe the wave. The size of the frequency corresponds to what we usually call the pitch, and the amplitude affects the size (pitch) of the sound.
  • the characteristic information includes but is not limited to evaluation words, text expressions, punctuation marks, etc.
  • Voice emotion recognition can be performed based on the feature information based on Gaussian mixture model (Adaptive background mixture models for real-time tracking, GMM), support vector machine (Support Vector Machine, SVM), hidden Markov model (Hidden Markov Model, HMM).
  • Gaussian mixture model Adaptive background mixture models for real-time tracking, GMM
  • support vector machine Small Vector Machine, SVM
  • hidden Markov model Hidden Markov Model, HMM
  • the feedback sentence is text information
  • it can be based on the attention mechanism in the neural network, combined with context information, combined with evaluation words, text expressions and other characteristics to output the indication result.
  • the indication result includes whether the emotion is a positive emotion or a reverse emotion.
  • the target response information is scored according to the emotional tendency indicated by the indication result.
  • each target response information corresponds to a basic score.
  • the basic score can be 50 points. If the basic score is increased or decreased by a certain amount according to the indicated result, the increased score and the decrease The scores can be the same or different.
  • the user’s emotions can be identified according to the feedback sentences fed back by the user, whether the user is satisfied, dissatisfied, happy or unhappy, and then feedback the target response information according to the result of the emotion indication, for example, if the situation indicated by the feedback sentence is reverse Emotion, you need to reduce the basic score by 2 points each time. If the cumulative reduced score is greater than or equal to the threshold (for example, the threshold is 10 points), the cumulative score reaches 10 points (5 feedback sentences are reversed Emotion), the target response information needs to be corrected.
  • the threshold for example, the threshold is 10 points
  • FIG. 3 is a schematic diagram of a scene of an embodiment of this application.
  • S301 Acquire target text information to be recognized, and the target text information may be input text converted from text or voice.
  • S302 Preprocess the input text by removing duplication, removing modal particles and correcting errors.
  • S304 Perform a similarity score on the similarity between the TOP30 candidate words and the target text information through the LR model.
  • S306 Use LSTM+CRF to segment the target text information, and output the segmented text information.
  • S307 Determine whether the number of words in the sentence in the text information after the sentence is greater than 60% of the number of words before the sentence.
  • S309 Determine the number of target intentions; if the number of target intentions is multiple (the number is greater than or equal to 2), perform step 310; if the number of target intentions is one, perform step 311.
  • S312 Return target response information corresponding to the target intent according to the returned final target intent.
  • the text information to be recognized is input into the ES database for retrieval, and a preset number (for example, 20) of candidate document sets are filtered out through the inverted search function of the ES database, which can greatly reduce
  • the purpose of identifying the similarity is to find the target intent corresponding to the candidate document, and to identify the target intent based on the similarity There are two situations: 1) When the similarity is greater than the threshold, the target intent corresponding to the target candidate document is directly determined; 2) If the similarity is less than the threshold, the final target intent cannot be determined.
  • the target text information to be recognized is input into the sentence segmentation model through the sentence segmentation model (LSMT+CRF), and the target text information to be recognized is segmented through the sentence segmentation model.
  • the long sentence is broken into short sentences, and then the short sentences are re-entered into the ES database, and the ES database is searched again, and finally the final goal intention and the goal response information corresponding to the final goal intention are obtained.
  • an embodiment of the present application also provides an embodiment of a device for obtaining response information, and the device 400 corresponds to the foregoing method embodiment.
  • the apparatus 400 for obtaining response information includes:
  • the obtaining module 401 is used to obtain the target text information to be recognized
  • the retrieval module 402 is configured to input the to-be-recognized target text information acquired by the acquisition module 401 into the ES database for retrieval, and obtain a set of candidate documents similar to the target text information through an inverted index;
  • the similarity recognition module 403 is configured to input each candidate document in the candidate document set retrieved by the retrieval module 402 and the target text information to be recognized into the similarity recognition model, and pass the similarity
  • the degree recognition model outputs the similarity between each candidate document and the target text information; each candidate document corresponds to an intent;
  • the intention determination module 404 is configured to determine the target intention corresponding to the target candidate document when the similarity score between the target candidate document determined by the similarity recognition module 403 and the target text information to be recognized is greater than a threshold;
  • the stated intention has associated response information;
  • the response information determining module 405 is configured to determine the target response information corresponding to the target intention determined by the intention determining module 404;
  • the output module 406 is configured to output the target response information determined by the response information determining module 405.
  • the device 400 further includes a sentence segmentation module 407;
  • the sentence segmentation module 407 is configured to input the target text information to be recognized when the similarity score between the target candidate document determined by the similarity recognition module 403 and the target text information to be recognized is less than or equal to the threshold value In the sentence segmentation model, output the text information after the sentence segmentation through the sentence segmentation model; the text information after the sentence segmentation is used as the target text information to be recognized;
  • the retrieval module 402 is also used for the step of inputting the text information to be recognized into the ES database for retrieval.
  • the sentence segmentation model is a combined model of a long short-term memory network LSTM and a conditional random field model CRF;
  • the sentence segmentation module 407 is further configured to input the target text information to be recognized into the LSTM, and perform feature annotations on each word in the target text information to obtain target text information with feature annotations; the LSTM is Obtained by training the sample set, the sample set includes multiple feature-labeled corpora; input the target text information with the feature label into the CRF, and use the CRF to insert the separator in the target text according to the feature label, and output the sentence At least two text messages after that.
  • the similarity recognition module 403 is also used to segment the target text information through the word segmentation model to obtain the first word sequence, and each word in the first word sequence can be marked with features, and each feature corresponds to Feature value; through the word segmentation model, each candidate document is segmented to obtain a second word sequence.
  • Each word in the second word sequence can be annotated with features and the corresponding feature value of each feature; combine the first word sequence with The second word sequence is input to the recognition model to determine the feature vector corresponding to each feature; according to each feature vector and the weight corresponding to each feature, the similarity between the target text information and the candidate document is output.
  • the acquisition module 401 is further configured to output an inquiry sentence to the terminal, the inquiry sentence carrying a category label; receiving an answer sentence corresponding to the inquiry sentence sent by the terminal, the answer sentence carrying the category label;
  • the answer sentence is used as the target text information to be recognized;
  • the inputting the target text information to be recognized into the ES database for retrieval includes: inputting the answer sentence into the ES database corresponding to the category label To search in the target sub-database.
  • the number of the target intentions is at least two, and each target intention corresponds to a priority
  • the intention determination module 404 is further configured to select the target response information corresponding to the highest priority target intention according to the priority corresponding to each target intention.
  • an embodiment of the present application also provides another embodiment of a device for obtaining response information.
  • the device 500 further includes a receiving module 408, an extracting module 409, emotion recognition module 410 and scoring module 411;
  • the receiving module 408 is configured to receive a feedback sentence fed back by the terminal, where the feedback sentence is a sentence corresponding to the target response information;
  • the extraction module 409 is configured to extract the characteristic information in the feedback sentence received by the receiving module 408;
  • the emotion recognition module 410 is configured to perform emotion recognition on the feedback sentence according to the feature information extracted by the extraction module 409 to obtain an indication result, and the indication result is used to indicate an emotional tendency;
  • the scoring module 411 is configured to score the target response information according to the emotional tendency indicated by the indication result obtained by the emotional recognition module 410.
  • Each module in the above-mentioned device can be implemented in whole or in part by software, hardware and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device 600 is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 6.
  • the computer device includes a processor 601, a memory 602, and a network interface 603 connected through a system bus 604.
  • the processor 601 of the computer device is used to provide calculation and control capabilities.
  • the memory 602 of the computer device includes a readable storage medium and an internal memory.
  • the internal memory provides an environment for the operation of the operating system and computer readable instructions in the readable storage medium.
  • the network interface 603 of the computer device is used to communicate with an external terminal through a network connection. When the computer-readable instruction is executed by the processor 601, a method for obtaining response information is realized.
  • a computer device which includes a memory 602, a processor 601, and computer-readable instructions stored in the memory and running on the processor 601.
  • the processor 601 implements the foregoing when the computer-readable instructions are executed.
  • the method for obtaining response information in the embodiment such as steps S101-S106 shown in FIG. 2, or the steps shown in FIG. 3, is not repeated here to avoid repetition.
  • the processor 601 executes a computer-readable instruction, the function of each module/unit in the embodiment of the device for obtaining response information is realized.
  • one or more readable storage media storing computer readable instructions are provided.
  • the readable storage medium includes the readable storage medium including a non-volatile readable storage medium and a volatile readable storage medium. Reading the storage medium, when the computer-readable instructions are executed by one or more processors, the one or more processors are executed to implement the steps of the method for obtaining response information in the foregoing embodiment, for example, as shown in FIG. 2 Steps S101-S106, or the steps shown in FIG. 3, are not repeated here in order to avoid repetition. Or, when the processor executes the computer-readable instruction, the function of each module/unit in the embodiment of the device for obtaining response information is realized.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种获取应答信息的方法、装置、计算机设备及存储介质,所述方法包括:获取待识别的目标文本信息(101);将待识别的目标文本信息输入到ES数据库中进行检索,通过倒排索引得到与目标文本信息相似的候选文档集合(102);将候选文档集合中的每个候选文档与待识别的目标文本信息输入到相似度识别模型中,通过相似度识别模型输出每个候选文档与目标文本信息的相似度;每个候选文档对应一个意图;若目标候选文档与待识别的目标文本信息的相似度分值大于阈值,则确定目标候选文档所对应的目标意图(104);每个意图具有关联的应答信息;确定目标意图对应的目标应答信息;输出目标应答信息。所述方法,基于人工智能输出标准的应答信息,极大的提高了客服质量。

Description

获取应答信息的方法、装置、计算机设备及存储介质
本申请以2019年9月18日提交的申请号为201910883201.3,名称为“获取应答话术的方法、装置、计算机设备及存储介质”的中国发明专利申请为基础,并要求其优先权。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种获取应答信息的方法、装置、计算机设备及存储介质。
背景技术
大多数企业为了提高服务质量都设有客服部门,例如,互联网企业、金融服务企业、制造业等,人工客服可以通过语音或者文字交谈了解用户需求。
客服需要根据用户的语音或文字的交谈内容来判断其所需业务,并根据用户的反应和提出的问题进行进一步的判断回答。用户的语言表达能力及表达习惯均不同,因此传统的人工客服需要具有良好的表达能力去说明意图,优秀的逻辑思维能力理解用户的意图,才能做出正确的判断,如何说服用户,这些业务能力会根据人工客服自身的能力素质、训练时间和工作经验的不同有很大的区别,不同的客服的服务质量良莠不齐,可能造成人工客服整体服务质量较低。
发明内容
本申请实施例提供一种获取应答信息的方法、装置、计算机设备及存储介质,用于输出应答信息,提高客服质量。
第一方面,本申请实施例提供了一种获取应答信息的方法,包括:
获取待识别的目标文本信息;
将所述待识别的目标文本信息输入到ES数据库中进行检索,通过倒排索引得到与所述目标文本信息相似的候选文档集合;
将所述候选文档集合中的每个候选文档与所述待识别的目标文本信息输入到相似度识别模型中,通过所述相似度识别模型输出每个候选文档与所述目标文本信息的相似度;所述每个候选文档对应一个意图;
若目标候选文档与所述待识别的目标文本信息的相似度分值大于阈值,则确定所述目标候选文档所对应的目标意图;每个所述意图具有关联的应答信息;
确定所述目标意图对应的目标应答信息;
输出所述目标应答信息。
第二方面,本申请实施例提供了一种获取应答信息的装置,包括:
获取模块,用于获取待识别的目标文本信息;
检索模块,用于将所述获取模块获取的所述待识别的目标文本信息输入到ES数据库中进行检索,通过倒排索引得到与所述目标文本信息相似的候选文档集合;
相似度识别模块,用于将所述检索模块检索后得到的所述候选文档集合中的每个候选文档与所述待识别的目标文本信息输入到相似度识别模型中,通过所述相似度识别模型输 出每个候选文档与所述目标文本信息的相似度;所述每个候选文档对应一个意图;
意图确定模块,用于当相似度识别模块确定的目标候选文档与所述待识别的目标文本信息的相似度分值大于阈值,确定所述目标候选文档所对应的目标意图;每个所述意图具有关联的应答信息;
应答信息确定模块,用于确定所述意图确定模块确定的所述目标意图所对应的目标应答信息;
输出模块,用于输出所述应答信息确定模块确定的所述目标应答信息。
第三方面,本申请实施例还提供了一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现第一方面所述的方法。
第四方面,本申请实施例提供了一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时实现第一方面所述的方法。
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。
图1是本申请实施例中一种通信系统的场景示意图;
图2是本申请实施例中一种获取应答信息的方法的一个实施例的步骤流程示意图;
图3是本申请实施例中一种获取应答信息的方法的另一个实施例的步骤流程示意图;
图4是本申请实施例中一种获取应答信息的装置的一个实施例的结构示意图;
图5是本申请实施例中一种获取应答信息的装置的另一个实施例的结构示意图;
图6是本申请实施例中计算机设备的一个实施例的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
本申请实施例提供了一种获取应答信息的方法,该方法可应用于一种通信系统,如图1所示,该通信系统包括服务器和终端,服务器可以用独立的服务器或者是多个服务器组 成的服务器集群来实现;该终端包括但不限定于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备等。本申请实施例中一种获取应答信息的方法的执行主体以服务器为例进行说明。本申请实施例中,服务器获取待识别的目标文本信息;将所述待识别的文本信息输入到ES数据库中进行检索,通过倒排索引得到与目标文本信息相似的候选文档集合,缩小了检索范围,然后将候选文档集合中的每个候选文档与目标文本信息输入到相似度识别模型中,通过所述相似度识别模型输出每个候选文档与所述待识别的文本信息的相似度,提高了相似度识别模型的处理效率;若目标候选文档与目标文本信息的相似度分值大于阈值,则确定所述目标候选文档所对应的目标意图;基于正确的目标意图确定所述目标意图对应的目标应答信息,将该目标应答信息输出,基于人工智能输出标准的应答信息,极大了提高了客服质量。
请参阅图2所示,本申请实施例提供了一种获取应答信息的方法的一个实施例的流程示意图,该获取应答信息的方法可以具体包括如下步骤:
S101、获取待识别的目标文本信息。
服务器接收终端发送的语音信息或者文本信息,若服务器接收到终端发送的文本信息,则不需要转换;若终端发送的是语音信息,则可以将该语音信息转换为文本信息,然后对接收到的文本信息进行预处理,其中,该预处理包括对接收到的文本信息进行去重、去语气词和纠错等处理,得到该目标文本信息。
该终端可以为语音输出设备,例如固定电话,手机等。该终端也可以为智能终端设备,例如,智能手机、电脑、掌上电脑等。本申请实施例中,该终端可以以手机为例进行举例说明。
在一个应用场景中,服务器可以接收手机发送的语音信息,该语音信息可以为咨询信息,例如,服务器接收到用户通过电话发送的语音信息,该语音信息可以为“我想找一款保险,适合儿童的”。服务器接收到该语音信息,将该语音信息转换为对应的文本信息。将该文本信息进行预处理之后,得到该目标文本信息。
在另一个应用场景中,服务器还可以主动向手机呼出语音信息,例如,该语音信息可以为“您好,您是我们的VIP客户,我们现在正在举办回馈老客户活动,其中,A保险非常适合您”。然后,服务器会接收用户通过手机发送的语音信息,例如,该语音信息为“A保险是理财型的,还是消耗型的”,服务器可以将接收到的语音信息转换为待识别的文本信息。将该文本信息进行预处理之后,得到该目标文本信息。
S102、将所述待识别的目标文本信息输入到ES数据库中进行检索,通过倒排索引得到与所述目标文本信息相似的候选文档集合。
本申请实施例中的文档(Document)是指代表以文本形式存在的存储对象,涵盖多种形式,比如Word,PDF,html,XML等不同格式的文件都可以称之为文档。再比如一封邮件,一条短信,一条微博也可以称之为文档。本申请实施例中的文档表征文本信息。
该ES数据库中预先存储了文档集合,该文档集合包括了大量的文档,该文档集合中可以包括多个子集合,不同的业务场景对应不同的子集合,例如寿险对应子集合A,该子集合A中对应存储寿险相关的文档,车险对应子集合B,子集合B中对应存储车险相关的文档。
ES(ElasticSearch)是一款分布式全文检索框架,底层基于Lucene实现,ES中包含多个字段,每个字段会有自己的倒排索引。首先,将该待识别的目标文本信息输入到ES数据 库中,ES通过分词器将该待识别的目标文本信息按照一定规则切分为单词(如按照非字符切分,按照空格切分等等),针对单词进行再加工,比如转小写、删除或新增等处理。
将待识别的目标文本信息进行分词,得到词序列,例如,该文档信息为“A保险是理财型的,还是消耗型的”,分词后的词序列为“A保险是理财型的,还是消耗型的”,然后提取该词序列中的关键词,例如,该关键词为“A保险”,“理财型”,“消耗型”
倒排索引:从单词的角度看,是单词到文档ID的关联关系,也就是说通过单词搜索到文档ID。
正排索引:从单词的角度看,是文档ID到文档内容、单词的关联关系,也就是说通过ID获取到文档的内容。
倒排索引项(Posting)主要包含如下的信息:
1、文档ID,用于获取该文档对应的信息。
2、单词频率,记录该单词在该文档中出现的次数。
3、位置,记录单词在文档中的分词位置。
倒排索引的检索流程为:在ES数据库中,每个文档都对应一个文档ID,文档内容被表示为一系列关键词的集合,在搜索引擎中,关键词也已经转换为关键词ID,例如,在本申请实施例中,待识别的文档经过分词,提取了m个关键词(单词),首先根据单词倒排索引搜索到对应的文档ID,然后根据正排索引查询文档ID的完整内容,最后返回该文档ID对应的完整内容。
本申请实施例中,将待识别的目标文本信息进行分词后,得到单词序列,通过倒排索引检索单词序列所对应的文本ID,计算单词序列中每个单词在不同的文本ID所对应的文本中的单词频率,根据每个单词在对应的文本ID中位置的权重确定待识别的目标文本信息与候选文档的相似度,搜索出相似的候选文档,按照相似度从高到底的顺序选择预置数量的候选文档,例如TOP30条候选文档,该预置数量的候选文档组成候选文档集合。本申请实施例中,倒排索引是实现“单词-文档矩阵”的一种具体存储形式,通过倒排索引,可以根据单词快速获取包含这个单词的文档列表。然后可以快速搜索出与该待识别的目标文本信息相似的预置数量的文档。
S103、将所述候选文档集合中的每个候选文档与所述待识别的目标文本信息输入到相似度识别模型,得到每个候选文档与所述待识别的目标文本信息的相似度;
该相似度识别模型包括分词模型;该分词模型可以为隐马尔可夫模型(hiding markov model,HMM)。该分词模型是根据第一训练样本集对HMM模型进行训练得到的,该第一训练样本集包括的训练样本为分词后的词序列,且对词序列中的每个词进行标注,标注特征,例如该特征可以是词性、实体位置、上下文关系等。
该相似度识别模型还包括识别模型,该识别模型为逻辑回归(Logistic Regression,LR)模型,对于多维空间中存在的样本点,用特征的线性组合(特征加权)去拟合空间中点的分布和轨迹。该LR模型具有简单、高效、易于并行且在线学习(动态扩展)的优点。
有监督训练数据集(X,Y),X表示特征,Y表示相似度,w表示该某一特征对应的权重,最终的线性模型如下:
Y=w 0+w 1x 1+w 2x 2+…+w nx n  公式1;
该识别模型是通过LR模型学习第二样本数据集得到的,该第二样本数据集中每一个样本包括一个语料对,及该语料对的相似度;该语料对包括第一语料和第二语料,将第一 语料进行分词,得到第一词序列;将第二语料进行分词,得到第二词序列;将第一词序列转换成第一词向量序列,将第二词序列转换成第二词向量序列,计算第一词向量序列和第二词向量序列之间的距离,该距离用于指示第一语料和第二语料之间的相似度;提取第一词序列的特征及特征值,提取第二词序列的特征及特征值,该特征包括但不限定于词性、词义、上下文关系、句子成分(主语、谓语、宾语等)、类别(如字、词组、成语等)等。将每个样本特征向量和相似度输入到上述公式1进行训练,得到上述模型中的参数w。得到训练好的相似度识别模型。
将每个候选文档与待识别的目标文本信息输入到已训练好的相似度识别模型中,得到每个候选文档与待识别的目标文本信息的相似度的具体方法可以为:
首先,通过分词模型对目标文本信息进行分词,得到第一词序列,且该第一词序列中的每个单词可以标注特征,及每个特征对应的特征值;通过该分词模型对每个候选文档进行分词,得到第二词序列,该第二词序列中的每个单词可以标注特征,及每个特征对应的特征值。
然后,将第一词序列和第二词序列输入到识别模型,确定每个特征对应的特征向量。
该识别模型计算在第一词序列和第二词序列中的相同位置对应的特征的相似度。例如,第一词序列为:我想了解一款儿童险;第二词序列为:我想知道儿童险的作用。在每个位置得到该位置上对应的词的词性如下表示例:
位置 1 2 3 4 5 6 7
第一词序列 了解 一款 儿童险 0
词性 代词 动词 动词 数量词 名词 分隔符 补位
第二词序列 知道 儿童险 作用
词性 代词 动词 动词 名词 虚词 名词 分隔符
由上表所示,在位置1,第一词序列中的特征(词性)对应的特征值为代词,第二词序列中的特征(词性)对应的特征值为代词,特征(词性)对应的特征值相似,关联度为1;同理,位置2,位置3和位置4,特征(词性)对应的特征值相似,关联度为1;位置5,特征(词性)对应的特征值相似,关联度为0;后续不一一举例,每个位置的第一特征(如词性)对应特征值的关联度组成第一特征向量,即为x 1;每个位置的第二特征(如实体位置)对应特征值的关联度组成第二特征向量,即为x 2;每个位置的第n特征对应特征的关联度组成第n特征向量,即为x n
最后,可以根据每个特征向量,及每个特征对应的权重,输出目标文本信息与候选文档的相似度。
将x 1,x 2。。。x n输入到上述公式1中,得到目标文本信息与候选文档的相似度。
可选的,候选文档经过分词模型分词后,单词的数量为预置数量,例如,该数量可以为6个、7个、或8个等,具体的数量不限定,该预置数量是根据经验得到的,例如,在对话中,通常7个特征词就可以表明一句话的语义,因此,在ES库中存储的候选文档可以是经过筛选特征词处理之后的文档。例如,一个语料,“我想知道儿童险的作用”,可以通过词频、信息增益等方法对该语料进行处理,只保留上述语料中的特征词,处理之后的语料为“儿童险作用”,很明显,仅通过处理之后的语料就可以表明该语料的语义。同理如果第一词序列(待识别的目标文本信息)也进行筛选特征词处理之后,再进行相似度计算, 第一词序列保留与第二词序列相同数量的特征词之后,再计算第一词序列和第二词序列之间的相似度,可以有效的降低数据处理维度,提高速率。
在通过分词模型对目标文本信息进行分词,得到第一词序列;分词模型对候选文档进行分词,得到第二词序列的步骤之后,在将第一词序列和第二词序列输入到识别模型,确定每个特征对应的特征向量步骤之前,所述方法还可以包括如下步骤:
若第一词序列中包含的单词的数量大于门限,则在第一词序列中选取预置数量的特征词,将该预置数量的特征词作为输入到识别模型中的第一词序列,该第一词序列中的特征词的数量与该第二词序列中的单词的特征词数量相同。
选取特征词的方法包括但不限于基于信息增益的方式和基于词频的方式。其中,信息增益是指某一个特征词在整个文档中能够带来的信息量,通过能够带来的信息量来衡量该特征词的重要性。词频是指某一个词在整个文档中出现的频率,出现的频率越大,可能该特征词在整个文档中的重要性越大。
本实施例中,首先可以通过ES数据库通过倒排索引检索,提高检索速度,检索出与待处理的目标文本信息相近似的候选文档集合,该候选文档集合包括预置数量的候选文档,极大的减少了候选文档的数量,提高了相似度识别模型的目标文本信息和候选文档集相似度的识别效率。
S104、若目标候选文档与所述待识别的目标文本信息的相似度大于阈值,则确定所述目标候选文档所对应的目标意图。
该目标候选文档为在候选文档集合中与目标文本信息的相似度大于阈值的候选文档,该目标候选文档的数量并不限定,该目标候选文档的数量可以为1个,也可以为多个(本实施例中的多个包括2个及以上),本示例中,以该目标候选文档为1个进行举例说明。每个候选文档均会对应一个意图,该意图与候选文档关联存储;该意图可以包括但不限定于主体信息字段和状态字段。例如,一个候选文档为:我想要了解儿童险。其中,该主体信息字段用于指示需求,在该例子中,该主体信息字段指示“儿童险”;该状态字段用于指示“是”或“否”,在该示例中,需要了解为“是”,不需要了解为“否”,综上,该候选文档对应的意图为:儿童险-是。
该阈值可以为0.6、0.7或0.8等,本示例中,该阈值可以以0.7为例进行说明。若目标候选文档(我想了解儿童险)与目标文本信息的相似度大于0.7,则确定该目标候选文档所对应的目标意图(如,儿童险-是)。
S105、确定所述目标意图对应的目标应答信息;每个所述意图具有关联的应答信息。
对每个意图关联至少一个应答信息,该应答信息是根据经验预先存储的。例如,意图为:儿童险-是,意图中的状态指示“是”,表明客户“需要”,主体信息指示“儿童险”,关联的应答信息的主要目的是说明儿童险,应答信息需要重点说明儿童险的作用,费用等等。若意图为:儿童险-否,表明客户“不需要”,主体信息指示“儿童险”,关联的应答信息的主要目的可以为:第一种话术,重点说明儿童险的优势,希望客户对儿童险有正确的理解。第二种话术,扩大险种范围,险种举例,并简要说明每种险适用人群,询问客户需要哪个险种。
S106、输出所述目标应答信息。
向终端发送该目标应答信息。输出该目标应答信息的方式可以是语音输出,也可以是文字输出,此处并不限定。
若目标意图关联一个目标应答信息,则输出该目标应答信息;若目标意图关联了至少两个目标应答信息,则可以按照预先设置的应答信息的优先级,确定优先输出哪个应答信息。或者,获取终端所属客户的客户信息,根据客户信息选择应答信息,例如,该客户为VIP客户,已婚,且已经购买过其他险种,没有购买儿童险的记录,那么选择第一种应答信息作为目标应答信息输出。
进一步的,S107、若所述标候选文档与所述待识别的目标文本信息的相似度小于或者等于所述阈值,将所述待识别的目标文本信息输入到断句模型中,通过所述断句模型输出断句后的文本信息;所述断句后的文本信息作为待识别的目标文本信息。
若候选文档集合中的所有候选文档与该待识别的目标文本信息的相似度均小于或者等于0.7,则表明可能由于目标文本信息的长度过长,影响相似度识别模型识别的准确率,则将待识别的目标文本信息输入到断句模型中,通过该断句模型对该目标文本信息进行断句,将断句后的文本信息作为待识别的目标文本信息,将该待识别的目标文本信息输入到ES数据中进行检索,重复执行步骤S20-S60。
可选的,该断句模型为长短期记忆网络(Long Short-Term Memory,LSTM)和条件随机场模型(conditional random field algorithm,CRF)的组合模型,通过所述断句模型输出断句后的文本信息,包括:
首先,将所述待识别的目标文本信息输入到所述LSTM中,通过LSTM对目标文本信息中的每个词进行特征标注,得到具有特征标注的目标文本信息;该LSTM是通过对样本集进行训练得到的,该样本集中包括多个具有特征标注的语料。
LSTM(Long Short Term Memory),是一种特殊类型的RNN(循环神经网络),能够学习长期的依赖关系。本示例中的LSTM模型用于对待识别的目标文本进行特征标注,本示例中的特征标注包括但不限于词性标注、句子成分标注和实体识别。本示例中的实体可以指人名,地名,组织名称,某领域专有名称等,由于不同的领域所使用的语句和词具有一定的规律性,该断句模型可是根据不同的应用领域进行训练得到的,该领域包括金融领域,制造业领域,科技领域等等,本申请实施例中断句模型以金融领域为例进行说明。获取用于训练LSTM的第三样本数据集,该第三样本数据集中包含的样本为金融领域中的语料,且每个语料中的词具有特征标注,LSTM通过对第三样本训练集的学习,得到模型参数。
将待识别的目标文本信息输入到该LSTM中,LSTM输出当前词语被标注的特征,例如,待识别的目标文本信息为“我想了解儿童险疾病险”,本示例中以一个特征(词性)为例说明,如:我(代词)要(能愿动词)了解(动词)儿童险(专有名词)疾病险(专有名词)。
然后,将具有特征标注的目标文本信息输入到CRF中,采用CRF根据特征标注在该目标文本中插入分隔符,输出断句后的至少两个文本信息。
LSTM包括输入层、隐藏层和输出层,其中该LSTM的输出层为CRF,将具有特征标注的文本信息输入到CRF中,CRF的特征函数在滑动窗口下确定单词之间的关系,即单词在句子中出现的位置,当前单词标记的特征,计算在每两个词之间插入分隔符的概率,根据该概率确定分隔符的位置,基于CRF对LSTM的输出结果进行条件干预,选择概率最大的最优路径。
由于词性、句子成分和实体等特征对于句子断句有明显的影响,CRF综合计算各个特征进行断句会更准确,例如,以“我想了解儿童险疾病险”为例,其中“要”和“了解”这两个词都是动词,在两个动词之间断句的概率很低,而且从句子成分来看,“要”是状语,“了解”是谓语,状语和谓语之间断句的概率较低,“了解”是动词,“儿童险”是名词,动词和名词之间断句的概率比较低;而且,“了解”是谓语,“儿童险”是宾语,在谓语和宾语之间断句概率低;“儿童险”和“疾病险”都是专有名词,都是宾语,而且无关联关系,则确定“儿童险”和“疾病险”之间加入分隔符,对句子进行断句。输出断句后的文本信息,即“我要了解儿童险、疾病险”。
若返回的句子中有句子的字数大于断句前字数的60%,则只返回该句子;若返回的句子中没有句子的字数大于断句前字数的60%,则返回所有句子。
然后将返回后的句子重复执行S102-S106中的步骤。
本实施例中,对待识别的文本信息进行断句,将长句切分为短句,以提高ES数据库检索的效率及相似度识别模型的识别准确率。且将句子进行断句后,可以将断句之后的话术存储到ES数据库中,不断更新ES数据中的候选文档;不断增加候选文档的数据量。提高后续ES数据库检索的准确率。
进一步的,在步骤S101中,获取待识别的目标文本信息的具体步骤还可以包括:
向终端输出询问语句,所述询问语句携带类别标签;
以语音信息为例,服务器可以向客户所属的电话呼出询问语句,本示例中的询问语句并非限定该语句是问句,而是为了区分接收到的语句而功能上的命名,本示例中呼出的语句可以称为“询问语句”,而接收到的语句可以称为“回答语句”,以下不赘述。
例如,该询问语句是针对某一个保险品种的呼出询问语句,该询问语句为“当前针对老客户推出的儿童险功能很广,希望您了解一下”,该询问语句携带类别标签,该类别标签为“儿童险”,该类别标签仅是举例说明,并不造成对本申请的限定。
接收终端发送的所述询问语句对应的回答语句,所述回答语句携带所述类别标签;所述回答语句作为所述待识别的目标文本信息。
接收用户通过手机发送的回答语句,例如,该回答语句可以为“好的,我想了解儿童医疗保险”,该回答语句携带该类别标签。
将所述该回答语句输入到所述ES数据库中所述类别标签对应的目标子数据库中进行检索。
该类别标签的作用用于在ES数据库中进行索引检索,在该ES数据库中可以将候选文档(也可以理解为候选话术)进行分类存储,该问答语句携带类别标签,服务器可以根据该类别标签将该问答语句输入到子数据库(儿童险数据库)中检索,提高了ES数据库的检索效率。
可选的,若所述目标意图的数量为至少两个,每个所述目标意图对应一个优先级,所述根据所述目标意图确定所述目标意图对应的目标应答信息,包括:
按照每个目标意图对应的优先级,选择最高优先级的目标意图所对应的目标应答信息。
在步骤S103中,若与目标文本信息相似度大于阈值的目标候选文档的数据至少有两个,那么该目标意图也可能有两个;或者,若所述标候选文档与所述待识别的目标文本信息的相似度小于或者等于所述阈值,需要对该目标文本信息进行断句,断句之后,断句后 的目标文本信息也可能对应至少两个目标候选文档,这种情况下,也可能有至少两个目标意图。
预先设置意图的优先级,例如,投诉类的意图优先级别最高,需要了解某一类保险的意图的优先级为第二优先级,而想要基础了解保险的优先级为第三优先级。需要说明的是,本示例中,仅是对意图对应的优先级别进行举例说明,意图和优先级别此处不能穷举,在其他的类别中,需要对应的设置意图与优先级。
例如,当有两个意图时,该两个意图,分别为“投诉类的意图”和“想要了解某一类保险的意图”,那么优先选择“投诉类的意图”对应的目标应答信息并进行输出。
可选的,在输出所述目标应答信息之后,所述方法还包括:
接收终端反馈的反馈语句;
当向终端输出该目标应答信息之后,可以接收用户的反馈语句,例如,该反馈语句为“好的”,“我不想了解”等等。
提取所述反馈语句中的特征信息。
若该反馈语句为语音信息,则该特征信息可以包括但不限定于音高,音调等;语音信号可以通过麦克风转化成电信号,转换成语音波形图,声音作为波的一种,频率(声源在一秒内振动的次数)和振幅是描述波的重要属性,频率的大小与我们通常所说的音高对应,而振幅影响声音的大小(音调)。若该反馈语句为文本信息,则该特征信息包括但不限定于评价词,文本表情,标点符号等。
根据所述特征信息对所述反馈语句进行情感识别,得到指示结果,所述指示结果用于指示情感倾向;
可以根据特征信息基于高斯混合模型(Adaptive background mixture models for real-time tracking,GMM)、支持向量机(Support Vector Machine,SVM),隐马尔可夫模型(Hidden Markov Model,HMM)进行语音情感识别。
若该反馈语句是文本信息,则可以基于神经网络中的注意力机制,结合上下文信息,并结合评价词,文本表情等特征输出指示结果,例如,该指示结果包括情感是正向情感,还是反向情感。
根据所述指示结果指示的情感倾向对所述目标应答信息进行评分。
本示例中,每个目标应答信息对应有一个基础分,例如该基础分可以是50分,若根据指示结果相应的在基础分的基础上增加或减少一定的分值,增加的分值和减少的分值可以相同,也可以不同。可以根据用户反馈的反馈语句,识别用户的情感,用户是满意,不满意,高兴还是不高兴,然后根据情感的指示结果在反馈到目标应答信息,例如,若该反馈语句指示的情况是反向情感,则需要在基础分的基础上每次减少2分,若累计减少的分数大于或者等于门限(例如门限为10分),累计分数达到了10分(5次反馈的反馈语句均是反向情感),则需要对该目标应答信息进行修正。
为了方便理解,请参阅图3所示,图3为本申请一个实施例的场景示意图。
S301、获取待识别的目标文本信息,目标文本信息可以为文字或语音所转换的输入文本。
S302、通过去重、去语气词和纠错对输入文本进行预处理。
S303、将预处理后的目标文本信息输入到所述ES数据库中所述类别标签对应的目标子数据库中进行检索,通过倒排索引得到与所述目标文本信息相似的候选文档集合(如 TOP30条候选话术)。
S304、通过LR模型将对TOP30候选话术和目标文本信息的相似度进行相似度打分。
S305、若相似度的分数值大于0.7,则执行308;若该相似度小于或者等于0.7,则执行步骤306。
S306、使用LSTM+CRF对目标文本信息进行断句,输出断句后的文本信息。
S307、判断断句后的文本信息中的句子的字数是否大于断句前字数的60%。
S3071、若返回的断句后的文本信息中有句子的字数超过断句前字数的60%,则只返回该句子;S3072、若返回的句子中没有句子的字数未超过断句前字数的60%,则返回所有句子。将返回后的句子重复执行步骤303-305中的操作。
S308、返回目标候选话术所对应的意图目标意图;
S309、判断目标意图的数量;若目标意图的数量为多个(多个为大于或者等于2个),执行步骤310;若目标意图的数量为一个,则执行步骤311。
S310、若目标意图的数量为多个,则根据每个目标意图对应的优先级规则返回最终的目标意图。
S311、确定最终的目标意图。
S312、根据返回的最终的目标意图意图,返回目标意图对应的目标应答信息。
本实施例中,首先,将待识别的文本信息输入到ES数据库中进行检索,通过ES数据库的倒排检索功能先筛选出预置数量(例如20个)的候选文档集合,可以极大的减少输入到相似度识别模型的文档量,再由相似度识别模型识别目标文本信息和每个候选文档的相似度,识别相似度的目的是为了找到候选文档对应的目标意图,根据相似度识别目标意图包括两种情况:1)当相似度大于阈值时,则直接确定目标候选文档所对应的目标意图;2)若相似度小于阈值时,则不能确定最终的目标意图,这种情况下,有可能是由于句子过长从而使得相似度模型识别准确率降低,进一步的,通过断句模型(LSMT+CRF)将待识别的目标文本信息输入到断句模型中,通过断句模型对待识别的目标文本信息进行断句,将长句断成短句,然后,再将短句重新输入到ES数据库中,ES数据库重新检索,最后得到最终的目标意图及最终的目标意图对应的目标应答信息。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
请参阅图4所示,本申请实施例还提供了一种获取应答信息的装置的一个实施例,该装置400与上述方法实施例相对应。该获取应答信息的装置400包括::
获取模块401,用于获取待识别的目标文本信息;
检索模块402,用于将所述获取模块401获取的所述待识别的目标文本信息输入到ES数据库中进行检索,通过倒排索引得到与所述目标文本信息相似的候选文档集合;
相似度识别模块403,用于将所述检索模块402检索后得到的所述候选文档集合中的每个候选文档与所述待识别的目标文本信息输入到相似度识别模型中,通过所述相似度识别模型输出每个候选文档与所述目标文本信息的相似度;所述每个候选文档对应一个意图;
意图确定模块404,用于当相似度识别模块403确定的目标候选文档与所述待识别的目标文本信息的相似度分值大于阈值,确定所述目标候选文档所对应的目标意图;每个所述意图具有关联的应答信息;
应答信息确定模块405,用于确定所述意图确定模块404确定的所述目标意图所对应的目标应答信息;
输出模块406,用于输出所述应答信息确定模块405确定的所述目标应答信息。
可选的,该装置400还包括断句模块407;
断句模块407,用于当相似度识别模块403确定的目标候选文档与所述待识别的目标文本信息的相似度分值小于或者等于所述阈值时,将所述待识别的目标文本信息输入到断句模型中,通过所述断句模型输出断句后的文本信息;所述断句后的文本信息作为待识别的目标文本信息;
检索模块402,还用于将所述待识别的文本信息输入到ES数据库中进行检索的步骤。
可选的,所述断句模型为长短期记忆网络LSTM和条件随机场模型CRF的组合模型;
断句模块407,还用于将所述待识别的目标文本信息输入到所述LSTM中,对所述目标文本信息中的每个词进行特征标注,得到具有特征标注的目标文本信息;该LSTM是通过对样本集进行训练得到的,该样本集中包括多个具有特征标注的语料;将具有特征标注的目标文本信息输入到CRF中,采用CRF根据特征标注在该目标文本中插入分隔符,输出断句后的至少两个文本信息。
可选的,相似度识别模块403,还用于通过分词模型对目标文本信息进行分词,得到第一词序列,且该第一词序列中的每个单词可以标注特征,及每个特征对应的特征值;通过该分词模型对每个候选文档进行分词,得到第二词序列,该第二词序列中的每个单词可以标注特征,及每个特征对应的特征值;将第一词序列和第二词序列输入到识别模型,确定每个特征对应的特征向量;根据每个特征向量,及每个特征对应的权重,输出目标文本信息与候选文档的相似度。
可选的,获取模块401,还用于向终端输出询问语句,所述询问语句携带类别标签;接收终端发送的所述询问语句对应的回答语句,所述回答语句携带所述类别标签;所述回答语句作为所述待识别的目标文本信息;所述将所述待识别的目标文本信息输入到ES数据库中进行检索,包括:将所述回答语句输入到所述ES数据库中所述类别标签对应的目标子数据库中进行检索。
可选的,所述目标意图的数量为至少两个,每个所述目标意图对应一个优先级;
意图确定模块404,还用于按照每个目标意图对应的优先级,选择最高优先级的目标意图所对应的目标应答信息。
请参阅图5所示,在上述图4对应的实施例的基础上,本申请实施例还提供了一种获取应答信息的装置的另一个实施例,该装置500还包括接收模块408,提取模块409,情感识别模块410和评分模块411;
接收模块408,用于接收终端反馈的反馈语句,该反馈语句为目标应答信息对应的语句;
提取模块409,用于提取所述接收模块408接收的反馈语句中的特征信息;
情感识别模块410,用于根据所述提取模块409提取的特征信息对所述反馈语句进行情感识别,得到指示结果,所述指示结果用于指示情感倾向;
评分模块411,用于根据所述情感识别模块410得到的指示结果指示的情感倾向对所述目标应答信息进行评分。
关于获取应答信息的装置的具体说明可以参见上文中对于获取应答信息的方法实施例中的说明,在此不再赘述。上述装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备600,该计算机设备可以是服务器,其内部结构图可以如图6所示。该计算机设备包括通过系统总线604连接的处理器601、存储器602和网络接口603。其中,该计算机设备的处理器601用于提供计算和控制能力。该计算机设备的存储器602包括可读存储介质、内存储器。该内存储器为可读存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的网络接口603用于与外部的终端通过网络连接通信。该计算机可读指令被处理器601执行时以实现一种获取应答信息的方法。
在一个实施例中,提供了一种计算机设备,包括存储器602、处理器601及存储在存储器上并可在处理器601上运行的计算机可读指令,处理器601执行计算机可读指令时实现上述实施例中获取应答信息的方法,例如图2所示的步骤S101-S106,或者图3中所示的步骤,为避免重复,这里不再赘述。或者,处理器601执行计算机可读指令时实现获取应答信息的装置这一实施例中的各模块/单元的功能。
在一实施例中,提供一个或多个存储有计算机可读指令的可读存储介质,所述可读存储介质包括所述可读存储介质包括非易失性可读存储介质和易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时实现上述实施例中获取应答信息的方法的步骤,例如图2所示的步骤S101-S106,或者图3中所示的步骤,为避免重复,这里不再赘述。或者,处理器执行计算机可读指令时实现获取应答信息的装置这一实施例中的各模块/单元的功能。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种获取应答信息的方法,其特征在于,包括:
    获取待识别的目标文本信息;
    将所述待识别的目标文本信息输入到ES数据库中进行检索,通过倒排索引得到与所述目标文本信息相似的候选文档集合;
    将所述候选文档集合中的每个候选文档与所述待识别的目标文本信息输入到相似度识别模型中,通过所述相似度识别模型输出每个候选文档与所述目标文本信息的相似度;所述每个候选文档对应一个意图;
    若目标候选文档与所述待识别的目标文本信息的相似度分值大于阈值,则确定所述目标候选文档所对应的目标意图;每个所述意图具有关联的应答信息;
    确定所述目标意图对应的目标应答信息;
    输出所述目标应答信息。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    若目标候选文档与所述待识别的目标文本信息的相似度分值小于或者等于所述阈值,则将所述待识别的目标文本信息输入到断句模型中,通过所述断句模型输出断句后的文本信息;所述断句后的文本信息作为待识别的目标文本信息;
    执行所述将所述待识别的文本信息输入到ES数据库中进行检索的步骤,至,所述输出所述目标应答信息的步骤。
  3. 根据权利要求2所述的方法,其特征在于,所述断句模型为长短期记忆网络LSTM和条件随机场模型CRF的组合模型;所述将所述待识别的目标文本信息输入到断句模型中,通过所述断句模型输出断句后的文本信息,包括:
    将所述待识别的目标文本信息输入到所述LSTM中,对所述目标文本信息中的每个词进行特征标注,得到具有特征标注的目标文本信息;该LSTM是通过对样本集进行训练得到的,该样本集中包括多个具有特征标注的语料;
    将具有特征标注的目标文本信息输入到CRF中,采用CRF根据特征标注在该目标文本中插入分隔符,输出断句后的至少两个文本信息。
  4. 根据权利要求1所述的方法,其特征在于,所述将所述候选文档集合中的每个候选文档与所述待识别的文本信息输入到相似度识别模型中,通过所述相似度识别模型输出每个候选文档与所述待识别的文本信息的相似度,包括:
    通过分词模型对目标文本信息进行分词,得到第一词序列,且该第一词序列中的每个单词可以标注特征,及每个特征对应的特征值;通过该分词模型对每个候选文档进行分词,得到第二词序列,该第二词序列中的每个单词可以标注特征,及每个特征对应的特征值;
    将第一词序列和第二词序列输入到识别模型,确定每个特征对应的特征向量;
    根据每个所述特征向量,及每个特征对应的权重,输出目标文本信息与候选文档的相似度。
  5. 根据权利要求1所述的方法,其特征在于,所述获取待识别的目标文本信息,包括:
    向终端输出询问语句,所述询问语句携带类别标签;
    接收终端发送的所述询问语句对应的回答语句,所述回答语句携带所述类别标签;所述回答语句作为所述待识别的目标文本信息;
    所述将所述待识别的目标文本信息输入到ES数据库中进行检索,包括:
    将所述回答语句输入到所述ES数据库中所述类别标签对应的目标子数据库中进行检索。
  6. 根据权利要求1所述的方法,其特征在于,所述目标意图的数量为至少两个,每个所述目标意图对应一个优先级,所述根据所述目标意图确定所述目标意图对应的目标应答信息,包括:
    按照每个目标意图对应的优先级,选择最高优先级的目标意图所对应的目标应答信息。
  7. 根据权利要求1所述的方法,其特征在于,所述输出所述目标应答信息之后,所述方法还包括:
    接收终端反馈的反馈语句;
    提取所述反馈语句中的特征信息;
    根据所述特征信息对所述反馈语句进行情感识别,得到指示结果,所述指示结果用于指示情感倾向;
    根据所述指示结果指示的情感倾向对所述目标应答信息进行评分。
  8. 一种获取应答信息的装置,其特征在于,包括:
    获取模块,用于获取待识别的目标文本信息;
    检索模块,用于将所述获取模块获取的所述待识别的目标文本信息输入到ES数据库中进行检索,通过倒排索引得到与所述目标文本信息相似的候选文档集合;
    相似度识别模块,用于将所述检索模块检索后得到的所述候选文档集合中的每个候选文档与所述待识别的目标文本信息输入到相似度识别模型中,通过所述相似度识别模型输出每个候选文档与所述目标文本信息的相似度;所述每个候选文档对应一个意图;
    意图确定模块,用于当相似度识别模块确定的目标候选文档与所述待识别的目标文本信息的相似度分值大于阈值,确定所述目标候选文档所对应的目标意图;每个所述意图具有关联的应答信息;
    应答信息确定模块,用于确定所述意图确定模块确定的所述目标意图所对应的目标应答信息;
    输出模块,用于输出所述应答信息确定模块确定的所述目标应答信息。
  9. 根据权利要求8所述的获取应答信息的装置,其特征在于,所述获取应答信息的装置还包括断句模块;
    所述断句模块,用于当所述相似度识别模块确定的所述目标候选文档与所述待识别的目标文本信息的相似度分值小于或者等于所述阈值时,将所述待识别的目标文本信息输入到断句模型中,通过所述断句模型输出断句后的文本信息;所述断句后的文本信息作为待识别的目标文本信息;
    所述检索模块,还用于将所述待识别的文本信息输入到ES数据库中进行检索的步骤。
  10. 根据权利要求9所述的获取应答信息的装置,其特征在于,所述断句模型为长短期记忆网络LSTM和条件随机场模型CRF的组合模型;所述断句模块具体用于:
    将所述待识别的目标文本信息输入到所述LSTM中,对所述目标文本信息中的每个词进行特征标注,得到具有特征标注的目标文本信息;该LSTM是通过对样本集进行训练得到的,该样本集中包括多个具有特征标注的语料;
    将具有特征标注的目标文本信息输入到CRF中,采用CRF根据特征标注在该目标文 本中插入分隔符,输出断句后的至少两个文本信息。
  11. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取待识别的目标文本信息;
    将所述待识别的目标文本信息输入到ES数据库中进行检索,通过倒排索引得到与所述目标文本信息相似的候选文档集合;
    将所述候选文档集合中的每个候选文档与所述待识别的目标文本信息输入到相似度识别模型中,通过所述相似度识别模型输出每个候选文档与所述目标文本信息的相似度;所述每个候选文档对应一个意图;
    若目标候选文档与所述待识别的目标文本信息的相似度分值大于阈值,则确定所述目标候选文档所对应的目标意图;每个所述意图具有关联的应答信息;
    确定所述目标意图对应的目标应答信息;
    输出所述目标应答信息。
  12. 根据权利要求11所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还实现如下步骤:
    若目标候选文档与所述待识别的目标文本信息的相似度分值小于或者等于所述阈值,则将所述待识别的目标文本信息输入到断句模型中,通过所述断句模型输出断句后的文本信息;所述断句后的文本信息作为待识别的目标文本信息;
    执行所述将所述待识别的文本信息输入到ES数据库中进行检索的步骤,至,所述输出所述目标应答信息的步骤。
  13. 根据权利要求12所述的计算机设备,其特征在于,所述断句模型为长短期记忆网络LSTM和条件随机场模型CRF的组合模型;将所述待识别的目标文本信息输入到断句模型中,通过所述断句模型输出断句后的文本信息,包括:
    将所述待识别的目标文本信息输入到所述LSTM中,对所述目标文本信息中的每个词进行特征标注,得到具有特征标注的目标文本信息;该LSTM是通过对样本集进行训练得到的,该样本集中包括多个具有特征标注的语料;
    将具有特征标注的目标文本信息输入到CRF中,采用CRF根据特征标注在该目标文本中插入分隔符,输出断句后的至少两个文本信息。
  14. 根据权利要求11所述的计算机设备,其特征在于,所述将所述候选文档集合中的每个候选文档与所述待识别的文本信息输入到相似度识别模型中,通过所述相似度识别模型输出每个候选文档与所述待识别的文本信息的相似度,包括:
    通过分词模型对目标文本信息进行分词,得到第一词序列,且该第一词序列中的每个单词可以标注特征,及每个特征对应的特征值;通过该分词模型对每个候选文档进行分词,得到第二词序列,该第二词序列中的每个单词可以标注特征,及每个特征对应的特征值;
    将第一词序列和第二词序列输入到识别模型,确定每个特征对应的特征向量;
    根据每个所述特征向量,及每个特征对应的权重,输出目标文本信息与候选文档的相似度。
  15. 根据权利要求11所述的计算机设备,其特征在于,所述获取待识别的目标文本信息,包括:
    向终端输出询问语句,所述询问语句携带类别标签;
    接收终端发送的所述询问语句对应的回答语句,所述回答语句携带所述类别标签;所述回答语句作为所述待识别的目标文本信息;
    所述将所述待识别的目标文本信息输入到ES数据库中进行检索,包括:
    将所述回答语句输入到所述ES数据库中所述类别标签对应的目标子数据库中进行检索。
  16. 一个或多个存储有计算机可读指令的可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    获取待识别的目标文本信息;
    将所述待识别的目标文本信息输入到ES数据库中进行检索,通过倒排索引得到与所述目标文本信息相似的候选文档集合;
    将所述候选文档集合中的每个候选文档与所述待识别的目标文本信息输入到相似度识别模型中,通过所述相似度识别模型输出每个候选文档与所述目标文本信息的相似度;所述每个候选文档对应一个意图;
    若目标候选文档与所述待识别的目标文本信息的相似度分值大于阈值,则确定所述目标候选文档所对应的目标意图;每个所述意图具有关联的应答信息;
    确定所述目标意图对应的目标应答信息;
    输出所述目标应答信息。
  17. 根据权利要求11所述的可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:
    若目标候选文档与所述待识别的目标文本信息的相似度分值小于或者等于所述阈值,则将所述待识别的目标文本信息输入到断句模型中,通过所述断句模型输出断句后的文本信息;所述断句后的文本信息作为待识别的目标文本信息;
    执行所述将所述待识别的文本信息输入到ES数据库中进行检索的步骤,至,所述输出所述目标应答信息的步骤。
  18. 根据权利要求17所述的可读存储介质,其特征在于,所述断句模型为长短期记忆网络LSTM和条件随机场模型CRF的组合模型;将所述待识别的目标文本信息输入到断句模型中,通过所述断句模型输出断句后的文本信息,包括:
    将所述待识别的目标文本信息输入到所述LSTM中,对所述目标文本信息中的每个词进行特征标注,得到具有特征标注的目标文本信息;该LSTM是通过对样本集进行训练得到的,该样本集中包括多个具有特征标注的语料;
    将具有特征标注的目标文本信息输入到CRF中,采用CRF根据特征标注在该目标文本中插入分隔符,输出断句后的至少两个文本信息。
  19. 根据权利要求16所述的可读存储介质,其特征在于,所述将所述候选文档集合中的每个候选文档与所述待识别的文本信息输入到相似度识别模型中,通过所述相似度识别模型输出每个候选文档与所述待识别的文本信息的相似度,包括:
    通过分词模型对目标文本信息进行分词,得到第一词序列,且该第一词序列中的每个单词可以标注特征,及每个特征对应的特征值;通过该分词模型对每个候选文档进行分词,得到第二词序列,该第二词序列中的每个单词可以标注特征,及每个特征对应的特征值;
    将第一词序列和第二词序列输入到识别模型,确定每个特征对应的特征向量;
    根据每个所述特征向量,及每个特征对应的权重,输出目标文本信息与候选文档的相似度。
  20. 根据权利要求16所述的可读存储介质,其特征在于,所述获取待识别的目标文本信息,包括:
    向终端输出询问语句,所述询问语句携带类别标签;
    接收终端发送的所述询问语句对应的回答语句,所述回答语句携带所述类别标签;所述回答语句作为所述待识别的目标文本信息;
    所述将所述待识别的目标文本信息输入到ES数据库中进行检索,包括:
    将所述回答语句输入到所述ES数据库中所述类别标签对应的目标子数据库中进行检索。
PCT/CN2019/116944 2019-09-18 2019-11-11 获取应答信息的方法、装置、计算机设备及存储介质 WO2021051521A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910883201.3 2019-09-18
CN201910883201.3A CN110765244B (zh) 2019-09-18 2019-09-18 获取应答话术的方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021051521A1 true WO2021051521A1 (zh) 2021-03-25

Family

ID=69330148

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116944 WO2021051521A1 (zh) 2019-09-18 2019-11-11 获取应答信息的方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN110765244B (zh)
WO (1) WO2021051521A1 (zh)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158692A (zh) * 2021-04-22 2021-07-23 中国平安财产保险股份有限公司 基于语义识别的多意图处理方法、系统、设备及存储介质
CN113326388A (zh) * 2021-05-20 2021-08-31 上海云从汇临人工智能科技有限公司 基于倒排表的数据检索方法、系统、介质及装置
CN113326095A (zh) * 2021-05-26 2021-08-31 北京沃东天骏信息技术有限公司 一种佣金数据处理方法和装置
CN113361253A (zh) * 2021-05-28 2021-09-07 北京金山数字娱乐科技有限公司 识别模型训练方法及装置
CN113515621A (zh) * 2021-04-02 2021-10-19 中国科学院深圳先进技术研究院 数据检索方法、装置、设备及计算机可读存储介质
CN113641785A (zh) * 2021-06-28 2021-11-12 北京邮电大学 基于多维度的科技资源相似词检索方法及电子设备
CN113657109A (zh) * 2021-08-31 2021-11-16 平安医疗健康管理股份有限公司 基于模型的临床术语的标准化方法、装置和计算机设备
CN113672711A (zh) * 2021-08-09 2021-11-19 之江实验室 一种服务型机器人意图识别装置及其训练、识别方法
CN113704397A (zh) * 2021-08-05 2021-11-26 北京百度网讯科技有限公司 检索方法、装置、电子设备以及存储介质
CN113704462A (zh) * 2021-03-31 2021-11-26 腾讯科技(深圳)有限公司 文本处理方法、装置、计算机设备及存储介质
CN113821603A (zh) * 2021-09-29 2021-12-21 平安普惠企业管理有限公司 记录信息处理方法、装置、设备和存储介质
CN113887224A (zh) * 2021-10-19 2022-01-04 京东科技信息技术有限公司 语句意图识别方法、语句应答方法、装置和电子设备
CN114625878A (zh) * 2022-03-22 2022-06-14 中国平安人寿保险股份有限公司 意图识别方法、交互系统及设备
CN114972440A (zh) * 2022-06-21 2022-08-30 江西省国土空间调查规划研究院 用于国土调查的es数据库图斑对象链式追踪方法
CN114997293A (zh) * 2022-05-24 2022-09-02 企知道网络技术有限公司 信息关联方法、装置、计算机设备和存储介质
CN115249017A (zh) * 2021-06-23 2022-10-28 马上消费金融股份有限公司 文本标注方法、意图识别模型的训练方法及相关设备
CN116228249A (zh) * 2023-05-08 2023-06-06 陕西拓方信息技术有限公司 一种基于信息技术的客户服务系统
CN116522911A (zh) * 2023-06-29 2023-08-01 恒生电子股份有限公司 实体对齐方法及装置
CN116610782A (zh) * 2023-04-28 2023-08-18 北京百度网讯科技有限公司 文本检索方法、装置、电子设备及介质
CN118113855A (zh) * 2024-04-30 2024-05-31 浙江建木智能系统有限公司 一种舰船试验训练场景问答方法、系统、设备和介质
CN118132731A (zh) * 2024-05-06 2024-06-04 杭州数云信息技术有限公司 对话方法及装置、存储介质、终端、计算机程序产品
CN118364182A (zh) * 2024-06-19 2024-07-19 每日互动股份有限公司 一种确定目标用户的方法、装置、介质及设备
CN118363968A (zh) * 2024-06-19 2024-07-19 每日互动股份有限公司 一种获取目标数据库的方法、装置、介质及设备

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274787B (zh) * 2020-02-21 2023-04-18 支付宝(杭州)信息技术有限公司 一种用户意图预测方法和系统
CN111310477B (zh) * 2020-02-24 2023-04-21 成都网安科技发展有限公司 文档查询方法及装置
CN111444729B (zh) * 2020-03-02 2024-05-24 平安国际智慧城市科技股份有限公司 信息处理的方法、装置、设备及可读存储介质
CN111581976B (zh) * 2020-03-27 2023-07-21 深圳平安医疗健康科技服务有限公司 医学术语的标准化方法、装置、计算机设备及存储介质
CN111429157A (zh) * 2020-03-27 2020-07-17 上海东普信息科技有限公司 投诉工单的评价处理方法、装置、设备及存储介质
CN111708870A (zh) * 2020-05-27 2020-09-25 盛视科技股份有限公司 基于深度神经网络的问答方法、装置及存储介质
CN111651599B (zh) * 2020-05-29 2023-05-26 北京搜狗科技发展有限公司 一种语音识别候选结果的排序方法及装置
CN111783439B (zh) * 2020-06-28 2022-10-04 平安普惠企业管理有限公司 人机交互对话处理方法、装置、计算机设备及存储介质
CN111897935B (zh) * 2020-07-30 2023-04-07 中电金信软件有限公司 基于知识图谱的话术路径选择方法、装置和计算机设备
CN112185355B (zh) * 2020-09-18 2021-08-24 马上消费金融股份有限公司 一种信息处理方法、装置、设备及可读存储介质
CN112069304A (zh) * 2020-09-29 2020-12-11 龙马智芯(珠海横琴)科技有限公司 一种保险业务的问答方法、装置、服务器以及存储介质
CN113408292A (zh) * 2020-11-03 2021-09-17 腾讯科技(深圳)有限公司 语义识别方法、装置、电子设备及计算机可读存储介质
CN112581954B (zh) * 2020-12-01 2023-08-04 杭州九阳小家电有限公司 一种高匹配性语音交互方法和智能设备
US11049510B1 (en) * 2020-12-02 2021-06-29 Lucas GC Limited Method and apparatus for artificial intelligence (AI)-based computer-aided persuasion system (CAPS)
CN112667809A (zh) * 2020-12-25 2021-04-16 平安科技(深圳)有限公司 一种文本处理方法、装置及电子设备、存储介质
CN112749761A (zh) * 2021-01-22 2021-05-04 上海机电工程研究所 基于注意力机制和循环神经网络的敌方作战意图识别方法及系统
CN112800230B (zh) * 2021-03-22 2021-06-22 贝壳找房(北京)科技有限公司 文本处理方法、装置、计算机可读存储介质及电子设备
CN113033190B (zh) * 2021-04-19 2024-05-17 北京有竹居网络技术有限公司 字幕生成方法、装置、介质及电子设备
CN113204685B (zh) * 2021-04-25 2024-08-20 Oppo广东移动通信有限公司 资源信息获取方法及装置、可读存储介质、电子设备
CN113297367B (zh) * 2021-06-29 2024-08-02 中国平安人寿保险股份有限公司 用户对话衔接语生成的方法及相关设备
CN113553392A (zh) * 2021-07-20 2021-10-26 北京爱奇艺科技有限公司 数据内容索引方法、装置及电子设备
CN114154509A (zh) * 2021-11-26 2022-03-08 深圳集智数字科技有限公司 一种意图确定方法及装置
CN114492434B (zh) * 2022-01-27 2022-10-11 圆通速递有限公司 一种基于运单号自动识别模型智能识别运单号方法
CN115878764B (zh) * 2022-03-07 2023-08-11 北京中关村科金技术有限公司 语音回访问卷调查方法及系统、计算设备、存储介质
CN114942986B (zh) * 2022-06-21 2024-03-19 平安科技(深圳)有限公司 文本生成方法、装置、计算机设备及计算机可读存储介质
CN115934825B (zh) * 2023-02-02 2023-08-25 成都卓讯智安科技有限公司 基于Elasticsearch的数据接入方法、系统、电子设备和存储介质
CN116069914B (zh) * 2023-02-13 2024-04-12 北京百度网讯科技有限公司 训练数据的生成方法、模型训练方法以及装置
CN116204594A (zh) * 2023-05-05 2023-06-02 中国民航信息网络股份有限公司 一种基于区块链的数据处理方法、装置及设备
CN118035444A (zh) * 2024-02-20 2024-05-14 安徽彼亿网络科技有限公司 一种基于大数据的资讯信息提取方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN201130994Y (zh) * 2007-08-03 2008-10-08 李牧南 一种文字咨询自动应答的系统
CN108345672A (zh) * 2018-02-09 2018-07-31 平安科技(深圳)有限公司 智能应答方法、电子装置及存储介质
CN108681564A (zh) * 2018-04-28 2018-10-19 北京京东尚科信息技术有限公司 关键词和答案的确定方法、装置和计算机可读存储介质
US20180366111A1 (en) * 2017-06-16 2018-12-20 Hankuk University Of Foreign Studies Research & Business Foundation Method for automatic evaluation of non-native pronunciation

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8972432B2 (en) * 2008-04-23 2015-03-03 Google Inc. Machine translation using information retrieval
JP6414956B2 (ja) * 2014-08-21 2018-10-31 国立研究開発法人情報通信研究機構 質問文生成装置及びコンピュータプログラム
CN106844741A (zh) * 2017-02-13 2017-06-13 哈尔滨工业大学 一种面向特定领域的问题解答方法
CN107315766A (zh) * 2017-05-16 2017-11-03 广东电网有限责任公司江门供电局 一种集合智能与人工问答的语音问答方法及其装置
CN110019648B (zh) * 2017-12-05 2021-02-02 深圳市腾讯计算机系统有限公司 一种训练数据的方法、装置及存储介质
CN108491433B (zh) * 2018-02-09 2022-05-03 平安科技(深圳)有限公司 聊天应答方法、电子装置及存储介质
CN108763535B (zh) * 2018-05-31 2020-02-07 科大讯飞股份有限公司 信息获取方法及装置
CN109101545A (zh) * 2018-06-29 2018-12-28 北京百度网讯科技有限公司 基于人机交互的自然语言处理方法、装置、设备和介质
CN110110744A (zh) * 2019-03-27 2019-08-09 平安国际智慧城市科技股份有限公司 基于语义理解的文本配对方法、装置及计算机设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN201130994Y (zh) * 2007-08-03 2008-10-08 李牧南 一种文字咨询自动应答的系统
US20180366111A1 (en) * 2017-06-16 2018-12-20 Hankuk University Of Foreign Studies Research & Business Foundation Method for automatic evaluation of non-native pronunciation
CN108345672A (zh) * 2018-02-09 2018-07-31 平安科技(深圳)有限公司 智能应答方法、电子装置及存储介质
CN108681564A (zh) * 2018-04-28 2018-10-19 北京京东尚科信息技术有限公司 关键词和答案的确定方法、装置和计算机可读存储介质

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704462A (zh) * 2021-03-31 2021-11-26 腾讯科技(深圳)有限公司 文本处理方法、装置、计算机设备及存储介质
CN113515621B (zh) * 2021-04-02 2024-03-29 中国科学院深圳先进技术研究院 数据检索方法、装置、设备及计算机可读存储介质
CN113515621A (zh) * 2021-04-02 2021-10-19 中国科学院深圳先进技术研究院 数据检索方法、装置、设备及计算机可读存储介质
CN113158692B (zh) * 2021-04-22 2023-09-12 中国平安财产保险股份有限公司 基于语义识别的多意图处理方法、系统、设备及存储介质
CN113158692A (zh) * 2021-04-22 2021-07-23 中国平安财产保险股份有限公司 基于语义识别的多意图处理方法、系统、设备及存储介质
CN113326388A (zh) * 2021-05-20 2021-08-31 上海云从汇临人工智能科技有限公司 基于倒排表的数据检索方法、系统、介质及装置
CN113326095A (zh) * 2021-05-26 2021-08-31 北京沃东天骏信息技术有限公司 一种佣金数据处理方法和装置
CN113361253A (zh) * 2021-05-28 2021-09-07 北京金山数字娱乐科技有限公司 识别模型训练方法及装置
CN113361253B (zh) * 2021-05-28 2024-04-09 北京金山数字娱乐科技有限公司 识别模型训练方法及装置
CN115249017B (zh) * 2021-06-23 2023-12-19 马上消费金融股份有限公司 文本标注方法、意图识别模型的训练方法及相关设备
CN115249017A (zh) * 2021-06-23 2022-10-28 马上消费金融股份有限公司 文本标注方法、意图识别模型的训练方法及相关设备
CN113641785A (zh) * 2021-06-28 2021-11-12 北京邮电大学 基于多维度的科技资源相似词检索方法及电子设备
CN113641785B (zh) * 2021-06-28 2023-08-01 北京邮电大学 基于多维度的科技资源相似词检索方法及电子设备
CN113704397A (zh) * 2021-08-05 2021-11-26 北京百度网讯科技有限公司 检索方法、装置、电子设备以及存储介质
CN113704397B (zh) * 2021-08-05 2024-01-09 北京百度网讯科技有限公司 检索方法、装置、电子设备以及存储介质
CN113672711B (zh) * 2021-08-09 2024-01-19 之江实验室 一种服务型机器人意图识别装置及其训练、识别方法
CN113672711A (zh) * 2021-08-09 2021-11-19 之江实验室 一种服务型机器人意图识别装置及其训练、识别方法
CN113657109A (zh) * 2021-08-31 2021-11-16 平安医疗健康管理股份有限公司 基于模型的临床术语的标准化方法、装置和计算机设备
CN113821603A (zh) * 2021-09-29 2021-12-21 平安普惠企业管理有限公司 记录信息处理方法、装置、设备和存储介质
CN113887224A (zh) * 2021-10-19 2022-01-04 京东科技信息技术有限公司 语句意图识别方法、语句应答方法、装置和电子设备
CN114625878A (zh) * 2022-03-22 2022-06-14 中国平安人寿保险股份有限公司 意图识别方法、交互系统及设备
CN114997293A (zh) * 2022-05-24 2022-09-02 企知道网络技术有限公司 信息关联方法、装置、计算机设备和存储介质
CN114972440B (zh) * 2022-06-21 2024-03-08 江西省国土空间调查规划研究院 用于国土调查的es数据库图斑对象链式追踪方法
CN114972440A (zh) * 2022-06-21 2022-08-30 江西省国土空间调查规划研究院 用于国土调查的es数据库图斑对象链式追踪方法
CN116610782A (zh) * 2023-04-28 2023-08-18 北京百度网讯科技有限公司 文本检索方法、装置、电子设备及介质
CN116610782B (zh) * 2023-04-28 2024-03-15 北京百度网讯科技有限公司 文本检索方法、装置、电子设备及介质
CN116228249A (zh) * 2023-05-08 2023-06-06 陕西拓方信息技术有限公司 一种基于信息技术的客户服务系统
CN116522911B (zh) * 2023-06-29 2023-10-03 恒生电子股份有限公司 实体对齐方法及装置
CN116522911A (zh) * 2023-06-29 2023-08-01 恒生电子股份有限公司 实体对齐方法及装置
CN118113855A (zh) * 2024-04-30 2024-05-31 浙江建木智能系统有限公司 一种舰船试验训练场景问答方法、系统、设备和介质
CN118132731A (zh) * 2024-05-06 2024-06-04 杭州数云信息技术有限公司 对话方法及装置、存储介质、终端、计算机程序产品
CN118364182A (zh) * 2024-06-19 2024-07-19 每日互动股份有限公司 一种确定目标用户的方法、装置、介质及设备
CN118363968A (zh) * 2024-06-19 2024-07-19 每日互动股份有限公司 一种获取目标数据库的方法、装置、介质及设备

Also Published As

Publication number Publication date
CN110765244B (zh) 2023-06-06
CN110765244A (zh) 2020-02-07

Similar Documents

Publication Publication Date Title
WO2021051521A1 (zh) 获取应答信息的方法、装置、计算机设备及存储介质
Yenala et al. Deep learning for detecting inappropriate content in text
CN109314660B (zh) 在自动聊天中提供新闻推荐的方法和装置
US10176804B2 (en) Analyzing textual data
WO2020042925A1 (zh) 人机对话方法、装置、电子设备及计算机可读介质
WO2019153522A1 (zh) 智能交互方法、电子装置及存储介质
US10755177B1 (en) Voice user interface knowledge acquisition system
US10515125B1 (en) Structured text segment indexing techniques
CN110888990B (zh) 文本推荐方法、装置、设备及介质
US10042896B2 (en) Providing search recommendation
US9390161B2 (en) Methods and systems for extracting keyphrases from natural text for search engine indexing
Malandrakis et al. Distributional semantic models for affective text analysis
WO2018157789A1 (zh) 一种语音识别的方法、计算机、存储介质以及电子装置
KR102041621B1 (ko) 인공지능 음성인식 기반 기계학습의 대규모 말뭉치 구축을 위한 대화형 말뭉치 분석 서비스 제공 시스템 및 구축 방법
CN112069298A (zh) 基于语义网和意图识别的人机交互方法、设备及介质
Sutejo et al. Indonesia hate speech detection using deep learning
Griol et al. Combining speech-based and linguistic classifiers to recognize emotion in user spoken utterances
US20230350929A1 (en) Method and system for generating intent responses through virtual agents
US20200226216A1 (en) Context-sensitive summarization
Hassan Awadallah et al. Characterizing and predicting voice query reformulation
Sharma et al. BioAMA: towards an end to end biomedical question answering system
JP2020027548A (ja) キャラクタ属性に応じた対話シナリオを作成するプログラム、装置及び方法
Duşçu et al. Polarity classification of twitter messages using audio processing
CN111274366A (zh) 搜索推荐方法及装置、设备、存储介质
Hussain et al. A technique for perceiving abusive bangla comments

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19945691

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19945691

Country of ref document: EP

Kind code of ref document: A1