WO2021008015A1 - Intention recognition method, device and computer readable storage medium - Google Patents

Intention recognition method, device and computer readable storage medium Download PDF

Info

Publication number
WO2021008015A1
WO2021008015A1 PCT/CN2019/116240 CN2019116240W WO2021008015A1 WO 2021008015 A1 WO2021008015 A1 WO 2021008015A1 CN 2019116240 W CN2019116240 W CN 2019116240W WO 2021008015 A1 WO2021008015 A1 WO 2021008015A1
Authority
WO
WIPO (PCT)
Prior art keywords
search
search sentence
sentence
target
result
Prior art date
Application number
PCT/CN2019/116240
Other languages
French (fr)
Chinese (zh)
Inventor
石志娟
徐小方
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201910653241.9A external-priority patent/CN110472027B/en
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021008015A1 publication Critical patent/WO2021008015A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to an intention recognition method, device, and computer-readable storage medium.
  • search engines can recognize the intent of the search sentence based on the search sentence input by the user, so as to provide the user with search results based on the recognized intent.
  • General search sentences include search sentences with question and answer intent and search sentences without question and answer intent. If it is recognized that a certain search sentence has question and answer intent, the search results of the search sentence can provide multiple question and answer data for users to view. In order to solve the user's problems as soon as possible and enhance the user experience.
  • judging whether a search sentence has question and answer intent is generally by judging whether the search sentence includes question words. If it includes question words, it is determined that the search sentence has question and answer intent, otherwise it is determined that the search sentence does not have question and answer intent.
  • some search sentences with question and answer intent may not include question words, which leads to the unreliable way of identifying question and answer intent based on question words, and the accuracy of intent recognition is poor.
  • the embodiments of the present application provide an intent recognition method, device, and computer-readable storage medium, which can train an intent recognition model based on search event information associated with a search sentence set to perform question and answer intent recognition, which helps improve the accuracy of intent recognition.
  • an intention recognition method including:
  • the search event information is obtained by training the search event information associated with each target search sentence set in the search sentence set, where each target search sentence set includes at least one search sentence, and the search event information includes information about each search sentence in the at least one search sentence. Search order and/or click information of the search result of each search sentence, and the intention recognition result is used to indicate whether the target search sentence has a question and answer attribute;
  • the intent recognition result indicates that the target search sentence has a question and answer attribute
  • an embodiment of the present application provides an intention recognition device, which includes a unit for executing the method of the first aspect.
  • the embodiments of the present application provide another intention recognition device, including a processor and a memory, the processor and the memory are connected to each other, wherein the memory is used to store a computer program that supports the intention recognition device to execute the above method
  • the computer program includes program instructions, and the processor is configured to invoke the program instructions to execute the method of the first aspect described above.
  • the intent recognition device may further include a user interface and/or a communication interface.
  • an embodiment of the present application provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium stores a computer program, the computer program includes program instructions, and the program instructions When executed by a processor, the processor is caused to execute the method of the first aspect.
  • an intent recognition model can be trained based on search event information associated with a search sentence set to perform question and answer intent recognition, so that the accuracy of intent recognition is improved, and the reliability of question and answer intent recognition is higher.
  • FIG. 1 is a schematic flowchart of an intention recognition method provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of another intention identification method provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an intention recognition device provided by an embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of another intention recognition device provided by an embodiment of the present application.
  • an intent recognition device which may include a server, a terminal, a robot, or other recognition devices, for training an intent recognition model, recognizing the intent of a user's search sentence, and so on.
  • the terminal involved in this application may be a mobile phone, computer, tablet, personal computer, smart watch, etc., which is not limited in this application.
  • the intent recognition result of the target search sentence can be obtained by inputting the target search sentence for intent recognition to the intent recognition model trained based on multiple search sentence sets and their associated search event information. Determine whether the target search sentence has question and answer attributes, and then output the question and answer search results when the target search sentence has question and answer attributes, that is, the intent recognition model can be trained according to the search event information associated with the search sentence set for question and answer intent recognition. This improves the accuracy of intention recognition, and makes question-and-answer intention recognition more reliable. Detailed descriptions are given below.
  • FIG. 1 is a schematic flowchart of an intention recognition method provided by an embodiment of the present application. Specifically, the technical solution of this embodiment can be applied to the aforementioned intention recognition device. As shown in Figure 1, the intention recognition method may include the following steps:
  • the target search sentence is a search sentence for intent recognition. It can be understood that, in other embodiments, the target search sentence may also be obtained in other ways, such as from a search queue; the target search sentence may be input by text or voice, etc. This application does not limit the method of obtaining or inputting the target search sentence.
  • the word segmentation result of the target search sentence may include multiple word segments (also referred to as words, words, entries, etc.) that make up the target search sentence.
  • the multiple participles may refer to all the participles of the target search sentence; or, the multiple participles may refer to the partial participles of all the participles, such as removing meaningless participles from all the participles (such as removing)
  • a filter list can be preset, and the filter list can include various stop words or other meaningless words, such as "ah", "oh", " After the target search sentence is segmented, the stop words and other meaningless words in the query sentence can be determined by matching and comparison with the words in the filter list, and these words can be removed to Reduce the detection overhead of determining whether the search sentence has question and answer attributes; etc., which are not listed here.
  • the word segmentation method corresponding to the word segmentation processing may be stuttering word segmentation or Stanford word segmentation or other word segmentation methods, which are not limited in this application.
  • the intention recognition model can be used to identify whether the search sentence has question and answer attributes.
  • the intent recognition model may be trained based on multiple target search sentence sets and search event information associated with each target search sentence set in the multiple target search sentence sets.
  • Each target search sentence set may include at least one search sentence.
  • the search event information may include the search order of each search sentence in the at least one search sentence and/or search result click information of each search sentence, and the intention recognition result may be used to indicate whether the target search sentence has a question and answer attribute.
  • the intent recognition model Before inputting the word segmentation result of the target search sentence into a preset intent recognition model to obtain the intent recognition result corresponding to the target search sentence, the intent recognition model can be obtained by pre-training. Specifically, multiple target search sentence sets and their associated search event information can be obtained, for example, from a preset search sentence database, and the search order of each search sentence in each target search sentence set and/or each The click information of the search result of the search sentence determines the intent of each target search sentence set. For example, it is determined whether the search sentence included in each target search sentence set has a question and answer attribute, and then according to the search sentence included in each target search sentence set and whether it has The determined result of the question and answer attribute is trained to obtain the intention recognition model.
  • the search order can be used to indicate the search order of each search sentence
  • the search result click information can be used to indicate the information of the search result item clicked by the user.
  • the search order can be text information or identification information (such as 1, 2, 3...), or the search event information can also include the search time of each search sentence, and the search of each search sentence The order can be indicated by the search time, etc., which is not limited in this application.
  • the search result click information may include the total number of clicks on search result items, the number of clicks on Q&A search result items, and the number of clicks on non-Q&A search results (for example, by setting a label to indicate whether it is a Q&A or a non-Q&A) and / Or the browsing time of each search result item clicked, etc.
  • the multiple target search sentence sets may be search sentence sets whose occurrence times in the search sentence database are greater than a first number threshold; or may be search sentences whose proportions in the search sentence database are greater than a preset proportion value
  • the search sentence database may also record the search time of each search sentence, and the selected multiple target search sentence sets may be search sentence sets within the historical time window such as the previous month; or, multiple selected targets
  • the search sentence set may also be determined in combination with the application field of the intent recognition model to be trained, or the selected multiple target search sentence sets may also be selected by combining any two or more selection methods mentioned above, etc. , I will not list them here. This helps to improve the reliability of the selected model training data.
  • determining whether the search sentences included in each target search sentence set has a question and answer attribute can also be referred to as determining whether each target search sentence set has a question and answer attribute.
  • it can be determined according to the click information of the search result of part of the search sentence in the key search event information of the target search sentence set, for example, according to the target search
  • the search results of the first M search sentences that is, the M search sentences with the most recent/latest search time
  • M is an integer greater than or equal to 1; alternatively, all search sentences can be set according to the target
  • the search result of the search sentence is determined by click information; or, it can be determined according to the weighting coefficient of each search sentence in the target search sentence set and the search result click information of each search sentence, etc., which are not listed here.
  • the question and answer data that is, the search result item of the question and answer category corresponding to the target search result
  • the Q&A search result items can be displayed in front of the output interface according to the generation time or the relevance to the target search sentence, and the non-Q&A search result items can be displayed after all the Q&A search result items;
  • the Q&A search result item selects some search result items, such as the top N items with the latest generation time or the top M items with the highest correlation with the target search sentence, which will be displayed on the output interface, and will be displayed after the N items or M items.
  • the non-Q&A search result items corresponding to the target search sentence (for example, the top E item with the latest generation time or the top F item with the highest correlation with the target search sentence is still displayed), where the N, M, E, and F are all It is an integer greater than 0; for another example, the output interface may only display the search result items of the question and answer category corresponding to the target search result, etc., which are not listed here.
  • the intention recognition device when acquiring the target search sentence input by the user, can perform word segmentation processing on the target search sentence to obtain a word segmentation result, and input the word segmentation result into a set based on multiple search sentence sets and their associations.
  • the intent recognition model trained on the search event information to obtain whether the target search sentence has the question and answer attribute, and then when the target search sentence has the question and answer attribute, the search result including the question and answer search result item is output, so that the search
  • the search event information associated with the sentence set is trained to obtain an intent recognition model for question and answer intent recognition, which improves the accuracy of intent recognition, and the reliability of question and answer intent recognition is high.
  • FIG. 2 is a schematic flowchart of another intention recognition method provided by an embodiment of the present application. Specifically, as shown in FIG. 2, the intention recognition method may include the following steps:
  • the search sentence database records multiple search sentence sets and search event information associated with each search sentence set.
  • each search sentence set includes one or more search sentences, that is, includes at least one search sentence, and the search event information includes the search order of each search sentence in the at least one search sentence and/or the search of each search sentence The result is clicked information, so I won’t repeat it here.
  • a search sentence set includes multiple search sentences
  • the search time interval between any two of the multiple search sentences does not exceed a preset time threshold, and any two of the multiple search sentences
  • the overlap rate of keywords between search sentences is higher than the preset overlap rate threshold. That is to say, the search sentences included in the search sentence set may refer to similar search sentences within a preset time range (such as within 2 minutes of the first search), that is, keywords (such as removing modal particles and stop words from the search sentence).
  • the word segmentation is used as a keyword) for search sentences whose overlap rate is higher than the preset overlap rate threshold.
  • the preset time threshold is 2min
  • the overlap rate threshold is 70%
  • the search time interval of two search sentences is 30s, that is, the preset time threshold is not exceeded
  • the keywords of the two search sentences are 5 and 6 respectively.
  • the weight of each keyword is the same.
  • the number of keywords in the smaller one may be larger, or the average value of the two may also be used, etc., which are not listed here), which is greater than the overlap rate threshold, Then these two search sentences can be put into the same search sentence set.
  • weighting coefficients can be set for preset keywords (such as domain-specific words or words with a high frequency of occurrence) in advance, and the weighting coefficients corresponding to each preset keyword may be the same or different;
  • the specific keyword can be weighted according to the weighting coefficient of the specific keyword.
  • the keyword overlap rate is weighted, that is, after the keyword overlap rate is increased, the similar search sentence is judged, and the search sentence set is determined according to the search time and the overlap rate of the weighted search sentence. This helps to improve the reliability of the search sentence set determination.
  • the multiple target search sentence sets may be search sentence sets in the search sentence database whose occurrence times are greater than a first number threshold; or, it may be that the proportion of the search sentence database in the search sentence database is greater than a preset Proportional search sentence sets; or, the search sentence database may also record the search time of each search sentence, and the selected multiple target search sentence sets may be the search sentence sets in the historical time window, such as the previous month; or, The selected multiple target search sentence sets may also be determined in combination with the application field of the intent recognition model to be trained, or the selected multiple target search sentence sets may also be combined through any two or more selection methods described above Selected, etc., not listed here. This helps to improve the reliability of the selected model training data.
  • the intent recognition device may determine the search sentence sets whose occurrence times are greater than the preset second number threshold from the search sentence database, and determine The set of search sentences with the number of occurrences greater than the second number threshold is used as the multiple target search sentence sets; or, the second set between the number of occurrences and the total number of search sentences in the search sentence database can be determined from the search sentence database.
  • a search sentence set whose ratio is greater than a preset second proportion threshold, and the determined search sentence set whose second ratio is greater than the second proportion threshold is used as the multiple target search sentence sets; or, it can also be obtained from a search sentence database
  • the determined number of occurrences is greater than the second number threshold and the second ratio is greater than the preset
  • the search sentence set of the second ratio threshold is set as the multiple target search sentence sets, etc., which are not listed here.
  • the number of occurrences of the search sentence set may be the sum of the number of occurrences of the search sentences included in the search sentence set, or the number of occurrences of the search sentence set may be the average number of occurrences of the search sentences included in the search sentence set, or ,
  • the number of occurrences of the search sentence set may be the highest number of occurrences of the search sentences included in the search sentence set, etc., which are not listed here.
  • the number of occurrences of search sentences may refer to the number of search sentences in the search database or the number of search sentences in the search database whose similarity to the search sentence is higher than a threshold, and so on, which is not limited in this application.
  • the intention recognition device may determine the application field information of the intent recognition model to be trained, and obtain information from the search sentence database according to the application field information.
  • the target sub-database is determined from the included multiple sub-databases, and then the multiple target search sentence sets are selected from the target sub-database.
  • the sub-database has a one-to-one correspondence with the application field, and each sub-database includes multiple search sentence sets under the corresponding application field (the number of which is greater than the number of the selected target search sentence sets) and the search events associated with each search sentence set Information, the application field corresponding to the target sub-database is the same as the application field indicated by the application field information.
  • the search sentence database may include sub-databases under each application field, and each sub-database includes a search sentence set under an application field and the search order of each search sentence associated with each search sentence set and search result click information And so on, so when selecting the target search sentence set, you can determine the sub-database (such as the sub-database carrying the field label) by determining the application field information (such as the field label) of the intent recognition model to be trained, and select the target from it Search sentence set. Thereby, the reliability of the selected model training data can be further improved, and the training effect can be improved.
  • the word segmentation result of each target search sentence set includes multiple word breaks that make up the search sentence of the target search sentence set, and the multiple word breaks may refer to all the word breaks of the search sentence of the target search sentence set, or may refer to The partial participles in all the participles, for example, the participles after removing the meaningless participles (such as removing stop words or other meaningless participles) from all the participles.
  • a filter list can be preset, and the filter list can include Various stop words or other meaningless words, such as "ah”, "oh”, " ⁇ ”, etc., so that after the search sentence of the target search sentence set is segmented, it can pass the word in the filter list
  • the method of matching and comparison determines the stop words and other meaningless words in the query sentence, and removes these words to reduce the detection overhead of determining whether the search sentence has the question and answer attribute; or, the multiple word segmentation can refer to the The word segmentation of the search sentence with the largest search order (that is, the most recent search) in the target search sentence set (may be all or part of the word segmentation of the search sentence with the largest search order, which will not be repeated here), etc., and will not be repeated here.
  • each target search sentence set determines whether the search sentence included in each target search sentence set has a question and answer attribute.
  • the search result click information may include the total number of clicks of search result items and the number of clicks of Q&A search result items; when determining whether the search sentence has the Q&A attribute according to the search result click information of the search sentence, you can select
  • the total number of clicks on the search result items included in the search result click information of the search sentence is compared with a preset first number threshold, and the number of clicks on the Q&A search result items included in the search result click information of the search sentence is calculated with The first ratio between the total number of clicks on the search result item, and compare the first ratio with the preset first ratio value; if the total number of clicks on the search result item is greater than the preset first number threshold, and the If the first ratio is greater than the preset first ratio threshold, it can be determined that the search sentence has a question and answer attribute; otherwise, it can be indicated that it does not have a question and answer attribute (or can be further determined in combination with other methods).
  • the search result click information may include the total number of clicks on search result items, the number of clicks on Q&A search result items, and the browsing time of each clicked search result item; in the search result click information based on the search sentence
  • the search sentence when determining whether the search sentence has a question and answer attribute according to the click information of the search result of the search sentence, the number of clicks on the search result items of the question and answer type included in the search result click information of the search sentence and the preset If the number of clicks of the search result item of the question and answer category is greater than the other number threshold, it can be determined that the search sentence has question and answer attributes, etc., which are not listed here.
  • the search order of each search sentence included in the search event information associated with the target search sentence set may be used. , Determine the search sentence corresponding to the largest search order in the at least one search sentence included in the target search sentence set; determine whether the search sentence included in the target search sentence set has the search result click information of the search sentence corresponding to the maximum search sequence Q&A attributes.
  • the method of determining whether the search sentence included in the target search sentence set has the question and answer attribute can be determined by referring to the above click information according to the search result of the search sentence to determine whether the search sentence has the question and answer attribute The method is not repeated here. If the search sentence corresponding to the maximum search order has the question and answer attribute, it can be determined that the search sentence included in the target search sentence set has the question and answer attribute. That is to say, when determining whether a target search sentence has a question and answer attribute, the most recent search event from the related search events, that is, the search event of the search sentence corresponding to the maximum number of searches, can be selected according to the maximum number of searches.
  • the search result information of to determine whether the target search sentence has question and answer attributes because the search results obtained from the previous searches may not be what the user wants, you can follow the subsequent clicks to improve the judgment efficiency and ensure the judgment accuracy .
  • determining whether the search sentence included in the target search sentence set has a question and answer attribute it can be determined according to the click information of the search results of all the search sentences in the target search sentence set.
  • the search sentence has a question and answer attribute according to the click information of the search result of the search sentence, for example, counting the sum of the number of clicks of the search result items of the question and answer type in the search result click information of all the search sentences, and judging the sum of the number of clicks Whether it exceeds the preset number threshold, if it exceeds, it can indicate that the search sentences included in the target search sentence set have question-and-answer attributes, etc., which will not be repeated here.
  • the weighting coefficient of each search sentence may be preset, for example, the weighting coefficient of the search sentence including the question word is higher than the weighting coefficient of the search sentence not including the question word, and/or,
  • the weighting coefficient of the search sentence with a higher search order is higher than the weighting coefficient of the search sentence with a lower search order (that is, the higher the search order, the higher the weighting coefficient is), and/or the search result included in the click information for the search result
  • the click item has a display result of a specific question and answer website or a search sentence that has a display result of a specific question and answer website in the search result, and its weighting coefficient is higher than the weight coefficient of a search sentence that does not have a display result of the specific question and answer website, and so on.
  • the weight coefficient corresponding to each search sentence in the at least one search sentence included in the target search sentence set can be determined; according to the sum of the weight coefficients corresponding to each search sentence
  • the search result click information in the search event information associated with the target search sentence set determines whether the search sentence included in the target search sentence set has a question and answer attribute.
  • the weighting coefficient and the search result click information can refer to the weighting of the parameters of the search result click information, such as the number of search result click items of the question and answer category, the browsing time, etc., by the weighting coefficient.
  • the question and answer attribute determination method is specific You can refer to the above method of determining whether the search sentence has question and answer attributes based on the click information of the search result of the search sentence. For example, the number of clicks on the Q&A search result item corresponding to each search sentence can be weighted by the weighting coefficient of each search sentence (for example, the number of clicks on the Q&A search result item corresponding to each search sentence is 2, and the weighting factor is 1.5.
  • the first ratio between the sum of the number of clicks on the search result items of the question and answer category (weighted) and the total number of clicks on the search result items of each search sentence is greater than the preset first ratio threshold, then the target can be determined
  • the search sentences included in the search sentence set have question and answer attributes.
  • determining whether a target search sentence set has the question and answer attribute it can be determined whether the target search sentence has the question and answer attribute according to the search result information clicked by the user for each search number and the weight of each search result. This helps to improve the reliability of the question and answer attributes of the determined search sentence set.
  • step 203 can be performed first, and then step 202 can be performed, or steps 202 and 203 can be performed simultaneously, which is not limited in this application. .
  • the word segmentation results of the target search sentence set with question and answer attributes in the multiple target search sentence sets are used as positive samples, and the word segmentation results of the target search sentence sets without question answering attributes in the multiple target search sentence sets are used as negative samples.
  • the word segmentation results of the target search sentence set with the question and answer attribute in the multiple target search sentence sets may include the word segmentation of the search sentences with the question and answer attribute in the multiple target search sentence sets, which may be one or more target searches All participles of the sentence set can also be part of the participles. That is, the positive sample may include word segmentation of search sentences with question and answer attributes, and the negative sample may include word segmentation of search sentences without question and answer attributes. For example, when it is determined that the search sentence of a certain target search sentence set has a question and answer attribute, all the word segmentation (meaningless word segmentation can be removed) of the search sentence of the target search sentence set can be used as a positive sample.
  • the word segmentation of the search sentence with the question and answer attribute may be used as a positive training sample, and the word segmentation of the search sentence without the question and answer attribute may be used as the negative training sample, so as to obtain the intention recognition based on the positive training sample and the negative training sample.
  • Model so that the subsequent intent recognition model can quickly identify whether the input search sentence has question and answer attributes. Furthermore, it is possible to return information for the user according to whether the recognized input search sentence has a Q&A attribute recognition result.
  • the search sentence with question and answer attributes may indicate that there is a question and answer requirement
  • the search sentence without question and answer attributes may indicate that there is no question and answer requirement, so that different pages (interfaces) can be returned to the user according to whether the search sentence has question and answer requirements.
  • Different demand content may indicate that there is a question and answer requirement
  • the search sentence without question and answer attributes may indicate that there is no question and answer requirement
  • the description of the steps 201-204 and the related description of the embodiment shown in FIG. 1 may refer to each other, which is not repeated here.
  • the number of positive and negative samples is balanced, for example, it is determined that the number of search sentences corresponding to the positive sample corresponds to the negative sample Whether the absolute value of the difference between the number of search sentences exceeds a preset number threshold, such as a preset third number threshold, if it exceeds, it may indicate that the number of positive and negative samples is unbalanced.
  • a preset number threshold such as a preset third number threshold
  • the number of positive and negative samples can be counted separately.
  • the difference between the two is too large, such as the difference exceeds the preset third number threshold, the number of positive and negative samples can be balanced according to the preset sample balance rule. training.
  • the preset sample balance rule can be multiple, and can be selected according to the number of positive and negative samples, or according to the training scenario.
  • the positive and negative samples can be balanced by the way of synthetic samples; another example, for scenes with high reliability requirements (for example, the scene label is high reliability), changes can be adopted
  • the method of sample weight balances positive and negative samples, and the specific balance rules for each sample and the selected scenes can be preset.
  • the way to balance positive and negative samples can be as follows:
  • Upsampling To increase the samples with a small number of samples, the method is to directly copy the original samples. For example, it can be used when the sample is small.
  • Downsampling To reduce samples with a large number of samples, the way is to discard these redundant samples. For example, it can be used when there are many samples.
  • the target search sentences can be sorted according to the total number of clicks, and samples corresponding to the target search sentences with a small total number of clicks are discarded.
  • Synthetic samples increase the type of samples with a small number of samples. Synthesis refers to the combination of various features of existing samples to generate new samples. Specifically, the method of generating a new sample can be to randomly select some features from each feature or select some specific features through some methods (such as features with a number of occurrences higher than a threshold, or sample similarity higher than a threshold, such as Europe The features between samples whose distance is less than the threshold, etc.) are then spliced into a new sample, thereby increasing the number of samples in the category with a smaller number of samples. Unlike upsampling, which is simply copying samples, here is the splicing to obtain new samples, which can further improve the reliability of model training.
  • upsampling which is simply copying samples
  • the intent recognition model can be trained, so that the subsequent intent recognition model can quickly identify whether the input search sentence has question and answer intent.
  • the intention recognition model can be used to identify whether the input search sentence has question and answer attributes.
  • the model may be a model based on a binary tree, a model based on a multi-tree, or a neural network model, etc., which is not limited in this application.
  • the question and answer category of the search sentence can be further determined, for example, whether it belongs to an explicit question answer search sentence that includes question words or does not include question words. Implicit question and answer search statement.
  • the intent recognition model can be trained based on the positive and negative samples and the question and answer category (such as category label) to which each positive sample belongs. Therefore, when the intention recognition model is used to identify the question and answer attribute of the search sentence, not only the question and answer type search sentence, that is, the search sentence with the question and answer intention, can be identified, but also the question and answer category to which the question and answer intention belongs can be determined. Further optionally, the corresponding relationship between each question and answer category and display content/page (keyword or content title format) can be preset, and then the content can be displayed for users according to the question and answer category, which improves the flexibility of page display .
  • the description of the steps 209-212 and the related description of the steps 101-104 in the embodiment shown in FIG. 1 may refer to each other, and details are not repeated here.
  • the intent recognition device can perform word segmentation processing on the search sentences included in the selected multiple target search sentence sets to obtain the word segmentation result of each target search sentence set, and according to the associated search sentence set of each target search sentence set.
  • the search event information determines whether the search sentences included in each target search sentence set have question and answer attributes, and then the word segmentation results of search sentences with question and answer attributes in the multiple target search sentence sets can be used as positive samples and search sentences that do not have question and answer attributes
  • the word segmentation result of is used as a negative sample, and after the positive and negative samples are balanced according to the preset sample balance rule, the intent recognition model is trained based on the balanced positive and negative samples to perform question and answer intent recognition, so that the obtained search sentence Input to the intent recognition model to identify whether the target search sentence has question and answer attributes, and then when the target search sentence has question and answer attributes, the output includes the search result items of the question and answer category, which improves the accuracy of intent recognition, so that The reliability and recall rate of Q&A intention recognition are high.
  • FIG. 3 is a schematic structural diagram of an intention recognition device provided by an embodiment of the present application.
  • the intention recognition device of the embodiment of the present application includes a unit for executing the above-mentioned intention recognition method.
  • the intention recognition device 300 of this embodiment may include: an acquiring unit 301 and a processing unit 302. among them,
  • the obtaining unit 301 is configured to receive the target search sentence input by the user;
  • the processing unit 302 is configured to perform word segmentation processing on the target search sentence to obtain a word segmentation result of the target search sentence, and the word segmentation result of the target search sentence includes multiple word segmentation that constitute the target search sentence;
  • the processing unit 302 is further configured to input the word segmentation result of the target search sentence into a preset intent recognition model to obtain an intent recognition result corresponding to the target search sentence, and the intent recognition model is based on multiple target search sentences Set and the search event information associated with each target search sentence set in the multiple target search sentence sets, each target search sentence set includes at least one search sentence, and the search event information includes the at least one search.
  • the search order of each search sentence in the sentence and/or the click information of the search result of each search sentence, and the intention recognition result is used to indicate whether the target search sentence has a question and answer attribute;
  • the processing unit 302 is further configured to output a search result including a question and answer type search result item corresponding to the target search sentence if the intention recognition result indicates that the target search sentence has a question and answer attribute.
  • the acquiring unit 301 is further configured to select multiple target search sentence sets from a search sentence database; wherein, the search sentence database records multiple search sentence sets and search event information associated with each search sentence set Wherein each search sentence set includes at least one search sentence, and the search event information includes a search order of each search sentence in the at least one search sentence and/or search result click information of each search sentence;
  • the processing unit 302 is further configured to separately perform word segmentation processing on the search sentences included in each target search sentence set in the multiple target search sentence sets to obtain the word segmentation result of each target search sentence set.
  • the word segmentation result of the search sentence set includes multiple word segmentation that constitute the search sentence of the target search sentence set;
  • the processing unit 302 may be further configured to determine whether the search sentences included in each target search sentence set have question and answer attributes according to the search event information associated with each target search sentence set;
  • the word segmentation result of the target search sentence set of the question and answer attribute is used as a positive sample, and the word segmentation result of the target search sentence set that does not have the question and answer attribute in the plurality of target search sentence sets is used as a negative sample, and the multiple target search sentence sets are used
  • Corresponding positive samples and negative samples are trained to obtain an intention recognition model; wherein, the intention recognition model is used to identify whether the input search sentence has question and answer attributes.
  • the processing unit 302 may be specifically configured to determine the at least one search sentence included in the target search sentence set according to the search order of each search sentence included in the search event information associated with the target search sentence set The search sentence corresponding to the largest search order in the middle; according to the click information of the search result of the search sentence corresponding to the largest search order, it is determined whether the search sentence included in the target search sentence set has a question and answer attribute.
  • the search result click information includes the total number of clicks on search result items and the number of clicks on Q&A search result items;
  • the processing unit 302 when determining whether the search sentence included in the target search sentence set has a question and answer attribute according to the click information of the search result of the search sentence corresponding to the maximum search order, may be specifically used to: calculate the maximum search The first ratio between the number of clicks on the search result items of the question and answer category and the total number of clicks on the search result items included in the search result click information of the search sentence corresponding to the order; if the total number of clicks on the search result item is greater than the preset A first number threshold, and the first ratio is greater than a preset first ratio threshold, it is determined that the search sentences included in the target search sentence set have question and answer attributes.
  • the processing unit 302 may be specifically configured to determine a weighting coefficient corresponding to each search sentence in the at least one search sentence included in the target search sentence set, and a search sentence with a higher search order among the at least one search sentence
  • the weighting coefficient of is higher than the weighting coefficient of search sentences with a small search order
  • the target search sentence set is determined according to the weighting coefficient corresponding to each search sentence and the search result click information in the search event information associated with the target search sentence set Whether the included search sentence has question and answer attributes.
  • the acquiring unit 301 may be specifically configured to: determine from the search sentence database a set of search sentences with a number of occurrences greater than a preset second number threshold, and determine that the number of occurrences is greater than the second number threshold As the multiple target search sentence sets; or, determining from the search sentence database that the second ratio between the number of occurrences and the total number of search sentences in the search sentence database is greater than a preset second ratio threshold Search sentence sets, and use the determined search sentence sets whose second ratio is greater than the second ratio threshold as the multiple target search sentence sets.
  • the number of occurrences of the search sentence set is the sum of the number of occurrences of the search sentences included in the search sentence set, or the number of occurrences of the search sentence set is the average number of occurrences of the search sentences included in the search sentence set.
  • the acquiring unit 301 may be specifically configured to: determine the application field information of the intent recognition model to be trained; determine the target sub-database from multiple sub-databases included in the search sentence database according to the application field information; The target sub-database selects the multiple target search sentence sets.
  • each sub-database has a one-to-one correspondence with application fields
  • each sub-database includes multiple search sentence sets under the corresponding application field and search event information associated with each search sentence set
  • the application field corresponding to the target sub-database is The application fields indicated by the application field information are the same.
  • the processing unit 302 may also be configured to calculate the number of search sentences corresponding to the positive samples and the total number of search sentences corresponding to the positive samples before the intent recognition model is obtained by training using the positive samples and negative samples corresponding to the multiple target search sentence sets.
  • the absolute value of the difference between the number of search sentences corresponding to the negative sample determine whether the absolute value exceeds a preset third number threshold; if the absolute value exceeds the third number threshold, according to the preset
  • the sample balance rule processes the positive sample and/or the negative sample to obtain processed positive sample and negative sample;
  • the processing unit 302 may be specifically configured to use the processed positive samples and negative samples to train to obtain the intention recognition model.
  • the intention recognition device can implement part or all of the steps in the intention recognition method in the embodiment shown in FIG. 1 to FIG. 2 through the foregoing unit.
  • the embodiment of the present application is an apparatus embodiment corresponding to the method embodiment, and the description of the method embodiment is also applicable to the embodiment of the present application, and will not be repeated here.
  • FIG. 4 is a schematic structural diagram of another intention recognition device provided by an embodiment of the present application.
  • the intention recognition device is used to perform the above-mentioned method.
  • the intention recognition device 400 in this embodiment may include: one or more processors 401 and a memory 402.
  • the intention recognition device may further include one or more user interfaces 403 and/or one or more communication interfaces 404.
  • the above-mentioned processor 401, user interface 403, communication interface 404, and memory 402 may be connected through a bus 405, or may be connected in other ways, as illustrated in FIG. 4 by way of a bus.
  • the memory 402 is used to store a computer program, and the computer program includes program instructions, and the processor 401 is used to execute the program instructions stored in the memory 402.
  • the processor 401 may be used to call the program instructions to perform the following steps: call the user interface 403 to receive the target search sentence input by the user; perform word segmentation processing on the target search sentence to obtain the word segmentation result of the target search sentence, so The word segmentation result of the target search sentence includes multiple word segmentation that make up the target search sentence; the word segmentation result of the target search sentence is input into a preset intention recognition model to obtain the intention recognition result corresponding to the target search sentence,
  • the intent recognition model is trained based on multiple target search sentence sets and search event information associated with each target search sentence set in the multiple target search sentence sets, and each target search sentence set includes at least one search sentence
  • the search event information includes the search order of each search sentence in the at least one search sentence and/or search result click information of each search sentence, and the intention recognition result is used to indicate whether the target search sentence is It has a question and answer attribute; if the intent recognition result indicates that the target search sentence has a question and answer attribute, the user interface 403 is called to output a search result including
  • the processor 401 may perform the following steps before executing the inputting the word segmentation result of the target search sentence into a preset intent recognition model to obtain the intent recognition result corresponding to the target search sentence:
  • Multiple target search sentence sets are selected from the search sentence database; wherein, the search sentence database records multiple search sentence sets and search event information associated with each search sentence set, and each search sentence set includes at least one search Sentence, the search event information includes the search order of each search sentence in the at least one search sentence and/or the click information of the search result of each search sentence; each target in the plurality of target search sentences
  • the search sentences included in the search sentence set are subjected to word segmentation processing to obtain the word segmentation result of each target search sentence set, and the word segmentation result of each target search sentence set includes multiple word segmentation that constitute the search sentence of the target search sentence set
  • the search event information associated with each target search sentence set it is determined whether the search sentences included in each target search sentence set have question and answer attributes;
  • the word segmentation result is taken as a positive sample, and
  • the processor 401 may specifically execute the following steps: The search order of each search sentence included in the search event information associated with the target search sentence set determines the search sentence corresponding to the largest search order in the at least one search sentence included in the target search sentence set; and according to the maximum search order The search result click information of the corresponding search sentence determines whether the search sentence included in the target search sentence set has a question and answer attribute.
  • the processor 401 may specifically perform the following steps: determine the The target search sentence set includes a weighting coefficient corresponding to each search sentence in the at least one search sentence; according to the weighting coefficient corresponding to each search sentence and the search result click information in the search event information associated with the target search sentence set, It is determined whether the search sentences included in the target search sentence set have question and answer attributes.
  • the weighting coefficient of the search sentence with a higher search order in the at least one search sentence is higher than the weighting coefficient of the search sentence with a lower search order, and/or the search sentence including the question word in the at least one search sentence
  • the weighting coefficient of is higher than the weighting coefficient of search sentences that do not include question words, etc., which will not be repeated here.
  • the search result click information includes the total number of clicks of search result items and the number of clicks of Q&A search result items; the processor 401 is executing the search result clicks of the search sentence corresponding to the maximum search order Information, when determining whether the search sentences included in the target search sentence set have question and answer attributes, the following steps may be specifically executed: the search result of the search sentence corresponding to the maximum search order is calculated.
  • the click information includes the click of the search result item of the question and answer category
  • the first ratio between the number and the total number of clicks on the search result item; if the total number of clicks on the search result item is greater than the preset first number threshold, and the first ratio is greater than the preset first ratio threshold, It is determined that the search sentences included in the target search sentence set have question and answer attributes.
  • the processor 401 when it executes the selection of multiple target search sentence sets from the search sentence database, it may specifically perform the following steps: determine from the search sentence database the search sentences whose occurrence times are greater than the preset second number threshold Set, and use the determined set of search sentences with the number of occurrences greater than the second number threshold as the multiple target search sentence sets; or, determine the number of occurrences from the search sentence database and search in the search sentence database Search sentence sets whose second ratio between the total number of sentences is greater than a preset second proportion threshold, and use the determined search sentence set whose second ratio is greater than the second proportion threshold as the multiple target searches Statement set
  • the number of occurrences of the search sentence set is the sum of the number of occurrences of the search sentences included in the search sentence set, or the number of occurrences of the search sentence set is the average number of occurrences of the search sentences included in the search sentence set.
  • the processor 401 when the processor 401 executes the selection of multiple target search sentence sets from the search sentence database, it may specifically perform the following steps: determine the application field information of the intent recognition model to be trained; A target sub-database is determined from multiple sub-databases included in the search sentence database; and the multiple target search sentence sets are selected from the target sub-database.
  • each sub-database has a one-to-one correspondence with application fields
  • each sub-database includes multiple search sentence sets under the corresponding application field and search event information associated with each search sentence set
  • the application field corresponding to the target sub-database is The application fields indicated by the application field information are the same.
  • the processor 401 may also perform the following steps: calculate the search sentence corresponding to the positive sample The absolute value of the difference between the number and the number of search sentences corresponding to the negative sample; determine whether the absolute value exceeds a preset third number threshold; if the absolute value exceeds the third number threshold, follow The preset sample balance rule processes the positive sample and/or the negative sample to obtain processed positive sample and negative sample;
  • the processor 401 may specifically perform the following steps: train using the processed positive samples and negative samples to obtain the intention Identify the model.
  • the processor 401 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), and application specific integrated circuits (Application Specific Integrated Circuits). Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the user interface 403 may include an input device and an output device.
  • the input device may include a touch panel, a microphone, etc.
  • the output device may include a display (LCD, etc.), a speaker, etc.
  • the communication interface 404 may include a receiver and a transmitter for communicating with other devices.
  • the memory 402 may include a read-only memory and a random access memory, and provides instructions and data to the processor 401. A part of the memory 402 may also include a non-volatile random access memory. For example, the memory 402 may also store the aforementioned multiple search sentence sets, search event information associated with each search sentence set, and so on.
  • processor 401 described in the embodiment of the present application, etc. can execute the implementation described in the method embodiments shown in FIG. 1 to FIG. The implementation of the unit will not be repeated here.
  • the embodiment of the present application also provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium stores a computer program, and when the computer program is executed by a processor, it can realize FIGS. 1 to Part or all of the steps in the intention recognition method described in the corresponding embodiment of 2 can also realize the function of the intention recognition device in the embodiment shown in FIG. 3 or FIG. 4 of this application, and will not be repeated here.
  • the embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to execute part or all of the steps in the above method.
  • the computer non-volatile readable storage medium may be the internal storage unit of the intent identification device described in any of the foregoing embodiments, such as the hard disk or memory of the intent identification device.
  • the computer non-volatile readable storage medium may also be an external storage device of the intent identification device, for example, a plug-in hard disk equipped on the intent identification device, a smart media card (SMC), and a safe Digital (Secure Digital, SD) card, Flash Card (Flash Card), etc.
  • the term "and/or" is only an association relationship describing the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, and both A and B exist. , There are three cases of B alone.
  • the character "/" in this text generally indicates that the associated objects before and after are in an "or” relationship.
  • the size of the sequence number of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not correspond to the implementation process of the embodiments of the present application. Constitute any limitation.

Abstract

An intention recognition method, device and a computer readable storage medium, applicable to the technical field of artificial intelligence. The method comprises: receiving a target search statement input by a user; performing word segmentation to the target search statement so as to obtain a word segmentation result of the target search statement; inputting the word segmentation result of the target search statement into a preset intention recognition model so as to obtain an intention recognition result corresponding to the target search statement, the intention recognition result being used for indicating whether the target search statement has question-answer properties; if the intention recognition result indicates that the target search statement has question-answer properties, outputting a search result of question-answer search result items corresponding to the target search statement. The method facilitates the improvement of intention recognition accuracy.

Description

意图识别方法、设备及计算机可读存储介质Intention recognition method, equipment and computer readable storage medium
本申请要求于2019年07月18日提交中国专利局、申请号为201910653241.9、申请名称为“意图识别方法、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on July 18, 2019, the application number is 201910653241.9, and the application name is "Intent Recognition Method, Equipment, and Computer-readable Storage Medium", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及一种意图识别方法、设备及计算机可读存储介质。This application relates to the field of artificial intelligence technology, and in particular to an intention recognition method, device, and computer-readable storage medium.
背景技术Background technique
目前,搜索引擎可以基于用户输入的搜索语句,识别出该搜索语句的意图,以基于识别出的意图为用户提供搜索结果。一般的搜索语句包括具有问答意图的搜索语句和不具有问答意图的搜索语句,如果识别出某一搜索语句具有问答意图,则该搜索语句的搜索结果中可以提供多条问答数据以供用户查看,以便于尽快解决用户的问题,增强用户体验。目前判断搜索语句是否具有问答意图一般是通过判断该搜索语句是否包括疑问词,如果包括疑问词则确定该搜索语句具有问答意图,否则确定该搜索语句不具有问答意图。然而,实际上某些具有问答意图的搜索语句也可能不包括疑问词,这就导致该基于疑问词的问答意图识别方式不可靠,意图识别的准确性较差。Currently, search engines can recognize the intent of the search sentence based on the search sentence input by the user, so as to provide the user with search results based on the recognized intent. General search sentences include search sentences with question and answer intent and search sentences without question and answer intent. If it is recognized that a certain search sentence has question and answer intent, the search results of the search sentence can provide multiple question and answer data for users to view. In order to solve the user's problems as soon as possible and enhance the user experience. At present, judging whether a search sentence has question and answer intent is generally by judging whether the search sentence includes question words. If it includes question words, it is determined that the search sentence has question and answer intent, otherwise it is determined that the search sentence does not have question and answer intent. However, in fact, some search sentences with question and answer intent may not include question words, which leads to the unreliable way of identifying question and answer intent based on question words, and the accuracy of intent recognition is poor.
发明内容Summary of the invention
本申请实施例提供一种意图识别方法、设备及计算机可读存储介质,能够根据搜索语句集关联的搜索事件信息训练得到意图识别模型以进行问答意图识别,有助于提升意图识别的准确性。The embodiments of the present application provide an intent recognition method, device, and computer-readable storage medium, which can train an intent recognition model based on search event information associated with a search sentence set to perform question and answer intent recognition, which helps improve the accuracy of intent recognition.
第一方面,本申请实施例提供了一种意图识别方法,包括:In the first aspect, an embodiment of the present application provides an intention recognition method, including:
接收用户输入的目标搜索语句;Receive the target search sentence entered by the user;
对所述目标搜索语句进行分词处理,以得到所述目标搜索语句的分词结果,所述目标搜索语句的分词结果包括组成所述目标搜索语句的多个分词;Performing word segmentation processing on the target search sentence to obtain a word segmentation result of the target search sentence, and the word segmentation result of the target search sentence includes a plurality of word segmentation constituting the target search sentence;
将所述目标搜索语句的分词结果输入至预置的意图识别模型,以得到所述目标搜索语句对应的意图识别结果,所述意图识别模型是基于多个目标搜索语句集以及所述多个目标搜索语句集中每个目标搜索语句集关联的搜索事件信息训练得到的,所述每个目标搜索语句集包括至少一个搜索语句,所述搜索事件信息包括所述至少一个搜索语句中每个搜索语句的搜索次序和/或所述每个搜索语句的搜索结果点击信息,所述意图识别结果用于指示所述目标搜索语句是否具有问答属性;Input the word segmentation result of the target search sentence into a preset intent recognition model to obtain the intent recognition result corresponding to the target search sentence, and the intent recognition model is based on multiple target search sentence sets and the multiple targets The search event information is obtained by training the search event information associated with each target search sentence set in the search sentence set, where each target search sentence set includes at least one search sentence, and the search event information includes information about each search sentence in the at least one search sentence. Search order and/or click information of the search result of each search sentence, and the intention recognition result is used to indicate whether the target search sentence has a question and answer attribute;
如果所述意图识别结果指示所述目标搜索语句具有问答属性,则输出包括所述目标搜索语句对应的问答类搜索结果项的搜索结果。If the intent recognition result indicates that the target search sentence has a question and answer attribute, then output the search result including the question and answer type search result item corresponding to the target search sentence.
第二方面,本申请实施例提供了一种意图识别设备,该意图识别设备包括用于执行上述第一方面的方法的单元。In a second aspect, an embodiment of the present application provides an intention recognition device, which includes a unit for executing the method of the first aspect.
第三方面,本申请实施例提供了另一种意图识别设备,包括处理器和存储器,所述处 理器和存储器相互连接,其中,所述存储器用于存储支持意图识别设备执行上述方法的计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行上述第一方面的方法。可选的,该意图识别设备还可包括用户接口和/或通信接口。In the third aspect, the embodiments of the present application provide another intention recognition device, including a processor and a memory, the processor and the memory are connected to each other, wherein the memory is used to store a computer program that supports the intention recognition device to execute the above method The computer program includes program instructions, and the processor is configured to invoke the program instructions to execute the method of the first aspect described above. Optionally, the intent recognition device may further include a user interface and/or a communication interface.
第四方面,本申请实施例提供了一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行上述第一方面的方法。In a fourth aspect, an embodiment of the present application provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium stores a computer program, the computer program includes program instructions, and the program instructions When executed by a processor, the processor is caused to execute the method of the first aspect.
本申请实施例能够根据搜索语句集关联的搜索事件信息训练得到意图识别模型以进行问答意图识别,使得提升了意图识别的准确性,问答意图识别的可靠性较高。In the embodiment of the present application, an intent recognition model can be trained based on search event information associated with a search sentence set to perform question and answer intent recognition, so that the accuracy of intent recognition is improved, and the reliability of question and answer intent recognition is higher.
附图说明Description of the drawings
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图进行说明。In order to more clearly describe the technical solutions of the embodiments of the present application, the following will describe the drawings that need to be used in the description of the embodiments.
图1是本申请实施例提供的一种意图识别方法的流程示意图;FIG. 1 is a schematic flowchart of an intention recognition method provided by an embodiment of the present application;
图2是本申请实施例提供的另一种意图识别方法的流程示意图;FIG. 2 is a schematic flowchart of another intention identification method provided by an embodiment of the present application;
图3是本申请实施例提供的一种意图识别设备的结构示意图;FIG. 3 is a schematic structural diagram of an intention recognition device provided by an embodiment of the present application;
图4是本申请实施例提供的另一种意图识别设备的结构示意图。Fig. 4 is a schematic structural diagram of another intention recognition device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.
本申请的技术方案可应用于意图识别设备中,该意图识别设备可包括服务器、终端、机器人或其他识别设备,用于训练意图识别模型、对用户搜索语句的意图进行识别等等。本申请涉及的终端可以是手机、电脑、平板、个人计算机、智能手表等,本申请不做限定。The technical solution of the present application can be applied to an intent recognition device, which may include a server, a terminal, a robot, or other recognition devices, for training an intent recognition model, recognizing the intent of a user's search sentence, and so on. The terminal involved in this application may be a mobile phone, computer, tablet, personal computer, smart watch, etc., which is not limited in this application.
具体的,本申请可通过将待进行意图识别的目标搜索语句输入至基于多个搜索语句集及其关联的搜索事件信息训练得到的意图识别模型,以获取得到该目标搜索语句的意图识别结果,确定该目标搜索语句是否具有问答属性,进而在该目标搜索语句具有问答属性时输出问答类的搜索结果,即能够根据搜索语句集关联的搜索事件信息训练得到意图识别模型以进行问答意图识别,这就提升了意图识别的准确性,使得问答意图识别的可靠性较高。以下分别详细说明。Specifically, in this application, the intent recognition result of the target search sentence can be obtained by inputting the target search sentence for intent recognition to the intent recognition model trained based on multiple search sentence sets and their associated search event information. Determine whether the target search sentence has question and answer attributes, and then output the question and answer search results when the target search sentence has question and answer attributes, that is, the intent recognition model can be trained according to the search event information associated with the search sentence set for question and answer intent recognition. This improves the accuracy of intention recognition, and makes question-and-answer intention recognition more reliable. Detailed descriptions are given below.
请参见图1,图1是本申请实施例提供的一种意图识别方法的流程示意图。具体的,本实施例的技术方案可应用于上述的意图识别设备中。如图1所示,该意图识别方法可以包括以下步骤:Please refer to FIG. 1, which is a schematic flowchart of an intention recognition method provided by an embodiment of the present application. Specifically, the technical solution of this embodiment can be applied to the aforementioned intention recognition device. As shown in Figure 1, the intention recognition method may include the following steps:
101、接收用户输入的目标搜索语句。101. Receive a target search sentence input by a user.
其中,该目标搜索语句为待进行意图识别的搜索语句。可以理解,在其他实施例中,该目标搜索语句还可以通过其他方式获取得到,比如从搜索队列获取;该目标搜索语句可以是通过文本方式输入的,也可以是通过语音方式输入的,等等,对于该目标搜索语句的获取方式或者输入方式,本申请不做限定。Wherein, the target search sentence is a search sentence for intent recognition. It can be understood that, in other embodiments, the target search sentence may also be obtained in other ways, such as from a search queue; the target search sentence may be input by text or voice, etc. This application does not limit the method of obtaining or inputting the target search sentence.
102、对该目标搜索语句进行分词处理,以得到该目标搜索语句的分词结果。102. Perform word segmentation processing on the target search sentence to obtain a word segmentation result of the target search sentence.
其中,该目标搜索语句的分词结果可包括组成该目标搜索语句的多个分词(还可称为 词、词语、词条等等)。可选的,该多个分词可以是指该目标搜索语句的所有分词;或者,该多个分词可以是指该所有分词中的部分分词,比如为该所有分词中去掉无意义的分词(如去掉停用词或其他无意义的分词)后的分词,例如,可预置一个过滤列表,该过滤列表可包括各种停用词或其他无意义的词,如“啊”、“哦”、“的”等等,从而在对目标搜索语句进行分词后,能够通过与该过滤列表中的词进行匹配对比的方式确定出查询语句中的停用词等无意义的词,并去掉这些词,以减小后续确定搜索语句是否具有问答属性的检测开销;等等,此处不一一列举。Wherein, the word segmentation result of the target search sentence may include multiple word segments (also referred to as words, words, entries, etc.) that make up the target search sentence. Optionally, the multiple participles may refer to all the participles of the target search sentence; or, the multiple participles may refer to the partial participles of all the participles, such as removing meaningless participles from all the participles (such as removing For example, a filter list can be preset, and the filter list can include various stop words or other meaningless words, such as "ah", "oh", " After the target search sentence is segmented, the stop words and other meaningless words in the query sentence can be determined by matching and comparison with the words in the filter list, and these words can be removed to Reduce the detection overhead of determining whether the search sentence has question and answer attributes; etc., which are not listed here.
可选的,该分词处理对应的分词方法可以为结巴分词或斯坦福分词法或其他分词方法,本申请不做限定。Optionally, the word segmentation method corresponding to the word segmentation processing may be stuttering word segmentation or Stanford word segmentation or other word segmentation methods, which are not limited in this application.
103、将该目标搜索语句的分词结果输入至预置的意图识别模型,以得到该目标搜索语句对应的意图识别结果。其中,该意图识别模型可用于识别搜索语句是否具有问答属性。该意图识别模型可以是基于多个目标搜索语句集以及该多个目标搜索语句集中每个目标搜索语句集关联的搜索事件信息训练得到的,每个目标搜索语句集可包括至少一个搜索语句,该搜索事件信息可包括该至少一个搜索语句中每个搜索语句的搜索次序和/或每个搜索语句的搜索结果点击信息,该意图识别结果可用于指示该目标搜索语句是否具有问答属性。103. Input the word segmentation result of the target search sentence into a preset intention recognition model to obtain the intention recognition result corresponding to the target search sentence. Among them, the intention recognition model can be used to identify whether the search sentence has question and answer attributes. The intent recognition model may be trained based on multiple target search sentence sets and search event information associated with each target search sentence set in the multiple target search sentence sets. Each target search sentence set may include at least one search sentence. The search event information may include the search order of each search sentence in the at least one search sentence and/or search result click information of each search sentence, and the intention recognition result may be used to indicate whether the target search sentence has a question and answer attribute.
在该将该目标搜索语句的分词结果输入至预置的意图识别模型,以得到该目标搜索语句对应的意图识别结果之前,可预先训练得到该意图识别模型。具体的,可通过获取多个目标搜索语句集及其关联的搜索事件信息,比如从预置的搜索语句数据库获取,并分别根据各个目标搜索语句集中每个搜索语句的搜索次序和/或每个搜索语句的搜索结果点击信息确定出各个目标搜索语句集对应的意图,比如确定该各个目标搜索语句集包括的搜索语句是否具有问答属性,进而根据各个目标搜索语句集包括的搜索语句及其是否具有问答属性的确定结果训练得到该意图识别模型。Before inputting the word segmentation result of the target search sentence into a preset intent recognition model to obtain the intent recognition result corresponding to the target search sentence, the intent recognition model can be obtained by pre-training. Specifically, multiple target search sentence sets and their associated search event information can be obtained, for example, from a preset search sentence database, and the search order of each search sentence in each target search sentence set and/or each The click information of the search result of the search sentence determines the intent of each target search sentence set. For example, it is determined whether the search sentence included in each target search sentence set has a question and answer attribute, and then according to the search sentence included in each target search sentence set and whether it has The determined result of the question and answer attribute is trained to obtain the intention recognition model.
其中,该搜索次序可用于指示各搜索语句的搜索先后顺序,该搜索结果点击信息可用于指示用户点击的搜索结果项的信息。可选的,该搜索次序可以是文本信息,或者为标识信息(如1、2、3…),或者该搜索事件信息中还可包括每个搜索语句的搜索时间,该每个搜索语句的搜索次序可以是通过搜索时间来指示的,等等,本申请不做限定。该搜索结果点击信息可包括搜索结果项的点击总数量、问答类的搜索结果项的点击数量、非问答类的搜索结果的点击数量(比如通过设置标签来指示为问答类还是非问答类)和/或点击的各搜索结果项的浏览时长等等。Wherein, the search order can be used to indicate the search order of each search sentence, and the search result click information can be used to indicate the information of the search result item clicked by the user. Optionally, the search order can be text information or identification information (such as 1, 2, 3...), or the search event information can also include the search time of each search sentence, and the search of each search sentence The order can be indicated by the search time, etc., which is not limited in this application. The search result click information may include the total number of clicks on search result items, the number of clicks on Q&A search result items, and the number of clicks on non-Q&A search results (for example, by setting a label to indicate whether it is a Q&A or a non-Q&A) and / Or the browsing time of each search result item clicked, etc.
可选的,该多个目标搜索语句集可以为该搜索语句数据库中出现次数大于第一数目阈值的搜索语句集;或者,可以为该搜索语句数据库中所占比例大于预设比例值的搜索语句集;或者,该搜索语句数据库还可记录各搜索语句的搜索时间,选取的多个该目标搜索语句集可以为历史时间窗内如前一个月内的搜索语句集;或者,选取的多个目标搜索语句集还可以是结合待训练的意图识别模型的应用领域确定出的,或者,该选取的多个目标搜索语句集还可以是通过上述任两个或多个选取方式结合选取的,等等,此处不一一列举。从而有助于提升选取的模型训练数据的可靠性。Optionally, the multiple target search sentence sets may be search sentence sets whose occurrence times in the search sentence database are greater than a first number threshold; or may be search sentences whose proportions in the search sentence database are greater than a preset proportion value Alternatively, the search sentence database may also record the search time of each search sentence, and the selected multiple target search sentence sets may be search sentence sets within the historical time window such as the previous month; or, multiple selected targets The search sentence set may also be determined in combination with the application field of the intent recognition model to be trained, or the selected multiple target search sentence sets may also be selected by combining any two or more selection methods mentioned above, etc. , I will not list them here. This helps to improve the reliability of the selected model training data.
其中,确定该每个目标搜索语句集包括的搜索语句是否具有问答属性,也可称为确定该每个目标搜索语句集是否具有问答属性。可选的,在确定一目标搜索语句集包括的搜索 语句是否具有问答属性时,可以根据该目标搜索语句集关键的搜索事件信息中部分搜索语句的搜索结果点击信息来确定,比如根据该目标搜索语句集中搜索次序前M的搜索语句(即搜索时间最近/最晚的M个搜索语句)的搜索结果点击信息来确定,M为大于或等于1的整数;或者,可以根据该目标搜索语句集中所有搜索语句的搜索结果点击信息来确定;或者,可以根据该目标搜索语句集中各搜索语句的加权系数和各搜索语句的搜索结果点击信息来确定,等等,此处不一一列举。Wherein, determining whether the search sentences included in each target search sentence set has a question and answer attribute can also be referred to as determining whether each target search sentence set has a question and answer attribute. Optionally, when determining whether a search sentence included in a target search sentence set has a question and answer attribute, it can be determined according to the click information of the search result of part of the search sentence in the key search event information of the target search sentence set, for example, according to the target search The search results of the first M search sentences (that is, the M search sentences with the most recent/latest search time) in the sentence set search order are determined by clicking on the information, and M is an integer greater than or equal to 1; alternatively, all search sentences can be set according to the target The search result of the search sentence is determined by click information; or, it can be determined according to the weighting coefficient of each search sentence in the target search sentence set and the search result click information of each search sentence, etc., which are not listed here.
104、如果该意图识别结果指示该目标搜索语句具有问答属性,则输出包括该目标搜索语句对应的问答类搜索结果项的搜索结果。104. If the intent recognition result indicates that the target search sentence has a question and answer attribute, output a search result including a question and answer type search result item corresponding to the target search sentence.
在确定出该目标搜索语句具有问答属性之后,即可获取问答数据,即该目标搜索结果对应的问答类的搜索结果项,并进行展示,以提供给用户。比如可将问答类的搜索结果项按照生成时间或与该目标搜索语句的相关度显示在输出界面的前面,将非问答类的搜索结果项显示在所有问答类搜索结果项之后;又如可以从问答类的搜索结果项选取部分搜索结果项,如生成时间最新的前N项或与该目标搜索语句的相关度最高的前M项显示在输出界面,并在该N项或M项之后显示该目标搜索语句对应的非问答类的搜索结果项(如仍显示生成时间最新的前E项或与该目标搜索语句的相关度最高的前F项),其中,该N、M、E和F均为大于0的整数;又如,该输出界面可仅显示该目标搜索结果对应的问答类的搜索结果项,等等,此处不一一列举。After determining that the target search sentence has a question and answer attribute, the question and answer data, that is, the search result item of the question and answer category corresponding to the target search result, can be obtained, and displayed to provide the user. For example, the Q&A search result items can be displayed in front of the output interface according to the generation time or the relevance to the target search sentence, and the non-Q&A search result items can be displayed after all the Q&A search result items; The Q&A search result item selects some search result items, such as the top N items with the latest generation time or the top M items with the highest correlation with the target search sentence, which will be displayed on the output interface, and will be displayed after the N items or M items. The non-Q&A search result items corresponding to the target search sentence (for example, the top E item with the latest generation time or the top F item with the highest correlation with the target search sentence is still displayed), where the N, M, E, and F are all It is an integer greater than 0; for another example, the output interface may only display the search result items of the question and answer category corresponding to the target search result, etc., which are not listed here.
在本实施例中,意图识别设备能够在获取到用户输入的目标搜索语句时,通过对该目标搜索语句进行分词处理以得到分词结果,将该分词结果输入至基于多个搜索语句集及其关联的搜索事件信息训练得到的的意图识别模型,以获取得到该目标搜索语句是否具有问答属性,进而在该目标搜索语句具有问答属性时,输出包括问答类搜索结果项的搜索结果,使得能够根据搜索语句集关联的搜索事件信息训练得到意图识别模型以进行问答意图识别,由此提升了意图识别的准确性,问答意图识别的可靠性较高。In this embodiment, when acquiring the target search sentence input by the user, the intention recognition device can perform word segmentation processing on the target search sentence to obtain a word segmentation result, and input the word segmentation result into a set based on multiple search sentence sets and their associations. The intent recognition model trained on the search event information to obtain whether the target search sentence has the question and answer attribute, and then when the target search sentence has the question and answer attribute, the search result including the question and answer search result item is output, so that the search The search event information associated with the sentence set is trained to obtain an intent recognition model for question and answer intent recognition, which improves the accuracy of intent recognition, and the reliability of question and answer intent recognition is high.
请参见图2,图2是本申请实施例提供的另一种意图识别方法的流程示意图。具体的,如图2所示,该意图识别方法可以包括以下步骤:Please refer to FIG. 2, which is a schematic flowchart of another intention recognition method provided by an embodiment of the present application. Specifically, as shown in FIG. 2, the intention recognition method may include the following steps:
201、从搜索语句数据库中选取多个目标搜索语句集,该搜索语句数据库中记录了多个搜索语句集以及每个搜索语句集关联的搜索事件信息。201. Select a plurality of target search sentence sets from a search sentence database. The search sentence database records multiple search sentence sets and search event information associated with each search sentence set.
其中,每个搜索语句集包括一个或多个搜索语句,即包括至少一个搜索语句,该搜索事件信息包括该至少一个搜索语句中每个搜索语句的搜索次序和/或该每个搜索语句的搜索结果点击信息,此处不赘述。Wherein, each search sentence set includes one or more search sentences, that is, includes at least one search sentence, and the search event information includes the search order of each search sentence in the at least one search sentence and/or the search of each search sentence The result is clicked information, so I won’t repeat it here.
可选的,如果一搜索语句集包括多个搜索语句,该多个搜索语句中任两个搜索语句之间的搜索时间间隔不超过预设时间阈值,且所述多个搜索语句中任两个搜索语句之间的关键词(如去除无意义的词之后的其他分词)的重叠率高于预设重叠率阈值。也就是说,该搜索语句集包括的搜索语句可以是指预设时间范围内(如间隔首次搜索2分钟内)的类似搜索语句,即关键词(如对搜索语句中除去语气词和停用词的分词作为关键词)重叠率高于预设重叠率阈值的搜索语句。例如,预设的时间阈值为2min,重叠率阈值为70%,两个搜索语句的搜索时间间隔为30s,即不超过该预设时间阈值,两个搜索语句的关键词分别为5个和6个,两个搜索语句重叠的(相同的)关键词有4个,每个关键词的权值相同,未 存在加权系数,即重叠率为4/5=80%(即可以取两个关键词中较小的关键词数目,在其他实施例中,还可以取较大的关键词数目,或者还可以取两者的平均值,等等,此处不一一列举),大于重叠率阈值,则可将这两个搜索语句放入同一搜索语句集。因用户在第一次搜索结果不理想的情况下,可能变换搜索语句句型或结构来进行搜索。进一步可选的,还可预先为预设关键词(如领域特有词语或出现频率较高的词语)设置加权系数,各预设关键词对应的加权系数可以相同也可以不同;进而在基于关键词确定类似搜索语句以确定搜索语句集时,可以通过匹配搜索语句中是否存在该特定关键词,如果存在该特定关键词,则可按照该特定关键词的加权系数对该特定关键词进行加权处理,或者说对该关键词重叠率进行加权处理,即增加该关键词重叠率后再进行类似搜索语句的判断,根据搜索时间和该加权处理后的搜索语句的重叠率来确定该搜索语句集。从而有助于提升搜索语句集确定的可靠性。Optionally, if a search sentence set includes multiple search sentences, the search time interval between any two of the multiple search sentences does not exceed a preset time threshold, and any two of the multiple search sentences The overlap rate of keywords between search sentences (such as other word segmentation after removing meaningless words) is higher than the preset overlap rate threshold. That is to say, the search sentences included in the search sentence set may refer to similar search sentences within a preset time range (such as within 2 minutes of the first search), that is, keywords (such as removing modal particles and stop words from the search sentence). The word segmentation is used as a keyword) for search sentences whose overlap rate is higher than the preset overlap rate threshold. For example, the preset time threshold is 2min, the overlap rate threshold is 70%, the search time interval of two search sentences is 30s, that is, the preset time threshold is not exceeded, and the keywords of the two search sentences are 5 and 6 respectively. There are 4 (same) keywords overlapped by the two search sentences. The weight of each keyword is the same. There is no weighting coefficient, that is, the overlap rate is 4/5=80% (that is, two keywords can be selected In other embodiments, the number of keywords in the smaller one may be larger, or the average value of the two may also be used, etc., which are not listed here), which is greater than the overlap rate threshold, Then these two search sentences can be put into the same search sentence set. Because the user may change the sentence pattern or structure of the search sentence to search when the first search result is not ideal. Further optionally, weighting coefficients can be set for preset keywords (such as domain-specific words or words with a high frequency of occurrence) in advance, and the weighting coefficients corresponding to each preset keyword may be the same or different; When determining similar search sentences to determine the set of search sentences, you can match whether the specific keyword exists in the search sentence. If the specific keyword exists, then the specific keyword can be weighted according to the weighting coefficient of the specific keyword. In other words, the keyword overlap rate is weighted, that is, after the keyword overlap rate is increased, the similar search sentence is judged, and the search sentence set is determined according to the search time and the overlap rate of the weighted search sentence. This helps to improve the reliability of the search sentence set determination.
进一步可选的,可选的,该多个目标搜索语句集可以为该搜索语句数据库中出现次数大于第一数目阈值的搜索语句集;或者,可以为该搜索语句数据库中所占比例大于预设比例值的搜索语句集;或者,该搜索语句数据库还可记录各搜索语句的搜索时间,选取的多个该目标搜索语句集可以为历史时间窗内如前一个月内的搜索语句集;或者,选取的多个目标搜索语句集还可以是结合待训练的意图识别模型的应用领域确定出的,或者,该选取的多个目标搜索语句集还可以是通过上述任两个或多个选取方式结合选取的,等等,此处不一一列举。从而有助于提升选取的模型训练数据的可靠性。Further optionally, optionally, the multiple target search sentence sets may be search sentence sets in the search sentence database whose occurrence times are greater than a first number threshold; or, it may be that the proportion of the search sentence database in the search sentence database is greater than a preset Proportional search sentence sets; or, the search sentence database may also record the search time of each search sentence, and the selected multiple target search sentence sets may be the search sentence sets in the historical time window, such as the previous month; or, The selected multiple target search sentence sets may also be determined in combination with the application field of the intent recognition model to be trained, or the selected multiple target search sentence sets may also be combined through any two or more selection methods described above Selected, etc., not listed here. This helps to improve the reliability of the selected model training data.
例如,在一种可能的实施方式中,在选取该多个目标搜索语句集时,意图识别设备可以从搜索语句数据库中确定出现次数大于预设的第二数目阈值的搜索语句集,并将确定出的该出现次数大于该第二数目阈值的搜索语句集作为该多个目标搜索语句集;或者,可以从搜索语句数据库中确定出现次数与该搜索语句数据库中搜索语句总数量之间的第二比值大于预设的第二比例阈值的搜索语句集,并将确定出的该第二比值大于该第二比例阈值的搜索语句集作为该多个目标搜索语句集;或者,还可以从搜索语句数据库中确定出现次数大于预设的第二数目阈值,且第二比值大于预设的第二比例阈值的搜索语句集,并将确定出的该出现次数大于该第二数目阈值且第二比值大于预设的第二比例阈值的搜索语句集作为该多个目标搜索语句集,等等,此处不一一列举。其中,搜索语句集的出现次数可以为该搜索语句集包括的搜索语句的出现次数之和,或者,搜索语句集的出现次数可以为该搜索语句集包括的搜索语句的出现次数的平均值,或者,搜索语句集的出现次数可以为该搜索语句集包括的搜索语句的最高出现次数,等等,此处不一一列举。搜索语句出现次数可以是指该搜索数据库中该搜索语句的数目或该搜索数据库中与该搜索语句的相似度高于阈值的搜索语句的数目等等,本申请不做限定。For example, in a possible implementation manner, when selecting the multiple target search sentence sets, the intent recognition device may determine the search sentence sets whose occurrence times are greater than the preset second number threshold from the search sentence database, and determine The set of search sentences with the number of occurrences greater than the second number threshold is used as the multiple target search sentence sets; or, the second set between the number of occurrences and the total number of search sentences in the search sentence database can be determined from the search sentence database. A search sentence set whose ratio is greater than a preset second proportion threshold, and the determined search sentence set whose second ratio is greater than the second proportion threshold is used as the multiple target search sentence sets; or, it can also be obtained from a search sentence database In determining the number of occurrences greater than the preset second number threshold, and the second ratio is greater than the preset second ratio threshold, the determined number of occurrences is greater than the second number threshold and the second ratio is greater than the preset The search sentence set of the second ratio threshold is set as the multiple target search sentence sets, etc., which are not listed here. Wherein, the number of occurrences of the search sentence set may be the sum of the number of occurrences of the search sentences included in the search sentence set, or the number of occurrences of the search sentence set may be the average number of occurrences of the search sentences included in the search sentence set, or , The number of occurrences of the search sentence set may be the highest number of occurrences of the search sentences included in the search sentence set, etc., which are not listed here. The number of occurrences of search sentences may refer to the number of search sentences in the search database or the number of search sentences in the search database whose similarity to the search sentence is higher than a threshold, and so on, which is not limited in this application.
又如,在一种可能的实施方式中,在选取该多个目标搜索语句集时,意图识别设备可以确定待训练的意图识别模型的应用领域信息,并根据该应用领域信息从该搜索语句数据库包括的多个子数据库中确定出目标子数据库,进而从该目标子数据库选取所述多个目标搜索语句集。其中,该子数据库与应用领域一一对应,每个子数据库包括对应的应用领域下的多个搜索语句集(其数量大于选取的目标搜索语句集的数量)以及每个搜索语句集关联的搜索事件信息,该目标子数据库对应的应用领域与该应用领域信息指示的应用领域相 同。也就是说,该搜索语句数据库可包括各应用领域下的子数据库,每个子数据库包括一应用领域下的搜索语句集以及每个搜索语句集关联的每个搜索语句的搜索次序和搜索结果点击信息等等,从而在选取目标搜索语句集时,可以通过确定待训练的意图识别模型的应用领域信息(如领域标签)来确定子数据库(如携带有该领域标签的子数据库),并从中选取目标搜索语句集。从而能够进一步提升选取的模型训练数据的可靠性,进而提升训练效果。For another example, in a possible implementation manner, when selecting the multiple target search sentence sets, the intention recognition device may determine the application field information of the intent recognition model to be trained, and obtain information from the search sentence database according to the application field information. The target sub-database is determined from the included multiple sub-databases, and then the multiple target search sentence sets are selected from the target sub-database. Among them, the sub-database has a one-to-one correspondence with the application field, and each sub-database includes multiple search sentence sets under the corresponding application field (the number of which is greater than the number of the selected target search sentence sets) and the search events associated with each search sentence set Information, the application field corresponding to the target sub-database is the same as the application field indicated by the application field information. That is to say, the search sentence database may include sub-databases under each application field, and each sub-database includes a search sentence set under an application field and the search order of each search sentence associated with each search sentence set and search result click information And so on, so when selecting the target search sentence set, you can determine the sub-database (such as the sub-database carrying the field label) by determining the application field information (such as the field label) of the intent recognition model to be trained, and select the target from it Search sentence set. Thereby, the reliability of the selected model training data can be further improved, and the training effect can be improved.
202、分别对该多个目标搜索语句集中每个目标搜索语句集包括的搜索语句进行分词处理,以得到该每个目标搜索语句集的分词结果。202. Perform word segmentation processing on the search sentences included in each target search sentence set of the multiple target search sentence sets respectively to obtain a word segmentation result of each target search sentence set.
其中,该每个目标搜索语句集的分词结果包括组成该目标搜索语句集的搜索语句的多个分词,该多个分词可以是指该目标搜索语句集的搜索语句的所有分词,或者可以是指该所有分词中的部分分词,比如为该所有分词中去掉无意义的分词(如去掉停用词或其他无意义的分词)后的分词,例如,可预置一个过滤列表,该过滤列表可包括各种停用词或其他无意义的词,如“啊”、“哦”、“的”等等,从而在对目标搜索语句集的搜索语句进行分词后,能够通过与该过滤列表中的词进行匹配对比的方式确定出查询语句中的停用词等无意义的词,并去掉这些词,以减小后续确定搜索语句是否具有问答属性的检测开销;或者,该多个分词可以是指该目标搜索语句集中搜索次序最大(即最近一次搜索)的搜索语句的分词(可以是该搜索次序最大的搜索语句的所有分词或部分分词,此处不赘述),等等,此处不赘述。Wherein, the word segmentation result of each target search sentence set includes multiple word breaks that make up the search sentence of the target search sentence set, and the multiple word breaks may refer to all the word breaks of the search sentence of the target search sentence set, or may refer to The partial participles in all the participles, for example, the participles after removing the meaningless participles (such as removing stop words or other meaningless participles) from all the participles. For example, a filter list can be preset, and the filter list can include Various stop words or other meaningless words, such as "ah", "oh", "的", etc., so that after the search sentence of the target search sentence set is segmented, it can pass the word in the filter list The method of matching and comparison determines the stop words and other meaningless words in the query sentence, and removes these words to reduce the detection overhead of determining whether the search sentence has the question and answer attribute; or, the multiple word segmentation can refer to the The word segmentation of the search sentence with the largest search order (that is, the most recent search) in the target search sentence set (may be all or part of the word segmentation of the search sentence with the largest search order, which will not be repeated here), etc., and will not be repeated here.
203、根据该每个目标搜索语句集关联的搜索事件信息,确定该每个目标搜索语句集包括的搜索语句是否具有问答属性。203. According to the search event information associated with each target search sentence set, determine whether the search sentence included in each target search sentence set has a question and answer attribute.
可选的,该搜索结果点击信息可包括搜索结果项的点击总数量和问答类的搜索结果项的点击数量;在根据搜索语句的搜索结果点击信息确定搜索语句是否具有问答属性时,可以通过将该搜索语句的搜索结果点击信息包括的搜索结果项的点击总数量与预设的第一数目阈值进行比较,以及计算该搜索语句的搜索结果点击信息包括的问答类的搜索结果项的点击数量与搜索结果项的点击总数量之间的第一比值,将该第一比值与预设的第一比例值进行比较;如果该搜索结果项的点击总数量大于预设的第一数目阈值,且该第一比值大于预设的第一比例阈值,则可确定该搜索语句具有问答属性;否则,可表明不具有问答属性(或者可结合其余方式进一步判断)。或者,可选的,该搜索结果点击信息可包括搜索结果项的点击总数量、问答类的搜索结果项的点击数量和点击的各搜索结果项的浏览时长;在根据搜索语句的搜索结果点击信息确定搜索语句是否具有问答属性时,可以过滤掉浏览时长小于预设时长阈值的搜索结果项,并确定过滤该搜索结果项之后剩余的搜索结果项的点击总数量(即搜索结果点击信息包括的搜索结果项的点击总数量减去浏览时长小于预设时长阈值的搜索结果项的数量),确定过滤该搜索结果项之后剩余的问答类的搜索结果项的点击数量(即搜索结果点击信息包括的问答类的搜索结果项的点击数量减去浏览时长小于预设时长阈值的搜索结果项的数量),以及计算该剩余的问答类的搜索结果项的点击数量与该剩余的搜索结果项的点击总数量之间的第一比值;如果该剩余的搜索结果项的点击总数量大于预设的第一数目阈值,且该第一比值大于预设的第一比例阈值,则可确定该搜索语句具有问答属性。或者,可选的,在根据搜索语句的搜索结果点击信息确定搜索语句 是否具有问答属性时,还可以通过将该搜索语句的搜索结果点击信息包括的问答类的搜索结果项的点击数量与预设的另一数目阈值进行比较;如果该问答类的搜索结果项的点击数量大于该另一数目阈值,则可确定该搜索语句具有问答属性,等等,此处不一一列举。Optionally, the search result click information may include the total number of clicks of search result items and the number of clicks of Q&A search result items; when determining whether the search sentence has the Q&A attribute according to the search result click information of the search sentence, you can select The total number of clicks on the search result items included in the search result click information of the search sentence is compared with a preset first number threshold, and the number of clicks on the Q&A search result items included in the search result click information of the search sentence is calculated with The first ratio between the total number of clicks on the search result item, and compare the first ratio with the preset first ratio value; if the total number of clicks on the search result item is greater than the preset first number threshold, and the If the first ratio is greater than the preset first ratio threshold, it can be determined that the search sentence has a question and answer attribute; otherwise, it can be indicated that it does not have a question and answer attribute (or can be further determined in combination with other methods). Or, optionally, the search result click information may include the total number of clicks on search result items, the number of clicks on Q&A search result items, and the browsing time of each clicked search result item; in the search result click information based on the search sentence When determining whether a search sentence has a question and answer attribute, you can filter out search result items whose browsing duration is less than the preset duration threshold, and determine the total number of clicks of search result items remaining after filtering the search result item (that is, the search result click information includes the search The total number of clicks on the result item minus the number of search result items whose browsing duration is less than the preset duration threshold), to determine the number of clicks on the question and answer search result items remaining after filtering the search result item (ie the question and answer included in the search result click information The number of clicks on search result items of the category minus the number of search result items whose browsing duration is less than the preset duration threshold), and the number of clicks on the remaining Q&A search result items and the total number of clicks on the remaining search result items are calculated If the total number of clicks on the remaining search result items is greater than the preset first number threshold, and the first ratio is greater than the preset first ratio threshold, it can be determined that the search sentence has a question and answer attribute . Or, optionally, when determining whether the search sentence has a question and answer attribute according to the click information of the search result of the search sentence, the number of clicks on the search result items of the question and answer type included in the search result click information of the search sentence and the preset If the number of clicks of the search result item of the question and answer category is greater than the other number threshold, it can be determined that the search sentence has question and answer attributes, etc., which are not listed here.
例如,在一种可能的实施方式中,在确定该目标搜索语句集包括的搜索语句是否具有问答属性时,可以根据该目标搜索语句集关联的搜索事件信息所包括的每个搜索语句的搜索次序,确定该目标搜索语句集包括的该至少一个搜索语句中最大搜索次序对应的搜索语句;根据该最大搜索次序对应的搜索语句的搜索结果点击信息,确定该目标搜索语句集包括的搜索语句是否具有问答属性。根据该最大搜索次序对应的搜索语句的搜索结果点击信息,确定该目标搜索语句集包括的搜索语句是否具有问答属性的方式,可参照上述根据搜索语句的搜索结果点击信息确定搜索语句是否具有问答属性的方式,此处不赘述。如果该最大搜索次序对应的搜索语句具有问答属性,即可确定该目标搜索语句集包括的搜索语句具有问答属性。也就是说,在确定一目标搜索语句是否具有问答属性时,可以从该关联的搜索事件中最近的一次搜索事件,即最大搜索次数对应的搜索语句的搜索事件,根据该最大搜索次数下用户点击的搜索结果信息确定该目标搜索语句是否具有问答属性,因前面几次搜索得到的搜索结果可能并不是用户想要的,则可以以后续的点击为准,以提升判断效率,并确保判断准确性。For example, in a possible implementation manner, when determining whether the search sentences included in the target search sentence set have question-and-answer attributes, the search order of each search sentence included in the search event information associated with the target search sentence set may be used. , Determine the search sentence corresponding to the largest search order in the at least one search sentence included in the target search sentence set; determine whether the search sentence included in the target search sentence set has the search result click information of the search sentence corresponding to the maximum search sequence Q&A attributes. According to the click information of the search result of the search sentence corresponding to the maximum search order, the method of determining whether the search sentence included in the target search sentence set has the question and answer attribute can be determined by referring to the above click information according to the search result of the search sentence to determine whether the search sentence has the question and answer attribute The method is not repeated here. If the search sentence corresponding to the maximum search order has the question and answer attribute, it can be determined that the search sentence included in the target search sentence set has the question and answer attribute. That is to say, when determining whether a target search sentence has a question and answer attribute, the most recent search event from the related search events, that is, the search event of the search sentence corresponding to the maximum number of searches, can be selected according to the maximum number of searches. The search result information of to determine whether the target search sentence has question and answer attributes, because the search results obtained from the previous searches may not be what the user wants, you can follow the subsequent clicks to improve the judgment efficiency and ensure the judgment accuracy .
又如,在一种可能的实施方式中,在确定该目标搜索语句集包括的搜索语句是否具有问答属性时,可以根据该目标搜索语句集中所有搜索语句的搜索结果点击信息来确定,具体可参照上述根据搜索语句的搜索结果点击信息确定搜索语句是否具有问答属性的方式,比如统计该所有搜索语句的搜索结果点击信息中问答类的搜索结果项的点击数量之和,并判断该点击数量之和是否超过预设数目阈值,如果超过,则可表明该目标搜索语句集包括的搜索语句具有问答属性,等等,此处不赘述。For another example, in a possible implementation manner, when determining whether the search sentence included in the target search sentence set has a question and answer attribute, it can be determined according to the click information of the search results of all the search sentences in the target search sentence set. For details, please refer to The above method of determining whether the search sentence has a question and answer attribute according to the click information of the search result of the search sentence, for example, counting the sum of the number of clicks of the search result items of the question and answer type in the search result click information of all the search sentences, and judging the sum of the number of clicks Whether it exceeds the preset number threshold, if it exceeds, it can indicate that the search sentences included in the target search sentence set have question-and-answer attributes, etc., which will not be repeated here.
又如,在一种可能的实施方式中,可以预先设置得到各搜索语句的加权系数,比如包括疑问词的搜索语句的加权系数高于未包括疑问词的搜索语句的加权系数,和/或,搜索次序大的搜索语句的加权系数高于搜索次序小的搜索语句的加权系数(即搜索次序越大的搜索语句,其加权系数越高),和/或,对于搜索结果点击信息包括的搜索结果点击项中存在特定问答网站的展示结果或搜索结果中存在特定问答网站的展示结果的搜索语句,其加权系数高于不存在该特定问答网站的展示结果的搜索语句的加权系数,等等。在该目标搜索语句集包括的搜索语句是否具有问答属性时,可以确定该目标搜索语句集包括的该至少一个搜索语句中每个搜索语句对应的加权系数;根据每个搜索语句对应的加权系数和该目标搜索语句集关联的搜索事件信息中的搜索结果点击信息,确定该目标搜索语句集包括的搜索语句是否具有问答属性。根据该加权系数和搜索结果点击信息可以是指通过该加权系数对搜索结果点击信息的参数如问答类的搜索结果点击项的数量、浏览时长等等进行加权,加权后,其问答属性确定方式具体可参照上述根据搜索语句的搜索结果点击信息确定搜索语句是否具有问答属性的方式。例如,可以通过各搜索语句的加权系数对各搜索语句对应的问答类搜索结果项的点击数量进行加权处理(如每一搜索语句对应的问答类搜索结果项的点击数量为2,加权系数为1.5,加权后的问答类搜索结果项的点击数量则为2*1.5=3);如果该目标搜索语句集中各搜索语句的搜索结果项的点击总数量大于预设的第一数目阈值, 且各搜索语对应的问答类的搜索结果项的点击数量(加权后)之和与各搜索语句的搜索结果项的点击总数量之间的第一比值大于预设的第一比例阈值,则可确定该目标搜索语句集包括的搜索语句具有问答属性。也就是说,在确定一目标搜索语句集是否具有问答属性时,可以根据每次搜索次数下用户点击的搜索结果信息以及每次搜索结果的权重来确定该目标搜索语句是否具有问答属性。从而有助于提升确定出的搜索语句集的问答属性的可靠性。For another example, in a possible implementation manner, the weighting coefficient of each search sentence may be preset, for example, the weighting coefficient of the search sentence including the question word is higher than the weighting coefficient of the search sentence not including the question word, and/or, The weighting coefficient of the search sentence with a higher search order is higher than the weighting coefficient of the search sentence with a lower search order (that is, the higher the search order, the higher the weighting coefficient is), and/or the search result included in the click information for the search result The click item has a display result of a specific question and answer website or a search sentence that has a display result of a specific question and answer website in the search result, and its weighting coefficient is higher than the weight coefficient of a search sentence that does not have a display result of the specific question and answer website, and so on. When the search sentences included in the target search sentence set have question-and-answer attributes, the weight coefficient corresponding to each search sentence in the at least one search sentence included in the target search sentence set can be determined; according to the sum of the weight coefficients corresponding to each search sentence The search result click information in the search event information associated with the target search sentence set determines whether the search sentence included in the target search sentence set has a question and answer attribute. According to the weighting coefficient and the search result click information can refer to the weighting of the parameters of the search result click information, such as the number of search result click items of the question and answer category, the browsing time, etc., by the weighting coefficient. After weighting, the question and answer attribute determination method is specific You can refer to the above method of determining whether the search sentence has question and answer attributes based on the click information of the search result of the search sentence. For example, the number of clicks on the Q&A search result item corresponding to each search sentence can be weighted by the weighting coefficient of each search sentence (for example, the number of clicks on the Q&A search result item corresponding to each search sentence is 2, and the weighting factor is 1.5. , The number of clicks on the weighted question and answer search result items is 2*1.5=3); if the total number of clicks on the search result items of each search sentence in the target search sentence set is greater than the preset first number threshold, and each search The first ratio between the sum of the number of clicks on the search result items of the question and answer category (weighted) and the total number of clicks on the search result items of each search sentence is greater than the preset first ratio threshold, then the target can be determined The search sentences included in the search sentence set have question and answer attributes. That is to say, when determining whether a target search sentence set has the question and answer attribute, it can be determined whether the target search sentence has the question and answer attribute according to the search result information clicked by the user for each search number and the weight of each search result. This helps to improve the reliability of the question and answer attributes of the determined search sentence set.
可选的,在本申请中,该步骤202和203的执行顺序不受限定,比如还可先执行步骤203,再执行步骤202,或者,该步骤202和203可同时执行,本申请不做限定。Optionally, in this application, the execution order of steps 202 and 203 is not limited. For example, step 203 can be performed first, and then step 202 can be performed, or steps 202 and 203 can be performed simultaneously, which is not limited in this application. .
204、将该多个目标搜索语句集中具有问答属性的目标搜索语句集的分词结果作为正样本,以及将该多个目标搜索语句集中不具有问答属性的目标搜索语句集的分词结果作为负样本。204. The word segmentation results of the target search sentence set with question and answer attributes in the multiple target search sentence sets are used as positive samples, and the word segmentation results of the target search sentence sets without question answering attributes in the multiple target search sentence sets are used as negative samples.
可选的,该多个目标搜索语句集中具有问答属性的目标搜索语句集的分词结果可以包括该多个目标搜索语句集中具有问答属性的搜索语句的分词,其可以是某一个或多个目标搜索语句集的所有分词,也可以是其中的部分分词。也即,该正样本可包括具有问答属性的搜索语句的分词,该负样本可包括不具有问答属性的搜索语句的分词。比如确定某一目标搜索语句集的搜索语句具有问答属性时,可以将该目标搜索语句集的搜索语句的所有分词(可去除无意义的分词)作为正样本。Optionally, the word segmentation results of the target search sentence set with the question and answer attribute in the multiple target search sentence sets may include the word segmentation of the search sentences with the question and answer attribute in the multiple target search sentence sets, which may be one or more target searches All participles of the sentence set can also be part of the participles. That is, the positive sample may include word segmentation of search sentences with question and answer attributes, and the negative sample may include word segmentation of search sentences without question and answer attributes. For example, when it is determined that the search sentence of a certain target search sentence set has a question and answer attribute, all the word segmentation (meaningless word segmentation can be removed) of the search sentence of the target search sentence set can be used as a positive sample.
在一些实施例中,可以将具有问答属性的搜索语句的分词作为正训练样本,将不具有问答属性的搜索语句的分词作为负训练样本,以基于该正训练样本和负训练样本训练得到意图识别模型,使得后续能够通过该意图识别模型快速识别输入的搜索语句是否具有问答属性。进而能够根据识别出的该输入的搜索语句是否具有问答属性的识别结果,来为用户返回信息。例如,该具有问答属性的搜索语句可表明具有问答需求,不具有问答属性的搜索语句可表明不具有问答需求,由此可根据搜索语句是否具有问答需求向用户返回不同的页面(界面),提供不同的需求内容。In some embodiments, the word segmentation of the search sentence with the question and answer attribute may be used as a positive training sample, and the word segmentation of the search sentence without the question and answer attribute may be used as the negative training sample, so as to obtain the intention recognition based on the positive training sample and the negative training sample. Model, so that the subsequent intent recognition model can quickly identify whether the input search sentence has question and answer attributes. Furthermore, it is possible to return information for the user according to whether the recognized input search sentence has a Q&A attribute recognition result. For example, the search sentence with question and answer attributes may indicate that there is a question and answer requirement, and the search sentence without question and answer attributes may indicate that there is no question and answer requirement, so that different pages (interfaces) can be returned to the user according to whether the search sentence has question and answer requirements. Different demand content.
可选的,该步骤201-204的描述和上述图1所示实施例的相关描述可相互参照,此处不赘述。Optionally, the description of the steps 201-204 and the related description of the embodiment shown in FIG. 1 may refer to each other, which is not repeated here.
205、计算该正样本对应的搜索语句的数量与该负样本对应的搜索语句的数量之间的差值的绝对值。205. Calculate the absolute value of the difference between the number of search sentences corresponding to the positive sample and the number of search sentences corresponding to the negative sample.
206、判断该绝对值是否超过预设的数目阈值。206. Determine whether the absolute value exceeds a preset number threshold.
207、如果该绝对值超过该数目阈值,按照预设的样本平衡规则对该正样本和/或该负样本进行处理,以得到处理后的正样本和负样本。207. If the absolute value exceeds the number threshold, process the positive sample and/or the negative sample according to a preset sample balance rule to obtain processed positive samples and negative samples.
可选的,在确定该多个目标搜索语句集对应的正样本和负样本之后,还可进一步确定正负样本的数量是否平衡,比如判断该正样本对应的搜索语句的数量与该负样本对应的搜索语句的数量之间的差值的绝对值是否超过预设的数目阈值如预设的第三数目阈值,如果超过,则可表明该正负样本的数量不平衡。因在训练模型时,很多时候正负样本不平衡,导致训练出的模型识别准确性较差,因容易对比例大的样本造成过拟合,也就是说预测容易偏向样本数较多的分类,这就大大降低了模型的范化能力,导致其识别结果不可靠。因此,在训练之前,可分别统计正负样本的数量,在两者差距过大如相差超过预设的第三数目阈值时,可以按照预设的样本平衡规则平衡正负样本的数量之后再进行训练。该预设的 样本平衡规则可以为多种,具体可根据正负样本的数量进行选择,或者根据训练场景进行选择。例如,针对正样本较少的情况,可以采用增加正样本的方式来平衡正负样本;又如,针对负样本较少的情况,可以采用增加负样本来平衡正负样本;又如,针对需要大量样本进行训练的场景(如场景标签为多样本),可采用合成样本的方式平衡正负样本;又如,针对可靠性要求较高的场景(如场景标签为高可靠性),可采用改变样本权重的方式平衡正负样本,具体可预先设置得到该各样本平衡规则以及选择使用的场景等。可选的,平衡正负样本的方式可以如下:Optionally, after determining the positive samples and negative samples corresponding to the multiple target search sentence sets, it can be further determined whether the number of positive and negative samples is balanced, for example, it is determined that the number of search sentences corresponding to the positive sample corresponds to the negative sample Whether the absolute value of the difference between the number of search sentences exceeds a preset number threshold, such as a preset third number threshold, if it exceeds, it may indicate that the number of positive and negative samples is unbalanced. When training the model, the positive and negative samples are often unbalanced, resulting in poor recognition accuracy of the trained model. It is easy to overfit a large sample, that is to say, the prediction is easy to be biased to the classification with a large number of samples. This greatly reduces the normalization ability of the model, resulting in unreliable recognition results. Therefore, before training, the number of positive and negative samples can be counted separately. When the difference between the two is too large, such as the difference exceeds the preset third number threshold, the number of positive and negative samples can be balanced according to the preset sample balance rule. training. The preset sample balance rule can be multiple, and can be selected according to the number of positive and negative samples, or according to the training scenario. For example, in the case of fewer positive samples, you can increase the positive samples to balance the positive and negative samples; another example, in the case of fewer negative samples, you can increase the negative samples to balance the positive and negative samples; another example, for needs For scenes where a large number of samples are trained (for example, the scene label is multi-sample), the positive and negative samples can be balanced by the way of synthetic samples; another example, for scenes with high reliability requirements (for example, the scene label is high reliability), changes can be adopted The method of sample weight balances positive and negative samples, and the specific balance rules for each sample and the selected scenes can be preset. Optionally, the way to balance positive and negative samples can be as follows:
1)上采样:增加样本数较少的样本,其方式是直接复制原来的样本。比如可以在样本较少时采用。1) Upsampling: To increase the samples with a small number of samples, the method is to directly copy the original samples. For example, it can be used when the sample is small.
2)下采样:减少样本数较多的样本,其方式是丢弃这些多余的样本。比如可以在样本较多时采用。如可按照点击总数量对目标搜索语句进行排序,丢弃点击总数量较少的目标搜索语句对应的样本。2) Downsampling: To reduce samples with a large number of samples, the way is to discard these redundant samples. For example, it can be used when there are many samples. For example, the target search sentences can be sorted according to the total number of clicks, and samples corresponding to the target search sentences with a small total number of clicks are discarded.
3)合成样本:增加样本数目较少的那一类的样本,合成指的是通过组合已有的样本的各个特征以产生新的样本。具体的,该产生新样本的方式可以是从各个特征中随机选出一些特征或者通过一些方式选出某些特定的特征(如出现次数高于阈值的特征,或者样本相似度高于阈值如欧氏距离小于阈值的样本之间的特征等等)之后,然后拼接成一个新的样本,从而增加了样本数目较少的类别的样本数。不同于上采样是单纯的复制样本,而这里则是拼接得到新的样本,使得能够进一步提升模型训练的可靠性。3) Synthetic samples: increase the type of samples with a small number of samples. Synthesis refers to the combination of various features of existing samples to generate new samples. Specifically, the method of generating a new sample can be to randomly select some features from each feature or select some specific features through some methods (such as features with a number of occurrences higher than a threshold, or sample similarity higher than a threshold, such as Europe The features between samples whose distance is less than the threshold, etc.) are then spliced into a new sample, thereby increasing the number of samples in the category with a smaller number of samples. Unlike upsampling, which is simply copying samples, here is the splicing to obtain new samples, which can further improve the reliability of model training.
4)改变样本权重:增大关键分词的权重,假如对于正样本,对于具有明显问答属性的分词可以乘上一个权重,以提升判断可靠性。4) Change the sample weight: increase the weight of the key word segmentation. If for a positive sample, the word segmentation with obvious question and answer attributes can be multiplied by a weight to improve the reliability of judgment.
在得到正负样本之后,即可训练得到意图识别模型,使得后续能够根据该意图识别模型快速识别出输入的搜索语句是否具有问答意图。After the positive and negative samples are obtained, the intent recognition model can be trained, so that the subsequent intent recognition model can quickly identify whether the input search sentence has question and answer intent.
208、利用处理后的正样本和负样本训练得到该意图识别模型。208. Use the processed positive samples and negative samples to train to obtain the intention recognition model.
其中,该意图识别模型可用于识别输入的搜索语句是否具有问答属性。该模型可以是基于二叉树的模型,也可以是基于多叉树的模型,也可以是神经网络模型,等等,本申请不做限定。Among them, the intention recognition model can be used to identify whether the input search sentence has question and answer attributes. The model may be a model based on a binary tree, a model based on a multi-tree, or a neural network model, etc., which is not limited in this application.
可选的,在确定出具有问答意图的搜索语句(集)之后,还可进一步确定出该搜索语句的问答类别,比如是属于包括疑问词的显式问答搜索语句,还是属于不包括疑问词的隐式问答搜索语句。并可基于上述的正负样本和各正样本所属的问答类别(如类别标签)训练得到该意图识别模型。使得后续在利用该意图识别模型识别搜索语句的问答属性时,不仅能够识别出问答类的搜索语句,即具有问答意图的搜索语句,还可确定出该问答意图所属的问答类别。进一步可选的,可预先设置得到各问答类别和展示内容/页面(的关键词或内容标题格式)等的对应的关系,进而可根据问答类别区别为用户展示内容,提升了页面展示的灵活性。Optionally, after the search sentence (set) with question and answer intent is determined, the question and answer category of the search sentence can be further determined, for example, whether it belongs to an explicit question answer search sentence that includes question words or does not include question words. Implicit question and answer search statement. The intent recognition model can be trained based on the positive and negative samples and the question and answer category (such as category label) to which each positive sample belongs. Therefore, when the intention recognition model is used to identify the question and answer attribute of the search sentence, not only the question and answer type search sentence, that is, the search sentence with the question and answer intention, can be identified, but also the question and answer category to which the question and answer intention belongs can be determined. Further optionally, the corresponding relationship between each question and answer category and display content/page (keyword or content title format) can be preset, and then the content can be displayed for users according to the question and answer category, which improves the flexibility of page display .
209、接收用户输入的目标搜索语句。209. Receive the target search sentence input by the user.
210、对该目标搜索语句进行分词处理,以得到该目标搜索语句的分词结果。210. Perform word segmentation processing on the target search sentence to obtain a word segmentation result of the target search sentence.
211、将该目标搜索语句的分词结果输入至预置的意图识别模型,以得到该目标搜索语句对应的意图识别结果。211. Input the word segmentation result of the target search sentence into a preset intention recognition model to obtain the intention recognition result corresponding to the target search sentence.
212、如果该意图识别结果指示该目标搜索语句具有问答属性,则输出包括该目标搜索语句对应的问答类搜索结果项的搜索结果。212. If the intent recognition result indicates that the target search sentence has a question and answer attribute, output a search result including a question and answer type search result item corresponding to the target search sentence.
可选的,该步骤209-212的描述和上述图1所示实施例中步骤101-104的相关描述可相互参照,此处不赘述。Optionally, the description of the steps 209-212 and the related description of the steps 101-104 in the embodiment shown in FIG. 1 may refer to each other, and details are not repeated here.
在本实施例中,意图识别设备能够通过对选取的多个目标搜索语句集包括的搜索语句进行分词处理,以得到每个目标搜索语句集的分词结果,并根据每个目标搜索语句集关联的搜索事件信息确定每个目标搜索语句集包括的搜索语句是否具有问答属性,进而能够将该多个目标搜索语句集中具有问答属性的搜索语句的分词结果作为正样本以及将不具有问答属性的搜索语句的分词结果作为负样本,并按照预设的样本平衡规则对正负样本进行平衡后,基于该平衡后的正负样本训练得到意图识别模型以进行问答意图识别,使得能够通过将获取的搜索语句输入至该意图识别模型以识别该目标搜索语句是否具有问答属性,进而在该目标搜索语句具有问答属性时,输出包括问答类搜索结果项的搜索结果,这就提升了意图识别的准确性,使得问答意图识别的可靠性和召回率较高。In this embodiment, the intent recognition device can perform word segmentation processing on the search sentences included in the selected multiple target search sentence sets to obtain the word segmentation result of each target search sentence set, and according to the associated search sentence set of each target search sentence set. The search event information determines whether the search sentences included in each target search sentence set have question and answer attributes, and then the word segmentation results of search sentences with question and answer attributes in the multiple target search sentence sets can be used as positive samples and search sentences that do not have question and answer attributes The word segmentation result of is used as a negative sample, and after the positive and negative samples are balanced according to the preset sample balance rule, the intent recognition model is trained based on the balanced positive and negative samples to perform question and answer intent recognition, so that the obtained search sentence Input to the intent recognition model to identify whether the target search sentence has question and answer attributes, and then when the target search sentence has question and answer attributes, the output includes the search result items of the question and answer category, which improves the accuracy of intent recognition, so that The reliability and recall rate of Q&A intention recognition are high.
上述方法实施例都是对本申请的意图识别方法的举例说明,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。The foregoing method embodiments are all examples of the intention identification method of the present application, and the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.
请参见图3,图3是本申请实施例提供的一种意图识别设备的结构示意图。本申请实施例的意图识别设备包括用于执行上述意图识别方法的单元。具体的,本实施例的意图识别设备300可包括:获取单元301和处理单元302。其中,Please refer to FIG. 3, which is a schematic structural diagram of an intention recognition device provided by an embodiment of the present application. The intention recognition device of the embodiment of the present application includes a unit for executing the above-mentioned intention recognition method. Specifically, the intention recognition device 300 of this embodiment may include: an acquiring unit 301 and a processing unit 302. among them,
获取单元301,用于接收用户输入的目标搜索语句;The obtaining unit 301 is configured to receive the target search sentence input by the user;
处理单元302,用于对所述目标搜索语句进行分词处理,以得到所述目标搜索语句的分词结果,所述目标搜索语句的分词结果包括组成所述目标搜索语句的多个分词;The processing unit 302 is configured to perform word segmentation processing on the target search sentence to obtain a word segmentation result of the target search sentence, and the word segmentation result of the target search sentence includes multiple word segmentation that constitute the target search sentence;
处理单元302,还用于将所述目标搜索语句的分词结果输入至预置的意图识别模型,以得到所述目标搜索语句对应的意图识别结果,所述意图识别模型是基于多个目标搜索语句集以及所述多个目标搜索语句集中每个目标搜索语句集关联的搜索事件信息训练得到的,所述每个目标搜索语句集包括至少一个搜索语句,所述搜索事件信息包括所述至少一个搜索语句中每个搜索语句的搜索次序和/或所述每个搜索语句的搜索结果点击信息,所述意图识别结果用于指示所述目标搜索语句是否具有问答属性;The processing unit 302 is further configured to input the word segmentation result of the target search sentence into a preset intent recognition model to obtain an intent recognition result corresponding to the target search sentence, and the intent recognition model is based on multiple target search sentences Set and the search event information associated with each target search sentence set in the multiple target search sentence sets, each target search sentence set includes at least one search sentence, and the search event information includes the at least one search The search order of each search sentence in the sentence and/or the click information of the search result of each search sentence, and the intention recognition result is used to indicate whether the target search sentence has a question and answer attribute;
处理单元302,还用于如果所述意图识别结果指示所述目标搜索语句具有问答属性,则输出包括所述目标搜索语句对应的问答类搜索结果项的搜索结果。The processing unit 302 is further configured to output a search result including a question and answer type search result item corresponding to the target search sentence if the intention recognition result indicates that the target search sentence has a question and answer attribute.
可选的,获取单元301,还用于从搜索语句数据库中选取多个目标搜索语句集;其中,所述搜索语句数据库中记录了多个搜索语句集以及每个搜索语句集关联的搜索事件信息,所述每个搜索语句集包括至少一个搜索语句,所述搜索事件信息包括所述至少一个搜索语句中每个搜索语句的搜索次序和/或所述每个搜索语句的搜索结果点击信息;Optionally, the acquiring unit 301 is further configured to select multiple target search sentence sets from a search sentence database; wherein, the search sentence database records multiple search sentence sets and search event information associated with each search sentence set Wherein each search sentence set includes at least one search sentence, and the search event information includes a search order of each search sentence in the at least one search sentence and/or search result click information of each search sentence;
处理单元302,还用于分别对所述多个目标搜索语句集中每个目标搜索语句集包括的搜索语句进行分词处理,以得到所述每个目标搜索语句集的分词结果,所述每个目标搜索语句集的分词结果包括组成该目标搜索语句集的搜索语句的多个分词;The processing unit 302 is further configured to separately perform word segmentation processing on the search sentences included in each target search sentence set in the multiple target search sentence sets to obtain the word segmentation result of each target search sentence set. The word segmentation result of the search sentence set includes multiple word segmentation that constitute the search sentence of the target search sentence set;
处理单元302,还可用于根据所述每个目标搜索语句集关联的搜索事件信息,确定所 述每个目标搜索语句集包括的搜索语句是否具有问答属性;将所述多个目标搜索语句集中具有问答属性的目标搜索语句集的分词结果作为正样本,以及将所述多个目标搜索语句集中不具有问答属性的目标搜索语句集的分词结果作为负样本,并利用所述多个目标搜索语句集对应的正样本和负样本训练得到意图识别模型;其中,所述意图识别模型用于识别输入的搜索语句是否具有问答属性。The processing unit 302 may be further configured to determine whether the search sentences included in each target search sentence set have question and answer attributes according to the search event information associated with each target search sentence set; The word segmentation result of the target search sentence set of the question and answer attribute is used as a positive sample, and the word segmentation result of the target search sentence set that does not have the question and answer attribute in the plurality of target search sentence sets is used as a negative sample, and the multiple target search sentence sets are used Corresponding positive samples and negative samples are trained to obtain an intention recognition model; wherein, the intention recognition model is used to identify whether the input search sentence has question and answer attributes.
可选的,处理单元302,可具体用于根据所述目标搜索语句集关联的搜索事件信息所包括的每个搜索语句的搜索次序,确定所述目标搜索语句集包括的所述至少一个搜索语句中最大搜索次序对应的搜索语句;根据所述最大搜索次序对应的搜索语句的搜索结果点击信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性。Optionally, the processing unit 302 may be specifically configured to determine the at least one search sentence included in the target search sentence set according to the search order of each search sentence included in the search event information associated with the target search sentence set The search sentence corresponding to the largest search order in the middle; according to the click information of the search result of the search sentence corresponding to the largest search order, it is determined whether the search sentence included in the target search sentence set has a question and answer attribute.
可选的,所述搜索结果点击信息包括搜索结果项的点击总数量和问答类的搜索结果项的点击数量;Optionally, the search result click information includes the total number of clicks on search result items and the number of clicks on Q&A search result items;
处理单元302,在所述根据所述最大搜索次序对应的搜索语句的搜索结果点击信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性时,可具体用于:计算所述最大搜索次序对应的搜索语句的搜索结果点击信息包括的问答类的搜索结果项的点击数量与搜索结果项的点击总数量之间的第一比值;如果所述搜索结果项的点击总数量大于预设的第一数目阈值,且所述第一比值大于预设的第一比例阈值,确定所述目标搜索语句集包括的搜索语句具有问答属性。The processing unit 302, when determining whether the search sentence included in the target search sentence set has a question and answer attribute according to the click information of the search result of the search sentence corresponding to the maximum search order, may be specifically used to: calculate the maximum search The first ratio between the number of clicks on the search result items of the question and answer category and the total number of clicks on the search result items included in the search result click information of the search sentence corresponding to the order; if the total number of clicks on the search result item is greater than the preset A first number threshold, and the first ratio is greater than a preset first ratio threshold, it is determined that the search sentences included in the target search sentence set have question and answer attributes.
可选的,处理单元302,可具体用于确定所述目标搜索语句集包括的所述至少一个搜索语句中每个搜索语句对应的加权系数,所述至少一个搜索语句中搜索次序大的搜索语句的加权系数高于搜索次序小的搜索语句的加权系数;根据每个搜索语句对应的加权系数和所述目标搜索语句集关联的搜索事件信息中的搜索结果点击信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性。Optionally, the processing unit 302 may be specifically configured to determine a weighting coefficient corresponding to each search sentence in the at least one search sentence included in the target search sentence set, and a search sentence with a higher search order among the at least one search sentence The weighting coefficient of is higher than the weighting coefficient of search sentences with a small search order; the target search sentence set is determined according to the weighting coefficient corresponding to each search sentence and the search result click information in the search event information associated with the target search sentence set Whether the included search sentence has question and answer attributes.
可选的,获取单元301,可具体用于:从搜索语句数据库中确定出现次数大于预设的第二数目阈值的搜索语句集,并将确定出的所述出现次数大于所述第二数目阈值的搜索语句集作为所述多个目标搜索语句集;或者,从搜索语句数据库中确定出现次数与所述搜索语句数据库中搜索语句总数量之间的第二比值大于预设的第二比例阈值的搜索语句集,并将确定出的所述第二比值大于所述第二比例阈值的搜索语句集作为所述多个目标搜索语句集。Optionally, the acquiring unit 301 may be specifically configured to: determine from the search sentence database a set of search sentences with a number of occurrences greater than a preset second number threshold, and determine that the number of occurrences is greater than the second number threshold As the multiple target search sentence sets; or, determining from the search sentence database that the second ratio between the number of occurrences and the total number of search sentences in the search sentence database is greater than a preset second ratio threshold Search sentence sets, and use the determined search sentence sets whose second ratio is greater than the second ratio threshold as the multiple target search sentence sets.
其中,搜索语句集的出现次数为该搜索语句集包括的搜索语句的出现次数之和,或者,搜索语句集的出现次数为该搜索语句集包括的搜索语句的出现次数的平均值。Wherein, the number of occurrences of the search sentence set is the sum of the number of occurrences of the search sentences included in the search sentence set, or the number of occurrences of the search sentence set is the average number of occurrences of the search sentences included in the search sentence set.
可选的,获取单元301,可具体用于:确定待训练的意图识别模型的应用领域信息;根据所述应用领域信息从所述搜索语句数据库包括的多个子数据库中确定出目标子数据库;从所述目标子数据库选取所述多个目标搜索语句集。Optionally, the acquiring unit 301 may be specifically configured to: determine the application field information of the intent recognition model to be trained; determine the target sub-database from multiple sub-databases included in the search sentence database according to the application field information; The target sub-database selects the multiple target search sentence sets.
其中,所述子数据库与应用领域一一对应,每个子数据库包括对应的应用领域下的多个搜索语句集以及每个搜索语句集关联的搜索事件信息,所述目标子数据库对应的应用领域与所述应用领域信息指示的应用领域相同。Wherein, the sub-databases have a one-to-one correspondence with application fields, each sub-database includes multiple search sentence sets under the corresponding application field and search event information associated with each search sentence set, and the application field corresponding to the target sub-database is The application fields indicated by the application field information are the same.
可选的,处理单元302,还可用于在所述利用所述多个目标搜索语句集对应的正样本和负样本训练得到意图识别模型之前,计算所述正样本对应的搜索语句的数量与所述负样 本对应的搜索语句的数量之间的差值的绝对值;判断所述绝对值是否超过预设的第三数目阈值;如果所述绝对值超过所述第三数目阈值,按照预设的样本平衡规则对所述正样本和/或所述负样本进行处理,以得到处理后的正样本和负样本;Optionally, the processing unit 302 may also be configured to calculate the number of search sentences corresponding to the positive samples and the total number of search sentences corresponding to the positive samples before the intent recognition model is obtained by training using the positive samples and negative samples corresponding to the multiple target search sentence sets. The absolute value of the difference between the number of search sentences corresponding to the negative sample; determine whether the absolute value exceeds a preset third number threshold; if the absolute value exceeds the third number threshold, according to the preset The sample balance rule processes the positive sample and/or the negative sample to obtain processed positive sample and negative sample;
处理单元302,可具体用于利用处理后的正样本和负样本训练得到所述意图识别模型。The processing unit 302 may be specifically configured to use the processed positive samples and negative samples to train to obtain the intention recognition model.
具体的,该意图识别设备可通过上述单元实现上述图1至图2所示实施例中的意图识别方法中的部分或全部步骤。应理解,本申请实施例是对应方法实施例的装置实施例,对方法实施例的描述,也适用于本申请实施例,此处不赘述。Specifically, the intention recognition device can implement part or all of the steps in the intention recognition method in the embodiment shown in FIG. 1 to FIG. 2 through the foregoing unit. It should be understood that the embodiment of the present application is an apparatus embodiment corresponding to the method embodiment, and the description of the method embodiment is also applicable to the embodiment of the present application, and will not be repeated here.
请参见图4,图4是本申请实施例提供的另一种意图识别设备的结构示意图。该意图识别设备用于执行上述的方法。如图4所示,本实施例中的意图识别设备400可以包括:一个或多个处理器401和存储器402。可选的,该意图识别设备还可包括一个或多个用户接口403,和/或,一个或多个通信接口404。上述处理器401、用户接口403、通信接口404和存储器402可通过总线405连接,或者可以通过其他方式连接,图4中以总线方式进行示例说明。其中,存储器402用于存储计算机程序,所述计算机程序包括程序指令,处理器401用于执行存储器402存储的程序指令。Please refer to FIG. 4, which is a schematic structural diagram of another intention recognition device provided by an embodiment of the present application. The intention recognition device is used to perform the above-mentioned method. As shown in FIG. 4, the intention recognition device 400 in this embodiment may include: one or more processors 401 and a memory 402. Optionally, the intention recognition device may further include one or more user interfaces 403 and/or one or more communication interfaces 404. The above-mentioned processor 401, user interface 403, communication interface 404, and memory 402 may be connected through a bus 405, or may be connected in other ways, as illustrated in FIG. 4 by way of a bus. The memory 402 is used to store a computer program, and the computer program includes program instructions, and the processor 401 is used to execute the program instructions stored in the memory 402.
其中,处理器401可用于调用所述程序指令执行以下步骤:调用用户接口403接收用户输入的目标搜索语句;对所述目标搜索语句进行分词处理,以得到所述目标搜索语句的分词结果,所述目标搜索语句的分词结果包括组成所述目标搜索语句的多个分词;将所述目标搜索语句的分词结果输入至预置的意图识别模型,以得到所述目标搜索语句对应的意图识别结果,所述意图识别模型是基于多个目标搜索语句集以及所述多个目标搜索语句集中每个目标搜索语句集关联的搜索事件信息训练得到的,所述每个目标搜索语句集包括至少一个搜索语句,所述搜索事件信息包括所述至少一个搜索语句中每个搜索语句的搜索次序和/或所述每个搜索语句的搜索结果点击信息,所述意图识别结果用于指示所述目标搜索语句是否具有问答属性;如果所述意图识别结果指示所述目标搜索语句具有问答属性,则调用用户接口403输出包括所述目标搜索语句对应的问答类搜索结果项的搜索结果。The processor 401 may be used to call the program instructions to perform the following steps: call the user interface 403 to receive the target search sentence input by the user; perform word segmentation processing on the target search sentence to obtain the word segmentation result of the target search sentence, so The word segmentation result of the target search sentence includes multiple word segmentation that make up the target search sentence; the word segmentation result of the target search sentence is input into a preset intention recognition model to obtain the intention recognition result corresponding to the target search sentence, The intent recognition model is trained based on multiple target search sentence sets and search event information associated with each target search sentence set in the multiple target search sentence sets, and each target search sentence set includes at least one search sentence The search event information includes the search order of each search sentence in the at least one search sentence and/or search result click information of each search sentence, and the intention recognition result is used to indicate whether the target search sentence is It has a question and answer attribute; if the intent recognition result indicates that the target search sentence has a question and answer attribute, the user interface 403 is called to output a search result including a question and answer type search result item corresponding to the target search sentence.
可选的,处理器401在执行所述将所述目标搜索语句的分词结果输入至预置的意图识别模型,以得到所述目标搜索语句对应的意图识别结果之前,还可执行以下步骤:从搜索语句数据库中选取多个目标搜索语句集;其中,所述搜索语句数据库中记录了多个搜索语句集以及每个搜索语句集关联的搜索事件信息,所述每个搜索语句集包括至少一个搜索语句,所述搜索事件信息包括所述至少一个搜索语句中每个搜索语句的搜索次序和/或所述每个搜索语句的搜索结果点击信息;分别对所述多个目标搜索语句集中每个目标搜索语句集包括的搜索语句进行分词处理,以得到所述每个目标搜索语句集的分词结果,所述每个目标搜索语句集的分词结果包括组成该目标搜索语句集的搜索语句的多个分词;根据所述每个目标搜索语句集关联的搜索事件信息,确定所述每个目标搜索语句集包括的搜索语句是否具有问答属性;将所述多个目标搜索语句集中具有问答属性的搜索语句的分词结果作为正样本,以及将所述多个目标搜索语句集中不具有问答属性的搜索语句的分词结果作为负样本,并利用所述多个目标搜索语句集对应的正样本和负样本训练得到意图识别模型;其中,所述意图识别模型用于识别输入的搜索语句是否具有问答属性。Optionally, the processor 401 may perform the following steps before executing the inputting the word segmentation result of the target search sentence into a preset intent recognition model to obtain the intent recognition result corresponding to the target search sentence: Multiple target search sentence sets are selected from the search sentence database; wherein, the search sentence database records multiple search sentence sets and search event information associated with each search sentence set, and each search sentence set includes at least one search Sentence, the search event information includes the search order of each search sentence in the at least one search sentence and/or the click information of the search result of each search sentence; each target in the plurality of target search sentences The search sentences included in the search sentence set are subjected to word segmentation processing to obtain the word segmentation result of each target search sentence set, and the word segmentation result of each target search sentence set includes multiple word segmentation that constitute the search sentence of the target search sentence set According to the search event information associated with each target search sentence set, it is determined whether the search sentences included in each target search sentence set have question and answer attributes; The word segmentation result is taken as a positive sample, and the word segmentation result of the search sentence that does not have the question and answer attribute in the multiple target search sentence sets is taken as a negative sample, and the intention is obtained by training with the positive samples and negative samples corresponding to the multiple target search sentence sets Recognition model; wherein the intention recognition model is used to recognize whether the input search sentence has question and answer attributes.
可选的,处理器401在执行所述根据所述目标搜索语句集关联的搜索事件信息,确定 所述目标搜索语句集包括的搜索语句是否具有问答属性时,可具体执行以下步骤:根据所述目标搜索语句集关联的搜索事件信息所包括的每个搜索语句的搜索次序,确定所述目标搜索语句集包括的所述至少一个搜索语句中最大搜索次序对应的搜索语句;根据所述最大搜索次序对应的搜索语句的搜索结果点击信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性。Optionally, when the processor 401 executes the search event information associated with the target search sentence set to determine whether the search sentence included in the target search sentence set has a question and answer attribute, the processor 401 may specifically execute the following steps: The search order of each search sentence included in the search event information associated with the target search sentence set determines the search sentence corresponding to the largest search order in the at least one search sentence included in the target search sentence set; and according to the maximum search order The search result click information of the corresponding search sentence determines whether the search sentence included in the target search sentence set has a question and answer attribute.
可选的,处理器401在执行所述根据所述目标搜索语句集关联的搜索事件信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性时,可具体执行以下步骤:确定所述目标搜索语句集包括的所述至少一个搜索语句中每个搜索语句对应的加权系数;根据每个搜索语句对应的加权系数和所述目标搜索语句集关联的搜索事件信息中的搜索结果点击信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性。Optionally, when the processor 401 executes the search event information associated with the target search sentence set to determine whether the search sentence included in the target search sentence set has a question and answer attribute, the processor 401 may specifically perform the following steps: determine the The target search sentence set includes a weighting coefficient corresponding to each search sentence in the at least one search sentence; according to the weighting coefficient corresponding to each search sentence and the search result click information in the search event information associated with the target search sentence set, It is determined whether the search sentences included in the target search sentence set have question and answer attributes.
进一步可选的,所述至少一个搜索语句中搜索次序大的搜索语句的加权系数高于搜索次序小的搜索语句的加权系数,和/或,所述至少一个搜索语句中包括疑问词的搜索语句的加权系数高于未包括疑问词的搜索语句的加权系数,等等,此处不赘述。Further optionally, the weighting coefficient of the search sentence with a higher search order in the at least one search sentence is higher than the weighting coefficient of the search sentence with a lower search order, and/or the search sentence including the question word in the at least one search sentence The weighting coefficient of is higher than the weighting coefficient of search sentences that do not include question words, etc., which will not be repeated here.
可选的,所述搜索结果点击信息包括搜索结果项的点击总数量和问答类的搜索结果项的点击数量;处理器401在执行所述根据所述最大搜索次序对应的搜索语句的搜索结果点击信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性时,可具体执行以下步骤:计算所述最大搜索次序对应的搜索语句的搜索结果点击信息包括的问答类的搜索结果项的点击数量与搜索结果项的点击总数量之间的第一比值;如果所述搜索结果项的点击总数量大于预设的第一数目阈值,且所述第一比值大于预设的第一比例阈值,确定所述目标搜索语句集包括的搜索语句具有问答属性。Optionally, the search result click information includes the total number of clicks of search result items and the number of clicks of Q&A search result items; the processor 401 is executing the search result clicks of the search sentence corresponding to the maximum search order Information, when determining whether the search sentences included in the target search sentence set have question and answer attributes, the following steps may be specifically executed: the search result of the search sentence corresponding to the maximum search order is calculated. The click information includes the click of the search result item of the question and answer category The first ratio between the number and the total number of clicks on the search result item; if the total number of clicks on the search result item is greater than the preset first number threshold, and the first ratio is greater than the preset first ratio threshold, It is determined that the search sentences included in the target search sentence set have question and answer attributes.
可选的,处理器401在执行所述从搜索语句数据库中选取多个目标搜索语句集时,可具体执行以下步骤:从搜索语句数据库中确定出现次数大于预设的第二数目阈值的搜索语句集,并将确定出的所述出现次数大于所述第二数目阈值的搜索语句集作为所述多个目标搜索语句集;或者,从搜索语句数据库中确定出现次数与所述搜索语句数据库中搜索语句总数量之间的第二比值大于预设的第二比例阈值的搜索语句集,并将确定出的所述第二比值大于所述第二比例阈值的搜索语句集作为所述多个目标搜索语句集;Optionally, when the processor 401 executes the selection of multiple target search sentence sets from the search sentence database, it may specifically perform the following steps: determine from the search sentence database the search sentences whose occurrence times are greater than the preset second number threshold Set, and use the determined set of search sentences with the number of occurrences greater than the second number threshold as the multiple target search sentence sets; or, determine the number of occurrences from the search sentence database and search in the search sentence database Search sentence sets whose second ratio between the total number of sentences is greater than a preset second proportion threshold, and use the determined search sentence set whose second ratio is greater than the second proportion threshold as the multiple target searches Statement set
其中,搜索语句集的出现次数为该搜索语句集包括的搜索语句的出现次数之和,或者,搜索语句集的出现次数为该搜索语句集包括的搜索语句的出现次数的平均值。Wherein, the number of occurrences of the search sentence set is the sum of the number of occurrences of the search sentences included in the search sentence set, or the number of occurrences of the search sentence set is the average number of occurrences of the search sentences included in the search sentence set.
可选的,处理器401在执行所述从搜索语句数据库中选取多个目标搜索语句集时,可具体执行以下步骤:确定待训练的意图识别模型的应用领域信息;根据所述应用领域信息从所述搜索语句数据库包括的多个子数据库中确定出目标子数据库;从所述目标子数据库选取所述多个目标搜索语句集。Optionally, when the processor 401 executes the selection of multiple target search sentence sets from the search sentence database, it may specifically perform the following steps: determine the application field information of the intent recognition model to be trained; A target sub-database is determined from multiple sub-databases included in the search sentence database; and the multiple target search sentence sets are selected from the target sub-database.
其中,所述子数据库与应用领域一一对应,每个子数据库包括对应的应用领域下的多个搜索语句集以及每个搜索语句集关联的搜索事件信息,所述目标子数据库对应的应用领域与所述应用领域信息指示的应用领域相同。Wherein, the sub-databases have a one-to-one correspondence with application fields, each sub-database includes multiple search sentence sets under the corresponding application field and search event information associated with each search sentence set, and the application field corresponding to the target sub-database is The application fields indicated by the application field information are the same.
可选的,处理器401在执行所述利用所述多个目标搜索语句集对应的正样本和负样本训练得到意图识别模型之前,还可执行以下步骤:计算所述正样本对应的搜索语句的数量与所述负样本对应的搜索语句的数量之间的差值的绝对值;判断所述绝对值是否超过预设 的第三数目阈值;如果所述绝对值超过所述第三数目阈值,按照预设的样本平衡规则对所述正样本和/或所述负样本进行处理,以得到处理后的正样本和负样本;Optionally, before the processor 401 executes the training to obtain the intent recognition model by using the positive samples and negative samples corresponding to the multiple target search sentence sets, it may also perform the following steps: calculate the search sentence corresponding to the positive sample The absolute value of the difference between the number and the number of search sentences corresponding to the negative sample; determine whether the absolute value exceeds a preset third number threshold; if the absolute value exceeds the third number threshold, follow The preset sample balance rule processes the positive sample and/or the negative sample to obtain processed positive sample and negative sample;
处理器401在执行所述利用所述多个目标搜索语句集对应的正样本和负样本训练得到意图识别模型时,可具体执行以下步骤:利用处理后的正样本和负样本训练得到所述意图识别模型。When the processor 401 executes the training using the positive samples and negative samples corresponding to the multiple target search sentence sets to obtain the intention recognition model, the processor 401 may specifically perform the following steps: train using the processed positive samples and negative samples to obtain the intention Identify the model.
其中,所述处理器401可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。Wherein, the processor 401 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), and application specific integrated circuits (Application Specific Integrated Circuits). Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
用户接口403可包括输入设备和输出设备,输入设备可以包括触控板、麦克风等,输出设备可以包括显示器(LCD等)、扬声器等。The user interface 403 may include an input device and an output device. The input device may include a touch panel, a microphone, etc., and the output device may include a display (LCD, etc.), a speaker, etc.
通信接口404可包括接收器和发射器,用于与其他设备进行通信。The communication interface 404 may include a receiver and a transmitter for communicating with other devices.
存储器402可以包括只读存储器和随机存取存储器,并向处理器401提供指令和数据。存储器402的一部分还可以包括非易失性随机存取存储器。例如,存储器402还可以存储上述的多个搜索语句集、每个搜索语句集关联的搜索事件信息等等。The memory 402 may include a read-only memory and a random access memory, and provides instructions and data to the processor 401. A part of the memory 402 may also include a non-volatile random access memory. For example, the memory 402 may also store the aforementioned multiple search sentence sets, search event information associated with each search sentence set, and so on.
具体实现中,本申请实施例中所描述的处理器401等可执行上述图1至图2所示的方法实施例中所描述的实现方式,也可执行本申请实施例图3所描述的各单元的实现方式,此处不赘述。In specific implementation, the processor 401 described in the embodiment of the present application, etc., can execute the implementation described in the method embodiments shown in FIG. 1 to FIG. The implementation of the unit will not be repeated here.
本申请实施例还提供了一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质存储有计算机程序,所述计算机程序被处理器执行时可实现图1至图2所对应实施例中描述的意图识别方法中的部分或全部步骤,也可实现本申请图3或图4所示实施例的意图识别设备的功能,此处不赘述。The embodiment of the present application also provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium stores a computer program, and when the computer program is executed by a processor, it can realize FIGS. 1 to Part or all of the steps in the intention recognition method described in the corresponding embodiment of 2 can also realize the function of the intention recognition device in the embodiment shown in FIG. 3 or FIG. 4 of this application, and will not be repeated here.
本申请实施例还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述方法中的部分或全部步骤。The embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to execute part or all of the steps in the above method.
所述计算机非易失性可读存储介质可以是前述任一实施例所述的意图识别设备的内部存储单元,例如意图识别设备的硬盘或内存。所述计算机非易失性可读存储介质也可以是所述意图识别设备的外部存储设备,例如所述意图识别设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。The computer non-volatile readable storage medium may be the internal storage unit of the intent identification device described in any of the foregoing embodiments, such as the hard disk or memory of the intent identification device. The computer non-volatile readable storage medium may also be an external storage device of the intent identification device, for example, a plug-in hard disk equipped on the intent identification device, a smart media card (SMC), and a safe Digital (Secure Digital, SD) card, Flash Card (Flash Card), etc.
在本申请中,术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。In this application, the term "and/or" is only an association relationship describing the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, and both A and B exist. , There are three cases of B alone. In addition, the character "/" in this text generally indicates that the associated objects before and after are in an "or" relationship. In the various embodiments of the present application, the size of the sequence number of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not correspond to the implementation process of the embodiments of the present application. Constitute any limitation.
以上所述,仅为本申请的部分实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。The above are only part of the implementation of this application, but the protection scope of this application is not limited to this. Anyone familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the protection scope of this application.

Claims (20)

  1. 一种意图识别方法,其特征在于,包括:An intention recognition method, characterized in that it includes:
    接收用户输入的目标搜索语句;Receive the target search sentence entered by the user;
    对所述目标搜索语句进行分词处理,以得到所述目标搜索语句的分词结果,所述目标搜索语句的分词结果包括组成所述目标搜索语句的多个分词;Performing word segmentation processing on the target search sentence to obtain a word segmentation result of the target search sentence, and the word segmentation result of the target search sentence includes a plurality of word segmentation constituting the target search sentence;
    将所述目标搜索语句的分词结果输入至预置的意图识别模型,以得到所述目标搜索语句对应的意图识别结果,所述意图识别模型是基于多个目标搜索语句集以及所述多个目标搜索语句集中每个目标搜索语句集关联的搜索事件信息训练得到的,所述每个目标搜索语句集包括至少一个搜索语句,所述搜索事件信息包括所述至少一个搜索语句中每个搜索语句的搜索次序和/或所述每个搜索语句的搜索结果点击信息,所述意图识别结果用于指示所述目标搜索语句是否具有问答属性;Input the word segmentation result of the target search sentence into a preset intent recognition model to obtain the intent recognition result corresponding to the target search sentence, and the intent recognition model is based on multiple target search sentence sets and the multiple targets The search event information is obtained by training the search event information associated with each target search sentence set in the search sentence set, where each target search sentence set includes at least one search sentence, and the search event information includes information about each search sentence in the at least one search sentence. Search order and/or click information of the search result of each search sentence, and the intention recognition result is used to indicate whether the target search sentence has a question and answer attribute;
    如果所述意图识别结果指示所述目标搜索语句具有问答属性,则输出包括所述目标搜索语句对应的问答类搜索结果项的搜索结果。If the intent recognition result indicates that the target search sentence has a question and answer attribute, then output the search result including the question and answer type search result item corresponding to the target search sentence.
  2. 根据权利要求1所述的方法,其特征在于,在所述将所述目标搜索语句的分词结果输入至预置的意图识别模型,以得到所述目标搜索语句对应的意图识别结果之前,所述方法还包括:The method according to claim 1, characterized in that, before inputting the word segmentation result of the target search sentence into a preset intention recognition model to obtain the intention recognition result corresponding to the target search sentence, the Methods also include:
    从搜索语句数据库中选取多个目标搜索语句集;其中,所述搜索语句数据库中记录了多个搜索语句集以及每个搜索语句集关联的搜索事件信息,所述每个搜索语句集包括至少一个搜索语句,所述搜索事件信息包括所述至少一个搜索语句中每个搜索语句的搜索次序和/或所述每个搜索语句的搜索结果点击信息;Select multiple target search sentence sets from the search sentence database; wherein, the search sentence database records multiple search sentence sets and search event information associated with each search sentence set, and each search sentence set includes at least one Search sentence, the search event information includes a search order of each search sentence in the at least one search sentence and/or search result click information of each search sentence;
    分别对所述多个目标搜索语句集中每个目标搜索语句集包括的搜索语句进行分词处理,以得到所述每个目标搜索语句集的分词结果,所述每个目标搜索语句集的分词结果包括组成该目标搜索语句集的搜索语句的多个分词;Word segmentation processing is performed on the search sentences included in each target search sentence set of the multiple target search sentence sets to obtain the word segmentation result of each target search sentence set, and the word segmentation result of each target search sentence set includes Multiple word segmentation of the search sentence constituting the target search sentence set;
    根据所述每个目标搜索语句集关联的搜索事件信息,确定所述每个目标搜索语句集包括的搜索语句是否具有问答属性;According to the search event information associated with each target search sentence set, determine whether the search sentence included in each target search sentence set has a question and answer attribute;
    将所述多个目标搜索语句集中具有问答属性的目标搜索语句集的分词结果作为正样本,以及将所述多个目标搜索语句集中不具有问答属性的目标搜索语句集的分词结果作为负样本,并利用所述多个目标搜索语句集对应的正样本和负样本训练得到意图识别模型;其中,所述意图识别模型用于识别输入的搜索语句是否具有问答属性。Taking the word segmentation results of the target search sentence set with question and answer attributes in the multiple target search sentence sets as positive samples, and taking the word segmentation results of the target search sentence sets without question answering attributes in the multiple target search sentence sets as negative samples, The positive samples and negative samples corresponding to the multiple target search sentence sets are used for training to obtain an intention recognition model; wherein, the intention recognition model is used to identify whether the input search sentence has question and answer attributes.
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述目标搜索语句集关联的搜索事件信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性,包括:The method according to claim 2, wherein the determining whether the search sentences included in the target search sentence set have question and answer attributes according to the search event information associated with the target search sentence set comprises:
    根据所述目标搜索语句集关联的搜索事件信息所包括的每个搜索语句的搜索次序,确定所述目标搜索语句集包括的所述至少一个搜索语句中最大搜索次序对应的搜索语句;Determine the search sentence corresponding to the largest search order in the at least one search sentence included in the target search sentence set according to the search order of each search sentence included in the search event information associated with the target search sentence set;
    根据所述最大搜索次序对应的搜索语句的搜索结果点击信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性。According to the click information of the search result of the search sentence corresponding to the maximum search order, it is determined whether the search sentence included in the target search sentence set has a question and answer attribute.
  4. 根据权利要求3所述的方法,其特征在于,所述搜索结果点击信息包括搜索结果项的点击总数量和问答类的搜索结果项的点击数量;所述根据所述最大搜索次序对应的搜索语句的搜索结果点击信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性, 包括:The method according to claim 3, wherein the search result click information includes the total number of clicks on search result items and the number of clicks on Q&A search result items; the search sentence corresponding to the maximum search order Click the information in the search result to determine whether the search sentence included in the target search sentence set has question and answer attributes, including:
    计算所述最大搜索次序对应的搜索语句的搜索结果点击信息包括的问答类的搜索结果项的点击数量与搜索结果项的点击总数量之间的第一比值;Calculating the first ratio between the number of clicks on the search result items of the question and answer category and the total number of clicks on the search result items included in the search result click information of the search sentence corresponding to the maximum search order;
    如果所述搜索结果项的点击总数量大于预设的第一数目阈值,且所述第一比值大于预设的第一比例阈值,确定所述目标搜索语句集包括的搜索语句具有问答属性。If the total number of clicks on the search result item is greater than the preset first number threshold, and the first ratio is greater than the preset first ratio threshold, it is determined that the search sentence included in the target search sentence set has a question and answer attribute.
  5. 根据权利要求2所述的方法,其特征在于,所述根据所述目标搜索语句集关联的搜索事件信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性,包括:The method according to claim 2, wherein the determining whether the search sentences included in the target search sentence set have question and answer attributes according to the search event information associated with the target search sentence set comprises:
    确定所述目标搜索语句集包括的所述至少一个搜索语句中每个搜索语句对应的加权系数,所述至少一个搜索语句中搜索次序大的搜索语句的加权系数高于搜索次序小的搜索语句的加权系数;Determine the weighting coefficient corresponding to each search sentence in the at least one search sentence included in the target search sentence set, and the weighting coefficient of the search sentence with a higher search order in the at least one search sentence is higher than that of the search sentence with a lower search order Weighting factor
    根据每个搜索语句对应的加权系数和所述目标搜索语句集关联的搜索事件信息中所述每个搜索语句的搜索结果点击信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性。According to the weighting coefficient corresponding to each search sentence and the search result click information of each search sentence in the search event information associated with the target search sentence set, it is determined whether the search sentence included in the target search sentence set has a question and answer attribute.
  6. 根据权利要求2所述的方法,其特征在于,所述搜索事件信息包括所述至少一个搜索语句中每个搜索语句的搜索结果点击信息,所述搜索结果点击信息包括搜索结果项的点击总数量、问答类的搜索结果项的点击数量和点击的各搜索结果项的浏览时长;The method according to claim 2, wherein the search event information includes search result click information of each search sentence in the at least one search sentence, and the search result click information includes the total number of clicks of search result items , The number of clicks on Q&A search result items and the browsing time of each clicked search result item;
    所述根据所述目标搜索语句集关联的搜索事件信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性,包括:The determining whether the search sentences included in the target search sentence set have question and answer attributes according to the search event information associated with the target search sentence set includes:
    过滤掉浏览时长小于预设时长阈值的搜索结果项,并确定过滤该搜索结果项之后剩余的搜索结果项的点击总数量,确定过滤该搜索结果项之后剩余的问答类的搜索结果项的点击数量,以及计算该剩余的问答类的搜索结果项的点击数量与该剩余的搜索结果项的点击总数量之间的第一比值;Filter out search result items whose browsing duration is less than the preset duration threshold, and determine the total number of clicks on the search result items remaining after filtering the search result item, and determine the number of clicks on the question-and-answer search result items remaining after filtering the search result item , And calculating the first ratio between the number of clicks on the remaining Q&A search result items and the total number of clicks on the remaining search result items;
    如果该剩余的搜索结果项的点击总数量大于预设的第一数目阈值,且该第一比值大于预设的第一比例阈值,确定所述目标搜索语句集包括的搜索语句具有问答属性。If the total number of clicks on the remaining search result items is greater than the preset first number threshold, and the first ratio is greater than the preset first ratio threshold, it is determined that the search sentences included in the target search sentence set have question and answer attributes.
  7. 根据权利要求2-6任一项所述的方法,其特征在于,所述从搜索语句数据库中选取多个目标搜索语句集,包括:The method according to any one of claims 2-6, wherein the selecting multiple target search sentence sets from a search sentence database comprises:
    确定待训练的意图识别模型的应用领域信息;Determine the application domain information of the intent recognition model to be trained;
    根据所述应用领域信息从所述搜索语句数据库包括的多个子数据库中确定出目标子数据库,所述子数据库与应用领域一一对应,每个子数据库包括对应的应用领域下的多个搜索语句集以及每个搜索语句集关联的搜索事件信息,所述目标子数据库对应的应用领域与所述应用领域信息指示的应用领域相同;According to the application field information, a target sub-database is determined from a plurality of sub-databases included in the search sentence database, the sub-databases have a one-to-one correspondence with application fields, and each sub-database includes multiple search sentence sets under the corresponding application field And search event information associated with each search sentence set, the application field corresponding to the target sub-database is the same as the application field indicated by the application field information;
    从所述目标子数据库选取所述多个目标搜索语句集。The multiple target search sentence sets are selected from the target sub-database.
  8. 一种意图识别设备,其特征在于,包括:获取单元和处理单元;An intention recognition device, which is characterized by comprising: an acquisition unit and a processing unit;
    获取单元,用于接收用户输入的目标搜索语句;The acquiring unit is used to receive the target search sentence input by the user;
    处理单元,用于对所述目标搜索语句进行分词处理,以得到所述目标搜索语句的分词结果,所述目标搜索语句的分词结果包括组成所述目标搜索语句的多个分词;A processing unit, configured to perform word segmentation processing on the target search sentence to obtain a word segmentation result of the target search sentence, and the word segmentation result of the target search sentence includes a plurality of word segmentation forming the target search sentence;
    所述处理单元,还用于将所述目标搜索语句的分词结果输入至预置的意图识别模型,以得到所述目标搜索语句对应的意图识别结果,所述意图识别模型是基于多个目标搜索语 句集以及所述多个目标搜索语句集中每个目标搜索语句集关联的搜索事件信息训练得到的,所述每个目标搜索语句集包括至少一个搜索语句,所述搜索事件信息包括所述至少一个搜索语句中每个搜索语句的搜索次序和/或所述每个搜索语句的搜索结果点击信息,所述意图识别结果用于指示所述目标搜索语句是否具有问答属性;The processing unit is further configured to input the word segmentation result of the target search sentence into a preset intent recognition model to obtain an intent recognition result corresponding to the target search sentence, and the intent recognition model is based on multiple target searches Sentence sets and search event information associated with each target search sentence set in the multiple target search sentence sets, each target search sentence set includes at least one search sentence, and the search event information includes the at least one The search order of each search sentence in the search sentence and/or the click information of the search result of each search sentence, and the intention recognition result is used to indicate whether the target search sentence has a question and answer attribute;
    所述处理单元,还用于如果所述意图识别结果指示所述目标搜索语句具有问答属性,则输出包括所述目标搜索语句对应的问答类搜索结果项的搜索结果。The processing unit is further configured to output a search result including a question and answer type search result item corresponding to the target search sentence if the intention recognition result indicates that the target search sentence has a question and answer attribute.
  9. 根据权利要求8所述的设备,其特征在于,The device according to claim 8, wherein:
    所述获取单元,还用于从搜索语句数据库中选取多个目标搜索语句集;其中,所述搜索语句数据库中记录了多个搜索语句集以及每个搜索语句集关联的搜索事件信息,所述每个搜索语句集包括至少一个搜索语句,所述搜索事件信息包括所述至少一个搜索语句中每个搜索语句的搜索次序和/或所述每个搜索语句的搜索结果点击信息;The acquiring unit is also used to select multiple target search sentence sets from a search sentence database; wherein, the search sentence database records multiple search sentence sets and search event information associated with each search sentence set. Each search sentence set includes at least one search sentence, and the search event information includes the search order of each search sentence in the at least one search sentence and/or search result click information of each search sentence;
    所述处理单元,还用于分别对所述多个目标搜索语句集中每个目标搜索语句集包括的搜索语句进行分词处理,以得到所述每个目标搜索语句集的分词结果,所述每个目标搜索语句集的分词结果包括组成该目标搜索语句集的搜索语句的多个分词;The processing unit is further configured to separately perform word segmentation processing on the search sentences included in each target search sentence set in the multiple target search sentence sets, so as to obtain the word segmentation results of each target search sentence set. The word segmentation result of the target search sentence set includes multiple word segmentation that constitute the search sentence of the target search sentence set;
    所述处理单元,还用于根据所述每个目标搜索语句集关联的搜索事件信息,确定所述每个目标搜索语句集包括的搜索语句是否具有问答属性;将所述多个目标搜索语句集中具有问答属性的目标搜索语句集的分词结果作为正样本,以及将所述多个目标搜索语句集中不具有问答属性的目标搜索语句集的分词结果作为负样本,并利用所述多个目标搜索语句集对应的正样本和负样本训练得到意图识别模型;其中,所述意图识别模型用于识别输入的搜索语句是否具有问答属性。The processing unit is further configured to determine, according to the search event information associated with each target search sentence set, whether the search sentence included in each target search sentence set has a question and answer attribute; and collect the multiple target search sentences The word segmentation result of the target search sentence set with the question and answer attribute is used as a positive sample, and the word segmentation result of the target search sentence set without the question and answer attribute in the plurality of target search sentence sets is used as a negative sample, and the multiple target search sentences are used The positive samples and negative samples corresponding to the set are trained to obtain an intent recognition model; wherein, the intent recognition model is used to identify whether the input search sentence has question and answer attributes.
  10. 根据权利要求9所述的设备,其特征在于,The device according to claim 9, wherein:
    所述处理单元,具体用于根据所述目标搜索语句集关联的搜索事件信息所包括的每个搜索语句的搜索次序,确定所述目标搜索语句集包括的所述至少一个搜索语句中最大搜索次序对应的搜索语句;The processing unit is specifically configured to determine the maximum search order among the at least one search sentence included in the target search sentence set according to the search order of each search sentence included in the search event information associated with the target search sentence set The corresponding search sentence;
    根据所述最大搜索次序对应的搜索语句的搜索结果点击信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性。According to the click information of the search result of the search sentence corresponding to the maximum search order, it is determined whether the search sentence included in the target search sentence set has a question and answer attribute.
  11. 根据权利要求10所述的设备,其特征在于,所述搜索结果点击信息包括搜索结果项的点击总数量和问答类的搜索结果项的点击数量;所述处理单元,在所述根据所述最大搜索次序对应的搜索语句的搜索结果点击信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性时,具体用于:The device according to claim 10, wherein the search result click information includes the total number of clicks on search result items and the number of clicks on Q&A search result items; When the search result click information of the search sentence corresponding to the search order is used to determine whether the search sentence included in the target search sentence set has a question and answer attribute, it is specifically used for:
    计算所述最大搜索次序对应的搜索语句的搜索结果点击信息包括的问答类的搜索结果项的点击数量与搜索结果项的点击总数量之间的第一比值;Calculating the first ratio between the number of clicks on the search result items of the question and answer category and the total number of clicks on the search result items included in the search result click information of the search sentence corresponding to the maximum search order;
    如果所述搜索结果项的点击总数量大于预设的第一数目阈值,且所述第一比值大于预设的第一比例阈值,确定所述目标搜索语句集包括的搜索语句具有问答属性。If the total number of clicks on the search result item is greater than the preset first number threshold, and the first ratio is greater than the preset first ratio threshold, it is determined that the search sentence included in the target search sentence set has a question and answer attribute.
  12. 根据权利要求9所述的设备,其特征在于,The device according to claim 9, wherein:
    所述处理单元,具体用于确定所述目标搜索语句集包括的所述至少一个搜索语句中每个搜索语句对应的加权系数,所述至少一个搜索语句中搜索次序大的搜索语句的加权系数高于搜索次序小的搜索语句的加权系数;根据每个搜索语句对应的加权系数和所述目标搜 索语句集关联的搜索事件信息中所述每个搜索语句的搜索结果点击信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性。The processing unit is specifically configured to determine a weighting coefficient corresponding to each search sentence in the at least one search sentence included in the target search sentence set, and a search sentence with a higher search order in the at least one search sentence has a higher weighting coefficient The weighting coefficient of the search sentence with a small search order; the search result click information of each search sentence in the search event information associated with each search sentence and the search event information associated with the target search sentence set determines the target search Whether the search sentences included in the sentence set have question and answer attributes.
  13. 根据权利要求9所述的设备,其特征在于,所述搜索事件信息包括所述至少一个搜索语句中每个搜索语句的搜索结果点击信息,所述搜索结果点击信息包括搜索结果项的点击总数量、问答类的搜索结果项的点击数量和点击的各搜索结果项的浏览时长;The device according to claim 9, wherein the search event information includes search result click information of each search sentence in the at least one search sentence, and the search result click information includes the total number of clicks of search result items , The number of clicks on Q&A search result items and the browsing time of each clicked search result item;
    所述处理单元,具体用于过滤掉浏览时长小于预设时长阈值的搜索结果项,并确定过滤该搜索结果项之后剩余的搜索结果项的点击总数量,确定过滤该搜索结果项之后剩余的问答类的搜索结果项的点击数量,以及计算该剩余的问答类的搜索结果项的点击数量与该剩余的搜索结果项的点击总数量之间的第一比值;如果该剩余的搜索结果项的点击总数量大于预设的第一数目阈值,且该第一比值大于预设的第一比例阈值,确定所述目标搜索语句集包括的搜索语句具有问答属性。The processing unit is specifically configured to filter out search result items whose browsing duration is less than a preset duration threshold, determine the total number of clicks on the search result items remaining after filtering the search result item, and determine the question and answer remaining after filtering the search result item The number of clicks on search result items of the category, and the first ratio between the number of clicks on the remaining Q&A search result items and the total number of clicks on the remaining search result items; if the remaining search result items are clicked The total number is greater than the preset first number threshold, and the first ratio is greater than the preset first ratio threshold, it is determined that the search sentences included in the target search sentence set have question and answer attributes.
  14. 根据权利要求9-13任一项所述的设备,其特征在于,The device according to any one of claims 9-13, wherein:
    所述获取单元,具体用于确定待训练的意图识别模型的应用领域信息;根据所述应用领域信息从所述搜索语句数据库包括的多个子数据库中确定出目标子数据库,所述子数据库与应用领域一一对应,每个子数据库包括对应的应用领域下的多个搜索语句集以及每个搜索语句集关联的搜索事件信息,所述目标子数据库对应的应用领域与所述应用领域信息指示的应用领域相同;从所述目标子数据库选取所述多个目标搜索语句集。The acquiring unit is specifically configured to determine the application field information of the intention recognition model to be trained; according to the application field information, a target sub-database is determined from a plurality of sub-databases included in the search sentence database, and the sub-database and the application Fields have one-to-one correspondence, each sub-database includes multiple search sentence sets under the corresponding application field and search event information associated with each search sentence set, and the application field corresponding to the target sub-database corresponds to the application indicated by the application field information The fields are the same; the multiple target search sentence sets are selected from the target sub-database.
  15. 一种意图识别设备,其特征在于,包括处理器和存储器,所述处理器和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行以下步骤:An intention recognition device, characterized by comprising a processor and a memory, the processor and the memory are connected to each other, wherein the memory is used to store a computer program, the computer program includes program instructions, and the processor is configured Used to call the program instructions, perform the following steps:
    接收用户输入的目标搜索语句;Receive the target search sentence entered by the user;
    对所述目标搜索语句进行分词处理,以得到所述目标搜索语句的分词结果,所述目标搜索语句的分词结果包括组成所述目标搜索语句的多个分词;Performing word segmentation processing on the target search sentence to obtain a word segmentation result of the target search sentence, and the word segmentation result of the target search sentence includes a plurality of word segmentation constituting the target search sentence;
    将所述目标搜索语句的分词结果输入至预置的意图识别模型,以得到所述目标搜索语句对应的意图识别结果,所述意图识别模型是基于多个目标搜索语句集以及所述多个目标搜索语句集中每个目标搜索语句集关联的搜索事件信息训练得到的,所述每个目标搜索语句集包括至少一个搜索语句,所述搜索事件信息包括所述至少一个搜索语句中每个搜索语句的搜索次序和/或所述每个搜索语句的搜索结果点击信息,所述意图识别结果用于指示所述目标搜索语句是否具有问答属性;Input the word segmentation result of the target search sentence into a preset intent recognition model to obtain the intent recognition result corresponding to the target search sentence, and the intent recognition model is based on multiple target search sentence sets and the multiple targets The search event information is obtained by training the search event information associated with each target search sentence set in the search sentence set, where each target search sentence set includes at least one search sentence, and the search event information includes information about each search sentence in the at least one search sentence. Search order and/or click information of the search result of each search sentence, and the intention recognition result is used to indicate whether the target search sentence has a question and answer attribute;
    如果所述意图识别结果指示所述目标搜索语句具有问答属性,则输出包括所述目标搜索语句对应的问答类搜索结果项的搜索结果。If the intent recognition result indicates that the target search sentence has a question and answer attribute, then output the search result including the question and answer type search result item corresponding to the target search sentence.
  16. 根据权利要求15所述的设备,其特征在于,所述处理器在执行所述将所述目标搜索语句的分词结果输入至预置的意图识别模型,以得到所述目标搜索语句对应的意图识别结果之前,还执行以下步骤:The device according to claim 15, wherein the processor is executing the input of the word segmentation result of the target search sentence into a preset intention recognition model to obtain the intention recognition corresponding to the target search sentence Before the result, perform the following steps:
    从搜索语句数据库中选取多个目标搜索语句集;其中,所述搜索语句数据库中记录了多个搜索语句集以及每个搜索语句集关联的搜索事件信息,所述每个搜索语句集包括至少一个搜索语句,所述搜索事件信息包括所述至少一个搜索语句中每个搜索语句的搜索次序和/或所述每个搜索语句的搜索结果点击信息;Select multiple target search sentence sets from the search sentence database; wherein, the search sentence database records multiple search sentence sets and search event information associated with each search sentence set, and each search sentence set includes at least one Search sentence, the search event information includes a search order of each search sentence in the at least one search sentence and/or search result click information of each search sentence;
    分别对所述多个目标搜索语句集中每个目标搜索语句集包括的搜索语句进行分词处理,以得到所述每个目标搜索语句集的分词结果,所述每个目标搜索语句集的分词结果包括组成该目标搜索语句集的搜索语句的多个分词;Word segmentation processing is performed on the search sentences included in each target search sentence set of the multiple target search sentence sets to obtain the word segmentation result of each target search sentence set, and the word segmentation result of each target search sentence set includes Multiple word segmentation of the search sentence constituting the target search sentence set;
    根据所述每个目标搜索语句集关联的搜索事件信息,确定所述每个目标搜索语句集包括的搜索语句是否具有问答属性;According to the search event information associated with each target search sentence set, determine whether the search sentence included in each target search sentence set has a question and answer attribute;
    将所述多个目标搜索语句集中具有问答属性的目标搜索语句集的分词结果作为正样本,以及将所述多个目标搜索语句集中不具有问答属性的目标搜索语句集的分词结果作为负样本,并利用所述多个目标搜索语句集对应的正样本和负样本训练得到意图识别模型;其中,所述意图识别模型用于识别输入的搜索语句是否具有问答属性。Taking the word segmentation results of the target search sentence set with question and answer attributes in the multiple target search sentence sets as positive samples, and taking the word segmentation results of the target search sentence sets without question answering attributes in the multiple target search sentence sets as negative samples, The positive samples and negative samples corresponding to the multiple target search sentence sets are used for training to obtain an intention recognition model; wherein, the intention recognition model is used to identify whether the input search sentence has question and answer attributes.
  17. 根据权利要求16所述的设备,其特征在于,所述处理器在执行所述根据所述目标搜索语句集关联的搜索事件信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性时,具体执行以下步骤:The device according to claim 16, wherein when the processor executes the search event information associated with the target search sentence set to determine whether the search sentence included in the target search sentence set has a question and answer attribute , Perform the following steps:
    根据所述目标搜索语句集关联的搜索事件信息所包括的每个搜索语句的搜索次序,确定所述目标搜索语句集包括的所述至少一个搜索语句中最大搜索次序对应的搜索语句;Determine the search sentence corresponding to the largest search order in the at least one search sentence included in the target search sentence set according to the search order of each search sentence included in the search event information associated with the target search sentence set;
    根据所述最大搜索次序对应的搜索语句的搜索结果点击信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性。According to the click information of the search result of the search sentence corresponding to the maximum search order, it is determined whether the search sentence included in the target search sentence set has a question and answer attribute.
  18. 根据权利要求17所述的设备,其特征在于,所述搜索结果点击信息包括搜索结果项的点击总数量和问答类的搜索结果项的点击数量;所述处理器在执行所述根据所述最大搜索次序对应的搜索语句的搜索结果点击信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性时,具体执行以下步骤:The device according to claim 17, wherein the search result click information includes the total number of clicks on search result items and the number of clicks on Q&A search result items; When the search result click information of the search sentence corresponding to the search order is clicked to determine whether the search sentence included in the target search sentence set has a question and answer attribute, the following steps are specifically performed:
    计算所述最大搜索次序对应的搜索语句的搜索结果点击信息包括的问答类的搜索结果项的点击数量与搜索结果项的点击总数量之间的第一比值;Calculating the first ratio between the number of clicks on the search result items of the question and answer category and the total number of clicks on the search result items included in the search result click information of the search sentence corresponding to the maximum search order;
    如果所述搜索结果项的点击总数量大于预设的第一数目阈值,且所述第一比值大于预设的第一比例阈值,确定所述目标搜索语句集包括的搜索语句具有问答属性。If the total number of clicks on the search result item is greater than the preset first number threshold, and the first ratio is greater than the preset first ratio threshold, it is determined that the search sentence included in the target search sentence set has a question and answer attribute.
  19. 根据权利要求16所述的设备,其特征在于,所述处理器在执行所述根据所述目标搜索语句集关联的搜索事件信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性时,具体执行以下步骤:The device according to claim 16, wherein when the processor executes the search event information associated with the target search sentence set to determine whether the search sentence included in the target search sentence set has a question and answer attribute , Perform the following steps:
    确定所述目标搜索语句集包括的所述至少一个搜索语句中每个搜索语句对应的加权系数,所述至少一个搜索语句中搜索次序大的搜索语句的加权系数高于搜索次序小的搜索语句的加权系数;Determine the weighting coefficient corresponding to each search sentence in the at least one search sentence included in the target search sentence set, and the weighting coefficient of the search sentence with a higher search order in the at least one search sentence is higher than that of the search sentence with a lower search order Weighting factor
    根据每个搜索语句对应的加权系数和所述目标搜索语句集关联的搜索事件信息中所述每个搜索语句的搜索结果点击信息,确定所述目标搜索语句集包括的搜索语句是否具有问答属性。According to the weighting coefficient corresponding to each search sentence and the search result click information of each search sentence in the search event information associated with the target search sentence set, it is determined whether the search sentence included in the target search sentence set has a question and answer attribute.
  20. 一种计算机非易失性可读存储介质,其特征在于,所述计算机非易失性可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如权利要求1-7任一项所述的方法。A computer nonvolatile readable storage medium, wherein the computer nonvolatile readable storage medium stores a computer program, the computer program includes program instructions, and the program instructions when executed by a processor The processor is caused to execute the method according to any one of claims 1-7.
PCT/CN2019/116240 2019-07-18 2019-11-07 Intention recognition method, device and computer readable storage medium WO2021008015A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910653241.9 2019-07-18
CN201910653241.9A CN110472027B (en) 2019-07-18 Intent recognition method, apparatus, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2021008015A1 true WO2021008015A1 (en) 2021-01-21

Family

ID=68509723

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116240 WO2021008015A1 (en) 2019-07-18 2019-11-07 Intention recognition method, device and computer readable storage medium

Country Status (1)

Country Link
WO (1) WO2021008015A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860867A (en) * 2021-02-25 2021-05-28 电子科技大学 Attribute selecting method and storage medium for Chinese question-answering system based on convolution neural network
CN113343051A (en) * 2021-06-04 2021-09-03 全球能源互联网研究院有限公司 Abnormal SQL detection model construction method and detection method
CN113641803A (en) * 2021-06-30 2021-11-12 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium
CN116628315A (en) * 2023-04-07 2023-08-22 百度在线网络技术(北京)有限公司 Search method, training method and device of deep learning model and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180121810A1 (en) * 2016-10-31 2018-05-03 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for analyzing intention based on artificial intelligence
CN108446286A (en) * 2017-02-16 2018-08-24 阿里巴巴集团控股有限公司 A kind of generation method, device and the server of the answer of natural language question sentence
CN109062977A (en) * 2018-06-29 2018-12-21 厦门快商通信息技术有限公司 A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity
CN109543012A (en) * 2018-10-25 2019-03-29 苏宁易购集团股份有限公司 A kind of user's intension recognizing method and device based on Word2Vec and RNN
CN109815314A (en) * 2019-01-04 2019-05-28 平安科技(深圳)有限公司 A kind of intension recognizing method, identification equipment and computer readable storage medium
CN109933779A (en) * 2017-12-18 2019-06-25 苏宁云商集团股份有限公司 User's intension recognizing method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180121810A1 (en) * 2016-10-31 2018-05-03 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for analyzing intention based on artificial intelligence
CN108446286A (en) * 2017-02-16 2018-08-24 阿里巴巴集团控股有限公司 A kind of generation method, device and the server of the answer of natural language question sentence
CN109933779A (en) * 2017-12-18 2019-06-25 苏宁云商集团股份有限公司 User's intension recognizing method and system
CN109062977A (en) * 2018-06-29 2018-12-21 厦门快商通信息技术有限公司 A kind of automatic question answering text matching technique, automatic question-answering method and system based on semantic similarity
CN109543012A (en) * 2018-10-25 2019-03-29 苏宁易购集团股份有限公司 A kind of user's intension recognizing method and device based on Word2Vec and RNN
CN109815314A (en) * 2019-01-04 2019-05-28 平安科技(深圳)有限公司 A kind of intension recognizing method, identification equipment and computer readable storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860867A (en) * 2021-02-25 2021-05-28 电子科技大学 Attribute selecting method and storage medium for Chinese question-answering system based on convolution neural network
CN113343051A (en) * 2021-06-04 2021-09-03 全球能源互联网研究院有限公司 Abnormal SQL detection model construction method and detection method
CN113343051B (en) * 2021-06-04 2024-04-16 全球能源互联网研究院有限公司 Abnormal SQL detection model construction method and detection method
CN113641803A (en) * 2021-06-30 2021-11-12 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium
CN113641803B (en) * 2021-06-30 2023-06-06 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium
CN116628315A (en) * 2023-04-07 2023-08-22 百度在线网络技术(北京)有限公司 Search method, training method and device of deep learning model and electronic equipment
CN116628315B (en) * 2023-04-07 2024-03-22 百度在线网络技术(北京)有限公司 Search method, training method and device of deep learning model and electronic equipment

Also Published As

Publication number Publication date
CN110472027A (en) 2019-11-19

Similar Documents

Publication Publication Date Title
WO2020140372A1 (en) Recognition model-based intention recognition method, recognition device, and medium
WO2021008015A1 (en) Intention recognition method, device and computer readable storage medium
WO2020140373A1 (en) Intention recognition method, recognition device and computer-readable storage medium
WO2020108608A1 (en) Search result processing method, device, terminal, electronic device, and storage medium
CN107609101B (en) Intelligent interaction method, equipment and storage medium
CN107818781B (en) Intelligent interaction method, equipment and storage medium
CN107832286B (en) Intelligent interaction method, equipment and storage medium
CN107515877B (en) Sensitive subject word set generation method and device
WO2020244073A1 (en) Speech-based user classification method and device, computer apparatus, and storage medium
CN107797984B (en) Intelligent interaction method, equipment and storage medium
CN111814770B (en) Content keyword extraction method of news video, terminal device and medium
WO2020207074A1 (en) Information pushing method and device
WO2019228203A1 (en) Short text classification method and system
KR20160030943A (en) Performing an operation relative to tabular data based upon voice input
WO2021189951A1 (en) Text search method and apparatus, and computer device and storage medium
WO2014008139A2 (en) Generating search results
CN108287848B (en) Method and system for semantic parsing
CN113722478B (en) Multi-dimensional feature fusion similar event calculation method and system and electronic equipment
CN112199588A (en) Public opinion text screening method and device
CN111767713A (en) Keyword extraction method and device, electronic equipment and storage medium
WO2023240878A1 (en) Resource recognition method and apparatus, and device and storage medium
CN111061876A (en) Event public opinion data analysis method and device
CN111737607B (en) Data processing method, device, electronic equipment and storage medium
CN111274366A (en) Search recommendation method and device, equipment and storage medium
CN108628875B (en) Text label extraction method and device and server

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19937514

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19937514

Country of ref document: EP

Kind code of ref document: A1