WO2021012483A1 - 信息识别方法、装置、计算机设备和存储介质 - Google Patents

信息识别方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2021012483A1
WO2021012483A1 PCT/CN2019/116508 CN2019116508W WO2021012483A1 WO 2021012483 A1 WO2021012483 A1 WO 2021012483A1 CN 2019116508 W CN2019116508 W CN 2019116508W WO 2021012483 A1 WO2021012483 A1 WO 2021012483A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
vector
log
combined
session
Prior art date
Application number
PCT/CN2019/116508
Other languages
English (en)
French (fr)
Inventor
刘利
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021012483A1 publication Critical patent/WO2021012483A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • This application relates to an information identification method, device, computer equipment and storage medium.
  • search engine technology With the development of search engine technology, more and more websites use search engine technology to enable users to quickly query the information they want.
  • the current search engine technology can recognize the user's intention according to the user's input, and return corresponding information according to the user's intention.
  • an information recognition method is provided.
  • An information identification method including:
  • Start multiple threads use multiple threads to calculate the similarity between each combined query vector and the intent class in parallel, and obtain the information recognition result according to the similarity between each combined query vector and the intent class.
  • An information identification method and device including:
  • the log acquisition module is used to acquire the query log.
  • the query log includes multiple query sessions;
  • the filtering module is used to filter according to the query time and the number of queries of the query session to obtain the target query log;
  • the feature extraction module is used to extract query feature information from the target query log, and digitize the query feature information to obtain the query vector corresponding to each query session in the target query log;
  • the target vector selection module is used for selecting and combining query vectors corresponding to a preset number of query sessions to obtain a target query vector;
  • the intent class obtaining module is used to calculate the similarity between the target query vector and the preset intent clustering model to obtain the intent class corresponding to the target query vector;
  • the vector combination module is used to combine the query vectors corresponding to each query session according to preset rules to obtain each combined query vector;
  • the information recognition module is configured to start multiple threads, use the multiple threads to calculate the similarity between each combined query vector and the intent class in parallel, and obtain an information recognition result according to the similarity between each combined query vector and the intent class.
  • a computer device including a memory and one or more processors, the memory stores computer readable instructions, when the computer readable instructions are executed by the processor, the one or more processors execute The following steps:
  • Start multiple threads use multiple threads to calculate the similarity between each combined query vector and the intent class in parallel, and obtain the information recognition result according to the similarity between each combined query vector and the intent class.
  • One or more non-volatile storage media storing computer-readable instructions.
  • the computer-readable instructions When executed by one or more processors, the one or more processors perform the following steps:
  • Start multiple threads use multiple threads to calculate the similarity between each combined query vector and the intent class in parallel, and obtain the information recognition result according to the similarity between each combined query vector and the intent class.
  • Fig. 1 is an application scenario diagram of an information recognition method according to one or more embodiments.
  • Fig. 2 is a schematic flowchart of an information identification method according to one or more embodiments.
  • Fig. 3 is a schematic flow chart of filtering query logs according to one or more embodiments.
  • Fig. 4 is a schematic diagram of a process for obtaining a query vector according to one or more embodiments.
  • Fig. 5 is a schematic flowchart of obtaining a combined query vector according to one or more embodiments.
  • Fig. 6 is a schematic diagram of a flow of information identification according to one or more embodiments.
  • Fig. 7 is a schematic diagram of a process of pushing recommendation information according to one or more embodiments.
  • Fig. 8 is a schematic flowchart of obtaining a preset intent clustering model according to one or more embodiments.
  • Fig. 9 is a structural block diagram of an information recognition device according to one or more embodiments.
  • Fig. 10 is an internal structure diagram of a computer device according to one or more embodiments.
  • the information identification method provided in this application can be applied to the application environment as shown in FIG. 1.
  • the terminal 102 communicates with the server 104 through the network.
  • the server 104 obtains the query log sent by the terminal 102, and the query log includes multiple query sessions; filters according to the query time and query times of the query session to obtain the target query log; extracts query features from the target query log, and digitizes the query features Obtain the query vector corresponding to each query session in the target query log; select the query vectors corresponding to the preset number of query sessions and combine to obtain the target query vector; calculate the similarity between the target query vector and the preset intent clustering model to obtain the target query The intent class corresponding to the vector; the server 104 combines the query vectors corresponding to each query session according to preset rules to obtain each combined query vector.
  • the server 104 starts multiple threads, and uses multiple threads to calculate the results of each combined query vector and the intent class in parallel Similarity, the information recognition result is obtained according to the similarity between each combined query vector and the intent class.
  • the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
  • an information identification method is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps:
  • the query log is based on the log information generated when the user uses the search engine.
  • a query session is a series of continuous interactive activities that a user needs to obtain a certain information within a period of time, from submitting a query to submitting the next query or exiting the search engine. For example, you may submit a navigation query (such as pingan bank), click the official website, stop searching, and get the corresponding query session.
  • you may submit a navigation query such as pingan bank
  • click the official website stop searching, and get the corresponding query session.
  • the query session is stored on the server, it is stored as multiple fields of information, including query time, query sentence, click time, click URL (uniform resource locator), and so on. Multiple query sessions can be included in the query log.
  • the server obtains the query log, and the query log may be different query logs obtained from multiple different terminals.
  • Each query log includes query sessions generated by users when searching.
  • the query session can be multiple.
  • S204 Filter the query log according to the query time and the number of queries of the query log to obtain the target query log.
  • the query time of the query log refers to the total time it takes to complete the user's query, and the number of queries refers to the number of queries included in the user's query log.
  • the server compares the number of query sessions included in the query log from the start time to the result time of the user’s query log with the preset query time and number of queries, and deletes the non-conformance in the query log based on the comparison result.
  • the query log for the preset query time and the number of queries will use the filtered query log as the target query log.
  • S206 Extract query features from the target query log, and digitize the query feature information to obtain query vectors corresponding to each query session in the target query log.
  • the query feature is used to represent the feature of the query session.
  • the query feature is preset, and can include query sentence features, clicked URL features, and combined features.
  • the combined feature is a combination of query sentence features and clicked URL features.
  • the server extracts the query characteristics of each query session from the target query log, and digitizes the query characteristics to obtain the query vector corresponding to each query session in the target query log. Binarization or tf-idf can be used to digitize the query characteristics. Get the query vector.
  • S208 Select and combine query vectors corresponding to a preset number of query sessions to obtain a target query vector.
  • the query vector is used to characterize query sessions, and each query session corresponds to a query vector.
  • the preset number is a number set in the server in advance and the set number is less than the number of query vectors.
  • the number can be manually set, or can be obtained by counting the average number of query vectors included in the historical query log.
  • the server selects a preset number of query vectors from query vectors corresponding to each query session in the target query log, and combines the number of query vectors to obtain the target query vector.
  • the query vectors corresponding to the pre-organized query sessions can be selected from morning to night according to the time sequence of query sessions in the query log. For example, q_1, q_2, q_3,..., q_n are n query vectors corresponding to n query sessions. If d is the average number of query sessions included in the historical query log. At this time, if the preset number is set to d, the target query vector obtained can be (q_1, q_2,..., q_d).
  • S210 Calculate the similarity between the target query vector and the preset intent clustering model, and obtain the intent class corresponding to the target query vector.
  • the preset intention clustering model refers to a model that uses a clustering algorithm to cluster according to the user's historical query log to obtain the user's various intention classes.
  • the server uses a similarity algorithm to calculate the similarity between the target query vector and the preset intent clustering model to obtain the intent class corresponding to the target query vector, that is, to find the intent class to which the target query vector belongs.
  • the similarity algorithm can be Euclidean distance algorithm, cosine distance algorithm and so on.
  • the preset rule refers to the pre-set combination rule of the query vectors corresponding to each query session.
  • the query vectors corresponding to each query session can be combined in turn, or the query vectors can be selected from the query vectors corresponding to each query session for combination. .
  • the server combines the query vectors corresponding to each query session according to preset rules to obtain each combined query vector, which can be that the first query vector is used as the first combined query vector, and the second query subvector is combined with the first query vector. Combine a combined query vector to get the second combined query vector, then combine the third query vector with the second combined query vector to get the third combined query vector, until all the query vectors are combined, Get each combined query vector.
  • S214 Start multiple threads, use multiple threads to calculate the similarity between each combined query vector and the intent class in parallel, and obtain an information recognition result according to the similarity between each combined query vector and the intent class.
  • information refers to the user's intention in the conversation
  • information recognition refers to identifying whether the query intention has changed between two adjacent query sessions. For example, in a query session to query "dog" related information. In the next query session, the query did not continue to query the information related to "dog", but to query the information related to "gym”.
  • the intent is irrelevant to the query, that is, the intent between the two query sessions has changed, that is, the information recognition result is that the information is sent.
  • a thread is the smallest unit that the operating system can perform operation scheduling.
  • the server starts multiple threads, and the multiple threads can run in parallel.
  • the multiple threads running in parallel are used to calculate the similarity between each combined query vector and the intent class, and compare the similarity between each combined query vector and the intent class. According to the comparison result, the intent information change result of the query session corresponding to each combined query vector is determined.
  • the query vector is obtained through the query log, and the target query vector corresponding to the query vector is obtained, and the intent class corresponding to the target query vector is obtained. Then the query vectors are combined to obtain each combined query vector, and multiple It calculates the similarity between each combined query vector and the intent class in parallel, and then determines the information recognition result according to each similarity. By calculating the similarity in parallel, the efficiency of calculating the similarity is improved, and the information recognition in the large amount of data is determined The efficiency of the result.
  • step S204 that is, filtering the query log according to the query time and the number of queries of the query log to obtain the target query log, includes the steps:
  • S302 Search for the first query log whose query time is greater than the preset time, and delete the first query log from the query log.
  • the server searches the user's query log for a log whose query time is greater than a preset time, and uses the searched log as the first query log.
  • the first query log refers to a log generated by the user during a relatively long query, such as If the user query log for more than one hour, the first query log is deleted from the query log. Since users may change their intentions frequently during long-term queries, the query log does not meet the requirements. Delete the query log to ensure the consistency of the query log data
  • S304 Search for a second query log in which the number of queries of the query log is less than the preset number of times, and delete the second query log from the query log to obtain the target query log.
  • the server searches the user's query log for logs with less than a preset number of queries, that is, the number of query sessions in the query log is less than the preset number, uses the found log as the second query log, and sets the first 2.
  • the query log is deleted from the query log to obtain the target query log. Because the number of query sessions is too small, it is impossible to generate information based on the user's intention. For example, there is only one query log for a query session. The query log may not be complete enough to describe the entire intention of the user. Delete the query log to ensure the consistency of the query log.
  • step S206 that is, extracting query feature information from the target query log, and digitizing the query feature information to obtain query vectors corresponding to each query session in the target query log, includes the steps:
  • S402 Obtain query text from each query session of the target query log, and extract keywords in the query text using a bag-of-words strategy to obtain query keyword features.
  • the query text refers to the text obtained according to the query sentence entered by the user. That is, it can be obtained from the query statement field in each query session of the query log. Bag-of-words strategy refers to the use of machine learning algorithms to extract features from text. Bag-of-words is a representation of the text in which words appear in a document.
  • the server obtains the query text from the query sentence field of each query session in the target query log, uses the bag-of-words strategy to extract keywords in the query text, and obtains the query keyword characteristics. That is, the query text of each query session in the target query log is obtained, and the query keyword characteristics of each query session are obtained.
  • S404 Obtain uniform resource locator information from each query session of the target query log, search for a classification catalog corresponding to the uniform resource locator information, and obtain uniform resource locator characteristics.
  • the server obtains uniform resource locator information, that is, URL information, from the URL field in each query session of the target query log, and finds the category directory corresponding to the URL information from the open category directory search system (ODP), and according to the category directory Get uniform resource locator characteristics.
  • uniform resource locator information that is, URL information
  • ODP open category directory search system
  • the keywords in the query text of the server are combined with the classification catalog corresponding to the uniform resource locator information, and the combined information is used as the combined feature. For example, if a user searches for "Blue Moon Laundry Detergent" on Taobao, he can find the corresponding product catalog according to the URL of the product clicked by the user.
  • the category directory may be "daily users-washing-laundry detergent”.
  • the keywords can be "Blue Moon” and "Laundry Liquid”.
  • Combine each item in the category and keywords to get the combination feature can be "Daily life user-Blue Moon”, “Daily life user-Laundry liquid” , “Washing-Blue Moon”, “Washing-Liquid Laundry”, “Liquor-Blue Moon” and “Liquid Laundry-Liquid”.
  • S408 Obtain the query feature according to the query keyword feature, the uniform resource locator feature, and the combination feature, and digitize the query feature to obtain a query vector.
  • the server combines the query keyword feature, uniform resource locator feature information, and combination feature to obtain the query feature corresponding to each query session in the target query log, and digitize the query feature to obtain the query feature corresponding to each query session in the target query log.
  • Query vector the query keyword feature, uniform resource locator feature information, and combination feature to obtain the query feature corresponding to each query session in the target query log.
  • the query keyword feature is obtained through the query sentence, the URL feature is obtained according to the clicked URL, and then the URL feature information is obtained according to the query sentence and the clicked URL to obtain the combined feature information, so that the obtained query feature is more accurate, thereby Make the obtained query vector more accurate.
  • step S212 that is, combining the query vectors corresponding to each query session according to preset rules to obtain each combined query vector, includes the steps:
  • S502 Obtain an initial query vector corresponding to an initial query session in each query session, and use the initial query vector as a first combined query vector.
  • the initial query session refers to the query session corresponding to the start time in the target query log, and the start time is the time when the user starts the query.
  • the server obtains the initial query session in each query session corresponding to the target query log, and obtains the initial query vector corresponding to the initial query session, and uses the initial query session as the first combined query vector.
  • S504 Obtain a query vector immediately adjacent to the first combined query vector, and combine the first combined query vector with the immediately adjacent query vector to obtain a second combined query vector.
  • the server obtains the query vector immediately adjacent to the initial query vector from the query vector, that is, the query vector corresponding to the second query session in the target query log, and combines the initial query vector with the immediately adjacent query vector to obtain the second combined query vector,
  • the initial query vector is q_1
  • the next query vector is q_2
  • the second combined query vector obtained is (q_1, q_2).
  • the second combined query vector is used as the first combined query vector, and the step of obtaining the query vector immediately adjacent to the first combined query vector is returned and executed until the query vectors corresponding to each query session are all combined, and each combined query vector is obtained.
  • the server takes the second combined query vector as the first combined query vector and returns to step S504 for execution, that is, returns to the execution of the step of obtaining the query vector immediately adjacent to the first combined query vector.
  • the combined query vector obtained includes query vectors corresponding to all query sessions, and each combined query vector is obtained. For example, at this time, using (q_1, q_2) as the first combined query vector, the query vector immediately adjacent to the first combined query vector is obtained as q_3, and the combined second combined query vector is (q_1, q_2, q_3).
  • each combined query vector is obtained.
  • each combined query vector is obtained, which is convenient for calculating the similarity between the combined query vector and the intent class.
  • step S508 that is, obtaining the information recognition result according to the similarity between each combined query vector and the intent class, includes the steps:
  • S602 Obtain a first similarity between the first combined query vector and the intent class, and obtain a second similarity between the second combined query vector and the intent class.
  • the first combined query vector is the combined query vector obtained from the initial query vector
  • the second combined query vector is obtained by combining the initial query vector and the query vector immediately adjacent to the initial query vector.
  • the server obtains the first similarity between the first combined query vector and the intent class and obtains the second similarity between the second combined query vector and the intent class.
  • the server compares the first similarity with the second similarity, and when the first similarity exceeds the second similarity, the query session corresponding to the first combined query vector is obtained, that is, the initial query session corresponds to the second combined query vector
  • the query session that is, the information between immediately adjacent query sessions has changed, that is, the intention to send changes, that is, the query intention information between the initial query session and the query session immediately adjacent to the initial query session is different.
  • the information recognition result can be written into an information change record table for storage.
  • the information change record table is used to record the result of information recognition, including the query session field before the information change and the query session field after the information change.
  • the query session corresponding to the first combined query vector that is, the initial query session
  • the query session corresponding to the second combined query vector that is, the intent information between the immediately adjacent query sessions have not occurred. Change and do not deal with it.
  • the similarity of all the combined query vectors is sequentially compared to obtain the intent information recognition result between each query session and the next query session, and the recognition result is written into the information change record table for storage.
  • the first similarity between the first combined query vector and the intent class and the second similarity between the second combined query vector and the intent class are compared to obtain the query session and the second similarity corresponding to the first combined query vector.
  • Combining the intent change results of the query session corresponding to the query vector improves the accuracy of detecting the intent change.
  • step S212 is to combine the query vectors corresponding to each query session according to preset rules to obtain each combined query vector, and calculate the similarity between each combined query vector and the intent class After getting the information result, it also includes steps:
  • S702 Acquire a query session before the information change and a query session after the information from the query log according to the information identification result.
  • the query session before the information change refers to all query sessions in the query log when the intent has not changed.
  • the query session after the information change refers to all the query sessions corresponding to the changed intent after the intent is changed.
  • the query log includes user session a1, user session a2, user session a3, user session a4, and user session a5.
  • the query session before the information change includes user session a1, user session a2, and user session a3.
  • the query session after the information change includes user session a4 and user session a5.
  • the server obtains the query session before the information change and the query session after the information change from the query log according to the information recognition result.
  • the query session before the information change is obtained includes user session a1, user session a2, and user session a3.
  • the query session after the information change includes user session a4 and user session a5.
  • S704 Obtain the query time in the query session before the information change and the query session after the information change, and obtain the weight of the intent corresponding to the query session before the information change and the weight of the intent corresponding to the query session after the information change according to the query time.
  • the query time in the query session before the information acquisition and the query time in the query session after the information change are used to determine the weight of the query session corresponding intent before the information change and the query session corresponding intent after the information change according to the size of the query time the weight of.
  • the query session before the information change includes user session a1, user session a2, and user session a3.
  • the query time of the intent corresponding to the query session before the information change is the query time s1 of a1 plus the query time s2 of a2 and the time of a3 Query time s3.
  • the query session after the information change includes the user session a4 and the user session a5, and the query time corresponding to the intent of the query session after the information change is the query time s4 of a4 plus the query time s5 of a5. And according to the size of the query time to get the corresponding weight.
  • S706 Compare the weight of the intent corresponding to the query session before the information change and the weight of the intent corresponding to the query session after the information change, obtain corresponding recommended information according to the comparison result, and push the recommended information to the query terminal.
  • the server compares the weight of the intent corresponding to the query session before the information change and the weight of the intent corresponding to the query session after the information change.
  • the weight of the intent corresponding to the query session before the information change is greater than the weight of the intent corresponding to the query session after the information change
  • Obtain the recommended information corresponding to the intent of the query session before the information change and push the recommended information to the query terminal.
  • the weight of the intent corresponding to the query session before the information change is less than the weight of the intent corresponding to the query session after the information change
  • the recommended information corresponding to the intent of the query session after the information change is obtained, and the recommended information is pushed to the query terminal.
  • the intent weight corresponding to the query session is determined according to the query time of the query session, and the recommended information is determined according to the weight, so that the obtained recommended information can be more accurate and meet the needs of users.
  • the steps of generating historical intent clustering results include:
  • S802 Obtain a historical query log, and filter the historical query log according to the query time and the number of queries of the historical query log in the historical query log to obtain the target historical query log.
  • the server obtains the historical query log, and deletes the historical query log whose query time is greater than the preset threshold according to the query time of the historical query log in the historical query log. According to the query count of the historical query log in the historical query log, the historical query log whose query count is less than the preset number is deleted to obtain the target historical query log.
  • S804 Extract historical query features of each historical query session in the target historical query log, and digitize the historical query feature information to obtain historical query vectors corresponding to each historical query session.
  • the historical query feature of each historical query session in the server target historical query log
  • the historical query feature information includes keyword feature information, URL feature information, and keyword and URL combination feature information. And digitize historical query feature information to obtain historical query vectors corresponding to each historical query session.
  • clustering is performed using a hierarchical clustering algorithm according to the historical query vector, and when the clustering is completed, a preset intention clustering model is obtained.
  • the server uses a hierarchical clustering algorithm to perform clustering according to the historical query vector, and when the clustering is completed, the historical intention clustering result is obtained.
  • cluster completion refers to dividing all historical query vectors into a preset number of cluster categories.
  • the target historical query log is obtained by filtering the historical query log, and the hierarchical clustering algorithm is used for clustering according to the target historical query log to obtain the historical intent clustering result.
  • the historical intent clustering result can be obtained in advance. When information is recognized, it can be called directly, which is convenient and quick, and improves the efficiency of obtaining information recognition results.
  • an information recognition device 900 which includes: a log acquisition module 902, a filtering module 904, a feature extraction module 906, a target vector selection module 908, an intent class acquisition module 910, and The result is module 912, where:
  • the log obtaining module 902 is configured to obtain a query log, and the query log includes multiple query sessions;
  • the filtering module 904 is configured to filter according to the query time and the number of queries of the query session to obtain the target query log;
  • the feature extraction module 906 is configured to extract query feature information from the target query log, and digitize the query feature information to obtain query vectors corresponding to each query session in the target query log;
  • the target vector selection module 908 is configured to select and combine query vectors corresponding to a preset number of query sessions to obtain a target query vector;
  • the intent class obtaining module 910 is used to calculate the similarity between the target query vector and the historical intent clustering result to obtain the intent class corresponding to the target query vector;
  • the vector combination module 912 is configured to combine the query vectors corresponding to each query session according to preset rules to obtain each combined query vector;
  • the information recognition module 914 is configured to start multiple threads, use the multiple threads to calculate the similarity between each combined query vector and the intent class in parallel, and obtain the information recognition result according to the similarity between each combined query vector and the intent class .
  • the filtering module 904 includes:
  • the first log deletion module is used to find the first query log whose query time is greater than the preset time, and delete the first query session from the query log;
  • the second log deletion module is used to find the second query log whose query times are less than the preset number of query logs, and delete the second query log from the query log to obtain the target query log.
  • the feature extraction module 906 includes:
  • the word extraction module is used to obtain the query text from each query session of the target query log, use the bag-of-words strategy to extract the keywords in the query text, and obtain the query keyword characteristics;
  • the category catalog obtaining module is used to obtain uniform resource locator information from each query session of the target query log, find the category catalog corresponding to the uniform resource locator information, and obtain uniform resource locator characteristics;
  • Obtain a combined feature module which is used to combine the keywords in the query text and the classification catalog corresponding to the uniform resource locator information to obtain combined features
  • the query feature obtaining module is used to obtain query features based on query keyword features, uniform resource locator features, and combination features, and digitize the query feature information to obtain query vectors corresponding to each query session in the target query log.
  • the vector combination module 912 includes:
  • the first combined query vector obtaining module obtains the initial query vector corresponding to the initial query session in each query session, and uses the initial query vector as the first combined query vector;
  • the second combined query vector obtaining module is used to obtain the query vector immediately adjacent to the first combined query vector, and combine the first combined query vector with the immediately adjacent query vector to obtain the second combined query vector;
  • the loop module is used to take the second combined query vector as the first combined query vector, and return to the step of obtaining the query vector next to the first combined query vector for execution, until the query vectors corresponding to each query session are all combined, and each combination is obtained Query vector.
  • the information identification module 914 includes:
  • a similarity obtaining module configured to obtain the first similarity between the first combined query vector and the intent class and obtain the second similarity between the second combined query vector and the intent class;
  • the similarity comparison module is used to compare the first similarity with the second similarity. When the first similarity exceeds the second similarity, obtain the query session corresponding to the first combined query vector and the second combined query vector The query session information has changed.
  • the information recognition device 900 further includes:
  • the session acquisition module is used to acquire the query session before the information change and the query session after the information change from the query log according to the information recognition result;
  • the weight calculation module is used to obtain the query time in the query session before the information change and the query session after the information change, and obtain the weight of the intent corresponding to the query session before the information change and the weight of the intent corresponding to the query session after the information change according to the query time ;
  • the weight comparison module is used to compare the weight of the intent corresponding to the query session before the information change and the weight of the intent corresponding to the query session after the information change, obtain corresponding recommended information according to the comparison result, and push the recommended information to the query terminal.
  • the information recognition device 900 includes:
  • the historical log obtaining module is used to obtain the historical query log, filter the historical query log according to the query time and the number of queries in the historical query log, and obtain the target historical query log;
  • the historical vector obtaining module is used to extract the historical query features of each historical query session in the target historical query log, and digitize the historical query feature information to obtain the historical query vector corresponding to each historical query session;
  • the clustering module is used to perform clustering using a hierarchical clustering algorithm according to the historical query vector, and when the clustering is completed, the historical intention clustering result is obtained.
  • Each module in the above-mentioned information identification device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 10.
  • the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer equipment is used to store query log data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instruction is executed by the processor to realize an information recognition method.
  • FIG. 10 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer readable instructions.
  • the steps of the information identification method provided in any embodiment of the present application are implemented.
  • One or more non-volatile storage media storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, the one or more processors realize the information provided in any embodiment of the present application Identify the steps of the method.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • ROM read only memory
  • PROM programmable ROM
  • EPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Abstract

一种信息识别方法,包括:获取查询日志,查询日志中包括多个查询会话;根据查询会话的查询时间和查询次数进行过滤,得到目标查询日志;从目标查询日志中提取查询特征,将查询特征数值化得到目标查询日志中各个查询会话对应的查询向量;选取预设数量的查询会话对应的查询向量,得到目标查询向量;计算目标查询向量与预设意图聚类模型的相似度,得到目标查询向量对应的意图类;将各个查询会话对应的查询向量依次组合,得到各个组合查询向量,启动多个线程,使用所述多个线程并行计算各个组合查询向量与意图类的相似度,得到信息结果。

Description

信息识别方法、装置、计算机设备和存储介质
相关申请的交叉引用
本申请要求于2019年07月23日提交中国专利局,申请号为201910666381X,申请名称为“信息识别方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及一种信息识别方法、装置、计算机设备和存储介质。
背景技术
随着搜索引擎技术的发展,越来越多的网站使用搜索引擎技术来使用户快速查询想要的信息。目前的搜索引擎技术,能够根据用户的输入,识别出用户的意图,从而根据用户的意图返回对应的信息。
然而,当网站在一定时间内识别用户不同的搜索意图信息时,需要分别将用户各个的搜索意图依次进行识别。比如,用户在先开始查找动物“狗”,网站会返回狗相关的信息。当用户再一次查询动物“猫”时,网站会返回猫相关的信息,当需要识别用户搜索意图信息时,先会识别出“狗”,然后再会识别出“猫”,然后进行比较得到意图信息发生变化,这种识别意图信息发生变化的方法当存在大量的用户数据进行识别时,识别意图信息发生变化的效率比较低。
发明内容
根据本申请公开的各种实施例,提供一种信息识别方法、装置、计算机设备和存储介质。
一种信息识别方法,包括:
获取查询日志,查询日志中包括多个查询会话;
根据查询日志的查询时间和查询次数过滤查询日志,得到目标查询日志;
从目标查询日志中提取查询特征,将查询特征数值化得到目标查询日志中各个查询会话对应的查询向量;
选取预设数量的查询会话对应的查询向量进行组合,得到目标查询向量;
计算目标查询向量与历史意图聚类结果的相似度,得到目标查询向量对应的意图类;
按照预设规则将各个查询会话对应的查询向量进行组合,得到各个组合查询向量;及
启动多个线程,使用多个线程并行计算各个组合查询向量与意图类的相似度,根据各个组合查询向量与意图类的相似度得到信息识别结果。
一种信息识别方法装置,包括:
日志获取模块,用于获取查询日志,查询日志中包括多个查询会话;
过滤模块,用于根据查询会话的查询时间和查询次数进行过滤,得到目标查询日志;
特征提取模块,用于从目标查询日志中提取查询特征信息,将查询特征信息数值化得到目标查询日志中各个查询会话对应的查询向量;
目标向量选取模块,用于选取预设数量的查询会话对应的查询向量进行组合,得到目标查询向量;
意图类得到模块,用于计算目标查询向量与预设意图聚类模型的相似度,得到目标查询向量对应的意图类;
向量组合模块,用于按照预设规则将各个查询会话对应的查询向量进行组合,得到各个组合查询向量;及
信息识别模块,用于启动多个线程,使用所述多个线程并行计算各个组合查询向量与意图类的相似度,根据所述各个组合查询向量与所述意图类的相似度得到信息识别结果。
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:
获取查询日志,查询日志中包括多个查询会话;
根据查询会话的查询时间和查询次数进行过滤,得到目标查询日志;
从目标查询日志中提取查询特征,将查询特征数值化得到目标查询日志中各个查询会话对应的查询向量;
选取预设数量的查询会话对应的查询向量进行组合,得到目标查询向量;
计算目标查询向量与预设意图聚类模型的相似度,得到目标查询向量对应的意图类;
按照预设规则将各个查询会话对应的查询向量进行组合,得到各个组合查询向量;及
启动多个线程,使用多个线程并行计算各个组合查询向量与意图类的相似度,根据各个组合查询向量与意图类的相似度得到信息识别结果。
一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
获取查询日志,查询日志中包括多个查询会话;
根据查询会话的查询时间和查询次数进行过滤,得到目标查询日志;
从目标查询日志中提取查询特征,将查询特征数值化得到目标查询日志中各个查询会话对应的查询向量;
选取预设数量的查询会话对应的查询向量进行组合,得到目标查询向量;
计算目标查询向量与预设意图聚类模型的相似度,得到目标查询向量对应的意图类;
按照预设规则将各个查询会话对应的查询向量进行组合,得到各个组合查询向量;及
启动多个线程,使用多个线程并行计算各个组合查询向量与意图类的相似度,根据各 个组合查询向量与意图类的相似度得到信息识别结果。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为根据一个或多个实施例中信息识别方法的应用场景图。
图2为根据一个或多个实施例中信息识别方法的流程示意图。
图3为根据一个或多个实施例中过滤查询日志的流程示意图。
图4为根据一个或多个实施例中得到查询向量的流程示意图。
图5为根据一个或多个实施例中得到组合查询向量的流程示意图。
图6为根据一个或多个实施例中信息识别的流程示意图。
图7为根据一个或多个实施例中推送推荐信息的流程示意图。
图8为根据一个或多个实施例中得到预设意图聚类模型的流程示意图。
图9为根据一个或多个实施例中信息识别装置的结构框图。
图10为根据一个或多个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的信息识别方法,可以应用于如图1所示的应用环境中。终端102通过网络与服务器104进行通信。服务器104获取终端102发送的查询日志,查询日志中包括多个查询会话;根据查询会话的查询时间和查询次数进行过滤,得到目标查询日志;从目标查询日志中提取查询特征,将查询特征数值化得到目标查询日志中各个查询会话对应的查询向量;选取预设数量的查询会话对应的查询向量进行组合,得到目标查询向量;计算目标查询向量与预设意图聚类模型的相似度,得到目标查询向量对应的意图类;服务器104按照预设规则将各个查询会话对应的查询向量进行组合,得到各个组合查询向量,服务器104启动多个线程,使用多个线程并行计算各个组合查询向量与意图类的相似度,根据各个组合查询向量与意图类的相似度得到信息识别结果。终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在其中一个实施例中,如图2所示,提供了一种信息识别方法,以该方法应用于图1 中的服务器为例进行说明,包括以下步骤:
S202,获取查询日志,查询日志中包括多个查询会话。
查询日志是根据用户使用搜索引擎时产生的日志信息。查询会话是用户为了获得某一信息需求在一段时间间隔内从提交的查询到提交下一个查询或退出搜索引擎为止进行的一系列连续的交互活动。比如,可能提交一导航查询(如pingan bank),单击官方网站,停止搜索,就得到了对应的查询会话。该查询会话在服务器存储时,存储为多个字段信息,包括查询时间、查询语句,点击时间和点击URL(统一资源定位符)等等。查询日志中可以包括多个查询会话。
具体地,服务器会获取查询日志,该查询日志可以是多个不同的终端中获取到的不同的查询日志。每个查询日志都包括了用户在搜索时,产生的查询会话。该查询会话可以是多个。
S204,根据查询日志的查询时间和查询次数过滤查询日志,得到目标查询日志。
查询日志的查询时间是指完成用户完成查询时花费的总时间,查询次数是指该用户的查询日志中包括的查询次数。
具体地,服务器根据用户的查询日志的起始时间到结果时间和该查询日志中包括的查询会话的数量与预先设定的查询时间和查询次数进行比较,根据比较结果来删除查询日志中不符合预先设定的查询时间和查询次数的查询日志,将过滤后的查询日志作为目标查询日志。
S206,从目标查询日志中提取查询特征,将查询特征信息数值化得到目标查询日志中各个查询会话对应的查询向量。
查询特征是用来表示查询会话的特征。该查询特征是预先设置好,可以包括查询语句特征,单击的URL特征和组合特征,该组合特征是由查询语句特征和单击的URL特征组合得到的特征。
具体地,服务器从目标查询日志中提取到各个查询会话的查询特征,将查询特征数值化得到目标查询日志中各个查询会话对应的查询向量,可以使用二值化或者tf-idf将查询特征数值化得到查询向量。
S208,选取预设数量的查询会话对应的查询向量进行组合,得到目标查询向量。
查询向量是用来表征查询会话,每个查询会话对应一个查询向量。其中,预设数量是预先在服务器中设置好的数量且该设置数量小于查询向量的数量。该数量可以是人为设置的,也可以是统计历史查询日志中包括的查询向量的平均数得到的。
具体地,服务器从目标查询日志中各个查询会话对应的查询向量中选取预先设置好的数量的查询向量,将该数量的查询向量组合,得到目标查询向量。可以按照查询日志的中查询会话的时间顺序依次从早到晚选择预设梳理的查询会话对应的查询向量。比如,q_1,q_2,q_3,…,q_n为n个查询会话对应的n个查询向量。若d为历史查询日志中包括查询会话的平均数。此时,设置预设数量为d,则得到的目标查询向量可以是 (q_1,q_2,...,q_d)。
S210,计算目标查询向量与预设意图聚类模型的相似度,得到目标查询向量对应的意图类。
预设意图聚类模型是指预先将根据用户的历史查询日志使用聚类算法进行聚类得到用户的各个意图类的模型。
具体地,服务器使用相似度算法计算目标查询向量与预设意图聚类模型的相似度,得到目标查询向量对应的意图类,即找到目标查询向量所属于的意图类。其中相似度算法可以是欧式距离算法,余弦距离算法等等。
S212,按照预设规则将各个查询会话对应的查询向量进行组合,得到各个组合查询向量;
预设规则是指预先设置好的各个查询会话对应的查询向量的组合规则,比如,可以依次将各个查询会话对应的查询向量组合,也可以从各个查询会话对应的查询向量中选取查询向量进行组合。
具体地,服务器按照预设规则将各个查询会话对应的查询向量进行组合,得到各个组合查询向量,可以是即将第一个查询向量作为第一个组合查询向量,将第二个查询子向量与第一个组合查询向量组合,得到第二个组合查询向量,然后将第三个查询向量与第二个组合查询向量组合,得到第三个组合查询向量,直到所有的查询向量都被组合完成时,得到各个组合查询向量。
S214,启动多个线程,使用多个线程并行计算各个组合查询向量与意图类的相似度,根据各个组合查询向量与意图类的相似度得到信息识别结果。
其中,信息是指用户会话中的意图,信息识别是指识别两个相邻的查询会话之间查询意图是否发生变化。比如,在一个查询会话中要查询“狗”相关的信息。而在紧邻的查询会话中并没有继续查询与“狗”相关的信息,而是查询了“健身房”相关的信息。明显是查询不相关的意图,即这两个查询会话之间的意图发生了变化即信息识别结果为信息发送了变化。线程(thread)是操作系统能够进行运算调度的最小单位。
具体地,服务器启动多个线程,该多个线程可以并行运行,使用该并行运行的多个线程分别计算各个组合查询向量与意图类的相似度,比较各个组合查询向量与意图类的相似度,根据比较结果确定各个组合查询向量对应的查询会话的意图信息变化结果。
在上述信息识别方法中,通过查询日志得到查询向量,并得到查询向量对应的目标查询向量,得到该目标查询向量对应的意图类,然后将查询向量进行组合,得到各个组合查询向量,并启动多个线程,并行计算各个组合查询向量与意图类的相似度,然后根据各个相似度来确定信息识别结果,通过并行计算相似度,提高了计算相似度的效率,进而提高了确定大量数据中信息识别结果的效率。
在其中一个实施例中,如图3所示,步骤S204,即根据查询日志的查询时间和查询次数过滤查询日志,得到目标查询日志,包括步骤:
S302,查找查询日志的查询时间大于预设时间的第一查询日志,将第一查询日志从查询日志中删除。
具体地,服务器在用户的查询日志中查找查询时间大于预设时间的日志,将查找到的日志作为第一查询日志,该第一查询日志是指用户在较长时间进行查询产生的日志,比如用户查询时间超过一个小时的查询日志,将该第一查询日志从查询日志中删除。由于用户在长时间的查询中意图更改可能比较频繁,导致该查询日志不符合要求,删除该查询日志,保证查询日志数据的一致性
S304,查找查询日志的查询次数少于预设次数的第二查询日志,将第二查询日志从查询日志中删除,得到目标查询日志。
具体地,服务器在用户的查询日志中查找查询次数少于预设次数的日志,即查询日志中查询会话的数量少于预先设置好的数量,将查找到的日志作为第二查询日志,将第二查询日志从查询日志中删除,得到目标查询日志。由于查询会话数量太少,用户的意图的就根本不可能发生信息,比如,只有一个查询会话的查询日志。则该查询日志可能不够完整,无法描述用户的整个意图,删除,该查询日志,保证查询日志的一致性。
在其中一个实施例中,如图4所示,步骤S206,即从目标查询日志中提取查询特征信息,将查询特征信息数值化得到目标查询日志中各个查询会话对应的查询向量,包括步骤:
S402,从目标查询日志各个查询会话中得到查询文本,使用词袋策略提取查询文本中的关键词,得到查询关键词特征。
查询文本是指根据用户输入的查询语句得到的文本。即可以从查询日志各个查询会话中的查询语句字段中得到。词袋策略是指使用机器学习算法从文本中提起特征的方法,词袋(Bag-of-words)是描述文档中单词出现的文本的一种表示形式。
具体地,服务器从目标查询日志中各个查询会话的查询语句字段中得到查询文本,使用词袋策略提取查询文本中的关键词,得到查询关键词特征。即得到目标查询日志中各个查询会话的查询文本,得到各个查询会话的查询关键词特征。
S404,从目标查询日志各个查询会话中得到统一资源定位符信息,查找统一资源定位符信息对应的分类目录,得到统一资源定位符特征。
具体地,服务器从目标查询日志各个查询会话中的URL字段中得到统一资源定位符信息即URL信息,从开放式分类目录搜索系统(ODP)中查找到URL信息对应的分类目录,根据该分类目录得到统一资源定位符特征。
S406,将查询文本中的关键词和统一资源定位符信息对应的分类目录组合,得到组合特征。
具体地,服务器查询文本中的关键词和统一资源定位符信息对应的分类目录进行组合,将组合之后的信息作为组合特征。比如,用户在淘宝上搜索“蓝月亮洗衣液”,就可以根据用户的点击的商品的URL找到对应的商品目录。分类目录可以是“日常生活用户- 洗涤类-洗衣液”。关键词可以是“蓝月亮”和“洗衣液”,将分类目录和关键词中的每一项两两组合得到组合特征可以是“日常生活用户-蓝月亮”、“日常生活用户-洗衣液”、“洗涤类-蓝月亮”、“洗涤类-洗衣液”、“洗衣液-蓝月亮”和“洗衣液-洗衣液”。
S408,根据查询关键词特征、统一资源定位符特征和组合特征,得到查询特征,将查询特征数值化得到查询向量。
具体地,服务器将查询关键词特征、统一资源定位符特征信息和组合特征组合,就得到目标查询日志中各个查询会话对应的查询特征,将查询特征数值化得到目标查询日志中各个查询会话对应的查询向量。
在上述实施例中,通过查询语句得到查询关键词特征,根据点击的URL得到URL特征,然后根据查询语句和点击的URL得到URL特征信息得到组合特征息,使得到的查询特征更加的精确,从而使得到的查询向量更加的精确。
在其中一个实施例中,如图5所示,步骤S212,即按照预设规则将各个查询会话对应的查询向量进行组合,得到各个组合查询向量,包括步骤:
S502,获取各个查询会话中初始查询会话对应的初始查询向量,将所述初始查询向量作为第一组合查询向量;
初始查询会话是指目标查询日志中起始时间对应的查询会话,起始时间是用户开始查询的时间。
具体地,服务器获取目标查询日志对应的各个查询会话中的初始查询会话,并得到初始查询会话对应的初始查询向量,将初始查询会话作为第一组合查询向量。
S504,获取第一组合查询向量紧邻的查询向量,将第一组合查询向量与紧邻的查询向量组合,得到第二组合查询向量。
具体地,服务器从查询向量中获取初始查询向量紧邻的查询向量,即目标查询日志中第二个查询会话对应的查询向量,将初始查询向量与紧邻的查询向量组合,得到第二组合查询向量,比如初始查询向量为q_1,则紧邻的查询向量为q_2,则得到的第二组合查询向量为(q_1,q_2)。
S506,将第二组合查询向量作为第一组合查询向量,返回获取第一组合查询向量紧邻的查询向量的步骤进行执行,直到各个查询会话对应的查询向量全被组合时,得到各个组合查询向量。
具体地,服务器将将第二组合查询向量作为第一组合查询向量,返回步骤S504进行执行,即返回获取第一组合查询向量紧邻的查询向量的步骤执行。当所有查询会话对应的查询子向量都被组合时即得到的组合查询向量中包括所有查询会话对应的查询向量时,得到各个组合查询向量。比如,此时,将(q_1,q_2)作为第一组合查询向量,获取第一组合查询向量中紧邻的查询向量为q_3,组合得到第二组合查询向量为(q_1,q_2,q_3)。然后再将(q_1,q_2,q_3)作为第一组合查询向量,返回步骤获取第一组合查询向量紧邻的查询向量的步骤,直到各个查询会话对应的查询向量q_1,q_2,...,q_n全被组合时, 即得到最后一个组合查询向量为(q_1,q_2,...,q_n)时,得到各个组合查询向量。
在上述实施例中,通过将各个查询会话对应的查询向量进行组合,得到各个组合查询向量,便于计算组合查询向量与意图类的相似度。
在其中一个实施例中,如图6所示,步骤S508,即根据各个组合查询向量与意图类的相似度得到信息识别结果,包括步骤:
S602,获取第一组合查询向量与意图类的第一相似度并获取第二组合查询向量与意图类的第二相似度。
第一组合查询向量是初始查询向量得到组合查询向量,第二组合查询向量是初始查询向量和初始查询向量紧邻的查询向量组合得到的。
具体地,服务器获取第一组合查询向量与意图类的第一相似度并获取第二组合查询向量与意图类的第二相似度。
S604,将第一相似度与第二相似度进行比较,当第一相似度超过第二相似度时,得到第一组合查询向量对应的查询会话与第二组合查询向量对应的查询会话的信息已改变。
具体地,服务器将第一相似度与第二相似度进行比较,当第一相似度超过第二相似度时,得到第一组合查询向量对应的查询会话即初始查询会话与第二组合查询向量对应的查询会话即紧邻的查询会话之间的信息已改变,即意图发送变化,也就是初始查询会话和初始查询会话紧邻的查询会话之间查询的意图信息是不同的。可以将信息识别结果写入信息变化记录表中保存,该信息变化记录表用于记录信息识别的结果,包括信息变化前查询会话字段和信息变化后查询会话字段。
当第一相似度未超过第二相似度时,即说明第一组合查询向量对应的查询会话即初始查询会话与第二组合查询向量对应的查询会话即紧邻的查询会话之间的意图信息未发生变化,不做处理。
在其中一个实施例中,依次比较所有的组合查询向量的相似度,得到各个查询会话与紧邻的查询会话之间的意图信息识别结果,并将识别结果写入信息变化记录表中保存。
在上述实施例中,使用第一组合查询向量与意图类的第一相似度和第二组合查询向量与意图类的第二相似度进行比较,得到第一组合查询向量对应的查询会话与第二组合查询向量对应的查询会话的意图变化结果,提高了检测意图变化的准确性。
在其中一个实施例中,如图7所示,步骤S212,即在按照预设规则将各个查询会话对应的查询向量进行组合,得到各个组合查询向量,计算各个组合查询向量与意图类的相似度,得到信息结果之后,还包括步骤:
S702,根据信息识别结果从查询日志中获取信息变化前的查询会话和信息后的查询会话。
信息变化前的查询会话是指查询日志中在意图未发生变化时所有的查询会话。信息变化后的查询会话是指意图发生变化后,变化后的意图对应的所有的查询会话。比如,查询日志包括用户会话a1,用户会话a2,用户会话a3,用户会话a4和用户会话a5。该查询 日志中只有用户会话a3和用户会话a4之间发生意图变化。此时,信息变化前的查询会话就包括用户会话a1,用户会话a2,用户会话a3。信息变化后的查询会话就包括用户会话a4和用户会话a5。
具体地,服务器根据信息识别结果从查询日志中获取信息变化前的查询会话和信息变化后的查询会话。比如,获取到信息变化前的查询会话包括用户会话a1,用户会话a2,用户会话a3。信息变化后的查询会话包括用户会话a4和用户会话a5。
S704,获取信息变化前的查询会话和信息变化后的查询会话中的查询时间,根据查询时间得到信息变化前的查询会话对应意图的权重和信息变化后的查询会话对应意图的权重。
具体地,获取信息前的查询会话中的查询时间和信息变化后的查询会话中的查询时间,根据查询时间的大小确定信息变化前的查询会话对应意图的权重和信息变化后的查询会话对应意图的权重。比如,信息变化前的查询会话包括用户会话a1,用户会话a2和用户会话a3,则信息变化前的查询会话对应的意图的查询时间为a1的查询时间s1加上a2的查询时间s2以及a3的查询时间s3。信息变化后的查询会话包括用户会话a4和用户会话a5,则信息变化后的查询会话对应意图的查询时间为a4的查询时间s4加上a5的查询时间s5。并且根据查询时间的大小得到对应的权重。
S706,比较信息变化前的查询会话对应意图的权重和信息变化后的查询会话对应意图的权重,根据比较结果获取对应的推荐信息,并将推荐信息推送到查询终端。
具体地,服务器比较信息变化前的查询会话对应意图的权重和信息变化后的查询会话对应意图的权重,当信息变化前的查询会话对应意图的权重大于信息变化后的查询会话对应意图的权重时,获取到信息变化前的查询会话对应意图的推荐信息,将推荐信息推送到查询终端。当当信息变化前的查询会话对应意图的权重小于信息变化后的查询会话对应意图的权重时,获取到信息变化后的查询会话对应意图的推荐信息,将推荐信息推送到查询终端。
在上述实施例中,根据查询会话的查询时间来确定查询会话对应的意图权重,根据该权重大小来确定推荐信息,可使得到的推荐信息更为精准,符合用户的需求。
在其中一个实施例中,如图8所示,历史意图聚类结果的生成步骤包括:
S802,获取历史查询日志,根据历史查询日志中历史查询日志的查询时间和查询次数过滤历史查询日志,得到目标历史查询日志。
具体地,服务器获取到历史查询日志,根据历史查询日志中历史查询日志的查询时间,将查询时间大于预设阈值的历史查询日志删除。根据历史查询日志中历史查询日志的查询次数,将查询次数小于预设次数的历史查询日志删除,得到目标历史查询日志。
S804,提取目标历史查询日志中各个历史查询会话的历史查询特征,将历史查询特征信息数值化得到各个历史查询会话对应的历史查询向量。
具体地,服务器目标历史查询日志中各个历史查询会话的历史查询特征,该历史查询 特征信息包括关键词特征信息,URL特征信息以及关键词和URL组合特征信息。并且将历史查询特征信息数值化得到各个历史查询会话对应的历史查询向量。
S806,根据历史查询向量使用层次聚类算法进行聚类,当聚类完成时,得到预设意图聚类模型。
具体地,服务器根据历史查询向量使用层次聚类算法进行聚类,当聚类完成时,得到历史意图聚类结果。其中,聚类完成是指将所有的历史查询向量都划分到预设个数的聚类类别中。
在上述实施例中,通过使用历史查询日志过滤得到目标历史查询日志,根据目标历史查询日志使用层次聚类算法进行聚类,得到历史意图聚类结果,能够预先得到历史意图聚类结果,在进行信息识别时,可以直接调用,方便快捷,提高得到信息识别结果的效率。
应该理解的是,虽然图2-8的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-8中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在其中一个实施例中,如图9所示,提供了一种信息识别装置900,包括:日志获取模块902、过滤模块904、特征提取模块906、目标向量选取模块908、意图类得到模块910和结果得到模块912,其中:
日志获取模块902,用于获取查询日志,查询日志中包括多个查询会话;
过滤模块904,用于根据查询会话的查询时间和查询次数进行过滤,得到目标查询日志;
特征提取模块906,用于从目标查询日志中提取查询特征信息,将查询特征信息数值化得到目标查询日志中各个查询会话对应的查询向量;
目标向量选取模块908,用于选取预设数量的查询会话对应的查询向量进行组合,得到目标查询向量;
意图类得到模块910,用于计算目标查询向量与历史意图聚类结果的相似度,得到目标查询向量对应的意图类;
向量组合模块912,用于按照预设规则将各个查询会话对应的查询向量进行组合,得到各个组合查询向量;
信息识别模块914,用于启动多个线程,使用所述多个线程并行计算各个组合查询向量与意图类的相似度,根据所述各个组合查询向量与所述意图类的相似度得到信息识别结果。
在其中一个实施例中,过滤模块904,包括:
第一日志删除模块,用于查找查询日志的查询时间大于预设时间的第一查询日志,将第一查询会话从查询日志中删除;
第二日志删除模块,用于查找查询日志的查询次数少于预设次数的第二查询日志,将第二查询日志从查询日志中删除,得到目标查询日志。
在其中一个实施例中,特征提取模块906,包括:
词提取模块,用于从目标查询日志各个查询会话中得到查询文本,使用词袋策略提取查询文本中的关键词,得到查询关键词特征;
分类目录得到模块,用于从目标查询日志各个查询会话中得到统一资源定位符信息,查找统一资源定位符信息对应的分类目录,得到统一资源定位符特征;
得到组合特征模块,用于将查询文本中的关键词和统一资源定位符信息对应的分类目录组合,得到组合特征;
查询特征得到模块,用于根据查询关键词特征、统一资源定位符特征和组合特征,得到查询特征,将查询特征信息数值化得到目标查询日志中各个查询会话对应的查询向量。
在其中一个实施例中,向量组合模块912,包括:
第一组合查询向量获取模块,获取各个查询会话中初始查询会话对应的初始查询向量,将所述初始查询向量作为第一组合查询向量;
第二组合查询向量得到模块,用于获取第一组合查询向量紧邻的查询向量,将第一组合查询向量与紧邻的查询向量组合,得到第二组合查询向量;
循环模块,用于将第二组合查询向量作为第一组合查询向量,返回获取第一组合查询向量紧邻的查询向量的步骤进行执行,直到各个查询会话对应的查询向量全被组合时,得到各个组合查询向量。
在其中一个实施例中,信息识别模块914,包括:
相似度获取模块,用于获取第一组合查询向量与意图类的第一相似度并获取第二组合查询向量与意图类的第二相似度;
相似度比较模块,用于将第一相似度与第二相似度进行比较,当第一相似度超过第二相似度时,得到第一组合查询向量对应的查询会话与第二组合查询向量对应的查询会话的信息已改变。
在其中一个实施例中,信息识别装置900,还包括:
会话获取模块,用于根据信息识别结果从查询日志中获取信息变化前的查询会话和信息变化后的查询会话;
权重计算模块,用于获取信息变化前的查询会话和信息变化后的查询会话中的查询时间,根据查询时间得到信息变化前的查询会话对应意图的权重和信息变化后的查询会话对应意图的权重;
权重比较模块,用于比较信息变化前的查询会话对应意图的权重和信息变化后的查询会话对应意图的权重,根据比较结果获取对应的推荐信息,并将推荐信息推送到查询终端。
在其中一个实施例中,信息识别装置900,包括:
历史日志得到模块,用于获取历史查询日志,根据历史查询日志中历史查询日志的查询时间和查询次数过滤历史查询日志,得到目标历史查询日志;
历史向量得到模块,用于提取目标历史查询日志中各个历史查询会话的历史查询特征,将历史查询特征信息数值化得到各个历史查询会话对应的历史查询向量;
聚类模块,用于根据历史查询向量使用层次聚类算法进行聚类,当聚类完成时,得到历史意图聚类结果。
关于信息识别装置的具体限定可以参见上文中对于信息识别方法的限定,在此不再赘述。上述信息识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储查询日志数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种信息识别方法。
本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
一种计算机设备,包括存储器和一个或多个处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时实现本申请任意一个实施例中提供的信息识别方法的步骤。
一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现本申请任意一个实施例中提供的信息识别方法的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明 而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种信息识别方法,包括:
    获取查询日志,所述查询日志中包括多个查询会话;
    根据所述查询日志的查询时间和查询次数过滤所述查询日志,得到目标查询日志;
    从所述目标查询日志中提取查询特征,将所述查询特征数值化得到所述目标查询日志中各个查询会话对应的查询向量;
    选取预设数量的查询会话对应的查询向量进行组合,得到目标查询向量;
    计算所述目标查询向量与历史意图聚类结果的相似度,得到所述目标查询向量对应的意图类;
    按照预设规则将所述各个查询会话对应的查询向量进行组合,得到各个组合查询向量;及
    启动多个线程,使用所述多个线程并行计算所述各个组合查询向量与所述意图类的相似度,根据所述各个组合查询向量与所述意图类的相似度得到信息识别结果。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述查询日志的查询时间和查询次数过滤所述查询日志,得到目标查询日志,包括:
    查找所述查询日志的查询时间大于预设时间的第一查询日志,将所述第一查询日志从所述查询日志中删除;及
    查找所述查询日志的查询次数少于预设次数的第二查询日志,将所述第二查询日志从所述查询日志中删除,得到所述目标查询日志。
  3. 根据权利要求1所述的方法,其特征在于,所述从所述目标查询日志中提取查询特征,将所述查询特征数值化得到所述目标查询日志中各个查询会话对应的查询向量,包括:
    从所述目标查询日志各个查询会话中得到查询文本,使用词袋策略提取所述查询文本中的关键词,得到查询关键词特征;
    从所述目标查询日志各个查询会话中得到统一资源定位符信息,查找所述统一资源定位符信息对应的分类目录,得到统一资源定位符特征;
    将所述查询文本中的关键词和所述统一资源定位符信息对应的分类目录组合,得到组合特征;及
    根据所述查询关键词特征、所述统一资源定位符特征和所述组合特征,得到所述查询特征,将所述查询特征数值化得到所述目标查询日志中各个查询会话对应的查询向量。
  4. 根据权利要求1所述的方法,其特征在于,所述按照预设规则将所述各个查询会话对应的查询向量进行组合,得到各个组合查询向量,包括:
    获取所述各个查询会话中初始查询会话对应的初始查询向量,将所述初始查询向量作为第一组合查询向量;
    获取所述第一组合查询向量紧邻的查询向量,将所述第一组合查询向量与所述紧邻的 查询向量组合,得到第二组合查询向量;及
    将所述第二组合查询向量作为第一组合查询向量,返回获取所述第一组合查询向量紧邻的查询向量的步骤进行执行,直到所述各个查询会话对应的查询向量全被组合时,得到各个组合查询向量。
  5. 根据权利要求1所述的方法,其特征在于,所述根据所述各个组合查询向量与所述意图类的相似度得到信息识别结果,包括:
    获取第一组合查询向量与所述意图类的第一相似度并获取第二组合查询向量与所述意图类的第二相似度;及
    将所述第一相似度与所述第二相似度进行比较,当所述第一相似度超过所述第二相似度时,得到所述第一组合查询向量对应的查询会话与所述第二组合查询向量对应的查询会话的信息已改变。
  6. 根据权利要求1所述的方法,其特征在于,在所述启动多个线程,使用所述多个线程并行计算所述各个组合查询向量与所述意图类的相似度,根据所述各个组合查询向量与所述意图类的相似度得到信息识别结果之后,还包括:
    根据所述信息识别结果从所述查询日志中获取信息变化前的查询会话和信息变化后的查询会话;
    获取所述信息变化前的查询会话和所述信息变化后的查询会话中的查询时间,根据所述查询时间得到所述信息变化前的查询会话对应意图的权重和所述信息变化后的查询会话对应意图的权重;及
    比较所述信息变化前的查询会话对应意图的权重和所述信息变化后的查询会话对应意图的权重,根据比较结果获取对应的推荐信息,并将所述推荐信息推送到查询终端。
  7. 根据权利要求1所述的方法,其特征在于,所述历史意图聚类结果的生成步骤包括:
    获取历史查询日志,根据所述历史查询日志中历史查询日志的查询时间和查询次数过滤所述历史查询日志,得到目标历史查询日志;
    提取所述目标历史查询日志中各个历史查询会话的历史查询特征,将所述历史查询特征数值化得到各个历史查询会话对应的历史查询向量;及
    根据所述历史查询向量使用层次聚类算法进行聚类,当聚类完成时,得到所述历史意图聚类结果。
  8. 一种信息识别方法装置,包括:
    日志获取模块,用于获取查询日志,所述查询日志中包括多个查询会话;
    过滤模块,用于根据所述查询会话的查询时间和查询次数进行过滤,得到目标查询日志;
    特征提取模块,用于从所述目标查询日志中提取查询特征,将所述查询特征数值化得到所述目标查询日志中各个查询会话对应的查询向量;
    目标向量选取模块,用于选取预设数量的查询会话对应的查询向量进行组合,得到目标查询向量;
    意图类得到模块,用于计算所述目标查询向量与历史意图聚类结果的相似度,得到所述目标查询向量对应的意图类;
    向量组合模块,用于按照预设规则将所述各个查询会话对应的查询向量进行组合,得到各个组合查询向量;及
    信息识别模块,用于启动多个线程,使用所述多个线程并行计算所述各个组合查询向量与所述意图类的相似度,根据所述各个组合查询向量与所述意图类的相似度得到信息识别结果。
  9. 根据权利要求8所述的装置,其特征在于,过滤模块,包括:
    第一日志删除模块,用于查找查询日志的查询时间大于预设时间的第一查询日志,将第一查询会话从查询日志中删除;
    第二日志删除模块,用于查找查询日志的查询次数少于预设次数的第二查询日志,将第二查询日志从查询日志中删除,得到目标查询日志。
  10. 根据权利要求8所述的装置,其特征在于,特征提取模块,包括:
    词提取模块,用于从目标查询日志各个查询会话中得到查询文本,使用词袋策略提取查询文本中的关键词,得到查询关键词特征;
    分类目录得到模块,用于从目标查询日志各个查询会话中得到统一资源定位符信息,查找统一资源定位符信息对应的分类目录,得到统一资源定位符特征;
    得到组合特征模块,用于将查询文本中的关键词和统一资源定位符信息对应的分类目录组合,得到组合特征;
    查询特征得到模块,用于根据查询关键词特征、统一资源定位符特征和组合特征,得到查询特征,将查询特征信息数值化得到目标查询日志中各个查询会话对应的查询向量。
  11. 根据权利要求8所述的装置,其特征在于,向量组合模块,包括:
    第一组合查询向量获取模块,获取各个查询会话中初始查询会话对应的初始查询向量,将所述初始查询向量作为第一组合查询向量;
    第二组合查询向量得到模块,用于获取第一组合查询向量紧邻的查询向量,将第一组合查询向量与紧邻的查询向量组合,得到第二组合查询向量;
    循环模块,用于将第二组合查询向量作为第一组合查询向量,返回获取第一组合查询向量紧邻的查询向量的步骤进行执行,直到各个查询会话对应的查询向量全被组合时,得到各个组合查询向量。
  12. 根据权利要求8所述的装置,其特征在于,信息识别模块,包括:
    相似度获取模块,用于获取第一组合查询向量与意图类的第一相似度并获取第二组合查询向量与意图类的第二相似度;
    相似度比较模块,用于将第一相似度与第二相似度进行比较,当第一相似度超过第二 相似度时,得到第一组合查询向量对应的查询会话与第二组合查询向量对应的查询会话的信息已改变。
  13. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    获取查询日志,所述查询日志中包括多个查询会话;
    根据所述查询日志的查询时间和查询次数过滤所述查询日志,得到目标查询日志;
    从所述目标查询日志中提取查询特征,将所述查询特征数值化得到所述目标查询日志中各个查询会话对应的查询向量;
    选取预设数量的查询会话对应的查询向量进行组合,得到目标查询向量;
    计算所述目标查询向量与历史意图聚类结果的相似度,得到所述目标查询向量对应的意图类;
    按照预设规则将所述各个查询会话对应的查询向量进行组合,得到各个组合查询向量;及
    启动多个线程,使用所述多个线程并行计算所述各个组合查询向量与所述意图类的相似度,根据所述各个组合查询向量与所述意图类的相似度得到信息识别结果。
  14. 根据权利要求13所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    查找所述查询日志的查询时间大于预设时间的第一查询日志,将所述第一查询日志从所述查询日志中删除;及
    查找所述查询日志的查询次数少于预设次数的第二查询日志,将所述第二查询日志从所述查询日志中删除,得到所述目标查询日志。
  15. 根据权利要求13所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    从所述目标查询日志各个查询会话中得到查询文本,使用词袋策略提取所述查询文本中的关键词,得到查询关键词特征;
    从所述目标查询日志各个查询会话中得到统一资源定位符信息,查找所述统一资源定位符信息对应的分类目录,得到统一资源定位符特征;
    将所述查询文本中的关键词和所述统一资源定位符信息对应的分类目录组合,得到组合特征;及
    根据所述查询关键词特征、所述统一资源定位符特征和所述组合特征,得到所述查询特征,将所述查询特征数值化得到所述目标查询日志中各个查询会话对应的查询向量。
  16. 根据权利要求13所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:
    获取所述各个查询会话中初始查询会话对应的初始查询向量,将所述初始查询向量作 为第一组合查询向量;
    获取所述第一组合查询向量紧邻的查询向量,将所述第一组合查询向量与所述紧邻的查询向量组合,得到第二组合查询向量;及
    将所述第二组合查询向量作为第一组合查询向量,返回获取所述第一组合查询向量紧邻的查询向量的步骤进行执行,直到所述各个查询会话对应的查询向量全被组合时,得到各个组合查询向量。
  17. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    获取查询日志,所述查询日志中包括多个查询会话;
    根据所述查询日志的查询时间和查询次数过滤所述查询日志,得到目标查询日志;
    从所述目标查询日志中提取查询特征,将所述查询特征数值化得到所述目标查询日志中各个查询会话对应的查询向量;
    选取预设数量的查询会话对应的查询向量进行组合,得到目标查询向量;
    计算所述目标查询向量与历史意图聚类结果的相似度,得到所述目标查询向量对应的意图类;
    按照预设规则将所述各个查询会话对应的查询向量进行组合,得到各个组合查询向量;及
    启动多个线程,使用所述多个线程并行计算所述各个组合查询向量与所述意图类的相似度,根据所述各个组合查询向量与所述意图类的相似度得到信息识别结果。
  18. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    查找所述查询日志的查询时间大于预设时间的第一查询日志,将所述第一查询日志从所述查询日志中删除;及
    查找所述查询日志的查询次数少于预设次数的第二查询日志,将所述第二查询日志从所述查询日志中删除,得到所述目标查询日志。
  19. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    从所述目标查询日志各个查询会话中得到查询文本,使用词袋策略提取所述查询文本中的关键词,得到查询关键词特征;
    从所述目标查询日志各个查询会话中得到统一资源定位符信息,查找所述统一资源定位符信息对应的分类目录,得到统一资源定位符特征;
    将所述查询文本中的关键词和所述统一资源定位符信息对应的分类目录组合,得到组合特征;及
    根据所述查询关键词特征、所述统一资源定位符特征和所述组合特征,得到所述查询特征,将所述查询特征数值化得到所述目标查询日志中各个查询会话对应的查询向量。
  20. 根据权利要求17所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:
    获取所述各个查询会话中初始查询会话对应的初始查询向量,将所述初始查询向量作为第一组合查询向量;
    获取所述第一组合查询向量紧邻的查询向量,将所述第一组合查询向量与所述紧邻的查询向量组合,得到第二组合查询向量;及
    将所述第二组合查询向量作为第一组合查询向量,返回获取所述第一组合查询向量紧邻的查询向量的步骤进行执行,直到所述各个查询会话对应的查询向量全被组合时,得到各个组合查询向量。
PCT/CN2019/116508 2019-07-23 2019-11-08 信息识别方法、装置、计算机设备和存储介质 WO2021012483A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910666381.XA CN110555165B (zh) 2019-07-23 2019-07-23 信息识别方法、装置、计算机设备和存储介质
CN201910666381.X 2019-07-23

Publications (1)

Publication Number Publication Date
WO2021012483A1 true WO2021012483A1 (zh) 2021-01-28

Family

ID=68735838

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116508 WO2021012483A1 (zh) 2019-07-23 2019-11-08 信息识别方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN110555165B (zh)
WO (1) WO2021012483A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079448A (zh) * 2019-12-31 2020-04-28 出门问问信息科技有限公司 一种意图识别方法及装置
CN112070416B (zh) * 2019-12-31 2024-04-16 北京来也网络科技有限公司 基于ai的rpa流程的生成方法、装置、设备及介质
CN112214588B (zh) * 2020-10-16 2024-04-02 深圳赛安特技术服务有限公司 多意图识别方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090319517A1 (en) * 2008-06-23 2009-12-24 Google Inc. Query identification and association
CN102609433A (zh) * 2011-12-16 2012-07-25 北京大学 基于用户日志进行查询推荐的方法及系统
CN107256267A (zh) * 2017-06-19 2017-10-17 北京百度网讯科技有限公司 查询方法和装置
CN108304444A (zh) * 2017-11-30 2018-07-20 腾讯科技(深圳)有限公司 信息查询方法及装置
CN109145934A (zh) * 2017-12-22 2019-01-04 北京数安鑫云信息技术有限公司 基于日志的用户行为数据处理方法、介质、设备及装置
CN109583472A (zh) * 2018-10-30 2019-04-05 中国科学院计算技术研究所 一种web日志用户识别方法和系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7877389B2 (en) * 2007-12-14 2011-01-25 Yahoo, Inc. Segmentation of search topics in query logs
CN104217030B (zh) * 2014-09-28 2018-12-11 北京奇虎科技有限公司 一种根据服务器搜索日志数据进行用户分类的方法和装置
CN104217031B (zh) * 2014-09-28 2019-08-02 北京奇虎科技有限公司 一种根据服务器搜索日志数据进行用户分类的方法和装置
CN109145213B (zh) * 2018-08-22 2020-07-28 清华大学 基于历史信息的查询推荐方法及装置
CN109857848A (zh) * 2019-01-18 2019-06-07 深圳壹账通智能科技有限公司 交互内容生成方法、装置、计算机设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090319517A1 (en) * 2008-06-23 2009-12-24 Google Inc. Query identification and association
CN102609433A (zh) * 2011-12-16 2012-07-25 北京大学 基于用户日志进行查询推荐的方法及系统
CN107256267A (zh) * 2017-06-19 2017-10-17 北京百度网讯科技有限公司 查询方法和装置
CN108304444A (zh) * 2017-11-30 2018-07-20 腾讯科技(深圳)有限公司 信息查询方法及装置
CN109145934A (zh) * 2017-12-22 2019-01-04 北京数安鑫云信息技术有限公司 基于日志的用户行为数据处理方法、介质、设备及装置
CN109583472A (zh) * 2018-10-30 2019-04-05 中国科学院计算技术研究所 一种web日志用户识别方法和系统

Also Published As

Publication number Publication date
CN110555165B (zh) 2023-04-07
CN110555165A (zh) 2019-12-10

Similar Documents

Publication Publication Date Title
WO2021004333A1 (zh) 基于知识图谱的事件处理方法、装置、设备和存储介质
CN110765275B (zh) 搜索方法、装置、计算机设备和存储介质
CN109446302B (zh) 基于机器学习的问答数据处理方法、装置和计算机设备
US11176124B2 (en) Managing a search
WO2019136993A1 (zh) 文本相似度计算方法、装置、计算机设备和存储介质
WO2020057022A1 (zh) 关联推荐方法、装置、计算机设备和存储介质
US20180165370A1 (en) Methods and systems for object recognition
US20160034512A1 (en) Context-based metadata generation and automatic annotation of electronic media in a computer network
WO2021012483A1 (zh) 信息识别方法、装置、计算机设备和存储介质
CN110046298B (zh) 一种查询词推荐方法、装置、终端设备及计算机可读介质
US9720979B2 (en) Method and system of identifying relevant content snippets that include additional information
CN110377558B (zh) 文档查询方法、装置、计算机设备和存储介质
WO2021012790A1 (zh) 页面数据生成方法、装置、计算机设备及存储介质
CN112015900B (zh) 医学属性知识图谱构建方法、装置、设备及介质
CN111177405A (zh) 数据搜索匹配方法、装置、计算机设备和存储介质
US20140379719A1 (en) System and method for tagging and searching documents
WO2020020287A1 (zh) 一种获取文本相似度的方法、装置、设备及可读存储介质
CN112651236B (zh) 提取文本信息的方法、装置、计算机设备和存储介质
CN112560444A (zh) 文本处理方法、装置、计算机设备和存储介质
CN110532229B (zh) 证据文件检索方法、装置、计算机设备和存储介质
CN109086386B (zh) 数据处理方法、装置、计算机设备和存储介质
CN109656947B (zh) 数据查询方法、装置、计算机设备和存储介质
CN110688516A (zh) 图像检索方法、装置、计算机设备和存储介质
US11507593B2 (en) System and method for generating queryeable structured document from an unstructured document using machine learning
CN113676505B (zh) 信息推送方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19938796

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19938796

Country of ref document: EP

Kind code of ref document: A1