CN106874492B - Searching method and device - Google Patents

Searching method and device Download PDF

Info

Publication number
CN106874492B
CN106874492B CN201710099795.XA CN201710099795A CN106874492B CN 106874492 B CN106874492 B CN 106874492B CN 201710099795 A CN201710099795 A CN 201710099795A CN 106874492 B CN106874492 B CN 106874492B
Authority
CN
China
Prior art keywords
site
search
user
search keyword
tuples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710099795.XA
Other languages
Chinese (zh)
Other versions
CN106874492A (en
Inventor
寿如阳
朱健
林睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710099795.XA priority Critical patent/CN106874492B/en
Publication of CN106874492A publication Critical patent/CN106874492A/en
Application granted granted Critical
Publication of CN106874492B publication Critical patent/CN106874492B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • G06Q30/0625Directed, with specific intent or strategy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a searching method and a searching device. One embodiment of the method comprises: performing text segmentation on user search keywords input by a user in a search engine, and combining segmented words obtained after the text segmentation to obtain a plurality of user search keyword tuples; respectively finding out the in-site search keyword tuples matched with each user search keyword tuple from the plurality of in-site search keyword tuples; selecting the in-station search keyword tuples meeting preset conditions from the found in-station search keyword tuples, and selecting core words from the selected in-station search keyword tuples; and presenting the search result in the site corresponding to the core word to the user. Core words which better represent the interests and the intentions of the user are mined from user search keywords such as search engine import, the core words are utilized to search in sites such as E-commerce, and search results of commodities such as E-commerce sites which are interested by the user are presented to the user.

Description

Searching method and device
Technical Field
The application relates to the field of interconnection, in particular to the field of search, and particularly relates to a search method and a search device.
Background
The Search Engine may bring more clicks and concerns to the sites with which the e-commerce is collaborating through such means as Search Engine Marketing (SEM). The e-commerce imports users on the search engine into the e-commerce site by purposively purchasing keywords on the search engine. The e-commerce site can provide a middle page as a portal for the search engine to import traffic, stimulating the user's purchasing interest. At present, a user search keyword imported by a search engine is generally searched in a commodity retrieval system of an e-commerce site directly, and a search result is presented to the user on an intermediate page.
However, since the knowledge systems of the search engine and the commodity retrieval system of the e-commerce site are significantly different, the search engine tends to be more universal in application scenarios and tends to have unbiased properties such as the heat of information during searching, and the commodity retrieval system of the e-commerce site is deeply optimized based on the commodity set and tends to limit the retrieval target to the range of known commodities. Therefore, the search keyword of the user directly imported by the search engine is searched in the commodity retrieval system of the e-commerce site, and the commodity interested by the user is difficult to return, so that the commodity interested by the user cannot be presented to the user on the middle page, and the user experience and the final conversion are influenced.
Disclosure of Invention
The present application provides a searching method and apparatus, which are used to solve the technical problems in the background art section.
In a first aspect, the present application provides a search method, comprising: performing text segmentation on user search keywords input by a user in a search engine, and combining segmented words obtained after the text segmentation to obtain a plurality of user search keyword tuples; searching an in-site search keyword tuple matched with each user search keyword tuple from the plurality of in-site search keyword tuples respectively, wherein the in-site search keyword tuple is generated by combining segmented words obtained by performing text segmentation on the in-site search keywords input by the user in a site in advance; selecting the in-station search keyword tuples meeting preset conditions from the found in-station search keyword tuples, and selecting core words from the selected in-station search keyword tuples, wherein the preset conditions comprise: the strength of the search intent of at least one category within the corresponding site is greater than a threshold; and presenting the search result in the site corresponding to the core word to the user.
In a second aspect, the present application provides a search apparatus comprising: the processing unit is configured to perform text segmentation on user search keywords input by a user in a search engine, and combine segmented words obtained after the text segmentation to obtain a plurality of user search keyword tuples; the searching unit is configured to respectively search out in-site search keyword tuples matched with each user search keyword tuple from the plurality of in-site search keyword tuples, wherein the in-site search keyword tuples are generated by combining segmented words obtained by text segmentation of in-site search keywords input by the user in a site in advance; the core word screening unit is configured to select the in-station search keyword tuples meeting preset conditions from the found in-station search keyword tuples and select core words from the selected in-station search keyword tuples, wherein the preset conditions comprise: the strength of the search intent of at least one category within the corresponding site is greater than a threshold; and the search unit in the site is configured to present the search result in the site corresponding to the core word to the user.
According to the searching method and the searching device, a plurality of user searching keyword tuples are obtained by segmenting the text of the user searching keywords input by the user in a searching engine and combining segmented words obtained after segmenting the text; respectively finding out the in-site search keyword tuples matched with each user search keyword tuple from the plurality of in-site search keyword tuples; selecting the in-station search keyword tuples meeting preset conditions from the found in-station search keyword tuples, and selecting core words from the selected in-station search keyword tuples; and presenting the search result in the site corresponding to the core word to the user. Core words which better represent the interests and the intentions of the user are mined from user search keywords such as search engine import, the core words are utilized to search in sites such as E-commerce, and search results of commodities such as E-commerce sites which are interested by the user are presented to the user.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram that may be applied to the search method of the present application;
FIG. 2 shows a flow chart of the search method of the present application;
FIG. 3 illustrates an exemplary flow chart of a search method of the present application;
fig. 4 shows a schematic structural diagram of the search device of the present application.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture diagram that can be applied to the search method of the present application.
As shown in fig. 1, the system architecture may include a search engine 101, a network 102, and a site 103. Network 102 is used to provide the medium for transmission links between search engine 101 and sites 103. Network 102 may include various connection types, such as wired, wireless transmission links, or fiber optic cables, among others.
Search engine 101 may import traffic for site 103. For example, site 103 may be an e-commerce site, and search engine 101 may import traffic for site 103 through a search engine marketing tool. The server on the site 103 can mine the core words which better represent the interests and intentions of the user from the user search keywords imported by the search engine 101, and search in the retrieval system of the site 103 by using the core words to obtain the search results which are interesting to the user, so that the search results which are interesting to the user are presented to the user in the search middle page.
Referring to fig. 2, a flow chart of the search method of the present application is shown. The method may be performed by a server, such as a server on site 103 in fig. 1, and accordingly the search means may be provided in the server, such as a server on site 103. The method comprises the following steps:
step 201, processing the search keyword input by the user in the search engine.
Taking a website as an e-commerce website and taking a search engine as a search engine for importing traffic for the e-commerce website as an example, in order to rewrite a user search keyword imported by the search engine into a core word which better represents the interest and intention of a user, search in a commodity retrieval system in the e-commerce website by using the core word and return a commodity interested by the user, the user search keyword imported by the search engine may be first obtained. After the user search keyword is obtained, text segmentation may be performed on the user search keyword to obtain a plurality of segmented words. The segmented terms may then be combined to obtain a user search keyword tuple.
In some embodiments, when text segmentation is performed on a user search keyword, a preset vocabulary including Unregistered words in a site may be preset, taking into account an influence of the Unregistered words on the quality of the text segmentation. When the text is cut, besides the dictionary formed by common words, the words which are not logged in the keywords searched by the user can be accurately cut according to the preset word list.
Taking a site as an e-commerce site as an example, in-site search words input by a user during searching in the e-commerce site are usually related to commodity names, brands and the like, and most of the in-site search words belong to unknown words, but express strong search intentions for commodities. In order to improve the text segmentation quality, preset word lists such as a category word list, a commodity word list, a brand word list and the like can be updated regularly, wherein the category word list, the commodity word list and the brand word list comprise keywords for indicating commodity categories in sites of e-commerce, keywords for indicating commodity names in sites of e-commerce and keywords for indicating commodity brands in the sites of e-commerce. Therefore, when text segmentation is performed on the user search keywords, words such as keywords representing commodity categories, keywords representing commodity names, and keywords representing commodity brands can be accurately segmented according to the category vocabulary, the commodity vocabulary, and the brand vocabulary which are updated regularly. Therefore, the accuracy of text segmentation is improved.
In some embodiments, after text parsing of the user search keyword to obtain a plurality of parsed words, an N-tuple (N-Gram) model may be used to combine the parsed words to obtain a tuple of the user search keyword.
For example, the segmented word obtained by segmenting the text of the user search keyword includes "apple". By "apple" it may be meant a brand or a commodity in a different context. The segmented words may be combined using an N-tuple model. When the segmentation words are combined by using the N-tuple model, each segmentation word and the left and right adjacent continuous words can form a user search keyword tuple, and the maximum length N is an adjustable parameter. Therefore, the search keyword tuple of the user comprises the phrase and certain context information, and the search intention of the user can be highlighted. For example, if the user search keyword tuple includes "apple" and "mobile phone," it can be determined that "apple" refers to a brand, and at the same time, the search intention of the user can be more accurately reflected as a certain brand of mobile phone product.
Step 202, finding out the in-site search keyword tuples matched with each user search keyword tuple respectively.
After obtaining the user search keyword tuples through step 201, for example, after performing text segmentation and N-tuple model on the user search keywords imported from the search engine in combination with the preset word list to obtain the user search keyword tuples, the in-site search keyword tuples matched with each user search keyword tuple can be found out from the plurality of in-site search keyword tuples obtained in advance, that is, the tuples of the user search keywords can be found out from the plurality of in-site search keyword tuples obtained in advance.
In some embodiments, a large number of search results in a site corresponding to in-site search keywords input by a user in historical search in the site and clicked in-site search keywords may be obtained in advance. Then, according to a preset word list containing the unregistered words in the site, text segmentation is carried out on the obtained in-site search keywords, and an N-tuple model is adopted to combine the segmented words to obtain in-site search keyword tuples.
And step 203, mining core words from the searched in-site search keyword tuples.
After finding in-site search keyword tuples matching each of the user search keyword tuples, via step 202, core terms may be further mined from the found in-site search keyword tuples.
In some embodiments, the information gain corresponding to each in-site search keyword tuple may be calculated in advance, and the core term is mined based on the information gain corresponding to the in-site search keyword tuple. For any search keyword tuple, an information gain can be defined as the difference in search intent certainty with or without the search keyword tuple. Taking a site as an e-commerce site as an example, assuming that final conversion of a search behavior without keyword description is uniformly distributed on all commodities, and adding a search keyword 'mobile phone' at the time, it can be inferred that a conversion target is only limited to commodities under the mobile phone category. The reduction of the target range or the deterministic improvement caused by adding the search keyword can be quantitatively described by information gain in the information theory.
In some embodiments, the historical conversion category set corresponding to each in-site search keyword tuple and the conversion times of each category may be predetermined. For example, a user inputs a site search keyword tuple in a historical search in a site, and clicks a search result of one category in search results in the site corresponding to the site search keyword tuple, the category can be used as a historical conversion category corresponding to the site search keyword tuple, and the number of clicks of the search result of the category by the user can be used as the conversion number of the category. After the historical conversion category set corresponding to each in-site search keyword tuple and the conversion times of each category are respectively calculated, the information gain corresponding to each in-site search keyword tuple can be further respectively calculated. The information gain corresponding to each in-site search keyword tuple can be obtained by subtracting the conditional entropy of the category transformation probability under the condition that the in-site search keyword tuple participates in the in-site search from the entropy of all the category transformation probabilities in the site. After the information gain corresponding to each in-site search keyword tuple is calculated, a dictionary containing the in-site search keyword tuple and the information gain corresponding to the in-site search keyword tuple can be constructed.
In some embodiments, the in-site search keyword tuple matching the user search keyword tuple may be looked up in a dictionary containing the in-site search keyword tuple and the information gain corresponding to the in-site search keyword tuple, i.e., the tuple of the user search keyword is looked up in the dictionary.
If the in-site search keyword tuple matching with the user search keyword tuple does not exist in the dictionary, that is, the tuple of the user search keyword is not found in the dictionary, the information gain can be considered to be zero.
If the in-site search keyword tuple matched with the user search keyword tuple exists in the dictionary, namely the tuple of the user search keyword is searched out from the dictionary, the information gains corresponding to the in-site search keyword tuple matched with the user search keyword tuple can be searched out in sequence, namely the tuple of the searched user search keyword is sequenced from high to low according to the information gains, and the user search keyword tuples with the top rank are selected as candidates of the core words. Therefore, the screened core words can better express the interests and intentions of the user.
Taking a site as an e-commerce site as an example, a user inputs search words such as "apple Samsung is good" and "millet 6 is sold at any time" in a search engine, whether useful morphemes such as "apple Mobile", "Samsung Mobile", "millet 5" are searched in the e-commerce site can be judged, although millet 6 is not actually sold, the interest of the user in searching for commodities in the e-commerce site can still be analyzed, the user is considered to be interested in millet, and millet 5 can be recommended according to the historical heat of the user in the site.
According to the method and the device, the information gain constructed based on the category rather than the commodity can be used, the instability of the calculated value caused by too few sales records of the long-tail commodity is avoided, and the requirements of subsequent middle page optimization can be met.
And step 204, presenting the search results in the site corresponding to the core words to the user.
After the core words are obtained through step 203, the core words may be combined to obtain a core word combination. The core word combinations can be utilized to search within the site to obtain search results that are of interest to the user, and the search results are presented to the user.
In some embodiments, after a search within a site is conducted using core word combinations, and search results of interest to a user are obtained, the search results may be presented to the user in a search intermediate page.
Taking a site as an e-commerce site and a search engine as a search engine capable of importing traffic for the e-commerce site as an example, the core word combination can be utilized to search in a commodity retrieval system in the site, and an obtained search result is presented to a user in a search middle page. Because the core combination in the core word combination can better express the interest and intention of the user, the search result obtained by searching the core word combination in the commodity retrieval system in the website is the commodity interested by the user, and the commodity interested by the user can be presented in the search middle page, so that the commodity display accuracy rate is improved.
Referring to fig. 3, an exemplary flowchart of a search method provided by the present application is shown.
And segmenting the in-station search keywords by a text, processing the in-station search keywords by an N-tuple model, and summarizing to obtain an in-station search keyword tuple. And segmenting the user search keywords by a text, processing the user search keywords by an N-tuple model, and summarizing to obtain a user search keyword tuple. The method comprises the steps of determining a historical conversion category set corresponding to each in-station search keyword tuple and the conversion times of each category according to click histories corresponding to in-station search keywords in advance, and further calculating information gain corresponding to each in-station search keyword, wherein the information gain is obtained by subtracting conditional entropy of category conversion probability under the condition of determining the tuple from entropy of all the category conversion probability, and the in-station search keyword tuple and the corresponding information gain form a dictionary.
Tuples of user search keywords may be looked up from the dictionary. If not, the gain is considered to be zero. If the search keyword exists, the searched tuples of the user search keywords can be sorted from high to low according to the information gain, and the tuple of the searched user search keywords with the top rank is selected as a candidate of the core word. After obtaining multiple tuples with information gains ranked at the top, the situation that some tuples are subsets of other tuples may occur, and deduplication and sensitive words removal can be performed to obtain core words, which can better represent interests and intentions of users. Then, the core words can be arranged and combined to obtain the rewriting target. Therefore, the rewriting target can be used for searching in a retrieval system in the site, a search result which is interesting to the user is obtained, and the search result is presented to the user.
The following takes a site as an e-commerce site as an example to illustrate the advantages of the search method of the present application: in the method, a model reflecting the search intention of the user for different categories can be constructed by utilizing the search data of the website of the e-commerce, and the model can be constructed according to the in-site search word group and the information gain corresponding to the site search word group. The search behavior imported by the search engine can be mapped to the model, the rewritten search word group is used for searching in a commodity searching system in the site to obtain the commodity interested by the user, and the commodity interested by the user is presented in the middle search page. Therefore, the problem that the quality of the content returned to the user is low due to the fact that the search words input in the search engine are not normal and the search habits are different when the search system directly utilizes the user search keywords input by the search engine to directly input into the website of the electronic commerce for searching is effectively solved, and the search recall rate is improved. The commodity which the user is interested in can be returned by rewriting the website search of the E-commerce, and the commodity display accuracy rate is improved.
Please refer to fig. 4, which shows a schematic structural diagram of the search apparatus of the present application. The search device includes: the system comprises a processing unit 401, a searching unit 402, a core word screening unit 403 and an intra-site searching unit 404. The processing unit 401 is configured to perform text segmentation on a user search keyword input by a user in a search engine, and combine segmented words obtained after the text segmentation to obtain a plurality of user search keyword tuples; the searching unit 402 is configured to respectively search out an in-site search keyword tuple matching each user search keyword tuple from the plurality of in-site search keyword tuples, wherein the in-site search keyword tuple is generated by combining segmented words obtained by text segmentation of in-site search keywords input by a user in a site in advance; the core word screening unit 403 is configured to select a site search keyword tuple satisfying preset conditions from the found site search keyword tuples and select a core word from the selected site search keyword tuple, where the preset conditions include: the strength of the search intent of at least one category within the corresponding site is greater than a threshold; the search-in-site unit 404 is configured to present search results in a site corresponding to the core term to the user.
The application also provides a server, which can comprise the searching device described in the figure 4. The server may be configured with one or more processors; a memory for storing one or more programs, wherein the one or more programs may include instructions for performing the operations described in the above steps 201 and 204. The one or more programs, when executed by the one or more processors, cause the one or more processors to perform the operations described in step 201 and 204 above.
The present application also provides a computer readable medium, which may be included in a server; or the device can exist independently and is not assembled into the server. The computer readable medium carries one or more programs which, when executed by the server, cause the server to: performing text segmentation on user search keywords input by a user in a search engine, and combining segmented words obtained after the text segmentation to obtain a plurality of user search keyword tuples; searching an in-site search keyword tuple matched with each user search keyword tuple from the plurality of in-site search keyword tuples respectively, wherein the in-site search keyword tuple is generated by combining segmented words obtained by performing text segmentation on the in-site search keywords input by the user in a site in advance; selecting the in-station search keyword tuples meeting preset conditions from the found in-station search keyword tuples, and selecting core words from the selected in-station search keyword tuples, wherein the preset conditions comprise: the strength of the search intent of at least one category within the corresponding site is greater than a threshold; and presenting the search result in the site corresponding to the core word to the user.
It should be noted that the computer readable medium can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (10)

1. A method of searching, the method comprising:
based on a preset word list, performing text segmentation on user search keywords input by a user in a search engine, and combining segmented words obtained after the text segmentation to obtain a plurality of user search keyword tuples;
finding the in-site search keyword tuple matched with each user search keyword tuple from the plurality of in-site search keyword tuples respectively, wherein the method comprises the following steps: searching in-site search keyword tuples matched with each user search keyword tuple from a dictionary, wherein the in-site search keyword tuples are generated by combining segmented words obtained by text segmentation of in-site search keywords input by a user in a site on the basis of the preset word list in advance, and the dictionary comprises information gains corresponding to the in-site search keyword tuples and the in-site search keyword tuples;
selecting the in-station search keyword tuples meeting preset conditions from the found in-station search keyword tuples, and selecting core words from the selected in-station search keyword tuples, wherein the preset conditions comprise: the strength of the search intent of at least one category within the corresponding site is greater than a threshold;
and presenting the search results in the site corresponding to the core words to the user.
2. The method of claim 1, wherein segmenting a text of a user search keyword input by a user in a search engine, and combining segmented words obtained after segmenting the text to obtain a plurality of user search keyword tuples comprises:
based on a preset word list, performing text segmentation on the user search keywords to obtain segmented words, wherein the preset word list comprises: a commodity category keyword, a commodity name keyword, a commodity brand keyword;
and combining the segmented words by adopting an N-tuple model to obtain a user search keyword tuple.
3. The method of claim 2, further comprising:
acquiring in-site search keywords input by a user in a site;
performing text segmentation on the in-station search keywords based on the preset word list to obtain segmented words;
and combining the segmented words by adopting an N-tuple model to obtain the in-station search keyword tuple.
4. The method of claim 3, further comprising:
calculating information gain corresponding to the in-station search keyword tuple, wherein the information gain is obtained by subtracting the conditional entropy of the category transformation probability under the condition of determining the in-station search keyword tuple from the entropy of all the category transformation probabilities in the station;
and constructing a dictionary containing the in-site search keyword tuple and information gain corresponding to the in-site search keyword tuple.
5. The method of claim 4, wherein the step of selecting the in-site search keyword tuples satisfying the predetermined condition from the searched in-site search keyword tuples comprises:
sorting the searched in-site search keyword tuples according to the information gain corresponding to the searched in-site search keyword tuples in the dictionary;
and selecting the in-station search keyword tuples with the sorted positions before the preset order.
6. The method of claim 5, wherein selecting core terms from the selected in-site search keyword tuples comprises:
and removing the duplication and the sensitive words of the selected words in the in-station search keyword tuple with the sorted position before the preset order to obtain the core words.
7. The method of claim 6, wherein presenting search results within a site corresponding to a core term to the user comprises:
arranging and combining the core words to obtain a core word combination;
searching in the site by using the core word combination to obtain a search result in the site;
and presenting the search results in the site to the user in a search intermediate page.
8. A search apparatus, characterized in that the apparatus comprises:
the processing unit is configured to perform text segmentation on user search keywords input by a user in a search engine based on a preset word list, and combine segmented words obtained after the text segmentation to obtain a plurality of user search keyword tuples;
the searching unit is configured to respectively search the in-site search keyword tuples matched with each user search keyword tuple from the plurality of in-site search keyword tuples, and comprises the following steps: searching in-site search keyword tuples matched with each user search keyword tuple from a dictionary, wherein the in-site search keyword tuples are generated by combining segmented words obtained by text segmentation of in-site search keywords input by a user in a site on the basis of the preset word list in advance, and the dictionary comprises information gains corresponding to the in-site search keyword tuples and the in-site search keyword tuples;
the core word screening unit is configured to select a site search keyword tuple meeting a preset condition from the found site search keyword tuples and select a core word from the selected site search keyword tuple, wherein the preset condition includes: the strength of the search intent of at least one category within the corresponding site is greater than a threshold;
and the search unit in the site is configured to present the search result in the site corresponding to the core word to the user.
9. A server, comprising:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN201710099795.XA 2017-02-23 2017-02-23 Searching method and device Active CN106874492B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710099795.XA CN106874492B (en) 2017-02-23 2017-02-23 Searching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710099795.XA CN106874492B (en) 2017-02-23 2017-02-23 Searching method and device

Publications (2)

Publication Number Publication Date
CN106874492A CN106874492A (en) 2017-06-20
CN106874492B true CN106874492B (en) 2021-01-26

Family

ID=59168524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710099795.XA Active CN106874492B (en) 2017-02-23 2017-02-23 Searching method and device

Country Status (1)

Country Link
CN (1) CN106874492B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330023B (en) * 2017-06-21 2021-02-12 北京百度网讯科技有限公司 Text content recommendation method and device based on attention points
CN107609192A (en) * 2017-10-12 2018-01-19 北京京东尚科信息技术有限公司 The supplement searching method and device of a kind of search engine
CN107944166A (en) * 2017-11-30 2018-04-20 中州大学 The implementation method and device of a kind of Electronic Design
CN108268617B (en) * 2018-01-05 2021-10-29 创新先进技术有限公司 User intention determining method and device
CN110209827B (en) * 2018-02-07 2023-09-19 腾讯科技(深圳)有限公司 Search method, search device, computer-readable storage medium, and computer device
CN108228907B (en) * 2018-02-08 2021-04-23 北京三快在线科技有限公司 Information recommending method and device, electronic equipment and storage medium
CN110232581A (en) * 2018-03-06 2019-09-13 北京京东尚科信息技术有限公司 It is a kind of to provide the method and apparatus of discount coupon for user
CN110633352A (en) * 2018-06-01 2019-12-31 北京嘀嘀无限科技发展有限公司 Semantic retrieval method and device
CN111695022B (en) * 2019-01-18 2023-07-04 创新奇智(重庆)科技有限公司 Interest searching method based on knowledge graph visualization
CN111597297A (en) * 2019-02-21 2020-08-28 北京京东尚科信息技术有限公司 Article recall method, system, electronic device and readable storage medium
CN110941694A (en) * 2019-10-14 2020-03-31 珠海格力电器股份有限公司 Knowledge graph searching and positioning method and system, electronic equipment and storage medium
CN111428022B (en) * 2020-03-25 2023-06-02 北京明略软件系统有限公司 Information retrieval method, device and storage medium
CN111553765A (en) * 2020-04-27 2020-08-18 广州探途网络技术有限公司 E-commerce search sorting method and device and computing equipment
CN111797205B (en) * 2020-06-30 2024-03-12 百度在线网络技术(北京)有限公司 Vocabulary retrieval method and device, electronic equipment and storage medium
CN112769823A (en) * 2021-01-07 2021-05-07 北京码牛科技有限公司 Information management-based secure network auditing method and system
CN113360755A (en) * 2021-05-31 2021-09-07 北京乐我无限科技有限责任公司 Information pushing and displaying method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102289436A (en) * 2010-06-18 2011-12-21 阿里巴巴集团控股有限公司 Method and device for determining weighted value of search term and method and device for generating search results
CN102456058A (en) * 2010-11-02 2012-05-16 阿里巴巴集团控股有限公司 Method and device for providing category information
CN104376115A (en) * 2014-12-01 2015-02-25 北京奇虎科技有限公司 Fuzzy word determining method and device based on global search

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102289436A (en) * 2010-06-18 2011-12-21 阿里巴巴集团控股有限公司 Method and device for determining weighted value of search term and method and device for generating search results
CN102456058A (en) * 2010-11-02 2012-05-16 阿里巴巴集团控股有限公司 Method and device for providing category information
CN104376115A (en) * 2014-12-01 2015-02-25 北京奇虎科技有限公司 Fuzzy word determining method and device based on global search

Also Published As

Publication number Publication date
CN106874492A (en) 2017-06-20

Similar Documents

Publication Publication Date Title
CN106874492B (en) Searching method and device
US10795939B2 (en) Query method and apparatus
US20210256047A1 (en) System and method for providing technology assisted data review with optimizing features
US11321759B2 (en) Method, computer program product and system for enabling personalized recommendations using intelligent dialog
US9594826B2 (en) Co-selected image classification
CN109325182B (en) Information pushing method and device based on session, computer equipment and storage medium
US9230025B2 (en) Searching for information based on generic attributes of the query
US20130060769A1 (en) System and method for identifying social media interactions
US10102246B2 (en) Natural language consumer segmentation
CN103514299A (en) Information searching method and device
US20100191758A1 (en) System and method for improved search relevance using proximity boosting
CN104933100A (en) Keyword recommendation method and device
CN108121814B (en) Search result ranking model generation method and device
CN111444304A (en) Search ranking method and device
CN107330079B (en) Method and device for presenting rumor splitting information based on artificial intelligence
US20200334697A1 (en) Generating survey responses from unsolicited messages
CN103530364A (en) Method and system for providing download link
US20190163828A1 (en) Method and apparatus for outputting information
CN112199602A (en) Post recommendation method, recommendation platform and server
CN112579729A (en) Training method and device for document quality evaluation model, electronic equipment and medium
CN112328889A (en) Method and device for determining recommended search terms, readable medium and electronic equipment
CN110245357B (en) Main entity identification method and device
CN111444424A (en) Information recommendation method and information recommendation system
CN112148958A (en) Method, apparatus, and computer storage medium for information recommendation
US10191970B2 (en) Systems and methods for customized data parsing and paraphrasing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant