CN113177061B - Searching method and device and electronic equipment - Google Patents

Searching method and device and electronic equipment Download PDF

Info

Publication number
CN113177061B
CN113177061B CN202110571293.9A CN202110571293A CN113177061B CN 113177061 B CN113177061 B CN 113177061B CN 202110571293 A CN202110571293 A CN 202110571293A CN 113177061 B CN113177061 B CN 113177061B
Authority
CN
China
Prior art keywords
target
search
word
search word
preset value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110571293.9A
Other languages
Chinese (zh)
Other versions
CN113177061A (en
Inventor
赵伟刚
郭剑霓
罗展松
吴鹏
蒋宁
吴海英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mashang Xiaofei Finance Co Ltd
Original Assignee
Mashang Xiaofei Finance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mashang Xiaofei Finance Co Ltd filed Critical Mashang Xiaofei Finance Co Ltd
Priority to CN202110571293.9A priority Critical patent/CN113177061B/en
Publication of CN113177061A publication Critical patent/CN113177061A/en
Application granted granted Critical
Publication of CN113177061B publication Critical patent/CN113177061B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a searching method, a searching device and electronic equipment, wherein the searching method comprises the following steps: acquiring search words; determining a target search word based on the number of characters of the search word, wherein the target search word is the search word when the number of characters of the search word is smaller than or equal to a first preset value, and the target search word is a word segmentation result obtained by word segmentation of the search word when the number of characters of the search word is larger than the first preset value; searching in a target database based on the target search word, and outputting a search result. The search method, the search device and the electronic equipment can solve the problem that the existing search method is poor in search effect.

Description

Searching method and device and electronic equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a search method, apparatus, and electronic device.
Background
The search process of the existing search method is generally: and receiving search words of the user, performing word segmentation processing on the search words to obtain word segmentation results, and then searching based on the word segmentation results. However, in this way, a large number of irrelevant results are often searched, and thus, the existing searching method has a problem of poor searching effect.
Disclosure of Invention
The search method, the search device and the electronic equipment can solve the problem that the existing search method is poor in search effect.
In order to solve the technical problems, the specific implementation scheme of the invention is as follows:
in a first aspect, an embodiment of the present invention provides a search method, including:
acquiring search words;
determining a target search word based on the number of characters of the search word, wherein the target search word is the search word when the number of characters of the search word is smaller than or equal to a first preset value, and the target search word is a word segmentation result obtained by word segmentation of the search word when the number of characters of the search word is larger than the first preset value;
searching in a target database based on the target search word, and outputting a search result.
In a second aspect, an embodiment of the present invention further provides a search apparatus, including:
the acquisition module is used for acquiring the search word;
the determining module is used for determining a target search word based on the number of characters of the search word, wherein the target search word is the search word when the number of characters of the search word is smaller than or equal to a first preset value, and the target search word is a word segmentation result obtained by word segmentation of the search word when the number of characters of the search word is larger than the first preset value;
And the search module is used for searching in the target database based on the target search word and outputting a search result.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a processor, a memory, and a computer program stored on the memory and executable on the processor, where the computer program is executed by the processor to implement the steps of the method described in the first aspect.
In a fourth aspect, embodiments of the present invention also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of the first aspect.
According to the method and the device, the target search word is determined based on the word number of the search word, wherein when the word number of the search word is smaller than a certain value, the search word is directly used as the target search word, and when the word number of the search word is larger than the certain value, the word segmentation result obtained after word segmentation processing is carried out on the search word is used as the target search word, so that the problem that a large number of irrelevant search results are searched out due to further word segmentation processing on the search word when the search word is shorter can be avoided, and the searching effect can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.
FIG. 1 is one of the flowcharts of a search method provided by an embodiment of the present invention;
FIG. 2 is a second flowchart of a search method according to an embodiment of the present invention;
FIG. 3 is a block diagram of a search device according to an embodiment of the present invention;
fig. 4 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, fig. 1 is a search method provided in the present application, where the search method includes:
step 101, obtaining search words;
102, determining a target search word based on the number of characters of the search word, wherein the target search word is the search word when the number of characters of the search word is smaller than or equal to a first preset value, and the target search word is a word segmentation result obtained by word segmentation of the search word when the number of characters of the search word is larger than the first preset value;
and 103, searching in a target database based on the target search word, and outputting a search result.
The number of characters may include at least: chinese character number, english character number, punctuation number, special character number, etc. When the search word input by the user is a Chinese search word, the number of Chinese characters can be determined through the number of characters. The method provided by the embodiment of the application is further explained below by taking the search term as a Chinese search term as an example.
The number of characters of the search word is the number of bytes contained in the search word, and for example, when the search word is "mobile phone number is registered", the number of bytes is 14, and the number of characters is 7. In addition, the number of words of the search term can be directly identified.
The size of the first preset value may be set according to actual needs, for example, the value of the first preset value may be 4, that is, the number of words is 2, in this case, when the number of words included in the search word is less than or equal to 2, the search word is directly determined as the target search word, that is, word segmentation processing is not performed on the search word in the search process. Correspondingly, when the number of words contained in the search word is greater than 2, word segmentation processing is needed to be carried out on the search word in the search process, a word segmentation result after the word segmentation processing is used as a target search word, and then searching is carried out based on the target search word.
In one embodiment of the present application, when the number of characters of the search term is less than or equal to a first preset value, searching may be performed in the target database according to a wild card method based on the target search term, and a search result is output. Correspondingly, when the number of characters of the search word is larger than a first preset value, word segmentation fuzzy search can be performed according to a multi-field matching query (multi-match) method based on the target search word, and a search result is output.
In the process of word segmentation processing of the search word, the input search word can be subjected to word segmentation operation based on the maximum force splitting method in the Chinese word segmentation plug-in. For example, a dictionary and rule based open source chinese word segmenter (IK segmenter) may be employed to segment search words using a fine granularity mode of the IK segmenter.
The target database can be determined according to a specific search scene, wherein the target database is used for storing search results. For example, when the method is applied to a customer service intelligent question-answering system, the target database can store answers to various common questions maintained by the system, so that when a user inputs search words, questions and answers related to the search words can be accurately searched in the target database based on the search words of the user.
It will be appreciated that each search result stored in the target database may be a word-segmentation result, for example, a search result includes a header portion, a keyword portion, and a text portion, and before the search is stored in the target database, word-segmentation is required for the header portion, the keyword portion, and the text portion of the search result, respectively, and then the word-segmentation results of the three portions may be stored in different subsections in the same search result. Thus, when searching in the target database based on the target search term, the target search term may be matched with the word segment in each search result in the target database to determine the search result of the desired search.
In addition, compared with the existing wild card searching method, the searching method can improve the searching effect. The reason for this is: based on the search mode of the wildcard, if the length of the search word is longer, for example, the search word input by the user is "can be repaid with WeChat", because the search method of the wildcard does not need to split the search word, the search is performed according to the Like of the SQL sentence in the wildcard mode, the effective result is difficult to retrieve, and the time consumption is higher when the multiple fields are searched. Because the searching method does not divide words, if only WeChat repayment is carried out in the maintenance content, the user cannot search the searching result.
Correspondingly, compared with the existing multi-field matching search mode, the search method can improve the search effect. The reason for this is: in the prior art, if the length of the search term is shorter, for example, the search term input by the user is "bill", and the search is performed according to the multi-field matching mode, the title, the keyword and the item containing "bill" in the content are listed, and the irrelevant data is displayed more; if the search key word is input longer, the user inputs "can pay back with WeChat", the weight and the minimum matching degree parameter are not set, and a plurality of irrelevant items can be searched according to the multi-field matching mode query, and the information is not necessarily the effective information wanted by the user.
In this embodiment, the target search word is determined based on the number of characters of the search word, where when the number of characters of the search word is smaller than a certain value, the search word is directly used as the target search word, and when the number of characters of the search word is larger than a certain value, the word segmentation result obtained by word segmentation processing on the search word is used as the target search word, so that when the search word is shorter, the problem that a large number of irrelevant search results are searched out due to further word segmentation processing on the search word can be avoided, and thus the search effect can be improved.
Optionally, when the number of characters of the search word is greater than the first preset value, the target search word includes n segmentation words, where n is a positive integer, and the searching is performed in a target database based on the target search word, and a search result is output, including:
searching in the target database based on the n segmented words under the condition that the n is smaller than or equal to a second preset value, and outputting a first target object, wherein the second preset value is larger than 1, and the first target object is an object comprising the n segmented words in the target database;
and under the condition that n is larger than the second preset value, searching in the target database based on the n segmented words, and outputting a second target object, wherein the second target object is an object comprising part of segmented words in the n segmented words in the target database.
The number of word segmentation or clause number after the word segmentation of the search word is performed on the n search words, for example, when the search word is "mobile phone number is registered", the word segmentation result of the word segmentation of the search word according to the IK word segmentation device is: "mobile phone number", "mobile phone", "phone number", "registered". At this time, the search word is segmented into 7 segmented words in total, and therefore, in this case, the n=7.
Specifically, since the longer the length of the search word, the larger the number of resultant segmented words after the segmentation processing is, in general. And the longer the search term, the more specific the search condition is, when the search term is too long, since keywords which are irrelevant to the search result, such as wrongly written words or word of speech, may be included in the search term, in this case, an object which includes n word segments at the same time may not be found in the search result stored in the target database. Accordingly, when the search term is shorter, the number of resultant segmented words is generally smaller after the segmentation process, and in this case, keywords having an affinity with the search result are generally among the search terms.
Based on this, in one embodiment of the present application, when n is less than or equal to a second preset value, it is determined that the number of the word segments at this time is small, in order to ensure accuracy of a search result, a first target object is searched in a target database based on the n word segments, and the first target object is an object including the n word segments in the target database. Wherein searching the target database for the first target object based on the n segmentations refers to: the n segmented words are respectively matched with each object in the target database, all first target objects comprising the n segmented words are searched, all the searched first target objects are output, namely the number of the output first target objects can be more than or equal to 1, and of course, when the first target objects are not searched based on the target search words, the output result of 'searching for matching cannot be searched' can be output.
When the first object of interest includes a plurality of different constituent parts, the first object of interest including the n segmentation words may refer to: regarding the multiple different composition parts of the first target object as a whole, only n participles need to be included in the whole, namely, regarding the first target object as n participles, namely, the n participles can be distributed in different parts of the first target object, for example, when the first target object includes a title part, a keyword part and a text part, the first target object includes the n participles can be respectively distributed in the title part, the keyword part and the text part.
It will be appreciated that the n tokens are n different tokens, and that the first target object includes n tokens means that the first target object includes n different tokens, but not the first target object includes n tokens, in fact, the same token may appear multiple times in the first target object.
Correspondingly, when n is greater than a second preset value, the number of the segmented words is determined to be more, namely the length of the search word is longer, so as to ensure that the search result can be searched in the target database based on the n segmented words. At this time, a second target object may be queried in a target database based on the n participles, where the second target object is an object including a part of the n participles in the target database. Wherein the second target object is an object including a partial word segment of the n word segments in the target database: the second object may be an object in the target database, where the object includes s word segments in the n word segments, s may be smaller than n, and the value of s may be selected according to the actual situation, for example, s may be two thirds of n. Wherein the s partial words may refer to s different partial words of the n partial words. Thus, when the number of the segmented words in the target search word is large, the second target object is ensured to comprise partial segmented words in n parts, and the second target object can be determined to be a search result and output, so that the probability of searching the search result in the target database when the search word is long is improved.
Wherein searching the target database for a second target object based on n tokens refers to: the n segmentation words are respectively matched with each object in the target database, all second target objects are searched, all the searched second target objects are output, namely, the number of the output second target objects can be more than or equal to 1, and of course, when the second target objects are not searched based on the target search words, the output result of 'the search result which cannot be searched' can be output.
The value of the second preset value may be selected according to actual situations, for example, in an embodiment of the present application, the value of the second preset value may be 2, that is, when the number of the segmented words included in the target search word is less than or equal to 2, all the segmented words in the target search word need to be included in the first searched target object. And when the number of the word segments of the target search word is greater than 2, ensuring that the searched second target object comprises part of the word segments in the target search word, and outputting the second target object as a search result.
In this embodiment, the target objects with different matching degrees are selected as the output results in the target database based on the difference of the number of the segmented words included in the target search word, so that the accuracy of the search results can be improved, and meanwhile, the probability of searching the search results in the target database when the search word is longer can be improved.
Optionally, the second target object includes a first sub-target object and a second sub-target object, the searching in the target database based on the n segmentation words, outputting the second target object, including:
searching in the target database based on the n segmented words under the condition that the n is smaller than a third preset value, and outputting the first sub-target object, wherein the first sub-target object is an object comprising at least k segmented words in the n segmented words in the target database;
searching in the target database based on the n segmented words when the n is greater than or equal to the third preset value, and outputting the second sub-target object, wherein the second sub-target object is an object comprising at least m segmented words in the n segmented words in the target database;
the third preset value is greater than the second preset value, k is greater than the product of a target coefficient and a first probability value, m is greater than the product of the target coefficient and a second probability value, the target coefficient is calculated based on n, the first probability value is greater than the second probability value, the first probability value is greater than 0 and less than 1, and the second probability value is greater than 0 and less than 1.
Specifically, since the larger the number of the segmented words included in the target search word, that is, the larger the value of n, the smaller the probability that the object including n segmented words simultaneously can be retrieved. Based on this, in one embodiment of the present application, it may be provided that: when the value of n is larger, the matching degree between the search result determined in the target database and the target search word may be lower, that is, when the value of n is larger, the ratio of the number of words included in the output target object to n may be smaller, and correspondingly, when the value of n is smaller, the ratio of the number of words included in the output target object to n may be larger. Specifically, when n is smaller than the third preset value, the value of n is determined to be smaller, and when n is greater than or equal to the third preset value, the value of n is determined to be larger, where the value of the third preset value may be selected according to the actual situation, for example, in an embodiment of the present application, the value of the third preset value may be 6.
In one embodiment, a minimum matching degree parameter may be set, and when the number of different word segments in the retrieved object including the target search word is greater than or equal to the minimum matching degree parameter, the search result is determined to be the target search result. Wherein, different minimum matching degree parameters can be set according to different values of n.
Specifically, a basic matching degree coefficient may be calculated based on the n, and the basic matching degree coefficient may be n or may be smaller than n. And when n is smaller than a third preset value, searching a first sub-target object in the target database based on the n segmented words, wherein the first sub-target object is an object comprising at least k segmented words in the n segmented words in the target database, k is larger than the product of a target coefficient and a first probability value, and in this case, the value of the minimum matching degree parameter is k. And searching a second sub-target object in the target database based on the target search word when the n is greater than or equal to the third preset value, wherein the second sub-target object is an object comprising at least m segmentation words in the n segmentation words in the target database, and m is greater than the product of the target coefficient and a second probability value, and in this case, the value of the minimum matching degree parameter is m. The target coefficient is a coefficient calculated based on the n, and the first probability value is greater than the second probability value. In this way, it is ensured that the value of n is determined to be smaller when n is smaller than a third preset value, and is determined to be larger when n is greater than or equal to the third preset value.
In one embodiment of the present application, the basic matching degree coefficient may have a value of "0.8 times n and rounded down". The first probability value may be 0.75 and the second probability value may be 0.6.
For example, when the value of the third preset value is 6 and the search word is "mobile phone number is registered", the word segmentation result of the word segmentation process for the search word according to the IK word segmentation device is: "mobile phone number", "mobile phone", "phone number", "registered". At this time, the search word is segmented into 7 segmented words in total, and therefore, in this case, the n=7. At this time, the base matching degree coefficient is equal to (0.8x7) rounded down, i.e., the base matching degree coefficient is equal to 5. Since n=7, i.e. n is greater than the third preset value 6, at this time, the second sub-target object is queried in the target database, i.e. m has a value greater than (0.6x5), i.e. m is greater than 3. At this time, since the value of m is a positive integer, the minimum value of m is 4. In this case, searching is performed in the target database based on the 7 tokens, and the second sub-target object is output, which is an object including at least 4 tokens among the 7 tokens in the target database. Through verification, when the search word input by the user is 'mobile phone number registered', the duration of outputting the search result is about 800 milliseconds, and compared with the prior art, the time consumption of searching can be reduced, and the searching speed is improved.
In this embodiment, when n is less than or equal to the second preset value, by further determining the relative sizes of n and the third preset value, and setting different minimum matching degree parameters when n is less than the third preset value and n is greater than or equal to the third preset value, respectively, the probability that the search result can be found in the target database when the search term is longer can be further improved.
Optionally, the search result includes at least two target objects, the target objects including at least one of: a title portion, a keyword portion, and a body portion, each portion including a corresponding weight value;
the outputting search results includes:
scoring the at least two target objects according to the number of the segmented words contained in each part of the target objects and the weight value of each part of the target objects to obtain the scoring value of each target object;
and sequencing the at least two target objects according to the magnitude of the scoring value, and outputting the sequenced at least two target objects.
Wherein, since the content of the title portion is generally a full-text topic, the content of the keyword portion is generally an overview of the end portion of the full-text, and the content of the body portion generally includes both topic-related content and some unrelated content. Thus, the weight value of the title portion may be greater than the weight value of the keyword portion, which may be greater than the weight value of the body portion. In this way, when searching is performed in the target database based on the target search words, at least two target objects are searched, the at least two target objects can be respectively scored based on the number of the segmentation words contained in each part of the target objects and the weight value of each part, so as to obtain the scoring value of each target object; and then, sorting the at least two target objects according to the magnitude of the grading value, and outputting the sorted at least two target objects. Therefore, the searched at least two target objects can be ensured to be displayed in an arrangement mode according to the association degree with the target search words, and a user can conveniently and quickly locate the search results to be searched.
In one embodiment of the present application, the weight value of the title portion is 3, the weight value of the keyword portion is 2, and the weight value of the text portion is 1. When searching for the three target objects A, B, C, if in the target object a: the title part comprises 1 word segment, the keyword part comprises 0 word segment, and the text part comprises 1 word segment; target object B: the title part comprises 0 participles, the keyword part comprises 0 participles, and the text part comprises 3 participles; target object C: the title portion includes 0 tokens, the keyword portion includes 2 tokens, and the body portion includes 2 tokens. The score value of the target object a is (1× 3+0 ×2+1×1) =4 points. The score value of the target object B is (0× 3+0 ×2+3×1) =3 points. The score value of the target object C is (0×3+2×2+2×1) =6 points. Therefore, the sorting results are in turn: target object C, target object a, target object B.
It is understood that the target object may be the first target object or the first sub-target object or the second sub-target object in the above embodiments.
In this embodiment, when at least two target objects are searched, scoring is performed on the at least two target objects based on the number of tokens included in each part of the target objects and the weight value of each part, so as to obtain a scoring value of each target object; and then, sorting the at least two target objects according to the magnitude of the grading value, and outputting the sorted at least two target objects. Therefore, the searched at least two target objects can be arranged and displayed according to the association degree with the target search words, and the user can conveniently and quickly locate the search results to be searched.
Optionally, in the case that the number of characters of the search word is less than or equal to the first preset value, searching in a target database based on the target search word, and outputting a search result includes:
searching in the target database based on the target search word, and outputting a third target object, wherein the third target object is an object in the target database, and the title part comprises the target search word.
Specifically, when the number of characters of the search term is less than or equal to the first preset value, the title portion of each object may be queried in the target database based on the wild card search mode, and a third target object whose title portion includes the target search term may be output. For example, when the value of the first preset value is 2, if the search term input by the user is "deferred", at this time, a wild card search mode may be invoked to only query the header portion of each object in the target database, so as to search out a search result that meets the requirement, and the process takes about 500 milliseconds after verification. And each part (specifically including a title part, a keyword part, a text part and the like) in the database is searched based on the search word in a conventional manner, so that a large number of irrelevant results can be searched, and the time is about 2 seconds. Therefore, by adopting the searching method provided by the application, the accuracy of the searching result can be improved, and meanwhile, the time consumption of searching can be reduced, and the searching speed can be improved.
In this embodiment, when the number of characters of the search term is less than or equal to the first preset value, that is, the search term is short, if a full-text search is performed on each object in the target database based on the search term (that is, the title portion, the keyword portion, and the text portion are searched at the same time), a large number of irrelevant results may be searched. In order to avoid searching a large number of irrelevant results, in the embodiment of the present application, only the target search word is matched with the header portion of each object in the target database, and the third target object whose header portion includes the target search word is output, so that the subject of the searched object is ensured to be related to the target search word, thereby improving accuracy of the search result, reducing time consumption of the search, and improving searching speed.
Optionally, before the search term is obtained, the method further includes:
under the condition that a user logs in a search interface, acquiring identity information of the user;
recommending target search results to the user based on the identity information of the user, wherein the target search results are search results which are matched with the identity information of the user in the target database and have the highest click rate.
The identity information of the user may be a type of the user, for example, the user may be classified according to the age, sex, and other features of the user. Alternatively, users may be classified into: guest users, general users, member users, senior users, and the like. And the search results with highest click rates of different types of users can be counted in the user search process. For another example, in another embodiment of the present application, different labels may be set for different types of users, where the classification of the different types of users may be: registered users, real-name users, authorized users, etc. In the process of searching the user, the search word of the user and the search result clicked by the user can be reported to an event index; and counting according to label maintenance and event maintenance through timing tasks, and counting the search results with the highest click rate under all labels, so that the search results with the highest click rate under the labels of the user are recommended to the user by acquiring the classified labels of the user.
In this embodiment, when a user logs in to the search interface, the classification of the user is determined according to the identity information of the user, then, the search result with the highest click rate in the classification of the user is determined, and the search result with the highest click rate is recommended to the user. Therefore, the time spent in the searching process of the user can be saved, and the searching effect is further improved.
In addition, compared with the prior art, the searching method provided by the embodiment of the application has the following beneficial effects: (1) The user searches the content more quickly and efficiently, and simultaneously, the search result meets the expectations of the user as much as possible. (2) According to the searching method, higher-quality basic data are provided, the user universality problem is exposed to the user in advance through statistical analysis of the data, the searching frequency is reduced, and the pressure of a server is reduced. (3) By adopting the searching method provided by the application, the problem solving rate is improved compared with that of the conventional searching method, the consultation rate of the manual customer service is reduced by 42%, and the labor cost is greatly saved.
Referring to fig. 2, a flowchart of a search method according to an embodiment of the present application is provided, where the method includes the following steps: receiving search words input by a user; judging whether the number of characters in the search word is larger than a first preset value, determining the search word as a target search word under the condition that the number of characters is not larger than the first preset value, searching in a target database based on the target search word, and outputting a third target object, wherein the third target object is an object of the target search word in the target database, and the title part comprises the target search word. And then scoring all the searched third target objects according to the number of the word fragments contained in the title part of the third target object, sorting all the searched third target objects according to the scoring value of the third target objects, and outputting all the sorted third target objects.
And calling a word segmentation method to segment the search word under the condition that the number of characters contained in the target search word is larger than the first preset value, obtaining a word segmentation result and taking the word segmentation result as the target search word, wherein the target search word comprises n word segments. And then further judging whether the number of the segmented words included in the target search word is larger than a second preset value, searching in a target database based on the target search word under the condition that the number of the segmented words included in the target search word is not larger than the second preset value, and outputting a first sub-target object, wherein the first sub-target object is an object of at least k segmented words in the target database. And then, scoring the searched first sub-target objects according to the number of the words and the weight values of the parts contained in each part of the first sub-target objects to obtain the scoring value of each first sub-target object, wherein the scoring process of the first sub-target objects is the same as that of the target objects in the embodiment, and the repetition is avoided, and the description is omitted. And finally, sorting according to the grading values of the first sub-target objects, and outputting all the sorted first sub-target objects.
Correspondingly, under the condition that the number of the segmented words included in the target search word is larger than a second preset value, searching is conducted in a target database based on the target search word, and a second sub-target object is output, wherein the second sub-target object is an object of at least m segmented words in the n segmented words in the target database. And then, scoring the searched second sub-target objects according to the number of the segmented words contained in each part of the second sub-target objects and the weight values of each part to obtain the scoring value of each second sub-target object, wherein the scoring process of the second sub-target objects is the same as that of the target objects in the embodiment, and the repeated scoring process is omitted. And finally, sorting according to the scoring values of the second sub-target objects, and outputting all the sorted second sub-target objects.
The third preset value is greater than the second preset value, k is greater than the product of a target coefficient and a first probability value, m is greater than the product of the target coefficient and a second probability value, the target coefficient is calculated based on n, and the first probability value is greater than the second probability value. In addition, the specific implementation process in this embodiment may refer to the above-mentioned embodiment, and in order to avoid repetition, the description is omitted here.
Referring to fig. 3, fig. 3 is a search apparatus 300 provided in an embodiment of the present application, including:
an obtaining module 301, configured to obtain a search term;
a determining module 302, configured to determine a target search word based on the number of characters of the search word, where the target search word is the search word when the number of characters of the search word is less than or equal to a first preset value, and the target search word is a word segmentation result obtained by word segmentation of the search word when the number of characters of the search word is greater than the first preset value;
and the searching module 303 is used for searching in the target database based on the target search word and outputting a search result.
Optionally, when the number of characters of the search term is greater than the first preset value, the target search term includes n segmented words, where n is a positive integer, and the search module 303 is configured to search in the target database based on the n segmented words when n is less than or equal to a second preset value, and output a first target object, where the second preset value is greater than 1, and the first target object is an object including the n segmented words in the target database;
The searching module 303 is further configured to search in the target database based on the n terms and output a second target object when the n is greater than the second preset value, where the second target object is an object in the target database that includes a part of the n terms.
Optionally, the second target object includes a first sub-target object and a second sub-target object, and the search module 303 is further configured to search in the target database based on the n terms and output the first sub-target object, where the first sub-target object is an object including at least k terms of the n terms in the target database, where n is smaller than a third preset value;
the searching module 303 is further configured to search in the target database based on the target search word if the n is greater than or equal to the third preset value, and output the second sub-target object, where the second sub-target object is an object in the target database that includes at least m parts of the n parts of words;
the third preset value is greater than the second preset value, k is greater than the product of a target coefficient and a first probability value, m is greater than the product of the target coefficient and a second probability value, the target coefficient is calculated based on n, and the first probability value is greater than the second probability value.
Optionally, the search result includes at least two target objects, the target objects including at least one of: a title portion, a keyword portion, and a body portion, each portion including a corresponding weight value; the search module 303 includes:
the scoring sub-module is used for scoring the at least two target objects according to the word segmentation quantity contained in each part of the target objects and the weight value of each part to obtain the scoring value of each target object;
and the output sub-module is used for sequencing the at least two target objects according to the magnitude of the scoring value and outputting the sequenced at least two target objects.
Optionally, in the case that the number of characters of the search word is less than or equal to the first preset value, the search module 303 is configured to search in the target database based on the target search word, and output a third target object, where the third target object is an object of the target database, and a header part includes the target search word.
Optionally, the obtaining module 301 is further configured to obtain identity information of the user if the user logs in to a search interface;
The apparatus further comprises:
and the recommending module is used for recommending target search results to the user based on the identity information of the user, wherein the target search results are search results which are matched with the identity information of the user in the target database and have the highest click rate.
Optionally, the searching module 303 is configured to search in the target database according to a wild card method based on the target search term, and output a search result.
The searching apparatus 300 provided in the embodiment of the present invention can implement each process in the above method embodiment, and in order to avoid repetition, details are not repeated here.
Referring to fig. 4, fig. 4 is a block diagram of an electronic device 400 according to still another embodiment of the present invention, and as shown in fig. 4, the electronic device 400 includes: a processor 401, a memory 402 and a computer program stored on the memory 402 and executable on the processor, the components in the electronic device 400 being coupled together by a bus interface 403, the computer program when executed by the processor 401 realizing the steps of:
acquiring search words;
determining a target search word based on the number of characters of the search word, wherein the target search word is the search word when the number of characters of the search word is smaller than or equal to a first preset value, and the target search word is a word segmentation result obtained by word segmentation of the search word when the number of characters of the search word is larger than the first preset value;
Searching in a target database based on the target search word, and outputting a search result.
Optionally, when the number of characters of the search word is greater than the first preset value, the target search word includes n segmentation words, where n is a positive integer, and the searching is performed in a target database based on the target search word, and a search result is output, including:
searching in the target database based on the n segmented words under the condition that the n is smaller than or equal to a second preset value, and outputting a first target object, wherein the second preset value is larger than 1, and the first target object is an object comprising the n segmented words in the target database;
and under the condition that n is larger than the second preset value, searching in the target database based on the n segmented words, and outputting a second target object, wherein the second target object is an object comprising part of segmented words in the n segmented words in the target database.
Optionally, the second target object includes a first sub-target object and a second sub-target object, the searching in the target database based on the n segmentation words, outputting the second target object, including:
Searching in the target database based on the n segmented words under the condition that the n is smaller than a third preset value, and outputting the first sub-target object, wherein the first sub-target object is an object comprising at least k segmented words in the n segmented words in the target database;
searching in the target database based on the n segmented words when the n is greater than or equal to the third preset value, and outputting the second sub-target object, wherein the second sub-target object is an object comprising at least m segmented words in the n segmented words in the target database;
the third preset value is greater than the second preset value, k is greater than the product of a target coefficient and a first probability value, m is greater than the product of the target coefficient and a second probability value, the target coefficient is calculated based on n, and the first probability value is greater than the second probability value.
Optionally, the search result includes at least two target objects, the target objects including at least one of: a title portion, a keyword portion, and a body portion, each portion including a corresponding weight value;
The outputting search results includes:
scoring the at least two target objects according to the number of the segmented words contained in each part of the target objects and the weight value of each part of the target objects to obtain the scoring value of each target object;
and sequencing the at least two target objects according to the magnitude of the scoring value, and outputting the sequenced at least two target objects.
Optionally, in the case that the number of characters of the search word is less than or equal to the first preset value, searching in a target database based on the target search word, and outputting a search result includes:
searching in the target database based on the target search word, and outputting a third target object, wherein the third target object is an object in the target database, and the title part comprises the target search word.
Optionally, before the search term is obtained, the method further includes:
under the condition that a user logs in a search interface, acquiring identity information of the user;
recommending target search results to the user based on the identity information of the user, wherein the target search results are search results which are matched with the identity information of the user in the target database and have the highest click rate.
Optionally, in the case that the number of characters of the search word is less than or equal to the first preset value, searching in a target database based on the target search word, and outputting a search result includes:
searching in the target database according to a wild card method based on the target search word, and outputting a search result.
The embodiment of the invention also provides an electronic device, which comprises a processor, a memory, and a computer program stored in the memory and capable of running on the processor, wherein the computer program realizes the processes of the method embodiment when being executed by the processor, and can achieve the same technical effects, and the repetition is avoided, and the description is omitted here.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the processes of the above method embodiment, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing an electronic device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.

Claims (9)

1. A search method, comprising:
acquiring search words;
determining a target search word based on the number of characters of the search word, wherein the target search word is the search word when the number of characters of the search word is smaller than or equal to a first preset value, and the target search word is a word segmentation result obtained by word segmentation of the search word when the number of characters of the search word is larger than the first preset value;
searching in a target database based on the target search word, and outputting a search result;
under the condition that the number of characters of the search word is larger than the first preset value, the target search word comprises n segmentation words, n is a positive integer, searching is performed in a target database based on the target search word, and a search result is output, wherein the method comprises the following steps:
Searching in the target database based on the n segmented words under the condition that the n is larger than a second preset value, and outputting a second target object, wherein the second target object is an object comprising part of segmented words in the n segmented words in the target database;
the second target object comprises a first sub-target object and a second sub-target object, the searching is performed in the target database based on the n segmentation words, the second target object is output, and the method comprises the following steps:
searching in the target database based on the n segmented words under the condition that the n is smaller than a third preset value, and outputting the first sub-target object, wherein the first sub-target object is an object comprising at least k segmented words in the n segmented words in the target database;
searching in the target database based on the n segmented words when the n is greater than or equal to the third preset value, and outputting the second sub-target object, wherein the second sub-target object is an object comprising at least m segmented words in the n segmented words in the target database;
the third preset value is greater than the second preset value, k is greater than the product of a target coefficient and a first probability value, m is greater than the product of the target coefficient and a second probability value, the target coefficient is obtained by multiplying n by the preset value and then rounding down, and the first probability value is greater than the second probability value.
2. The method of claim 1, wherein searching in a target database based on the target search term, outputting search results, comprises:
and under the condition that n is smaller than or equal to a second preset value, searching in the target database based on the n segmented words, and outputting a first target object, wherein the second preset value is larger than 1, and the first target object is an object comprising the n segmented words in the target database.
3. The method of claim 1, wherein the search result comprises at least two target objects, the target objects comprising at least one of: a title portion, a keyword portion, and a body portion, each portion including a corresponding weight value;
the outputting search results includes:
scoring the at least two target objects according to the number of the segmented words contained in each part of the target objects and the weight value of each part of the target objects to obtain the scoring value of each target object;
and sequencing the at least two target objects according to the magnitude of the scoring value, and outputting the sequenced at least two target objects.
4. The method according to claim 1, wherein in the case that the number of characters of the search term is less than or equal to the first preset value, the searching in the target database based on the target search term, and outputting a search result, includes:
searching in the target database based on the target search word, and outputting a third target object, wherein the third target object is an object in the target database, and the title part comprises the target search word.
5. The method of claim 1, wherein prior to the obtaining the search term, the method further comprises:
under the condition that a user logs in a search interface, acquiring identity information of the user;
recommending target search results to the user based on the identity information of the user, wherein the target search results are search results which are matched with the identity information of the user in the target database and have the highest click rate.
6. The method according to claim 1, wherein in the case that the number of characters of the search term is less than or equal to the first preset value, the searching in the target database based on the target search term, and outputting a search result, includes:
Searching in the target database according to a wild card method based on the target search word, and outputting a search result.
7. A search apparatus, comprising:
the acquisition module is used for acquiring the search word;
the determining module is used for determining a target search word based on the number of characters of the search word, wherein the target search word is the search word when the number of characters of the search word is smaller than or equal to a first preset value, and the target search word is a word segmentation result obtained by word segmentation of the search word when the number of characters of the search word is larger than the first preset value;
the search module is used for searching in the target database based on the target search word and outputting a search result;
under the condition that the number of characters of the search word is larger than the first preset value, the target search word comprises n segmentation words, wherein n is a positive integer;
the searching module is further configured to search in the target database based on the n participles and output a second target object when the n is greater than a second preset value, where the second target object is an object in the target database that includes a part of the n participles;
The second target object comprises a first sub-target object and a second sub-target object, and the search module is further configured to search in the target database based on the n segmentations and output the first sub-target object when the n is smaller than a third preset value, where the first sub-target object is an object including at least k segmentations of the n segmentations in the target database;
the searching module is further configured to search in the target database based on the target search word when the n is greater than or equal to the third preset value, and output the second sub-target object, where the second sub-target object is an object in the target database that includes at least m segmentation words of the n segmentation words;
the third preset value is greater than the second preset value, k is greater than the product of a target coefficient and a first probability value, m is greater than the product of the target coefficient and a second probability value, the target coefficient is obtained by multiplying n by the preset value and then rounding down, and the first probability value is greater than the second probability value.
8. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, which when executed by the processor performs the steps of the search method according to any one of claims 1 to 6.
9. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the search method according to any of claims 1 to 6.
CN202110571293.9A 2021-05-25 2021-05-25 Searching method and device and electronic equipment Active CN113177061B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110571293.9A CN113177061B (en) 2021-05-25 2021-05-25 Searching method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110571293.9A CN113177061B (en) 2021-05-25 2021-05-25 Searching method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN113177061A CN113177061A (en) 2021-07-27
CN113177061B true CN113177061B (en) 2023-05-16

Family

ID=76928193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110571293.9A Active CN113177061B (en) 2021-05-25 2021-05-25 Searching method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113177061B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114637601A (en) * 2022-03-02 2022-06-17 马上消费金融股份有限公司 Information acquisition method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145555A (en) * 2017-04-28 2017-09-08 北京安数云信息技术有限公司 A kind of fuzzy sentence searching method based on participle
CN110968664A (en) * 2018-09-30 2020-04-07 北京国双科技有限公司 Document retrieval method, device, equipment and medium
CN111666448A (en) * 2020-04-21 2020-09-15 北京奇艺世纪科技有限公司 Search method, search device, electronic equipment and computer-readable storage medium
CN111708911A (en) * 2020-06-17 2020-09-25 北京字节跳动网络技术有限公司 Search method, search device, electronic equipment and computer-readable storage medium
CN111737571A (en) * 2020-06-11 2020-10-02 北京字节跳动网络技术有限公司 Searching method and device and electronic equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1668541A1 (en) * 2003-09-30 2006-06-14 British Telecommunications Public Limited Company Information retrieval
CN101794307A (en) * 2010-03-02 2010-08-04 光庭导航数据(武汉)有限公司 Vehicle navigation POI (Point of Interest) search engine based on internetwork word segmentation idea
CN102541960A (en) * 2010-12-31 2012-07-04 北大方正集团有限公司 Method and device of fuzzy retrieval
CN106503130A (en) * 2016-10-20 2017-03-15 深圳铂睿智恒科技有限公司 The application searches method of application market, system and application market
CN108241667B (en) * 2016-12-26 2019-10-15 百度在线网络技术(北京)有限公司 Method and apparatus for pushed information
CN112380244B (en) * 2020-12-02 2022-06-14 杭州筑龙信息技术股份有限公司 Word segmentation searching method and device, electronic equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145555A (en) * 2017-04-28 2017-09-08 北京安数云信息技术有限公司 A kind of fuzzy sentence searching method based on participle
CN110968664A (en) * 2018-09-30 2020-04-07 北京国双科技有限公司 Document retrieval method, device, equipment and medium
CN111666448A (en) * 2020-04-21 2020-09-15 北京奇艺世纪科技有限公司 Search method, search device, electronic equipment and computer-readable storage medium
CN111737571A (en) * 2020-06-11 2020-10-02 北京字节跳动网络技术有限公司 Searching method and device and electronic equipment
CN111708911A (en) * 2020-06-17 2020-09-25 北京字节跳动网络技术有限公司 Search method, search device, electronic equipment and computer-readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Efficient dynamic multi-keyword fuzzy search over encrypted cloud data;Hong Zhong 等;《Journal of Network and Computer Applications》;第149卷;1-10 *
基于词向量扩展的语义检索模型研究;张莉;《中国优秀硕士学位论文全文数据库 信息科技辑》(第07期);I138-1624 *

Also Published As

Publication number Publication date
CN113177061A (en) 2021-07-27

Similar Documents

Publication Publication Date Title
CN108268619B (en) Content recommendation method and device
US20200081977A1 (en) Keyword extraction method and apparatus, storage medium, and electronic apparatus
CN110263248B (en) Information pushing method, device, storage medium and server
CN105824959B (en) Public opinion monitoring method and system
CN111767716B (en) Method and device for determining enterprise multi-level industry information and computer equipment
CN109918560A (en) A kind of answering method and device based on search engine
CN112069298A (en) Human-computer interaction method, device and medium based on semantic web and intention recognition
US10586174B2 (en) Methods and systems for finding and ranking entities in a domain specific system
CN109858626B (en) Knowledge base construction method and device
CN113254643B (en) Text classification method and device, electronic equipment and text classification program
CN108287848B (en) Method and system for semantic parsing
CN113569011B (en) Training method, device and equipment of text matching model and storage medium
CN112000776A (en) Topic matching method, device and equipment based on voice semantics and storage medium
CN111881283A (en) Business keyword library creating method, intelligent chat guiding method and device
CN111079029A (en) Sensitive account detection method, storage medium and computer equipment
CN113127621A (en) Dialogue module pushing method, device, equipment and storage medium
CN113177061B (en) Searching method and device and electronic equipment
CN113570380A (en) Service complaint processing method, device and equipment based on semantic analysis and computer readable storage medium
CN111708870A (en) Deep neural network-based question answering method and device and storage medium
CN115098766B (en) Bidding information recommendation method and system for electronic bidding transaction platform
CN115827990A (en) Searching method and device
CN111382265A (en) Search method, apparatus, device and medium
CN115080741A (en) Questionnaire survey analysis method, device, storage medium and equipment
CN114780678A (en) Text retrieval method, device, equipment and storage medium
CN114255067A (en) Data pricing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant