CN116738065B - Enterprise searching method, device, equipment and storage medium - Google Patents

Enterprise searching method, device, equipment and storage medium Download PDF

Info

Publication number
CN116738065B
CN116738065B CN202311021653.3A CN202311021653A CN116738065B CN 116738065 B CN116738065 B CN 116738065B CN 202311021653 A CN202311021653 A CN 202311021653A CN 116738065 B CN116738065 B CN 116738065B
Authority
CN
China
Prior art keywords
enterprise
determining
matching
search
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311021653.3A
Other languages
Chinese (zh)
Other versions
CN116738065A (en
Inventor
石南
周平
马超
朱雷明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Tongxin Enterprise Credit Service Co ltd
Original Assignee
Zhejiang Tongxin Enterprise Credit Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Tongxin Enterprise Credit Service Co ltd filed Critical Zhejiang Tongxin Enterprise Credit Service Co ltd
Priority to CN202311021653.3A priority Critical patent/CN116738065B/en
Publication of CN116738065A publication Critical patent/CN116738065A/en
Application granted granted Critical
Publication of CN116738065B publication Critical patent/CN116738065B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an enterprise search method, device, equipment and storage medium, relating to the field of data search, comprising the following steps: acquiring enterprise query words for preprocessing, generating corresponding keywords by using a preset word segmentation system, and determining keyword weights of the keywords; determining corresponding matching fields according to the keywords, determining matching weights of the matching fields, and generating search sentences based on the matching fields and the matching weights; inquiring according to the inquiry grammar corresponding to the search statement, determining recall scores of enterprises corresponding to the grammar inquiry results, sorting the enterprises by using a preset precise sorting rule based on the recall scores to determine target enterprises meeting a preset score condition, and generating corresponding enterprise search results. Recall and sequencing of enterprises can be optimized through analysis results of query keywords, weight setting is conducted on data of each dimension of the enterprises according to service requirements, accurate sequencing is achieved, and better enterprise searching effect is achieved.

Description

Enterprise searching method, device, equipment and storage medium
Technical Field
The present invention relates to the field of data searching, and in particular, to an enterprise searching method, apparatus, device, and storage medium.
Background
Enterprise searching refers to searching and ranking enterprises through enterprise library data and search keywords. Different from general web page searching, the enterprise data on the data is more structured and comprises information of multiple dimensions of the enterprise; many text and network graph based methods are not suitable for enterprise search scenarios.
At present, the specific technical schemes aiming at enterprise search scenes are few, and as the enterprise data storage mode is consistent with the database construction mode, the main stream methods are all data retrieval-based schemes, such as Apache Solr, elastic search and the like, and the open source search engines provide rich search and sorting algorithms and can perform full-text search, word segmentation, semantic analysis and other operations on enterprise database data. However, the data retrieval-based approach has some limitations in the effect of enterprise search ordering, such as: the TF-IDF (term frequency-inverse document frequency, a common weighting technique for information retrieval and data mining) algorithm is used for searching and sorting, the algorithm can only be matched based on word frequency and document frequency, complex semantic relation cannot be processed, if more accurate searching and sorting results are needed, more advanced algorithm and technology are needed, special requirements on enterprise searching business cannot be met, and the expandability is poor. Thus, how to pass more accurate enterprise search results is a problem in the art.
Disclosure of Invention
Accordingly, the present invention aims to provide an enterprise search method, apparatus, device and storage medium, which can optimize recall and ranking of enterprises through analysis results of query keywords, and set weights of data of each dimension of the enterprises according to service requirements, so as to achieve accurate ranking and obtain better enterprise search effects. The specific scheme is as follows:
in a first aspect, the present application provides an enterprise search method, including:
acquiring enterprise query words input by a user, preprocessing the enterprise query words, generating keywords corresponding to the preprocessed enterprise query words by using a preset word segmentation system, and determining keyword weights of the keywords;
determining a corresponding matching field in a preset search engine according to the keywords, determining the matching weight of the matching field, and generating a corresponding search statement based on the matching field and the matching weight;
And inquiring according to the inquiry grammar corresponding to the search statement by using the preset search engine, determining recall scores of enterprises corresponding to grammar inquiry results, sorting the enterprises by using a preset accurate sorting rule based on the recall scores, determining target enterprises meeting preset score conditions based on the sorting results, and generating enterprise search results corresponding to the enterprise inquiry words according to the target enterprises.
Optionally, the determining, according to the keyword, a corresponding matching field in a preset search engine includes:
And determining enterprise search business requirements corresponding to the enterprise query words according to the keywords, determining corresponding matching modes according to the enterprise search business requirements, and determining matching fields corresponding to the enterprise search business requirements in the preset search engine based on the matching modes.
Optionally, after determining the keyword weight of the keyword, the method further includes:
determining the keyword attribute of the keyword, determining the keyword level corresponding to the keyword according to the keyword attribute and the keyword weight, and determining the level weight corresponding to the keyword level.
Optionally, determining recall scores of enterprises corresponding to the grammar query results includes:
determining a matching degree score of the search statement and enterprise information in the preset search engine according to a preset matching degree calculation rule;
and determining recall scores of enterprises corresponding to the grammar query results according to the matching degree scores, the level weights and the matching weights.
Optionally, after determining the matching weight of the matching field, the method further includes:
Determining the data distribution condition of the matching field and the search log data, and distributing corresponding target weights for preset non-matching fields according to preset weight distribution rules so as to adjust the matching weights according to the data distribution condition, the search log data and the target weights, so that corresponding search sentences are generated based on the matching field and the adjusted matching weights.
Optionally, the determining the keyword weight of the keyword includes:
if the keyword comprises an enterprise name, determining a first weight corresponding to the enterprise name through a preset enterprise name matching algorithm;
Processing the enterprise name according to a preset universal suffix word dictionary and a preset loss coefficient, and determining a second weight corresponding to the enterprise name;
and determining the keyword weight of the enterprise name according to the first weight and the second weight.
Optionally, the enterprise search method further includes:
If the keyword comprises an address word, judging whether the address word is on an address link of a pre-constructed address tree, and if so, determining a field score corresponding to a field corresponding to the address word;
correspondingly, determining the recall score of the enterprise corresponding to the grammar query result further comprises:
and determining recall scores of enterprises corresponding to the grammar query results based on the field scores.
In a second aspect, the present application provides an enterprise search apparatus, comprising:
The keyword determining module is used for acquiring enterprise query words input by a user, preprocessing the enterprise query words, generating keywords corresponding to the preprocessed enterprise query words by utilizing a preset word segmentation system, and determining keyword weights of the keywords;
The search statement generation module is used for determining a corresponding matching field in a preset search engine according to the keyword, determining the matching weight of the matching field and generating a corresponding search statement based on the matching field and the matching weight;
The search result generation module is used for inquiring according to the inquiry grammar corresponding to the search statement by utilizing the preset search engine, determining recall scores of enterprises corresponding to the grammar inquiry result, sorting the enterprises by utilizing a preset accurate sorting rule based on the recall scores, determining target enterprises meeting preset score conditions based on the sorting result, and generating enterprise search results corresponding to the enterprise inquiry words according to the target enterprises.
In a third aspect, the present application provides an electronic device comprising a processor and a memory; the memory is used for storing a computer program, and the computer program is loaded and executed by the processor to realize the enterprise searching method.
In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which when executed by a processor implements the enterprise search method described above.
The method comprises the steps of obtaining enterprise query words input by a user, preprocessing the enterprise query words, generating keywords corresponding to the preprocessed enterprise query words by using a preset word segmentation system, and determining keyword weights of the keywords; determining a corresponding matching field in a preset search engine according to the keywords, determining the matching weight of the matching field, and generating a corresponding search statement based on the matching field and the matching weight; and inquiring according to the inquiry grammar corresponding to the search statement by using the preset search engine, determining recall scores of enterprises corresponding to grammar inquiry results, sorting the enterprises by using a preset accurate sorting rule based on the recall scores, determining target enterprises meeting preset score conditions based on the sorting results, and generating enterprise search results corresponding to the enterprise inquiry words according to the target enterprises. In this way, the query can be analyzed according to the query keywords, related results are recalled from the database, the results are accurately ordered, the recall and the ordering of enterprises are optimized through the analysis results of the query keywords, the weight setting is carried out on the dimension data of the enterprises according to the service requirements, the accurate ordering is realized, and the enterprise search effect with better expansibility is obtained.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an enterprise search method provided by the application;
FIG. 2 is a flow chart of an enterprise search framework provided by the present application;
FIG. 3 is a flow chart of a specific enterprise search result precision ordering provided by the present application;
FIG. 4 is a schematic diagram of an enterprise search device according to the present application;
fig. 5 is a block diagram of an electronic device according to the present application.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
At present, enterprise search is mainly based on data retrieval, but a scheme based on data retrieval can only be matched based on word frequency and document frequency, cannot process complex semantic relations, cannot meet some special requirements on enterprise search business, and is poor in expandability. According to the application, recall and sequencing of enterprises can be optimized through analysis results of query keywords, and weight setting is performed on data of each dimension of the enterprises according to service requirements, so that accurate sequencing is realized, and an enterprise search effect with better expansibility is obtained.
Referring to fig. 1, the embodiment of the invention discloses an enterprise searching method, which comprises the following steps:
Step S11, acquiring enterprise query words input by a user, preprocessing the enterprise query words, generating keywords corresponding to the preprocessed enterprise query words by using a preset word segmentation system, and determining keyword weights of the keywords.
In this embodiment, after the enterprise query word input by the user is obtained first, preprocessing is performed on the enterprise query word, including but not limited to removing stop words, extracting stems, and other operations, and then a preset word segmentation system is utilized to generate keywords corresponding to the preprocessed enterprise query word, and keyword weights of the keywords are determined. It should be noted that, the preset word segmentation system may use a ready-made word segmentation system and a corresponding word weight score, or may provide a corpus for retraining generation, in this embodiment, the structure of the chinese word segmentation device may refer to ready-made various documents, and in order to meet the needs of the enterprise search scene, the specific chinese word segmentation device is retrained by using the enterprise library information and various documents of public opinion, and the design and structure of the word segmentation device are not repeated here. The word segmentation system can score the word segmentation of the query words input by the user and the importance of each word to obtain corresponding weight, and some preprocessing is needed before the query words are processed. The query words input by the user can be finally divided into a { keyword: weight } format for subsequent use.
It should be noted that, after determining the keyword weight of the keyword, the keyword attribute of the keyword may also be determined, and the keyword level corresponding to the keyword may be determined according to the keyword attribute and the keyword weight, and the level weight corresponding to the keyword level may be determined. In this way, the preprocessed keywords may be ranked and keywords with higher weights may be assigned to higher levels for weighted calculation when calculating recall scores. The keyword classification method can judge according to the importance, frequency, length and other factors of the keywords, and can set that the keywords with the highest level must be hit by enterprise information, so that the efficiency of subsequent recall score calculation is improved.
Step S12, corresponding matching fields in a preset search engine are determined according to the keywords, the matching weights of the matching fields are determined, and corresponding search sentences are generated based on the matching fields and the matching weights.
In this embodiment, a corresponding matching field in a preset search engine may be determined according to a keyword, and a matching weight of the matching field may be determined. After determining the corresponding enterprise search business requirements by the query terms of the user, it is necessary to determine which fields are involved in recall matching. For example, in searching for business names, fields of business industry, region, brand, etc. may need to be matched simultaneously, and in addition, appropriate matching means may need to be selected, including but not limited to full text matching, fuzzy matching, exact matching, etc., and each matching field assigned a matching weight for weighted calculation in calculating recall scores. And then generating a corresponding search statement based on the matching field and the matching weight, wherein it is understood that the search statement needs to specify the index and the type to be searched by the user, and information such as the field, the matching mode, the matching weight and the like participating in the search.
It should be noted that, regarding the selection of the above-mentioned preset search engine, since the enterprise information includes many dimensions, there are generally enterprise names, great names, registered capital, registered addresses, legal persons, high-level authorities, official networks, products, operating ranges, etc., and there are also stock abbreviations, stock codes, etc. for the listed companies, and the enterprises also have many processed enterprise labels under various services. Some fields of such structured data may be associated with search terms, such as company names, and some are company attributes, such as registry capital. On the other hand, tens of millions of businesses in an enterprise search scenario need to consider the effectiveness and efficiency of data in text retrieval. Thus, the present embodiment employs an elastic search engine having a function of storing and retrieving data, but it is different from a conventional relational database. Compared with the traditional database, the elastic search is more focused on application scenes such as full text search, log analysis, data analysis and the like, and provides a more flexible query mode and higher efficiency performance. The elastic search can provide various query modes including but not limited to full text search, fuzzy search, accurate matching and the like, and also supports aggregation, filtering, sequencing and other operations, can flexibly meet various query requirements, adopts an inverted index technology, can quickly query a large amount of data, supports distributed deployment and can easily expand processing capacity. Specifically, in this embodiment, according to the service scenario, since the field types are determined according to the field itself and whether the field needs to be completely matched among the plurality of fields that can be matched with the keyword. If a fully matched legal person, abbreviation and the like are needed, a keyword type storage is used, a fully matched keyword type such as a company name, an address and the like is not needed, in enterprise searching, search core words are proper nouns and are related to dimensions such as enterprise names, brands and the like, and processing results of an elastomer search self-contained word segmentation processor are not enough to meet business requirements in an enterprise searching scene, so that a word segmentation processor of the preset word segmentation system needs to select a word segmentation device suitable for enterprise searching, and meanwhile, word segmentation devices used for constructing indexes in the elastomer search are unified with word segmentation devices for processing search words.
And S13, inquiring by utilizing the preset search engine according to the inquiry grammar corresponding to the search statement, determining recall scores of enterprises corresponding to the grammar inquiry result, sequencing the enterprises by utilizing a preset precise sequencing rule based on the recall scores, determining target enterprises meeting preset score conditions based on the sequencing result, and generating enterprise search results corresponding to the enterprise inquiry word according to the target enterprises.
In this embodiment, the search engine of the elastic search may determine the corresponding query grammar according to the search statement to perform the query, and specifically may use the query grammar provided by the elastic search to implement advanced query functions, such as fuzzy query, range query, boolean query, and so on. And then determining recall scores of enterprises corresponding to the grammar query results, firstly determining the matching degree scores of the enterprise information in the search sentences and the preset search engines according to the preset matching degree calculation rules, and then determining the recall scores of the enterprises corresponding to the grammar query results according to the matching degree scores, the level weights and the matching weights. The recall score of each enterprise can be obtained according to the search statement, the calculation of the recall score generally comprises matching degree calculation and weighting calculation, specifically, the matching degree calculation is the matching degree score calculated according to the matching condition in the search statement and the enterprise information, the weighting calculation is carried out according to the level of the keyword and the matching weight, the final recall score is obtained, and the score calculation formula is that the recall score = matching degree score x keyword level weight x matching weight. Finally, recall results may be ranked, typically from high to low according to recall scores for different businesses, and businesses with recall scores greater than a set threshold may be recalled, while the top n ranked businesses may enter the next module. It can be appreciated that the recall score threshold can be set according to actual situations, and the higher the threshold, the higher the relevance of the determined enterprise to the query terms of the user.
Through the above technical solution, in this embodiment, as shown in fig. 2, the enterprise query words input by the user are obtained, the enterprise query words are preprocessed, the keyword corresponding to the preprocessed enterprise query words is generated by using the preset word segmentation system, and the keyword weights of the keywords are determined and classified; determining a corresponding matching field in a preset search engine according to the keywords, determining the matching weight of the matching field, and generating a corresponding elastic search statement based on the matching field and the matching weight; and inquiring by using a preset search engine according to the inquiry grammar corresponding to the search sentence, determining recall scores of enterprises corresponding to the grammar inquiry result, sorting the enterprises by using a preset precise sorting rule based on the recall scores, determining target enterprises meeting the preset score condition based on the sorting result, and generating enterprise search results corresponding to the enterprise inquiry word according to the target enterprises. The query can be analyzed according to the query keywords, recall and sequencing of enterprises are optimized through analysis results of the query keywords, weight setting is conducted on data of each dimension of the enterprises according to service requirements, and enterprise searching effects with better expansibility for word segmentation are achieved.
Based on the above embodiment, the method and the device can optimize recall and ranking of enterprises according to analysis results by analyzing the weight settings of the query keywords, and next, the process of accurately ranking recall results will be described in detail in the embodiment. Referring to fig. 3, the embodiment of the application discloses a specific enterprise search result precise ordering method, which comprises the following steps:
Step S21, acquiring enterprise query words input by a user, preprocessing the enterprise query words, generating keywords corresponding to the preprocessed enterprise query words by using a preset word segmentation system, and determining keyword weights of the keywords.
In this embodiment, when determining the keyword weight of the keyword, if the keyword includes an enterprise name, a first weight corresponding to the enterprise name may be determined by a preset enterprise name matching algorithm, then the enterprise name is processed according to a preset universal suffix word dictionary and a preset loss coefficient, a second weight corresponding to the enterprise name is determined, and the keyword weight of the enterprise name is determined according to the first weight and the second weight.
It should be noted that in the enterprise search scenario, in many cases, keyword searching for an enterprise name is performed, so as to further satisfy the association degree calculation between the user query and the enterprise name based on the matching manner of the elastic search. The method has the advantages that more proper scores can be calculated for the enterprise names through an enterprise name matching algorithm, meanwhile, in order to better refine keywords, a universal suffix word dictionary is designed according to the service scene, words with low matching value in the enterprise searching scene such as 'limited company', 'responsibility' and the like are recorded, noun fields are processed, and the calculated matching value length is reduced after the processing, so that balance can be lost in score calculation, a certain loss coefficient can be multiplied, and therefore, more accurate keyword scores can be realized when the keywords are the enterprise names, and follow-up accurate calculation of recall scores is facilitated.
The preset enterprise name matching algorithm is specifically as follows:
1. name value matching algorithm:
Inputting parameters:
V is a company name string to be matched;
-totalWe sum of keyword weights;
Outputting a result:
hit, matching score, including keyword weight, compactness, length duty cycle, etc.;
-sameWe the sum of the matched keyword weights;
-equalFlag, whether the flag bit is completely matched, if the matching is successful, the flag bit is True, otherwise the flag bit is False.
The specific implementation is as follows:
1. The inputted company name string v is converted into lowercase and the chinese brackets are replaced with english brackets, ensuring that all brackets are english.
2. Splitting the company name into a plurality of words by using a word splitter, and performing reverse matching and forward matching.
3. Hit weights hit are calculated from the matched keyword weights, while the sum of the matched keyword weights sameWe and the length of the matched portion sameLen are calculated.
4. If the name string contains all search keywords, the contain flag containFlag is set to True, otherwise False.
5. And calculating a matching degree score according to the hit weight, the keyword weight sum ratio, the matching part length ratio and other factors.
6. If the include flag containFlag is set to True, then additional points are added; if there is a perfect match, i.e., the query term and name value are consistent in length, additional scores are again added and the perfect match flag equalFlag is set to True; if the order is also consistent, i.e., the query terms are identical to the name values, a consistent score is added.
7. Finally, a matching degree score hit, a sum sameWe of matched keyword weights and a complete matching flag equalFlag are output.
The algorithm is used for calculating the matching degree score between the company name and the query condition, and factors such as keyword weight matching, compactness matching, complete inclusion, complete matching and the like are considered, so that the accuracy of the matching degree score is improved.
2. Business name field processing algorithm:
Inputting parameters:
weight, attribute weight.
Outputting a result:
maxHit final maximum match score;
equalFlag whether the matching is complete;
maxWe maximum match name weight.
The specific implementation flow is as follows:
1. and acquiring the attribute value of the corresponding attribute name from the self.info dictionary, and if the attribute value is not available, the value is empty.
2. If value is a string type, it is comma-separated into lists.
3. Matching is performed on each value by using a name value matching algorithm, so as to obtain a matching score hit, a matched name weight sameWe and whether the matching is completely equalFlag.
4. According to different attribute names, special treatment is carried out on individual households with very short names or hong kong stocks companies, company short names and the like, and different coefficients are multiplied respectively.
5. The maximum match score maxHit and the name weight maxWe for the maximum match for all value values are recorded.
6. And processing the value by using a universal suffix dictionary, and matching the value after the suffix is removed again to obtain a matching score hit and a matching name weight sameWe.
7. Multiplying the matching score of the suffix removal by a loss coefficient, recording the maximum matching score maxHit of the suffix removal and the name weight maxWe of the maximum matching, and comparing the maximum matching score with the maximum matching score before the suffix removal to obtain a final result.
8. MaxHit times the attribute weight, maxWe times 0.8 (if maxWe is less than 1).
9. Finally, a maximum match score maxHit, whether it matches perfectly equalFlag, and a maximum match name weight maxWe are output.
Step S22, corresponding matching fields in a preset search engine are determined according to the keywords, matching weights of the matching fields are determined, then data distribution conditions of the matching fields and search log data are determined, corresponding target weights are distributed for preset non-matching fields according to preset weight distribution rules, and the matching weights are adjusted according to the data distribution conditions, the search log data and the target weights so as to generate corresponding search sentences based on the matching fields and the adjusted matching weights.
In this embodiment, after determining a corresponding matching field in a preset search engine according to a keyword and determining a matching weight of the matching field, a data distribution situation and search log data of the matching field may be determined, and a corresponding target weight is allocated to a preset non-matching field according to a preset weight allocation rule, so as to adjust the matching weight according to the data distribution situation, the search log data and the target weight. Thus, fine adjustment of the matching field weight can be performed, and deviations caused by the data retrieval stage can be repaired. For example, the matching weights of different fields can be adjusted according to the data distribution condition and the search log data of different fields in the enterprise information so as to achieve a more accurate ordering result.
It should be noted that in addition to matching fields, design scoring of non-matching fields such as properties of the enterprise may also be considered to take into account more factors in the ordering. For example, fields such as register capital, marketing, nature of the enterprise, and world 500 strength may be assigned weights and added to the calculation of recall scores, which may allow high quality enterprises to get higher ranking scores, thereby improving accuracy and reliability of search results.
Step S23, inquiring by utilizing the preset search engine according to the inquiry grammar corresponding to the search statement, determining recall scores of enterprises corresponding to grammar inquiry results, sorting the enterprises by utilizing a preset precise sorting rule based on the recall scores, determining target enterprises meeting preset score conditions based on the sorting results, and generating enterprise search results corresponding to the enterprise inquiry words according to the target enterprises.
In this embodiment, if the keyword includes an address word, it is determined whether the address word is on an address link of a pre-constructed address tree, so as to determine a field score corresponding to a field corresponding to the address word, and a recall score corresponding to a grammar query result may be determined based on the field score.
It should be noted that among the query words input by the user, there may be a search including a place name. The user inputs the place name, and besides the direct matching of the enterprise name, the user also has the intention of limiting the area of the enterprise and the intention of searching the address. When the user is the second intention, the service requirement cannot be met by directly matching the place name part and the name in the query word, for example, the user wants to search for "Hangzhou XX network limited company", but inputs "Zhejiang XX network limited company", in order to return a correct result in the enterprise search under the scene, the embodiment can construct an address tree, and particularly obtain names of each province, city, county/district level according to administrative division to determine the sub-node and the father node, and the method is implemented by using a storage mode of an array and a dictionary when searching and judging the address tree. When the address field information of the enterprise is acquired, the address can be formatted to obtain the corresponding province, city, county/region names, and the father node, namely the information of the upper region can be supplemented through the address tree due to the incomplete condition of the address field information. And finally, determining the address words in the query words, judging whether the address words are on an address link of the enterprise address, and if so, giving hit scores to the address fields. Therefore, when the enterprise query words input by the user are not accurate enough, corresponding query results are still provided for the user, and user experience is improved. It will be appreciated that, since the size of the nodes in the address tree is not large, the address words are extracted by using a cyclic traversal method, and in the case that the user searches for the address directly, the processing is performed according to the processing method of the previous embodiment, and text matching is used and the corresponding weight is multiplied. It can be understood that the ranking time complexity of the recall score according to the business name matching algorithm and the keyword processing in the above steps is high, and in order to meet the performance requirement, the top 100 recall results may be ranked accurately in this embodiment.
In this embodiment, an enterprise query word input by a user is obtained, the enterprise query word is preprocessed, a keyword corresponding to the preprocessed enterprise query word is generated by using a preset word segmentation system, the keyword weight of the keyword is determined, if the keyword comprises an enterprise name, a first weight corresponding to the enterprise name is determined through a preset enterprise name matching algorithm, then the enterprise name is processed according to a preset universal suffix word dictionary and a preset loss coefficient, a second weight corresponding to the enterprise name is determined, and the keyword weight of the enterprise name is determined according to the first weight and the second weight; if the keyword comprises an address word, judging whether the address word is on an address link of a pre-constructed address tree so as to determine a field score corresponding to a field corresponding to the address word. And determining corresponding matching fields in a preset search engine according to the keywords, determining the data distribution condition and the search log data of the matching fields after determining the matching weights of the matching fields, and distributing corresponding target weights for preset non-matching fields according to preset weight distribution rules so as to adjust the matching weights according to the data distribution condition, the search log data and the target weights. And finally, determining the corresponding recall score, sorting the enterprises by using a preset precise sorting rule based on the recall score, determining a target enterprise meeting the preset score condition based on the sorting result, and generating enterprise search results corresponding to the enterprise query word according to the target enterprise. In this way, when the keywords comprise enterprise names and address words, corresponding score calculation is performed, weights are distributed to non-matching fields, accurate ordering is achieved, and more perfect business processing logic is increased by utilizing the word segmentation and the weights obtained by query word analysis.
Referring to fig. 4, the embodiment of the application also discloses an enterprise search device, which comprises:
the keyword determining module 11 is configured to obtain an enterprise query word input by a user, pre-process the enterprise query word, generate a keyword corresponding to the pre-processed enterprise query word by using a preset word segmentation system, and determine a keyword weight of the keyword;
The search sentence generation module 12 is configured to determine a corresponding matching field in a preset search engine according to the keyword, determine a matching weight of the matching field, and generate a corresponding search sentence based on the matching field and the matching weight;
The search result generation module 13 is configured to query according to a query grammar corresponding to the search sentence by using the preset search engine, determine recall scores of enterprises corresponding to the grammar query result, sort the enterprises by using a preset precise sorting rule based on the recall scores, determine target enterprises meeting a preset score condition based on the sorting result, and generate enterprise search results corresponding to the enterprise query word according to the target enterprises.
In this embodiment, an enterprise query word input by a user may be obtained, the enterprise query word is preprocessed, a keyword corresponding to the preprocessed enterprise query word is generated by using a preset word segmentation system, and a keyword weight of the keyword is determined; determining a corresponding matching field in a preset search engine according to the keywords, determining the matching weight of the matching field, and generating a corresponding search statement based on the matching field and the matching weight; and inquiring by using a preset search engine according to the inquiry grammar corresponding to the search sentence, determining recall scores of enterprises corresponding to the grammar inquiry result, sorting the enterprises by using a preset precise sorting rule based on the recall scores, determining target enterprises meeting the preset score condition based on the sorting result, and generating enterprise search results corresponding to the enterprise inquiry word according to the target enterprises. In this way, the query can be analyzed according to the query keywords, related results are recalled from the database, the results are accurately ordered, the recall and the ordering of enterprises are optimized through the analysis results of the query keywords, the weight setting is carried out on the dimension data of the enterprises according to the service requirements, the accurate ordering is realized, and the enterprise search effect with better expansibility is obtained.
In some specific embodiments, the search term generation module 12 specifically includes:
And the matching field determining unit is used for determining enterprise search business requirements corresponding to the enterprise query words according to the keywords, determining corresponding matching modes according to the enterprise search business requirements, and determining matching fields corresponding to the enterprise search business requirements in the preset search engine based on the matching modes.
In some specific embodiments, the keyword determining module 11 further includes:
The level weight determining unit is used for determining the keyword attribute of the keyword, determining the keyword level corresponding to the keyword according to the keyword attribute and the keyword weight, and determining the level weight corresponding to the keyword level.
In some specific embodiments, the search result generation module 13 specifically includes:
The matching degree score determining unit is used for determining matching degree scores of the search sentences and enterprise information in the preset search engine according to preset matching degree calculation rules;
And the first recall score determining unit is used for determining the recall score of the enterprise corresponding to the grammar query result according to the matching degree score, the level weight and the matching weight.
In some specific embodiments, the search term generation module 12 further includes:
And the matching weight adjustment unit is used for determining the data distribution condition of the matching field and the search log data, and distributing corresponding target weights for preset non-matching fields according to preset weight distribution rules so as to adjust the matching weights according to the data distribution condition, the search log data and the target weights, so that corresponding search sentences are generated based on the matching fields and the adjusted matching weights.
In some specific embodiments, the keyword determining module 11 specifically includes:
The first weight determining unit is used for determining a first weight corresponding to the enterprise name through a preset enterprise name matching algorithm if the keyword comprises the enterprise name;
the second weight determining unit is used for processing the enterprise name according to a preset universal suffix word dictionary and a preset loss coefficient and determining a second weight corresponding to the enterprise name;
and the keyword weight determining unit is used for determining the keyword weight of the enterprise name according to the first weight and the second weight.
In some specific embodiments, the search result generation module 13 further includes:
A field score determining unit, configured to determine whether the address word is on an address link of a pre-constructed address tree if the address word is included in the keyword, and if so, determine a field score corresponding to a field corresponding to the address word;
And the second recall score determining unit is used for determining the recall score of the enterprise corresponding to the grammar query result based on the field score.
Further, the embodiment of the present application further discloses an electronic device, and fig. 5 is a block diagram of an electronic device 20 according to an exemplary embodiment, where the content of the figure is not to be considered as any limitation on the scope of use of the present application.
Fig. 5 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. The memory 22 is used for storing a computer program, and the computer program is loaded and executed by the processor 21 to implement relevant steps in the enterprise search method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, and the like, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and the computer program 222, which may be windows server, netware, unix, linux, etc. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the enterprise search method performed by the electronic device 20 as disclosed in any of the previous embodiments.
Further, the application also discloses a computer readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the enterprise search method disclosed previously. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing has outlined rather broadly the more detailed description of the application in order that the detailed description of the application that follows may be better understood, and in order that the present principles and embodiments may be better understood; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (8)

1. An enterprise search method, comprising:
acquiring enterprise query words input by a user, preprocessing the enterprise query words, generating keywords corresponding to the preprocessed enterprise query words by using a preset word segmentation system, and determining keyword weights of the keywords;
determining a corresponding matching field in a preset search engine according to the keywords, determining the matching weight of the matching field, and generating a corresponding search statement based on the matching field and the matching weight;
inquiring according to the inquiry grammar corresponding to the search statement by using the preset search engine, determining recall scores of enterprises corresponding to grammar inquiry results, sorting the enterprises by using a preset precise sorting rule based on the recall scores, determining target enterprises meeting preset score conditions based on the sorting results, and generating enterprise search results corresponding to the enterprise inquiry words according to the target enterprises;
And, the determining the keyword weight of the keyword includes:
if the keyword comprises an enterprise name, determining a first weight corresponding to the enterprise name through a preset enterprise name matching algorithm;
Processing the enterprise name according to a preset universal suffix word dictionary and a preset loss coefficient, and determining a second weight corresponding to the enterprise name;
determining keyword weights of the enterprise names according to the first weights and the second weights;
and after determining the matching weight of the matching field, further including:
Determining the data distribution condition of the matching field and the search log data, and distributing corresponding target weights for preset non-matching fields according to preset weight distribution rules so as to adjust the matching weights according to the data distribution condition, the search log data and the target weights, so that corresponding search sentences are generated based on the matching field and the adjusted matching weights.
2. The enterprise search method of claim 1, wherein the determining the corresponding matching field in the preset search engine according to the keyword comprises:
And determining enterprise search business requirements corresponding to the enterprise query words according to the keywords, determining corresponding matching modes according to the enterprise search business requirements, and determining matching fields corresponding to the enterprise search business requirements in the preset search engine based on the matching modes.
3. The enterprise search method of claim 1, wherein after determining the keyword weights for the keywords, further comprising:
determining the keyword attribute of the keyword, determining the keyword level corresponding to the keyword according to the keyword attribute and the keyword weight, and determining the level weight corresponding to the keyword level.
4. The enterprise search method of claim 3, wherein determining recall scores for enterprises for which grammatical query results correspond comprises:
determining a matching degree score of the search statement and enterprise information in the preset search engine according to a preset matching degree calculation rule;
and determining recall scores of enterprises corresponding to the grammar query results according to the matching degree scores, the level weights and the matching weights.
5. The enterprise search method of claim 4, further comprising:
If the keyword comprises an address word, judging whether the address word is on an address link of a pre-constructed address tree, and if so, determining a field score corresponding to a field corresponding to the address word;
correspondingly, determining the recall score of the enterprise corresponding to the grammar query result further comprises:
and determining recall scores of enterprises corresponding to the grammar query results based on the field scores.
6. An enterprise search apparatus, comprising:
The keyword determining module is used for acquiring enterprise query words input by a user, preprocessing the enterprise query words, generating keywords corresponding to the preprocessed enterprise query words by utilizing a preset word segmentation system, and determining keyword weights of the keywords;
The search statement generation module is used for determining a corresponding matching field in a preset search engine according to the keyword, determining the matching weight of the matching field and generating a corresponding search statement based on the matching field and the matching weight;
the search result generation module is used for inquiring according to the inquiry grammar corresponding to the search statement by using the preset search engine, determining recall scores of enterprises corresponding to the grammar inquiry result, sorting the enterprises by using a preset precise sorting rule based on the recall scores, determining target enterprises meeting preset score conditions based on the sorting result, and generating enterprise search results corresponding to the enterprise inquiry words according to the target enterprises;
and, the keyword determining module specifically includes:
The first weight determining unit is used for determining a first weight corresponding to the enterprise name through a preset enterprise name matching algorithm if the keyword comprises the enterprise name;
the second weight determining unit is used for processing the enterprise name according to a preset universal suffix word dictionary and a preset loss coefficient and determining a second weight corresponding to the enterprise name;
a keyword weight determining unit, configured to determine a keyword weight of the business name according to the first weight and the second weight;
the search statement generation module further includes:
And the matching weight adjustment unit is used for determining the data distribution condition of the matching field and the search log data, and distributing corresponding target weights for preset non-matching fields according to preset weight distribution rules so as to adjust the matching weights according to the data distribution condition, the search log data and the target weights, so that corresponding search sentences are generated based on the matching fields and the adjusted matching weights.
7. An electronic device comprising a processor and a memory; wherein the memory is for storing a computer program that is loaded and executed by the processor to implement the enterprise search method of any one of claims 1 to 5.
8. A computer readable storage medium storing a computer program which when executed by a processor implements the enterprise search method of any one of claims 1 to 5.
CN202311021653.3A 2023-08-15 2023-08-15 Enterprise searching method, device, equipment and storage medium Active CN116738065B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311021653.3A CN116738065B (en) 2023-08-15 2023-08-15 Enterprise searching method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311021653.3A CN116738065B (en) 2023-08-15 2023-08-15 Enterprise searching method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116738065A CN116738065A (en) 2023-09-12
CN116738065B true CN116738065B (en) 2024-04-19

Family

ID=87919008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311021653.3A Active CN116738065B (en) 2023-08-15 2023-08-15 Enterprise searching method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116738065B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103353894A (en) * 2013-07-19 2013-10-16 武汉睿数信息技术有限公司 Data searching method and system based on semantic analysis
CN113343046A (en) * 2021-05-20 2021-09-03 成都美尔贝科技股份有限公司 Intelligent search sequencing system
WO2021190249A1 (en) * 2020-03-27 2021-09-30 京东方科技集团股份有限公司 Data retrieval method and apparatus, electronic device and computer-readable storage medium
CN113468441A (en) * 2021-06-29 2021-10-01 平安信托有限责任公司 Search sorting method, device, equipment and storage medium based on weight adjustment
WO2022052639A1 (en) * 2020-09-10 2022-03-17 北京达佳互联信息技术有限公司 Data query method and apparatus
CN114330329A (en) * 2021-12-23 2022-04-12 广东太平洋互联网信息服务有限公司 Service content searching method and device, electronic equipment and storage medium
CN115858939A (en) * 2022-12-31 2023-03-28 企知道网络技术有限公司 Method, system and storage medium for recalling in-line

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103353894A (en) * 2013-07-19 2013-10-16 武汉睿数信息技术有限公司 Data searching method and system based on semantic analysis
WO2021190249A1 (en) * 2020-03-27 2021-09-30 京东方科技集团股份有限公司 Data retrieval method and apparatus, electronic device and computer-readable storage medium
WO2022052639A1 (en) * 2020-09-10 2022-03-17 北京达佳互联信息技术有限公司 Data query method and apparatus
CN113343046A (en) * 2021-05-20 2021-09-03 成都美尔贝科技股份有限公司 Intelligent search sequencing system
CN113468441A (en) * 2021-06-29 2021-10-01 平安信托有限责任公司 Search sorting method, device, equipment and storage medium based on weight adjustment
CN114330329A (en) * 2021-12-23 2022-04-12 广东太平洋互联网信息服务有限公司 Service content searching method and device, electronic equipment and storage medium
CN115858939A (en) * 2022-12-31 2023-03-28 企知道网络技术有限公司 Method, system and storage medium for recalling in-line

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Novel Approach for Rank Optimization using Search Engine Transaction Logs;Kataria, Shipra et al.;PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT;20161231;第3387-3393页 *
基于用户查询意图的搜索排序算法;张美珍;王治莹;;天津理工大学学报;20120620(03);第46-51页 *

Also Published As

Publication number Publication date
CN116738065A (en) 2023-09-12

Similar Documents

Publication Publication Date Title
Al-Radaideh et al. A hybrid approach for arabic text summarization using domain knowledge and genetic algorithms
US20180300315A1 (en) Systems and methods for document processing using machine learning
CN107180093B (en) Information searching method and device and timeliness query word identification method and device
US8392441B1 (en) Synonym generation using online decompounding and transitivity
US20040249808A1 (en) Query expansion using query logs
CN110929125B (en) Search recall method, device, equipment and storage medium thereof
JPH1125108A (en) Automatic extraction device for relative keyword, document retrieving device and document retrieving system using these devices
CN110637316B (en) System and method for prospective object identification
JP6355840B2 (en) Stopword identification method and apparatus
CN103678576A (en) Full-text retrieval system based on dynamic semantic analysis
US9569525B2 (en) Techniques for entity-level technology recommendation
JP6663826B2 (en) Computer and response generation method
US20150006563A1 (en) Transitive Synonym Creation
Mahdabi et al. The effect of citation analysis on query expansion for patent retrieval
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
WO2015084757A1 (en) Systems and methods for processing data stored in a database
CN111488453B (en) Resource grading method, device, equipment and storage medium
US20090327269A1 (en) Pattern generation
CN103226601A (en) Method and device for image search
Poostchi et al. Cluster labeling by word embeddings and WordNet’s hypernymy
JPH1049543A (en) Document retrieval device
CN115062135B (en) Patent screening method and electronic equipment
US9223833B2 (en) Method for in-loop human validation of disambiguated features
CN116738065B (en) Enterprise searching method, device, equipment and storage medium
KR102454261B1 (en) Collaborative partner recommendation system and method based on user information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant