CN106354871A - Similarity search method of enterprise names - Google Patents

Similarity search method of enterprise names Download PDF

Info

Publication number
CN106354871A
CN106354871A CN201610829356.5A CN201610829356A CN106354871A CN 106354871 A CN106354871 A CN 106354871A CN 201610829356 A CN201610829356 A CN 201610829356A CN 106354871 A CN106354871 A CN 106354871A
Authority
CN
China
Prior art keywords
retrieval
search key
phrase
enterprise name
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610829356.5A
Other languages
Chinese (zh)
Inventor
仲晓琦
刘丰
刘镇华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Great Wall Computer Software & Systems Inc
Original Assignee
Great Wall Computer Software & Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Great Wall Computer Software & Systems Inc filed Critical Great Wall Computer Software & Systems Inc
Priority to CN201610829356.5A priority Critical patent/CN106354871A/en
Publication of CN106354871A publication Critical patent/CN106354871A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results

Abstract

The invention relates to a similarity search method of enterprise names. The similarity search method comprises steps as follows: input search keywords are decomposed, and the processed search keywords are obtained, wherein the search keywords are to-be-researched enterprise names; search phrases are determined according to the processed search keywords; similarity search is performed on the determined search phrases and search results are obtained; the enterprise names ranking in the top N of the search results are displayed to be checked by users, and N is an integer larger than 1. With the adoption of the method, the search efficiency is greatly improved, the users can check similarity research results, and the requirement of similarity search business is met.

Description

A kind of similarity retrieval method of enterprise name
Technical field
The present invention relates to similarity retrieval technical field, the similarity retrieval method of more particularly, to a kind of enterprise name.
Background technology
The accurate inquiry velocity of data base is quickish, but fuzzy query after data volume is more than million grades, especially It is that inquiry velocity for "comprising" relation can rapidly reduce, and generally all can exceed 10 seconds.For example: by inputting keyword " computer " is wanted will be very slow in hit " Great Wall computer software " speed.The approximate rule need still more according to judging title The fuzzy query of loop nesting to be used, performance cannot accept completely.
Typically encounter this problem, can be using arriving " global search technology ", such as: Baidu, search dog etc. are it is achieved that according to few Amount keyword quick-searching in mass data obtains result.But, inventor finds during implementing one's duty: general Global search technology, still reaches to less than title Approximate Retrieval requirement, subject matter is as follows:
First, general global search technology is based on index in classification, i.e. carry out the retrieval string of input first Then phrase after participle is entered line retrieval in participle index database by participle, and then integrated searching result is pressed hit degree height and arranged Sequence.But title Approximate Retrieval is not entirely in units of word, such as " Alipay ", " Ou Fubao " two word strings, if point Word enters line retrieval it is likely that will be considered that they are dissimilar.
Secondly, general full-text search engine can also support the pattern by search words, but the advantage in performance does not just have , for example: for the name character string of 15 words, by word fully intermeshing enter that line retrieval obtains it is concluded that only by " in full Retrieval " also cannot fully achieve the business demand with regard to enterprise name Approximate Retrieval, and its retrieval time limit can exceed 30 seconds, and retrieves The sequence of result is also inaccurate, larger with the title approximation difference of people's generally sensation, much cannot meet business demand.
Content of the invention
The technical problem to be solved is for the deficiencies in the prior art, provides a kind of similarity of enterprise name Search method.
The technical scheme is that a kind of similarity retrieval method of enterprise name, bag Include:
Resolution process is carried out to the search key of input, the search key after being processed, wherein, described retrieval is closed Key word is enterprise name to be retrieved;
According to the described search key after processing, determine retrieval phrase;
Similarity retrieval is carried out to the described retrieval phrase determining, obtains retrieval result;
Show the enterprise name coming front n position in described retrieval result, so that user checks, n takes more than 1 integer.
The invention has the beneficial effects as follows: by resolution process is carried out to the search key inputting, the inspection after being processed Rope keyword, and according to the search key after processing, determine retrieval phrase, then similarity inspection is carried out to the retrieval phrase determining Rope, obtains retrieval result, finally shows the enterprise name coming front n position in retrieval result, so that user checks, not only can make Obtain recall precision to greatly improve, and facilitate user to check the result of similarity retrieval, and meet similarity retrieval business Demand.
On the basis of technique scheme, the present invention can also do following improvement.
Further, the described search key to input carries out segment processing, the search key after being processed, bag Include: judge in described search key, whether to comprise the ingredient of administrative division and the type of organization type of business;
If not comprising, described search key is decomposed into enterprise's font size and/or industry characteristic using as after process Search key;Otherwise, described search key is decomposed into administrative division, the type of organization type of business, enterprise's font size and row Industry feature, and the search key as after process using enterprise's font size and/or industry characteristic.
Further, described according to process after described search key, determine retrieval phrase, comprising:
Described search key after processing is arranged according to the order before resolution process;
Described retrieval phrase will be defined as from the m different phrase of word in described search key, m takes 0,1 or 2;Or,
Synonymous with described search key and/or unisonance phrase is defined as described retrieval phrase;Or,
The phrase of the word composition identical and adjacent with k in described search key is defined as described retrieval phrase;Or Person,
The phrase of identical with q in described search key but scattered word composition is defined as described retrieval phrase.
Beneficial effect using above-mentioned further scheme is: by being decomposed the search key of input, removes row Administrative division is drawn and the general informations such as the type of organization type of business, determines retrieval phrase according to enterprise's font size and/or industry characteristic, permissible Greatly reduce the quantity of retrieval type, thus effectively raising recall precision.
Further, the described described retrieval phrase to determination carries out similarity retrieval, obtains retrieval result, comprising: sentence Whether the quantity of disconnected described retrieval phrase exceedes preset value;
If exceeding, adopt distributed more piece point retrieval, and the retrieval result that each node is obtained is collected, and obtains Described retrieval result;Otherwise, using single-unit point retrieval, obtain described retrieval result.
Beneficial effect using above-mentioned further scheme is: by judging the quantity of retrieval phrase, determines using distributed More piece point retrieval or single-unit point retrieval, greatly can speed retrieval rate, thus effectively raising recall precision.
Further, after obtaining retrieval result, also include: calculate in described search key and described retrieval result The Similarity value of each retrieval result;According to described Similarity value, described retrieval result is ranked up.
Further, described according to described Similarity value, described retrieval result is ranked up, comprising: according to described phase Like angle value from high to low, described retrieval result is ranked up.
Further, carry out segment processing in the search key to input, before the search key after being processed, Also include: build enterprise name search library.
Further, described structure enterprise name search library, comprising: enterprise name data is increased to full-text search data In storehouse, and keep real-time synchronization;Set up the index of described enterprise name data and described Full-text database.
Further, the described index setting up described enterprise name data and described Full-text database, comprising: by institute State enterprise's font size that enterprise name data each enterprise name corresponding comprises and industry characteristic as index column, set up described enterprise Industry name data and the index of described Full-text database.
Further, come the enterprise name of front n position in the described retrieval result of described display, so that user checks, comprising: Show the enterprise name coming first 10 in described retrieval result, so that user checks.
Beneficial effect using above-mentioned further scheme is: by showing the enterprise's name coming first 10 in retrieval result Claim, user can be facilitated to check the result of similarity retrieval, so that user carries out contrast and judges, and make a choice, greatly drop The low work load of relevant staff.
The advantage of the aspect that the present invention adds will be set forth in part in the description, and partly will become from the following description Obtain substantially, or recognized by present invention practice.
Brief description
In order to be illustrated more clearly that the technical scheme of the embodiment of the present invention, below will be to the embodiment of the present invention or prior art In description the accompanying drawing of required use be briefly described it should be apparent that, drawings described below is only the present invention Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this A little accompanying drawings obtain other accompanying drawings.
Fig. 1 is a kind of indicative flowchart of the similarity retrieval method of enterprise name provided in an embodiment of the present invention;
A kind of schematic flow of the similarity retrieval method of enterprise name that Fig. 2 provides for another embodiment of the present invention Figure;
A kind of schematic flow of the similarity retrieval method of enterprise name that Fig. 3 provides for another embodiment of the present invention Figure.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes it is clear that described embodiment is a part of embodiment of the present invention, rather than whole embodiments.Based on this Embodiment in bright, the every other reality that those of ordinary skill in the art are obtained on the premise of not making creative work Apply example, all should belong to the scope of protection of the invention.
Fig. 1 gives a kind of schematic stream of the similarity retrieval method 100 of enterprise name provided in an embodiment of the present invention Cheng Tu.The similarity retrieval method 100 of enterprise name as shown in Figure 1 includes:
110th, resolution process is carried out to the search key of input, the search key after being processed, wherein, retrieval is closed Key word is enterprise name to be retrieved.
120th, according to the search key after processing, determine retrieval phrase.
130th, similarity retrieval is carried out to the retrieval phrase determining, obtain retrieval result.
140th, come the enterprise name of front n position in display retrieval result, so that user checks, n takes more than 1 integer.
A kind of similarity retrieval method of enterprise name that the present invention provides, by carrying out point to the search key of input Solution is processed, the search key after being processed, and according to the search key after processing, determines retrieval phrase, then to determination Retrieval phrase carry out similarity retrieval, obtain retrieval result, finally show the enterprise name coming front n position in retrieval result, So that user checks, recall precision not only can be made to greatly improve, and facilitate user to check the result of similarity retrieval, And meet the demand of similarity retrieval business.
Specifically, in this embodiment, step 110 may include that whether comprise in the search key judging to input to go The ingredient with the type of organization type of business is drawn in administrative division.If not comprising, by search key be decomposed into enterprise's font size and/or Industry characteristic is using the search key as after process.Otherwise, search key is decomposed into administrative division, type of organization enterprise Type, enterprise's font size and industry characteristic, and the search key as after process using enterprise's font size and/or industry characteristic.
For example: the search key of input is " Beijing Lian Xinyong long day company limited ", contains row in this search key " Beijing " and the ingredient of the type of organization type of business " company limited " are drawn in administrative division, then need to remove it, by enterprise's font size " connection letter long day forever " is as the search key after processing.
Step 120 may include that and arranges the search key after processing according to the order before resolution process.Will be with inspection In rope keyword, the m different phrase of word is defined as retrieving phrase, and m can take 0,1 or 2.
For example: retrieval phrase may include that
1), will be from long day, connection * letter long day, connection letter * long day, the connection letter forever forever forever of the different phrase of 0 word: * connection letter in " connection letter long day forever " Forever * long day, connection letter forever long day *, etc.;
2), will from the different phrase of 1 word: * letter in " connection letter forever long day " forever long day, connection * forever long day, connection letter * long day, connection letter forever *, etc. Deng;
3), will be from the different phrase of 2 words in " connection letter long day forever ": * * long day, connection letter * *, connection * * long day forever;Connection * *, * letter * forever Long day;* believe forever *, etc..
Or, in another embodiment, step 120 may include that by process after search key according to decomposition at Order arrangement before reason.Synonymous with search key and/or unisonance phrase is defined as retrieving phrase.
For example: retrieval phrase can include the phrase with " connection letter forever long day " unisonance: such as, connect with the heart chant, even newly to chant prosperous, vessel used to hold grain at the imperial sacrifice prosperous Swimming length etc..It should be understood that " connection letter forever long day " this four words, each word has its phonetically similar word, then between the phonetically similar word of four words Fully intermeshing combination after the phrase with " connection letter forever long day " unisonance that broadly falls into of result, and only list here several as example Son, illustrates to the technical scheme of the embodiment of the present invention, does not constitute any restriction to the technical scheme of the embodiment of the present invention.
Or, in another embodiment, step 120 may include that by process after search key according to decomposition at Order arrangement before reason.The phrase of the word composition identical and adjacent with k in search key is defined as retrieving phrase.Should Understand, the number of words that the value of k is comprised depending on search key, maximum take the number of words identical being comprised with search key Numerical value, minimum takes 2.
For example: retrieval phrase may include that
1) phrase of identical and adjacent with 4 in " connection letter forever long day " word composition: * connection letter forever long day, connection letter forever long day *, etc. Deng;
2) phrase with 3 identical and adjacent word compositions in " connection letter long day forever ": connection * letter long day, connection letter * long day, * letter forever forever Forever long day, connection letter forever *, etc.;
3) phrase with 2 identical and adjacent word compositions in " connection letter long day forever ": connection letter * long day, connection * letter * long day, * * forever forever Forever long day, connection letter forever * *, * letter forever *, etc..
Or, in another embodiment, step 120 may include that by process after search key according to decomposition at Order arrangement before reason.The phrase of identical with q in search key but scattered word composition is defined as retrieving phrase.Should Understand, the number of words that the value of q is comprised depending on search key, maximum take the number of words identical being comprised with search key Numerical value, minimum takes 2.
For example: retrieval phrase may include that
1) phrase forming with 4 identical but scattered words in " connection letter long day forever ": connection * letter * * long day, * connection * letter * * forever forever Long day, connection * letter * forever * long day *, etc.;
2) phrase forming with 3 identical but scattered words in " connection letter long day forever ": connection * letter * *, * letter * * long day, * connection * forever forever Letter * forever *, * letter * forever * long day *, etc.;
3) identical with 2 in " connection letter forever long day " but the phrase of scattered words composition: connection * letter *, * letter * forever *, connection * forever *, Connection * * long day, * letter * long day, etc..
In this embodiment, by being decomposed the search key of input, administrative division and type of organization enterprise are removed The general informations such as industry type, determine retrieval phrase according to enterprise's font size and/or industry characteristic, can greatly reduce retrieval type Quantity, thus effectively raise recall precision.
Step 130 may include that whether the quantity judging to retrieve phrase exceedes preset value.If exceeding, using distributed More piece point retrieval, and the retrieval result that each node is obtained collected, and obtains retrieval result.Otherwise, examined using single node Rope, obtains retrieval result.
For example: when retrieving the quantity of phrase more than 10, can be using distributed search it may be assumed that being entered using multiple nodes Then the retrieval result of each node is collected by line retrieval, obtains final retrieval result.When the quantity of retrieval phrase is little When 10, using single-unit point retrieval it may be assumed that entering line retrieval using a search engine, obtain retrieval result.
In this embodiment, by judging the quantity of retrieval phrase, determine and adopt distributed more piece point retrieval or single node Retrieval, greatly can speed retrieval rate, thus effectively raising recall precision.
It should be understood that in this embodiment, it is only to take 10 as an example, the technology to the embodiment of the present invention using preset value Scheme illustrates, and does not constitute any restriction to the technical scheme of the embodiment of the present invention.That is, preset value can also take 8th, 12,15,16,18,20 etc., the embodiment of the present invention is to this not in any restriction.
Step 140 is specifically as follows: comes the enterprise name of first 10 in display retrieval result, so that user checks.
It should be understood that in the above-described embodiments, the retrieval result of final display can be the retrieval result obtaining in step 130 In come the enterprise name of first 10, be only to come first 10 as an example, the technology to the embodiment of the present invention showing Scheme illustrates, and does not constitute any restriction to the technical scheme of the embodiment of the present invention.That is it is also possible to be display row In the enterprise names of first 8, the enterprise name of first 12, the enterprise name of first 15, first 20 enterprise name etc., the present invention Embodiment is to this not in any restriction.
In this embodiment, by showing the enterprise name coming first 10 in retrieval result, user can be facilitated to check The result of similarity retrieval, so that user carries out contrast and judges, and makes a choice, the work of relevant staff greatly reduces Bear.
Alternatively, as one embodiment of the invention, as shown in Fig. 2 before step 110, this enterprise name similar Property search method 100 can also include:
150th, build enterprise name search library.
Specifically, in this embodiment, enterprise name data is increased in Full-text database, and keep same in real time Step.Set up the index of enterprise name data and Full-text database.
For example: enterprise's font size that comprise each enterprise name corresponding for enterprise name data and industry characteristic are as index Row, set up the index of enterprise name data and Full-text database.The specific mode setting up index can adopt existing Technology completes, succinct in order to describe, will not be described here.
Alternatively, as another embodiment of the present invention, as shown in Fig. 2 after step 130, the phase of this enterprise name Can also include like property search method 100:
160th, calculate the Similarity value of each retrieval result in search key and retrieval result.
170th, according to Similarity value, retrieval result is ranked up.
Specifically, in this embodiment it is possible to according to Similarity value from high to low, retrieval result is ranked up.With regard to The computational methods of similarity can be using the existing method calculating similarity, such as: shared when being occurred according to situations below Weighted value, the search key calculating each retrieval result with input is it may be assumed that the similarity of enterprise name:
1) and input enterprise name difference number of words.
2) correlation degree of the affiliated industry of enterprise name of the affiliated industry of coupling enterprise and input.
3) homonym accounts for the proportion of whole number of words.
Succinct in order to describe, specific calculating process will not be described here.
A kind of technical scheme of the similarity retrieval method of the enterprise name present invention being provided with reference to Fig. 3 is carried out in detail Thin description.The similarity retrieval method 200 of enterprise name as shown in Figure 3 includes:
211st, enterprise name data is increased in Full-text database, and keep real-time synchronization.
Specifically, in this embodiment, the method that the concrete grammar of data syn-chronization adopts existing data syn-chronization, in order to retouch That states is succinct, will not be described here.
212nd, set up the index of enterprise name data and Full-text database.
Specifically, in this embodiment it is possible to enterprise's word that each enterprise name corresponding for enterprise name data is comprised Number and industry characteristic as index column, set up the index of enterprise name data and Full-text database.
220th, judge in search key, whether to comprise the ingredient of administrative division and the type of organization type of business, if bag Contain, then execution step 231.Otherwise, execution step 232.
231st, search key is decomposed into administrative division, the type of organization type of business, enterprise's font size and industry characteristic.
232nd, the enterprise's font size the obtaining and/or industry characteristic search key as after process will be decomposed.
241st, the search key after processing is arranged according to the order before resolution process.
For example: the search key of input is " Beijing Keycom Intellectual Property Agency Co., Ltd. ", then this search key In comprise administrative division " Beijing " and the type of organization type of business " company limited ", then will enterprise's font size " light create " and industry spy Point " intellectual property agency " is as the search key after processing, and the order arrangement according to " light wound intellectual property agency ", Rather than carried out combination in any.
242nd, will be defined as retrieving phrase from 0 word in search key or 1 word or 2 different phrases of word.Also It is to say, can will be defined as retrieving phrase from " light wound intellectual property agency " 0 different phrase of word;Or, can by with " light 1 different phrase of word of wound intellectual property agency " is defined as retrieving phrase;Or, can by with " light wound intellectual property agency " 2 The different phrase of individual word is defined as retrieving phrase, and the specific not here that formed enumerates.
Or, 243, by synonymous with search key and/or unisonance phrase be defined as retrieve phrase.That is, can Will be defined as retrieving phrase with " light wound intellectual property agency " synonymous and/or unisonance phrase, specifically form not here one One enumerates.
Or, 244, will be defined as retrieving phrase with the phrase of word that in search key, 2 are identical and adjacent composition.? That is, it is possible to will be defined as retrieving phrase with the phrase of 2 identical and adjacent word compositions in " light wound intellectual property agency ", The specific not here that formed enumerates.
Or, 245, by identical with 2 in search key but scattered word composition phrase be defined as retrieve phrase.? That is, it is possible to the phrase of identical with 2 in " light wound intellectual property agency " but scattered word composition is defined as retrieving phrase, The specific not here that formed enumerates.
251st, judge the quantity retrieving phrase obtaining in above-mentioned steps 242 or step 243 or step 244 or step 245 Whether exceed preset value.If exceeding, execution step 253.Otherwise, execution step 252.
252nd, adopt single-unit point retrieval, obtain retrieval result.
253rd, adopt distributed more piece point retrieval, and the retrieval result that each node is obtained is collected, and is retrieved Result.
261st, calculate each retrieval in the retrieval result obtaining in search key and above-mentioned steps 252 or step 253 The Similarity value of result.
262nd, according to Similarity value from high to low, retrieval result is ranked up.
270th, come the enterprise name of first 10 in display retrieval result, so that user checks.
It should be understood that this embodiment in, the numerical value of appearance, such as: " 0 word or 1 word or 2 words ", " 2 identical and Adjacent ", " 2 identical but dispersion ", " coming first 10 ", these are only for illustrating the technical side of the embodiment of the present invention Case, does not constitute any restriction to the embodiment of the present invention.
It should also be understood that in various embodiments of the present invention, the size of the sequence number of above-mentioned each process is not meant to execution sequence Priority, the execution sequence of each process should determine with its function and internal logic, and should not be to the enforcement of the embodiment of the present invention Journey constitutes any restriction.
A kind of similarity retrieval method of the enterprise name being provided by the present invention, so that recall precision greatly carries Height, may insure the interior similarity retrieval completing enterprise name in 10 seconds under normal circumstances.The number of words of search key is not With the quantity of the corresponding retrieval phrase generating is also different, such as: the number of words of search key is more, then the retrieval phrase obtaining Quantity is more.The difference of the number of words according to search key, corresponding retrieval duration distribution is as follows:
Therefore, a kind of similarity retrieval method of the enterprise name being provided by the present invention, can apply to industrial and commercial title The work efficiency of the Approximate Retrieval link of industrial and commercial title registration in register system, can be greatly improved, complete for the registration of industrial and commercial title Face is realized quick enterprise core name and is provided strong technological means, is that business system reform provides power-assisted.
In addition, the terms "and/or", a kind of only incidence relation of description affiliated partner, expression there may be Three kinds of relations, for example, a and/or b, can represent: individualism a, there are a and b, these three situations of individualism b simultaneously.Separately Outward, character "/" herein, typically represent forward-backward correlation to as if a kind of relation of "or".
More than, the specific embodiment of the only present invention, but protection scope of the present invention is not limited thereto, any it is familiar with Those skilled in the art the invention discloses technical scope in, various equivalent modifications or replacement can be readily occurred in, These modifications or replacement all should be included within the scope of the present invention.Therefore, protection scope of the present invention should be wanted with right The protection domain asked is defined.

Claims (10)

1. a kind of similarity retrieval method of enterprise name is it is characterised in that include:
Resolution process is carried out to the search key of input, the search key after being processed, wherein, described search key For enterprise name to be retrieved;
According to the described search key after processing, determine retrieval phrase;
Similarity retrieval is carried out to the described retrieval phrase determining, obtains retrieval result;
Show the enterprise name coming front n position in described retrieval result, so that user checks, n takes more than 1 integer.
2. enterprise name according to claim 1 similarity retrieval method it is characterised in that described to input retrieval Keyword carries out segment processing, the search key after being processed, comprising:
Judge in described search key, whether to comprise the ingredient of administrative division and the type of organization type of business;
If not comprising, described search key is decomposed into enterprise's font size and/or industry characteristic using the retrieval as after process Keyword;
Otherwise, described search key is decomposed into administrative division, the type of organization type of business, enterprise's font size and industry characteristic, And using enterprise's font size and/or industry characteristic as the search key after process.
3. enterprise name according to claim 1 similarity retrieval method it is characterised in that described according to process after Described search key, determines retrieval phrase, comprising:
Described search key after processing is arranged according to the order before resolution process;
Described retrieval phrase will be defined as from the m different phrase of word in described search key, m takes 0,1 or 2;Or,
Synonymous with described search key and/or unisonance phrase is defined as described retrieval phrase;Or,
The phrase of the word composition identical and adjacent with k in described search key is defined as described retrieval phrase;Or,
The phrase of identical with q in described search key but scattered word composition is defined as described retrieval phrase.
4. enterprise name according to claim 1 similarity retrieval method it is characterised in that described to determine described in Retrieval phrase carries out similarity retrieval, obtains retrieval result, comprising:
Judge whether the quantity of described retrieval phrase exceedes preset value;
If exceeding, adopt distributed more piece point retrieval, and the retrieval result that each node is obtained is collected, obtain described Retrieval result;
Otherwise, using single-unit point retrieval, obtain described retrieval result.
5. the similarity retrieval method of the enterprise name according to any one of claim 1-4 is it is characterised in that examined After hitch fruit, also include:
Calculate the Similarity value of each retrieval result in described search key and described retrieval result;
According to described Similarity value, described retrieval result is ranked up.
6. enterprise name according to claim 5 similarity retrieval method it is characterised in that described according to described similar Angle value, is ranked up to described retrieval result, comprising:
According to described Similarity value from high to low, described retrieval result is ranked up.
7. the similarity retrieval method of enterprise name according to claim 5 is it is characterised in that close in the retrieval to input Key word carries out segment processing, before the search key after being processed, also includes: builds enterprise name search library.
8. the similarity retrieval method of enterprise name according to claim 7 is it is characterised in that described structure enterprise name Search library, comprising:
Enterprise name data is increased in Full-text database, and keeps real-time synchronization;
Set up the index of described enterprise name data and described Full-text database.
9. the similarity retrieval method of enterprise name according to claim 8 is it is characterised in that described set up described enterprise Name data and the index of described Full-text database, comprising:
Enterprise's font size that comprise each enterprise name corresponding for described enterprise name data and industry characteristic, as index column, are built Found the index of described enterprise name data and described Full-text database.
10. the similarity retrieval method of enterprise name according to claim 1 is it is characterised in that the described inspection of described display The enterprise name of front n position is come, so that user checks in hitch fruit, comprising:
Show the enterprise name coming first 10 in described retrieval result, so that user checks.
CN201610829356.5A 2016-09-18 2016-09-18 Similarity search method of enterprise names Pending CN106354871A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610829356.5A CN106354871A (en) 2016-09-18 2016-09-18 Similarity search method of enterprise names

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610829356.5A CN106354871A (en) 2016-09-18 2016-09-18 Similarity search method of enterprise names

Publications (1)

Publication Number Publication Date
CN106354871A true CN106354871A (en) 2017-01-25

Family

ID=57859975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610829356.5A Pending CN106354871A (en) 2016-09-18 2016-09-18 Similarity search method of enterprise names

Country Status (1)

Country Link
CN (1) CN106354871A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951415A (en) * 2017-04-01 2017-07-14 银联智策顾问(上海)有限公司 A kind of name of firm searching method and device
CN107153991A (en) * 2017-04-28 2017-09-12 国网冀北电力有限公司物资分公司 The inconsistent integrated conduct method of title in a kind of financial system
CN107357916A (en) * 2017-07-19 2017-11-17 北京金堤科技有限公司 Data processing method and system
CN110928915A (en) * 2018-08-31 2020-03-27 北京京东金融科技控股有限公司 Method, device and equipment for fuzzy matching of Chinese names and readable storage medium
CN111783467A (en) * 2020-07-21 2020-10-16 致诚阿福技术发展(北京)有限公司 Enterprise name identification method and device
CN111881183A (en) * 2020-07-28 2020-11-03 北京金堤科技有限公司 Enterprise name matching method and device, storage medium and electronic equipment
CN112364635A (en) * 2020-11-30 2021-02-12 中国银行股份有限公司 Enterprise name duplication checking method and device
CN112650951A (en) * 2020-12-21 2021-04-13 撼地数智(重庆)科技有限公司 Enterprise similarity matching method, system and computing device
CN115329039A (en) * 2022-08-08 2022-11-11 前锦网络信息技术(上海)有限公司 Recruitment enterprise searching method, system, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080104276A1 (en) * 2006-10-25 2008-05-01 Arcsight, Inc. Real-Time Identification of an Asset Model and Categorization of an Asset to Assist in Computer Network Security
CN102279843A (en) * 2010-06-13 2011-12-14 北京四维图新科技股份有限公司 Method and device for processing phrase data
CN103885937A (en) * 2014-04-14 2014-06-25 焦点科技股份有限公司 Method for judging repetition of enterprise Chinese names on basis of core word similarity
CN104252507A (en) * 2013-06-28 2014-12-31 北京华傲达数据技术有限公司 Enterprise data matching method and device
CN104424613A (en) * 2013-09-04 2015-03-18 航天信息股份有限公司 Value added tax invoice monitoring method and system thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080104276A1 (en) * 2006-10-25 2008-05-01 Arcsight, Inc. Real-Time Identification of an Asset Model and Categorization of an Asset to Assist in Computer Network Security
CN102279843A (en) * 2010-06-13 2011-12-14 北京四维图新科技股份有限公司 Method and device for processing phrase data
CN104252507A (en) * 2013-06-28 2014-12-31 北京华傲达数据技术有限公司 Enterprise data matching method and device
CN104424613A (en) * 2013-09-04 2015-03-18 航天信息股份有限公司 Value added tax invoice monitoring method and system thereof
CN103885937A (en) * 2014-04-14 2014-06-25 焦点科技股份有限公司 Method for judging repetition of enterprise Chinese names on basis of core word similarity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨艳等: "《网络营销理论与实务》", 30 June 2015, 知识产权出版社 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951415A (en) * 2017-04-01 2017-07-14 银联智策顾问(上海)有限公司 A kind of name of firm searching method and device
CN107153991A (en) * 2017-04-28 2017-09-12 国网冀北电力有限公司物资分公司 The inconsistent integrated conduct method of title in a kind of financial system
CN107357916A (en) * 2017-07-19 2017-11-17 北京金堤科技有限公司 Data processing method and system
CN110928915A (en) * 2018-08-31 2020-03-27 北京京东金融科技控股有限公司 Method, device and equipment for fuzzy matching of Chinese names and readable storage medium
CN111783467A (en) * 2020-07-21 2020-10-16 致诚阿福技术发展(北京)有限公司 Enterprise name identification method and device
CN111881183A (en) * 2020-07-28 2020-11-03 北京金堤科技有限公司 Enterprise name matching method and device, storage medium and electronic equipment
CN112364635A (en) * 2020-11-30 2021-02-12 中国银行股份有限公司 Enterprise name duplication checking method and device
CN112364635B (en) * 2020-11-30 2023-11-21 中国银行股份有限公司 Enterprise name duplicate checking method and device
CN112650951A (en) * 2020-12-21 2021-04-13 撼地数智(重庆)科技有限公司 Enterprise similarity matching method, system and computing device
CN115329039A (en) * 2022-08-08 2022-11-11 前锦网络信息技术(上海)有限公司 Recruitment enterprise searching method, system, electronic equipment and storage medium
CN115329039B (en) * 2022-08-08 2023-08-04 前锦网络信息技术(上海)有限公司 Recruitment enterprise searching method and system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106354871A (en) Similarity search method of enterprise names
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
CN109800284B (en) Task-oriented unstructured information intelligent question-answering system construction method
CN104615767B (en) Training method, search processing method and the device of searching order model
CN105824959B (en) Public opinion monitoring method and system
CN103207860B (en) The entity relation extraction method and apparatus of public sentiment event
CN110334178B (en) Data retrieval method, device, equipment and readable storage medium
CN103678576B (en) The text retrieval system analyzed based on dynamic semantics
CN107463658B (en) Text classification method and device
CN103150333B (en) Opinion leader identification method in microblog media
CN103593474B (en) Image retrieval sort method based on deep learning
CN106446070B (en) A kind of information processing unit and method based on patent group
US20090094223A1 (en) System and method for classifying search queries
CN110609902A (en) Text processing method and device based on fusion knowledge graph
CN106682172A (en) Keyword-based document research hotspot recommending method
CN104978314B (en) Media content recommendations method and device
CN106204156A (en) A kind of advertisement placement method for network forum and device
CN102054029A (en) Figure information disambiguation treatment method based on social network and name context
US20070288442A1 (en) System and a program for searching documents
US20140317001A1 (en) Methods for evaluating term support in patent-related documents
CN107315738A (en) A kind of innovation degree appraisal procedure of text message
CN111221968B (en) Author disambiguation method and device based on subject tree clustering
CN107247743A (en) A kind of judicial class case search method and system
CN110569273A (en) Patent retrieval system and method based on relevance sorting
CN112784049B (en) Text data-oriented online social platform multi-element knowledge acquisition method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170125