CN106354871A - Similarity search method of enterprise names - Google Patents
Similarity search method of enterprise names Download PDFInfo
- Publication number
- CN106354871A CN106354871A CN201610829356.5A CN201610829356A CN106354871A CN 106354871 A CN106354871 A CN 106354871A CN 201610829356 A CN201610829356 A CN 201610829356A CN 106354871 A CN106354871 A CN 106354871A
- Authority
- CN
- China
- Prior art keywords
- retrieval
- search key
- phrase
- enterprise name
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3338—Query expansion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
Abstract
The invention relates to a similarity search method of enterprise names. The similarity search method comprises steps as follows: input search keywords are decomposed, and the processed search keywords are obtained, wherein the search keywords are to-be-researched enterprise names; search phrases are determined according to the processed search keywords; similarity search is performed on the determined search phrases and search results are obtained; the enterprise names ranking in the top N of the search results are displayed to be checked by users, and N is an integer larger than 1. With the adoption of the method, the search efficiency is greatly improved, the users can check similarity research results, and the requirement of similarity search business is met.
Description
Technical field
The present invention relates to similarity retrieval technical field, the similarity retrieval method of more particularly, to a kind of enterprise name.
Background technology
The accurate inquiry velocity of data base is quickish, but fuzzy query after data volume is more than million grades, especially
It is that inquiry velocity for "comprising" relation can rapidly reduce, and generally all can exceed 10 seconds.For example: by inputting keyword
" computer " is wanted will be very slow in hit " Great Wall computer software " speed.The approximate rule need still more according to judging title
The fuzzy query of loop nesting to be used, performance cannot accept completely.
Typically encounter this problem, can be using arriving " global search technology ", such as: Baidu, search dog etc. are it is achieved that according to few
Amount keyword quick-searching in mass data obtains result.But, inventor finds during implementing one's duty: general
Global search technology, still reaches to less than title Approximate Retrieval requirement, subject matter is as follows:
First, general global search technology is based on index in classification, i.e. carry out the retrieval string of input first
Then phrase after participle is entered line retrieval in participle index database by participle, and then integrated searching result is pressed hit degree height and arranged
Sequence.But title Approximate Retrieval is not entirely in units of word, such as " Alipay ", " Ou Fubao " two word strings, if point
Word enters line retrieval it is likely that will be considered that they are dissimilar.
Secondly, general full-text search engine can also support the pattern by search words, but the advantage in performance does not just have
, for example: for the name character string of 15 words, by word fully intermeshing enter that line retrieval obtains it is concluded that only by " in full
Retrieval " also cannot fully achieve the business demand with regard to enterprise name Approximate Retrieval, and its retrieval time limit can exceed 30 seconds, and retrieves
The sequence of result is also inaccurate, larger with the title approximation difference of people's generally sensation, much cannot meet business demand.
Content of the invention
The technical problem to be solved is for the deficiencies in the prior art, provides a kind of similarity of enterprise name
Search method.
The technical scheme is that a kind of similarity retrieval method of enterprise name, bag
Include:
Resolution process is carried out to the search key of input, the search key after being processed, wherein, described retrieval is closed
Key word is enterprise name to be retrieved;
According to the described search key after processing, determine retrieval phrase;
Similarity retrieval is carried out to the described retrieval phrase determining, obtains retrieval result;
Show the enterprise name coming front n position in described retrieval result, so that user checks, n takes more than 1 integer.
The invention has the beneficial effects as follows: by resolution process is carried out to the search key inputting, the inspection after being processed
Rope keyword, and according to the search key after processing, determine retrieval phrase, then similarity inspection is carried out to the retrieval phrase determining
Rope, obtains retrieval result, finally shows the enterprise name coming front n position in retrieval result, so that user checks, not only can make
Obtain recall precision to greatly improve, and facilitate user to check the result of similarity retrieval, and meet similarity retrieval business
Demand.
On the basis of technique scheme, the present invention can also do following improvement.
Further, the described search key to input carries out segment processing, the search key after being processed, bag
Include: judge in described search key, whether to comprise the ingredient of administrative division and the type of organization type of business;
If not comprising, described search key is decomposed into enterprise's font size and/or industry characteristic using as after process
Search key;Otherwise, described search key is decomposed into administrative division, the type of organization type of business, enterprise's font size and row
Industry feature, and the search key as after process using enterprise's font size and/or industry characteristic.
Further, described according to process after described search key, determine retrieval phrase, comprising:
Described search key after processing is arranged according to the order before resolution process;
Described retrieval phrase will be defined as from the m different phrase of word in described search key, m takes 0,1 or 2;Or,
Synonymous with described search key and/or unisonance phrase is defined as described retrieval phrase;Or,
The phrase of the word composition identical and adjacent with k in described search key is defined as described retrieval phrase;Or
Person,
The phrase of identical with q in described search key but scattered word composition is defined as described retrieval phrase.
Beneficial effect using above-mentioned further scheme is: by being decomposed the search key of input, removes row
Administrative division is drawn and the general informations such as the type of organization type of business, determines retrieval phrase according to enterprise's font size and/or industry characteristic, permissible
Greatly reduce the quantity of retrieval type, thus effectively raising recall precision.
Further, the described described retrieval phrase to determination carries out similarity retrieval, obtains retrieval result, comprising: sentence
Whether the quantity of disconnected described retrieval phrase exceedes preset value;
If exceeding, adopt distributed more piece point retrieval, and the retrieval result that each node is obtained is collected, and obtains
Described retrieval result;Otherwise, using single-unit point retrieval, obtain described retrieval result.
Beneficial effect using above-mentioned further scheme is: by judging the quantity of retrieval phrase, determines using distributed
More piece point retrieval or single-unit point retrieval, greatly can speed retrieval rate, thus effectively raising recall precision.
Further, after obtaining retrieval result, also include: calculate in described search key and described retrieval result
The Similarity value of each retrieval result;According to described Similarity value, described retrieval result is ranked up.
Further, described according to described Similarity value, described retrieval result is ranked up, comprising: according to described phase
Like angle value from high to low, described retrieval result is ranked up.
Further, carry out segment processing in the search key to input, before the search key after being processed,
Also include: build enterprise name search library.
Further, described structure enterprise name search library, comprising: enterprise name data is increased to full-text search data
In storehouse, and keep real-time synchronization;Set up the index of described enterprise name data and described Full-text database.
Further, the described index setting up described enterprise name data and described Full-text database, comprising: by institute
State enterprise's font size that enterprise name data each enterprise name corresponding comprises and industry characteristic as index column, set up described enterprise
Industry name data and the index of described Full-text database.
Further, come the enterprise name of front n position in the described retrieval result of described display, so that user checks, comprising:
Show the enterprise name coming first 10 in described retrieval result, so that user checks.
Beneficial effect using above-mentioned further scheme is: by showing the enterprise's name coming first 10 in retrieval result
Claim, user can be facilitated to check the result of similarity retrieval, so that user carries out contrast and judges, and make a choice, greatly drop
The low work load of relevant staff.
The advantage of the aspect that the present invention adds will be set forth in part in the description, and partly will become from the following description
Obtain substantially, or recognized by present invention practice.
Brief description
In order to be illustrated more clearly that the technical scheme of the embodiment of the present invention, below will be to the embodiment of the present invention or prior art
In description the accompanying drawing of required use be briefly described it should be apparent that, drawings described below is only the present invention
Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this
A little accompanying drawings obtain other accompanying drawings.
Fig. 1 is a kind of indicative flowchart of the similarity retrieval method of enterprise name provided in an embodiment of the present invention;
A kind of schematic flow of the similarity retrieval method of enterprise name that Fig. 2 provides for another embodiment of the present invention
Figure;
A kind of schematic flow of the similarity retrieval method of enterprise name that Fig. 3 provides for another embodiment of the present invention
Figure.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation describes it is clear that described embodiment is a part of embodiment of the present invention, rather than whole embodiments.Based on this
Embodiment in bright, the every other reality that those of ordinary skill in the art are obtained on the premise of not making creative work
Apply example, all should belong to the scope of protection of the invention.
Fig. 1 gives a kind of schematic stream of the similarity retrieval method 100 of enterprise name provided in an embodiment of the present invention
Cheng Tu.The similarity retrieval method 100 of enterprise name as shown in Figure 1 includes:
110th, resolution process is carried out to the search key of input, the search key after being processed, wherein, retrieval is closed
Key word is enterprise name to be retrieved.
120th, according to the search key after processing, determine retrieval phrase.
130th, similarity retrieval is carried out to the retrieval phrase determining, obtain retrieval result.
140th, come the enterprise name of front n position in display retrieval result, so that user checks, n takes more than 1 integer.
A kind of similarity retrieval method of enterprise name that the present invention provides, by carrying out point to the search key of input
Solution is processed, the search key after being processed, and according to the search key after processing, determines retrieval phrase, then to determination
Retrieval phrase carry out similarity retrieval, obtain retrieval result, finally show the enterprise name coming front n position in retrieval result,
So that user checks, recall precision not only can be made to greatly improve, and facilitate user to check the result of similarity retrieval,
And meet the demand of similarity retrieval business.
Specifically, in this embodiment, step 110 may include that whether comprise in the search key judging to input to go
The ingredient with the type of organization type of business is drawn in administrative division.If not comprising, by search key be decomposed into enterprise's font size and/or
Industry characteristic is using the search key as after process.Otherwise, search key is decomposed into administrative division, type of organization enterprise
Type, enterprise's font size and industry characteristic, and the search key as after process using enterprise's font size and/or industry characteristic.
For example: the search key of input is " Beijing Lian Xinyong long day company limited ", contains row in this search key
" Beijing " and the ingredient of the type of organization type of business " company limited " are drawn in administrative division, then need to remove it, by enterprise's font size
" connection letter long day forever " is as the search key after processing.
Step 120 may include that and arranges the search key after processing according to the order before resolution process.Will be with inspection
In rope keyword, the m different phrase of word is defined as retrieving phrase, and m can take 0,1 or 2.
For example: retrieval phrase may include that
1), will be from long day, connection * letter long day, connection letter * long day, the connection letter forever forever forever of the different phrase of 0 word: * connection letter in " connection letter long day forever "
Forever * long day, connection letter forever long day *, etc.;
2), will from the different phrase of 1 word: * letter in " connection letter forever long day " forever long day, connection * forever long day, connection letter * long day, connection letter forever *, etc.
Deng;
3), will be from the different phrase of 2 words in " connection letter long day forever ": * * long day, connection letter * *, connection * * long day forever;Connection * *, * letter * forever
Long day;* believe forever *, etc..
Or, in another embodiment, step 120 may include that by process after search key according to decomposition at
Order arrangement before reason.Synonymous with search key and/or unisonance phrase is defined as retrieving phrase.
For example: retrieval phrase can include the phrase with " connection letter forever long day " unisonance: such as, connect with the heart chant, even newly to chant prosperous, vessel used to hold grain at the imperial sacrifice prosperous
Swimming length etc..It should be understood that " connection letter forever long day " this four words, each word has its phonetically similar word, then between the phonetically similar word of four words
Fully intermeshing combination after the phrase with " connection letter forever long day " unisonance that broadly falls into of result, and only list here several as example
Son, illustrates to the technical scheme of the embodiment of the present invention, does not constitute any restriction to the technical scheme of the embodiment of the present invention.
Or, in another embodiment, step 120 may include that by process after search key according to decomposition at
Order arrangement before reason.The phrase of the word composition identical and adjacent with k in search key is defined as retrieving phrase.Should
Understand, the number of words that the value of k is comprised depending on search key, maximum take the number of words identical being comprised with search key
Numerical value, minimum takes 2.
For example: retrieval phrase may include that
1) phrase of identical and adjacent with 4 in " connection letter forever long day " word composition: * connection letter forever long day, connection letter forever long day *, etc.
Deng;
2) phrase with 3 identical and adjacent word compositions in " connection letter long day forever ": connection * letter long day, connection letter * long day, * letter forever forever
Forever long day, connection letter forever *, etc.;
3) phrase with 2 identical and adjacent word compositions in " connection letter long day forever ": connection letter * long day, connection * letter * long day, * * forever forever
Forever long day, connection letter forever * *, * letter forever *, etc..
Or, in another embodiment, step 120 may include that by process after search key according to decomposition at
Order arrangement before reason.The phrase of identical with q in search key but scattered word composition is defined as retrieving phrase.Should
Understand, the number of words that the value of q is comprised depending on search key, maximum take the number of words identical being comprised with search key
Numerical value, minimum takes 2.
For example: retrieval phrase may include that
1) phrase forming with 4 identical but scattered words in " connection letter long day forever ": connection * letter * * long day, * connection * letter * * forever forever
Long day, connection * letter * forever * long day *, etc.;
2) phrase forming with 3 identical but scattered words in " connection letter long day forever ": connection * letter * *, * letter * * long day, * connection * forever forever
Letter * forever *, * letter * forever * long day *, etc.;
3) identical with 2 in " connection letter forever long day " but the phrase of scattered words composition: connection * letter *, * letter * forever *, connection * forever *,
Connection * * long day, * letter * long day, etc..
In this embodiment, by being decomposed the search key of input, administrative division and type of organization enterprise are removed
The general informations such as industry type, determine retrieval phrase according to enterprise's font size and/or industry characteristic, can greatly reduce retrieval type
Quantity, thus effectively raise recall precision.
Step 130 may include that whether the quantity judging to retrieve phrase exceedes preset value.If exceeding, using distributed
More piece point retrieval, and the retrieval result that each node is obtained collected, and obtains retrieval result.Otherwise, examined using single node
Rope, obtains retrieval result.
For example: when retrieving the quantity of phrase more than 10, can be using distributed search it may be assumed that being entered using multiple nodes
Then the retrieval result of each node is collected by line retrieval, obtains final retrieval result.When the quantity of retrieval phrase is little
When 10, using single-unit point retrieval it may be assumed that entering line retrieval using a search engine, obtain retrieval result.
In this embodiment, by judging the quantity of retrieval phrase, determine and adopt distributed more piece point retrieval or single node
Retrieval, greatly can speed retrieval rate, thus effectively raising recall precision.
It should be understood that in this embodiment, it is only to take 10 as an example, the technology to the embodiment of the present invention using preset value
Scheme illustrates, and does not constitute any restriction to the technical scheme of the embodiment of the present invention.That is, preset value can also take
8th, 12,15,16,18,20 etc., the embodiment of the present invention is to this not in any restriction.
Step 140 is specifically as follows: comes the enterprise name of first 10 in display retrieval result, so that user checks.
It should be understood that in the above-described embodiments, the retrieval result of final display can be the retrieval result obtaining in step 130
In come the enterprise name of first 10, be only to come first 10 as an example, the technology to the embodiment of the present invention showing
Scheme illustrates, and does not constitute any restriction to the technical scheme of the embodiment of the present invention.That is it is also possible to be display row
In the enterprise names of first 8, the enterprise name of first 12, the enterprise name of first 15, first 20 enterprise name etc., the present invention
Embodiment is to this not in any restriction.
In this embodiment, by showing the enterprise name coming first 10 in retrieval result, user can be facilitated to check
The result of similarity retrieval, so that user carries out contrast and judges, and makes a choice, the work of relevant staff greatly reduces
Bear.
Alternatively, as one embodiment of the invention, as shown in Fig. 2 before step 110, this enterprise name similar
Property search method 100 can also include:
150th, build enterprise name search library.
Specifically, in this embodiment, enterprise name data is increased in Full-text database, and keep same in real time
Step.Set up the index of enterprise name data and Full-text database.
For example: enterprise's font size that comprise each enterprise name corresponding for enterprise name data and industry characteristic are as index
Row, set up the index of enterprise name data and Full-text database.The specific mode setting up index can adopt existing
Technology completes, succinct in order to describe, will not be described here.
Alternatively, as another embodiment of the present invention, as shown in Fig. 2 after step 130, the phase of this enterprise name
Can also include like property search method 100:
160th, calculate the Similarity value of each retrieval result in search key and retrieval result.
170th, according to Similarity value, retrieval result is ranked up.
Specifically, in this embodiment it is possible to according to Similarity value from high to low, retrieval result is ranked up.With regard to
The computational methods of similarity can be using the existing method calculating similarity, such as: shared when being occurred according to situations below
Weighted value, the search key calculating each retrieval result with input is it may be assumed that the similarity of enterprise name:
1) and input enterprise name difference number of words.
2) correlation degree of the affiliated industry of enterprise name of the affiliated industry of coupling enterprise and input.
3) homonym accounts for the proportion of whole number of words.
Succinct in order to describe, specific calculating process will not be described here.
A kind of technical scheme of the similarity retrieval method of the enterprise name present invention being provided with reference to Fig. 3 is carried out in detail
Thin description.The similarity retrieval method 200 of enterprise name as shown in Figure 3 includes:
211st, enterprise name data is increased in Full-text database, and keep real-time synchronization.
Specifically, in this embodiment, the method that the concrete grammar of data syn-chronization adopts existing data syn-chronization, in order to retouch
That states is succinct, will not be described here.
212nd, set up the index of enterprise name data and Full-text database.
Specifically, in this embodiment it is possible to enterprise's word that each enterprise name corresponding for enterprise name data is comprised
Number and industry characteristic as index column, set up the index of enterprise name data and Full-text database.
220th, judge in search key, whether to comprise the ingredient of administrative division and the type of organization type of business, if bag
Contain, then execution step 231.Otherwise, execution step 232.
231st, search key is decomposed into administrative division, the type of organization type of business, enterprise's font size and industry characteristic.
232nd, the enterprise's font size the obtaining and/or industry characteristic search key as after process will be decomposed.
241st, the search key after processing is arranged according to the order before resolution process.
For example: the search key of input is " Beijing Keycom Intellectual Property Agency Co., Ltd. ", then this search key
In comprise administrative division " Beijing " and the type of organization type of business " company limited ", then will enterprise's font size " light create " and industry spy
Point " intellectual property agency " is as the search key after processing, and the order arrangement according to " light wound intellectual property agency ",
Rather than carried out combination in any.
242nd, will be defined as retrieving phrase from 0 word in search key or 1 word or 2 different phrases of word.Also
It is to say, can will be defined as retrieving phrase from " light wound intellectual property agency " 0 different phrase of word;Or, can by with " light
1 different phrase of word of wound intellectual property agency " is defined as retrieving phrase;Or, can by with " light wound intellectual property agency " 2
The different phrase of individual word is defined as retrieving phrase, and the specific not here that formed enumerates.
Or, 243, by synonymous with search key and/or unisonance phrase be defined as retrieve phrase.That is, can
Will be defined as retrieving phrase with " light wound intellectual property agency " synonymous and/or unisonance phrase, specifically form not here one
One enumerates.
Or, 244, will be defined as retrieving phrase with the phrase of word that in search key, 2 are identical and adjacent composition.?
That is, it is possible to will be defined as retrieving phrase with the phrase of 2 identical and adjacent word compositions in " light wound intellectual property agency ",
The specific not here that formed enumerates.
Or, 245, by identical with 2 in search key but scattered word composition phrase be defined as retrieve phrase.?
That is, it is possible to the phrase of identical with 2 in " light wound intellectual property agency " but scattered word composition is defined as retrieving phrase,
The specific not here that formed enumerates.
251st, judge the quantity retrieving phrase obtaining in above-mentioned steps 242 or step 243 or step 244 or step 245
Whether exceed preset value.If exceeding, execution step 253.Otherwise, execution step 252.
252nd, adopt single-unit point retrieval, obtain retrieval result.
253rd, adopt distributed more piece point retrieval, and the retrieval result that each node is obtained is collected, and is retrieved
Result.
261st, calculate each retrieval in the retrieval result obtaining in search key and above-mentioned steps 252 or step 253
The Similarity value of result.
262nd, according to Similarity value from high to low, retrieval result is ranked up.
270th, come the enterprise name of first 10 in display retrieval result, so that user checks.
It should be understood that this embodiment in, the numerical value of appearance, such as: " 0 word or 1 word or 2 words ", " 2 identical and
Adjacent ", " 2 identical but dispersion ", " coming first 10 ", these are only for illustrating the technical side of the embodiment of the present invention
Case, does not constitute any restriction to the embodiment of the present invention.
It should also be understood that in various embodiments of the present invention, the size of the sequence number of above-mentioned each process is not meant to execution sequence
Priority, the execution sequence of each process should determine with its function and internal logic, and should not be to the enforcement of the embodiment of the present invention
Journey constitutes any restriction.
A kind of similarity retrieval method of the enterprise name being provided by the present invention, so that recall precision greatly carries
Height, may insure the interior similarity retrieval completing enterprise name in 10 seconds under normal circumstances.The number of words of search key is not
With the quantity of the corresponding retrieval phrase generating is also different, such as: the number of words of search key is more, then the retrieval phrase obtaining
Quantity is more.The difference of the number of words according to search key, corresponding retrieval duration distribution is as follows:
Therefore, a kind of similarity retrieval method of the enterprise name being provided by the present invention, can apply to industrial and commercial title
The work efficiency of the Approximate Retrieval link of industrial and commercial title registration in register system, can be greatly improved, complete for the registration of industrial and commercial title
Face is realized quick enterprise core name and is provided strong technological means, is that business system reform provides power-assisted.
In addition, the terms "and/or", a kind of only incidence relation of description affiliated partner, expression there may be
Three kinds of relations, for example, a and/or b, can represent: individualism a, there are a and b, these three situations of individualism b simultaneously.Separately
Outward, character "/" herein, typically represent forward-backward correlation to as if a kind of relation of "or".
More than, the specific embodiment of the only present invention, but protection scope of the present invention is not limited thereto, any it is familiar with
Those skilled in the art the invention discloses technical scope in, various equivalent modifications or replacement can be readily occurred in,
These modifications or replacement all should be included within the scope of the present invention.Therefore, protection scope of the present invention should be wanted with right
The protection domain asked is defined.
Claims (10)
1. a kind of similarity retrieval method of enterprise name is it is characterised in that include:
Resolution process is carried out to the search key of input, the search key after being processed, wherein, described search key
For enterprise name to be retrieved;
According to the described search key after processing, determine retrieval phrase;
Similarity retrieval is carried out to the described retrieval phrase determining, obtains retrieval result;
Show the enterprise name coming front n position in described retrieval result, so that user checks, n takes more than 1 integer.
2. enterprise name according to claim 1 similarity retrieval method it is characterised in that described to input retrieval
Keyword carries out segment processing, the search key after being processed, comprising:
Judge in described search key, whether to comprise the ingredient of administrative division and the type of organization type of business;
If not comprising, described search key is decomposed into enterprise's font size and/or industry characteristic using the retrieval as after process
Keyword;
Otherwise, described search key is decomposed into administrative division, the type of organization type of business, enterprise's font size and industry characteristic,
And using enterprise's font size and/or industry characteristic as the search key after process.
3. enterprise name according to claim 1 similarity retrieval method it is characterised in that described according to process after
Described search key, determines retrieval phrase, comprising:
Described search key after processing is arranged according to the order before resolution process;
Described retrieval phrase will be defined as from the m different phrase of word in described search key, m takes 0,1 or 2;Or,
Synonymous with described search key and/or unisonance phrase is defined as described retrieval phrase;Or,
The phrase of the word composition identical and adjacent with k in described search key is defined as described retrieval phrase;Or,
The phrase of identical with q in described search key but scattered word composition is defined as described retrieval phrase.
4. enterprise name according to claim 1 similarity retrieval method it is characterised in that described to determine described in
Retrieval phrase carries out similarity retrieval, obtains retrieval result, comprising:
Judge whether the quantity of described retrieval phrase exceedes preset value;
If exceeding, adopt distributed more piece point retrieval, and the retrieval result that each node is obtained is collected, obtain described
Retrieval result;
Otherwise, using single-unit point retrieval, obtain described retrieval result.
5. the similarity retrieval method of the enterprise name according to any one of claim 1-4 is it is characterised in that examined
After hitch fruit, also include:
Calculate the Similarity value of each retrieval result in described search key and described retrieval result;
According to described Similarity value, described retrieval result is ranked up.
6. enterprise name according to claim 5 similarity retrieval method it is characterised in that described according to described similar
Angle value, is ranked up to described retrieval result, comprising:
According to described Similarity value from high to low, described retrieval result is ranked up.
7. the similarity retrieval method of enterprise name according to claim 5 is it is characterised in that close in the retrieval to input
Key word carries out segment processing, before the search key after being processed, also includes: builds enterprise name search library.
8. the similarity retrieval method of enterprise name according to claim 7 is it is characterised in that described structure enterprise name
Search library, comprising:
Enterprise name data is increased in Full-text database, and keeps real-time synchronization;
Set up the index of described enterprise name data and described Full-text database.
9. the similarity retrieval method of enterprise name according to claim 8 is it is characterised in that described set up described enterprise
Name data and the index of described Full-text database, comprising:
Enterprise's font size that comprise each enterprise name corresponding for described enterprise name data and industry characteristic, as index column, are built
Found the index of described enterprise name data and described Full-text database.
10. the similarity retrieval method of enterprise name according to claim 1 is it is characterised in that the described inspection of described display
The enterprise name of front n position is come, so that user checks in hitch fruit, comprising:
Show the enterprise name coming first 10 in described retrieval result, so that user checks.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610829356.5A CN106354871A (en) | 2016-09-18 | 2016-09-18 | Similarity search method of enterprise names |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610829356.5A CN106354871A (en) | 2016-09-18 | 2016-09-18 | Similarity search method of enterprise names |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106354871A true CN106354871A (en) | 2017-01-25 |
Family
ID=57859975
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610829356.5A Pending CN106354871A (en) | 2016-09-18 | 2016-09-18 | Similarity search method of enterprise names |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106354871A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106951415A (en) * | 2017-04-01 | 2017-07-14 | 银联智策顾问(上海)有限公司 | A kind of name of firm searching method and device |
CN107153991A (en) * | 2017-04-28 | 2017-09-12 | 国网冀北电力有限公司物资分公司 | The inconsistent integrated conduct method of title in a kind of financial system |
CN107357916A (en) * | 2017-07-19 | 2017-11-17 | 北京金堤科技有限公司 | Data processing method and system |
CN110928915A (en) * | 2018-08-31 | 2020-03-27 | 北京京东金融科技控股有限公司 | Method, device and equipment for fuzzy matching of Chinese names and readable storage medium |
CN111783467A (en) * | 2020-07-21 | 2020-10-16 | 致诚阿福技术发展(北京)有限公司 | Enterprise name identification method and device |
CN111881183A (en) * | 2020-07-28 | 2020-11-03 | 北京金堤科技有限公司 | Enterprise name matching method and device, storage medium and electronic equipment |
CN112364635A (en) * | 2020-11-30 | 2021-02-12 | 中国银行股份有限公司 | Enterprise name duplication checking method and device |
CN112650951A (en) * | 2020-12-21 | 2021-04-13 | 撼地数智(重庆)科技有限公司 | Enterprise similarity matching method, system and computing device |
CN115329039A (en) * | 2022-08-08 | 2022-11-11 | 前锦网络信息技术(上海)有限公司 | Recruitment enterprise searching method, system, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080104276A1 (en) * | 2006-10-25 | 2008-05-01 | Arcsight, Inc. | Real-Time Identification of an Asset Model and Categorization of an Asset to Assist in Computer Network Security |
CN102279843A (en) * | 2010-06-13 | 2011-12-14 | 北京四维图新科技股份有限公司 | Method and device for processing phrase data |
CN103885937A (en) * | 2014-04-14 | 2014-06-25 | 焦点科技股份有限公司 | Method for judging repetition of enterprise Chinese names on basis of core word similarity |
CN104252507A (en) * | 2013-06-28 | 2014-12-31 | 北京华傲达数据技术有限公司 | Enterprise data matching method and device |
CN104424613A (en) * | 2013-09-04 | 2015-03-18 | 航天信息股份有限公司 | Value added tax invoice monitoring method and system thereof |
-
2016
- 2016-09-18 CN CN201610829356.5A patent/CN106354871A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080104276A1 (en) * | 2006-10-25 | 2008-05-01 | Arcsight, Inc. | Real-Time Identification of an Asset Model and Categorization of an Asset to Assist in Computer Network Security |
CN102279843A (en) * | 2010-06-13 | 2011-12-14 | 北京四维图新科技股份有限公司 | Method and device for processing phrase data |
CN104252507A (en) * | 2013-06-28 | 2014-12-31 | 北京华傲达数据技术有限公司 | Enterprise data matching method and device |
CN104424613A (en) * | 2013-09-04 | 2015-03-18 | 航天信息股份有限公司 | Value added tax invoice monitoring method and system thereof |
CN103885937A (en) * | 2014-04-14 | 2014-06-25 | 焦点科技股份有限公司 | Method for judging repetition of enterprise Chinese names on basis of core word similarity |
Non-Patent Citations (1)
Title |
---|
杨艳等: "《网络营销理论与实务》", 30 June 2015, 知识产权出版社 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106951415A (en) * | 2017-04-01 | 2017-07-14 | 银联智策顾问(上海)有限公司 | A kind of name of firm searching method and device |
CN107153991A (en) * | 2017-04-28 | 2017-09-12 | 国网冀北电力有限公司物资分公司 | The inconsistent integrated conduct method of title in a kind of financial system |
CN107357916A (en) * | 2017-07-19 | 2017-11-17 | 北京金堤科技有限公司 | Data processing method and system |
CN110928915A (en) * | 2018-08-31 | 2020-03-27 | 北京京东金融科技控股有限公司 | Method, device and equipment for fuzzy matching of Chinese names and readable storage medium |
CN111783467A (en) * | 2020-07-21 | 2020-10-16 | 致诚阿福技术发展(北京)有限公司 | Enterprise name identification method and device |
CN111881183A (en) * | 2020-07-28 | 2020-11-03 | 北京金堤科技有限公司 | Enterprise name matching method and device, storage medium and electronic equipment |
CN112364635A (en) * | 2020-11-30 | 2021-02-12 | 中国银行股份有限公司 | Enterprise name duplication checking method and device |
CN112364635B (en) * | 2020-11-30 | 2023-11-21 | 中国银行股份有限公司 | Enterprise name duplicate checking method and device |
CN112650951A (en) * | 2020-12-21 | 2021-04-13 | 撼地数智(重庆)科技有限公司 | Enterprise similarity matching method, system and computing device |
CN115329039A (en) * | 2022-08-08 | 2022-11-11 | 前锦网络信息技术(上海)有限公司 | Recruitment enterprise searching method, system, electronic equipment and storage medium |
CN115329039B (en) * | 2022-08-08 | 2023-08-04 | 前锦网络信息技术(上海)有限公司 | Recruitment enterprise searching method and system, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106354871A (en) | Similarity search method of enterprise names | |
CN106649260B (en) | Product characteristic structure tree construction method based on comment text mining | |
CN109800284B (en) | Task-oriented unstructured information intelligent question-answering system construction method | |
CN104615767B (en) | Training method, search processing method and the device of searching order model | |
CN105824959B (en) | Public opinion monitoring method and system | |
CN103207860B (en) | The entity relation extraction method and apparatus of public sentiment event | |
CN110334178B (en) | Data retrieval method, device, equipment and readable storage medium | |
CN103678576B (en) | The text retrieval system analyzed based on dynamic semantics | |
CN107463658B (en) | Text classification method and device | |
CN103150333B (en) | Opinion leader identification method in microblog media | |
CN103593474B (en) | Image retrieval sort method based on deep learning | |
CN106446070B (en) | A kind of information processing unit and method based on patent group | |
US20090094223A1 (en) | System and method for classifying search queries | |
CN110609902A (en) | Text processing method and device based on fusion knowledge graph | |
CN106682172A (en) | Keyword-based document research hotspot recommending method | |
CN104978314B (en) | Media content recommendations method and device | |
CN106204156A (en) | A kind of advertisement placement method for network forum and device | |
CN102054029A (en) | Figure information disambiguation treatment method based on social network and name context | |
US20070288442A1 (en) | System and a program for searching documents | |
US20140317001A1 (en) | Methods for evaluating term support in patent-related documents | |
CN107315738A (en) | A kind of innovation degree appraisal procedure of text message | |
CN111221968B (en) | Author disambiguation method and device based on subject tree clustering | |
CN107247743A (en) | A kind of judicial class case search method and system | |
CN110569273A (en) | Patent retrieval system and method based on relevance sorting | |
CN112784049B (en) | Text data-oriented online social platform multi-element knowledge acquisition method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170125 |