CN110083704A - A kind of company's information processing method, storage medium and equipment based on main business - Google Patents
A kind of company's information processing method, storage medium and equipment based on main business Download PDFInfo
- Publication number
- CN110083704A CN110083704A CN201910370624.5A CN201910370624A CN110083704A CN 110083704 A CN110083704 A CN 110083704A CN 201910370624 A CN201910370624 A CN 201910370624A CN 110083704 A CN110083704 A CN 110083704A
- Authority
- CN
- China
- Prior art keywords
- company
- word
- tag along
- business
- along sort
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to field of information security technology, in particular to a kind of company's information processing method based on main business.Relationship between the naming rule and company's main business of present invention combination register of company, relatively accurate extraction is carried out to company's main business in Business Name, and in this, as the foundation of company's classification, classification based on main business is carried out to company, therefore can the main business to company carry out relatively accurate positioning, and then company is more accurately divided, be conducive to the comparative analysis that each dimension is carried out between company.
Description
Technical field
The present invention relates to field of information security technology, at a kind of company's information based on main business
Reason method.
Background technique
With the arrival of big data era, more and more companies pay attention to company's comparative analysis based on big data, to public affairs
Department compares and analyzes, and is to cluster to company similar in business scope, such as classify according to industry first, still
With the diversification of corporate scope, company's main business may multifarious, the main management industry of different industries under same industry
Business may also be close, therefore causes to carry out company's information processing by industry, not accurate enough to the positioning of company, and then causes
The comparative analysis of each dimension of the company arrived is not representative enough, and relatively accurate development orientation etc. cannot be brought to company.
Company's information processing based on company's industry is not accurate enough, and then the comparative analysis of each dimension of company caused is inadequate
It is representative, the problem of relatively accurate development orientation cannot be brought to company.
Therefore, in long-term research and development, invention proposes a kind of company's information processing method based on main business,
One of to solve the above technical problems.
Summary of the invention
The purpose of the present invention is to provide a kind of company's information processing method, device, medium and electricity based on main business
Sub- equipment is able to solve at least one technical problem mentioned above.Concrete scheme is as follows:
A kind of company's information processing method based on main business, which comprises the steps of:
S1, acquisition simultaneously identify Business Name, extract address information and register of company's type information in Business Name;
S2, the Business Name stratified sampling after excision is analyzed, determines interception number of words upper limit value;
S3, even bit word-breaking and ik word-breaking are carried out to the Business Name after excision based on the determining upper limit value, is formed
Initial tag along sort word;
S4, the trading company in the initial tag along sort word is filtered out with existing trading company, company library and to described initial
Tag along sort word remainder carry out re-scheduling;
S5, part of speech analysis is carried out to the remainder of the initial tag along sort word, result is analyzed according to part of speech and is screened out
The tag along sort word of corporate business feature cannot be represented;
S6, rear tag along sort word progress artificial screening is screened out to above-mentioned, deletes and tear wrong tag along sort word open, then carries out
Front and back matching sequence, forms company's tag along sort word dictionary;
S7, net covering company's number statistics is carried out to Business Name according to company's tag along sort word dictionary, assesses company
Tag along sort dictionary it is comprehensive;
S8, the unlapped company's number of statistics, assess company's coverage rate of the tag along sort word dictionary.
Further, the concrete processing procedure of the step S1 is as follows: according to comprising each provinces, municipalities and autonomous regions, be directly under the jurisdiction of
The regional dictionary in city and county, traverses each Business Name, deletes the location part of each Business Name, building is without ground
Then company's name database of location information traverses above-mentioned company's name database, structure further according to existing register of company's type dictionary
Build company's name database without address information and register of company's type.
Further, the concrete processing procedure of the step S2 is as follows: to the company in database according to register of company
Type is grouped, and carries out stratified sampling according to each registration type company accounting, and each registration type extracts 0.1 ‰ company,
Determine that the Business Name after removing address and register of company's type information should intercept number of words upper limit value.
Further, the concrete processing procedure of the step S3 is as follows: a Business Name removes address information and public affairs
Registration type information is taken charge of, then intercepts 6 words from back to front, then carry out even bit word-breaking and ik word-breaking.
The ik word-breaking concrete operations principle are as follows: ik segmenter technology is used, while arranging the dictionary of oneself, supplement
Into ik segmenter, optimize to be done to ik segmenter;Stop words is collected, hive technology, while integrated ik participle are then used
Device segments all companies, and filters out the word of a word in word segmentation result.
Further, the tag along sort word split out under the sector is traversed according to the trading company library of various industries, filters out quotient
Number.
Further, in the step S5 and S6, screening is advanced optimized to remaining word, then not in lines
Industry is summarized and is carried out front and back matching sequence, forms company's tag along sort word dictionary;It deletes after tearing wrong word open, carries out front and back
Matching sequence, front and back matching are divided into first-level class word label, matched secondary classification word label and matched three when sorting
Grade classificating word label.
Further, it in the step S7, is counted with funneling method, from the tag along sort word of minimum level-one
Start to count, according to the net covering company number of each word, tag along sort word is comprehensive in estimation dictionary.
Further, in the step S8, coverage rate is coverage (%), the covering of company's information processing word label
Company's number is company_num1, and parent company's number is company_num2, the company coverage rate coverage (%) of the dictionary
=company_num1/company_num2*100%.
Specific embodiment according to the present invention, the present invention provide a kind of computer readable storage medium, are stored thereon with
Computer program realizes that the content in as above described in any item pairs of documents is edited when described program is executed by processor
Method.
Specific embodiment according to the present invention, the present invention provide a kind of electronic equipment, comprising: one or more processing
Device;Storage device, for storing one or more programs, when one or more of programs are by one or more of processing
When device executes, so that one or more of processors realize that the content in as above described in any item pairs of documents is edited
Method.
The above scheme of the embodiment of the present invention compared with prior art, at least has the advantages that
Relationship between the naming rule and company's main business of present invention combination register of company is right in Business Name
Company's main business carries out relatively accurate extraction, and in this, as the foundation of company's classification, carries out company based on main management industry
The classification of business, thus can the main business to company carry out relatively accurate positioning, and then company is more accurately drawn
Point, be conducive to the comparative analysis that each dimension is carried out between company.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets reality of the invention
Example is applied, and is used to explain the principle of the present invention together with specification.It should be evident that the accompanying drawings in the following description is only this
Some embodiments of invention without creative efforts, may be used also for those of ordinary skill in the art
To obtain other drawings based on these drawings.In the accompanying drawings:
Fig. 1 shows the company's information processing method flow chart based on main business provided according to embodiments of the present invention;
Fig. 2 shows the electronic equipment attachment structure schematic diagrams of embodiment according to the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into
It is described in detail to one step, it is clear that described embodiments are only a part of the embodiments of the present invention, rather than whole implementation
Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts
All other embodiment, shall fall within the protection scope of the present invention.
The term used in embodiments of the present invention is only and to be not intended to limit merely for for the purpose of describing particular embodiments
The system present invention.The embodiment of the present invention and the "an" of singular used in the attached claims, " described " and
"the" is also intended to including most forms, and unless the context clearly indicates other meaning, " a variety of " generally comprise at least two.
It should be appreciated that term "and/or" used herein is only a kind of incidence relation for describing affiliated partner, table
Show there may be three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, individualism B this
Three kinds of situations.In addition, character "/" herein, typicallys represent the relationship that forward-backward correlation object is a kind of "or".
It will be appreciated that though may be described in embodiments of the present invention using term first, second, third, etc..,
But these ... it should not necessarily be limited by these terms.These terms be only used to by ... distinguish.For example, of the invention real not departing from
In the case where applying a range, first ... can also be referred to as second ..., and similarly, second ... can also be referred to as
One ....
Depending on context, word as used in this " if ", " if " can be construed to " ... when " or
" when ... " or " in response to determination " or " in response to detection ".Similarly, context is depended on, phrase " if it is determined that " or " such as
Fruit detection (condition or event of statement) " can be construed to " when determining " or " in response to determination " or " when detection is (old
The condition or event stated) when " or " in response to detection (condition or event of statement) ".
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
Include, so that commodity or device including a series of elements not only include those elements, but also including not clear
The other element listed, or further include for this commodity or the intrinsic element of device.In the feelings not limited more
Under condition, the element that is limited by sentence "including a ...", it is not excluded that in the commodity or device for including the element
There is also other identical elements.
The alternative embodiment that the invention will now be described in detail with reference to the accompanying drawings.
Embodiment 1
The embodiment of the invention provides a kind of as shown in Figure 1 company's information processing method based on main business, including such as
Lower step:
S1, Business Name is obtained, address information in Business Name and register of company's type information is extractd;
S2, remainder in Business Name is subjected to stratified sampling analysis according to register of company's type, determines residue company
Title, which should at most intercept how many a words, can most represent company's main business;
S3, Business Name remainder is intercepted from back to front according to determining interception word number, then to interception
Company name field carry out even bit word-breaking and ik word-breaking, form initial tag along sort word;
S4, filtered out with existing trading company, company library the trading company in tag along sort word and to remaining tag along sort word into
Row re-scheduling;
S5, part of speech analysis is carried out to tag along sort word, screening out name, place name etc. according to part of speech analysis result cannot represent
The tag along sort word of corporate business feature, to reduce cost of labor;
S6, artificial screening is carried out to the remaining tag along sort word of above-mentioned steps, deletes and tear wrong tag along sort word open, then
Front and back matching sequence is carried out, company's tag along sort word dictionary is formed;
S7, net covering company's number statistics, assessment company point are carried out to Business Name according to the tag along sort word dictionary of formation
Class label word it is comprehensive;
S8, the unlapped company's number of statistics, assess company's coverage rate of the tag along sort word dictionary.
Embodiment 2
Firstly, obtaining Business Name, and identify the information such as the company location contained in title and register of company's type,
Then address information in Business Name and register of company's type information are extractd;Specific processing method may is that according to comprising
Each provinces, municipalities and autonomous regions, municipality directly under the Central Government and the regional dictionary in county, traverse each Business Name, delete the institute of each Business Name
In ground part, building is free of company's name database of address information, then traverses further according to existing register of company's type dictionary
Above-mentioned company's name database, building are free of company's name database of address information and register of company's type.
Next, carrying out stratified sampling analysis according to register of company's type to above-mentioned new company name database root, determine public
Department's title intercepts the number of word from back to front;Specific processing method, which may is that, a) infuses the company in database according to company
Volume type is grouped, and carries out stratified sampling according to each registration type company accounting, each registration type extracts 0.1 ‰ public affairs
Department determines that the Business Name after removing address and register of company's type information should intercept how many a words, according to domestic corporation
Name habit, generally selection even bit, such as choose six, and such as " nine Chong Tian Ecotourism Co., Ltd of Guangxi " goes to fall on the ground
It is " nine Chong Tian Ecotourisms " that then at most six words of interception are " eco-tour hair from back to front behind location and company's type
Exhibition " can most represent company's main business.
Next even bit word-breaking and ik word-breaking are carried out to the company name field of interception;Specific processing method can be with
If are as follows: a Business Name removes address information and register of company's type information, then intercepts 6 words from back to front, such as
" nine Chong Tian Ecotourism Co., Ltd of Guangxi " is " nine Chong Tian Ecotourisms " after removing address and company's type,
Then at most six words of interception are " Ecotourism " from back to front, then carry out even bit to " Ecotourism " and tear open
Word and ik word-breaking;
Ik word-breaking concrete operations principle are as follows: use ik segmenter technology, while arranging the dictionary of oneself, add to ik points
In word device, optimize to be done to ik segmenter;Collect stop words such as " ", " ", "and" etc., then using hive technology, together
Shi Jicheng ik segmenter, segments all companies, and filters out the word of a word in word segmentation result;
The word-breaking result of some company can be with are as follows: " ecology ", " tourism ", " development ", " eco-tour ", " tourism ",
" Ecotourism ", the tag along sort word as the said firm;
Next the corresponding trading company library of the industry according to belonging to company filters out trading company's word to the word that all companies split out;
Specific processing method, which may is that, traverses the tag along sort word split out under the sector according to the trading company library of various industries, filters out
Trading company, such as: " Bayan County Run Ji peasant planting Specialty Co-operative Organization " intercepts six after removing place name and registration type from back to front
A word is " moistening lucky peasant planting ", carries out even bit word-breaking and ik word-breaking, word-breaking result are as follows: " profit is lucky ", " peasant ", " kind
Plant ", " profit Ji Nongmin ", " peasant planting ", " moistening lucky peasant planting ", and " profit Ji " is trading company, then the word comprising " profit is lucky " is by mistake
It filters, then remaining available tag along sort word is " peasant ", " plantation ", " peasant planting ", can be reduced by traversing after re-scheduling
The number of traversal.
Next screening is advanced optimized to remaining word, then branch trade is not summarized and carries out front and back
With sequence, company's tag along sort word dictionary is formed;It deletes after tearing wrong word open, the processing method of front and back matching sequence may is that example
It, can be with its matched secondary classification word label if " numerical control " word is used as first-level class word label are as follows: " numerical control device ", " several
Control lathe ", " accurate digital control ", " intelligent numerical control " etc., can be with matched three-level mark under " numerical control device " secondary classification word label
It is signed with: " numerical control device manufacture ", " numerical control of machine tools equipment " etc..
Next, being counted according to the net covering company number that the cluster dictionary of formation carries out classified dictionary to Business Name, comment
Estimate the comprehensive of dictionary classificating word label;Specific processing method can be with are as follows: is counted with funneling method, from minimum one
The tag along sort word of grade starts to count, such as first-level class label word are as follows: " numerical control ", the secondary classification label word below it can
To have: " numerical control device " can have three-level tag along sort word under this secondary classification label word: " numerical control device manufacture ", " machine
Bed numerical control device " etc., the company's number covered under " numerical control device manufacture " are N3_1, the company covered under " numerical control of machine tools equipment "
Number is N3_2., then under secondary classification label " numerical control device " three-level tag along sort cover parent company's number are as follows:
N3_total=N3_1+N3_2+N3_3+N3_4- ∩ (N3_1, N3_2, N3_3, N3_4) is if then secondary classification mark
Company's sum of " numerical control device " covering signed is N2_total, then its net covering company number are as follows: N2_total-N3_
total;And then according to the net covering company number of each word, the comprehensive of tag along sort word in dictionary is estimated;
1. company's number of last statistical classification label covering, assesses company's coverage rate of the dictionary;Specific processing method
It may is that coverage rate is coverage (%), it is company_num1, parent company's number that company's classificating word label, which covers company's number,
For company coverage rate coverage (%)=company_num1/company_num2* of the company_num2 then dictionary
100%.
Tag along sort word covers company's number statistical method in the embodiment of the present invention are as follows: includes the classification inside Business Name
Label word, is just denoted as 1.
Embodiment 3
As shown in Fig. 2, the equipment is used for based at main business company information the present embodiment provides a kind of electronic equipment
The method of reason, the electronic equipment, comprising: at least one processor;And it is connect at least one described processor communication
Memory.
Below with reference to Fig. 2, it illustrates the structural representations for the electronic equipment 400 for being suitable for being used to realize the embodiment of the present disclosure
Figure.Terminal device in the embodiment of the present disclosure can include but is not limited to such as mobile phone, laptop, digital broadcasting
Receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable media player), car-mounted terminal (such as
Vehicle mounted guidance terminal) etc. mobile terminal and such as number TV, desktop computer etc. fixed terminal.Fig. 2 shows
Electronic equipment be only an example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.
As shown in Fig. 2, electronic equipment 400 may include processing unit (such as central processing unit, graphics processor etc.)
401, random visit can be loaded into according to the program being stored in read-only memory (ROM) 402 or from storage device 408
It asks the program in memory (RAM) 403 and executes various movements appropriate and processing.In RAM 403, it is also stored with electronics
Equipment 400 operates required various programs and data.Processing unit 401, ROM 402 and RAM 403 by bus 404 that
This is connected.Input/output (I/O) interface 405 is also connected to bus 404.
In general, following device can connect to I/O interface 405: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph
As the input unit 406 of head, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaker, vibration
The output device 407 of dynamic device etc.;Storage device 408 including such as tape, hard disk etc.;And communication device 409.Communication dress
It sets 409 and can permit electronic equipment 400 and wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 2 shows
Electronic equipment 400 with various devices, it should be understood that being not required for implementing or having all devices shown.It can
Alternatively to implement or have more or fewer devices.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable Jie
Computer program in matter, the computer program include the program code for method shown in execution flow chart.Such
In embodiment, which can be downloaded and installed from network by communication device 409, or from storage device
408 are mounted, or are mounted from ROM 402.When the computer program is executed by processing unit 401, the disclosure is executed
The above-mentioned function of being limited in the method for embodiment.
It should be noted that the above-mentioned computer-readable medium of the disclosure can be computer-readable signal media or
Computer readable storage medium either the two any combination.Computer readable storage medium for example can be ---
But be not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above group
It closes.The more specific example of computer readable storage medium can include but is not limited to: have being electrically connected for one or more conducting wires
It connects, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type programmable
Reading memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic are deposited
Memory device or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium, which can be, any includes
Or the tangible medium of storage program, which can be commanded execution system, device or device use or in connection make
With.And in the disclosure, computer-readable signal media may include in a base band or as carrier wave a part propagate number
It is believed that number, wherein carrying computer-readable program code.The data-signal of this propagation can take various forms, packet
Include but be not limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be
Any computer-readable medium other than computer readable storage medium, the computer-readable signal media can send, propagate
Either transmission is for by the use of instruction execution system, device or device or program in connection.It is computer-readable
The program code for including on medium can transmit with any suitable medium, including but not limited to: electric wire, optical cable, RF (radio frequency)
Etc. or above-mentioned any appropriate combination.
Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment;It is also possible to individualism, and not
It is fitted into the electronic equipment.
Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are by this
When electronic equipment executes, so that the electronic equipment: obtaining at least two internet protocol addresses;It sends and wraps to Node evaluation equipment
Include the Node evaluation request of at least two internet protocol address, wherein the Node evaluation equipment is from described at least two
In internet protocol address, chooses internet protocol address and return;With receiving the Internet protocol that the Node evaluation equipment returns
Location;Wherein, the fringe node in acquired internet protocol address instruction content distributing network.
Alternatively, above-mentioned computer-readable medium carries one or more program, when said one or multiple programs
When being executed by the electronic equipment, so that the electronic equipment: receiving the Node evaluation including at least two internet protocol addresses and ask
It asks;From at least two internet protocol address, internet protocol address is chosen;Return to the internet protocol address selected;Its
In, the internet protocol address received indicates the fringe node in content distributing network.
The calculating of the operation for executing the disclosure can be write with one or more programming languages or combinations thereof
Machine program code, above procedure design language include object oriented program language-such as Java, Smalltalk, C+
+, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can
Fully to execute, partly execute on the user computer on the user computer, be held as an independent software package
Row, part on the user computer part on the remote computer execute or completely on a remote computer or server
It executes.In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network
(LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as using because of spy
Service provider is netted to connect by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can be with
A part of a module, program segment or code is represented, a part of the module, program segment or code includes one or more
A executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, box
Middle marked function can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated
Can actually be basically executed in parallel, they can also be executed in the opposite order sometimes, this according to related function and
It is fixed.It is also noted that the group of each box in block diagram and or flow chart and the box in block diagram and or flow chart
It closes, can be realized with the dedicated hardware based system for executing defined functions or operations, or specialized hardware can be used
Combination with computer instruction is realized.
Being described in unit involved in the embodiment of the present disclosure can be realized by way of software, can also be passed through
The mode of hardware is realized.Wherein, the title of unit does not constitute the restriction to the unit itself under certain conditions, for example,
First acquisition unit is also described as " obtaining the unit of at least two internet protocol addresses ".
Claims (10)
1. a kind of company's information processing method based on main business, which comprises the steps of:
S1, acquisition simultaneously identify Business Name, extract address information and register of company's type information in Business Name;
S2, the Business Name stratified sampling after excision is analyzed, determines interception number of words upper limit value;
S3, even bit word-breaking and ik word-breaking are carried out to the Business Name after excision based on the determining upper limit value, is formed initial
Tag along sort word;
S4, the trading company in the initial tag along sort word is filtered out with existing trading company, company library and to the initial classification
The remainder of label word carries out re-scheduling;
S5, part of speech analysis is carried out to the remainder of the initial tag along sort word, screening out according to part of speech analysis result cannot
Represent the tag along sort word of corporate business feature;
S6, rear tag along sort word progress artificial screening is screened out to above-mentioned, deletes and tear wrong tag along sort word open, then carries out front and back
With sequence, company's tag along sort word dictionary is formed;
S7, net covering company's number statistics, the classification of assessment company are carried out to Business Name according to company's tag along sort word dictionary
Label dictionary it is comprehensive;
S8, the unlapped company's number of statistics, assess company's coverage rate of the tag along sort word dictionary.
2. company's information processing method according to claim 1 based on main business, which is characterized in that the step
The concrete processing procedure of S1 is as follows: according to the regional dictionary comprising each provinces, municipalities and autonomous regions, municipality directly under the Central Government and county, traversing each public affairs
Take charge of title, delete the location part of each Business Name, building is free of company's name database of address information, then further according to
Existing register of company's type dictionary traverses above-mentioned company's name database, and building is free of the public affairs of address information and register of company's type
Take charge of name database.
3. company's information processing method according to claim 1 based on main business, which is characterized in that the step
The concrete processing procedure of S2 is as follows: being grouped to the company in database according to register of company's type, according to each registration type
Company's accounting carries out stratified sampling, and each registration type extracts 0.1 ‰ company, determines and removes address and register of company's type letter
Business Name after breath should intercept number of words upper limit value.
4. company's information processing method according to claim 1 based on main business, which is characterized in that the step
The concrete processing procedure of S3 is as follows: a Business Name removes address information and register of company's type information, then from back to front
6 words are intercepted, then carry out even bit word-breaking and ik word-breaking;
The ik word-breaking concrete operations principle are as follows: use ik segmenter technology, while arranging the dictionary of oneself, add to ik points
In word device, optimize to be done to ik segmenter;Stop words is collected, hive technology, while integrated ik segmenter are then used, to institute
Some Business Names are segmented, and filter out the word of a word in word segmentation result.
5. company's information processing method according to claim 1 based on main business, which is characterized in that the step
The concrete processing procedure of S4 is as follows: traversing the tag along sort word split out under the sector according to the trading company library of various industries, filters out
Trading company.
6. company's information processing method according to claim 1 based on main business, which is characterized in that the step
In S5 and S6, screening is advanced optimized to remaining word, then branch trade is not summarized and is carried out front and back matching sequence,
Formation company tag along sort word dictionary;It deletes after tearing wrong word open, progress front and back matching sequence, front and back matching is divided into level-one when sorting
Classificating word label, matched secondary classification word label and matched three-level classificating word label.
7. company's information processing method according to claim 1 based on main business, which is characterized in that the step
It in S7, is counted with funneling method, is counted since the tag along sort word of minimum level-one, it is public according to the net covering of each word
Number is taken charge of, tag along sort word is comprehensive in estimation dictionary.
8. company's information processing method according to claim 1 based on main business, which is characterized in that the step
In S8, coverage rate is coverage (%), and it is company_num1 that company's classificating word label, which covers company's number, and parent company's number is
Company_num2, company coverage rate coverage (%)=company_num1/company_num2*100% of the dictionary.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is by processor
Such as method described in any item of the claim 1 to 8 is realized when execution.
10. a kind of electronic equipment characterized by comprising
One or more processors;
Storage device, for storing one or more programs, when one or more of programs are by one or more of processing
When device executes, so that one or more of processors realize such as method described in any item of the claim 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910370624.5A CN110083704B (en) | 2019-05-06 | 2019-05-06 | Method, storage medium and device for processing company information based on main business |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910370624.5A CN110083704B (en) | 2019-05-06 | 2019-05-06 | Method, storage medium and device for processing company information based on main business |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110083704A true CN110083704A (en) | 2019-08-02 |
CN110083704B CN110083704B (en) | 2020-06-09 |
Family
ID=67418691
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910370624.5A Active CN110083704B (en) | 2019-05-06 | 2019-05-06 | Method, storage medium and device for processing company information based on main business |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110083704B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140163964A1 (en) * | 2012-12-12 | 2014-06-12 | International Business Machines Corporation | Approximate named-entity extraction |
CN104252507A (en) * | 2013-06-28 | 2014-12-31 | 北京华傲达数据技术有限公司 | Enterprise data matching method and device |
US9002725B1 (en) * | 2005-04-20 | 2015-04-07 | Google Inc. | System and method for targeting information based on message content |
CN107193959A (en) * | 2017-05-24 | 2017-09-22 | 南京大学 | A kind of business entity's sorting technique towards plain text |
-
2019
- 2019-05-06 CN CN201910370624.5A patent/CN110083704B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9002725B1 (en) * | 2005-04-20 | 2015-04-07 | Google Inc. | System and method for targeting information based on message content |
US20140163964A1 (en) * | 2012-12-12 | 2014-06-12 | International Business Machines Corporation | Approximate named-entity extraction |
CN104252507A (en) * | 2013-06-28 | 2014-12-31 | 北京华傲达数据技术有限公司 | Enterprise data matching method and device |
CN107193959A (en) * | 2017-05-24 | 2017-09-22 | 南京大学 | A kind of business entity's sorting technique towards plain text |
Also Published As
Publication number | Publication date |
---|---|
CN110083704B (en) | 2020-06-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10446028B2 (en) | Parking identification and availability prediction | |
EP4137961A1 (en) | Method and apparatus for executing automatic machine learning process, and device | |
CN107341220A (en) | A kind of multi-source data fusion method and device | |
CN107590654A (en) | A kind of method of on-line payment, terminal and computer-readable medium | |
CN110134759A (en) | A method of obtaining the trade information of enterprise | |
CN110309469A (en) | A kind of user clicks behavior visual analysis method, system, medium and electronic equipment | |
CN112118551A (en) | Equipment risk identification method and related equipment | |
CN111522838A (en) | Address similarity calculation method and related device | |
CN109145050B (en) | Computing device | |
CN114511353A (en) | Data analysis method and device | |
Manley et al. | New forms of data for understanding urban activity in developing countries | |
KR102124935B1 (en) | Disaster Monitoring System, Method Using Crowd Sourcing, and Computer Program therefor | |
CN105786810B (en) | The method for building up and device of classification mapping relations | |
CN110443265A (en) | A kind of behavioral value method and apparatus based on corporations | |
CN106886517A (en) | Business site selecting method, device and system | |
CN110516062A (en) | A kind of search processing method and device of document | |
CN110490349A (en) | A kind of information recommendation method based on calendar, device, medium and electronic equipment | |
CN109783381A (en) | A kind of test data generating method, apparatus and system | |
McKenzie et al. | A user-generated data based approach to enhancing location prediction of financial services in sub-Saharan Africa | |
CN109526027A (en) | A kind of cell capacity optimization method, device, equipment and computer storage medium | |
CN113269355A (en) | User loan prediction method, device and storage medium | |
CN112699955A (en) | User classification method, device, equipment and storage medium | |
CN110083704A (en) | A kind of company's information processing method, storage medium and equipment based on main business | |
US10595178B2 (en) | Listing service registrations through a mobile number | |
Markou et al. | Is travel demand actually deep? An application in event areas using semantic information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |