CN110083704A - A kind of company's information processing method, storage medium and equipment based on main business - Google Patents

A kind of company's information processing method, storage medium and equipment based on main business Download PDF

Info

Publication number
CN110083704A
CN110083704A CN201910370624.5A CN201910370624A CN110083704A CN 110083704 A CN110083704 A CN 110083704A CN 201910370624 A CN201910370624 A CN 201910370624A CN 110083704 A CN110083704 A CN 110083704A
Authority
CN
China
Prior art keywords
company
word
tag along
business
along sort
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910370624.5A
Other languages
Chinese (zh)
Other versions
CN110083704B (en
Inventor
张艳华
郭瑞兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Tianpeng Network Co Ltd
Original Assignee
Chongqing Tianpeng Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Tianpeng Network Co Ltd filed Critical Chongqing Tianpeng Network Co Ltd
Priority to CN201910370624.5A priority Critical patent/CN110083704B/en
Publication of CN110083704A publication Critical patent/CN110083704A/en
Application granted granted Critical
Publication of CN110083704B publication Critical patent/CN110083704B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to field of information security technology, in particular to a kind of company's information processing method based on main business.Relationship between the naming rule and company's main business of present invention combination register of company, relatively accurate extraction is carried out to company's main business in Business Name, and in this, as the foundation of company's classification, classification based on main business is carried out to company, therefore can the main business to company carry out relatively accurate positioning, and then company is more accurately divided, be conducive to the comparative analysis that each dimension is carried out between company.

Description

A kind of company's information processing method, storage medium and equipment based on main business
Technical field
The present invention relates to field of information security technology, at a kind of company's information based on main business Reason method.
Background technique
With the arrival of big data era, more and more companies pay attention to company's comparative analysis based on big data, to public affairs Department compares and analyzes, and is to cluster to company similar in business scope, such as classify according to industry first, still With the diversification of corporate scope, company's main business may multifarious, the main management industry of different industries under same industry Business may also be close, therefore causes to carry out company's information processing by industry, not accurate enough to the positioning of company, and then causes The comparative analysis of each dimension of the company arrived is not representative enough, and relatively accurate development orientation etc. cannot be brought to company. Company's information processing based on company's industry is not accurate enough, and then the comparative analysis of each dimension of company caused is inadequate It is representative, the problem of relatively accurate development orientation cannot be brought to company.
Therefore, in long-term research and development, invention proposes a kind of company's information processing method based on main business, One of to solve the above technical problems.
Summary of the invention
The purpose of the present invention is to provide a kind of company's information processing method, device, medium and electricity based on main business Sub- equipment is able to solve at least one technical problem mentioned above.Concrete scheme is as follows:
A kind of company's information processing method based on main business, which comprises the steps of:
S1, acquisition simultaneously identify Business Name, extract address information and register of company's type information in Business Name;
S2, the Business Name stratified sampling after excision is analyzed, determines interception number of words upper limit value;
S3, even bit word-breaking and ik word-breaking are carried out to the Business Name after excision based on the determining upper limit value, is formed Initial tag along sort word;
S4, the trading company in the initial tag along sort word is filtered out with existing trading company, company library and to described initial Tag along sort word remainder carry out re-scheduling;
S5, part of speech analysis is carried out to the remainder of the initial tag along sort word, result is analyzed according to part of speech and is screened out The tag along sort word of corporate business feature cannot be represented;
S6, rear tag along sort word progress artificial screening is screened out to above-mentioned, deletes and tear wrong tag along sort word open, then carries out Front and back matching sequence, forms company's tag along sort word dictionary;
S7, net covering company's number statistics is carried out to Business Name according to company's tag along sort word dictionary, assesses company Tag along sort dictionary it is comprehensive;
S8, the unlapped company's number of statistics, assess company's coverage rate of the tag along sort word dictionary.
Further, the concrete processing procedure of the step S1 is as follows: according to comprising each provinces, municipalities and autonomous regions, be directly under the jurisdiction of The regional dictionary in city and county, traverses each Business Name, deletes the location part of each Business Name, building is without ground Then company's name database of location information traverses above-mentioned company's name database, structure further according to existing register of company's type dictionary Build company's name database without address information and register of company's type.
Further, the concrete processing procedure of the step S2 is as follows: to the company in database according to register of company Type is grouped, and carries out stratified sampling according to each registration type company accounting, and each registration type extracts 0.1 ‰ company, Determine that the Business Name after removing address and register of company's type information should intercept number of words upper limit value.
Further, the concrete processing procedure of the step S3 is as follows: a Business Name removes address information and public affairs Registration type information is taken charge of, then intercepts 6 words from back to front, then carry out even bit word-breaking and ik word-breaking.
The ik word-breaking concrete operations principle are as follows: ik segmenter technology is used, while arranging the dictionary of oneself, supplement Into ik segmenter, optimize to be done to ik segmenter;Stop words is collected, hive technology, while integrated ik participle are then used Device segments all companies, and filters out the word of a word in word segmentation result.
Further, the tag along sort word split out under the sector is traversed according to the trading company library of various industries, filters out quotient Number.
Further, in the step S5 and S6, screening is advanced optimized to remaining word, then not in lines Industry is summarized and is carried out front and back matching sequence, forms company's tag along sort word dictionary;It deletes after tearing wrong word open, carries out front and back Matching sequence, front and back matching are divided into first-level class word label, matched secondary classification word label and matched three when sorting Grade classificating word label.
Further, it in the step S7, is counted with funneling method, from the tag along sort word of minimum level-one Start to count, according to the net covering company number of each word, tag along sort word is comprehensive in estimation dictionary.
Further, in the step S8, coverage rate is coverage (%), the covering of company's information processing word label Company's number is company_num1, and parent company's number is company_num2, the company coverage rate coverage (%) of the dictionary =company_num1/company_num2*100%.
Specific embodiment according to the present invention, the present invention provide a kind of computer readable storage medium, are stored thereon with Computer program realizes that the content in as above described in any item pairs of documents is edited when described program is executed by processor Method.
Specific embodiment according to the present invention, the present invention provide a kind of electronic equipment, comprising: one or more processing Device;Storage device, for storing one or more programs, when one or more of programs are by one or more of processing When device executes, so that one or more of processors realize that the content in as above described in any item pairs of documents is edited Method.
The above scheme of the embodiment of the present invention compared with prior art, at least has the advantages that
Relationship between the naming rule and company's main business of present invention combination register of company is right in Business Name Company's main business carries out relatively accurate extraction, and in this, as the foundation of company's classification, carries out company based on main management industry The classification of business, thus can the main business to company carry out relatively accurate positioning, and then company is more accurately drawn Point, be conducive to the comparative analysis that each dimension is carried out between company.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets reality of the invention Example is applied, and is used to explain the principle of the present invention together with specification.It should be evident that the accompanying drawings in the following description is only this Some embodiments of invention without creative efforts, may be used also for those of ordinary skill in the art To obtain other drawings based on these drawings.In the accompanying drawings:
Fig. 1 shows the company's information processing method flow chart based on main business provided according to embodiments of the present invention;
Fig. 2 shows the electronic equipment attachment structure schematic diagrams of embodiment according to the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into It is described in detail to one step, it is clear that described embodiments are only a part of the embodiments of the present invention, rather than whole implementation Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts All other embodiment, shall fall within the protection scope of the present invention.
The term used in embodiments of the present invention is only and to be not intended to limit merely for for the purpose of describing particular embodiments The system present invention.The embodiment of the present invention and the "an" of singular used in the attached claims, " described " and "the" is also intended to including most forms, and unless the context clearly indicates other meaning, " a variety of " generally comprise at least two.
It should be appreciated that term "and/or" used herein is only a kind of incidence relation for describing affiliated partner, table Show there may be three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, individualism B this Three kinds of situations.In addition, character "/" herein, typicallys represent the relationship that forward-backward correlation object is a kind of "or".
It will be appreciated that though may be described in embodiments of the present invention using term first, second, third, etc.., But these ... it should not necessarily be limited by these terms.These terms be only used to by ... distinguish.For example, of the invention real not departing from In the case where applying a range, first ... can also be referred to as second ..., and similarly, second ... can also be referred to as One ....
Depending on context, word as used in this " if ", " if " can be construed to " ... when " or " when ... " or " in response to determination " or " in response to detection ".Similarly, context is depended on, phrase " if it is determined that " or " such as Fruit detection (condition or event of statement) " can be construed to " when determining " or " in response to determination " or " when detection is (old The condition or event stated) when " or " in response to detection (condition or event of statement) ".
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability Include, so that commodity or device including a series of elements not only include those elements, but also including not clear The other element listed, or further include for this commodity or the intrinsic element of device.In the feelings not limited more Under condition, the element that is limited by sentence "including a ...", it is not excluded that in the commodity or device for including the element There is also other identical elements.
The alternative embodiment that the invention will now be described in detail with reference to the accompanying drawings.
Embodiment 1
The embodiment of the invention provides a kind of as shown in Figure 1 company's information processing method based on main business, including such as Lower step:
S1, Business Name is obtained, address information in Business Name and register of company's type information is extractd;
S2, remainder in Business Name is subjected to stratified sampling analysis according to register of company's type, determines residue company Title, which should at most intercept how many a words, can most represent company's main business;
S3, Business Name remainder is intercepted from back to front according to determining interception word number, then to interception Company name field carry out even bit word-breaking and ik word-breaking, form initial tag along sort word;
S4, filtered out with existing trading company, company library the trading company in tag along sort word and to remaining tag along sort word into Row re-scheduling;
S5, part of speech analysis is carried out to tag along sort word, screening out name, place name etc. according to part of speech analysis result cannot represent The tag along sort word of corporate business feature, to reduce cost of labor;
S6, artificial screening is carried out to the remaining tag along sort word of above-mentioned steps, deletes and tear wrong tag along sort word open, then Front and back matching sequence is carried out, company's tag along sort word dictionary is formed;
S7, net covering company's number statistics, assessment company point are carried out to Business Name according to the tag along sort word dictionary of formation Class label word it is comprehensive;
S8, the unlapped company's number of statistics, assess company's coverage rate of the tag along sort word dictionary.
Embodiment 2
Firstly, obtaining Business Name, and identify the information such as the company location contained in title and register of company's type, Then address information in Business Name and register of company's type information are extractd;Specific processing method may is that according to comprising Each provinces, municipalities and autonomous regions, municipality directly under the Central Government and the regional dictionary in county, traverse each Business Name, delete the institute of each Business Name In ground part, building is free of company's name database of address information, then traverses further according to existing register of company's type dictionary Above-mentioned company's name database, building are free of company's name database of address information and register of company's type.
Next, carrying out stratified sampling analysis according to register of company's type to above-mentioned new company name database root, determine public Department's title intercepts the number of word from back to front;Specific processing method, which may is that, a) infuses the company in database according to company Volume type is grouped, and carries out stratified sampling according to each registration type company accounting, each registration type extracts 0.1 ‰ public affairs Department determines that the Business Name after removing address and register of company's type information should intercept how many a words, according to domestic corporation Name habit, generally selection even bit, such as choose six, and such as " nine Chong Tian Ecotourism Co., Ltd of Guangxi " goes to fall on the ground It is " nine Chong Tian Ecotourisms " that then at most six words of interception are " eco-tour hair from back to front behind location and company's type Exhibition " can most represent company's main business.
Next even bit word-breaking and ik word-breaking are carried out to the company name field of interception;Specific processing method can be with If are as follows: a Business Name removes address information and register of company's type information, then intercepts 6 words from back to front, such as " nine Chong Tian Ecotourism Co., Ltd of Guangxi " is " nine Chong Tian Ecotourisms " after removing address and company's type, Then at most six words of interception are " Ecotourism " from back to front, then carry out even bit to " Ecotourism " and tear open Word and ik word-breaking;
Ik word-breaking concrete operations principle are as follows: use ik segmenter technology, while arranging the dictionary of oneself, add to ik points In word device, optimize to be done to ik segmenter;Collect stop words such as " ", " ", "and" etc., then using hive technology, together Shi Jicheng ik segmenter, segments all companies, and filters out the word of a word in word segmentation result;
The word-breaking result of some company can be with are as follows: " ecology ", " tourism ", " development ", " eco-tour ", " tourism ", " Ecotourism ", the tag along sort word as the said firm;
Next the corresponding trading company library of the industry according to belonging to company filters out trading company's word to the word that all companies split out; Specific processing method, which may is that, traverses the tag along sort word split out under the sector according to the trading company library of various industries, filters out Trading company, such as: " Bayan County Run Ji peasant planting Specialty Co-operative Organization " intercepts six after removing place name and registration type from back to front A word is " moistening lucky peasant planting ", carries out even bit word-breaking and ik word-breaking, word-breaking result are as follows: " profit is lucky ", " peasant ", " kind Plant ", " profit Ji Nongmin ", " peasant planting ", " moistening lucky peasant planting ", and " profit Ji " is trading company, then the word comprising " profit is lucky " is by mistake It filters, then remaining available tag along sort word is " peasant ", " plantation ", " peasant planting ", can be reduced by traversing after re-scheduling The number of traversal.
Next screening is advanced optimized to remaining word, then branch trade is not summarized and carries out front and back With sequence, company's tag along sort word dictionary is formed;It deletes after tearing wrong word open, the processing method of front and back matching sequence may is that example It, can be with its matched secondary classification word label if " numerical control " word is used as first-level class word label are as follows: " numerical control device ", " several Control lathe ", " accurate digital control ", " intelligent numerical control " etc., can be with matched three-level mark under " numerical control device " secondary classification word label It is signed with: " numerical control device manufacture ", " numerical control of machine tools equipment " etc..
Next, being counted according to the net covering company number that the cluster dictionary of formation carries out classified dictionary to Business Name, comment Estimate the comprehensive of dictionary classificating word label;Specific processing method can be with are as follows: is counted with funneling method, from minimum one The tag along sort word of grade starts to count, such as first-level class label word are as follows: " numerical control ", the secondary classification label word below it can To have: " numerical control device " can have three-level tag along sort word under this secondary classification label word: " numerical control device manufacture ", " machine Bed numerical control device " etc., the company's number covered under " numerical control device manufacture " are N3_1, the company covered under " numerical control of machine tools equipment " Number is N3_2., then under secondary classification label " numerical control device " three-level tag along sort cover parent company's number are as follows:
N3_total=N3_1+N3_2+N3_3+N3_4- ∩ (N3_1, N3_2, N3_3, N3_4) is if then secondary classification mark Company's sum of " numerical control device " covering signed is N2_total, then its net covering company number are as follows: N2_total-N3_ total;And then according to the net covering company number of each word, the comprehensive of tag along sort word in dictionary is estimated;
1. company's number of last statistical classification label covering, assesses company's coverage rate of the dictionary;Specific processing method It may is that coverage rate is coverage (%), it is company_num1, parent company's number that company's classificating word label, which covers company's number, For company coverage rate coverage (%)=company_num1/company_num2* of the company_num2 then dictionary 100%.
Tag along sort word covers company's number statistical method in the embodiment of the present invention are as follows: includes the classification inside Business Name Label word, is just denoted as 1.
Embodiment 3
As shown in Fig. 2, the equipment is used for based at main business company information the present embodiment provides a kind of electronic equipment The method of reason, the electronic equipment, comprising: at least one processor;And it is connect at least one described processor communication Memory.
Below with reference to Fig. 2, it illustrates the structural representations for the electronic equipment 400 for being suitable for being used to realize the embodiment of the present disclosure Figure.Terminal device in the embodiment of the present disclosure can include but is not limited to such as mobile phone, laptop, digital broadcasting Receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable media player), car-mounted terminal (such as Vehicle mounted guidance terminal) etc. mobile terminal and such as number TV, desktop computer etc. fixed terminal.Fig. 2 shows Electronic equipment be only an example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.
As shown in Fig. 2, electronic equipment 400 may include processing unit (such as central processing unit, graphics processor etc.) 401, random visit can be loaded into according to the program being stored in read-only memory (ROM) 402 or from storage device 408 It asks the program in memory (RAM) 403 and executes various movements appropriate and processing.In RAM 403, it is also stored with electronics Equipment 400 operates required various programs and data.Processing unit 401, ROM 402 and RAM 403 by bus 404 that This is connected.Input/output (I/O) interface 405 is also connected to bus 404.
In general, following device can connect to I/O interface 405: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph As the input unit 406 of head, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaker, vibration The output device 407 of dynamic device etc.;Storage device 408 including such as tape, hard disk etc.;And communication device 409.Communication dress It sets 409 and can permit electronic equipment 400 and wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 2 shows Electronic equipment 400 with various devices, it should be understood that being not required for implementing or having all devices shown.It can Alternatively to implement or have more or fewer devices.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable Jie Computer program in matter, the computer program include the program code for method shown in execution flow chart.Such In embodiment, which can be downloaded and installed from network by communication device 409, or from storage device 408 are mounted, or are mounted from ROM 402.When the computer program is executed by processing unit 401, the disclosure is executed The above-mentioned function of being limited in the method for embodiment.
It should be noted that the above-mentioned computer-readable medium of the disclosure can be computer-readable signal media or Computer readable storage medium either the two any combination.Computer readable storage medium for example can be --- But be not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above group It closes.The more specific example of computer readable storage medium can include but is not limited to: have being electrically connected for one or more conducting wires It connects, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type programmable Reading memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic are deposited Memory device or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium, which can be, any includes Or the tangible medium of storage program, which can be commanded execution system, device or device use or in connection make With.And in the disclosure, computer-readable signal media may include in a base band or as carrier wave a part propagate number It is believed that number, wherein carrying computer-readable program code.The data-signal of this propagation can take various forms, packet Include but be not limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium other than computer readable storage medium, the computer-readable signal media can send, propagate Either transmission is for by the use of instruction execution system, device or device or program in connection.It is computer-readable The program code for including on medium can transmit with any suitable medium, including but not limited to: electric wire, optical cable, RF (radio frequency) Etc. or above-mentioned any appropriate combination.
Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment;It is also possible to individualism, and not It is fitted into the electronic equipment.
Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are by this When electronic equipment executes, so that the electronic equipment: obtaining at least two internet protocol addresses;It sends and wraps to Node evaluation equipment Include the Node evaluation request of at least two internet protocol address, wherein the Node evaluation equipment is from described at least two In internet protocol address, chooses internet protocol address and return;With receiving the Internet protocol that the Node evaluation equipment returns Location;Wherein, the fringe node in acquired internet protocol address instruction content distributing network.
Alternatively, above-mentioned computer-readable medium carries one or more program, when said one or multiple programs When being executed by the electronic equipment, so that the electronic equipment: receiving the Node evaluation including at least two internet protocol addresses and ask It asks;From at least two internet protocol address, internet protocol address is chosen;Return to the internet protocol address selected;Its In, the internet protocol address received indicates the fringe node in content distributing network.
The calculating of the operation for executing the disclosure can be write with one or more programming languages or combinations thereof Machine program code, above procedure design language include object oriented program language-such as Java, Smalltalk, C+ +, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can Fully to execute, partly execute on the user computer on the user computer, be held as an independent software package Row, part on the user computer part on the remote computer execute or completely on a remote computer or server It executes.In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as using because of spy Service provider is netted to connect by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can be with A part of a module, program segment or code is represented, a part of the module, program segment or code includes one or more A executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, box Middle marked function can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated Can actually be basically executed in parallel, they can also be executed in the opposite order sometimes, this according to related function and It is fixed.It is also noted that the group of each box in block diagram and or flow chart and the box in block diagram and or flow chart It closes, can be realized with the dedicated hardware based system for executing defined functions or operations, or specialized hardware can be used Combination with computer instruction is realized.
Being described in unit involved in the embodiment of the present disclosure can be realized by way of software, can also be passed through The mode of hardware is realized.Wherein, the title of unit does not constitute the restriction to the unit itself under certain conditions, for example, First acquisition unit is also described as " obtaining the unit of at least two internet protocol addresses ".

Claims (10)

1. a kind of company's information processing method based on main business, which comprises the steps of:
S1, acquisition simultaneously identify Business Name, extract address information and register of company's type information in Business Name;
S2, the Business Name stratified sampling after excision is analyzed, determines interception number of words upper limit value;
S3, even bit word-breaking and ik word-breaking are carried out to the Business Name after excision based on the determining upper limit value, is formed initial Tag along sort word;
S4, the trading company in the initial tag along sort word is filtered out with existing trading company, company library and to the initial classification The remainder of label word carries out re-scheduling;
S5, part of speech analysis is carried out to the remainder of the initial tag along sort word, screening out according to part of speech analysis result cannot Represent the tag along sort word of corporate business feature;
S6, rear tag along sort word progress artificial screening is screened out to above-mentioned, deletes and tear wrong tag along sort word open, then carries out front and back With sequence, company's tag along sort word dictionary is formed;
S7, net covering company's number statistics, the classification of assessment company are carried out to Business Name according to company's tag along sort word dictionary Label dictionary it is comprehensive;
S8, the unlapped company's number of statistics, assess company's coverage rate of the tag along sort word dictionary.
2. company's information processing method according to claim 1 based on main business, which is characterized in that the step The concrete processing procedure of S1 is as follows: according to the regional dictionary comprising each provinces, municipalities and autonomous regions, municipality directly under the Central Government and county, traversing each public affairs Take charge of title, delete the location part of each Business Name, building is free of company's name database of address information, then further according to Existing register of company's type dictionary traverses above-mentioned company's name database, and building is free of the public affairs of address information and register of company's type Take charge of name database.
3. company's information processing method according to claim 1 based on main business, which is characterized in that the step The concrete processing procedure of S2 is as follows: being grouped to the company in database according to register of company's type, according to each registration type Company's accounting carries out stratified sampling, and each registration type extracts 0.1 ‰ company, determines and removes address and register of company's type letter Business Name after breath should intercept number of words upper limit value.
4. company's information processing method according to claim 1 based on main business, which is characterized in that the step The concrete processing procedure of S3 is as follows: a Business Name removes address information and register of company's type information, then from back to front 6 words are intercepted, then carry out even bit word-breaking and ik word-breaking;
The ik word-breaking concrete operations principle are as follows: use ik segmenter technology, while arranging the dictionary of oneself, add to ik points In word device, optimize to be done to ik segmenter;Stop words is collected, hive technology, while integrated ik segmenter are then used, to institute Some Business Names are segmented, and filter out the word of a word in word segmentation result.
5. company's information processing method according to claim 1 based on main business, which is characterized in that the step The concrete processing procedure of S4 is as follows: traversing the tag along sort word split out under the sector according to the trading company library of various industries, filters out Trading company.
6. company's information processing method according to claim 1 based on main business, which is characterized in that the step In S5 and S6, screening is advanced optimized to remaining word, then branch trade is not summarized and is carried out front and back matching sequence, Formation company tag along sort word dictionary;It deletes after tearing wrong word open, progress front and back matching sequence, front and back matching is divided into level-one when sorting Classificating word label, matched secondary classification word label and matched three-level classificating word label.
7. company's information processing method according to claim 1 based on main business, which is characterized in that the step It in S7, is counted with funneling method, is counted since the tag along sort word of minimum level-one, it is public according to the net covering of each word Number is taken charge of, tag along sort word is comprehensive in estimation dictionary.
8. company's information processing method according to claim 1 based on main business, which is characterized in that the step In S8, coverage rate is coverage (%), and it is company_num1 that company's classificating word label, which covers company's number, and parent company's number is Company_num2, company coverage rate coverage (%)=company_num1/company_num2*100% of the dictionary.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is by processor Such as method described in any item of the claim 1 to 8 is realized when execution.
10. a kind of electronic equipment characterized by comprising
One or more processors;
Storage device, for storing one or more programs, when one or more of programs are by one or more of processing When device executes, so that one or more of processors realize such as method described in any item of the claim 1 to 8.
CN201910370624.5A 2019-05-06 2019-05-06 Method, storage medium and device for processing company information based on main business Active CN110083704B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910370624.5A CN110083704B (en) 2019-05-06 2019-05-06 Method, storage medium and device for processing company information based on main business

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910370624.5A CN110083704B (en) 2019-05-06 2019-05-06 Method, storage medium and device for processing company information based on main business

Publications (2)

Publication Number Publication Date
CN110083704A true CN110083704A (en) 2019-08-02
CN110083704B CN110083704B (en) 2020-06-09

Family

ID=67418691

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910370624.5A Active CN110083704B (en) 2019-05-06 2019-05-06 Method, storage medium and device for processing company information based on main business

Country Status (1)

Country Link
CN (1) CN110083704B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140163964A1 (en) * 2012-12-12 2014-06-12 International Business Machines Corporation Approximate named-entity extraction
CN104252507A (en) * 2013-06-28 2014-12-31 北京华傲达数据技术有限公司 Enterprise data matching method and device
US9002725B1 (en) * 2005-04-20 2015-04-07 Google Inc. System and method for targeting information based on message content
CN107193959A (en) * 2017-05-24 2017-09-22 南京大学 A kind of business entity's sorting technique towards plain text

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9002725B1 (en) * 2005-04-20 2015-04-07 Google Inc. System and method for targeting information based on message content
US20140163964A1 (en) * 2012-12-12 2014-06-12 International Business Machines Corporation Approximate named-entity extraction
CN104252507A (en) * 2013-06-28 2014-12-31 北京华傲达数据技术有限公司 Enterprise data matching method and device
CN107193959A (en) * 2017-05-24 2017-09-22 南京大学 A kind of business entity's sorting technique towards plain text

Also Published As

Publication number Publication date
CN110083704B (en) 2020-06-09

Similar Documents

Publication Publication Date Title
US10446028B2 (en) Parking identification and availability prediction
EP4137961A1 (en) Method and apparatus for executing automatic machine learning process, and device
CN107341220A (en) A kind of multi-source data fusion method and device
CN107590654A (en) A kind of method of on-line payment, terminal and computer-readable medium
CN110134759A (en) A method of obtaining the trade information of enterprise
CN110309469A (en) A kind of user clicks behavior visual analysis method, system, medium and electronic equipment
CN112118551A (en) Equipment risk identification method and related equipment
CN111522838A (en) Address similarity calculation method and related device
CN109145050B (en) Computing device
CN114511353A (en) Data analysis method and device
Manley et al. New forms of data for understanding urban activity in developing countries
KR102124935B1 (en) Disaster Monitoring System, Method Using Crowd Sourcing, and Computer Program therefor
CN105786810B (en) The method for building up and device of classification mapping relations
CN110443265A (en) A kind of behavioral value method and apparatus based on corporations
CN106886517A (en) Business site selecting method, device and system
CN110516062A (en) A kind of search processing method and device of document
CN110490349A (en) A kind of information recommendation method based on calendar, device, medium and electronic equipment
CN109783381A (en) A kind of test data generating method, apparatus and system
McKenzie et al. A user-generated data based approach to enhancing location prediction of financial services in sub-Saharan Africa
CN109526027A (en) A kind of cell capacity optimization method, device, equipment and computer storage medium
CN113269355A (en) User loan prediction method, device and storage medium
CN112699955A (en) User classification method, device, equipment and storage medium
CN110083704A (en) A kind of company's information processing method, storage medium and equipment based on main business
US10595178B2 (en) Listing service registrations through a mobile number
Markou et al. Is travel demand actually deep? An application in event areas using semantic information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant