CN110083704B - Method, storage medium and device for processing company information based on main business - Google Patents

Method, storage medium and device for processing company information based on main business Download PDF

Info

Publication number
CN110083704B
CN110083704B CN201910370624.5A CN201910370624A CN110083704B CN 110083704 B CN110083704 B CN 110083704B CN 201910370624 A CN201910370624 A CN 201910370624A CN 110083704 B CN110083704 B CN 110083704B
Authority
CN
China
Prior art keywords
company
word
label
words
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910370624.5A
Other languages
Chinese (zh)
Other versions
CN110083704A (en
Inventor
张艳华
郭瑞兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Tianpeng Network Co ltd
Original Assignee
Chongqing Tianpeng Network Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Tianpeng Network Co ltd filed Critical Chongqing Tianpeng Network Co ltd
Priority to CN201910370624.5A priority Critical patent/CN110083704B/en
Publication of CN110083704A publication Critical patent/CN110083704A/en
Application granted granted Critical
Publication of CN110083704B publication Critical patent/CN110083704B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention relates to the technical field of information security, in particular to a company information processing method based on a main business. According to the invention, the main business of the company is relatively accurately extracted from the company name by combining the relation between the naming rule registered by the company and the main business of the company, and the main business is classified based on the company as the basis of the classification of the company, so that the main business of the company can be relatively accurately positioned, the company can be more accurately divided, and the comparative analysis of each dimension among the companies is facilitated.

Description

Method, storage medium and device for processing company information based on main business
Technical Field
The invention relates to the technical field of information security, in particular to a company information processing method based on a main business.
Background
With the arrival of the big data era, more and more companies pay attention to the comparative analysis of companies based on big data, and the comparative analysis is performed on the companies, and firstly, the companies with similar operation ranges are clustered, for example, the companies are classified according to industries, but with the diversification of the operation ranges of the companies, the main business services of the companies in the same industry can be different and can be similar, so that the information processing of the companies according to the industries is caused, the positioning of the companies is not accurate, and further, the obtained comparative analysis of each dimension of the companies is not representative, and relatively accurate development positioning and the like can not be brought to the companies.
The company information processing based on the company industry is not accurate enough, so that the obtained comparative analysis of each dimension of the company is not representative enough, and the problem of relatively accurate development positioning cannot be brought to the company.
Therefore, in the long-term research and development, the invention provides a company information processing method based on the main business to solve one of the above technical problems.
Disclosure of Invention
The invention aims to provide a company information processing method, a company information processing device, a company information processing medium and electronic equipment based on a main business, which can solve at least one technical problem mentioned above. The specific scheme is as follows:
a company information processing method based on a main business is characterized by comprising the following steps:
s1, acquiring and identifying the company name, and removing address information and company registration type information in the company name;
s2, hierarchically sampling and analyzing the removed company name, and determining the upper limit value of the intercepted word number;
s3, performing even word segmentation and ik word segmentation on the removed company name based on the determined upper limit value to form an initial classification label word;
s4, filtering out the trade names in the initial classification label words by using an existing company trade name library and re-ranking the rest parts of the initial classification label words;
s5, performing part-of-speech analysis on the rest of the initial classified label words, and screening out the classified label words which cannot represent the business characteristics of the company according to the part-of-speech analysis result;
s6, manually screening the screened classified label words, deleting the wrongly-removed classified label words, and then performing front-back matching sequencing to form a company classified label word dictionary;
s7, carrying out net coverage company number statistics on the company name according to the company classification label word dictionary, and evaluating the comprehensiveness of the company classification label dictionary;
and S8, counting the number of uncovered companies, and evaluating the company coverage rate of the classification label word dictionary.
Further, the specific processing procedure of step S1 is as follows: according to a region dictionary comprising provinces, cities, autonomous regions, direct prefectures and counties, traversing each company name, deleting the location part of each company name, constructing a company name database without address information, and then traversing the company name database according to the existing company registration type dictionary to construct a company name database without address information and company registration type.
Further, the specific processing procedure of step S2 is as follows: the companies in the database are grouped according to the company registration types, hierarchical sampling is carried out according to the company occupation ratio of each registration type, and each registration type is extracted
Figure GDA0002447834600000021
Determines that the company name from which the address and the company registration type information are removed should intercept the upper limit value of the word number.
Further, the specific processing procedure of step S3 is as follows: the address information and the company registration type information are removed from one company name, then 6 characters are cut from back to front, and then even number word splitting and ik word splitting are carried out.
The ik word segmentation specific operation principle is as follows: an ik word segmentation device technology is adopted, meanwhile, a dictionary of the user is arranged and supplemented into the ik word segmentation device, so that the ik word segmentation device is optimized; and collecting stop words, and then adopting hive technology and integrating ik word segmenters to segment all companies and filter out words of one word in the segmentation result.
Further, according to the business number library of each industry, the classified label words detached from the industry are traversed, and business numbers are filtered.
Further, in the steps S5 and S6, the remaining words are further optimized and screened, and then are summarized and subjected to front-back matching and sorting without being divided into industries, so as to form a company classification tagged word dictionary; and after deleting the wrongly-disassembled words, performing front and back matching sorting, wherein the front and back matching sorting is divided into a first-level classified word label, a second-level classified word label matched with the first-level classified word label and a third-level classified word label matched with the second-level classified word label.
Furthermore, in step S7, statistics is performed by using a funnel method, the statistics is performed from the lowest-level classification tagged word, and the comprehensiveness of the classification tagged words in the dictionary is estimated according to the net coverage company number of each word.
In step S8, the coverage rate is coverage (%), the company information processing word label covers the company number of company _ num1, the total company number of company _ num2, and the company coverage (%) of the dictionary is company _ num1/company _ num2 100%.
According to a specific embodiment of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of editing content in a document as described in any one of the above.
According to an embodiment of the present invention, there is provided an electronic apparatus including: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method of editing content in a document as claimed in any preceding claim.
Compared with the prior art, the scheme of the embodiment of the invention at least has the following beneficial effects:
according to the invention, the main business of the company is relatively accurately extracted from the company name by combining the relation between the naming rule registered by the company and the main business of the company, and the main business is classified based on the company as the basis of the classification of the company, so that the main business of the company can be relatively accurately positioned, the company can be more accurately divided, and the comparative analysis of each dimension among the companies is facilitated.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 is a flow chart of a method for processing company information based on a main business provided by an embodiment of the invention;
fig. 2 shows a schematic diagram of an electronic device connection structure according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a plurality" typically includes at least two.
It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
It should be understood that although the terms first, second, third, etc. may be used to describe … … in embodiments of the present invention, these … … should not be limited to these terms. These terms are used only to distinguish … …. For example, the first … … can also be referred to as the second … … and similarly the second … … can also be referred to as the first … … without departing from the scope of embodiments of the present invention.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such article or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in the article or device in which the element is included.
Alternative embodiments of the present invention are described in detail below with reference to the accompanying drawings.
Example 1
As shown in fig. 1, the embodiment of the present invention provides a method for processing company information based on a main business, including the following steps:
s1, acquiring the company name, and removing address information and company registration type information in the company name;
s2, carrying out hierarchical sampling analysis on the rest part in the company name according to the company registration type, and determining how many words of the rest company name should be intercepted at most to represent the business of the company;
s3, intercepting the rest part of the company name from back to front according to the determined number of the intercepted characters, and then performing even number word segmentation and ik word segmentation on the intercepted company name field to form an initial classification label word;
s4, filtering out the trade names in the classification label words by using the existing company trade name library and re-ranking the rest classification label words;
s5, performing part-of-speech analysis on the classified label words, and screening out the classified label words such as names of people and place names which cannot represent the business characteristics of the company according to the part-of-speech analysis result, thereby reducing the labor cost;
s6, manually screening the remaining classified label words in the steps, deleting the wrongly-disassembled classified label words, and then performing front-back matching and sequencing to form a company classified label word dictionary;
s7, carrying out net coverage company number statistics on company names according to the formed classification label word dictionary, and evaluating the comprehensiveness of the classification label words of the companies;
and S8, counting the number of uncovered companies, and evaluating the company coverage rate of the classification label word dictionary.
Example 2
Firstly, acquiring a company name, identifying information such as the company location and the company registration type contained in the name, and then removing address information and the company registration type information in the company name; the specific treatment method can be as follows: according to a region dictionary comprising provinces, cities, autonomous regions, direct prefectures and counties, traversing each company name, deleting the location part of each company name, constructing a company name database without address information, and then traversing the company name database according to the existing company registration type dictionary to construct a company name database without address information and company registration type.
Next, carrying out hierarchical sampling analysis on the new company name database according to the company registration type, and determining the number of words intercepted from back to front of the company name; the specific treatment method can be as follows: a) the companies in the database are grouped according to the company registration types, hierarchical sampling is carried out according to the company occupation ratio of each registration type, and each registration type is extracted
Figure GDA0002447834600000071
The company determines how many words should be cut off from the company name without the address and the company registration type information, generally selects even number digits, such as six digits, according to the naming habits of the domestic company, such as that the 'Guangxi nine Chong natural tourism development limited company' is the 'nine Chong natural tourism development' after the address and the company type are removed, and then cuts off at most six words from the back to the front to represent the main operation of the company.
Then, performing even number word splitting and ik word splitting on the intercepted company name field; the specific treatment method can be as follows: if address information and company registration type information are removed from one company name, then 6 characters are intercepted from back to front, for example, the 'Guangxi nine Chongtian ecological tourism development Limited company' is 'nine Chongtian ecological tourism development' after the address and the company type are removed, then six characters are intercepted from back to front at most as 'ecological tourism development', and then even-number word segmentation and ik word segmentation are carried out on the 'ecological tourism development';
the specific operating principle of ik word segmentation is as follows: an ik word segmentation device technology is adopted, meanwhile, a dictionary of the user is arranged and supplemented into the ik word segmentation device, so that the ik word segmentation device is optimized; collecting stop words such as ' of ', ' ground ', ' and the like, then adopting hive technology, integrating ik word segmenters at the same time, segmenting all companies, and filtering out words of one character in segmentation results;
the word-splitting result of a certain company may be: "ecological", "travel", "development", "ecological travel", "travel development", "ecological travel development", as the classification label words of the company;
then, according to a business number library corresponding to the industry to which the company belongs, business number words are filtered from words torn down by all companies; the specific treatment method can be as follows: according to the business number base of each industry, traversing the classified label words torn down in the industry, and filtering out the business numbers, for example: the Bayan county Runji farmer planting professional cooperative society cuts out six characters from back to front after the place name and the registration type are removed to obtain Runji farmer planting, and performs even number word splitting and ik word splitting, and the word splitting result is: "moist ji", "peasant", "plant", "moist ji peasant", "peasant plants", "moist ji peasant plants", and "moist ji" is the trade number, then the word that contains "moist ji" is filtered, then the usable categorised label word of surplus is "peasant", "plant", "peasant plants", the accessible is arranged and is traversed after heavily, reduces the number of times of traversing.
Then, further optimizing and screening the remaining words, then summarizing without dividing into industries, and performing front-back matching and sequencing to form a company classification label word dictionary; after deleting the wrongly-removed words, the processing method of front-back matching sorting may be: for example, the word "numerical control" is used as a first-level classification word label, and a second-level classification word label matched with the first-level classification word label can be: the label of the secondary classification word of the numerical control equipment can be matched with the label of the third grade label of the numerical control equipment, the numerical control machine, the precision numerical control, the intelligent numerical control and the like: "numerical control equipment manufacturing", "machine tool numerical control equipment", and the like.
Then, carrying out net covering company number statistics of a classification dictionary on company names according to the formed clustering dictionary, and evaluating the comprehensiveness of the dictionary classification word labels; the specific treatment method can be as follows: counting by using a funnel method, starting from the lowest class label word, for example, the class label word of the first class is: "numerical control", it can have the following secondary classification label words: the numerical control equipment can have three levels of classified label words under the second-level classified label word: "manufacturing of numerical control equipment", "numerical control equipment of machine tool", and the like, the number of companies covered under "manufacturing of numerical control equipment" is N3_1, and the number of companies covered under "numerical control equipment of machine tool" is N3_ 2. . . Then, the total number of companies covered by the three-level classification label under the second-level classification label "numerical control equipment" is:
n3_ total is N3_1+ N3_2+ N3_3+ N3_4- ∩ (N3_1, N3_2, N3_3 and N3_4), if the total number of covered companies of the numerical control equipment under the secondary classification label is N2_ total, the net covered company number is N2_ total-N3_ total, and the comprehensiveness of the classification label words in the dictionary is estimated according to the net covered company number of each word;
and finally, counting the number of companies covered by the classification tags ①, and evaluating the company coverage of the dictionary, wherein the specific processing method comprises the steps of covering the coverage rate by percent, covering the number of companies by the classification word tags by company _ num1, and covering the total number of companies by company _ num2, wherein the company coverage rate of the dictionary (%) is equal to company _ num1/company _ num2 by 100%.
The statistical method for the number of the classified label words covering companies in the embodiment of the invention comprises the following steps: the company name includes the category label word, and is denoted as 1.
Example 3
As shown in fig. 2, the present embodiment provides an electronic device for a method of information processing based on a main business company, the electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor.
Referring now to FIG. 2, shown is a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 2 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 2, the electronic device may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 401 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage means 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the electronic apparatus are also stored. The processing device 401, the ROM 402, and the RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
Generally, the following devices may be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 408 including, for example, tape, hard disk, etc.; and a communication device 409. The communication means 409 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While fig. 2 illustrates an electronic device having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 409, or from the storage device 408, or from the ROM 402. The computer program, when executed by the processing device 401, performs the above-described functions defined in the method of the present embodiment.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects the internet protocol addresses from the at least two internet protocol addresses and returns the internet protocol addresses; receiving an internet protocol address returned by the node evaluation equipment; wherein the obtained internet protocol address indicates an edge node in the content distribution network.
Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and including conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".

Claims (9)

1. A company information processing method based on a main business is characterized by comprising the following steps:
s1, acquiring and identifying the company name, and removing address information and company registration type information in the company name;
s2, hierarchically sampling and analyzing the removed company name, and determining the upper limit value of the intercepted word number;
s3, performing even word segmentation and ik word segmentation on the removed company name based on the determined upper limit value to form an initial classification label word;
s4, filtering out the trade names in the initial classification label words by using an existing company trade name library and re-ranking the rest parts of the initial classification label words;
s5, performing part-of-speech analysis on the rest of the initial classified label words, and screening out the classified label words which cannot represent the business characteristics of the company according to the part-of-speech analysis result;
s6, manually screening the screened classified label words, deleting the wrongly-removed classified label words, and then performing front-back matching sequencing to form a company classified label word dictionary;
s7, carrying out net coverage company number statistics on the company name according to the company classification label word dictionary, and evaluating the comprehensiveness of the company classification label dictionary;
s8, counting the number of uncovered companies, and evaluating the company coverage rate of the classification label word dictionary;
the specific processing procedure of step S3 is as follows: removing address information and company registration type information from one company name, then intercepting 6 characters from back to front, and then performing even number word splitting and ik word splitting;
the ik word segmentation specific operation principle is as follows: an ik word segmentation device technology is adopted, meanwhile, a dictionary of the user is arranged and supplemented into the ik word segmentation device, so that the ik word segmentation device is optimized; and collecting stop words, and then adopting hive technology and integrating an ik word segmentation device to segment all company names and filter out words of one word in the segmentation result.
2. The method for processing company information based on a main business as claimed in claim 1, wherein the specific processing procedure of step S1 is as follows: according to a region dictionary comprising provinces, cities, autonomous regions, direct prefectures and counties, traversing each company name, deleting the location part of each company name, constructing a company name database without address information, and then traversing the company name database according to the existing company registration type dictionary to construct a company name database without address information and company registration type.
3. The method for processing company information based on a main business as claimed in claim 1, wherein the specific processing procedure of step S2 is as follows: the companies in the database are grouped according to the company registration types, hierarchical sampling is carried out according to the company occupation ratio of each registration type, and each registration type is extracted
Figure FDA0002447834590000021
Determines that the company name from which the address and the company registration type information are removed should intercept the upper limit value of the word number.
4. The method for processing company information based on a main business as claimed in claim 1, wherein the specific processing procedure of step S4 is as follows: and traversing the classified label words removed from the industry according to the business number base of each industry, and filtering out the business numbers.
5. The method as claimed in claim 1, wherein in steps S5 and S6, the remaining words are further optimized and filtered, then collected without division into sectors and sorted by front-to-back matching to form a company classification tagged word dictionary; and after deleting the wrongly-disassembled words, performing front and back matching sorting, wherein the front and back matching sorting is divided into a first-level classified word label, a second-level classified word label matched with the first-level classified word label and a third-level classified word label matched with the second-level classified word label.
6. The method as claimed in claim 1, wherein the step S7 is performed by using a funnel method, wherein the statistics is performed from the lowest-level classification tag word, and the comprehensiveness of the classification tag words in the dictionary is estimated according to the net number of covered companies of each word.
7. The method as claimed in claim 1, wherein in step S8, the coverage is coverage (%), the label of the company classification word covers company number 1, total company number 2, and the coverage (%) of the dictionary is 100% of company _ num1/company _ num 2.
8. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out the method of any one of claims 1 to 7.
CN201910370624.5A 2019-05-06 2019-05-06 Method, storage medium and device for processing company information based on main business Active CN110083704B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910370624.5A CN110083704B (en) 2019-05-06 2019-05-06 Method, storage medium and device for processing company information based on main business

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910370624.5A CN110083704B (en) 2019-05-06 2019-05-06 Method, storage medium and device for processing company information based on main business

Publications (2)

Publication Number Publication Date
CN110083704A CN110083704A (en) 2019-08-02
CN110083704B true CN110083704B (en) 2020-06-09

Family

ID=67418691

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910370624.5A Active CN110083704B (en) 2019-05-06 2019-05-06 Method, storage medium and device for processing company information based on main business

Country Status (1)

Country Link
CN (1) CN110083704B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9002725B1 (en) * 2005-04-20 2015-04-07 Google Inc. System and method for targeting information based on message content
US9251133B2 (en) * 2012-12-12 2016-02-02 International Business Machines Corporation Approximate named-entity extraction
CN104252507B (en) * 2013-06-28 2017-06-27 北京华傲达数据技术有限公司 A kind of business data matching process and device
CN107193959B (en) * 2017-05-24 2020-11-27 南京大学 Pure text-oriented enterprise entity classification method

Also Published As

Publication number Publication date
CN110083704A (en) 2019-08-02

Similar Documents

Publication Publication Date Title
CN110008300B (en) Method and device for determining alias of POI (Point of interest), computer equipment and storage medium
US10115306B2 (en) Parking identification and availability prediction
CN110019616B (en) POI (Point of interest) situation acquisition method and equipment, storage medium and server thereof
CN110619076B (en) Search term recommendation method and device, computer and storage medium
CN111291071B (en) Data processing method and device and electronic equipment
CN111078818B (en) Address analysis method and device, electronic equipment and storage medium
CN110674360B (en) Tracing method and system for data
CN111522838A (en) Address similarity calculation method and related device
CN106503108A (en) Geographical position search method and device
Lansley et al. Big data and geospatial analysis
CN111597279B (en) Information prediction method based on deep learning and related equipment
CN111179055B (en) Credit line adjusting method and device and electronic equipment
CN112506981A (en) Online training service pushing method and device
CN110012426B (en) Method and device for determining casualty POI, computer equipment and storage medium
CN111738316A (en) Image classification method and device for zero sample learning and electronic equipment
CN111339409A (en) Map display method and system
CN113837799A (en) Intelligent business site selection method, system, equipment and readable storage medium
CN110619061A (en) Video classification method and device, electronic equipment and readable medium
CN110083704B (en) Method, storage medium and device for processing company information based on main business
US20180329926A1 (en) Image-based semantic accommodation search
CN111401934A (en) Distributed advertisement statistical method and device
CN115563522A (en) Traffic data clustering method, device, equipment and medium
CN111126120B (en) Urban area classification method, device, equipment and medium
CN114860879A (en) Data association method, device, equipment and computer storage medium
CN112307073A (en) Information query method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant