CN112989166A - Method for calculating actual business territory of enterprise - Google Patents

Method for calculating actual business territory of enterprise Download PDF

Info

Publication number
CN112989166A
CN112989166A CN202110330113.8A CN202110330113A CN112989166A CN 112989166 A CN112989166 A CN 112989166A CN 202110330113 A CN202110330113 A CN 202110330113A CN 112989166 A CN112989166 A CN 112989166A
Authority
CN
China
Prior art keywords
address
enterprise
administrative division
target
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110330113.8A
Other languages
Chinese (zh)
Inventor
唐杰
徐超
陈雨馨
梁协君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Youshu Finance Information Services Co ltd
Original Assignee
Hangzhou Youshu Finance Information Services Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Youshu Finance Information Services Co ltd filed Critical Hangzhou Youshu Finance Information Services Co ltd
Priority to CN202110330113.8A priority Critical patent/CN112989166A/en
Publication of CN112989166A publication Critical patent/CN112989166A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis

Abstract

The application provides a method for calculating an actual business address of an enterprise, which extracts detailed structured data such as province, city, county and the like from a target enterprise address through a trained address information extraction model; mapping the extracted detailed structured data such as province, city, district and county and the like into standard and standard administrative division description data through a trained administrative division mapping model; calculating an initial score of the address based on standard canonical administrative division description data extracted and mapped from the enterprise address; calculating the actual score of the address based on the release date of the target enterprise address and the attenuation function; the address corresponding to the highest output score is the actual business address of the enterprise (if the address corresponding to the highest output score is more than one, the address with the latest release date is output).

Description

Method for calculating actual business territory of enterprise
Technical Field
The method relates to the technical field of text processing, in particular to a method for extracting an actual business area of an enterprise according to a plurality of source addresses of the enterprise.
Background
The address published by the enterprise on the public data source often has the following problems:
1. the address issued in the basic information of the enterprise is often a registration address, and usually has a certain access with the business address;
2. the address released in the annual report is independently disclosed by the enterprise, the updating frequency is low, and the address is updated once in 1 year under the common condition;
3. although the addresses published by the recruitment website are updated frequently, the problem that the same address is not uniform in description modes such as shorthand, wrong writing, missed writing and the like due to the fact that the number of the published addresses is large and the number of staff is large is caused;
when a user wants to obtain the actual business address of the enterprise through the above mentioned data sources, it is common practice to first eliminate addresses where it is difficult to obtain the actual location through manual inspection and verification, and then screen the actual business address of the enterprise from the remaining addresses according to the detailed level of the address description and the release date of the address. However, this method requires a lot of manpower and material resources, and
the verification process consumes a lot of time, and the efficiency of the analysis process is low.
Disclosure of Invention
Therefore, there is a need to provide a method for extracting an actual business address of an enterprise, which can improve analysis efficiency, and can calculate the actual business address of the enterprise by reasonably analyzing addresses acquired from a plurality of public data source websites when a user queries the actual business address of the enterprise, aiming at the technical problem mentioned above.
In order to realize the purpose, the method adopts the technical scheme that: a method for calculating the actual business area of an enterprise analyzes addresses acquired from a plurality of public data source websites, and firstly filters out addresses which cannot be positioned to the actual business position of the enterprise and possibly influence the final calculation result or have little meaning from the address length, whether keywords such as province, city, county and the like exist; extracting key address information from the address through a trained model, and mapping the key address information into a standard and standardized administrative division description; and finally, calculating the score of each enterprise address by combining a formula which is adjusted to be optimal weight through a large number of experiments, and outputting the address with the highest score and the nearest release date as the actual operation address of the enterprise.
The specific implementation steps of the whole scheme are as follows:
1. acquiring addresses of enterprises and release dates of the addresses from a plurality of sources, and cleaning the addresses to filter out addresses with insufficient length, wherein the addresses have no important keywords such as xx province, xx city, xx county and the like, and only have meaningless addresses such as xx province, xx city, xx county and the like;
2. extracting key address information from the address cleaned in the step 1 through a trained address information extraction model;
3. mapping the address information extracted from the Shang 2 by the trained administrative division mapping model into the administrative division description of the standard specification;
4. calculating the initial score of each enterprise address through an enterprise address initial score formula which is subjected to a large number of experiments and is adjusted in weight and the administrative division description of the standard specification in the above-mentioned 3;
5. calculating the final score of each enterprise address according to an enterprise address final score formula, the enterprise address initial score in the complaint 4 and the release date of the enterprise address;
6. the address with the highest output score and the latest release date is the actual business address of the enterprise.
Drawings
FIG. 1 is a flow chart of a specific embodiment of the process.
Detailed Description
For the purpose of making the technical solutions and advantages of the present application more clear and more obvious, the technical solutions in the embodiments of the present application are described below in detail and completely with reference to the drawings in the embodiments, it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of the present application.
The method for calculating the actual business place of the enterprise can be applied to the application environment shown in fig. 1, and the steps include:
step 101: when a user inquires about the actual business place of a target enterprise, firstly, a plurality of addresses of the enterprise are required to be acquired from various websites capable of acquiring the enterprise address, the most direct inquiry from the website of the national enterprise credit information public system can be realized to acquire the registration address of the enterprise, the change date of the registration address, namely the release date, is acquired from the change record of the enterprise, and if the registration address is not changed, the establishment date is acquired from the attendance information of the enterprise and is used as the release date of the registration address; in addition, the annual newspaper address independently disclosed by the enterprise can be obtained from the annual newspaper released by the enterprise in the last year, and the release date of the annual newspaper is selected as the release date of the annual newspaper address; more address sources can be obtained from a plurality of recruitment websites, such as BOSS direct recruitment, intelligent joint recruitment, no worry in the future, 58 co-city recruitment, network pull-up, street network and the like, the recruitment announcement released by the enterprise in the last year is selected, the recruitment address released by the enterprise is obtained from the address sources, and the date of releasing the recruitment announcement is used as the release date of the recruitment address.
Step 102: before analyzing the enterprise addresses acquired in the step 101, basic filtering and cleaning are firstly carried out on the addresses, and the inventor finds that, in the implementation process, because the recruitment notices issued on the recruitment website often do not come from the hands of the same person, each person has own unique writing habit, or the addresses of real positions are difficult to distinguish by other people due to miswriting, missed writing, shorthand writing and the like caused by the carelessness of work; such as: in a recruitment bulletin released by a certain suzhou company on a carefree website in the future in 10 months of 2020, filled in the address description field are: the lake of kou dao 39, which is acceptable to an applicant, can lock the suzhou city and then inquire the position of the lake of kou dao 39, but the address is meaningless for the method, and important key words of xx province, xx city or xx county are lacked, so that the accurate position of the address cannot be located, even unnecessary influence on the final output result is possible, and the address system like the address system is filtered out; moreover, in a recruitment announcement issued by a certain Changchun company on a recruitment website in 10 and 58 in 2020, only Changchun is filled in an address description column, which is not only meaningless for the method, but also meaningless for an applicant; therefore, the regular expression is used to filter the addresses of the words with too few words, or without the keywords of zhejiang (province), suzhou (city), etc., or only the keywords of zhejiang (province), suzhou (city), etc., so as to avoid unnecessary influence on the result.
Step 103: it should be noted that even if the acquired address is flushed as mentioned in step 2 above, the following problems are inevitable; firstly, if the inquiring enterprise is a new enterprise, because the personnel of each necessary post is relatively fixed at the initial establishment stage of the enterprise, the recruitment of new personnel is not needed in a short time, and therefore, the recruitment address issued by the enterprise cannot be acquired on each large recruitment website; secondly, the national enterprise credit information public system website needs to be updated and maintained daily, and when the node is located to inquire a just established enterprise, the registered address of the enterprise can not be acquired; finally, the annual report of the enterprise is issued once a year, which means that the annual report address of the enterprise can not be obtained when the latest established enterprise is inquired; when the problems occur simultaneously, the enterprise can not obtain the address from the complaint website, and the actual operation address of the enterprise is directly returned to be empty; in addition, if only one address is left after cleaning, it is also meaningless to execute the following steps, and the address can be directly returned as the actual business address of the enterprise.
Step 104: when a plurality of acquired enterprise addresses are available and a plurality of addresses are reserved after the cleaning in the step 2, further analysis is needed to determine the actual operating address of the enterprise; in order to ensure the accuracy of the final result, an address information extraction model needs to be trained; randomly acquiring a batch of addresses from a national enterprise credit information public system website and a recruitment website as training samples, and marking each sample address into the following form: sample address: nine loops in the Jianggan area of Hangzhou city, Zhejiang 4 th building 436 chamber, the labeling result is: saving: zhejiang province, city: hangzhou city, county (district): river trunk area, road: nine loops, way-number: no. nine, park: park-ridge: building No. 4, garden-building: building 4, park-dong-building-number: 436 chamber; and taking the sample address as an input feature, taking the corresponding labeled address information of the target province, the city, the county and the like as an expected output feature, and training the long-term memory neural network to obtain a trained address information extraction model.
Step 105: and extracting key information of all the addresses of the enterprise after being cleaned by using the address information extraction model trained in the step 103.
Step 106: the address information extracted in step 104 as mentioned above is checked, for example: under the key of province, only 34 provincial administrative districts such as Zhejiang province, Shanghai city and the like can be stored, and the key can be called as follows: zhejiang, Xinjiang, Australia and the like, wherein the Zhejiang, Xinjiang, Australia and the like comprise 23 provinces, 5 autonomous regions, 4 directly administered cities and 2 special administrative regions, and the address is discarded as long as keywords of the administrative regions are not included; similarly, whether other keywords appear under the city, district (county) key or not is checked, and if the keywords appear, the address is discarded as well.
Step 107: training an administrative division mapping model; descriptions that will be less common are: the Guangxi Zhuang autonomous region, the Hui autonomous county of the great works, the Isui Kazak autonomous State and the like are taken as training samples, and the description of the corresponding common standard specification is as follows: guangxi, Dacheng county, Ili as the results of the labeling; and taking the description of the target administrative division as an input feature, taking the standard specification description corresponding to the corresponding target administrative division as an expected output feature, and training the long-term memory neural network to obtain a trained administrative division mapping model.
Step 108: mapping the extracted address information into a standard administrative division description by using the administrative division mapping model trained in the step 106;
step 109: according to the following formula:
s0the enterprise site was calculated as 30p +25c +20x +15j + max (5w +5wh, 7y +2yd +0.5ydl +0.5ydlh)An initial score for the address; the formula is known, and the formula is calculated by extracting administrative division information according to the enterprise address; the total score is 100, if province is extracted from the address, 30 scores are obtained, namely the weight ratio is 30%, if city is extracted, 25 scores are obtained, namely the weight ratio is 25%, if district (county) is extracted, 20 scores are obtained, namely the weight ratio is 20%, if street is extracted, 15 scores are obtained, namely the weight ratio is 15%, and the remaining 10 scores, namely 10% weight is determined by the following two conditions with higher score, namely the first condition: if xx is extracted, 5 points are obtained, i.e. the weight is 5%, and in the second case: if the park is extracted, 7 points are obtained, namely the weight accounts for 7%, if the xx building is extracted, 2 points are obtained, namely the weight accounts for 2%, if the xx building is extracted, 0.5 point is obtained, namely the weight accounts for 0.5%, and if the xx number is extracted, 0.5 point is also obtained, namely the weight accounts for 0.5%;
step 110: according to the following formula:
Figure BDA0002994155900000061
calculating an actual score of the business address; as known from the formula, when the initial score s of the address of the business is obtained by calculation0After, it does not mean that the final address score is also s0This is obvious; when the release date of an address has been long, and the confidence level of the address is very low even if the description is detailed again, the inventor hopes that the formula can lead the score to show a descending trend according to the increase of days of the release date, and the descending trend is obvious in the initial days, and the descending trend is more and more slow later, such as: when the initial score of a certain address is 100 minutes, after 15 days past the address release date, the final address score is:
Figure BDA0002994155900000071
step 111: and performing reverse sorting according to the final score of the enterprise address, and if the same score corresponds to a plurality of addresses, performing secondary reverse sorting according to the release date, so as to output the address with the highest score and the nearest release date, namely the final actual business address of the enterprise.

Claims (3)

1. A method for calculating an actual business segment of an enterprise, comprising the steps of:
step 1: when a certain user inquires the actual operation address of a certain enterprise, the address of the enterprise and the release date corresponding to the address are obtained through a data public website;
step 2: extracting province, city, county (district), street, road-number, park-ridge-building-number data, namely the enterprise address information structured data from the address in the step 1 through a trained address information extraction model;
and step 3: mapping the structured data of the enterprise address information extracted in the step 2 into standard and standard general administrative division description data through a trained administrative division mapping model;
and 4, step 4: calculating the initial score of the enterprise address according to the universal administrative division description data of the enterprise standard specification in the step 3, wherein the specific formula is as follows:
s0=v1×p+v2×c+v3×x+v4×j+max(v5×w+v6×wh,v7×y+v8×yd+v9×ydl+v10×ydlh)
wherein, p, c, x, j, w, wh, y, yd, ydl and ydlh respectively and correspondingly represent the values of province, city, county (district), street, road-number, garden-building-number; if the general administrative division description data of the enterprise standard specification has a value corresponding to an administrative division, the value of the administrative division to the dependent variable is 1, otherwise, the value is 0; v. of1,v2,v3,v4,v5,v6,v7,v8,v9,v10Is corresponding toThe most appropriate values obtained after the weight of the index is adjusted according to experience and a large number of experiments are respectively as follows: 30, 25, 20, 15, 5, 5, 7, 2, 0.5, 0.5;
and 5: according to the initial score s of the enterprise address in step 40And calculating the actual score of the enterprise address according to the address release date, wherein the specific formula is as follows:
Figure FDA0002994155890000011
where s represents the actual score of the business address, tmIndicating the current date, tnIndicating the release date, t, of the business addressm-tnThen the difference in day of the release date from today (rounded down), S0An initial score representing the address of the business;
step 6: and 5, according to the actual scores of the enterprise addresses, performing reverse ordering on the enterprise addresses according to the scores, if the same score corresponds to a plurality of addresses, performing secondary reverse ordering according to the release date, and outputting the address with the first rank, namely the user finds the actual business address of the enterprise.
2. The method of claim 1, wherein the step of training the address information extraction model comprises:
acquiring a plurality of target enterprise addresses;
respectively marking target provinces, cities, counties (districts), streets, roads, road-numbers, parks, park-buildings and park-buildings in each target address;
obtaining the information extraction problem of the target address;
and training the long-term memory neural network to obtain a trained address information extraction model by taking the target address and the extraction problem of the target address information as input features and taking the corresponding address information of the target province, the city, the county and the like as expected output features.
3. The method of claim 1, wherein the step of training the administrative division mapping model comprises:
acquiring description data of a plurality of target administrative divisions, and distinguishing provinces, cities, counties, streets and the like;
respectively marking descriptions of provinces, cities, counties, streets and the like needing corresponding standard specifications in each target administrative district;
and training the long-term memory neural network to obtain a trained administrative division mapping model by taking the description of the target administrative division as an input feature and taking the standard description corresponding to the corresponding target administrative division as an expected output feature.
CN202110330113.8A 2021-03-26 2021-03-26 Method for calculating actual business territory of enterprise Pending CN112989166A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110330113.8A CN112989166A (en) 2021-03-26 2021-03-26 Method for calculating actual business territory of enterprise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110330113.8A CN112989166A (en) 2021-03-26 2021-03-26 Method for calculating actual business territory of enterprise

Publications (1)

Publication Number Publication Date
CN112989166A true CN112989166A (en) 2021-06-18

Family

ID=76334019

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110330113.8A Pending CN112989166A (en) 2021-03-26 2021-03-26 Method for calculating actual business territory of enterprise

Country Status (1)

Country Link
CN (1) CN112989166A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933134A (en) * 2015-06-12 2015-09-23 海信集团有限公司 User feature analysis method and user feature analysis device
CN107967332A (en) * 2017-11-28 2018-04-27 厦门市美亚柏科信息股份有限公司 Enterprise's address recognition methods and identifying system
CN109684440A (en) * 2018-12-13 2019-04-26 北京惠盈金科技术有限公司 Address method for measuring similarity based on level mark
CN109739844A (en) * 2018-12-26 2019-05-10 西安电子科技大学 Data classification method based on decaying weight
CN109885534A (en) * 2019-03-04 2019-06-14 深圳市众信电子商务交易保障促进中心 The management method and terminal of business license
CN110633345A (en) * 2019-08-16 2019-12-31 阿里巴巴集团控股有限公司 Method and system for identifying enterprise registration address
CN111125365A (en) * 2019-12-24 2020-05-08 京东数字科技控股有限公司 Address data labeling method and device, electronic equipment and storage medium
CA3145918A1 (en) * 2019-07-26 2021-02-04 Nanyi LI Address information parsing method and apparatus, system and data acquisition method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933134A (en) * 2015-06-12 2015-09-23 海信集团有限公司 User feature analysis method and user feature analysis device
CN107967332A (en) * 2017-11-28 2018-04-27 厦门市美亚柏科信息股份有限公司 Enterprise's address recognition methods and identifying system
CN109684440A (en) * 2018-12-13 2019-04-26 北京惠盈金科技术有限公司 Address method for measuring similarity based on level mark
CN109739844A (en) * 2018-12-26 2019-05-10 西安电子科技大学 Data classification method based on decaying weight
CN109885534A (en) * 2019-03-04 2019-06-14 深圳市众信电子商务交易保障促进中心 The management method and terminal of business license
CA3145918A1 (en) * 2019-07-26 2021-02-04 Nanyi LI Address information parsing method and apparatus, system and data acquisition method
CN110633345A (en) * 2019-08-16 2019-12-31 阿里巴巴集团控股有限公司 Method and system for identifying enterprise registration address
CN111125365A (en) * 2019-12-24 2020-05-08 京东数字科技控股有限公司 Address data labeling method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107526786A (en) The method and system that place name address date based on multi-source data is integrated
CN111324679B (en) Method, device and system for processing address information
Gregory et al. The Great Britain Historical GIS Project: from maps to changing human geography
CN110472066A (en) A kind of construction method of urban geography semantic knowledge map
CN104462216B (en) Occupy committee's standard code converting system and method
WO2015027836A1 (en) Method and system for place name entity recognition
CN109933797A (en) Geocoding and system based on Jieba participle and address dictionary
CN104462059A (en) Commercial tenant address information recognition method and device
CN104699835A (en) Method and device used for determining webpages including POI (point of interest) data
CN107463711A (en) A kind of tag match method and device of data
CN110297961A (en) A kind of Quick Acquisition of policy information and optimization extracting method
CN108062365B (en) Method for improving address resolution accuracy
KR101217642B1 (en) Pollution load estimation system
CN107577744A (en) Nonstandard Address automatic matching model, matching process and method for establishing model
Bronin et al. How to Make a Zoning Atlas: A Methodology for Translating and Standardizing District-Specific Regulations
CN112989166A (en) Method for calculating actual business territory of enterprise
CN107885833A (en) Method and system based on the change of Web newsletter archive quick detections ground mulching
CN115719289A (en) House data processing method, device, equipment and medium
CN113268568A (en) Electric power work order repeated appeal analysis method based on word segmentation technology
CN116431625A (en) Positioning analysis method and device for geographic entity and computer equipment
CN112732779B (en) Method for analyzing address text by big data based on site POI
CN107135281B (en) IP region feature extraction method based on multi-data source fusion
CN109408533A (en) Data processing and search method, database, search engine and system
CN117787209A (en) Treatment system for carrying out address structuring algorithm based on natural language
Taylor et al. Canada's Federal Electoral Districts, 1867–2021: New Digital Boundary Files and a Comparative Investigation of District Compactness

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 310000 room 808, floor 8, building 1, No. 8 and 10, Jiuhe Road, Jiubao street, Shangcheng District, Hangzhou, Zhejiang Province

Applicant after: Zhejiang youshuzhi Technology Co.,Ltd.

Address before: 310000 room 808, 8 / F, building 4, No. 9, Jiuhuan Road, Jianggan District, Hangzhou City, Zhejiang Province

Applicant before: HANGZHOU YOUSHU FINANCE INFORMATION SERVICES CO.,LTD.