CN116384948A - Method, device, equipment and medium for extracting location of mark information item - Google Patents

Method, device, equipment and medium for extracting location of mark information item Download PDF

Info

Publication number
CN116384948A
CN116384948A CN202310645158.3A CN202310645158A CN116384948A CN 116384948 A CN116384948 A CN 116384948A CN 202310645158 A CN202310645158 A CN 202310645158A CN 116384948 A CN116384948 A CN 116384948A
Authority
CN
China
Prior art keywords
bidding
bidding information
level administrative
city
information parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310645158.3A
Other languages
Chinese (zh)
Other versions
CN116384948B (en
Inventor
贾新
田小亮
张金坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tuopu Fenglian Information Technology Co ltd
Original Assignee
Beijing Tuopu Fenglian Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tuopu Fenglian Information Technology Co ltd filed Critical Beijing Tuopu Fenglian Information Technology Co ltd
Priority to CN202310645158.3A priority Critical patent/CN116384948B/en
Publication of CN116384948A publication Critical patent/CN116384948A/en
Application granted granted Critical
Publication of CN116384948B publication Critical patent/CN116384948B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Operations Research (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a method, a device, equipment and a medium for extracting a target message project place, which relate to the technical field of data processing and collect a plurality of data to be identified of the target message project place; the plurality of data to be identified comprise site region values of a preset bidding information release platform, a preset place name library comprising administrative division trees and bidding data comprising various bidding information parameters; three-level administrative division extraction is carried out on each bidding information parameter, and a plurality of three-level administrative divisions extracted by each bidding information parameter are combined to obtain the city where each bidding information parameter is located; and merging the site region value, the place name library and the cities of each bidding information parameter successively according to a preset first priority to obtain the target location of the bidding project, thereby accurately serving local users.

Description

Method, device, equipment and medium for extracting location of mark information item
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a medium for extracting a location of a label item.
Background
In order to help users to master more valuable bidding data in real time, market competitiveness is improved, data are crawled for all large Internet bidding websites, most of crawled structured web texts are structured, and statistics and analysis are further carried out.
However, at present, the function of mining the location of the mark item is not available, and the requirement of accurately serving the local user cannot be met.
Disclosure of Invention
In view of this, an object of the present application is to provide a method, apparatus, device and medium for extracting a location of a target item, which can extract the location of the target item from collected web text and structured data, and serve local users more accurately.
In a first aspect, an embodiment of the present application provides a method for extracting a location of a beacon item, where the method includes the following steps:
collecting a plurality of data to be identified of the location of the standard message item; the plurality of data to be identified comprise site region values of a preset bidding information release platform, a preset place name library comprising administrative division trees and bidding data comprising various bidding information parameters;
three-level administrative division extraction is carried out on each bidding information parameter, and a plurality of three-level administrative divisions extracted by each bidding information parameter are combined to obtain the city where each bidding information parameter is located; wherein the bid information parameters comprise one or more of project addresses, project names and purchasing units;
Setting a first priority for the regional value of the website, the place name library and the cities of each bidding information parameter, and merging the regional value of the website, the place name library and the cities of each bidding information parameter successively according to the first priority to obtain the target place of the bidding project.
In some embodiments, the three-level administrative division extraction for each of the bid information parameters includes the steps of:
preprocessing each bidding information parameter, wherein the preprocessing comprises collecting auxiliary addresses of the project addresses, cleaning redundant fields of the project names and extracting characteristic characters in the purchasing units;
comparing each preprocessed bid information parameter with the place name library to obtain three-level administrative division of each bid information parameter; the three-level administrative division is three levels of province, city and county.
In some embodiments, the merging the three-level administrative divisions extracted from each of the bid information parameters to obtain the city of each of the bid information parameters includes the following steps:
According to the structure of the plurality of three-level administrative divisions extracted by each bidding information parameter, reserving or discarding the plurality of three-level administrative divisions;
if the extracted three-level administrative division is more than two provincial structures, discarding the extracted three-level administrative division;
if the extracted three-level administrative division is of a provincial multi-city structure, reserving provincial parts of the extracted three-level administrative division;
and if the extracted three-level administrative division is in a province-city structure, reserving the province and the city of the extracted three-level administrative division.
In some embodiments, the setting the first priority to the site territory value, the place name repository, and the city of each of the bidding information parameters includes the steps of:
determining the region value of the site, the place name library and the confidence sequencing order of each bidding information parameter according to a plurality of different types of data to be identified and the confidence degree of the places where the bidding information items are located affected by different information in each type of data to be identified;
setting a first priority according to the region value of the website, the place name library and the confidence sequencing order of each bidding information parameter; wherein the higher the confidence, the higher the first priority.
In some embodiments, the purchase units are of different types, and the site territory value, the place name library, and the confidence ranking order of each of the bid information parameters are different; the bid information parameters also comprise one or more of approval departments/release departments, purchase unit addresses, titles and postal codes/fixed phones.
In some embodiments, the local cities of the sources of the purchasing units are merged sequentially by the second first priority, and the site territory value, the place name repository, the local cities of each of the bid information parameters are merged sequentially by the first priority, by:
if the previous three-level administrative division is a structure containing less than two provinces, the next three-level administrative division is a structure containing new provinces, and the new provinces are discarded when merging;
if two three-level administrative regions to be combined are in a provincial and municipal two-level structure, one three-level administrative region is in the same provincial and primary structure, and the three-level administrative regions of the provincial and municipal two-level structure are reserved.
In some embodiments, the extraction method further comprises the steps of:
judging whether the obtained location of the mark item is empty or not;
And if the obtained target information item location is empty, setting a second priority for the site region value and the cities of each bidding information parameter, and sequentially merging the site region value and the cities of each bidding information parameter according to the second priority to obtain the target information item location.
In a second aspect, an embodiment of the present application provides an apparatus for extracting a location of a label item, where the apparatus includes:
the acquisition module is used for acquiring a plurality of data to be identified of the location of the standard message item; the plurality of data to be identified comprise site region values of a preset bidding information release platform, a preset place name library comprising administrative division trees and bidding data comprising various bidding information parameters;
the extraction module is used for extracting three-level administrative regions aiming at each bidding information parameter, and combining a plurality of three-level administrative regions extracted by each bidding information parameter to obtain the city where each bidding information parameter is located; wherein the bid information parameters comprise one or more of project addresses, project names and purchasing units;
And the merging module is used for setting a first priority for the site region value, the place name library and the cities of each bidding information parameter, and merging the site region value, the place name library and the cities of each bidding information parameter successively according to the first priority to obtain the places of the bidding projects.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, and a bus, where the memory stores machine-readable instructions executable by the processor, where the processor and the memory communicate with each other through the bus when the electronic device is running, and where the machine-readable instructions, when executed by the processor, perform the steps of the method for extracting a location of a beacon item according to any one of the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer readable storage medium, where a computer program is stored, where the computer program is executed by a processor to perform the steps of the method for extracting a location of a beacon item according to any one of the first aspect.
The method, the device, the electronic equipment and the storage medium for extracting the location of the standard message item acquire a plurality of data to be identified of the location of the standard message item; the plurality of data to be identified comprise site region values of a preset bidding information release platform, a preset place name library comprising administrative division trees and bidding data comprising various bidding information parameters; three-level administrative division extraction is carried out on each bidding information parameter, and a plurality of three-level administrative divisions extracted by each bidding information parameter are combined to obtain the city where each bidding information parameter is located; and merging the site region value, the place name library and the cities of each bidding information parameter successively according to a preset first priority to obtain the target location of the bidding project, thereby accurately serving local users.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for extracting a location of a label item according to an embodiment of the present application;
FIG. 2 illustrates a flow chart of three-level administrative division extraction for each of the bid information parameters according to an embodiment of the present application;
FIG. 3 illustrates a flow chart of obtaining a city of the purchasing unit in accordance with an embodiment of the present application;
FIG. 4 is a flow chart illustrating a first priority of setting a city of each of the bid information parameters, the site territory value, the place name repository, and the place name repository according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of an extracting device of the location of a label item according to an embodiment of the present application;
fig. 6 shows a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the accompanying drawings in the present application are only for the purpose of illustration and description, and are not intended to limit the protection scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this application, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to the flow diagrams and one or more operations may be removed from the flow diagrams as directed by those skilled in the art.
In addition, the described embodiments are only some, but not all, of the embodiments of the present application. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.
It should be noted that the term "comprising" will be used in the embodiments of the present application to indicate the presence of the features stated hereinafter, but not to exclude the addition of other features.
In view of the technical problems set forth in the background art, the application provides a method, a device, electronic equipment and a storage medium for extracting the location of a mark item, which can extract the location of the mark item from collected web text and structured data, so that a local user can be served more accurately.
Referring to fig. 1 of the specification, the method for extracting the location of the label item provided in the embodiment of the application includes the following steps:
s1, collecting a plurality of data to be identified of a location of a standard message item; the plurality of data to be identified comprise site region values of a preset bidding information release platform, a preset place name library comprising administrative division trees and bidding data comprising various bidding information parameters;
s2, extracting three-level administrative regions aiming at each bidding information parameter, and merging a plurality of three-level administrative regions extracted by each bidding information parameter to obtain the city where each bidding information parameter is located; wherein the bid information parameters comprise one or more of project addresses, project names and purchasing units;
And S3, setting a first priority for the site region value, the place name library and the cities of each bidding information parameter, and merging the site region value, the place name library and the cities of each bidding information parameter successively according to the first priority to obtain the target location of the bidding project.
It should be noted that, in the embodiment of the present application, the method for extracting the location of the beacon item may be operated in a terminal device or a server; the terminal device may be a server terminal device, and when the extraction method of the location of the mark item is operated on the server, the extraction method of the location of the mark item may be implemented and executed based on a cloud interaction system, where the cloud interaction system at least includes the server and a client device (i.e., the terminal device).
In step S1, a part of the plurality of data to be identified is derived from the web text and the structured data acquired from the bidding issuing platform, and the other part is derived from the preset regional value of the bidding issuing platform and the preset place name library.
The place name library comprises administrative division trees constructed in the province-city-county level, corresponding regional nodes can be obtained through logical calculation and comparison between given regional values and the administrative division trees in the place name library, and if the regional nodes have father nodes, the corresponding father node regions can be deduced. For example, the obtained region node is the city, and the derived parent node region is the province including the city. The method is based on the principle, and can identify three-level administrative directories of provinces, cities and counties where the target of the marking item is located as far as possible;
The preset regional value of the bidding issuing platform can be national level, provincial level or municipal level, and if the regional value of the electronic bidding public service platform is provincial level; * The regional value of the public service platform of the electronic bidding and bidding of the city is the city level. This is because bidding information is regional, and bidding information advertised by the electronic bidding public service platform of the province is generally applicable only to the province; * The electronic bidding public service platform is generally applicable to only change of market. Therefore, the region value of the preset bidding issuing platform has certain confidence coefficient and can be used for judging the target location of the bidding project;
the web text and structured partial data obtained from the bid issuing platform must also be data associated with identifying the location of the bidding project, for example, the bidding data selected in the present application to include various bid information parameters may be the project address, project name and purchasing unit, since the project address, project name and purchasing unit will all include characters related to the location of the bidding project in the regular bulletin, while other bid information parameters, such as bid qualification, financial requirement, investment amount, etc., will generally not include characters related to the location of the bidding project.
Therefore, in order to improve the probability and accuracy of identifying the target location of the standard message item, in the present application, the site region value of the preset standard message information distribution platform, the preset place name library including the administrative division tree, and the item address, the item name and the purchase unit information are used as a plurality of data to be identified of the target location of the standard message item.
In step S2, see fig. 2 of the specification, wherein three-level administrative division extraction is performed for each of the bidding information parameters, including the steps of:
s201, preprocessing each bidding information parameter, wherein the preprocessing comprises the steps of collecting auxiliary addresses of project addresses, cleaning redundant fields of the project names and extracting characteristic characters in the purchasing units;
s202, comparing each preprocessed bid information parameter with the place name library to obtain three-level administrative regions of each bid information parameter; the three-level administrative division is three levels of province, city and county.
In other words, when three-level administrative divisions are extracted from three types of bidding information parameters of project address, project name and purchasing unit, data of the project address, the project name and the purchasing unit need to be preprocessed. In this embodiment, the project address refers to various auxiliary addresses, such as delivery address, receiving address, delivery address, location/area/goods location, etc., so that all the related auxiliary address data must be collected uniformly to more comprehensively extract three-level administrative regions of the project address;
Because the project names contain texts such as 'company', the same texts as the buyers and the like besides the texts of the cities, and in the application, the finally acquired target places of the target message projects are provincial, urban and county three-level administrative divisions, the front three characters of roads/streets are also required to be removed, namely, a rule for cleaning redundant fields is set for the project names so as to improve the efficiency of the subsequent comparison with the place name library;
and conversely, the purchasing unit extracts the third-level administrative division by setting the extraction rule, so that the efficiency of the subsequent comparison with the place name library is improved. For example, extract division, first three characters; characters in brackets; * Group company middle characters, company middle characters, company post three characters, and the like.
The comparison of each preprocessed bid information parameter with the place name library to find out the corresponding city field is a technical means known to those skilled in the art, and will not be described herein. However, in the extracted city field, the province and city two-stage use full names for short and the county-stage use full names.
Since there are multiple city fields extracted from the project address, project name or purchasing unit data, even there are conflicts between the multiple city fields, if effective merging is not performed, the location of the target project identified later will be inaccurate. For example, city fields extracted from different auxiliary addresses are some are A province, some are B-1 (B province 1 city), and some are B-2 (B province 2 city). Therefore, a plurality of city fields extracted from the project address, the project name or the purchase unit data need to be combined to obtain the city corresponding to each of the bid information parameters.
In the present application, the provinces, cities and counties are combined in such a manner that they are reserved or discarded according to the structures of the three-level administrative divisions extracted by each bidding information parameter. Specific:
if the extracted three-level administrative divisions are more than two provinces, discarding the extracted three-level administrative divisions, for example, the three-level administrative divisions extracted from the project address are respectively A province, B province and C province, and the confidence is not high, so that any data is not reserved; if the proposed three-level administrative regions are A-1 (A province 1 city), B-1 (B province 1 city) and C-1 (C province 1 city), respectively, the three provinces are also exceeded, the confidence is not high, and therefore no data is reserved; if the proposed three-level administrative divisions are A-1 (A province 1 city), A-2 (A province 2 city), A-3 (A province 3 city), only one province is involved, but it is uncertain which province is, so A-1 (A province 1 city), A-2 (A province 2 city), A-3 (A province 3 city) are reserved for the subsequent determination;
If the extracted three-level administrative divisions are more than four municipal structures, the municipal is discarded, and only the provinces are reserved, for example, the three-level administrative divisions extracted from the project name are respectively A-1 (A province 1 municipal), A-2 (A province 2 municipal), A-3 (A province 2 municipal), B-1 (B province 1 municipal), B-2 (B province 2 municipal), and A province and B province are reserved for subsequent determination.
In step S3, referring to fig. 4 of the specification, a first priority is set for the site region value, the place name library, and the city of each of the bid information parameters in the following manner:
s301, determining the site region value, the place name library and the confidence sequencing order of each bidding information parameter according to a plurality of different types of data to be identified and the confidence degree of the places where the bidding information items are located affected by different information in each type of data to be identified;
s302, setting a first priority according to the region value of the website, the place name library and the confidence sequencing order of each bidding information parameter; wherein the higher the confidence, the higher the first priority
In other words, in the present application, when the target of the label item is identified, the confidence degrees of the data to be identified in different categories and the confidence degrees of the different information in each category are different, for example, the site region value of the preset label information release platform is known, so that the confidence degrees are higher; the confidence of the city field extracted from the project address is higher than the confidence of the city field extracted from the project unit; the confidence of the city field extracted from the project unit is higher than the confidence of the city field extracted from the purchase unit; therefore, in the method, the device and the system are set according to the principle that the higher the confidence is, the higher the first priority is, and then the extracted city fields are combined successively according to the sequence from the high priority to the low priority, so that the target location of the mark message item is obtained.
In the present application, the bid information parameter includes, in addition to the item address, the item name, and the purchase unit, other parameters such as an approval unit/release unit, a purchase unit address, a title, and a zip code/fixed phone. In the process of actually extracting the target place of the mark item, the types of purchasing units are different, and the confidence sequencing orders of the site region value, the place name library and each mark information parameter are different. For example, in general, purchasing units are divided into two types, one type is a school, a public resource and an online approval platform; the other type is websites of medical institutions, banks, enterprise portals, insurance, securities, social purchases, agency institutions and engineering construction; the confidence sequencing order of the first type of purchasing units is province, project address, project name, purchasing units, approval departments/release departments, place name library, purchasing unit address, market share, title and postal code/fixed phone in site region values; for the second category of purchasing units, the confidence sequencing order is province, project address, project name, purchasing unit, approval department/release department, purchasing unit address, place name library, market share, title and postal code/fixed phone in the site region value.
In the application, the site region value, the place name base and the city of each bidding information parameter are successively combined according to the first priority by the following modes:
if the previous three-level administrative division is a structure containing less than two provinces or municipal parts, the next three-level administrative division is a structure containing new provinces, and the new provinces are discarded when merging; for example, the first group extracts the A-province and the B-province, the second group extracts the C-province, then the C-province is abandoned, and the A-province and the B-province are reserved; or the first group is A-1 (A1 City, A2 City, A province) and the second group is A-3 (A3 City, A province) and then A-3 (A3 City, A1 City, A2 (A2 City, A province) is reserved; this is because the first priority of the first group is higher than the second group, so that the confidence of re-extracting new provinces or municipalities is relatively low;
if two three-level administrative regions to be combined are in a provincial and municipal two-level structure, and the other three-level administrative region is in the same provincial and primary structure, the three-level administrative regions with more comprehensive provincial and municipal two-level structure are reserved; for example, the first group is A-1 (A, 1 City, A) and B-1 (B, 1 City, B), and the second group is A, and A-1 (A, 1 City, A) is reserved; or the first group is extracted from A province and B-1 (B province 1 city), the second group is extracted from A-1 (A province 1 city), and A-1 (A province 1 city) is reserved; or the first group extracts A-1 (A province 1 city), A-2 (A province 2 city), and the second group extracts A province, and then A-1 (A province 1 city) and A-2 (A province 2 city) are reserved.
The local values of the sites, the place name library and the cities of each bidding information parameter are combined successively according to the first priority in the mode, so that the target places of the bidding information items can be obtained; however, in individual cases, null values may also occur, for example, three groups of data to be combined are respectively a-province, B-province, and C-province, more than two provinces, and the confidence is not high, so that any data is not retained. Therefore, in the present application, when the local city of the site regional value, the place name library and each bidding information parameter is merged successively according to the first priority in the above manner, and the obtained target location of the bidding information item is null, setting a second priority for the local city of the site regional value and each bidding information parameter, and directly obtaining the target location of the bidding information item from the local city of the site regional value and each bidding information parameter according to the second priority based on a principle of saving one city, or merging successively to obtain the target location of the bidding information item.
For example, in an embodiment, the order of the second priority is item name (four-level full scale), item name (three-level short term), purchasing unit (three-level short term), text recognition, and site region value, that is, the city of the item name (four-level full scale) is firstly determined, if the city of the item name (four-level full scale) is a province-city structure, the city is directly used as the target place of the nominal information item, and no subsequent determination is performed; if the city of the destination name (four-level full name) is a province-multiple-city structure, reserving provinces, and extracting and merging the provinces from the subsequent project names (three-level short), purchasing units (three-level short), text recognition and site region values; if the city of the destination name (four-level full name) is in a multi-province structure, all cities are abandoned, and the merged provinces and the city are extracted from the following item names (three-level short), purchasing units (three-level short), text recognition and site region values.
The method for obtaining the target location of the label item is the same as the first priority and is not described in detail herein. However, when the city of the project name and the city of the purchasing unit are extracted based on the place name library, preprocessing (cleaning redundant data) is not needed, so that relatively more city fields can be obtained; the site regional value is only determined to the province; and the destination of the finally obtained marking item only contains province and city, and the county is not required to be determined.
Therefore, the extraction method of the target location of the standard message item can utilize the preset site region value of the standard message issuing platform, the preset place name library comprising the administrative division tree, and the collected web text or structured data of the item address, the item name and the purchasing unit, and the target location of the standard message item can be extracted according to certain extraction and combination rules, so that the local user can be accurately serviced.
Based on the same inventive concept, the embodiment of the present application further provides a device for extracting the location of the label item, and since the principle of solving the problem by the device in the embodiment of the present application is similar to the method for extracting the location of the label item in the embodiment of the present application, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.
The acquisition module 501 is used for acquiring a plurality of data to be identified of the location of the mark information item; the plurality of data to be identified comprise site region values of a preset bidding information release platform, a preset place name library comprising administrative division trees and bidding data comprising various bidding information parameters;
the extracting module 502 is configured to extract three-level administrative regions for each of the bid information parameters, and combine the three-level administrative regions extracted by each of the bid information parameters to obtain a city in which each of the bid information parameters is located; wherein the bid information parameters comprise one or more of project addresses, project names and purchasing units;
and a merging module 503, configured to set a first priority for the site region value, the place name bank, and the cities of each bidding information parameter, and merge the site region value, the place name bank, and the cities of each bidding information parameter successively according to the first priority, so as to obtain the location of the bidding project.
In some embodiments, the extracting module 502 performs three-level administrative division extraction for each of the bid information parameters, including:
Preprocessing each bidding information parameter, wherein the preprocessing comprises collecting auxiliary addresses of the project addresses, cleaning redundant fields of the project names and extracting characteristic characters in the purchasing units;
comparing each preprocessed bid information parameter with the place name library to obtain three-level administrative division of each bid information parameter; the three-level administrative division is three levels of province, city and county.
In some embodiments, the extracting module 502 merges the three-level administrative divisions extracted by each of the bid information parameters to obtain the city of each of the bid information parameters, including:
according to the structure of the plurality of three-level administrative divisions extracted by each bidding information parameter, reserving or discarding the plurality of three-level administrative divisions;
if the extracted three-level administrative division is more than two provincial structures, discarding the extracted three-level administrative division; if the extracted three-level administrative division is of a provincial multi-city structure, reserving provincial parts of the extracted three-level administrative division; and if the extracted three-level administrative division is in a province-city structure, reserving the province and the city of the extracted three-level administrative division.
In some embodiments, the merging module 503 sets a first priority to the site territory value, the place name repository, and the city of each of the bidding information parameters, including:
determining the region value of the site, the place name library and the confidence sequencing order of each bidding information parameter according to a plurality of different types of data to be identified and the confidence degree of the places where the bidding information items are located affected by different information in each type of data to be identified;
setting a first priority according to the region value of the website, the place name library and the confidence sequencing order of each bidding information parameter; wherein the higher the confidence, the higher the first priority;
the types of the purchasing units are different, and the region values of the sites, the place name library and the confidence sequencing order of each bidding information parameter are different; the bid information parameters also comprise one or more of approval departments/release departments, purchase unit addresses, titles and postal codes/fixed phones.
In some embodiments, the merging module 503 merges the site territory value, the place name repository, and the city of each of the bidding information parameters successively according to the first priority, including:
If the previous three-level administrative division is a structure containing less than two provinces, the next three-level administrative division is a structure containing new provinces, and the new provinces are discarded when merging;
if two three-level administrative regions to be combined are in a provincial and municipal two-level structure, one three-level administrative region is in the same provincial and primary structure, and the three-level administrative regions of the provincial and municipal two-level structure are reserved.
In some embodiments, the device further includes a judging module, configured to judge whether the obtained location of the beacon item is empty; and if the obtained target information item location is empty, setting a second priority for the site region value and the cities of each bidding information parameter, and sequentially merging the site region value and the cities of each bidding information parameter according to the second priority to obtain the target information item location.
The extraction device for the location of the mark information item acquires a plurality of data to be identified of the location of the mark information item through an acquisition module; the plurality of data to be identified comprise site region values of a preset bidding information release platform, a preset place name library comprising administrative division trees and bidding data comprising various bidding information parameters; three-level administrative division extraction is carried out on each bidding information parameter through an extraction module, and a plurality of three-level administrative divisions extracted by each bidding information parameter are combined to obtain the city where each bidding information parameter is located; and gradually merging the site region value, the place name library and the cities of each bidding information parameter according to a preset first priority by a merging module to obtain the target place of the bidding information item, thereby accurately serving the local user.
Based on the same concept of the present invention, fig. 6 of the present disclosure shows a structure of an electronic device 600 according to an embodiment of the present application, where the electronic device 600 includes: at least one processor 601, at least one network interface 604 or other user interface 603, memory 605, at least one communication bus 602. The communication bus 602 is used to enable connected communications between these components. The electronic device 600 optionally includes a user interface 603 including a display (e.g., a touch screen, LCD, CRT, holographic imaging (Holographic) or projection (Projector), etc.), a keyboard or pointing device (e.g., a mouse, trackball, touch pad or touch screen, etc.).
Memory 605 may include read-only memory and random access memory and provide instructions and data to processor 601. A portion of the memory 605 may also include non-volatile random access memory (NVRAM).
In some implementations, the memory 605 stores the following elements, protectable modules or data structures, or a subset thereof, or an extended set thereof:
an operating system 6051 containing various system programs for implementing various basic services and handling hardware-based tasks;
The application program module 6052 includes various application programs such as a desktop (desktop), a Media Player (Media Player), a Browser (Browser), and the like for implementing various application services.
In the embodiment of the present application, the processor 601 is configured to execute steps in a method for extracting a target location of a target item, by calling a program or instructions stored in the memory 605, so as to accurately serve a local user.
The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs steps in a method of extracting a locale of a label item.
Specifically, the storage medium can be a general-purpose storage medium, such as a mobile magnetic disk, a hard disk, etc., and when the computer program on the storage medium is executed, the method for extracting the location of the label item can be executed.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments provided in the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the foregoing examples are merely illustrative of specific embodiments of the present application, and are not intended to limit the scope of the present application, although the present application is described in detail with reference to the foregoing examples, it will be understood by those skilled in the art that: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or make equivalent substitutions for some of the technical features within the technical scope of the disclosure of the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for extracting a location of a beacon item, the method comprising the steps of:
collecting a plurality of data to be identified of the location of the standard message item; the plurality of data to be identified comprise site region values of a preset bidding information release platform, a preset place name library comprising administrative division trees and bidding data comprising various bidding information parameters;
Three-level administrative division extraction is carried out on each bidding information parameter, and a plurality of three-level administrative divisions extracted by each bidding information parameter are combined to obtain the city where each bidding information parameter is located; wherein the bid information parameters comprise one or more of project addresses, project names and purchasing units;
setting a first priority for the regional value of the website, the place name library and the cities of each bidding information parameter, and merging the regional value of the website, the place name library and the cities of each bidding information parameter successively according to the first priority to obtain the target place of the bidding project.
2. The method for extracting a location of a label item according to claim 1, wherein the three-level administrative division extraction is performed for each of the label information parameters, and the method comprises the following steps:
preprocessing each bidding information parameter, wherein the preprocessing comprises collecting auxiliary addresses of the project addresses, cleaning redundant fields of the project names and extracting characteristic characters in the purchasing units;
comparing each preprocessed bid information parameter with the place name library to obtain three-level administrative division of each bid information parameter; the three-level administrative division is three levels of province, city and county.
3. The method for extracting a location of a bidding project according to claim 2, wherein the merging the plurality of three-level administrative divisions extracted from each of the bidding information parameters to obtain the city of each of the bidding information parameters comprises the following steps:
according to the structure of the plurality of three-level administrative divisions extracted by each bidding information parameter, reserving or discarding the plurality of three-level administrative divisions;
if the extracted three-level administrative division is more than two provincial structures, discarding the extracted three-level administrative division;
if the extracted three-level administrative division is of a provincial multi-city structure, reserving provincial parts of the extracted three-level administrative division;
and if the extracted three-level administrative division is in a province-city structure, reserving the province and the city of the extracted three-level administrative division.
4. A method for extracting a location of a label item according to claim 3, wherein said setting a first priority for said site region value, said place name library, and a city of each of said label information parameters comprises the steps of:
Determining the region value of the site, the place name library and the confidence sequencing order of each bidding information parameter according to a plurality of different types of data to be identified and the confidence degree of the places where the bidding information items are located affected by different information in each type of data to be identified;
setting a first priority according to the region value of the website, the place name library and the confidence sequencing order of each bidding information parameter; wherein the higher the confidence, the higher the first priority.
5. The method according to claim 4, wherein the purchase units are different in type, and the site region value, the place name library, and the confidence ranking order of each of the bid information parameters are different; the bid information parameters also comprise one or more of approval departments/release departments, purchase unit addresses, titles and postal codes/fixed phones.
6. The method according to claim 5, wherein the site territory value, the location name library, and the city of each of the bidding information parameters are successively combined according to the first priority by:
If the previous three-level administrative division is a structure containing less than two provinces, the next three-level administrative division is a structure containing new provinces, and the new provinces are discarded when merging;
if two three-level administrative regions to be combined are in a provincial and municipal two-level structure, one three-level administrative region is in the same provincial and primary structure, and the three-level administrative regions of the provincial and municipal two-level structure are reserved.
7. The method of claim 6, further comprising the steps of:
judging whether the obtained location of the mark item is empty or not;
and if the obtained target information item location is empty, setting a second priority for the site region value and the cities of each bidding information parameter, and sequentially merging the site region value and the cities of each bidding information parameter according to the second priority to obtain the target information item location.
8. An apparatus for extracting a location of a message item, the apparatus comprising:
the acquisition module is used for acquiring a plurality of data to be identified of the location of the standard message item; the plurality of data to be identified comprise site region values of a preset bidding information release platform, a preset place name library comprising administrative division trees and bidding data comprising various bidding information parameters;
The extraction module is used for extracting three-level administrative regions aiming at each bidding information parameter, and combining a plurality of three-level administrative regions extracted by each bidding information parameter to obtain the city where each bidding information parameter is located; wherein the bid information parameters comprise one or more of project addresses, project names and purchasing units;
and the merging module is used for setting a first priority for the site region value, the place name library and the cities of each bidding information parameter, and merging the site region value, the place name library and the cities of each bidding information parameter successively according to the first priority to obtain the places of the bidding projects.
9. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating via the bus when the electronic device is running, said machine readable instructions when executed by said processor performing the steps of the method of extracting the locality of a beacon item according to any one of claims 1 to 7.
10. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which computer program, when being executed by a processor, performs the steps of the method for extracting the location of the beacon item according to any one of claims 1 to 7.
CN202310645158.3A 2023-06-02 2023-06-02 Method, device, equipment and medium for extracting location of mark information item Active CN116384948B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310645158.3A CN116384948B (en) 2023-06-02 2023-06-02 Method, device, equipment and medium for extracting location of mark information item

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310645158.3A CN116384948B (en) 2023-06-02 2023-06-02 Method, device, equipment and medium for extracting location of mark information item

Publications (2)

Publication Number Publication Date
CN116384948A true CN116384948A (en) 2023-07-04
CN116384948B CN116384948B (en) 2023-08-25

Family

ID=86971391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310645158.3A Active CN116384948B (en) 2023-06-02 2023-06-02 Method, device, equipment and medium for extracting location of mark information item

Country Status (1)

Country Link
CN (1) CN116384948B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001337952A (en) * 2000-05-29 2001-12-07 Fukuchiyama Kotani Sangyo Kk Local site retrieval system on internet
CN101651634A (en) * 2008-08-13 2010-02-17 阿里巴巴集团控股有限公司 Method and system for providing regional information
CN105224622A (en) * 2015-09-22 2016-01-06 中国搜索信息科技股份有限公司 The place name address extraction of Internet and standardized method
KR20170029135A (en) * 2015-09-07 2017-03-15 민부근 Integrated information search method based on administrative district map
CN110148043A (en) * 2019-03-01 2019-08-20 安徽省优质采科技发展有限责任公司 The bid and purchase information recommendation system and recommended method of knowledge based map
CN113128218A (en) * 2021-04-27 2021-07-16 华世界数字科技(深圳)有限公司 Key field extraction method and device for bidding information
CN113947337A (en) * 2021-12-20 2022-01-18 中通服建设有限公司 Method for cooperatively sharing engineering construction resources
CN115422884A (en) * 2022-08-15 2022-12-02 广州众成大数据科技有限公司 Method, system, equipment and storage medium for processing beacon data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001337952A (en) * 2000-05-29 2001-12-07 Fukuchiyama Kotani Sangyo Kk Local site retrieval system on internet
CN101651634A (en) * 2008-08-13 2010-02-17 阿里巴巴集团控股有限公司 Method and system for providing regional information
KR20170029135A (en) * 2015-09-07 2017-03-15 민부근 Integrated information search method based on administrative district map
CN105224622A (en) * 2015-09-22 2016-01-06 中国搜索信息科技股份有限公司 The place name address extraction of Internet and standardized method
CN110148043A (en) * 2019-03-01 2019-08-20 安徽省优质采科技发展有限责任公司 The bid and purchase information recommendation system and recommended method of knowledge based map
CN113128218A (en) * 2021-04-27 2021-07-16 华世界数字科技(深圳)有限公司 Key field extraction method and device for bidding information
CN113947337A (en) * 2021-12-20 2022-01-18 中通服建设有限公司 Method for cooperatively sharing engineering construction resources
CN115422884A (en) * 2022-08-15 2022-12-02 广州众成大数据科技有限公司 Method, system, equipment and storage medium for processing beacon data

Also Published As

Publication number Publication date
CN116384948B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
CN106982256B (en) Information pushing method, device, equipment and storage medium
CN108628811B (en) Address text matching method and device
US20070233582A1 (en) Neighborhood commerce in a geo-spatial environment
RU2695420C1 (en) Method of collecting logistic information and interstate transportation system
JP5113108B2 (en) Note name identification device, note name identification method, and note name identification program
US20080301042A1 (en) System and method for categorizing credit card transaction data
US20140236753A1 (en) Neighborhood commerce in a geo-spatial environment
CN109635276B (en) Information matching method and terminal
CN109325845A (en) A kind of financial product intelligent recommendation method and system
CN110288451B (en) Financial reimbursement method, system, equipment and storage medium
CN111639253A (en) Data duplication judging method, device, equipment and storage medium
CN106933814A (en) Tax data exception analysis method and system
CN111159183B (en) Report generation method, electronic device and computer readable storage medium
CN111191123A (en) Business information pushing method and device, readable storage medium and computer equipment
CN110825817B (en) Enterprise suspected association judgment method and system
CN113362162A (en) Wind control identification method and device based on network behavior data, electronic equipment and medium
Chatterjee et al. SAGEL: smart address geocoding engine for supply-chain logistics
CN116384948B (en) Method, device, equipment and medium for extracting location of mark information item
CN114528448B (en) Accurate analytic system of drawing of portrait of global foreign trade customer
KR102271234B1 (en) Method for creating estate similar case db using pnu
CN110852620A (en) Logistics order processing method and device, electronic equipment and storage medium
CN116579791A (en) User mining method and device
Suman et al. Direct marketing with the application of data mining
CN113989005A (en) Tax risk enterprise mining method and device
CN113449002A (en) Vehicle recommendation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant