CN110019797A - Data classification method and device - Google Patents

Data classification method and device Download PDF

Info

Publication number
CN110019797A
CN110019797A CN201711131428.XA CN201711131428A CN110019797A CN 110019797 A CN110019797 A CN 110019797A CN 201711131428 A CN201711131428 A CN 201711131428A CN 110019797 A CN110019797 A CN 110019797A
Authority
CN
China
Prior art keywords
information
data
classified
address
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711131428.XA
Other languages
Chinese (zh)
Inventor
黄双全
唐玉建
范英
康凯
郝瑞朝
邹继文
王鑫
顾智海
袁新武
�田�浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Research Center Of Ministry Of Public Security Huzhengguanli
Aisino Corp
Original Assignee
Research Center Of Ministry Of Public Security Huzhengguanli
Aisino Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Research Center Of Ministry Of Public Security Huzhengguanli, Aisino Corp filed Critical Research Center Of Ministry Of Public Security Huzhengguanli
Priority to CN201711131428.XA priority Critical patent/CN110019797A/en
Publication of CN110019797A publication Critical patent/CN110019797A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of data classification method and devices.The data classification method includes: to obtain information to be sorted;It determines the corresponding address information recording of address date in the information to be sorted, the household register categorical attribute of the information to be sorted is determined according to the corresponding address information recording of the address date;Alternatively, determining the corresponding keyword message record of keyword in the address date in the information to be sorted, is recorded according to the corresponding keyword message of keyword in the address date, determine the household register categorical attribute of the information to be sorted.It may be implemented to classify to the town and country attribute of population information data by the data classification method.

Description

Data classification method and device
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data classification method and apparatus.
Background
According to reports on working opinions related to population information data analysis development in medium-term evaluation of reform of cooperative household registration system, population management big data application level modeling technology research demand analysis reports and population management big data application level modeling technology research demand review of the ministry of public Security, the change situation of the urbanization rate of the household registration population in the whole nation since 2016 is analyzed.
The urban and rural classified information of the household registers is manually marked and reported by population management departments of public security organs at different levels. Due to the fact that population management information systems used by provincial units are different, basic population information maintenance and service change filing are not verified mutually, and urban and rural classified manual identification in household registration service information is achieved, the quality problem of population data is large, the integrity and accuracy of identification of urban and rural classified information of household registration migration are not high, other variable information urban and rural classified information is not identified, meanwhile, the timeliness of updating household registration addresses is far behind the change of actual addresses, and the statistics data accuracy and reliability of the urban population change rate and the urban population change rate are not convenient to count.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data classification method and apparatus to solve the problem in the prior art that the urban and rural classification attributes of the human interface information data are not complete.
The embodiment of the invention provides a data classification method, which comprises the following steps: acquiring information to be classified; determining an address information record corresponding to address data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the address information record corresponding to the address data; or determining a keyword information record corresponding to a keyword in address data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the keyword information record corresponding to the keyword in the address data.
Optionally, determining an address information record corresponding to address data in the information to be classified, and determining a household registration classification attribute of the information to be classified according to the address information record corresponding to the address data includes: and determining a standard address information record matched with the address data, wherein the standard address information record comprises standard address information data and a corresponding household classification attribute, and determining the household classification attribute in the matched standard address information record as the household classification attribute of the information to be classified.
Optionally, determining an address information record corresponding to address data in the information to be classified, and determining a household registration classification attribute of the information to be classified according to the address information record corresponding to the address data, further includes: if the standard address information record is not matched with the address data, determining province, city and county data in administrative regions, which are matched with province, city and county data of the address data, in the city and county classification information record, and if the household registration classification attribute corresponding to the matched province, city and county data is a town, determining the household registration classification attribute of the information to be classified as the town; or if the household registration classification attribute corresponding to the matched province, city and county data is not a town, determining village and town name data matched with the village and town name data of the address data in the village and town name data contained in the zone detailed address corresponding to the matched province, and if the household registration classification attribute of the matched village and town name data is definite, determining the household registration classification attribute of the information to be classified as the household registration classification attribute of the matched village and town name data; or, if the registered family classification attribute of the village name data is ambiguous, specifying village committee or resident committee name data registered with the village committee or resident committee name data of the address data in the area details address corresponding to the registered village name data, wherein the family classification attribute of the information to be classified is the family classification attribute corresponding to the registered village committee or resident committee name data.
Optionally, determining a keyword information record corresponding to a keyword in address data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the keyword information record corresponding to the keyword in the address data includes: extracting a keyword in the address data if village name data contained in the zone detailed address corresponding to the matched province, city and county data does not match with village name data of the address data, or if village committee or resident committee name data contained in the zone detailed address corresponding to the matched village name data does not match with village committee or resident committee name data of the address data; and determining keyword data matched with the keywords in the address data in the keyword information record, wherein the keyword information record comprises the keyword data and the corresponding household registration classification attribute, and determining the household registration classification attribute of the information to be classified as the household registration classification attribute corresponding to the matched keyword data.
Optionally, determining a keyword information record corresponding to a keyword in address data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the keyword information record corresponding to the keyword in the address data, further includes: if no corresponding keyword information record exists or the address data content is empty, determining the record with the earliest updating time in the household registration population basic information urban and rural classification information table and the household registration population historical information urban and rural classification table according to the identity number data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the record.
Optionally, the method further comprises: if the information to be classified is determined not to contain address data, determining the type of the information to be classified; determining corresponding type information records matched in the household registration population basic information according to the types of the information to be classified, and determining household registration classification attributes of the information to be classified according to the household registration classification attributes of the corresponding type information records matched in the household registration population basic information; or if it is determined that the corresponding type information record matched with the type of the information to be classified does not exist in the household registration population basic information, determining the corresponding type information record matched in the household registration population historical information according to the type of the information to be classified, and determining the household registration classification attribute of the information to be classified according to the household registration classification attribute of the corresponding type information record matched in the household registration population historical information; or if it is determined that the corresponding type information record matched with the type of the information to be classified does not exist in the household registration population history information, determining that the information to be classified is empty.
According to another aspect of the present invention, there is provided a data sorting apparatus comprising: the acquisition module is used for acquiring information to be classified; the first determining module is used for determining an address information record corresponding to address data in the information to be classified and determining the household registration classification attribute of the information to be classified according to the address information record corresponding to the address data; or, the second determining module is configured to determine a keyword information record corresponding to a keyword in address data in the information to be classified, and determine the household registration classification attribute of the information to be classified according to the keyword information record corresponding to the keyword in the address data.
According to the data classification scheme provided by the embodiment of the invention, the address data in the information to be classified is matched with the address information record and/or the key information record, the corresponding address information record or the corresponding key information record is determined, and the household registration classification attribute of the information to be classified is determined according to the corresponding address information record or key information record.
When the data classification scheme is used for population data analysis or household registration population urbanization rate change analysis, the problems of low population data quality and poor statistical reliability caused by inconsistent management information systems, error in data filling, missing report or untimely updating and the like adopted by the current population management departments can be avoided, the household registration urbanization rate change analysis is convenient to carry out, and the reliability and the accuracy of the analysis are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of a data classification method according to an embodiment of the present invention;
fig. 2 is a flowchart of a data classification method according to a second embodiment of the present invention;
fig. 3 is a block diagram of a data classification apparatus according to a third embodiment of the present invention;
fig. 4 is a block diagram of a data classification apparatus according to a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Fig. 1 is a flowchart of a data classification method according to an embodiment of the present invention. As shown in fig. 1, according to a first embodiment of the present invention, a data classification method includes the following steps:
s101: and acquiring information to be classified.
When data classification is carried out, information to be classified is obtained firstly. The information to be classified may be any information, such as sales data information, attendance data information, and the like, depending on the application scenario of the data classification method. In this embodiment, a demographic big data analysis application scenario is taken as an example for explanation, and in this application scenario, the information to be classified may be basic population information, household registration management business information, and the like, or may be any population information existing or generated in a population management process.
S102 a: and determining address information records corresponding to the address data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the address information records corresponding to the address data.
The content contained in different information to be classified may be different. Taking basic population information as an example, the basic population information may include content parameters such as name, identification number, native place, household address, and household classification attribute. Taking the household management service information as an example, the content parameters contained in the household management service information may be different according to different types of household management services. For example, the birth registration service information may include content parameters such as name, identification number, household address, birth date, etc.
The information to be classified containing the address data can be classified according to the attributes of the address data to be classified. For example, the household registration classification attribute (e.g., urban and rural classification attribute) of the information to be classified is classified according to the address data in the information to be classified. Due to the association between the urban and rural classification attributes of the population and the household address, the reliability of such classification can be guaranteed.
In this embodiment, the household registration classification attribute of the information to be classified, which has a blank household registration classification attribute or is unknown, is classified by determining the address information record corresponding to the address data in the information to be classified and then determining the household registration classification attribute of the information to be classified according to the corresponding address information record.
The address information record may associate the address with the household registration classification attribute, so as to generate a mapping relationship between the address and the household registration classification attribute, where the mapping relationship may be a one-to-one mapping, a one-to-many mapping, or a many-to-one mapping.
Alternatively, the household classification attribute of the information to be classified may be determined through step S102 b.
S102 b: determining a keyword information record corresponding to a keyword in address data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the keyword information record corresponding to the keyword in the address data.
Since there is a correlation between the address data of the population and the household classification attribute of the population, for example, the household address is the population of a country, and the household classification attribute has a high probability of being a country (i.e., the population of the country), the household classification attribute can be determined according to the keywords in the address data in the information to be classified.
In the present embodiment, the keyword information record associates a keyword with a corresponding household registration classification attribute, and the keyword may be "cell", "unit", "building", "village", and the like, but is not limited thereto. The household registration classification attribute may be an urban and rural classification attribute, such as town, country, etc. Of course, the household registration classification attribute may be other attributes that need to be classified according to different data classification requirements. The keyword information record may record the probability of the household classification attribute of each keyword, for example, the probability of the household classification attribute of the cell being a town is 98%. Of course, the keyword information record may also be associated with the household classification attribute by other means, such as assigning the household classification attribute of the cell to a town, etc.
The household registration classification attribute of the information to be classified can be determined quickly and reliably by determining the key word information record corresponding to the key word in the address data of the information to be classified and determining the household registration classification attribute of the information to be classified according to the key word information record corresponding to the key word in the address data.
Of course, the step S102b may be executed when the household classification attribute of the information to be classified cannot be determined through the step S102a, or may be executed separately.
In the data classification method of this embodiment, address data in the information to be classified is matched with the address information record and/or the keyword information record, so as to determine the corresponding address information record or the corresponding keyword information record, and then the household registration classification attribute of the information to be classified is determined according to the corresponding address information record or the corresponding keyword information record.
When the data classification method is used for population data analysis or household registration population urbanization rate change analysis, the problems of low population data quality and poor statistical reliability caused by inconsistent management information systems, error in data filling, missing report or untimely updating and the like adopted by the current population management departments can be avoided, the household registration urbanization rate change analysis is convenient to carry out, and the reliability and the accuracy of the analysis are improved.
Example two
Fig. 2 is a flowchart of a data classification method according to a second embodiment of the present invention. As shown in fig. 2, the data classification method includes the steps of:
s201: and acquiring information to be classified.
The manner of obtaining the information to be classified may be to receive the information to be classified from the outside, or to read the information to be classified from a local storage location, or to download the information to be classified from a cloud, another server, or the like.
The information to be classified may be, but is not limited to, basic household population information, history household population information, household management service information, and the like.
The content parameters contained in different information to be classified are different. The basic household population information and the history household population information both include address data, such as household addresses, residential addresses, and the like. The household management service information may or may not include address data according to a specific management service. The household registration management services including the address data include, but are not limited to, birth registration services, migration-in household addresses, migration-out addresses, missed registration, refund transfer, homeland returning and living, criminal education, missing logout and the like. The household registration management service that does not include address data includes, but is not limited to, death logout information, outbound information, re-login information, military information, and the like.
The household registration classification attribute can be determined in different ways according to different information to be classified.
The method for classifying different types of data will be described in detail below by taking the basic information of household population, the information of household management service containing address data, and the information of household management service not containing address data as examples.
If the information to be classified includes address data, such as the basic information of the household population and the household management service information including the address data, step S202 may be executed.
S202: and determining address information records corresponding to the address data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the address information records corresponding to the address data.
Taking the information to be classified as the basic information of the household population as an example, the step S202 includes:
s202 a: and determining standard address information records matched with the address data, wherein the standard address information records comprise standard address information data and corresponding household classification attributes, and determining the household classification attributes in the matched standard address information records as the household classification attributes of the information to be classified.
When the standard address information record matched with the address data is determined, province, city and county codes and area detailed addresses in the household address (namely the address data) of the household population basic information (namely the information to be classified) are matched with the standard address information record, and if the matched data record exists in the standard address information record, the city and countryside classification code of the matched data record is assigned to the household classification attribute of the information to be classified.
Wherein the standard address information record can be obtained by big data learning or data analysis. For example, in the present embodiment, the standard address information record can be obtained by analyzing the household registration, urban and rural classification information table, the address household registration, urban and rural classification statistics table and the household address, urban and rural classification identifier. The standard address information records may be stored in a table manner, a database manner, or any other manner capable of being implemented.
Taking a table as an example, the table includes at least one standard address information record, and each standard address information record at least includes standard address information data and a household classification attribute corresponding to each standard address information data. The standard address information data and the household category attribute may be present in the form of fields in the table. The standard address information data may store province and city codes, detailed zone addresses, or any other form capable of identifying addresses. The household registration classification attribute may store an urban and rural classification code or other identification capable of identifying an urban and rural category. Such as "town", "country", or the like, or a code "100" indicating a town, a code "200" indicating a country, or the like.
Of course, in step S202, there may be a case where the standard address information record does not match the address data, in which case the address data may be matched with the city and countryside classification information table of the administrative area, and the household registration classification attribute of the information to be classified is determined according to the city and countryside classification information table of the administrative area.
Step S202 further includes:
s202 b: and if the standard address information record is not matched with the address data, determining province, city and county data in the administrative division urban and rural classified information record matched with the province, city and county data of the address data, and if the household registration classification attribute corresponding to the matched province, city and county data is a town, determining the household registration classification attribute of the information to be classified as the town.
In the case where the address data is not matched with the standard address information record, it is necessary to match the address data with the address information in the administrative division urban and rural classification information table. At least one information record exists in the administrative division urban and rural classified information table, and each information record at least comprises an administrative division address, a household registration classified attribute (such as urban and rural classified attribute) corresponding to the administrative division address and the like. And if the city and county classification attribute corresponding to the matched city and county is a town, assigning the household registration classification attribute containing the information to be classified of the address data as the town attribute.
Or, there may be a case where the urban and rural classification data corresponding to the matched province, city and county is not a town.
S202 c: if the household registration classification attribute corresponding to the matched province, city and county data is not a town, the village and town name data matched with the village and town name data of the address data in the village and town name data contained in the district detailed address corresponding to the matched province, city and county data is determined, and if the household registration classification attribute of the matched village and town name data is clear, the household registration classification attribute of the information to be classified is the household registration classification attribute of the matched village and town name data.
If the household registration classification attribute corresponding to the matched province, city and county data is not a town, the household registration classification attribute cannot be directly determined according to the matched province, city and county data, and further matching needs to be carried out on the address data. For example, the country name of the prefecture in the province, city, county, etc. in the address data is searched for in the area detail address, and if no matching country name data is searched for, the matching is performed directly by the keyword in the address data (this step will be described in detail later). If the information is searched, the detail address of the district in the address data contains the corresponding country name, and the household registration classification attribute of the information to be classified can be determined according to the household registration classification attribute corresponding to the matched country name data.
For example, if the household registration classification attribute corresponding to the town name data is definitely determined as a town, a country or other types, the corresponding household registration classification attribute is determined as the household registration classification attribute of the information to be classified.
Alternatively, there may be situations where the domicile category attribute corresponding to the matching country name data is ambiguous or unfilled. In this case, step S202d is executed.
S202 d: if the family classification attribute of the matched village and town name data is not clear, the village and commission name data matched with the village and commission name data of the address data in the village and commission name data contained in the area detailed address corresponding to the matched village and town name data is determined, and the family classification attribute of the information to be classified is the family classification attribute corresponding to the matched village and commission name data.
And if the family membership classification attribute corresponding to the village and town name data is uncertain, namely the town and country classification of the village and town name data is uncertain, further searching matched village committee or resident committee name data belonging to the village and town in the area detailed address of the address data. If not, matching according to keywords (detailed description is carried out later); and if the matched name of the village committee or the residence committee is searched, obtaining the matched name data of the village committee or the residence committee, and assigning the household classification attribute corresponding to the matched name data of the village committee or the residence committee as the household classification attribute of the information to be classified. For example, if the town-country classification attribute corresponding to the matched village committee or the living committee name data is a town, the household attribute of the information to be classified is determined as the town.
If no matching village name data is retrieved or no matching village or resident name data is retrieved from the area details address of the address data, step S203 is executed.
S203: determining a keyword information record corresponding to a keyword in address data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the keyword information record corresponding to the keyword in the address data.
Determining the keywords in the address data in the information to be classified may be performed in a keyword search manner, for example, by searching the keywords in the keyword information records one by one until a matching keyword is found in the address data, or by traversing all the keywords in the keyword information records.
Specifically, step S203 includes:
s203 a: if the village name data included in the area detailed address corresponding to the matched province, city and county data does not match the village name data of the address data, or the village committee or resident committee name data included in the area detailed address corresponding to the matched village name data does not match the village committee or resident committee name data of the address data, extracting the keyword in the address data.
S203 b: and determining keyword data matched with the keywords in the address data in the keyword information record, wherein the keyword information record comprises the keyword data and the corresponding household registration classification attribute, and determining the household registration classification attribute of the information to be classified as the household registration classification attribute corresponding to the matched keyword data.
The address data is searched for the key in the key information record. And after the keyword information records are matched, determining the household registration classification attribute with the highest probability corresponding to the keyword information records as the household registration classification attribute of the information to be classified. For example, if the town probability that the area detailed address of the address data contains the unit is 98.87%, the household classification attribute that the area detailed address contains the information to be classified with the keyword as the unit in the address data is town.
In step S203, there may be a case where although the information to be classified includes address data, the address data is not filled, that is, the content is empty, or the address data cannot be matched with the standard address information record or the administrative division urban and rural classification information table. In this case, step S203 further includes:
s203 c: if no corresponding keyword information record exists or the address data content is empty, determining the record with the earliest updating time in the city and countryside classification information table of the household registration population basic information and the city and countryside classification table of the household registration population historical information according to the identity card number data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the record.
For information to be classified with null address data or unmatched address data (not matching standard address information records, administrative division urban and rural classification information tables and keyword information records), the records with the earliest update time (the records are farthest from the present) are selected according to the citizen identity number and name association 'basic household registration information-urban and rural classification information table' and 'household registration history information-urban and rural classification table', the urban and rural classification information of the record is assigned to the household registration classification attribute of the information to be classified, and the address data of the information to be classified can also be assigned to the address information of the record.
For the information to be classified without address data, the method further comprises:
s204: and if the information to be classified does not contain the address data, determining the type of the information to be classified.
For the household registration management service information which does not contain address data, the type of the management service of the information to be classified is determined. Taking the death cancellation information as an example, the type of the information to be classified is death cancellation.
S205: and determining the corresponding type information record matched in the household registration population basic information according to the type of the information to be classified, and determining the household registration classification attribute of the information to be classified according to the household registration classification attribute of the corresponding type information record matched in the household registration population basic information.
The death logout information is first matched with the death logout information in the basic information of the household population. And if the matched death logout information exists, assigning the address data and the urban and rural classification information recorded in the basic population information of the household registration substrate to the address data and the household registration classification attribute of the information to be classified.
The death logout information can be matched with the death logout information in the basic information of the household registration population according to the identity card number, the name and the like.
Alternatively, there may be a case where the death cancellation information does not match the death cancellation information in the basic information of the household population, and in this case, the step S206 is executed.
S206: if it is determined that the corresponding type information record matched with the type of the information to be classified does not exist in the household registration population basic information, determining the corresponding type information record matched in the household registration population historical information according to the type of the information to be classified, and determining the household registration classification attribute of the information to be classified according to the household registration classification attribute of the corresponding type information record matched in the household registration population historical information.
And the information records are compared with the corresponding type information records in the basic information of the household registration population, and the information records are continuously matched with the corresponding type information records in the history information of the household registration population. Taking the death logout information as an example, the death logout information is continuously matched with the death logout information in the household registration population history information. And if so, assigning the address data and the urban and rural classification information of the record in the household registration population history information to the information to be classified.
Alternatively, there may be cases where there is still no match. In this case, step S207 is executed.
S207: and if it is determined that the corresponding type information record matched with the type of the information to be classified does not exist in the household registration population history information, determining that the household registration classification attribute of the information to be classified is empty.
And determining that the household registration classification attribute of the information to be classified is empty for the information to be classified which is not compared with the basic information of the household registration population and the historical information of the household registration population. For example, the household classification attribute is set to 300 to indicate that its household classification attribute is null. Or set the household category attribute to "null" to indicate that its household category attribute is null.
Afterwards, the administrative division (city level) of the information to be classified can be determined according to the corresponding type of division field (such as the division field according to death logout) for subsequent statistics.
In the data classification method of this embodiment, the household registration classification attribute of the information to be classified is determined by comparing the address data in the information to be classified with the standard address information record, the administrative division urban and rural classification information table, and the keyword information record. The identification such as the identity card number and the name is associated with the information to be classified without address data, and the information to be classified is compared with the basic information of the household registration population, the historical information of the household registration population and the like, so that the classification attribute of the household registration is determined, the information to be classified of different types can be classified, the adaptability is better, and the classification is reliable and accurate.
When the data classification method is used for population data analysis or household registration population urbanization rate change analysis, the problems of low population data quality and poor statistical reliability caused by inconsistent management information systems, error in data filling, missing report or untimely updating and the like adopted by the current population management departments can be avoided, the household registration urbanization rate change analysis is convenient to carry out, and the reliability and the accuracy of the analysis are improved.
The method comprises the steps of establishing a model by using household registration address information, household registration population affiliated region and urban and rural classified information in household registration population basic information through population big data address matching and classified attribute identification, obtaining urban and rural classified information of the household registration addresses of the national household registration population, and urban and rural classified information of address element sets such as county, village, country and committee, and simultaneously, obtaining urban and rural classified information of address common words through machine learning, analysis and obtaining, thereby providing a foundation for urban and rural classification of the household registration population basic information, historical information and household registration management service information and ensuring the accuracy of classification.
And adopting a standard address matching mode for the information to be classified containing the address data. The standard address matching mode is mainly applied to the basic information and the historical information of household registration population and household registration management service information containing address information, and mainly comprises standard address information record matching, administrative division urban and rural classification information matching and keyword matching.
For the information to be classified containing address data but with empty address data content, the household address information and the urban and rural classification information can be obtained by associating the basic household population information and the historical information on the basis of a standard address matching mode. The method is mainly applied to the record with empty address in the tables of birth registration, migration and transfer, refuge and transfer, homeland return and the like.
And for the information to be classified which does not contain address data, adopting a non-address information matching mode. The non-address information matching mode is mainly applied to business change information which does not contain address information, such as death, foreign settlement, military service and the like.
The data classification method completely and comprehensively corrects the information to be classified and the urban and rural classification of the address by using standard address information record, keyword urban and rural classification information record and administrative region urban and rural classification information, and simultaneously completes the address information and the address urban and rural classification information for data in the service change information more completely. Population data and household registration policy analysis is carried out on the provinces with large deviation of calculation results and annual report data in the monthly change curve of the household registration population urbanization rate, national and typical provincial village transfer population household town conditions are analyzed, the range and flow direction of the country transfer population household transferring area are analyzed, household registration population urbanization rate change and household policy implementation effects are evaluated, a solid foundation is laid for the evaluation policy implementation effects, and reliable data support is provided.
EXAMPLE III
Fig. 3 is a block diagram of a data classification apparatus according to a third embodiment of the present invention. As shown in fig. 3. The data classification apparatus includes: an obtaining module 301, configured to obtain information to be classified; a first determining module 302, configured to determine an address information record corresponding to address data in the information to be classified, and determine a household registration classification attribute of the information to be classified according to the address information record corresponding to the address data; or, the second determining module 303 is configured to determine a keyword information record corresponding to a keyword in the address data in the information to be classified, and determine the household registration classification attribute of the information to be classified according to the keyword information record corresponding to the keyword in the address data.
The data classification device determines the corresponding address information record or the corresponding key word information record by utilizing the matching of the address data in the information to be classified and the address information record and/or the key word information record, and then determines the household registration classification attribute of the information to be classified according to the corresponding address information record or the key word information record.
When the data classification device is used for population data analysis or household registration population urbanization rate change analysis, the problems of low population data quality and poor statistical reliability caused by inconsistent management information systems, error in data filling, missing report or untimely updating and the like adopted by the current population management departments can be avoided, the household registration urbanization rate change analysis is convenient to carry out, and the reliability and the accuracy of the analysis are improved.
Example four
Fig. 4 is a block diagram of a data classification apparatus according to a fourth embodiment of the present invention. As shown in fig. 4. A data sorting apparatus includes: an obtaining module 401, configured to obtain information to be classified; a first determining module 402, configured to determine an address information record corresponding to address data in the information to be classified, and determine a household registration classification attribute of the information to be classified according to the address information record corresponding to the address data; or, the second determining module 403 is configured to determine a keyword information record corresponding to a keyword in the address data in the information to be classified, and determine the household registration classification attribute of the information to be classified according to the keyword information record corresponding to the keyword in the address data.
Optionally, the first determining module 402 includes: the first determining sub-module 4021 is configured to determine a standard address information record matched with the address data, where the standard address information record includes the standard address information data and a corresponding household classification attribute, and determine the household classification attribute in the matched standard address information record as the household classification attribute of the information to be classified.
Optionally, the first determining module 402 further includes: a second determining sub-module 4022, configured to determine, if the standard address information record does not match the address data, province, city and county data in the administrative division city and county classification information record that matches the province, city and county data of the address data, and if the household registration classification attribute corresponding to the matched province, city and county data is a town, determine the household registration classification attribute of the information to be classified as a town; or, the third determining sub-module 4023 is configured to determine, if the household registration classification attribute corresponding to the matched province, city and county data is not a town, the county name data matched with the county name data of the address data in the county name data included in the district detailed address corresponding to the matched province, city and county data, and if the household registration classification attribute of the matched county name data is clear, the household registration classification attribute of the information to be classified is the household registration classification attribute of the matched county name data; alternatively, the fourth determination submodule 4024 is configured to determine, if the family classification attribute of the matched country name data is ambiguous, the name data of a village or a residence matching the name data of a village or a residence of the address data in the area details address included in the matched country name data, and the family classification attribute of the information to be classified is the family classification attribute corresponding to the matched village or residence name data.
Optionally, the second determining module 403 includes: a fifth determining submodule 4031 configured to extract a keyword from the address data if the village name data included in the area detailed address corresponding to the matched province-city-county data does not match the village name data of the address data, or if the village committee or resident committee name data included in the area detailed address corresponding to the matched village name data does not match the village committee or resident committee name data of the address data; a sixth determining sub-module 4032, configured to determine keyword data in the keyword information record, which is matched with the keywords in the address data, where the keyword information record includes the keyword data and a corresponding household registration classification attribute, and then determine that the household registration classification attribute of the information to be classified is a household registration classification attribute corresponding to the matched keyword data.
Optionally, the second determining module 403 further includes: a seventh determining submodule 4033, configured to determine, if there is no corresponding keyword information record or the address data content is empty, a record with the earliest update time in the household registration population basic information urban and rural classification information table and the household registration population historical information urban and rural classification table according to the identity number data in the information to be classified, and determine a household registration classification attribute of the information to be classified according to the record.
Optionally, the apparatus further comprises: a type determining module 404, configured to determine the type of the information to be classified if it is determined that the information to be classified does not include address data; a third determining module 405, configured to determine, according to the type of the information to be classified, a corresponding type information record matched in the household registration population basic information, and determine, according to the household registration classification attribute of the corresponding type information record matched in the household registration population basic information, a household registration classification attribute of the information to be classified; or, the fourth determining module 406 is configured to determine, if it is determined that the corresponding type information record matching the type of the information to be classified does not exist in the household registration population basic information, the corresponding type information record matching the type of the household registration population historical information according to the type of the information to be classified, and determine the household registration classification attribute of the information to be classified according to the household registration classification attribute of the corresponding type information record matching the type of the household registration population historical information; or, the fifth determining module 407 is configured to determine that the information to be classified is empty if it is determined that the corresponding type information record matching the type of the information to be classified does not exist in the household registration population history information.
The data classification apparatus of this embodiment determines the household registration classification attribute of the information to be classified by comparing the address data in the information to be classified with the standard address information record, the administrative division urban and rural classification information table, and the keyword information record. The identification such as the identity card number and the name is associated with the information to be classified without address data, and the information to be classified is compared with the basic information of the household registration population, the historical information of the household registration population and the like, so that the classification attribute of the household registration is determined, the information to be classified of different types can be classified, the adaptability is better, and the classification is reliable and accurate.
When the data classification device is used for population data analysis or household registration population urbanization rate change analysis, the problems of low population data quality and poor statistical reliability caused by inconsistent management information systems, error in data filling, missing report or untimely updating and the like adopted by the current population management departments can be avoided, the household registration urbanization rate change analysis is convenient to carry out, and the reliability and the accuracy of the analysis are improved.
The above-described embodiments of the apparatus are merely illustrative, wherein the units described as separate parts may or may not be physically separate, and some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The data classification method of the embodiment may be implemented by any suitable device or apparatus with a data processing function, including but not limited to various terminals and servers. Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (12)

1. A method of data classification, comprising:
acquiring information to be classified;
determining an address information record corresponding to address data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the address information record corresponding to the address data; or,
determining a keyword information record corresponding to a keyword in address data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the keyword information record corresponding to the keyword in the address data.
2. The method of claim 1, wherein determining an address information record corresponding to address data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the address information record corresponding to the address data comprises:
and determining a standard address information record matched with the address data, wherein the standard address information record comprises standard address information data and a corresponding household classification attribute, and determining the household classification attribute in the matched standard address information record as the household classification attribute of the information to be classified.
3. The method of claim 2, wherein determining an address information record corresponding to address data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the address information record corresponding to the address data, further comprises:
if the standard address information record is not matched with the address data, determining province, city and county data in administrative regions, which are matched with province, city and county data of the address data, in the city and county classification information record, and if the household registration classification attribute corresponding to the matched province, city and county data is a town, determining the household registration classification attribute of the information to be classified as the town; or,
if the household registration classification attribute corresponding to the matched province, city and county data is not a town, determining village and town name data matched with the village and town name data of the address data in village and town name data contained in the detailed district address corresponding to the matched province, and if the household registration classification attribute of the matched village and town name data is definite, determining the household registration classification attribute of the information to be classified as the household registration classification attribute of the matched village and town name data; or,
and if the registered family membership classification attribute of the village name data is not clear, determining village committee or resident committee name data which is matched with the village committee or resident committee name data of the address data in the village committee or resident committee name data contained in the area detailed address corresponding to the registered village name data, wherein the family membership classification attribute of the information to be classified is the family membership classification attribute corresponding to the registered village committee or resident committee name data.
4. The method of claim 3, wherein determining a keyword information record corresponding to a keyword in address data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the keyword information record corresponding to the keyword in the address data comprises:
extracting a keyword in the address data if village name data contained in the zone detailed address corresponding to the matched province, city and county data does not match with village name data of the address data, or if village committee or resident committee name data contained in the zone detailed address corresponding to the matched village name data does not match with village committee or resident committee name data of the address data;
and determining keyword data matched with the keywords in the address data in the keyword information record, wherein the keyword information record comprises the keyword data and the corresponding household registration classification attribute, and determining the household registration classification attribute of the information to be classified as the household registration classification attribute corresponding to the matched keyword data.
5. The method of claim 4, wherein determining a keyword information record corresponding to a keyword in address data of the information to be classified, and determining the household registration classification attribute of the information to be classified according to the keyword information record corresponding to the keyword in the address data, further comprises:
if no corresponding keyword information record exists or the address data content is empty, determining the record with the earliest updating time in the household registration population basic information urban and rural classification information table and the household registration population historical information urban and rural classification table according to the identity number data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the record.
6. The method of claim 1, further comprising:
if the information to be classified is determined not to contain address data, determining the type of the information to be classified;
determining corresponding type information records matched in the household registration population basic information according to the types of the information to be classified, and determining household registration classification attributes of the information to be classified according to the household registration classification attributes of the corresponding type information records matched in the household registration population basic information; or,
if it is determined that the corresponding type information record matched with the type of the information to be classified does not exist in the household registration population basic information, determining a corresponding type information record matched in the household registration population historical information according to the type of the information to be classified, and determining a household registration classification attribute of the information to be classified according to the household registration classification attribute of the corresponding type information record matched in the household registration population historical information; or,
and if it is determined that the corresponding type information record matched with the type of the information to be classified does not exist in the household registration population history information, determining that the household registration classification attribute of the information to be classified is empty.
7. A data sorting apparatus, comprising:
the acquisition module is used for acquiring information to be classified;
the first determining module is used for determining an address information record corresponding to address data in the information to be classified and determining the household registration classification attribute of the information to be classified according to the address information record corresponding to the address data; or,
and the second determining module is used for determining a keyword information record corresponding to a keyword in the address data in the information to be classified, and determining the household registration classification attribute of the information to be classified according to the keyword information record corresponding to the keyword in the address data.
8. The apparatus of claim 7, wherein the first determining module comprises:
and the first determining submodule is used for determining a standard address information record matched with the address data, wherein the standard address information record comprises standard address information data and a corresponding household classification attribute, and the household classification attribute in the matched standard address information record is determined as the household classification attribute of the information to be classified.
9. The apparatus of claim 8, wherein the first determining module further comprises:
a second determining submodule, configured to determine, if the standard address information record does not match the address data, province, city and county data in the administrative division city and county classification information record that matches the province, city and county data of the address data, and if a household registration classification attribute corresponding to the matched province, city and county data is a town, determine that the household registration classification attribute of the information to be classified is the town; or,
a third determining submodule, configured to determine, if the household registration classification attribute corresponding to the matched province, city and county data is not a town, town name data that is matched with the country name data of the address data in the country name data included in the district detailed address corresponding to the matched province, city and county data, and if the household registration classification attribute of the matched country name data is clear, the household registration classification attribute of the information to be classified is the household registration classification attribute of the matched country name data; or,
and a fourth determination submodule configured to determine, if the registered family classification attribute of the village name data is ambiguous, a village or a place name data registered with the village or place name data of the address data among the village or place name data included in the area details address corresponding to the registered village name data, wherein the family classification attribute of the information to be classified is a family classification attribute corresponding to the registered village or place name data.
10. The apparatus of claim 9, wherein the second determining module comprises:
a fifth determination submodule configured to extract a keyword from the address data if village name data included in the area details address corresponding to the matched province-city-county data does not match the village name data of the address data, or if village committee or place name data included in the area details address corresponding to the matched village name data does not match the village committee or place name data of the address data;
and a sixth determining sub-module, configured to determine keyword data in the keyword information record, where the keyword data is matched with a keyword in the address data, and the keyword information record includes the keyword data and a corresponding household registration classification attribute, and then determine that the household registration classification attribute of the information to be classified is a household registration classification attribute corresponding to the matched keyword data.
11. The apparatus of claim 10, wherein the second determining module further comprises:
and the seventh determining submodule is used for determining the record with the earliest updating time in the household registration population basic information urban and rural classification information table and the household registration population historical information urban and rural classification table according to the identity number data in the information to be classified if no corresponding keyword information record exists or the address data content is empty, and determining the household registration classification attribute of the information to be classified according to the record.
12. The apparatus of claim 7, further comprising:
the type determining module is used for determining the type of the information to be classified if the information to be classified does not contain address data;
a third determining module, configured to determine, according to the type of the information to be classified, a corresponding type information record matched in the household registration population basic information, and determine, according to a household registration classification attribute of the corresponding type information record matched in the household registration population basic information, a household registration classification attribute of the information to be classified; or,
a fourth determining module, configured to determine, if it is determined that there is no corresponding type information record matching the type of the information to be classified in the household registration population basic information, a corresponding type information record matching in the household registration population history information according to the type of the information to be classified, and determine a household registration classification attribute of the information to be classified according to a household registration classification attribute of the corresponding type information record matching in the household registration population history information; or,
and the fifth determining module is used for determining that the information to be classified is empty if it is determined that the corresponding type information record matched with the type of the information to be classified does not exist in the household registration population historical information.
CN201711131428.XA 2017-11-15 2017-11-15 Data classification method and device Pending CN110019797A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711131428.XA CN110019797A (en) 2017-11-15 2017-11-15 Data classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711131428.XA CN110019797A (en) 2017-11-15 2017-11-15 Data classification method and device

Publications (1)

Publication Number Publication Date
CN110019797A true CN110019797A (en) 2019-07-16

Family

ID=67185928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711131428.XA Pending CN110019797A (en) 2017-11-15 2017-11-15 Data classification method and device

Country Status (1)

Country Link
CN (1) CN110019797A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117436767A (en) * 2023-12-15 2024-01-23 云南师范大学 Assessment method, system and storage medium based on near-remote coupling coordination model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577423A (en) * 2012-07-23 2014-02-12 阿里巴巴集团控股有限公司 Keyword classification method and system
CN104915453A (en) * 2015-07-01 2015-09-16 北京奇虎科技有限公司 Method, device and system for classifying POI information
CN105005792A (en) * 2015-07-13 2015-10-28 河南科技大学 KNN algorithm based article translation method
CN105069056A (en) * 2015-07-24 2015-11-18 湖北文理学院 Character string matching based method and system for analyzing address information of identification card
CN106650783A (en) * 2015-10-30 2017-05-10 李静涛 Method, device and system for mobile terminal data classifying, generating and matching

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577423A (en) * 2012-07-23 2014-02-12 阿里巴巴集团控股有限公司 Keyword classification method and system
CN104915453A (en) * 2015-07-01 2015-09-16 北京奇虎科技有限公司 Method, device and system for classifying POI information
CN105005792A (en) * 2015-07-13 2015-10-28 河南科技大学 KNN algorithm based article translation method
CN105069056A (en) * 2015-07-24 2015-11-18 湖北文理学院 Character string matching based method and system for analyzing address information of identification card
CN106650783A (en) * 2015-10-30 2017-05-10 李静涛 Method, device and system for mobile terminal data classifying, generating and matching

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
侯亚杰: ""户口迁移与户籍人口城镇化"", 《人口研究》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117436767A (en) * 2023-12-15 2024-01-23 云南师范大学 Assessment method, system and storage medium based on near-remote coupling coordination model
CN117436767B (en) * 2023-12-15 2024-04-09 云南师范大学 Assessment method, system and storage medium based on near-remote coupling coordination model

Similar Documents

Publication Publication Date Title
AU2022204452B2 (en) Verification of electronic identity components
US20120330959A1 (en) Method and Apparatus for Assessing a Person's Security Risk
US11966424B2 (en) Method and apparatus for dividing region, storage medium, and electronic device
CN110674360B (en) Tracing method and system for data
JP2019512764A (en) Method and apparatus for identifying the type of user geographical location
Kim et al. An analysis on movement patterns between zones using smart card data in subway networks
CN110109908B (en) Analysis system and method for mining potential relationship of person based on social basic information
CN102948117A (en) Information tracking system and method
CN111159973B (en) Administrative division alignment and standardization method for Chinese addresses
Edwards et al. Geocoding Large Population‐level Administrative Datasets at Highly Resolved Spatial Scales
Brooks et al. The African Bird Atlas Project: a description of the project and BirdMap data-collection protocol
KR20140097805A (en) Coordinates (x, y) position value using a systematic block code generated and the address matching service using methods
USRE48213E1 (en) Techniques for synchronized address coding and print sequencing
CN113849702B (en) Method and device for determining target data, electronic equipment and storage medium
KR102184048B1 (en) System and method for checking of information about estate development plan based on geographic information system
US9542471B2 (en) Method of building a geo-tree
CN103250151A (en) Server, information-anagement method, information-management program, and computer-readable recording medium with said program recorded thereon
US8396877B2 (en) Method and apparatus for generating a fused view of one or more people
CN110457332B (en) Information processing method and related equipment
Chen et al. An analysis of movement patterns between zones using taxi GPS data
Chatterjee et al. SAGEL: smart address geocoding engine for supply-chain logistics
CN104933581A (en) Method and system for monitoring market subject
CN107798450B (en) Service distribution method and device
CN110019797A (en) Data classification method and device
CN102446186B (en) Chinese geocoding and coding/decoding method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190716

RJ01 Rejection of invention patent application after publication