CN116737705A - Method, device, equipment and storage medium for normalizing deposit information - Google Patents

Method, device, equipment and storage medium for normalizing deposit information Download PDF

Info

Publication number
CN116737705A
CN116737705A CN202310690606.1A CN202310690606A CN116737705A CN 116737705 A CN116737705 A CN 116737705A CN 202310690606 A CN202310690606 A CN 202310690606A CN 116737705 A CN116737705 A CN 116737705A
Authority
CN
China
Prior art keywords
deposit
information
address information
address
split
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310690606.1A
Other languages
Chinese (zh)
Inventor
胡艺
俞中宏
王威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Fangxuntong Information Technology Co ltd
Original Assignee
Shenzhen Fangxuntong Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Fangxuntong Information Technology Co ltd filed Critical Shenzhen Fangxuntong Information Technology Co ltd
Priority to CN202310690606.1A priority Critical patent/CN116737705A/en
Publication of CN116737705A publication Critical patent/CN116737705A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0278Product appraisal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0283Price estimation or determination

Abstract

The application provides a method, a device, equipment and a storage medium for normalizing deposit information, and the deposit information is acquired; the deposit information includes deposit address information; splitting the deposit address information according to the hierarchical address by matching the deposit address information through a standard database; and arranging the split deposit address information step by step according to the hierarchical address. By implementing the scheme of the application, the deposit information is split step by step according to the location area, the location floor address, the location floor, the location building and the location room number and is arranged in the corresponding table, so that the standardization of the deposit information is realized, and the efficiency of splitting and inputting the deposit information is improved.

Description

Method, device, equipment and storage medium for normalizing deposit information
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for normalizing deposit information.
Background
Excel is a piece of spreadsheet software written by Microsoft for computers using Windows and Mac operating systems. Visual interface, excellent calculation function and graph tool, and successful marketing, excel becomes the most popular personal computer data processing software
Although the traditional Excel table has a splitting function, the data table of different types can be split, other splitting modes can only be manually processed, and when the processed data volume is too large, a great amount of time is required to split the target data.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for normalizing deposit information, which at least can solve the problem that a great deal of time is required to be spent for manually splitting target data in related technologies.
An embodiment of the present application provides a method for normalizing deposit information, including:
acquiring deposit information; the deposit information includes deposit address information;
splitting the deposit address information according to the hierarchical address by matching the deposit address information through a standard database;
and arranging the split deposit address information step by step according to the hierarchical address.
According to the scheme, the standard database is used for matching the deposit address information, the deposit address information is split according to the hierarchical address, the split deposit address information is arranged step by step according to the hierarchical address, the standardization of the deposit information is realized, and the deposit information processing efficiency is improved.
Optionally, the step of splitting the deposit address information according to the hierarchical address by matching the deposit address information by the standard database includes:
step-by-step matching is carried out on the deposit address information in a standard database according to the hierarchical address, and the deposit address information is split if the matching is successful;
if the deposit address information has no matched level address information in the standard database, matching is carried out after deleting the corresponding level address key words in the standard database, and the deposit address information is split;
if the deposit address information cannot be split, splitting the deposit address information according to the address keywords of each level in the deposit address information.
Optionally, the step of splitting the deposit address information by matching the standard database with the names of the addresses of all levels in the deposit address information includes:
matching the first deposit address information based on a standard database, and splitting the area where the deposit is located and the second deposit address information; the first deposit address information is initial deposit address information;
matching the second deposit address information based on the standard database, and splitting the floor address where the deposit is located and the third deposit address information;
matching the third deposit address information based on the standard database, and splitting the name of the building where the deposit is and the fourth deposit address information;
matching the fourth deposit address information based on the standard database, and splitting the address information of the building where the deposit is located and the fifth deposit address information;
and matching the fifth deposit address information based on the standard database, and splitting out the room number where the deposit is located.
According to the scheme, the deposit information is split step by step according to the location area, the location floor address, the location floor, the location building and the location room number, and the split deposit information is arranged in the corresponding table, so that the standardization of the deposit information is realized, and the efficiency of splitting and inputting the deposit information is improved.
Optionally, the step of splitting the name of the building where the deposit is located includes:
if the administrative area of the floor where the deposit is located is inconsistent with the split administrative area, replacing the split administrative area with the administrative area of the floor;
if a plurality of floor names are matched, selecting the floor names of administrative areas where the floors are located and the administrative areas which are split are consistent;
if the floor names in which the administrative areas where the floors are located are consistent with the split administrative areas do not exist in the plurality of floor names, selecting a first floor name according to the matching similarity; the first building name is the building name with the highest matching similarity.
According to the scheme, the number of the names of the floors is accurately matched, whether the administrative area where the deposit is located and the floor where the deposit is located in the deposit information are judged, and the accuracy of the area where the split deposit is located and the floor where the deposit is located is improved.
Optionally, the deposit information further includes a deposit house type, and the method is characterized in that the step of splitting the building where the deposit is located includes:
if the fourth deposit address information is incomplete, determining a building where the deposit with the missing deposit address information is located based on the deposit house type, and splitting the building where the deposit is located.
Optionally, the deposit information further includes deposit value information, and the deposit information standardization method further includes:
if the complete deposit address information is split according to the addresses of all levels, if no matched deposit address information exists in the standard database, the deposit address information is redetermined based on the deposit value information and the standard database, and the deposit address information is split.
Optionally, the step of redefining the deposit address information and splitting the deposit address information based on the deposit value information and the standard database includes:
invoking preset estimated value price data in the deposit value information;
determining the address names of all levels in the deposit address information based on the address names of all levels of deposit information matched with the estimated price data;
and if the address names of all levels of the deposit information matched with the estimated price data are not unique, screening according to different parameters of the deposit value information, and screening all levels of the address names of the deposit information step by step until the unique level address names are matched.
A second aspect of an embodiment of the present application provides a deposit information normalizing device, including:
the acquisition module is used for acquiring the deposit information; the deposit information includes deposit address information;
the splitting module is used for splitting the deposit address information through matching the standard database with the deposit address information;
and the arrangement module is used for arranging the split deposit address information step by step.
A third aspect of the embodiment of the present application provides an electronic device, which is characterized by comprising a memory and a processor, wherein the processor is configured to execute a computer program stored on the memory, and when the processor executes the computer program, the processor performs each step in the method for normalizing deposit information provided in the first aspect of the embodiment of the present application.
A fourth aspect of the embodiment of the present application provides a computer readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the deposit information normalization method provided in the first aspect of the embodiment of the present application.
Drawings
Fig. 1 is a basic flow diagram of a first embodiment of a method for normalizing deposit information according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a deposit tool applied to an Excel tool table according to an embodiment of the present application;
fig. 3 is a schematic flow chart of a second embodiment of a method for normalizing deposit information according to an embodiment of the present application;
fig. 4 is a schematic program module diagram of a deposit information normalization apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, features and advantages of the present application more comprehensible, the technical solutions in the embodiments of the present application will be clearly described in conjunction with the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In order to solve the problem that a lot of time is required for manually splitting target data in the related art, an embodiment of the present application provides a method for normalizing deposit information, as shown in fig. 1, which is a basic flowchart of an embodiment of the method for normalizing deposit information provided in the present embodiment, where the method for normalizing deposit information includes the following steps:
step 110, obtaining deposit information; the deposit information includes deposit address information;
step 120, matching the deposit address information through a standard database, and splitting the deposit address information according to the hierarchical address;
and 130, arranging the split deposit address information step by step according to the hierarchical address.
Specifically, the Excel tool table acquires all the deposit information from the database, and splits the deposit address information by matching the standard database with the deposit address information; and then arranging the split deposit address information step by step according to the hierarchical address. As shown in fig. 2, a deposit tool schematic diagram applied to an Excel tool table according to an embodiment of the present application is shown, when deposit information is a deposit address, the deposit address is split and then filled into corresponding cells step by step according to a region where the deposit is located (province/direct administration city, county/district), a floor address where the deposit is located, a floor where the deposit is located, a building where the deposit is located, and a building number where the deposit is located, wherein the floor address where the deposit is located and the floor where the deposit is located (i.e. a floor name) can be filled in the same cell (floor).
Optionally, the deposit address information is matched step by step in a standard database according to the hierarchical address, and the deposit address information is split if the matching is successful; if the deposit address information has no matched level address information in the standard database, matching is carried out after the corresponding level address key words in the standard database are deleted, and the deposit address information is split; if the deposit address information cannot be split, the deposit address information is split according to the address keywords of each level in the deposit address information. It can be understood that after the splitting is completed, the final splitting result is transferred to a manual auditing process to determine the rationality of each level address. For example, when the deposit address information is split according to each of the hierarchical address keywords in the deposit address information, if a clearly erroneous hierarchical address name (for example, long salesman in Guangdong province or deep Chuan in Guangdong province) appears, the hierarchical address name is corrected by manual processing.
Optionally, the deposit tool further includes a building re-estimation module, and after splitting the building, house number, etc. in the address, if the building can be correlated with the building in the standard database, the corresponding estimated value price data can be called through the correlated estimated value interface, so as to obtain a re-estimation result.
Alternatively, since AI splitting cannot be 100% correctly split and involves a building with a standard database, there are some buildings that cannot be split or that cannot be associated with a standard database, and therefore manual processing is required. When the manual processing is performed, the provincial area is associated in a clicking mode, then the keyword is searched for searching the building, and the corresponding data is selected for association or new addition.
Based on the embodiment scheme of the application, deposit information is acquired; the deposit information includes deposit address information; splitting the deposit address information according to the hierarchical address by matching the deposit address information through a standard database; and arranging the split deposit address information step by step according to the hierarchical address. By implementing the scheme of the application, the deposit information is split step by step according to the location area, the location floor address, the location floor, the location building and the location room number and is arranged in the corresponding table, so that the standardization of the deposit information is realized, and the efficiency of splitting and inputting the deposit information is improved.
Fig. 3 is a schematic flow chart of a second embodiment of a method for normalizing deposit information according to the first embodiment of the present application, wherein the step of step-by-step matching the deposit address information in a standard database according to a hierarchical address, and splitting the deposit address information if the matching is successful includes:
and 310, matching the first deposit address information based on the standard database, and splitting the area where the deposit is located and the second deposit address information.
Specifically, the deposit information is the name or address of the deposit to be split. If the deposit to be split has both a deposit name and a deposit address, only the deposit name or the deposit address is split, and the deposit name or the deposit address is split into: the area (province/direct administration city, county/district), the building address (street/town, road/village, house number), the building name (including the stage building), the building and the house number. In this embodiment, the area where the deposit in the first deposit address information is located is split according to the administrative area name of the standard database acquired nationally by the house communication. Taking the name of the deposit as an example, the first deposit address information represents the original target deposit information provided by the customer, the area where the deposit is located includes, but is not limited to, province/direct administration city, county/district, after the target deposit information provided by the customer is obtained, the Excel tool table splits the target deposit information according to the administrative name in the standard database, and no matter which area or more of the area where the target deposit information provided by the customer includes province/direct administration city, city and county/district, the target deposit information needs to be split out and orderly arranged in the table of the Excel tool table.
Optionally, when splitting the city where the first deposit address information is located, fuzzy matching is performed with the name of the deposit by using the abbreviation of the city, for example, beijing for short, shanghai for short, wherein fuzzy matching is a data matching technique for comparing two or more records and calculating the probability that they belong to the same entity. Meanwhile, after matching the city where the deposit is located, the province where the deposit is located can be found out from the data according to the city, and the IDs corresponding to the province and the city number are queried in the database and are filled in the table of the Excel tool table. By fuzzy matching of the names of the deposit with the abbreviations of the cities, the area where the deposit is located can be accurately split when the name or address of the deposit provided by the client is not standard.
In an optional implementation manner of this embodiment, the step of splitting the region where the first deposit address information is located according to the administrative area name of the standard database includes: matching the administrative area names through the standard database, and splitting the administrative area names of the deposit; if the administrative area names cannot be split, matching is carried out after the administrative area keywords of the administrative area names of the standard database are deleted, and the deposit administrative area names are split; if the administrative area names cannot be split, the deposit administrative area names are split according to the administrative area keywords.
In particular, exact matching refers to matching that gives a degree of accuracy according to the conditions or requirements set forth. After splitting the province, city of deposit information, it is also necessary to further divide more precise administrative areas, including county, district, flag, etc., because in some larger first-line cities or direct jurisdictions, it is very difficult to accurately find houses within such a large range if only the deposit name is split into the city. In this embodiment, first, the administrative area names belonging to the standard database of the city are adopted to perform accurate matching, an attempt is made to directly match the administrative area corresponding to the deposit information, and the administrative area names where the deposit is located are split in the deposit information. If the administrative area where the deposit is located cannot be determined directly through accurate matching, the considered client is not standard in filling in the deposit name or area, the problem of writing out the keywords of the administrative area is solved, and when the matching is performed again, the keywords (area, city, county, flag, county and the like) of the administrative area in the administrative area name of the standard database are deleted, for example, the deposit name is a baoshan street, the baoshan area with the administrative area name may not be directly split, and the administrative area can be split by using the baoshan with the administrative area keywords deleted. If the administrative area name cannot be split at this time, then the administrative area keyword is directly adopted to split the deposit information again, and if the administrative area keyword exists in the deposit information, then the part before the keyword is used as the administrative area where the deposit is located to split. It will be appreciated that the above-described splitting approach is applicable to splitting all hierarchical addresses.
Optionally, if the background color is fuzzy matching or keyword matching, importing data to the Excel cell, wherein the background color is brown yellow; if the values cannot be matched, the data is imported to the Excel cell with a red background color. For example, when the administrative area keywords are deleted for matching or matched by keywords, if the administrative area names can be matched, when the data is filled into the county/area of the Excel form, setting the background color of the cell lattice to be brown yellow, which indicates the possibility of errors of the names, and requiring manual verification; if the administrative area names still cannot be split by using the administrative area keywords, when the data are filled into the county/area, the background color of the unit grids is set to be red to indicate that the county/area value is null, and the corresponding administrative area names cannot be matched, so that manual processing is needed.
And 320, matching the second deposit address information based on the standard database, and splitting the floor address where the deposit is and the third deposit address information.
Specifically, in this embodiment, after the area where the deposit is located is split, the remaining second deposit address information is split according to the floor address where the deposit is located, where the floor address where the deposit is located includes a road, a number, and the like.
In an optional implementation manner of this embodiment, the step of splitting the building address where the deposit is located includes: splitting the road of the building address where the deposit is located according to the road key; based on the located road, splitting the located number according to the road number key word.
Specifically, in this embodiment, when splitting a branch in a administrative area where a deposit is located, keywords related to the road (e.g., road, street, town) are used to split the road where the deposit is located, if one deposit name includes a plurality of road keywords, the last keyword is intercepted from the position, and all the previous deposit names are split. If the second deposit address information can split the road, the key word for the number correlation is used for splitting the road number, and if the road number key word is in the form of "… building", "number" in the last place or "number" after the building key word, the number is not split. If the second deposit address information cannot be split, the road number is not split.
And 330, matching the third deposit address information based on the standard database, and splitting the name of the building where the deposit is and the fourth deposit address information.
Specifically, in this embodiment, the building name of the building where the deposit is located in the third deposit address information is split.
In an alternative implementation manner of this embodiment, the step of splitting the building where the deposit is located includes: according to the name of the building in the first standard database, carrying out accurate matching, and splitting the building in which the deposit is located; if the building name according to the first standard database can not be split to form the building where the deposit is located, the building where the deposit is located is split through the building key words.
Specifically, the first standard database is a standard database of an administrative area where the deposit is located. In this embodiment, after the information about the road and the road number is excluded, the standard database corresponding to the city in the database is used to precisely match the name of the building of the product provided by the customer, if the building of the product cannot be split by the precise match, the building of the product is split by the key words of the building (for example, a district, a building, an apartment, a garden, a square, a new village, a home, a garden, etc.), if one name of the product contains a plurality of keys of the building, the character is intercepted from the position of the last key word, and if the name of the building is not found by the precise match and the matching of the keys of the building, the splitting flow of the product information provided by the customer and the splitting completion or the inaccuracy of the product information can be directly ended, and the "name of the building" of the Excel form and the background color of the relevant cell after are set to red, and waiting for the manual processing.
It can be understood that if the road and the number can be split, but the building can not be split, the road is a building and the number is a building; if the back of the building comprises Arabic numerals in the + stage, splitting into stages; the numbers of Chinese characters are equal to Arabic numerals, and if the first period of the Wanke vegetable garden appears, the first period of the Wanke color garden is represented by the same building, building and house number are the same.
Optionally, after the step of splitting the building where the deposit is located, the method further includes: if the administrative area of the corridor where the deposit is located is inconsistent with the split administrative area, replacing the split administrative area with the administrative area of the corridor; if a plurality of floor names are matched, selecting the floor names of the administrative areas where the floors are located and the administrative areas which are split to be consistent; if the floor names in which the administrative areas where the floors are located are consistent with the split administrative areas do not exist in the plurality of floor names, selecting a first floor name according to the matching similarity.
Specifically, if only one floor name is matched through the accurate matching of the standard database, it can be determined that only the one floor name is most likely to be the customer's deposit floor name in the city, and when the administrative area where the matched floor is located is inconsistent with the administrative area previously split, the administrative area previously split is replaced by the floor administrative area, because the Excel tool table automatically determines that the administrative area is a filling error or a previous splitting error when the customer provides deposit information; if a plurality of floor names are matched through accurate matching of the standard database, selecting the floor names of the administrative areas where the floors are located, which are consistent with the split administrative areas; if a plurality of floor names are matched through accurate matching of the standard database, but administrative areas where the plurality of floor names are located are inconsistent with administrative areas split before, the floor name with the highest matching similarity is selected according to the matching similarity when the plurality of floors are matched, and the cell is set to be brown yellow, and the cell waits for manual confirmation.
And 340, matching the fourth deposit address information based on the standard database, and splitting the building where the deposit is located and the fifth deposit address information.
Specifically, in an optional implementation manner of this embodiment, the step of splitting the building where the deposit is located includes: if the building plate where the deposit is located is split through accurate matching, the building is accurately matched according to the building name of the second standard database, and the building where the deposit is located is split; if the building where the deposit is located cannot be split by precisely matching the name of the building or the building where the deposit is located is split by the building key, the building where the deposit is located is split by the building key.
Specifically, the second standard database is a standard database of a building where the deposit is located, all building information below the building is recorded in the standard database, the building where the deposit is located is precisely matched by using the building information, if the client records the precise deposit information, the building can be directly separated, otherwise, building keywords (such as building, number building, seat and the like) are needed to be matched again, if one deposit name contains a plurality of building keywords, characters are intercepted from the position of the last keyword, and the characters are taken as building names.
Optionally, if a building keyword splitting method is adopted when splitting a building, after the building is split, the building can be split directly by the building keyword splitting method when splitting the building.
Optionally, if the building name is Arabic numerals+building, building, number building, building and seat, the number is large to be matched preferentially; for example: 1, 11, firstly 11 matches and then 1 match. After the building is matched, whether the front of the building is provided with Arabic numerals is also verified, and if the front of the building is provided with Arabic numerals, the correct building is the Arabic numerals plus the matched building. For example: the name of the pressed product is as follows: the number of the building in the standard database is only 2, the split building number is 12, and the escort of the 12 building in the X building can not be split into the 2 building in the X building because only 2 building is present, which belongs to the situation that the building does not exist in the standard database and needs to be newly added with 12 building numbers.
Optionally, when the building information in the deposit address information is split, the building where the deposit is located cannot be split due to missing building address or incomplete building address. In this embodiment, the deposit information further includes a deposit house type, and in one building, house types may be different between buildings, but house types inside the buildings are almost identical, so after the name of the building where the deposit is located is determined, the building where the deposit is located can be determined according to the deposit house type.
And 350, matching the fifth deposit address information based on the standard database, and splitting out the room number of the deposit.
Specifically, in this embodiment, after the building is determined through the previous four splitting processes, only the room number remains undivided, so after the first four types of deposit information of the deposit information are split, the room number is split by using the remaining deposit information.
In an optional implementation manner of this embodiment, the step of splitting the room number where the deposit is located includes: according to the room number name of the third standard database, carrying out accurate matching, and splitting the room number of the deposit; if the room number of the deposit cannot be split through accurate matching, the fifth deposit address information is determined to be the room number of the deposit.
Specifically, the third standard database is a standard database under the building where the deposit is located. In this embodiment, the room number of the building where the split deposit is located is split by precisely matching the address information of the fifth deposit with the room number of the standard database of the building where the split deposit is located. If the room number of the deposit cannot be split through accurate matching, the remaining characters of the address information of the fifth deposit are all defaulted as the room number, wherein if the Arabic number of the room number is 3 digits, the floor is the first place; if the data of the room number is 4, the floor is the first two.
Optionally, if the room number is Arabic number, the number is large and is matched preferentially; for example: 1201 Matching 1201, and then matching 201, wherein the room number is the ID of the database; looking up the floor attention of the room number: after the room number is matched, whether the Arabic number is arranged in front of the room number is verified, if the Arabic number is arranged in front of the room number, the correct room number is the Arabic number and the matched room number. For example: the name of the pressed product is as follows: the building number of the building 1102 is 102, and the matched building number is 1102.
Optionally, the incomplete deposit address information of the deposit information may cause that the deposit information cannot be completely split, which level address is missing, and the splitting process ends where, so in this embodiment, the deposit value information of the deposit information may also be acquired, after the complete deposit address information is split according to each level address by the standard database, the deposit address information is determined based on the deposit value information, and the deposit address information is split. For example, if the building and the house number exist, the building name cannot be continuously split, and the value information of the building can be acquired at this time, including but not limited to preset estimated value price data, namely, the building name of the building where the building is located is determined according to the estimated price of the house when the user mortises the building, and the building address information is split again. It will be appreciated that the estimated value of the same building in different regions is different, for example, the mortgage price of the same building in a large city is definitely higher than that of a small city, so that after determining the provincial city and region where the deposit is located, the building name can be screened according to the mortgage price of the deposit, and then determined according to the comprehensive room price data of the provincial city and region where the deposit is located.
It can be understood that, if the matching result is not unique according to the preset estimated price data, for example, ten similar hierarchical address names are determined through screening of the preset estimated price data when the missing hierarchical address names are not unique, the ten hierarchical address names can be screened for the second time through the housing year in the deposit value information, only one hierarchical address name may be left after the second screening, or two to three hierarchical address names may be left, and the third and fourth screening is continued through other information in the deposit value information until only one hierarchical address name is left. If the unique hierarchical address name still cannot be determined after screening through the deposit value information, the remaining hierarchical address names are sent to manual processing, and the hierarchical address names are determined by combining the deposit information. It can be understood that when the information of the pressware is incomplete, the missing hierarchical address names are directly matched manually, so that the workload of processing can be hundreds of hierarchical address names, and the workload of processing is just that of one piece of information of the pressware.
Fig. 3 is a deposit information normalization device according to an embodiment of the present application, which may be used to implement the deposit information normalization method in the foregoing embodiment. As shown in fig. 3, the deposit information normalizing device mainly includes:
an acquisition module 10 for acquiring deposit information; the deposit information includes deposit address information;
the splitting module 20 is used for splitting the deposit address information through matching the standard database with the deposit address information;
the arrangement module 30 is configured to arrange the split deposit address information step by step.
In an optional implementation manner of this embodiment, the splitting module is specifically configured to: step-by-step matching is carried out on the deposit address information in a standard database according to the hierarchical address, and the deposit address information is split if the matching is successful; if the deposit address information has no matched level address information in the standard database, matching is carried out after the corresponding level address key words in the standard database are deleted, and the deposit address information is split; if the deposit address information cannot be split, the deposit address information is split according to the address keywords of each level in the deposit address information.
Further, in an optional implementation manner of this embodiment, the splitting module is further configured to: matching the first deposit address information based on a standard database, and splitting the area where the deposit is located and the second deposit address information; the first deposit address information is initial deposit address information; matching the second deposit address information based on the standard database, and splitting the floor address where the deposit is located and the third deposit address information; matching the third deposit address information based on the standard database, and splitting the name of the building where the deposit is and the fourth deposit address information; matching the fourth deposit address information based on the standard database, and splitting the address information of the building where the deposit is located and the fifth deposit address information; and matching the fifth deposit address information based on the standard database, and splitting out the room number where the deposit is located.
Still further, in an alternative implementation manner of this embodiment, the splitting module is further configured to: if the administrative district of the floor where the name of the floor where the deposit is located is inconsistent with the split administrative district, replacing the split administrative district with the administrative district of the floor where the floor is located; if a plurality of floor names are matched, selecting the floor names of the administrative areas where the floors are located and the administrative areas which are split to be consistent; if the floor names in which the administrative areas where the floors are located are consistent with the split administrative areas do not exist in the plurality of floor names, selecting a first floor name according to the matching similarity; the first building name is the building name with the highest matching similarity.
In an alternative implementation of this embodiment, the splitting module is further configured to: if the fourth deposit address information is incomplete, determining the building where the deposit address information is missing based on the deposit house type, and splitting the building where the deposit is located.
In an optional implementation manner of this embodiment, the deposit information normalizing device further includes: and a determining module. The determining module is specifically configured to: if the complete deposit address information is split according to the addresses of all levels, the deposit address information is redetermined based on the deposit value information and the standard database when no matched deposit address information exists in the standard database, and the deposit address information is split.
Further, in an optional implementation manner of this embodiment, the determining module is further configured to: invoking preset estimated value price data in the deposit value information; determining the address names of all levels in the deposit address information based on the address names of all levels of deposit information matched with the estimated price data; and if the address names of all levels of deposit information matched with the estimated price data are not unique, screening according to different parameters of the deposit value information, and screening the address names of all levels of deposit information step by step until the unique level address names are matched. According to the deposit information standardization device provided by the scheme of the application, deposit information is acquired; the deposit information includes deposit address information; splitting the deposit address information according to the hierarchical address by matching the deposit address information through a standard database; and arranging the split deposit address information step by step according to the hierarchical address. By implementing the scheme of the application, the deposit information is split step by step according to the location area, the location floor address, the location floor, the location building and the location room number and is arranged in the corresponding table, so that the standardization of the deposit information is realized, and the efficiency of splitting and inputting the deposit information is improved.
Fig. 5 is an electronic device provided in an embodiment of the present application. The electronic device may be used to implement the deposit information normalization method in the foregoing embodiment, and mainly includes:
memory 501, processor 502, and computer program 503 stored on memory 501 and executable on processor 502, memory 501 and processor 502 being connected by communication. The processor 502, when executing the computer program 503, implements the deposit information normalization method in the foregoing embodiment. Wherein the number of processors may be one or more.
The memory 501 may be a high-speed random access memory (RAM, random Access Memory) memory or a non-volatile memory (non-volatile memory), such as a disk memory. The memory 501 is used for storing executable program codes, and the processor 502 is coupled to the memory 501.
Further, an embodiment of the present application further provides a computer readable storage medium, which may be provided in the electronic device in each of the foregoing embodiments, and the computer readable storage medium may be a memory in the foregoing embodiment shown in fig. 5.
The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the deposit information normalizing method of the foregoing embodiments. Further, the computer-readable medium may be any medium capable of storing a program code, such as a usb (universal serial bus), a removable hard disk, a Read-Only Memory (ROM), a RAM, a magnetic disk, or an optical disk.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.
The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a readable storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned readable storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.
It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The foregoing describes the method, apparatus, device and storage medium for normalizing deposit information provided by the present application, and those skilled in the art will recognize that there are variations in terms of specific embodiments and application ranges of the concepts of the embodiments of the present application.

Claims (10)

1. A method for normalizing deposit information, comprising:
acquiring deposit information; the deposit information includes deposit address information;
matching the deposit address information through a standard database, and gradually splitting the deposit address information according to the hierarchical address;
and arranging the split deposit address information step by step according to the hierarchical address.
2. The method of claim 1, wherein said step of splitting said deposit address information step by hierarchical addresses by matching said deposit address information by a standard database comprises:
step-by-step matching is carried out on the deposit address information in a standard database according to the hierarchical address, and the deposit address information is split if the matching is successful;
if the deposit address information has no matched level address information in the standard database, matching is carried out after deleting the corresponding level address key words in the standard database, and the deposit address information is split;
if the deposit address information cannot be split, splitting the deposit address information according to the address keywords of each level in the deposit address information.
3. The method for normalizing deposit information according to claim 2, wherein said step of stepwise matching said deposit address information in a standard database according to hierarchical addresses, and splitting said deposit address information if the matching is successful comprises:
matching the first deposit address information based on a standard database, and splitting the area where the deposit is located and the second deposit address information; the first deposit address information is initial deposit address information;
matching the second deposit address information based on the standard database, and splitting the floor address where the deposit is located and the third deposit address information;
matching the third deposit address information based on the standard database, and splitting the name of the building where the deposit is and the fourth deposit address information;
matching the fourth deposit address information based on the standard database, and splitting the address information of the building where the deposit is located and the fifth deposit address information;
and matching the fifth deposit address information based on the standard database, and splitting out the room number where the deposit is located.
4. A method of normalizing deposit information as defined in claim 3, wherein said step of splitting the name of the building where the deposit is located comprises:
if the administrative area of the floor with the name of the floor where the deposit is located is inconsistent with the split administrative area, replacing the split administrative area with the administrative area of the floor;
if a plurality of floor names are matched, selecting the floor names of administrative areas where the floors are located and the administrative areas which are split are consistent;
if the floor names in which the administrative areas where the floors are located are consistent with the split administrative areas do not exist in the plurality of floor names, selecting a first floor name according to the matching similarity; the first building name is the building name with the highest matching similarity.
5. The method for normalizing deposit information according to claim 3, wherein said deposit information further comprises a deposit house type, and wherein the step of splitting the building on which the deposit is located comprises:
and if the fourth deposit address information is incomplete, determining a building where the deposit with the missing deposit address information is located based on the deposit house type, and splitting the building where the deposit is located.
6. The deposit information standardization method of claim 1, the deposit information further comprising deposit value information, characterized in that the deposit information standardization method further comprises:
if the complete deposit address information is split according to the addresses of all levels, if no matched deposit address information exists in the standard database, the deposit address information is redetermined based on the deposit value information and the standard database, and the deposit address information is split.
7. The method of claim 6, wherein the steps of redefining the deposit address information based on the deposit value information and a standard database, and splitting the deposit address information, comprise:
invoking preset estimated value price data in the deposit value information;
determining the address names of all levels in the deposit address information based on the address names of all levels of deposit information matched with the estimated price data;
and if the address names of all levels of the deposit information matched with the estimated price data are not unique, screening according to different parameters of the deposit value information, and screening all levels of the address names of the deposit information step by step until the unique level address names are matched.
8. A deposit information normalizing device, comprising:
the acquisition module is used for acquiring the deposit information; the deposit information includes deposit address information;
the splitting module is used for splitting the deposit address information according to the hierarchical address through matching the deposit address information by the standard database;
and the arrangement module is used for arranging the split deposit address information step by step according to the hierarchical address.
9. An electronic device comprising a memory and a processor, wherein:
the processor is used for executing the computer program stored on the memory;
the processor, when executing the computer program, implements the steps of the deposit information normalizing method of any one of claims 1 to 7.
10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the deposit information normalizing method of any one of claims 1 to 7.
CN202310690606.1A 2023-06-12 2023-06-12 Method, device, equipment and storage medium for normalizing deposit information Pending CN116737705A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310690606.1A CN116737705A (en) 2023-06-12 2023-06-12 Method, device, equipment and storage medium for normalizing deposit information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310690606.1A CN116737705A (en) 2023-06-12 2023-06-12 Method, device, equipment and storage medium for normalizing deposit information

Publications (1)

Publication Number Publication Date
CN116737705A true CN116737705A (en) 2023-09-12

Family

ID=87905690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310690606.1A Pending CN116737705A (en) 2023-06-12 2023-06-12 Method, device, equipment and storage medium for normalizing deposit information

Country Status (1)

Country Link
CN (1) CN116737705A (en)

Similar Documents

Publication Publication Date Title
CN103279542B (en) Data import processing method and data processing equipment
US10114922B2 (en) Identifying ancestral relationships using a continuous stream of input
CN112199366A (en) Data table processing method, device and equipment
CN112115152B (en) Data increment updating and inquiring method and device, electronic equipment and storage medium
CN109933645B (en) Information query method, device, computer equipment and storage medium
CN111371858A (en) Group control equipment identification method, device, medium and electronic equipment
Ali et al. A framework to implement data cleaning in enterprise data warehouse for robust data quality
CN111753075A (en) Method and device for creating question and answer data of customer service robot and computer equipment
Scofield Energy star building benchmarking scores: good idea, bad science
CN113434482A (en) Data migration method and device, computer equipment and storage medium
CN110737432B (en) Script aided design method and device based on root list
CN116737705A (en) Method, device, equipment and storage medium for normalizing deposit information
WO2019071907A1 (en) Method for identifying help information based on operation page, and application server
CN109460318B (en) Import method of rollback archive collected data, computer device and computer readable storage medium
CN109241167B (en) Table data importing method based on BS framework
CN111460042B (en) Method for synchronizing and matching power grid user mark information among heterogeneous multiple systems
CN114022188A (en) Target crowd circling method, device, equipment and storage medium
CN112632247A (en) Method and device for detecting man-hour report, computer equipment and storage medium
CN114138739A (en) Database table content rapid comparison system
CN111767222A (en) Data model verification method and device, electronic equipment and storage medium
CN111639490A (en) Building data processing method and device, electronic equipment and storage medium
CN112416993A (en) Trademark change judgment method, system, equipment and readable storage medium
CN111949845A (en) Method, apparatus, computer device and storage medium for processing mapping information
CN110928947A (en) Data synchronization method and device based on button and related equipment
CN114115825B (en) Front-end and back-end data verification method compatible with software

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination