CN116977024A - Receipt address-based bill recognition method, device and equipment - Google Patents

Receipt address-based bill recognition method, device and equipment Download PDF

Info

Publication number
CN116977024A
CN116977024A CN202310947935.XA CN202310947935A CN116977024A CN 116977024 A CN116977024 A CN 116977024A CN 202310947935 A CN202310947935 A CN 202310947935A CN 116977024 A CN116977024 A CN 116977024A
Authority
CN
China
Prior art keywords
address
address information
receiving address
receiving
order
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310947935.XA
Other languages
Chinese (zh)
Inventor
李鑫海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Pinwei Software Co Ltd
Original Assignee
Guangzhou Pinwei Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Pinwei Software Co Ltd filed Critical Guangzhou Pinwei Software Co Ltd
Priority to CN202310947935.XA priority Critical patent/CN116977024A/en
Publication of CN116977024A publication Critical patent/CN116977024A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Lists, e.g. purchase orders, compilation or processing
    • G06Q30/0635Processing of requisition or of purchase orders

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a receipt address-based bill recognition method, a device and equipment, wherein the method comprises the steps of obtaining receipt address information of all orders in the same time period; analyzing the association relation among the receiving address information; grouping the orders according to the association relation among the receiving address information to obtain a plurality of groups of order sets; and determining whether the order placing user corresponding to the order set has a refreshing action or not according to the order quantity in each group of order set. Therefore, the application can determine the association relation of each order by utilizing the receiving address information of each order in the same time period, thereby identifying the action of the bill. In addition, since the data such as the receiving mobile phone number, the user identification and the like in the order information are easy to forge and the receiving address information is difficult to forge, the accuracy of the application can be further improved by utilizing the receiving address information to carry out the bill swiping identification.

Description

Receipt address-based bill recognition method, device and equipment
Technical Field
The present application relates to the field of information identification technologies, and in particular, to a method, an apparatus, and a device for identifying a receipt based on a receiving address.
Background
The online shopping is a common shopping mode in the life of people, and the online shopping gathers commodities in all places, and consumers can select the commodities in all places through the online shopping, so that the situation of supply and demand are often generated in the online shopping market, and merchants compete strongly. Because the commodity sales success rate that the consumer browses on shopping websites is higher first, and the commodity ordering mode on shopping websites is largely determined by the sales amount and the good score of commodities, based on the commodity ordering mode, a plurality of merchants adopt the mode of hiring personnel or purchasing a robot account to conduct bill brushing, and the sales amount and the good score are improved. However, this poor competing way causes a certain loss in reputation of the shopping website and the rights and interests of the rest of the merchants. Thus, there is a need for a method of identifying a bill, identifying the behavior of the bill, and processing the behavior of the bill in time.
Disclosure of Invention
In view of the above, the application provides a receipt address-based bill recognition method, device and equipment for recognizing the behavior of a bill and processing the behavior of the bill in time.
In order to achieve the above object, the following solutions have been proposed:
a receipt address-based bill recognition method comprises the following steps:
acquiring the receiving address information of all orders in the same time period;
analyzing the association relation among the receiving address information;
grouping the orders according to the association relation among the receiving address information to obtain a plurality of groups of order sets;
and determining whether the order placing user corresponding to the order set has a bill swiping action or not according to the order quantity in each group of the order set.
Optionally, the acquiring the receiving address information of all orders in the same time period includes:
acquiring the related data of the receiving address of each order in the same time period;
judging whether more than two administrative identifiers of the same administrative level exist in the goods receiving address related data;
and if the order is in the order, correcting the data related to the receiving address by using an administrative region library to obtain the receiving address information of the order.
Optionally, the correcting the receiving address related data by using the administrative area library to obtain receiving address information of the order includes:
searching the administrative region library, and selecting an address matched with the data related to the receiving address as a target address to obtain a plurality of target addresses;
scoring each of the target addresses using the shipping address related data;
and revising the goods receiving address related data into the target address with the highest grading, wherein the revised goods receiving address related data is the goods receiving address information of the order.
Optionally, scoring each of the target addresses using the receiving address-related data includes:
determining the coincidence of the target addresses according to the coincidence of the administrative identifier in each target address and the administrative identifier of the data related to the receiving address;
determining the accurate score of the target address according to the character intervals of different administrative identifiers in the target address in the related data of the receiving address;
and taking the sum of the coincidence score and the accurate score of the target address as the score of the target address.
Optionally, the analyzing the association relationship between the receiving address information includes:
calculating the similarity between every two receiving address information;
and carrying out hierarchical clustering on the receiving address information according to the size of each similarity to obtain a clustering result, wherein the clustering result comprises a plurality of clustering sets, each clustering set comprises more than one receiving address information, and the receiving address information in the same clustering set is the receiving address information with an association relation.
Optionally, calculating the similarity between the two pieces of receiving address information includes:
calculating the similarity between every two receiving address information by adopting a common character matching technology;
or alternatively, the first and second heat exchangers may be,
calculating the similarity between every two receiving address information by adopting an edit distance technology;
or alternatively, the first and second heat exchangers may be,
and calculating the similarity between every two pieces of receiving address information by adopting a Hamming distance measurement technology.
Optionally, the analyzing the association relationship between the receiving address information includes:
combining the receiving address information according to the ordering time to obtain a plurality of address combinations;
counting the number of each address combination;
character segmentation is carried out on the receiving address information corresponding to each address combination, so that a plurality of segmentation words corresponding to each address combination are obtained;
counting the occurrence times of each word of the address combination in the address combination;
calculating the number of address combinations containing the word;
calculating a Tf idf value of the word in the address combination according to the number of the address combinations containing the word, the occurrence times of the word in the address combination and the number;
and taking the word with the highest Tf value in each address combination as a secret number of the address combination, wherein the receiving address information with the secret number in the address combination is receiving address information with an association relationship.
Optionally, the calculating the Tfidf value of the word in the address combination according to the number of the address combinations containing the word, the number of occurrences of the word in the address combinations and the number includes:
calculating an inverse document coefficient of the word, wherein the inverse document coefficient is a natural logarithm of a target quotient, and the target quotient is a quotient of the number and the number of address combinations containing the word;
and taking the product of the inverse document coefficient and the occurrence number of the word as a Tf idf value of the word in the address combination.
A receipt address-based bill recognition device, comprising:
the acquisition unit is used for acquiring the receiving address information of all orders in the same time period;
the analysis unit is used for analyzing the association relation among the receiving address information;
the grouping unit is used for grouping the orders according to the association relation among the receiving address information to obtain a plurality of groups of order sets;
and the determining unit is used for determining whether the ordering user corresponding to the order set has a bill brushing action or not according to the order quantity in each group of the order set.
A receipt address-based bill identifying device comprising a memory and a processor;
the memory is used for storing programs;
the processor is used for executing the program to realize each step of the bill identifying method based on the receiving address.
According to the technical scheme, the receipt address-based receipt address recognition method provided by the application can be used for acquiring the receipt address information of all orders in the same time period; based on the analysis of the association relation between the receiving address information of each order, the application can acquire the association relation between each order by analyzing the association relation between the receiving address information of each order; grouping the orders according to the association relation among the receiving addresses to obtain a plurality of groups of order sets; determining whether a corresponding order placing user of the order set has a bill swiping action or not according to the order quantity in each group of the order set; based on the method, the device and the system, the brush list can be identified according to the association relation among the orders. Therefore, the application can determine the association relation of each order by utilizing the receiving address information of each order in the same time period, thereby identifying the action of the bill. In addition, since the data such as the receiving mobile phone number, the user identification and the like in the order information are easy to forge and the receiving address information is difficult to forge, the accuracy of the application can be further improved by utilizing the receiving address information to carry out the bill swiping identification.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a receipt address-based bill recognition method according to an embodiment of the present application;
fig. 2 is a block diagram of a bill identifying device based on a receiving address according to an embodiment of the present application;
fig. 3 is a hardware block diagram of a bill identifying device based on a receiving address according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The receipt address-based bill recognition method provided by the application can be applied to numerous general or special computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor devices, distributed computing environments that include any of the above devices or devices, and the like.
The receipt address-based bill recognition method of the present application will be described in detail with reference to fig. 1, and includes the following steps:
and S1, acquiring the receiving address information of all orders in the same time period.
Specifically, the receiving address information of each order of the same commodity in the same time period can be extracted sequentially according to the ordering time.
The time period can be set according to actual requirements, for example, the time period can be 1 hour or 30 minutes.
The receiving address information of all orders in the same hour can be obtained.
And S2, analyzing the association relation among the receiving address information.
Specifically, according to the similarity degree between the receiving address information, the association relationship between each receiving address information and other receiving address information can be analyzed in turn, and whether other receiving address information associated with each receiving address information exists or not can be determined.
And S3, grouping the orders according to the association relation among the receiving address information to obtain a plurality of groups of order sets.
Specifically, each order may be clustered according to an association relationship between each receiving address, to form a plurality of sets of order sets.
Each order in the same set of orders may be considered to belong to an order placed by the same user.
And S4, determining whether a bill brushing action exists for the order placing user corresponding to the order set according to the order quantity in each group of order sets.
Specifically, whether the order placing user corresponding to the order set has a swiping action or not can be determined according to the order quantity in each group of order set.
According to the technical scheme, the receipt address-based receipt address recognition method provided by the embodiment of the application obtains the receipt address information of all orders in the same time period; based on the analysis of the association relation between the receiving address information of each order, the application can acquire the association relation between each order by analyzing the association relation between the receiving address information of each order; grouping the orders according to the association relation among the receiving addresses to obtain a plurality of groups of order sets; determining whether a corresponding order placing user of the order set has a bill swiping action or not according to the order quantity in each group of the order set; based on the method, the device and the system, the brush list can be identified according to the association relation among the orders. Therefore, the application can determine the association relation of each order by utilizing the receiving address information of each order in the same time period, thereby identifying the action of the bill. In addition, since the data such as the receiving mobile phone number, the user identification and the like in the order information are easy to forge and the receiving address information is difficult to forge, the accuracy of the application can be further improved by utilizing the receiving address information to carry out the bill swiping identification.
In some embodiments of the present application, the process of acquiring the receiving address information of all orders in the same time period in step S1 is described in detail, and the steps are as follows:
s10, acquiring the related data of the receiving address of each order in the same time period.
Specifically, the data relating to the receiving address of each order is extracted using a character extraction technique.
S11, judging whether more than two administrative identifiers of the same administrative level exist in the goods receiving address related data, and if so, executing the step S12.
Specifically, each administrative identifier existing in the related data of each receiving address may be extracted, and it may be determined whether or not there are two or more administrative identifiers at the same administrative level.
When there are more than two administrative identifications in the same administrative level, executing step S12;
and when more than two administrative identifiers in the same administrative level do not exist, directly taking the data related to the receiving address as receiving address information.
And S12, correcting the goods receiving address related data by utilizing an administrative region library to obtain the goods receiving address information of the order.
Specifically, each address in the administrative region library may be used to revise the related data of the receiving address with more than two same administrative levels, and the revised related data of the receiving address is the receiving address information.
As can be seen from the above technical solution, the present embodiment provides an optional way of identifying and revising the erroneous receiving address related data, by which the reliability and accuracy of the receiving address information can be further improved, so that the accuracy of analyzing the association relationship between the receiving address information is further improved.
In some embodiments of the present application, the process of correcting the receiving address related data by using the administrative area library in step S12 to obtain the receiving address information of the order is described in detail, and the steps are as follows:
and S120, searching the administrative region library, and selecting an address matched with the data related to the receiving address as a target address to obtain a plurality of target addresses.
Specifically, the administrative region library may be searched by using the administrative identifier in the receiving address-related data, and the target address may be an address in the administrative region library that coincides with the administrative identifier of the receiving address-related data.
And S121, scoring each target address by utilizing the related data of the receiving address.
Specifically, the similarity of the receiving address-related data and each target address may be compared, and each of the target addresses may be scored.
S122, revising the related data of the receiving address to the target address with the highest grading, wherein the related data of the revised receiving address is the receiving address information of the order.
Specifically, the target addresses can be ranked according to the scoring magnitude of the target addresses, and the target address with the highest scoring is selected.
And replacing the related data of the receiving address with the corresponding target address with the highest score, and taking the related data of the receiving address finally obtained as receiving address information.
As can be seen from the above technical solutions, the present embodiment provides an alternative way of identifying erroneous receiving address related data and revising the erroneous receiving address related data. By the mode, the reliability of the receiving address information can be further improved.
In some embodiments of the present application, the scoring process of each target address by using the receiving address related data in step S121 is described in detail as follows:
s1210, determining the coincidence of the target addresses according to the coincidence of the administrative identifiers in each target address and the administrative identifiers of the data related to the receiving addresses.
Specifically, the coincidence of the target addresses may be set by comparing the coincidence of the respective administrative identifications in each target address with the respective administrative identifications of the data related to the receiving address.
The higher the degree of coincidence, the higher the overlap of the target addresses.
S1211, determining the accurate score of the target address according to the character intervals of different administrative identifiers in the target address in the related data of the receiving address.
Specifically, the accurate score of the target address may be set according to the character intervals of each administrative identifier in the target address in the receiving address-related data.
The smaller the character interval of each administrative identifier in the target address in the receiving address related data is, the higher the accurate score is.
S1212, taking the sum of the coincidence score and the accurate score of the target address as the score of the target address.
Specifically, the overlap division and the accurate division of the target address are added, and the obtained numerical value is the score of the target address.
According to the technical scheme, the embodiment provides an optional mode for scoring the target address, and the related data of the receiving address can be better revised through the mode, so that the accuracy of the receiving address information is improved.
In some embodiments of the present application, in order to allow the user to receive the goods normally, the variation degree of the receiving address information is small, so that the association relationship between the receiving address information can be determined according to the similarity between the receiving address information; in order to distinguish the order of the order, the user who has the order is considered to be more likely to add special characters on the receiving addresses to improve the distinguishing possibility, so that the association relationship between the receiving addresses can be judged by analyzing whether the special characters exist in the receiving address information. Based on this, this embodiment illustrates two optional implementation processes of analyzing the association relationship between the receiving address information in step S2, which are respectively as follows:
a first kind of,
The embodiment can verify the correlation between the receiving address information, specifically as follows:
s20, calculating the similarity between every two pieces of receiving address information.
Specifically, the degree of similarity between every two pieces of receiving address information can be compared, and the similarity between every two pieces of receiving address information can be obtained.
S21, hierarchical clustering is carried out on the receiving address information according to the size of each similarity to obtain a clustering result, wherein the clustering result comprises a plurality of clustering sets, each clustering set comprises more than one receiving address information, and the receiving address information in the same clustering set is the receiving address information with the association relation.
Specifically, the highest similarity can be selected from the similarities in turn, and when the highest similarity is greater than a similarity threshold, two receiving address information with the highest similarity are clustered to obtain each cluster set, and each receiving address information in the same cluster set is receiving address information with an association relationship.
The similarity threshold may be set according to actual requirements, and in general, the similarity threshold may be 0.7.
A second kind of,
The embodiment can verify through special characters, and the specific steps are as follows:
s20, combining the receiving address information according to the ordering time to obtain a plurality of address combinations.
Specifically, the receiving address information can be combined according to the order time sequence to obtain a plurality of address combinations. The respective shipping address information in the respective address combinations constitutes shipping address information for all orders.
S21, counting the number of each address combination.
In particular, the total number of address combinations may be counted.
S22, character segmentation is carried out on the receiving address information corresponding to each address combination, and a plurality of segmentation words corresponding to each address combination are obtained.
Specifically, character segmentation can be performed on the receiving address information in each address combination, and each segmentation word corresponding to each address combination is extracted.
The method comprises the steps of setting a plurality of regular expressions in advance, and carrying out character segmentation on the receiving address information according to the positive expression matched with the receiving address information so as to set different regular expressions aiming at different receiving address information to obtain segmentation.
The word segmentation can be an administrative identifier which is frequently appeared in order basic information, or can be a special character with lower frequency of occurrence under normal conditions.
S23, counting the occurrence times of each word segment of the address combination in the address combination.
Specifically, the number of occurrences of each word in each address combination may be counted.
S24, calculating the number of address combinations containing the segmentation.
Specifically, the number of address combinations including the segmentation may be counted.
S25, calculating the Tf idf value of the segmentation in the address combination according to the number of the address combination containing the segmentation, the occurrence frequency of the segmentation in the address combination and the number.
Specifically, the Tfidf value of the word in the address combination may be calculated by using the number of address combinations including the word, the number of occurrences of the word in the address combination, and the number of address combinations, so as to obtain the Tfidf value of each word in the address combination.
S26, taking the word with the highest Tf value in each address combination as the secret number of the address combination, wherein the receiving address information with the secret number in the address combination is the receiving address information with the association relation.
Specifically, the sizes of the Tfidf values corresponding to the address combination can be ordered, and the word corresponding to the Tfidf value which is highest and exceeds the risk threshold is selected as the secret number of the address combination, wherein the secret number is the special character of the list.
The receiving address information in which the secret number exists in the address combination may be used as receiving address information in which an association relationship exists.
As can be seen from the above technical solution, the present embodiment provides an optional manner of determining the receiving address information with the association relationship, by which the association relationship between the receiving address information and the special character existing in the receiving address information can be further determined, so as to further improve the accuracy of analyzing the association relationship.
In some embodiments of the present application, three optional steps S20 are provided to calculate the similarity between every two pieces of shipping address information, and the following details are provided: a step of
A first kind of,
S200, calculating the similarity between every two receiving address information by adopting a common character matching technology.
Specifically, the similarity is a quotient between an intersection between two pieces of receiving address information and a union between two pieces of receiving address information.
A second kind of,
S200, calculating the similarity between every two receiving address information by adopting an edit distance technology.
Specifically, the edit distance technique may be a quantitative measurement of the degree of difference between two strings by determining how many times one string needs to be processed to become another, where the number of processing times indicates the degree of similarity between the two shipping address information.
The smaller the number of processing times, the higher the similarity.
A third kind,
S200, calculating the similarity between every two receiving address information by adopting a Hamming distance measurement technology.
Specifically, the Hamming distance measurement technology is an algorithm of text compression, and can perform word segmentation, word coding, weighting, merging and dimension reduction on the receiving address information, so that each receiving address information is contracted into a section of 01 byte codes with fixed length, and the Hamming distance is different in number from the byte codes corresponding to the two receiving address information.
The smaller the Hamming distance, the higher the similarity.
According to the technical scheme, three optional modes for calculating the similarity between the two receiving address information are provided, and the mode for calculating the similarity can be selected pertinently according to actual requirements, so that the similarity between the two receiving address information is calculated better.
In some embodiments of the present application, the process of calculating Tfidf value of the word in the address combination according to the number of address combinations containing the word, the number of occurrences of the word in the address combinations, and the number in step S25 is described in detail, and the steps are as follows:
s250, calculating an inverse document coefficient of the word, wherein the inverse document coefficient is a natural logarithm of a target quotient, and the target quotient is a quotient of the number and the number of address combinations containing the word.
Specifically, the quotient of the total number of address combinations and the number of address combinations containing the word is calculated, and the natural logarithm of the quotient is calculated, so as to obtain the inverse document coefficient of the word.
S251, taking the product of the inverse document coefficient and the occurrence number of the word as a Tf idf value of the word in the address combination.
Specifically, the product of the inverse document coefficient and the number of occurrences of the term in the address combination may be calculated, and the resulting product is the Tfidf value of the term in the address combination.
As can be seen from the above technical solution, this embodiment provides an optional way to calculate the Tfidf value corresponding to each word, by which the Tfidf value of each word in each address combination can be further calculated, and the secret number in each address combination can be further determined.
The receipt address-based bill recognition device provided by the embodiment of the application is described below, and the receipt address-based bill recognition device described below and the receipt address-based bill recognition method described above can be referred to correspondingly.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a bill identifying device based on a receiving address according to an embodiment of the present application.
As shown in fig. 2, the receipt address-based bill recognition apparatus may include:
an acquiring unit 1, configured to acquire receiving address information of all orders in the same time period;
an analysis unit 2 for analyzing association relations between the respective shipping address information;
the grouping unit 3 is used for grouping the orders according to the association relation among the receiving address information to obtain a plurality of groups of order sets;
and the determining unit 4 is used for determining whether the order placing user corresponding to the order set has a bill brushing action or not according to the order quantity in each group of the order set.
Alternatively, the acquiring unit may include:
the receiving address related data acquisition unit is used for acquiring receiving address related data of each order in the same time period;
an administrative identifier judging unit, configured to judge whether or not there are more than two administrative identifiers of the same administrative level in the receiving address related data;
and the receiving address information acquisition unit is used for correcting the receiving address related data by utilizing an administrative region library if more than two administrative identifiers with the same administrative level exist, so as to obtain the receiving address information of the order.
Alternatively, the shipping address information acquisition unit may include:
an administrative region library searching unit, configured to search the administrative region library, and select an address matching with the data related to the receiving address as a target address, to obtain a plurality of target addresses;
a target address scoring unit for scoring each of the target addresses using the receiving address-related data;
and the target address selecting unit is used for revising the related data of the receiving address into the target address with the highest grading, and the revised related data of the receiving address is the receiving address information of the order.
Alternatively, the target address scoring unit may include:
the first target address scoring unit is used for determining the coincidence of the target addresses according to the coincidence of the administrative identifier in each target address and the administrative identifier of the data related to the receiving address;
the second target address scoring unit is used for determining the accurate score of the target address according to the character intervals of different administrative identifiers in the target address in the related data of the receiving address;
and the third target address scoring unit is used for taking the sum of the coincidence division and the accurate division of the target addresses as the score of the target addresses.
Alternatively, the analysis unit may include:
a similarity calculation unit for calculating the similarity between every two pieces of receiving address information;
and the receiving address information clustering unit is used for carrying out hierarchical clustering on each receiving address information according to the size of each similarity to obtain a clustering result, wherein the clustering result comprises a plurality of clustering sets, each clustering set comprises more than one receiving address information, and the receiving address information in the same clustering set is the receiving address information with an association relation.
Alternatively, the similarity calculation unit may include:
a first similarity calculation unit for calculating the similarity between every two pieces of receiving address information by adopting a common character matching technology;
or alternatively, the first and second heat exchangers may be,
calculating the similarity between every two receiving address information by adopting an edit distance technology;
or alternatively, the first and second heat exchangers may be,
and calculating the similarity between every two pieces of receiving address information by adopting a Hamming distance measurement technology.
Alternatively, the analysis unit may include:
the receiving address information combination unit is used for combining the receiving address information according to the ordering time to obtain a plurality of address combinations;
an address combination statistics unit for counting the number of each address combination;
the character segmentation unit is used for carrying out character segmentation on the receiving address information corresponding to each address combination to obtain a plurality of segmentation words corresponding to each address combination;
the occurrence count unit is used for counting the occurrence count of each word segment of the address combination in the address combination;
a number statistics unit for calculating the number of address combinations containing the word;
a Tfidf value calculation unit, configured to calculate a Tfidf value of the word in the address combination according to the number of address combinations including the word, the number of occurrences of the word in the address combination, and the number;
and the secret number determining unit is used for taking the word with the highest Tf idf value in each address combination as the secret number of the address combination, and the receiving address information with the secret number in the address combination is receiving address information with an association relation.
Alternatively, the Tfidf value calculation unit may include:
the first Tf value calculation unit is used for calculating an inverse document coefficient of the word, wherein the inverse document coefficient is the natural logarithm of a target quotient, and the target quotient is the quotient of the number and the number of the address combinations containing the word;
and a second Tfidf value calculation unit, configured to take a product of the inverse document coefficient and the occurrence number of the word as a Tfidf value of the word in the address combination.
The receipt address-based bill recognition device provided by the embodiment of the application can be applied to receipt address-based bill recognition equipment, such as a PC terminal, a cloud platform, a server cluster and the like. Optionally, fig. 3 shows a hardware structure block diagram of the receipt address-based bill identifying apparatus, and referring to fig. 3, the hardware structure of the receipt address-based bill identifying apparatus may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;
in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete the communication with each other through the communication bus 4;
processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application, etc.;
the memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory) or the like, such as at least one magnetic disk memory;
wherein the memory stores a program, the processor is operable to invoke the program stored in the memory, the program operable to:
acquiring the receiving address information of all orders in the same time period;
analyzing the association relation among the receiving address information;
grouping the orders according to the association relation among the receiving address information to obtain a plurality of groups of order sets;
and determining whether the order placing user corresponding to the order set has a bill swiping action or not according to the order quantity in each group of the order set.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
The embodiment of the present application also provides a readable storage medium storing a program adapted to be executed by a processor, the program being configured to:
acquiring the receiving address information of all orders in the same time period;
analyzing the association relation among the receiving address information;
grouping the orders according to the association relation among the receiving address information to obtain a plurality of groups of order sets;
and determining whether the order placing user corresponding to the order set has a bill swiping action or not according to the order quantity in each group of the order set.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Various embodiments of the present application may be combined with each other. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The receipt address-based bill recognition method is characterized by comprising the following steps of:
acquiring the receiving address information of all orders in the same time period;
analyzing the association relation among the receiving address information;
grouping the orders according to the association relation among the receiving address information to obtain a plurality of groups of order sets;
and determining whether the order placing user corresponding to the order set has a bill swiping action or not according to the order quantity in each group of the order set.
2. The method for identifying a receipt address based on claim 1, wherein the acquiring receipt address information of all orders in the same time period comprises:
acquiring the related data of the receiving address of each order in the same time period;
judging whether more than two administrative identifiers of the same administrative level exist in the goods receiving address related data;
and if the order is in the order, correcting the data related to the receiving address by using an administrative region library to obtain the receiving address information of the order.
3. The method for identifying a receipt address based on claim 2, wherein the correcting the receipt address related data using an administrative domain library to obtain the receipt address information of the order comprises:
searching the administrative region library, and selecting an address matched with the data related to the receiving address as a target address to obtain a plurality of target addresses;
scoring each of the target addresses using the shipping address related data;
and revising the goods receiving address related data into the target address with the highest grading, wherein the revised goods receiving address related data is the goods receiving address information of the order.
4. A method of claim 3, wherein scoring each of the target addresses using the shipping address related data comprises:
determining the coincidence of the target addresses according to the coincidence of the administrative identifier in each target address and the administrative identifier of the data related to the receiving address;
determining the accurate score of the target address according to the character intervals of different administrative identifiers in the target address in the related data of the receiving address;
and taking the sum of the coincidence score and the accurate score of the target address as the score of the target address.
5. The receipt address-based method of claim 1, wherein the analyzing the association between the receipt address information includes:
calculating the similarity between every two receiving address information;
and carrying out hierarchical clustering on the receiving address information according to the size of each similarity to obtain a clustering result, wherein the clustering result comprises a plurality of clustering sets, each clustering set comprises more than one receiving address information, and the receiving address information in the same clustering set is the receiving address information with an association relation.
6. The receipt address based method for identifying a receipt of a bill according to claim 5, wherein calculating the similarity between each two receipt address information comprises:
calculating the similarity between every two receiving address information by adopting a common character matching technology;
or alternatively, the first and second heat exchangers may be,
calculating the similarity between every two receiving address information by adopting an edit distance technology;
or alternatively, the first and second heat exchangers may be,
and calculating the similarity between every two pieces of receiving address information by adopting a Hamming distance measurement technology.
7. The receipt address-based method of claim 1, wherein the analyzing the association between the receipt address information includes:
combining the receiving address information according to the ordering time to obtain a plurality of address combinations;
counting the number of each address combination;
character segmentation is carried out on the receiving address information corresponding to each address combination, so that a plurality of segmentation words corresponding to each address combination are obtained;
counting the occurrence times of each word of the address combination in the address combination;
calculating the number of address combinations containing the word;
calculating a Tf idf value of the word in the address combination according to the number of the address combinations containing the word, the occurrence times of the word in the address combination and the number;
and taking the word with the highest Tf value in each address combination as a secret number of the address combination, wherein the receiving address information with the secret number in the address combination is receiving address information with an association relationship.
8. The method for identifying a receipt address based on claim 7, wherein said calculating Tfidf value of the term in the address combination based on the number of address combinations containing the term, the number of occurrences of the term in the address combinations, and the number comprises:
calculating an inverse document coefficient of the word, wherein the inverse document coefficient is a natural logarithm of a target quotient, and the target quotient is a quotient of the number and the number of address combinations containing the word;
and taking the product of the inverse document coefficient and the occurrence number of the word as a Tf idf value of the word in the address combination.
9. A receipt address-based bill recognition device, comprising:
the acquisition unit is used for acquiring the receiving address information of all orders in the same time period;
the analysis unit is used for analyzing the association relation among the receiving address information;
the grouping unit is used for grouping the orders according to the association relation among the receiving address information to obtain a plurality of groups of order sets;
and the determining unit is used for determining whether the ordering user corresponding to the order set has a bill brushing action or not according to the order quantity in each group of the order set.
10. A receipt address-based bill recognition device, comprising a memory and a processor;
the memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the order-receipt-based order-receipt identification method according to any one of claims 1 to 8.
CN202310947935.XA 2023-07-31 2023-07-31 Receipt address-based bill recognition method, device and equipment Pending CN116977024A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310947935.XA CN116977024A (en) 2023-07-31 2023-07-31 Receipt address-based bill recognition method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310947935.XA CN116977024A (en) 2023-07-31 2023-07-31 Receipt address-based bill recognition method, device and equipment

Publications (1)

Publication Number Publication Date
CN116977024A true CN116977024A (en) 2023-10-31

Family

ID=88470906

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310947935.XA Pending CN116977024A (en) 2023-07-31 2023-07-31 Receipt address-based bill recognition method, device and equipment

Country Status (1)

Country Link
CN (1) CN116977024A (en)

Similar Documents

Publication Publication Date Title
CN110866181B (en) Resource recommendation method, device and storage medium
CN107341716B (en) Malicious order identification method and device and electronic equipment
CN108121737B (en) Method, device and system for generating business object attribute identifier
US9767144B2 (en) Search system with query refinement
CN111523976A (en) Commodity recommendation method and device, electronic equipment and storage medium
CN110827112B (en) Deep learning commodity recommendation method and device, computer equipment and storage medium
CN109064293B (en) Commodity recommendation method and device, computer equipment and storage medium
CN107911448B (en) Content pushing method and device
CN102609422A (en) Class misplacing identification method and device
KR102064589B1 (en) System and method for recommending goods
CN111325609A (en) Commodity recommendation list determining method and device, electronic equipment and storage medium
CN111178949A (en) Service resource matching reference data determination method, device, equipment and storage medium
CN110046298A (en) Query word recommendation method and device, terminal device and computer readable medium
CN110766486A (en) Method and device for determining item category
CN111966886A (en) Object recommendation method, object recommendation device, electronic equipment and storage medium
CN113032668A (en) Product recommendation method, device and equipment based on user portrait and storage medium
CN111242709A (en) Message pushing method and device, equipment and storage medium thereof
CN111488385A (en) Data processing method and device based on artificial intelligence and computer equipment
CN105589901A (en) E-commerce public praise analysis system and method thereof
CN113077317A (en) Item recommendation method, device and equipment based on user data and storage medium
CN110717097A (en) Service recommendation method and device, computer equipment and storage medium
CN115311042A (en) Commodity recommendation method and device, computer equipment and storage medium
CN112288510A (en) Article recommendation method, device, equipment and storage medium
CN111383049A (en) Product recommendation method, device and storage medium
CN110827101A (en) Shop recommendation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination