CN115659953A - Address similarity real-time calculation method, system, storage medium and computer equipment - Google Patents

Address similarity real-time calculation method, system, storage medium and computer equipment Download PDF

Info

Publication number
CN115659953A
CN115659953A CN202211275023.4A CN202211275023A CN115659953A CN 115659953 A CN115659953 A CN 115659953A CN 202211275023 A CN202211275023 A CN 202211275023A CN 115659953 A CN115659953 A CN 115659953A
Authority
CN
China
Prior art keywords
address
similarity
real
time
platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211275023.4A
Other languages
Chinese (zh)
Inventor
崔一凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kangjian Information Technology Shenzhen Co Ltd
Original Assignee
Kangjian Information Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kangjian Information Technology Shenzhen Co Ltd filed Critical Kangjian Information Technology Shenzhen Co Ltd
Priority to CN202211275023.4A priority Critical patent/CN115659953A/en
Publication of CN115659953A publication Critical patent/CN115659953A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method, a system, a storage medium and computer equipment for calculating address similarity in real time, which relate to the technical field of artificial intelligence, and comprise the following steps: acquiring a receiving address of a real-time shopping order of an online shopping platform; cutting words of the receiving address to obtain an address word vector; and calculating similarity between addressees based on the address word vectors so as to judge whether the real-time shopping orders are intercepted according to the similarity. The real-time address similarity calculation method, the real-time address similarity calculation system, the storage medium and the computer equipment can perform similarity calculation based on the address data of the shopping orders acquired in real time, and effectively reduce the risk of an online shopping platform.

Description

Address similarity real-time calculation method, system, storage medium and computer equipment
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method, a system, a storage medium and computer equipment for calculating address similarity in real time.
Background
With the rapid popularization of the internet and the vigorous development of the e-commerce economy, countless online shopping platforms are attracting people's attention. In order to expand the market share of each large online shopping platform, various coupon subsidy activities are generally performed. With the continuous development of the wool party group, coupons issued by the online shopping platform to normal consumers can be accepted and consumed by the wool party, thereby causing the fund loss of the online shopping platform.
As the ordering addresses of the wool parties have great similarity, the discrimination of the wool parties is carried out by calculating the similarity of the ordering addresses of the users. In the prior art, an address similarity calculation mode of t +1 offline data is generally adopted. This approach has the following disadvantages:
(1) The timeliness is low, the activity data of the day can be known only the next day, and the actual requirements of activity effect monitoring and wind control early warning cannot be met;
(2) The risk is great, can't discover the risk that has taken place in time, and interception afterwards not only can cause a large amount of customer complaints, influences the platform reputation moreover, increases the pressure of customer service.
Disclosure of Invention
In view of the above drawbacks of the prior art, an object of the present invention is to provide a method, a system, a storage medium, and a computer device for calculating an address similarity in real time, which can perform similarity calculation based on address data of a shopping order collected in real time, and effectively reduce risks of an online shopping platform.
In order to achieve the above objects and other related objects, the present invention provides a method for calculating address similarity in real time, which is applied to a third-party service platform, and comprises the following steps: acquiring an addressee of a real-time shopping order of an online shopping platform; the online shopping platform comprises an application and data integration platform, a stream processing platform and a database; cutting words of the receiving address to obtain an address word vector; and calculating similarity between addressees based on the address word vectors so as to judge whether the real-time shopping orders are intercepted according to the similarity.
The invention provides an address similarity real-time computing system, which is applied to a third-party service platform and comprises:
the address acquisition module is used for acquiring the addressee of the real-time shopping order of the online shopping platform; the online shopping platform comprises an application and data integration platform, a stream processing platform and a database;
the vector acquisition module is used for segmenting words of the addressee to acquire address word vectors;
and the calculating module is used for calculating the similarity between the addressees based on the address word vector so as to judge whether to intercept the real-time shopping order according to the similarity.
The invention provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the real-time address similarity calculation method when executing the computer program.
The present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described address similarity real-time calculation method.
As described above, the address similarity real-time calculation method, system, storage medium and computer device of the present invention have the following advantages:
(1) Based on the address data collected in real time, calculating the similarity of the online shopping receiving addresses to intercept orders with more similar receiving addresses in the same range in real time, thereby effectively reducing the risk of an online shopping platform;
(2) The problem of address concentration when the wool party issues the list can be effectively intercepted, and the problem of platform material loss caused by brushing and stripping wool of the wool party in the follow-up process is solved;
(3) By calculating the similarity of the addressees of the real-time shopping orders, judging the effectiveness of the real-time shopping orders according to the similarity and intercepting malicious orders in time, the complaint rate of customers is effectively reduced, the pressure of customer service is relieved, and the operation cost of the online shopping platform is saved.
Drawings
FIG. 1 is a flowchart illustrating a method for real-time address similarity calculation according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a real-time address similarity calculation method according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for address similarity calculation according to another embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating an embodiment of a real-time address similarity computing system according to the present invention;
FIG. 5 is a schematic diagram of a computer apparatus according to an embodiment of the present invention.
Description of the element reference numerals
41. Address acquisition module
42. Vector acquisition module
43. Computing module
51. Processor/processing unit
52. Memory device
521. Random access memory
522. Cache memory
523. Storage system
524. Program/utility tool
5241. Program module
53. Bus line
54. Input/output interface
55. Network adapter
Detailed Description
The following embodiments of the present invention are provided by way of specific examples, and other advantages and effects of the present invention will be readily apparent to those skilled in the art from the disclosure herein. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
It should be noted that the drawings provided in the present embodiment are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
According to the method, the system, the storage medium and the computer equipment for calculating the address similarity in real time, the Flink framework is adopted, and similarity calculation is carried out on the basis of the address data of the shopping orders collected in real time, so that orders with a large number of similar addressees in the same range are intercepted in real time, and the problem of platform resource loss caused by brushing and stripping wool from wool parties in the follow-up process is solved.
As shown in fig. 1 and fig. 2, in an embodiment, the address similarity real-time calculation method of the present invention is applied to a third-party service platform, and includes the following steps:
s1, acquiring an addressee of a real-time shopping order of an online shopping platform, wherein the online shopping platform comprises an application and data integration platform, a stream processing platform and a database.
Specifically, in the invention, the addressee of the real-time shopping order generated by the online shopping platform is stored to the data processing platform in real time. And the receiving address can be obtained in real time by monitoring the data processing platform.
In an embodiment of the present invention, the obtaining of the addressee of the real-time shopping order of the online shopping platform adopts one or more of the following ways:
11 Monitoring messages of an application and data integration platform to acquire a first receiving address from the Roma platform, and storing the first receiving address to the application and data integration platform in real time.
Specifically, the application and data integration platform adopts a Roma platform. The Roma platform is an application and data integration platform, focuses on application and data connection, provides data, API (application programming interface), message and equipment integration capability, can be connected with the cloud and the cloud quickly, eliminates a digital gap, and realizes digital transformation. The Roma platform mainly comprises four components of data integration (FDI), service integration (APIC), message integration (MQS) and equipment integration (LINK). The data integration supports flexible, rapid and non-intrusive data integration among various data sources (texts, messages, APIs (application program interfaces), relational data, non-relational data and the like), can realize a cross-machine-room, cross-data-center and cross-cloud data integration scheme, and can implement, operate, maintain and monitor integrated data by self. In the invention, the receiving address is stored to the Roma platform in real time, and the receiving address can be acquired from the Roma platform by monitoring the message of the Roma platform.
12 Monitoring messages of the stream processing platform to obtain a second receiving address from the stream processing platform, and storing the second receiving address to the stream processing platform in real time.
Specifically, the stream processing platform adopts a kafka stream processing platform. The kafka stream processing platform is a distributed, partition-supporting (partitioning), multi-copy (replenisher) distributed message system based on zookeeper coordination, and has the biggest characteristic that a large amount of data can be processed in real time to meet various demand scenarios, such as a hadoop-based batch processing system, a low-delay real-time system, a storm/Spark stream processing engine, a web/nginx log, an access log, a message service and the like. In the invention, the receiving address is stored in the kafka platform in real time, and the receiving address can be acquired from the Roma platform by monitoring the message of the kafka platform.
13 Monitoring the information of the database to obtain a third receiving address from the database, and storing the third receiving address to the database in real time.
Specifically, the database is a Mysql database. The Mysql database is a relational database management system that uses a standard sql data language form, can be enabled on multiple systems, and supports multiple languages such as C, C + +, python, java, perl, PHP, eiffel, ruby, tcl, and the like. Where the Mysql database stores data in different tables instead of putting all data in one large repository, increasing speed and flexibility. In the invention, the addressee is stored in the Mysql platform in real time, and the addressee can be acquired from the Roma platform by monitoring the message of the Mysql platform.
Wherein the recipient address includes the first recipient address, the second recipient address, and the third recipient address.
And S2, segmenting words of the addressee to obtain address word vectors.
Specifically, in order to perform similarity calculation of a recipient address, an address word vector of the recipient address needs to be extracted first. According to the method, word segmentation is carried out on the receiving address by adopting a word segmentation algorithm, and then all words forming the receiving address obtained by word segmentation are mapped to all dimensions of a vector, so that a corresponding address word vector is generated.
As shown in fig. 3, in an embodiment of the present invention, when performing word segmentation on the recipient address, one or more of the following word segmentation algorithms are used:
21 Word segmentation algorithm based on string matching
Specifically, the word segmentation algorithm based on character string matching matches a Chinese character string to be analyzed with a vocabulary entry in a "sufficiently large" machine dictionary according to a certain strategy, and if a certain character string is found in the dictionary, the matching is successful (a word is recognized). According to different scanning directions, the string matching word segmentation method can adopt forward matching or reverse matching; according to the condition of preferential matching of different lengths, maximum (longest) matching and minimum (shortest) matching can be adopted; according to whether the method is combined with the part-of-speech tagging process or not, a simple word segmentation method and an integrated method combining word segmentation and tagging can be adopted. Preferably, the word segmentation algorithm based on character string matching may adopt one or more combinations of a forward maximum matching method, a reverse maximum matching method and a minimum segmentation method. In the invention, words in the addressee are matched with words in a machine dictionary according to a preset strategy based on a word segmentation algorithm matched with a character string, and if a word matched with any word of the addressee is found in the machine dictionary, the word is obtained by segmenting the word of the addressee.
22 Word segmentation algorithm based on understanding
Specifically, the word segmentation algorithm based on understanding achieves the effect of recognizing words by enabling a computer to simulate human understanding of sentences; the basic idea is to analyze syntax and semantics while segmenting words, and to process ambiguity phenomenon by using syntax information and semantic information. In the invention, the recipient address is subjected to syntactic and semantic analysis based on an understood word segmentation algorithm, and word segmentation is carried out on the recipient address by utilizing syntactic information and semantic information.
23 Statistical-based word segmentation algorithm
Specifically, a word is a combination of words, and the more times adjacent words appear simultaneously in the context, the more likely it is to constitute a word. Therefore, the frequency or probability of the co-occurrence of the characters and the adjacent characters can better reflect the credibility of the words. The word segmentation algorithm based on statistics can count the frequency of the combination of adjacent co-occurring words in the language and calculate the co-occurring information of the words. And defining the mutual occurrence information of the two characters, and calculating the adjacent co-occurrence probability of the two Chinese characters X and Y. The mutual presentation information embodies the closeness of the association between the Chinese characters. When the degree of compactness is higher than a certain threshold, it is considered that the word group may constitute a word. In the invention, a word segmentation algorithm based on statistics is used for counting the combination frequency of adjacent co-occurring characters in the receiving address, and when the combination frequency of the adjacent co-occurring characters is higher than a preset threshold value, the adjacent co-occurring characters are judged to be words obtained by word segmentation of the receiving address.
And S3, calculating the similarity of two receiving addresses based on the address word vector, and judging whether to intercept the real-time shopping order according to the similarity.
Specifically, for two recipient addresses, the dot product of the word vectors for the two recipient addresses is calculated from cos (α, β) = (α | | | | | | | | | β |), where α, β represent the word vectors for the two recipient addresses and | | | | represents the modulus of the vectors. Then the dot product is taken as the similarity of the two recipient addresses. It should be noted that each recipient address corresponds to an address word vector. And sequentially combining the sub-vectors corresponding to each participle of the receiving address to form the address word vector.
Preferably, to ensure the validity of the calculated similarity, two recipient addresses of the same area are selected first, and then the similarity of the two recipient addresses of the same area is calculated.
In an embodiment of the present invention, the method for calculating address similarity in real time further includes storing the similarity in a target database, such as a Redis database, so as to facilitate subsequent query and analyze whether a shopping order corresponding to the similarity is an abnormal order. For example, when the similarity of two shopping orders is greater than a preset threshold, it is determined that the two shopping orders are abnormal, and ordering limitation can be performed for ordering addresses and ordering persons of the abnormal shopping orders.
In an embodiment of the present invention, determining whether to intercept the real-time shopping order according to the similarity includes: if the similarity of the plurality of addressees is calculated to be high in the same area, the fact that the corresponding shopping order is possibly a bad order generated by scrubbing the order by the wool party is indicated. At the moment, the corresponding wind control system or the corresponding transaction system can intercept the plurality of addressees in time, so that the loss of the online shopping platform is avoided. For example, when the similarity of every two addressees in the same area is smaller than a first threshold value, the corresponding real-time shopping orders are processed normally; and when the pairwise similarity of the addressees in the same area is greater than a second threshold value, intercepting the corresponding real-time shopping order so that the real-time shopping order cannot be submitted.
It should be noted that, in an embodiment of the present invention, the address similarity real-time calculation method of the present invention is implemented by using a Flink framework. The Flink framework is mapped to stream data streams after execution, each Flink data stream starting with one or more sources (data input, e.g. message queue or file system) and ending with one or more receivers (data output, e.g. message queue, file system or database, etc.). The Flink framework can perform any number of transformations on streams that can be arranged into directed acyclic data flow graphs, allowing applications to branch and merge data flows. In the invention, the address of the real-time shopping order of the online shopping platform is input into a flight framework to form a flight data stream; and calculating the store address of the word vector of the receiving address aiming at the Flink data stream so as to obtain the similarity between the receiving addresses. Therefore, the method can realize the real-time calculation of the address similarity of the shopping orders by adopting the Flink frame, and provides a foundation for timely intercepting bad orders.
The address similarity real-time calculation method of the present invention is further described below by using specific embodiments.
In this embodiment, the online shopping platform is a medical material shopping platform. The medical supplies shopping platform issues a series of coupons to users in order to facilitate consumption. When a user places an order, the medical material shopping platform stores the addressee of the real-time shopping order to one or more of the Roma platform, the kafka stream processing platform and the Mysql database. And the Flink platform for calculating the address similarity in real time monitors the Roma platform, the kafka stream processing platform and the Mysql database, and acquires the receiving address in real time. Meanwhile, the Flink platform carries out region division on the receiving addresses, and the pairwise similarity between the receiving addresses in the same region is calculated.
And when the similarity of every two addressees in the same area is smaller than the first threshold, the similarity between the addressees is not strong, and the corresponding real-time shopping order is an effective order and can be normally processed.
When the similarity of every two addressees in the same area is more than the second threshold, the similarity between the corresponding addressees is strong, and the corresponding real-time shopping order is possibly generated by scrubbing the order by the wool party and needs to be intercepted in time.
As shown in fig. 4, in an embodiment, the address similarity real-time computing system of the present invention is applied to a third-party service platform, and includes:
the address obtaining module 41 is configured to obtain an address of a real-time shopping order of an online shopping platform, where the online shopping platform includes an application and data integration platform, a stream processing platform, and a database.
Specifically, in the invention, the addressees of the real-time shopping orders generated by the online shopping platform are stored in the data processing platform in real time. And the recipient address can be obtained in real time by monitoring the data processing platform.
In an embodiment of the present invention, the obtaining of the addressee of the real-time shopping order of the online shopping platform adopts one or more of the following ways:
11 Monitoring messages of an application and data integration platform to acquire a first receiving address from the Roma platform, and storing the first receiving address to the application and data integration platform in real time.
Specifically, the application and data integration platform adopts a Roma platform. The Roma platform is an application and data integration platform, focuses on application and data connection, provides data, API (application programming interface), message and equipment integration capabilities, can be connected with the cloud, eliminates a digital gap and realizes digital transformation. The Roma platform mainly comprises four components of data integration (FDI), service integration (APIC), message integration (MQS) and equipment integration (LINK). The data integration supports flexible, rapid and non-intrusive data integration among various data sources (texts, messages, APIs (application program interfaces), relational data, non-relational data and the like), can realize a cross-machine-room, cross-data-center and cross-cloud data integration scheme, and can implement, operate, maintain and monitor integrated data by self. In the invention, the receiving address is stored to the Roma platform in real time, and the receiving address can be acquired from the Roma platform by monitoring the message of the Roma platform.
12 Monitoring messages of the stream processing platform to obtain a second receiving address from the stream processing platform, and storing the second receiving address to the stream processing platform in real time.
Specifically, the stream processing platform adopts a kafka stream processing platform. The kafka stream processing platform is a distributed, partition-supporting (partitioning), multi-copy (replenisher) distributed message system based on zookeeper coordination, and has the biggest characteristic that a large amount of data can be processed in real time to meet various demand scenarios, such as a hadoop-based batch processing system, a low-delay real-time system, a storm/Spark stream processing engine, a web/nginx log, an access log, a message service and the like. In the invention, the receiving address is stored in the kafka platform in real time, and the receiving address can be acquired from the Roma platform by monitoring the message of the kafka platform.
13 Monitoring the information of the database to obtain a third receiving address from the database, and storing the third receiving address to the database in real time.
Specifically, the database is a Mysql database. The Mysql database is a relational database management system that uses a standard sql data language format, can be hosted on multiple systems, and supports multiple languages, such as C, C + +, python, java, perl, PHP, eiffel, ruby, and Tcl, among others. Where the Mysql database stores data in different tables instead of putting all data in one large repository, increasing speed and flexibility. In the invention, the addressee is stored in the Mysql platform in real time, and the addressee can be acquired from the Roma platform by monitoring the message of the Mysql platform.
The recipient address includes the first recipient address, the second recipient address, and the third recipient address.
And the vector obtaining module 42 is connected to the address obtaining module 41, and is configured to cut words of the recipient address to obtain address word vectors.
Specifically, in order to perform similarity calculation of the recipient addresses, first, address word vectors of the recipient addresses need to be extracted. According to the method, word segmentation is carried out on the receiving address by adopting a word segmentation algorithm, and then all words forming the receiving address obtained by word segmentation are mapped to all dimensions of a vector, so that a corresponding address word vector is generated.
As shown in fig. 3, in an embodiment of the present invention, when performing word segmentation on the recipient address, one or more of the following word segmentation algorithms are used:
21 Word segmentation algorithm based on string matching
Specifically, the word segmentation algorithm based on character string matching matches a Chinese character string to be analyzed with a vocabulary entry in a "sufficiently large" machine dictionary according to a certain strategy, and if a certain character string is found in the dictionary, the matching is successful (a word is recognized). According to different scanning directions, the string matching word segmentation method can adopt forward matching or reverse matching; according to the condition of preferential matching of different lengths, maximum (longest) matching and minimum (shortest) matching can be adopted; according to whether the method is combined with the part-of-speech tagging process or not, a simple word segmentation method and an integrated method combining word segmentation and tagging can be adopted. Preferably, the word segmentation algorithm based on character string matching may adopt one or more combinations of a forward maximum matching method, a reverse maximum matching method and a minimum segmentation method. In the invention, words in the addressee are matched with words in a machine dictionary according to a preset strategy based on a word segmentation algorithm matched with a character string, and if a word matched with any word of the addressee is found in the machine dictionary, the word is obtained by segmenting the word of the addressee.
22 Word segmentation algorithm based on understanding
Specifically, the word segmentation algorithm based on understanding achieves the effect of recognizing words by enabling a computer to simulate human understanding of sentences; the basic idea is to analyze syntax and semantics while segmenting words and process ambiguity phenomena by using syntax information and semantic information. In the invention, the syntactic and semantic analysis is carried out on the addressee based on the understood word segmentation algorithm, and the word segmentation is carried out on the addressee by utilizing syntactic information and semantic information.
23 Statistical-based word segmentation algorithm
Specifically, a word is a combination of words, and the more times adjacent words appear simultaneously in the context, the more likely it is to constitute a word. Therefore, the frequency or probability of the co-occurrence of the characters and the adjacent characters can better reflect the credibility of the words. The word segmentation algorithm based on statistics can count the frequency of the combination of each word which is adjacent to each other in the data, and calculate the mutual occurrence information of the words. And defining the mutual occurrence information of the two characters, and calculating the adjacent co-occurrence probability of the two Chinese characters X and Y. The mutual presentation information embodies the closeness of the association between the Chinese characters. When the degree of closeness is above a certain threshold, it is considered that the word group may constitute a word. In the invention, a word segmentation algorithm based on statistics is used for counting the combination frequency of adjacent co-occurring characters in the receiving address, and when the combination frequency of the adjacent co-occurring characters is higher than a preset threshold value, the adjacent co-occurring characters are judged to be words obtained by word segmentation of the receiving address.
And the calculating module 43 is connected to the vector acquiring module 42, and is configured to calculate similarity between two addressees based on the address word vector, so as to determine whether to intercept the real-time shopping order according to the similarity.
Specifically, for two recipient addresses, the dot product of the word vectors for the two recipient addresses is calculated from cos (α, β) = (α | | | | | | | | | β | |), where α, β represent the word vectors for the two recipient addresses and | | | represents the modulus of the vector. Then the dot product is taken as the similarity of the two recipient addresses. It should be noted that each recipient address corresponds to an address word vector. And combining the sub-vectors corresponding to each participle of the receiving address in sequence to form the address word vector.
Preferably, to ensure the validity of the calculated similarity, two recipient addresses of the same area are selected first, and then the similarity of the two recipient addresses of the same area is calculated.
In an embodiment of the present invention, the real-time address similarity calculation module further includes a storage module, configured to store the similarity into a target database, such as a Redis database, so as to facilitate subsequent query, and analyze whether a shopping order corresponding to the similarity is an abnormal order. For example, when the similarity of two shopping orders is greater than a preset threshold, it is determined that the two shopping orders are abnormal, and ordering limitation can be performed for ordering addresses and ordering persons of the abnormal shopping orders.
In an embodiment of the present invention, determining whether to intercept the real-time shopping order according to the similarity includes: if the similarity of the plurality of addressees is calculated to be high in the same area, the fact that the corresponding shopping order is possibly a bad order generated by scrubbing the order by the wool party is indicated. At the moment, the corresponding control module can intercept the plurality of addressees in time, so that the loss of the online shopping platform is avoided. For example, when the similarity of every two addressees in the same area is smaller than a first threshold value, the corresponding real-time shopping orders are processed normally; and when the pairwise similarity of the addressees in the same area is greater than a second threshold value, intercepting the corresponding real-time shopping order so that the real-time shopping order cannot be submitted.
It should be noted that, in an embodiment of the present invention, the address similarity real-time computing system of the present invention is implemented by using a Flink framework. The Flink framework is mapped to stream data streams after execution, each Flink data stream starting with one or more sources (data input, e.g. message queue or file system) and ending with one or more receivers (data output, e.g. message queue, file system or database, etc.). The Flink framework can perform any number of transformations on streams that can be arranged into directed acyclic data flow diagrams, allowing applications to branch and merge data flows. In the invention, the address of the real-time shopping order of the online shopping platform is input into a flight frame to form a flight data stream; and calculating the dot product of the word vectors of the receiving addresses aiming at the Flink data stream so as to obtain the similarity between the receiving addresses. Therefore, the method can realize the real-time calculation of the address similarity of the shopping orders by adopting the Flink frame, and provides a foundation for timely intercepting bad orders.
It should be noted that the division of each module of the above apparatus is only a logical division, and all or part of the actual implementation may be integrated into one physical entity or may be physically separated. And these modules can be realized in the form of software called by processing element; or can be implemented in the form of hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the x module may be a processing element that is set up separately, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the x module may be called and executed by a processing element of the apparatus. The other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. As another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
The computer readable storage medium of the present invention stores thereon a computer program, which when executed by a processor implements the steps of the above-described address similarity real-time calculation method. Preferably, the storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.
Any combination of one or more storage media may be employed. The storage medium may be a computer-readable signal medium or a computer-readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a RAM, a ROM, an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The present invention is described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the computer program instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
In an embodiment, the computer device of the present invention includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the real-time address similarity calculation method when executing the computer program.
The memory includes: various media that can store program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components.
As shown in FIG. 5, the computer device of the present invention is embodied in a general purpose computing device. Components of the computer device may include, but are not limited to: one or more processors or processing units 51, a memory 52, and a bus 53 that couples the various system components including the memory 52, the processors or processing units 51.
Bus 53 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
The computer device typically includes a variety of computer system readable media. Such media may be any available media that is accessible by a computing device and includes both volatile and nonvolatile media, removable and non-removable media.
The memory 52 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 521 and/or cache memory 522. The computer device may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 523 may be used to read from and write to non-removable, nonvolatile magnetic media (commonly referred to as a "hard disk drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 53 by one or more data media interfaces. Memory 52 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 524 having a set (at least one) of program modules 5241, such program modules 5241 including but not limited to an operating system, one or more application programs, other program modules, and program data, may be stored in, for example, the memory 52, each of which examples or some combination thereof may include an implementation of a network environment. The program modules 5241 generally perform the functions and/or methods of the described embodiments of the invention.
The computing device may also communicate with one or more external devices (e.g., keyboard, pointing device, display, etc.), one or more devices that enable a user to interact with the computing device, and/or any devices (e.g., network card, modem, etc.) that enable the computing device to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 54. Also, the computer device may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) through the network adapter 55. As shown in FIG. 5, the network adapter 55 communicates with the other modules of the computer device via the bus 53. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the computer device, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, to name a few.
In summary, the real-time address similarity calculation method, the real-time address similarity calculation system, the real-time storage medium and the computer device calculate the similarity of the online shopping recipient addresses based on the real-time acquired address data so as to intercept orders with more similar recipient addresses in the same range in real time, thereby effectively reducing the risk of an online shopping platform; the problem of address concentration when the wool party issues the list can be effectively intercepted, and the problem of platform material loss caused by brushing and stripping wool of the wool party in the follow-up process is solved; by calculating the similarity of the addressees of the real-time shopping orders, judging the effectiveness of the real-time shopping orders according to the similarity and intercepting malicious orders in time, the complaint rate of customers is effectively reduced, the pressure of customer service is relieved, and the operation cost of the online shopping platform is saved. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Those skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A real-time address similarity calculation method is applied to a third-party service platform and is characterized by comprising the following steps:
acquiring a receiving address of a real-time shopping order of an online shopping platform; the online shopping platform comprises an application and data integration platform, a stream processing platform and a database;
cutting words of the receiving address to obtain an address word vector;
and calculating similarity among addressees based on the address word vectors, and judging whether to intercept the real-time shopping orders or not according to the similarity.
2. The method for calculating the address similarity in real time according to claim 1, wherein the address of the online shopping platform is obtained by one or more of the following ways:
monitoring messages of an application and data integration platform to acquire a first receiving address from the application and data integration platform, wherein the first receiving address is stored to the application and data integration platform in real time;
monitoring a message of a stream processing platform to obtain a second receiving address from the stream processing platform, wherein the second receiving address is stored to the stream processing platform in real time;
monitoring information of a database to obtain the third receiving address from the database, and storing the third receiving address to the database in real time;
the recipient address includes the first recipient address, the second recipient address, and the third recipient address.
3. The method for calculating the address similarity in real time according to claim 1, wherein the recipient address is segmented by one or more of the following algorithms:
matching words in the receiving address with words in a machine dictionary according to a preset strategy, wherein if a word matched with any word of the receiving address is found in the machine dictionary, the word is a word obtained by word segmentation of the receiving address;
carrying out syntactic and semantic analysis on the addressee, and utilizing syntactic information and semantic information to cut words of the addressee;
and counting the combination frequency of each adjacent co-occurring character in the receiving address, and when the combination frequency of each adjacent co-occurring character is higher than a preset threshold value, judging that each adjacent co-occurring character is a word obtained by word segmentation of the receiving address.
4. The method for calculating the similarity of the addresses in real time according to claim 1, wherein the calculating the similarity of the two recipient addresses based on the address word vector comprises the following steps:
calculating a dot product of the word vectors of the two recipient addresses cos (α, β) = (α | | | | | | β | | | | |) where α, β represent the word vectors of the two recipient addresses and | | | represents a modulus of the vectors;
and taking the dot product as the similarity.
5. The method according to claim 1, wherein the method further comprises storing the similarity into a Redis database, querying the similarity in the Redis database, and analyzing whether a shopping order corresponding to the similarity is an abnormal order.
6. The method for calculating the address similarity in real time according to claim 1, wherein judging whether to intercept the real-time shopping order according to the similarity comprises: when the similarity of every two addressees in the same area is smaller than a first threshold value, the corresponding real-time shopping orders are processed normally; and when the similarity of every two addressees in the same area is greater than a second threshold value, intercepting the corresponding real-time shopping order so that the real-time shopping order cannot be submitted.
7. The real-time address similarity calculation method according to claim 1, wherein the real-time address similarity calculation method is implemented by using a Flink framework; inputting a receiving address of a real-time shopping order of the online shopping platform into a flight framework to form a flight data stream; and aiming at the Flink data stream, calculating the dot product of the word vectors of the receiving addresses to obtain the similarity between the receiving addresses.
8. An address similarity real-time computing system applied to a third-party service platform is characterized by comprising:
the address acquisition module is used for acquiring a receiving address of a real-time shopping order of the online shopping platform; the online shopping platform comprises an application and data integration platform, a stream processing platform and a database;
the vector acquisition module is used for segmenting words of the addressee to acquire address word vectors;
and the calculating module is used for calculating the similarity between the addressees based on the address word vectors so as to judge whether the real-time shopping orders are intercepted according to the similarity.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the real-time address similarity calculation method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the address similarity real-time calculation method according to any one of claims 1 to 7.
CN202211275023.4A 2022-10-18 2022-10-18 Address similarity real-time calculation method, system, storage medium and computer equipment Pending CN115659953A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211275023.4A CN115659953A (en) 2022-10-18 2022-10-18 Address similarity real-time calculation method, system, storage medium and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211275023.4A CN115659953A (en) 2022-10-18 2022-10-18 Address similarity real-time calculation method, system, storage medium and computer equipment

Publications (1)

Publication Number Publication Date
CN115659953A true CN115659953A (en) 2023-01-31

Family

ID=84989597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211275023.4A Pending CN115659953A (en) 2022-10-18 2022-10-18 Address similarity real-time calculation method, system, storage medium and computer equipment

Country Status (1)

Country Link
CN (1) CN115659953A (en)

Similar Documents

Publication Publication Date Title
US12039447B2 (en) Information processing method and terminal, and computer storage medium
CN109582861B (en) Data privacy information detection system
US11037356B2 (en) System and method for executing non-graphical algorithms on a GPU (graphics processing unit)
WO2021139343A1 (en) Data analysis method and apparatus based on natural language processing, and computer device
CN115795000A (en) Joint similarity algorithm comparison-based enclosure identification method and device
CN111651552A (en) Structured information determination method and device and electronic equipment
CN115099239A (en) Resource identification method, device, equipment and storage medium
CN110309293A (en) Text recommended method and device
CN117974152A (en) Customer complaint data analysis method and device, storage medium and electronic equipment
US11803796B2 (en) System, method, electronic device, and storage medium for identifying risk event based on social information
CN115248890A (en) User interest portrait generation method and device, electronic equipment and storage medium
CN115659953A (en) Address similarity real-time calculation method, system, storage medium and computer equipment
WO2023137903A1 (en) Reply statement determination method and apparatus based on rough semantics, and electronic device
CN113706207B (en) Order success rate analysis method, device, equipment and medium based on semantic analysis
CN111209391A (en) Information identification model establishing method and system and interception method and system
CN115116080A (en) Table analysis method and device, electronic equipment and storage medium
CN114254650A (en) Information processing method, device, equipment and medium
CN114118937A (en) Information recommendation method and device based on task, electronic equipment and storage medium
CN116244740B (en) Log desensitization method and device, electronic equipment and storage medium
CN113190506B (en) Object attribute preservation method and device
CN113704405B (en) Quality inspection scoring method, device, equipment and storage medium based on recorded content
CN114612104B (en) Risk identification method and device and electronic equipment
US20240311568A1 (en) Entity relation mining method and apparatus, electronic device, and storage medium
CN107870679B (en) Polyphone processing method and system
CN114048056A (en) Root cause positioning method, apparatus, device, medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination