CN112199573B - Illegal transaction active detection method and system - Google Patents
Illegal transaction active detection method and system Download PDFInfo
- Publication number
- CN112199573B CN112199573B CN202010776643.0A CN202010776643A CN112199573B CN 112199573 B CN112199573 B CN 112199573B CN 202010776643 A CN202010776643 A CN 202010776643A CN 112199573 B CN112199573 B CN 112199573B
- Authority
- CN
- China
- Prior art keywords
- illegal
- website
- transaction
- template
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 44
- 238000012544 monitoring process Methods 0.000 claims abstract description 35
- 238000000034 method Methods 0.000 claims abstract description 25
- 238000012216 screening Methods 0.000 claims abstract description 7
- 238000005065 mining Methods 0.000 claims abstract description 5
- 238000004088 simulation Methods 0.000 claims description 8
- 230000009471 action Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000009877 rendering Methods 0.000 claims description 4
- 230000002265 prevention Effects 0.000 abstract description 4
- 238000004422 calculation algorithm Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 10
- BUGBHKTXTAQXES-UHFFFAOYSA-N Selenium Chemical compound [Se] BUGBHKTXTAQXES-UHFFFAOYSA-N 0.000 description 9
- 229910052711 selenium Inorganic materials 0.000 description 9
- 239000011669 selenium Substances 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 239000000284 extract Substances 0.000 description 3
- VPGRYOFKCNULNK-ACXQXYJUSA-N Deoxycorticosterone acetate Chemical compound C1CC2=CC(=O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H](C(=O)COC(=O)C)[C@@]1(C)CC2 VPGRYOFKCNULNK-ACXQXYJUSA-N 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012954 risk control Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229910002056 binary alloy Inorganic materials 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/547—Remote procedure calls [RPC]; Web services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Business, Economics & Management (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Computer Networks & Wireless Communication (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides an illegal transaction active detection method and system, wherein the method comprises the following steps: screening illegal websites, matching templates by calculating the similarity between text information of the illegal websites and pre-marked templates, selecting program scripts of the matched templates to perform simulated registration, login and detection of transaction channels on the illegal websites, and extracting relevant information of transaction orders returned by the illegal websites through text analysis and mining and/or image recognition analysis to serve as a judging basis for judging whether the transaction behavior is legal or illegal. The application can effectively identify and early warn transaction risks in daily monitoring, realize rapid early warning of illegal crime platforms, realize early discovery and timely disposal, avoid further expansion of loss and improve the overall prevention and control level of financial risks.
Description
Technical Field
The application relates to the field of financial risk control, in particular to an illegal transaction active detection method and system.
Background
At present, in common payment risks, payment channels are often utilized by some illegal network stations and applications to trap users to conduct illegal transactions and acquire violence, so that economic losses are caused for the users.
Therefore, how to actively detect an illegal transaction platform, effectively identify and early warn transaction risks in daily monitoring, and realize early identification, early warning and early treatment of risks and improve the overall prevention and control level of financial risks is a problem to be solved urgently.
Disclosure of Invention
The application aims to provide an illegal transaction active detection method and system, which are used for solving the problems in the technical background.
In order to achieve the above purpose, the present application adopts the following technical scheme:
the first aspect of the application provides an illegal transaction active detection method, which comprises the following steps:
the method comprises the steps of screening illegal network stations through manually inputting URL and keywords of a website or through search engine query results, and storing website information of the illegal network stations into a database;
calculating the similarity between the text information of the illegal network station and a pre-marked template, wherein the pre-marked template is different preset models generated by classifying the illegal network station of the history record, and each template is marked with a template code number;
if the calculated similarity is larger than a preset threshold value, classifying the illegal network stations, marking new template codes, and storing the new templates as newly added templates into a database;
if the calculated similarity is smaller than or equal to a preset threshold value, the active detection software developed by the pre-marked template is used for carrying out simulated registration, login and detection of transaction channels on the illegal network station, relevant information of the transaction order returned by the illegal network station is extracted through text analysis and mining and/or image recognition analysis and is stored in a database to be used as a judging basis for judging whether the transaction behavior is legal or illegal.
Preferably, screening illegal network stations through results of search engine queries includes:
searching by a keyword through a search engine to obtain a corresponding suspected website URL;
keyword auditing is carried out on the source code content in the webpage;
and (5) taking the web address as an illegal website through the audited web address, and recording the web address into a database.
Preferably, the illegal transaction active detection method further comprises the following steps: after the illegal network station is simulated and registered, the corresponding website URL and registered user information (virtual data) are stored into a database for backup.
Preferably, the website information of the illegal website screened out by manually entering the URL, the keyword of the website or the result of the search engine query comprises at least one of the following: website name, website URL, website validity (whether or not it can be opened), text information of website, website snapshot picture URL, website mark template code, website creation time, website update time.
Preferably, the similarity is a hamming distance between a SimHash value of the web page source code of the illegal website and a SimHash value of a pre-marked template.
Preferably, the predetermined threshold is an empirical value, preferably 15.
Preferably, the related information of the trade order includes at least one of: order number, order screenshot, bank of transaction, transaction time, website URL, transaction amount, payee account information.
Preferably, the illegal transaction active detection method further comprises the following steps: supporting distributed task distribution processing, adopting a browser dock cluster, using a Sepium Grid to realize page rendering and simulation operation, uniformly calling the Sepium Hub to distribute tasks to at least one Node proxy Node registered on the Sepium Hub, requesting illegal network stations by the plurality of Node proxy nodes, completing simulation registration, login and transaction actions, and receiving transaction orders returned by the illegal network stations.
Preferably, the illegal transaction active detection method further comprises the following steps: the IP address of the simulated user for the simulated registration of the illegal network station is dynamically configured.
Preferably, when the illegal network station performs simulated registration, login and detection of a transaction channel, the illegal transaction active detection method further comprises the following steps: and generating monitoring log information and storing the monitoring log information into a database.
A second aspect of the present application provides an active detection system for illegal transactions, comprising:
-a database storing a website base data table, an order monitoring result table, and a simulated registration log-in information table; wherein,
the website basic data table is used for storing website information of illegal websites screened out by manually inputting URL and keywords of the websites or by searching results queried by a search engine;
the order monitoring result table is used for storing related information of the transaction order obtained by actively detecting the illegal network station;
the simulated registration login information table is used for storing website URLs and registered user information (virtual data) corresponding to simulated registration and login of illegal network stations;
the illegal transaction platform positioning module is used for screening out illegal network stations through manually inputting URL and keywords of websites or through search engine query results;
the template marking module is used for calculating the similarity between the text information of the illegal network station and a pre-marked template, wherein the pre-marked template is different preset models generated by classifying the illegal network station of the history record, and each template is marked with a template code number; if the calculated similarity is larger than a preset threshold, classifying the illegal network stations, marking new template codes, and storing the new templates as newly added templates into a website basic data table;
the distributed task distribution module is used for carrying out simulated registration, login and detection of transaction channels on illegal websites with similarity smaller than or equal to a preset threshold value calculated by the marking template module and receiving transaction orders returned by the illegal websites by using active detection software developed by the pre-marked template;
the text analysis module is used for carrying out text analysis mining on text information in the transaction order received by the distributed task distribution module and storing the text information in the order monitoring result table;
the image recognition analysis module is used for carrying out image recognition analysis on the image information in the transaction order received by the distributed task distribution module and storing the image information in the order monitoring result table.
Preferably, the website information stored in the website base data table includes at least one of the following: website name, website URL, website validity (whether or not it can be opened), text information of website, website snapshot picture URL, website mark template code, website creation time, website update time.
Preferably, the related information of the trade order stored in the order monitoring result table includes at least one of the following: order number, order screenshot, bank of transaction, transaction time, website URL, transaction amount, payee account information.
Preferably, the marking template module includes:
the extraction unit is used for extracting the pattern fingerprints in the illegal network station;
a calculation unit for calculating the similarity between the pattern fingerprint in the illegal website extracted by the extraction unit and the pre-marked template;
and the determining unit is used for determining that the illegal website can use active detection software developed by a pre-marked template to perform simulated registration, login and transaction channel detection on the illegal website when the similarity calculated by the calculating unit is smaller than or equal to a preset threshold value.
More preferably, the similarity calculated by the calculation unit is a hamming distance between a SimHash value of the web page source code of the illegal website and a SimHash value of a pre-marked template, that is, the extraction unit extracts a style fingerprint in the illegal website as the SimHash value of the web page source code of the illegal website.
More preferably, the predetermined threshold is an empirical value, preferably 15.
Preferably, the distributed task distribution module supports distributed task distribution processing, adopts a browser dock cluster, uses a Selenium Grid to realize page rendering and simulation operation, uniformly calls a Selenium Hub to distribute tasks to at least one Node proxy Node registered on the Selenium Hub, requests illegal network stations by the plurality of Node proxy nodes, completes simulation registration, login and transaction actions, and receives transaction orders returned by the illegal network stations.
Preferably, the database further comprises a monitoring log information table, which is used for storing monitoring log information generated when the illegal network station performs simulated registration, login and detection of a transaction channel.
Compared with the prior art, the technical scheme of the application has the following beneficial effects:
the application provides an illegal transaction active detection method and system, wherein the system takes the website address of an illegal transaction platform as input, and the output is characteristic information of the illegal transaction platform and illegal transaction order information, and through active detection of an illegal network station, the illegal transaction risk can be effectively identified and early-warned in daily monitoring, the rapid early warning of an illegal criminal platform is realized, the illegal criminal platform is discovered as early as possible and is disposed in time, the loss is avoided to be further enlarged, and the integral prevention and control level of financial risk is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:
FIG. 1 is a block diagram of an active detection system for illegal transactions of the preferred embodiment;
FIG. 2 is a schematic diagram of a website information list of illegal websites screened out by the illegal transaction platform positioning module according to the preferred embodiment;
FIG. 3 is a schematic flow chart of the SimHash algorithm;
FIG. 4 is an architectural diagram of a distributed task distribution process;
FIG. 5 is a schematic diagram of a Selenium Grid distributed task node;
FIG. 6 is a schematic diagram of a page of the illegal network station blocking IP of the preferred embodiment;
fig. 7 is a flow chart of an illegal transaction active detection system.
Detailed Description
In order to make the objects, technical solutions and effects of the present application clearer and more obvious, the present application will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
It is noted that the terms "first," "second," and the like in the description and claims of the present application and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order, and it is to be understood that the data so used may be interchanged where appropriate. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is a block diagram of an active detection system for illegal transactions. As shown in fig. 1, the illegal transaction active detection system comprises a database 1, an illegal transaction platform positioning module 2, a marking template module 3, a distributed task distribution module 4, a text analysis module 5 and an image recognition analysis module 6.
1. Database for storing data
The database 1 stores a website base data table 101, an order monitoring result table 102, and a simulation registration information table 103.
The website basic data table 101 is used for storing website information of illegal websites screened out by manually inputting URLs and keywords of websites or by searching results queried by a search engine. The website information stored in the website base data table 101 includes at least one of the following: website name, website URL, website availability (whether or not openable), text information of the website, website snapshot picture URL, website markup template code, website creation time, website update time, as shown in fig. 2.
The order monitoring result table 102 is configured to store information about a transaction order obtained by actively detecting an illegal website, for example, an order number, an order screenshot, a bank of the transaction, a transaction time, a website URL, a transaction amount, account information of a payee, and the like.
The simulated registration login information table 103 is used for storing website URLs and registered user information (virtual data) corresponding to simulated registration and login of illegal network stations, so that the illegal network stations can be read and used next time.
Preferably, the database 1 further stores a monitoring log information table 104, which is used for storing monitoring log information generated when the illegal network station performs simulated registration, logging and detection of a transaction channel.
2. Illegal transaction platform positioning module
The illegal transaction platform positioning module 2 is used for screening illegal network stations through manually inputting URL and keywords of websites or through search engine query results.
Taking searching and locating a certain type of illegal website as an example, two main ways are: manually enter and search engine queries.
And (3) manual input: and (5) manually inputting the URL and the keywords of the illegal website. Wherein the keywords are manually preset.
Search engine query: illegal web sites have some characteristics such as easy change of web site URL, multiple domain name resolution to the same web site, unstable accessibility of web site itself, etc. Generally, the human resources that can provide suspected websites are limited, so an illegal transaction platform positioning module is required to automatically discover the suspected websites and monitor them in real time, i.e., query through a search engine. The search engine inquiry needs to use keywords, so that an administrator of the system carries out configuration of the keywords and keyword deformed words, then the search engine inquiry is carried out through the keywords to obtain corresponding suspected illegal website URLs, then keyword auditing is carried out on source code contents in the webpages, and the website is regarded as an illegal website and is input into the website basic data table 101 of the database 1 through the audited websites.
3. Marking template module
Because of the huge number of illegal transaction platforms, hundreds or thousands of illegal transaction platforms may exist, and the efficiency of developing and maintaining the system is greatly different from the newly increased and changed speeds of illegal network stations, so that the burden of a developer of the system is increased. In order to solve the contradiction, a distributed network system of a marking template is required to be designed, and the system can automatically select an action instance according to the accessed webpage information of the illegal transaction platform to complete the monitoring task.
The marking template module 3 is configured to calculate similarity between text information of the illegal network station and a pre-marked template, where the pre-marked template is a different preset model generated by classifying the illegal network station of the history record, and each template is marked with a template code number; if the calculated similarity is greater than a predetermined threshold, classifying the illegal network stations, marking new template codes, and storing the new templates as newly added templates in the website base data table 101.
Common algorithms for calculating the similarity of texts include SimHash algorithm, machine learning clustering algorithm, reverse construction of XPath according to a Dom tree, and Kmeans clustering method based on SimHash improvement. Of course, the method for classifying websites by calculating the similarity is not limited thereto, and any algorithm that can implement website classification by calculating the similarity should be covered in the protection scope of the present application.
The similarity calculation is performed by taking the SimHash algorithm as an example.
SimHash is the most commonly used hash method for web page deduplication, and is fast, and similarity between documents is compared according to Hamming distance. The SimHash algorithm flow is shown in FIG. 3, and the algorithm process is as follows:
and extracting keywords (including segmentation and weight calculation) from the document Doc, and extracting n pairs (keywords, weights) in the figure. Denoted feature_weight_pairs= [ fw1, fw2.. fwn ], where fwn = (feature_n, weight_n), n is a natural number greater than 1.
The hash_weight_pairs= [ (hash (feature), weight) for feature, weight in feature _weight_pairs ] generate (hash, weight) in the graph, and the number of bits generated by the hash is assumed to be bits_count=6 (see fig. 3).
Then, the hash_weight_pairs are bit-wise accumulated, and if the bit is 1, +weight, and if 0, —weight, the bits_count numbers are finally generated, as shown by [13,108, -22, -5, -32,55], where the values generated are related to the algorithm used by the hash function.
Positive numbers are represented by 1, negative numbers are represented by 0, and [13,108, -22, -5, -32,55] is converted into binary strings 110001, namely SimHash values of the document Doc.
And calculating the similarity between the two documents, respectively calculating the SimHash values of the two documents, and then calculating the Hamming distance between the two SimHash values.
For example, the SimHash value of document A is: a=100111;
the SimHash value of document B is: b=101010;
the Hamming distance of two SimHash values is calculated, namely the number of 1 in the binary system after A XOR B: hamming_distance (a, B) =count_1 (a XOR B) =count_1 (001101) =3;
after SimHash values of all documents are calculated, the condition that whether document a and document B are similar or not needs to be calculated is: whether the Hamming distance of A and B is less than or equal to n, which can be empirically taken.
Specifically, in a preferred embodiment, multiple illegal websites use the same set of H5 front-end interfaces, and the websites can be categorized by the SimHash algorithm, and corresponding example actions are selected for monitoring.
Assuming that a certain illegal website active detection code is developed and completed, marking a template code of a website mark as a template A, and calculating the SimHash value of the current illegal website page source code docA after calculating the SimHash value of the current illegal website page source code docA, wherein the condition that whether the doc A and the template A are similar or not is required to be calculated is as follows: whether the Hamming distance between the SimHash values of doc A and template A is less than or equal to n, which is typically a value of 15 according to experience.
And if the value of n is less than or equal to 15, the current illegal website is judged to be monitored by using the active detection code developed by the template A. By marking templates and classifying different illegal network stations, the workload of development can be reduced, and the effect of half effort is achieved.
In a preferred embodiment, the procedure for classifying illegal websites using SimHash algorithm is as follows:
1) In the process, the SimHash value of the common website is marked: automatically accessing the websites through a search engine or manually entered URL, manually observing front-end characteristics, selecting a plurality of websites with high occurrence frequency of the same front-end characteristics as one type, developing an automatic script, and simultaneously recording SimHash values of the developed websites for comparison;
2) And (3) calculating the SimHash value of each website by using a SimHash processing program, comparing the SimHash value with marked illegal websites, if similar websites appear, marking corresponding classification labels, warehousing, and selecting a corresponding script program to detect a payment channel.
4. Distributed task distribution module
The illegal transaction active detection system is deployed on a plurality of servers in a distributed environment, a load balancing implementation scheme is needed to be obtained, the task distribution balance in the distributed environment is ensured, the processing efficiency is improved, and single-point faults are avoided, as shown in fig. 4.
Distributed task queues: a distributed system is a system in which multiple programs of multiple machines process multiple URLs simultaneously. The distributed mode can greatly improve the efficiency of the program. Composition of distributed task queues: the Broker, the container in which the message queues are stored, is typically provided by a third party message queuing mechanism, such as RabbitMQ, redis. Tasks, typically written in a script, function as producers, are used to generate messages. The Worker, the consumer, obtains the message from the Broker and processes it.
Distributed task nodes: taking a certain type of illegal website as an example, because illegal network stations monitor a plurality of tasks and a marking template is adopted to classify the tasks, in the distributed task distribution module, a browser Docker cluster is preferentially adopted, and the web page rendering and simulation operation are realized by using a Selenium Grid. The monitoring system calls the Selenium Hub in a unified way, a plurality of Node proxy nodes are registered on the Selenium Hub, a issuing mechanism is established between the Selenium Hub and the plurality of Node proxy nodes, tasks are distributed by the Selenium Hub, websites are requested by the plurality of Node proxy nodes to complete the actions of simulating registration, login and transaction, corresponding webpage information source codes are returned, and the actions enter a text analysis module and an image recognition analysis module to be processed, as shown in fig. 5.
In addition, in actual work, the condition that the system IP is blocked is frequently encountered, and for this purpose, the illegal transaction active detection system needs a lot of IPs to realize the continuous switching of own IP addresses, so as to achieve the purpose of normal monitoring.
5. Text analysis module and image recognition analysis module
The text analysis module 5 is configured to identify and extract text information in the transaction order received by the distributed task distribution module 4, and store the text information in the order monitoring result table 101.
The image recognition analysis module 6 is configured to recognize and extract image information in the transaction order received by the distributed task distribution module 4, and store the image information in the order monitoring result table 101.
The related information of the transaction orders extracted by the text analysis module 5 and the image recognition analysis module 6 may include an order number, an order screenshot, a bank of the transaction, transaction time, a website URL, transaction amount, account information of a payee, etc., which may be used as a criterion for judging whether the transaction is legal or illegal, so as to timely and effectively discover illegal transaction orders of illegal websites, timely control the display of an interactive interface in the background, report the result to a risk control business department for subsequent processing, and if necessary, automatically implement an early warning measure according to an early warning rule to intervene, where the early warning rule includes a series of data such as time, place, website where the early warning occurs, frequency where the early warning occurs, amount of money, etc.
Fig. 7 is a flow chart of an illegal transaction active detection system.
As shown in fig. 7, the main flow of the illegal transaction active detection system of the present application is:
the illegal transaction platform positioning module screens out illegal network stations through manually inputting URL and keywords of websites or through search engine query results, and stores website information of the illegal network stations into a website basic data table of a database;
the marking template module calculates the similarity between the text information of the illegal network stations stored in the website basic data table and a pre-marked template, wherein the pre-marked template is different preset models generated by classifying the illegal network stations of the history record, and each template is marked with a template code number;
if the calculated similarity is larger than a preset threshold, classifying the illegal network stations, marking new template codes, and storing the new templates as newly added templates into a website basic data table in a database;
if the calculated similarity is smaller than or equal to a preset threshold value, running active detection software developed by a pre-marked template to perform simulated registration, login and transaction channel detection on the illegal network station; monitoring log information generated in the active detection process is stored in a monitoring log information table in a database; in the active detection process, the simulated registration, the website URL of the login and the registered user information (virtual data) are stored into a simulated registration login information table of a database for backup;
and respectively extracting the transaction orders returned by the illegal network stations received in the active detection process through a text analysis module and/or an image recognition analysis module, and storing the transaction orders into an order monitoring result table of a database to be used as a judging basis for judging whether the transaction behaviors are legal or illegal. For abnormal transactions and high risk merchants found during the monitoring process, treatment measures such as accountability improvement, risk level adjustment, limiting transactions, closing settlement, reporting to regulatory authorities, etc. can be taken.
In summary, the application provides an illegal transaction active detection method and system, the system takes the website address of an illegal transaction platform as input, and the output is the characteristic information and the illegal transaction order information of the illegal transaction platform, and through the active detection of the illegal network station, the transaction risk can be effectively identified and early-warned in daily monitoring, the rapid early warning of the illegal crime platform is realized, the early discovery and the timely disposal are realized, the loss is avoided to be further enlarged, and the integral prevention and control level of the financial risk is improved.
The above description of the specific embodiments of the present application has been given by way of example only, and the present application is not limited to the above described specific embodiments. Any equivalent modifications and substitutions for the present application will occur to those skilled in the art, and are also within the scope of the present application. Accordingly, equivalent changes and modifications are intended to be included within the scope of the present application without departing from the spirit and scope thereof.
Claims (9)
1. An active detection method for illegal transactions, which is characterized by comprising the following steps:
the method comprises the steps of screening illegal network stations through manually inputting URL and keywords of a website or through search engine query results, and storing website information of the illegal network stations into a database;
calculating the similarity between the text information of the illegal network station and a pre-marked template, wherein the pre-marked template is different preset models generated by classifying the illegal network station of the history record, and each template is marked with a template code number;
if the calculated similarity is larger than a preset threshold value, classifying the illegal network stations, marking new template codes, and storing the new templates as newly added templates into a database;
if the calculated similarity is smaller than or equal to a preset threshold value, the active detection software developed by the pre-marked template is used for carrying out simulated registration, login and detection of transaction channels on the illegal network station, relevant information of a transaction order returned by the illegal network station is extracted through text analysis and mining and/or image recognition analysis and is stored in a database to be used as a judging basis for judging whether the transaction behavior is legal or illegal;
the method for detecting the illegal network station by using active detection software developed by a pre-marked template comprises the following steps of:
supporting distributed task distribution processing, adopting a browser dock cluster, using a Sepium Grid to realize page rendering and simulation operation, uniformly calling the Sepium Hub to distribute tasks to at least one Node proxy Node registered on the Sepium Hub, requesting illegal network stations by the plurality of Node proxy nodes, completing simulation registration, login and transaction actions, and receiving transaction orders returned by the illegal network stations.
2. The method for actively detecting illegal transactions according to claim 1, wherein: and after the illegal network station is simulated and registered, storing the corresponding website URL and registered user information into a database for backup.
3. The method for actively detecting illegal transactions according to claim 1, wherein: website information of illegal websites screened out by manually entering URLs and keywords of websites or by search engine query results comprises at least one of the following: website name, website URL, website validity, text information of website, website snapshot picture URL, website mark template code number, website creation time, website update time.
4. The method for actively detecting illegal transactions according to claim 1, wherein: the similarity is a Hamming distance between the SimHash value of the webpage source code of the illegal website and the SimHash value of the pre-marked template.
5. The method of claim 1, wherein the information related to the trade order includes at least one of: order number, order screenshot, bank of transaction, transaction time, website URL, transaction amount, payee account information.
6. The method for actively detecting illegal transactions according to claim 1, wherein: and when the illegal network station is subjected to simulated registration, login and detection of a transaction channel, monitoring log information is generated, and the monitoring log information is stored in a database.
7. An active detection system for illegal transactions, comprising:
-a database storing a website base data table, an order monitoring result table, and a simulated registration log-in information table; wherein,
the website basic data table is used for storing website information of illegal websites screened out by manually inputting URL and keywords of the websites or by searching results queried by a search engine;
the order monitoring result table is used for storing related information of the transaction order obtained by actively detecting the illegal network station;
the simulated registration login information table is used for storing website URLs and registered user information corresponding to simulated registration and login of illegal network stations;
the illegal transaction platform positioning module is used for screening out illegal network stations through manually inputting URL and keywords of websites or through search engine query results;
the template marking module is used for calculating the similarity between the text information of the illegal network station and a pre-marked template, wherein the pre-marked template is different preset models generated by classifying the illegal network station of the history record, and each template is marked with a template code number; if the calculated similarity is larger than a preset threshold, classifying the illegal network stations, marking new template codes, and storing the new templates as newly added templates into a website basic data table;
the distributed task distribution module is used for carrying out simulated registration, login and detection of transaction channels on illegal websites with similarity smaller than or equal to a preset threshold value calculated by the marking template module and receiving transaction orders returned by the illegal websites by using active detection software developed by the pre-marked template;
the text analysis module is used for carrying out text analysis mining on text information in the transaction order received by the distributed task distribution module and storing the text information in the order monitoring result table;
the image recognition analysis module is used for carrying out image recognition analysis on the image information in the transaction order received by the distributed task distribution module and storing the image information in the order monitoring result table.
8. The system of claim 7, wherein the marking template module comprises:
the extraction unit is used for extracting the pattern fingerprints in the illegal network station;
a calculation unit for calculating the similarity between the pattern fingerprint in the illegal website extracted by the extraction unit and the pre-marked template;
and the determining unit is used for determining that the illegal website can use active detection software developed by a pre-marked template to perform simulated registration, login and transaction channel detection on the illegal website when the similarity calculated by the calculating unit is smaller than or equal to a preset threshold value.
9. An illegal transaction active detection system according to claim 7, characterized in that: the database also comprises a monitoring log information table which is used for storing monitoring log information generated when the illegal network station is subjected to simulated registration, login and detection of a transaction channel.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010776643.0A CN112199573B (en) | 2020-08-05 | 2020-08-05 | Illegal transaction active detection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010776643.0A CN112199573B (en) | 2020-08-05 | 2020-08-05 | Illegal transaction active detection method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112199573A CN112199573A (en) | 2021-01-08 |
CN112199573B true CN112199573B (en) | 2023-12-08 |
Family
ID=74006145
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010776643.0A Active CN112199573B (en) | 2020-08-05 | 2020-08-05 | Illegal transaction active detection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112199573B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112966263A (en) * | 2021-02-25 | 2021-06-15 | 中国银联股份有限公司 | Target information acquisition method and device and computer readable storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996041488A1 (en) * | 1995-06-07 | 1996-12-19 | The Dice Company | Fraud detection system for electronic networks using geographical location coordinates |
CN101383820A (en) * | 2008-07-07 | 2009-03-11 | 上海安融信息系统有限公司 | Design and implementing method for SSL connection and data monitoring |
KR20090090641A (en) * | 2008-02-21 | 2009-08-26 | 주식회사 조은시큐리티 | System for active security surveillance |
CN103685575A (en) * | 2014-01-06 | 2014-03-26 | 洪高颖 | Website security monitoring method based on cloud architecture |
CN106302438A (en) * | 2016-08-11 | 2017-01-04 | 国家计算机网络与信息安全管理中心 | A kind of method of actively monitoring fishing website of Behavior-based control feature by all kinds of means |
CN107733969A (en) * | 2017-07-25 | 2018-02-23 | 上海壹账通金融科技有限公司 | Website simulation login method, device, service end and readable storage medium storing program for executing |
CN107786537A (en) * | 2017-09-19 | 2018-03-09 | 杭州安恒信息技术有限公司 | A kind of lonely page implantation attack detection method based on internet intersection search |
US10108968B1 (en) * | 2014-03-05 | 2018-10-23 | Plentyoffish Media Ulc | Apparatus, method and article to facilitate automatic detection and removal of fraudulent advertising accounts in a network environment |
CN110020075A (en) * | 2017-10-20 | 2019-07-16 | 南京烽火软件科技有限公司 | Device is excavated in illegal website automatically |
CN110119469A (en) * | 2019-05-22 | 2019-08-13 | 北京计算机技术及应用研究所 | A kind of data collection and transmission and method towards darknet |
CN110413908A (en) * | 2018-04-26 | 2019-11-05 | 维布络有限公司 | The method and apparatus classified based on web site contents to uniform resource locator |
-
2020
- 2020-08-05 CN CN202010776643.0A patent/CN112199573B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996041488A1 (en) * | 1995-06-07 | 1996-12-19 | The Dice Company | Fraud detection system for electronic networks using geographical location coordinates |
KR20090090641A (en) * | 2008-02-21 | 2009-08-26 | 주식회사 조은시큐리티 | System for active security surveillance |
CN101383820A (en) * | 2008-07-07 | 2009-03-11 | 上海安融信息系统有限公司 | Design and implementing method for SSL connection and data monitoring |
CN103685575A (en) * | 2014-01-06 | 2014-03-26 | 洪高颖 | Website security monitoring method based on cloud architecture |
US10108968B1 (en) * | 2014-03-05 | 2018-10-23 | Plentyoffish Media Ulc | Apparatus, method and article to facilitate automatic detection and removal of fraudulent advertising accounts in a network environment |
CN106302438A (en) * | 2016-08-11 | 2017-01-04 | 国家计算机网络与信息安全管理中心 | A kind of method of actively monitoring fishing website of Behavior-based control feature by all kinds of means |
CN107733969A (en) * | 2017-07-25 | 2018-02-23 | 上海壹账通金融科技有限公司 | Website simulation login method, device, service end and readable storage medium storing program for executing |
CN107786537A (en) * | 2017-09-19 | 2018-03-09 | 杭州安恒信息技术有限公司 | A kind of lonely page implantation attack detection method based on internet intersection search |
CN110020075A (en) * | 2017-10-20 | 2019-07-16 | 南京烽火软件科技有限公司 | Device is excavated in illegal website automatically |
CN110413908A (en) * | 2018-04-26 | 2019-11-05 | 维布络有限公司 | The method and apparatus classified based on web site contents to uniform resource locator |
CN110119469A (en) * | 2019-05-22 | 2019-08-13 | 北京计算机技术及应用研究所 | A kind of data collection and transmission and method towards darknet |
Non-Patent Citations (2)
Title |
---|
基于URL特征检测的违法网站识别方法;凡友荣;杨涛;王永剑;姜国庆;;《计算机工程》(第3期);176-182 * |
基于主动探测的仿冒网站检测系统设计与实现;魏玉良;《中国优秀硕士学位论文全文数据库信息科技辑》(第2期);I138-1244 * |
Also Published As
Publication number | Publication date |
---|---|
CN112199573A (en) | 2021-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110602029B (en) | Method and system for identifying network attack | |
US8453027B2 (en) | Similarity detection for error reports | |
CN108566399B (en) | Phishing website identification method and system | |
CN104598367A (en) | System and method for automatically managing fault events of data center | |
EP3872637A1 (en) | Application programming interface assessment | |
CN117473512B (en) | Vulnerability risk assessment method based on network mapping | |
US11836331B2 (en) | Mathematical models of graphical user interfaces | |
US11822578B2 (en) | Matching machine generated data entries to pattern clusters | |
KR102257139B1 (en) | Method and apparatus for collecting information regarding dark web | |
JP2016192185A (en) | Spoofing detection system and spoofing detection method | |
CN110689211A (en) | Method and device for evaluating website service capability | |
CN113918794B (en) | Enterprise network public opinion benefit analysis method, system, electronic equipment and storage medium | |
CN116186716A (en) | Security analysis method and device for continuous integrated deployment | |
CN112199573B (en) | Illegal transaction active detection method and system | |
CN108804501B (en) | Method and device for detecting effective information | |
CN113688346A (en) | Illegal website identification method, device, equipment and storage medium | |
Naidu et al. | Analysis of Hadoop log file in an environment for dynamic detection of threats using machine learning | |
CN116318974A (en) | Site risk identification method and device, computer readable medium and electronic equipment | |
KR100992069B1 (en) | A system for preventing exposure of personal information on the internet and the method thereof | |
JP2020095452A (en) | Vocabulary extraction support system and vocabulary extraction support method | |
Vyawhare et al. | Machine Learning System for Malicious Website Detection using Concept Drift Detection | |
Sakai et al. | An Automatic Detection System for Fake Japanese Shopping Sites Using fastText and LightGBM | |
TWI726455B (en) | Penetration test case suggestion method and system | |
CN118276933B (en) | Method, device, equipment and medium for processing software compatibility problem | |
Eljialy et al. | Errors Detection Mechanism in Big Data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |