CN112199573A - Active detection method and system for illegal transaction - Google Patents
Active detection method and system for illegal transaction Download PDFInfo
- Publication number
- CN112199573A CN112199573A CN202010776643.0A CN202010776643A CN112199573A CN 112199573 A CN112199573 A CN 112199573A CN 202010776643 A CN202010776643 A CN 202010776643A CN 112199573 A CN112199573 A CN 112199573A
- Authority
- CN
- China
- Prior art keywords
- illegal
- website
- transaction
- template
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 48
- 238000012216 screening Methods 0.000 claims abstract description 7
- 238000005065 mining Methods 0.000 claims abstract description 5
- 238000012544 monitoring process Methods 0.000 claims description 35
- BUGBHKTXTAQXES-UHFFFAOYSA-N Selenium Chemical compound [Se] BUGBHKTXTAQXES-UHFFFAOYSA-N 0.000 claims description 15
- 229910052711 selenium Inorganic materials 0.000 claims description 15
- 239000011669 selenium Substances 0.000 claims description 15
- 238000004088 simulation Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 7
- 230000009471 action Effects 0.000 claims description 6
- 238000009877 rendering Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 2
- 239000000126 substance Substances 0.000 claims description 2
- 238000000034 method Methods 0.000 abstract description 21
- 230000006399 behavior Effects 0.000 abstract description 4
- 230000002265 prevention Effects 0.000 abstract description 4
- 238000004422 calculation algorithm Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 4
- 239000003795 chemical substances by application Substances 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 238000012954 risk control Methods 0.000 description 2
- VPGRYOFKCNULNK-ACXQXYJUSA-N Deoxycorticosterone acetate Chemical compound C1CC2=CC(=O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H](C(=O)COC(=O)C)[C@@]1(C)CC2 VPGRYOFKCNULNK-ACXQXYJUSA-N 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229910002056 binary alloy Inorganic materials 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/547—Remote procedure calls [RPC]; Web services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Abstract
The application provides an illegal transaction active detection method and system, wherein the method comprises the following steps: screening illegal websites, matching templates by calculating the similarity between the text information of the illegal websites and the templates marked in advance, selecting program scripts of the matched templates to carry out simulated registration, login and detection of transaction channels on the illegal websites, and extracting relevant information of transaction orders returned by the illegal websites through text analysis mining and/or image recognition analysis to serve as a basis for judging whether the transaction behaviors are legal or illegal. This application can effectively discern, early warning transaction risk in daily control, realizes accomplishing as early as possible discovery, in time dealing with to the quick early warning of illegal criminal platform, avoids the loss further to enlarge, has promoted the whole prevention and control level of financial risk.
Description
Technical Field
The invention relates to the field of financial risk control, in particular to an illegal transaction active detection method and system.
Background
At present, in common payment risks, payment channels are often utilized by illegal websites and applications, users are tricked into carrying out illegal transactions, violence is obtained, and economic losses are caused for the users.
Therefore, how to actively detect the illegal transaction platform, effectively identify and early warn the transaction risk in daily monitoring, realize early identification of risk, early warning, early disposal and improve the overall prevention and control level of financial risk is a problem to be solved urgently.
Disclosure of Invention
The present invention is directed to provide an active illegal transaction detection method and system, so as to solve the problems set forth in the above background.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present application provides an active detection method for illegal transactions, including:
screening out illegal websites by manually inputting URL (uniform resource locator) and key words of the websites or by searching results of an engine, and storing website information of the illegal websites into a database;
calculating the similarity between the text information of the illegal websites and pre-marked templates, wherein the pre-marked templates are different preset models generated by classifying historical illegal websites, and each template is marked with a template code;
if the calculated similarity is larger than a preset threshold value, classifying the illegal website, marking a new template code number, and storing the template code number as a newly added template in a database;
if the calculated similarity is smaller than or equal to a preset threshold value, using active detection software developed by a pre-marked template to perform simulated registration, login and detection of a transaction channel on the illegal website, extracting relevant information of a transaction order returned by the illegal website through text analysis mining and/or image recognition analysis, and storing the relevant information in a database as a basis for judging whether the transaction behavior is legal or illegal.
Preferably, screening out illegitimate websites through the results of search engine queries includes:
searching engine query through the keywords to obtain corresponding suspected website URL;
performing keyword examination on the source code content in the webpage;
and (4) the checked website is regarded as an illegal website and is recorded into the database.
Preferably, the illegal transaction active detection method further comprises: and after the illegal website is simulated and registered, storing the corresponding website URL and the registered user information (virtual data) into a database for backup.
Preferably, the website information of the illegal website screened out by manually entering the URL, the keyword of the website, or the result of the search engine query includes at least one of the following: website name, website URL, website validity (whether the website can be opened or not), text information of the website, website snapshot picture URL, website mark template code number, website creation time and website update time.
Preferably, the similarity is a hamming distance between a SimHash value of a web page source code of the illegal website and a SimHash value of a pre-marked template.
Preferably, the predetermined threshold is an empirical value, preferably 15.
Preferably, the information relating to the trade order comprises at least one of: order number, order screenshot, bank of transaction, transaction time, website URL, transaction amount, payee account information.
Preferably, the illegal transaction active detection method further comprises: the distributed task distribution processing is supported, a browser Docker cluster is adopted, a Selenium Grid is used for realizing page rendering and simulation operation, a Selenium Hub is called uniformly to distribute tasks to at least one Node proxy Node registered on the Selenium Hub, a plurality of Node proxy nodes request an illegal website to complete simulation registration, login and transaction actions, and a transaction order returned by the illegal website is received.
Preferably, the illegal transaction active detection method further comprises: and the IP address of the simulated user for performing simulated registration on the illegal website is dynamically configured.
Preferably, when the illegal website is subjected to simulated registration, login and detection of a transaction channel, the illegal transaction active detection method further comprises the following steps: and generating monitoring log information and storing the monitoring log information into a database.
A second aspect of the present application provides an active illegal transaction detection system, comprising:
a database storing a website basic data table, an order monitoring result table, and a simulation registration login information table; wherein the content of the first and second substances,
the website basic data table is used for storing website information of illegal websites screened out by manually inputting URLs and key words of the websites or by results of search engine query;
the order monitoring result table is used for storing the related information of the transaction order obtained by actively detecting the illegal website;
the simulation registration login information table is used for storing simulation registration of illegal websites, corresponding website URLs and registered user information (virtual data) during login;
the illegal trading platform positioning module is used for screening out illegal websites by manually inputting URL and key words of the websites or by searching results of an engine;
the template marking module is used for calculating the similarity between the text information of the illegal website and a template marked in advance, wherein the template marked in advance is different preset models generated by classifying the illegal websites recorded in history, and each template is marked with a template code; if the calculated similarity is larger than a preset threshold value, classifying the illegal website, marking a new template code number, and storing the template code number as a newly added template into a website basic data table;
the distributed task distribution module is used for performing simulated registration, login and transaction channel detection on the illegal website with the similarity calculated in the marking template module being less than or equal to a preset threshold value by using active detection software developed by a pre-marked template, and receiving a transaction order returned by the illegal website;
the text analysis module is used for performing text analysis mining on the text information in the transaction order received by the distributed task distribution module and storing the text information into the order monitoring result table;
the image recognition analysis module is used for carrying out image recognition analysis on the image information in the transaction order received by the distributed task distribution module and storing the image information in the order monitoring result table.
Preferably, the website information stored in the website basic data table includes at least one of: website name, website URL, website validity (whether the website can be opened or not), text information of the website, website snapshot picture URL, website mark template code number, website creation time and website update time.
Preferably, the information related to the trade orders stored in the order monitoring result table includes at least one of the following: order number, order screenshot, bank of transaction, transaction time, website URL, transaction amount, payee account information.
Preferably, the marking template module includes:
an extraction unit for extracting pattern fingerprints in an illegal website;
the calculating unit is used for calculating the similarity between the pattern fingerprint in the illegal website extracted by the extracting unit and the template marked in advance;
and the determining unit is used for determining that the illegal website can use active detection software developed by a pre-marked template to perform simulated registration, login and detection of a transaction channel on the illegal website when the similarity calculated by the calculating unit is less than or equal to a preset threshold.
More preferably, the similarity calculated by the calculating unit is a hamming distance between a SimHash value of the web page source code of the illegal website and a SimHash value of a pre-marked template, that is, the extracting unit extracts the pattern fingerprint in the illegal website as the SimHash value of the web page source code of the illegal website.
More preferably, the predetermined threshold is an empirical value, preferably 15.
Preferably, the distributed task distribution module supports distributed task distribution processing, a browser Docker cluster is adopted, a Selenium Grid is used for realizing page rendering and simulation operations, the Selenium Hub is called uniformly to distribute tasks to at least one Node proxy Node registered on the Selenium Hub, a plurality of Node proxy nodes request an illegal website to complete simulated registration, login and transaction actions, and a transaction order returned by the illegal website is received.
Preferably, the database further includes a monitoring log information table for storing monitoring log information generated when the illegal website is subjected to simulated registration, login, and detection of a transaction channel.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
the system takes an illegal transaction platform website address as input, outputs characteristic information and illegal transaction order information of an illegal transaction platform, can effectively identify and early warn transaction risks in daily monitoring through active detection of the illegal website, realizes quick early warning of an illegal crime platform, finds and timely disposes the illegal transaction risk as soon as possible, avoids loss from further expansion, and improves the overall prevention and control level of financial risk.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 is a block diagram of an illegal transaction active detection system of the preferred embodiment;
FIG. 2 is a schematic diagram of a website information list of illegal websites screened by the illegal transaction platform positioning module according to the preferred embodiment;
FIG. 3 is a schematic diagram of the SimHash algorithm flow;
FIG. 4 is an architectural diagram of a distributed task distribution process;
FIG. 5 is a schematic diagram of a Selenium Grid distributed task node;
FIG. 6 is a schematic diagram of a page of the illegal website block IP of the preferred embodiment;
fig. 7 is a schematic flow diagram of an illegal transaction active probing system.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order, it being understood that the data so used may be interchanged under appropriate circumstances. Furthermore, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is a block diagram of an illegal transaction active detection system. As shown in fig. 1, the illegal transaction active detection system includes a database 1, an illegal transaction platform positioning module 2, a marking template module 3, a distributed task distribution module 4, a text analysis module 5, and an image recognition analysis module 6.
1. Database with a plurality of databases
The database 1 stores a website basic data table 101, an order monitoring result table 102, and a simulation registration login information table 103.
The website basic data table 101 is used for storing website information of illegal websites screened out by manually inputting URLs and keywords of the websites or by results of search engine queries. The website information stored in the website basic data table 101 includes at least one of the following: website name, website URL, website validity (whether it can be opened), text information of website, website snapshot picture URL, website markup template code number, website creation time, and website update time, as shown in fig. 2.
The order monitoring result table 102 is used for storing relevant information of the transaction order obtained by actively detecting the illegal website, such as an order number, an order screenshot, a transaction bank, transaction time, a website URL, a transaction amount, payee account information, and the like.
The simulated registration login information table 103 is used to store website URLs and registered user information (virtual data) corresponding to simulated registration and login of an illegal website, and is convenient for reading and using next time.
Preferably, the database 1 further stores a monitoring log information table 104, which is used for storing monitoring log information generated when the illegal website is subjected to analog registration, login and detection of a transaction channel.
2. Illegal trading platform positioning module
The illegal trading platform positioning module 2 is used for screening out illegal websites through manually inputting URL and key words of the websites or through the result of search engine query.
Taking the search and the location of a certain kind of illegal websites as an example, the method mainly includes two ways: manual entry and search engine queries.
Manual input: and manually inputting illegal website URLs and key words. Wherein, the keyword is preset manually.
Search engine query: illegal websites have some characteristics, such as easy change of URL, resolution of multiple domain names to the same website, unstable accessibility of the website itself, and the like. Generally, the human resources capable of providing the suspected website are limited, so an illegal transaction platform location module is required to automatically find the suspected website and monitor the suspected website in real time, namely, query the suspected website through a search engine. The search engine queries the required keywords, so that a system administrator configures the keywords and the keyword distortion words, then queries the search engine through the keywords to obtain corresponding suspected illegal website URL, performs keyword review on the source code content in the webpage, regards the website passing the review as an illegal website, and records the website into the website basic data table 101 of the database 1.
3. Mark template module
Because the number of illegal trading platforms is huge, hundreds of illegal trading platforms may exist, and the efficiency of developing and maintaining the system is different from the great difference between the new increase of illegal network stations and the higher speed, the burden of a developer of the system is increased. Aiming at the contradiction, a distributed network system of a marking template needs to be designed, and the system can automatically select an action instance according to the accessed illegal transaction platform webpage information to complete a monitoring task.
The marked template module 3 is used for calculating the similarity between the text information of the illegal website and a pre-marked template, wherein the pre-marked template is different preset models generated by classifying historical illegal websites, and each template is marked with a template code; if the calculated similarity is greater than a predetermined threshold, classifying the illegal website, marking a new template code, and storing the template as a newly added template in the website basic data table 101.
Common algorithms for calculating the similarity of texts include a SimHash algorithm, a machine learning clustering algorithm, a method for reversely constructing XPath according to a Dom tree, a SimHash-based improved Kmeans clustering method and the like. Of course, the method for classifying websites through similarity calculation mentioned in the present application is not limited thereto, and all algorithms capable of implementing website classification through similarity calculation should be covered in the protection scope of the present application.
The similarity calculation is performed by taking the SimHash algorithm as an example.
SimHash is the most common hash method for web page deduplication, is fast, and compares the similarity between documents according to Hamming distance. The SimHash algorithm flow is shown in FIG. 3, and the algorithm process is as follows:
extracting keywords from the document Doc (including word segmentation and weight calculation), and extracting n (keyword, weight) pairs, namely (feature, weight) in the graph. Note that feature _ weight _ pairs ═ fw1, fw2.. fwn ], where fwn ═ feature _ n, weight _ n, and n is a natural number greater than 1.
hash _ weight _ pairs ═ hash (feature), weight) for feature, weight in feature _ weight _ pairs ] generates (hash, weight) in the graph, and at this time, it is assumed that the number of bits _ count generated by hash is 6 (see fig. 3).
Then the hash weight pairs is accumulated longitudinally with a bit of + weight if the bit is 1 and-weight if the bit is 0, and finally a bits count number is generated, as shown in the figure [13,108, -22, -5, -32,55], where the resulting value is related to the algorithm used for the hash function.
A positive number is represented by 1 and a negative number by 0, then [13,108, -22, -5, -32,55] is converted into a binary string 110001, i.e. the SimHash value of the document Doc.
And calculating the similarity between the two documents, namely calculating the SimHash values of the two documents respectively, and then calculating the Hamming distance between the two SimHash values.
For example, the SimHash value for document A is: a is 100111;
the SimHash value of document B is: b is 101010;
calculating the Hamming distance of two SimHash values, namely the number of 1 in binary system after A XOR B: weighting _ distance (a, B) ═ count _1(a XOR B) ═ count _1(001101) ═ 3;
after the SimHash values of all the documents are calculated, the condition that whether the document A and the document B are similar needs to be calculated is as follows: whether the Hamming distance between A and B is less than or equal to n or not can be determined according to experience.
Specifically, in a preferred embodiment, multiple illegal websites, using the same set of H5 front-end interface, can be categorized by the SimHash algorithm, and corresponding instance actions are selected for monitoring.
Assuming that the active detection code of a certain illegal website is developed, the template code of the website mark is marked as template A, and after the SimHash value of the source code docA of the current illegal website webpage is calculated, the condition that whether the doc A and the template A are similar is required to be calculated is as follows: whether the hamming distance between the SimHash values of doc A and template A is less than or equal to n or not is generally found to be 15 according to experience.
And judging that the value of n is less than or equal to 15, and judging that the current illegal website can be monitored by using active detection codes developed by template A. By marking the template, different illegal websites are classified, the development workload can be reduced, and the effect of achieving twice the result with half the effort is achieved.
In a preferred embodiment, the process of classifying illegitimate web sites using the SimHash algorithm is as follows:
1) in the processing procedure, the SimHash value of a common website is marked: automatically accessing the websites through a search engine or a manually-entered URL, manually observing front-end characteristics, selecting several websites with the same front-end characteristics and high occurrence frequency as a class, developing an automatic script, and simultaneously recording SimHash values of the developed websites for comparison;
2) and calculating the SimHash value of each website by using a SimHash processing program, comparing the SimHash value with the marked illegal websites, marking corresponding classification labels if similar websites appear, warehousing, and selecting a corresponding script program to detect the payment channel.
4. Distributed task distribution module
The illegal transaction active detection system is deployed on a plurality of servers in a distributed environment, a load balancing implementation scheme needs to be obtained, balance of task distribution in the distributed environment is guaranteed, processing efficiency is improved, and single-point faults are avoided, as shown in fig. 4.
Distributed task queue: a distributed system is that a plurality of machines and a plurality of programs process a plurality of URLs at the same time. The distributed mode can greatly improve the efficiency of the program. Composition of the distributed task queue: broker, a container that holds message queues, is typically provided by third party message queue mechanisms such as RabbitMQ, Redis, etc. Tasks, generally written in a script, acts as a producer for generating messages. Worker, the consumer, obtains the message from Broker and processes.
Distributed task nodes: taking a certain illegal website as an example, because the monitoring tasks of the illegal website are numerous and the mark template is adopted to classify tasks, a browser Docker cluster is preferentially adopted in the distributed task distribution module, and here, a Selenium Grid is used to realize page rendering and simulation operation. The monitoring system calls the Selenium Hub uniformly, a plurality of Node agent nodes are registered on the Selenium Hub, an issuing mechanism is established between the Selenium Hub and the Node agent nodes, the Selenium Hub distributes tasks, the Node agent nodes request websites to complete simulation registration, login and transaction actions, and return corresponding webpage information source codes to enter a text analysis module and an image recognition analysis module for processing, as shown in FIG. 5.
In addition, in actual work, the situation that the system IP is forbidden is often encountered, and therefore, the illegal transaction active detection system needs a lot of IPs to realize the ceaseless switching of own IP addresses, and the purpose of normal monitoring is achieved.
5. Text analysis module and image recognition analysis module
The text analysis module 5 is configured to identify and extract text information in the transaction order received by the distributed task distribution module 4, and store the text information in the order monitoring result table 101.
The image recognition and analysis module 6 is configured to recognize and extract image information in the transaction order received by the distributed task distribution module 4, and store the image information in the order monitoring result table 101.
The relevant information of the transaction order extracted by the text analysis module 5 and the image recognition analysis module 6 may include an order number, an order screenshot, a bank of the transaction, transaction time, a website URL, transaction amount, payee account information, and the like, which may be used as a criterion for determining whether the transaction behavior is legal or illegal, so as to timely and effectively find the illegal transaction order of the illegal website, and timely display the interaction interface in the background control, and report the result to the risk control business department for subsequent processing.
Fig. 7 is a schematic flow diagram of an illegal transaction active probing system.
As shown in fig. 7, the main processes of the illegal transaction active detection system of the present application are:
the illegal trading platform positioning module screens out illegal websites through manually inputting URL and key words of the websites or through the search engine query result, and stores the website information of the illegal websites into a website basic data table of a database;
the method comprises the steps that a template marking module calculates the similarity between text information of the illegal websites stored in a website basic data table and pre-marked templates, wherein the pre-marked templates are different preset models generated by classifying historical illegal websites, and each template is marked with a template code;
if the calculated similarity is larger than a preset threshold value, classifying the illegal website, marking a new template code number, and storing the template as a newly added template into a website basic data table in a database;
if the calculated similarity is less than or equal to a preset threshold value, active detection software developed by a pre-marked template can be operated to carry out the detection of simulation registration, login and transaction channels on the illegal website; monitoring log information generated in the active detection process is stored in a monitoring log information table in a database; in the active detection process, the simulated registration and login website URL and the registered user information (virtual data) are stored in a simulated registration login information table of a database for backup;
and respectively extracting the transaction orders returned by the illegal websites received in the active detection process through a text analysis module and/or an image recognition analysis module, and storing the transaction orders into an order monitoring result table of a database to be used as a basis for judging whether the transaction behaviors are legal or illegal. For abnormal transactions and high-risk merchants discovered in the monitoring process, measures such as order adjustment, risk level adjustment, transaction limitation, closing settlement, reporting to a supervision authority and the like can be taken.
In summary, the present application provides an illegal transaction active detection method and system, the system uses the address of the website of the illegal transaction platform as input, outputs the characteristic information of the illegal transaction platform and the illegal transaction order information, and through the active detection of the illegal website, the transaction risk can be effectively identified and early warned in daily monitoring, so as to realize the rapid early warning of the platform for illegal crimes, find and dispose in time as soon as possible, avoid further expansion of loss, and improve the overall prevention and control level of financial risk.
The embodiments of the present invention have been described in detail, but the embodiments are merely examples, and the present invention is not limited to the embodiments described above. Any equivalent modifications and substitutions to those skilled in the art are also within the scope of the present invention. Accordingly, equivalent changes and modifications made without departing from the spirit and scope of the present invention should be covered by the present invention.
Claims (10)
1. An active detection method for illegal transactions, comprising:
screening out illegal websites by manually inputting URL (uniform resource locator) and key words of the websites or by searching results of an engine, and storing website information of the illegal websites into a database;
calculating the similarity between the text information of the illegal websites and pre-marked templates, wherein the pre-marked templates are different preset models generated by classifying historical illegal websites, and each template is marked with a template code;
if the calculated similarity is larger than a preset threshold value, classifying the illegal website, marking a new template code number, and storing the template code number as a newly added template in a database;
if the calculated similarity is smaller than or equal to a preset threshold value, using active detection software developed by a pre-marked template to perform simulated registration, login and detection of a transaction channel on the illegal website, extracting relevant information of a transaction order returned by the illegal website through text analysis mining and/or image recognition analysis, and storing the relevant information in a database as a basis for judging whether the transaction behavior is legal or illegal.
2. The active detection method of illegal transactions according to claim 1, characterized in that: and after the illegal website is simulated and registered, storing the corresponding website URL and the registered user information into a database for backup.
3. The active detection method of illegal transactions according to claim 1, characterized in that: the website information of the illegal website screened out by manually inputting the URL and the keyword of the website or the result of the search engine query comprises at least one of the following: website name, website URL, website validity, website text information, website snapshot picture URL, website mark template code number, website creation time and website update time.
4. The active detection method of illegal transactions according to claim 1, characterized in that: the similarity is the hamming distance between the SimHash value of the webpage source code of the illegal website and the SimHash value of the pre-marked template.
5. The active illegal transaction detection method according to claim 1, wherein the information related to the transaction order comprises at least one of the following: order number, order screenshot, bank of transaction, transaction time, website URL, transaction amount, payee account information.
6. The active illegal transaction detection method according to claim 1, wherein distributed task distribution processing is supported, a browser Docker cluster is adopted, a Selenium Grid is used to realize page rendering and simulation operations, a Selenium Hub is called uniformly to distribute tasks to at least one Node proxy Node registered on the Selenium Hub, a plurality of Node proxy nodes request an illegal website to complete simulated registration, login and transaction actions, and a transaction order returned by the illegal website is received.
7. The active detection method of illegal transactions according to claim 1, characterized in that: and when the illegal website is subjected to simulated registration, login and transaction channel detection, generating monitoring log information and storing the monitoring log information into a database.
8. An active illegal transaction detection system comprising:
a database storing a website basic data table, an order monitoring result table, and a simulation registration login information table; wherein the content of the first and second substances,
the website basic data table is used for storing website information of illegal websites screened out by manually inputting URLs and key words of the websites or by results of search engine query;
the order monitoring result table is used for storing the related information of the transaction order obtained by actively detecting the illegal website;
the simulation registration login information table is used for storing simulation registration of illegal websites, corresponding website URLs and registered user information during login;
the illegal trading platform positioning module is used for screening out illegal websites by manually inputting URL and key words of the websites or by searching results of an engine;
the template marking module is used for calculating the similarity between the text information of the illegal website and a template marked in advance, wherein the template marked in advance is different preset models generated by classifying the illegal websites recorded in history, and each template is marked with a template code; if the calculated similarity is larger than a preset threshold value, classifying the illegal website, marking a new template code number, and storing the template code number as a newly added template into a website basic data table;
the distributed task distribution module is used for performing simulated registration, login and transaction channel detection on the illegal website with the similarity calculated in the marking template module being less than or equal to a preset threshold value by using active detection software developed by a pre-marked template, and receiving a transaction order returned by the illegal website;
the text analysis module is used for performing text analysis mining on the text information in the transaction order received by the distributed task distribution module and storing the text information into the order monitoring result table;
the image recognition analysis module is used for carrying out image recognition analysis on the image information in the transaction order received by the distributed task distribution module and storing the image information in the order monitoring result table.
9. The active illegal transaction detection system of claim 8 wherein said branding template module comprises:
an extraction unit for extracting pattern fingerprints in an illegal website;
the calculating unit is used for calculating the similarity between the pattern fingerprint in the illegal website extracted by the extracting unit and the template marked in advance;
and the determining unit is used for determining that the illegal website can use active detection software developed by a pre-marked template to perform simulated registration, login and detection of a transaction channel on the illegal website when the similarity calculated by the calculating unit is less than or equal to a preset threshold.
10. The active illegal transaction detection system of claim 8 wherein: the database also comprises a monitoring log information table which is used for storing monitoring log information generated when the illegal website is subjected to simulation registration, login and detection of a transaction channel.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010776643.0A CN112199573B (en) | 2020-08-05 | 2020-08-05 | Illegal transaction active detection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010776643.0A CN112199573B (en) | 2020-08-05 | 2020-08-05 | Illegal transaction active detection method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112199573A true CN112199573A (en) | 2021-01-08 |
CN112199573B CN112199573B (en) | 2023-12-08 |
Family
ID=74006145
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010776643.0A Active CN112199573B (en) | 2020-08-05 | 2020-08-05 | Illegal transaction active detection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112199573B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112966263A (en) * | 2021-02-25 | 2021-06-15 | 中国银联股份有限公司 | Target information acquisition method and device and computer readable storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996041488A1 (en) * | 1995-06-07 | 1996-12-19 | The Dice Company | Fraud detection system for electronic networks using geographical location coordinates |
CN101383820A (en) * | 2008-07-07 | 2009-03-11 | 上海安融信息系统有限公司 | Design and implementing method for SSL connection and data monitoring |
KR20090090641A (en) * | 2008-02-21 | 2009-08-26 | 주식회사 조은시큐리티 | System for active security surveillance |
CN103685575A (en) * | 2014-01-06 | 2014-03-26 | 洪高颖 | Website security monitoring method based on cloud architecture |
CN106302438A (en) * | 2016-08-11 | 2017-01-04 | 国家计算机网络与信息安全管理中心 | A kind of method of actively monitoring fishing website of Behavior-based control feature by all kinds of means |
CN107733969A (en) * | 2017-07-25 | 2018-02-23 | 上海壹账通金融科技有限公司 | Website simulation login method, device, service end and readable storage medium storing program for executing |
CN107786537A (en) * | 2017-09-19 | 2018-03-09 | 杭州安恒信息技术有限公司 | A kind of lonely page implantation attack detection method based on internet intersection search |
US10108968B1 (en) * | 2014-03-05 | 2018-10-23 | Plentyoffish Media Ulc | Apparatus, method and article to facilitate automatic detection and removal of fraudulent advertising accounts in a network environment |
CN110020075A (en) * | 2017-10-20 | 2019-07-16 | 南京烽火软件科技有限公司 | Device is excavated in illegal website automatically |
CN110119469A (en) * | 2019-05-22 | 2019-08-13 | 北京计算机技术及应用研究所 | A kind of data collection and transmission and method towards darknet |
CN110413908A (en) * | 2018-04-26 | 2019-11-05 | 维布络有限公司 | The method and apparatus classified based on web site contents to uniform resource locator |
-
2020
- 2020-08-05 CN CN202010776643.0A patent/CN112199573B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996041488A1 (en) * | 1995-06-07 | 1996-12-19 | The Dice Company | Fraud detection system for electronic networks using geographical location coordinates |
KR20090090641A (en) * | 2008-02-21 | 2009-08-26 | 주식회사 조은시큐리티 | System for active security surveillance |
CN101383820A (en) * | 2008-07-07 | 2009-03-11 | 上海安融信息系统有限公司 | Design and implementing method for SSL connection and data monitoring |
CN103685575A (en) * | 2014-01-06 | 2014-03-26 | 洪高颖 | Website security monitoring method based on cloud architecture |
US10108968B1 (en) * | 2014-03-05 | 2018-10-23 | Plentyoffish Media Ulc | Apparatus, method and article to facilitate automatic detection and removal of fraudulent advertising accounts in a network environment |
CN106302438A (en) * | 2016-08-11 | 2017-01-04 | 国家计算机网络与信息安全管理中心 | A kind of method of actively monitoring fishing website of Behavior-based control feature by all kinds of means |
CN107733969A (en) * | 2017-07-25 | 2018-02-23 | 上海壹账通金融科技有限公司 | Website simulation login method, device, service end and readable storage medium storing program for executing |
CN107786537A (en) * | 2017-09-19 | 2018-03-09 | 杭州安恒信息技术有限公司 | A kind of lonely page implantation attack detection method based on internet intersection search |
CN110020075A (en) * | 2017-10-20 | 2019-07-16 | 南京烽火软件科技有限公司 | Device is excavated in illegal website automatically |
CN110413908A (en) * | 2018-04-26 | 2019-11-05 | 维布络有限公司 | The method and apparatus classified based on web site contents to uniform resource locator |
CN110119469A (en) * | 2019-05-22 | 2019-08-13 | 北京计算机技术及应用研究所 | A kind of data collection and transmission and method towards darknet |
Non-Patent Citations (2)
Title |
---|
凡友荣;杨涛;王永剑;姜国庆;: "基于URL特征检测的违法网站识别方法", 《计算机工程》, no. 3, pages 176 - 182 * |
魏玉良: "基于主动探测的仿冒网站检测系统设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 2, pages 138 - 1244 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112966263A (en) * | 2021-02-25 | 2021-06-15 | 中国银联股份有限公司 | Target information acquisition method and device and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112199573B (en) | 2023-12-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Robust log-based anomaly detection on unstable log data | |
US20200293946A1 (en) | Machine learning based incident classification and resolution | |
US9459950B2 (en) | Leveraging user-to-tool interactions to automatically analyze defects in IT services delivery | |
Zhou et al. | Spi: Automated identification of security patches via commits | |
CN110602029B (en) | Method and system for identifying network attack | |
CN111181922A (en) | Fishing link detection method and system | |
CN112749284A (en) | Knowledge graph construction method, device, equipment and storage medium | |
CN110602030A (en) | Network intrusion blocking method, server and computer readable medium | |
CN117473512B (en) | Vulnerability risk assessment method based on network mapping | |
US11334592B2 (en) | Self-orchestrated system for extraction, analysis, and presentation of entity data | |
De La Torre-Abaitua et al. | On the application of compression-based metrics to identifying anomalous behaviour in web traffic | |
CN108804501B (en) | Method and device for detecting effective information | |
CN112199573B (en) | Illegal transaction active detection method and system | |
KR102257139B1 (en) | Method and apparatus for collecting information regarding dark web | |
JP7470235B2 (en) | Vocabulary extraction support system and vocabulary extraction support method | |
CN110866700A (en) | Method and device for determining enterprise employee information disclosure source | |
CN115801455A (en) | Website fingerprint-based counterfeit website detection method and device | |
US11822578B2 (en) | Matching machine generated data entries to pattern clusters | |
CN113688346A (en) | Illegal website identification method, device, equipment and storage medium | |
KR100992069B1 (en) | A system for preventing exposure of personal information on the internet and the method thereof | |
CN112347328A (en) | Network platform identification method, device, equipment and readable storage medium | |
KR20210083510A (en) | Crime detection system through fake news decision and web monitoring and Method thereof | |
de la Torre-Abaitua et al. | A parameter-free method for the detection of web attacks | |
Sun et al. | Identify vulnerability fix commits automatically using hierarchical attention network | |
CN116150541B (en) | Background system identification method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |