CN109995834A - Massive dataflow processing method, calculates equipment and storage medium at device - Google Patents

Massive dataflow processing method, calculates equipment and storage medium at device Download PDF

Info

Publication number
CN109995834A
CN109995834A CN201711498056.4A CN201711498056A CN109995834A CN 109995834 A CN109995834 A CN 109995834A CN 201711498056 A CN201711498056 A CN 201711498056A CN 109995834 A CN109995834 A CN 109995834A
Authority
CN
China
Prior art keywords
user
resource
phone number
access
cell
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711498056.4A
Other languages
Chinese (zh)
Inventor
刘晓斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Guizhou Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Guizhou Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Guizhou Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201711498056.4A priority Critical patent/CN109995834A/en
Publication of CN109995834A publication Critical patent/CN109995834A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/53Network services using third party service providers

Abstract

The embodiment of the invention discloses a kind of massive dataflow processing method, device, calculate equipment and storage medium, method includes: based on user's history access log, for each IP, the cell-phone number quantity that each sub- period in the order history period accesses the IP is counted respectively;The variance that each sub- period accesses each IP mobile telephone number amount is calculated, variance is greater than to the conduct IP to be seen of the first predetermined threshold;Based on cell-phone number and the related coefficient of access time under each IP to be seen and/or success rate is logged in, identification cheating user.Through the above scheme, cheating user can be found in real time, improve the treatment effeciency of user's order, and promote the access speed of user.

Description

Massive dataflow processing method, calculates equipment and storage medium at device
Technical field
The present invention relates to Internet service support technology field more particularly to a kind of massive dataflow processing method, device, Calculate equipment and storage medium.
Background technique
Internet marketing methods at present, whether purchase by group, the second kills, draws a lottery, or other preferential activities, all there is resource It is limited and demand is unlimited common ground, the user access request of the high concurrent and big flow that can all face.Often such work It is dynamic be it is short-term, the flowing of access of user and number of concurrent will be usually thousands of times, any one production system can not be One short-term activity deposit so multiserver, Internet resources.For network, middleware, database, for, this A huge challenge.
Have at present for the processing scheme of high concurrent big flow: immediate processing mode, all requests of user fully enter web Server, the flow control of user and con current control transfer to web middleware to handle.This mode easily occurs because network is stifled Problem that plug, system resource instantaneously occupy excessively high, system congestion, system extension is waited indefinitely.Asynchronous process mode, system are only responsible for collecting using Family request, after user's request collects, another process queue is responsible for handling the subscription process of commodity resource.This mode The consumption for reducing system peak resource to a certain extent reduces the probability that page response is slow, reports an error.But efficiency is very Low, period of reservation of number is too long, easily causes the bad user of black-box operation to perceive, network blockage can not still solve.
Both the above scheme can not all be handled cheating request.
Summary of the invention
Since prior art processing massive dataflow can not be handled cheating request, the embodiment of the invention provides one Kind massive dataflow processing method, calculates equipment and storage medium at device, can find cheating user in real time, improves user and orders Single treatment effeciency, and promote the access speed of user.
In a first aspect, the embodiment of the invention provides a kind of massive dataflow processing method, method includes:
Based on user's history access log, for each IP, each sub- period access in the order history period is counted respectively The cell-phone number quantity of the IP;
The variance that each sub- period accesses each IP mobile telephone number amount is calculated, the conduct that variance is greater than the first predetermined threshold is waited for Observe IP;
Based on cell-phone number and the related coefficient of access time under each IP to be seen and/or success rate is logged in, is identified Practise fraud user.
Second aspect, the embodiment of the invention provides a kind of massive dataflow processing unit, device include: statistical module, Computing module and identification module.
Statistical module can be based on user's history access log, for each IP, count each in the order history period respectively The sub- period accesses the cell-phone number quantity of the IP.
Computing module can calculate the variance for each sub- period accessing each IP mobile telephone number amount, and variance is greater than first and is made a reservation for The conduct of threshold value IP to be seen.
Identification module based on cell-phone number and the related coefficient of access time under each IP to be seen and/or can log in Success rate, identification cheating user.
The third aspect, the embodiment of the invention provides a kind of calculating equipment, comprising: at least one processor, at least one Memory and computer program instructions stored in memory are realized such as when computer program instructions are executed by processor The method of first aspect in above embodiment.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage mediums, are stored thereon with computer journey The method such as first aspect in above embodiment is realized in sequence instruction when computer program instructions are executed by processor.
Massive dataflow processing method, device, calculating equipment and storage medium provided in an embodiment of the present invention, pass through algorithm Measurement can find cheating user in real time, by Distributed Message Queue treatment mechanism, improve the processing effect of user's order Rate, and be deployed separately by dynamic resource and static resource, improve the access speed of user.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, will make below to required in the embodiment of the present invention Attached drawing is briefly described, for those of ordinary skill in the art, without creative efforts, also Other drawings may be obtained according to these drawings without any creative labor.
Fig. 1 shows the schematic flow chart of massive dataflow processing method according to an embodiment of the invention;
Fig. 2 shows the schematic flow charts of cheating user identification according to an embodiment of the invention;
Fig. 3 shows the schematic diagram of Distributed cache queries distribution according to an embodiment of the invention;
Fig. 4 shows the schematic block diagram of massive dataflow processing unit according to an embodiment of the invention;
Fig. 5 shows the hardware structural diagram provided in an embodiment of the present invention for calculating equipment.
Specific embodiment
The feature and exemplary embodiment of various aspects of the invention is described more fully below, in order to make mesh of the invention , technical solution and advantage be more clearly understood, with reference to the accompanying drawings and embodiments, the present invention is further retouched in detail It states.It should be understood that specific embodiment described herein is only configured to explain the present invention, it is not configured as limiting the present invention. To those skilled in the art, the present invention can be real in the case where not needing some details in these details It applies.Below the description of embodiment is used for the purpose of better understanding the present invention to provide by showing example of the invention.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence " including ... ", it is not excluded that including There is also other identical elements in the process, method, article or equipment of the element.
Aiming at the problem that processing of user's big flow high concurrent access can not be to identification cheating request, both occupying system resources Also fairness is influenced.And the mature solution of existing electric business, system architecture and scale are very huge, kernel model or place Reason mechanism does not disclose.The present invention provides a kind of massive dataflow processing scheme, and Fig. 1 shows according to an embodiment of the invention The schematic flow chart of massive dataflow processing method.
As shown in Figure 1, can be based on user's history access log in the step s 100, for each IP, statistics is predetermined respectively Each sub- period accesses the cell-phone number quantity of the IP in historical time section.
In all kinds of activities of telecom operators, maliciously carries out movable being usually agent, the mobile phone at family can be taken Number and IP address identify whether that the second is carried out for agent's malice kills using algorithm as key figure.
For example, the quantity of the cell-phone number of IP access can be counted, by formula of variance, to same IP in the phase of history time The cell-phone number quantity logged in is calculated, and variance yields is obtained, and greater than scheduled variance yields, is included in inventory to be seen.Wherein, may be used To adjust the length of order history time according to the quantity of access IP mobile telephone number, when the negligible amounts of access IP mobile telephone number can be with It is appropriate to increase statistical time length, when the access more length that can suitably shorten statistical time of IP mobile telephone number amount.It can incite somebody to action Statistical time section is divided into multiple sub- periods, counts the cell-phone number quantity of access IP in each sub- period respectively.
The variance that each sub- period accesses each IP mobile telephone number amount can be calculated in step s 200, and variance is greater than first The conduct of predetermined threshold IP to be seen.
Wherein, the dispersion degree of variance reflection data, discrete value is bigger to indicate that the cell-phone number quantity for accessing the IP is more unstable It is fixed.Formula of variance can be as follows, calculates the average value that each IP is accessed cell-phone number quantity first:
Wherein, n indicates that this group of data amount check, x1, x2, x3 ... xn indicate cell-phone number quantity, then variance are as follows:
It calculates and obtains the variance that each IP is accessed cell-phone number quantity, variance is greater than the IP of predetermined threshold as to be seen IP。
In step S300 can based on cell-phone number and the related coefficient of access time under each IP to be seen and/or Log in success rate, identification cheating user.
An embodiment according to the present invention can calculate the cell-phone number of access IP to be seen and its visit in the order history period Ask the Pearson correlation coefficient between the time, the related coefficient as cell-phone number and access time under IP to be seen.Pearson came phase Relationship number is not unique related coefficient, but the most common related coefficient.
Wherein, Pearson correlation coefficient is for measuring two linear correlations between variable X and Y, and value is between -1 and 1 Between.Wherein, it is linear relationship between two variables, is all continuous data;The generally normal distribution of two variables, or it is close The Unimodal Distribution of normal state, and the observation of two variables be it is pairs of, it is mutually indepedent between each pair of observation.Such as cell-phone number and Time can occur in pairs, and a phone number corresponds to the access time of the cell-phone number.
Cell-phone number and the correlation of time (phone, time) under the same IP can be counted, X is set as, uses Pearson came phase Data normalization (data subtract it and correspond to mean value) is carried out cosine similarity calculation by relationship number afterwards.Related coefficient can also be by It is regarded as by the cosine value of two random variable vector angles, i.e. cosine cosine similarity.Pearson correlation coefficient can be by following Formula calculates:
Wherein, Corr (x, y) indicates that related coefficient, xi indicate that the cell-phone number for sometime accessing IP, yi indicate the mobile phone Number access IP time,The sample average of respectively xi and yi.In general, after taking absolute value, related coefficient 0-0.09 Indicate no correlation, 0.1-0.3 is weak correlation, and 0.3-0.5 is medium correlation, and 0.5-1.0 is strong correlation.It can be by phase relation The IP user to be seen that number is greater than second threshold is considered as cheating user.Second threshold can be depending on the circumstances.
An embodiment according to the present invention, this method can also include: by k- means clustering algorithm, to the order history time The success rate of the biggish IP of login times is clustered in section, determines third threshold value.
Clustering is also known as cluster analysis, it is a kind of statistical analysis technique of study sample or index classification problem, simultaneously It is also an important algorithm of data mining.Based on similitude, than not same poly- between the mode in a cluster There are more similitudes between mode in class.The process of k-means (K- mean value) algorithm generally comprises:
Arbitrarily select k object as initial cluster center from n data object first, and for remaining other right As assigning these to the cluster most like with it respectively then according to the similarity (distance) of they and these cluster centres.So It calculates each cluster centre for obtaining and newly clustering again afterwards, constantly repeats this process until canonical measure function starts to converge to Only.Each cluster itself is compact as far as possible, and separated as far as possible between respectively clustering.
For example, can count IP in inventory to be seen logs in success rate, then it is superimposed on the basis of above-mentioned related coefficient Success rate (being set as Y) is logged in as two-dimentional decision content (X, Y), success rate is logged in using k- means clustering algorithm, passes through k- mean value The success rate of cluster IP larger to login times in certain time obtains after clustering, and the available overwhelming majority logs in success Rate is greater than 90%.It whether is cheating user by this threshold determination user it is possible thereby to set third threshold value by (X, Y).
An embodiment according to the present invention, this method can also include: that cheating user pipes off, when user requests to visit When asking, judge user whether in blacklist.
Wherein, the cell-phone number that cheating user can be the user determined by above-mentioned algorithm is also possible to the user mobile phone number Corresponding IP.Fig. 2 shows the schematic flow charts of cheating user identification according to an embodiment of the invention.As shown in Fig. 2, logical It crosses stream process engine and cheating identification is carried out to user's history access data stream, blacklist is written into cheating user, when user accesses When judge that user whether in blacklist, refuses the request of the user if in blacklist, if not passing through access if Request.
Above scheme used many algorithms measure whether be cheating user, pass through the data collection of a period of time, knot Interflow processing engine, can find cheating user in real time, judge whether user is cheating user, and not have to until carrying out afterwards Analysis.
User kills the page access second, and mainly there are two aspects for user's flowing of access of internet marketing, and one is that user is clear Look at loose-leaf, one be user's carry out activity order, two current capacity contrasts are about 3:1.User's flowing of access can be from dynamic Two aspects of resource and static resource are started with, and realize the deployment of sound resource separation.
An embodiment according to the present invention, this method can also include: method further include:
Based on preset rules library, the deployment package that developer uploads is divided into dynamic resource and static resource.Exploit person When member's deployment, entire deployment package can be uploaded, program decompresses deployment package automatically, determines which is static money according to pre-defined rule Source, which is dynamic resource, and generates mark table.
Wherein, default rule library can define the content that static file is regular (suffix name), and definition needs are processed, such as slow Deposit date, reduced rule, header file business rule etc..It can will be related to the module interacted with background service as dynamic resource, Remaining such as activity description page, movable pilot process, activity end content are as static resource.
Resource deployment interface can be separated to static resource and dynamic resource, repack, form new deployment resource.Point From when program the separation of file deployment interface can be carried out to static resource and dynamic resource, be repacked, formed according to mark table New deployment package.
Static resource can be processed, new deployment package is compressed into according to pre-defined rule library based on new deployment resource.It is right It in static resource, may be further processed, to file header, defined according to rule base, it can be plus validity period, compression rule Then, static resource business rule etc., recompression form new deployment package.
Dynamic resource and static resource can be deployed to dynamic state server and static clothes respectively according to the type of deployment package Business device.
An embodiment according to the present invention can dispose static resource by CDN server, wherein CDN server ISP (Internet Service Provider) of the Domain Name Service System according to domain name where user, geographical location, distribute service for user Device IP address.
Wherein, CDN server is content delivery network service device, can be by the static content periodic synchronization on source server Onto CDN server, the CDN server IP address optimal to user is parsed, it can be by high speed that CDN multiple spot lands Content distributing network accesses nearest static server website.In addition the technologies such as protocol optimization and data compression, may be implemented The high-speed transfer of static resource.The most flow of user handles completion in CDN network, after user's request need not all arrive at Platform server, it is possible to reduce to the pressure of network bandwidth and server.The flow distribution of the overwhelming majority is arrived into internet in this way In CDN server, it is possible to reduce the flow of source server 90%.
By the above sound separation and traffic filtering after, user's flowing of access remain as the decades of times of usually amount of access with On, and the QPS (Query Per Second, processing number of request per second) of a Web service, processing number of request per second can reach To 1,000 or so.
In order to further enhance the processing speed and efficiency of system, by resource data, session data, user data, high speed The data of page cache, system configuration information, rule verification etc. are carried out distributed by redis (Key-Value database) Caching.
An embodiment according to the present invention, this method can also include: using consistency hash algorithm to user access request Carry out Distributed cache queries distribution.
Consistency hash algorithm generally comprises: finding out the cryptographic Hash of server (node) first, and is configured to 0~232 Circle on.Then the cryptographic Hash of the key of storing data is found out using same method, and is mapped on identical circle.Then from number Start to search clockwise according to the position being mapped to, store data on first server found.If it exceeds 232Still It can not find server, will be saved on First server.Consistent hashing algorithm is in one cache of removal/addition, energy Enough existing key mapping relations of change as small as possible, meet the requirement of monotonicity as far as possible.
An embodiment according to the present invention can be used consistency hash algorithm based on the phone number of user and obtain Hash Value, the calculate node number based on cryptographic Hash and server obtain the calculate node corresponding to phone number.
When client has user's request to come in, the request of user is introduced into cache pool, and the thread of distribution queue is from caching It successively takes out data and carries out queue assignment in pond.Cryptographic Hash can be calculated to cell-phone number by consistency hash algorithm, further according to clothes Be engaged in the calculate node number of device, rem (hash (phonenamber) %nodes (service)) obtain cache node, and will ask Number is asked to be put into corresponding buffer queue.
The phone number of user is put into buffer queue corresponding with the calculate node, wherein of the buffer queue Number is identical as the calculate node number.
Such processing mode is simpler, and calculate node is identical as buffer queue number, and Fig. 3 is shown according to the present invention The schematic diagram of the Distributed cache queries distribution of one embodiment.
As shown in figure 3, each buffer queue be it is triplicate, major queue two is from queue, each calculate node All be it is triplicate, host node two is from node.Since the calculate node on each buffer queue has been fixed, these industry Business calculate node is responsible for the processing of the data on this buffer queue, the design and code of the system greatly simplified in this way There are two backups for amount, each buffer queue and calculate node, it is ensured that the High Availabitity of this production line.After the completion of service node processing Mark is stamped behind data, next calculate node is then handled, and after the completion of all marks, removes queue.
An embodiment according to the present invention, this method can also include: method further include: monitor the queue in each buffer queue Data backlog, when the queuing data backlog in buffer queue reaches predetermined quantity, to buffer queue and calculate node into Row Health Check.
Wherein, monitoring queue can be divided into queue monitoring subroutine, calculate node monitoring subroutine, queuing data monitoring Program, when monitor queuing data backlog it is big when, starting queue monitoring subroutine, calculate node monitoring subroutine to queue and The health of calculate node is checked, problematic process is replaced and is restarted.To ensure the High Availabitity applied.
Distributed Message Queue processing scheme can provide distributed solution for many places multiple threads and message queue Scheme carries out parallel processing for example, all requests of user can be divided into multiple queues in multithreading queuing mechanism.It is logical Excessive thread queuing concurrent mechanism, disposes multiple services, and the same service enables multiple thread queues, it can be achieved that mostly concurrently locating Reason, first-in first-out, processing user's request.
In addition, improving speed by multi-threading parallel process in Parallel access control, asking for resource plunder is brought Topic, reliable way are to carry out resources control using pessimistic lock mechanism, will lock when modification, concurrently repair simultaneously every time when multiple It needs that resource lock is waited to discharge when changing resource, this mode is very safe, not will lead to the oversold of commodity.But this mode effect It is very low, optimistic locking thinking is used in this motion, that is to say, that same all requests of part data are all qualified to go to modify, but meeting The version number of the data is obtained, what only version number met can just be updated successfully, other need to reacquire version progress Modification.The safety that both may insure resource in this way also improves the efficiency of data update.
Fig. 4 shows the schematic block diagram of massive dataflow processing unit according to an embodiment of the invention.Such as Fig. 4 Shown, which may include: statistical module 410, computing module 420 and identification module 430.
Statistical module 420 can be based on user's history access log, for each IP, count in the order history period respectively Each sub- period accesses the cell-phone number quantity of the IP.
Computing module 420 can calculate the variance for each sub- period accessing each IP mobile telephone number amount, and variance is greater than first The conduct of predetermined threshold IP to be seen.
Identification module 430 can based on cell-phone number and the related coefficient of access time under each IP to be seen and/or Log in success rate, identification cheating user.
An embodiment according to the present invention, the device 400 can also include: to be included in module and judgment module.
Cheating user can be piped off by being included in module.
Whether judgment module can judge user in blacklist when user requests access to.
An embodiment according to the present invention, the device 400 can also include: computing module.
Computing module, which can calculate, accesses the cell-phone number of IP to be seen and its between access time in the order history period Pearson correlation coefficient related coefficient is greater than second as the related coefficient of cell-phone number and access time under IP to be seen The IP user to be seen of threshold value is considered as cheating user.
An embodiment according to the present invention, the device 400 can also include: cluster module.
Cluster module can by k- means clustering algorithm, in the order history period the biggish IP of login times at Power is clustered, and determines third threshold value, will be logged in be seen IP user of the success rate greater than third threshold value and is considered as cheating user.
An embodiment according to the present invention, the device 400 can also include: distribution module.
Distribution module can be used consistency hash algorithm and carry out Distributed cache queries distribution to user access request.
An embodiment according to the present invention, distribution module may include: acquiring unit, obtain unit and be put into unit.
Acquiring unit can be used consistency hash algorithm and obtain cryptographic Hash based on the phone number of user.
Obtain unit can the calculate node number based on cryptographic Hash and server, obtain the calculating corresponding to phone number Node.
The phone number of user can be put into buffer queue corresponding with calculate node by being put into unit, wherein buffer queue Number it is identical as calculate node number.
An embodiment according to the present invention, distribution module can also include: monitoring unit and inspection unit.
Monitoring unit can monitor the queuing data backlog in each buffer queue.
Inspection unit can be when the queuing data backlog in buffer queue reaches predetermined quantity, to buffer queue and meter Operator node carries out Health Check.
An embodiment according to the present invention, the device 400 can also include: discriminating module, separation packetization module, processing module And deployment module.
Discriminating module can be based on preset rules library, and the deployment package that developer uploads is divided into dynamic resource and static state Resource.
Resource deployment interface can be separated to static resource and dynamic resource by separating packetization module, repacked, formed new Deployment resource.
Processing module can be processed static resource, be compressed into new portion based on new deployment resource according to pre-defined rule library Administration's packet.
Dynamic resource and static resource can be deployed to dynamic state server according to the type of deployment package by deployment module respectively And static server.
Above scheme used many algorithms measure whether be cheating user, pass through the data collection of a period of time, knot Interflow processing engine, can find cheating user in real time, obtain whether user is cheating user, and not have to until carrying out afterwards Analysis.By the parallel processing mechanism of distributed message queue, the treatment effeciency of user's order, lifting system are substantially increased Availability.
It is deployed separately by the sound of automation, reduces the development deployment workload of programmer, promote processing business and updates Speed greatly improves the access speed of user using CDN network, greatly reduces the pressure of server, reduce network bandwidth and System resources consumption.
In addition, the massive dataflow processing method in conjunction with Fig. 1 embodiment of the present invention described can be by calculating equipment Lai real It is existing.Fig. 5 shows the hardware structural diagram provided in an embodiment of the present invention for calculating equipment.
Calculating equipment may include processor 501 and the memory 502 for being stored with computer program instructions.
Specifically, above-mentioned processor 501 may include central processing unit (CPU) or specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured to implement implementation of the present invention One or more integrated circuits of example.
Memory 502 may include the mass storage for data or instruction.For example it rather than limits, memory 502 may include hard disk drive (Hard Disk Drive, HDD), floppy disk drive, flash memory, CD, magneto-optic disk, tape or logical With the combination of universal serial bus (Universal Serial Bus, USB) driver or two or more the above.It is closing In the case where suitable, memory 502 may include the medium of removable or non-removable (or fixed).In a suitable case, it stores Device 502 can be inside or outside data processing equipment.In a particular embodiment, memory 502 is nonvolatile solid state storage Device.In a particular embodiment, memory 502 includes read-only memory (ROM).In a suitable case, which can be mask ROM, programming ROM (PROM), erasable PROM (EPROM), the electric erasable PROM (EEPROM), electrically-alterable ROM of programming (EAROM) or the combination of flash memory or two or more the above.
Processor 501 is by reading and executing the computer program instructions stored in memory 502, to realize above-mentioned implementation Any one massive dataflow processing method in example.
In one example, calculating equipment may also include communication interface 503 and bus 510.Wherein, as shown in figure 5, processing Device 501, memory 502, communication interface 503 connect by bus 510 and complete mutual communication.
Communication interface 503 is mainly used for realizing in the embodiment of the present invention between each module, device, unit and/or equipment Communication.
Bus 510 includes hardware, software or both, and the component for calculating equipment is coupled to each other together.For example and It is unrestricted, bus may include accelerated graphics port (AGP) or other graphics bus, enhancing Industry Standard Architecture (EISA) bus, Front side bus (FSB), super transmission (HT) interconnection, the interconnection of Industry Standard Architecture (ISA) bus, infinite bandwidth, low pin count (LPC) Bus, memory bus, micro- channel architecture (MCA) bus, peripheral component interconnection (PCI) bus, PCI-Express (PCI-X) Bus, Serial Advanced Technology Attachment (SATA) bus, Video Electronics Standards Association part (VLB) bus or other suitable buses Or the combination of two or more the above.In a suitable case, bus 510 may include one or more buses.To the greatest extent Specific bus has been described and illustrated in the pipe embodiment of the present invention, but the present invention considers any suitable bus or interconnection.
In addition, in conjunction with the massive dataflow processing method in above-described embodiment, the embodiment of the present invention can provide a kind of calculating Machine readable storage medium storing program for executing is realized.Computer program instructions are stored on the computer readable storage medium;The computer program Any one massive dataflow processing method in above-described embodiment is realized in instruction when being executed by processor.
In conclusion the traffic filtering scheme that the program is separated by sound, alleviates server stress.Pass through one point Cloth stream process engine distributes user access request data flow in multithreading queue, by multithreading queue concurrent processing, mentions The treatment effeciency of user's request is risen.By recognition mechanism of practising fraud, the cheating of user is found, to reject from queue. Cheating user can be found in real time, improve the treatment effeciency of user's order, and promote the access speed of user.
It should be clear that the invention is not limited to specific configuration described above and shown in figure and processing. For brevity, it is omitted here the detailed description to known method.In the above-described embodiments, several tools have been described and illustrated The step of body, is as example.But method process of the invention is not limited to described and illustrated specific steps, this field Technical staff can be variously modified, modification and addition after understanding spirit of the invention, or suitable between changing the step Sequence.
Functional block shown in structures described above block diagram can be implemented as hardware, software, firmware or their group It closes.When realizing in hardware, it may, for example, be electronic circuit, specific integrated circuit (ASIC), firmware appropriate, insert Part, function card etc..When being realized with software mode, element of the invention is used to execute program or the generation of required task Code section.Perhaps code segment can store in machine readable media program or the data-signal by carrying in carrier wave is passing Defeated medium or communication links are sent." machine readable media " may include any medium for capableing of storage or transmission information. The example of machine readable media includes electronic circuit, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), soft Disk, CD-ROM, CD, hard disk, fiber medium, radio frequency (RF) link, etc..Code segment can be via such as internet, inline The computer network of net etc. is downloaded.
It should also be noted that, the exemplary embodiment referred in the present invention, is retouched based on a series of step or device State certain methods or system.But the present invention is not limited to the sequence of above-mentioned steps, that is to say, that can be according in embodiment The sequence referred to executes step, may also be distinct from that the sequence in embodiment or several steps are performed simultaneously.
The above description is merely a specific embodiment, it is apparent to those skilled in the art that, For convenience of description and succinctly, the system, module of foregoing description and the specific work process of unit can refer to preceding method Corresponding process in embodiment, details are not described herein.It should be understood that scope of protection of the present invention is not limited thereto, it is any to be familiar with Those skilled in the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or substitutions, These modifications or substitutions should be covered by the protection scope of the present invention.

Claims (13)

1. a kind of massive dataflow processing method, which is characterized in that the described method includes:
Based on user's history access log, for each IP, counted in the order history period respectively described in each sub- period access The cell-phone number quantity of IP;
The variance that each sub- period accesses each IP mobile telephone number amount is calculated, the conduct that the variance is greater than the first predetermined threshold is waited for Observe IP;
Based on cell-phone number and the related coefficient of access time under each IP to be seen and/or success rate is logged in, is identified Practise fraud user.
2. the method according to claim 1, wherein the method also includes:
The cheating user is piped off;
When user requests access to, judge user whether in the blacklist.
3. the method according to claim 1, wherein the method also includes:
Calculate the cell-phone number of access IP to be seen and its Pearson came correlation between access time in the order history period Coefficient, as the related coefficient of cell-phone number and access time under the IP to be seen,
Wherein, the IP user to be seen that the related coefficient is greater than second threshold is considered as cheating user.
4. the method according to claim 1, wherein the method also includes:
By k- means clustering algorithm, the success rate of the biggish IP of login times in the order history period is gathered Class determines third threshold value,
Wherein, the to be seen IP user of the success rate greater than the third threshold value will be logged in and is considered as cheating user.
5. the method according to claim 1, wherein the method also includes:
Distributed cache queries distribution is carried out to user access request using consistency hash algorithm.
6. according to the method described in claim 5, it is characterized in that, it is described using consistency hash algorithm user access request into The distribution of row Distributed cache queries, comprising:
Cryptographic Hash is obtained based on the phone number of user using consistency hash algorithm;
Calculate node number based on the cryptographic Hash and server obtains the calculate node corresponding to the phone number;
The phone number of the user is put into buffer queue corresponding with the calculate node, wherein of the buffer queue Number is identical as the calculate node number.
7. according to the method described in claim 6, it is characterized in that, described use consistency hash algorithm to user access request After progress Distributed cache queries distribution, further includes:
Monitor the queuing data backlog in each buffer queue;
When the queuing data backlog in the buffer queue reaches predetermined quantity, the buffer queue and the calculating are saved Point carries out Health Check.
8. the method according to claim 1, wherein the method also includes:
Based on preset rules library, the deployment package that developer uploads is divided into dynamic resource and static resource;
Resource deployment interface is separated to static resource and dynamic resource, repacks, forms new deployment resource;
The static resource is processed according to pre-defined rule library based on the new deployment resource, is compressed into new deployment package;
Dynamic resource and static resource are deployed to dynamic state server and static server respectively according to the type of the deployment package.
9. according to the method described in claim 8, it is characterized in that, the deployment package that developer is uploaded divides into dynamic Resource and static resource, comprising:
It will be related to the module interacted with background service as dynamic resource, remaining is as static resource.
10. according to the method described in claim 8, it is characterized in that, the type according to the deployment package is deployed to respectively Dynamic state server and static server, comprising:
Static resource is disposed by CDN server;
ISP of the Domain Name Service System of the CDN server according to domain name where user, geographical location, distribute service for user Device IP address.
11. a kind of massive dataflow processing unit, which is characterized in that described device includes:
Statistical module, for each IP, counts each period of the day from 11 p.m. to 1 a.m in the order history period for being based on user's history access log respectively Between section access the cell-phone number quantity of the IP;
Computing module accesses the variance of each IP mobile telephone number amount for calculating each sub- period, and it is pre- that the variance is greater than first Determine the conduct IP to be seen of threshold value;
Identification module, for based on cell-phone number and the related coefficient of access time under each IP to be seen and/or stepping on Land success rate, identification cheating user.
12. a kind of calculating equipment characterized by comprising at least one processor, at least one processor and be stored in institute The computer program instructions in memory are stated, are realized when the computer program instructions are executed by the processor as right is wanted Seek method described in any one of 1-10.
13. a kind of computer readable storage medium, is stored thereon with computer program instructions, which is characterized in that when the calculating Such as method of any of claims 1-10 is realized when machine program instruction is executed by processor.
CN201711498056.4A 2017-12-30 2017-12-30 Massive dataflow processing method, calculates equipment and storage medium at device Pending CN109995834A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711498056.4A CN109995834A (en) 2017-12-30 2017-12-30 Massive dataflow processing method, calculates equipment and storage medium at device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711498056.4A CN109995834A (en) 2017-12-30 2017-12-30 Massive dataflow processing method, calculates equipment and storage medium at device

Publications (1)

Publication Number Publication Date
CN109995834A true CN109995834A (en) 2019-07-09

Family

ID=67111090

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711498056.4A Pending CN109995834A (en) 2017-12-30 2017-12-30 Massive dataflow processing method, calculates equipment and storage medium at device

Country Status (1)

Country Link
CN (1) CN109995834A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866198A (en) * 2019-09-27 2020-03-06 上海硬通网络科技有限公司 Static resource caching method, system, device, computer equipment and storage medium
CN111931047A (en) * 2020-07-31 2020-11-13 中国平安人寿保险股份有限公司 Artificial intelligence-based black product account detection method and related device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060285665A1 (en) * 2005-05-27 2006-12-21 Nice Systems Ltd. Method and apparatus for fraud detection
CN103077172A (en) * 2011-10-26 2013-05-01 腾讯科技(深圳)有限公司 Method and device for mining cheating user
CN104754000A (en) * 2013-12-30 2015-07-01 国家电网公司 Load equalizing method and system
CN105282045A (en) * 2015-11-17 2016-01-27 高新兴科技集团股份有限公司 Distributed calculating and storage method based on consistent Hash algorithm
CN105516261A (en) * 2015-11-26 2016-04-20 深圳市深信服电子科技有限公司 Web page loading control method and load balancer
CN106022834A (en) * 2016-05-24 2016-10-12 腾讯科技(深圳)有限公司 Advertisement against cheating method and device
CN106485559A (en) * 2015-08-19 2017-03-08 阿里巴巴集团控股有限公司 Cheating recognition methods and device for on-line shop
CN106506451A (en) * 2016-09-30 2017-03-15 百度在线网络技术(北京)有限公司 The processing method and processing device of malicious access
CN106598823A (en) * 2016-10-19 2017-04-26 同盾科技有限公司 Difference calculation method and system for network behavior characteristics
CN106603554A (en) * 2016-12-29 2017-04-26 北京奇艺世纪科技有限公司 Adaptive real-time video data anti-cheating method and apparatus

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060285665A1 (en) * 2005-05-27 2006-12-21 Nice Systems Ltd. Method and apparatus for fraud detection
CN103077172A (en) * 2011-10-26 2013-05-01 腾讯科技(深圳)有限公司 Method and device for mining cheating user
CN104754000A (en) * 2013-12-30 2015-07-01 国家电网公司 Load equalizing method and system
CN106485559A (en) * 2015-08-19 2017-03-08 阿里巴巴集团控股有限公司 Cheating recognition methods and device for on-line shop
CN105282045A (en) * 2015-11-17 2016-01-27 高新兴科技集团股份有限公司 Distributed calculating and storage method based on consistent Hash algorithm
CN105516261A (en) * 2015-11-26 2016-04-20 深圳市深信服电子科技有限公司 Web page loading control method and load balancer
CN106022834A (en) * 2016-05-24 2016-10-12 腾讯科技(深圳)有限公司 Advertisement against cheating method and device
CN106506451A (en) * 2016-09-30 2017-03-15 百度在线网络技术(北京)有限公司 The processing method and processing device of malicious access
CN106598823A (en) * 2016-10-19 2017-04-26 同盾科技有限公司 Difference calculation method and system for network behavior characteristics
CN106603554A (en) * 2016-12-29 2017-04-26 北京奇艺世纪科技有限公司 Adaptive real-time video data anti-cheating method and apparatus

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KARGER D, LEHMAN E, LEIGHTON T: "Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web", 《TWENTY-NINTH ACM SYMPOSIUM ON THEORY OF COMPUTING》 *
孙乔等: "基于一致性哈希的分布式数据库性能拓展", 《计算机应用》 *
裴沛等: "一种改进的分布式存储系统节点动态扩展策略", 《广西民族大学学报(自然科学版)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866198A (en) * 2019-09-27 2020-03-06 上海硬通网络科技有限公司 Static resource caching method, system, device, computer equipment and storage medium
CN110866198B (en) * 2019-09-27 2022-10-28 上海硬通网络科技有限公司 Static resource caching method, system, device, computer equipment and storage medium
CN111931047A (en) * 2020-07-31 2020-11-13 中国平安人寿保险股份有限公司 Artificial intelligence-based black product account detection method and related device

Similar Documents

Publication Publication Date Title
US10002144B2 (en) Identification of distinguishing compound features extracted from real time data streams
CN108776934B (en) Distributed data calculation method and device, computer equipment and readable storage medium
Liu et al. Monitoring and analyzing big traffic data of a large-scale cellular network with Hadoop
US9996409B2 (en) Identification of distinguishable anomalies extracted from real time data streams
CN112800095B (en) Data processing method, device, equipment and storage medium
JP5735969B2 (en) System and method for analyzing social graph data for determining connections within a community
CN111181799B (en) Network traffic monitoring method and equipment
CN108965347A (en) A kind of detecting method of distributed denial of service attacking, device and server
CN106815254B (en) Data processing method and device
WO2011134086A1 (en) Systems and methods for conducting reliable assessments with connectivity information
WO2011047474A1 (en) Systems and methods for social graph data analytics to determine connectivity within a community
CN111786950A (en) Situation awareness-based network security monitoring method, device, equipment and medium
JP7069173B2 (en) A system that prepares network traffic for fast analysis
CN108920948A (en) A kind of anti-fraud streaming computing device and method
CN109495291B (en) Calling abnormity positioning method and device and server
CN108875091A (en) A kind of distributed network crawler system of unified management
CN113360554A (en) Method and equipment for extracting, converting and loading ETL (extract transform load) data
CN113656168A (en) Method, system, medium and equipment for automatic disaster recovery and scheduling of traffic
CN111740868A (en) Alarm data processing method and device and storage medium
CN113835874A (en) Deep learning service scheduling method, system, terminal and storage medium
CN109446225A (en) Data cache method, device, computer equipment and storage medium
CN109995834A (en) Massive dataflow processing method, calculates equipment and storage medium at device
CN111800292A (en) Early warning method and device based on historical flow, computer equipment and storage medium
CN111581258A (en) Safety data analysis method, device, system, equipment and storage medium
CN112650614A (en) Call chain monitoring method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190709