CN107402980A - A kind of processing method and system of big data under Network Environment - Google Patents

A kind of processing method and system of big data under Network Environment Download PDF

Info

Publication number
CN107402980A
CN107402980A CN201710546811.5A CN201710546811A CN107402980A CN 107402980 A CN107402980 A CN 107402980A CN 201710546811 A CN201710546811 A CN 201710546811A CN 107402980 A CN107402980 A CN 107402980A
Authority
CN
China
Prior art keywords
session
data
session data
identification
merging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710546811.5A
Other languages
Chinese (zh)
Inventor
徐振超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Esafenet Science & Technology Co Ltd
Original Assignee
Beijing Esafenet Science & Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Esafenet Science & Technology Co Ltd filed Critical Beijing Esafenet Science & Technology Co Ltd
Priority to CN201710546811.5A priority Critical patent/CN107402980A/en
Publication of CN107402980A publication Critical patent/CN107402980A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/146Markers for unambiguous identification of a particular session, e.g. session cookie or URL-encoding

Abstract

The embodiments of the invention provide a kind of processing method and system of the big data under Network Environment, wherein, methods described includes:Obtain a plurality of the first session data with session identification to conform to a predetermined condition;The first session data with identical session identification is merged, respectively obtained and each self-corresponding second session data of session identification;If the current caching that reaches merges the cycle, second session data is merged with the 3rd session data with identical session identification cached, obtained and each self-corresponding 4th session data of session identification;And/or if current reach caching flush cycle, at least one of second session data, the 3rd session data cached, the 4th session data are write into output file, shown for output.The embodiment of the present invention reduces the resource occupation that big data merges under network environment, alleviates the pressure of merging, saves the time of merging, realizes the dynamic regulation of merging.

Description

A kind of processing method and system of big data under Network Environment
Technical field
The present embodiments relate to the big data under technical field of data processing, more particularly to a kind of Network Environment Processing method and system.
Background technology
Due to the high speed development of the popularization, particularly internet and the communication technology of Intelligent life in recent years so that network rings Huge, numerous and diverse various data are generated under border, no matter still these data will all be increased with linear incremental in the future now, i.e., The big data formed under network environment.
At present, the conventional scheme merged to a certain amount of data is:The information such as the attribute according to data it is identical or Similar principle is disposably merged to total data, if data volume is larger, merges that the time is long, efficiency is low.
Being additionally, since big data has the characteristic that can not be caught, managed and be handled by conventional tool, therefore, nothing Method handles the big data of the conventional Merge Scenarios application of data in a network environment, so, to the big number under network environment It is to be solved according to letter the problem of merging.
The content of the invention
It is existing to solve the embodiments of the invention provide a kind of processing method and system of the big data under Network Environment The problem of some data Merge Scenarioses can not be applied in big data in a network environment.
One side according to embodiments of the present invention, there is provided a kind of processing method of the big data under Network Environment, Including:
Obtain a plurality of the first session data with session identification to conform to a predetermined condition;
The first session data with identical session identification is merged, respectively obtained each corresponding with session identification The second session data;
If the current caching that reaches merges the cycle, second session data had into identical session with what is cached 3rd session data of mark merges, and obtains and each self-corresponding 4th session data of session identification;And/or
If current reach caching flush cycle, by second session data, the 3rd session data cached, the At least one of four session datas write output file, are shown for output.
A kind of another aspect according to embodiments of the present invention, there is provided the processing system of the big data under Network Environment System, including:
Acquisition module, for obtaining a plurality of the first session data with session identification to conform to a predetermined condition;
Merging module, for the first session data with identical session identification to be merged, respectively obtain participant Words identify each self-corresponding second session data;
Merging module is cached, if merging the cycle for currently reaching caching, by second session data with having delayed The 3rd session data with identical session identification deposited merges, and obtains and each self-corresponding 4th session of session identification Data;
Caching empties module, if for currently reaching caching flush cycle, by second session data, has cached The 3rd session data, at least one of the 4th session data write-in output file, shown for output.
The processing method and system of big data under the Network Environment provided according to embodiments of the present invention, first, obtain A plurality of the first session data with session identification to conform to a predetermined condition is taken, secondly, by the with identical session identification One session data merges, respectively obtain with each self-corresponding second session data of session identification, if current reach caching Merge the cycle, then merge the second session data with the 3rd session data with identical session identification cached, Obtain and each self-corresponding 4th session data of session identification;And/or if current reach caching flush cycle, will Second session data, the 3rd session data cached and/or the 4th session data write-in output file, for exporting exhibition Show.
The embodiment of the present invention is merged by the first session data to getting according to session identification identical principle, is obtained Second session data of the session identification to before with merging, by the second session data and stored and there is same session mark The 3rd session data known merges again, obtains the 4th session data with the session identification before merging again, and/or Person, at least one of the second session data, the 3rd session data cached, the 4th session data are write into output file, with Shown for exporting.The merging process of first session data is mainly divided into two parts by the embodiment of the present invention, a part, is closed And the first session data got in a period of time obtains the second session data;Another part, by the second session data and 3rd session data of caching merges again, wherein, the 3rd session data cached can be the second session data.The present invention Embodiment reduces the resource occupation that big data merges under network environment, alleviates the pressure of merging, save by repeatedly merging The time merged, merge cycle time length moreover, passing through and changing caching, thus it is possible to vary merging obtains the second session data Quantity, the time length of flush cycle is cached by changing, thus it is possible to vary merging obtains the second session data and the 3rd session number According to quantity, and cached the quantity of the 3rd session data, realized the dynamic regulation of merging.
Brief description of the drawings
Fig. 1 is the step flow of the processing method of the big data under according to embodiments of the present invention one Network Environment Figure;
Fig. 2 is the step flow of the processing method of the big data under according to embodiments of the present invention two Network Environment Figure;
Fig. 3 is the structured flowchart of the processing system of the big data under according to embodiments of the present invention three Network Environment;
Fig. 4 is the structured flowchart of the processing system of the big data under according to embodiments of the present invention four Network Environment;
Fig. 5 is the structural representation of the processing system of the big data under according to embodiments of the present invention five Network Environment Figure;
Fig. 6 is the concrete structure schematic diagram of according to embodiments of the present invention five data analysis extraction cluster 54.
Embodiment
(identical label represents identical element in some accompanying drawings) and embodiment below in conjunction with the accompanying drawings, implement to the present invention The embodiment of example is described in further detail.Following examples are used to illustrate the present invention, but are not limited to the present invention Scope.
It will be understood by those skilled in the art that the term such as " first ", " second " in the embodiment of the present invention is only used for distinguishing Different step, equipment or module etc., any particular technology implication is neither represented, also do not indicate that the inevitable logic between them is suitable Sequence.
Embodiment one
Reference picture 1, show the processing side of the big data under a kind of according to embodiments of the present invention one Network Environment The step flow chart of method
The processing method of big data under the Network Environment of the present embodiment comprises the following steps:
Step S100, a plurality of the first session data with session identification to conform to a predetermined condition is obtained.
In the present embodiment, the first session data can be obtained from the big data under network environment, wherein, under network environment Big data can include the data of arbitrary format, such as excel, word, pdf, the number of any protocol type can also be included According to such as http agreements, pop3 agreements, smtp agreements, the big data under network environment refer to number caused by various communication protocols According to the present embodiment is not particularly limited to the big data under network environment.
In a kind of optional embodiment, the big data under network environment can be obtained in real time, and assist according to default filtering View rule carries out being filtrated to get valid data to big data;Valid data are integrated to obtain according to default integration protocol rule A plurality of the first session data for belonging to same rule.Wherein, presetting the regular and default integration protocol rule of filtering protocol can root According to being actually needed, progress is self-defined, the model of mobile terminal is extracted such as from the big data of http agreements, the present embodiment is to pre- If the regular and default integration protocol rule of filtering protocol is not limited.Moreover, according to default filtering protocol rule, default integration The first session data that protocol rule obtains after handling big data may be considered the first session for meeting preparatory condition Data.
First session data can be data caused by communication under network environment between user, in the first session data Session identification be used for represent the first session data identity information, each session data of bar first can have identical or different Session identification, the present embodiment is not limited to the first session data.
, can be according to the form of queue to the first session after a plurality of the first session data with session identification is obtained Data are stored, meanwhile, in order to improve the storage security of the first session data, every first session data can be carried out Backup, it can be more parts by every the first session data backup specifically.
Step S102, the first session data with identical session identification is merged, respectively obtained and session mark Know each self-corresponding second session data.
In the present embodiment, the principle merged to the first session data is two the first session numbers of session identification identical According to a second new session data is merged into, the session identification of the second new session data is still the first session number before merging According to session identification.For example, an envelope mail is sent to more people, often sends to a people and then produce first session data, every The session identification of first session data is identical.
In the present embodiment, a period can be pre-set, first session number is got in the preset time period According to then merging first session data (need to get the premise bar of the first session data with same session mark Under part), until preset time period terminates, realize and the first session data is merged one by one.Merging one by one in the present embodiment Refer to after getting first session data, next next got and the previous session data of bar first are belonged into phase With rule and the first session data with same session mark merges, then by the second session data after merging again Next with getting belongs to same rule and is merged with the first session data that same session identifies, with such Push away.
If step S104, the current caching that reaches merges the cycle, by the second session data with having cached with identical The 3rd session data of session identification merge, obtain and each self-corresponding 4th session data of session identification.
In the present embodiment, the caching merging cycle can be a period of time pre-set, merge in the caching in the cycle, Perform above-mentioned steps S100 and step S102.When reaching the caching merging cycle, will be obtained at least in above-mentioned steps S102 One article of second session data and cached and with the 3rd meeting with least one the second session data identical session identifications Words data merge, obtain with merge before the second session data or the 3rd session data have same session mark the 4th Session data.
For example, above-mentioned steps S102 obtains two the second session datas, respectively the second session with session identification b1 Data D1 and the second session data D2 with session identification b2, wherein, session identification b1 is different from session identification b2.Cache The 3rd session data be respectively the 3rd session data E1 with session identification b1, the 3rd session number with session identification b2 According to E2 and with session identification b3 the 3rd session data E3.In this step S104, by the second session data D1 and the 3rd meeting Words data E1 is merged, and the 4th session data F1 with session identification b1 is obtained, by the second session data D2 and the 3rd meeting Words data E2 is merged, and obtains the 4th session data F2 with session identification b2.
If step S106, current reach caching flush cycle, by the second session data, the 3rd session number cached According at least one of, the 4th session data write-in output file, shown for output.
In the present embodiment, caching flush cycle can be a period of time pre-set, merge week reaching the caching During the phase, at least one of the second session data, the 3rd session data cached, the 4th session data are write into output file, Shown for output.
The processing method and system of big data under the Network Environment provided according to embodiments of the present invention, first, obtain A plurality of the first session data with session identification to conform to a predetermined condition is taken, secondly, by the with identical session identification One session data merges, respectively obtain with each self-corresponding second session data of session identification, if current reach caching Merge the cycle, then merge the second session data with the 3rd session data with identical session identification cached, Obtain and each self-corresponding 4th session data of session identification;And/or if current reach caching flush cycle, will Second session data, the 3rd session data cached and/or the 4th session data write-in output file, for exporting exhibition Show.
The embodiment of the present invention is merged by the first session data to getting according to session identification identical principle, is obtained Second session data of the session identification to before with merging, by the second session data and stored and there is same session mark The 3rd session data known merges again, obtains the 4th session data with the session identification before merging again, and/or Person, at least one of the second session data, the 3rd session data cached, the 4th session data are write into output file, with Shown for exporting.The merging process of first session data is mainly divided into two parts by the embodiment of the present invention, a part, is closed And the first session data got in a period of time obtains the second session data;Another part, by the second session data and 3rd session data of caching merges again, wherein, the 3rd session data cached can be the second session data.The present invention Embodiment reduces the resource occupation that big data merges under network environment, alleviates the pressure of merging, save by repeatedly merging The time merged, merge cycle time length moreover, passing through and changing caching, thus it is possible to vary merging obtains the second session data Quantity, the time length of flush cycle is cached by changing, thus it is possible to vary merging obtains the second session data and the 3rd session number According to quantity, and cached the quantity of the 3rd session data, realized the dynamic regulation of merging.
Embodiment two
Reference picture 2, show the processing side of the big data under a kind of according to embodiments of the present invention two Network Environment The step flow chart of method.
It should be noted that the part of various embodiments of the present invention description is given priority to, the not detailed description of certain embodiment Part can be found in introduction and explanation in other embodiments of the invention, repeat no more.
The processing method of big data under the Network Environment of the present embodiment comprises the following steps:
Step S200, a plurality of the first session data with session identification to conform to a predetermined condition is obtained.
In a kind of optional embodiment, the big data under network environment can be obtained in real time, and assist according to default filtering View rule carries out being filtrated to get valid data to big data;Valid data are integrated to obtain according to default integration protocol rule A plurality of the first session data for belonging to same rule.Wherein, presetting the regular and default integration protocol rule of filtering protocol can root According to being actually needed, progress is self-defined, the model of mobile terminal is extracted such as from the big data of http agreements, the present embodiment is to pre- If the regular and default integration protocol rule of filtering protocol is not limited.
, can be according to the form of queue to the first session data after the first session data to be conformed to a predetermined condition Stored, meanwhile, in order to improve the storage security of the first session data, every first session data can be carried out standby Part.
Step S202, output file is write using get first the first session data as display data, and by the One the first session data is stored in buffer structure.
, that is, can not when obtaining first the first session data when only obtaining first session data in the present embodiment The union operation of the first session data is carried out, now writes output file using first the first session data as display data, First session data is shown to user or third-party application by output file.Moreover, also by first the first session data Store into buffer structure, so as to subsequent execution step S206.
Step S204, the first session data with identical session identification is merged, respectively obtained and session mark Know each self-corresponding second session data.
In the present embodiment, every first session data has respective session identification, the session of every first session data Mark can be with identical or different.The principle merged to the first session data is two the first sessions of session identification identical Data merge into a second new session data, and the session identification of the second new session data is still the first session before merging The session identification of data.For example, an envelope mail is sent to more people, often sends to a people and then produce first session data, often The session identification of the session data of bar first is identical.
In the present embodiment, a period can be pre-set, first session number is got in the preset time period According to then merging first session data (need to get the premise bar of the first session data with same session mark Under part), until preset time period terminates, realize and the first session data is merged one by one.Merging one by one in the present embodiment Refer to after getting first session data, next next got and the previous session data of bar first are belonged into phase With rule and the first session data with same session mark merges, then by the second session data after merging again Next with getting belongs to same rule and is merged with the first session data that same session identifies, with such Push away.
Step S206, in buffer structure, inquiry whether there is threeth session data corresponding with the second session data, If in the presence of execution step S208;If being not present, step S212 is performed, the second session data is stored into buffer structure.
In the present embodiment, the session identification of the 3rd session data in buffer structure, being inquired about in buffer structure is It is no threeth session data consistent with the session identification of the second session data to be present.Wherein, it is corresponding with the second session data 3rd session data refers to the session identification phase of the session identification and the 3rd session data in buffer structure of the second session data Together.The 3rd session data stored in buffer structure is used as keyword using the session identification of the 3rd session data.
Step S208, the second session data is merged again with the 3rd corresponding session data, and will be closed again The 4th session data obtained after and is stored into buffer structure.
For example, the threeth session data H corresponding with the second session data G in buffer structure be present, by the second session number Merge again according to G and the 3rd session data H, obtain the 4th session data Q, then store the 4th session data Q to buffer structure In, and cover the 3rd session data H, now, in buffer structure only exist one with the second session data G belong to same rule, And with the 4th session data Q of same session mark.
Step S210, timer is set, if timer reaches caching and merges the cycle, and a plurality of first session data merges Finish, then export whole session datas in buffer structure to output file, to cover display data.
In the present embodiment, not only timer can be set in this step S210, can also be before step S210 appoint Timer is set in meaning implementation procedure, judges whether that reaching caching merges the cycle by timer.
Merge the cycle when timer reaches caching, and the first session data got in step S200 merges and finished, Then whole session datas in buffer structure are exported to output file, first the first session number in covering step S202 According to.
It should be noted that whole session datas of buffer structure can include the second session data, the 3rd session data, 4th session data.If the 3rd session data that with the second session data there is same session to identify is not present in buffer structure, Then the second session data is stored into buffer structure.
, can be by whole session datas (second in output file by above-mentioned merging and output to each session data At least one of session data, the 3rd session data, the 4th session data) store to conversation database, and then to session number Analyzed according to each session data in storehouse, using etc..It is exemplified below two kinds of analyses to each session data, application example.
Example one:Amount of user data counts
The 5th session data is read from buffer structure or in conversation database, the 5th session data can be above-mentioned the Any one in two session datas, the 3rd session data, the 4th session data, according to belonging to the 5th session data read Integral point time segment information the 5th session data read is counted, inquired about from customer data base and count and read The 5th session data got belongs to same integral point time segment information and belongs to same agreement with the 5th session data read User data quantity, count results are added with quantity, as amount of user data statistical result.Wherein it is possible to according to Timestamp field in 5th session data determines that the 5th session data particularly belongs to which integral point time segment information, for example, certain Information in the timestamp field of 5th session data is " 1392515067621 ", then the timestamp of the 5th session data is The timestamp of millisecond number since 1970.
For example, reading the 5th session data P from conversation database, the 5th session data P belongs to agreement X1, the 5th session Data P belongs to integral point time segment information T10, and that is read in one period K1 of record from conversation database belongs to the integral point period Article number L of information T10 the 5th session data, inquires about and counts from customer data base and belong to phase with the 5th session data P With agreement X1 and belong to the quantity S of integral point time segment information T10 user data, bar number L and quantity S-phase are added, as user The final result of data bulk statistics.
Example two:User profile is extracted
The 6th session data is read from buffer structure or in conversation database, the 6th session data can be above-mentioned the Any one in two session datas, the 3rd session data, the 4th session data, according to default regular expression to reading The 6th session data parsed, obtain user related information;Wherein, user related information includes at least one of:Move The hardware and software information of dynamic terminal, virtual identity information, associated person information, movable record information etc..
For example, the agreement header in the 6th session data of http agreements is extracted according to default regular expression To the manufacturer's information of mobile terminal, language message, browser information, operating system version information etc..Extract the 6th session number The value in certain field in, the value extracted is parsed to obtain software information in mobile terminal, using account information, should With nickname information etc..Contact field in the 6th session data of telephone protocol and short message protocol is extracted, obtains associated person information Deng.
The processing method and system of big data under the Network Environment provided according to embodiments of the present invention, first, obtain A plurality of the first session data with session identification to conform to a predetermined condition is taken, secondly, by the with identical session identification One session data merges, respectively obtain with each self-corresponding second session data of session identification, if current reach caching Merge the cycle, then merge the second session data with the 3rd session data with identical session identification cached, Obtain and each self-corresponding 4th session data of session identification;And/or if current reach caching flush cycle, will Second session data, the 3rd session data cached and/or the 4th session data write-in output file, for exporting exhibition Show.
The embodiment of the present invention is merged by the first session data to getting according to session identification identical principle, is obtained Second session data of the session identification to before with merging, by the second session data and stored and there is same session mark The 3rd session data known merges again, obtains the 4th session data with the session identification before merging again, and/or Person, at least one of the second session data, the 3rd session data cached, the 4th session data are write into output file, with Shown for exporting.The merging process of first session data is mainly divided into two parts by the embodiment of the present invention, a part, is closed And the first session data got in a period of time obtains the second session data;Another part, by the second session data and 3rd session data of caching merges again, wherein, the 3rd session data cached can be the second session data.The present invention Embodiment reduces the resource occupation that big data merges under network environment, alleviates the pressure of merging, save by repeatedly merging The time merged, merge cycle time length moreover, passing through and changing caching, thus it is possible to vary merging obtains the second session data Quantity, the time length of flush cycle is cached by changing, thus it is possible to vary merging obtains the second session data and the 3rd session number According to quantity, and cached the quantity of the 3rd session data, realized the dynamic regulation of merging.
The embodiment of the present invention exports get first the first session data to output file, without waiting for the first meeting Words data carry out the displaying of the first session data again after merging, accelerate the speed of the first session data displaying, optimize user Experience.
Embodiment three
Reference picture 3, show the processing system of the big data under a kind of according to embodiments of the present invention three Network Environment The structured flowchart of system.
The processing system of big data under the Network Environment of the present embodiment includes:Acquisition module 30, accorded with for obtaining Close a plurality of the first session data with session identification of predetermined condition;Merging module 32, for that will have identical session mark The first session data known merges, and respectively obtains and each self-corresponding second session data of session identification;Caching merges mould Block 34, if merging the cycle for currently reaching caching, the second session data had into identical session mark with what is cached The 3rd session data known merges, and obtains and each self-corresponding 4th session data of session identification;Caching empties module 36, If for currently reaching caching flush cycle, by the second session data, the 3rd session data cached, the 4th session number According at least one of write-in output file, for output show.
The processing system of big data under the Network Environment of the present embodiment is used to realize corresponding in above-described embodiment The disposal system and method for big data under Network Environment, and the beneficial effect with corresponding embodiment of the method, herein not Repeat again.
Example IV
Reference picture 4, show the processing system of the big data under a kind of according to embodiments of the present invention four Network Environment The structured flowchart of system.
The processing system of big data under the Network Environment of the present embodiment includes:Acquisition module 40, accorded with for obtaining Close a plurality of the first session data with session identification of predetermined condition;Merging module 41, for that will have identical session mark The first session data known merges, and respectively obtains and each self-corresponding second session data of session identification;Caching merges mould Block 42, if merging the cycle for currently reaching caching, the second session data had into identical session mark with what is cached The 3rd session data known merges, and obtains and each self-corresponding 4th session data of session identification;Caching empties module 43, If for currently reaching caching flush cycle, by the second session data, the 3rd session data cached, the 4th session number According at least one of write-in output file, for output show.
Alternatively, the processing system of the big data under the Network Environment that the present embodiment provides also includes:Data stock Module 44 is stored up, at least one of the second session data, the 3rd session data, the 4th session data to be stored in into session number According in storehouse.
Alternatively, the processing system of the big data under the Network Environment that the present embodiment provides also includes:Output module 45, for merging the first session data with identical session identification in merging module 41, respectively obtain and session Before identifying each self-corresponding second session data, write using get first the first session data as display data defeated Go out file, and first the first session data is stored in buffer structure.
Alternatively, the processing system of the big data under the Network Environment that the present embodiment provides also includes:Buffer structure Memory module 46, for caching merging module 42 obtain with after each self-corresponding 4th session data of session identification, by the Four session datas are stored into buffer structure.
Alternatively, the processing system of the big data under the Network Environment that the present embodiment provides also includes:User data Quantity statistics module, for reading the 5th session data from buffer structure;According to the integral point time belonging to the 5th session data Segment information counts to the 5th session data;Inquire about and count from customer data base and belong to same with the 5th session data Integral point time segment information and the quantity for belonging to the user data of same agreement with the 5th session data;By count results and quantity It is added, as amount of user data statistical result.
Alternatively, the processing system of the big data under the Network Environment that the present embodiment provides also includes:User profile Extraction module, for reading the 6th session data from buffer structure;According to default regular expression to the 6th session data Parsed, obtain user related information;Wherein, user related information includes at least one of:The hardware and software of mobile terminal Information, virtual identity information, associated person information, movable record information.
The processing system of big data under the Network Environment of the present embodiment is used to realize corresponding in above-described embodiment The disposal system and method for big data under Network Environment, and the beneficial effect with corresponding embodiment of the method, herein not Repeat again.
Embodiment five
Reference picture 5, show the processing system of the big data under a kind of according to embodiments of the present invention five Network Environment The structural representation of system.
The processing system of big data under the Network Environment of the present embodiment includes:Data acquisition cluster 50, data are whole Close and temporary cluster 52, data analysis extraction cluster 54, cache database cluster 56 and shared resource manager cluster 58.Its In, shared resource manager cluster 58 is used for the health information and server shape for collecting every server in other each clusters State information, so as to be effectively the server-assignment resource in each cluster, ensure system normal operation, improve system effectiveness.Number It is used to read the big data under network environment according to cluster 50 is obtained, big data filter according to default filtering protocol rule Data Integration and temporary cluster 52 are pushed to valid data, and by valid data.Data Integration is used to press with temporary cluster 52 Valid data are integrated according to default integration protocol rule to obtain a plurality of session data for belonging to same rule, and will be integrated To session data store to cache database cluster 56.Data analysis extraction cluster 54 is used to actively capture cache database collection The session data stored in group 56, the data that conversate merge, and carry out amount of user data system to the session data after merging The processing such as meter, user profile extraction.
Fig. 6 is the concrete structure schematic diagram that cluster 54 is extracted in data analysis, wherein, grabbing assembly 541, which is used to capture, to be cached The session data stored in data-base cluster 56, and give application component 542 and further handle.Application component 542 is used for basis Actual demand conversate data merge, amount of user data statistics, user profile extraction etc. processing.Application component 542 is also used Stored in the session data after by merging into buffer structure.(grabbed moreover, grabbing assembly 541 is made up of several placement units Unit 1, placement unit 2 ... placement unit n) are taken, application component 542 is made up of several applying units and (applying unit 1, answered With unit 2 ... applying unit n).In actual applications, corresponding to the session data that placement unit grabs can directly be given Applying unit.
The processing system of big data under the Network Environment of the present embodiment is used to realize corresponding in above-described embodiment The disposal system and method for big data under Network Environment, and the beneficial effect with corresponding embodiment of the method, herein not Repeat again.
It may be noted that according to the needs of implementation, all parts/step described in the embodiment of the present invention can be split as more Multi-part/step, the part operation of two or more components/steps or components/steps can be also combined into new part/step Suddenly, to realize the purpose of the embodiment of the present invention.
Above-mentioned method according to embodiments of the present invention can be realized in hardware, firmware, or be implemented as being storable in note Software or computer code in recording medium (such as CD ROM, RAM, floppy disk, hard disk or magneto-optic disk), or it is implemented through net The original storage that network is downloaded is in long-range recording medium or nonvolatile machine readable media and will be stored in local recording medium In computer code, can be stored in using all-purpose computer, application specific processor or can compile so as to method described here Such software processing in journey or the recording medium of specialized hardware (such as ASIC or FPGA).It is appreciated that computer, processing Device, microprocessor controller or programmable hardware include can storing or receive software or computer code storage assembly (for example, RAM, ROM, flash memory etc.), when the software or computer code are by computer, processor or hardware access and when performing, realize Hidden tooth abnormal correction method described here.In addition, when all-purpose computer is accessed for realizing the hidden tooth being shown in which just During abnormal code, the execution of code by all-purpose computer be converted to by perform the hidden tooth abnormal correction being shown in which it is special based on Calculation machine.
Those of ordinary skill in the art are it is to be appreciated that the list of each example described with reference to the embodiments described herein Member and method and step, it can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually Performed with hardware or software mode, application-specific and design constraint depending on technical scheme.Professional and technical personnel Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed The scope of the embodiment of the present invention.
Embodiment of above is merely to illustrate the embodiment of the present invention, and is not the limitation to the embodiment of the present invention, relevant skill The those of ordinary skill in art field, in the case where not departing from the spirit and scope of the embodiment of the present invention, it can also make various Change and modification, therefore all equivalent technical schemes fall within the category of the embodiment of the present invention, the patent of the embodiment of the present invention Protection domain should be defined by the claims.

Claims (10)

  1. A kind of 1. processing method of the big data under Network Environment, it is characterised in that including:
    Obtain a plurality of the first session data with session identification to conform to a predetermined condition;
    The first session data with identical session identification is merged, respectively obtained and session identification each self-corresponding Two session datas;
    If the current caching that reaches merges the cycle, second session data had into identical session identification with what is cached The 3rd session data merge, obtain and each self-corresponding 4th session data of session identification;And/or
    If current reach caching flush cycle, by second session data, the 3rd session data cached, the 4th meeting At least one of data write-in output file is talked about, is shown for output.
  2. 2. according to the method for claim 1, it is characterised in that methods described also includes:
    At least one of second session data, the 3rd session data, the 4th session data are stored in conversation database In.
  3. 3. according to the method for claim 1, it is characterised in that described by the first session with identical session identification Data merge, and respectively obtain and before each self-corresponding second session data of session identification, methods described also includes:
    The output file is write using get first the first session data as display data, and by described first article the One session data storage is in buffer structure.
  4. 4. according to the method for claim 3, it is characterised in that obtained and each self-corresponding 4th meeting of session identification described After talking about data, methods described also includes:
    4th session data is stored into the buffer structure.
  5. 5. according to the method for claim 4, it is characterised in that also include:
    The 5th session data is read from the buffer structure;
    Integral point time segment information according to belonging to the 5th session data counts to the 5th session data;
    Inquired about from customer data base and count with the 5th session data belong to same integral point time segment information and with institute State the quantity that the 5th session data belongs to the user data of same agreement;
    Count results are added with the quantity, as amount of user data statistical result.
  6. 6. the method according to claim 4 or 5, it is characterised in that also include:
    The 6th session data is read from the buffer structure;
    The 6th session data is parsed according to default regular expression, obtains user related information;
    Wherein, the user related information includes at least one of:The hardware and software information of mobile terminal, virtual identity information, Associated person information, movable record information.
  7. A kind of 7. processing system of the big data under Network Environment, it is characterised in that including:
    Acquisition module, for obtaining a plurality of the first session data with session identification to conform to a predetermined condition;
    Merging module, for the first session data with identical session identification to be merged, respectively obtain and session mark Know each self-corresponding second session data;
    Merging module is cached, if merging the cycle for currently reaching caching, by second session data and has been cached The 3rd session data with identical session identification merges, and obtains and each self-corresponding 4th session number of session identification According to;
    Caching empties module, if for currently reaching caching flush cycle, by second session data, cached the At least one of three session datas, the 4th session data write output file, are shown for output.
  8. 8. system according to claim 7, it is characterised in that the system also includes:Database storage module, for inciting somebody to action At least one of second session data, the 3rd session data, the 4th session data are stored in conversation database.
  9. 9. system according to claim 7, it is characterised in that the system also includes:
    Output module, for the first session data with identical session identification to be merged in the merging module, point Do not obtain with before each self-corresponding second session data of session identification, using get first the first session data as exhibition First first session data is stored in buffer structure by registration according to the write-in output file.
  10. 10. system according to claim 9, it is characterised in that the system also includes:Buffer structure memory module, use In it is described caching merging module obtain with after each self-corresponding 4th session data of session identification, by the 4th session number According to storing into the buffer structure.
CN201710546811.5A 2017-07-06 2017-07-06 A kind of processing method and system of big data under Network Environment Pending CN107402980A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710546811.5A CN107402980A (en) 2017-07-06 2017-07-06 A kind of processing method and system of big data under Network Environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710546811.5A CN107402980A (en) 2017-07-06 2017-07-06 A kind of processing method and system of big data under Network Environment

Publications (1)

Publication Number Publication Date
CN107402980A true CN107402980A (en) 2017-11-28

Family

ID=60405450

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710546811.5A Pending CN107402980A (en) 2017-07-06 2017-07-06 A kind of processing method and system of big data under Network Environment

Country Status (1)

Country Link
CN (1) CN107402980A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241007A (en) * 2018-07-19 2019-01-18 北京亿赛通网络安全技术有限公司 The pretreatment system and method for email big data under a kind of network environment
CN109241176A (en) * 2018-07-10 2019-01-18 北京亿赛通科技发展有限责任公司 The correlation analysis system and method for big data under a kind of Network Environment
CN111080448A (en) * 2019-12-02 2020-04-28 深圳索信达数据技术有限公司 Intention analysis method based on conversation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970843A (en) * 2014-04-28 2014-08-06 东华大学 Conversation combining method based on UUID in Web log preprocessing
CN104144069A (en) * 2013-05-10 2014-11-12 中国电信股份有限公司 Method and device for correlating wireless side call data records and user service behaviors
CN104424219A (en) * 2013-08-23 2015-03-18 华为技术有限公司 Method and equipment of managing data documents
CN104426713A (en) * 2013-08-28 2015-03-18 腾讯科技(北京)有限公司 Method and device for monitoring network site access effect data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104144069A (en) * 2013-05-10 2014-11-12 中国电信股份有限公司 Method and device for correlating wireless side call data records and user service behaviors
CN104424219A (en) * 2013-08-23 2015-03-18 华为技术有限公司 Method and equipment of managing data documents
CN104426713A (en) * 2013-08-28 2015-03-18 腾讯科技(北京)有限公司 Method and device for monitoring network site access effect data
CN103970843A (en) * 2014-04-28 2014-08-06 东华大学 Conversation combining method based on UUID in Web log preprocessing

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241176A (en) * 2018-07-10 2019-01-18 北京亿赛通科技发展有限责任公司 The correlation analysis system and method for big data under a kind of Network Environment
CN109241007A (en) * 2018-07-19 2019-01-18 北京亿赛通网络安全技术有限公司 The pretreatment system and method for email big data under a kind of network environment
CN109241007B (en) * 2018-07-19 2021-08-13 北京亿赛通网络安全技术有限公司 System and method for preprocessing email big data in network environment
CN111080448A (en) * 2019-12-02 2020-04-28 深圳索信达数据技术有限公司 Intention analysis method based on conversation
CN111080448B (en) * 2019-12-02 2024-03-26 深圳索信达数据技术有限公司 Intent analysis method based on session

Similar Documents

Publication Publication Date Title
CN106022708A (en) Method for predicting employee resignation
CN103778148B (en) Life cycle management method and equipment for data file of Hadoop distributed file system
CN107402980A (en) A kind of processing method and system of big data under Network Environment
CN102662988B (en) Method for filtering redundant data of RFID middleware
CN106651416A (en) Analyzing method and analyzing device of application popularization information
US20060224682A1 (en) System and method of screening unstructured messages and communications
CN107979477A (en) A kind of method and system of business monitoring
CN102148805A (en) Feature matching method and device
CN116737482A (en) Method and device for collecting chip test data in real time and electronic equipment
CN115062087A (en) User portrait construction method, device, equipment and medium
CN102801548A (en) Intelligent early warning method, device and information system
CN111666308B (en) Behavior analysis-based intelligent big data recommendation query method and system
CN103297419A (en) Method and system for fusing off-line data and on-line data
CN101431760A (en) Method and system for implementing business report
AU2019101198A4 (en) A statistical analysis method of mobile telecom data driven user loss prediction
CN110677269B (en) Method and device for determining communication user relationship and computer readable storage medium
CN109299132A (en) SQL data processing method, system and electronic equipment
CN105786945B (en) A kind of power information data efficient processing method based on data channel
CN109241176A (en) The correlation analysis system and method for big data under a kind of Network Environment
CN107835190A (en) A kind of malice SP orders check method
CN109429296A (en) For terminal and the associated method, apparatus of internet information and storage medium
CN112256734A (en) Big data processing method, device, system, equipment and storage medium
CN101827175A (en) Method and system for storing sorted call bills by catalog
CN109241388A (en) A kind of application programming interfaces behavior analysis method and system
CN105868197B (en) A kind of statistical method and statistic device of call bill data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171128