CN107402980A - A kind of processing method and system of big data under Network Environment - Google Patents
A kind of processing method and system of big data under Network Environment Download PDFInfo
- Publication number
- CN107402980A CN107402980A CN201710546811.5A CN201710546811A CN107402980A CN 107402980 A CN107402980 A CN 107402980A CN 201710546811 A CN201710546811 A CN 201710546811A CN 107402980 A CN107402980 A CN 107402980A
- Authority
- CN
- China
- Prior art keywords
- session
- data
- session data
- identification
- merging
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/14—Session management
- H04L67/146—Markers for unambiguous identification of a particular session, e.g. session cookie or URL-encoding
Abstract
The embodiments of the invention provide a kind of processing method and system of the big data under Network Environment, wherein, methods described includes:Obtain a plurality of the first session data with session identification to conform to a predetermined condition;The first session data with identical session identification is merged, respectively obtained and each self-corresponding second session data of session identification;If the current caching that reaches merges the cycle, second session data is merged with the 3rd session data with identical session identification cached, obtained and each self-corresponding 4th session data of session identification;And/or if current reach caching flush cycle, at least one of second session data, the 3rd session data cached, the 4th session data are write into output file, shown for output.The embodiment of the present invention reduces the resource occupation that big data merges under network environment, alleviates the pressure of merging, saves the time of merging, realizes the dynamic regulation of merging.
Description
Technical field
The present embodiments relate to the big data under technical field of data processing, more particularly to a kind of Network Environment
Processing method and system.
Background technology
Due to the high speed development of the popularization, particularly internet and the communication technology of Intelligent life in recent years so that network rings
Huge, numerous and diverse various data are generated under border, no matter still these data will all be increased with linear incremental in the future now, i.e.,
The big data formed under network environment.
At present, the conventional scheme merged to a certain amount of data is:The information such as the attribute according to data it is identical or
Similar principle is disposably merged to total data, if data volume is larger, merges that the time is long, efficiency is low.
Being additionally, since big data has the characteristic that can not be caught, managed and be handled by conventional tool, therefore, nothing
Method handles the big data of the conventional Merge Scenarios application of data in a network environment, so, to the big number under network environment
It is to be solved according to letter the problem of merging.
The content of the invention
It is existing to solve the embodiments of the invention provide a kind of processing method and system of the big data under Network Environment
The problem of some data Merge Scenarioses can not be applied in big data in a network environment.
One side according to embodiments of the present invention, there is provided a kind of processing method of the big data under Network Environment,
Including:
Obtain a plurality of the first session data with session identification to conform to a predetermined condition;
The first session data with identical session identification is merged, respectively obtained each corresponding with session identification
The second session data;
If the current caching that reaches merges the cycle, second session data had into identical session with what is cached
3rd session data of mark merges, and obtains and each self-corresponding 4th session data of session identification;And/or
If current reach caching flush cycle, by second session data, the 3rd session data cached, the
At least one of four session datas write output file, are shown for output.
A kind of another aspect according to embodiments of the present invention, there is provided the processing system of the big data under Network Environment
System, including:
Acquisition module, for obtaining a plurality of the first session data with session identification to conform to a predetermined condition;
Merging module, for the first session data with identical session identification to be merged, respectively obtain participant
Words identify each self-corresponding second session data;
Merging module is cached, if merging the cycle for currently reaching caching, by second session data with having delayed
The 3rd session data with identical session identification deposited merges, and obtains and each self-corresponding 4th session of session identification
Data;
Caching empties module, if for currently reaching caching flush cycle, by second session data, has cached
The 3rd session data, at least one of the 4th session data write-in output file, shown for output.
The processing method and system of big data under the Network Environment provided according to embodiments of the present invention, first, obtain
A plurality of the first session data with session identification to conform to a predetermined condition is taken, secondly, by the with identical session identification
One session data merges, respectively obtain with each self-corresponding second session data of session identification, if current reach caching
Merge the cycle, then merge the second session data with the 3rd session data with identical session identification cached,
Obtain and each self-corresponding 4th session data of session identification;And/or if current reach caching flush cycle, will
Second session data, the 3rd session data cached and/or the 4th session data write-in output file, for exporting exhibition
Show.
The embodiment of the present invention is merged by the first session data to getting according to session identification identical principle, is obtained
Second session data of the session identification to before with merging, by the second session data and stored and there is same session mark
The 3rd session data known merges again, obtains the 4th session data with the session identification before merging again, and/or
Person, at least one of the second session data, the 3rd session data cached, the 4th session data are write into output file, with
Shown for exporting.The merging process of first session data is mainly divided into two parts by the embodiment of the present invention, a part, is closed
And the first session data got in a period of time obtains the second session data;Another part, by the second session data and
3rd session data of caching merges again, wherein, the 3rd session data cached can be the second session data.The present invention
Embodiment reduces the resource occupation that big data merges under network environment, alleviates the pressure of merging, save by repeatedly merging
The time merged, merge cycle time length moreover, passing through and changing caching, thus it is possible to vary merging obtains the second session data
Quantity, the time length of flush cycle is cached by changing, thus it is possible to vary merging obtains the second session data and the 3rd session number
According to quantity, and cached the quantity of the 3rd session data, realized the dynamic regulation of merging.
Brief description of the drawings
Fig. 1 is the step flow of the processing method of the big data under according to embodiments of the present invention one Network Environment
Figure;
Fig. 2 is the step flow of the processing method of the big data under according to embodiments of the present invention two Network Environment
Figure;
Fig. 3 is the structured flowchart of the processing system of the big data under according to embodiments of the present invention three Network Environment;
Fig. 4 is the structured flowchart of the processing system of the big data under according to embodiments of the present invention four Network Environment;
Fig. 5 is the structural representation of the processing system of the big data under according to embodiments of the present invention five Network Environment
Figure;
Fig. 6 is the concrete structure schematic diagram of according to embodiments of the present invention five data analysis extraction cluster 54.
Embodiment
(identical label represents identical element in some accompanying drawings) and embodiment below in conjunction with the accompanying drawings, implement to the present invention
The embodiment of example is described in further detail.Following examples are used to illustrate the present invention, but are not limited to the present invention
Scope.
It will be understood by those skilled in the art that the term such as " first ", " second " in the embodiment of the present invention is only used for distinguishing
Different step, equipment or module etc., any particular technology implication is neither represented, also do not indicate that the inevitable logic between them is suitable
Sequence.
Embodiment one
Reference picture 1, show the processing side of the big data under a kind of according to embodiments of the present invention one Network Environment
The step flow chart of method
The processing method of big data under the Network Environment of the present embodiment comprises the following steps:
Step S100, a plurality of the first session data with session identification to conform to a predetermined condition is obtained.
In the present embodiment, the first session data can be obtained from the big data under network environment, wherein, under network environment
Big data can include the data of arbitrary format, such as excel, word, pdf, the number of any protocol type can also be included
According to such as http agreements, pop3 agreements, smtp agreements, the big data under network environment refer to number caused by various communication protocols
According to the present embodiment is not particularly limited to the big data under network environment.
In a kind of optional embodiment, the big data under network environment can be obtained in real time, and assist according to default filtering
View rule carries out being filtrated to get valid data to big data;Valid data are integrated to obtain according to default integration protocol rule
A plurality of the first session data for belonging to same rule.Wherein, presetting the regular and default integration protocol rule of filtering protocol can root
According to being actually needed, progress is self-defined, the model of mobile terminal is extracted such as from the big data of http agreements, the present embodiment is to pre-
If the regular and default integration protocol rule of filtering protocol is not limited.Moreover, according to default filtering protocol rule, default integration
The first session data that protocol rule obtains after handling big data may be considered the first session for meeting preparatory condition
Data.
First session data can be data caused by communication under network environment between user, in the first session data
Session identification be used for represent the first session data identity information, each session data of bar first can have identical or different
Session identification, the present embodiment is not limited to the first session data.
, can be according to the form of queue to the first session after a plurality of the first session data with session identification is obtained
Data are stored, meanwhile, in order to improve the storage security of the first session data, every first session data can be carried out
Backup, it can be more parts by every the first session data backup specifically.
Step S102, the first session data with identical session identification is merged, respectively obtained and session mark
Know each self-corresponding second session data.
In the present embodiment, the principle merged to the first session data is two the first session numbers of session identification identical
According to a second new session data is merged into, the session identification of the second new session data is still the first session number before merging
According to session identification.For example, an envelope mail is sent to more people, often sends to a people and then produce first session data, every
The session identification of first session data is identical.
In the present embodiment, a period can be pre-set, first session number is got in the preset time period
According to then merging first session data (need to get the premise bar of the first session data with same session mark
Under part), until preset time period terminates, realize and the first session data is merged one by one.Merging one by one in the present embodiment
Refer to after getting first session data, next next got and the previous session data of bar first are belonged into phase
With rule and the first session data with same session mark merges, then by the second session data after merging again
Next with getting belongs to same rule and is merged with the first session data that same session identifies, with such
Push away.
If step S104, the current caching that reaches merges the cycle, by the second session data with having cached with identical
The 3rd session data of session identification merge, obtain and each self-corresponding 4th session data of session identification.
In the present embodiment, the caching merging cycle can be a period of time pre-set, merge in the caching in the cycle,
Perform above-mentioned steps S100 and step S102.When reaching the caching merging cycle, will be obtained at least in above-mentioned steps S102
One article of second session data and cached and with the 3rd meeting with least one the second session data identical session identifications
Words data merge, obtain with merge before the second session data or the 3rd session data have same session mark the 4th
Session data.
For example, above-mentioned steps S102 obtains two the second session datas, respectively the second session with session identification b1
Data D1 and the second session data D2 with session identification b2, wherein, session identification b1 is different from session identification b2.Cache
The 3rd session data be respectively the 3rd session data E1 with session identification b1, the 3rd session number with session identification b2
According to E2 and with session identification b3 the 3rd session data E3.In this step S104, by the second session data D1 and the 3rd meeting
Words data E1 is merged, and the 4th session data F1 with session identification b1 is obtained, by the second session data D2 and the 3rd meeting
Words data E2 is merged, and obtains the 4th session data F2 with session identification b2.
If step S106, current reach caching flush cycle, by the second session data, the 3rd session number cached
According at least one of, the 4th session data write-in output file, shown for output.
In the present embodiment, caching flush cycle can be a period of time pre-set, merge week reaching the caching
During the phase, at least one of the second session data, the 3rd session data cached, the 4th session data are write into output file,
Shown for output.
The processing method and system of big data under the Network Environment provided according to embodiments of the present invention, first, obtain
A plurality of the first session data with session identification to conform to a predetermined condition is taken, secondly, by the with identical session identification
One session data merges, respectively obtain with each self-corresponding second session data of session identification, if current reach caching
Merge the cycle, then merge the second session data with the 3rd session data with identical session identification cached,
Obtain and each self-corresponding 4th session data of session identification;And/or if current reach caching flush cycle, will
Second session data, the 3rd session data cached and/or the 4th session data write-in output file, for exporting exhibition
Show.
The embodiment of the present invention is merged by the first session data to getting according to session identification identical principle, is obtained
Second session data of the session identification to before with merging, by the second session data and stored and there is same session mark
The 3rd session data known merges again, obtains the 4th session data with the session identification before merging again, and/or
Person, at least one of the second session data, the 3rd session data cached, the 4th session data are write into output file, with
Shown for exporting.The merging process of first session data is mainly divided into two parts by the embodiment of the present invention, a part, is closed
And the first session data got in a period of time obtains the second session data;Another part, by the second session data and
3rd session data of caching merges again, wherein, the 3rd session data cached can be the second session data.The present invention
Embodiment reduces the resource occupation that big data merges under network environment, alleviates the pressure of merging, save by repeatedly merging
The time merged, merge cycle time length moreover, passing through and changing caching, thus it is possible to vary merging obtains the second session data
Quantity, the time length of flush cycle is cached by changing, thus it is possible to vary merging obtains the second session data and the 3rd session number
According to quantity, and cached the quantity of the 3rd session data, realized the dynamic regulation of merging.
Embodiment two
Reference picture 2, show the processing side of the big data under a kind of according to embodiments of the present invention two Network Environment
The step flow chart of method.
It should be noted that the part of various embodiments of the present invention description is given priority to, the not detailed description of certain embodiment
Part can be found in introduction and explanation in other embodiments of the invention, repeat no more.
The processing method of big data under the Network Environment of the present embodiment comprises the following steps:
Step S200, a plurality of the first session data with session identification to conform to a predetermined condition is obtained.
In a kind of optional embodiment, the big data under network environment can be obtained in real time, and assist according to default filtering
View rule carries out being filtrated to get valid data to big data;Valid data are integrated to obtain according to default integration protocol rule
A plurality of the first session data for belonging to same rule.Wherein, presetting the regular and default integration protocol rule of filtering protocol can root
According to being actually needed, progress is self-defined, the model of mobile terminal is extracted such as from the big data of http agreements, the present embodiment is to pre-
If the regular and default integration protocol rule of filtering protocol is not limited.
, can be according to the form of queue to the first session data after the first session data to be conformed to a predetermined condition
Stored, meanwhile, in order to improve the storage security of the first session data, every first session data can be carried out standby
Part.
Step S202, output file is write using get first the first session data as display data, and by the
One the first session data is stored in buffer structure.
, that is, can not when obtaining first the first session data when only obtaining first session data in the present embodiment
The union operation of the first session data is carried out, now writes output file using first the first session data as display data,
First session data is shown to user or third-party application by output file.Moreover, also by first the first session data
Store into buffer structure, so as to subsequent execution step S206.
Step S204, the first session data with identical session identification is merged, respectively obtained and session mark
Know each self-corresponding second session data.
In the present embodiment, every first session data has respective session identification, the session of every first session data
Mark can be with identical or different.The principle merged to the first session data is two the first sessions of session identification identical
Data merge into a second new session data, and the session identification of the second new session data is still the first session before merging
The session identification of data.For example, an envelope mail is sent to more people, often sends to a people and then produce first session data, often
The session identification of the session data of bar first is identical.
In the present embodiment, a period can be pre-set, first session number is got in the preset time period
According to then merging first session data (need to get the premise bar of the first session data with same session mark
Under part), until preset time period terminates, realize and the first session data is merged one by one.Merging one by one in the present embodiment
Refer to after getting first session data, next next got and the previous session data of bar first are belonged into phase
With rule and the first session data with same session mark merges, then by the second session data after merging again
Next with getting belongs to same rule and is merged with the first session data that same session identifies, with such
Push away.
Step S206, in buffer structure, inquiry whether there is threeth session data corresponding with the second session data,
If in the presence of execution step S208;If being not present, step S212 is performed, the second session data is stored into buffer structure.
In the present embodiment, the session identification of the 3rd session data in buffer structure, being inquired about in buffer structure is
It is no threeth session data consistent with the session identification of the second session data to be present.Wherein, it is corresponding with the second session data
3rd session data refers to the session identification phase of the session identification and the 3rd session data in buffer structure of the second session data
Together.The 3rd session data stored in buffer structure is used as keyword using the session identification of the 3rd session data.
Step S208, the second session data is merged again with the 3rd corresponding session data, and will be closed again
The 4th session data obtained after and is stored into buffer structure.
For example, the threeth session data H corresponding with the second session data G in buffer structure be present, by the second session number
Merge again according to G and the 3rd session data H, obtain the 4th session data Q, then store the 4th session data Q to buffer structure
In, and cover the 3rd session data H, now, in buffer structure only exist one with the second session data G belong to same rule,
And with the 4th session data Q of same session mark.
Step S210, timer is set, if timer reaches caching and merges the cycle, and a plurality of first session data merges
Finish, then export whole session datas in buffer structure to output file, to cover display data.
In the present embodiment, not only timer can be set in this step S210, can also be before step S210 appoint
Timer is set in meaning implementation procedure, judges whether that reaching caching merges the cycle by timer.
Merge the cycle when timer reaches caching, and the first session data got in step S200 merges and finished,
Then whole session datas in buffer structure are exported to output file, first the first session number in covering step S202
According to.
It should be noted that whole session datas of buffer structure can include the second session data, the 3rd session data,
4th session data.If the 3rd session data that with the second session data there is same session to identify is not present in buffer structure,
Then the second session data is stored into buffer structure.
, can be by whole session datas (second in output file by above-mentioned merging and output to each session data
At least one of session data, the 3rd session data, the 4th session data) store to conversation database, and then to session number
Analyzed according to each session data in storehouse, using etc..It is exemplified below two kinds of analyses to each session data, application example.
Example one:Amount of user data counts
The 5th session data is read from buffer structure or in conversation database, the 5th session data can be above-mentioned the
Any one in two session datas, the 3rd session data, the 4th session data, according to belonging to the 5th session data read
Integral point time segment information the 5th session data read is counted, inquired about from customer data base and count and read
The 5th session data got belongs to same integral point time segment information and belongs to same agreement with the 5th session data read
User data quantity, count results are added with quantity, as amount of user data statistical result.Wherein it is possible to according to
Timestamp field in 5th session data determines that the 5th session data particularly belongs to which integral point time segment information, for example, certain
Information in the timestamp field of 5th session data is " 1392515067621 ", then the timestamp of the 5th session data is
The timestamp of millisecond number since 1970.
For example, reading the 5th session data P from conversation database, the 5th session data P belongs to agreement X1, the 5th session
Data P belongs to integral point time segment information T10, and that is read in one period K1 of record from conversation database belongs to the integral point period
Article number L of information T10 the 5th session data, inquires about and counts from customer data base and belong to phase with the 5th session data P
With agreement X1 and belong to the quantity S of integral point time segment information T10 user data, bar number L and quantity S-phase are added, as user
The final result of data bulk statistics.
Example two:User profile is extracted
The 6th session data is read from buffer structure or in conversation database, the 6th session data can be above-mentioned the
Any one in two session datas, the 3rd session data, the 4th session data, according to default regular expression to reading
The 6th session data parsed, obtain user related information;Wherein, user related information includes at least one of:Move
The hardware and software information of dynamic terminal, virtual identity information, associated person information, movable record information etc..
For example, the agreement header in the 6th session data of http agreements is extracted according to default regular expression
To the manufacturer's information of mobile terminal, language message, browser information, operating system version information etc..Extract the 6th session number
The value in certain field in, the value extracted is parsed to obtain software information in mobile terminal, using account information, should
With nickname information etc..Contact field in the 6th session data of telephone protocol and short message protocol is extracted, obtains associated person information
Deng.
The processing method and system of big data under the Network Environment provided according to embodiments of the present invention, first, obtain
A plurality of the first session data with session identification to conform to a predetermined condition is taken, secondly, by the with identical session identification
One session data merges, respectively obtain with each self-corresponding second session data of session identification, if current reach caching
Merge the cycle, then merge the second session data with the 3rd session data with identical session identification cached,
Obtain and each self-corresponding 4th session data of session identification;And/or if current reach caching flush cycle, will
Second session data, the 3rd session data cached and/or the 4th session data write-in output file, for exporting exhibition
Show.
The embodiment of the present invention is merged by the first session data to getting according to session identification identical principle, is obtained
Second session data of the session identification to before with merging, by the second session data and stored and there is same session mark
The 3rd session data known merges again, obtains the 4th session data with the session identification before merging again, and/or
Person, at least one of the second session data, the 3rd session data cached, the 4th session data are write into output file, with
Shown for exporting.The merging process of first session data is mainly divided into two parts by the embodiment of the present invention, a part, is closed
And the first session data got in a period of time obtains the second session data;Another part, by the second session data and
3rd session data of caching merges again, wherein, the 3rd session data cached can be the second session data.The present invention
Embodiment reduces the resource occupation that big data merges under network environment, alleviates the pressure of merging, save by repeatedly merging
The time merged, merge cycle time length moreover, passing through and changing caching, thus it is possible to vary merging obtains the second session data
Quantity, the time length of flush cycle is cached by changing, thus it is possible to vary merging obtains the second session data and the 3rd session number
According to quantity, and cached the quantity of the 3rd session data, realized the dynamic regulation of merging.
The embodiment of the present invention exports get first the first session data to output file, without waiting for the first meeting
Words data carry out the displaying of the first session data again after merging, accelerate the speed of the first session data displaying, optimize user
Experience.
Embodiment three
Reference picture 3, show the processing system of the big data under a kind of according to embodiments of the present invention three Network Environment
The structured flowchart of system.
The processing system of big data under the Network Environment of the present embodiment includes:Acquisition module 30, accorded with for obtaining
Close a plurality of the first session data with session identification of predetermined condition;Merging module 32, for that will have identical session mark
The first session data known merges, and respectively obtains and each self-corresponding second session data of session identification;Caching merges mould
Block 34, if merging the cycle for currently reaching caching, the second session data had into identical session mark with what is cached
The 3rd session data known merges, and obtains and each self-corresponding 4th session data of session identification;Caching empties module 36,
If for currently reaching caching flush cycle, by the second session data, the 3rd session data cached, the 4th session number
According at least one of write-in output file, for output show.
The processing system of big data under the Network Environment of the present embodiment is used to realize corresponding in above-described embodiment
The disposal system and method for big data under Network Environment, and the beneficial effect with corresponding embodiment of the method, herein not
Repeat again.
Example IV
Reference picture 4, show the processing system of the big data under a kind of according to embodiments of the present invention four Network Environment
The structured flowchart of system.
The processing system of big data under the Network Environment of the present embodiment includes:Acquisition module 40, accorded with for obtaining
Close a plurality of the first session data with session identification of predetermined condition;Merging module 41, for that will have identical session mark
The first session data known merges, and respectively obtains and each self-corresponding second session data of session identification;Caching merges mould
Block 42, if merging the cycle for currently reaching caching, the second session data had into identical session mark with what is cached
The 3rd session data known merges, and obtains and each self-corresponding 4th session data of session identification;Caching empties module 43,
If for currently reaching caching flush cycle, by the second session data, the 3rd session data cached, the 4th session number
According at least one of write-in output file, for output show.
Alternatively, the processing system of the big data under the Network Environment that the present embodiment provides also includes:Data stock
Module 44 is stored up, at least one of the second session data, the 3rd session data, the 4th session data to be stored in into session number
According in storehouse.
Alternatively, the processing system of the big data under the Network Environment that the present embodiment provides also includes:Output module
45, for merging the first session data with identical session identification in merging module 41, respectively obtain and session
Before identifying each self-corresponding second session data, write using get first the first session data as display data defeated
Go out file, and first the first session data is stored in buffer structure.
Alternatively, the processing system of the big data under the Network Environment that the present embodiment provides also includes:Buffer structure
Memory module 46, for caching merging module 42 obtain with after each self-corresponding 4th session data of session identification, by the
Four session datas are stored into buffer structure.
Alternatively, the processing system of the big data under the Network Environment that the present embodiment provides also includes:User data
Quantity statistics module, for reading the 5th session data from buffer structure;According to the integral point time belonging to the 5th session data
Segment information counts to the 5th session data;Inquire about and count from customer data base and belong to same with the 5th session data
Integral point time segment information and the quantity for belonging to the user data of same agreement with the 5th session data;By count results and quantity
It is added, as amount of user data statistical result.
Alternatively, the processing system of the big data under the Network Environment that the present embodiment provides also includes:User profile
Extraction module, for reading the 6th session data from buffer structure;According to default regular expression to the 6th session data
Parsed, obtain user related information;Wherein, user related information includes at least one of:The hardware and software of mobile terminal
Information, virtual identity information, associated person information, movable record information.
The processing system of big data under the Network Environment of the present embodiment is used to realize corresponding in above-described embodiment
The disposal system and method for big data under Network Environment, and the beneficial effect with corresponding embodiment of the method, herein not
Repeat again.
Embodiment five
Reference picture 5, show the processing system of the big data under a kind of according to embodiments of the present invention five Network Environment
The structural representation of system.
The processing system of big data under the Network Environment of the present embodiment includes:Data acquisition cluster 50, data are whole
Close and temporary cluster 52, data analysis extraction cluster 54, cache database cluster 56 and shared resource manager cluster 58.Its
In, shared resource manager cluster 58 is used for the health information and server shape for collecting every server in other each clusters
State information, so as to be effectively the server-assignment resource in each cluster, ensure system normal operation, improve system effectiveness.Number
It is used to read the big data under network environment according to cluster 50 is obtained, big data filter according to default filtering protocol rule
Data Integration and temporary cluster 52 are pushed to valid data, and by valid data.Data Integration is used to press with temporary cluster 52
Valid data are integrated according to default integration protocol rule to obtain a plurality of session data for belonging to same rule, and will be integrated
To session data store to cache database cluster 56.Data analysis extraction cluster 54 is used to actively capture cache database collection
The session data stored in group 56, the data that conversate merge, and carry out amount of user data system to the session data after merging
The processing such as meter, user profile extraction.
Fig. 6 is the concrete structure schematic diagram that cluster 54 is extracted in data analysis, wherein, grabbing assembly 541, which is used to capture, to be cached
The session data stored in data-base cluster 56, and give application component 542 and further handle.Application component 542 is used for basis
Actual demand conversate data merge, amount of user data statistics, user profile extraction etc. processing.Application component 542 is also used
Stored in the session data after by merging into buffer structure.(grabbed moreover, grabbing assembly 541 is made up of several placement units
Unit 1, placement unit 2 ... placement unit n) are taken, application component 542 is made up of several applying units and (applying unit 1, answered
With unit 2 ... applying unit n).In actual applications, corresponding to the session data that placement unit grabs can directly be given
Applying unit.
The processing system of big data under the Network Environment of the present embodiment is used to realize corresponding in above-described embodiment
The disposal system and method for big data under Network Environment, and the beneficial effect with corresponding embodiment of the method, herein not
Repeat again.
It may be noted that according to the needs of implementation, all parts/step described in the embodiment of the present invention can be split as more
Multi-part/step, the part operation of two or more components/steps or components/steps can be also combined into new part/step
Suddenly, to realize the purpose of the embodiment of the present invention.
Above-mentioned method according to embodiments of the present invention can be realized in hardware, firmware, or be implemented as being storable in note
Software or computer code in recording medium (such as CD ROM, RAM, floppy disk, hard disk or magneto-optic disk), or it is implemented through net
The original storage that network is downloaded is in long-range recording medium or nonvolatile machine readable media and will be stored in local recording medium
In computer code, can be stored in using all-purpose computer, application specific processor or can compile so as to method described here
Such software processing in journey or the recording medium of specialized hardware (such as ASIC or FPGA).It is appreciated that computer, processing
Device, microprocessor controller or programmable hardware include can storing or receive software or computer code storage assembly (for example,
RAM, ROM, flash memory etc.), when the software or computer code are by computer, processor or hardware access and when performing, realize
Hidden tooth abnormal correction method described here.In addition, when all-purpose computer is accessed for realizing the hidden tooth being shown in which just
During abnormal code, the execution of code by all-purpose computer be converted to by perform the hidden tooth abnormal correction being shown in which it is special based on
Calculation machine.
Those of ordinary skill in the art are it is to be appreciated that the list of each example described with reference to the embodiments described herein
Member and method and step, it can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
Performed with hardware or software mode, application-specific and design constraint depending on technical scheme.Professional and technical personnel
Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed
The scope of the embodiment of the present invention.
Embodiment of above is merely to illustrate the embodiment of the present invention, and is not the limitation to the embodiment of the present invention, relevant skill
The those of ordinary skill in art field, in the case where not departing from the spirit and scope of the embodiment of the present invention, it can also make various
Change and modification, therefore all equivalent technical schemes fall within the category of the embodiment of the present invention, the patent of the embodiment of the present invention
Protection domain should be defined by the claims.
Claims (10)
- A kind of 1. processing method of the big data under Network Environment, it is characterised in that including:Obtain a plurality of the first session data with session identification to conform to a predetermined condition;The first session data with identical session identification is merged, respectively obtained and session identification each self-corresponding Two session datas;If the current caching that reaches merges the cycle, second session data had into identical session identification with what is cached The 3rd session data merge, obtain and each self-corresponding 4th session data of session identification;And/orIf current reach caching flush cycle, by second session data, the 3rd session data cached, the 4th meeting At least one of data write-in output file is talked about, is shown for output.
- 2. according to the method for claim 1, it is characterised in that methods described also includes:At least one of second session data, the 3rd session data, the 4th session data are stored in conversation database In.
- 3. according to the method for claim 1, it is characterised in that described by the first session with identical session identification Data merge, and respectively obtain and before each self-corresponding second session data of session identification, methods described also includes:The output file is write using get first the first session data as display data, and by described first article the One session data storage is in buffer structure.
- 4. according to the method for claim 3, it is characterised in that obtained and each self-corresponding 4th meeting of session identification described After talking about data, methods described also includes:4th session data is stored into the buffer structure.
- 5. according to the method for claim 4, it is characterised in that also include:The 5th session data is read from the buffer structure;Integral point time segment information according to belonging to the 5th session data counts to the 5th session data;Inquired about from customer data base and count with the 5th session data belong to same integral point time segment information and with institute State the quantity that the 5th session data belongs to the user data of same agreement;Count results are added with the quantity, as amount of user data statistical result.
- 6. the method according to claim 4 or 5, it is characterised in that also include:The 6th session data is read from the buffer structure;The 6th session data is parsed according to default regular expression, obtains user related information;Wherein, the user related information includes at least one of:The hardware and software information of mobile terminal, virtual identity information, Associated person information, movable record information.
- A kind of 7. processing system of the big data under Network Environment, it is characterised in that including:Acquisition module, for obtaining a plurality of the first session data with session identification to conform to a predetermined condition;Merging module, for the first session data with identical session identification to be merged, respectively obtain and session mark Know each self-corresponding second session data;Merging module is cached, if merging the cycle for currently reaching caching, by second session data and has been cached The 3rd session data with identical session identification merges, and obtains and each self-corresponding 4th session number of session identification According to;Caching empties module, if for currently reaching caching flush cycle, by second session data, cached the At least one of three session datas, the 4th session data write output file, are shown for output.
- 8. system according to claim 7, it is characterised in that the system also includes:Database storage module, for inciting somebody to action At least one of second session data, the 3rd session data, the 4th session data are stored in conversation database.
- 9. system according to claim 7, it is characterised in that the system also includes:Output module, for the first session data with identical session identification to be merged in the merging module, point Do not obtain with before each self-corresponding second session data of session identification, using get first the first session data as exhibition First first session data is stored in buffer structure by registration according to the write-in output file.
- 10. system according to claim 9, it is characterised in that the system also includes:Buffer structure memory module, use In it is described caching merging module obtain with after each self-corresponding 4th session data of session identification, by the 4th session number According to storing into the buffer structure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710546811.5A CN107402980A (en) | 2017-07-06 | 2017-07-06 | A kind of processing method and system of big data under Network Environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710546811.5A CN107402980A (en) | 2017-07-06 | 2017-07-06 | A kind of processing method and system of big data under Network Environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107402980A true CN107402980A (en) | 2017-11-28 |
Family
ID=60405450
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710546811.5A Pending CN107402980A (en) | 2017-07-06 | 2017-07-06 | A kind of processing method and system of big data under Network Environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107402980A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109241007A (en) * | 2018-07-19 | 2019-01-18 | 北京亿赛通网络安全技术有限公司 | The pretreatment system and method for email big data under a kind of network environment |
CN109241176A (en) * | 2018-07-10 | 2019-01-18 | 北京亿赛通科技发展有限责任公司 | The correlation analysis system and method for big data under a kind of Network Environment |
CN111080448A (en) * | 2019-12-02 | 2020-04-28 | 深圳索信达数据技术有限公司 | Intention analysis method based on conversation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103970843A (en) * | 2014-04-28 | 2014-08-06 | 东华大学 | Conversation combining method based on UUID in Web log preprocessing |
CN104144069A (en) * | 2013-05-10 | 2014-11-12 | 中国电信股份有限公司 | Method and device for correlating wireless side call data records and user service behaviors |
CN104424219A (en) * | 2013-08-23 | 2015-03-18 | 华为技术有限公司 | Method and equipment of managing data documents |
CN104426713A (en) * | 2013-08-28 | 2015-03-18 | 腾讯科技(北京)有限公司 | Method and device for monitoring network site access effect data |
-
2017
- 2017-07-06 CN CN201710546811.5A patent/CN107402980A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104144069A (en) * | 2013-05-10 | 2014-11-12 | 中国电信股份有限公司 | Method and device for correlating wireless side call data records and user service behaviors |
CN104424219A (en) * | 2013-08-23 | 2015-03-18 | 华为技术有限公司 | Method and equipment of managing data documents |
CN104426713A (en) * | 2013-08-28 | 2015-03-18 | 腾讯科技(北京)有限公司 | Method and device for monitoring network site access effect data |
CN103970843A (en) * | 2014-04-28 | 2014-08-06 | 东华大学 | Conversation combining method based on UUID in Web log preprocessing |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109241176A (en) * | 2018-07-10 | 2019-01-18 | 北京亿赛通科技发展有限责任公司 | The correlation analysis system and method for big data under a kind of Network Environment |
CN109241007A (en) * | 2018-07-19 | 2019-01-18 | 北京亿赛通网络安全技术有限公司 | The pretreatment system and method for email big data under a kind of network environment |
CN109241007B (en) * | 2018-07-19 | 2021-08-13 | 北京亿赛通网络安全技术有限公司 | System and method for preprocessing email big data in network environment |
CN111080448A (en) * | 2019-12-02 | 2020-04-28 | 深圳索信达数据技术有限公司 | Intention analysis method based on conversation |
CN111080448B (en) * | 2019-12-02 | 2024-03-26 | 深圳索信达数据技术有限公司 | Intent analysis method based on session |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106022708A (en) | Method for predicting employee resignation | |
CN103778148B (en) | Life cycle management method and equipment for data file of Hadoop distributed file system | |
CN107402980A (en) | A kind of processing method and system of big data under Network Environment | |
CN102662988B (en) | Method for filtering redundant data of RFID middleware | |
CN106651416A (en) | Analyzing method and analyzing device of application popularization information | |
US20060224682A1 (en) | System and method of screening unstructured messages and communications | |
CN107979477A (en) | A kind of method and system of business monitoring | |
CN102148805A (en) | Feature matching method and device | |
CN116737482A (en) | Method and device for collecting chip test data in real time and electronic equipment | |
CN115062087A (en) | User portrait construction method, device, equipment and medium | |
CN102801548A (en) | Intelligent early warning method, device and information system | |
CN111666308B (en) | Behavior analysis-based intelligent big data recommendation query method and system | |
CN103297419A (en) | Method and system for fusing off-line data and on-line data | |
CN101431760A (en) | Method and system for implementing business report | |
AU2019101198A4 (en) | A statistical analysis method of mobile telecom data driven user loss prediction | |
CN110677269B (en) | Method and device for determining communication user relationship and computer readable storage medium | |
CN109299132A (en) | SQL data processing method, system and electronic equipment | |
CN105786945B (en) | A kind of power information data efficient processing method based on data channel | |
CN109241176A (en) | The correlation analysis system and method for big data under a kind of Network Environment | |
CN107835190A (en) | A kind of malice SP orders check method | |
CN109429296A (en) | For terminal and the associated method, apparatus of internet information and storage medium | |
CN112256734A (en) | Big data processing method, device, system, equipment and storage medium | |
CN101827175A (en) | Method and system for storing sorted call bills by catalog | |
CN109241388A (en) | A kind of application programming interfaces behavior analysis method and system | |
CN105868197B (en) | A kind of statistical method and statistic device of call bill data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171128 |