WO2017000592A1 - 数据处理方法、装置及系统 - Google Patents

数据处理方法、装置及系统 Download PDF

Info

Publication number
WO2017000592A1
WO2017000592A1 PCT/CN2016/076648 CN2016076648W WO2017000592A1 WO 2017000592 A1 WO2017000592 A1 WO 2017000592A1 CN 2016076648 W CN2016076648 W CN 2016076648W WO 2017000592 A1 WO2017000592 A1 WO 2017000592A1
Authority
WO
WIPO (PCT)
Prior art keywords
signaling
interface
data
data storage
storage server
Prior art date
Application number
PCT/CN2016/076648
Other languages
English (en)
French (fr)
Inventor
陈世雄
李超
王佳
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017000592A1 publication Critical patent/WO2017000592A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Definitions

  • the present invention relates to the field of communications, and in particular to a data processing method, apparatus, and system.
  • the mobile Internet brings opportunities to operators and brings challenges.
  • Signaling is the most basic and most important component of the communication network, reflecting all aspects of network quality and service provision. Therefore, operators do not hesitate to build a huge investment letter.
  • the monitoring platform is used to serve the functional domains facing traffic, such as traffic tracking, network planning and network optimization, and fault diagnosis. How to provide a highly available signaling tracking platform is a top priority.
  • a relational database can be used to store big data. For example, multiple data having an association relationship are stored in different data tables of different databases, and relationships between data stored in different databases are recorded, so that each The data is associated.
  • the actual test data shows that, for example, the method of inserting data into the SQL Server database is commonly used by the application to insert directly (or indirectly) using the Inserted Query Language (SQL). This method is too slow, and the fastest it is tested (when the original table is empty) is only 1000 records per second.
  • SQL Inserted Query Language
  • the method of storing the data in different data tables of different databases by using the association relationship is loose in data storage mode, and the relationship must be reflected by the relational database.
  • this method of loosely storing data and using associations to record data in different data tables will greatly reduce the efficiency of data storage and further reduce the efficiency of subsequent search and maintenance.
  • the embodiment of the invention provides a data processing method, device and system to solve at least the problem of low signaling storage efficiency in the related art.
  • a data processing method including: collecting a gateway general packet without Signaling of the line service support node GGSN or the public data network gateway PGW, wherein the signaling is user signaling; acquiring a unique keyword of the user; and storing the signaling to the data storage server according to the unique keyword In a multi-level directory.
  • the signaling of the collection gateway general packet radio service support node GGSN or the public data network gateway PGW includes: connecting to the above-mentioned general packet radio service support node or the interface of the public data network gateway by optical port mirroring to collect the foregoing
  • the signaling includes the following interfaces: at least one of the following: an S5 interface, an S8 interface, a Gn interface, a Gp interface, a Gx interface, a Gy interface, and an authentication and authorization charging AAA interface.
  • obtaining the unique keyword of the user includes: acquiring an identifier of the user, where the identifier includes an International Mobile Subscriber Identity (IMSI) or a Mobile Subscriber Integrated Services Digital Network Number (MSISDN); performing hashing on the identifier , get the above unique keyword.
  • IMSI International Mobile Subscriber Identity
  • MSISDN Mobile Subscriber Integrated Services Digital Network Number
  • the method further includes: generating a multi-level directory in the data storage server according to time.
  • the method includes: detecting whether there is a directory exceeding the preset time in the multi-level directory; and detecting the foregoing When there is a directory exceeding the preset time in the multi-level directory, the directory exceeding the preset time is deleted from the data storage server.
  • storing the foregoing signaling to the multi-level directory of the data storage server according to the foregoing unique keyword includes: searching for the data storage server corresponding to the user according to the unique keyword; and storing the signaling to the corresponding user The multi-level directory of the data storage server.
  • the storing the signaling to the multi-level directory of the data storage server corresponding to the user includes: acquiring a timestamp of the service message; generating the first identifier according to the timestamp and the unique keyword; acquiring the first a writer corresponding to the identifier, wherein the writer is in one-to-one correspondence with the multi-level directory; and the signaling is written into the corresponding directory by the writer.
  • the data storage server includes a memory library and a file server, wherein the memory library is configured to store summary information of the signaling, and the file server is configured to store file information of the signaling, the summary information, and the file information. There is a mapping relationship between them.
  • the method further includes: receiving a query instruction, where the query instruction includes a filter condition and the unique keyword; a data storage server corresponding to the unique keyword; and querying data from the data storage server corresponding to the unique keyword according to the filtering condition.
  • the querying the data from the data storage server corresponding to the unique keyword according to the filtering condition includes: traversing the multi-level directory of the data storage server corresponding to the unique keyword according to the filtering condition; and data corresponding to the unique keyword Obtaining the data satisfying the foregoing filtering condition in the multi-level directory of the storage server, obtaining the query result; determining whether the number of data rows of the query result exceeds a preset value; and determining that the number of data rows of the query result exceeds the foregoing When the preset value is displayed, the above query results are displayed in batches.
  • a data processing apparatus including: an acquisition module, configured to collect signaling of a gateway general packet radio service support node GGSN or a public data network gateway PGW, wherein the signaling is The user's signaling; the obtaining module is configured to acquire the unique keyword of the user; and the storage module is configured to store the signaling to the multi-level directory of the data storage server according to the unique keyword.
  • the foregoing collection module includes: a signaling collector, configured to connect to the interface of the general packet radio service support node or the public data network gateway by using an optical port image to collect the signaling, where the interface includes at least the following One of them: S5 interface, S8 interface, Gn interface, Gp interface, Gx interface, Gy interface, authentication and authorization accounting AAA interface.
  • a signaling collector configured to connect to the interface of the general packet radio service support node or the public data network gateway by using an optical port image to collect the signaling, where the interface includes at least the following One of them: S5 interface, S8 interface, Gn interface, Gp interface, Gx interface, Gy interface, authentication and authorization accounting AAA interface.
  • the acquiring module includes: an acquiring unit, configured to obtain an identifier of the user, where the identifier includes an international mobile subscriber identity (IMSI) or a mobile subscriber integrated service digital network number (MSISDN); and an operation unit configured to The identification code is hashed to obtain the above unique keyword.
  • IMSI international mobile subscriber identity
  • MSISDN mobile subscriber integrated service digital network number
  • the foregoing apparatus further includes: a generating module, configured to generate a multi-level directory in the data storage server according to time.
  • a generating module configured to generate a multi-level directory in the data storage server according to time.
  • the foregoing storage module includes: a searching unit, configured to search for a data storage server corresponding to the user according to the unique keyword; and a storage unit configured to store the signaling to multiple levels of the data storage server corresponding to the user In the directory.
  • a data processing system includes: a data collection server configured to collect signaling of a gateway general packet radio service support node GGSN or a public data network gateway PGW, where the signaling is And the data storage server is connected to the data collection module, wherein the data storage server comprises a multi-level directory, and the multi-level directory is used to store the signaling.
  • the data storage server includes a memory library and a file server, wherein the memory library is configured to store summary information of the signaling, and the file server is configured to store file information of the signaling, the summary information, and the file information. There is a mapping relationship between them.
  • the data collection server includes a probe signaling collector, and the probe signaling collector is connected to the universal packet wireless service support node or the public data network gateway interface by optical port mirroring to collect the foregoing information.
  • the above interface includes at least one of the following: an S5 interface, an S8 interface, a Gn interface, a Gp interface, a Gx interface, a Gy interface, and an authentication and authorization charging AAA interface.
  • the data collection server further includes a processing module, connected to the probe signaling collector, configured to parse the signaling collected by the probe signaling collector to obtain the summary information and the file information, and The summary information and the file information are sent to the memory library and the file server, respectively.
  • a processing module connected to the probe signaling collector, configured to parse the signaling collected by the probe signaling collector to obtain the summary information and the file information, and The summary information and the file information are sent to the memory library and the file server, respectively.
  • the data processing system further includes: a query server, connected to the data storage server, configured to query the signaling from the data storage server.
  • a query server connected to the data storage server, configured to query the signaling from the data storage server.
  • a computer storage medium is also provided, and the computer storage medium may be stored and executed.
  • the signaling of the general-purpose packet radio service support node GGSN or the public data network gateway PGW of the collection gateway is adopted, wherein the signaling is user signaling; the unique keyword of the user is obtained; and the only key is The word is stored in the multi-level directory of the data storage server, which solves the problem of low signaling storage efficiency in the related art, thereby achieving the effect of improving signaling storage efficiency.
  • FIG. 1 is a flow chart of a data processing method according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a multi-level directory according to an embodiment of the present invention.
  • FIG. 3 is a flow chart of writing data to a memory bank in accordance with an embodiment of the present invention.
  • FIG. 4 is a schematic flow chart of retrieving data in an in-memory library according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a hierarchy of memory library retrieval information according to an embodiment of the present invention.
  • FIG. 6 is a block diagram showing the structure of a data processing apparatus according to an embodiment of the present invention.
  • FIG. 7 is a block diagram showing the structure of a data processing system according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a memory library retrieval data system deployment according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a data processing method according to an embodiment of the present invention. As shown in FIG. 1, the process includes the following steps:
  • Step S102 collecting signaling of the gateway general packet radio service support node GGSN or the public data network gateway PGW, wherein the signaling is user signaling.
  • user signaling can be collected through each interface of a Gateway General Packet Radio Service Supporting Node (GGSN) or a Public Data Network Gateway (PGW). , wherein the user can be one or more.
  • GGSN Gateway General Packet Radio Service Supporting Node
  • PGW Public Data Network Gateway
  • the collection gateway general packet radio service support The signaling of the node GGSN or the public data network gateway PGW includes: connecting to the interface of the general packet radio service support node or the public data network gateway by optical port mirroring to collect the foregoing signaling, wherein the interface includes at least the following One: S5 interface, S8 interface, Gn interface, Gp interface, Gx interface, Gy interface, authentication and authorization accounting AAA interface.
  • the probe signaling collector can be connected to each interface of the GGSN or the PGW by optical port mirroring, so that the signaling of each interface of the GGSN or the PGW can be collected in real time.
  • the signaling of the interface of the GGSN or the PGW is collected by means of optical port mirroring, which can prevent the normal operation of the interface of the GGSN or the PGW during the process of collecting the signaling of the interface of the GGSN or the PGW.
  • Step S104 acquiring a unique keyword of the user
  • each user corresponds to a unique keyword
  • the unique keyword exists in the embodiment of the present invention, because a large number of users exist in the network element.
  • obtaining the unique keyword of the user includes: obtaining an identifier of the user, wherein the identifier comprises an International Mobie Subscriber Identity (IMSI) or a mobile subscriber integrated service digital network number (Mobie) Subscriber International Integranted Services Digital/Public Switched Telephone Network Number (MSISDN); hashing the above identification code to obtain the above unique keyword.
  • IMSI International Mobie Subscriber Identity
  • Mobie mobile subscriber integrated service digital network number
  • MSISDN Subscriber International Integranted Services Digital/Public Switched Telephone Network Number
  • Each user in the network element has a corresponding international mobile subscriber identity (IMSI) or mobile subscriber integrated service digital network number (MSISDN), and hashes the IMSI or MSISDN corresponding to the user to obtain a hash value, and the hash is obtained.
  • IMSI international mobile subscriber identity
  • MSISDN mobile subscriber integrated service digital network number
  • the value is used as the above-mentioned unique keyword to facilitate fast storage and fast search of each subsequent user signaling.
  • Step S106 The foregoing signaling is stored in the multi-level directory of the data storage server according to the unique keyword.
  • the embodiment of the present invention may create a multi-level directory in the data storage server in advance, or may dynamically generate a multi-level directory in the data storage server during the process of storing the signaling to the data storage server.
  • the present invention implements
  • the user's signaling is stored in a file in a multi-level directory of the data storage server, for example, in a file named according to a unique keyword.
  • the method further includes: generating a multi-level directory in the data storage server according to time.
  • FIG. 2 is a schematic diagram of the multi-level directory according to the embodiment of the present invention, as shown in FIG. 2
  • the multi-level directory is generated according to the year, month, day, hour, and minute, and the user signaling is stored in the corresponding directory according to time.
  • the signaling 1 is collected at 12:20 on December 30, 2014.
  • the signaling 1 can be stored in the file named according to the unique keyword in the 20-minute directory shown in FIG.
  • the number of levels of the multi-level directory may be determined according to the amount of data. For example, when the amount of data is small, the hour may be used as the leaf directory, that is, the level 4 directory, and when the amount of data is large, You can use minutes as the leaf directory, which is a level 5 directory.
  • the user's signaling is stored in the multi-level directory of the data storage server according to the unique keyword.
  • the user's signaling is stored in the database, the storage speed is faster, and the related technology is solved.
  • Signaling storage The problem of lower efficiency, and thus the effect of improving the efficiency of signaling storage.
  • the method includes: detecting whether there is a directory exceeding the preset time in the multi-level directory. And when it is detected that there is a directory exceeding the preset time in the multi-level directory, the directory exceeding the preset time is deleted from the data storage server.
  • the signaling of the user in the network element has strong real-time performance, when monitoring the network element user, it is usually only necessary to analyze the user signaling of the latest period of time.
  • the user signaling of the storage time can be deleted, and the memory occupation can be saved on the other hand. Conducive to fast retrieval of user signaling.
  • the preset time can be set according to the actual situation. For example, the preset number of days is set to 7 days, and the directory exceeding the preset time can be directly deleted from the data storage server. For example, you can check whether there are more than 7 days of catalogs once a day, and delete the catalogs by time if they exist, without having to check the contents of the files.
  • storing the signaling to the multi-level directory of the data storage server according to the foregoing unique keyword comprises: searching for a data storage server corresponding to the user according to the unique keyword; and storing the signaling to the data corresponding to the user The multi-level directory of the storage server.
  • the user's unique keyword may be associated with its corresponding data storage server in advance, through the user's unique key.
  • the word can be found in the data storage server corresponding to the user, and the user's signaling is stored in the multi-level directory of the data storage server corresponding to the user, thereby facilitating the subsequent quick retrieval of the user signaling.
  • storing the foregoing signaling in the multi-level directory of the data storage server corresponding to the user includes: acquiring a timestamp of the service message; generating the first identifier according to the timestamp and the unique keyword; acquiring the first identifier Corresponding writer, wherein the writer is in one-to-one correspondence with the multi-level directory; and the signaling is written into the corresponding directory by the writer.
  • the service message is the user's signaling, and the first identifier is generated according to the timestamp and the unique key.
  • the first identifier is used for the search of the writer, and after the writer corresponding to the first identifier is found, the writer is utilized. Write to the corresponding memory file (that is, the file stored in the multi-level directory). Since the first identifier uses a timestamp, the function of writing at a time of 1 second can be realized without using a timer. For example, when the first identifier is different, the first identifier is inevitably different, a new writer is created, and the real-time requirement is high. In the case of 1 second, it will be forced to write the file once, no matter whether the cache is full or not, but it can achieve the purpose of timing writing.
  • the data storage server includes a memory library and a file server, wherein the memory library is configured to store summary information of the signaling, and the file server is configured to store file information of the signaling, the summary information and the file information. There is a mapping relationship between them.
  • the embodiment of the present invention adopts a distributed storage method, and stores summary information of signaling and file information of signaling in a memory library and a file server, respectively.
  • the signaling summary information and the signaling file can be obtained by parsing the signaling.
  • Information wherein the summary information of the signaling includes a Uniform Resource Locator (URL) information of the signaling file and a Uniform Resource Locator URL information of the media file, and the file information of the signaling includes detailed signaling.
  • URL Uniform Resource Locator
  • the file and the media file in the embodiment of the present invention, the corresponding signaling file can be obtained through the URL information of the signaling file, and the corresponding media file can be obtained through the URL information of the media file, therefore, only the memory is needed in the retrieval process.
  • the summary information of the retrieval signaling in the library can obtain the corresponding file information.
  • the method further includes: receiving a query instruction, wherein the query instruction includes a filter condition and the unique keyword; a data storage server corresponding to the unique keyword; and querying the data from the data storage server corresponding to the unique keyword according to the filtering condition.
  • the user signaling stored in the data storage server may be queried.
  • the unique key may be included in the query instruction, thereby The word quickly retrieves the user's signaling from the data storage server.
  • the querying the data from the data storage server corresponding to the unique keyword according to the filtering condition comprises: traversing the multi-level directory of the data storage server corresponding to the unique keyword according to the filtering condition; and storing the data corresponding to the unique keyword Obtaining the data satisfying the foregoing filtering condition in the multi-level directory of the server, and obtaining the query result; determining whether the number of data rows of the query result exceeds a preset value; and when determining that the number of data rows of the query result exceeds the preset value, The above query results are displayed in batches.
  • embodiments of the present invention may reduce the depth of retrieval of the server according to user query habits (for example, the maximum number of data lines that the user has to watch each time). Specifically, the number of rows of the query result displayed each time may be set. When the query result is greater than the preset number of rows (ie, the preset value), the query result is displayed in batches.
  • user query habits for example, the maximum number of data lines that the user has to watch each time.
  • the embodiment of the present invention does not use any commercial database to realize rapid storage and query of massive data, but uses a tree-type storage structure to store user signaling in a memory library, and the data file format thereof can be configured, for example, It is described by TLV (that is, the data format including three fields of type, length, and value), and the related data dictionary can be defined by Extensible Markup Language (XML) file during storage and query.
  • TLV that is, the data format including three fields of type, length, and value
  • XML Extensible Markup Language
  • the unique keyword KEY1 of different user signaling is configured, and the unique keyword KEY1 is used for the file name when the file is generated, and the corresponding memory library DS SERVER is matched during the query.
  • the embodiment of the present invention adopts a distributed networking architecture, that is, deploys multiple signaling collection modules AGNENT and a memory library DS SERVER in the network.
  • a plurality of signaling collection modules AGNENT and a plurality of memory banks DS SERVER are associated with each other through the MSISDN hash value as the unique key KEY1, and the forwarding relationship between the query request of the query server WEB SERVER and the memory library DS SERVER is also through the query.
  • the hash value of the unique keyword KEY1 in the condition is associated, and each parallel processing node shares the protocol packet that is processed by the GGSN or the PGW network element.
  • writing data to the memory bank includes the following steps:
  • Step S301 the signaling collection module constructs a TLV record, and sends the hash according to the MSISDN as the unique key KEY1 to obtain the corresponding memory bank, and adds the KEY1 to the TLV record.
  • the signaling collection module AGENT collects signaling and parses the signaling, for example, constructs a TLV record, where the TLV refers to a data format including three fields of type, length, and value, and takes the hash of the MSISDN as a unique key. KEY1 is sent to get the corresponding memory bank, and this KEY1 is added to the TLV record.
  • Step S302 the memory library receives the TLV record, constructs the first identifier KEY2, and the KEY2 is the second format of the timestamp of the KEY1 and the service message, or the hour format.
  • step S303 it is found whether the writer corresponding to KEY2 is successful. If it succeeds, step S306 is performed, and if it fails, step KS304 is performed.
  • step S304 it means that the refresh time or the new MSISDN joins, and the batch (256 writers are a batch) is required to close the current writer, and when it is closed, the memory disk is forced to be written from the cache.
  • the writer corresponding to KEY2 when the writer corresponding to KEY2 is not found, it indicates that the refresh time is up to or there is a new MSISDN join. At this time, the current writer needs to be closed.
  • step S305 a writer corresponding to KEY2 is created, and the writer creates a new file in a leaf directory of a minute value or an hour value corresponding to the current system.
  • Creating a writer creates a corresponding time leaf directory and file, as well as a cache.
  • the writer first enters the cache.
  • the cache is full before writing to the file, and the file is stored in the memory virtual disk.
  • the data files of the same MSISDN have the same name, and data files of the same file name will be available in different time directories.
  • Step S306 writing to the cache of the corresponding writer.
  • step S307 it is determined whether the buffer of the writer is full. If the buffer of the writer is full, step S308 is performed. If the buffer of the writer is not full, step S301 is performed to perform processing of the next data.
  • step S308 the writer cache data is written into the file, and step S301 is completed.
  • retrieving data from a memory library includes the following steps:
  • step S401 the query server WEB SERVER accepts the query request of the user, and finds the corresponding memory bank DS SERVER according to KEY1.
  • the TLV data defines a data dictionary by CHRMAP; the PATCHMAP defines key information of the TLV data, for example, the index of KEY1; FILTERMAP defines all the filtering conditions.
  • Step S402 the memory library DS SERVER receives the query request of the query server, and finds the start time according to KEY1. STARTTIME, end time ENDTIME, and other business field filter values, construct filter FILTERMAP to initiate a query request.
  • step S403 it is determined whether the time type is hour or minute. If it is determined that the time type is hour, step S404 is performed, and if it is determined that the time type is minute, step S405 is performed.
  • Step S404 traversing the minute directory in the time range according to STARTTIME and ENDTIME, the search depth is 5 levels: year/month/day/hour/minute/, obtaining the URL list of the level 5 directory, and executing step S406.
  • Step S405 traversing the hour directory in the time range according to STARTTIME and ENDTIME, the search depth is 4 levels: year/month/day/hour/, obtaining the URL list of the level 4 directory, and executing step S406.
  • Step S406 traversing the same resource locator URL list of the time directory, determining whether the KEY1.il file exists in the directory, if not, executing step S406 to continue traversing, and if yes, executing step S407.
  • step S407 the file is processed line by line, and each row of data is filtered according to the set filter FILTERMAP, and only valid result data is cached.
  • step S408 it is determined whether the query result queue exceeds the preset number of result rows. If not, step S409 is performed, and if yes, step S411 is performed, and the query ends.
  • step S409 it is determined whether the file is at the end of the file. If the file is not at the end of the file, step S407 is performed, and if the file is at the end, step S410 is performed.
  • step S410 it is determined whether the end of the directory list is reached. If the end of the list is not reached, step S406 is performed to take a time directory processing. To the end of the directory list, step S411 is directly executed, and the query ends.
  • step S411 the results are sorted according to the start time, and the query result is sent to the query server WEB SERVER.
  • FIG. 5 is a hierarchical diagram of memory library retrieval information according to an embodiment of the present invention.
  • the embodiment of the present invention provides a tree-type storage structure.
  • the signaling tracking involves a plurality of media files, signaling files, and the like.
  • the memory library in the embodiment of the present invention stores the summary of the information, which is the uppermost layer of data. It is also the fastest data for storing and querying. In the summary information, you can see the URL information of the signaling and media files involved in a business process.
  • the client's presentation of the signaling process only needs to associate the information stored in the memory library with the file content of the corresponding URL.
  • a large number of media files and signaling files are also stored in a directory structure separated by minutes for leaf nodes, which is the same as memory bank processing, and the memory library records implement management processing of these files and signaling processes.
  • the distributed big data fast storage strategy of the embodiment can provide different response speeds according to user configuration, evenly share network traffic, improve system processing capability and reliability, such as using Intel DPDK stream processing framework for data collection, Using memory disk technology and distributed big data storage query system to solve the generation of large amount of data files, and timely
  • the contradiction between queries provides the ability to insert 100,000 data per second in real time and the ability to quickly query in real time.
  • it can adapt to the business demand of large data volume, and the network element shares the entire network service load in parallel and improves the service processing performance of the network.
  • a network element communication link is interrupted or faulty
  • other network elements in the distributed network take over the network element service, and the entire network operation state is not interrupted, thereby ensuring network stability and reliability.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
  • a data processing device is also provided, which is used to implement the above-mentioned embodiments and preferred embodiments, and will not be described again.
  • the term “module” may implement a combination of software and/or hardware of a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 6 is a structural block diagram of a data processing apparatus according to an embodiment of the present invention. As shown in FIG. 6, the apparatus includes an acquisition module 62, an acquisition module 64, and a storage module 66.
  • the collecting module 62 is configured to collect signaling of the gateway general packet radio service support node GGSN or the public data network gateway PGW, wherein the signaling is user signaling;
  • the embodiment of the present invention may collect user signaling by monitoring each interface of the GGSN or the PGW, where the user may be one or multiple.
  • the foregoing collection module 62 includes: a signaling collector, which is connected to the interface of the general packet radio service support node or the public data network gateway by using an optical port image to collect the signaling, where the interface includes at least the following One of them: S6 interface, S8 interface, Gn interface, Gp interface, Gx interface, Gy interface, authentication and authorization accounting AAA interface.
  • the obtaining module 64 is configured to obtain the unique keyword of the user
  • each user corresponds to a unique keyword
  • the unique keyword exists in the embodiment of the present invention, because a large number of users exist in the network element.
  • the obtaining module 64 includes: an obtaining unit, configured to acquire an identifier of the user, wherein the user identifier comprises an international mobile subscriber identity IMSI or a mobile subscriber integrated service digital network number MSISDN; and an operation unit is set to The above identification code is hashed to obtain the above unique keyword.
  • Each user in the network element has a corresponding international mobile subscriber identity (IMSI) or mobile subscriber integrated service digital network number (MSISDN), and hashes the IMSI or MSISDN corresponding to the user to obtain a hash value, and the hash is obtained.
  • IMSI international mobile subscriber identity
  • MSISDN mobile subscriber integrated service digital network number
  • the value is used as the above-mentioned unique keyword to facilitate fast storage and fast search of each subsequent user signaling.
  • the storage module 66 is configured to store the signaling to the multi-level directory of the data storage server according to the unique keyword.
  • the embodiment of the present invention may create a multi-level directory in the data storage server in advance, or may dynamically generate a multi-level directory in the data storage server during the process of storing the signaling to the data storage server.
  • the embodiment of the present invention collects the signaling of the gateway general packet radio service support node GGSN or the public data network gateway PGW through the collection module 62, wherein the signaling is the signaling of the user; the obtaining module 64 acquires the unique keyword of the user; The storage module 66 is configured to store the signaling to the multi-level directory of the data storage server according to the unique keyword.
  • the user's signaling is stored in the database, the storage speed is faster, and the problem of low signaling storage efficiency in the related art is solved, thereby achieving the effect of improving signaling storage efficiency.
  • the foregoing apparatus before storing the foregoing signaling in the multi-level directory of the data storage server according to the above-mentioned unique keyword, the foregoing apparatus further includes: a generating module, configured to generate a multi-level directory in the data storage server according to time.
  • a generating module configured to generate a multi-level directory in the data storage server according to time.
  • a tree-type multi-level directory is generated according to year, month, day, hour, and minute, where year is the root directory and minutes is the leaf directory.
  • the number of levels of the multi-level directory can be determined according to the amount of data. For example, when the amount of data is small, the hour can be used as the leaf directory, that is, the level 4 directory, and when the amount of data is large, the minute can be used as the leaf. Directory, which is a level 5 directory.
  • the storage module 66 includes: a searching unit configured to search the data storage server corresponding to the user according to the unique keyword; and a storage unit configured to store the signaling to multiple levels of the data storage server corresponding to the user In the directory.
  • the user's unique keyword may be associated with its corresponding data storage server in advance, through the user's unique key.
  • the word can be found in the data storage server corresponding to the user, and the user's signaling is stored in the multi-level directory of the data storage server corresponding to the user, thereby facilitating the subsequent quick retrieval of the user signaling.
  • a data processing system is also provided in this embodiment.
  • 7 is a block diagram showing the structure of a data processing system in accordance with an embodiment of the present invention. As shown in FIG. 7, the data processing system includes a data collection server 72 and a data storage server 74.
  • the data collection server 72 is configured to collect signaling of the gateway general packet radio service support node GGSN or the public data network gateway PGW, wherein the signaling is user signaling.
  • the data collection server includes a probe signaling collector, and the probe signaling collector is connected to the interface of the general packet radio service support node or the public data network gateway by optical port mirroring to collect the signaling.
  • the foregoing interface includes at least one of the following: an S5 interface, an S8 interface, a Gn interface, a Gp interface, a Gx interface, a Gy interface, and an authentication and authorization charging AAA interface.
  • the signaling of the interface of the GGSN or the PGW is collected by means of optical port mirroring, which can prevent the normal operation of the interface of the GGSN or the PGW during the process of collecting the signaling of the interface of the GGSN or the PGW.
  • the data storage server 74 is connected to the data collection module, wherein the data storage server includes a multi-level directory, and the multi-level directory is used to store the signaling.
  • the embodiment of the present invention collects the gateway general packet radio service support node GGSN or the public through the data collection server 72.
  • the data storage server includes a memory library and a file server, wherein the memory library is configured to store summary information of the signaling, and the file server is configured to store file information of the signaling, the summary information and the file information. There is a mapping relationship between them.
  • the summary information of the signaling includes the uniform resource locator URL information of the signaling file and the uniform resource locator URL information of the media file, and the file information of the signaling includes the detailed signaling file and the media file, and the signaling is performed by the embodiment of the present invention.
  • the URL information of the file can obtain the corresponding signaling file, and the corresponding media file can be obtained through the URL information of the media file. Therefore, in the retrieval process, only the summary information of the signaling needs to be retrieved from the memory library to obtain the same. Corresponding file information.
  • the data collection server further includes a processing module, connected to the probe signaling collector, configured to parse the signaling collected by the probe signaling collector to obtain summary information and file information, and respectively separate the summary information and the file information. Send to the memory library and file server.
  • a processing module connected to the probe signaling collector, configured to parse the signaling collected by the probe signaling collector to obtain summary information and file information, and respectively separate the summary information and the file information. Send to the memory library and file server.
  • the embodiment of the present invention adopts a distributed storage method, and stores summary information of signaling and file information of signaling in a memory library and a file server, respectively.
  • the processor of the data collection server parses the signaling to obtain the summary information of the signaling and the file information of the signaling, and sends the summary information and the file information to the memory library and the file server, respectively.
  • the data processing system further includes: a query server, connected to the data storage server, configured to query the signaling from the data storage server.
  • a query server connected to the data storage server, configured to query the signaling from the data storage server.
  • the query server is configured to query the data storage server for signaling of the network element user to implement monitoring of the network element user.
  • FIG. 8 is a schematic diagram of a memory library retrieval data system deployment according to an embodiment of the present invention.
  • the memory library retrieval data system includes multiple signaling collection modules (ie, the signaling collection module 1 to the signaling collection module m), and is connected to each interface of the GGSN or the PGW to collect user signaling, and multiple memory libraries. (that is, the memory library 1 to the memory library n), the query server and the client query module, wherein, in the reporting and inbound process, the signaling collection module reports the message and matches the corresponding memory library according to the MSISD hash as a unique keyword.
  • the query request of the query server is also based on the mandatory conditions. For example, the MSISDN takes a hash as a unique keyword to match the corresponding memory bank.
  • the probe signaling collector when the usage rights of each server are strictly restricted, is connected to the signaling of each interface of the GGSN or the PGW by the optical port mirroring for real-time monitoring, including the S5/S8 interface, Gn/Gp. Interface, Gx interface, Gy interface, and AAA interface for authentication and authorization.
  • the system is implemented by adding a network element in the mobile data network of the existing operator.
  • the signaling acquisition module AGENT is connected to the Gn/Gp interface between the GGSN or the PGW.
  • the Gx interface, the Gy interface, and the authentication and authorization charging AAA interface the signaling acquisition module AGENT obtains the data packets of each interface by means of probe acquisition, extracts the real-time data of the network, and extracts the user-related signaling flow according to the user number MSISDN.
  • the memory library DS SERVER receives the TLV record of the signaling summary information constructed by the signaling acquisition module AGENT, and stores it in real time.
  • the server WEB SERVER implements the customizable query function of the client, and the query server WEB SERVER receives the query request of the user, finds the corresponding memory library DS SERVER according to the unique keyword KEY1, and displays the JAVA script object representation format (JavaScript Object Notation, referred to as JSON).
  • the query request is sent to the memory library DS SERVER, and the query request includes the unique keyword KEY1.
  • the query server WEB SERVER will receive the query result, and at the same time provide the network management parameter configuration control center, which can provide the parameter configuration interface for the network administrator.
  • the query module contains an efficient query algorithm.
  • the query condition (ie, the query command) includes three pieces of information: 1 start time; 2 end time; 3MSISDN, where the start time and end time are accurate to the order of minutes.
  • the query conditions are respectively converted into corresponding dates, hours, MSISDN, and the matching matches are performed hierarchically in the three-level file directory such as date/hour/minute/.
  • the query result is a signaling flowchart. When a row is clicked, the detailed protocol code stream and protocol decoding detailed information of the signaling appear.
  • the steps for querying the network element signaling backtracking system data are as follows:
  • Step 1 The user inputs the query condition (that is, the query instruction) in the network query client interface of the client query module, including: start time, end time, MSISDN, maximum return line number, and is assembled into a JSON format.
  • the query condition that is, the query instruction
  • Step 2 The query server WEB SERVER obtains the unique key KEY1 according to the MSISDN hash, and adds KEY1 to the query parameter combination, finds the matching memory bank DS SERVER according to KEY1, and sends the query request data packet to it in JSON format.
  • Step 3 The query of the memory library DS SERVER listens to the arrival of the query request data packet, and obtains the query condition in the data packet of the JSON format and converts it into: start date, end date, and KEY1. And in the memory library search for the log records that meet the conditions according to the maximum number of returned rows.
  • Step 4 The memory library DS SERVER quickly sends all the data set packets that meet the conditions to the query server WEB SERVER in the form of a UDP-based Data Transfer Protocol (UDT) message.
  • UDT UDP-based Data Transfer Protocol
  • Step 5 The query server WEB SERVER receives the query result data packet returned by the corresponding memory library DS SERVER, sorts it according to time, and the final result is sent to the client in JSON format, and the client is transformed and presented on the query interface.
  • the patent system No. CN104636199A "a large-data real-time processing system and method based on distributed memory computing" has the following disadvantages: the problem of duplication is not considered before writing a file, and the file metadata of the new and old versions are The server side compares and performs redundant deduplication on the same data through the file block in the storage layer, which has a large system overhead, and the data of the present invention is first filtered according to the hash code of the IMSI to different files, so that the same keyword is guaranteed. The same file, you can directly locate the corresponding file according to the IMSI hash value when querying. At the same time, the files are stored in a directory that is refined to the minute.
  • the query can be locked to a few directories according to the time range.
  • the embodiment of the present invention adopts a customizable query on the query, that is, the user needs to read a few, and the server only processes the corresponding limited line of text in the file, and in the big data environment, it is not necessary to read the complete file. Greatly improved response speed.
  • the invention ensures fast positioning and fast query through the planning of the system.
  • Patent No. CN104679893A, "A Big Data-Based Information Retrieval Method” has the following shortcomings: The data retrieval method based on big data involves multiple backups and consistency maintenance of multiple different hosts, which is complicated and affects the system. The processing power of massive data.
  • the embodiment of the invention adopts the hash of the MSISDN to obtain the only key After the word KEY1, accurate transmission can avoid the problem of data duplication on different hosts.
  • Distributed storage and distributed queries use the same hash algorithm of the same field, all located on the same memory bank DS SERVER, there will be no phenomenon involving one query involving multiple hosts.
  • the information model in the present invention is a typical tree structure, the top level is each table in our distributed memory library, the lower level is the corresponding signaling file and media file of each table, and the representation form of the memory table is also data file, memory Table access is also the filtering of file directories and the filtering of file contents.
  • the embodiment of the invention provides a distributed big data fast storage query system, which provides real-time monitoring and corresponding reports on the service signaling and data service types of the GGSN/PGW. These include network real-time monitoring and network element signaling backtracking.
  • the signaling of each interface of the GGSN/PGW can be monitored in real time, including S5/S8, Gn/Gp, Gx, Gy, and the authentication and authorization charging AAA interface.
  • the operator can query the signaling generated by the user on the GGSN/PGW within a certain period of time through the user IMSI/MSISDN number on the system, and can decode the signaling.
  • the signaling of all users of the entire network element can be maintained for at least 7 days for backtracking queries.
  • the present invention also provides a distributed big data fast storage strategy, which can provide different response speeds according to user configurations, and aims to evenly share network traffic and improve system processing capability and reliability.
  • the Intel DPDK stream processing framework is used for data collection, and the memory disk technology and the distributed big data storage query system are used to solve the problem of generating a large number of data files and timely querying. Provides the ability to insert 100,000 data per second in real time.
  • the present invention provides two distributed online log backtracking systems for different scenarios in an actual network environment.
  • the probe signaling collector connects to the signaling of each interface of the GGSN/PGW through optical port mirroring for real-time monitoring, including S5/S8, Gn/Gp, Gx, Gy, authentication and authorization accounting AAA interface;
  • MSISDN as the unique key of the system KEY1, used for network query and memory library DS SERVER association, signaling acquisition module AGENT and memory library DS SERVER report message purpose The association, used for the unique naming of memory library files.
  • the system uses a combination of distributed memory library and distributed file system to provide a hierarchical information structure from summary to detail.
  • the summary information is stored in the memory library, and detailed information (ie, signaling files, media files, etc.)
  • the distributed file server is distributed and distributed, and the summary information includes, for example, a uniform resource locator URL of the signaling file and a uniform resource locator URL of the media file.
  • the client needs detailed information, the local information can be downloaded locally through the URL, and locally at the client. Rendered in the tool without affecting the performance of the server.
  • the use of system data timestamps reduce the use of a large number of timers; use user query habits (the maximum number of data lines to be viewed at a time) to reduce the depth of the server; use memory processing instead of file processing, improve system processing power .
  • the system device is provided with a signaling acquisition module AGENT, a memory library DS Server, a query server WEB SERVER, a file server, and a total of four components.
  • the signaling acquisition module AGENT and the memory library DS SERVER are respectively deployed in different network environments. The specific functions of each component are as follows:
  • the signaling acquisition module AGENT captures the signaling of each interface of the GGSN/PGW by using a probe module (for example, a probe signaling collector), and performs parsing of each protocol state machine to obtain relevant summary information and each signaling.
  • the file, the media file, and the file are saved to the distributed file server; the summary information is sent to the corresponding memory bank DS SERVER according to the MSISDN hash as the unique key KEY1.
  • Memory library DS SERVER receives the TLV record constructed by the signaling acquisition module AGENT and solves according to the data dictionary
  • the unique keyword KEY1 is precipitated, and the first identifier KEY2 is constructed using the unique keyword KEY1.
  • the first identifier KEY2 is the second format of the timestamp of the service message on the unique key KEY1 combination, or the hour format.
  • the first identifier KEY2 is used for the search of the writer, and after the writer corresponding to the first identifier KEY2 is found, it is written into the corresponding memory file by using the writer. Since KEY2 uses a timestamp, it is possible to implement a timed 1 second write function without using a timer.
  • the memory library DS SERVER receives the query request of the query server WEB SERVER, finds the filter value based on the unique keyword KEY1, the start time STARTTIME, the end time ENDTIME and other business fields, constructs a filter to initiate a query request, and the time type.
  • the minute directory in the time range is traversed, and the search depth is 4: year/month/day/hour/minute/. Only get a list of URLs for the level 4 directory. Then iterate through the list of time directory URLs, and the KEY1.il file exists in the directory. If the file has a line-by-line processing file, the data of each line is filtered according to the set filter FILTERMAP, and only the valid result data is cached. If the result queue exceeds the set result line number or the end of the directory list, the result is sorted according to the start time, and is divided into The package sends the query result to the query server WEB SERVER to complete the query.
  • the query server WEB SERVER accepts the query request of the user, finds the corresponding memory library DS SERVER according to the unique keyword KEY1, and sends the JSON format query request to the memory library.
  • DS SERVER where the unique keyword KEY1 is included in the query request.
  • the query server WEB SERVER will receive the query result, and at the same time provide the network management parameter configuration control center, which can provide the parameter configuration interface for the network administrator.
  • the file server is provided to the information collection module AGENT to store the signaling file and the media file, and is provided to the client for high-speed download.
  • the present invention also provides a distributed big data fast storage strategy, which can provide different response speeds according to user configurations, and is intended to evenly share the network, in order to achieve the system's processing capability and reliability for large data-level services. Traffic, improve system processing power and reliability.
  • the Intel DPDK stream processing framework is used for data acquisition, and the memory disk technology and the distributed big data storage query system are used to solve the two contradictions of generating and timely querying a large number of data files, and providing 100,000 data per second for real-time insertion.
  • writing data to the memory bank includes the following steps:
  • Step S301 the signaling collection module constructs a TLV record, and sends the hash according to the MSISDN as the unique key KEY1 to obtain the corresponding memory bank, and adds the KEY1 to the TLV record.
  • the signaling collection module AGENT collects signaling and parses the signaling, for example, constructs a TLV record, where the TLV refers to a data format including three fields of type, length, and value, and takes the hash of the MSISDN as a unique key. KEY1 is sent to get the corresponding memory bank, and this KEY1 is added to the TLV record.
  • Step S302 the memory library receives the TLV record, constructs the first identifier KEY2, and the KEY2 is the second format of the timestamp of the KEY1 and the service message, or the hour format.
  • step S303 it is found whether the writer corresponding to KEY2 is successful. If it succeeds, step S306 is performed, and if it fails, step KS304 is performed.
  • step S304 it means that the refresh time or the new MSISDN joins, and the batch (256 writers are a batch) is required to close the current writer, and when it is closed, the memory disk is forced to be written from the cache.
  • the writer corresponding to KEY2 when the writer corresponding to KEY2 is not found, it indicates that the refresh time is up to or there is a new MSISDN join. At this time, the current writer needs to be closed.
  • step S305 a writer corresponding to KEY2 is created, and the writer creates a new file in a leaf directory of a minute value or an hour value corresponding to the current system.
  • Step S306 writing to the cache of the corresponding writer.
  • step S307 it is determined whether the buffer of the writer is full. If the buffer of the writer is full, step S308 is performed. If the buffer of the writer is not full, step S301 is performed to perform processing of the next data.
  • step S308 the writer cache data is written into the file, and step S301 is completed.
  • retrieving data from the memory library includes the following steps:
  • step S401 the query server WEB SERVER accepts the query request of the user, and finds the corresponding memory bank DS SERVER according to KEY1.
  • Step S402 the memory library DS SERVER receives the query request of the query server, according to KEY1, start time STARTTIME, end time ENDTIME, and other business field filter values, constructing a filter FILTERMAP to initiate a query request.
  • step S403 it is determined whether the time type is hour or minute. If it is determined that the time type is hour, step S404 is performed, and if it is determined that the time type is minute, step S405 is performed.
  • Step S404 traversing the minute directory in the time range according to STARTTIME and ENDTIME, the search depth is 5 levels: year/month/day/hour/minute/, obtaining the URL list of the level 5 directory, and executing step S406.
  • Step S405 traversing the hour directory in the time range according to STARTTIME and ENDTIME, the search depth is 4 levels: year/month/day/hour/, obtaining the URL list of the level 4 directory, and executing step S406.
  • Step S406 traversing the same resource locator URL list of the time directory, determining whether the KEY1.il file exists in the directory, if not, executing step S406 to continue traversing, and if yes, executing step S407.
  • step S407 the file is processed line by line, and each row of data is filtered according to the set filter FILTERMAP, and only valid result data is cached.
  • step S408 it is determined whether the query result queue exceeds the preset number of result rows. If not, step S409 is performed, and if yes, step S411 is performed, and the query ends.
  • step S409 it is determined whether it is to the end of the file. If the end of the file is not reached, step S407 is performed, if the end of the file is reached. Then step S410 is performed.
  • step S410 it is determined whether the end of the directory list is reached. If the end of the list is not reached, step S406 is performed to take a time directory processing. To the end of the directory list, step S411 is directly executed, and the query ends.
  • step S411 the results are sorted according to the start time, and the query result is sent to the query server WEB SERVER.
  • the technical problem to be solved by the embodiments of the present invention is to provide a GGSN/PGW real-time signaling tracking platform capable of supporting 5 million users on the whole network, and 280 Gbps throughput (AIS bidding requirements in 2014); It can support a single GGSN/PGW 1.5 million users with 50 Gbps throughput.
  • the present invention can provide a real-time monitoring and corresponding report for the service signaling and data service types of the GGSN/PGW. These include network real-time monitoring and network element signaling backtracking.
  • the signaling of each interface of the GGSN/PGW can be monitored in real time, including S5/S8, Gn/Gp, Gx, Gy, and the authentication and authorization charging AAA interface.
  • the operator can query the signaling generated by the user on the GGSN/PGW within a certain period of time through the user IMSI/MSISDN number on the system, and can decode the signaling.
  • the signaling of all users of the entire network element can be maintained for at least 7 days for backtracking queries.
  • the company also provides a distributed big data fast storage strategy, which can provide different response speeds according to user configurations, aiming to evenly share network traffic and improve system processing capability and reliability.
  • the Intel DPDK stream processing framework is used for data collection, and the memory disk technology and the distributed big data storage query system are used to solve the problem of generating a large number of data files and timely querying.
  • it can adapt to the business demand of large data volume, and the network element shares the entire network service load in parallel and improves the service processing performance of the network.
  • a network element communication link is interrupted or faulty
  • other network elements in the distributed network take over the network element service, and the entire network operation state is not interrupted, thereby ensuring network stability and reliability.
  • each of the above modules may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, the foregoing modules are all located in the same processor; or, the modules are located in multiple In the processor.
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium may be configured to store program code for performing the method steps of the above embodiment:
  • the foregoing storage medium may include, but not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • a mobile hard disk e.g., a hard disk
  • magnetic memory e.g., a hard disk
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein. Execution shown or described The steps are either made into individual integrated circuit modules, or a plurality of modules or steps are made into a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software.
  • the signaling of the general-purpose packet radio service support node GGSN or the public data network gateway PGW of the collection gateway is adopted, wherein the signaling is user signaling; the unique keyword of the user is obtained; and the only key is The word is stored in the multi-level directory of the data storage server, which solves the problem of low signaling storage efficiency in the related art, thereby achieving the effect of improving signaling storage efficiency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据处理方法、装置及系统,该方法包括:采集网关通用分组无线业务支持节点GGSN或公共数据网网关PGW的信令(S102),其中,上述信令为用户的信令;获取上述用户的唯一关键字(S104);以及根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中(S106)。该方法、装置及系统解决了相关技术中信令存储效率较低的问题,进而达到了提高信令存储效率的效果。

Description

数据处理方法、装置及系统 技术领域
本发明涉及通信领域,具体而言,涉及一种数据处理方法、装置及系统。
背景技术
移动互联网给运营商带来机遇的同时也带来了挑战,而信令作为通信网络最基本,也是最关键的组成部分,反映着网络质量与业务提供的方方面面,所以运营商不惜巨资建设信令监测平台,用其服务于话务跟踪、网规网优、故障诊断等面对生产的功能域。如何提供高可用性的信令跟踪平台是当务之急。
随着数据收集手段的不断丰富及完善,越来越多的行业数据被积累下来。数据规模已经增长到了传统软件行业无法承载的大数据(例如,100GB、TB、PB)级别。在大数据场景下,大数据的存储则成为了急需解决的问题。
目前,可以采用关系型数据库来存储大数据,例如,将具有关联关系的多个数据分别存储在不同数据库的不同数据表中,并记录各个不同数据库中存储的数据之间的关系,以便将各个数据关联起来。而实际测试数据表明,例如,向SQL Server数据库中插入数据,常用的办法是由应用程序直接(或间接)使用插入(Insert)的结构化查询语句(Structured Query Language,简称为SQL)进行插入,这种方法速度太慢,经测试其速度最快(当原始表为空表时)也仅仅是1000条记录每秒。对于先保存为文件、再批量导入到数据库来提供检索的方法,例如,SQL Server中的批量插入(Bulk Insert),以用户指定的格式复制一个数据文件至数据库表或视图中,经过测试该种方法速度虽比使用插入(Insert)语句快,大约60000条记录每秒,插入数据的速度提高了60倍,但是生成这些用于导入的指定格式的数据文件也有时间开销,实际的记录入库速度减半。
此外,同时利用关联关系将各个数据存储到不同数据库的不同数据表中的方法,数据存储方式松散,其关联关系必须通过关系型数据库来体现。对于大数据的存储,这种松散存储数据及利用关联关系记录不同数据表中数据的方法,会大大降低数据存储的效率,并会进一步降低后续查找和维护的效率。
针对相关技术中信令存储效率较低的问题,目前尚未提出有效的解决方案。
发明内容
本发明实施例提供了一种数据处理方法、装置及系统,以至少解决相关技术中信令存储效率较低的问题。
根据本发明实施例的一个方面,提供了一种数据处理方法,包括:采集网关通用分组无 线业务支持节点GGSN或公共数据网网关PGW的信令,其中,上述信令为用户的信令;获取上述用户的唯一关键字;以及根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中。
可选地,采集网关通用分组无线业务支持节点GGSN或公共数据网网关PGW的信令包括:以光口镜像的方式连接到上述通用分组无线业务支持节点或上述公共数据网网关的接口以采集上述信令,其中,上述接口包括以下至少之一:S5接口,S8接口,Gn接口,Gp接口,Gx接口,Gy接口,认证授权计费AAA接口。
可选地,获取上述用户的唯一关键字包括:获取上述用户的识别码,其中,上述识别码包括国际移动用户识别码IMSI或移动用户综合业务数字网号码MSISDN;对上述识别码进行哈希运算,得到上述唯一关键字。
可选地,在根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中之前,上述方法还包括:根据时间在上述数据存储服务器中生成多级目录。
可选地,在根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中之后,上述方法包括:检测上述多级目录中是否存在超过预设时间的目录;以及在检测出上述多级目录中存在超过上述预设时间的目录时,将超过上述预设时间的目录从上述数据存储服务器中删除。
可选地,根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中包括:根据上述唯一关键字查找上述用户对应的数据存储服务器;以及将上述信令存储至上述用户对应的数据存储服务器的多级目录中。
可选地,将上述信令存储至上述用户对应的数据存储服务器的多级目录中包括:获取业务消息的时间戳;根据上述时间戳和上述唯一关键字生成第一标识符;获取上述第一标识符对应的写入器,其中,上述写入器和上述多级目录一一对应;以及通过上述写入器将上述信令写入至其对应的目录中。
可选地,上述数据存储服务器包括内存库和文件服务器,其中,上述内存库用于存储上述信令的概要信息,上述文件服务器用于存储上述信令的文件信息,上述概要信息和上述文件信息之间存在映射关系。
可选地,在根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中之后,上述方法还包括:接收查询指令,其中,上述查询指令包括过滤条件和上述唯一关键字;查找上述唯一关键字对应的数据存储服务器;以及根据上述过滤条件从上述唯一关键字对应的数据存储服务器中查询数据。
可选地,根据上述过滤条件从上述唯一关键字对应的数据存储服务器中查询数据包括:根据上述过滤条件遍历上述唯一关键字对应的数据存储服务器的多级目录;从上述唯一关键字对应的数据存储服务器的多级目录中获取满足上述过滤条件的数据,得到查询结果;判断上述查询结果的数据行数是否超过预设值;以及在判断出上述查询结果的数据行数超过上述 预设值时,分批次显示上述查询结果。
根据本发明实施例的另一方面,提供了一种数据处理装置,包括:采集模块,设置为采集网关通用分组无线业务支持节点GGSN或公共数据网网关PGW的信令,其中,上述信令为用户的信令;获取模块,设置为获取上述用户的唯一关键字;以及存储模块,设置为根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中。
可选地,上述采集模块包括:信令采集器,以光口镜像的方式连接到上述通用分组无线业务支持节点或上述公共数据网网关的接口以采集上述信令,其中,上述接口包括以下至少之一:S5接口,S8接口,Gn接口,Gp接口,Gx接口,Gy接口,认证授权计费AAA接口。
可选地,上述获取模块包括:获取单元,设置为获取上述用户的识别码,其中,上述识别码包括国际移动用户识别码IMSI或移动用户综合业务数字网号码MSISDN;运算单元,设置为对上述识别码进行哈希运算,得到上述唯一关键字。
可选地,上述装置还包括:生成模块,设置为根据时间在上述数据存储服务器中生成多级目录。
可选地,上述存储模块包括:查找单元,设置为根据上述唯一关键字查找上述用户对应的数据存储服务器;以及存储单元,设置为将上述信令存储至上述用户对应的数据存储服务器的多级目录中。
根据本发明实施例的又一方面,提供了一种数据处理系统,包括:数据采集服务器,设置为采集网关通用分组无线业务支持节点GGSN或公共数据网网关PGW的信令,其中,上述信令为用户的信令;以及数据存储服务器,连接至上述数据采集模块,其中,上述数据存储服务器包括多级目录,上述多级目录用于存储上述信令。
可选地,上述数据存储服务器包括内存库和文件服务器,其中,上述内存库用于存储上述信令的概要信息,上述文件服务器用于存储上述信令的文件信息,上述概要信息和上述文件信息之间存在映射关系。
可选地,上述数据采集服务器包括探针信令采集器,上述探针信令采集器以光口镜像的方式连接到上述通用分组无线业务支持节点或上述公共数据网网关的接口以采集上述信令,其中,上述接口包括以下至少之一:S5接口,S8接口,Gn接口,Gp接口,Gx接口,Gy接口,认证授权计费AAA接口。
可选地,上述数据采集服务器还包括处理模块,连接至上述探针信令采集器,设置为对上述探针信令采集器采集的信令进行解析得到上述概要信息和上述文件信息,并将上述概要信息和上述文件信息分别发送至上述内存库和上述文件服务器。
可选地,上述数据处理系统还包括:查询服务器,连接至上述数据存储服务器,设置为从上述数据存储服务器查询上述信令。
在本发明实施例中,还提供了一种计算机存储介质,该计算机存储介质可以存储有执行 指令,该执行指令用于执行上述实施例中的数据处理方法。
通过本发明实施例,采用采集网关通用分组无线业务支持节点GGSN或公共数据网网关PGW的信令,其中,上述信令为用户的信令;获取上述用户的唯一关键字;以及根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中,解决了相关技术中信令存储效率较低的问题,进而达到了提高信令存储效率的效果。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1是根据本发明实施例的数据处理方法的流程图;
图2是根据本发明实施例的多级目录的示意图;
图3是根据本发明实施例的内存库写入数据的流程图;
图4是根据本发明实施例的内存库检索数据流程示意图;
图5是根据本发明实施例的内存库检索信息层次示意图;
图6是根据本发明实施例的数据处理装置的结构框图;
图7是根据本发明实施例的数据处理系统的结构框图;以及
图8是根据本发明实施例的内存库检索数据系统部署示意图。
具体实施方式
下文中将参考附图并结合实施例来详细说明本发明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
在本实施例中提供了一种数据处理方法,图1是根据本发明实施例的数据处理方法的流程图,如图1所示,该流程包括如下步骤:
步骤S102,采集网关通用分组无线业务支持节点GGSN或公共数据网网关PGW的信令,其中,上述信令为用户的信令.
本发明实施例可以通过监测网关通用分组无线业务支持节点(Gateway General Packet Radio Service Supporting Node,简称为GGSN)或公共数据网网关(Public Data Network Gateway,简称为PGW)的各个接口采集用户的信令,其中,用户可以是一个,也可以是多个。优选地,为了保证GGSN或PGW的各个接口正常工作,采集网关通用分组无线业务支持 节点GGSN或公共数据网网关PGW的信令包括:以光口镜像的方式连接到上述通用分组无线业务支持节点或上述公共数据网网关的接口以采集上述信令,其中,上述接口包括以下至少之一:S5接口,S8接口,Gn接口,Gp接口,Gx接口,Gy接口,认证授权计费AAA接口。
例如,可以通过探针信令采集器以光口镜像的方式连接到GGSN或PGW的各个接口,从而可以实时采集GGSN或PGW的各个接口的信令。本发明实施例通过光口镜像的方式采集GGSN或PGW的接口的信令,可以避免在采集GGSN或PGW的接口的信令的过程中影响GGSN或PGW的接口的正常工作。
步骤S104,获取上述用户的唯一关键字;
由于网元中存在大量的用户,在采集用户的信令时,为了便于对每个用户的信令进行区分,本发明实施例中每个用户均对应于一个唯一关键字,通过该唯一关键字对用户进行唯一标识。优选地,获取上述用户的唯一关键字包括:获取上述用户的识别码,其中,上述识别码包括国际移动用户识别码(International Mobie Subscriber Identity,简称为IMSI)或移动用户综合业务数字网号码(Mobie Subscriber International Integranted Services Digital/Public Switched Telephone Network Number,简称为MSISDN);对上述识别码进行哈希运算,得到上述唯一关键字。
在网元中的每个用户均有对应的国际移动用户识别码IMSI或移动用户综合业务数字网号码MSISDN,通过对用户对应的IMSI或MSISDN进行哈希运算得到哈希值,并将该哈希值作为上述唯一关键字,便于后续每个用户信令的快速存储和快速查找。
步骤S106,根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中。
本发明实施例可以预先在数据存储服务器中创建多级目录,也可以是在将上述信令存储至数据存储服务器的过程中在数据存储服务器中动态的生成多级目录,具体的,本发明实施例将用户的信令存储至数据存储服务器的多级目录中的文件中,例如,根据唯一关键字命名的文件中。优选地,在根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中之前,上述方法还包括:根据时间在上述数据存储服务器中生成多级目录。
例如,按照年、月、日、小时、分钟生成树型多级目录,其中,年为根目录,分钟为叶子目录,图2是根据本发明实施例的多级目录的示意图,如图2所示,依次根据年、月、日、小时、分钟生成多级目录,将用户信令按照时间存储至对应的目录中,例如,信令1是在2014年12月30日12点20分采集的,可以将信令1存储在图2所示的20分目录中根据唯一关键字命名的文件中,信令2是在2014年12月30日12点22分采集,可以将信令2存储于22分目录中根据唯一关键字命名的文件中(图2中未示出)。需要说明的是,本发明实施例可以根据数据量的多少来决定多级目录的级数,例如,数据量少时,可以采用小时作为叶子目录,即为4级目录,数据量较大时,可以采用分钟作为叶子目录,即为5级目录。
通过上述步骤,根据唯一关键字将用户的信令存储至数据存储服务器的多级目录中,相比于现有技术中将用户的信令存储于数据库中,存储速度更快,解决了相关技术中信令存储 效率较低的问题,进而达到了提高信令存储效率的效果。
优选地,为了减少内存资源的占用,在根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中之后,上述方法包括:检测上述多级目录中是否存在超过预设时间的目录;以及在检测出上述多级目录中存在超过上述预设时间的目录时,将超过上述预设时间的目录从上述数据存储服务器中删除。
由于网元中用户的信令具有较强的实时性,对网元用户进行监控时,通常只需分析最近一段时间的用户信令。本发明实施例在根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中之后,可以删除那些存储时间较长的用户信令,一方面可以节省内存的占用,另一方面也利于用户信令的快速检索。上述预设时间可以根据实际情况进行设置,例如,将预设天数设置为7天,可以将超过预设时间的目录直接从数据存储服务器中删除。例如,可以每天检查1次是否有超过7天的目录,若存在就按时间删除目录,而不必检查文件内容。
优选地,根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中包括:根据上述唯一关键字查找上述用户对应的数据存储服务器;以及将上述信令存储至上述用户对应的数据存储服务器的多级目录中。
由于网元中存在大量的用户,为了便于快速存储用户的信令到该用户对应的数据存储服务器中,可以预先将用户的唯一关键字和其对应的数据存储服务器进行关联,通过用户的唯一关键字即可查找到该用户对应的数据存储服务器,并将用户的信令均存储于该用户对应的数据存储服务器的多级目录中,从而便于后续实现对用户信令的快速检索。
优选地,将上述信令存储至上述用户对应的数据存储服务器的多级目录中包括:获取业务消息的时间戳;根据上述时间戳和上述唯一关键字生成第一标识符;获取上述第一标识符对应的写入器,其中,上述写入器和上述多级目录一一对应;以及通过上述写入器将上述信令写入至其对应的目录中。
业务消息即用户的信令,根据时间戳和唯一关键字生成第一标识符,第一标识符用于写入器的查找,找到第一标识符对应的写入器后,即利用写入器写入到对应的内存文件中(即多级目录中存放的文件)。由于第一标识符使用了时间戳,这样不必使用定时器就可以实现定时1秒写入的功能,例如满1秒时第一标识符必然不同,会创建新的写入器,在实时要求高的情况下要保证1秒会强制写入一次文件,无论缓存是否满,没有使用定时器,却能达到定时写入的作用。
优选地,上述数据存储服务器包括内存库和文件服务器,其中,上述内存库用于存储上述信令的概要信息,上述文件服务器用于存储上述信令的文件信息,上述概要信息和上述文件信息之间存在映射关系。
本发明实施例采用分布式存储方法,将信令的概要信息和信令的文件信息分别存储于内存库和文件服务器中。具体的,通过将信令进行解析可以得到信令的概要信息和信令的文件 信息,其中,信令的概要信息包括信令文件的统一资源定位符(Uniform Resource Locator,简称为URL)信息和媒体文件的统一资源定位符URL信息,信令的文件信息则包括详细的信令文件和媒体文件,本发明实施例通过信令文件的URL信息即可以得到对应的信令文件,通过媒体文件的URL信息即可以得到对应的媒体文件,因此,在检索过程中,仅需从内存库中检索信令的概要信息既可以得到其对应的文件信息。
优选地,在根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中之后,上述方法还包括:接收查询指令,其中,上述查询指令包括过滤条件和上述唯一关键字;查找上述唯一关键字对应的数据存储服务器;以及根据上述过滤条件从上述唯一关键字对应的数据存储服务器中查询数据。
在将上述信令存储至数据存储服务器的多级目录中之后,可以对数据存储服务器中存储的用户信令进行查询,本发明实施例通过在查询指令中包括唯一关键字,从而可以根据唯一关键字快速地从数据存储服务器检索到该用户的信令。
优选地,根据上述过滤条件从上述唯一关键字对应的数据存储服务器中查询数据包括:根据上述过滤条件遍历上述唯一关键字对应的数据存储服务器的多级目录;从上述唯一关键字对应的数据存储服务器的多级目录中获取满足上述过滤条件的数据,得到查询结果;判断上述查询结果的数据行数是否超过预设值;以及在判断出上述查询结果的数据行数超过上述预设值时,分批次显示上述查询结果。
为了提高信令检索的效率,本发明实施例可以根据用户查询习惯(例如,用户每次要看的最大数据行数)来减轻服务器的检索深度。具体地,可以设置每次显示的查询结果的行数,在查询结果大于预设的行数时(即预设值),则分批次显示上述查询结果。
本发明实施例不采用任何商用数据库,来实现海量的数据的快速存储和查询,而是采用一种树型的存储结构,将用户信令存储于内存库中,其数据文件格式可配置,例如,采用TLV(即包括类型、长度、值三个字段的数据格式)来描述,同时可以通过可扩展标记语言(Extensible Markup Language,简称为XML)文件来定义相关的数据字典,在存储和查询时作为数据处理的依据。配置了不同用户信令的唯一性关键字KEY1,唯一性关键字KEY1用于文件生成时的文件名,以及查询时匹配对应的内存库DS SERVER。文件生成时,用户可以根据数据量的多少来决定采用小时作为叶子目录,还是以分钟作为叶子目录来保存,大数据情况下需要配置为以分钟作为叶子目录来保存。具体地,本发明实施例采用分布式组网架构,即在网络中部署多个信令采集模块AGNENT和内存库DS SERVER。多个信令采集模块AGNENT和多个内存库DS SERVER之间通过MSISDN取哈希值作为唯一关键字KEY1进行关联,查询服务器WEB SERVER的查询请求与内存库DS SERVER之间的转发关系也是通过查询条件中的唯一关键字KEY1的哈希值进行关联,各个并行处理节点共同分担处理GGSN或PGW网元抓取的协议包。
图3是根据本发明实施例的内存库写入数据的流程图。如图3所示,往内存库中写入数据(相当于将信令存储至数据存储服务器的多级目录中)包括如下步骤:
步骤S301,信令采集模块构建TLV记录,按MSISDN取哈希作为唯一关键字KEY1发送到得到对应内存库,并将此KEY1加入到TLV记录中。
信令采集模块AGENT采集信令,并对信令进行解析处理,例如,构建TLV记录,其中,TLV是指包括类型、长度和值三个字段的数据格式,将MSISDN取哈希作为唯一关键字KEY1发送到得到对应内存库,并将此KEY1加入到TLV记录中。
步骤S302,内存库接收到TLV记录,构建第一标识符KEY2,KEY2为KEY1和业务消息的时间戳的秒格式,或者小时格式。
通过上述方式可以不需要定时,满1秒或1小时时KEY2必然不同,会创建新的写入器,在实时要求高的情况下保证1秒会强制写入一次文件,无论缓存是否满。
步骤S303,查找KEY2对应的写入器是否成功,成功则执行步骤S306,失败则执行步骤KS304。
步骤S304,意味刷新时间到或新的MSISDN加入,需要批量(256个写入器为一批)关闭下当前的写入器,关闭时会强制从缓存写入内存盘。
具体地,当查找不到KEY2对应的写入器时,则表示刷新时间到或存在新的MSISDN加入,此时,需要关闭当前的写入器。
步骤S305,创建KEY2对应的写入器,写入器会在当前系统对应的分钟值或者小时值的叶子目录中创建新的文件。
创建写入器会创建对应的时间叶子目录和文件,以及缓存,写入器先入缓存,通常缓存满才写入文件,文件存放在内存虚拟盘中。需要说明的是,同一个MSISDN的数据文件名相同,不同时间目录下会有相同文件名的数据文件。
步骤S306,写入到对应写入器的缓存。
步骤S307,判断写入器的缓存是否已满,如果写入器的缓存已满则执行步骤S308,如果写入器的缓存未满则执行步骤S301,进行下一条数据的处理。
步骤S308,写入器缓存数据写入文件,完毕执行步骤S301。
图4是根据本发明实施例的内存库检索数据流程示意图,如图4所示,从内存库检索数据(相当于上述实施例中从数据存储服务器中查询数据)包括如下步骤:
步骤S401,查询服务器WEB SERVER接受用户的查询请求,根据KEY1找到对应的内存库DS SERVER。
需要说明的是,TLV数据由CHRMAP来定义数据字典;PATCHMAP定义TLV数据的关键信息,例如,KEY1的索引;FILTERMAP定义了全部过滤条件。
步骤S402,内存库DS SERVER接收查询服务器的查询请求,找到根据KEY1,开始时间 STARTTIME,结束时间ENDTIME,以及其他业务字段过滤值,构造过滤器FILTERMAP发起查询请求。
步骤S403,判断时间类型是小时还是分钟。如果判断出时间类型为小时则执行步骤S404,如果判断出时间类型为分钟则执行步骤S405。
步骤S404,根据STARTTIME和ENDTIME遍历时间范围内的分钟目录,搜索深度为5级:年/月/日/小时/分钟/,获取第5级目录的URL列表,并执行步骤S406。
步骤S405,根据STARTTIME和ENDTIME遍历时间范围内的小时目录,搜索深度为4级:年/月/日/小时/,获取第4级目录的URL列表,并执行步骤S406。
步骤S406,遍历时间目录的同一资源定位符URL列表,判断目录下KEY1.il文件是否存在,如果不存在则执行步骤S406继续遍历,如果存在则执行步骤S407。
具体地,一个目录下文件很多,故只按保存符合条件的目录列表。由于查询时KEY1是指定的,故文件名是固定的,这样不必获取文件列表,而只用判断各个文件目录下KEY1.il文件是否存在。
步骤S407,逐行处理文件,对各行数据根据设置的过滤器FILTERMAP过滤,只缓存有效的结果数据。
步骤S408,判断查询结果队列是否超过预设的结果行数,如果未超过则执行步骤S409,超过则执行步骤S411,查询结束。
步骤S409,判断是否到文件尾部,如果未到文件尾部则执行步骤S407,如果到文件尾部则执行步骤S410。
步骤S410,判断是否到目录列表尾部,如果未到列表尾部则执行步骤S406取下一个时间目录处理,到目录列表尾部则直接执行步骤S411,查询结束。
步骤S411,将结果按开始时间排序,并分包发送查询结果到查询服务器WEB SERVER。
图5是根据本发明实施例的内存库检索信息层次示意图。本发明实施例提供了一种树型的存储结构,信令跟踪涉及到很多媒体文件、信令文件等,本发明实施例的内存库中保存的是这些信息的概要,是最上层的数据,也是存储和查询最快的数据。概要信息中可以看到一个业务流程中涉及的信令和媒体文件的URL信息,客户端对信令流程的展现只用将内存库中保存的信息和对应URL的文件内容关联就可以了。大量的媒体文件和信令文件也是按分钟为叶子节点分开的目录结构下保存,和内存库处理相同,而内存库记录实现了这些文件和信令流程的管理处理。
本方明实施例的分布式大数据快速存储策略,能根据用户的配置提供不同的响应速度,均匀分担网络业务量,提高系统处理能力和可靠性,如采用Intel DPDK流处理框架进行数据采集,采用内存盘技术,和分布式大数据存储查询系统,解决大量数据文件的生成,和及时 查询之间的矛盾,提供了10万条每秒的数据实时插入的能力和实时的快速查询的能力。同时能适应在大数据量的业务需求下,网元并行分担整个网络业务负载,提高网络的业务处理性能。同时,在某个网元通信链路出现中断或故障时,分布式网络中的其它网元接替该网元业务,整个网络运营状态不中断,保证了网络的稳定性和可靠性。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。
在本实施例中还提供了一种数据处理装置,该装置用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图6是根据本发明实施例的数据处理装置的结构框图,如图6所示,该装置包括采集模块62、获取模块64和存储模块66。
采集模块62,设置为采集网关通用分组无线业务支持节点GGSN或公共数据网网关PGW的信令,其中,上述信令为用户的信令;
本发明实施例可以通过监测GGSN或PGW的各个接口采集用户的信令,其中,用户可以是一个,也可以是多个。优选地,上述采集模块62包括:信令采集器,以光口镜像的方式连接到上述通用分组无线业务支持节点或上述公共数据网网关的接口以采集上述信令,其中,上述接口包括以下至少之一:S6接口,S8接口,Gn接口,Gp接口,Gx接口,Gy接口,认证授权计费AAA接口。
获取模块64,设置为获取上述用户的唯一关键字;
由于网元中存在大量的用户,在采集用户的信令时,为了便于对每个用户的信令进行区分,本发明实施例中每个用户均对应于一个唯一关键字,通过该唯一关键字对用户进行唯一标识。优选地,上述获取模块64包括:获取单元,设置为获取上述用户的识别码,其中,上述用户识别码包括国际移动用户识别码IMSI或移动用户综合业务数字网号码MSISDN;运算单元,设置为对上述识别码进行哈希运算,得到上述唯一关键字。
在网元中的每个用户均有对应的国际移动用户识别码IMSI或移动用户综合业务数字网号码MSISDN,通过对用户对应的IMSI或MSISDN进行哈希运算得到哈希值,并将该哈希值作为上述唯一关键字,便于后续每个用户信令的快速存储和快速查找。
存储模块66,设置为根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中。
本发明实施例可以预先在数据存储服务器中创建多级目录,也可以是在将上述信令存储至数据存储服务器的过程中在数据存储服务器中动态的生成多级目录。
本发明实施例通过采集模块62采集网关通用分组无线业务支持节点GGSN或公共数据网网关PGW的信令,其中,上述信令为用户的信令;获取模块64获取上述用户的唯一关键字;以及存储模块66,设置为根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中。相比于现有技术中将用户的信令存储于数据库中,存储速度更快,解决了相关技术中信令存储效率较低的问题,进而达到了提高信令存储效率的效果。
优选地,在根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中之前,上述装置还包括:生成模块,设置为根据时间在上述数据存储服务器中生成多级目录。
例如,按照年、月、日、小时、分钟生成树型多级目录,其中,年为根目录,分钟为叶子目录。本发明实施例可以根据数据量的多少来决定多级目录的级数,例如,数据量少时,可以采用小时作为叶子目录,即为4级目录,数据量较大时,可以采用分钟作为叶子目录,即为5级目录。
优选地,上述存储模块66包括:查找单元,设置为根据上述唯一关键字查找上述用户对应的数据存储服务器;以及存储单元,设置为将上述信令存储至上述用户对应的数据存储服务器的多级目录中。
由于网元中存在大量的用户,为了便于快速存储用户的信令到该用户对应的数据存储服务器中,可以预先将用户的唯一关键字和其对应的数据存储服务器进行关联,通过用户的唯一关键字即可查找到该用户对应的数据存储服务器,并将用户的信令均存储于该用户对应的数据存储服务器的多级目录中,从而便于后续实现对用户信令的快速检索。
在本实施例中还提供了一种数据处理系统。图7是根据本发明实施例的数据处理系统的结构框图。如图7所示,数据处理系统包括:数据采集服务器72和数据存储服务器74。
数据采集服务器72,设置为采集网关通用分组无线业务支持节点GGSN或公共数据网网关PGW的信令,其中,上述信令为用户的信令。
优选地,上述数据采集服务器包括探针信令采集器,上述探针信令采集器以光口镜像的方式连接到上述通用分组无线业务支持节点或上述公共数据网网关的接口以采集上述信令,其中,上述接口包括以下至少之一:S5接口,S8接口,Gn接口,Gp接口,Gx接口,Gy接口,认证授权计费AAA接口。
本发明实施例通过光口镜像的方式采集GGSN或PGW的接口的信令,可以避免在采集GGSN或PGW的接口的信令的过程中影响GGSN或PGW的接口的正常工作。
数据存储服务器74,连接至上述数据采集模块,其中,上述数据存储服务器包括多级目录,上述多级目录用于存储上述信令。
本发明实施例通过数据采集服务器72采集网关通用分组无线业务支持节点GGSN或公共 数据网网关PGW的信令,其中,上述信令为用户的信令,数据存储服务器74,以多级目录格式存储上述信令,解决了相关技术中信令存储效率较低的问题,进而达到了提高信令存储效率的效果。
优选地,上述数据存储服务器包括内存库和文件服务器,其中,上述内存库用于存储上述信令的概要信息,上述文件服务器用于存储上述信令的文件信息,上述概要信息和上述文件信息之间存在映射关系。
信令的概要信息包括信令文件的统一资源定位符URL信息和媒体文件的统一资源定位符URL信息,信令的文件信息则包括详细的信令文件和媒体文件,本发明实施例通过信令文件的URL信息即可以得到对应的信令文件,通过媒体文件的URL信息即可以得到对应的媒体文件,因此,在检索过程中,仅需从内存库中检索信令的概要信息既可以得到其对应的文件信息。
优选地,数据采集服务器还包括处理模块,连接至探针信令采集器,设置为对探针信令采集器采集的信令进行解析得到概要信息和文件信息,并将概要信息和文件信息分别发送至内存库和文件服务器。
本发明实施例采用分布式存储方法,将信令的概要信息和信令的文件信息分别存储于内存库和文件服务器中。具体的,数据采集服务器的处理器将信令进行解析得到信令的概要信息和信令的文件信息,并将概要信息和文件信息分别发送至内存库和文件服务器。
优选地,上述数据处理系统还包括:查询服务器,连接至上述数据存储服务器,设置为从上述数据存储服务器查询上述信令。
查询服务器设置为从数据存储服务器查询网元用户的信令,以实现对网元用户的监控。
图8是根据本发明实施例的内存库检索数据系统部署示意图。如图8所示,内存库检索数据系统包括多个信令采集模块(即信令采集模块1至信令采集模块m),连接至GGSN或PGW各个接口以采集用户信令,多个内存库(即内存库1至内存库n),查询服务器和客户端查询模块,其中,在上报入库流程中,信令采集模块上报消息并根据MSISD取哈希做唯一关键字来匹配对应的内存库;在查询流程中,查询服务器的查询请求也根据必选条件,例如,MSISDN取哈希做唯一关键字来匹配对应的内存库。
本发明实施例在各个服务器使用权限受到严格限制时,通过探针信令采集器以光口镜像的方式连接到GGSN或PGW各个接口的信令进行实时监控,包括S5/S8接口,Gn/Gp接口,Gx接口,Gy接口和认证授权计费AAA接口。
该系统是在现有运营商的移动数据网络中通过新增网元的方式实现的,其在移动数据网络架构拓扑中通过信令采集模块AGENT接入GGSN或PGW之间的Gn/Gp接口,Gx接口,Gy接口和认证授权计费AAA接口,信令采集模块AGENT以探针采集的方式获取各个接口的数据包,提取网络实时数据,并按用户号MSISDN提取出用户相关的信令流程。内存库DS SERVER接收信令采集模块AGENT构建的信令概要信息的TLV记录,并实时入库。查询服 务器WEB SERVER实现客户端的可定制查询功能,查询服务器WEB SERVER接收用户的查询请求,根据唯一关键字KEY1找到对应的内存库DS SERVER,并把JAVA脚本对象表示格式(JavaScript Object Notation,简称为JSON)的查询请求发送给内存库DS SERVER,查询请求中包括了唯一关键字KEY1。内存库DS SERVER的查询处理完毕后,查询服务器WEB SERVER会收到查询结果,同时提供网管参数配置管控中心,能够为网络管理人员提供参数配置界面。查询模块包含了高效的查询算法,查询条件(即查询指令)中包括三个信息:①起始时间;②终止时间;③MSISDN,其中,起始时间和终止时间精确到分钟量级。查询条件分别转化为对应日期、小时、MSISDN,并在日期/小时/分钟/这样的三级文件目录中依层次执行查找配匹。其中,查询结果为信令流程图,单击某行,会出现该条信令的详细的协议码流和协议解码详细信息。网元信令回溯系统数据查询步骤如下:
步骤1:用户在客户端查询模块的网络查询客户界面输入查询条件(即查询指令)包括:开始时间、结束时间、MSISDN、最大返回行数,组装为JSON格式。
步骤2:查询服务器WEB SERVER根据MSISDN取哈希得到唯一关键字KEY1,并将KEY1加入查询参数组合后,根据KEY1找到匹配的内存库DS SERVER,将该查询请求数据包以JSON格式发送给它。
步骤3:内存库DS SERVER的查询监听到有查询请求数据包到来,获取该JSON格式的数据包中的查询条件并转化为:起始日期、结束日期、KEY1。并在内存库中根据最大返回行数搜索满足条件的日志记录。
步骤4:内存库DS SERVER将所有满足条件的数据集组包以基于用户数据包协议的数据传输协议(UDP-based Data Transfer Protocol,简称为UDT)报文方式快速发送给查询服务器WEB SERVER。
步骤5:查询服务器WEB SERVER收到对应的内存库DS SERVER返回的查询结果数据包,对其按照时间进行排序,并最终结果以JSON格式发送给客户端,客户端转换后呈现在查询界面上。
现有技术中,专利号为CN104636199A的“一种基于分布式内存计算的大数据实时处理系统及方法”存在以下不足:写文件前没有考虑重复的问题,将新旧两个版的文件元数据在服务器端进行比较,通过存储层中文件块对相同数据进行冗余去重,存在较大的系统开销,而本发明数据先按IMSI的哈希码进行过滤到不同文件,保证相同的关键字在同一文件,查询时可以按IMSI求哈希值直接定位到对应文件。同时文件按细化到分钟的目录存放,查询时根据时间范围可以锁定到为数很少的几个目录。此外,本发明实施例在查询上采用了可定制查询,就是用户需要看几条,服务端就只处理完文件中的对应有限行文本返回,在大数据环境下,不必读完整个文件,大大提高了响应速度。本发明通过系统的规划,保证了快速定位、快速查询。专利号为CN104679893A的“一种基于大数据的信息检索方法”存在以下不足:该基于大数据的信息检索方法中数据涉及到多个不同主机的多重备份和一致性维护,比较复杂,影响了系统的海量数据的处理能力。本发明实施例采用对MSISDN取哈希得到唯一关键 字KEY1后,进行精确的发送,可以规避不同主机上数据重复的问题。分布式存储和分布式查询采用相同的字段的相同的散列算法,都定位到同一个内存库DS SERVER上,不会出现一个查询涉及多个主机的现象。同时本发明中的信息模型为一个典型的树结构,顶级的是我们分布式内存库中的各个表,下级是各个表对应的信令文件、媒体文件,内存表的表现形式也是数据文件,内存表的访问也是对文件目录的过滤和文件内容的过滤。
本发明实施例提供了一种分布式大数据快速存储查询系统,对GGSN/PGW的业务信令及数据业务类型提供实时监控和相应报表。其中包括网络实时监控以及网元信令回溯功能。可以对GGSN/PGW各个接口的信令进行实时监控,包括S5/S8,Gn/Gp,Gx,Gy,认证授权计费AAA接口。运营商可以在系统上通过用户IMSI/MSISDN号码查询到某一时段内该用户在GGSN/PGW上发生的信令,并能对这些信令进行解码。可以至少保持7天的全网元所有用户的信令,用于回溯查询。
同时,本发明还提供了一种分布式大数据快速存储策略,能根据用户的配置提供不同的响应速度,旨在均匀分担网络业务量,提高系统处理能力和可靠性。如采用Intel DPDK流处理框架进行数据采集,采用内存盘技术,和分布式大数据存储查询系统,解决大量数据文件的生成,和及时查询2个矛盾。提供了10万条每秒的数据实时插入的能力。
本发明针对于实际网络环境中不同的场景需求,提供两种基于分布式的上网日志回溯系统。一、在各个服务器使用权限受到严格限制时,通过探针信令采集器以光口镜像的方式连接到GGSN/PGW各个接口的信令进行实时监控,包括S5/S8,Gn/Gp,Gx,Gy,认证授权计费AAA接口;二、使用如MSISDN取哈希作为系统的唯一关键字KEY1,用于网络查询和内存库DS SERVER的关联,信令采集模块AGENT和内存库DS SERVER上报消息目的的关联,用于内存库文件的唯一命名。三、系统采用了分布式内存库和分布式文件系统组合的方式提供了从概要到细目的分层信息结构,概要信息存放在内存库中,详细的信息(即信令文件、媒体文件等)通过分布式文件服务器分散保存,概要信息中包括例如信令文件的统一资源定位符URL和媒体文件的统一资源定位符URL,在客户端需要详细信息时,可以通过URL下载本地,在客户端本地工具中呈现,不影响服务器的性能。四、利用系统数据的时间戳,减少了大量定时器的使用;利用用户查询习惯(一次要看的最大数据行数)来减轻服务器的检索深度;利用内存处理替代文件处理,提高了系统处理能力。
因此,本系统装置设有信令采集模块AGENT,内存库DS Server,查询服务器WEB SERVER,文件服务器,共4个组成部件。其中,信令采集模块AGENT和内存库DS SERVER分别部署在不同的网络环境中。各个组件具体功能如下:
(1)信令采集模块AGENT,利用探针模块(例如,探针信令采集器)抓取GGSN/PGW各个接口的信令,并进行各个协议状态机的解析得到相关概要信息和各个信令文件、媒体文件,文件保存到分布式文件服务器;将概要信息按MSISDN取哈希作为唯一关键字KEY1发送到得到对应的内存库DS SERVER。
(2)内存库DS SERVER接收信令采集模块AGENT构建的TLV记录,并根据数据字典解 析出唯一关键字KEY1,并利用唯一关键字KEY1构建第一标识符KEY2。第一标识符KEY2为唯一关键字KEY1组合上业务消息的时间戳的秒格式,或者小时格式。第一标识符KEY2用于写入器的查找,找到第一标识符KEY2对应的写入器后,即利用写入器写入到对应的内存文件中。由于KEY2使用了时间戳,这样不必使用定时器就可以实现定时1秒写入的功能。例如,满1秒时KEY2必然不同,会创建新的写入器,在实时要求高的情况下要保证1秒会强制写入一次文件,无论缓存是否满,没有使用定时器,却能达到定时写入的作用。同时还用处理查询请求,内存库DS SERVER接收查询服务器WEB SERVER的查询请求,找到根据唯一关键字KEY1、开始时间STARTTIME、结束时间ENDTIME以及其他业务字段过滤值,构造过滤器发起查询请求,时间类型为分钟时,根据开始时间STARTTIME、结束时间ENDTIME,遍历时间范围内的分钟目录,搜索深度为4级:年/月/日/小时/分钟/。只获取第4级目录的URL列表。然后遍历时间目录URL列表,目录下KEY1.il文件存在。若文件存在逐行处理文件,对各行数据根据设置的过滤器FILTERMAP过滤,只缓存有效的结果数据,若结果队列超过设置的结果行数或者到目录列表尾部都会将结果按开始时间排序,并分包发送查询结果到查询服务器WEB SERVER,完成查询。
(3)查询服务器WEB SERVER,实现客户端的可定制查询功能,查询服务器WEB SERVER接受用户的查询请求,根据唯一关键字KEY1找到对应的内存库DS SERVER,并把JSON格式的查询请求发送给内存库DS SERVER,,其中,查询请求中包括了唯一关键字KEY1。内存库DS SERVER的查询处理完毕后,查询服务器WEB SERVER会收到查询结果,同时提供网管参数配置管控中心,能够为网络管理人员提供参数配置界面。
(4)文件服务器,提供给信息采集模块AGENT存储信令文件和媒体文件,提供给客户端高速下载。
为了达到系统对大数据量级业务的处理能力和保证可靠性的目的,本发明还提供了一种分布式大数据快速存储策略,能根据用户的配置提供不同的响应速度,旨在均匀分担网络业务量,提高系统处理能力和可靠性。如采用Intel DPDK流处理框架进行数据采集,采用内存盘技术,和分布式大数据存储查询系统,解决大量数据文件的生成和及时查询两个矛盾,提供了10万条每秒的数据实时插入的能力和实时的快速查询的能力。
如图3所示,往内存库中写入数据包括如下步骤:
步骤S301,信令采集模块构建TLV记录,按MSISDN取哈希作为唯一关键字KEY1发送到得到对应内存库,并将此KEY1加入到TLV记录中。
信令采集模块AGENT采集信令,并对信令进行解析处理,例如,构建TLV记录,其中,TLV是指包括类型、长度和值三个字段的数据格式,将MSISDN取哈希作为唯一关键字KEY1发送到得到对应内存库,并将此KEY1加入到TLV记录中。
步骤S302,内存库接收到TLV记录,构建第一标识符KEY2,KEY2为KEY1和业务消息的时间戳的秒格式,或者小时格式。
步骤S303,查找KEY2对应的写入器是否成功,成功则执行步骤S306,失败则执行步骤KS304。
步骤S304,意味刷新时间到或新的MSISDN加入,需要批量(256个写入器为一批)关闭下当前的写入器,关闭时会强制从缓存写入内存盘。
具体地,当查找不到KEY2对应的写入器时,则表示刷新时间到或存在新的MSISDN加入,此时,需要关闭当前的写入器。
步骤S305,创建KEY2对应的写入器,写入器会在当前系统对应的分钟值或者小时值的叶子目录中创建新的文件。
步骤S306,写入到对应写入器的缓存。
步骤S307,判断写入器的缓存是否已满,如果写入器的缓存已满则执行步骤S308,如果写入器的缓存未满则执行步骤S301,进行下一条数据的处理。
步骤S308,写入器缓存数据写入文件,完毕执行步骤S301。
如图4所示,从内存库检索数据包括如下步骤:
步骤S401,查询服务器WEB SERVER接受用户的查询请求,根据KEY1找到对应的内存库DS SERVER。
步骤S402,内存库DS SERVER接收查询服务器的查询请求,根据KEY1,开始时间STARTTIME,结束时间ENDTIME,以及其他业务字段过滤值,构造过滤器FILTERMAP发起查询请求。
步骤S403,判断时间类型是小时还是分钟。如果判断出时间类型为小时则执行步骤S404,如果判断出时间类型为分钟则执行步骤S405。
步骤S404,根据STARTTIME和ENDTIME遍历时间范围内的分钟目录,搜索深度为5级:年/月/日/小时/分钟/,获取第5级目录的URL列表,并执行步骤S406。
步骤S405,根据STARTTIME和ENDTIME遍历时间范围内的小时目录,搜索深度为4级:年/月/日/小时/,获取第4级目录的URL列表,并执行步骤S406。
步骤S406,遍历时间目录的同一资源定位符URL列表,判断目录下KEY1.il文件是否存在,如果不存在则执行步骤S406继续遍历,如果存在则执行步骤S407。
步骤S407,逐行处理文件,对各行数据根据设置的过滤器FILTERMAP过滤,只缓存有效的结果数据。
步骤S408,判断查询结果队列是否超过预设的结果行数,如果未超过则执行步骤S409,超过则执行步骤S411,查询结束。
步骤S409,判断是否到文件尾部,如果未到文件尾部则执行步骤S407,如果到文件尾部 则执行步骤S410。
步骤S410,判断是否到目录列表尾部,如果未到列表尾部则执行步骤S406取下一个时间目录处理,到目录列表尾部则直接执行步骤S411,查询结束。
步骤S411,将结果按开始时间排序,并分包发送查询结果到查询服务器WEB SERVER。
与现有技术比较,本发明实施例所要解决的技术问题是:提供一种GGSN/PGW的实时的信令跟踪平台能支持全网500万用户,280Gbps吞吐量(2014年AIS标书要求);并且能支持单个GGSN/PGW 150万用户,50Gbps吞吐量,本发明能够提供一种对GGSN/PGW的业务信令及数据业务类型提供实时监控和相应报表。其中包括网络实时监控以及网元信令回溯功能。可以对GGSN/PGW各个接口的信令进行实时监控,包括S5/S8,Gn/Gp,Gx,Gy,认证授权计费AAA接口。运营商可以在系统上通过用户IMSI/MSISDN号码查询到某一时段内该用户在GGSN/PGW上发生的信令,并能对这些信令进行解码。可以至少保持7天的全网元所有用户的信令,用于回溯查询。
此外,本方明还提供分布式大数据快速存储策略,能根据用户的配置提供不同的响应速度,旨在均匀分担网络业务量,提高系统处理能力和可靠性。如采用Intel DPDK流处理框架进行数据采集,采用内存盘技术,和分布式大数据存储查询系统,解决大量数据文件的生成,和及时查询2个矛盾。提供了10万条每秒的数据实时插入的能力和实时的快速查询的能力。同时能适应在大数据量的业务需求下,网元并行分担整个网络业务负载,提高网络的业务处理性能。同时,在某个网元通信链路出现中断或故障时,分布式网络中的其它网元接替该网元业务,整个网络运营状态不中断,保证了网络的稳定性和可靠性。
需要说明的是,上述各个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述模块均位于同一处理器中;或者,上述模块分别位于多个处理器中。
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以被设置为存储用于执行上述实施例方法步骤的程序代码:
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述 的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
工业实用性
通过本发明实施例,采用采集网关通用分组无线业务支持节点GGSN或公共数据网网关PGW的信令,其中,上述信令为用户的信令;获取上述用户的唯一关键字;以及根据上述唯一关键字将上述信令存储至数据存储服务器的多级目录中,解决了相关技术中信令存储效率较低的问题,进而达到了提高信令存储效率的效果。

Claims (20)

  1. 一种数据处理方法,包括:
    采集网关通用分组无线业务支持节点GGSN或公共数据网网关PGW的信令,其中,所述信令为用户的信令;
    获取所述用户的唯一关键字;以及
    根据所述唯一关键字将所述信令存储至数据存储服务器的多级目录中。
  2. 根据权利要求1所述的方法,其中,采集网关通用分组无线业务支持节点GGSN或公共数据网网关PGW的信令包括:
    以光口镜像的方式连接到所述通用分组无线业务支持节点或所述公共数据网网关的接口以采集所述信令,其中,所述接口包括以下至少之一:S5接口,S8接口,Gn接口,Gp接口,Gx接口,Gy接口,认证授权计费AAA接口。
  3. 根据权利要求1所述的方法,其中,获取所述用户的唯一关键字包括:
    获取所述用户的识别码,其中,所述识别码包括国际移动用户识别码IMSI或移动用户综合业务数字网号码MSISDN;
    对所述识别码进行哈希运算,得到所述唯一关键字。
  4. 根据权利要求1所述的方法,其中,在根据所述唯一关键字将所述信令存储至数据存储服务器的多级目录中之前,所述方法还包括:根据时间在所述数据存储服务器中生成多级目录。
  5. 根据权利要求4所述的方法,其中,在根据所述唯一关键字将所述信令存储至数据存储服务器的多级目录中之后,所述方法包括:
    检测所述多级目录中是否存在超过预设时间的目录;以及
    在检测出所述多级目录中存在超过所述预设时间的目录时,将超过所述预设时间的目录从所述数据存储服务器中删除。
  6. 根据权利要求1所述的方法,其中,根据所述唯一关键字将所述信令存储至数据存储服务器的多级目录中包括:
    根据所述唯一关键字查找所述用户对应的数据存储服务器;以及
    将所述信令存储至所述用户对应的数据存储服务器的多级目录中。
  7. 根据权利要求6所述的方法,其中,将所述信令存储至所述用户对应的数据存储服务器的多级目录中包括:
    获取业务消息的时间戳;
    根据所述时间戳和所述唯一关键字生成第一标识符;
    获取所述第一标识符对应的写入器,其中,所述写入器和所述多级目录一一对应;以及
    通过所述写入器将所述信令写入至其对应的目录中。
  8. 根据权利要求1或7中任一项所述的方法,其中,所述数据存储服务器包括内存库和文件服务器,其中,所述内存库用于存储所述信令的概要信息,所述文件服务器用于存储所述信令的文件信息,所述概要信息和所述文件信息之间存在映射关系。
  9. 根据权利要求1所述的方法,其中,在根据所述唯一关键字将所述信令存储至数据存储服务器的多级目录中之后,所述方法还包括:
    接收查询指令,其中,所述查询指令包括过滤条件和所述唯一关键字;
    查找所述唯一关键字对应的数据存储服务器;以及
    根据所述过滤条件从所述唯一关键字对应的数据存储服务器中查询数据。
  10. 根据权利要求9所述的方法,其中,根据所述过滤条件从所述唯一关键字对应的数据存储服务器中查询数据包括:
    根据所述过滤条件遍历所述唯一关键字对应的数据存储服务器的多级目录;
    从所述唯一关键字对应的数据存储服务器的多级目录中获取满足所述过滤条件的数据,得到查询结果;
    判断所述查询结果的数据行数是否超过预设值;以及
    在判断出所述查询结果的数据行数超过所述预设值时,分批次显示所述查询结果。
  11. 一种数据处理装置,包括:
    采集模块,设置为采集网关通用分组无线业务支持节点GGSN或公共数据网网关PGW的信令,其中,所述信令为用户的信令;
    获取模块,设置为获取所述用户的唯一关键字;以及
    存储模块,设置为根据所述唯一关键字将所述信令存储至数据存储服务器的多级目录中。
  12. 根据权利要求11所述的装置,其中,所述采集模块包括:
    信令采集器,以光口镜像的方式连接到所述通用分组无线业务支持节点或所述公共数据网网关的接口以采集所述信令,其中,所述接口包括以下至少之一:S5接口,S8接口,Gn接口,Gp接口,Gx接口,Gy接口,认证授权计费AAA接口。
  13. 根据权利要求11所述的装置,其中,所述获取模块包括:
    获取单元,设置为获取所述用户的识别码,其中,所述识别码包括国际移动用户识别码IMSI或移动用户综合业务数字网号码MSISDN;
    运算单元,设置为对所述识别码进行哈希运算,得到所述唯一关键字。
  14. 根据权利要求11所述的装置,其中,所述装置还包括:生成模块,设置为根据时间在所述数据存储服务器中生成多级目录。
  15. 根据权利要求11所述的装置,其中,所述存储模块包括:
    查找单元,设置为根据所述唯一关键字查找所述用户对应的数据存储服务器;以及
    存储单元,设置为将所述信令存储至所述用户对应的数据存储服务器的多级目录中。
  16. 一种数据处理系统,包括:
    数据采集服务器,设置为采集网关通用分组无线业务支持节点GGSN或公共数据网网关PGW的信令,其中,所述信令为用户的信令;以及
    数据存储服务器,连接至所述数据采集模块,其中,所述数据存储服务器包括多级目录,所述多级目录用于存储所述信令。
  17. 根据权利要求16所述的系统,其中,所述数据存储服务器包括内存库和文件服务器,其中,所述内存库用于存储所述信令的概要信息,所述文件服务器用于存储所述信令的文件信息,所述概要信息和所述文件信息之间存在映射关系。
  18. 根据权利要求17所述的系统,其中,所述数据采集服务器包括探针信令采集器,所述探针信令采集器以光口镜像的方式连接到所述通用分组无线业务支持节点或所述公共数据网网关的接口以采集所述信令,其中,所述接口包括以下至少之一:S5接口,S8接口,Gn接口,Gp接口,Gx接口,Gy接口,认证授权计费AAA接口。
  19. 根据权利要求18所述的系统,其中,所述数据采集服务器还包括处理模块,连接至所述探针信令采集器,设置为对所述探针信令采集器采集的信令进行解析得到所述概要信息和所述文件信息,并将所述概要信息和所述文件信息分别发送至所述内存库和所述文件服务器。
  20. 根据权利要求16至19中任一项所述的系统,其中,所述数据处理系统还包括:查询服务器,连接至所述数据存储服务器,设置为从所述数据存储服务器查询所述信令。
PCT/CN2016/076648 2015-06-30 2016-03-17 数据处理方法、装置及系统 WO2017000592A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510374386.7A CN106326280B (zh) 2015-06-30 2015-06-30 数据处理方法、装置及系统
CN201510374386.7 2015-06-30

Publications (1)

Publication Number Publication Date
WO2017000592A1 true WO2017000592A1 (zh) 2017-01-05

Family

ID=57607563

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/076648 WO2017000592A1 (zh) 2015-06-30 2016-03-17 数据处理方法、装置及系统

Country Status (2)

Country Link
CN (1) CN106326280B (zh)
WO (1) WO2017000592A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309109A (zh) * 2019-05-23 2019-10-08 中国平安财产保险股份有限公司 数据监控方法、装置、计算机设备及存储介质
CN112306528A (zh) * 2020-11-04 2021-02-02 北京焦点新干线信息技术有限公司 一种数据更新方法及装置
CN116210253A (zh) * 2020-08-06 2023-06-02 华为技术有限公司 一种通信方法、设备及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108255611B (zh) * 2018-01-18 2019-03-26 北京卓越智软科技有限公司 基于树形存储结构的请求处理方法
CN112037394A (zh) * 2020-08-07 2020-12-04 武汉旷视金智科技有限公司 身份识别记录处理方法、装置、门禁系统、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551826A (zh) * 2009-05-19 2009-10-07 成都市华为赛门铁克科技有限公司 数据检索方法、装置及其系统
CN101795211A (zh) * 2010-01-13 2010-08-04 北京中创信测科技股份有限公司 一种数据存储方法及系统
CN101859316A (zh) * 2010-04-29 2010-10-13 北京无限立通通讯技术有限责任公司 一种对海量文件进行存取的方法及装置
US8185751B2 (en) * 2006-06-27 2012-05-22 Emc Corporation Achieving strong cryptographic correlation between higher level semantic units and lower level components in a secure data storage system
CN103347008A (zh) * 2013-06-20 2013-10-09 中国联合网络通信集团有限公司 信息推送方法及其装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100504303C (zh) * 2006-03-28 2009-06-24 北京瑞图万方科技有限公司 分布式数据处理系统及方法
WO2009157565A1 (ja) * 2008-06-27 2009-12-30 京セラ株式会社 携帯端末装置、携帯端末装置の課金処理方法および課金システム
CN101459557B (zh) * 2008-11-29 2011-02-02 成都市华为赛门铁克科技有限公司 一种安全日志集中存储方法及装置
CN103067934A (zh) * 2011-10-21 2013-04-24 上海湾流仪器技术有限公司 核心网多接口信令流程关联方法
US9378234B2 (en) * 2013-03-11 2016-06-28 International Business Machines Corporation Management of updates in a database system
CN103346905B (zh) * 2013-06-14 2016-12-28 吴建进 一种信令分析的方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8185751B2 (en) * 2006-06-27 2012-05-22 Emc Corporation Achieving strong cryptographic correlation between higher level semantic units and lower level components in a secure data storage system
CN101551826A (zh) * 2009-05-19 2009-10-07 成都市华为赛门铁克科技有限公司 数据检索方法、装置及其系统
CN101795211A (zh) * 2010-01-13 2010-08-04 北京中创信测科技股份有限公司 一种数据存储方法及系统
CN101859316A (zh) * 2010-04-29 2010-10-13 北京无限立通通讯技术有限责任公司 一种对海量文件进行存取的方法及装置
CN103347008A (zh) * 2013-06-20 2013-10-09 中国联合网络通信集团有限公司 信息推送方法及其装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309109A (zh) * 2019-05-23 2019-10-08 中国平安财产保险股份有限公司 数据监控方法、装置、计算机设备及存储介质
CN110309109B (zh) * 2019-05-23 2024-02-02 中国平安财产保险股份有限公司 数据监控方法、装置、计算机设备及存储介质
CN116210253A (zh) * 2020-08-06 2023-06-02 华为技术有限公司 一种通信方法、设备及系统
CN112306528A (zh) * 2020-11-04 2021-02-02 北京焦点新干线信息技术有限公司 一种数据更新方法及装置
CN112306528B (zh) * 2020-11-04 2023-12-08 北京博点智合科技有限公司 一种数据更新方法及装置

Also Published As

Publication number Publication date
CN106326280B (zh) 2021-06-29
CN106326280A (zh) 2017-01-11

Similar Documents

Publication Publication Date Title
US11757740B2 (en) Aggregation of select network traffic statistics
CN109460349B (zh) 一种基于日志的测试用例生成方法和装置
US10891297B2 (en) Method and system for implementing collection-wise processing in a log analytics system
US11620288B2 (en) Dynamically assigning a search head to process a query
CN105138592B (zh) 一种基于分布式架构的日志数据存储和检索方法
WO2017000592A1 (zh) 数据处理方法、装置及系统
US6751627B2 (en) Method and apparatus to facilitate accessing data in network management protocol tables
CN106982150B (zh) 一种基于Hadoop的移动互联网用户行为分析方法
CN103118007B (zh) 一种用户访问行为的获取方法和系统
CN104699718A (zh) 用于快速引入业务数据的方法和装置
CN111258978A (zh) 一种数据存储的方法
CN105577411B (zh) 基于服务起源的云服务监控方法和装置
US11625412B2 (en) Storing data items and identifying stored data items
WO2015096609A1 (zh) 视频资源的倒排索引文件建立方法及其系统
CN112632129A (zh) 一种码流数据管理方法、装置及存储介质
WO2023093607A1 (zh) 一种离线数据模糊搜索方法、装置、设备和介质
CN115333966B (zh) 一种基于拓扑的Nginx日志分析方法、系统及设备
US11892996B1 (en) Identifying an indexing node to process data using a resource catalog
Murugesan et al. Audit log management in MongoDB
US11599396B2 (en) Resegmenting chunks of data based on source type to facilitate load balancing
CN104750860B (zh) 一种不确定数据的数据存储方法
CN107180072B (zh) 一种时序数据的处理方法及装置
CN117171272A (zh) 数据同步方法及装置
CN117493275A (zh) 冷数据检索方法、装置、电子设备及存储介质
CN112347195A (zh) 一种基于区块链的气瓶数据记录方法及记录装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16816957

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16816957

Country of ref document: EP

Kind code of ref document: A1