CN105653531B - Data extraction method and device - Google Patents

Data extraction method and device Download PDF

Info

Publication number
CN105653531B
CN105653531B CN201410638204.8A CN201410638204A CN105653531B CN 105653531 B CN105653531 B CN 105653531B CN 201410638204 A CN201410638204 A CN 201410638204A CN 105653531 B CN105653531 B CN 105653531B
Authority
CN
China
Prior art keywords
extraction
data
message
matching
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410638204.8A
Other languages
Chinese (zh)
Other versions
CN105653531A (en
Inventor
陈娟
吴明
陈伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201410638204.8A priority Critical patent/CN105653531B/en
Priority to PCT/CN2015/076587 priority patent/WO2016074434A1/en
Publication of CN105653531A publication Critical patent/CN105653531A/en
Application granted granted Critical
Publication of CN105653531B publication Critical patent/CN105653531B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Abstract

The invention discloses a data extraction method and a data extraction device, wherein the method comprises the following steps: determining extracted target data according to the data message; matching the content in the message data according to a predetermined regular expression; and under the condition that at least two target data exist in the message data, extracting the at least two target data. By the method and the device, the problem of inaccurate extraction of the target data in the related technology is solved, and the target data can be accurately extracted.

Description

Data extraction method and device
Technical Field
The invention relates to the field of communication, in particular to a data extraction method and device.
Background
With the development of mobile communication technology, internet information exchange and transmission are more convenient. The continuous optimization, speed improvement, bandwidth upgrading and cost reduction of the operator network are all in line with the trend of the era. In order to promote products and improve user experience better, operators need to know requirements and preferences of users urgently. Metadata extraction can assist in understanding the interactive content of websites, business applications, and servers that users log on frequently. The operator can track and analyze the user behavior and the user experience according to the metadata extraction result, and count the information of the hot website, the time delay, the flow and the like of the corresponding website on the user. The wireless network can be better optimized, and an operator is assisted in improving the network quality, so that the product obtains higher value.
The method is characterized in that a user requests a server to acquire resources through Internet terminal equipment, and the server returns a response message after receiving and explaining a request message, so that the problem is how to accurately extract required data from massive message contents. The existing method generally adopts direct matching extraction according to a regular expression, and because metadata information transmitted on the network is complicated, the characteristics of a plaintext can not be found sometimes, and the regular expression can not be configured well; sometimes, there are multiple extraction targets in the message data but the extraction is not complete, or only one extraction is needed but many unwanted error contents are extracted.
Aiming at the problem of inaccurate extraction of target data in the related art, no effective solution is provided at present.
Disclosure of Invention
The invention provides a data extraction method and a data extraction device, which are used for at least solving the problem of inaccurate extraction of target data in the related technology.
According to an aspect of the present invention, there is provided a data extraction method including: determining extracted target data according to the data message; matching the content in the message data according to a predetermined regular expression; and under the condition that at least two target data exist in the message data, extracting the at least two target data.
Further, matching the content in the message data according to a predetermined regular expression includes: and under the condition that the message data has character string characteristics, matching the content in the message data according to a preset character regular expression.
Further, matching the content in the message data according to a predetermined regular expression includes: and under the condition that the message data does not have character characteristics, analyzing the message data in a preset function analysis mode, and decoding to obtain the target data.
Further, extracting the at least two target data comprises: and under the condition that the at least two target data are extracted from different message data, extracting the target data by pre-configured extraction times for recording extraction success and/or extraction times for recording extraction failure.
Further, extracting the at least two target data comprises: under the condition that one message data has two extraction targets, extracting the two target data after matching the content in the message data for multiple times; and/or under the condition that different message data have two extraction targets, extracting the two target data by adopting the pre-configured extraction times for recording the extraction success and/or the extraction times of the attempts for recording the extraction failure.
Further, before extracting the target data through a pre-configured extraction number for recording extraction success and/or an extraction attempt number for recording extraction failure, the method further comprises the following steps: and configuring a dynamic setting interface, wherein the dynamic setting interface is used for receiving different extraction times and attempted extraction times set for different extraction types.
According to another aspect of the present invention, there is provided a data extracting apparatus including: the determining module is used for determining the extracted target data according to the data message; the matching module is used for matching the content in the message data according to a preset regular expression; and the extraction module is used for extracting at least two target data under the condition that the message data contains the at least two target data.
Further, the matching module comprises: and the matching unit is used for matching the content in the message data according to a preset character regular expression under the condition that the message data has character string characteristics.
Further, the matching module comprises: and the analysis unit is used for analyzing the message data in a preset function analysis mode under the condition that the message data does not have character characteristics, and decoding to obtain the target data.
Further, the extraction module comprises: and the extracting unit is used for extracting the target data through the pre-configured extraction times for recording the extraction success and/or the extraction times for recording the extraction failure under the condition of extracting the at least two target data from different message data.
Further, the extraction module comprises: the second extraction unit is used for extracting two target data after matching the content in the message data for multiple times under the condition that one message data has two extraction targets; and/or the third extraction unit is used for extracting the two target data by adopting the extraction times which are pre-configured and used for recording the successful extraction and/or the attempted extraction times for recording the failure extraction under the condition that the different message data has two extraction targets.
Further, the apparatus further comprises: the device comprises a configuration unit and a dynamic setting interface, wherein the dynamic setting interface is used for receiving different extraction times and attempted extraction times set for different extraction types.
According to the invention, extracted target data is determined according to the data message; matching the content in the message data according to a predetermined regular expression; under the condition that at least two target data exist in the message data, the at least two target data are extracted, the problem that the target data are not accurately extracted in the related technology is solved, and the effect of accurately extracting the target data can be achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow diagram of a data extraction method according to an embodiment of the invention;
FIG. 2 is a block diagram of a data extraction device according to an embodiment of the present invention;
FIG. 3 is a block diagram one of a data extraction device according to a preferred embodiment of the present invention;
FIG. 4 is a block diagram two of a data extraction device according to a preferred embodiment of the present invention;
FIG. 5 is a block diagram three of a data extraction device according to a preferred embodiment of the present invention;
FIG. 6 is a block diagram four of a data extraction device in accordance with a preferred embodiment of the present invention;
FIG. 7 is a first flowchart of a data extraction method according to a preferred embodiment of the present invention;
FIG. 8 is a flow chart two of a data extraction method according to a preferred embodiment of the present invention;
FIG. 9 is a flow chart III of a data extraction method according to a preferred embodiment of the present invention;
FIG. 10 is a flow chart diagram four of a data extraction method in accordance with a preferred embodiment of the present invention;
FIG. 11 is a flow chart diagram five of a data extraction method according to a preferred embodiment of the present invention;
FIG. 12 is a sixth flowchart of a data extraction method in accordance with a preferred embodiment of the present invention;
fig. 13 is a seventh flowchart of a data extraction method according to a preferred embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
In the present embodiment, a data extraction method is provided, and fig. 1 is a flowchart of a data extraction method according to an embodiment of the present invention, where as shown in fig. 1, the flowchart includes the following steps:
step S102, determining extracted target data according to the data message;
step S104, matching the content in the message data according to a predetermined regular expression;
step S106, under the condition that at least two target data exist in the message data, extracting the at least two target data.
Through the steps, the extracted target data is determined according to the data message, the content in the message data is matched according to the preset regular expression, and the at least two target data are extracted under the condition that at least two target data exist in the message data, so that the problem of inaccurate extraction of the target data in the related technology is solved, and the effect of accurately extracting the target data can be achieved.
In this embodiment, matching the content in the message data according to the predetermined regular expression may include: under the condition that the message data has character string characteristics, matching the content in the message data according to a preset character regular expression; and/or under the condition that the message data does not have character characteristics, analyzing the message data in a preset function analysis mode, and decoding to obtain the target data.
In an optional embodiment, extracting the at least two target data may include: and under the condition that the at least two target data are extracted from different message data, extracting the target data by pre-configured extraction times for recording extraction success and/or extraction times for recording extraction failure.
Further, extracting the at least two target data includes: under the condition that one message data has two extraction targets, extracting the two target data after matching the content in the message data for multiple times; and/or under the condition that different message data have two extraction targets, extracting the two target data by adopting the pre-configured extraction times for recording the extraction success and/or the extraction times of the attempts for recording the extraction failure.
As a preferred embodiment, before extracting the target data by a pre-configured extraction number for recording extraction success and/or an extraction attempt number for recording extraction failure, a dynamic setting interface is configured, wherein the dynamic setting interface is used for receiving different extraction times and extraction attempt times set for different extraction types.
The embodiment of the present invention further provides a data extraction device, which is used to implement the foregoing embodiments and preferred embodiments, and the description of the device that has been already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 2 is a block diagram of a data extraction apparatus according to an embodiment of the present invention, as shown in fig. 2, including: a determination module 22, a matching module 24, and an extraction module 26, each of which is briefly described below.
A determining module 22, configured to determine extracted target data according to the data packet;
a matching module 24, configured to match the content in the message data according to a predetermined regular expression;
the extracting module 26 is configured to extract at least two target data when at least two target data exist in the message data.
Fig. 3 is a block diagram of a data extraction device according to a preferred embodiment of the present invention, and as shown in fig. 3, the matching module 24 includes:
the matching unit 32 is configured to match the content in the message data according to a predetermined character regular expression under the condition that the message data has the character string feature.
Fig. 4 is a block diagram ii of the data extracting apparatus according to the preferred embodiment of the present invention, and as shown in fig. 4, the matching module 24 includes:
and the analyzing unit 42 is configured to analyze the message data in a predetermined function analyzing manner and decode the message data to obtain the target data when the message data does not have the character feature.
Fig. 5 is a block diagram three of a data extraction device according to a preferred embodiment of the present invention, and as shown in fig. 5, the extraction module 26 includes:
and an extracting unit 52, configured to, in a case that the at least two pieces of target data are extracted from different pieces of message data, extract the target data by pre-configuring the number of extraction times for recording extraction success and/or the number of extraction attempts for recording extraction failure.
Further, the extracting module 26 may further include: the second extraction unit is used for extracting two target data after matching the content in the message data for multiple times under the condition that one message data has two extraction targets; and/or the third extraction unit is used for extracting the two target data by adopting the extraction times which are pre-configured and used for recording the successful extraction and/or the attempted extraction times for recording the failure extraction under the condition that the different message data has two extraction targets.
Fig. 6 is a block diagram four of a data extraction apparatus according to a preferred embodiment of the present invention, as shown in fig. 6, the apparatus further comprising:
a configuration unit 62 configured to configure a dynamic setting interface, wherein the dynamic setting interface is configured to receive different extraction times and attempted extraction times set for different extraction types.
Examples of the present invention are further described below in conjunction with the alternative embodiments.
In order to better promote network services, embodiments of the present invention provide a method for extracting metadata, where content in a message needs to be analyzed to find needed target data, when the content of the message data has character string features, the content in the message is matched according to a predefined regular expression, and after matching is successful, the target data is extracted. If a plurality of extraction targets transmitted in one message data need to be extracted, and the regular rule can only match one result generally, the invention adopts the multiple matching extension function, and can realize all extraction by configuring multiple matching extension attributes. For example, if there are multiple hellos in a message, if only basic extraction configuration is configured, only the content of the first appearing position in the message can be extracted, in order to ensure all extraction, multiple extraction extension configuration is additionally configured, and the starting position of multiple matching is the ending position of the first matching (only if the first matching is satisfied, multiple matching is performed). Fig. 7 is a first flowchart of a data extraction method according to a preferred embodiment of the present invention, as shown in fig. 7, including the following steps:
step S702, analyzing that one message has a plurality of extraction targets;
step S704, writing a regular expression;
step S706, matching the message with a regular expression;
step S708, successfully matching and extracting the first;
step S710, configuring multiple matching extended attributes;
in step S712, the matching is continued from the end position of the previous matching until the extraction is finished.
When a plurality of extraction targets transmitted in different message data need to be extracted, the invention provides configuration extraction times and trial extraction times. The user can specify any extraction times, the count is increased by 1 every time extraction is carried out, and extraction is not carried out any more after the extraction times are reached. In some cases, it is possible to configure extraction rules, but not extract information to be extracted later, for example, it may be an encrypted message or the next object appears later, and then the number of extraction attempts may be specified to avoid the performance loss. Accumulation method of trial extraction times: and adding 1 if the extraction is not continuously extracted, and resetting if the extraction is continuously extracted.
The extraction times and the attempted extraction times of different extraction types have different configuration requirements, and the metadata extraction provides a dynamic setting interface for receiving user modification parameters. The user can set different extraction times and attempted extraction times for different extraction types, and dynamically modify the extraction data in real time. Fig. 8 is a second flowchart of the data extraction method according to the preferred embodiment of the present invention, as shown in fig. 8, including the following steps:
step S802, the number of times of extraction and the number of times of attempted extraction of a certain extraction type adopt default values;
step S804, the user (product) calls the parameter configuration interface to dynamically modify;
step S806, metadata extraction is performed according to the new parameters.
And when the characteristic character strings can not be found in the message, analyzing the application layer data by adopting a function analysis mode, and directly decoding to obtain an extraction target. Fig. 9 is a flowchart three of a data extraction method according to a preferred embodiment of the present invention, as shown in fig. 9, including the following steps:
step S902, analyzing the application layer data by a function;
in step S904, the extraction target is obtained by decoding.
In some cases, extraction can be performed only when a message satisfies a certain feature (defining a cut-in rule) that indicates that the message is specific message data, or when performance is affected due to a weak regular feature or extraction of more unnecessary content (defining an exclusion rule), expression auxiliary information extraction may be employed. Defining variables, and assigning data in the message content to the variables for expression operation. The expression form is similar as: (a +6) > & & (c | | | e > >2<8)), supporting logical expressions, mathematical expressions, and expressions in which both are combined. The extraction action can be performed only if the expression is true. Fig. 10 is a flow chart of a data extraction method according to a preferred embodiment of the present invention, as shown in fig. 10, including the following steps:
step S1002, defining variables;
step S1004, extracting data in the message and assigning values to variables;
step S1006, variable participation expression calculation
In step S1008, it is determined that the expression is established, and if the determination result is yes. Executing step S1010, and if the determination result is no, executing step S1012;
step S1010, extracting;
in step S1012, no extraction is performed, and the process returns.
In the related embodiment, one message has multiple extraction targets to describe the process of extracting multiple matching extended attributes, multiple messages has multiple extraction targets to describe the number of times of extracting metadata and the method of using the number of times of attempting to extract metadata, and the following describes the process of extracting expression-assisted metadata by using QQ login and exit events, but the mechanism and method of extracting metadata are not limited to the above cases.
Functional description of multiple matching extended attribute extractions. By using the invention, def needs to be extracted from the message load content abcdefghijkdeflmn. Two extraction targets def exist in the message, and multiple matching extended attributes are configured for extraction. The regular expression R1 ═ abc is configured, and R2 ═ ghi is configured. After the expression of R1 is matched, adding 1 to the matched end position or adding 3 to the matched start position is the starting position of the extraction target; after matching to the expression of R2, the end position of the extraction target is determined by subtracting 1 from the start position or subtracting 3 from the end position. Continuing with the second matching, starting from the end position i of the first matching, configuring the regular expression R3 ═ jk, and R4 ═ lmn. After the expression of R3 is matched, the matched end position plus 1 or the matched start position plus 2 is the starting position of the extraction target; after matching to the expression of R4, the end position of the extraction target is determined by subtracting 1 from the start position or subtracting 3 from the end position. And matching the extended attributes for multiple times to extract two results, and finishing the extraction. Fig. 11 is a flow chart of a fifth data extraction method according to a preferred embodiment of the present invention, as shown in fig. 11, including the following steps:
step S1102, extracting def from abcdefghijkdeflmn, configuring two extraction targets def in the message, and extracting by configuring multiple matching extended attributes;
step S1104, configuring a regular expression R1 ═ abc, R2 ═ ghi, and the message matching is successful;
step S1106, calculating the starting position: the end position of R1 plus 1 or the start position of R1 plus 3;
step S1108, calculating an end position: the starting position of R2 minus 1 or the ending position of R2 minus 3;
step S1110, continuing to perform the second matching, starting from the end position i of the first matching, configuring a regular expression R3 ═ jk, R4 ═ lmn, and successfully matching the message;
step S1112, calculating a start position: the end position of R3 plus 1 or the start position of R3 plus 2;
step S1114, calculate the end position: the starting position of R4 minus 1 or the ending position of R4 minus 3;
in step S1116, two results are extracted by matching the extended attributes for a plurality of times, and the extraction is completed.
The method comprises the steps that functional description about configuration extraction times and trial extraction times is carried out, a user configures the extraction times and the trial extraction times of a target extraction type, parameters are written into a metadata extraction module in real time, the user requests server data for internet service, and data messages enter the extraction module to be matched with regular expressions. When the matching is successful, adding 1 to the extraction frequency count, trying to zero the extraction frequency, calculating to obtain a starting position and an ending position, then judging whether the extraction frequency reaches a configured value, if not, continuing to enter the module for matching, otherwise, ending the extraction process; and if the matching is unsuccessful, adding 1 to the number of attempted extraction times, judging whether the number of attempted extraction times reaches a configured numerical value, if not, continuing to enter the module for matching, otherwise, ending the extraction process. Fig. 12 is a flowchart of a sixth method for extracting data according to a preferred embodiment of the present invention, as shown in fig. 12, including the following steps:
step S1202, the product configures the extraction times and the attempted extraction times of the target extraction type;
step S1204, the number of times of extraction and the number of times of attempted extraction are written into the extraction module in real time;
step S1206, the user requests server data for the Internet service;
step S1208, performing regular expression matching on the data message;
step S1210, matching is successful, the extraction frequency is +1, and the attempted extraction frequency is reset;
step S1212, calculating a start position and an end position;
step S1214, judging whether the extraction times are reached or not, continuing to step S1208, if so, executing step S1220, and finishing the extraction;
step S1216, unsuccessful matching, trying to extract times + 1;
step S1218, determining whether the number of attempted extractions is reached, if not, continuing to step S1208, and if so, executing step S1220 to finish the extraction;
in step S1220, the extraction is ended.
The function description about the extraction of expressions assisted QQ login and exit events. Analyzing the QQ login message, wherein one byte content of a Flag field at the initial position of the message load is 0x02, two bytes content of a Command field representing login at the initial +3 position of the message load is 0x62, and one byte content of a Data field at the initial +11 position of the message load is 0x 02. Analyzing the QQ exit message, wherein the Flag field is 0x02, the Command field representing exit is 0x01, and the Data field is 0x 02. The Command field of two-byte content at the start +3 position of the load needs to be extracted, the rule is too short and simple, the Flag is judged to be 2 by adopting the expression, the Data is equal to 2, and the Command of the QQ Command word is 0x62 or 0x01, namely, the message is extracted only when the message is judged to be a login message or an exit message, and the expression is established. And extracting the numerical values of two bytes at the +3 position of the initial load of the message, and finishing the extraction. Fig. 13 is a seventh flowchart of a data extraction method according to a preferred embodiment of the present invention, as shown in fig. 13, including the following steps:
step S1302, defining three variables, Flag, Command and Data;
step S1304, Flag assignment is carried out, and the content of the initial position of the message load takes a 1-byte value; the Command is assigned, and the content of the +3 position of the load start takes a value of 2 bytes; data is assigned, and the content of the +11 position of the load start takes a 1-byte value;
step S1306, calculate whether the expression is true ((Flag ═ 2) & (Data ═ 2) & ((Command ═ 0x62) | | | | Command ═ 0x 01));
step S1308, extracting the numerical values of two bytes at the initial +3 position of the message load;
in step S1310, the extraction is ended.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method of data extraction, comprising:
determining extracted target data according to the data message;
matching the content in the message data according to a predetermined regular expression;
under the condition that at least two target data exist in the message data, extracting the at least two target data comprises the following steps:
and under the condition that the at least two target data are extracted from different message data, extracting the target data by pre-configured extraction times for recording extraction success and/or extraction times for recording extraction failure.
2. The method of claim 1, wherein matching content in the message data according to a predetermined regular expression comprises:
and under the condition that the message data has character string characteristics, matching the content in the message data according to a preset character regular expression.
3. The method of claim 2, wherein matching content in the message data according to a predetermined regular expression comprises:
and under the condition that the message data does not have character characteristics, analyzing the message data in a preset function analysis mode, and decoding to obtain the target data.
4. The method of claim 1, wherein extracting the at least two target data comprises:
under the condition that one message data has two extraction targets, extracting the two target data after matching the content in the message data for multiple times; and/or
Under the condition that different message data have two extraction targets, extracting the two target data by adopting the pre-configured extraction times for recording the extraction success and/or the extraction times of the attempt for recording the extraction failure.
5. The method according to claim 1 or 4, wherein before extracting the target data through a pre-configured extraction number for recording extraction success and/or an extraction attempt number for recording extraction failure, the method further comprises:
and configuring a dynamic setting interface, wherein the dynamic setting interface is used for receiving different extraction times and attempted extraction times set for different extraction types.
6. A data extraction apparatus, comprising:
the determining module is used for determining the extracted target data according to the data message;
the matching module is used for matching the content in the message data according to a preset regular expression;
the extracting module is used for extracting at least two target data under the condition that the message data contains the at least two target data;
wherein the extraction module comprises:
the first extraction unit is used for extracting the target data through the extraction times which are configured in advance and used for recording the extraction success and/or the extraction times which are configured in advance and used for recording the extraction failure.
7. The apparatus of claim 6, wherein the matching module comprises:
and the matching unit is used for matching the content in the message data according to a preset character regular expression under the condition that the message data has character string characteristics.
8. The apparatus of claim 7, wherein the matching module comprises:
and the analysis unit is used for analyzing the message data in a preset function analysis mode under the condition that the message data does not have character characteristics, and decoding to obtain the target data.
9. The apparatus of claim 6, wherein the extraction module comprises:
the second extraction unit is used for extracting two target data after matching the content in the message data for multiple times under the condition that one message data has two extraction targets; and/or
And the third extraction unit is used for extracting the two target data by adopting the pre-configured extraction times for recording the successful extraction and/or the attempted extraction times for recording the failed extraction under the condition that the two extraction targets exist in different message data.
10. The apparatus of claim 7 or 9, further comprising:
the device comprises a configuration unit and a dynamic setting interface, wherein the dynamic setting interface is used for receiving different extraction times and attempted extraction times set for different extraction types.
CN201410638204.8A 2014-11-12 2014-11-12 Data extraction method and device Active CN105653531B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410638204.8A CN105653531B (en) 2014-11-12 2014-11-12 Data extraction method and device
PCT/CN2015/076587 WO2016074434A1 (en) 2014-11-12 2015-04-14 Data extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410638204.8A CN105653531B (en) 2014-11-12 2014-11-12 Data extraction method and device

Publications (2)

Publication Number Publication Date
CN105653531A CN105653531A (en) 2016-06-08
CN105653531B true CN105653531B (en) 2020-02-07

Family

ID=55953676

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410638204.8A Active CN105653531B (en) 2014-11-12 2014-11-12 Data extraction method and device

Country Status (2)

Country Link
CN (1) CN105653531B (en)
WO (1) WO2016074434A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117440B (en) * 2017-06-23 2021-06-22 中移动信息技术有限公司 Metadata information acquisition method, system and computer readable storage medium
CN107766466A (en) * 2017-09-29 2018-03-06 上海望友信息科技有限公司 Recognition methods, system, computer-readable recording medium and the equipment of data type
CN109933712A (en) * 2019-03-06 2019-06-25 北京思特奇信息技术股份有限公司 A kind of extracting method and system of message data
CN111507615A (en) * 2020-04-15 2020-08-07 江苏鹏为软件有限公司 Evaluation system for smart city detection
CN112511643A (en) * 2020-12-07 2021-03-16 北京天融信网络安全技术有限公司 Message data extraction method and device
CN113965408B (en) * 2021-11-09 2023-01-20 北京锐安科技有限公司 Method, device, medium and equipment for extracting HTTP (hyper text transport protocol) message

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101068209A (en) * 2007-06-20 2007-11-07 中兴通讯股份有限公司 Deep message detection system and method
CN101101600A (en) * 2007-07-10 2008-01-09 北京大学 Metadata automatic extraction method based on multiple rule in network search
CN101438272A (en) * 2006-04-21 2009-05-20 微软公司 System for processing formatted data
CN101957816A (en) * 2009-07-13 2011-01-26 上海谐宇网络科技有限公司 Webpage metadata automatic extraction method and system based on multi-page comparison
CN102576362A (en) * 2009-09-30 2012-07-11 株式会社日立解决方案 Method for setting metadata, system for setting metadata, and program
CN102611565A (en) * 2011-10-18 2012-07-25 国网电力科学研究院 Regular-expression-based alarm correlation analysis method for monitoring system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043862B (en) * 2010-12-29 2012-10-17 重庆新媒农信科技有限公司 Directional web data extraction method
CN104133830A (en) * 2013-05-02 2014-11-05 乐视网信息技术(北京)股份有限公司 Data obtaining method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101438272A (en) * 2006-04-21 2009-05-20 微软公司 System for processing formatted data
CN101068209A (en) * 2007-06-20 2007-11-07 中兴通讯股份有限公司 Deep message detection system and method
CN101101600A (en) * 2007-07-10 2008-01-09 北京大学 Metadata automatic extraction method based on multiple rule in network search
CN101957816A (en) * 2009-07-13 2011-01-26 上海谐宇网络科技有限公司 Webpage metadata automatic extraction method and system based on multi-page comparison
CN102576362A (en) * 2009-09-30 2012-07-11 株式会社日立解决方案 Method for setting metadata, system for setting metadata, and program
CN102611565A (en) * 2011-10-18 2012-07-25 国网电力科学研究院 Regular-expression-based alarm correlation analysis method for monitoring system

Also Published As

Publication number Publication date
WO2016074434A1 (en) 2016-05-19
CN105653531A (en) 2016-06-08

Similar Documents

Publication Publication Date Title
CN105653531B (en) Data extraction method and device
CN107645524B (en) Message pushing processing method and device
CN103986752B (en) The method, apparatus and system of information are inputted in the inputting interface of intelligent television
CN106649446B (en) Information pushing method and device
US20160050128A1 (en) System and Method for Facilitating Communication with Network-Enabled Devices
CN108469972B (en) Method and device for supporting display of multiple windows in WEB page
CN111708557B (en) Method, device and storage medium for updating configuration file
CN113010944B (en) Model verification method, electronic equipment and related products
CN104301875A (en) Short message processing method and device
CN104052757B (en) Identification system and method based on the client application in mobile phone
CN112019446A (en) Interface speed limiting method, device, equipment and readable storage medium
CN104811485A (en) Resource sharing method
JP2018537921A (en) Identification method and apparatus based on communication flow of different functions of Skype
CN104484482A (en) Webpage information updating method and system of network platform
CN108345606A (en) The acquisition methods and device of web page resources
CN111353036B (en) Rule file generation method, device, equipment and readable storage medium
CN105550179A (en) Webpage collection method and browser plug-in
CN102147660A (en) Method and device for input based on multi-user cooperative editing
CN108076015B (en) Authority information processing method and device
CN104902432A (en) Method and device for generating application operation log of terminal mobile
CN102769625A (en) Client-side Cookie information acquisition method and device
EP3163795B1 (en) Charging methods, access device, and charging device
CN109964473B (en) Voice service response method and device
CN105278928A (en) IVR external interface configuration method and IVR external interface configuration device
CN104980473B (en) UI resource loading method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant