CN106844553B - Data detection and expansion method and device based on sample data - Google Patents

Data detection and expansion method and device based on sample data Download PDF

Info

Publication number
CN106844553B
CN106844553B CN201611264829.8A CN201611264829A CN106844553B CN 106844553 B CN106844553 B CN 106844553B CN 201611264829 A CN201611264829 A CN 201611264829A CN 106844553 B CN106844553 B CN 106844553B
Authority
CN
China
Prior art keywords
data
matching
database
sample
sample data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611264829.8A
Other languages
Chinese (zh)
Other versions
CN106844553A (en
Inventor
汤奇峰
李炳辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zamplus Advertising Shanghai Co ltd
Original Assignee
Zamplus Advertising Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zamplus Advertising Shanghai Co ltd filed Critical Zamplus Advertising Shanghai Co ltd
Priority to CN201611264829.8A priority Critical patent/CN106844553B/en
Publication of CN106844553A publication Critical patent/CN106844553A/en
Application granted granted Critical
Publication of CN106844553B publication Critical patent/CN106844553B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data detection and expansion method and device based on sample data, the method includes the following steps: determining the sample data based on at least one piece of data in a database, wherein the database stores a plurality of pieces of data acquired by detecting mass data; searching in the mass data based on the sample data to obtain matched data matched with the sample data in the mass data; processing the matching data to obtain a matching rule, and updating a fingerprint database, wherein the matching rule obtained historically is stored in the fingerprint database; and performing matching extraction in the mass data based on the updated fingerprint database to obtain data matched with the matching rule in the updated fingerprint database in the mass data, and expanding the data obtained by matching to the database. The technical scheme provided by the invention can more accurately and efficiently carry out global and systematic analysis and processing on the mass data.

Description

Data detection and expansion method and device based on sample data
Technical Field
The invention relates to the technical field of internet, in particular to a data detection and expansion method and device based on sample data.
Background
With the rapid development of internet technology, internet websites and the number of people who surf the internet in China are all rapidly increased, and with the rapid growth of netizens and the increasing abundance of internet resources, access log data generated on the internet is rapidly expanded to form mass data, so that how to detect, discover and expand required data information from the mass data becomes the important task of the current information processing party.
At present, methods for discovering and expanding required data from mass data mainly focus on the following two methods: first, the data is manually checked, and a user accessing a Uniform Resource Locator (URL) of each website or Application program (APP, for example, Application software loaded in a mobile phone) on the internet is manually analyzed and summarized to obtain a series of matching rules, and then the matching rules are matched with mass data resources on the internet, so as to extract and expand the data to obtain the required data. Second, it is an Application Programming Interface (API) query method, which calls an Interface of the other party as needed through a document description of an API provider to obtain the required data.
Although these two methods can satisfy the user's desire to find and expand a specific type of data from a large amount of data to some extent, they have respective unavoidable drawbacks. For the manual data checking mode, a large amount of manpower is needed to manually perform related analysis and statistics in actual operation, and the detection and expansion efficiency is low; the API query mode depends on the document description provided by the API provider and has uncertainty.
On the other hand, the existing data discovery and expansion methods including the two methods finally obtain data on certain specific websites. However, due to the rapid expansion of the website scale in the internet and the fact that the construction modes of many websites and APPs for URLs do not establish uniform standards and rules, data acquired by the existing method is only a small part of mass data, which is not beneficial for users to perform global and systematic analysis and processing on the mass data, and affects the accuracy of data acquired by the users through detection and expansion.
Disclosure of Invention
The invention solves the technical problem that the prior art can not carry out global and systematic analysis and processing on mass data in a more accurate and efficient mode.
To solve the above technical problem, an embodiment of the present invention provides a data detection and expansion method based on sample data, including the following steps: determining the sample data based on at least one piece of data in a database, wherein the database stores a plurality of pieces of data acquired by detecting mass data; searching in the mass data based on the sample data to obtain matched data matched with the sample data in the mass data; processing the matching data to obtain a matching rule, and updating a fingerprint database, wherein the matching rule obtained historically is stored in the fingerprint database; and performing matching extraction in the mass data based on the updated fingerprint database to obtain data matched with the matching rule in the updated fingerprint database in the mass data, and expanding the data obtained by matching to the database.
Optionally, the determining the sample data based on at least one piece of data in the database includes the following steps: and selecting a preset amount of data from the database, and taking the characteristic information of the preset amount of data as the sample data.
Optionally, the feature information includes: the feature identification codes of the preset amount of data; or a regular expression determined according to the preset amount of data.
Optionally, searching in the mass data based on the sample data to obtain matching data in the mass data, where the matching data matches the sample data, includes the following steps: and searching data with the same characteristic information as the sample data in the mass data, and taking the data with the same characteristic information as the matching data.
Optionally, when searching in the mass data based on the sample data, if a preset limiting condition exists, searching in a part of data defined by the preset limiting condition in the mass data to obtain the matching data.
Optionally, the processing the matching data to obtain the matching rule, and updating the fingerprint database includes the following steps: carrying out structuralization processing on the matched data to obtain standard data arranged according to a preset format; generating the matching rule based on the standard data and removing duplication; and updating the fingerprint database based on the matching rule after the duplication removal.
Optionally, generating the matching rule based on the standard data and removing duplication includes the following steps: converting the standard data into the matching rule according to the preset format; and removing repeated items in the matching rule obtained by conversion to obtain the matching rule after duplication removal.
Optionally, updating the fingerprint database based on the deduplicated fingerprint includes the following steps: comparing the de-duplicated matching rules with the matching rules in the fingerprint database to remove duplicate items for the second time; and updating the matching rule after the repeated items are removed twice to the fingerprint database.
Optionally, the data is an internet access record.
An embodiment of the present invention further provides a data detection and expansion device based on sample data, including: the determining module is used for determining the sample data based on at least one piece of data in a database, and the database stores a plurality of pieces of data acquired by detecting mass data; the searching module is used for searching in the mass data based on the sample data so as to obtain matched data matched with the sample data in the mass data; the updating module is used for processing the matching data to obtain a matching rule and updating a fingerprint database, and the matching rule obtained historically is stored in the fingerprint database; and the extraction module is used for performing matching extraction on the mass data based on the updated fingerprint database to obtain data matched with the matching rule in the updated fingerprint database in the mass data, and expanding the data obtained by matching to the database.
Optionally, the determining module includes: and the selection submodule is used for selecting data with preset quantity from the database and taking the characteristic information of the data with the preset quantity as the sample data.
Optionally, the feature information includes: the feature identification codes of the preset amount of data; or a regular expression determined according to the preset amount of data.
Optionally, the searching module includes: and the first searching submodule is used for searching the data with the same characteristic information as the sample data in the mass data and taking the data with the same characteristic information as the matching data.
Optionally, the search module further includes a second search submodule, where the second search submodule is configured to search, when searching for the mass data based on the sample data, if a preset limiting condition exists, a part of data defined by the preset limiting condition in the mass data, so as to obtain the matching data.
Optionally, the update module includes: the processing submodule is used for carrying out structural processing on the matched data so as to obtain standard data arranged according to a preset format; the generation submodule is used for generating the matching rule based on the standard data and removing duplication; and the updating submodule is used for updating the fingerprint database based on the matching rule after the duplication is removed.
Optionally, the generating sub-module includes: the conversion unit is used for converting the standard data into the matching rule according to the preset format; and the duplication removing unit is used for removing repeated items in the converted matching rule to obtain the duplicated matching rule.
Optionally, the update sub-module includes: the comparison unit is used for comparing the matching rule after the duplication removal with the matching rule in the fingerprint database so as to remove repeated items for the second time; and the updating unit is used for updating the matching rule after the repeated items are removed twice to the fingerprint database.
Optionally, the data is an internet access record.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:
firstly, determining sample data according to at least one piece of data in a database, searching in mass data based on the sample data to detect and obtain matched data matched with the sample data from the mass data, then processing the matched data to obtain a matching rule so as to update a fingerprint database, finally, matching and extracting in the mass data based on the updated fingerprint database so as to obtain data matched with the matching rule in the updated fingerprint database in the mass data, and expanding the data obtained by matching to the database so as to realize data detection and expansion based on the sample data. Compared with the existing data discovery and expansion scheme mainly based on manual or API inquiry, the technical scheme of the embodiment of the invention generates the matching rule based on the sample data, performs matching extraction on the original data source (namely mass data) according to the matching rule to expand the database, determines the sample data from the expanded database and repeats the steps to finally form a closed loop circulation flow. By the technical scheme provided by the invention, global and systematic analysis and processing of mass data can be more accurately and efficiently carried out.
Further, a preset amount of data is selected from the database, the characteristic information of the preset amount of data is used as the sample data, the sample data is used as a template to be detected in the mass data, so that the data matched with the sample data is obtained to expand the database, the data stored in the database are ensured to be the data with the same characteristic information, and the use requirement of a user for finding and collecting specific types of data from the mass data is met.
Drawings
FIG. 1 is a flow chart of a data detection and expansion method based on sample data according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a data detection and expansion method based on sample data according to a second embodiment of the present invention;
FIG. 3 is a flow chart of a data detection and expansion method based on sample data according to a third embodiment of the present invention;
FIG. 4 is a schematic diagram of a character matching tree constructed by the data detection and expansion method based on sample data according to the embodiment of the present invention;
fig. 5 is a schematic structural diagram of a data detection and expansion apparatus based on sample data according to a fourth embodiment of the present invention.
Detailed Description
As mentioned in the background, the existing methods for discovering and expanding data required by users from mass data are still limited to two ways of manual retrieval or API query. However, the former method needs a lot of manpower to manually analyze and count the data; the latter is not adaptable to global analysis and processing of data.
In order to solve the technical problem, according to the technical scheme, sample data is determined according to at least one piece of data in a database, the sample data is searched in mass data based on the sample data, matched data matched with the sample data is obtained by detection from the mass data, then the matched data is processed to obtain a matching rule, so that a fingerprint database is updated, finally, matching extraction is carried out on the mass data based on the updated fingerprint database, so that data matched with the matching rule in the updated fingerprint database in the mass data is obtained, the data obtained by matching is expanded to the database, and data detection and expansion based on the sample data are achieved.
Those skilled in the art understand that as internet users expand, the proliferation of internet sites and the rapid increase in internet bandwidth, more and more users generate more and more internet user behavior (i.e., internet access records) on more and more sites. And the behaviors are recorded in a log form by various data collectors and stored as data (namely mass data). The technical scheme of the embodiment of the invention generates the matching rule based on the sample data, performs matching extraction on the original data source (namely mass data) according to the matching rule to expand the database, then determines the sample data from the expanded database and repeats the steps, and finally forms a closed loop circulation flow. By the technical scheme provided by the invention, global and systematic analysis and processing of mass data can be more accurately and efficiently carried out.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Fig. 1 is a flowchart of a data detection and expansion method based on sample data according to a first embodiment of the present invention. Wherein the data may be an internet access record.
Specifically, in this embodiment, step S101 is first executed to determine the sample data based on at least one piece of data in a database, where a plurality of pieces of data obtained by probing from mass data are stored in the database. More specifically, the mass data may be historical data obtained from the internet, such as historical internet access records of all users, or internet access records of selected users during selected periods. In a preferred embodiment, the amount of the sample data may be set individually according to the data processing capability of the hardware or software implementing the embodiment of the present invention, for example, the amount of the sample data may be between 1 ten thousand and 10 ten thousand. Preferably, the data may be represented in a Uniform Resource Locator (URL), or the data may be represented in the form of one or more URLs (refer URL), user agents (user agents), cookies, etc., and those skilled in the art may also change further embodiments according to actual needs, which is not described herein again.
And then, step S102 is carried out, and the mass data is searched based on the sample data to obtain matched data matched with the sample data in the mass data. Specifically, the matching may refer to that the matching data and the sample data have the same rule. Preferably, this step may be performed simultaneously or sequentially on at least one device cluster, wherein the device cluster may be coupled by one or more computers. In a preferred embodiment, the mass data may be dispersed to a computer composed of a plurality of clusters for processing, and then matching data matched by the computers in each cluster is summarized, for example, the dispersion processing and the summarization of the mass data may be implemented by a mapping specification (Mapreduce) task based on a Distributed system infrastructure (Hadoop Distributed file system).
Step S103 is executed next, the matching data is processed to obtain a matching rule, and a fingerprint library is updated, and the matching rule obtained in history is stored in the fingerprint library. Specifically, the matching rule is used to describe a rule that the sample data and the matching data have in common. More specifically, the fingerprint database is used for storing the matching rules extracted from the matching data after the technical scheme of the embodiment of the invention is executed historically. Those skilled in the art understand that the subsequent iteration operation can be better promoted by continuously enriching the fingerprint database, so that the technical scheme of the embodiment of the invention can obtain more data in mass data based on the updated fingerprint database in a matching manner.
And finally, executing step S104, performing matching extraction in the mass data based on the updated fingerprint database to obtain data matched with the matching rule in the updated fingerprint database in the mass data, and expanding the data obtained by matching to the database. In a preferred embodiment, the mass data is processed item by item based on the updated fingerprint database, and the matching result item by item is sorted and recorded, so as to update the data obtained by matching to the database, thereby realizing effective expansion of the volume of the database.
In a variation of this embodiment, after the step S104 is executed, the step S101 may be executed again based on the expanded database, so as to generate more sample data based on the expanded database, further detect and obtain more matching data in the mass data, and finally further expand the database.
Thus, the scheme of the first embodiment is adopted, the matching rule is generated based on the sample data, the matching extraction is performed on the original data source (namely mass data) according to the matching rule so as to expand the database, then the sample data is determined from the expanded database, and the steps are repeated. By the technical scheme of the embodiment of the invention, a closed-loop iterative processing mechanism can be formed, which is beneficial to more accurately and efficiently analyzing and processing the mass data globally and systematically by a user.
Fig. 2 is a flowchart of a data detection and expansion method based on sample data according to a second embodiment of the present invention. Specifically, in this embodiment, step S201 is first executed to select a preset number of data from the database, and use the feature information of the preset number of data as the sample data. More specifically, the preset number is determined by a user according to a data processing capability of hardware or software that performs the embodiment of the present invention. Preferably, the feature information may be a feature identification code of the preset amount of data. For example, when the data is URL information of a commodity, the feature identification code may be an identification code (ID) of the commodity, and the identification code may be extracted from the URL information corresponding to the commodity.
And then, step S202 is performed to search data having the same characteristic information as the sample data in the mass data, and use the data having the same characteristic information as the matching data. Preferably, for the mass data also represented by the URL, the URL of each mass data may be divided into three matching locations (host, path, and query) according to a structure, and the selected matching location is compared with the sample data in a manner of selecting one or two or all matching locations, so as to search for data having the same characteristic information as the sample data from the mass data. Preferably, for sample data using the feature identification code as the feature information, data having the same feature information as the sample data can be searched for in the massive data according to different matching rules.
In a preferred example, the host locations in the URLs of the mass data may be matched, and the host locations of the URLs of the mass data and the sample data may have the same feature identification code by searching in a manner that the left side includes matching. Preferably, the left-side containing matching may mean that the left side of the character string of the position to be matched (i.e. the host position in the foregoing preferred example) completely matches the feature identification code of the sample data. For example, if the host location of the URL of a certain data in the mass data includes the character string item _44123_ abcde, the character string may be considered to completely match the sample data represented by the feature identification code item _44123, so as to determine that the mass data and the sample data have the same feature information.
Step S203 is executed next, the matching data is processed to obtain matching rules, and a fingerprint library is updated, the fingerprint library storing the matching rules obtained in history. Specifically, a person skilled in the art may refer to step S103 in the embodiment shown in fig. 1, which is not described herein again. Preferably, the matching rule is used for filtering and extracting common features of a plurality of matching data.
And finally, executing step S204, performing matching extraction in the mass data based on the updated fingerprint database to obtain data matched with the matching rule in the updated fingerprint database in the mass data, and expanding the data obtained by matching to the database. Specifically, a person skilled in the art may refer to step S104 in the embodiment shown in fig. 1, which is not described herein again. In a preferred embodiment, all data included in the mass data are matched item by item according to the matching position, for example, matching may be performed according to a matching sequence of a host, a path, and a query. Specifically, it is first determined that the host portion of the URL of the data can be matched with the host portion of the matching rule included in the fingerprint library, if the two host portions are not matched, the data is skipped to match other data included in the massive data, if the two host portions are matched, the path portion of the data is continuously matched with the path portion of the matching rule, and when the path portions of the two host portions are also matched, the query portion of the data is matched with the query portion of the matching rule, so as to finally determine whether the data is matched with the matching rule in the updated fingerprint library.
Further, if the data are determined to meet the matching condition of the matching rule, extracting a part matched with the matching rule from the data and updating the part to the database.
Further, the mass data are matched one by one based on the updated fingerprint database to determine the data matched with the matching rule in the updated fingerprint database in the mass data, and the data content of the matched part is extracted and arranged to the database, so that the volume of the database is greatly expanded.
Further, in the implementation of the embodiment of the present invention, the dirty data that may be obtained during the detection and expansion may be screened in combination with manual and/or automatic computer identification to ensure the validity and accuracy of the data that is finally updated into the database.
In a variation of step S201, the feature information may also be a regular expression determined according to the preset amount of data. Those skilled in the art will appreciate that the regular expression may be used to match the characteristic information of all data randomly selected from the database, or the regular expression may be used to match the characteristic information of all data that a user wishes to detect and augment from the mass data.
For example, if it is desired to detect and expand mass data using the device feature identification code as sample data, the sample data randomly selected from the database includes a device feature identification code of the telecommunication device and a device feature identification code of the Mobile device, and the device feature identification code of the telecommunication device is represented based on an International Mobile Equipment Identity (IMEI), and the device feature identification code of the Mobile device is represented based on a Mobile Equipment Identity (MEID), and the two device feature identification codes have a common point that both are numbers with 11 beginning at 1, so that the regular expression can be determined by referring to the common point.
Also for example, if all data randomly selected from the database is Media Access Control (MAC) addresses, the regular expression can be expressed as "/^ ([ a-zA-Z0-9] {8} \ - [ a-zA-Z0-9] {4} \ - [ a-zA-Z0-9] {4} \[ a-zA-Z0-9] {4} \[ a-zA-Z0-9] {12}) $/".
For another example, if the user wishes to detect and expand the mass data to obtain data in a specific geographic area, the regular expression may also be used to represent the specific geographic area by defining longitude and latitude.
Further, searching for data having the same characteristic information as the sample data in the mass data according to different matching rules includes directly performing regular matching on a portion to be matched (i.e., a selected matching position) of the data and a regular expression of the sample data, and if the portion with matching meets the matching condition of the regular expression, determining that the data and the sample data have the same characteristic information. For example, the regular expression of the sample data may be shop- (\ d +), and for a piece of data, if the URL of the portion to be matched of the data is shop-33415-23-test, it may be determined that the data and the sample data have the same characteristic information because the URL of the portion to be matched conforms to the logic of the regular expression.
In a variation of the step S202, the matching rule further includes matching on the right side, and if a character string of a position to be matched of a certain data in the mass data is completely matched with the feature identification code of the sample data, it is determined that the data and the sample data have the same feature information. For example, if the path position of the URL of a certain data in the mass data includes the character string car _ shanghai _ ser33456 and the characteristic identifier of the sample data is ser3356, it may be determined that the mass data and the sample data have the same characteristic information.
In another variation of step S202, the matching rule further includes a matching rule of a complete equality, and if a character string of a position to be matched of a certain data in the mass data is completely equal to the feature identification code of the sample data, it is determined that the data and the sample data have the same feature information. For example, the character string "shop" 33415& category "23 & item" test "may be considered to be identical to the feature identification code 33415.
In another variation of step S202, the matching rule further includes matching, and if a character string of a to-be-matched position of a certain data in the mass data includes the feature identification code of the sample data, it is determined that the data and the sample data have the same feature information. For example, the string shop-33415-23-test may be considered to contain the feature identification code 33415.
In a variation of step S204, when the feature information is a regular expression determined according to the preset amount of data, and when the currently scanned data in the mass data has the same feature information as the sample information, the regular expression may be directly extracted from the currently scanned data, and the regular expression is updated to the fingerprint library.
In a variation of this embodiment, when searching for the mass data based on the sample data, if a preset limiting condition exists, the step S202 searches for a partial data defined by the preset limiting condition in the mass data to obtain the matching data. Preferably, for data and sample data represented by a URL, the preset restriction condition may be a top-level domain name tld in the URL. For example, the user may select to define the top-level domain name tld for some or all of the sample data selected and determined in the step S201, and then the technical solution of the embodiment of the present invention preferably only detects and expands the data on the website where the top-level domain name tld is located to the database for the sample data defined by the top-level domain name tld when the step S202 to the step S204 are executed.
Further, the preset limiting condition may be set according to a user requirement or a data processing capability of a device that executes the technical solution of the embodiment of the present invention.
Further, the top domain names tld of the sample data may be the same or different, for example, the top domain name tld of half of the sample data and the top domain name tld of the other half of the sample data in all the sample data selected and determined from the database may be set as different websites, so as to perform data detection and retrieval in two websites simultaneously based on the technical solution of the embodiment of the present invention.
In a typical application scenario, when a computer executes the technical solution of the embodiment of the present invention, first, the sample data is loaded into a local memory of the computer, and when part or all of the data in the sample data has the preset top-level domain name tld, a mapping table may be constructed in the local memory, where the mapping table is used to classify and store feature information or regular expressions of one or more sample data having the same top-level domain name tld in the sample data.
Preferably, for an application scenario in which the feature information is the feature identification code, a character matching tree may be constructed for one or more sample data with the same top-level domain name tld, so as to improve matching efficiency in subsequent data detection and expansion.
Preferably, for the application scenario in which the characteristic information is the regular expression, the respective regular expressions of one or more sample data with the same top-level domain name tld may also be stored as a list, so as to perform the subsequent detecting and expanding steps. As a variation, the regular expression may also be determined for multiple sample data having the same top-level domain name tld.
Further, when the sample data and the mass data are both represented based on URLs, in step S202, preferably, the URLs of the sample data are processed first to obtain a top-level domain name tld corresponding to the sample data, then when the mass data is scanned one by one, it is determined whether the URL of the currently scanned data includes the top-level domain name tld, and if the determination result indicates that the URL of the currently scanned data does not include the top-level domain name tld, the data is directly skipped; otherwise, if the judgment result indicates that the URL of the currently scanned data includes the top-level domain name tld, the step S202 is executed again, and the URL of the data is compared with the feature information of the sample data based on the selected matching position, so as to search the mass data for data having the same feature information as the sample data.
Therefore, by adopting the scheme of the second embodiment, data having the same characteristic information as the sample data in the mass data can be detected according to the sample data, so that the data finally expanded into the database has the same characteristic information, and the actual use requirement of a user for finding and expanding the data of a specific type in the mass data is met.
Those skilled in the art understand that, in this embodiment, the step S201, the step S202, and corresponding variations can be understood as a specific implementation manner of the step S101 and the step S102 in the embodiment shown in fig. 1, and the matching workload when matching in the massive data is reduced by the preset limiting condition, and at the same time, the user is allowed to perform data detection and expansion for a specific website. Further, a user can select whether the preset limiting condition needs to be set according to actual requirements, wherein when the user does not set the preset limiting condition, all access records on the internet are used as the mass data to perform data detection and expansion (namely, whole-network search); when the user sets the preset limiting condition, the embodiment of the present invention uses the access records on one or more websites defined by the preset limiting condition as the mass data to obtain the data required by the user (i.e. specific website search).
As a variation, when the user selects to perform the global search, the embodiment of the present invention may first perform the technical solution of the embodiment of the present invention on a plurality of websites once to obtain the matching rules from each website, and after integrating the matching rules of each of the plurality of websites into a universal match symbol, perform the global search using the universal match symbol as the feature information of the sample data.
Fig. 3 is a flowchart of a data detection and expansion method based on sample data according to a third embodiment of the present invention. Specifically, in this embodiment, step S301 is first executed to select a preset number of data from the database, and use the feature information of the preset number of data as the sample data. More specifically, a person skilled in the art may refer to step S201 in the embodiment shown in fig. 2, which is not described herein again.
And then, step S302 is executed to search data having the same characteristic information as the sample data in the mass data, and use the data having the same characteristic information as the matching data. Specifically, a person skilled in the art may refer to step S202 in the embodiment shown in fig. 2, which is not described herein again.
Step S303 is executed next, and the matching data is structured to obtain standard data arranged according to a preset format. Specifically, the result of the structuring process may be represented in a table form, wherein the table records all or part of the content of the matching data by category. More specifically, the standard data may be a result obtained by arranging the contents in the table according to the preset format. In a preferred embodiment, the matching data is also represented in the form of a URL, the categories recorded in the table include a top-level domain name tld, a port (port), a matching parameter (querykey), a matching location, matching content, and a matching manner, and this step may be performed by splitting the URL of the matching data according to the categories recorded in the table, and then reordering and integrating the split results according to the preset format, where the result of reordering and integrating is the standard data.
Then, the step S304 is performed, and the matching rule is generated and deduplicated based on the standard data. In a preferred embodiment, the standard data may be first converted into the matching rule according to the preset format, and then the repeated items in the converted matching rule are removed to obtain the duplicate-removed matching rule. Those skilled in the art understand that, through the processing in step S303, the standard data may include only key information required for performing subsequent matching work, and cannot be directly applied to the subsequent step, so that the standard data needs to be processed in this step, and is converted into the matching rule according to the preset format, so as to be used in the subsequent step; on the other hand, since the design of the URL of the same website generally has similarity, after all the matching rules are obtained by the conversion in this step, the duplicate removal processing may be performed on all the matching rules to remove the duplicate items in the matching rules obtained by the conversion in this step.
Step S305 is performed next, and the fingerprint database is updated based on the matching rule after the duplication removal. In particular, the updating comprises storing the de-duplicated matching rules to the fingerprint repository. More specifically, the updating further includes removing matching rules that are repeated with existing matching rules in the fingerprint database from the matching rules after the duplication removal. In a preferred embodiment, the match rule after the duplication removal is compared with the match rule in the fingerprint database to remove duplicate items twice, and then the match rule after the duplicate items are removed twice is updated to the fingerprint database.
And finally, executing step S306, performing matching extraction in the mass data based on the updated fingerprint database to obtain data matched with the matching rule in the updated fingerprint database in the mass data, and expanding the data obtained by matching to the database. Specifically, a person skilled in the art may refer to step S104 in the embodiment shown in fig. 1, which is not described herein again.
Further, the matching rule may be understood as a combination of filtering and extracting data.
In a preferred application scenario, the top-level domain name tld and the matching parameters in the matching rule may be used to filter data. For example, when the step S305 is executed, it may be first preliminarily determined whether the currently scanned data in the massive data is worth further matching work based on the top-level domain name tld and the matching parameter, and if the top-level domain name tld of the currently scanned data does not match the top-level domain name tld recorded in the matching rule, the currently scanned data may be directly rejected, so as to save the matching amount of the embodiment of the present invention and improve the matching efficiency.
In another preferred application scenario, the matching manner, the matching position, and the matching content or regular expression in the matching rule may be used to extract data to finally determine whether the currently scanned data has the same characteristic information as the sample data.
Further, the fingerprint database and the database may be stored in a computer executing the embodiment of the present invention, may also be stored in other storage devices coupled to the computer, or may also be stored in a cloud.
From the above, by adopting the solution of the third embodiment, it can be understood that in this embodiment, the step S303, the step S304, and the step S305 are the step S103 in the embodiment shown in fig. 1, or a specific implementation manner of the step S203 in the embodiment shown in fig. 2, through the structuring process, a plurality of matching data obtained through matching in different ways can have a highly uniform format, which is beneficial to the subsequent processing, and on the other hand, through the deduplication in the step S304 and the secondary deduplication in the step S305, it is ensured that no duplicate item occurs in the matching rule in the fingerprint library, so as to avoid meaningless waste of storage resources.
In a typical application scenario, the data is an item sold on a website, and the data is represented in the form of a URL, the database stores part of the goods sold on the website, the information of other goods sold on the website which the user wants to obtain, the user can adopt the technical scheme of the embodiment of the invention to randomly select a preset number of commodities from the plurality of commodities in the database, and the number of the selected commodity on the website is used as the characteristic identification code of the selected commodity, com (i.e., the preset restriction condition set by the top-level domain name tld), the user selects 2 commodities in the database as the sample data, and if the serial number of the commodity A on the website is item1234, and the serial number of the commodity B on the website is item1368, the sample data is item1234 and item 1368.
When the technical scheme of the embodiment of the invention is executed based on the sample data to search in the mass data, firstly, the sample data can be loaded in the local memory of the computer executing the embodiment of the invention, and a dictionary is constructed. The dictionary key (key) is a top-level domain name tld of the sample data (in the present application scenario, host.com), and the value (value) of the dictionary is a character matching tree under the top-level domain name tld. Preferably, the character matching tree is constructed by splitting the character strings of all sample data into individual characters. Preferably, in this application scenario, a character matching tree shown in fig. 4 can be constructed and obtained based on the sample data item1234 and item 1368.
And then scanning the mass data one by one to search based on the character matching tree. Com, if not equal, skipping the currently scanned data; and if the current scanned data are equal, performing subsequent matching work on the current scanned data.
Com, for the currently scanned data with top-level domain name tld equal to host, it needs to perform equal matching on the query portion of the URL of the currently scanned data (i.e. the matching location is a query, and the matching rule is equal matching). In http:// a.host.com/path/test.html? For example, when the URL i234& qk2 item _1246& item _ id item _1234 represents the currently scanned data, the URL may be split first to obtain a query portion in the URL of the currently scanned data, the query portion may be further split by separators "&" and "═" to obtain dictionaries { "qk1": i123"," qk2": item _1246", "item _ id": item _1234 "represented in the form of key value pairs, and then the dictionaries may be traversed to search values in the dictionaries on the character matching tree shown in fig. 4 one by one according to characters.
For example, when the value i123 is matched, i is matched first, and the matching is successful; and then the second character 1 of the value i123 is matched downwards, and the child node list of the character i in the character matching tree shown in fig. 4 only has the character t and does not contain 1, so that the matching of the value i123 is unsuccessful.
As another example, when matching value item _1246, the first character i matches successfully; the second character, t, is also included in the list of children of character i in the character matching tree shown in FIG. 4; the third character e is also in the child node list of the t character of the character matching tree shown in FIG. 4; the character e, the character m and the character 1 are matched with the character matching tree shown in FIG. 4 in the same way; next, matching character 2, character 1 in the character matching tree shown in fig. 4 has two child nodes [2,3] containing the character 2 to be matched, so that character 4 can be continuously matched; in matching the character 4, since it is determined that the value item _1246 may possibly match the branch of the character 2 in the child node [2,3] below the character 1 in the character matching tree shown in fig. 4 when the last character 2 is matched, matching of the character 4 based on the branch of the character 2 is continued, but since the child node below the node of the character 2 in the branch of the character 2 in the character matching tree shown in fig. 4 is the character 3 and does not contain the character 4 to be matched, matching of the value item _1246 is also unsuccessful.
For another example, when matching the value item _1234, through the foregoing matching step with the character matching tree shown in fig. 4, it may be determined that the value item _1234 and the character matching tree shown in fig. 4 can be completely matched, so it is determined that the URL of the data to be scanned contains the sample data, and the matching parameter is the product ID.
Table 1 matching data list based on URL representation
http://a.host.com/path/test.html?qk1=i234&qk2=item_1246&item_id=item_1234
http://b.host.com/test?item_id=item_1368&a=c
http://c.host.com:1234/test?id=item_1234
http://item_1368.host.com/detai_info.html
http://a.host.com:3345/category-1234-item_1234-t12
http://a.host.com:3567/item/item_1234/detail.html
Continuing to scan the mass data, it is also possible to obtain the following matching data based on URL representation. The matching data may include the URLs shown in table 1 above.
Table 2 table 1 table for structured standard data
Figure BDA0001200453230000161
Figure BDA0001200453230000171
As shown in table 2, after scanning the mass data one by one based on the sample data is completed, the matching data obtained by the search may be structured to obtain the standard data represented based on the preset format. Preferably, the standard data is arranged in the order of top-level domain name tld, port (port), matching parameter (querykey), matching location, matching content, and matching manner, wherein the default content is indicated by null. For example, for a port, when the port is a default value (i.e. 80), it may be omitted from the URL, and then the standard data is also indicated by a space. For another example, for the matching data obtained by searching the mass data with the path as the matching position in the embodiment of the present invention, after the matching data is structured into the standard data, the matching parameters of the standard data are null.
Table 3 matching rule list obtained based on the standard data conversion of table 2
Figure BDA0001200453230000172
Figure BDA0001200453230000181
For the standard data listed in table 2, converting the standard data into the matching rule according to the preset format, as shown in table 3. Where (item _ \ d +) is a regular expression used to represent a string of characters that begins with item _ and is followed by a number.
Further, according to the matching rule and the matching parameter, a second row may be deduplicated in the matching rule listed in table 3; and then comparing the matching rule with the existing matching rule in the fingerprint database, removing the matching rule which is possibly repeated with the existing matching rule in the fingerprint database in the table 3, and finally updating the matching rule subjected to twice duplication removal to the fingerprint database.
Further, the updated fingerprint database is reapplied to the mass data, and rescanning is performed based on the sequence of the host, the path and the query, so that the newly added matching rule can be matched with the URLs of more commodities, and the URL of the commodity obtained through matching (or the part of the commodity URL, which meets the matching rule) is updated to the database, so that the database can be expanded finally.
For example, the new matching rule http://. host.com/? item _ id? item _ id ═ test1& b ═ c, or URL:// test1.host. com/path/subpath/subpath/a. html? And the two newly matched commodity URLs of the commodity with the matching rule of q1, v1, 2, v2, item _ id and 11111 are test1 and 11111.
Those skilled in the art understand that, by the technical solution of the embodiment of the present invention, based on the sample data item _1234 in the database, two data, i.e. test1 and 11111, are finally extended. In the practical application process, the technical scheme of the embodiment of the invention can find a large amount of potential data in the long-tailed URL, thereby greatly expanding the database and realizing deep mining of mass data.
Fig. 5 is a schematic structural diagram of a data detection and expansion apparatus based on sample data according to a fourth embodiment of the present invention. Those skilled in the art understand that the data detection and expansion device 4 of the present embodiment is used to implement the method solutions in the embodiments shown in fig. 1 to fig. 4. Specifically, in this embodiment, the data detecting and expanding device 4 includes a determining module 41, configured to determine the sample data based on at least one piece of data in a database, where a plurality of pieces of data detected from mass data are stored; a searching module 42, configured to search the mass data based on the sample data to obtain matching data in the mass data, where the matching data matches the sample data; an updating module 43, configured to process the matching data to obtain a matching rule, and update a fingerprint database, where the matching rule obtained in history is stored in the fingerprint database; and an extraction module 44, configured to perform matching extraction on the mass data based on the updated fingerprint database to obtain data in the mass data that matches the matching rule in the updated fingerprint database, and expand the data obtained through matching to the database.
Further, the determining module 41 includes a selecting sub-module 411, configured to select a preset amount of data from the database, and use characteristic information of the preset amount of data as the sample data. Preferably, the feature information includes a feature identification code of the preset amount of data; or a regular expression determined according to the preset amount of data.
Further, the searching module 42 includes a first searching submodule 421, configured to search for data having the same characteristic information as the sample data in the massive data, and use the data having the same characteristic information as the matching data.
Further, the search module 42 further includes a second search submodule 422, where the second search submodule 422 is configured to, when searching for the mass data based on the sample data, if a preset limiting condition exists, search for a part of data defined by the preset limiting condition in the mass data to obtain the matching data.
Further, the updating module 43 includes a processing submodule 431, configured to perform structural processing on the matching data to obtain standard data arranged according to a preset format; a generating submodule 432, configured to generate the matching rule based on the standard data and perform deduplication; and an update submodule 433 for updating the fingerprint repository based on the de-duplicated matching rules.
Further, the generating sub-module 432 includes a converting unit 4321, configured to convert the standard data into the matching rule according to the preset format; and a duplicate removal unit 4322, configured to remove duplicate entries in the matching rule obtained through the conversion, to obtain the duplicate-removed matching rule.
Further, the update sub-module 433 includes a comparing unit 4331, configured to compare the de-duplicated matching rule with the matching rule in the fingerprint database to remove duplicate items twice; and an updating unit 4332, configured to update the matching rule with the duplicate entry removed twice to the fingerprint database.
Preferably, the data is an internet access record.
More contents of the working principle and the working mode of the data detection and expansion device 4 can refer to the related descriptions in fig. 1 to 4, and are not described again here.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (18)

1. A data detection and expansion method based on sample data is characterized by comprising the following steps:
determining the sample data based on at least one piece of data in a database, wherein the database stores a plurality of pieces of data acquired by detecting mass data;
searching in the mass data based on the sample data to obtain matched data matched with the sample data in the mass data;
processing the matching data to obtain a matching rule, and updating a fingerprint database, wherein the matching rule obtained historically is stored in the fingerprint database;
performing matching extraction in the mass data based on the updated fingerprint database to obtain data matched with the matching rule in the updated fingerprint database in the mass data, and expanding the data obtained by matching to the database;
and repeating the steps of determining sample data from the database to expanding the data obtained by matching to the database.
2. The method for data detection and expansion based on sample data according to claim 1, wherein the step of determining the sample data based on at least one piece of data in the database comprises the following steps:
and selecting a preset amount of data from the database, and taking the characteristic information of the preset amount of data as the sample data.
3. The method of claim 2, wherein the characteristic information comprises:
the feature identification codes of the preset amount of data; or
And the regular expression is determined according to the preset amount of data.
4. The method for data detection and expansion based on sample data according to claim 2, wherein the step of searching in the mass data based on the sample data to obtain the matching data matching with the sample data in the mass data comprises the following steps:
and searching data with the same characteristic information as the sample data in the mass data, and taking the data with the same characteristic information as the matching data.
5. The method according to claim 4, wherein when searching in the mass data based on the sample data, if a preset constraint exists, searching in a part of the mass data defined by the preset constraint to obtain the matching data.
6. The method of claim 1, wherein the matching data is processed to obtain matching rules and update the fingerprint database, comprising the steps of:
carrying out structuralization processing on the matched data to obtain standard data arranged according to a preset format;
generating the matching rule based on the standard data and removing duplication;
and updating the fingerprint database based on the matching rule after the duplication removal.
7. The method of claim 6, wherein generating the matching rules based on the standard data and de-duplicating the matching rules comprises:
converting the standard data into the matching rule according to the preset format;
and removing repeated items in the matching rule obtained by conversion to obtain the matching rule after duplication removal.
8. The method of claim 6, wherein the step of updating the fingerprint database based on the de-duplicated fingerprints comprises the steps of:
comparing the de-duplicated matching rules with the matching rules in the fingerprint database to remove duplicate items for the second time;
and updating the matching rule after the repeated items are removed twice to the fingerprint database.
9. The method of any one of claims 1 to 8, wherein the data is an internet access record.
10. A data probing and expansion device based on sample data, comprising:
the determining module is used for determining the sample data based on at least one piece of data in a database, and the database stores a plurality of pieces of data acquired by detecting mass data;
the searching module is used for searching in the mass data based on the sample data so as to obtain matched data matched with the sample data in the mass data;
the updating module is used for processing the matching data to obtain a matching rule and updating a fingerprint database, and the matching rule obtained historically is stored in the fingerprint database;
the extraction module is used for performing matching extraction on the mass data based on the updated fingerprint database to obtain data matched with the matching rule in the updated fingerprint database in the mass data and expanding the data obtained by matching to the database;
and repeating the steps of determining sample data from the database to expanding the data obtained by matching to the database.
11. The sample data-based data detection and expansion device according to claim 10, wherein the determination module comprises:
and the selection submodule is used for selecting data with preset quantity from the database and taking the characteristic information of the data with the preset quantity as the sample data.
12. The sample data-based data detection and expansion device of claim 11, wherein the characteristic information comprises:
the feature identification codes of the preset amount of data; or
And the regular expression is determined according to the preset amount of data.
13. The sample data-based data detection and expansion device of claim 11, wherein the lookup module comprises:
and the first searching submodule is used for searching the data with the same characteristic information as the sample data in the mass data and taking the data with the same characteristic information as the matching data.
14. The apparatus according to claim 13, wherein the search module further comprises a second search submodule, and the second search submodule is configured to search, when searching in the mass data based on the sample data, if a preset constraint condition exists, a part of data defined by the preset constraint condition in the mass data, so as to obtain the matching data.
15. The sample data-based data probing and expansion device according to claim 10, wherein said update module comprises:
the processing submodule is used for carrying out structural processing on the matched data so as to obtain standard data arranged according to a preset format;
the generation submodule is used for generating the matching rule based on the standard data and removing duplication;
and the updating submodule is used for updating the fingerprint database based on the matching rule after the duplication is removed.
16. The sample data-based data probing and expansion device according to claim 15, wherein said generating sub-module comprises:
the conversion unit is used for converting the standard data into the matching rule according to the preset format;
and the duplication removing unit is used for removing repeated items in the converted matching rule to obtain the duplicated matching rule.
17. The sample data-based data probing and expansion device according to claim 16, wherein said update submodule comprises:
the comparison unit is used for comparing the matching rule after the duplication removal with the matching rule in the fingerprint database so as to remove repeated items for the second time;
and the updating unit is used for updating the matching rule after the repeated items are removed twice to the fingerprint database.
18. The apparatus according to any one of claims 10 to 17, wherein the data is an internet access record.
CN201611264829.8A 2016-12-30 2016-12-30 Data detection and expansion method and device based on sample data Active CN106844553B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611264829.8A CN106844553B (en) 2016-12-30 2016-12-30 Data detection and expansion method and device based on sample data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611264829.8A CN106844553B (en) 2016-12-30 2016-12-30 Data detection and expansion method and device based on sample data

Publications (2)

Publication Number Publication Date
CN106844553A CN106844553A (en) 2017-06-13
CN106844553B true CN106844553B (en) 2020-05-01

Family

ID=59117193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611264829.8A Active CN106844553B (en) 2016-12-30 2016-12-30 Data detection and expansion method and device based on sample data

Country Status (1)

Country Link
CN (1) CN106844553B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647272A (en) * 2018-04-28 2018-10-12 江南大学 A kind of small sample extending method based on data distribution

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815488A (en) * 2018-12-26 2019-05-28 出门问问信息科技有限公司 Natural language understanding training data generation method, device, equipment and storage medium
CN111680286B (en) * 2020-02-27 2022-06-10 中国科学院信息工程研究所 Refinement method of Internet of things equipment fingerprint library

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1952929A (en) * 2005-10-20 2007-04-25 关涛 Extraction method and system of structured data of internet based on sample & faced to regime
CN103942282A (en) * 2014-04-02 2014-07-23 新浪网技术(中国)有限公司 Sample data obtaining method, device and system
CN104063474A (en) * 2014-06-30 2014-09-24 五八同城信息技术有限公司 Sample data collection system
CN105095240A (en) * 2014-05-04 2015-11-25 中国银联股份有限公司 Database data sample acquisition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1952929A (en) * 2005-10-20 2007-04-25 关涛 Extraction method and system of structured data of internet based on sample & faced to regime
CN103942282A (en) * 2014-04-02 2014-07-23 新浪网技术(中国)有限公司 Sample data obtaining method, device and system
CN105095240A (en) * 2014-05-04 2015-11-25 中国银联股份有限公司 Database data sample acquisition
CN104063474A (en) * 2014-06-30 2014-09-24 五八同城信息技术有限公司 Sample data collection system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647272A (en) * 2018-04-28 2018-10-12 江南大学 A kind of small sample extending method based on data distribution
CN108647272B (en) * 2018-04-28 2020-12-29 江南大学 Method for predicting concentration of butane at bottom of debutanizer by expanding small samples based on data distribution

Also Published As

Publication number Publication date
CN106844553A (en) 2017-06-13

Similar Documents

Publication Publication Date Title
US7818303B2 (en) Web graph compression through scalable pattern mining
Gao et al. Navigating the data lake with datamaran: Automatically extracting structure from log datasets
US10404731B2 (en) Method and device for detecting website attack
CN111868710A (en) Random extraction forest index structure for searching large-scale unstructured data
CN106844553B (en) Data detection and expansion method and device based on sample data
CN111752955A (en) Data processing method, device, equipment and computer readable storage medium
RU2568276C2 (en) Method of extracting useful content from mobile application setup files for further computer data processing, particularly search
CN111159334A (en) Method and system for house source follow-up information processing
CN115830649A (en) Network asset fingerprint feature identification method and device and electronic equipment
CN110333990B (en) Data processing method and device
CN112364014A (en) Data query method, device, server and storage medium
CN110263021B (en) Theme library generation method based on personalized label system
CN114817243A (en) Method, device and equipment for establishing database joint index and storage medium
CN114490923A (en) Training method, device and equipment for similar text matching model and storage medium
CN116032741A (en) Equipment identification method and device, electronic equipment and computer storage medium
CN109376138B (en) Abnormal combination detection method and device for multi-dimensional data
CN109710860B (en) URL (Uniform resource locator) classification matching method and device
US10229105B1 (en) Mobile log data parsing
CN110825947A (en) URL duplicate removal method, device, equipment and computer readable storage medium
CN111723122A (en) Method, device and equipment for determining association rule between data and readable storage medium
CN114257565B (en) Method, system and server for mining potential threat domain names
CN112579839B (en) Multi-mode matching method and device for large-scale features and storage medium
CN111488263A (en) Method and device for analyzing logs in MySQ L database
KR20100080345A (en) System and method for prompting an end user with a preferred sequence of commands which performs an activity in a least number of inputs
CN116502009B (en) Webpage filtering method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant