CN106682044A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN106682044A
CN106682044A CN201510767682.3A CN201510767682A CN106682044A CN 106682044 A CN106682044 A CN 106682044A CN 201510767682 A CN201510767682 A CN 201510767682A CN 106682044 A CN106682044 A CN 106682044A
Authority
CN
China
Prior art keywords
data
target data
screening
target
website
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510767682.3A
Other languages
Chinese (zh)
Other versions
CN106682044B (en
Inventor
刘嘉
钦滨杰
陈晓敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201510767682.3A priority Critical patent/CN106682044B/en
Publication of CN106682044A publication Critical patent/CN106682044A/en
Application granted granted Critical
Publication of CN106682044B publication Critical patent/CN106682044B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses data processing method and device, relates to the technical field of Internet, and mainly aims to reduce data screening time and increase data screening accuracy. According to the technical scheme, the method includes: extracting target data from to-be-processed data, wherein the target data comprises data attributes; caching the target data into preset favorites; responding to a data screening instruction to screen the target data in the preset favorites according to the data attributes to obtain screened target data; displaying the screened target data. The method is mainly applicable to data screening.

Description

The method and device of data processing
Technical field
The present invention relates to Internet technical field, more particularly to a kind of method and device of data processing.
Background technology
With developing rapidly for network, WWW becomes the carrier of mass data, how to efficiently extract And become a huge challenge using these data.It is have that valid data are filtered out in mass data Effect utilizes the one of which implementation of internet data.
Generally, in garbled data, according to the actual demand locking data source to data, the data source The usually webpage in website, then the data in data source are crawled by crawlers, will climb The data got be stored according to certain mode it is standby in database, when garbled data is needed, adjust With the data in the database and screened, by the data preparation for filtering out for data report form, To realize the effectively utilizes to mass data.
, in garbled data by the way, there are the following problems to find it for inventor:To data When data in storehouse are screened, need to screen the total data in database successively, if should Data volume in database is larger, then the time of a large amount of garbled datas can be taken during garbled data, And the accuracy of garbled data is relatively low;Meanwhile, if being screened during garbled data based on database Process interrupt, then need to screen the total data in database again, screens before interruption Data cannot retain, and cause the time consumed during garbled data excessive.
The content of the invention
In view of this, the method and device of a kind of data processing that the present invention is provided, main purpose is The holding time for reducing garbled data and the accuracy for improving garbled data.
In order to solve the above problems, present invention generally provides following technical scheme:
On the one hand, the invention provides a kind of method of data processing, the method includes:
Target data is extracted from pending data;Wherein, the target data includes data attribute value;
The target data is cached in default collection;
In response to garbled data instruction, according to the data attribute value to the mesh in the default collection Mark data are screened, with the target data after being screened;
Target data after the screening is shown.
On the other hand, the present invention also provides a kind of processing meanss of data, and the device includes:
Extraction unit, for extracting target data from pending data;Wherein, the target data Comprising data attribute value;
Buffer unit, for the target data that the extraction unit is extracted to be cached in into default collection In folder;
Screening unit, for instructing in response to garbled data, according to the data attribute value to described slow The target data that memory cell is buffered in the default collection is screened, with the mesh after being screened Mark data;
Display unit, for being shown to the target data after screening unit screening.
By above-mentioned technical proposal, the technical scheme that the present invention is provided at least has following advantages:
The method and device of the data processing that the present invention is provided, extracts first target from pending data Data, the wherein target data include data attribute value, by target data caching and the default receipts extracted Hide in folder, in response to garbled data instruction, according to the data attribute value of target data to presetting collection Interior target data is screened, after the target data after being screened, to the target data after screening It is shown;Directly treat from initialized data base compared with garbled data carries out screening with prior art, The present invention can be cached in the target data extracted from pending data in default collection, to contract The data volume of little data to be screened, so as to reduce the holding time of screening target data;Meanwhile, by It is inversely proportional to the degree of accuracy of screening target data in the data volume of target data, i.e. the data of target data Amount is less, and the degree of accuracy for screening target data is higher, and the data of the target data in default collection Amount is less, this improves the accuracy of screening target data.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the present invention's Technological means, and being practiced according to the content of specification, and in order to allow the above-mentioned of the present invention and Other objects, features and advantages can become apparent, below especially exemplified by the specific embodiment of the present invention.
Description of the drawings
By the detailed description for reading hereafter preferred embodiment, various other advantage and benefit for Those of ordinary skill in the art will be clear from understanding.Accompanying drawing is only used for illustrating the mesh of preferred embodiment , and it is not considered as limitation of the present invention.And in whole accompanying drawing, with identical with reference to symbol Number represent identical part.In the accompanying drawings:
Fig. 1 shows a kind of flow chart of the method for data processing provided in an embodiment of the present invention;
Fig. 2 shows a kind of composition frame chart of the device of data processing provided in an embodiment of the present invention;
Fig. 3 shows the composition frame chart of the device of another kind of data processing provided in an embodiment of the present invention.
Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing in accompanying drawing The exemplary embodiment of the disclosure is shown, it being understood, however, that may be realized in various forms the disclosure And should not be limited by embodiments set forth here.On the contrary, there is provided these embodiments are able to more Thoroughly understand the disclosure, and can be by the complete technology for conveying to this area of the scope of the present disclosure Personnel.
The embodiment of the present invention provides a kind of method of data processing, as shown in figure 1, the method includes:
101st, target data is extracted from pending data.
In the embodiment of the present invention, before to screening target data, first, internet target net is obtained The data of correspondence webpage in standing, and the pending data for obtaining is stored in initialized data base, so as to Target data is extracted from the initialized data base;The number of correspondence webpage in internet target website is obtained According to when, to be determined according to the different type of pending data need to obtain correspondence webpage in which website Content, the data type of pending data can be:Economic class data, video class data, scientific and technological class Data etc..The embodiment of the present invention is not entered to the content such as data type, specific targeted website screened Row is limited.
Generally, pending data is stored in initialized data base, treating in needing to initialized data base When processing data is screened, target data is extracted from pending data first, wherein, the mesh Mark packet value containing data attribute;The data attribute value is the data class of target data, for example, Target data is automotive-type data, military class data, scientific and technological class data can be believed by data attribute Breath makes a distinction.
As another kind of implementation of the embodiment of the present invention, data mode mark is also included in target data Know, the data mode is designated when appearance interruption during target data is extracted from pending data, In the corresponding data mode mark for interrupting addition at node of pending data, so as to from data mode mark Know and to continue to treat extract in processing data target data, save from pending middle extraction target data Time, and then improve screening target data holding time.
In the embodiment of the present invention is embodied as, when extracting target data from pending data, it is based on Default screening conditions extract target data from pending data, and the default screening conditions are artificially to set The screening conditions put, when default screening conditions are arranged, need to obtain pending data with from targeted website Screening conditions it is corresponding, can arrange default screening conditions with from targeted website obtain pending data Screening conditions are consistent;Or, it is also possible to the screening scope for arranging default screening conditions is less than from target network The corresponding screening scope of acquisition pending data screening conditions of standing.
Exemplary, if the screening conditions for obtaining pending data from targeted website are economic class data, Then preset screening conditions and could be arranged to stock, security, finance etc.;The embodiment of the present invention is to default The setting of screening conditions is not defined, but will be according to the actual demand of extraction target data to default Screening conditions are configured.
102nd, the target data is cached in default collection.
Initialized data base in step 101 is used to store pending data, but in the initialized data base The type of the pending data of storage is more, and coverage is wider, therefore, it is preset in order to reduce The coverage of the pending data in database, there is provided the accuracy of screening target data, will extract Target data afterwards is cached in default collection, wherein, the default collection is used to store number of targets According to, and the data volume of the pending data of the initialized data base is less than target in the default collection The data volume of data.
103rd, instruct in response to garbled data, according to the data attribute value in the default collection Target data screened.
Garbled data is instructed for screening to target data from default collection, to number of targets According to when being screened target data is screened according to data attribute value.According to the data attribute value Target data in the default collection is screened it is intended that being preset target in collection The data volume of data is less than the data volume of pending data in initialized data base, can save screening target The holding time of data;Secondly, when the result to screening target data is dissatisfied, can be from default Target data is screened in collection, the data volume in its default collection is little, can lift sieve Select the accuracy of target data.
104th, the target data after the screening is shown.
Target data after screening is shown, to be checked to the target data after screening, Use.
As a kind of implementation of the embodiment of the present invention, in the target data after showing screening, will Target data after screening is classified, the output display target data in the form of classification;As this Another kind of implementation of bright embodiment, the target data after screening is summed up, and output display is returned The target data of the General Logistics Department.The embodiment of the present invention is not entered to the concrete form for showing the target data after screening Row is limited.
The method of data processing provided in an embodiment of the present invention, extracts first target from pending data Data, the wherein target data include data attribute value, by target data caching and the default receipts extracted Hide in folder, in response to garbled data instruction, according to the data attribute value of target data to presetting collection Interior target data is screened, after the target data after being screened, to the target data after screening It is shown.
Specifically, directly treat from initialized data base compared with garbled data carries out screening with prior art, The target data extracted from pending data can be cached in default collection by the embodiment of the present invention It is interior, to reduce the data volume of data to be screened, so as to reduce the holding time of screening target data; Simultaneously as the data volume of target data is inversely proportional to the degree of accuracy of screening target data, i.e. number of targets According to data volume it is less, the degree of accuracy for screening target data is higher, and the number of targets in default collection According to data volume it is less, this improves screening target data accuracy.
It should be noted that step 103 provided by the present invention can directly using in target data Data attribute value carries out first time screening;The target can also be recycled after this is screened for the first time The property value of data determines the influence power of targeted website, and then further according to the influence power pair of the targeted website The target data carries out programmed screening;Obviously this programme can also first pass through the property value of target data Determine targeted website influence power, then recycle the influence power of determined targeted website to come to target Data are screened, and to this present invention any restriction is not done.
Further, as the refinement and extension to above-described embodiment, above-mentioned steps 103 are being performed When screening to the target data in the default collection according to the data attribute value, can adopt Following manner:
First, the website influence power of targeted website is obtained according to the data attribute value of target data;Then, Class indication is carried out using the website influence power to presetting the target data in collection;Finally, according to The class indication is screened to presetting the target data in collection;Wherein, targeted website is this The data source of target data described in bright embodiment;I.e. the target data is obtained from the targeted website, Website influence power is according to targeted website ownership place mark, targeted website ranking and the concern to targeted website Degree is constituted, and the default attention rate of the mainstream media is by default website visiting amount and default website visiting ranking It is determined that.
As a kind of implementation of the embodiment of the present invention, by the target data in default collection according to Class indication is stored;Or, as another kind of implementation of the embodiment of the present invention, according only to Website influence power carries out class indication to presetting the target data in collection, and is not entered with class indication Row storage, but in output display data to be screened, shown according to class indication.
In order to more clearly illustrate according to the website influence power of targeted website to presetting the mesh in collection Mark data carry out class indication, below will illustrate in exemplary fashion.
Exemplary, as shown in table 1, table 1 shows default collection provided in an embodiment of the present invention The schematic diagram of storage target data.Data source shown in table 1 is the network address of targeted website, its correspondence Website influence power weaken successively, therefore, in output display target data, website shadow can be based on The size for ringing power is shown.The exemplary only citing of table 1, the embodiment of the present invention is to default collection The concrete form of folder storage target data is not defined.
Table 1
It should be noted that being carried out point to presetting the target data in collection according to website influence power When class is identified, website influence power is bigger, and its authority is higher, illustrates the target obtained from the website Data are more representative, and the value of the target data is bigger;Website influence power is less, its It is authoritative lower, illustrate that the value of the target data of acquisition from website is less.
In the embodiment of the present invention, class indication is carried out to presetting the target data in collection, its purpose It is more accurately target data to be screened, target data is carried out based on class indication Mark, identifies significance level, data category of mark target data of target data etc..So as to Output display screening after target data when shown according to class indication.
In actual applications, when the target data in default collection carries out class indication, also may be used With the experience based on user, to presetting the target data in collection class indication, class indication are carried out When can be including but not limited to herein below, such as:It is important, more important, can delete etc., but When being to carry out class indication to presetting the target data in collection based on this kind of mode, user is depended on Experience, because the experience of user has differences, and causes the target data in collection to carry out Also there is difference in class indication;Specifically, the embodiment of the present invention is not defined to this.
Further, during default collection is screened to target data, if occurring interrupting, Then the interruption it is corresponding interrupt node at interpolation data status indicator, to be identified according to data mode Continue to be screened to presetting the target data in collection.
For example, when default collection memory storage target data is usually sequential storage, can interrupt At node after interpolation data status indicator, when according to data attribute value to preset collection in target Before data are screened, can detect first with the presence or absence of data mode mark in the default collection, If there is data mode mark, start to continue to presetting the mesh in collection from the data mode mark Mark data are screened, rather than target data is screened again from the starting position of default collection, Save the time that screening target data takes;If there is no data mode mark in default collection, Then target data can be screened from the starting position of default collection.
Further, when the target data after to the screening is shown, according to the contingency table Know and show the target data after screening, so that user enters according to class indication to the target data after screening Row effectively utilizes.
Further, before target data is extracted from pending data, based on crawlers from mesh Mark website obtains pending data, and the pending data of acquisition is stored in initialized data base, with Target data is extracted in the standby pending data from initialized data base.In the embodiment of the present invention, in base When crawlers obtain pending data from targeted website, can pass through but be not limited to following Mode realizes, for example:Crawlers obtain pending according to the mode of depth-first from targeted website Data;Or, crawlers are obtained according to breadth First or optimal preferential mode from targeted website Take pending data.The embodiment of the present invention obtains pending data to crawlers from targeted website Specific implementation is not defined.
Further, as the realization to method shown in above-mentioned Fig. 1, another embodiment of the present invention is also carried A kind of data processing is supplied, the device embodiment is corresponding with preceding method embodiment, for ease of reading, This device embodiment is no longer repeated one by one the detail content in preceding method embodiment, but it should Clearly, the device in the present embodiment can correspond to the full content realized in preceding method embodiment.This Inventive embodiments provide a kind of device of data processing, as shown in Fig. 2 the device includes:
Extraction unit 21, for extracting target data from pending data;Wherein, the number of targets According to comprising data attribute value;
Buffer unit 22, the target data for the extraction unit 21 to be extracted is cached in default In collection;
Screening unit 23, for instructing in response to garbled data, according to the data attribute value to described The target data that buffer unit 22 is buffered in the default collection is screened, after being screened Target data;
Display unit 24, is shown for the target data after screening to the screening unit 23.
Further, as shown in figure 3, the screening unit 23, including:
Acquisition module 231, for according to the data attribute value, obtaining the website influence power of targeted website; Wherein, the targeted website is the source website of the target data, and the website influence power is according to target Website ownership place mark, targeted website ranking and the attention rate to targeted website determine;
Sort module 232 is right for the website influence power obtained using the acquisition module 231 The target data in the default collection carries out class indication;
Screening module 233, for being preset to described according to the class indication of the sort module 232 The target data in collection is screened.
Further, as shown in figure 3, the screening unit 23 also includes:
Add module 234, for when according to the data attribute value to the mesh in the default collection When mark data carry out occurring interrupting in screening process, add number at corresponding interruption node in described interruption According to status indicator, to be continued to the mesh in the default collection according to data mode mark Mark data are screened.
Further, as shown in figure 3, the display unit 24, is additionally operable to according to the screening unit The class indication of the target data in 23 shows the target data after the screening.
Further, as shown in figure 3, described device also includes:
Acquiring unit 25, for the extraction unit 21 extract from pending data target data it Before, obtain the pending data based on crawlers;
Memory cell 26, after obtaining the pending data in the acquiring unit 25, by institute State pending data to be stored in initialized data base.
The device of data processing provided in an embodiment of the present invention, extracts first target from pending data Data, the wherein target data include data attribute value, by target data caching and the default receipts extracted Hide in folder, in response to garbled data instruction, according to the data attribute value of target data to presetting collection Interior target data is screened, after the target data after being screened, to the target data after screening It is shown;Directly treat from initialized data base compared with garbled data carries out screening with prior art, The target data extracted from pending data can be cached in default collection by the embodiment of the present invention It is interior, to reduce the data volume of data to be screened, so as to reduce the holding time of screening target data; Simultaneously as the data volume of target data is inversely proportional to the degree of accuracy of screening target data, i.e. number of targets According to data volume it is less, the degree of accuracy for screening target data is higher, and the number of targets in default collection According to data volume it is less, this improves screening target data accuracy.
The device of the data processing includes processor and memory, said extracted unit, buffer unit, Screening unit and display unit etc. are stored in memory as program unit, are deposited by computing device Storage said procedure unit in memory is realizing corresponding function.
Kernel is included in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can To arrange one or more, by adjusting kernel parameter holding time and the raising of garbled data are reduced The accuracy of garbled data.
Memory potentially includes the volatile memory in computer-readable medium, random access memory The form such as device (RAM) and/or Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM), memory includes at least one storage chip.
Present invention also provides a kind of computer program, when performing on data processing equipment, It is adapted for carrying out initializing the program code of there are as below methods step:Number of targets is extracted from pending data According to;Wherein, the target data includes data attribute value;The target data is cached in into default receipts Hide in folder;In response to garbled data instruction, according to the data attribute value in the default collection Target data screened, with the target data after being screened;To the number of targets after the screening According to being shown.
In the above embodiment of the present invention, the description to each embodiment all emphasizes particularly on different fields, certain reality Apply in example without the part described in detail, may refer to the associated description of other embodiment.
Those skilled in the art it should be appreciated that embodiments herein can be provided as method, system, Or computer program.Therefore, the application can be implemented using complete hardware embodiment, complete software Example or with reference to the form of the embodiment in terms of software and hardware.And, the application can be adopted at one Or it is multiple wherein include computer usable program code computer-usable storage medium (including but not Be limited to magnetic disc store, CD-ROM, optical memory etc.) on the computer program implemented Form.
The application is with reference to the method according to the embodiment of the present application, equipment (system) and computer program The flow chart and/or block diagram of product is describing.It should be understood that can be realized flowing by computer program instructions In each flow process and/or square frame and flow chart and/or block diagram in journey figure and/or block diagram Flow process and/or square frame combination.Can provide these computer program instructions to all-purpose computer, specially With the processor of computer, Embedded Processor or other programmable data processing devices producing one Machine so that produced by the instruction of computer or the computing device of other programmable data processing devices It is raw to be used to realize in one flow process of flow chart or one square frame of multiple flow processs and/or block diagram or multiple sides The device of the function of specifying in frame.
These computer program instructions may be alternatively stored in can guide computer or other programmable datas to process In the computer-readable memory that equipment works in a specific way so that be stored in the computer-readable and deposit Instruction in reservoir is produced and includes the manufacture of command device, and command device realization is in flow chart one The function of specifying in flow process or one square frame of multiple flow processs and/or block diagram or multiple square frames.
These computer program instructions can also be loaded into computer or other programmable data processing devices On so that series of operation steps is performed on computer or other programmable devices to produce computer The process of realization, so as to the instruction performed on computer or other programmable devices is provided for realizing Specify in one flow process of flow chart or one square frame of multiple flow processs and/or block diagram or multiple square frames The step of function.
In a typical configuration, computing device include one or more processors (CPU), input/ Output interface, network interface and internal memory.
Memory potentially includes the volatile memory in computer-readable medium, random access memory The form such as device (RAM) and/or Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Memory is the example of computer-readable medium.
Computer-readable medium includes that permanent and non-permanent, removable and non-removable media can be with Information Store is realized by any method or technique.Information can be computer-readable instruction, data knot Structure, the module of program or other data.The example of the storage medium of computer includes, but are not limited to phase Become internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electricity can Erasable programmable read-only memory (EPROM) (EEPROM), fast flash memory bank or other memory techniques, read-only light Disk read-only storage (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic Cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus or any other non-transmission medium, Can be used to store the information that can be accessed by a computing device.Define according to herein, computer-readable Medium does not include temporary computer readable media (transitory media), the such as data-signal and load of modulation Ripple.
Also, it should be noted that term " including ", "comprising" or its any other variant are intended to contain Lid nonexcludability is included, so that process, method, commodity including a series of key elements or setting It is standby not only to include those key elements, but also including other key elements being not expressly set out, or also wrap Include the key element intrinsic for this process, method, commodity or equipment.In the feelings without more restrictions Under condition, the key element limited by sentence "including a ...", it is not excluded that including key element process, Also there is other identical element in method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can be provided as method, system or calculating Machine program product.Therefore, the application can adopt complete hardware embodiment, complete software embodiment or knot Close the form of the embodiment in terms of software and hardware.And, the application can adopt at one or more it In include computer-usable storage medium (the including but not limited to disk of computer usable program code Memory, CD-ROM, optical memory etc.) on implement computer program form.
Embodiments herein is these are only, the application is not limited to.For this area skill For art personnel, the application can have various modifications and variations.It is all spirit herein and principle it Interior made any modification, equivalent substitution and improvements etc., should be included in claims hereof model Within enclosing.

Claims (10)

1. a kind of data processing method, it is characterised in that include:
Target data is extracted from pending data;Wherein, the target data includes data attribute value;
The target data is cached in default collection;
In response to garbled data instruction, according to the data attribute value to the mesh in the default collection Mark data are screened, with the target data after being screened;
Target data after the screening is shown.
2. method according to claim 1, it is characterised in that according to the data attribute value pair Target data in the default collection is screened, including:
According to the data attribute value, the website influence power of targeted website is obtained;Wherein, the target Website is the source website of the target data, the website influence power identifies according to targeted website ownership place, Targeted website ranking and the attention rate to targeted website determine;
Using the website influence power, the target data in the default collection is classified Mark;
The target data in the default collection is screened according to the class indication.
3. method according to claim 1 and 2, it is characterised in that according to the data attribute Value carries out screening to the target data in the default collection to be included:
If carrying out screening to the target data in the default collection according to the data attribute value Occur interrupting in journey, then interrupt interpolation data status indicator at corresponding interruption node described, so as to Continue to screen the target data in the default collection according to data mode mark.
4. method according to claim 3, it is characterised in that to the number of targets after the screening According to be shown including:
Class indication according to target data shows the target data after the screening.
5. method according to claim 4, it is characterised in that extracting from pending data Before target data, methods described also includes:
The pending data is obtained based on crawlers, and the pending data is stored in preset In database.
6. a kind of data processing equipment, it is characterised in that include:
Extraction unit, for extracting target data from pending data;Wherein, the target data Comprising data attribute value;
Buffer unit, for the target data that the extraction unit is extracted to be cached in into default collection In folder;
Screening unit, for instructing in response to garbled data, according to the data attribute value to described slow The target data that memory cell is buffered in the default collection is screened, with the mesh after being screened Mark data;
Display unit, for being shown to the target data after screening unit screening.
7. device according to claim 6, it is characterised in that the screening unit, including:
Acquisition module, for according to the data attribute value, obtaining the website influence power of targeted website; Wherein, the targeted website is the source website of the target data, and the website influence power is according to target Website ownership place mark, targeted website ranking and the attention rate to targeted website determine;
Sort module, for the website influence power obtained using the acquisition module, to described pre- If the target data in collection carries out class indication;
Screening module, for according to the class indication of the sort module to the default collection The interior target data is screened.
8. the device according to claim 6 or 7, it is characterised in that the screening unit includes:
Add module, for when according to the data attribute value to the target in the default collection When data carry out occurring interrupting in screening process, described interpolation data at corresponding interruption node is interrupted Status indicator, to be continued to the target in the default collection according to data mode mark Data are screened.
9. device according to claim 8, it is characterised in that the display unit, for pressing Show the target data after the screening according to the class indication of the target data in the screening unit.
10. device according to claim 9, it is characterised in that described device also includes:
Acquiring unit, for before the extraction unit extracts target data from pending data, The pending data is obtained based on crawlers;
Memory cell, after obtaining the pending data in the acquiring unit, treats described Processing data is stored in initialized data base.
CN201510767682.3A 2015-11-11 2015-11-11 Data processing method and device Active CN106682044B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510767682.3A CN106682044B (en) 2015-11-11 2015-11-11 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510767682.3A CN106682044B (en) 2015-11-11 2015-11-11 Data processing method and device

Publications (2)

Publication Number Publication Date
CN106682044A true CN106682044A (en) 2017-05-17
CN106682044B CN106682044B (en) 2021-01-15

Family

ID=58864867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510767682.3A Active CN106682044B (en) 2015-11-11 2015-11-11 Data processing method and device

Country Status (1)

Country Link
CN (1) CN106682044B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590641A (en) * 2017-08-18 2018-01-16 北京北信源软件股份有限公司 A kind of localization method of organization node, system, computer-readable recording medium and storage control
CN107665234A (en) * 2017-07-25 2018-02-06 平安科技(深圳)有限公司 Method for processing business, device, server and storage medium
CN107909483A (en) * 2017-07-25 2018-04-13 平安科技(深圳)有限公司 Flow of settling a claim recognition methods, device, server and storage medium
CN111796513A (en) * 2019-04-08 2020-10-20 阿里巴巴集团控股有限公司 Data processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929985A (en) * 2012-10-18 2013-02-13 北京奇虎科技有限公司 Method and system for displaying collected webpage
CN103389984A (en) * 2012-05-08 2013-11-13 百度在线网络技术(北京)有限公司 Method and device for providing collection association information in search results
CN104965884A (en) * 2015-06-15 2015-10-07 广东欧珀移动通信有限公司 File collection method and related terminal
US20150287092A1 (en) * 2014-04-07 2015-10-08 Favored.By Social networking consumer product organization and presentation application

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103389984A (en) * 2012-05-08 2013-11-13 百度在线网络技术(北京)有限公司 Method and device for providing collection association information in search results
CN102929985A (en) * 2012-10-18 2013-02-13 北京奇虎科技有限公司 Method and system for displaying collected webpage
US20150287092A1 (en) * 2014-04-07 2015-10-08 Favored.By Social networking consumer product organization and presentation application
CN104965884A (en) * 2015-06-15 2015-10-07 广东欧珀移动通信有限公司 File collection method and related terminal

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107665234A (en) * 2017-07-25 2018-02-06 平安科技(深圳)有限公司 Method for processing business, device, server and storage medium
CN107909483A (en) * 2017-07-25 2018-04-13 平安科技(深圳)有限公司 Flow of settling a claim recognition methods, device, server and storage medium
WO2019019621A1 (en) * 2017-07-25 2019-01-31 平安科技(深圳)有限公司 Service processing method, device, server and storage medium
CN107665234B (en) * 2017-07-25 2020-07-28 平安科技(深圳)有限公司 Service processing method, device, server and storage medium
CN107909483B (en) * 2017-07-25 2021-05-04 平安科技(深圳)有限公司 Claims settlement flow identification method, device, server and storage medium
CN107590641A (en) * 2017-08-18 2018-01-16 北京北信源软件股份有限公司 A kind of localization method of organization node, system, computer-readable recording medium and storage control
CN111796513A (en) * 2019-04-08 2020-10-20 阿里巴巴集团控股有限公司 Data processing method and device

Also Published As

Publication number Publication date
CN106682044B (en) 2021-01-15

Similar Documents

Publication Publication Date Title
CN104391951B (en) The loading method and device of webpage thermodynamic
CN104572668B (en) Method and apparatus based on multiple pattern file generated Merge Styles files
CN106682044A (en) Data processing method and device
WO2020253351A1 (en) Click hijacking vulnerability detection method, device and computer apparatus
CN102077201A (en) System and method for dynamic and real-time categorization of webpages
CN103617241B (en) Search information processing method, browser terminal and server
CN103544313B (en) Data processing method and device for webpage recommending
CN107766469A (en) A kind of method for caching and processing and device
CN106570025A (en) Data filtering method and device
CN103530390B (en) The method and apparatus of webpage capture
CN112835682B (en) Data processing method, device, computer equipment and readable storage medium
CN103984743A (en) Method and device for managing memory resources
CN106886547A (en) A kind of scenario generation method and device
CN103064849B (en) Treatment method and device for cascading style sheet (CSS)
CN107015986A (en) A kind of reptile crawls the method and device of webpage
CN106020891A (en) Page loading method and device
CN105376311A (en) Method and device for determining page stay duration based on terminal access
WO2017086992A1 (en) Malicious web content discovery through graphical model inference
CN110008393A (en) It is a kind of for obtaining the method and apparatus of site information
CN109766488A (en) A kind of collecting method based on Scrapy
CN105069135B (en) The data crawling method and system of the website OTA
CN110020297A (en) A kind of loading method of web page contents, apparatus and system
CN110147473A (en) A kind of crawling method and device of crawler
CN108062326A (en) A kind of update recording method of data message and device
CN106817355A (en) The control method and device of webpage authority

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant