CN104965929B - A kind of data processing method and device - Google Patents

A kind of data processing method and device Download PDF

Info

Publication number
CN104965929B
CN104965929B CN201510441030.0A CN201510441030A CN104965929B CN 104965929 B CN104965929 B CN 104965929B CN 201510441030 A CN201510441030 A CN 201510441030A CN 104965929 B CN104965929 B CN 104965929B
Authority
CN
China
Prior art keywords
data content
web page
content
page files
specific data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510441030.0A
Other languages
Chinese (zh)
Other versions
CN104965929A (en
Inventor
张琦
刘锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Media Technology Beijing Co Ltd
Original Assignee
Netease Media Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Media Technology Beijing Co Ltd filed Critical Netease Media Technology Beijing Co Ltd
Priority to CN201510441030.0A priority Critical patent/CN104965929B/en
Publication of CN104965929A publication Critical patent/CN104965929A/en
Application granted granted Critical
Publication of CN104965929B publication Critical patent/CN104965929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Abstract

Embodiments of the present invention provide a kind of data processing method.This method comprises: reading web page files from data source;It is searched in the web page files to generate specific data content;The specific data content is exported.It is searched in web page files by elder generation to generate required specific data content, only the specific data content of acquisition is exported again, content derived from institute is treated data content, method of the invention makes without will carry out again artificial treatment after the total data content export in data source, to improve the speed and efficiency of data processing significantly, better experience is brought for user.In addition, embodiments of the present invention additionally provide a kind of data processing equipment.

Description

A kind of data processing method and device
Technical field
Embodiments of the present invention are related to technical field of data processing, more specifically, embodiments of the present invention are related to one Kind data processing method and device.
Background technique
Background that this section is intended to provide an explanation of the embodiments of the present invention set forth in the claims or context.Herein Description recognizes it is the prior art not because not being included in this section.
It is universal with Internet technology, many users get used to servicing using some networks or cyberspace come record from Situations such as oneself life, work, such as user can record daily life by blog.
Meanwhile user has and export and editing and composing by the data content (such as text, image etc.) for uploading to network The demand of processing, such as Blog content is combined into book.Currently, having had already appeared in the prior art some by network data content Derived scheme is carried out, such as reads the address of a certain data source, the total data content saved in the data source is led Out, processing required for being carried out later using manual type to data content derived from institute.
Summary of the invention
But the data processing method of the prior art, after needing to be exported the total data content in data source, use The processing such as family manually edited, screened to data content derived from institute further according to actual demand, typesetting, when user only needs to count According to the specific part in content, and when the quantity of data content is very more, then require a great deal of time, manpower is completed Editor to data content etc. science and engineering is made, for example, it will only be necessary to the word segment in data content, then needing will be a large amount of derived Non-legible part in data content is deleted, and the speed and efficiency of data handling procedure are extremely inefficient.
Therefore the data handling procedure after in the prior art being exported network data, is very bothersome mistake Journey.
Thus, it is also very desirable to a kind of improved data processing scheme, so as to improve the speed and efficiency of data processing.
In the present context, embodiments of the present invention are intended to provide a kind of data processing method and device.
In the first aspect of embodiment of the present invention, a kind of data processing method is provided, comprising: read from data source Web page files;It is searched in the web page files to generate specific data content;The specific data content is led Out.
In the second aspect of embodiment of the present invention, a kind of data processing equipment is provided, comprising: reading unit is used In from data source read web page files;Generation unit, for being searched in the web page files to generate in specific data Hold;Lead-out unit, for being exported to the specific data content.
The data processing method and device of embodiment according to the present invention can read web page files from data source, first It is searched in web page files to generate required specific data content, then only the specific data content of acquisition is led Out, content derived from institute is treated data content, without by after the total data content export in data source again into Row artificial treatment brings better experience to improve the speed and efficiency of data processing significantly for user.
Detailed description of the invention
The following detailed description is read with reference to the accompanying drawings, above-mentioned and other mesh of exemplary embodiment of the invention , feature and advantage will become prone to understand.In the accompanying drawings, if showing by way of example rather than limitation of the invention Dry embodiment, in which:
Fig. 1 schematically shows the application scenarios that embodiment of the present invention can be implemented within;
Fig. 2 schematically shows the flow charts of data processing method according to an embodiment of the present invention;
Fig. 3 schematically shows the structure chart of data processing equipment according to an embodiment of the present invention.
In the accompanying drawings, identical or corresponding label indicates identical or corresponding part.
Specific embodiment
The principle and spirit of the invention are described below with reference to several illustrative embodiments.It should be appreciated that providing this A little embodiments are used for the purpose of making those skilled in the art can better understand that realizing the present invention in turn, and be not with any Mode limits the scope of the invention.On the contrary, these embodiments are provided so that this disclosure will be more thorough and complete, and energy It is enough that the scope of the present disclosure is completely communicated to those skilled in the art.
One skilled in the art will appreciate that embodiments of the present invention can be implemented as a kind of system, device, equipment, method Or computer program product.Therefore, the present disclosure may be embodied in the following forms, it may be assumed that complete hardware, complete software The form that (including firmware, resident software, microcode etc.) or hardware and software combine.
Embodiment according to the present invention proposes a kind of method and device of data processing.
In addition, any number of elements in attached drawing is used to example rather than limitation and any name are only used for distinguishing, Without any restrictions meaning.
Below with reference to several representative embodiments of the invention, the principle and spirit of the present invention are explained in detail.
Summary of the invention
The inventors discovered that in the prior art network data content can be got simultaneously from the data mapping in network Exported, for network data content derived from institute can be used text edit software edited, the processing such as typesetting.But It is, when needing the specific part in data content, when data content quantity is again very more, then to require a great deal of time, people Power has carried out the editor of paired data content etc. science and engineering work.
In view of the above problems, basic thought of the invention is: reading web page files from data source, which can be with For data mapping or multiple data sources, spy required for being generated by being searched from the web page files of reading Data content is determined, for example, searching word segment and/or Picture section in web page files to generate required specific data Content, then only the specific data content of acquisition is exported, content derived from institute is after handling required for user in this way Data content, without by data source total data content export after carry out artificial treatment again, to improve significantly The speed and efficiency of data processing brings better experience for user.
After introduced the basic principles of the present invention, lower mask body introduces various non-limiting embodiment party of the invention Formula.
Application scenarios overview
It is the block schematic illustration of an exemplary application scene of embodiments of the present invention referring initially to Fig. 1, Fig. 1.Its In, user is interacted by the client 102 on user equipment with the server 101 for providing data record.Art technology It is that embodiments of the present invention can be in one be wherein achieved that personnel, which are appreciated that block schematic illustration shown in FIG. 1 only, Example.The scope of application of embodiment of the present invention is not limited by any aspect of the frame.For example, exemplary answering at another It is also possible to voluntarily to be provided by client 102 with the service in scene, providing data record, user can also only set with user Standby upper client 102 interacts.
It should be noted that user equipment herein can be existing, researching and developing or research and development in the future, Neng Goutong It crosses any type of wiredly and/or wirelessly connection (for example, Wi-Fi, LAN, honeycomb, coaxial cable etc.) and realizes client thereon The 102 any user equipmenies interacted with server 101, including but not limited to: existing, researching and developing or research and development in the future intelligence It can mobile phone, non-smart mobile phone, tablet computer, laptop PC, desktop personal computer, minicomputer, medium-sized Computer, mainframe computer etc..
It is also to be noted that server 101 herein be only it is existing, researching and developing or in the future research and development, can Provide a user an example of the equipment of data recording services.Embodiments of the present invention are not limited in this respect.
Based on frame shown in FIG. 1, client 102 can read web page files from data source;Then, client 102 exists It is searched in web page files to generate specific data content;After again, client 102 can be led specific data content Out.
It is understood that in application scenarios of the invention, although herein and below by the movement of embodiment of the present invention It is described as being executed by client 102, but these movements can also partially be held by the execution of client 102, partially by server 101 Row.The present invention is unrestricted in terms of executing subject, as long as performing movement disclosed in embodiment of the present invention.
Illustrative methods
Below with reference to the application scenarios of Fig. 1, be described with reference to Figure 2 illustrative embodiments according to the present invention for data The method of processing.It should be noted which is shown only for the purpose of facilitating an understanding of the spirit and principles of the present invention for above-mentioned application scenarios, Embodiments of the present invention are not limited in this respect.On the contrary, embodiments of the present invention can be applied to applicable appoint What scene.
Referring to fig. 2, show the flow chart of one embodiment of data processing method in the present invention, for example, can specifically include as Lower step:
Step 201: reading web page files from data source.
Data source can be the data source in network, such as the blog address of a certain user is a data source, described Data source may include individual data source or multiple and different data sources, that is to say, that in the present embodiment can be from single number Web page files are read according to source, web page files can also be read simultaneously from multiple and different data sources.From multiple and different data sources Web page files are read simultaneously, and carry out the processing of subsequent step, may be implemented to the specific data in multiple and different data sources Content is exported together, further promotes the efficiency of data processing.
One data source can correspond to one or more web page files, and web page files can be HTML (Hyper Text Markup Language, HyperText Markup Language) file.For example, each article in a certain user's blog can be It is shown in one webpage, and each webpage can correspond to a html file, i.e., is obtained by reading the blog address of user Corresponding multiple html files under the blog address catalogue.
It may include the letter such as file address of the label of data content type, data content or data content in web page files The label of breath, data content type can be with the type of mark data content, such as text label can be text with mark data content This, picture tag can include multiple identical or different in same web page files using mark data content as picture simultaneously The label of type and corresponding data content can then obtain the web page files by read web page files in this way Corresponding total data content, data content include but is not limited to the contents such as text, picture.
In some possible embodiments, it before reading web page files from data source, can also receive including data The setting information of source path.
It, then can be by receiving in advance since data source can may be multiple and different data sources for individual data source Setting information including data source path finds data source by data source path, then reads web page files from data source.
Step 202: searched in the web page files to generate specific data content.
Total data content corresponding to web page files can be obtained by reading web page files, but can according to user demand It can not need to export total data content, then need to be searched in web page files required specific to generate Data content, for example, specific data content can be text, picture, text snippet, text snippet and picture abstract etc..
In some possible embodiments, according to the difference of specific data content, the specific implementation of step 202 at least may be used To have in a manner of following four:
It is searched in the web page files to generate in the first possible implementation of specific data content, it can To search the text label for including in the web page files, the corresponding data content of the text label is determined as specific data Content.
For example, it is desired to export the data content of plain text, i.e., it, can be in web page files when specific data content is text Searching text label can directly include in one possible implementation the corresponding number of text label in web page files It can also only include the corresponding data content of text label in web page files in alternatively possible implementation according to content File address, can be obtained in the corresponding data of text label by the file address of the corresponding data content of text label Hold, the corresponding data content of text label obtained in this way by searching for the text label for including in web page files is text The corresponding data content of text label is determined as specific data content by data content.In addition, searching text in web page files The realization of label can directly search text label, can also search non-textual label, filter out non-textual label, web page files In remaining label be then text label.
It is searched in the web page files to generate in second of specific data content possible implementation, it can To search the picture tag for including in the web page files, the corresponding data content of the picture tag is determined as specific data Content.
For example, it is desired to which whole pictures are exported, i.e., when specific data content is picture, can be looked into web page files Picture tag is looked for, can directly include the corresponding data of picture tag in web page files in one possible implementation Content can also only include the corresponding data content of picture tag in web page files in alternatively possible implementation File address can be obtained in the corresponding data of picture tag by the file address of the corresponding data content of the picture tag Hold, the corresponding data content of picture tag obtained in this way by searching for the picture tag for including in web page files is picture The corresponding data content of picture tag is determined as specific data content by data content.In addition, searching picture in web page files The realization of label can directly search picture tag, can also search non-picture tag, filter out non-picture tag, web page files In remaining label be then picture tag.
It is searched in the web page files to generate in the third possible implementation of specific data content, it can To search the text label for including in the web page files, the corresponding data content of the text label is determined as midamble Content;The clip Text that default number of words is generated according to the midamble content, is determined as specific data for the clip Text Content.
In some cases, it is desirable to which text is exported in the form of text snippet, can first be looked into web page files It looks for generate text as midamble content, midamble content is recycled to generate clip Text as specific data content.
It is searched in web page files to generate text as in a manner of midamble content, with above-mentioned in web page files It is middle searched it is similar to generate the first possible implementation of specific data content, it can searched in web page files Text label can directly include the corresponding data content of text label in web page files, can also only wrap in web page files The file address for including the corresponding data content of text label, can be with by the file address of the corresponding data content of text label Obtain the corresponding data content of text label, the text mark obtained in this way by searching for the text label for including in web page files Signing corresponding data content is text data content, in this way, the corresponding data content of text label can be determined as centre Word content.
After generating midamble content, it can use the clip Text that summarization generation algorithm generates default number of words, will make a summary Content is determined as specific data content, for example, the midamble content contracting that will be searched and generated in a certain web page files It is kept within 100 words, as the corresponding clip Text of the web page files.
It is searched in the web page files to generate in the 4th of specific data content the kind of possible implementation, it can To search the picture tag for including in the web page files, one or specified one is selected in the picture tag and is used as spy Determine picture tag;The corresponding data content of the particular picture label is determined as intermediate picture content;By the clip Text And the intermediate picture content is determined as specific data content.
In some cases, the form that picture can also be combined to make a summary with picture in the form of text snippet in text is together Export can be searched first in web page files to generate a width picture as intermediate picture content, in conjunction with plucking for text Want content collectively as specific data content.The generation of clip Text may refer to above-mentioned be searched in web page files with life At the third possible implementation of specific data content, details are not described herein.And make a summary for picture, webpage can be searched The picture tag for including in file selectes one or specified one and is used as particular picture in picture tag, such as randomly Label;The corresponding data content of particular picture label is determined as intermediate picture content, i.e. picture is made a summary.In this way, can will pluck Want content and intermediate picture content collectively as specific data content.
Step 203: the specific data content is exported.
Since specific data content is the processing such as to have been screened, edited to data content included in web page files Processing result afterwards, directly exports specific data content, and derived content is exactly required data content, without It carries out editing etc. science and engineering again after export to make, the speed of the data handling procedure after being exported network data greatly improved Degree and efficiency.
In some possible embodiments, described that the derived specific implementation of specific data content progress can wrap It includes:
The specific data content is exported directly to local;Alternatively, the specific data content is exported to third party Data platform.
I.e. specific data content can be exported directly to local, use for example, directly exporting as local word document, can also To export to third party's data platform, for example, may be implemented to export to after the different blogs of same user are carried out summarize with screening Another data platform is saved.
In addition, in some possible embodiments, can also include:
Read the tag along sort that the web page files are arranged;It will be led in web page files with the identical tag along sort The specific data content out is divided into same category.
Each web page files are also provided with tag along sort, for example, the corresponding data content record of the web page files It is travel notes, then can be set as travelling by tag along sort, it, can be by the webpage text with same category label in export process Derived specific data content is placed into together in part.
In addition, in some possible embodiments, can also include:
It is searched in the web page files and starts mark and end of identification;
Separator is inserted into after exporting the specific data content started between mark and the end of identification.
The content started in web page files between mark and end of identification is required content, starts to mark derived Know and be inserted into separator after the specific data content between end of identification, automatic segmentation can be carried out to derived content or divides Page.Starting mark and end of identification user can be set.
It in some possible embodiments, can also include: that file is arranged to the derived certain number according to typesetting Typesetting is carried out according to content.
For example, in typesetting setting file can the contents such as font size, font to text etc., the size of picture be configured, Then file can also be set according to typesetting to derived specific data content in the present embodiment and carry out typesetting, be further reduced user Human-edited's process.
In this way, the data processing method of embodiment according to the present invention, web page files can be read from data source, are first existed It is searched in web page files to generate required specific data content, then only the specific data content of acquisition is led Out, content derived from institute is treated data content, without by after the total data content export in data source again into Row artificial treatment brings better experience to improve the speed and efficiency of data processing significantly for user.
Below in conjunction with practical application, data processing method embodiment provided in the present invention is further described.
For example, a certain user use two blog address: aaaa.blog.163.com and Aaaa.blog.sina.com preserves 100 articles in each blog, then the two blogs preserve 200 articles altogether, uses Family wishes in a specified pattern to publish this 200 articles, then can be used at the data provided in the embodiment of the present invention Reason method handles the network data content saved.
Data source is the two blog address: aaaa.blog.163.com, aaaa.blog.sina.com, blog first In every article correspond to a webpage, a webpage is a web page files, then can read altogether from the two data sources Get 200 web page files.It can be understood that every file may include text and/or picture in blog, then each webpage Data content pointed by file may include text and/or picture.
Then according to user it is specific needs can (1) using the corresponding data content of text label in each web page files as Specific data content is exported, i.e., exports 200 articles in the form of pure words;It (2) will be in each web page files The corresponding data content of picture tag is exported as specific data content, i.e., carries out 200 articles in the form of picture Export;(3) the corresponding data content of text label in each web page files is subjected to abstract extraction, every article is reduced to pluck It wants content to be exported as specific data content, i.e., exports 200 articles in the form of abstract respectively;(4) it combines (2), (3) point will be every using the corresponding data content progress abstract extraction of text label in each web page files as text snippet The corresponding data content of a picture tag chosen in a web page files is made a summary as picture, and text snippet, picture are made a summary It is exported, i.e., is exported every article in 200 articles in the form that text snippet adds picture to make a summary jointly.
In addition, each web page files can have tag along sort, i.e. this 200 articles can have different tag along sorts, The article with same category label can be placed into together in export process.For example, have in 200 articles 50 film reviews and 50 film reviews then can be put together to form film review chapters and sections, 150 travel notes are put together to form travel notes chapter by 150 travel notes Section.Meanwhile separator can be inserted into after the specific data content derived from every article institute, i.e., it can be with every in export process Article has a new paragraph or sets up another one page with easy-to-read or publication, also can be set derived from specific data content typesetting Format, for example, the size etc. of the font size of text, font, picture.
In this way, several web page files can be read from blog address data source, looked into each web page files Look for generate required specific data content, then to carry out searching in each web page files the specific data content of generation into Row export, content derived from institute is treated data content, meanwhile, derived specific data content can be carried out automatic Typesetting significantly improves the speed and efficiency of data processing.
Example devices
After describing the method for exemplary embodiment of the invention, next, with reference to Fig. 3 to the exemplary reality of the present invention Apply mode, be illustrated for the device of data processing.
Referring to Fig. 3, the structure chart of one embodiment of device of data processing in the present invention is shown, such as specifically can wrap It includes:
Reading unit 301, for reading web page files from data source.
Generation unit 302, for being searched in the web page files to generate specific data content.
Lead-out unit 303, for being exported to the specific data content.
In some possible embodiments, the generation unit 302 may include:
First searches subelement, for searching the text label for including in the web page files;
First determines subelement, for the corresponding data content of the text label to be determined as specific data content.
In some possible embodiments, the generation unit 302 may include:
Second searches subelement, for searching the picture tag for including in the web page files;
Second determines subelement, for the corresponding data content of the picture tag to be determined as specific data content.
In some possible embodiments, the generation unit 302 may include:
First searches subelement, for searching the text label for including in the web page files;
Third determines subelement, for the corresponding data content of the text label to be determined as midamble content;
4th determines subelement, will be described for generating the clip Text of default number of words according to the midamble content Clip Text is determined as specific data content.
In some possible embodiments, the generation unit 302 may include:
Second searches subelement, for searching the picture tag for including in the web page files;
Selected/specified subelement is used as particular picture for selecting one or specified one in the picture tag Label;
5th determines subelement, for the corresponding data content of the particular picture label to be determined as in intermediate picture Hold;
6th determines subelement, for clip Text and the intermediate picture content to be determined as specific data content.
In some possible embodiments, the data processing equipment provided in the embodiment of the present invention can also include:
Tag reader unit, for reading the tag along sort to web page files setting;
Taxon, for that will have the derived specific data content in the web page files of the identical tag along sort It is divided into same category.
In some possible embodiments, the data processing equipment provided in the embodiment of the present invention can also include:
Receiving unit, for receiving the setting information including data source path, the data source include individual data source or The multiple and different data source of person.
In some possible embodiments, the lead-out unit 303 can be specifically used for:
The specific data content is exported directly to local;Alternatively, the specific data content is exported to third party Data platform.
In some possible embodiments, the data processing equipment provided in the embodiment of the present invention can also include:
Searching unit starts mark and end of identification for searching in the web page files;
Be inserted into unit, for export it is described start mark the end of identification between the specific data content it After be inserted into separator.
In some possible embodiments, the data processing equipment provided in the embodiment of the present invention can also include:
Typesetting unit carries out typesetting to the derived specific data content for file to be arranged according to typesetting.
In this way, the data processing equipment of embodiment according to the present invention, web page files can be read from data source, are first existed It is searched in web page files to generate required specific data content, then only the specific data content of acquisition is led Out, content derived from institute is treated data content, without by after the total data content export in data source again into Row artificial treatment brings better experience to improve the speed and efficiency of data processing significantly for user.
It should be noted that although being referred to several unit or sub-units of data processing equipment in the above detailed description, Be it is this division be only exemplary it is not enforceable.In fact, embodiment according to the present invention, above-described two The feature and function of a or more unit can embody in a unit.Conversely, the feature of an above-described unit It can be to be embodied by multiple units with further division with function.
In addition, although describing the operation of the method for the present invention in the accompanying drawings with particular order, this do not require that or Hint must execute these operations in this particular order, or have to carry out shown in whole operation be just able to achieve it is desired As a result.Additionally or alternatively, it is convenient to omit multiple steps are merged into a step and executed by certain steps, and/or by one Step is decomposed into execution of multiple steps.
Although detailed description of the preferred embodimentsthe spirit and principles of the present invention are described by reference to several, it should be appreciated that, this It is not limited to the specific embodiments disclosed for invention, does not also mean that the feature in these aspects cannot to the division of various aspects Combination is benefited to carry out, this to divide the convenience merely to statement.The present invention is directed to cover appended claims spirit and Included various modifications and equivalent arrangements in range.

Claims (8)

1. a kind of data processing method, comprising:
Web page files are read from data source;
It is searched in the web page files to generate specific data content, the specific data content is true according to user demand It is fixed;
The specific data content is exported;
Wherein, the method also includes:
Read the tag along sort that the web page files are arranged;
The specific data content derived in web page files with the identical tag along sort is divided into same category;
The method also includes:
In the web page files search start mark and end of identification, it is described start mark and the end of identification be by User setting;
Separator is inserted into after exporting the specific data content started between mark and the end of identification;
It is described to be searched in the web page files to generate specific data content, comprising:
The text label for including in the web page files is searched, the corresponding data content of the text label is determined as intermediate text Word content;
The clip Text of default number of words is generated according to the midamble content;
The picture tag for including in the web page files is searched, one is selected in the picture tag or specifies a conduct Particular picture label;
The corresponding data content of the particular picture label is determined as intermediate picture content;
The clip Text and the intermediate picture content are determined as specific data content.
2. according to the method described in claim 1, before reading web page files from data source, further includes:
The setting information including data source path is received, the data source includes individual data source or multiple and different data Source.
3. described to be exported to the specific data content according to the method described in claim 1, wherein, comprising:
The specific data content is exported directly to local;
Alternatively,
The specific data content is exported into third party's data platform.
4. according to the method described in claim 1, further include:
File is set according to typesetting, typesetting is carried out to the derived specific data content.
5. a kind of data processing equipment, comprising:
Reading unit, for reading web page files from data source;
Generation unit, for being searched in the web page files to generate specific data content, the specific data content It is determined according to user demand;
Lead-out unit, for being exported to the specific data content;
Wherein, the data processing equipment further include:
Tag reader unit, for reading the tag along sort to web page files setting;
Taxon, for that will have the derived specific data content in the web page files of the identical tag along sort to divide For same category;
Searching unit starts mark and end of identification for searching in the web page files, described to start mark and institute It is set by the user for stating end of identification;
It is inserted into unit, for inserting after exporting the specific data content started between mark and the end of identification Enter separator;
The generation unit includes:
First searches subelement, for searching the text label for including in the web page files;
Third determines subelement, for the corresponding data content of the text label to be determined as midamble content;
4th determines subelement, for generating the clip Text of default number of words according to the midamble content;
Second searches subelement, for searching the picture tag for including in the web page files;
Selected/specified subelement is used as particular picture label for selecting one or specified one in the picture tag;
5th determines subelement, for the corresponding data content of the particular picture label to be determined as intermediate picture content;
6th determines subelement, for clip Text and the intermediate picture content to be determined as specific data content.
6. device according to claim 5, further includes:
Receiving unit, for receiving the setting information including data source path, the data source includes individual data source or more A different data source.
7. device according to claim 5, the lead-out unit is specifically used for:
The specific data content is exported directly to local;Alternatively, the specific data content is exported to third party's data Platform.
8. device according to claim 5, further includes:
Typesetting unit carries out typesetting to the derived specific data content for file to be arranged according to typesetting.
CN201510441030.0A 2015-07-24 2015-07-24 A kind of data processing method and device Active CN104965929B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510441030.0A CN104965929B (en) 2015-07-24 2015-07-24 A kind of data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510441030.0A CN104965929B (en) 2015-07-24 2015-07-24 A kind of data processing method and device

Publications (2)

Publication Number Publication Date
CN104965929A CN104965929A (en) 2015-10-07
CN104965929B true CN104965929B (en) 2019-07-02

Family

ID=54219968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510441030.0A Active CN104965929B (en) 2015-07-24 2015-07-24 A kind of data processing method and device

Country Status (1)

Country Link
CN (1) CN104965929B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489543B (en) * 2019-08-14 2020-09-15 北京金堤科技有限公司 News abstract extraction method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093468A (en) * 2006-06-19 2007-12-26 上海新纳广告传媒有限公司 Method for automatic timed updating data of weather forecast at terminal
CN101957866A (en) * 2010-10-25 2011-01-26 中国农业大学 Network text information integration method and device
CN102779169A (en) * 2012-06-27 2012-11-14 江苏新瑞峰信息科技有限公司 Extracting method and device for webpage content based on HTML (Hypertext Markup Language) label
CN102982144A (en) * 2012-11-22 2013-03-20 东莞宇龙通信科技有限公司 Method and system for sharing webpage information
CN103473285A (en) * 2013-08-29 2013-12-25 北京奇虎科技有限公司 Web information extraction method and device based on location markers

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093468A (en) * 2006-06-19 2007-12-26 上海新纳广告传媒有限公司 Method for automatic timed updating data of weather forecast at terminal
CN101957866A (en) * 2010-10-25 2011-01-26 中国农业大学 Network text information integration method and device
CN102779169A (en) * 2012-06-27 2012-11-14 江苏新瑞峰信息科技有限公司 Extracting method and device for webpage content based on HTML (Hypertext Markup Language) label
CN102982144A (en) * 2012-11-22 2013-03-20 东莞宇龙通信科技有限公司 Method and system for sharing webpage information
CN103473285A (en) * 2013-08-29 2013-12-25 北京奇虎科技有限公司 Web information extraction method and device based on location markers

Also Published As

Publication number Publication date
CN104965929A (en) 2015-10-07

Similar Documents

Publication Publication Date Title
Soratto et al. Thematic content analysis using ATLAS. ti software: Potentialities for researchs in health
US11372935B2 (en) Automatically generating a website specific to an industry
CN107346336B (en) Information processing method and device based on artificial intelligence
Zhao et al. Using semantic web technologies for representing e-science provenance
US9251130B1 (en) Tagging annotations of electronic books
CN112749284B (en) Knowledge graph construction method, device, equipment and storage medium
WO2015047920A1 (en) Title and body extraction from web page
KR20080071213A (en) System and method for research information service based on semantic web
Gibbs Grounded theory, coding and computer-assisted analysis
JP6840597B2 (en) Search result summarizing device, program and method
Koch et al. D-WISE Tool Suite for the Sociology of Knowledge Approach to Discourse
CN104965929B (en) A kind of data processing method and device
CN111930976A (en) Presentation generation method, device, equipment and storage medium
Parinov Semantic attributes for citation relationships: creation and visualization
MacNeil et al. Generic evolution and the online archival catalogue
Kumar et al. Implementation of MVC (Model-View-Controller) design architecture to develop web based Institutional repositories: A tool for Information and knowledge sharing
Belerao et al. Summarization using mapreduce framework based big data and hybrid algorithm (HMM and DBSCAN)
Celli et al. Discovering, indexing and interlinking information resources
CN109815313A (en) Personalization technology survey data processing method, device, equipment and storage medium
Benardou et al. From Europeana Cloud to Europeana Research: Tools, users and methods
Shen Data discovery, reuse, and integration: the perspectives of natural resources and environmental scientists
Khezri et al. HIET Web-based digital repository for health informatics evaluation tools
Derry A microanalysis of pair problem solving with and without a computer tool
Paknejad Technical assessment of Greenstone toward development of digital libraries in Iran
Petrushyna et al. i*-REST: Light-Weight i* Modeling with RESTful Web Services.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant