CN107577683A - Information processor, information processing method and message processing device - Google Patents

Information processor, information processing method and message processing device Download PDF

Info

Publication number
CN107577683A
CN107577683A CN201610523111.XA CN201610523111A CN107577683A CN 107577683 A CN107577683 A CN 107577683A CN 201610523111 A CN201610523111 A CN 201610523111A CN 107577683 A CN107577683 A CN 107577683A
Authority
CN
China
Prior art keywords
attribute
perpetual object
initial
extracted
templates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610523111.XA
Other languages
Chinese (zh)
Inventor
张波
孟遥
孙俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201610523111.XA priority Critical patent/CN107577683A/en
Publication of CN107577683A publication Critical patent/CN107577683A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Present disclose provides information processor, information processing method and message processing device.Information processor includes:Initial extraction unit, its input of first based on user, extracts the concern part relevant with the particular community of the perpetual object from the webpage as the initial search result of perpetual object;Original template generation unit, its input of second based on the user, the concern part is labeled and the attribute of the perpetual object is extracted from the concern part, and it is trained using the concern part of mark, to generate the initial attribute environment templates with the context-sensitive of the attribute of the perpetual object;And expansion templates generation unit, it utilizes the initial attribute environment templates, the info web related to the attribute of the perpetual object is extracted from network, to obtain expanded search results, and the extended attribute environment templates based on expanded search results generation with the context-sensitive of the attribute of the perpetual object.

Description

Information processor, information processing method and message processing device
Technical field
The disclosure relates generally to field of information processing, in particular to the acquisition of information work that can provide customization Information processor, information processing method and the message processing device of tool.
Background technology
With going deep into for global IT application, the entrance the Internet model of all trades and professions all slowly.Various mechanisms or individual It is required for obtaining information and extraction knowledge from the information source of such as internet.A kind of common information acquiring pattern is to utilize to search Index, which is held up, to be scanned for.However, user be directly viewable using search engine obtain a large amount of search results may excessively it is complicated and It is time-consuming.In addition, even with instruments such as existing search engine reptile, focused crawler, visualization reptiles, also it is only capable of realizing crawl The effect of information, and not necessarily meet the actual demand of user.
It is desirable to be improved existing information acquiring pattern, to meet the actual demand of user.
The content of the invention
The brief overview on the present invention is given below, to provide on the basic of certain aspects of the invention Understand.It should be appreciated that this general introduction is not the exhaustive general introduction on the present invention.It is not intended to determine the pass of the present invention Key or pith, nor is it intended to limit the scope of the present invention.Its purpose only provides some concepts in simplified form, In this, as the preamble in greater detail discussed later.
In view of the defects of prior art, an object of the present invention is to provide a kind of acquisition of information that can provide customization Information processor, method and the equipment of instrument, it is existing at least to solve the problems, such as.
According to an aspect of this disclosure, there is provided a kind of information processor, including:Initial extraction unit, it is based on using First input at family, extraction and the particular community of the perpetual object from the webpage as the initial search result of perpetual object Relevant concern part;Original template generation unit, its input of second based on the user, the concern part is marked Note and extract the attribute of the perpetual object from the concern part, and be trained using the concern part of mark, with The initial attribute environment templates of generation and the context-sensitive of the attribute of the perpetual object;And expansion templates generation unit, It utilizes the initial attribute environment templates, and the info web related to the attribute of the perpetual object is extracted from network, with Obtain expanded search results, and the context-sensitive based on expanded search results generation with the attribute of the perpetual object Extended attribute environment templates.
According to another aspect of the present disclosure, there is provided a kind of information processing method, including:The first input based on user, from The concern part relevant with the particular community of the perpetual object as extraction in the webpage of the initial search result of perpetual object; The second input based on the user, the concern part is labeled and extracts the concern pair from the concern part The attribute of elephant, and be trained using the concern part of mark, have with the context generated with the attribute of the perpetual object The initial attribute environment templates of pass;And the initial attribute environment templates are utilized, extraction and the perpetual object from network The related info web of attribute, to obtain expanded search results, and based on expanded search results generation and the pass Note the extended attribute environment templates of the context-sensitive of the attribute of object.
According to the another aspect of the disclosure, there is provided a kind of message processing device, the equipment include controller, the control Device is configured as:Based on user first input, from the webpage as the initial search result of perpetual object extraction with it is described The relevant concern part of the particular community of perpetual object;The second input based on the user, the concern part is marked Note and extract the attribute of the perpetual object from the concern part, and be trained using the concern part of mark, with The initial attribute environment templates of generation and the context-sensitive of the attribute of the perpetual object;And utilize the initial attribute ring Border template, the info web related to the attribute of the perpetual object is extracted from network, to obtain expanded search results, and Extended attribute environment templates based on expanded search results generation with the context-sensitive of the attribute of the perpetual object.
According to the other side of the disclosure, additionally provide a kind of so that computer is used as information processor as described above Program.
According to the another aspect of the disclosure, corresponding computer-readable recording medium is additionally provided, this is computer-readable to deposit Calculating can be set upon execution by the computer program of computing device, the computer program by being stored with storage media It is standby to perform above- mentioned information processing method.
The above-mentioned various aspects according to the embodiment of the present disclosure, it can at least obtain following benefit:Needed for different users Ask, there is provided the information acquisition instrument of customization.This is particularly useful effect for the small-scale user such as small enterprise, individual Fruit.
By excellent below in conjunction with detailed description of the accompanying drawing to the most preferred embodiment of the disclosure, the these and other of the disclosure Point will be apparent from.
Brief description of the drawings
The disclosure can be by reference to being better understood, wherein in institute below in association with the description given by accompanying drawing Have and same or analogous reference has been used in accompanying drawing to represent same or similar part.The accompanying drawing is together with following Describe in detail and include in this manual and formed the part of this specification together, and for this is further illustrated Disclosed preferred embodiment and the principle and advantage for explaining the disclosure.Wherein:
Fig. 1 is the block diagram for the exemplary construction for schematically showing the information processor according to the embodiment of the present disclosure.
Fig. 2A to Fig. 2 C is the explanation figure for illustrating the example for the initial search result that perpetual object is obtained from network.
Fig. 3 and Fig. 4 is for illustrating that the initial extraction unit in the information processor according to the embodiment of the present disclosure is entered The explanation figure of capable example process.
Fig. 5 is for illustrating that the original template generation unit in the information processor according to the embodiment of the present disclosure is carried out Example process explanation figure.
Fig. 6 is for illustrating that the expansion templates generation unit in the information processor according to the embodiment of the present disclosure is carried out Example process explanation figure.
Fig. 7 is the block diagram for another exemplary construction for schematically showing the information processor according to the embodiment of the present disclosure.
Fig. 8 is the flow chart for the example flow for schematically showing the information processing method according to the embodiment of the present disclosure.
Fig. 9 is the flow for another example flow for schematically showing the information processing method according to the embodiment of the present disclosure Figure.
Figure 10 is the block diagram for the exemplary construction for schematically showing the message processing device according to the embodiment of the present disclosure.
Figure 11 is to show to can be used to realize one according to the information processor of the embodiment of the present disclosure, method and equipment The structure diagram of the possible hardware configuration of kind.
Embodiment
The one exemplary embodiment of the present invention is described hereinafter in connection with accompanying drawing.For clarity and conciseness, All features of actual embodiment are not described in the description.It should be understood, however, that developing any this actual implementation It must be made during example much specific to the decision of embodiment, to realize the objectives of developer, for example, symbol Those restrictive conditions related to system and business are closed, and these restrictive conditions may have with the difference of embodiment Changed.In addition, it will also be appreciated that although development is likely to be extremely complex and time-consuming, to having benefited from the disclosure For those skilled in the art of content, this development is only routine task.
Herein, it is also necessary to which explanation is a bit, in order to avoid having obscured the present invention because of unnecessary details, in the accompanying drawings It illustrate only and according to the closely related apparatus structure of the solution of the present invention and/or processing step, and eliminate and the present invention The little other details of relation.
In field of information processing, it is expected to be improved existing information acquiring pattern, to meet the difference of different user Demand.Based on this, the present disclosure proposes a kind of information processor, method and equipment, and it can be directed to different users and need Ask, there is provided the information acquisition instrument of customization.
According to an aspect of this disclosure, there is provided a kind of information processor.Fig. 1 is schematically shown according to this public affairs Open the block diagram of the exemplary construction of the information processor of embodiment.
As shown in figure 1, information processor 10 includes:Initial extraction unit 101, its input of first based on user, from The concern part relevant with the particular community of the perpetual object as extraction in the webpage of the initial search result of perpetual object; Original template generation unit 102, its input of second based on the user, the concern part is labeled and from the pass Note extracts the attribute of the perpetual object in part, and is trained using the concern part of mark, with generation and the pass Note the initial attribute environment templates of the context-sensitive of the attribute of object;And expansion templates generation unit 103, its utilize described in Initial attribute environment templates, the info web related to the attribute of the perpetual object is extracted from network, is searched with obtaining extension Hitch fruit, and the extended attribute based on expanded search results generation with the context-sensitive of the attribute of the perpetual object Environment templates.
Various prior art manners can be utilized, are obtained from the network of such as internet with perpetual object (hereinafter " perpetual object " can be referred to as " entity ") relevant initial search result.
Fig. 2A to Fig. 2 C is the explanation figure for illustrating the example for the initial search result that perpetual object is obtained from network, It illustrates the initial search result for obtaining perpetual object by different way.
Fig. 2A shows the example for the information that perpetual object is obtained in the specific region for the particular webpage specified from user.This Kind of mode is adapted to the webpage of Deep Web types, i.e. and web page interlinkage do not change/but the webpage that changes of data.For example, It can be chosen using web crawlers shown in Fig. 2A in the page including perpetual object name " XX " " the mathematical physics department of the Chinese Academy of Sciences " All original list regions, to be used as initial search result.
Fig. 2 B show the perpetual object keyword provided using specific META Search Engine according to user to scan for Example.In Fig. 2 B example, entity key " Camry " is inputted in " family of automobile " META Search Engine, to obtain just Walk search result.
Fig. 2 C show the example by way of carrying out the whole network traversal to scan for after sub-pages are obtained. In Fig. 2 C example, after the sub-pages of " family of my automobile " are obtained, arranged according to specified seed URL distances Sequence, by way of breadth first traversal, obtain all theme models of " family of my automobile " website, as with concern pair As " family of my automobile " relevant initial search result.
From Fig. 2A to Fig. 2, it is huge to can be seen that the possible quantity of the initial search result obtained using existing way from network by C Greatly, content is numerous and diverse, and user may need to take considerable time could obtain content really interested therefrom with energy.
Inventors realized that above mentioned problem of the prior art, and propose the information processor of the embodiment of the present disclosure. Using the information processor 10 of the present embodiment, the initial search result obtained from existing way can be based on user's Input and obtain the expanded search results customized and then obtain the extended attribute environment templates customized, determine so as to obtain The information acquisition instrument of inhibition and generation.
Each component units institute of the information processor 10 of the present embodiment is further described hereinafter with reference to Fig. 3 to Fig. 5 The example process of progress.
It is considered as following situations of example:User wishes to the Chinese Academy of Sciences as perpetual object from Network Capture Academician XX personal brief introduction, including each attribute such as the date of birth of the academician, birthplace, occupation.Therefore, first from such as scheming In the related web page of the Chinese Academy of Sciences shown in 2A, the initial search result of the personal brief introduction comprising the academician XX is obtained, such as Shown in Fig. 3.
Using the information processor 10 of the present embodiment, in face of initial search result as shown in Figure 3, can be carried in user After the first input, the pass relevant with the particular community of perpetual object is extracted from the webpage by initial extraction unit 101 Note part.In this example, the particular community of perpetual object for example can including perpetual object XX date of birth, birthplace, Occupation;Concern part can include name, photo and personal brief introduction comprising above-mentioned attribute of perpetual object etc..
In a preferred embodiment, the first of user involved in initial extraction unit 101 described herein is defeated Enter and/or the original template generation unit 102 that is described later in the second input of involved user can be user by can The input carried out depending on change mode.User's input is carried out using this visual means, can cause to use in a user-friendly manner Family customizes its information acquisition instrument.For ease of illustration for the sake of, hereinafter inputted as example and be described using visual user. However, it will be understood by those skilled in the art that the information processor of the embodiment of the present disclosure is suitably adapted for various types of use Family inputs, for example, user's input of audible etc., is not repeated.
For ease of illustration for the sake of, Fig. 4 is shown can be by the initial extraction list of the information processor of the embodiment of the present invention The example process interface that member 101 provides, wherein left side shows the webpage as initial search result, right side alternatively shows The information of the concern part on being extracted is gone out.
In interface as shown in Figure 4 left side shown in, initial extraction unit 101 based on user first input and from webpage It has selected name, brief introduction, this concern part of three contents as perpetual object XX of photo.As an example, user can pass through In the interface shown in Fig. 4 the first input is provided on the webpage in left side using cursor selection appropriate section.
In a preferred exemplary, user can be directed to extracted concern part and provide mark, that is, with the addition of shown in Fig. 4 Interface in the word " name ", " brief introduction ", " photo " that are superimposed on the webpage of left side.Correspondingly, in this preferred exemplary, Fig. 4 institutes Right side shows the information of the concern part on being extracted in the interface shown, including user is to paying close attention to each content of part Mark (such as " name ") and the type (such as " text ") of the content etc..It will be understood by those skilled in the art that shown in Fig. 4 Interface in left side display mark and the relevant information that show of right side be optional rather than necessary, only conduct is preferably shown for it Example, it is therefore intended that the concern part for being easy to user's identification to be extracted.
Fig. 5 shows an example process interface being labeled using original template generation unit 102.For having carried The concern part as individual's brief introduction of taking-up, inputted using original template generation unit 102, second based on user into rower Note and attributes extraction.Shown in left side in interface as shown in Figure 5, the second input of user can be selected from personal brief introduction " astronomer ", and label it as " occupation ";Select " November 23 nineteen twenty-three ", and label it as " birthday ";Select " good fortune Build Foochow ", and label it as " birthplace ".As an example, user can be by sharp on the webpage in the interface shown in Fig. 5 Appropriate section is selected with cursor and provides the second input using the corresponding word of input through keyboard.
In a preferred exemplary, in the example process interface that original template generation unit 102 is labeled, such as Fig. 5 Shown in the top on right side, it is shown that attribute and gained of the second input based on user from the perpetual object of concern extracting section The mark corresponding with attribute arrived, and as shown in the bottom on the right side of Fig. 5, it is shown that corresponding items for information.Art technology Personnel are appreciated that the display on the right side of Fig. 5 is optional rather than necessary, and it is only used as preferred exemplary, it is therefore intended that is easy to user The attribute and mark of the extracted perpetual object of identification.
It is allocated as using the concerned department marked for example shown in Fig. 5 as training corpus, original template generation unit 102 can be with It is trained, to generate the initial attribute environment templates with the context-sensitive of the attribute of perpetual object.For example, the initial category of generation Property environment templates a kind of way of example can include from concern part in directly extraction perpetual object attribute periphery word, And using frequency of occurrences highest word in the word extracted as the keyword in initial attribute environment templates.It is in addition, initial Template generation unit 102 can be trained using various appropriate machine learning methods based on the concern part marked, This is without being described in detail.
Unrestricted as example, the initial attribute environment templates generated can be such as including perpetual object " XX " (such as " Deng Jiaxian ") and attribute environment word " people ", relevant with birthplace attribute template { XX famous person };Including concern Object " XX " and attribute environment word " birth ", relevant with birthday attribute template { the XX times are born };Etc..
The one or more initial attribute environment templates generated using original template generation unit 102, expansion templates generation Unit 103 can extract the info web related to the attribute of perpetual object from network, to obtain expanded search results, and base In the extended attribute environment templates of expanded search results generation and the context-sensitive of the attribute of perpetual object.
It is understood that the attribute involved by for initial attribute environment templates, expansion templates generation unit 103 is obtained The property context different from concern part may be included in the expanded search results taken.Therefore, searched based on such extension Hitch fruit, expansion templates generation unit 103 can utilize various appropriate machine learning methods or by directly extracting attribute week The modes such as the frequent words on side, the extended attribute environment templates different from initial attribute environment templates are obtained, are no longer carried out herein It is described in detail.
On the other hand, in the expanded search results obtained using initial attribute environment templates, it is initial that this may also be included Other attributes that attribute environment templates are not directed to, perpetual object.
As an example, Fig. 6, which is shown, utilizes the initial attribute environment templates related to perpetual object XX birthplace attribute The info web that keyword " XX "+" people " in { XX famous person } searches out from network.Herein for the sake of simplifying and illustrating, only show Partial search results are gone out, and specific web page contents therein are only illustrative.As shown in Fig. 6 example, based on In the webpage that the initial attribute environment templates of birthplace attribute obtain, first, Article 4, Article 5, Article 7 info web It further comprises graduation universities and colleges attribute, i.e., " Party school of Provincial Party committee " in first, " Guangdong Party school of Provincial Party committee " in Article 4, in Article 5 " Party School of the CPC Central Committee ", " National Central University " in Article 7.
Therefore, the further input (such as the 3rd input of user) that expansion templates generation unit 103 can be based on user And graduation universities and colleges attribute that initial attribute environment templates are not directed to is extracted from above-mentioned info web (such as in Article 7 " National Central University "), and " graduation universities and colleges " are labeled it as, and then by training the expansion obtained with the context-sensitive of graduation universities and colleges Open up attribute environment templates, such as { XX graduates from noun }.
That is, the attribute being not directed to for initial attribute environment templates, expansion templates generation unit 103 can according to it is initial Template generation unit 102 generates initial attribute environment templates similar mode generation extended attribute environment templates.Expansion templates are given birth to Expanded search results can be considered as to new " concern part " into unit 103, (such as the 3rd of user is defeated based on user's input Enter) it is labeled and attributes extraction, so as to obtain extended attribute environment templates.
It can be seen that using the information processor of the present embodiment, the initial search result obtained by prior art can The expanded search results of customization are obtained with the input based on user and then obtain the extended attribute environment templates customized.
In a preferred embodiment, expansion templates generation unit 103 can be extracted described initial from the webpage of network The brotgher of node of text node and the text node where the keyword of attribute environment templates, as the webpage extracted Information.
Using the above-mentioned preferred disposition of expansion templates generation unit 103, can extract in webpage with initial attribute environment The part that the keyword of template is closely related, so as to filter out the irrelevant portions in webpage.
In a preferred embodiment, in the info web extracted, expansion templates generation unit 103 can will with it is first The similarity for the concern part that beginning extraction unit 101 extracts is more than the info web of predetermined threshold as the expanded search knot Fruit.
For example, referring back to the example shown in Fig. 6.From fig. 6 it can be seen that utilizing initial attribute environment templates { XX Ground famous person } in the info web that is searched out from network of keyword " XX "+" people " in, first, Article 4 and Article 5 net Page information is related to the personal brief introduction of other personages to be born the same name with scientist, and Article 2 and Article 3 info web are related to the network user Homepage, different from the personal brief introduction for scientist shown in Fig. 3 and Fig. 4.
In this case, when user it is expected to obtain from network and expanded search results and base as concerned department classification When such expanded search results are to generate extended attribute environment templates, phase is considered using expansion templates generation unit 103 Like the preferred disposition of degree, the unexpected part in info web can be filtered out.For example, during Fig. 6 example can be filtered out The first to five info web, and retain with concerned department classification as, the Article 6 relevant with scientist XX personal brief introduction and Article 7 info web.
Correspondingly, there is provided the expanded search results to expansion templates generation unit 103 are more as classifying with concerned department Information source, and extended attribute environment templates can be generated based on such information source, it is more accurate so as to be advantageous to generation And/or more fully extended attribute environment templates.
In a preferred embodiment, it is similar that the similarity that expansion templates generation unit 103 is considered can include content Degree and structural similarity.
In a preferred embodiment, the content similarity includes the field contents and webpage that can include concern part The similarity between field contents in information.
In a preferred embodiment, the structural similarity can include it is following in it is at least one:CSS modifies class name Similarity, masurium similarity, field length similarity.In other words, expansion templates generation unit 103 can be by info web CSS modifications class name, the CSS modification classes of concern part extracted respectively with initial extraction unit 101 of masurium, field length Name, masurium, field length are compared, to judge whether the similarity between these projects is more than predetermined threshold.
Performance requirement can be based on, calculate the various design factors such as cost, system configuration suitably to set expansion templates Predetermined threshold used in generation unit 103 on similarity.It is only unrestricted as example, for field contents similarity Predetermined threshold could be arranged to each node co-occurrence keyword quantity ratio up to 80%;Class name or member are modified for CSS The predetermined threshold of plain name similarity could be arranged to the registration of CSS modification class names or masurium up to 50%;For field length The predetermined threshold of similarity could be arranged to the length ratio of field length up to 80%;Etc..
Another exemplary construction of information processor according to the embodiment of the present disclosure is described referring to Fig. 7.Fig. 7 is to show The block diagram of another exemplary construction of information processor according to the embodiment of the present disclosure is shown to meaning property.
As shown in Figure 7, in information processor 70, except similar with the unit 101 to 103 in Fig. 1 respectively Initial extraction unit 701, original template generation unit 702, outside expansion templates generation unit 703, in addition to extension extraction Unit 704, for extracting the extended attribute of the perpetual object from network using the extended attribute environment templates.
As described above with described by Fig. 6, expansion templates generation unit 703 can obtain the expansion of such as { XX graduates from noun } Attribute environment templates are opened up, and pass can be extracted in favor of such extended attribute environment templates from network by extending extraction unit 704 Note object XX extended attribute, that is, universities and colleges of graduating.
Using the information processor 70 of this implementation, except original template generation unit 102 is from the category of concern extracting section Property outside, user is also based on extending the extended attribute that is extracted of extraction unit 704 to build the customization on perpetual object The knowledge base of change.
In a preferred embodiment, when the extended attribute and original template generation list that extension extraction unit 704 is extracted When the attribute that member 102 is extracted is inconsistent, the extended attribute that extension extraction unit 704 can be extracted is presented to the use Family.
For example, it is assumed that expansion templates generation unit 703 utilizes extended attribute environment templates { XX graduates from noun } from network Middle acquisition XX graduation universities and colleges are " A universities ", its graduation universities and colleges extracted with original template generation unit 102 from concern part Not including graduation universities and colleges in attribute " B universities " contradiction, or the attribute extracted from concern part, (both of which can be considered " extended attribute " and the attribute that original template generation unit 102 is extracted are inconsistent).Now, extending extraction unit 704 can incite somebody to action The extended attribute is presented to user.
So, user may determine that in the knowledge base on perpetual object of its structure whether extended attribute should substitute Whether the initial attribute (the previous case) extracted from concern part, or extended attribute should be used as the supplement to initial attribute (latter event), and correspondingly update its knowledge base.
Described above by reference to Fig. 1 to Fig. 7 according to the information processor and its component units of the embodiment of the present disclosure and Relevant treatment.Using the information processor according to the embodiment of the present disclosure, different user's requests can be directed to, there is provided customize Information acquisition instrument.
According to another aspect of the disclosure, there is provided a kind of information processing method.Fig. 8 is schematically shown according to this The flow chart of the example flow of the information processing method of open embodiment.
As shown in figure 8, information processing method 80 can include:Initial extraction step S801, the first input based on user, The concerned department relevant with the particular community of the perpetual object is extracted from the webpage as the initial search result of perpetual object Point;Original template generation step S803, the second input based on the user, the concern part is labeled and from described Concern extracts the attribute of the perpetual object in part, and is trained using the concern part of mark, with generation with it is described The initial attribute environment templates of the context-sensitive of the attribute of perpetual object;And expansion templates generation step S805, utilize institute Initial attribute environment templates are stated, the info web related to the attribute of the perpetual object are extracted from network, to be extended Search result, and the extension category based on expanded search results generation with the context-sensitive of the attribute of the perpetual object Property environment templates.
In a preferred embodiment, first input and/or second input are that the user passes through visualization The input that mode is carried out.
In a preferred embodiment, where the keyword that the initial attribute environment templates are extracted from the webpage of network Text node and the text node the brotgher of node, as the info web extracted.
In a preferred embodiment, in the info web extracted, by with extracted concern part similarity More than predetermined threshold info web as the expanded search results.
In a preferred embodiment, the similarity includes content similarity and structural similarity.
In a preferred embodiment, the structural similarity include it is following in it is at least one:CSS modification class names are similar Degree, masurium similarity, field length similarity.
Above- mentioned information processing method 80 and its each step can realize the information by being described above by reference to Fig. 1 to Fig. 6 The processing that device 10 and its each component units are carried out is managed, and realizes similar effect, no longer carries out repeat specification herein.
Another example flow of information processing method according to the embodiment of the present disclosure is described referring to Fig. 9.Fig. 9 is to show The flow chart of another example flow of information processing method according to the embodiment of the present disclosure is shown to meaning property.
As shown in figure 9, in information processing method 90, except similar with the corresponding steps S801 to S805 in Fig. 8 respectively Initial extraction step S901, original template generation step S903, outside expansion templates generation step S905, in addition to extension carries Step S907 is taken, for extracting the extended attribute of the perpetual object from network using the extended attribute environment templates.
In a preferred embodiment, when the extended attribute extracted from network from the concern part with being extracted Attribute it is inconsistent when, the extended attribute extracted is presented to the user.
Above- mentioned information processing method 90 and its each step can realize the information processor by being described above by reference to Fig. 7 70 and its processing that carries out of each component units, and similar effect is realized, no longer carry out repeat specification herein.
According to the another aspect of the disclosure, there is provided a kind of message processing device.Figure 10 is to schematically show basis The block diagram of the exemplary construction of the message processing device of the embodiment of the present disclosure.
As shown in Figure 10, message processing device 100 can include controller 1001.Controller 1001 can be configured as: The first input based on user, extraction and the spy of the perpetual object from the webpage as the initial search result of perpetual object Determine the relevant concern part of attribute;The second input based on the user, the concern part is labeled and from the pass Note extracts the attribute of the perpetual object in part, and is trained using the concern part of mark, with generation and the pass Note the initial attribute environment templates of the context-sensitive of the attribute of object;And the initial attribute environment templates are utilized, from net The info web related to the attribute of the perpetual object is extracted in network, to obtain expanded search results, and is based on the expansion Open up the extended attribute environment templates of search result generation and the context-sensitive of the attribute of the perpetual object.
Message processing device 100 can utilize arbitrary specialized hardware, special-purpose computer or in general universal personal to calculate Machine is realized, and controller 1001 can utilize CPU (CPU), processor, application specific integrated circuit etc. various suitable Realized when device.
Utilize message processing device 100, it is possible to achieve by information processor 10, the letter described above by reference to Fig. 1 to Fig. 7 The processing of processing unit 70 and its progress of each component units is ceased, and obtains corresponding effect, is lack of repetition herein.
Figure 11 is to show to can be used to realize one according to the information processor of the embodiment of the present disclosure, method and equipment The structure diagram of the possible hardware configuration of kind.
In fig. 11, CPU (CPU) 1101 according to the program stored in read-only storage (ROM) 1102 or from The program that storage part 1108 is loaded into random access memory (RAM) 1103 performs various processing.In RAM 1103, root is gone back The data required when CPU 1101 performs various processing etc. are stored according to needs.CPU 1101, ROM 1102 and RAM 1103 It is connected to each other via bus 1104.Input/output interface 1105 is also connected to bus 1104.
Components described below is also connected to input/output interface 1105:It is importation 1106 (including keyboard, mouse etc.), defeated Go out part 1107 (including display, such as cathode-ray tube (CRT), liquid crystal display (LCD) etc., and loudspeaker etc.), storage Part 1108 (including hard disk etc.), communications portion 1109 (including NIC is such as LAN card, modem).Communication Part 1109 performs communication process via network such as internet.As needed, driver 1110 can be connected to input/output Interface 1105.Detachable media 1111 such as disk, CD, magneto-optic disk, semiconductor memory etc. can be pacified as needed On driver 1110 so that the computer program read out can be installed in storage part 1108 as needed.
In addition, the disclosure also proposed a kind of program product for the instruction code for being stored with machine-readable.Above-mentioned instruction When code is read and performed by machine, the above-mentioned information processing method according to the embodiment of the present disclosure can perform.Correspondingly, for holding The various storage mediums such as disk, CD, magneto-optic disk, semiconductor memory for carrying this program product are also included within this public affairs In the disclosure opened.
In the description to disclosure specific embodiment above, for a kind of description of embodiment and/or the feature shown It can be used in a manner of same or similar in one or more other embodiments, with the feature in other embodiment It is combined, or substitute the feature in other embodiment.
In addition, the method for the presently disclosed embodiments be not limited to specifications described in or shown in accompanying drawing when Between sequentially perform, can also be according to other time sequencings, concurrently or independently perform.Therefore, described in this specification The execution sequence of method scope of the presently disclosed technology is not construed as limiting.
It should be further understood that can also can be stored in various machines according to each operating process of the above method of the disclosure The mode of computer executable program in the storage medium of reading is realized.
Moreover, the purpose of the disclosure can also be accomplished in the following manner:Above-mentioned executable program code will be stored with Storage medium is directly or indirectly supplied to system or equipment, and computer or central processing in the system or equipment Unit (CPU) reads and performs said procedure code.
Now, as long as the system or equipment have the function of configuration processor, then embodiment of the present disclosure is not limited to Program, and the program can also be arbitrary form, for example, program that target program, interpreter perform or being supplied to behaviour Make shell script of system etc..
These above-mentioned machinable mediums include but is not limited to:Various memories and memory cell, semiconductor equipment, Disk cell such as light, magnetic and magneto-optic disk, and other media suitable for storage information etc..
In addition, customer information processing terminal is by the corresponding website that is connected on internet, and by according to the disclosure Computer program code is downloaded and is installed in the information processing terminal and then performs the program, can also realize each reality of the disclosure Apply example.
To sum up, according in the embodiment of the present disclosure, present disclose provides following scheme, but not limited to this:
A kind of 1. information processor of scheme, including:
Initial extraction unit, its input of first based on user, from the webpage of the initial search result as perpetual object The middle extraction concern part relevant with the particular community of the perpetual object;
Original template generation unit, its based on the user second input, to it is described concern part be labeled and from The attribute of the perpetual object is extracted in the concern part, and is trained using the concern part of mark, with generation with The initial attribute environment templates of the context-sensitive of the attribute of the perpetual object;And
Expansion templates generation unit, it utilizes the initial attribute environment templates, extraction and the concern pair from network The related info web of the attribute of elephant, to obtain expanded search results, and based on expanded search results generation with it is described The extended attribute environment templates of the context-sensitive of the attribute of perpetual object.
Information processor of the scheme 2. as described in scheme 1, wherein,
First input and/or second input are the inputs that the user is carried out by visual means.
Information processor of the scheme 3. as described in scheme 1, wherein,
The expansion templates generation unit extracts the keyword institute of the initial attribute environment templates from the webpage of network Text node and the text node the brotgher of node, as the info web extracted.
Information processor of the scheme 4. as any one of scheme 1 to 3, wherein,
In the info web extracted, pass that the expansion templates generation unit will be extracted with the initial extraction unit The similarity for noting part is more than the info web of predetermined threshold as the expanded search results.
Information processor of the scheme 5. as described in scheme 4, wherein,
The similarity includes content similarity and structural similarity.
Information processor of the scheme 6. as described in scheme 5, wherein,
The structural similarity include it is following in it is at least one:CSS modification class names similarity, masurium similarity, word Segment length similarity.
Information processor of the scheme 7. as described in scheme 1, in addition to:
Extraction unit is extended, for extracting the expansion of the perpetual object from network using the extended attribute environment templates Open up attribute.
Information processor of the scheme 8. as described in scheme 7, wherein,
When the attribute that the extended attribute that the extension extraction unit is extracted is extracted with the original template generation unit When inconsistent, the extended attribute that the extension extraction unit is extracted is presented to the user.
A kind of 9. information processing method of scheme, including:
The first input based on user, extraction and the concern from the webpage as the initial search result of perpetual object The relevant concern part of the particular community of object;
The second input based on the user, the concern part is labeled and extracts institute from the concern part The attribute of perpetual object is stated, and is trained using the concern part of mark, to generate and the attribute of the perpetual object The initial attribute environment templates of context-sensitive;And
Using the initial attribute environment templates, the webpage letter related to the attribute of the perpetual object is extracted from network Breath, to obtain expanded search results, and the attribute based on expanded search results generation and the perpetual object is upper and lower The relevant extended attribute environment templates of text.
Information processing method of the scheme 10. as described in scheme 9, wherein,
First input and/or second input are the inputs that the user is carried out by visual means.
Information processing method of the scheme 11. as described in scheme 9, wherein,
Text node where the keyword of the initial attribute environment templates and described is extracted from the webpage of network The brotgher of node of text node, as the info web extracted.
Information processing method of the scheme 12. as any one of scheme 9 to 11, wherein,
In the info web extracted, the webpage that the similarity of the concern part with being extracted is more than to predetermined threshold is believed Breath is used as the expanded search results.
Information processing method of the scheme 13. as described in scheme 12, wherein,
The similarity includes content similarity and structural similarity.
Information processing method of the scheme 14. as described in scheme 13, wherein,
The structural similarity include it is following in it is at least one:CSS modification class names similarity, masurium similarity, word Segment length similarity.
Information processing method 90 of the scheme 15. as described in scheme 9, in addition to:
The extended attribute of the perpetual object is extracted from network using the extended attribute environment templates.
Information processing method of the scheme 16. as described in scheme 9, wherein,
When the extended attribute extracted from network and the inconsistent attribute extracted from the concern part, by institute The extended attribute of extraction is presented to the user.
A kind of 17. message processing device of scheme, including:
Controller, the controller are configured as:
The first input based on user, extraction and the concern from the webpage as the initial search result of perpetual object The relevant concern part of the particular community of object;
The second input based on the user, the concern part is labeled and extracts institute from the concern part The attribute of perpetual object is stated, and is trained using the concern part of mark, to generate and the attribute of the perpetual object The initial attribute environment templates of context-sensitive;And
Using the initial attribute environment templates, the webpage letter related to the attribute of the perpetual object is extracted from network Breath, to obtain expanded search results, and the attribute based on expanded search results generation and the perpetual object is upper and lower The relevant extended attribute environment templates of text.
Finally, it is to be noted that, in the disclosure, such as first and second or the like relational terms are used merely to One entity or operation are made a distinction with another entity or operation, and not necessarily require or imply these entities or behaviour Any this actual relation or order between work be present.Moreover, term " comprising ", "comprising" or its any other variant Including for nonexcludability is intended to, so that process, method, article or equipment including a series of elements not only include Those key elements, but also the other element including being not expressly set out, or also include for this process, method, article or The intrinsic key element of person's equipment.In the absence of more restrictions, the key element limited by sentence "including a ...", not Other identical element in the process including the key element, method, article or equipment also be present in exclusion.
Although being had been disclosed above by the description of the specific embodiment of the disclosure to the disclosure, however, it should Understand, those skilled in the art can design various modifications, the improvement to the disclosure in the spirit and scope of the appended claims Or equivalent.These modifications, improvement or equivalent should also be as being to be considered as included in the claimed scope of the disclosure.

Claims (10)

1. a kind of information processor, including:
Initial extraction unit, its input of first based on user, is carried from the webpage as the initial search result of perpetual object Take the concern part relevant with the particular community of the perpetual object;
Original template generation unit, its input of second based on the user, the concern part is labeled and from described Concern extracts the attribute of the perpetual object in part, and is trained using the concern part of mark, with generation with it is described The initial attribute environment templates of the context-sensitive of the attribute of perpetual object;And
Expansion templates generation unit, it utilizes the initial attribute environment templates, extraction and the perpetual object from network The related info web of attribute, to obtain expanded search results, and based on expanded search results generation and the concern The extended attribute environment templates of the context-sensitive of the attribute of object.
2. information processor as claimed in claim 1, wherein,
First input and/or second input are the inputs that the user is carried out by visual means.
3. information processor as claimed in claim 1, wherein,
The expansion templates generation unit is extracted from the webpage of network where the keyword of the initial attribute environment templates The brotgher of node of text node and the text node, as the info web extracted.
4. information processor as claimed any one in claims 1 to 3, wherein,
In the info web extracted, concerned department that the expansion templates generation unit will be extracted with the initial extraction unit The similarity divided is more than the info web of predetermined threshold as the expanded search results.
5. information processor as claimed in claim 4, wherein,
The similarity includes content similarity and structural similarity.
6. information processor as claimed in claim 5, wherein,
The structural similarity include it is following in it is at least one:CSS modification class names similarity, masurium similarity, field length Spend similarity.
7. information processor as claimed in claim 1, in addition to:
Extraction unit is extended, for extracting the extension category of the perpetual object from network using the extended attribute environment templates Property.
8. information processor as claimed in claim 7, wherein,
When the attribute that the extended attribute that the extension extraction unit is extracted is extracted with the original template generation unit differs During cause, the extended attribute that the extension extraction unit is extracted is presented to the user.
9. a kind of information processing method, including:
The first input based on user, extraction and the perpetual object from the webpage as the initial search result of perpetual object The relevant concern part of particular community;
The second input based on the user, the concern part is labeled and extracts the pass from the concern part The attribute of object is noted, and is trained using the concern part of mark, above and below the attribute of generation and the perpetual object The relevant initial attribute environment templates of text;And
Using the initial attribute environment templates, the info web related to the attribute of the perpetual object is extracted from network, To obtain expanded search results, and had based on expanded search results generation and the context of the attribute of the perpetual object The extended attribute environment templates of pass.
10. a kind of message processing device, including:
Controller, the controller are configured as:
The first input based on user, extraction and the perpetual object from the webpage as the initial search result of perpetual object The relevant concern part of particular community;
The second input based on the user, the concern part is labeled and extracts the pass from the concern part The attribute of object is noted, and is trained using the concern part of mark, above and below the attribute of generation and the perpetual object The relevant initial attribute environment templates of text;And
Using the initial attribute environment templates, the info web related to the attribute of the perpetual object is extracted from network, To obtain expanded search results, and had based on expanded search results generation and the context of the attribute of the perpetual object The extended attribute environment templates of pass.
CN201610523111.XA 2016-07-05 2016-07-05 Information processor, information processing method and message processing device Pending CN107577683A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610523111.XA CN107577683A (en) 2016-07-05 2016-07-05 Information processor, information processing method and message processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610523111.XA CN107577683A (en) 2016-07-05 2016-07-05 Information processor, information processing method and message processing device

Publications (1)

Publication Number Publication Date
CN107577683A true CN107577683A (en) 2018-01-12

Family

ID=61050064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610523111.XA Pending CN107577683A (en) 2016-07-05 2016-07-05 Information processor, information processing method and message processing device

Country Status (1)

Country Link
CN (1) CN107577683A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551800A (en) * 2008-03-31 2009-10-07 富士通株式会社 Marked information generation device, inquiry unit and sharing system
CN103390128A (en) * 2013-08-01 2013-11-13 贝壳网际(北京)安全技术有限公司 Page labeling method and device and terminal equipment
US8856125B1 (en) * 2010-02-26 2014-10-07 Google Inc. Non-text content item search
CN105117498A (en) * 2015-09-28 2015-12-02 北京奇虎科技有限公司 Webpage data processing method and device
CN105138705A (en) * 2015-09-30 2015-12-09 北京奇虎科技有限公司 Webpage marking method and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551800A (en) * 2008-03-31 2009-10-07 富士通株式会社 Marked information generation device, inquiry unit and sharing system
US8856125B1 (en) * 2010-02-26 2014-10-07 Google Inc. Non-text content item search
CN103390128A (en) * 2013-08-01 2013-11-13 贝壳网际(北京)安全技术有限公司 Page labeling method and device and terminal equipment
CN105117498A (en) * 2015-09-28 2015-12-02 北京奇虎科技有限公司 Webpage data processing method and device
CN105138705A (en) * 2015-09-30 2015-12-09 北京奇虎科技有限公司 Webpage marking method and electronic equipment

Similar Documents

Publication Publication Date Title
Jänicke et al. Visual text analysis in digital humanities
Jänicke et al. On Close and Distant Reading in Digital Humanities: A Survey and Future Challenges.
JP6818050B2 (en) Website building system and method
CN105393265B (en) Active features in man-machine interaction study
CN111488467A (en) Construction method and device of geographical knowledge graph, storage medium and computer equipment
CN106951420A (en) Literature search method and apparatus, author's searching method and equipment
Mozaffari et al. GANSpiration: Balancing Targeted and Serendipitous Inspiration in User Interface Design with Style-Based Generative Adversarial Network
Kikuchi et al. Modeling visual containment for web page layout optimization
Pande et al. Development and deployment of a generative model-based framework for text to photorealistic image generation
Aladakatti et al. Exploring natural language processing techniques to extract semantics from unstructured dataset which will aid in effective semantic interlinking
CN114373554A (en) Drug interaction relation extraction method using drug knowledge and syntactic dependency relation
CN115203429B (en) Automatic knowledge graph expansion method for constructing ontology framework in auditing field
Sorge Polyfilling accessible chemistry diagrams
Qudeisat et al. A Linguistic Landscape Study Of Shop Signs In The Northern Part Of Jordan
Wu Automating Knowledge Distillation and Representation from Richly Formatted Data
CN107577683A (en) Information processor, information processing method and message processing device
Muhammad et al. Exploiting mixing regularization for truly unsupervised font synthesis
Matsuda et al. Impressions2font: Generating fonts by specifying impressions
JP7135730B2 (en) Summary generation method and summary generation program
Dunkel et al. Generative text-to-image diffusion for automated map production based on geosocial media data
Belerao et al. Summarization using mapreduce framework based big data and hybrid algorithm (HMM and DBSCAN)
Lama Clustering system based on text mining using the K-means algorithm: news headlines clustering
De Sisto et al. Understanding poetry using natural language processing tools: a survey
Yashaswini et al. Story telling: learning to visualize sentences through generated scenes
Drucker The Back End: Infrastructure Design for Scholarly Research

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180112

WD01 Invention patent application deemed withdrawn after publication