CN104484374A - Method and device for creating Internet encyclopedia entry - Google Patents

Method and device for creating Internet encyclopedia entry Download PDF

Info

Publication number
CN104484374A
CN104484374A CN201410742411.8A CN201410742411A CN104484374A CN 104484374 A CN104484374 A CN 104484374A CN 201410742411 A CN201410742411 A CN 201410742411A CN 104484374 A CN104484374 A CN 104484374A
Authority
CN
China
Prior art keywords
retrieval
result
entry
visual angle
statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410742411.8A
Other languages
Chinese (zh)
Other versions
CN104484374B (en
Inventor
吴先超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201410742411.8A priority Critical patent/CN104484374B/en
Publication of CN104484374A publication Critical patent/CN104484374A/en
Application granted granted Critical
Publication of CN104484374B publication Critical patent/CN104484374B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An embodiment of the invention discloses a method and device for creating an Internet encyclopedia entry. The method includes: receiving the to-be-created entry, analyzing the field category of the to-be-created entry, and inquiring a view angle partition corresponding to the field category; using the to-be-created entry and the view angle partition to build a retrieval statement, and retrieving according to the retrieval statement; integrating retrieval results according to preset integrating rules, and displaying the integrated retrieval results. By the method and device, automatic entry creating can be achieved, and entry creating accuracy is increased.

Description

A kind of method and device creating network encyclopaedia entry
Technical field
The embodiment of the present invention relates to natural language processing technique field, particularly relates to a kind of method and the device that create network encyclopaedia entry.
Background technology
Encyclopedia is the reference book that summary describes all knowledge classes of the mankind or a certain knowledge class, and it has almost contained the composition of various reference book, enumerates the knowledge of each side.Along with the development of network technology, network encyclopaedia substitutes papery encyclopedia gradually.
Because network encyclopaedia has opening and freedom, emphasize participation and the spirit of utter devotion of user.Therefore, network encyclopaedia allows any user to create entry, and the content that editor is corresponding with entry, give full play to the strength of user, converge the wisdom of more than one hundred million user.Meanwhile, network encyclopaedia achieves the combination with search engine (such as, Baidu and Google etc.), question and answer, meets the demand of user to information from different levels.The network encyclopaedia of current Chinese edition mainly contains: wikipedia, Baidupedia, search dog encyclopaedia and interactive encyclopaedia.
In network encyclopaedia during search entry, if the page that search engine returns is not for include this entry, then as shown in Figure 1.Then need using this entry as entry to be created, the network encyclopaedia that manual creation is corresponding with entry to be created.At present, when needing to create the entry to be created towards directions such as personage, mechanism, brand and products, first founder utilizes existing search engine, retrieves the related web page information of entry to be created from different dimensions.Then artificial related web page information to be filtered, simplified and a series of editor such as integration.Finally create out the content corresponding with entry to be created.The work of the brand-new entry of this manual creation is heavy and trifling.And manually being filtered related web page information, simplify and a series of editor such as integration process in, due to the oneself factor of founder, likely there is maloperation, reduce the accuracy of network encyclopaedia.
Summary of the invention
The embodiment of the present invention provides a kind of method and the device that create network encyclopaedia entry, to improve the efficiency and accuracy rate that create network encyclopaedia entry.
On the one hand, embodiments provide a kind of method creating network encyclopaedia entry, comprising:
Receive entry to be created, analyze the domain classification of described entry to be created, and the visual angle subregion that inquiry is corresponding with described domain classification;
Use described entry to be created and described visual angle subregion to set up retrieve statement, retrieve according to described retrieve statement;
According to the integration rules preset, result for retrieval is integrated, and the result for retrieval after display integration.
On the other hand, the embodiment of the present invention additionally provides a kind of device creating network encyclopaedia entry, comprising:
Visual angle subregion enquiry module, for receiving entry to be created, analyzes the domain classification of described entry to be created, and the visual angle subregion that inquiry is corresponding with described domain classification;
Retrieve statement builds module, for using described entry to be created and described visual angle subregion to set up retrieve statement, retrieves according to described retrieve statement;
Result for retrieval integrate module, for according to the integration rules preset, integrates result for retrieval, and the result for retrieval after display integration.
The embodiment of the present invention is by a kind of method and the device that create network encyclopaedia entry, when the entry of network encyclopaedia retrieval is the entry of not including, using this entry as entry to be created, analyze the domain classification of entry to be created, and the visual angle subregion that inquiry is corresponding with domain classification; Set up retrieve statement to entry to be created and visual angle subregion to retrieve; According to the integration rules preset, result for retrieval is integrated, and the result for retrieval after display integration, make it possible to realize automatically creating entry, improve the accuracy rate creating entry.
Accompanying drawing explanation
Fig. 1 does not include the network encyclopaedia page corresponding to entry in prior art;
A kind of schematic flow sheet creating the method for network encyclopaedia entry that Fig. 2 provides for the embodiment of the present invention;
The training entry that Fig. 3 a provides for the embodiment of the present invention and domain classification model and utilize the schematic diagram of domain classification of entry and domain classification model analysis entry to be created;
The domain classification that in the current network encyclopaedia that Fig. 3 b provides for the embodiment of the present invention, existing entry is corresponding;
The characteristic of division that in the current network encyclopaedia that Fig. 3 c provides for the embodiment of the present invention, existing entry is corresponding;
A kind of schematic flow sheet creating the method for network encyclopaedia entry that Fig. 4 provides for the embodiment of the present invention;
A kind of schematic flow sheet creating the method for network encyclopaedia entry that Fig. 5 provides for the embodiment of the present invention;
Fig. 6 provides semantic character labeling process schematic for the embodiment of the present invention;
A kind of schematic flow sheet creating the method for network encyclopaedia entry that Fig. 7 provides for the embodiment of the present invention;
The dependency tree schematic diagram that Fig. 8 provides for the embodiment of the present invention;
A kind of schematic diagram creating the method for network encyclopaedia entry that Fig. 9 provides for the embodiment of the present invention;
A kind of structural representation creating the device of network encyclopaedia entry that Figure 10 provides for the embodiment of the present invention;
A kind of schematic diagram creating the structure of the device of network encyclopaedia entry that Figure 11 provides for the embodiment of the present invention;
A kind of page creating network encyclopaedia entry that Figure 12 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with drawings and Examples, the present invention is described in further detail.Be understandable that, specific embodiment described herein is only for explaining the present invention, but not limitation of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not entire infrastructure.
Embodiment one
A kind of schematic flow sheet creating the method for network encyclopaedia entry that Fig. 2 provides for the embodiment of the present invention one.
The method is applicable to the new term that network encyclopaedia does not include user's input, and when user creates new term, the method can be performed by the device creating network encyclopaedia entry, and this device can be configured in the server that can process network encyclopaedia information.The method specifically comprises following operation S201-S203:
Operation S201, receives entry to be created, analyzes the domain classification of entry to be created, and the visual angle subregion that inquiry is corresponding with domain classification.
In operation S201, in network encyclopaedia, retrieve entry, network encyclopaedia search engine does not return the info web corresponding to entry, and points out this entry of user not included by network encyclopaedia, then this entry can, as entry to be created, need to create the network encyclopaedia corresponding with entry to be created.For realizing creating the network encyclopaedia corresponding with entry to be created, the domain classification of Water demand entry to be created, using as the foundation distinguishing field belonging to entry to be created, entry to be created can a correspondence domain classification also can corresponding multiple domain classification.Such as, " Asia research institute of Microsoft " is organization names, then corresponding domain classification is mechanism; " benz " can corresponding name " Ka Er benz " this personage, also can corresponding " Mercedes benz " this automobile brand, then " benz " can corresponding personage and automobile brand two domain classifications.After the domain classification analyzing entry to be created, at least one visual angle subregion corresponding with domain classification can be inquired about.Visual angle subregion can represent each side attribute of domain classification, to be described in detail domain classification.Such as " Zhang Yaqin " is personage's domain classification, and the visual angle subregion of its correspondence can comprise experience, curriculum vitae, main honor and Microsoft Asia-Pacific in one's early years.
Operation S202, uses entry to be created and visual angle subregion to set up retrieve statement, retrieves, obtain result for retrieval according to retrieve statement.
In operation S202, entry to be created and each visual angle subregion can be set up retrieve statement respectively, such as retrieve statement is respectively " Zhang Yaqin in one's early years experience " and " Zhang Yaqin curriculum vitae " etc., to realize carrying out complete search from each visual angle subregion to entry to be created.Also retrieve statement can be set up in entry to be created and multiple visual angles subregion, such as " the main honor of Zhang Yaqin curriculum vitae ", search engine is allowed to return the web page contents higher with entry precision to be created, be conducive to searching high-quality webpage as original language material, to build the network encyclopaedia corresponding with entry to be created.Result for retrieval can comprise the whole web page contents about each query statement that search engine returns.
Operation S203, according to the integration rules preset, integrates result for retrieval, and the result for retrieval after display integration.
In S203, the quantity of the web page contents in result for retrieval is more, and there is the content of relatively same visual angle subregion repeated description in these web page contents, and readability is not strong.Need, according to the integration rules preset, to integrate result for retrieval, the result for retrieval after integrating is shown as the network encyclopaedia corresponding with entry to be created.For improving the accuracy creating network encyclopaedia further, after the result for retrieval after showing this integration to user, user can be pointed out to confirm, and whether the result for retrieval after this integration is correct.If correct, then can using the result for retrieval after this integration as network encyclopaedia corresponding to entry to be created; If incorrect, then user can revise the result for retrieval after integration, using amended result for retrieval as network encyclopaedia corresponding to entry to be created.
By a kind of method creating network encyclopaedia entry that the embodiment of the present invention provides, can at network encyclopaedia retrieval entry, when this entry is the entry of not including, using this entry as entry to be created, the network encyclopaedia of automatic establishment entry to be created, improves the accuracy rate creating network encyclopaedia entry.
Embodiment two
The present embodiment is based on above-described embodiment, further by aforesaid operations S201 " receive entry to be created; analyze the domain classification of entry to be created; and the inquiry visual angle subregion corresponding with domain classification " be optimized for further: the domain classification of foundation entry and domain classification model analysis entry to be created, obtains at least one domain classification.
Training entry and domain classification model and utilize entry and domain classification model analysis entry to be created domain classification schematic diagram as shown in Figure 3 a, train the stage of entry and domain classification model specifically to comprise:
First, training data is obtained.To can manually mark the entry of domain classification as training data.Such as: " apple-fruit ", also can using existing entry in current network encyclopaedia and the domain classification corresponded as training data, such as " Microsoft Research, Asia-mechanism ".
Secondly, sorter is used to build the characteristic of division of training data.Wherein, sorter can be SVM (Support Vector Machine, support vector machine) or Bayes classifier.
Characteristic of division can be the feature of entry own, and such as, entry is " fruit ", then the feature of " color " and " shape " these " fruit " self can as characteristic of division.
Characteristic of division also can be the web page contents of the top-N (flow ranking list) that entry Automatically invoked search engine obtains.Such as, the webpage that search entry " Zhang Yaqin " obtains comprises the information such as " he ", " being born in 1966 " and " University of Washington doctor ", these information are all supported " Zhang Yaqin " this entry and are divided into the characteristic of division in " personage " this field.
Finally, according to characteristic of division, create entry and domain classification model.
Such as, respectively according to self characteristic of division " shape " and " color " construction feature function of entry " apple ", f1 (x) is the fundamental function of " color ", represents the colouring information on fruit picture, such as redness can value be 1, and yellow can value be 2 etc.; F2 (x) is the fundamental function of " shape ", represents the shape of fruit picture x, and such as " circle " can value be 1, and " slender type " can value be 2 etc.Its entry created and domain classification model are linear model y (x)=w1*f1 (x)+w2*f2 (x), and wherein x is fruit picture, and y (x) is domain classification: fruit, w1 and w2 is two undetermined parameters.
By that analogy, can the type of extension feature, according to the content of pages above existing encyclopaedia entry and the described field label of this entry, construct the disaggregated model of entry and field label.
Such as: an existing encyclopaedia entry " Zhang Yaqin " is in encyclopaedia, and described field labeling is " personage ", as shown in Figure 3 b.
Like this, a training data can be constructed according to this encyclopaedia webpage entry as follows:
Y: personage;
X: Zhang Yaqin;
F1 (x) to fn (x) can some phrases in Fig. 3 c, such as f1 (x) is " in the encyclopaedia webpage of this entry of Zhang Yaqin; whether comprise " in one's early years experience " this content ", this is because for a new entry to be created, if we are from existing netpage search to the word content relevant with " in one's early years experience ", the possibility that then this entry to be created belongs to " personage " field label can increase, and this is the general knowledge expection that meets people.
As shown in Figure 3, the operation of the domain classification of entry and domain classification model analysis entry to be created is utilized to comprise:
First, entry to be created is received.
Secondly, build the characteristic of division of entry to be created, and call entry and domain classification model.
Wherein, the characteristic of division building entry to be created can be the feature of entry own, and such as, entry is " fruit ", then the feature of " color " and " shape " these " fruit " self can as characteristic of division.
The characteristic of division building entry to be created also can be the web page contents of the top-N that entry Automatically invoked search engine obtains.Such as, the webpage that search entry " Zhang Yaqin " obtains comprises the information such as " he ", " being born in 1966 " and " University of Washington doctor ", these information all support that " Zhang Yaqin " this entry is divided into the characteristic of division in " personage " this field.According to each characteristic of division above-mentioned, and call entry and domain classification model, obtain the every field classification of entry to be created, by each characteristic of division input entry and domain classification model of entry to be created, obtain at least one domain classification of the entry to be created that entry and domain classification model export.Finally, the set of domain classification is exported.Include at least one domain classification in this set, such as, in " Zhang Yaqin personage ", contain a domain classification; And " benz personage brand " includes two domain classifications.
By the technical scheme that the embodiment of the present invention provides, according to the domain classification of entry and domain classification model analysis entry to be created, the accuracy of the domain classification analyzing entry to be created can be improve.
Embodiment three
The present embodiment is based on above-described embodiment, further by above-mentioned " receive entry to be created; analyze the domain classification of entry to be created; and the inquiry visual angle subregion corresponding with domain classification " in operation be optimized for further: the network encyclopaedia entry template that foundation domain classification is corresponding, inquire about the visual angle subregion corresponding with domain classification.
Wherein, domain classification and the visual angle subregion corresponding with domain classification is comprised in network encyclopaedia entry template.Exemplary, domain classification can comprise at least one in personage, mechanism, medicine and brand.The visual angle subregion that figure picture is corresponding can comprise the space-time collection of illustrative plates that time, space and life event form; The corresponding visual angle subregion of mechanism can comprise time, space and related person; The corresponding visual angle subregion of medicine can comprise time, inventor, invention mechanism, effect and spinoff; The corresponding visual angle subregion of brand can comprise time, founder, scale and product.
Such as, the visual angle subregion under " Zhang Yaqin " this personage's entry comprises: experience, curriculum vitae, main contributions, main honor and Microsoft Asia-Pacific in one's early years; And the visual angle subregion under " Li Kaifu " this personage's entry comprises: personage experiences, achievement and honor, the personal production and social assessment.
According to the gathering of visual angle subregion of above-mentioned entry of being correlated with for Baidupedia personage, general overview can go out the visual angle subregion relevant with personage and can have: curriculum vitae (being equal to personage's experience), achievement and some aspects such as honor (main contributions, main honor) and social assessment, can set up the template of " personage " relevant encyclopaedia entry according to similar method.Thus use this template to instruct the automatic process of establishing of entry to be created.
By the technical scheme that the embodiment of the present invention provides, visual angle subregion that can be corresponding with domain classification according to the network encyclopaedia entry template query that domain classification is corresponding, improves the accuracy determining visual angle subregion.
Embodiment four
In the technical scheme that the embodiment of the present invention provides, the process of " receive entry to be created, analyze the domain classification of described entry to be created, and inquire about the visual angle subregion corresponding with described domain classification " is optimized further based on above-described embodiment, as shown in Figure 4, operation S401-405 is specifically comprised;
Operation S401, receives entry to be created, analyzes the domain classification of described entry to be created, and the visual angle subregion that inquiry is corresponding with described domain classification.
Operation S402, uses entry to be created and visual angle subregion to set up retrieve statement, retrieves, obtain result for retrieval according to retrieve statement.
Operation S403, for each visual angle subregion inquired, to should the frequency of occurrence of result for retrieval of visual angle subregion in statistics result for retrieval.
Such as, the visual angle subregion that inquiry " Zhang Yaqin personage " obtains comprises: experience 51 frequencys, curriculum vitae 20 frequency, teenager's frequency in period 49, main contributions 10 frequency in one's early years.
Operation S404, is merged into a visual angle subregion by multiple visual angles subregion of the semantic similitude inquired, and superposes the frequency of occurrence of result for retrieval corresponding to the plurality of visual angle subregion, as the frequency of occurrence of result for retrieval corresponding to visual angle subregion after merging.
Multiple visual angles subregion of semantic similitude is such as experienced and juvenile period in one's early years, be merged into a visual angle subregion, such as one's early years are experienced, and superpose the frequency of its frequency (one's early years experience 51 frequencys, teenager's frequency in period 49) as the visual angle subregion after conjunction, and such as one's early years experience 100 frequencys.
Operation S405, according to the frequency of occurrence order from high to low of result for retrieval corresponding to each visual angle subregion, shows the result for retrieval that each visual angle subregion is corresponding.
By the technical scheme that the embodiment of the present invention provides, multiple visual angles subregion of semantic similitude can be merged into a visual angle subregion, reduce the task amount of later retrieval entry to be created and described visual angle subregion establishment retrieve statement, reduce the data processing amount creating network encyclopaedia.
Embodiment five
On the basis of above-described embodiment, embodiments provide a kind of method creating network encyclopaedia entry, when being applicable to the network encyclopaedia creating new term, as shown in Figure 5, concrete executable operations S501-S505:
Operation S501, receives entry to be created, analyzes the domain classification of entry to be created, and the visual angle subregion that inquiry is corresponding with domain classification.
Operation S502, uses entry to be created and visual angle subregion to set up retrieve statement, retrieves, obtain result for retrieval according to retrieve statement.
Operation S503, adopts semantic character labeling device, carries out semantic character labeling (SRL, Semantic role labelling), obtain the trunk structure of each statement to each statement in result for retrieval.
In operation S503, in natural language processing, semantic character labeling device can adopt the analytical algorithm of O (n) complexity, and wherein n is the number of word in read statement.The trunk structure of each statement can comprise: the argument (such as, subject, object, time adverbial, point adverbial etc.) that semantic role is classified, each predicate is corresponding of predicate, predicate and the semantic relation between predicate and argument.
For clarity sake, semantic character labeling process is described to carry out semantic character labeling to statement " I likes Baidu ", as shown in Figure 6:
First, carry out PRG (Predicate recognition, predicate recognition) to statement " I likes Baidu ", the predicate identified is " love ".
Secondly, the semantic classification of predicate " love " is analyzed.Such as, " love " if semantic classification CPB2 (ChinesePropbank 2.0, Chinese Penn Treebank 2.0 editions) in the first semantic classification of defining of frame set (framework collection), be then " love .01 " to the semantic classification assignment of " loves ".
Finally, the argument corresponding with predicate " love " is searched.Argument can be subject, object, time adverbial, point adverbial etc., and the subject A0 of such as, " love " in " I likes Baidu " is " I ", and object A1 is " Baidu ", then corresponding with predicate " love " argument is " I " and " Baidu ".
Need to be described, semantic character labeling device has nothing to do to the mark of statement trunk structure and word order (wordorder).Such as: statement one, " police are just in probe culprit ", carries out semantic character labeling to it: " police "/subject-agent is detailed " investigation "/verb-predicate " culprit "/object-word denoting the receiver of an action; Statement two, " police carry out probe to culprit ", wherein verb is rearmounted, and noun, but statement trunk is constant; Statement three, " investigation of the police to culprit terminates ".The semantic trunk of the corresponding same of above-mentioned three statements: police investigation culprit.Thus the word order impact of statement can be avoided semantic character labeling, improve the accuracy of semantic character labeling.
Need to be described, the trunk structure of statement be by the syntactic structure that " SVO determine shape benefit " is so nested/combine.Subject in statement (or object etc.) be likely hidden.For overcoming this problem, semantic character labeling device can also carry out auto-complete to the semantic role of statement.Such as: " eating " this predicate in " I has had a meal " and " I has eaten " these two statements, all needs subject and object, and the object only in second statement has been omitted.The trunk structure extracted in these two statements can respectively: I/Shi Shi – eats/Wei Ci – meal/word denoting the receiver of an action; I/Shi Shi – eats/what >/ word denoting the receiver of an action of Wei Ci – <.Such as: in " you have had supper " and " I has eaten " these two statements, the trunk structure that can be drawn into is " I/Shi Shi – have/Wei Ci – supper/word denoting the receiver of an action ".Thus the trunk structure extracted by semantic character labeling device, the information extraction for non-single statement provides good clue.
Operation S504, according to the integration rules preset, integrates result for retrieval, and the result for retrieval after display integration.
By the technical scheme that the embodiment of the present invention provides, semantic character labeling device can be adopted, the trunk structure that semantic character labeling obtains each statement is carried out to each statement in result for retrieval, improve the accuracy creating network encyclopaedia.
The word existed in corpus is known words, but the partial words in statement does not exist in corpus, such as emerging network vocabulary, and these words are unknown word.In order to improve the precision of semantic character labeling device mark statement trunk structure, also need to identify unknown word.Term clustering dictionary can be used, unknown word is mapped in similar known words, and reach the semantic role relation correctly parsing unknown word and be correlated with.
Such as: statement 1 " I did not also learn this course at present "; Statement 2 " I did not also research this course at present ".Suppose, known words is the subject of " study " is " I ", and object is " course "; Unknown word is " advanced study and training ", does not also know the semantic role of " advanced study and training ".Now, just need according to term clustering method, i.e. similarity based on context, estimate two current words " study " and the semantic similarity of " advanced study and training ".In based on large-scale data training, if the some Word similarities about these two words are positioned at certain limit, then can improve the semantic similarity of known words and unknown word.Because " study " is identical with the word of " advanced study and training " left and right, then can judge that " study " and " advanced study and training " semantic similarity is very high, thus the semantic frame of known " study " can be transplanted to above unknown " advanced study and training " this word, determine that " study " is the known words of " advanced study and training ".
Due to known words that the sorter such as support vector machine and Bayes classifier is higher to the frequency, to carry out semantic tagger accuracy higher, also above-mentioned mapping method can be adopted to known words, such as " study " and " studying intensively " two known words, the frequency that " study " occurs is high, " study intensively " frequency occurred low, " can study intensively " and be mapped in " study ", thus improve the precision of semantic character labeling.
Embodiment six
In the technical scheme that the embodiment of the present invention provides, preferably semantic character labeling device adopts pipeline (tube side) structure, and the semantic character labeling device of this tube side structure is set as input, with the trunk structure of statement for exporting with the dependency analysis of statement.Therefore, before the semantic character labeling device calling tube side structure carries out semantic character labeling to statement, need the dependency analysis tree obtaining statement.
For solving the problem, the embodiment of the present invention provides a kind of preferred technical scheme, shown in Figure 7, comprises following operation:
Operation S701, receives entry to be created, analyzes the domain classification of entry to be created, and the visual angle subregion that inquiry is corresponding with domain classification.
Operation S702, uses entry to be created and visual angle subregion to set up retrieve statement, retrieves according to retrieve statement.
Operation S703, carries out dependency analysis respectively to each statement in result for retrieval, and revises analysis result, obtains the dependency analysis tree that each statement is corresponding respectively.
In operation S703, dependency analysis device can be used to analyze the dependence of each statement, and dependence can be existing grammatical relation, and such as SVO is determined shape and mended.At least one in the subject of statement, predicate, object and the adverbial modifier can be comprised in analysis result.Dependency tree can be the statement of participle and part-of-speech tagging.Such as, the dependency tree of statement " Christina beat Scott with baseball yesterday " is as Fig. 8, and wherein, " Christina " is subject, and " yesterday " be time adverbial, " Scott " is object.
Operation S704, the dependency analysis of each statement in result for retrieval tree is inputted semantic character labeling device respectively, after semantic character labeling device receives the dependency analysis tree of statement at every turn, the dependency analysis tree according to current reception carries out syntax parsing, exports the trunk structure of corresponding statement.
Operation S705, the integration rules that foundation is preset, integrates result for retrieval, and the result for retrieval after display integration, specifically the trunk structure of each statement in result for retrieval is integrated, and the content after display integration.
The skill scheme provided by the embodiment of the present invention, can provide the dependency tree of statement as input for the semantic character labeling device of tube side structure, thus realizes the trunk structure utilizing semantic character labeling device output statement.
Embodiment seven
In the technical scheme that the embodiment of the present invention provides, the integration rules preset comprises following at least one item:
The integration rules preset can be carry out ambiguity elimination to the entity word in result for retrieval.Unified for same entity word with the entity word that will there is ambiguity.Such as, inside same document, " Julius happy spreads " " Caesar " " he " and " Julius Caesar " etc. is the same person " Julius happy spreads " referred to.Can be unified for " Julius happy spreads ".
The integration rules preset can be that normalization represents the shrinkage language of same entity.Such as " U.S. " and " United States of America " all refer to the U.S., can will be normalized to " U.S. " both it.
The integration rules preset can be, is normalized and automatically calculates the time.Such as " Zhang Yaqin is born in 1966.12 year old that year, he was admitted to classes for exceptionally gifted children in colleges and universities of China Science & Technology University "; " 12 years old " here can pass through " 1966+12=1978 " namely latter; by the reference resolution of Time Calculation and pronoun, can be rewritten as " within 1978 years, Zhang Yaqin has been admitted to classes for exceptionally gifted children in colleges and universities of China Science & Technology University ".By this step, we can make, around the knowledge information on the different statements of same personage, to get up according to the sequencing unified integration of when and where.
The integration rules preset can be, identifies NER (Named Entity Recognition, named entity recognition) feature in result for retrieval.To identify in statement the entity word having certain sense, the entity words such as such as name, place name and time.Name is generally subject object etc., and place name represents point adverbial, time representation time adverbial.
Such as: " yesterday at dawn has bought a school bag in Shangdi ".Wherein, " dawn " is name, and " yesterday " is the time, and " Shangdi " is place, then the trunk that predicate " is bought " is: " I " is subject, and " yesterday " is time adverbial, and " Shangdi " is point adverbial, and " buying " is predicate, and " school bag " is object.Can see, the precision of named entity recognition feature to semantic character labeling device has suggesting effect.
The integration rules preset can be, identify Chunk (block) feature in result for retrieval, Chunk feature can using the phrase in statement as subject or object.Such as, " I goes to Shangdi subbranch of Bank of China "." Shangdi subbranch of Bank of China " is a Chunk feature, represents unique place, can using " Shangdi subbranch of Bank of China " as place object.Therefore identify that in result for retrieval, Chunk feature can make semantic character labeling device reach better for the classification of the argument composition of phrase.
The integration rules preset can be, uses inspection word extensive dictionary unknown word in result for retrieval to be described as the near synonym of known words point, makes up the semantic character labeling system accuracy decline problem caused because of the scarce word problem in corpus.
Embodiment eight
The present embodiment, based on each embodiment above-mentioned, is " Zhang Yaqin " for word to be created, provides a kind of method creating network encyclopaedia, see Fig. 9, mainly comprises following three phases:
First stage, semantic tagger is carried out to the statement retrieved.Wherein, retrieve statement can be " Zhang Yaqin in one's early years experience ", and its statement retrieved comprises " Zhang Yaqin is born in Taiyuan, Shanxi in 1966 ", " Zhang Yaqin is admitted to a university for 12 years old " and " Zhang Yaqin graduates from Chinese University of Science and Technology ".
Semantic tagger is carried out to above-mentioned statement, " Zhang Yaqin " in " Zhang Yaqin 1966 class origin and Taiyuan, Shanxi " is subject, " birth " is predicate, " Taiyuan, Shanxi " is point adverbial, " 1966 " be time adverbial." Zhang Yaqin " in " Zhang Yaqin is admitted to a university for 12 years old " is subject, " examining " is predicate, " 12 years old " is time adverbial, " university " is object." Zhang Yaqin " " Zhang Yaqin graduates from Chinese University of Science and Technology " is subject, " graduation " is predicate, " Chinese University of Science and Technology " is point adverbial.
Subordinate phase, integrates result for retrieval." 12 years old " is wherein carried out superposing with " 1966 " and generates " 1987 ", thus obtain the result for retrieval of " Chinese University of Science and Technology Zhang Yaqin in 1978 college entrance ", " Taiyuan, Shanxi Zhang Yaqin was born in 1966 " and " nineteen eighty-two, Chinese University of Science and Technology Zhang Yaqin graduated from university ".
Phase III, according to the retrieve statement after integration, the generating network encyclopaedia page.
Embodiment nine
Embodiments provide a kind of device creating network encyclopaedia entry, be applicable to the method and be applicable to the new term that network encyclopaedia does not include user's input, when user creates new term, as shown in Figure 10, mainly comprise: visual angle subregion enquiry module 1001, retrieve statement build module 1002 and result for retrieval integrate module 1003.
Visual angle subregion enquiry module 1001, for receiving entry to be created, analyzes the domain classification of entry to be created, and the visual angle subregion that inquiry is corresponding with domain classification; Retrieve statement builds module 1002, for using entry to be created and visual angle subregion to set up retrieve statement, retrieves according to retrieve statement; Result for retrieval integrate module 1003, for according to the integration rules preset, integrates result for retrieval, and the result for retrieval after display integration.
In visual angle subregion enquiry module 1001, entry is retrieved in network encyclopaedia, network encyclopaedia search engine does not return the info web corresponding to entry, and point out this entry of user not included by network encyclopaedia, then this entry can, as entry to be created, need to create the network encyclopaedia corresponding with entry to be created.For realizing creating the network encyclopaedia corresponding with entry to be created.The domain classification of Water demand entry to be created, using as the foundation distinguishing field belonging to entry to be created, entry to be created can a corresponding domain classification also can corresponding multiple domain classification.After the domain classification analyzing entry to be created, at least one visual angle subregion corresponding with domain classification can be inquired about.Visual angle subregion can represent each side attribute of domain classification, to be described in detail domain classification.
Visual angle subregion enquiry module 1001 specifically for, according to the domain classification of entry and domain classification model analysis entry to be created, obtain at least one domain classification.Wherein, domain classification can comprise at least one in personage, mechanism, medicine and brand; The visual angle subregion that figure picture is corresponding comprises the space-time collection of illustrative plates that time, space and life event form; The corresponding visual angle subregion of mechanism comprises time, space and related person; The corresponding visual angle subregion of medicine comprises time, inventor, invention mechanism, effect and spinoff; The corresponding visual angle subregion of brand comprises time, founder, scale and product.
Entry to be created and each visual angle subregion can also be set up retrieve statement, to realize carrying out complete search from each visual angle to entry to be created at visual angle subregion enquiry module 1001.Also retrieve statement can be set up in entry to be created and multiple visual angles subregion, allow search engine return the web page contents higher with entry precision to be created, be conducive to searching high-quality webpage and build the network encyclopaedia corresponding with entry to be created as original language material.Result for retrieval can comprise the whole web page contents about each query statement that search engine returns.
The all right network encyclopaedia entry template corresponding according to domain classification of visual angle subregion enquiry module 1001, inquires about the visual angle subregion corresponding with domain classification; Wherein, domain classification and the visual angle subregion corresponding with domain classification is comprised in network encyclopaedia entry template.
In result for retrieval integrate module 1003, the quantity of the web page contents in result for retrieval is more, and there is the content of relatively same visual angle subregion repeated description in these web page contents, and readability is not strong.Need, according to the integration rules preset, to integrate result for retrieval, the result for retrieval after integrating is shown as the network encyclopaedia corresponding with entry to be created.For improving the accuracy creating network encyclopaedia further, after the result for retrieval after showing this integration to user, user can be pointed out to confirm, and whether the result for retrieval after this integration is correct.If correct, then can using the result for retrieval after this integration as network encyclopaedia corresponding to entry to be created; If incorrect, then user can revise the result for retrieval after integration, using amended result for retrieval as network encyclopaedia corresponding to entry to be created.
Result for retrieval integrate module 1003 specifically performs the integration rules that following at least one item is preset: carry out named entity recognition NER to result for retrieval, to identify the entity word of certain sense; Ambiguity elimination is carried out to the entity word in result for retrieval, unified for same entity word with the entity word that will there is ambiguity; Normalization represents the shrinkage language of same entity.
Result for retrieval integrate module 1003, comprising: frequency statistics submodule, frequency superposition submodule and result output sub-module.
Frequency statistics submodule, for for each visual angle subregion inquired, to should the frequency of occurrence of result for retrieval of visual angle subregion in statistics result for retrieval;
Frequency superposition submodule, for multiple visual angles subregion of the semantic similitude inquired is merged into a visual angle subregion, and superpose the frequency of occurrence of result for retrieval corresponding to the plurality of visual angle subregion, as the frequency of occurrence of result for retrieval corresponding to visual angle subregion after merging;
Result output sub-module, for the frequency of occurrence order from high to low according to result for retrieval corresponding to each visual angle subregion, shows the result for retrieval that each visual angle subregion is corresponding.
By the embodiment of the present invention provide by a kind of device creating network encyclopaedia entry, can at network encyclopaedia retrieval entry, when this entry is the entry of not including, using this entry as entry to be created, the network encyclopaedia of automatic establishment entry to be created, improves the accuracy rate creating network encyclopaedia entry.
Embodiment ten
On the basis of each embodiment above-mentioned, a kind of device creating network encyclopaedia entry that the embodiment of the present invention provides, as shown in figure 11, comprising: visual angle subregion enquiry module 1101, retrieve statement build module 1102 dependency tree and obtain module 1103, semantic character labeling module 1104 and result for retrieval integrate module 1105.
Dependency tree obtains module 1103, for carrying out dependency analysis respectively to each statement in result for retrieval, and revises analysis result, obtains the dependency analysis tree that each statement is corresponding respectively;
Semantic character labeling module 1104 specifically for, the dependency analysis of each statement in result for retrieval tree is inputted semantic character labeling device respectively, after semantic character labeling device receives the dependency analysis tree of statement at every turn, dependency analysis tree according to current reception carries out syntax parsing, exports the trunk structure of corresponding statement.
By the technical scheme that the embodiment of the present invention provides, semantic character labeling device can be adopted, the trunk structure that semantic character labeling obtains each statement is carried out to each statement in result for retrieval, improve the accuracy creating network encyclopaedia.
The condition starting the technical scheme that each embodiment above-mentioned provides can be user in the page as shown in figure 12, triggers when " fast creation button " comes and realizes.
Note that and above are only preferred embodiment of the present invention and institute's application technology principle.Skilled person in the art will appreciate that the specific embodiment that the invention is not restricted to here, various obvious change can be carried out for a person skilled in the art, readjust and substitute and can not protection scope of the present invention be departed from.Therefore, although be described in further detail invention has been by above embodiment, the present invention is not limited only to above embodiment, when not departing from the present invention's design, can also comprise other Equivalent embodiments more, and scope of the present invention is determined by appended right.

Claims (16)

1. create a method for network encyclopaedia entry, it is characterized in that, comprising:
Receive entry to be created, analyze the domain classification of described entry to be created, and the visual angle subregion that inquiry is corresponding with described domain classification;
Use described entry to be created and described visual angle subregion to set up retrieve statement, retrieve according to described retrieve statement;
According to the integration rules preset, result for retrieval is integrated, and the result for retrieval after display integration.
2. method according to claim 1, is characterized in that, the domain classification analyzing described entry to be created comprises:
According to the domain classification of entry to be created described in entry and domain classification model analysis, obtain domain classification described at least one.
3. method according to claim 1, is characterized in that, the visual angle subregion that described inquiry is corresponding with described domain classification, comprising:
According to the network encyclopaedia entry template that described domain classification is corresponding, inquire about the visual angle subregion corresponding with described domain classification; Wherein, described domain classification and the visual angle subregion corresponding with described domain classification is comprised in described network encyclopaedia entry template.
4. method according to claim 3, is characterized in that, described domain classification comprises at least one in personage, mechanism, medicine and brand;
The visual angle subregion that described figure picture is corresponding comprises the space-time collection of illustrative plates that time, space and life event form;
The corresponding visual angle subregion of described mechanism comprises time, space and related person;
The corresponding visual angle subregion of described medicine comprises time, inventor, invention mechanism, effect and spinoff;
The corresponding visual angle subregion of described brand comprises time, founder, scale and product.
5. method according to claim 1, is characterized in that, sets up retrieve statement at the described entry to be created of use and described visual angle subregion, after retrieving, according to the integration rules preset, before integrated searching result, also comprises according to described retrieve statement:
Adopt semantic character labeling device, semantic character labeling is carried out to each statement in result for retrieval, obtain the trunk structure of each statement described.
6. method according to claim 5, is characterized in that, at employing semantic character labeling device, before carrying out semantic character labeling, also comprises each statement in result for retrieval:
Respectively dependency analysis is carried out to each statement in result for retrieval, and analysis result is revised, obtain the dependency analysis tree that each statement described is corresponding respectively;
Then described employing semantic character labeling device, carries out semantic character labeling to each statement in result for retrieval, obtains the trunk structure of each statement described, comprising:
The dependency analysis of each statement in result for retrieval tree is inputted described semantic character labeling device respectively, after described semantic character labeling device receives the dependency analysis tree of statement at every turn, dependency analysis tree according to current reception carries out syntax parsing, exports the trunk structure of corresponding statement.
7., according to described method arbitrary in claim 1-6, it is characterized in that, described default integration rules comprises following at least one item:
Named entity recognition NER is carried out to described result for retrieval, to identify the entity word of certain sense;
Ambiguity elimination is carried out to the entity word in described result for retrieval, unified for same entity word with the entity word that will there is ambiguity;
Normalization represents the shrinkage language of same entity.
8. according to described method arbitrary in claim 1-6, it is characterized in that, according to the integration rules preset, after result for retrieval is integrated, before the result for retrieval after display is integrated, also comprise:
For each visual angle subregion inquired, add up in described result for retrieval should the frequency of occurrence of result for retrieval of visual angle subregion;
Multiple visual angles subregion of the semantic similitude inquired is merged into a visual angle subregion, and superposes the frequency of occurrence of result for retrieval corresponding to the plurality of visual angle subregion, as the frequency of occurrence of result for retrieval corresponding to visual angle subregion after merging;
Result for retrieval after described display integration comprises:
According to the frequency of occurrence order from high to low of result for retrieval corresponding to each visual angle subregion, show the result for retrieval that each visual angle subregion is corresponding.
9. create a device for network encyclopaedia entry, it is characterized in that, comprising:
Visual angle subregion enquiry module, for receiving entry to be created, analyzes the domain classification of described entry to be created, and the visual angle subregion that inquiry is corresponding with described domain classification;
Retrieve statement builds module, for using described entry to be created and described visual angle subregion to set up retrieve statement, retrieves according to described retrieve statement;
Result for retrieval integrate module, for according to the integration rules preset, integrates result for retrieval, and the result for retrieval after display integration.
10. device according to claim 9, is characterized in that, described visual angle subregion enquiry module specifically for, according to the domain classification of entry to be created described in entry and domain classification model analysis, obtain domain classification described at least one.
11. devices according to claim 9, is characterized in that, described visual angle subregion enquiry module specifically for, according to the network encyclopaedia entry template that described domain classification is corresponding, inquire about the visual angle subregion corresponding with described domain classification; Wherein, described domain classification and the visual angle subregion corresponding with described domain classification is comprised in described network encyclopaedia entry template.
12. devices according to claim 11, is characterized in that, described domain classification comprises at least one in personage, mechanism, medicine and brand;
The visual angle subregion that described figure picture is corresponding comprises the space-time collection of illustrative plates that time, space and life event form;
The corresponding visual angle subregion of described mechanism comprises time, space and related person;
The corresponding visual angle subregion of described medicine comprises time, inventor, invention mechanism, effect and spinoff;
The corresponding visual angle subregion of described brand comprises time, founder, scale and product.
13. devices according to claim 9, is characterized in that, also comprise:
Semantic character labeling module, for adopting semantic character labeling device, carries out semantic character labeling to each statement in result for retrieval, obtains the trunk structure of each statement described.
14. devices according to claim 13, is characterized in that, also comprise:
Dependency tree obtains module, for carrying out dependency analysis respectively to each statement in result for retrieval, and revises analysis result, obtains the dependency analysis tree that each statement described is corresponding respectively;
Then, described semantic character labeling module specifically for, the dependency analysis of each statement in result for retrieval tree is inputted described semantic character labeling device respectively, after described semantic character labeling device receives the dependency analysis tree of statement at every turn, dependency analysis tree according to current reception carries out syntax parsing, exports the trunk structure of corresponding statement.
15. according to described device arbitrary in claim 9-14, and it is characterized in that, described result for retrieval integrate module specifically performs the described default integration rules of following at least one item:
Named entity recognition NER is carried out to described result for retrieval, to identify the entity word of certain sense;
Ambiguity elimination is carried out to the entity word in described result for retrieval, unified for same entity word with the entity word that will there is ambiguity;
Normalization represents the shrinkage language of same entity.
16. according to described device arbitrary in claim 9-14, and it is characterized in that, described result for retrieval integrate module, comprising:
Frequency statistics submodule, for for each visual angle subregion inquired, adds up in described result for retrieval should the frequency of occurrence of result for retrieval of visual angle subregion;
Frequency superposition submodule, for multiple visual angles subregion of the semantic similitude inquired is merged into a visual angle subregion, and superpose the frequency of occurrence of result for retrieval corresponding to the plurality of visual angle subregion, as the frequency of occurrence of result for retrieval corresponding to visual angle subregion after merging;
Result output sub-module, for the frequency of occurrence order from high to low according to result for retrieval corresponding to each visual angle subregion, shows the result for retrieval that each visual angle subregion is corresponding.
CN201410742411.8A 2014-12-08 2014-12-08 A kind of method and device creating network encyclopaedia entry Active CN104484374B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410742411.8A CN104484374B (en) 2014-12-08 2014-12-08 A kind of method and device creating network encyclopaedia entry

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410742411.8A CN104484374B (en) 2014-12-08 2014-12-08 A kind of method and device creating network encyclopaedia entry

Publications (2)

Publication Number Publication Date
CN104484374A true CN104484374A (en) 2015-04-01
CN104484374B CN104484374B (en) 2018-11-16

Family

ID=52758915

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410742411.8A Active CN104484374B (en) 2014-12-08 2014-12-08 A kind of method and device creating network encyclopaedia entry

Country Status (1)

Country Link
CN (1) CN104484374B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866614A (en) * 2015-06-05 2015-08-26 深圳市爱学堂教育科技有限公司 Entry creating method and entry creating device
CN105243111A (en) * 2015-09-25 2016-01-13 常熟商数信息技术有限公司 Hierarchical relationship organization based multilingual thesaurus management method
CN107148624A (en) * 2015-06-22 2017-09-08 电子部品研究院 The method of preprocessed text and the pretreatment system for performing this method
CN108572954A (en) * 2017-03-07 2018-09-25 上海颐为网络科技有限公司 A kind of approximation entry structure recommendation method and system
CN108572953A (en) * 2017-03-07 2018-09-25 上海颐为网络科技有限公司 A kind of merging method of entry structure
CN108959255A (en) * 2018-06-28 2018-12-07 北京百度网讯科技有限公司 Entity labeled data collection construction method, device and equipment
CN108959228A (en) * 2018-07-13 2018-12-07 众安信息技术服务有限公司 Based on block chain creation, retrieval, the method and readable storage medium storing program for executing for editing data
CN110019656A (en) * 2017-07-26 2019-07-16 上海颐为网络科技有限公司 A kind of newly-built entry related content intelligently pushing method and system
CN111681769A (en) * 2020-08-17 2020-09-18 耀方信息技术(上海)有限公司 Medicine word segmentation searching method and system
CN112464115A (en) * 2020-11-24 2021-03-09 北京字节跳动网络技术有限公司 Information display method and device and computer storage medium
CN113157996A (en) * 2020-01-23 2021-07-23 久瓴(上海)智能科技有限公司 Document information processing method and device, computer equipment and readable storage medium
CN113282745A (en) * 2020-02-20 2021-08-20 清华大学 Automatic generation method and device for event encyclopedia document
CN116991969A (en) * 2023-05-23 2023-11-03 暨南大学 Method, system, electronic device and storage medium for retrieving configurable grammar relationship

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101179472A (en) * 2007-05-31 2008-05-14 腾讯科技(深圳)有限公司 Network resource searching method and searching system
US20100057568A1 (en) * 2007-08-11 2010-03-04 Tencent Technology (Shenzhen) Company Ltd. Method and Apparatus for Searching for Online Advertisement Resource
CN102033955A (en) * 2010-12-24 2011-04-27 常华 Method for expanding user search results and server
CN102314458A (en) * 2010-06-30 2012-01-11 百度在线网络技术(北京)有限公司 Method and system for acquiring network encyclopedia data
CN102737029A (en) * 2011-04-02 2012-10-17 腾讯科技(深圳)有限公司 Searching method and system
CN104133916A (en) * 2014-08-14 2014-11-05 百度在线网络技术(北京)有限公司 Search result information organizational method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101179472A (en) * 2007-05-31 2008-05-14 腾讯科技(深圳)有限公司 Network resource searching method and searching system
US20100057568A1 (en) * 2007-08-11 2010-03-04 Tencent Technology (Shenzhen) Company Ltd. Method and Apparatus for Searching for Online Advertisement Resource
CN102314458A (en) * 2010-06-30 2012-01-11 百度在线网络技术(北京)有限公司 Method and system for acquiring network encyclopedia data
CN102033955A (en) * 2010-12-24 2011-04-27 常华 Method for expanding user search results and server
CN102737029A (en) * 2011-04-02 2012-10-17 腾讯科技(深圳)有限公司 Searching method and system
CN104133916A (en) * 2014-08-14 2014-11-05 百度在线网络技术(北京)有限公司 Search result information organizational method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王步康等: "基于依存句法分析的中文语义角色标注", 《中文信息学报》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866614A (en) * 2015-06-05 2015-08-26 深圳市爱学堂教育科技有限公司 Entry creating method and entry creating device
CN107148624A (en) * 2015-06-22 2017-09-08 电子部品研究院 The method of preprocessed text and the pretreatment system for performing this method
CN105243111A (en) * 2015-09-25 2016-01-13 常熟商数信息技术有限公司 Hierarchical relationship organization based multilingual thesaurus management method
CN108572954A (en) * 2017-03-07 2018-09-25 上海颐为网络科技有限公司 A kind of approximation entry structure recommendation method and system
CN108572953A (en) * 2017-03-07 2018-09-25 上海颐为网络科技有限公司 A kind of merging method of entry structure
CN108572954B (en) * 2017-03-07 2023-04-28 上海颐为网络科技有限公司 Method and system for recommending approximate entry structure
CN110019656A (en) * 2017-07-26 2019-07-16 上海颐为网络科技有限公司 A kind of newly-built entry related content intelligently pushing method and system
CN108959255A (en) * 2018-06-28 2018-12-07 北京百度网讯科技有限公司 Entity labeled data collection construction method, device and equipment
CN108959228A (en) * 2018-07-13 2018-12-07 众安信息技术服务有限公司 Based on block chain creation, retrieval, the method and readable storage medium storing program for executing for editing data
CN113157996B (en) * 2020-01-23 2022-09-16 久瓴(上海)智能科技有限公司 Document information processing method and device, computer equipment and readable storage medium
CN113157996A (en) * 2020-01-23 2021-07-23 久瓴(上海)智能科技有限公司 Document information processing method and device, computer equipment and readable storage medium
CN113282745A (en) * 2020-02-20 2021-08-20 清华大学 Automatic generation method and device for event encyclopedia document
CN111681769A (en) * 2020-08-17 2020-09-18 耀方信息技术(上海)有限公司 Medicine word segmentation searching method and system
WO2022111249A1 (en) * 2020-11-24 2022-06-02 北京字节跳动网络技术有限公司 Information presentation method, apparatus, and computer storage medium
CN112464115A (en) * 2020-11-24 2021-03-09 北京字节跳动网络技术有限公司 Information display method and device and computer storage medium
CN116991969A (en) * 2023-05-23 2023-11-03 暨南大学 Method, system, electronic device and storage medium for retrieving configurable grammar relationship
CN116991969B (en) * 2023-05-23 2024-03-19 暨南大学 Method, system, electronic device and storage medium for retrieving configurable grammar relationship

Also Published As

Publication number Publication date
CN104484374B (en) 2018-11-16

Similar Documents

Publication Publication Date Title
CN104484374A (en) Method and device for creating Internet encyclopedia entry
US10997370B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
CN101539907B (en) Part-of-speech tagging model training device and part-of-speech tagging system and method thereof
US9740736B2 (en) Linking ontologies to expand supported language
CN111475623A (en) Case information semantic retrieval method and device based on knowledge graph
CN110287494A (en) A method of the short text Similarity matching based on deep learning BERT algorithm
Berardi et al. Word Embeddings Go to Italy: A Comparison of Models and Training Datasets.
CN104794169B (en) A kind of subject terminology extraction method and system based on sequence labelling model
CN108038725A (en) A kind of electric business Customer Satisfaction for Product analysis method based on machine learning
CN105843897A (en) Vertical domain-oriented intelligent question and answer system
CN105824933A (en) Automatic question-answering system based on theme-rheme positions and realization method of automatic question answering system
CN105930452A (en) Smart answering method capable of identifying natural language
CN104050302B (en) Topic detecting system based on atlas model
CN106970910A (en) A kind of keyword extracting method and device based on graph model
CN111414763A (en) Semantic disambiguation method, device, equipment and storage device for sign language calculation
CN106599032A (en) Text event extraction method in combination of sparse coding and structural perceptron
CN103440314A (en) Semantic retrieval method based on Ontology
CN111553160B (en) Method and system for obtaining question answers in legal field
JP2022532451A (en) How to disambiguate Chinese place name meanings based on encyclopedia knowledge base and word embedding
CN112328800A (en) System and method for automatically generating programming specification question answers
CN107133212A (en) It is a kind of that recognition methods is contained based on integrated study and the text of words and phrases integrated information
CN114841353A (en) Quantum language model modeling system fusing syntactic information and application thereof
Yaman et al. Address entities extraction using named entity recognition
CN103177089A (en) Sentence meaning composition relationship lamination identification method based on central blocks
Jain et al. TexEmo: Conveying emotion from text-the study

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant