CN109783651A - Extract method, apparatus, electronic equipment and the storage medium of entity relevant information - Google Patents

Extract method, apparatus, electronic equipment and the storage medium of entity relevant information Download PDF

Info

Publication number
CN109783651A
CN109783651A CN201910087401.8A CN201910087401A CN109783651A CN 109783651 A CN109783651 A CN 109783651A CN 201910087401 A CN201910087401 A CN 201910087401A CN 109783651 A CN109783651 A CN 109783651A
Authority
CN
China
Prior art keywords
entity
attribute
predetermined
text
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910087401.8A
Other languages
Chinese (zh)
Other versions
CN109783651B (en
Inventor
贺薇
李双婕
史亚冰
梁海金
张扬
朱勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910087401.8A priority Critical patent/CN109783651B/en
Publication of CN109783651A publication Critical patent/CN109783651A/en
Application granted granted Critical
Publication of CN109783651B publication Critical patent/CN109783651B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiment of the disclosure provides a kind of method, apparatus, electronic equipment and computer readable storage medium for extracting entity relevant information.In the method, it calculates equipment and obtains multiple candidate texts associated with predetermined entity and predetermined attribute.In addition, calculating semanteme of the equipment based on the entity attribute pair formed by predetermined entity and predetermined attribute, at least one target text is determined from multiple candidate texts.Further, it calculates equipment and is based at least one target text, determine the attribute value of the predetermined attribute of predetermined entity.Embodiment of the disclosure can improve timeliness when extracting entity relevant information and reduce cost of labor.

Description

Extract method, apparatus, electronic equipment and the storage medium of entity relevant information
Technical field
Embodiment of the disclosure is generally related to technical field of information processing, and more specifically it relates to a kind of extraction is real Method, apparatus, electronic equipment and the computer readable storage medium of body relevant information.
Background technique
Traditionally, there are the modes that two kinds are extracted entity relevant information.A kind of mode is the extraction of pure opening, is mainly wrapped It includes and is extracted for free text and the open of half structure webpage.That is, being opened in the free text and half structure webpage of internet The relevant semantic relations between entity and entity are excavated to putting property, wherein semi-structured webpage refers to certain structural net Page, this structural performance are based on hypertext markup language (HTML).For example, in text " Yao Ming, 12 sunrise of September in 1980 Be born in Xuhui District of Shanghai " in directly excavate (Yao Ming, date of birth, on September 12nd, 1980) and (Yao Ming, birthplace, on Sea market Xuhui District) as triple.Another way is the extraction of structuring, refers mainly to configure mapping pass by manually System is to extract entity relevant information.That is, artificially configuring multiple mappings to each website for the fixation website of fixed vertical class Relationship templates, for example, webpage canonical template, extensible markup language path (xPath) etc. are artificially defined, to solid in webpage The data for determining structure are oriented extraction.
However, these extract the traditional scheme of entity relevant informations there is also various problem and shortage, many occasions without Method meets the performance requirement for extracting entity relevant information, to result in undesirable user's body in the applications such as entity recommendation It tests.
Summary of the invention
Method, apparatus, electronic equipment and the computer that embodiment of the disclosure is related to a kind of extraction entity relevant information can Read storage medium.
In the disclosure in a first aspect, providing a kind of method for extracting entity relevant information.This method comprises: obtain with Predetermined entity and the associated multiple candidate texts of predetermined attribute.This method further include: based on by predetermined entity and predetermined attribute The semanteme of the entity attribute pair of formation determines at least one target text from multiple candidate texts.This method further comprises: Based at least one target text, the attribute value of the predetermined attribute of predetermined entity is determined.
In the second aspect of the disclosure, a kind of device for extracting entity relevant information is provided.The device includes: candidate text This acquisition module is configured as obtaining multiple candidate texts associated with predetermined entity and predetermined attribute.The device further include: Target text determining module is configured as the semanteme based on the entity attribute pair formed by predetermined entity and predetermined attribute, from more At least one target text is determined in a candidate's text.The device further comprises: attribute value determining module is configured as being based on At least one target text determines the attribute value of the predetermined attribute of predetermined entity.
In the third aspect of the disclosure, a kind of electronic equipment is provided.The electronic equipment includes one or more processors; And storage device, for storing one or more programs.When one or more programs are executed by one or more processors, So that the method that one or more processors realize first aspect.
In the fourth aspect of the disclosure, a kind of computer readable storage medium is provided, computer program is stored thereon with, The method of first aspect is realized when the computer program is executed by processor.
It should be appreciated that content described in Summary be not intended to limit embodiment of the disclosure key or Important feature, it is also non-for limiting the scope of the present disclosure.Other features of the disclosure will become easy reason by description below Solution.
Detailed description of the invention
The following detailed description is read with reference to the accompanying drawings, above-mentioned and other purposes, the feature of embodiment of the disclosure It will be easy to understand with advantage.In the accompanying drawings, several implementations of the disclosure are shown by way of example rather than limitation Example, in which:
Fig. 1 shows some embodiments of the present disclosure can be in the schematic diagram for the example context wherein realized;
Fig. 2 shows the schematic flow charts of the method according to an embodiment of the present disclosure for extracting entity relevant information;
Fig. 3 shows the schematic block diagram of the device according to an embodiment of the present disclosure for extracting entity relevant information;
Fig. 4 shows a kind of general technology frame of the attribute value according to an embodiment of the present disclosure for extracting entity attribute Schematic block diagram;And
Fig. 5 shows a kind of schematic block diagram of equipment that can be used to implement embodiment of the disclosure.
Through all attached drawings, same or similar reference label is used to represent same or similar component.
Specific embodiment
Several exemplary embodiments shown in below with reference to the accompanying drawings describe the principle and spirit of the disclosure.It should Understand, describes these specific embodiments merely to enabling those skilled in the art to more fully understand and realizing this public affairs It opens, and not limits the scope of the present disclosure in any way.
As mentioned above, traditional entity relationship extracting mode mainly includes extracting mode and the structuring of pure opening Extracting mode.However, there are some problem and shortage in both traditional extracting modes.For example, the extracting mode of pure opening It is mainly used to handle the batch extracting of knowledge, but ductility is larger when for the extraction of novel entities and newly-increased knowledge, renewal time It is long, therefore not can solve the renewal of knowledge problem of timeliness.On the other hand, the extracting mode of structuring is primary disadvantage is that artificial Cost is larger, needs manually according to structure of web page come configuration extraction template, and can only realize that a degree of orientation is extracted.It is logical Configuration target class purpose template is crossed, the orientation of classification granularity may be implemented, but also cannot achieve " entity+attribute " granularity and determine To.
One is proposed in view of the above problem present in traditional scheme and potential other problems, embodiment of the disclosure Kind extracts method, apparatus, electronic equipment and the computer readable storage medium of entity relevant information, to extract entity correlation letter Timeliness is improved when breath and reduces cost of labor.Specifically, embodiment of the disclosure proposes a kind of orientation knowledge extractive technique, It is mainly used in the case where given " entity-attribute " binary group, has and pointedly extract its corresponding attribute value.It is proposed Orientation extractive technique be intended to from text library (for example, internet text of magnanimity) by information extraction technology orient extract The entity relationship data of high confidence level.
From the angle that knowledge mapping constructs, the orientation extractive technique proposed can extract the relationship category of entity missing Property value, can be used for being promoted the degree of communication of knowledge mapping, efficiently promoted knowledge mapping instructive degree and completeness.From product From the perspective of, the entity relationship data of supplement can directly meet the needs of user is for entity associated, can also have Effect improves the efficiency of people's retrieval and browsing entity, promotes user experience, and typical application may include that entity question and answer, entity push away It recommends.
Compared to traditional entity information extraction scheme, on the one hand embodiment of the disclosure solves imeliness problem.Such as Fruit have novel entities or in the short time high temperature entity appearance, since renewal time is short, each embodiment can be extracted rapidly The attribute value of the missing of novel entities or high temperature entity supplements entity attribute, improves knowledge mapping for timeliness " entity-category The covering of property-attribute value ".On the other hand, embodiment of the disclosure reduces cost of labor, such as uses deep learning model It for owning " entity-attribute-attribute value " relationship unified Modeling, therefore does not need have deep understanding to domain knowledge, is not required to yet Complicated advanced features are designed, thus easy to maintain and extension.Several embodiments of the disclosure are described with reference to the accompanying drawing.
Fig. 1 shows some embodiments of the present disclosure can be in the signal for the example context (or system) 100 wherein realized Figure.As shown in Figure 1, predetermined entity 105 and predetermined attribute 110 can be input to and calculate in equipment 120 in example context 100, So as to for example obtained from the text of text library (not shown) by calculating equipment 120 predetermined entity 105 predetermined attribute 110 category Property value 160.In some embodiments, text library may include the text collection obtained from internet.In other embodiments, Text library may include any text collection appropriate for describing any attribute of any entity, including but not limited to various uses Collect with the text in source.
In the context of the disclosure, term " entity " refers to distinguishability and certain self-existent things, all Such as a certain individual, some city, a certain plant, a certain commodity, etc..World's all things on earth is made of specific things, With referred to as entity.For example, " China ", " U.S. ", " Japan " etc..Term " attribute " refer to entity certain property or entity with Relationship between another entity.For example, attribute can refer to height, gender, birthplace of someone, etc..In addition, attribute is also It can refer to the relationship of some entity Yu another entity.For example, husband, father, friend, etc..Term " attribute value " refers to reality The particular content of body attribute or another entity with entity with certain relationship.For example, the category of the attribute " gender " of someone Property value can be " male ".In another example having the attribute of certain attribute of a relation (for example, wife) with some entity (for example, Yao Ming) Value can be another entity (for example, Ye Li).It should be appreciated that above for various terms definition be merely exemplary with The disclosure is understood in help, it is not intended to be limited the scope of the present disclosure in any way.In other embodiments, used herein various Term will meet the art-recognized meanings being generally understood by those skilled in the art.
Continue to refer to figure 1, calculate equipment 120 can predetermined entity 105 and predetermined attribute 110 based on input, from text Multiple candidate text 140-1s to 140-N associated with predetermined entity 105 and predetermined attribute 110 are obtained in library (can hereafter unite Referred to as multiple candidate texts 140).Because multiple candidate's texts 140 are related with predetermined entity 105 and predetermined attribute 110, meter Equipment 120 is calculated to be possible to extract attribute value 160 from multiple candidate texts 140.In addition, in order to improve the performance of system 100 And robustness, multiple candidate texts 140 can be filtered by calculating equipment 120.It can be based on pre- for this purpose, calculating equipment 120 The semanteme for determining the entity attribute pair that entity 105 and predetermined attribute 110 form determines at least one from multiple candidate texts 140 Target text 150-1 to 150-M (hereafter may be collectively referred to as multiple target texts 150), with for extracting attribute value 160, wherein M With N be positive integer and M can be less than or equal to N.Then, calculating equipment 120 can the text of at least one target based on determined by This 150 come determine predetermined entity 105 predetermined attribute 110 attribute value 160.
It will be understood that calculating equipment 120 can be any type of mobile terminal, fixed terminal or portable terminal, including Mobile phone, website, unit, equipment, multimedia computer, multimedia plate, internet node, communicator, desktop computer, Laptop computer, notebook computer, netbook computer, tablet computer, PCS Personal Communications System (PCS) equipment, individual Navigation equipment, personal digital assistant (PDA), audio/video player, digital camera/video camera, positioning device, television reception Device, radio broadcast receiver, electronic book equipment, game station or any combination thereof, accessory including these equipment and outer If any combination thereof.Any type of interface for user can be supported (all it is also contemplated that calculating equipment 120 Such as " wearable " circuit).It is described below with reference to Fig. 2 according to an embodiment of the present disclosure for extracting entity relevant information Exemplary operations.
Fig. 2 shows the schematic flow charts of the method 200 according to an embodiment of the present disclosure for extracting entity relevant information. In some embodiments, method 200 can be realized by the calculating equipment 120 of Fig. 1, such as can be by calculating equipment 120 Device or processing unit are managed to realize.In other embodiments, all or part of method 200 can also be by independently of calculating equipment The calculating equipment of system 120 is realized, or can be realized by other units in example context 100.To incite somebody to action convenient for discussing Method 200 is described in conjunction with Fig. 1.
At 210, calculates equipment 120 and obtain multiple candidate texts associated with predetermined entity 105 and predetermined attribute 110 140.It should be understood that calculating equipment 120 can be used any mode appropriate to obtain multiple candidate texts 140, as long as multiple Candidate text 140 is associated with predetermined entity 105 and predetermined attribute 110, and embodiment of the disclosure is unrestricted in this respect. For example, the particular community about certain special entities, may have existed introduction or illustrate the particular community of the special entity Text collection.In this case, multiple candidate texts can be obtained by importing text set by calculating equipment 120 140。
More generally, in some embodiments, calculating equipment 120 can be obtained more by being retrieved in text library A candidate's text 140.For example, calculate equipment 120 can determine entity term corresponding with predetermined entity 105 and with it is predetermined The corresponding attribute retrieval word of attribute 110.Then, calculating equipment 120 can use identified entity term and attribute inspection Rope word retrieves multiple candidate texts 140 from text library.In this way, calculating equipment 120 can be in text library Find out text related with predetermined entity 105 and predetermined attribute 110.As noted above, it can wrap for the text library of retrieval Include the text collection obtained from internet.It in addition, or alternatively, may include describing any reality for the text library of retrieval Any text collection appropriate of any attribute of body, the text of including but not limited to various uses and source collect.
In some embodiments, calculate title that the entity term that uses of equipment 120 may include predetermined entity 105, Alias, other may refer to keyword of predetermined entity 105 etc. and any combination of them.Similarly, equipment 120 is calculated The attribute retrieval word used may include the title of predetermined attribute 110, alias, introducer, other are related with predetermined attribute 110 Keyword etc. and any combination of them.As used in this article, the introducer of attribute can be used for being guided out certain of entity A attribute.For example, introducer " marriage " can be used for being guided out entity attributes " spouse ".In this way, calculating is set Standby 120 can be to avoid the omission text related with predetermined entity 105 and predetermined attribute 110 in retrieval.
In some embodiments, in order to targetedly extract the relevant information of popular entity or novel entities, equipment is calculated 120 can determine that emerging entity or search rate are higher than the entity of threshold value as predetermined entity 105.As popular entity A kind of example, it is assumed that there is currently a higher personages of the degree of social concern (such as, certain star), and the personage is in search platform Upper search rate with higher, it is the entity in the short time with high temperature that this, which embodies the personage,.In this case, it counts Calculating equipment 120 can be using the personage as predetermined entity 105.It can be by by the search rate of entity for this purpose, calculating equipment 120 It is compared with predetermined threshold, to determine entity search rate whether with higher.It is readily apparent that, threshold value here can basis Specific system environments and design requirement reasonably select.In addition, the example as novel entities, if some is created recently Recreation ground i.e. will be open to the public, then the recreation ground will be the novel entities occurred recently.In this case, equipment is calculated 120 can be using the recreation ground as predetermined entity 105.
After predetermined entity 105 has been determined, predetermined attribute can be determined based on predetermined entity 105 by calculating equipment 120 110.For example, calculating equipment 120 will can correspondingly make a reservation in the case where certain star is determined as predetermined entity 105 Attribute 110 is determined as attribute related with the star, such as height, weight, birthplace, previous graduate college, friend men and women, etc.. In another example calculating equipment 120 can be correspondingly by predetermined category in the case where new recreation ground is determined as predetermined entity 105 Property 110 is determined as attribute related with the recreation ground, such as specific address, occupied area, business hours, amusement facility, etc..
At 220, language of the equipment 120 based on the entity attribute pair formed by predetermined entity 105 and predetermined attribute 110 is calculated Justice determines at least one target text 150 from multiple candidate texts 140.It will be understood that although it is multiple candidate text 140 be with Predetermined entity 105 and predetermined attribute 110 are associated, but it is not intended that multiple candidate's texts 140 are inevitable semantically It is related to the semanteme of entity attribute pair that predetermined entity 105 and predetermined attribute 110 form.For example, some text may include reality Body " Yao Ming " and attribute " height ", but the semanteme of the text is not necessarily related to " height of Yao Ming ", may only refer to Yao Ming and describe the height of another people.Therefore, pass through the entity attribute pair based on predetermined entity 105 and predetermined attribute 110 Semanteme select at least one target text 150, to calculate equipment 120 can carry out all candidate texts 140 of acquisition Filter only retains the reality formed with predetermined entity 105 and predetermined attribute 110 to reduce the amount of text for extracting attribute value 160 The semanteme of body attribute pair is related, and can extract the text of attribute value 160 out, to improve the performance and robustness of system 100.
In some embodiments, for the given candidate text 140-1 in multiple candidate texts 140, calculating equipment 120 can To handle candidate text 140-1, to determine the semanteme of candidate text 140-1.For example, calculating equipment 120 can be known by part of speech Other tool obtains the participle and part of speech recognition result of candidate text 140-1, and candidate text is obtained by dependency analysis tool The interdependent recognition result of the sentence of 140-1, and obtained by subgraph associated tool candidate text 140-1 Entity recognition and Upperseat concept recognition result.It should be understood that time can also be determined by any other semantic analysis by calculating equipment 120 The semanteme of this 140-1 of selection.
Then, the semanteme and predetermined entity 105 and predetermined attribute 110 of candidate text 140-1 can be determined by calculating equipment 120 Entity attribute pair semanteme between similarity.For example, semantic relevant text validity point can be called by calculating equipment 120 Class model (or be operator) is come the calculating that carries out the semantic dependency, and calling classification algorithm differentiates candidate text 140- Whether 1 semanteme is related to the semanteme of entity attribute pair that predetermined entity 105 and predetermined attribute 110 form and then literary from candidate The text unrelated with the semanteme is filtered out in sheet 140.It should be understood that any other determination can also be passed through by calculating equipment 120 Semantic similarity method determines above-mentioned semantic dependency.Then, if identified semantic similarity is higher than threshold value, calculating is set Standby 120 can choose candidate text 140-1 as one of at least one target text 150.It is readily apparent that, threshold value here can root It is reasonably selected according to specific system environments and design requirement.
In addition, in some embodiments, before the semanteme for determining multiple candidate texts 140, calculating equipment 120 can be with Prefiltration is executed to multiple candidate texts 140, filters out the language of the entity attribute pair with predetermined entity 105 and predetermined attribute 110 The unrelated candidate text 140 of justice.For example, calculating whether equipment 120 can include predetermined entity in candidate text 140 by determining 105 title (name and alias including entity etc.), title (name, alias including attribute, guidance of predetermined attribute 110 Word etc.), whether text size is in predefined length of interval, the features such as the Chinese character ratio of text are carried out to multiple The primary filtration of candidate text 140, to exclude the obviously language with predetermined entity 105 and the entity attribute pair of predetermined attribute 110 The unrelated candidate text 140 of justice.
At 230, calculates equipment 120 and be based at least one target text 150, determine the predetermined attribute of predetermined entity 105 110 attribute value 160.It should be appreciated that calculating equipment 120 can be used the extraction of any existing extracting method or the following exploitation Method extracts attribute value 160 from least one target text 150, and embodiment of the disclosure is unrestricted in this aspect.For example, Calculating equipment 120 can be used the extraction model based on deep learning and extract attribute value from least one target text 150 160.In addition, or alternatively, calculating equipment 120 also can be used other kinds of extraction model from least one target text Attribute value 160 is extracted in 150.
In some embodiments, in order to improve the extraction accuracy of attribute value 160, calculating equipment 120 and can be used has Multiple and different extraction models of different model structures are based on predetermined entity 105 and predetermined attribute 110, from least one target Multiple candidate values are extracted in text 150.It will be understood that multiple and different extraction models may include can be according to predetermined entity Any model of attribute value is extracted from given text with predetermined attribute, for example, multiple bases with different neural network structures In the extraction model of neural network.
By way of example, calculating equipment 120 can be used three kinds of different extraction models.The first extracts model can It is that one kind is based on deep learning Computational frame (for example, PaddlePaddle is flat to be slot filling (Slot Filling) model Platform) deep learning model, mentioned for slot filling task (known entities and attributes extraction attribute value) attribute value for being completed Modulus type.The reading that other two kinds of extractions model can be two kinds of different structures understands model, they are to understand to appoint based on reading The attribute value extraction model for being engaged in completed.Read for two kinds and understand that entity and attribute can be changed into inquiry by model, and with inquiry with Text is as mode input, to mark out attribute value initial position in the text and end position.It will be understood that providing here Concrete model and model number be merely exemplary, it is not intended to limit the scope of the present disclosure in any way.In other embodiments In, calculating equipment 120 can be used any number of any different model to extract attribute 160.
Using have different model structures be extracted multiple candidate values after, calculate equipment 120 can determine The respective confidence level of multiple candidate value.As an example, it is assumed that predetermined entity 105 is " Yao Ming " and predetermined attribute 110 be " birthplace ", then multiple candidate values that multiple and different models is extracted from least one target text 150 may It is China, the U.S., Beijing, Shanghai.In this case, calculating equipment 120 can determine that this four candidate values are respective Confidence level namely they be correct Yao Ming birthplace probability.
It will be understood that the confidence level of candidate value can be determined in any suitable manner by calculating equipment 120, including but It is not limited by attribute value and extracts model to obtain, be verified by other data bank, by determining and predetermined entity Degree of correlation of other attributes, etc..For example, above with regard in the example in Yao Ming birthplace, calculate during equipment 120 can determine State, the U.S., Beijing, the respective confidence level in Shanghai are 0.7,0.3,0.5,0.8.
Then, the attribute value that equipment 120 can select confidence level to be higher than threshold value from multiple candidate values is calculated.As A kind of example, threshold value here can be set to 0.7, therefore calculate equipment 120 and can choose " Shanghai " as predetermined entity " Yao It is bright " predetermined attribute " birthplace " attribute value.It should be understood that specific value and place name given here are merely illustrative, it is not intended to It limits the scope of the present disclosure in any way.In addition, threshold value here can according to specific system environments and design requirement come It reasonably selects.As a kind of alternative for selecting attribute value from multiple candidate values, calculating equipment 120 can also be from more The highest attribute value of confidence level is selected in a candidate value.
In some embodiments, at least one target text 150 may include multiple target text 150-1 to 150-M.? In this case, different extraction models may extract identical candidate value from different target texts.Therefore, it is Determine the respective confidence of each candidate value in multiple candidate values, multiple times can be directed to by calculating equipment 120 The given candidate value in attribute value is selected, determines the multiple of the extraction model and target text for extracting given candidate value Pairing.
Continue the example being used above, without loss of generality, it is assumed that candidate value " Shanghai " extracts model from the by first It extracts, is also extracted from the second target text 150-2 by the first extraction model, also by second in one target text 150-1 It extracts model to extract from the second target text 150-2, also be extracted from the 4th target text 150-4 by the second extraction model Out, also model is extracted by third to extract from third target text 150-3.In this case, for candidate value " on Sea ", the following multiple pairings for extracting attribute value " Shanghai " can be determined by calculating equipment 120: first extracts model and first Target text 150-1, first extract model and the second target text 150-2, the second extraction model and the second target text 150- 2, second model and the 4th target text 150-4 and third extraction model and third target text 150-3 are extracted.
Then, multiple confidence scores of candidate value, multiple confidence score point can be obtained by calculating equipment 120 It is not associated with multiple pairings.For example, the example continued the above, for candidate value " Shanghai ", first extract model about First object text 150-1 provides the extraction model of confidence score 0.6, first and provides confidence about the second target text 150-2 Degree score 0.5, second extract model about the second target text 150-2 provide confidence score 0.8, second extract model about 4th target text 150-4 provides confidence score 0.7 and third is extracted model and provided about third target text 150-3 Confidence score 0.6.In this case, multiple confidence levels that calculating equipment 120 can obtain candidate value " Shanghai " obtain It is divided into 0.6,0.5,0.8,0.7,0.6.
Then, multiple confidence scores of candidate value can be added by calculating equipment 120, to obtain candidate value Confidence level.In the above example, calculate equipment 120 can by multiple confidence scores 0.6 of attribute value " Shanghai ", 0.5, 0.8, it 0.7,0.6 is added, so that it is determined that the confidence level of candidate value " Shanghai " is 3.2 out.In this way, calculating is set Standby 120 can synthetically evaluate the confidence level of some candidate value in a manner of quantization.Similarly, equipment is calculated 120 can calculate the confidence level of other candidate values (such as, China, the U.S. and Beijing), and finally select confidence level height In the attribute value of threshold value.
Fig. 3 shows the schematic block diagram of the device 300 according to an embodiment of the present disclosure for extracting entity relevant information.? In some embodiments, device 300 can be included in the calculating equipment 120 of Fig. 1 or be implemented as to calculate equipment 120.
As shown in figure 3, device 300 includes that candidate text obtains module 310, target text determining module 320 and attribute value Determining module 330.Candidate text obtains module 310 and is configured as obtaining multiple times associated with predetermined entity and predetermined attribute Selection sheet.Target text determining module 320 is configured as based on the entity attribute pair formed by predetermined entity and predetermined attribute Semanteme determines at least one target text from multiple candidate texts.Attribute value determining module 330 is configured as based at least one A target text determines the attribute value of the predetermined attribute of predetermined entity.
In some embodiments, it includes: term determining module that candidate text, which obtains module 310, be configured to determine that with The corresponding entity term of predetermined entity and attribute retrieval word corresponding with predetermined attribute;And retrieval module, it is configured To utilize entity term and attribute retrieval word, multiple candidate texts are retrieved from text library.
In some embodiments, entity term includes at least one of title and alias of predetermined entity, and is belonged to Property term include at least one of title, alias and introducer of predetermined attribute, introducer is for being guided out predetermined entity Predetermined attribute.
In some embodiments, device 300 further comprises: predetermined entity determining module is configured to determine that new appearance Entity or search rate be higher than threshold value entity as predetermined entity;And predetermined attribute determining module, it is configured as being based on Predetermined entity determines predetermined attribute.
In some embodiments, for the given candidate text in multiple candidate texts, target text determining module 320 is wrapped Include: processing module is configured as handling semanteme of the given candidate text to determine given candidate text;Similarity determining module, The similarity being configured to determine that between the semanteme of given candidate text and the semanteme of entity attribute pair;And target text selection Module is configured to respond to similarity higher than threshold value, selects given candidate text as one of at least one target text.
In some embodiments, attribute value determining module 330 includes: attribute value extraction module, and being configured with has Multiple and different extraction models of different model structures are based on predetermined entity and predetermined attribute, from least one target text Extract multiple candidate values;Confidence determination module is configured to determine that the confidence level of multiple candidate values;And attribute It is worth selecting module, is configured as the attribute value for selecting confidence level to be higher than threshold value from multiple candidate values.
In some embodiments, at least one target text includes multiple target texts, and is directed to multiple candidate attributes Given candidate value in value, confidence determination module include: pairing determining module, are configured to determine that and extract given time Select multiple pairings of the extraction model and target text of attribute value;Score obtains module, is configured as obtaining candidate value Multiple confidence scores associated with multiple pairings respectively;And adduction module, it is configured as multiple confidence score phases Add, to obtain the confidence level of given candidate value.
Fig. 4 shows a kind of general technology frame of the attribute value according to an embodiment of the present disclosure for extracting entity attribute 400 schematic block diagram.As shown in figure 4, general technology frame 400 may include attribute value extracting tool 401 and external tool 403.In some embodiments, attribute value extracting tool 401 can use external tool 403 to realize embodiment of the disclosure, Method 200 such as about Fig. 2 description.For example, attribute value extracting tool 401 can input predetermined entity attribute to 405 it Afterwards, orientation extracts 407 information of attribute value corresponding to predetermined entity and predetermined attribute from text library.
Attribute value extracting tool 401 includes text retrieval module 410, text validity categorization module 420, attribute value extraction Model 430 and multi-source fusion module 440.The modules of attribute value extracting tool 401 can use the retrieval of external tool 403 Interface 450 sweeps library tool 460, dependency analysis and part of speech identification module 470, subgraph relating module 480 and deep learning frame 490 realize to the extraction of attribute value 407, are described in detail below.
The major function of text retrieval module 410 may include the predetermined entity attribute according to input to 405, such as pass through Retrieval Interface 450 obtains the corpus text extracted for attribute value with library tool 460 (such as seeksign sweeps library tool) is swept. Text retrieval module 410 supports to obtain predetermined entity attribute from multiple Text Retrieval Model sources to relevant text information, and And it is easy to add and extends other models.
Furthermore, it is contemplated that entity is frequently present of phenomenon of the same name, text retrieval module 410 may include entity granularity and text Two kinds of the granularity text acquisition modes combined, wherein entity granularity, which refers to, does not consider other entities of the same name, and it is real only to extract input Body corresponds to text information, and text granularity refers to while considering all text informations corresponding with all entities of the same name.One In a little embodiments, the Text Retrieval Model of text retrieval module 410 may include encyclopaedia text, physical page, question and answer text library With wantonly search for four class of search result to obtain relevant web results, wherein first two can be entity granularity, and latter two can To be text granularity.
The major function of text validity categorization module 420 may include all texts obtained to text retrieval module 410 Originally it is filtered and classifies, to reduce the amount of text for being sent to subsequent module, only reservation is related to semanteme to predetermined entity attribute, And the text of attribute value can be extracted out, to improve the performance and robustness of system.In some embodiments, text validity point Generic module 420 may be implemented such as semantic unrelated prefiltration function, semantic information obtain function and semantic relevant classification function.
Semantic unrelated prefiltration function for example can by text whether include entity the title (name including entity And alias), the title (name, alias including attribute, introducer) of attribute, text size, the spies such as text Chinese character ratio Sign is to carry out primary filtration.Semantic information obtains function can for example obtain participle and part of speech identification knot by part of speech identification facility Fruit obtains the interdependent recognition result of sentence by dependency analysis tool, obtains Entity recognition and upper by subgraph associated tool Concept identification result.Semantic relevant classification function can for example call semantic relevant text validity disaggregated model to carry out language The relevant feature calculation of justice, and calling classification algorithm differentiates whether text is semantic related to predetermined entity to predetermined attribute , and then filter out and semantic unrelated text.
The major function that attribute value extracts model 430 may include in given predetermined entity, predetermined attribute and for mentioning In the case where the text for taking attribute value, extract in the text with entity attribute to corresponding attribute value.Attribute value extracts model 430 support to add multiple extraction models, i.e., are obtained respectively by multiple extraction models as a result, and being easy to extended model.
The input of multi-source fusion module 440 can be entity-attribute-text-attribute value, and output can be entity-attribute- Attribute value, major function may include for each entity attribute pair, to call knowledge fusion model to mention multiple attribute values The attribute value of modulus type output from multiple target texts carries out multi-source fusion preferentially, final output attribute value 407.Melt in multi-source It molds in block 440, the extraction result that attribute value extracts multiple extraction models in model 430 easily can be expanded into participation In candidate value preferentially.
Fig. 5 schematically shows a kind of block diagram of equipment 500 that can be used to implement embodiment of the disclosure.Such as figure Shown in 5, equipment 500 includes central processing unit (CPU) 501, can be according to being stored in read only memory devices (ROM) Computer program instructions in 502 are loaded into the calculating in random access memory device (RAM) 503 from storage unit 508 Machine program instruction, to execute various movements appropriate and processing.In RAM 503, can also store equipment 500 operate it is required each Kind program and data.CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 are also connected to bus 504.
Multiple components in equipment 500 are connected to I/O interface 505, comprising: input unit 506, such as keyboard, mouse etc.; Output unit 507, such as various types of displays, loudspeaker etc.;Storage unit 508, such as disk, CD etc.;And it is logical Believe unit 509, such as network interface card, modem, wireless communication transceiver etc..Communication unit 509 allows equipment 500 by such as The computer network of internet and/or various telecommunication networks exchange information/data with other equipment.
Each process as described above and processing, such as method 200 can be executed by processing unit 501.For example, one In a little embodiments, method 200 can be implemented as computer software programs, be tangibly embodied in machine readable media, such as Storage unit 508.In some embodiments, some or all of of computer program can be via ROM 502 and/or communication unit Member 509 and be loaded into and/or be installed in equipment 500.When computer program is loaded into RAM503 and is executed by CPU 501 When, the one or more steps of method as described above 200 can be executed.
As it is used herein, term " includes " and its similar term should be understood as that opening includes, i.e., " including but not It is limited to ".Term "based" should be understood as " being based at least partially on ".Term " one embodiment " or " embodiment " should manage Solution is " at least one embodiment ".Term " first ", " second " etc. may refer to different or identical object.May be used also herein It can include other specific and implicit definition.
As it is used herein, term " determination " covers various movements.For example, " determination " may include operation, It calculates, processing, export, investigation, searches (for example, searching in table, database or another data structure), finds out.In addition, " determination " may include receiving (for example, receiving information), access (for example, data in access memory) etc..In addition, " determination " It may include parsing, selection, selection, foundation etc..
It should be noted that embodiment of the disclosure can be realized by the combination of hardware, software or software and hardware.Firmly Part part can use special logic to realize;Software section can store in memory, by instruction execution system appropriate, Such as microprocessor or special designs hardware execute.It will be appreciated by those skilled in the art that above-mentioned device and method can It is realized with using computer executable instructions and/or being included in the processor control code, such as in programmable memory Or such code is provided in the data medium of such as optics or electrical signal carrier.
In addition, although describing the operation of disclosed method in the accompanying drawings with particular order, this do not require that or Person implies must execute these operations in this particular order, or has to carry out operation shown in whole and be just able to achieve expectation Result.On the contrary, the step of describing in flow chart can change and execute sequence.Additionally or alternatively, it is convenient to omit Mou Xiebu Suddenly, multiple step groups are combined into a step to execute, and/or a step is decomposed into execution of multiple steps.It shall also be noted that It can be embodied in one apparatus according to the feature and function of two or more devices of the disclosure.Conversely, above-described The feature and function of one device can be to be embodied by multiple devices with further division.
Although describing the disclosure by reference to several specific embodiments, but it is to be understood that it is public that the present disclosure is not limited to institutes The specific embodiment opened.The disclosure is intended to cover in spirit and scope of the appended claims included various modifications and equivalent Arrangement.

Claims (16)

1. a kind of method for extracting entity relevant information, comprising:
Obtain multiple candidate texts associated with predetermined entity and predetermined attribute;
Based on the semanteme of the entity attribute pair formed by the predetermined entity and the predetermined attribute, from the multiple candidate text At least one target text of middle determination;And
Based at least one described target text, the attribute value of the predetermined attribute of the predetermined entity is determined.
2. according to the method described in claim 1, wherein obtaining the multiple candidate text and including:
Determine entity term corresponding with the predetermined entity and attribute retrieval word corresponding with the predetermined attribute;With And
Using the entity term and the attribute retrieval word, the multiple candidate text is retrieved from text library.
3. according to the method described in claim 2, wherein the entity term includes the title and alias of the predetermined entity At least one of, and the attribute retrieval word includes at least one in title, alias and the introducer of the predetermined attribute A, the introducer is used to be guided out the predetermined attribute of the predetermined entity.
4. according to the method described in claim 1, further comprising:
Determine that emerging entity or search rate are higher than the entity of threshold value as the predetermined entity;And
The predetermined attribute is determined based on the predetermined entity.
5. according to the method described in claim 1, wherein determining that at least one described target text includes: for the multiple time Given candidate text in selection sheet,
The given candidate text is handled with the semanteme of the determination given candidate text;
Determine the semantic similarity between the semanteme of the entity attribute pair of the given candidate text;And
Be higher than threshold value in response to the similarity, select described given candidate text at least one target text as described in it One.
6. according to the method described in claim 1, wherein determining that the attribute value includes:
Using multiple and different extraction models with different model structures, it is based on the predetermined entity and the predetermined attribute, Multiple candidate values are extracted from least one described target text;
Determine the confidence level of the multiple candidate value;And
Confidence level is selected to be higher than the attribute value of threshold value from the multiple candidate value.
7. according to the method described in claim 6, wherein at least one described target text includes multiple target texts, and its The confidence level of middle the multiple candidate value of determination includes: for the given candidate attribute in the multiple candidate value Value,
Determine the multiple pairings for extracting the extraction model and target text of the given candidate value;
Obtain multiple confidence scores associated with the multiple pairing respectively of the candidate value;And
The multiple confidence score is added, to obtain the confidence level of the given candidate value.
8. a kind of device for extracting entity relevant information, comprising:
Candidate text obtains module, is configured as obtaining multiple candidate texts associated with predetermined entity and predetermined attribute;
Target text determining module is configured as based on the entity attribute pair formed by the predetermined entity and the predetermined attribute Semanteme, determine at least one target text from the multiple candidate text;And
Attribute value determining module is configured as determining the described pre- of the predetermined entity based at least one described target text Determine the attribute value of attribute.
9. device according to claim 8, wherein candidate's text acquisition module includes:
Term determining module, be configured to determine that entity term corresponding with the predetermined entity and with the predetermined category The corresponding attribute retrieval word of property;And
Retrieval module is configured as retrieving from text library described more using the entity term and the attribute retrieval word A candidate's text.
10. device according to claim 9, wherein the entity term includes the title and alias of the predetermined entity At least one of, and the attribute retrieval word includes at least one in title, alias and the introducer of the predetermined attribute A, the introducer is used to be guided out the predetermined attribute of the predetermined entity.
11. device according to claim 8, further comprises:
Predetermined entity determining module is configured to determine that emerging entity or search rate are higher than described in the entity conduct of threshold value Predetermined entity;And
Predetermined attribute determining module is configured as determining the predetermined attribute based on the predetermined entity.
12. device according to claim 8, wherein for the given candidate text in the multiple candidate text, it is described Target text determining module includes:
Processing module is configured as handling the given candidate text with the semanteme of the determination given candidate text;
Similarity determining module is configured to determine that the semanteme and the semanteme of the entity attribute pair of the given candidate text Between similarity;And
Target text selecting module is configured to respond to the similarity higher than threshold value, the given candidate text is selected to make For one of at least one described target text.
13. device according to claim 8, wherein the attribute value determining module includes:
Attribute value extraction module is configured with multiple and different extraction models with different model structures, based on described Predetermined entity and the predetermined attribute extract multiple candidate values from least one described target text;
Confidence determination module is configured to determine that the confidence level of the multiple candidate value;And
Attribute value selecting module is configured as the attribute value for selecting confidence level to be higher than threshold value from the multiple candidate value.
14. device according to claim 13, wherein at least one described target text includes multiple target texts, and Wherein for the given candidate value in the multiple candidate value, the confidence determination module includes:
Determining module is matched, is configured to determine that the more of the extraction model and target text for extracting the given candidate value A pairing;
Score obtains module, is configured as obtaining the associated with the multiple pairing respectively of the candidate value and multiple sets Confidence score;And
Module is summed it up, is configured as the multiple confidence score being added, to obtain the confidence of the given candidate value Degree.
15. a kind of electronic equipment, comprising:
One or more processors;And
Storage device, for storing one or more programs, when one or more of programs are by one or more of processing When device executes, so that one or more of processors realize such as method of any of claims 1-7.
16. a kind of computer readable storage medium is stored thereon with computer program, realization when described program is executed by processor Such as method of any of claims 1-7.
CN201910087401.8A 2019-01-29 2019-01-29 Method and device for extracting entity related information, electronic equipment and storage medium Active CN109783651B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910087401.8A CN109783651B (en) 2019-01-29 2019-01-29 Method and device for extracting entity related information, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910087401.8A CN109783651B (en) 2019-01-29 2019-01-29 Method and device for extracting entity related information, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109783651A true CN109783651A (en) 2019-05-21
CN109783651B CN109783651B (en) 2022-03-04

Family

ID=66503607

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910087401.8A Active CN109783651B (en) 2019-01-29 2019-01-29 Method and device for extracting entity related information, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109783651B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210038A (en) * 2019-06-13 2019-09-06 北京百度网讯科技有限公司 Kernel entity determines method and its system, server and computer-readable medium
CN110263342A (en) * 2019-06-20 2019-09-20 北京百度网讯科技有限公司 Method for digging and device, the electronic equipment of the hyponymy of entity
CN110263340A (en) * 2019-06-20 2019-09-20 北京百度网讯科技有限公司 Comment on generation method, device, server and storage medium
CN110287302A (en) * 2019-06-28 2019-09-27 中国船舶工业综合技术经济研究院 A kind of science and techniques of defence field open source information confidence level determines method and system
CN110674637A (en) * 2019-09-06 2020-01-10 腾讯科技(深圳)有限公司 Character relation recognition model training method, device, equipment and medium
CN110795525A (en) * 2019-09-17 2020-02-14 腾讯科技(深圳)有限公司 Text structuring method and device, electronic equipment and computer readable storage medium
CN111026937A (en) * 2019-11-13 2020-04-17 百度在线网络技术(北京)有限公司 Method, device and equipment for extracting POI name and computer storage medium
CN111143384A (en) * 2019-12-23 2020-05-12 深圳市中农网有限公司 Entity attribute updating method, device, equipment and readable medium
CN111476033A (en) * 2020-04-07 2020-07-31 武汉元光科技有限公司 Bus stop name generation method and device
CN111640511A (en) * 2020-05-29 2020-09-08 北京百度网讯科技有限公司 Medical fact verification method and device, electronic equipment and storage medium
CN112434530A (en) * 2019-08-06 2021-03-02 富士通株式会社 Information processing apparatus, information processing method, and computer program
CN112507702A (en) * 2020-12-03 2021-03-16 北京百度网讯科技有限公司 Text information extraction method and device, electronic equipment and storage medium
CN112651220A (en) * 2021-01-28 2021-04-13 宁夏智诚安环科技发展股份有限公司四川分公司 Environmental impact evaluation report generation method and system
CN112805715A (en) * 2019-07-05 2021-05-14 谷歌有限责任公司 Identifying entity attribute relationships
CN113051926A (en) * 2021-03-01 2021-06-29 北京百度网讯科技有限公司 Text extraction method, equipment and storage medium
CN113223729A (en) * 2021-05-26 2021-08-06 广州天鹏计算机科技有限公司 Data processing method of medical data

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101661484A (en) * 2008-08-29 2010-03-03 株式会社理光 Query method and query system
CN102200983A (en) * 2010-03-25 2011-09-28 日电(中国)有限公司 Attribute extraction device and method
CN104636466A (en) * 2015-02-11 2015-05-20 中国科学院计算技术研究所 Entity attribute extraction method and system oriented to open web page
CN105493075A (en) * 2013-07-15 2016-04-13 微软技术许可有限责任公司 Retrieval of attribute values based upon identified entities
EP3168791A1 (en) * 2015-11-10 2017-05-17 Fujitsu Limited Method and system for data validation in knowledge extraction apparatus
CN107944025A (en) * 2017-12-12 2018-04-20 北京百度网讯科技有限公司 Information-pushing method and device
CN108280061A (en) * 2018-01-17 2018-07-13 北京百度网讯科技有限公司 Text handling method based on ambiguity entity word and device
EP3398137A1 (en) * 2016-02-04 2018-11-07 Siemens Aktiengesellschaft Strategic improvisation design for adaptive resilience
CN109145114A (en) * 2018-08-29 2019-01-04 电子科技大学 Social networks event detecting method based on Kleinberg presence machine

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101661484A (en) * 2008-08-29 2010-03-03 株式会社理光 Query method and query system
CN102200983A (en) * 2010-03-25 2011-09-28 日电(中国)有限公司 Attribute extraction device and method
CN105493075A (en) * 2013-07-15 2016-04-13 微软技术许可有限责任公司 Retrieval of attribute values based upon identified entities
CN104636466A (en) * 2015-02-11 2015-05-20 中国科学院计算技术研究所 Entity attribute extraction method and system oriented to open web page
EP3168791A1 (en) * 2015-11-10 2017-05-17 Fujitsu Limited Method and system for data validation in knowledge extraction apparatus
EP3398137A1 (en) * 2016-02-04 2018-11-07 Siemens Aktiengesellschaft Strategic improvisation design for adaptive resilience
CN107944025A (en) * 2017-12-12 2018-04-20 北京百度网讯科技有限公司 Information-pushing method and device
CN108280061A (en) * 2018-01-17 2018-07-13 北京百度网讯科技有限公司 Text handling method based on ambiguity entity word and device
CN109145114A (en) * 2018-08-29 2019-01-04 电子科技大学 Social networks event detecting method based on Kleinberg presence machine

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐增壮: "基于实体关联性和语义信息的槽填充方法研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210038A (en) * 2019-06-13 2019-09-06 北京百度网讯科技有限公司 Kernel entity determines method and its system, server and computer-readable medium
CN110210038B (en) * 2019-06-13 2023-01-10 北京百度网讯科技有限公司 Core entity determining method, system, server and computer readable medium thereof
CN110263342A (en) * 2019-06-20 2019-09-20 北京百度网讯科技有限公司 Method for digging and device, the electronic equipment of the hyponymy of entity
CN110263340A (en) * 2019-06-20 2019-09-20 北京百度网讯科技有限公司 Comment on generation method, device, server and storage medium
CN110263340B (en) * 2019-06-20 2023-05-23 北京百度网讯科技有限公司 Comment generation method, comment generation device, server and storage medium
CN110287302A (en) * 2019-06-28 2019-09-27 中国船舶工业综合技术经济研究院 A kind of science and techniques of defence field open source information confidence level determines method and system
CN110287302B (en) * 2019-06-28 2021-03-30 中国船舶工业综合技术经济研究院 Method and system for determining confidence of open source information in national defense science and technology field
CN112805715A (en) * 2019-07-05 2021-05-14 谷歌有限责任公司 Identifying entity attribute relationships
CN112434530A (en) * 2019-08-06 2021-03-02 富士通株式会社 Information processing apparatus, information processing method, and computer program
CN110674637B (en) * 2019-09-06 2023-07-11 腾讯科技(深圳)有限公司 Character relationship recognition model training method, device, equipment and medium
CN110674637A (en) * 2019-09-06 2020-01-10 腾讯科技(深圳)有限公司 Character relation recognition model training method, device, equipment and medium
CN110795525B (en) * 2019-09-17 2023-09-15 腾讯科技(深圳)有限公司 Text structuring method, text structuring device, electronic equipment and computer readable storage medium
CN110795525A (en) * 2019-09-17 2020-02-14 腾讯科技(深圳)有限公司 Text structuring method and device, electronic equipment and computer readable storage medium
CN111026937B (en) * 2019-11-13 2021-02-19 百度在线网络技术(北京)有限公司 Method, device and equipment for extracting POI name and computer storage medium
US11768892B2 (en) 2019-11-13 2023-09-26 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for extracting name of POI, device and computer storage medium
CN111026937A (en) * 2019-11-13 2020-04-17 百度在线网络技术(北京)有限公司 Method, device and equipment for extracting POI name and computer storage medium
CN111143384A (en) * 2019-12-23 2020-05-12 深圳市中农网有限公司 Entity attribute updating method, device, equipment and readable medium
CN111143384B (en) * 2019-12-23 2024-02-02 深圳市中农网有限公司 Entity attribute updating method, device, equipment and readable medium
CN111476033B (en) * 2020-04-07 2023-09-19 武汉元光科技有限公司 Bus stop name generation method and device
CN111476033A (en) * 2020-04-07 2020-07-31 武汉元光科技有限公司 Bus stop name generation method and device
CN111640511B (en) * 2020-05-29 2023-08-04 北京百度网讯科技有限公司 Medical fact verification method, device, electronic equipment and storage medium
CN111640511A (en) * 2020-05-29 2020-09-08 北京百度网讯科技有限公司 Medical fact verification method and device, electronic equipment and storage medium
CN112507702A (en) * 2020-12-03 2021-03-16 北京百度网讯科技有限公司 Text information extraction method and device, electronic equipment and storage medium
CN112507702B (en) * 2020-12-03 2023-08-22 北京百度网讯科技有限公司 Text information extraction method and device, electronic equipment and storage medium
CN112651220A (en) * 2021-01-28 2021-04-13 宁夏智诚安环科技发展股份有限公司四川分公司 Environmental impact evaluation report generation method and system
CN113051926A (en) * 2021-03-01 2021-06-29 北京百度网讯科技有限公司 Text extraction method, equipment and storage medium
CN113051926B (en) * 2021-03-01 2023-06-23 北京百度网讯科技有限公司 Text extraction method, apparatus and storage medium
CN113223729A (en) * 2021-05-26 2021-08-06 广州天鹏计算机科技有限公司 Data processing method of medical data

Also Published As

Publication number Publication date
CN109783651B (en) 2022-03-04

Similar Documents

Publication Publication Date Title
CN109783651A (en) Extract method, apparatus, electronic equipment and the storage medium of entity relevant information
CN108287858B (en) Semantic extraction method and device for natural language
CN110659366A (en) Semantic analysis method and device, electronic equipment and storage medium
CN110597963B (en) Expression question-answering library construction method, expression search device and storage medium
CN107220386A (en) Information-pushing method and device
CN107491534A (en) Information processing method and device
CN112287069B (en) Information retrieval method and device based on voice semantics and computer equipment
CN111813905B (en) Corpus generation method, corpus generation device, computer equipment and storage medium
CN110619050B (en) Intention recognition method and device
CN116797684B (en) Image generation method, device, electronic equipment and storage medium
CN109446328A (en) A kind of text recognition method, device and its storage medium
CN111597788B (en) Attribute fusion method, device, equipment and storage medium based on entity alignment
CN108304424B (en) Text keyword extraction method and text keyword extraction device
US20230386238A1 (en) Data processing method and apparatus, computer device, and storage medium
CN109902187A (en) A kind of construction method and device, terminal device of feature knowledge map
WO2021082086A1 (en) Machine reading method, system, device, and storage medium
CN110275963A (en) Method and apparatus for output information
CN110232126A (en) Hot spot method for digging and server and computer readable storage medium
CN112650842A (en) Human-computer interaction based customer service robot intention recognition method and related equipment
CN112084342A (en) Test question generation method and device, computer equipment and storage medium
CN113342948A (en) Intelligent question and answer method and device
CN107766498A (en) Method and apparatus for generating information
CN111931503B (en) Information extraction method and device, equipment and computer readable storage medium
CN112598039A (en) Method for acquiring positive sample in NLP classification field and related equipment
CN116910218A (en) Automatic excavation method and device for extended questions in knowledge base

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant