CN110287466A - A kind of physical template generation method and device - Google Patents

A kind of physical template generation method and device Download PDF

Info

Publication number
CN110287466A
CN110287466A CN201910550477.XA CN201910550477A CN110287466A CN 110287466 A CN110287466 A CN 110287466A CN 201910550477 A CN201910550477 A CN 201910550477A CN 110287466 A CN110287466 A CN 110287466A
Authority
CN
China
Prior art keywords
entity
template
text
library
physical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910550477.XA
Other languages
Chinese (zh)
Inventor
徐程程
郑孙聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910550477.XA priority Critical patent/CN110287466A/en
Publication of CN110287466A publication Critical patent/CN110287466A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application discloses a kind of physical template generation method and device, after obtaining search text, described search text can be matched with the physical template in current entity template library, wherein, physical template includes entity substitution word and corresponding adjacent text, if the first instance template in described search text and physical template library meets matching relationship, first object entity can be determined from described search text according to the entity substitution word and corresponding adjacent text that the first instance template includes.It is then possible to which the first object entity is returned in mark extremely search text, and new target entity template is generated according to search text and first object entity therein, newly-generated target entity template is added in physical template library.This method can enable next entity to recall process according to template as comprehensive as possible, to guarantee more accurately to recall various types of entities in search text, improve the generalization ability that entity is recalled.

Description

A kind of physical template generation method and device
Technical field
This application involves data processing fields, more particularly to a kind of physical template generation method and device.
Background technique
In search system, search engine would generally retain the search log of user, and the search log of user includes: looking into The keyword (query) of inquiry, query time, inquiry place, unified resource of the user based on searching keyword institute webpage clicking are fixed The information such as position symbol (Uniform Resource Locator, URL).
Many entity informations can be obtained by carrying out text mining to the search log of browser, by collecting entity letter Breath, is expanded with the knowledge mapping to search system, can be convenient for providing more accurate search result for user.Wherein, institute Stating entity can be noun or noun phrase with practical significance, for example entity can be " I is not medicine mind " this film Title, " mostly sudden strain of a muscle software " this dbase etc..
Currently, the entity in search log is mainly recalled according to the method for deep learning, by training depth in advance Practise model come extract search text in feature, and will search for text generation one or more vector, thus based on generation to Amount extracts the entity in search text.
Since the entity in search log is diversified, and the deep learning model that training obtains can only recall Accuracy rate is higher when certain types of entity, i.e. the generalization ability of this method is lower, is thus difficult accurately search in log Various types of entities are recalled.
Summary of the invention
In order to solve the above-mentioned technical problem, this application provides a kind of physical template generation method and devices, it is ensured that Various types of entities in search text are more accurately recalled, the generalization ability that entity is recalled is improved.
The embodiment of the present application discloses following technical solution:
In a first aspect, the embodiment of the present application provides a kind of physical template generation method, which comprises
Obtain the search text recalled for entity;
Described search text is matched with the physical template in physical template library, the physical template includes that entity replaces Pronoun and corresponding adjacent text;
If the first instance template in described search text and the physical template library meets matching relationship, according to described the The entity substitution word and corresponding adjacent text that one physical template includes, determine the first object entity in described search text;
Target entity template is generated according to described search text and the first object entity, and by the target entity mould Plate is added in the physical template library.
Second aspect, the embodiment of the present application provide a kind of physical template generating means, described device include acquiring unit, Matching unit, determination unit and generation unit:
The acquiring unit, for obtaining the search text for being used for entity and recalling;
The matching unit, it is described for matching described search text with the physical template in physical template library Physical template includes entity substitution word and corresponding adjacent text;
A determination unit, if meeting for the first instance template in described search text and the physical template library With relationship, the entity substitution word and corresponding adjacent text for including according to the first instance template determine described search text In first object entity;
The generation unit, for generating target entity template according to described search text and the first object entity, And the target entity template is added in the physical template library.
The third aspect, the embodiment of the present application provide a kind of generating device for physical template, and the equipment includes place Manage device and memory:
Said program code is transferred to the processor for storing program code by the memory;
The processor is used for raw according to the physical template of the instruction execution in said program code as described in relation to the first aspect At method.
Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, the computer-readable storage Medium is used to execute physical template generation method as described in relation to the first aspect for storing program code, said program code.
It can be seen from above-mentioned technical proposal after obtaining the search text recalled for entity, it can be searched described Suo Wenben is matched with the physical template in current entity template library, wherein physical template includes entity substitution word and correspondence Adjacent text can basis if the first instance template in described search text and physical template library meets matching relationship The entity substitution word and corresponding adjacent text that the first instance template includes, determine first object from described search text Entity.It is then possible to the first object entity be returned mark into search text, and according to search text and the first mesh therein Mark entity generates new target entity template, and newly-generated target entity template is added in physical template library.As it can be seen that passing through During recalling to search text progress entity, it is continuously generated new target entity template, to realize physical template Expand and updates.Thus, it is possible to which next entity is enabled to recall process according to template as comprehensive as possible, to guarantee Various types of entities in search text are more accurately recalled, the generalization ability that entity is recalled is improved.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of application without any creative labor, may be used also for those of ordinary skill in the art To obtain other drawings based on these drawings.
Fig. 1 is a kind of application scenarios schematic diagram of physical template generation method provided by the embodiments of the present application;
Fig. 2 is a kind of physical template generation method flow chart provided by the embodiments of the present application;
Fig. 3 is that a kind of novel entities provided by the embodiments of the present application recall method flow diagram;
Fig. 4 is a kind of physical template generation method schematic diagram provided by the embodiments of the present application;
Fig. 5 is that a kind of the second target entity in candidate entity library provided by the embodiments of the present application is filtered and merges Method flow diagram;
Fig. 6 is the structure chart that a kind of physical template provided by the embodiments of the present application generates;
Fig. 7 is a kind of structure chart for physical template generating device provided by the embodiments of the present application;
Fig. 8 is a kind of structure chart of server provided by the embodiments of the present application.
Specific embodiment
With reference to the accompanying drawing, embodiments herein is described.
Currently, mainly recalling the entity in search log according to the method for deep learning.Due to the reality in search log Body is diversified, and the deep learning model obtained based on deep learning method can only recall certain types of entity When accuracy rate it is higher, it is seen then that the generalization ability of this method is lower, is difficult to search for various types of entities in log and recalls.
For this purpose, the embodiment of the present application provides a kind of physical template generation method, core concept are as follows: pass through text matches Mode, after recalling the entity in search sample based on the physical template in current entity template library, entity that this is recalled Mark is returned into search sample, thus, new target entity template is generated according to search sample and the entity recalled, to realize entity The update of template library.By being continuously generated new target entity template during recalling to search text progress entity, with Expand and update physical template library.Thus, it is possible to which next entity is enabled to recall process according to mould as comprehensive as possible Plate ensure that and recall the various types of entities searched in text, improves the generalization ability that entity is recalled.
Firstly, the application scenarios to the embodiment of the present application are introduced.This method can be applied in terminal device, terminal Equipment for example can be intelligent terminal, computer, personal digital assistant (Personal Digital Assistant, abbreviation PDA), the equipment such as tablet computer.
The physical template generation method is also applied in server, and the server can be only for generating entity The private server of template, the server are also possible to public servicer also comprising other data processing functions, the application Embodiment is without limitation.
The technical solution of the application in order to facilitate understanding, below with reference to practical application scene, to this Shen by taking server as an example Please embodiment provide physical template generation method be introduced.
Referring to Fig. 1, Fig. 1 is a kind of application scenarios schematic diagram of physical template generation method provided by the embodiments of the present application. Include server 101 in the application scenarios, physical template library, the physical template library can be preserved in the server 101 In include physical template, the physical template includes entity substitution word and corresponding adjacent text, for example physical template can be " downloading XX ", wherein " XX " can be entity substitution word, and " downloading " can be the corresponding adjacent text of entity substitution word " XX " (i.e. the prefix word of entity substitution word " XX ").
In the embodiment of the present application, the available search text recalled for entity of the server 101.Described search The user that text such as can be search engine reservation searches for log.After server 101 obtains search text, it can will search Suo Wenben is matched with the physical template in current entity template library, if it is determined that is searched in text and physical template library out A certain physical template meets matching relationship, can be using the physical template as first instance template, and according to the first instance Entity that template includes substitution word and corresponding adjacent text, determine the entity in search text, and the entity is denoted as the One target entity.
For example: assuming that including " downloading dodges " this search term, current physical template library Zhong Bao in search text " downloading XX " this physical template has been included, after search text is matched with the physical template in current entity template library, It determines that " downloading dodges " and physical template " downloading XX " in the search text match more, then, is directed to physical template " downloading XX " can determine the reality in search text according to entity substitution word " XX " and corresponding adjacent text " downloading " that it includes Body is " dodge " more, in this way, the entity " dodge " is the first object entity determined from search text more.
It, can be according to search text and the first mesh determined after determining the first object entity in search text It marks entity and generates target entity template, and newly-generated target entity template is added in physical template library.
For example: assuming that further include that " what dodge be more " this search term is then based on aforementioned exemplary in search text, After determining " dodge " for the first object entity in search text, according to " what dodge be more " in search text, this is searched more Such as " what dodge be more " this target entity template can be generated in rope word and " dodge " this first object entity more, and should Target entity template is added in physical template library, to realize update and expansion to physical template library.
In this way, when the entity for scanning for text later is recalled, it can also be based on the newly-generated target entity template It is matched, such as: when searching in text includes search term " I is not what medicine mind is ", based on newly-generated " dodge more What is " this target entity template, can from search text search term " I is not what medicine mind is " in determine " I not It is medicine mind " this first object entity.
As it can be seen that this method is by being continuously generated new target entity during recalling to search text progress entity Template, to realize the expansion and update in physical template library.Thus, it is possible to which next entity is enabled to recall process foundation Template as comprehensive as possible, thus guarantee by search for text in various types of entities recall, improve entity recall it is general Change ability.
Next, physical template generation method provided by the embodiments of the present application will be introduced in conjunction with attached drawing.Referring to figure 2, the figure shows a kind of physical template generation method flow charts provided by the embodiments of the present application, which comprises
S201: the search text recalled for entity is obtained.
S202: described search text is matched with the physical template in physical template library.
Wherein, the physical template can be the template for recalling entity from search text, and the physical template can To include entity substitution word and corresponding adjacent text.Wherein, the entity substitution word in the physical template can be anonymization Entity, that is, substitution entity word.
In the embodiment of the present application, can after obtaining the search text recalled for entity, by the search text with Physical template in physical template library is matched.
It is to be appreciated that when executing physical template generation method provided by the embodiments of the present application for the first time, applied entity Template library, which can be, to be pre-established, for example can preset a small amount of physical template, to form a physical template library, is led to The physical template crossed in the physical template library based on the foundation carries out first entity and recalls, and can recall in entity later New target entity template is generated in journey, expands and update physical template library to realize.
Illustrate the process in physical template library established: can be determined from search log the higher entity of frequency of occurrence and Corresponding adjacent text, for example determine " dodge what is ", " downloading dodges " contour entity for occuring frequently existing and corresponding adjacent more Text.Then, the entity and corresponding adjacent text occurred according to these high frequencies, setting physical template " downloading XX ", " XX is assorted " etc. physical templates, and by these physical templates of setting form physical template library.
S203: if the first instance template in described search text and the physical template library meets matching relationship, according to The entity substitution word and corresponding adjacent text that the first instance template includes, determine the first object in described search text Entity.
When the physical template searched in text and physical template library meets matching relationship, the physical template can be remembered For first instance template, and the entity substitution word and corresponding adjacent text for including according to first instance template, from search text In determine first object entity.
S204: generating target entity template according to described search text and the first object entity, and by the target Physical template is added in the physical template library.
After determining first object entity in search text, can be given birth to according to search text and first object entity The target entity template of Cheng Xin, it is thus possible to which newly-generated target entity template is added in physical template library.
By repeating S201-S204, available richer and accurate physical template library, thus, it is based on these Physical template carries out entity and recalls, it is ensured that the accuracy rate recalled to various types of entities.
It can be seen from above-mentioned technical proposal after obtaining the search text recalled for entity, it can be searched described Suo Wenben is matched with the physical template in current entity template library, wherein physical template includes entity substitution word and correspondence Adjacent text can basis if the first instance template in described search text and physical template library meets matching relationship The entity substitution word and corresponding adjacent text that the first instance template includes, determine first object from described search text Entity.It is then possible to the first object entity be returned mark into search text, and according to search text and the first mesh therein Mark entity generates new target entity template, and newly-generated target entity template is added in physical template library.As it can be seen that passing through During recalling to search text progress entity, it is continuously generated new target entity template, to realize physical template Expand and updates.Thus, it is possible to which next entity is enabled to recall process according to template as comprehensive as possible, to guarantee Various types of entities in search text are more accurately recalled, the generalization ability that entity is recalled is improved.In addition, this method is also Reduce the process manually participated in, to reduce human cost.
It is to be appreciated that the method that the embodiment of the present application does not limit S204, in order to improve the target entity mould generated in S204 The applicability of plate generates target reality according to search text and first object entity in S204 in one possible implementation The method of body template may include:
S301: extracting the first combine text from described search text, the first combine text include first object entity and Corresponding adjacent text.
It in the embodiment of the present application, can be based on position of the first object entity in search text, from described search text The first combine text is extracted in this, wherein may include that first object entity and the first object are real in the first combine text The corresponding adjacent text of body.
Such as: assuming that after determining " dodge " this target entity from the search term " dodge what is " in search text, more more " dodge " this first object entity is returned into search term " mostly sudden strain of a muscle software " of the mark into search text more, in this way, this can be searched Rope word (i.e. " mostly sudden strain of a muscle software ") is used as the first combine text, and it is extracted from search text.As it can be seen that extract It include " dodge " this first object entity and corresponding " software " this adjacent text in one combine text " mostly sudden strain of a muscle software " more.
S302: the first object entity in first combine text is replaced with into the entity and substitutes word, obtains second Combine text.
S303: target entity template is generated according to second combine text.
After extracting the first combine text, the first object entity in the first combine text can be replaced with into entity and replaced Pronoun obtains the second combine text.To using the second obtained combine text as newly-generated target entity template.
Such as: it is based on corresponding example in S301, the first object entity in the first combine text " mostly sudden strain of a muscle software " is " more Dodge " entity substitution word (such as " XX ") is replaced with, to obtain the second combine text " XX software ".It then, can be by the second combination text This " XX software " is as newly-generated target entity template.
Word is substituted by the way that the first object entity in the first combine text is replaced with entity, later by thus generating Target entity template when being matched, it is only necessary to pay close attention to the position of entity substitution word and corresponding adjacent text in target entity template Set relationship, so as to improve generation target entity template applicability.
It is appreciated that being directed to according to S301-S303 target entity template generated, wherein may include in search text Frequency of occurrence is lower in this and the higher physical template of frequency of occurrence.Based on this, in one possible implementation, executing S303, i.e., before generating target entity template according to second combine text, the method also includes:
S401: determining whether second combine text meets frequency condition, if so, executing S402.
S402: described the step of target entity template is generated according to second combine text is executed.
In the embodiment of the present application, frequency condition can be previously provided with.Wherein, the frequency condition can be for true Frequency of occurrence of fixed second combine text in search text belongs to the condition of high frequency.
After obtaining the second combine text, it can determine whether second combine text meets frequency condition, if so, S303 can be executed, it may be assumed that target entity template is generated according to second combine text.To generate target entity template.
As a result, newly-generated target entity template can be high frequency physical template, without additionally generating low frequency Physical template avoids the waste of system resource.
It is appreciated that usually there is bigger meaning to recalling for the novel entities occurred in the near future, and such as: novel entities can be with Facilitate search system and understands the recent search term of user.Wherein, the novel entities can be the entity occurred in the recent period, and search is new Corresponding search query is also usually rich and varied when entity.Therefore, obtaining novel entities is a more important task. For this purpose, in one possible implementation, after carrying out S203, i.e., determine the first object entity in search text it Afterwards, the method also includes:
S501: according to the substance feature of the second target entity in candidate entity library, determine that second target entity is It is no to belong to novel entities;It include the first object entity determined from described search text in candidate's entity library;Described second Target entity is any one first object entity in the candidate entity library;The substance feature includes the one of following feature Kind or it is a variety of: the physical template quantity that matches, second target entity include in the candidate entity library other first It include the first object physical quantities, pre- first of second target entity in target entity quantity, the candidate entity library If the word frequency in the time and the word frequency distribution in the second preset time, if it is not, executing S502.
S502: second target entity is deleted.
In the embodiment of the present application, in S203 determine first object entity can be the entity occurred for a long time before this or Person's emerging novel entities in the recent period.Therefore, after carrying out S203, the first object entity all generated can be formed one Candidate entity library, in this way, including first object entity in candidate entity library.Wherein, it is directed to any in candidate entity library One first object entity, can be denoted as the second target entity.
Wherein it is possible to determine whether second target entity belongs to new reality according to the substance feature of the second target entity Body.Substance feature described here can be for determining whether second target entity is feature that novel entities have.Institute Stating substance feature may include the one or more of following feature: the physical template quantity that matches, second target entity Including in the candidate entity library other first object physical quantities, in the candidate entity library include that second target is real The first object physical quantities of body, in the word frequency in the first preset time and the word frequency distribution in the second preset time.
Next, five substance features in upper segment description are introduced respectively.
The physical template quantity to match can be the second target entity and the physical template phase in physical template library Matched quantity, such as: this substance feature of physical template quantity of the second target entity A to match be 3, indicate this second Target entity A matches with 3 physical templates in physical template library.If being appreciated that the second target entity is novel entities, The physical template quantity to match should be as high as possible, can indicate that the search pattern of the second target entity is abundant in this way, also It is more likely novel entities.
It can include other first object physical quantities understanding in the candidate entity library by second target entity Are as follows: it may include some first object entities in candidate entity library in the second target entity, for example first object entity is " dodge ", the second target entity are " mostly sudden strain of a muscle APP " more, it is seen that first object entity includes by the second target entity.For the second mesh Number entity is marked, the quantity of the first object entity as in candidate entity library is (i.e. by the second target entity in candidate entity library Including first object physical quantities) can be the second target entity include other first objects in the candidate entity library Physical quantities.Wherein, when the second target entity includes that other first object physical quantities in the candidate entity library are more, It may indicate that second target entity is more likely specific entity (full name of such as entity).
First object physical quantities including second target entity in the candidate entity library can be understood are as follows: wait Selecting some first object entities in entity library may include the second target entity, for the second number of targets entity, as candidate real The quantity of such first object entity (includes the first object entity number of the second target entity in candidate entity library in body library Amount) can be in candidate entity library include second target entity first object physical quantities.In addition, when candidate entity library In include the second target entity first object physical quantities it is more when, may indicate that second target entity semanteme it is wider It is general, it is that the chance of non-physical is higher.
The word frequency being directed in the first preset time, first preset time can be the preset time, such as First preset time can be past certain time today.Word frequency in first preset time can be the second mesh The frequency that mark number entity occurs in the first preset time.If being appreciated that the second target entity is novel entities, when first pre- If the time is recent time (such as today past certain time), if the first preset time of the second target entity Interior word frequency is higher, second target entity be more likely be emerging entity, i.e. novel entities.
The word frequency distribution being directed in the second preset time, when second preset time is also possible to preset Between, for example the second preset time can be nearly one month time.The word frequency distribution in the second preset time can be The frequency distribution that second number of targets entity occurs in the second preset time.When the second preset time is nearly one month time When, the word frequency distribution in the second preset time such as can be the second number of targets entity to be occurred daily in nearly middle of the month The frequency distribution.If being appreciated that the second target entity is novel entities, when the second preset time be recent time (such as Nearly one month time) when, if the second target entity frequency that (such as nearly a couple of days) occurs within the closer time is higher, Showing that second target entity is more likely is emerging entity, i.e. novel entities.
It in addition, if word frequency of second target entity in the first preset time is higher, and include institute in candidate entity library State the first object physical quantities of the second target entity substance feature it is higher when, although the second target entity is when first is default In word frequency it is very high, but since the number that it includes by other first object entities is also very high, i.e., second target entity is more It is possible that in the case of this kind, which can be determined as non-novel entities for non-physical.
In the concrete realization, a machine learning model can be trained in advance, wherein the model can use logistic regression The mode of (Logistics Regression), and based on above-mentioned substance feature, allow it to determine that the entity of input is The no function for novel entities.By manually marking batch of data as training data, to be trained to the machine learning model, To obtain model parameter.In the application machine learning model, which can be real to the second target of input Body executes the prediction of novel entities, for example when the prediction label of machine learning model output is 1, indicates that second target entity is Novel entities.
In this way, being directed to the second target entity, if it is determined that, can be by it from candidate entity library when it is not novel entities It deletes.Second instance is all being used as to whole first object entities in candidate entity library as a result, and above-mentioned by executing When S501-S502, it can only to remain with novel entities in candidate entity library, it is thus achieved that novel entities are recalled.
It is understood that may be true from S203 for the second target entity in S501 in candidate entity library Non-physical word, vulgar word for making etc..Based on this, in one possible implementation, before carrying out S501, i.e., described According to the substance feature of the second target entity in candidate entity library, determine second target entity whether belong to novel entities it Before, the method can also include:
S601: determining in dictionary whether to include second target entity, include in the dictionary non-physical dictionary, One of vulgar dictionary or basic dictionary are a variety of, if so, executing S602.
S602: second target entity is deleted.
In the embodiment of the present application, dictionary can be previously provided with, the dictionary may include non-physical dictionary, low One of popular dictionary or basic dictionary are a variety of.It wherein, may include non-physical word, actual field in the non-physical dictionary Jing Zhong, non-physical word can be the word of no practical significance, such as " then ", " general etc. ".It can wrap in the vulgar dictionary Include vulgar word.It may include common entity in basic (Base) dictionary, for example can all occur daily in longer period of time The entity of the more frequency.
In the concrete realization, dictionary can be constructed by the search log or disclosed word lists of history.Such as Base dictionary can be constructed by the following method: firstly, using historical search log as search text, it is above-mentioned by executing The method of S201-S204 is recalled with carrying out entity to it, obtains candidate entity library.It then, is the second mesh in candidate entity library Mark entity determines the substance feature of its word frequency distribution in the second preset time, and number of days threshold value and the frequency occurs by setting Threshold value, daily frequency of occurrence also higher entity occurs that number of days is more and in filtering from candidate entity library, these entities are formed Base dictionary.Wherein, Base dictionary can regularly update.
Due to may include the entity for being not belonging to novel entities in dictionary, such as candidate reality in the candidate entity library of S501 It may include a large amount of common entities in body library.The second target entity being directed in the candidate entity library of S501 as a result, can be with Determine in entity library whether include that second target entity can directly determine second target entity and not belong to if including Method in novel entities, so as to directly be deleted, without executing S501-S502.
In addition, the second target entity in S501 in candidate entity library is not it is also possible to be entity, such as the second target reality Body is number or additional character (such as comma, fullstop), is based on this, in one possible implementation, is carrying out S501 Before, i.e., in the substance feature according to the second target entity in candidate entity library, determine that second target entity is No to belong to before novel entities, the method can also include:
S701: determining whether second target entity meets goal rule condition, and the goal rule condition includes such as One of lower condition is a variety of: character length, character types or regular expression it is compiled after expression mode, if it is not, Execute S702.
S702: second target entity is deleted.
In the embodiment of the present application, it can also be previously provided with goal rule condition, the goal rule condition can be For determining whether the second target entity belongs to the condition of entity.Wherein, the goal rule condition may include following condition One of or it is a variety of: character length, character types or regular expression it is compiled after expression mode.Can by it is described just Expression mode after then expression formula is compiled is interpreted as, in the table to the corresponding regular expression of the second target entity after compiled Existing mode (pattern).
Wherein, if the second target entity belongs to entity, the goal rule condition met may is that character length solid Determine quantitative range, for example character length, between 2-10, character types are not entirely by number or characteristic symbol (such as branch, comma And dash etc.) etc. composition, the expression mode after regular expression is compiled is not belonging to URL, time, Internet protocol address The special pattern such as (Internet Protocol Address, IP), Email (E-mail).
In this way, being directed to the second target entity, determine if to meet goal rule condition, if not satisfied, can be by it It is deleted from candidate entity library, the method without executing S501-S502.
As it can be seen that the second target entity of novel entities can will be not belonging in candidate entity library by the method for S601-S602 It deletes.By the method for S701-S702, the second target entity that entity is not belonging in candidate entity library can be deleted, this two Kind method can reduce the quantity that novel entities identify in S501-S502, that is, reduce the pressure of novel entities identification, and reduce The chance that such second target entity is judged by accident in S501.
After executing S501-S502, the second target entity retained in candidate entity library is usually novel entities.It can be with Understand, in the entity in candidate entity library, wherein it is similar for having may include part entity, and such as: in candidate entity library Second target entity " mostly sudden strain of a muscle APP " and the second target entity " mostly sudden strain of a muscle software " are similar.Therefore, after executing S502, i.e., The substance feature according to the second target entity in candidate entity library is executed, determines whether second target entity belongs to After novel entities, the method can also include:
S801: the similarity degree in the candidate entity library between the second target entity of any two is determined.
S802: the second target entity that similarity degree meets condition of similarity is merged.
In the embodiment of the present application, condition of similarity can be preset, the condition of similarity is determined for the second target reality Similar condition between body.Thus, it is possible to determine the similarity degree in candidate's entity library between the second target entity of any two.So Afterwards, the second target entity that similarity degree therein can be met to preset condition merges.
It wherein, such as can be by way of short text clustering, to determine the second target of any two in candidate entity library Similarity degree between entity, such as: the vector by calculating the second target entity in candidate entity library, to calculate any two The similarity degree of vector between a second target entity, the obtained similarity degree are similar between two the second target entities Degree.
It is to be appreciated that the embodiment of the present application does not limit the mode of merging, suitable merging can be selected according to practical situation Mode.Wherein: combined mode can be extracted from the second target entity that similarity degree meets preset condition mainly at Point.Such as: extracted from the second target entity " mostly sudden strain of a muscle APP " and the second target entity " mostly sudden strain of a muscle software " " dodge " more this mainly Ingredient.Alternatively, combined mode can also be extracted from the second target entity that similarity degree meets preset condition it is identical Ingredient.Such as: extract " dodge " this phase from the second target entity " mostly sudden strain of a muscle APP " and the second target entity " introducing dodge " more more It is congruent.
In addition, being directed to the concrete methods of realizing of S801-S802, the URL clicked can be corresponded to according to the second target entity, To determine that the similarity degree in candidate entity library meets the second target entity of default condition of similarity.Such as: when different second When the URL of the corresponding click of target entity is identical, it can determine that the similarity degree between these second target entities meets condition of similarity.
In such a way that the second target entity that similarity degree is high merges, can remain with all recall it is new Under the premise of entity, reduce the quantity of the second target entity in candidate entity library, so as to avoid the waste of system resource.
Physical template generation method based on foregoing description, the embodiment of the present application also provides a kind of novel entities sides of recalling Method, referring to Fig. 3, the figure shows a kind of novel entities provided by the embodiments of the present application to recall method flow diagram.As shown in figure 3, can By recalling module and recalling module and will search in text based on physical template in a manner of based on name Entity recognition First object entity is recalled, and the first object entity recalled is formed candidate entity library.Then, in candidate entity library The second target entity be filtered and merge by filtering and merging module, thus, the entity retained in candidate entity library The novel entities as recalled.Wherein, Entity recognition (Named Entity Recognition, NER) method is named, and can be incited somebody to action It is referred to as " proper name identification ", this method can identify in text with certain sense entity, such as identification including name, The entities such as name, mechanism name, proper noun.
It describes in detail below to the novel entities method of recalling provided by the embodiments of the present application.
NER recalls module and can accurately recall the entity of the classifications such as common name, place name, mechanism name very much, is based on This, module can be recalled by NER and is recalled to this kind of entity in search text.In the concrete realization, which recalls mould Block can be using some disclosed sequence labelling methods, for example Hidden Markov Model, condition random field, are based on neural network side Method etc..
Scan for the entity in text by the module of recalling of physical template in addition, being directed to and recall method, below it is right It is introduced: referring to fig. 4, the figure shows a kind of physical template generation method schematic diagrames provided by the embodiments of the present application.Such as Shown in Fig. 4, which mainly uses Bootstrapping method, wherein the side Bootstrapping Method can be a kind of Weakly supervised method from extension, and the Bootstrapping method can carry out limited sample data Sampling is repeated several times, to re-establish the new samples that can represent parent sample distribution.
It is possible, firstly, to obtain the more entity of frequency of occurrence by for statistical analysis to the query searched in log etc. Adjacent text (prefix or suffix of such as entity), such as obtained adjacent text be the prefixes such as " what is ", " downloading " or after Sew.It is then possible to set regular expression for the adjacent text of these entities, to obtain physical template, and will obtain These physical templates form physical template library.It is appreciated that because next can be given birth to automatically based on Bootstrapping mode At more target entity templates, to expand physical template library, therefore, the quantity of preset physical template is not needed very much.
After creating a small amount of physical template composition physical template library, it can be carried out using these physical templates and search text Matching, thus obtains first object entity.Such as: using " what (.* ?) be " this physical template and search for " more in text What sudden strain of a muscle is " match after, can thus obtain " dodge " this first object entity more.Wherein, described " what (.* ?) be " In this physical template " (.* ?) " it can be entity substitution word, in described " what (.* ?) be " this physical template " is What " it can be corresponding adjacent text.
After obtaining first object entity, the first object entity can be returned to mark into search text, to extract First combine text, wherein include first object entity and corresponding adjacent text in first combine text.It can be by First object entity in one combine text carries out anonymization, i.e., first object entity therein is replaced with entity substitution word, Obtain the second combine text.To obtain target entity template based on the second combine text.For example: by that " will dodge " more This first object entity returns mark into search text, and the mesh of the multiplicity such as " ENTITY downloading ", " ENTITY software " can be generated Mark physical template.It is then also possible to manual examination and verification be carried out to obtained target entity template, to guarantee obtained target entity mould Plate is accurate physical template.Finally, the target entity template passed through through manual examination and verification can be added in physical template library. Wherein, described " ENTITY " can be the entity (i.e. entity substitution word) after anonymization, and thus, it is possible to convenient for statistics high frequency template.
Above-mentioned steps are repeated, these physical templates are applied in available abundant and accurate physical template library as a result, It carries out entity to recall, it is ensured that higher accuracy rate and recall rate.
After carrying out entity to search text and recalling, available candidate's entity library.It below will be in candidate entity library Second target entity is filtered and merges.
Referring to Fig. 5, the figure shows a kind of the second target entities in candidate entity library provided by the embodiments of the present application It is filtered and combined method flow diagram, as shown in figure 5, it is directed to the second target entity in candidate entity library, it can be true Determine in dictionary whether second target entity if including deletes second target entity from the candidate entity library, with reality Now non-novel entities are filtered out for candidate entity library.In addition, being directed to the second target entity in candidate entity library, can also determine Whether it meets goal rule condition, if not meeting, second target entity is deleted from the candidate entity library, to be embodied as Candidate entity library filters out those non-physical.And a machine learning model can also be trained as filtering model, so that should Model can be based on substance feature above-mentioned, to determine whether the second target entity belongs to novel entities, determine the second target When entity is not belonging to novel entities, it is deleted, to guarantee only to remain with novel entities in candidate entity library as far as possible.This When, it mainly include novel entities in candidate entity library, it next can be real to the second target similar in current candidate entity library Body is merged by merging module.After similar second target entity merges in candidate entity library, current time Selecting entity library is finally obtained novel entities library.
Based on a kind of physical template generation method that previous embodiment provides, the embodiment of the present application also provides a kind of entity mould Plate generating means, referring to Fig. 6, the figure shows a kind of structural representations of physical template generating means provided by the embodiments of the present application Figure, described device 600 include acquiring unit 601, matching unit 602, determination unit 603 and generation unit 604:
The acquiring unit 601, for obtaining the search text for being used for entity and recalling;
The matching unit 602, for described search text to be matched with the physical template in physical template library, institute Stating physical template includes entity substitution word and corresponding adjacent text;
The determination unit 603, if being accorded with for the first instance template in described search text and the physical template library Matching relationship is closed, the entity substitution word and corresponding adjacent text for including according to the first instance template determine described search First object entity in text;
The generation unit 604, for generating target entity mould according to described search text and the first object entity Plate, and the target entity template is added in the physical template library.
Optionally, the generation unit 604, is specifically used for:
The first combine text is extracted from described search text, first combine text includes the first object entity With corresponding adjacent text;
First object entity in first combine text is replaced with into the entity substitution word, obtains the second combination text This;
Target entity template is generated according to second combine text.
Optionally, the generation unit 604, also particularly useful for:
Before the generation target entity template according to second combine text, determine that second combine text is It is no to meet frequency condition;
If so, executing described the step of generating target entity template according to second combine text.
Optionally, the determination unit 603, is specifically used for:
After the first object entity in the determining described search text, according to the second target in candidate entity library The substance feature of entity, determines whether second target entity belongs to novel entities;It include from described in candidate's entity library Search for the first object entity determined in text;Second target entity be in the candidate entity library any one first Target entity;
The substance feature includes the one or more of following feature: the physical template quantity that matches, second mesh Mark entity includes other first object physical quantities in the candidate entity library, in the candidate entity library includes described second The first object physical quantities of target entity are divided in the word frequency in the first preset time and the word frequency in the second preset time Cloth;
If it is not, deleting second target entity.
Optionally, the determination unit 603, is specifically used for:
In the substance feature according to the second target entity in candidate entity library, determine that second target entity is It is no to belong to before novel entities, it determines in dictionary whether include second target entity, includes non-physical in the dictionary One of dictionary, vulgar dictionary or basic dictionary are a variety of;
If so, deleting second target entity.
Optionally, the determination unit 603, is specifically used for:
In the substance feature according to the second target entity in candidate entity library, determine that second target entity is It is no to belong to before novel entities, determine whether second target entity meets goal rule condition, the goal rule condition packet Include one of following condition or a variety of: character length, character types or regular expression it is compiled after expression mode;
If it is not, deleting second target entity.
Optionally, the determination unit 603, is specifically used for:
In the substance feature according to the second target entity in candidate entity library, determine that second target entity is It is no to belong to after novel entities, determine the similarity degree in the candidate entity library between the second target entity of any two;
The second target entity that similarity degree meets condition of similarity is merged.
It can be seen from above-mentioned technical proposal after obtaining the search text recalled for entity, it can be searched described Suo Wenben is matched with the physical template in current entity template library, wherein physical template includes entity substitution word and correspondence Adjacent text can basis if the first instance template in described search text and physical template library meets matching relationship The entity substitution word and corresponding adjacent text that the first instance template includes, determine first object from described search text Entity.It is then possible to the first object entity be returned mark into search text, and according to search text and the first mesh therein Mark entity generates new target entity template, and newly-generated target entity template is added in physical template library.As it can be seen that passing through During recalling to search text progress entity, it is continuously generated new target entity template, to realize physical template Expand and updates.Thus, it is possible to which next entity is enabled to recall process according to template as comprehensive as possible, to guarantee Various types of entities in search text are more accurately recalled, the generalization ability that entity is recalled is improved.
The embodiment of the present application also provides a kind of equipment generated for physical template, with reference to the accompanying drawing to for entity The equipment of template generation is introduced.Shown in Figure 7, the embodiment of the present application provides a kind of generate for physical template and sets Standby 700, which can also be terminal device, the terminal device can be include that mobile phone, tablet computer, individual digital help It manages (Personal Digital Assistant, abbreviation PDA), point-of-sale terminal (Point of Sales, abbreviation POS), vehicle-mounted Any intelligent terminal such as computer, by taking terminal device is mobile phone as an example:
Fig. 7 shows the block diagram of the part-structure of mobile phone relevant to terminal device provided by the embodiments of the present application.Ginseng Fig. 7 is examined, mobile phone includes: radio frequency (Radio Frequency, abbreviation RF) circuit 710, memory 720, input unit 830, display Unit 740, sensor 750, voicefrequency circuit 760, Wireless Fidelity (wireless fidelity, abbreviation WiFi) module 770, place Manage the components such as device 780 and power supply 790.It will be understood by those skilled in the art that handset structure shown in Fig. 7 is not constituted Restriction to mobile phone may include perhaps combining certain components or different component cloth than illustrating more or fewer components It sets.
It is specifically introduced below with reference to each component parts of the Fig. 7 to mobile phone:
RF circuit 710 can be used for receiving and sending messages or communication process in, signal sends and receivees, particularly, by base station After downlink information receives, handled to processor 780;In addition, the data for designing uplink are sent to base station.In general, RF circuit 710 Including but not limited to antenna, at least one amplifier, transceiver, coupler, low-noise amplifier (Low Noise Amplifier, abbreviation LNA), duplexer etc..In addition, RF circuit 710 can also by wireless communication with network and other equipment Communication.Any communication standard or agreement, including but not limited to global system for mobile communications can be used in above-mentioned wireless communication (Global System of Mobile communication, abbreviation GSM), general packet radio service (General Packet Radio Service, abbreviation GPRS), CDMA (Code Division Multiple Access, referred to as CDMA), wideband code division multiple access (Wideband Code Division Multiple Access, abbreviation WCDMA), long term evolution (Long Term Evolution, abbreviation LTE), Email, short message service (Short Messaging Service, letter Claim SMS) etc..
Memory 720 can be used for storing software program and module, and processor 780 is stored in memory 720 by operation Software program and module, thereby executing the various function application and data processing of mobile phone.Memory 720 can mainly include Storing program area and storage data area, wherein storing program area can application journey needed for storage program area, at least one function Sequence (such as sound-playing function, image player function etc.) etc.;Storage data area can be stored to be created according to using for mobile phone Data (such as audio data, phone directory etc.) etc..It, can be in addition, memory 720 may include high-speed random access memory Including nonvolatile memory, for example, at least a disk memory, flush memory device or other volatile solid-states Part.
Input unit 730 can be used for receiving the number or character information of input, and generate with the user setting of mobile phone with And the related key signals input of function control.Specifically, input unit 730 may include that touch panel 731 and other inputs are set Standby 732.Touch panel 731, also referred to as touch screen, collect user on it or nearby touch operation (such as user use The operation of any suitable object or attachment such as finger, stylus on touch panel 731 or near touch panel 731), and root Corresponding attachment device is driven according to preset formula.Optionally, touch panel 731 may include touch detecting apparatus and touch Two parts of controller.Wherein, the touch orientation of touch detecting apparatus detection user, and touch operation bring signal is detected, Transmit a signal to touch controller;Touch controller receives touch information from touch detecting apparatus, and is converted into touching Point coordinate, then gives processor 780, and can receive order that processor 780 is sent and be executed.Furthermore, it is possible to using electricity The multiple types such as resistive, condenser type, infrared ray and surface acoustic wave realize touch panel 731.In addition to touch panel 731, input Unit 730 can also include other input equipments 732.Specifically, other input equipments 732 can include but is not limited to secondary or physical bond One of disk, function key (such as volume control button, switch key etc.), trace ball, mouse, operating stick etc. are a variety of.
Display unit 740 can be used for showing information input by user or be supplied to user information and mobile phone it is various Menu.Display unit 740 may include display panel 741, optionally, can use liquid crystal display (Liquid Crystal Display, abbreviation LCD), the forms such as Organic Light Emitting Diode (Organic Light-Emitting Diode, abbreviation OLED) To configure display panel 741.Further, touch panel 731 can cover display panel 741, when touch panel 731 detects After touch operation on or near it, processor 780 is sent to determine the type of touch event, is followed by subsequent processing 780 basis of device The type of touch event provides corresponding visual output on display panel 741.Although in Fig. 7, touch panel 731 and display Panel 741 is the input and input function for realizing mobile phone as two independent components, but in some embodiments it is possible to It is touch panel 731 and display panel 741 is integrated and that realizes mobile phone output and input function.
Mobile phone may also include at least one sensor 750, such as optical sensor, motion sensor and other sensors. Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can be according to ambient light Light and shade adjust the brightness of display panel 741, proximity sensor can close display panel 741 when mobile phone is moved in one's ear And/or backlight.As a kind of motion sensor, accelerometer sensor can detect (generally three axis) acceleration in all directions Size, can detect that size and the direction of gravity when static, can be used to identify the application of mobile phone posture, (for example horizontal/vertical screen is cut Change, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap) etc.;May be used also as mobile phone The other sensors such as gyroscope, barometer, hygrometer, thermometer, the infrared sensor of configuration, details are not described herein.
Voicefrequency circuit 760, loudspeaker 761, microphone 762 can provide the audio interface between user and mobile phone.Audio-frequency electric Electric signal after the audio data received conversion can be transferred to loudspeaker 761, be converted to sound by loudspeaker 761 by road 760 Signal output;On the other hand, the voice signal of collection is converted to electric signal by microphone 762, is turned after being received by voicefrequency circuit 760 It is changed to audio data, then by after the processing of audio data output processor 780, such as another mobile phone is sent to through RF circuit 710, Or audio data is exported to memory 720 to be further processed.
WiFi belongs to short range wireless transmission technology, and mobile phone can help user's transceiver electronics postal by WiFi module 770 Part, browsing webpage and access streaming video etc., it provides wireless broadband internet access for user.Although Fig. 7 is shown WiFi module 770, but it is understood that, and it is not belonging to must be configured into for mobile phone, it can according to need do not changing completely Become in the range of the essence of invention and omits.
Processor 780 is the control centre of mobile phone, using the various pieces of various interfaces and connection whole mobile phone, is led to It crosses operation or executes the software program and/or module being stored in memory 720, and call and be stored in memory 720 Data execute the various functions and processing data of mobile phone, to carry out integral monitoring to mobile phone.Optionally, processor 780 can wrap Include one or more processing units;Preferably, processor 780 can integrate application processor and modem processor, wherein answer With the main processing operation system of processor, user interface and application program etc., modem processor mainly handles wireless communication. It is understood that above-mentioned modem processor can not also be integrated into processor 780.
Mobile phone further includes the power supply 790 (such as battery) powered to all parts, it is preferred that power supply can pass through power supply pipe Reason system and processor 780 are logically contiguous, to realize management charging, electric discharge and power managed by power-supply management system Etc. functions.
Although being not shown, mobile phone can also include camera, bluetooth module etc., and details are not described herein.
In the present embodiment, processor 780 included by the terminal device is also with the following functions:
Obtain the search text recalled for entity;
Described search text is matched with the physical template in physical template library, the physical template includes that entity replaces Pronoun and corresponding adjacent text;
If the first instance template in described search text and the physical template library meets matching relationship, according to described the The entity substitution word and corresponding adjacent text that one physical template includes, determine the first object entity in described search text;
Target entity template is generated according to described search text and the first object entity, and by the target entity mould Plate is added in the physical template library.
It is provided by the embodiments of the present application to can be server for physical template generating device, shown in Figure 8, Fig. 8 For the structure chart of server 800 provided by the embodiments of the present application, server 800 can generate bigger because of configuration or performance difference Difference, may include one or more central processing units (Central Processing Units, abbreviation CPU) 822 (for example, one or more processors) and memory 832, one or more storage application programs 842 or data 844 Storage medium 830 (such as one or more mass memory units).Wherein, memory 832 and storage medium 830 can be with It is of short duration storage or persistent storage.The program for being stored in storage medium 830 may include that (diagram does not have one or more modules Mark), each module may include to the series of instructions operation in server.Further, central processing unit 822 can be with It is set as communicating with storage medium 830, the series of instructions operation in storage medium 830 is executed on server 800.
Server 800 can also include one or more power supplys 826, one or more wired or wireless networks Interface 850, one or more input/output interfaces 858, and/or, one or more operating systems 841, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
The step as performed by server can be based on the server architecture shown in Fig. 8 in above-described embodiment.
Wherein, CPU 822 is for executing following steps:
Obtain the search text recalled for entity;
Described search text is matched with the physical template in physical template library, the physical template includes that entity replaces Pronoun and corresponding adjacent text;
If the first instance template in described search text and the physical template library meets matching relationship, according to described the The entity substitution word and corresponding adjacent text that one physical template includes, determine the first object entity in described search text;
Target entity template is generated according to described search text and the first object entity, and by the target entity mould Plate is added in the physical template library.
The description of the present application and term " first " in above-mentioned attached drawing, " second ", " third ", " the 4th " etc. are (if deposited ) it is to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that use in this way Data are interchangeable under appropriate circumstances, so that embodiments herein described herein for example can be in addition to illustrating herein Or the sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that Cover it is non-exclusive include, for example, containing the process, method, system, product or equipment of a series of steps or units need not limit In step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, produce The other step or units of product or equipment inherently.
It should be appreciated that in this application, " at least one (item) " refers to one or more, and " multiple " refer to two or two More than a."and/or" indicates may exist three kinds of relationships, for example, " A and/or B " for describing the incidence relation of affiliated partner It can indicate: only exist A, only exist B and exist simultaneously tri- kinds of situations of A and B, wherein A, B can be odd number or plural number.Word Symbol "/" typicallys represent the relationship that forward-backward correlation object is a kind of "or"." at least one of following (a) " or its similar expression, refers to Any combination in these, any combination including individual event (a) or complex item (a).At least one of for example, in a, b or c (a) can indicate: a, b, c, " a and b ", " a and c ", " b and c ", or " a and b and c ", and wherein a, b, c can be individually, can also To be multiple.
In several embodiments provided herein, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the application Portion or part steps.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, letter Claim ROM), random access memory (Random Access Memory, abbreviation RAM), magnetic or disk etc. is various to deposit Store up the medium of program code.
The above, above embodiments are only to illustrate the technical solution of the application, rather than its limitations;Although referring to before Embodiment is stated the application is described in detail, those skilled in the art should understand that: it still can be to preceding Technical solution documented by each embodiment is stated to modify or equivalent replacement of some of the technical features;And these It modifies or replaces, the spirit and scope of each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution.

Claims (10)

1. a kind of physical template generation method, which is characterized in that the described method includes:
Obtain the search text recalled for entity;
Described search text is matched with the physical template in physical template library, the physical template includes entity substitution word With corresponding adjacent text;
It is real according to described first if the first instance template in described search text and the physical template library meets matching relationship The entity substitution word and corresponding adjacent text that body template includes, determine the first object entity in described search text;
Target entity template is generated according to described search text and the first object entity, and the target entity template is added It adds in the physical template library.
2. the method according to claim 1, wherein described real according to described search text and the first object Body generates target entity template, comprising:
Extract the first combine text from described search text, first combine text includes the first object entity and right The adjacent text answered;
First object entity in first combine text is replaced with into the entity substitution word, obtains the second combine text;
Target entity template is generated according to second combine text.
3. the method according to claim 1, wherein generating target reality according to second combine text described Before body template, the method also includes:
Determine whether second combine text meets frequency condition;
If so, executing described the step of generating target entity template according to second combine text.
4. the method according to claim 1, wherein the first object in the determining described search text is real After body, the method also includes:
According to the substance feature of the second target entity in candidate entity library, determine whether second target entity belongs to new reality Body;It include the first object entity determined from described search text in candidate's entity library;Second target entity is Any one first object entity in candidate's entity library;
The substance feature includes the one or more of following feature: physical template quantity, second target to match is real Body includes other first object physical quantities in the candidate entity library, in the candidate entity library includes second target The first object physical quantities of entity, in the word frequency in the first preset time and the word frequency distribution in the second preset time;
If it is not, deleting second target entity.
5. according to the method described in claim 4, it is characterized in that, in second target entity according in candidate entity library Substance feature, determine whether second target entity belongs to before novel entities, the method also includes:
It determines in dictionary whether include second target entity, includes non-physical dictionary, vulgar dictionary in the dictionary Or one of basic dictionary or a variety of;
If so, deleting second target entity.
6. according to the method described in claim 4, it is characterized in that, in second target entity according in candidate entity library Substance feature, determine whether second target entity belongs to before novel entities, the method also includes:
Determine whether second target entity meets goal rule condition, the goal rule condition includes in following condition It is one or more: character length, character types or regular expression it is compiled after expression mode;
If it is not, deleting second target entity.
7. according to method described in claim 4-6 any one, which is characterized in that according in candidate entity library The substance feature of two target entities, determines whether second target entity belongs to after novel entities, the method also includes:
Determine the similarity degree in the candidate entity library between the second target entity of any two;
The second target entity that similarity degree meets condition of similarity is merged.
8. a kind of physical template generating means, which is characterized in that described device includes acquiring unit, matching unit, determination unit And generation unit:
The acquiring unit, for obtaining the search text for being used for entity and recalling;
The matching unit, for described search text to be matched with the physical template in physical template library, the entity Template includes entity substitution word and corresponding adjacent text;
The determination unit matches pass if meeting for described search text with the first instance template in the physical template library System, the entity substitution word and corresponding adjacent text for including according to the first instance template, determines in described search text First object entity;
The generation unit, for generating target entity template according to described search text and the first object entity, and will The target entity template is added in the physical template library.
9. device according to claim 8, which is characterized in that the generation unit is specifically used for:
Extract the first combine text from described search text, first combine text includes the first object entity and right The adjacent text answered;
First object entity in first combine text is replaced with into the entity substitution word, obtains the second combine text;
Target entity template is generated according to second combine text.
10. device according to claim 8, which is characterized in that the generation unit, also particularly useful for:
Before the generation target entity template according to second combine text, determine whether second combine text is full Sufficient frequency condition;
If so, executing described the step of generating target entity template according to second combine text.
CN201910550477.XA 2019-06-24 2019-06-24 A kind of physical template generation method and device Pending CN110287466A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910550477.XA CN110287466A (en) 2019-06-24 2019-06-24 A kind of physical template generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910550477.XA CN110287466A (en) 2019-06-24 2019-06-24 A kind of physical template generation method and device

Publications (1)

Publication Number Publication Date
CN110287466A true CN110287466A (en) 2019-09-27

Family

ID=68005462

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910550477.XA Pending CN110287466A (en) 2019-06-24 2019-06-24 A kind of physical template generation method and device

Country Status (1)

Country Link
CN (1) CN110287466A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110856186A (en) * 2019-11-19 2020-02-28 北京联合大学 Method and system for constructing wireless network knowledge graph
CN111488450A (en) * 2020-04-08 2020-08-04 北京字节跳动网络技术有限公司 Method and device for generating keyword library and electronic equipment
CN112231554A (en) * 2020-10-10 2021-01-15 腾讯科技(深圳)有限公司 Search recommendation word generation method and device, storage medium and computer equipment
CN112579707A (en) * 2020-12-08 2021-03-30 西安邮电大学 Log data knowledge graph construction method

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110856186A (en) * 2019-11-19 2020-02-28 北京联合大学 Method and system for constructing wireless network knowledge graph
CN110856186B (en) * 2019-11-19 2023-04-07 北京联合大学 Method and system for constructing wireless network knowledge graph
CN111488450A (en) * 2020-04-08 2020-08-04 北京字节跳动网络技术有限公司 Method and device for generating keyword library and electronic equipment
CN112231554A (en) * 2020-10-10 2021-01-15 腾讯科技(深圳)有限公司 Search recommendation word generation method and device, storage medium and computer equipment
CN112231554B (en) * 2020-10-10 2023-10-31 腾讯科技(深圳)有限公司 Search recommended word generation method and device, storage medium and computer equipment
CN112579707A (en) * 2020-12-08 2021-03-30 西安邮电大学 Log data knowledge graph construction method
CN112579707B (en) * 2020-12-08 2023-04-18 西安邮电大学 Log data knowledge graph construction method

Similar Documents

Publication Publication Date Title
CN111368934B (en) Image recognition model training method, image recognition method and related device
CN110287466A (en) A kind of physical template generation method and device
CN109241431A (en) A kind of resource recommendation method and device
CN104239535A (en) Method and system for matching pictures with characters, server and terminal
CN104217717A (en) Language model constructing method and device
CN108427761B (en) News event processing method, terminal, server and storage medium
CN111222563B (en) Model training method, data acquisition method and related device
CN109165292A (en) Data processing method, device and mobile terminal
CN110163045A (en) A kind of recognition methods of gesture motion, device and equipment
CN110276010A (en) A kind of weight model training method and relevant apparatus
CN111125523A (en) Searching method, searching device, terminal equipment and storage medium
CN110633438B (en) News event processing method, terminal, server and storage medium
CN109032491A (en) Data processing method, device and mobile terminal
CN109656510A (en) The method and terminal of voice input in a kind of webpage
CN104281610B (en) The method and apparatus for filtering microblogging
CN114117056B (en) Training data processing method and device and storage medium
CN111241815A (en) Text increment method and device and terminal equipment
CN110347858A (en) A kind of generation method and relevant apparatus of picture
CN108491502B (en) News tracking method, terminal, server and storage medium
CN106294087B (en) Statistical method and device for operation frequency of business execution operation
CN106020945A (en) Shortcut item adding method and device
CN110287398B (en) Information updating method and related device
CN113220848A (en) Automatic question answering method and device for man-machine interaction and intelligent equipment
CN103401910A (en) Recommendation method, server, terminals and system
CN107436896A (en) Method, apparatus and electronic equipment are recommended in one kind input

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination