CN110472232A - Information processing method and device based on name entity - Google Patents

Information processing method and device based on name entity Download PDF

Info

Publication number
CN110472232A
CN110472232A CN201910636844.8A CN201910636844A CN110472232A CN 110472232 A CN110472232 A CN 110472232A CN 201910636844 A CN201910636844 A CN 201910636844A CN 110472232 A CN110472232 A CN 110472232A
Authority
CN
China
Prior art keywords
information
name
book
retrieval
incidence relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910636844.8A
Other languages
Chinese (zh)
Inventor
雷文涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wanweidao Information Technology Co Ltd
Original Assignee
Beijing Wanweidao Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wanweidao Information Technology Co Ltd filed Critical Beijing Wanweidao Information Technology Co Ltd
Priority to CN201910636844.8A priority Critical patent/CN110472232A/en
Publication of CN110472232A publication Critical patent/CN110472232A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of information processing methods and device based on name entity.This method includes the name entity and historical events title extracted in book text;The incidence relation of the book text and the name entity and the historical events title is established respectively;The incidence relation is stored respectively into book retrieval database.Present application addresses be unable to satisfy the diversified Search Requirement of user caused by book retrieval has a single function in the related technology.It can satisfy the diversified book retrieval demand of user by the application.In addition, the application can be used for the scene for needing to carry out book retrieval.

Description

Information processing method and device based on name entity
Technical field
This application involves field of computer technology, in particular to a kind of information processing method based on name entity And device.
Background technique
It names entity (Named Entity), refers to name, mechanism name, place name and other are all with entitled mark Entity, wider entity further include number, date, currency, address etc..
The book retrieval method provided in the related technology, when a certain user using keyword when being retrieved, retrieval is tied Only books title includes the keyword as a result, such as user search in fruit: " term A ", search result is books Title carry " term A " as a result, referring to the pertinent texts of " term A " without retrieving in book text, greatly Ground limits user search demand.
Aiming at the problem that book retrieval function in the related technology is unable to satisfy user's diversified Search Requirement, at present still It does not put forward effective solutions.
Summary of the invention
The main purpose of the application is to provide a kind of information processing method and device based on name entity, to solve phase Book retrieval function in the technology of pass is unable to satisfy the problem of user's diversified Search Requirement.
To achieve the goals above, it according to the one aspect of the application, provides at a kind of information based on name entity Reason method.
Include: according to the information processing method based on name entity of the application
Extract the name entity and historical events title in book text;The book text and the name are established respectively The incidence relation of entity and the historical events title;The incidence relation is stored respectively into book retrieval database, In, the book retrieval database is the database that user carries out book retrieval.
Further, the name entity includes name information and information of place names, extracts the name entity in book text It include: the name information and the information of place names extracted in book text;The book text and the life are established respectively Name entity and the incidence relation of the historical events title include: to establish the book text and the name information and institute respectively State the incidence relation of information of place names.
Further, the information of place names includes modern information of place names and history information of place names, in extracting book text The name information and the information of place names after further include: determine the modern information of place names and the history information of place names Place name incidence relation;The place name incidence relation is stored into the book retrieval database.
Further, further include date of birth information in the book retrieval database, extracting the people in book text After name information and information of place names further include: determine the birth incidence relation of the name information Yu the date of birth information; The birth incidence relation is stored into the book retrieval database.
Further, after being stored the incidence relation respectively into book retrieval database further include: receive and use First retrieval request at family end;It retrieves in the book retrieval database and matches with the first retrieval request of the user terminal The first search result;First search result is fed back to the user terminal.
To achieve the goals above, it according to the another aspect of the application, provides at a kind of information based on name entity Manage device.
Include: according to the information processing unit based on name entity of the application
Extraction module, for extracting name entity and historical events title in book text;Module is established, for distinguishing Establish the incidence relation of the book text and the name entity and the historical events title;First memory module, is used for The incidence relation is stored respectively into book retrieval database, wherein the book retrieval database is that user carries out book The database of nationality retrieval.
Further, the name entity includes name information and information of place names, and the extraction module includes: the first extraction Unit, for extracting the name information and the information of place names in book text;The module of establishing includes: the first foundation Unit, for establishing the incidence relation of the book text Yu the name information and the information of place names respectively.
Further, the information of place names includes modern information of place names and history information of place names, described device further include: the One determining module, for determining the place name incidence relation of the modern information of place names and the history information of place names;Second storage Module, for storing the place name incidence relation into the book retrieval database.
It further, further include date of birth information, described device in the book retrieval database further include: second really Cover half block, for determining the birth incidence relation of the name information Yu the date of birth information;Third memory module, is used for The birth incidence relation is stored into the book retrieval database.
Further, described device further include: receiving module, for receiving the first retrieval request of user terminal;Retrieve mould Block is tied for retrieving the match with the first retrieval request of the user terminal first retrieval in the book retrieval database Fruit;Feedback module, for feeding back first search result to the user terminal.
In the embodiment of the present application, by the way of extracting name entity and the historical events title in book text, lead to The incidence relation for establishing the book text and the name entity and the historical events title respectively is crossed, the association is closed System is stored respectively into book retrieval database, has been achieved the purpose that abundant book retrieval function, has been met user to realize The technical effect of diversified book retrieval demand, and then solve causing since book retrieval has a single function in the related technology The technical issues of being unable to satisfy user's diversified Search Requirement.
Detailed description of the invention
The attached drawing constituted part of this application is used to provide further understanding of the present application, so that the application's is other Feature, objects and advantages become more apparent upon.The illustrative examples attached drawing and its explanation of the application is for explaining the application, not Constitute the improper restriction to the application.In the accompanying drawings:
Fig. 1 is the flow diagram according to the information processing method based on name entity of the application first embodiment;
Fig. 2 is the flow diagram according to the information processing method based on name entity of the application second embodiment;
Fig. 3 is the flow diagram according to the information processing method based on name entity of the application 3rd embodiment;
Fig. 4 is the flow diagram according to the information processing method based on name entity of the application fourth embodiment;
Fig. 5 is the flow diagram according to the information processing method based on name entity of the 5th embodiment of the application;
Fig. 6 is the composed structure signal according to the information processing unit based on name entity of the application first embodiment Figure;
Fig. 7 is the composed structure signal according to the information processing unit based on name entity of the application second embodiment Figure;
Fig. 8 is the composed structure signal according to the information processing unit based on name entity of the application 3rd embodiment Figure;
Fig. 9 is the composed structure signal according to the information processing unit based on name entity of the application fourth embodiment Figure;And
Figure 10 is the composed structure signal according to the information processing unit based on name entity of the 5th embodiment of the application Figure.
Specific embodiment
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only The embodiment of the application a part, instead of all the embodiments.Based on the embodiment in the application, ordinary skill people Member's every other embodiment obtained without making creative work, all should belong to the model of the application protection It encloses.
It should be noted that the description and claims of this application and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to embodiments herein described herein.In addition, term " includes " and " tool Have " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing a series of steps or units Process, method, system, product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include without clear Other step or units listing to Chu or intrinsic for these process, methods, product or equipment.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
According to the embodiment of the present application, a kind of information processing method based on name entity is provided, as shown in Figure 1, the party Method includes the following steps, namely S101 to step S103:
S101 extracts name entity and historical events title in book text.
So-called name entity (Named Entity) is exactly name, mechanism name, place name and other are all with entitled The entity of mark, wider entity further include number, date, currency, address etc..By taking current major Network station of books as an example, Retrieval: " Zhuge Liang " such as works as, Jingdone district, and search result is books title carrying " Zhuge Liang " as a result, without finding The pertinent texts of " Zhuge Liang " are referred in the book texts such as The Romance of the Three Kingdoms.Aiming at the problems existing in the prior art, this Shen A kind of information processing method based on name entity that please be provided is applied in book retrieval, when it is implemented, first with dividing Word tool carries out word division to various types of book texts, and existing participle tool kind is various, and there are commonly HanLP (Chinese processing packet), SnowNLP (Chinese class libraries), FoolNLTK (Chinese language processing kit), Jiagu (first bone NLP), Pyltp (Harbin Institute of Technology's language cloud), THULAC (Tsing-Hua University's Chinese lexical analysis kit), NLPIR (Chinese word segmentation system) etc., ability Field technique personnel can carry out any selection in existing participle tool according to practical book retrieval demand.It is directed to and divides again later Text after word therefrom identifies various types of name entities, such as name (Cao behaviour, Liu are standby) and place name (Xuzhou, Luoyang Deng).The identification process of name entity generally includes two parts: (1) entity Boundary Recognition;(2) determine entity class (name, Name, mechanism name or other).The method of common name Entity recognition has name Entity recognition based on NLTK and is based on Name Entity recognition of Stanford etc., those skilled in the art can be based on name entity recognition method root in the prior art Flexible choice is carried out according to actual demand, this will not be repeated here.
For war, the Chibi, Battle at port owned by the government etc. of the historical events such as the period of Three Kingdoms that each period occurs, the extraction of use Method can be extracted manually, and UGC (user-generated content) extraction is also possible to.
S102 establishes the incidence relation of the book text and the name entity and the historical events title respectively.
When it is implemented, establishing all kinds of name entities extracted from book text respectively according to certain preset rules With the incidence relation of book text and the incidence relation of historical events title and book text.The preset rules can be with It is that incidence relation is established with corresponding book text respectively in the source based on name entity and historical events title, specifically may be used To be to set up the name entity respectively while extracting the name entity and historical events title in book text and go through Historical event part title is associated with the book text, such as life has been extracted from The Romance of the Three Kingdoms book text using participle tool Name entity " Cao behaviour " and historical events title " the Chibi, Battle ", then by The Romance of the Three Kingdoms this books respectively with name entity " Cao Behaviour " and historical events title " the Chibi, Battle " establish incidence relation.
It should be noted that the incidence relation of the book text and the name entity and the historical events title, In embodiments herein and without specifically limiting, those skilled in the art can select according to actual use scene Or configuration.
S103 stores the incidence relation respectively into book retrieval database, wherein the book retrieval database It is the database that user carries out book retrieval.
Structural data refers generally to storage in the database, the data with certain logical construction or physical structure, the most The data being commonly stored in relational database.Some embodiments of the present application in the specific implementation, by above-mentioned foundation The book text and the name entity and the incidence relation of the historical events title are converted into structured relations data point It does not store into book retrieval database, for as the subsequent basis for carrying out book retrieval of user.
Preferably, book text information is also stored in above-mentioned database, specifically includes following field information: books ID and Books title;And historical events name information, specifically include following field information: historical events ID and historical events title.
Based on foregoing description it is found that in embodiments herein by book text name entity information and go through The extraction of historical event part title establishes and stores book text being associated between name entity information and historical events title respectively Relationship, and then make user when for example title, name, place name or historical events title carry out book retrieval by keyword, no Only the book retrieval in available title comprising above-mentioned keyword is as a result, can also get in book text comprising above-mentioned pass It is diversified to meet user as a result, provide more search result more abundant comprehensively for user for the book retrieval of keyword Book retrieval demand.
As shown in Fig. 2, in some embodiments of the present application, the name entity includes name information and information of place names, The described method includes:
S201 extracts the name information in book text and the information of place names.
As being previously mentioned in above-mentioned steps S101, it is described name entity specific manifestation form include name, mechanism name, Name etc., widely further includes number, date, currency, address etc..User in book retrieval, close by the more commonly used retrieval arrived Keyword can also be related to the name information referred in book text and information of place names etc., therefore, the application's other than title In some embodiments, extracts the name entity in book text and specifically include the name information and place name letter extracted in book text Breath.When it is implemented, carrying out word division, those skilled in the art to various types of book texts first with participle tool Any selection can be carried out in existing participle tool according to practical book retrieval demand.Later again to the text after participle from In identify name information (such as Cao behaviour, Liu are standby) and information of place names (such as Xuzhou, Luoyang).As previously mentioned, common name The method of Entity recognition has the name Entity recognition based on NLTK and the name Entity recognition based on Stanford etc., this field skill Art personnel can carry out flexible choice based on name entity recognition method in the prior art according to actual needs, not do herein superfluous It states.
Certainly it should be noted that above-mentioned steps are only used as a kind of preferred embodiment of the embodiment of the present application, other classes The extraction of the name entity information of type is equally all covered within the scope of protection of this application, and those skilled in the art can be according to reality Border needs to carry out flexible configuration.
Preferably, the incidence relation of the book text and the name entity and the historical events title is established respectively Specifically comprise the following steps:
S202 establishes the incidence relation of the book text Yu the name information and the information of place names respectively.
When it is implemented, establishing the book text and the name information and described respectively according to certain preset rules The incidence relation of information of place names.The preset rules can be, and be distinguished based on the source of the name information and information of place names Incidence relation is established with corresponding book text, specifically can be and extracting name information and place name letter in book text Being associated with for the name information and information of place names and the book text is set up while breath respectively, such as utilizes participle tool and life Name Entity recognition tool has extracted name information " Cao behaviour " and information of place names " Zhuozhou " from The Romance of the Three Kingdoms book text, then By The Romance of the Three Kingdoms, this books establishes incidence relation with name information " Cao behaviour " and information of place names " Zhuozhou " respectively.
Preferably, the incidence relation for establishing the book text and the name information and the information of place names respectively it After further include following steps:
S203 stores the incidence relation of the book text and the name information and the information of place names respectively to book In nationality searching database.
In some embodiments of the present application, name information is also stored in above-mentioned book retrieval database, specifically include as Lower field information: personage ID and person names and information of place names specifically include following field information: place name ID and place name.
As shown in figure 3, in some embodiments of the present application, the information of place names includes modern information of place names and history Name information is extracted the information of place names in book text in step S201 and is specifically included:
S301 extracts the modern information of place names and the history information of place names in book text.
It is also wrapped after step S301 extracts the modern information of place names and the history information of place names in book text It includes:
S302 determines the place name incidence relation of the modern information of place names and the history information of place names.
It might have different titles in the different historical stages for same place, such as: Nanjing has used in history The title crossed has: Nanjing, Tianjing, Jiankang, it is therefore desirable to determine same place between the different names of same stages of historical development Incidence relation, that is, determine the place name incidence relation between the corresponding modern place name in same place and history place name, and then use Family can equally obtain and the modern times place name or history place name is associated goes through when using modern place name or history geographical name retrieval The book retrieval result of history place name or modern place name.The Romance of the Three Kingdoms can be retrieved by such as retrieving history place name " Jiankang ", equally The pertinent texts " those things of the Ming Dynasty " including modern place name " Nanjing " can also be retrieved.
When it is implemented, can determine that the modern information of place names and the history place name are believed according to certain preset rules The place name incidence relation of breath, the preset rules can be, and obtain geographical position coordinates corresponding to each place name, example in advance Such as, geographical position coordinates corresponding to place name A are { x, y }, and geographical position coordinates corresponding to place name B are { y, z }, place name C institute Corresponding geographical position coordinates are similarly { x, y }, then illustrate that place name A and place name C is not of the same name corresponding to same geographical location Claim, then sets up place name incidence relation of the place name A and place name C between.
S303 stores the place name incidence relation into the book retrieval database.
In the specific implementation, by the place name incidence relation between the modern information of place names of above-mentioned determination and history information of place names It is converted into structured relations data to store into book retrieval database, for as the subsequent progress place name information retrieval of user Basis.In some embodiments of the present application, modern information of place names and history information of place names are also stored in above-mentioned database, has Body includes following field information: modern place name ID and modern place name, history place name ID and history place name.
By the extraction to modern information of place names and history information of place names in book text, determines and store modern place name Place name incidence relation between information and history information of place names, and then user is made to carry out book retrieval by modern information of place names When, it include not only the book retrieval of above-mentioned modern place name as a result, can also get in available title and book text Book retrieval comprising history place name associated with above-mentioned modern times place name in title and book text for user as a result, provide More search result more abundant comprehensively, meets the diversified book retrieval demand of user.
As shown in figure 4, further including date of birth letter in some embodiments of the present application, in the book retrieval database Breath, after step S201 extracts name information and information of place names in book text further include:
S401 determines the birth incidence relation of the name information Yu the date of birth information.
There may be same name for different times and correspond to multiple and different personages, such as the title of a certain personage of the Ming Dynasty Title for " ABC ", a certain personage of the Qing Dynasty is similarly " ABC ", if it is " ABC " that user, which wants retrieval and the person names of the Ming Dynasty, Personage's pertinent texts, when user with " ABC " is that keyword is retrieved, will appear in search result with the Qing Dynasty or other when Phase person names are similarly the relevant books of personage of " ABC ", these search results for not meeting user demand can reduce user Recall precision, influence user retrieval experience, therefore, to solve the above-mentioned problems, in some embodiments of the present application, really The birth incidence relation for having determined name information Yu date of birth information, by the way that date of birth information is arranged, user can in retrieval With the relevant books of the personage accurately got to it to be retrieved.
S402 stores the birth incidence relation into the book retrieval database.
Some embodiments of the present application in the specific implementation, by going out for the name information of above-mentioned determination and date of birth information Raw incidence relation is converted into structured relations data and stores into book retrieval database, for being used as the subsequent carry out name of user The basis of information retrieval.
By the extraction to the name information in book text, going out for name information and date of birth information is determined and stored Raw incidence relation, and then make user when carrying out book retrieval by name information, can be with by the limitation of date of birth information Books relevant to the personage that it to be retrieved accurately are got, provide more accurate search result for user, are improved The recall precision of user.
As shown in figure 5, the incidence relation is stored respectively to book in step S103 in some embodiments of the present application After in nationality searching database further include:
S501 receives the first retrieval request of user terminal.
In some embodiments of the present application, book text and name entity and historical events name are being established and stored respectively After the incidence relation of title, the first retrieval request sent the method also includes receiving user terminal, first retrieval request Can specifically include: title information retrieval requests carry out the book retrieval in title comprising above-mentioned title information, people for user Name information retrieval requests carry out the book retrieval in title or book text comprising above-mentioned name information, place name letter for user Retrieval request is ceased, carries out the book retrieval in title or book text comprising above-mentioned information of place names, historical events name for user Claim retrieval request, the book retrieval in title or book text comprising above-mentioned historical events title is carried out for user.
S502 retrieves first to match with the first retrieval request of the user terminal in the book retrieval database Search result.
After the first retrieval request for receiving user terminal transmission, further identify that the first retrieval request of user terminal is No is title information retrieval requests, name information retrieval requests, place name information retrieval is requested or historical events title retrieval request, If the first retrieval request of user terminal is title information retrieval requests, retrieval and the title in book retrieval database The title information book retrieval result that information matches;If the first retrieval request of user terminal is name information retrieval requests, The name information book retrieval result to match with the name information is then retrieved in book retrieval database;If user terminal The first retrieval request be place name information retrieval request, then retrieve in book retrieval database and match with the information of place names Name information book retrieval result;If the first retrieval request of user terminal is historical events title retrieval request, in book The historical events title book retrieval result to match with the historical events title is retrieved in nationality searching database.
As a kind of preferred embodiment of the application, above-mentioned first search result is specifically included in title or book text Search result comprising above-mentioned title information, name information, information of place names or historical events title.
S503 feeds back first search result to the user terminal.
When it is implemented, according to preset different dimensions to the first search result of client feeds back, the preset difference Dimension can be, the first dimension: including the first retrieval request information in title, the second dimension: includes the first inspection in book text Rope solicited message.The preset different dimensions are also possible that according to name information, information of place names or historical events title three The search result of dimension is to client feeds back.
In some embodiments of the present application, the recommendation weight of different dimensions is further preset, for example, presetting above-mentioned The weighted value of first dimension is higher than the weighted value of the second dimension, and the weighted value the high then more preferential to recommended by client, when the first inspection It is when rope request is name information retrieval requests, then the book retrieval result in title comprising above-mentioned name information is excellent as first First grade is recommended to user, and the book retrieval result in book text comprising above-mentioned name information can be used as the second priority to user Recommend.
It should be noted that the setting of the search result of above-mentioned different dimensions and the setting for recommending weight are not to fix not Become, those skilled in the art can carry out flexible configuration according to actual needs.
It can be seen from the above description that the present invention realizes following technical effect: by the life in book text The extraction of name entity information and historical events title, establish and store book text respectively with name entity information and historical events Incidence relation between title, and then carry out user by keyword such as title, name, place name or historical events title When book retrieval, not only the book retrieval in available title comprising above-mentioned keyword is as a result, books text can also be got Book retrieval in this comprising above-mentioned keyword is as a result, provide more comprehensive more accurate search result, raising for user The recall precision of user, meets the diversified book retrieval demand of user.
It should be noted that step shown in the flowchart of the accompanying drawings can be in such as a group of computer-executable instructions It is executed in computer system, although also, logical order is shown in flow charts, and it in some cases, can be with not The sequence being same as herein executes shown or described step.
According to embodiments of the present invention, it additionally provides a kind of for implementing the information processing based on name entity of the above method Device, as shown in fig. 6, the device includes:
Extraction module 1, for extracting name entity and historical events title in book text.
As previously mentioned, so-called name entity is exactly name, mechanism name, place name and other are all with entitled mark Entity, wider entity further include number, date, currency, address etc..By taking current major Network station of books as an example, retrieval: " Zhuge Liang " such as works as, Jingdone district, and search result is books title carrying " Zhuge Liang " as a result, " three states drill without finding Justice " etc. the pertinent texts of " Zhuge Liang " are referred in book texts.Aiming at the problems existing in the prior art, provided by the present application It is a kind of based on name entity information processing unit be applied in book retrieval, when it is implemented, firstly, extraction module 1 utilize Participle tool carries out word division to various types of book texts, and existing participle tool kind is various, those skilled in the art Member can carry out any selection in existing participle tool according to practical book retrieval demand.Later again to the text after participle Therefrom identify various types of name entities, such as name (Cao behaviour, Liu are standby) and place name (Xuzhou, Luoyang etc.).Common life The method of name Entity recognition has the name Entity recognition based on NLTK and the name Entity recognition based on Stanford etc., this field Technical staff can carry out flexible configuration based on name entity recognition method in the prior art according to actual needs, not do herein It repeats.
Module 2 is established, for establishing the book text and the name entity and the historical events title respectively Incidence relation.
When it is implemented, establish module 2 established respectively according to certain preset rules extracted from book text it is each Class names the incidence relation of entity and book text and the incidence relation of historical events title and book text.Described is pre- If rule can be, association pass is established in the source based on name entity and historical events title with corresponding book text respectively System, specifically can be and sets up the life respectively while extracting the name entity and historical events title in book text Name entity and historical events title and the book text are associated with, for example, using participle tool and name Entity recognition tool from Name entity " Cao behaviour " and historical events title " the Chibi, Battle " are extracted in The Romance of the Three Kingdoms book text, then by " three states Historical romance " this books respectively with name entity " Cao behaviour " and historical events title " the Chibi, Battle " establish incidence relation.
First memory module 3, for being stored the incidence relation respectively into book retrieval database, wherein described Book retrieval database is the database that user carries out book retrieval.
When it is implemented, by the first memory module 3 by the book text of above-mentioned foundation and the name entity and The incidence relation of the historical events title is converted into structured relations data and is stored respectively into book retrieval database, is used for As the subsequent basis for carrying out book retrieval of user.In some embodiments of the present application, book is also stored in above-mentioned database Nationality text information specifically includes following field information: books ID and books title;And historical events name information, it is specific to wrap Include following field information: historical events ID and historical events title.
By the extraction to name entity information and historical events title in book text, establishes and store book text Respectively name entity information and historical events title between incidence relation, and then make user by keyword such as title, It not only include the book of above-mentioned keyword in available title when name, place name or historical events title carry out book retrieval Nationality search result can also get the book retrieval in book text comprising above-mentioned keyword as a result, providing more for user Add comprehensive search result more abundant, meets the diversified book retrieval demand of user.
As shown in fig. 7, in some embodiments of the present application, the name entity includes name information and information of place names,
The extraction module 1 includes: the first extraction unit 11, for extracting the name information and institute in book text State information of place names.
For user in book retrieval, the more commonly used search key arrived can also be related to book text other than title In the name information that refers to and information of place names etc., therefore, in some embodiments of the present application, extraction module 1 is specifically included: One extraction unit 11.When it is implemented, the first extraction unit 11 first with participle tool to various types of book texts into Row word divides, and those skilled in the art can carry out any choosing in existing participle tool according to practical book retrieval demand It selects.Name information (such as Cao behaviour, Liu are standby) and information of place names (such as Xuzhou, Lip river are therefrom identified to the text after participle again later Sun etc.).Those skilled in the art can be carried out flexibly according to actual needs based on name entity recognition method in the prior art Configuration, this will not be repeated here.
Certainly it should be noted that above-mentioned steps are only used as a kind of preferred embodiment of the embodiment of the present application, other classes The extraction of the name entity information of type is equally all covered within the scope of protection of this application, and those skilled in the art can be according to reality Border is adjusted flexibly.
The module 2 of establishing includes: first establishing unit 21, is believed for establishing the book text and the name respectively The incidence relation of breath and the information of place names.
When it is implemented, first establishing unit 21 according to certain preset rules establish respectively the book text with it is described The incidence relation of name information and the information of place names.The preset rules can be, and be based on the name information and place name Incidence relation is established with corresponding book text respectively in the source of information, specifically can be and is extracting the people in book text Name information and information of place names while, set up being associated with for the name information and information of place names and the book text respectively, such as sharp Name information " Cao behaviour " and information of place names " Zhuozhou " have been extracted from The Romance of the Three Kingdoms book text with participle tool, then will This books of The Romance of the Three Kingdoms establish incidence relation with name information " Cao behaviour " and information of place names " Zhuozhou " respectively.
In some embodiments of the present application, name information is also stored in above-mentioned book retrieval database, specifically include as Lower field information: personage ID and person names and information of place names specifically include following field information: place name ID and place name.
As shown in figure 8, in some embodiments of the present application, the information of place names includes modern information of place names and history Name information, described device further include:
First determining module 4, for determining that the modern information of place names is associated with the place name of the history information of place names System.
It might have different titles in the different historical stages for same place, such as: Nanjing has used in history The title crossed has: Nanjing, Tianjing, Jiankang, it is therefore desirable to determine same place between the different names of same stages of historical development Incidence relation, that is, determine the place name incidence relation between the corresponding modern place name in same place and history place name, and then use Family can equally obtain and the modern times place name or history place name is associated goes through when using modern place name or history geographical name retrieval The book retrieval result of history place name or modern place name.The Romance of the Three Kingdoms can be retrieved by such as retrieving history place name " Jiankang ", equally The pertinent texts " those things of the Ming Dynasty " including modern place name " Nanjing " can also be retrieved.When it is implemented, the first determining module 4 The place name incidence relation of the modern information of place names and the history information of place names is determined according to certain preset rules, it is described pre- If rule can be, geographical position coordinates corresponding to each place name are obtained in advance, for example, geographical location corresponding to place name A Coordinate is { x, y }, and geographical position coordinates corresponding to place name B are { y, z }, and geographical position coordinates corresponding to place name C are similarly { x, y } then illustrates that place name A and place name C is different names corresponding to same geographical location, then sets up place name A and place name C extremely Between place name incidence relation.
Second memory module 5, for storing the place name incidence relation into the book retrieval database.
When it is implemented, by the second memory module 5 by the modern information of place names of above-mentioned determination and history information of place names it Between place name incidence relation be converted into structured relations data and store into book retrieval database, for as user it is subsequent into The basis of row place name information retrieval.In some embodiments of the present application, modern information of place names is also stored in above-mentioned database With history information of place names, specifically include following field information: modern place name ID is with modern place name, history place name ID and history Name.
By the extraction to modern information of place names and history information of place names in book text, determines and store modern place name Place name incidence relation between information and history information of place names, and then user is made to carry out book retrieval by modern information of place names When, it include not only the book retrieval of above-mentioned modern place name as a result, can also get in available title and book text Book retrieval comprising history place name associated with above-mentioned modern times place name in title and book text for user as a result, provide More search result more abundant comprehensively, meets the diversified book retrieval demand of user.
As shown in figure 9, further including date of birth letter in some embodiments of the present application, in the book retrieval database Breath, described device further include:
Second determining module 6, for determining the birth incidence relation of the name information Yu the date of birth information.
There may be same name for different times and correspond to multiple and different personages, such as the title of a certain personage of the Ming Dynasty Title for " ABC ", a certain personage of the Qing Dynasty is similarly " ABC ", if it is " ABC " that user, which wants retrieval and the person names of the Ming Dynasty, Personage's pertinent texts, when user with " ABC " is that keyword is retrieved, will appear in search result with the Qing Dynasty or other when Phase person names are similarly the relevant books of personage of " ABC ", these search results for not meeting user demand can reduce user Recall precision, influence user retrieval experience, therefore, to solve the above-mentioned problems, in some embodiments of the present application, lead to It crosses the second determining module 6 and the birth incidence relation of name information Yu date of birth information has been determined, believed by the setting date of birth Breath, user can be accurately obtained books relevant to the personage that it to be retrieved in retrieval.
Third memory module 7, for storing the birth incidence relation into the book retrieval database.
In the specific implementation, the going out the name information of above-mentioned determination and date of birth information by third memory module 7 Raw incidence relation is converted into structured relations data and stores into book retrieval database, for being used as the subsequent carry out name of user The basis of information retrieval.
By the extraction to the name information in book text, going out for name information and date of birth information is determined and stored Raw incidence relation, and then make user when carrying out book retrieval by name information, can be with by the limitation of date of birth information Books relevant to the personage that it to be retrieved accurately are got, provide more accurate search result for user, are improved The recall precision of user.
As shown in Figure 10, in some embodiments of the present application, described device further include:
Receiving module 8, for receiving the first retrieval request of user terminal.
In some embodiments of the present application, book text and name entity and historical events name are being established and stored respectively After the incidence relation of title, described device further includes receiving module 8, for receiving the first retrieval request of user terminal transmission, institute Stating the first retrieval request can specifically include: title information retrieval requests, believe in title comprising above-mentioned title for user The book retrieval of breath, name information retrieval requests carry out including above-mentioned name information in title or book text for user Book retrieval, place name information retrieval request carry out the books in title or book text comprising above-mentioned information of place names for user Retrieval, historical events title retrieval request carry out including above-mentioned historical events title in title or book text for user Book retrieval.
Retrieval module 9, for retrieving the first retrieval request phase with the user terminal in the book retrieval database Matched first search result.
After the first retrieval request that receiving module 8 receives user terminal transmission, further identified by retrieval module 9 Whether the first retrieval request of user terminal is title information retrieval requests, name information retrieval requests, place name information retrieval request Or historical events title retrieval request is examined if the first retrieval request of user terminal is title information retrieval requests in books The title information book retrieval result to match with the title information is retrieved in rope database;If the first of user terminal retrieves Request is name information retrieval requests, then the name information to match with the name information is retrieved in book retrieval database Book retrieval result;If the first retrieval request of user terminal is place name information retrieval request, in book retrieval database The name information book retrieval result that retrieval matches with the information of place names;If the first retrieval request of user terminal is history Event title retrieval request then retrieves the historical events name to match with the historical events title in book retrieval database Claim book retrieval result.
As a kind of preferred embodiment of the application, above-mentioned first search result is specifically included in title or book text Search result comprising above-mentioned title information, name information, information of place names or historical events title.
Feedback module 10, for feeding back first search result to the user terminal.
When it is implemented, feedback module 10 according to preset different dimensions to the first search result of client feeds back, it is described Preset different dimensions can be, the first dimension: include the first retrieval request information, the second dimension: in book text in title Include the first retrieval request information.The preset different dimensions are also possible that according to name information, information of place names or history thing The search result of three dimensions of part title is to client feeds back.
In some embodiments of the present application, the recommendation weight of different dimensions is further preset, for example, presetting above-mentioned The weighted value of first dimension is higher than the weighted value of the second dimension, and the weighted value the high then more preferential to recommended by client, when the first inspection It is when rope request is name information retrieval requests, then the book retrieval result in title comprising above-mentioned name information is excellent as first First grade is recommended to user, and the book retrieval result in book text comprising above-mentioned name information can be used as the second priority to user Recommend.
It should be noted that the setting of the search result of above-mentioned different dimensions and the setting for recommending weight are not to fix not Become, those skilled in the art can be adjusted flexibly according to actual needs.
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored Be performed by computing device in the storage device, perhaps they are fabricated to each integrated circuit modules or by they In multiple modules or step be fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific Hardware and software combines.
The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any to repair Change, equivalent replacement, improvement etc., should be included within the scope of protection of this application.

Claims (10)

1. a kind of information processing method based on name entity characterized by comprising
Extract the name entity and historical events title in book text;
The incidence relation of the book text and the name entity and the historical events title is established respectively;
The incidence relation is stored respectively into book retrieval database, wherein the book retrieval database be user into The database of row book retrieval.
2. the information processing method according to claim 1 based on name entity, which is characterized in that the name entity packet Name information and information of place names are included, the name entity extracted in book text includes:
Extract the name information in book text and the information of place names;
The book text is established respectively and the incidence relation of the name entity and the historical events title includes:
The incidence relation of the book text Yu the name information and the information of place names is established respectively.
3. the information processing method according to claim 2 based on name entity, which is characterized in that the information of place names packet Modern information of place names and history information of place names are included, after extracting the name information and the information of place names in book text Further include:
Determine the place name incidence relation of the modern information of place names and the history information of place names;
The place name incidence relation is stored into the book retrieval database.
4. the information processing method according to claim 2 based on name entity, which is characterized in that the book retrieval number According to further including date of birth information in library, after extracting name information and the information of place names in book text further include:
Determine the birth incidence relation of the name information Yu the date of birth information;
The birth incidence relation is stored into the book retrieval database.
5. the information processing method according to claim 1 based on name entity, which is characterized in that
After being stored the incidence relation respectively into book retrieval database further include:
Receive the first retrieval request of user terminal;
The first search result to match with the first retrieval request of the user terminal is retrieved in the book retrieval database;
First search result is fed back to the user terminal.
6. a kind of information processing unit based on name entity characterized by comprising
Extraction module, for extracting name entity and historical events title in book text;
Module is established, is associated with for establishing the book text respectively with the name entity and the historical events title System;
First memory module, for being stored the incidence relation respectively into book retrieval database, wherein the books inspection Rope database is the database that user carries out book retrieval.
7. the information processing unit according to claim 6 based on name entity, which is characterized in that the name entity packet Name information and information of place names are included,
The extraction module includes:
First extraction unit, for extracting the name information and the information of place names in book text;
The module of establishing includes:
First establishing unit, for establishing being associated with for the book text and the name information and the information of place names respectively System.
8. the information processing unit according to claim 7 based on name entity, which is characterized in that the information of place names packet Include modern information of place names and history information of place names, described device further include:
First determining module, for determining the place name incidence relation of the modern information of place names and the history information of place names;
Second memory module, for storing the place name incidence relation into the book retrieval database.
9. the information processing unit according to claim 7 based on name entity, which is characterized in that the book retrieval number According to further including date of birth information, described device in library further include:
Second determining module, for determining the birth incidence relation of the name information Yu the date of birth information;
Third memory module, for storing the birth incidence relation into the book retrieval database.
10. the information processing method according to claim 6 based on name entity, which is characterized in that described device is also wrapped It includes:
Receiving module, for receiving the first retrieval request of user terminal;
Retrieval module, what the first retrieval request for the retrieval in the book retrieval database and the user terminal matched First search result;
Feedback module, for feeding back first search result to the user terminal.
CN201910636844.8A 2019-07-15 2019-07-15 Information processing method and device based on name entity Pending CN110472232A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910636844.8A CN110472232A (en) 2019-07-15 2019-07-15 Information processing method and device based on name entity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910636844.8A CN110472232A (en) 2019-07-15 2019-07-15 Information processing method and device based on name entity

Publications (1)

Publication Number Publication Date
CN110472232A true CN110472232A (en) 2019-11-19

Family

ID=68508649

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910636844.8A Pending CN110472232A (en) 2019-07-15 2019-07-15 Information processing method and device based on name entity

Country Status (1)

Country Link
CN (1) CN110472232A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523013A (en) * 2020-04-22 2020-08-11 咪咕文化科技有限公司 Book searching method and device, electronic equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100281034A1 (en) * 2006-12-13 2010-11-04 Google Inc. Query-Independent Entity Importance in Books
CN102955807A (en) * 2011-08-26 2013-03-06 华为软件技术有限公司 Retrieval method and retrieval device for associated information
CN105468605A (en) * 2014-08-25 2016-04-06 济南中林信息科技有限公司 Entity information map generation method and device
US20170148276A1 (en) * 2015-11-19 2017-05-25 SBC Nevada, LLC System for placing wagers on sporting events and method of operating same

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100281034A1 (en) * 2006-12-13 2010-11-04 Google Inc. Query-Independent Entity Importance in Books
CN102955807A (en) * 2011-08-26 2013-03-06 华为软件技术有限公司 Retrieval method and retrieval device for associated information
CN105468605A (en) * 2014-08-25 2016-04-06 济南中林信息科技有限公司 Entity information map generation method and device
US20170148276A1 (en) * 2015-11-19 2017-05-25 SBC Nevada, LLC System for placing wagers on sporting events and method of operating same

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523013A (en) * 2020-04-22 2020-08-11 咪咕文化科技有限公司 Book searching method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US11681944B2 (en) System and method to generate a labeled dataset for training an entity detection system
US20200311342A1 (en) Populating values in a spreadsheet using semantic cues
CN103760991B (en) Physical input method and physical input device
CN108255958A (en) Data query method, apparatus and storage medium
CN105630938A (en) Intelligent question-answering system
CN106033416A (en) A string processing method and device
CN109933708A (en) Information retrieval method, device, storage medium and computer equipment
US20180131693A1 (en) Systems and methods for creating and displaying an electronic communication digest
CN103294778A (en) Method and system for pushing messages
CN105608113B (en) Judge the method and device of POI data in text
CN103902535A (en) Method, device and system for obtaining associational word
CN108256070A (en) For generating the method and apparatus of information
CN107085568A (en) A kind of text similarity method of discrimination and device
CN110019649A (en) A kind of method and device established, search for index tree
CN105653576A (en) Information searching method and apparatus, manual position service method and system
CN108509545A (en) A kind of comment processing method and system of article
CN105069034A (en) Recommendation information generation method and apparatus
CN106156262A (en) A kind of search information processing method and system
CN110472232A (en) Information processing method and device based on name entity
CN106446270A (en) Classifying method and device
CN109992729A (en) A kind of tourism strategy recommended method
CN109284362A (en) Content retrieval method and system
CN111488464B (en) Entity attribute processing method, device, equipment and medium
US9886497B2 (en) Indexing presentation slides
CN113449522A (en) Text fuzzy matching method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191119