CN107066474A - Literature search method and apparatus - Google Patents

Literature search method and apparatus Download PDF

Info

Publication number
CN107066474A
CN107066474A CN201611130331.2A CN201611130331A CN107066474A CN 107066474 A CN107066474 A CN 107066474A CN 201611130331 A CN201611130331 A CN 201611130331A CN 107066474 A CN107066474 A CN 107066474A
Authority
CN
China
Prior art keywords
doi
document
query statement
structural data
special type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611130331.2A
Other languages
Chinese (zh)
Inventor
张显
卢家广
李玉鹏
徐学睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201611130331.2A priority Critical patent/CN107066474A/en
Publication of CN107066474A publication Critical patent/CN107066474A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of literature search method and apparatus, wherein, method comprises the following steps:Receive the query statement of user's input;Judge whether include Digital Object Unique Identifier DOI in query statement;If comprising DOI, extracting DOI, and the document with single mark for including DOI is obtained according to DOI;Document is showed with special type pattern.This method obtains the corresponding document with single mark by the DOI of document, and shows document with special type pattern, it is achieved thereby that being accurately positioned target literature and showing target literature information in detail.

Description

Literature search method and apparatus
Technical field
The present invention relates to Computer Applied Technology field, more particularly to a kind of literature search method and apparatus.
Background technology
Scientific research personnel is when carrying out scientific research, it usually needs search the scientific documents of association area to be referred to.Mesh Before, scientific research personnel is when searching scientific documents, mainly by inputting the title or DOI (Digital Object Unique of document Identifier, Digital Object Unique Identifier) inquired about.But, because quantity of document is numerous, therefore it is difficult to demand Scientific documents are accurately positioned.
The content of the invention
It is contemplated that at least solving one of technical problem in correlation technique to a certain extent.Therefore, the present invention First purpose is to propose a kind of literature search method, and this method is obtained by the DOI of document corresponding has single mark Document, and document is showed with special type pattern, it is achieved thereby that being accurately positioned target literature and showing target literature information in detail.
Second object of the present invention is to propose a kind of literature search device.
To achieve these goals, first aspect present invention embodiment proposes a kind of literature search method, including:Receive The query statement of user's input;Judge whether include Digital Object Unique Identifier DOI in query statement;If comprising DOI, DOI is extracted, and the document with single mark for including DOI is obtained according to DOI;Document is showed with special type pattern.
The literature search method of the embodiment of the present invention, the corresponding document with single mark is obtained by the DOI of document, And document is showed with special type pattern, it is achieved thereby that being accurately positioned target literature and showing target literature information in detail.
For up to above-mentioned purpose, second aspect of the present invention embodiment proposes a kind of literature search device, including:Receive mould Block, the query statement for receiving user's input;Judge module, it is whether unique comprising digital object in query statement for judging Identifier DOI;Extraction module, if there is single mark comprising DOI for comprising DOI, extracting DOI, and being obtained according to DOI The document of note;Display module, for showing document with special type pattern.
The literature search device of the embodiment of the present invention, the corresponding document with single mark is obtained by the DOI of document, And document is showed with special type pattern, it is achieved thereby that being accurately positioned target literature and showing target literature information in detail.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description Obtain substantially, or recognized by the practice of the present invention.
Brief description of the drawings
Fig. 1 is that the effect diagram that DOI searches document is inputted in existing search engine;
Fig. 2 is the flow chart of literature search method according to an embodiment of the invention;
Fig. 3 is the flow chart according to an embodiment of the invention for setting up DOI inverted index databases;
Fig. 4 is the effect diagram that special type pattern according to an embodiment of the invention shows document;
Fig. 5 is the flow chart of the literature search method according to a specific embodiment of the invention;
Fig. 6 is the structural representation of literature search device according to an embodiment of the invention;
Fig. 7 is the structural representation of the literature search device according to a specific embodiment of the invention;
Fig. 8 is the structural representation of the literature search device according to another specific embodiment of the invention.
Embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and be not considered as limiting the invention.
Researcher is generally when searching scientific documents, it is necessary to accurately find the specific document of certain piece.At present, mainly Document is accurately searched by input header in a search engine.But be due to that scientific research personnel is numerous, title identical document also compared with It is many, by title it is difficult to be accurately positioned to the document to be searched.Therefore, it is possible to by the way that document uniqueness can be represented DOI searches document.
But, DOI of the input comprising the document query statement in existing search engine, it is impossible to accurate to find correspondence Document, i.e., existing search engine do not support DOI to retrieve.As shown in figure 1, inputting DOI in 360 academic search engines " after 10.1016/0735-1097 (96) 82380-1 ", click on " searching for ", there are two documents in search result list, not It is " 10.1016/0735-1097 (96) 82380-1 " document, and can not from search result list that DOI, which can be accurately positioned, The details of document are obtained, such as summary, network originating.
Below with reference to the accompanying drawings the literature search method and apparatus of the embodiment of the present invention are described.
Fig. 2 is the flow chart of literature search method according to an embodiment of the invention.
As shown in Fig. 2 document searching method includes:
S201, receives the query statement of user's input.
For example, input inquiry sentence searches document to certain scientific research personnel in a search engine, so that search engine receives user The query statement of input.
S202, judges whether include DOI in query statement.
Specifically, after the query statement of user's input is obtained, judge whether include DOI in query statement.Wherein, DOI It is the identifier for showing document uniqueness.
For example, receiving the query statement " DOI of user's input:10.1056/NEJMoa062462 after paper ", judge Query statement " DOI:Whether DOI is included in 10.1056/NEJMoa062462 papers ".
S203, if comprising DOI, extracting DOI, and the document with single mark for including DOI is obtained according to DOI.
Specifically, if including DOI in query statement, other characters in addition to DOI in query statement are removed, to carry DOI is taken, and the document with single mark for including the DOI is obtained according to the DOI of extraction.
More specifically, after the DOI included in extracting query statement, by the DOI of extraction and DOI inverted index databases Data carry out correlation calculations, all treat selection comprising the DOI extracted so as to obtain from DOI inverted index databases Offer.Wherein, document to be selected may include the bibliography for including the DOI, or document comprising the DOI etc. in questions record information.
Due to bibliography may be included in document to be selected, therefore after document to be selected is obtained, the topic of document to be selected is extracted No. DOI in information is recorded, No. DOI in the questions record information of the document to be selected of extraction is matched with the DOI in query statement. Consistent document to be selected pair is matched with the DOI included in query statement and carries out single mark, that is to say, that by document to be selected Chinese No. DOI document consistent with the DOI in query statement offered carries out single mark, and can be obtained by DOI inverted index databases Take the document with single mark.
Specific example is as follows:User input query sentence " DOI:10.1056/NEJMoa062462 paper ", search engine Receive after query statement, know by judgement and DOI is included in query statement.
By judging to know query statement " DOI:Included in 10.1056/NEJMoa062462 papers " after DOI, remove and look into Ask the character " DOI in addition to DOI in sentence:" and " paper ", extracting DOI is:10.1056/NEJMoa062462.Extracting After DOI, the data in the DOI of extraction and DOI inverted index databases are subjected to correlation calculations, so that from DOI inverted indexs The document to be selected for including the DOI is obtained in database.Obtain after document to be selected, extract the DOI in the questions record information of document to be selected Number, No. DOI in the questions record information of document to be selected is matched with " 10.1056/NEJMoa062462 ".If certain treats selection No. DOI in the questions record information offered matches unanimously with " 10.1056/NEJMoa062462 ", then carries out single to the document to be selected Mark, and obtained from DOI inverted index databases with the single document marked.
In addition, inquiring about DOI inverted index databases in the DOI in query statement, selection is treated comprising DOI to obtain Before offering, DOI inverted index databases can be pre-established.The specific steps of DOI inverted index databases are set up, as shown in figure 3, It may include:
S301, obtains the document sample in network.
Specifically, from network or bibliographic data base, in such as Hownet, incomparably bibliographic data base, document sample is obtained.
S302, extracts the structural data in document sample.
After document sample is obtained, using machine learning model, OCR (Optical Character Recognition, optical character identification) technology, maximum entropy model etc., extract structural data from document sample, such as title, Author, periodical, time, issue, reel number, network originating, No. DOI, bibliography etc..
S303, DOI inverted index databases are set up according to structural data.
According to the structural data of extraction, using Inverted Index Technique, set up document DOI and closed with the correspondence of corresponding document System, so as to obtain DOI inverted index databases.
S204, shows document with special type pattern.
Specifically, after document of the DOI acquisitions with single mark in query statement is target literature, it can extract single The structural data of the document of piece mark, such as title, author, periodical, time, issue, reel number, network originating, No. DOI, reference Document etc., and special type pattern template is called, structural data is inserted in special type pattern template to show document.Certainly, also may be used Handled, generated as shown in Figure 4 with the structural data to single mark document, the structuring letter with certain format Breath.
After Fig. 4 is input inquiry sentence " 10.3778/j.issn.1002-8331.2012.01.001 ", show in the page Document corresponding with DOI " 10.3778/j.issn.1002-8331.2012.01.001 ".As seen from Figure 4, opened up in the page The title of document corresponding with DOI " 10.3778/j.issn.1002-8331.2012.01.001 ", author, summary, phase are showed The information such as periodical, time, reel number, keyword, reference amount, network originating, free download link.
Compare Fig. 4 and Fig. 1 and understand that compared with existing searching method, the present invention is realized accurately finds text by DOI Offer, and it is detailed in the page show documentation & info, so as to facilitate user to obtain documentation & info, and user can pass through Download link in the page downloads document.
In summary, the literature search method of the embodiment of the present invention, obtains corresponding with single mark by the DOI of document The document of note, and document is showed with special type pattern, it is achieved thereby that being accurately positioned target literature and showing target literature letter in detail Breath.
Fig. 5 is the flow chart of the literature search method according to a specific embodiment of the invention.
As shown in figure 5, document searching method includes:
S501, obtains the document sample in network.
Specifically, from network or bibliographic data base, in such as Hownet, incomparably bibliographic data base, document sample is obtained.
S502, extracts the structural data in document sample.
After document sample is obtained, using machine learning model, OCR technique, maximum entropy model etc., from document sample Extract structural data, such as title, author, periodical, time, issue, reel number, network originating, No. DOI, bibliography.
S503, DOI inverted index databases are set up according to structural data.
According to the structural data of extraction, using Inverted Index Technique, set up document DOI and closed with the correspondence of corresponding document System, so as to obtain DOI inverted index databases.
S504, user input query sentence " DOI:10.1056/NEJMoa062462 paper ".
User input inquiry sentence " DOI in a search engine:10.1056/NEJMoa062462 papers ", draw so as to search for Hold up the query statement for receiving user's input.
Whether DOI is included in S505, query statement.
Receiving the query statement " DOI of user's input:After 10.1056/NEJMoa062462 papers ", query statement is judged In whether include DOI.
S506, extracts the DOI in query statement.
If including DOI in query statement, remove other characters in addition to DOI in query statement, extract DOI.Through Cross judgement query statement " DOI:DOI is included in 10.1056/NEJMoa062462 papers ", extraction DOI is " DOI:10.1056/ NEJMoa062462”。
S507, normal retrieval.
If not including DOI in query statement, normal retrieval is carried out according to query statement.
S508, calculates the correlation of the data in the DOI and DOI inverted index databases extracted, obtains document to be selected.
Specifically, after the DOI " 10.1056/NEJMoa062462 " in extracting query statement, the DOI extracted is calculated " 10.1056/NEJMoa062462 " and the correlation of the data in DOI inverted index data, so that from DOI inverted index data Document to be selected is obtained in storehouse.
S509, No. DOI in the questions record information of document to be selected is matched with " 10.1056/NEJMoa062462 ".
Due to bibliography may be included in document to be selected, therefore No. DOI in the questions record information of document to be selected is extracted, will No. DOI in the questions record information of the document to be selected extracted is matched with " 10.1056/NEJMoa062462 ".
S510, if matching is consistent.
Judge whether No. DOI in the questions record information of document to be selected match unanimously with " 10.1056/NEJMoa062462 ".
S511, consistent document to be selected pair is matched with " 10.1056/NEJMoa062462 " and carries out single mark, and is obtained The document of single mark.
If No. DOI in the questions record information of certain document to be selected matches unanimously with " 10.1056/NEJMoa062462 ", Pair matching consistent No. DOI corresponding document to be selected with " 10.1056/NEJMoa062462 " carries out single mark, and obtains list The document of piece mark.
S512, does not deal with
If No. DOI in the questions record information of all documents to be selected mismatches with " 10.1056/NEJMoa062462 ", Do not deal with then.
S513, shows the document of single mark of acquisition.
After the document that corresponding single of No. DOI consistent with " 10.1056/NEJMoa062462 " matching is marked is obtained, The structural data of the document of extractable single mark, such as title, author, periodical, time, issue, reel number, network originating, DOI Number, bibliography etc., and call special type pattern template, structural data inserted in special type pattern template to show document.When So, the structural data of single mark document can be handled, structured message of the generation with certain format.
The literature search device that the embodiment of the present invention is proposed is described in detail with reference to Fig. 6.Fig. 6 is according to this hair The structural representation of the literature search device of bright one embodiment.
As shown in fig. 6, document searcher may include:Receiving module 610, judge module 620, extraction module 630, exhibition Existing module 640.
Wherein, receiving module 610 is used for the query statement for receiving user's input.
For example, input inquiry sentence searches document to certain scientific research personnel in a search engine, used so that receiving module 610 is received The query statement of family input.
Judge module 620 is used to judge whether include DOI in query statement.
Specifically, after the query statement that receiving module 610 obtains user's input, judge module 620 judges query statement In whether include DOI.Wherein, DOI is the identifier for showing document uniqueness.
For example, receiving the query statement " DOI of user's input in receiving module 610:10.1056/NEJMoa062462 is discussed After text ", judge module 620 judges query statement " DOI:Whether DOI is included in 10.1056/NEJMoa062462 papers ".
If extraction module 630 is used for comprising DOI, DOI is extracted, and being obtained according to DOI comprising DOI there is single to be marked The document of note.
As shown in fig. 7, on the basis of Fig. 6, extraction module 630 includes:Acquiring unit 631, matching unit 632, mark Unit 633.Wherein, acquiring unit 631 is used to inquire about DOI inverted index databases according to DOI, to be selected comprising DOI to obtain Document;Matching unit 632 is used to be matched No. DOI in the questions record information of document to be selected with DOI;Indexing unit 633 is used Single mark is carried out in pair matching consistent document to be selected with DOI, and obtains the document with single mark.
Specifically, if including DOI in query statement, extraction module 630 removes its in query statement in addition to DOI His character, extracts DOI, and obtains according to the DOI of extraction the document with single mark comprising the DOI.
More specifically, after the DOI included in extracting query statement, the DOI of extraction and DOI is fallen to arrange by acquiring unit 631 Data in index data base carry out correlation calculations, so as to obtain all comprising extracting from DOI inverted index databases DOI document to be selected.Wherein, document to be selected may include the bibliography for including the DOI, or the DOI is included in questions record information Document etc..
Due to that may include bibliography in bibliography, therefore after document to be selected is obtained, matching unit 632 is extracted and treated No. DOI in the questions record information that selection is offered, by No. DOI in the questions record information of the document to be selected of extraction with query statement DOI is matched.633 pairs of indexing unit matches consistent document to be selected with the DOI included in query statement and carries out single mark, The DOI of the document documents consistent with the DOI in query statement is selected from document to be selected and carries out single mark, and can be passed through DOI inverted indexs database obtains the document with single mark.
Specific example is as follows:User input query sentence " DOI:10.1056/NEJMoa062462 paper ", receiving module 610 receive after query statement, judge to know in query statement comprising DOI by judge module 420.
Judge module 620 judges to know query statement " DOI:Included in 10.1056/NEJMoa062462 papers " after DOI, Extraction module 630 removes the character " DOI in addition to DOI in query statement:" and " paper ", extracting DOI is:10.1056/ NEJMoa062462.After DOI is extracted, acquiring unit 631 enters the data in the DOI of extraction and DOI inverted index databases Row correlation calculations, so as to obtain the document to be selected for including the DOI from DOI inverted index databases.Obtain after document to be selected, Extract No. DOI in the questions record information of document to be selected, matching unit 632 calculate No. DOI in the questions record information of document to be selected with " 10.1056/NEJMoa062462 " is matched.If No. DOI and " 10.1056/ in the questions record information of certain document to be selected NEJMoa062462 " matchings are consistent, then 633 pairs of the indexing unit document to be selected carries out single mark, and from DOI inverted index numbers The document that there is single mark according to being obtained in storehouse.
In addition, as shown in figure 8, on the basis of Fig. 7, extraction module 630 also includes setting up unit 634.
Setting up unit 634 is used to inquire about DOI inverted index databases according to DOI, to obtain the document to be selected for including DOI Before, DOI inverted index databases are pre-established.
In query statement DOI inquire about DOI inverted index databases, with obtain comprising DOI document to be selected it Before, DOI inverted index databases can be pre-established., can be according to the DOI pre-established after DOI inverted index databases are set up DOI in inverted index database and query statement obtains the document with single mark.Specific example is as follows:
First, unit 634 is set up from network or bibliographic data base, incomparably such as Hownet, bibliographic data base, obtains text Offer sample.After document sample is obtained, using machine learning model, OCR technique, maximum entropy model etc., from document sample Extract structural data, such as title, author, periodical, the time, issue, reel number, network originating, No. DOI, in bibliography.
Then, according to the structural data of extraction, using Inverted Index Technique, document DOI and corresponding document pair are set up It should be related to, so as to obtain DOI inverted index databases.
And then, in certain user input inquiry sentence " DOI in a search engine:10.1056/NEJMoa062462 paper " Afterwards, acquiring unit 631 falls the DOI " 10.1056/NEJMoa062462 " in the query statement of extraction with the advance DOI that sets up The data arranged in index data base carry out correlation calculations, so as to obtain all comprising DOI in DOI inverted index databases The document of " 10.1056/NEJMoa062462 " is used as document to be selected.After acquiring unit 631 obtains document to be selected, matching unit 632 extract No. DOI in the questions record information of document to be selected, and No. DOI of extraction is matched with the DOI in query statement.Such as Really No. DOI in the questions record information of certain document to be selected is matched unanimously with the DOI in query statement, then 633 pairs of indexing unit this treat Selection offers single mark of progress, so as to obtain the document with single mark.
Display module 640 is used to show document with special type pattern.
Specifically, DOI in query statement is obtained after the document with single mark is target literature, display module The structural data of the document of 640 extractable single marks, such as title, author, periodical, the time, issue, reel number, network originating, No. DOI, bibliography etc., and special type pattern template is called, structural data is inserted in special type pattern template to show document. It is of course also possible to handle the structural data of single mark document, generate as shown in Figure 4, with certain format Structured message.
After Fig. 4 is input inquiry sentence " 10.3778/j.issn.1002-8331.2012.01.001 ", show in the page Document corresponding with DOI " 10.3778/j.issn.1002-8331.2012.01.001 ".As seen from Figure 4, opened up in the page The title of document corresponding with DOI " 10.3778/j.issn.1002-8331.2012.01.001 ", author, summary, phase are showed The information such as periodical, time, reel number, keyword, reference amount, network originating, free download link.
Compare Fig. 4 and Fig. 1 and understand that compared with existing searching method, the present invention is realized accurately finds text by DOI Offer, and can be detailed in the page the structured message for showing document, so as to facilitate user to obtain documentation & info, and User can download document by the download link in the page.
In summary, the literature search device of the embodiment of the present invention, obtains corresponding with single mark by the DOI of document The document of note, and document is showed with special type pattern, it is achieved thereby that being accurately positioned target literature and showing target literature letter in detail Breath.
In the description of this specification, reference term:" one embodiment ", " specific embodiment " " some embodiments ", " show The description of example ", " specific example " or " some examples " etc. mean to combine the specific features of the embodiment or example description, structure, Material or feature are contained at least one embodiment of the present invention or example.In this manual, above-mentioned term is shown The statement of meaning property is necessarily directed to identical embodiment or example.Moreover, specific features, structure, material or the spy of description Point can in an appropriate manner be combined in any one or more embodiments or example.In addition, in the case of not conflicting, Those skilled in the art can be by the not be the same as Example or example and non-be the same as Example described in this specification or example Feature is combined and combined.
Although embodiments of the invention have been shown and described above, it is to be understood that above-described embodiment is example Property, it is impossible to limitation of the present invention is interpreted as, one of ordinary skill in the art within the scope of the invention can be to above-mentioned Embodiment is changed, changed, replacing and modification.

Claims (14)

1. a kind of literature search method, it is characterised in that including:
Receive the query statement of user's input;
Judge whether include Digital Object Unique Identifier DOI in the query statement;
If comprising the DOI, the DOI were extracted, and being obtained according to the DOI comprising the DOI there is single to mark Document;
The document is showed with special type pattern.
2. the method as described in claim 1, it is characterised in that obtained according to the DOI has single mark comprising the DOI The document of note, including:
DOI inverted index databases are inquired about according to the DOI, to obtain the document to be selected for including the DOI;
No. DOI in the questions record information of the document to be selected is matched with the DOI;
Consistent document to be selected pair is matched with the DOI and carries out single mark, and obtains the document with single mark.
3. method as claimed in claim 2, it is characterised in that DOI inverted index databases are being inquired about according to the DOI, with Before document to be selected of the acquisition comprising the DOI, in addition to:
Pre-establish the DOI inverted indexs database.
4. method as claimed in claim 3, it is characterised in that pre-establish the DOI inverted indexs database, including:
Obtain the document sample in network;
Extract the structural data in the document sample;
The DOI inverted indexs database is set up according to the structural data.
5. method as claimed in claim 4, it is characterised in that extract the structural data in the document sample, including:
The structuring in the document sample is extracted using at least one of machine learning model, OCR technique, maximum entropy model Data.
6. method as claimed in claim 4, it is characterised in that the structural data include title, author, periodical, the time, Issue, reel number, network originating, No. DOI, the one or more in bibliography.
7. the method as described in claim 1, it is characterised in that the document is showed with special type pattern, including:
Special type pattern template is called to show the document;Or
The structured message of the document is extracted, and the structured message is inserted into shows page predeterminated position and carries out exhibition It is existing.
8. a kind of literature search device, it is characterised in that including:
Receiving module, the query statement for receiving user's input;
Judge module, for judging whether include Digital Object Unique Identifier DOI in the query statement;
Extraction module, if obtained for comprising the DOI, extracting the DOI, and according to the DOI comprising the DOI Document with single mark;
Display module, for showing the document with special type pattern.
9. device as claimed in claim 8, it is characterised in that the extraction module, including:
Acquiring unit, for inquiring about DOI inverted index databases according to the DOI, selection is treated to obtain comprising the DOI Offer;
Matching unit, for No. DOI in the questions record information of the document to be selected to be matched with the DOI;
Indexing unit, single mark is carried out for pair matching consistent document to be selected with the DOI, and obtains and described have single The document of mark.
10. device as claimed in claim 9, it is characterised in that extraction module also includes:
Unit is set up, for inquiring about DOI inverted index databases according to the DOI, selection is treated comprising the DOI to obtain Before offering, the DOI inverted indexs database is pre-established.
11. device as claimed in claim 10, it is characterised in that described to set up unit, is used for:
Obtain the document sample in network;
Extract the structural data in the document sample;
The DOI inverted indexs database is set up according to the structural data.
12. device as claimed in claim 11, it is characterised in that described to set up unit, is used for:
The structuring in the document sample is extracted using at least one of machine learning model, OCR technique, maximum entropy model Data.
13. device as claimed in claim 11, it is characterised in that the structural data includes title, author, periodical, year Part, issue, reel number, network originating, No. DOI, the one or more in bibliography.
14. device as claimed in claim 8, it is characterised in that the display module, is used for:
Special type pattern template is called to show the document;Or
The structured message of the document is extracted, and the structured message is inserted into shows page predeterminated position and carries out exhibition It is existing.
CN201611130331.2A 2016-12-09 2016-12-09 Literature search method and apparatus Pending CN107066474A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611130331.2A CN107066474A (en) 2016-12-09 2016-12-09 Literature search method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611130331.2A CN107066474A (en) 2016-12-09 2016-12-09 Literature search method and apparatus

Publications (1)

Publication Number Publication Date
CN107066474A true CN107066474A (en) 2017-08-18

Family

ID=59618663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611130331.2A Pending CN107066474A (en) 2016-12-09 2016-12-09 Literature search method and apparatus

Country Status (1)

Country Link
CN (1) CN107066474A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086255A (en) * 2018-07-09 2018-12-25 北京大学 A kind of bibliography automatic marking method and system based on deep learning
CN109189948A (en) * 2018-08-06 2019-01-11 南京快文信息科技有限公司 A kind of data processing method and device for content index
CN111259168A (en) * 2019-01-31 2020-06-09 中粮营养健康研究院有限公司 Document processing method, document processing apparatus, storage medium, and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050240579A1 (en) * 2004-04-27 2005-10-27 Konica Minolta Holdings, Inc. Information retrieval system
CN104239570A (en) * 2014-09-30 2014-12-24 百度在线网络技术(北京)有限公司 Method and device for searching for paper

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050240579A1 (en) * 2004-04-27 2005-10-27 Konica Minolta Holdings, Inc. Information retrieval system
CN104239570A (en) * 2014-09-30 2014-12-24 百度在线网络技术(北京)有限公司 Method and device for searching for paper

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
田杰等: "DOI在科技信息资源搜索与利用中的应用", 《中国科技资源导刊》 *
韩春晓: "中文期刊个性化搜索引擎的设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086255A (en) * 2018-07-09 2018-12-25 北京大学 A kind of bibliography automatic marking method and system based on deep learning
CN109189948A (en) * 2018-08-06 2019-01-11 南京快文信息科技有限公司 A kind of data processing method and device for content index
CN109189948B (en) * 2018-08-06 2021-08-20 南京快文信息科技有限公司 Data processing method and device for content indexing
CN111259168A (en) * 2019-01-31 2020-06-09 中粮营养健康研究院有限公司 Document processing method, document processing apparatus, storage medium, and device
CN111259168B (en) * 2019-01-31 2023-08-01 中粮营养健康研究院有限公司 Document processing method, device, storage medium and apparatus

Similar Documents

Publication Publication Date Title
CN106202382B (en) Link instance method and system
US9645979B2 (en) Device, method and program for generating accurate corpus data for presentation target for searching
CN102053991B (en) Method and system for multi-language document retrieval
WO2008152805A1 (en) Image recognizing apparatus and image recognizing method
JP2016508264A5 (en)
CN110929125A (en) Search recall method, apparatus, device and storage medium thereof
CN103412852B (en) A kind of method for automatically extracting key information of English literature
CN101673266A (en) Method for searching audio and video contents
CN103235821B (en) Original content searching method and searching server
JP2015153013A5 (en)
CN107066474A (en) Literature search method and apparatus
CN102193940A (en) Method of carrying out characteristic analysis and data extraction on two-dimensional table
CN108536676A (en) Data processing method, device, electronic equipment and storage medium
CN101673263B (en) Method for searching video content
Ohta et al. CRF-based bibliography extraction from reference strings focusing on various token granularities
WO2009066392A1 (en) Map-searching device, map-searching method, map-searching program, and recording medium
Matsuoka et al. Examination of effective features for CRF-based bibliography extraction from reference strings
CN101833584A (en) System and method for searching teaching video contents in embedded equipment
US10606875B2 (en) Search support apparatus and method
CN107577667A (en) A kind of entity word treating method and apparatus
CN106777191A (en) A kind of search modes generation method and device based on search engine
JP3825829B2 (en) Registration information retrieval apparatus and method
WO2019119030A1 (en) Image analysis
JP2007011892A (en) Vocabulary acquisition method and device, program, and storage medium storing program
KR20150134645A (en) Author clearly confirm device and method.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170818