CN109189916B - English abstract key information extraction method and device and electronic equipment - Google Patents

English abstract key information extraction method and device and electronic equipment Download PDF

Info

Publication number
CN109189916B
CN109189916B CN201810945529.9A CN201810945529A CN109189916B CN 109189916 B CN109189916 B CN 109189916B CN 201810945529 A CN201810945529 A CN 201810945529A CN 109189916 B CN109189916 B CN 109189916B
Authority
CN
China
Prior art keywords
information
english
abstract
conclusion
english abstract
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810945529.9A
Other languages
Chinese (zh)
Other versions
CN109189916A (en
Inventor
杜林蔚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201810945529.9A priority Critical patent/CN109189916B/en
Publication of CN109189916A publication Critical patent/CN109189916A/en
Application granted granted Critical
Publication of CN109189916B publication Critical patent/CN109189916B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an English abstract key information extraction method, an English abstract key information extraction device and electronic equipment, and relates to the technical field of information processing; and then extracting conclusion information in the English abstract information, and combining published information corresponding to the English literature to obtain abstract condensed information corresponding to the English literature. Therefore, the method can form short abstract concentrated information by further refining and processing the English abstract information, so that when a reader searches documents, the key information of the documents is visually presented to the reader, the reader does not need to click to check the complete abstract information, the method is simple and convenient, the searching efficiency is improved, and meanwhile, the problem of reading fatigue is effectively relieved.

Description

English abstract key information extraction method and device and electronic equipment
Technical Field
The invention relates to the technical field of information processing, in particular to an English abstract key information extraction method and device and electronic equipment.
Background
Along with the development of science and technology, the demand and query frequency of English documents of users rise year by year. In english literature of great abundance as the sea of cigarette, it is important to understand and locate the required information quickly. The key information of the english literature is valuable in the following three occasions:
1. during the process of looking up the English documents, readers can keep track of some key information such as titles, keywords, abstracts and the like so as to judge whether the full text of the documents has value for further reading according to the information.
2. After a reader reads an English document, if the document is considered to be worth keeping, the key information in the document needs to be stored for later viewing.
3. During the process of searching the English literature which has been read before, the reader can inquire the literature more quickly according to the concise and organized key information.
Furthermore, in the literature, the abstract aims to provide an outline of the content of the literature, and brief and definite short texts describing the important content of the literature are not added with comments and supplementary explanations. In order to quickly acquire the key information of the document, a reading abstract mode can be adopted, a reader clicks a corresponding link, opens a corresponding document, and then reads the abstract to find the key information. When a large amount of documents need to be searched, repeated operation is carried out for many times, which is tedious, time-consuming, low in efficiency and easy to cause reading fatigue.
Disclosure of Invention
In view of the above, the present invention provides an english abstract key information extraction method, an apparatus and an electronic device, which visually present key information of a document to a reader, do not need the reader to click to view complete abstract information, are simple and convenient, improve retrieval efficiency, and effectively alleviate the problem of reading fatigue.
In a first aspect, an embodiment of the present invention provides an english abstract key information extraction method, including:
obtaining English abstract information of English documents;
extracting conclusion information in the English abstract information;
and generating abstract condensed information according to the conclusion information and published information corresponding to the English literature.
With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where before acquiring the english abstract information of the english literature, the method further includes:
screening out English documents according to the publication sequence of the English documents, the relevance between article titles and input search words and the sequencing of influence factors; or
And screening the documents according to the field types of the English documents to obtain the English documents.
With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the extracting conclusion information in the english abstract information includes:
performing text analysis on the English abstract information, and searching whether preset signal words exist in the English abstract information or not;
if so, extracting statement information behind the preset signal word, and taking the statement information as conclusion information in the English abstract information.
With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the extracting conclusion information in the english abstract information includes:
performing text analysis on the English abstract information, and searching whether preset keywords exist in the English abstract information or not;
if yes, extracting the preset keyword and statement information behind the preset keyword, and taking the statement information as conclusion information in the English abstract information.
With reference to the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the extracting conclusion information in the english abstract information includes:
and extracting sentence information of a preset part from the English abstract information, and taking the sentence information as conclusion information in the English abstract information.
With reference to the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where the generating, according to the conclusion information and the published information corresponding to the english literature, summary and condensed information includes:
selecting an abstract sentence pattern structure template according to the extraction mode of the conclusion information; the extraction mode is determined according to whether preset signal words or preset keywords exist in the English abstract information or not;
and adding the conclusion information and the published information to corresponding positions of the abstract sentence pattern structure template to generate abstract condensed information.
With reference to the first aspect, an embodiment of the present invention provides a sixth possible implementation manner of the first aspect, where after extracting conclusion information in the english abstract information, the method further includes:
and detecting a first person vocabulary in the conclusion information, and converting the first person vocabulary into an objective vocabulary.
With reference to the first aspect, an embodiment of the present invention provides a seventh possible implementation manner of the first aspect, where after the generating the summary condensed information, the method further includes:
and generating a document abstract document according to the abstract concentrated information and a preset export template.
In a second aspect, an embodiment of the present invention further provides an apparatus for extracting key information of an english abstract, including:
the acquisition module is used for acquiring English abstract information of the English literature;
the extraction module is used for extracting conclusion information in the English abstract information;
and the generating module is used for generating abstract condensed information according to the conclusion information and the published information corresponding to the English literature.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor executes the computer program to implement the method described in the first aspect and any possible implementation manner thereof.
The embodiment of the invention has the following beneficial effects:
in the embodiment provided by the invention, firstly, English abstract information of an English document is obtained; and then extracting conclusion information in the English abstract information, and combining published information corresponding to the English literature to obtain abstract condensed information corresponding to the English literature. Therefore, the method can form short abstract concentrated information by further refining and processing the English abstract information, so that when a reader searches documents, the key information of the documents is visually presented to the reader, the reader does not need to click to check the complete abstract information, the method is simple and convenient, the searching efficiency is improved, and meanwhile, the problem of reading fatigue is effectively relieved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flow chart of a method for extracting key information of an english abstract according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of another method for extracting key information of an english abstract according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an apparatus for extracting key information of an english abstract according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of another apparatus for extracting key information of an english abstract according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
At present, in order to quickly acquire key information of a document, a reading abstract mode can be adopted, a reader clicks a corresponding link, opens a corresponding document, and then reads the abstract to find the key information. When a large amount of documents need to be searched, repeated operation is carried out for many times, which is tedious, time-consuming, low in efficiency and easy to cause reading fatigue.
Based on this, the method, the device and the electronic device for extracting key information of the english abstract provided by the embodiment of the invention can form short abstract concentrated information by further refining and processing the english abstract information, so that when a reader searches documents, the key information of the documents is visually presented to the reader, the reader does not need to click to check complete abstract information, the method, the device and the electronic device are simple and convenient, and the problem of reading fatigue is effectively relieved while the searching efficiency is improved.
In order to facilitate understanding of the embodiment, a detailed description is first given to an english abstract key information extraction method disclosed in the embodiment of the present invention.
The first embodiment is as follows:
fig. 1 is a flowchart illustrating a method for extracting key information of an english abstract according to an embodiment of the present invention. As shown in fig. 1, the method for extracting key information of an english abstract includes:
step S101, obtaining English abstract information of the English literature.
Wherein the english literature can be obtained from a corresponding database or a website.
And step S102, extracting conclusion information in the English abstract information.
The English abstract information generally comprises four parts of an object (Objective), a method (Methods), a result (Results) and a Conclusion (constraint). And the conclusion information contains the most important key information of the English literature.
And step S103, generating abstract condensed information according to the conclusion information and the published information corresponding to the English literature.
The summary condensed information includes publication information and conclusion information, and the publication information may include, but is not limited to, a publication journal name, publication time, and author information of the english literature. After the abstract condensed information is generated, the abstract condensed information is directly displayed, so that a reader can quickly and intuitively know the key information in the English literature.
The above processing of the english literature may be performed for one english literature or for a plurality of searched english literatures.
Therefore, the short abstract concentrated information can be formed by further refining and processing the English abstract information in the embodiment of the invention, so that when a reader searches documents, the key information of the documents is visually presented to the reader, the reader does not need to click to check the complete abstract information, the method is simple and convenient, the searching efficiency is improved, and meanwhile, the problem of reading fatigue is effectively relieved.
Example two:
fig. 2 is a flowchart illustrating another method for extracting key information in an english abstract according to an embodiment of the present invention. As shown in fig. 2, the method for extracting key information of an english abstract includes:
step S201, screening and acquiring english documents.
In a possible embodiment, the english literature may be screened based on various databases such as existing Pubmed, Embase, Medline, and the like, or may be screened in an english literature database established in advance according to individual needs (for example, needs for publication time and literature fields). In a specific Application, the API (Application Programming Interface) corresponding to the database may be directly called to search the english documents in the database.
Specifically, the english documents can be screened and acquired according to the publication order of the english documents, the relevance between the article titles and the input search terms, and the ranking of the influence factors.
For example, the reader may first select documents satisfying the requirement based on the corresponding document retrieval system, and further select one or more english documents according to the publication order of the english documents, or according to the correlation between the chapter title and the input search term, or the SCI (Science Citation Index) influence factor, etc. Wherein SCI influence factors rank the periodicals governed by the literature according to IF (impact factor) score.
Of course, in a possible embodiment, in order to meet the personalized requirements of readers, document screening is also performed according to the field type to which the english documents belong, so as to obtain the english documents. For example, the english literature belongs to the field types such as Clinical prediction (Clinical prediction), Etiology (ethology), Diagnosis (Diagnosis), treatment (Therapy), Prognosis (Prognosis), and the like in the medical field. Wherein the literature for each type is defined in advance.
Step S202, obtaining English abstract information of the English literature.
After the english document is acquired through step S201, extraction of the english abstract information is performed. For example, the english abstract information can be directly extracted and obtained based on the Pubmed database.
Step S203, performing text analysis on the english abstract information, and retrieving whether a preset signal word exists in the english abstract information.
If the preset signal word exists in the english abstract information, after step S204 is executed, step S208 is executed; if no preset signal word exists in the english abstract information, step S205 is executed.
Step S204, extracting the sentence information after the preset signal word, and using the sentence information as the conclusion information in the english abstract information.
For example, text analysis is performed on the english abstract information to determine whether the vocabulary in the english abstract information contains a preset signal word. Considering that the generic english abstract generally includes a Conclusion part guided by Conclusion, in the present embodiment, the preset semaphore includes "Conclusion". If the english literature information is a preset semaphore following a summary segmented structure (i.e. including four parts of segmented purpose (Objective), method (Methods), Results (Results), and Conclusion (Conclusion)), or having no segment but definite "Conclusion", the statement information after "Conclusion" is only required to be the Conclusion information in the english literature information.
In addition, for the english abstract information in which the result part and the Conclusion part are merged in the english literature or the paragraphs in the english abstract are merged together, there is no explicit preset semaphore "classification", and the Conclusion part needs to be distinguished. In some english documents, the conclusion part has obvious signal words, such as: "in a word", "in short", "in brief", "in containment", "in general", "to summary", etc., and thus in a possible embodiment, the preset semaphore further includes "in a word", "in short", "in brief", "in containment", "in general", "to summary".
In practical application, a preset signal word is stored firstly, during text analysis, retrieval is performed according to the stored preset signal word, and as long as the English literature information contains the preset signal word, sentence information after the preset signal word is used as conclusion information in the English literature information. For example, if there are statements "In restriction, CMTM4 display an animal roll In the turning over, medium-bound VE-computer at AJs, and medium-bound barrier function and controlling variable routing" In the English summary information, the conclusion information is "CMTM 4 display an animal roll In the turning over, medium-bound VE-computer at AJs, and medium-bound barrier function and controlling variable routing".
For convenience of the following description, the above manner of extracting conclusion information according to the preset signal words may be referred to as a first extraction manner.
Step S205, performing text analysis on the english abstract information, and retrieving whether a preset keyword exists in the english abstract information.
If the preset keyword exists in the english abstract information, after step S206 is executed, step S208 is executed; if there is no preset keyword in the english abstract information, step S207 is executed.
Step S206, extracting the preset keyword and the sentence information following the preset keyword, and using the sentence information as the conclusion information in the english abstract information.
Considering that there is a considerable amount of information on the english abstract that combines the result part and the conclusion part, or combines the paragraphs in the english abstract together, the conclusion part indicates the conclusion through the fixed sentence pattern or the common vocabulary in the fixed sentence pattern, and there is a large variability in the sentence pattern.
For example: the term "The results suggested that at … …" sometimes appears in The form of "The Data suggested that at … …", "Data suggested that at … … or … …", or "suggested that at … …", etc., so that there is a need to further extract keywords that can indicate a conclusion therefrom. Based on this, the above predetermined keywords include, but are not limited to, "container", "ill-trate", "gather", "delete", "found", "ported", "support", "provide", "show", "recent", "index", "highlight", "recordmend", "in this case", etc., and also include various temporal and morphological forms of the above common words, such as "container", "contained", and "containing".
In practical application, preset keywords are stored firstly, during text analysis, retrieval is performed according to the stored preset keywords, and as long as the English literature information contains the preset keywords, the preset keywords and subsequent statement information are used as conclusion information in the English literature information.
For example, if there is a statement "the same data detected that is associated with a different spectral associated with C2CD 3-a different and not all partial presentations with the same spectral features of OFD 14" in the English summary information, then the conclusion information is "detected that is associated with a different spectral associated with C2CD 3-a different and not all partial presentations with the same spectral features of OFD 14".
For convenience of description later, the above manner of extracting the conclusion information according to the preset keyword may be referred to as a second extraction manner.
It should be noted that, in the first extraction manner and the second extraction manner, if two or more preset keywords or preset signal words appear in the same english abstract information, the last preset keyword or preset signal word is taken as the reference to extract the conclusion information.
Step S207, extracting statement information of a preset portion from the english abstract information, and using the statement information as conclusion information in the english abstract information.
In consideration of the fact that in many abstracts of English documents, no clear signal words or keyword prompts exist in a conclusion part, or in some abstracts of English documents, no conclusion part exists at all, and the judgment can be carried out only by semantic understanding. When summarizing and summarizing the abstract content of such english documents, since the summary content is generally placed at the end of the whole abstract, it may be considered to obtain the summary content from the latter part of the english abstract information, specifically, obtain a preset number of sentences from the back to the front in the english abstract information, and use the sentence information of the preset number of sentences (for example, 3 sentences, that is, the first to last sentence, the second to last sentence, and the third to last sentence) as the conclusion information in the english abstract information.
For convenience of the following description, the manner of extracting the conclusion information described in step S207 above may be referred to as a third extraction manner.
Step S208, detecting the first-person vocabulary in the conclusion information, and converting the first-person vocabulary into an objective vocabulary.
In order to ensure that the extracted content is smoothly linked with a pre-designed abstract sentence pattern structure template, the richness of the concentrated abstract template is considered to be increased, and the possibility of semantic conflict or repeated narration is reduced; in the above extracted conclusion information, terms such as "We" and "our" as first-person terms need to be replaced with objective terms such as "research" and "the", respectively.
For example: converting statement information of 'We aid to access the knowledge of diagnosis for OSA and predicting out' in conclusion information into: "research aid to access the knowledge of diagnosis for OSA and predicting outcontrol".
After the conclusion information extraction and conversion of "In constraint, In student, CMTM4 display an elementary role In the turn over of membrane-bound VE-computer at AJs, and the concluding implicit barrier function and controlling vascular routing" In the English abstract information is: "In the studio, CMTM4play an animal role In the turning over of membrane-bound VE-cadherin at AJs, mediating end neighbouring barrier function and regulating vascular routing".
Step S209, detecting the abbreviation in the conclusion information, obtaining the complete vocabulary corresponding to the abbreviation, and adding the complete vocabulary to the conclusion information.
Considering that a great number of english acronyms may appear in the conclusion information, in order to ensure that the meanings of the acronyms are clear and easy to understand, the full names of the acronyms need to be supplemented. The method for acquiring the complete vocabulary corresponding to the abbreviated side in the embodiment of the present invention can be, but is not limited to, the following two manners.
The first method comprises the following steps: the abbreviation is searched in the English literature, and if the initial in the abbreviation is different from other letters of the abbreviation, when the abbreviation is contained in brackets, a word before the abbreviation and a word after the abbreviation are extracted as a complete vocabulary of the abbreviation. For example: the expression "whole eximer sequencing" (WES) appears in the literature as a complete vocabulary of the acronym "WES".
And the second method comprises the following steps: and pre-storing complete vocabularies corresponding to the contraction rate words in the field to which each English document belongs. It is considered that most of the reduction vocabulary is well known. Therefore, the complete vocabulary corresponding to the abbreviation can be determined according to the field type of the English literature. As in the medical field: the complete vocabulary corresponding to the contraction rate word WES is 'hole exterior sequencing'; the complete vocabulary corresponding to the contraction rate word 'AJs' is 'adherens junctions'; the complete vocabulary corresponding to the contraction rate word "VE" is "influndibular muscles"; the acronym "PVB" is used in its entirety with the term "Vascal endothiial"; the abbreviation "TEER" corresponds to the full term "transmission elastic electrical resistance".
In addition, in order to avoid repeated occurrences of the same abbreviation in full english, a complete vocabulary corresponding to the abbreviation is added to the conclusion information just once, and in the embodiment, the complete vocabulary is added to the first occurrence of the abbreviation, specifically, to the front of the first occurrence of the abbreviation.
Step S210, selecting a summary sentence pattern structure template according to the extraction method of the conclusion information.
The extraction mode is determined according to whether the preset signal words or the preset keywords exist in the english abstract information, as described in the above steps S203 to S207, when the preset signal words exist in the english abstract information, the extraction mode of the conclusion information is the first extraction mode, and the corresponding abstract sentence pattern structure template is the first template; for example: a research published in … … (here filling the journal name in the publication information) found … … (here filling the conclusion information).
When the preset signal words do not exist in the English abstract information and the preset keywords exist in the English abstract information, the extraction mode of the conclusion information is a second extraction mode, and the corresponding abstract sentence pattern structure template is a second template; for example: a research published in … … (here filling the journal name in the publication information) … … (here filling the conclusion information).
When the English abstract information has no preset signal words or preset keywords, the extraction mode of the conclusion information is a third extraction mode, and the corresponding abstract sentence pattern structure template is a third template; where the third template may be … … (where publication information is filled out): … … (where conclusion information is filled out).
It should be noted that the form of the abstract sentence pattern structure template may be limited according to actual situations, only to make the expression smoother after the english abstract information is extracted.
In step S211, the conclusion information and the publication information are added to the corresponding positions of the abstract sentence pattern structure template to generate the abstract condensed information.
The summary condensed information includes publication information and conclusion information, and the publication information may include, but is not limited to, a publication journal name, publication time, and author information of the english literature. In a possible embodiment, conclusion information and publication information may be added to the summary sentence structure template at the corresponding locations.
After the abstract condensed information is generated, the abstract condensed information is directly displayed, so that a reader can quickly and intuitively know the key information in the English literature.
It should be noted that the above step S208 and step S209 may be executed after step S207, and the sequence of execution between step S208 and step S209 is not limited.
In order to facilitate the readers to read without obstacles, in a possible embodiment, after step S211, the method further includes calling a preset translation interface to translate the abstract concentrated information to obtain the abstract concentrated information in the preset language.
For example, the reader can translate the abstract condensed information according to the needs of the reader, and express the abstract condensed information by using a language which is familiar to the reader, such as Chinese, Korean, Japanese, and the like. The pre-set translation interface may be a call interface to existing translation software.
In a possible embodiment, after obtaining the abstract condensed information of the retrieved english literature or english literatures, the abstract condensed information can be derived according to actual needs, so as to be convenient for the reader to view. Based on this, the above method further includes, after step S211: and generating a document abstract document according to the abstract condensed information and a preset export template.
The preset export template may include a domain type to which the english literature belongs, publication time of the english literature, search time, the number of retrieved english literatures, the number of exported english literatures (which may be set according to individual needs), and a title corresponding to each english literature. Filling the abstract condensed information in a proper position according to the corresponding title of the English literature.
The document summary document can be used as a report file or a newsletter file.
It should be noted that the method can be applied to a terminal such as a mobile phone or a computer in the form of a public number, an applet, an APP (Application), or a web page, but is not limited thereto.
In summary, the solution in the embodiment adopts advanced artificial intelligence technologies such as natural language processing and machine learning, and further refines and translates the document abstract by methods such as heuristic, extraction and generation, so as to assist people in various fields, such as medical staff, marketers, salespersons and doctors, to closely track the latest progress in the treatment field and quickly obtain the latest information.
The embodiment of the invention can form short abstract concentrated information by further refining and processing the English abstract information, thereby visually presenting the key information of the document to a reader when the reader searches the document, without clicking to check the complete abstract information by the reader, being simple and convenient, improving the searching efficiency and effectively relieving the reading fatigue problem.
Example three:
for the method for extracting key information in an english abstract in the first embodiment or the second embodiment, an embodiment of the present invention provides an apparatus for extracting key information in an english abstract, as shown in fig. 3, where the apparatus includes:
the acquisition module 11 is used for acquiring English abstract information of an English document;
the extraction module 12 is used for extracting conclusion information in the English abstract information;
and a generating module 13, configured to generate summary condensed information according to the conclusion information and the published information corresponding to the english literature.
Further, referring to fig. 4, the apparatus further comprises a screening module 14, wherein the screening module 14 is configured to:
screening out English documents according to the publication sequence of the English documents, the relevance between article titles and input search words and the sequencing of influence factors; or
And screening the documents according to the field types of the English documents to obtain the English documents.
Further, the extracting module 12 is further configured to:
performing text analysis on the English abstract information, and searching whether preset signal words exist in the English abstract information or not; if yes, extracting statement information behind the preset signal word, and taking the statement information as conclusion information in the English abstract information.
Further, the extracting module 12 is further configured to:
performing text analysis on the English abstract information, and searching whether preset keywords exist in the English abstract information or not; if yes, extracting the preset keyword and sentence information behind the preset keyword, and taking the sentence information as conclusion information in the English abstract information.
Further, the extracting module 12 is further configured to:
and extracting preset part of sentence information from the English abstract information, and taking the sentence information as conclusion information in the English abstract information.
Further, the generating module 13 is further configured to:
selecting an abstract sentence pattern structure template according to the extraction mode of the conclusion information; the extraction mode is determined according to whether preset signal words or preset keywords exist in the English abstract information or not;
and adding the conclusion information and the published information to corresponding positions of the abstract sentence pattern structure template to generate abstract condensed information.
Further, the apparatus further includes a conversion module 15, where the conversion module 15 is configured to:
and detecting the first-person vocabulary in the conclusion information, and converting the first-person vocabulary into objective vocabulary.
Further, the apparatus further includes an adding module 16, where the adding module 16 is configured to:
and detecting the abbreviation in the conclusion information, acquiring a complete vocabulary corresponding to the abbreviation, and adding the complete vocabulary to the conclusion information.
Further, the apparatus further comprises a derivation module 17, where the derivation module 17 is configured to:
and generating a document abstract document according to the abstract condensed information and a preset export template.
The embodiment of the invention can form short abstract concentrated information by further refining and processing the English abstract information, thereby visually presenting the key information of the document to a reader when the reader searches the document, without clicking to check the complete abstract information by the reader, being simple and convenient, improving the searching efficiency and effectively relieving the reading fatigue problem.
Example four:
referring to fig. 5, an embodiment of the present invention further provides an electronic device 100, including: a processor 40, a memory 41, a bus 42 and a communication interface 43, wherein the processor 40, the communication interface 43 and the memory 41 are connected through the bus 42; the processor 40 is arranged to execute executable modules, such as computer programs, stored in the memory 41.
The Memory 41 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 43 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.
The bus 42 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 5, but this does not indicate only one bus or one type of bus.
The memory 41 is used for storing a program, the processor 40 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 40, or implemented by the processor 40.
The processor 40 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 40. The Processor 40 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory 41, and the processor 40 reads the information in the memory 41 and completes the steps of the method in combination with the hardware thereof.
The device for extracting key information of an english abstract and the electronic equipment provided by the embodiment of the invention have the same technical characteristics as the method for extracting key information of an english abstract provided by the embodiment, so that the same technical problems can be solved, and the same technical effect can be achieved.
The computer program product for performing the method for extracting key information of an english abstract according to the embodiment of the present invention includes a computer readable storage medium storing a nonvolatile program code executable by a processor, where instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, and will not be described herein again.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the apparatus and the electronic device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (7)

1. An English abstract key information extraction method is characterized by comprising the following steps:
obtaining English abstract information of English documents;
performing text analysis on the English abstract information, and searching whether preset signal words exist in the English abstract information or not;
if a preset signal word exists in the English abstract information, extracting statement information behind the preset signal word, and taking the statement information as conclusion information in the English abstract information;
if no preset signal word exists in the English abstract information, performing text analysis on the English abstract information, and searching whether a preset keyword exists in the English abstract information or not;
if the English abstract information contains a preset keyword, extracting the preset keyword and sentence information behind the preset keyword, and taking the sentence information as conclusion information in the English abstract information;
if the English abstract information does not have preset keywords, extracting sentence information of a preset part from the English abstract information, and taking the sentence information as conclusion information in the English abstract information;
and generating abstract condensed information according to the conclusion information and published information corresponding to the English literature.
2. The method of claim 1, wherein before obtaining the english abstract information of the english literature, the method further comprises:
screening out English documents according to the publication sequence of the English documents, the relevance between article titles and input search words and the sequencing of influence factors; or
And screening the documents according to the field types of the English documents to obtain the English documents.
3. The method of claim 1, wherein the generating summary condensed information according to the conclusion information and the published information corresponding to the english literature comprises:
selecting an abstract sentence pattern structure template according to the extraction mode of the conclusion information; the extraction mode is determined according to whether preset signal words or preset keywords exist in the English abstract information or not;
and adding the conclusion information and the published information to corresponding positions of the abstract sentence pattern structure template to generate abstract condensed information.
4. The method according to claim 1, wherein after extracting the conclusion information from the english abstract information, the method further comprises:
and detecting a first person vocabulary in the conclusion information, and converting the first person vocabulary into an objective vocabulary.
5. The method of claim 1, wherein after generating the summary condensed information, further comprising:
and generating a document abstract document according to the abstract concentrated information and a preset export template.
6. An English abstract key information extraction device is characterized by comprising:
the acquisition module is used for acquiring English abstract information of the English literature;
the extraction module is used for performing text analysis on the English abstract information and searching whether preset signal words exist in the English abstract information or not; if a preset signal word exists in the English abstract information, extracting statement information behind the preset signal word, and taking the statement information as conclusion information in the English abstract information; if no preset signal word exists in the English abstract information, performing text analysis on the English abstract information, and searching whether a preset keyword exists in the English abstract information or not; if the English abstract information contains a preset keyword, extracting the preset keyword and sentence information behind the preset keyword, and taking the sentence information as conclusion information in the English abstract information; if the English abstract information does not have preset keywords, extracting sentence information of a preset part from the English abstract information, and taking the sentence information as conclusion information in the English abstract information;
and the generating module is used for generating abstract condensed information according to the conclusion information and the published information corresponding to the English literature.
7. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 5 when executing the computer program.
CN201810945529.9A 2018-08-17 2018-08-17 English abstract key information extraction method and device and electronic equipment Active CN109189916B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810945529.9A CN109189916B (en) 2018-08-17 2018-08-17 English abstract key information extraction method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810945529.9A CN109189916B (en) 2018-08-17 2018-08-17 English abstract key information extraction method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN109189916A CN109189916A (en) 2019-01-11
CN109189916B true CN109189916B (en) 2022-04-22

Family

ID=64918759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810945529.9A Active CN109189916B (en) 2018-08-17 2018-08-17 English abstract key information extraction method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN109189916B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458718A (en) * 2009-01-05 2009-06-17 北京大学 Search engine dynamic summarization extracting method
CN103412852A (en) * 2013-08-21 2013-11-27 广东电子工业研究院有限公司 Method for automatically extracting key information of English literature
CN104731959A (en) * 2015-04-03 2015-06-24 北京威扬科技有限公司 Video abstraction generating method, device and system based on text webpage content
CN105574185A (en) * 2015-12-22 2016-05-11 北京奇虎科技有限公司 Method and device for providing clustering type intelligent summaries
CN108364677A (en) * 2018-03-13 2018-08-03 汤臣倍健股份有限公司 A kind of evaluating method and its device based on various dimensions health control model

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10025774B2 (en) * 2011-05-27 2018-07-17 The Board Of Trustees Of The Leland Stanford Junior University Method and system for extraction and normalization of relationships via ontology induction
US8311973B1 (en) * 2011-09-24 2012-11-13 Zadeh Lotfi A Methods and systems for applications for Z-numbers
CN102779143B (en) * 2012-01-31 2014-08-27 中国科学院自动化研究所 Visualizing method for knowledge genealogy
US10452698B2 (en) * 2015-05-11 2019-10-22 Stratifyd, Inc. Unstructured data analytics systems and methods
CN107391690B (en) * 2017-07-25 2020-03-31 李小明 Method for processing document information
CN108038096A (en) * 2017-11-10 2018-05-15 平安科技(深圳)有限公司 Knowledge database documents method for quickly retrieving, application server computer readable storage medium storing program for executing
CN107845422A (en) * 2017-11-23 2018-03-27 郑州大学第附属医院 A kind of remote medical consultation with specialists session understanding and method of abstracting based on the fusion of multi-modal clue

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458718A (en) * 2009-01-05 2009-06-17 北京大学 Search engine dynamic summarization extracting method
CN103412852A (en) * 2013-08-21 2013-11-27 广东电子工业研究院有限公司 Method for automatically extracting key information of English literature
CN104731959A (en) * 2015-04-03 2015-06-24 北京威扬科技有限公司 Video abstraction generating method, device and system based on text webpage content
CN105574185A (en) * 2015-12-22 2016-05-11 北京奇虎科技有限公司 Method and device for providing clustering type intelligent summaries
CN108364677A (en) * 2018-03-13 2018-08-03 汤臣倍健股份有限公司 A kind of evaluating method and its device based on various dimensions health control model

Also Published As

Publication number Publication date
CN109189916A (en) 2019-01-11

Similar Documents

Publication Publication Date Title
US11222167B2 (en) Generating structured text summaries of digital documents using interactive collaboration
CN106649818B (en) Application search intention identification method and device, application search method and server
US10943064B2 (en) Tabular data compilation
US11222053B2 (en) Searching multilingual documents based on document structure extraction
JP6394388B2 (en) Synonym relation determination device, synonym relation determination method, and program thereof
CN109918555B (en) Method, apparatus, device and medium for providing search suggestions
US10810245B2 (en) Hybrid method of building topic ontologies for publisher and marketer content and ad recommendations
CN110727785A (en) Recommendation method, device and storage medium for training recommendation model and recommending search text
CN111563212A (en) Inner chain adding method and device
CN111755090A (en) Medical record searching method, medical record searching device, storage medium and electronic equipment
CN108345694B (en) Document retrieval method and system based on theme database
CN114141384A (en) Method, apparatus and medium for retrieving medical data
CN112990290A (en) Sample data generation method, device, equipment and storage medium
CN110287270B (en) Entity relationship mining method and equipment
JP5869948B2 (en) Passage dividing method, apparatus, and program
CN111931041A (en) Label recommendation method and device, electronic equipment and storage medium
CN109189916B (en) English abstract key information extraction method and device and electronic equipment
CN110705285A (en) Government affair text subject word bank construction method, device, server and readable storage medium
US11520973B2 (en) Providing user-specific previews within text
WO2019231635A1 (en) Method and apparatus for generating digest for broadcasting
CN113449063B (en) Method and device for constructing document structure information retrieval library
JP5679400B2 (en) Category theme phrase extracting device, hierarchical tagging device and method, program, and computer-readable recording medium
CN112711695A (en) Content-based search suggestion generation method and device
CN111444707B (en) Title generation method and device and computer readable storage medium
CN113536779B (en) Trending topic data processing method and device based on document titles and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant