WO2017028407A1 - 一种用于提取文本摘要的方法与设备 - Google Patents
一种用于提取文本摘要的方法与设备 Download PDFInfo
- Publication number
- WO2017028407A1 WO2017028407A1 PCT/CN2015/096931 CN2015096931W WO2017028407A1 WO 2017028407 A1 WO2017028407 A1 WO 2017028407A1 CN 2015096931 W CN2015096931 W CN 2015096931W WO 2017028407 A1 WO2017028407 A1 WO 2017028407A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- reading
- user
- original text
- target original
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
Definitions
- the present invention relates to the field of automatic text summarization technology, and more particularly to a technique for extracting text abstracts.
- a method for extracting a text digest comprising:
- Extracting the content according to the content text of the target original text according to the attention text A textual summary of the original text of the target.
- a digest extracting device for extracting a text digest, wherein the digest extracting device comprises:
- an embodiment of the present invention extracts a text abstract of the target original text according to the user's attention text with respect to the target original text, and combines the content information of the target original text, so that the present invention extracts
- the text summary highlights the focus of the original literature, improves the accuracy and validity of the text summary, and, in turn, enhances the user's reading and browsing experience.
- FIG. 1 shows a schematic diagram of an apparatus for extracting a digest device for extracting text digests according to an aspect of the present invention
- FIG. 2 shows a flow chart of a method for extracting a text digest in accordance with another aspect of the present invention.
- the digest extraction device 1 comprises means for acquiring reading operation information of a user with respect to a target original text (hereinafter referred to as an overview) according to an aspect of the present invention.
- the operation obtaining means 11 acquires the reading operation information of the user regarding the target original text; the attention text determining means 12 determines the attention text of the user regarding the target original text based on the reading operation information; the digest extracting means 13 The attention text is combined with the content information of the target original text to extract a text summary of the target original text.
- the digest extraction device 1 includes, but is not limited to, a network device, a user device, or a device in which a network device is integrated with a user device through a network.
- the network device includes, but is not limited to, an implementation such as a network host, a single network server, a plurality of network server sets, or a cloud computing-based computer collection; or is implemented by a user equipment.
- the cloud is composed of a large number of host or network servers based on Cloud Computing, which is a kind of distributed computing, a super virtual computer composed of a group of loosely coupled computers.
- the user equipment may be any electronic product that can interact with a user through a keyboard, a mouse, a touch pad, a touch screen, or a handwriting device, such as a computer, a mobile phone, a smart phone, a PDA, or a wearable device. Equipment, Pocket PC PPC or tablet.
- the network includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless ad hoc network (Ad Hoc network), and the like.
- Ad Hoc network wireless ad hoc network
- both the network device and the user equipment include an electronic device capable of automatically performing numerical calculation and information processing according to an instruction set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an application specific integrated circuit (ASIC), Programmable gate array (FPGA), digital processor (DSP), embedded devices, etc.
- ASIC application specific integrated circuit
- FPGA Programmable gate array
- DSP digital processor
- the operation obtaining means 11 acquires the user's reading operation information about the target original text by calling an application program interface (API) provided by the user device itself, or an application program interface (API) provided by the reading application provided by the library or the like.
- API application program interface
- API application program interface
- the target original text may be an article of a genre, such as an article, a document, a news, a novel, or the like, which is separated by a text as a main description means.
- the reading operation information refers to reading related operation information that is displayed or executed by the user during reading of the target original text, such as setting a reading mode, changing a reading mode, a staying page, a paragraph content collection, and the like.
- the reading operation information includes but is not limited to at least one of the following:
- the collection related operations include, but are not limited to, operations such as copying, collecting, sharing, and the like.
- the user if the user performs a collection related operation during the process of reading the document, the user has a high degree of attention to the content of the collection, and to a certain extent, the content portion of the user's collection is an important content of the document.
- the reading mode information includes but is not limited to: 1) a browsing mode, that is, a page that is faster than a normal reading speed, and a reading mode, that is, a reading mode, that is, a page at a normal reading speed. 3) keyword search mode, in which the user can search through the lasso (Lasso) touch operation to select the selected content as a keyword, the search may be to the search engine for searching, or the user is reading. Search in the article.
- the function of the "lasso" of the lasso touch operation means that the user draws a circle around any one or more words on the webpage by a finger in contact with the touch input device, or any other predefined
- the operation of the circle can be performed on the selected content; wherein the lasso touch operation includes, but is not limited to, a circle operation and a bracket operation, and those skilled in the art should understand that the lasso touch operation is only an example, and other Some of the lasso touch operations that may or may occur in the future are applicable to the present invention and are also included in the scope of the present invention and are incorporated herein by reference.
- the user is reading the commentary on the realm of the word in the "Human Words" about "can be alone in the cold, and in the Du Fu voice", which can be searched for the interpretation of the words "Qing Sha Xing" of Qin Guan.
- target original text and reading operation information are only For example, other existing or future possible target text or reading operation information, as applicable to the present invention, should also be included in the scope of the present invention and is hereby incorporated by reference.
- the attention text determining means 12 determines the attention text of the user regarding the target original text based on the reading operation information.
- the attention text refers to a paragraph of interest, a paragraph of interest, and a word in the original text of the target, which also reflects important content in the original text of the target.
- a user when a user reads a portion of the content of interest in the document, it typically has a different performance than reading other portions of the content, such as longer dwell time, slower reading, and collection.
- the manner in which the attention text determining means 12 determines the text of interest includes but not Limited to at least one of the following:
- the attention text determining device 12 may perform a dwell time of the paragraph corresponding to the pause operation by the user, A paragraph having a stay time greater than a predetermined time threshold is used as the attention text.
- the attention text determining means 12 may use the paragraph [5-7] in the article "Research on the LTE Physical Downlink Control Channel Blind Detection Process" as the attention text.
- the attention text determining device 12 may display the target original text in the keyword search mode.
- the paragraph being viewed is used as the attention text.
- the current reading mode is changed from the browsing mode to the keyword search mode, in the keyword
- the paragraph corresponding to the "search space” portion read in the search mode is the [5-10] segment of the article, and the attention text determining device 12 can refer to the article "5 LTE Physical Downlink Control Channel Blind Detection Process". A segment of -10 is used as the attention text.
- the attention text determining device 12 may execute the paragraph corresponding to the collection related operation by the user as a The text of interest.
- the attention text determining means 12 can use the paragraph [5-7] in the article "Research on the LTE Physical Downlink Control Channel Blind Detection Process” as the attention text.
- the attention text determining device 12 may use the target original text The paragraph that the user browses at a reading speed lower than a predetermined reading speed threshold is used as the attention text.
- the user A when reading the description of the "search space” part in the article "Research on the LTE physical downlink control channel blind detection process", the user A displays the content of the corresponding paragraph of the "search space” part by the sliding operation as the After the [5] paragraph of the article, stay for 20 minutes before sliding to continue reading paragraph [6] of the article, assuming that the paragraph [5] of the article has a total of 400 words, then read with A [5]
- the reading speed of the segment is 20 words/min, which is lower than the predetermined reading speed threshold, for example, 500 words/min, and the attention text determining device 12 can recite paragraph [5] in the article "Research on the blind detection process of the LTE physical downlink control channel". As the attention text.
- the present invention may determine the attention text according to a combination of any one or more of the foregoing reading operation information.
- the attention text may be determined according to a combination of any one or more of the foregoing reading operation information.
- the attention text determining device 12 can Take paragraphs [5] and [7] in the article "Research on the LTE Physical Downlink Control Channel Blind Detection Process" as the text of interest; for example, for User A, read the article "LTE Physical Downlink Control Channel” In the description of the "search space” section in the "Blind Detection Process Study", it performs the collection operation when reading the [7] section of the article, and the "Search Space” part that is read in the keyword search mode corresponds to The paragraph is paragraph [5-10] of the article, and the attention text determining means 12 can use the paragraph [5-10] in the article "Research on the LTE physical downlink control channel blind detection process" as the attention text.
- the predetermined reading speed threshold such as 500 words/min
- the present invention can also obtain reading operation information of a plurality of users about the target original text, thereby obtaining the attention text of each user about the target original text, so that the present invention can be based on each user.
- Dedicating text, determining the plurality of users to close The public attention text of the target original text is used as the final attention text, which can further improve the accuracy and validity of the text summary, and further improve the user's reading browsing experience.
- the digest extraction device 13 extracts the target original text according to the attention text, combined with the content information of the target original text, by a method such as using a text as a linear sequence of sentences, a sentence as a linear sequence of words, and the like. Text summary.
- LTE physical downlink control channel is the entire system down the line set forth in allocating the various resources of the system plays a key scheduling role-based PDCCH channel transmitting and receiving process, according to the PDCCH channel structure, a detailed analysis of the scheduling process of channel reception developed a terminal PDCCH channel
- the detailed blind detection method provides a theoretical basis for the actual implementation of the LTE system, and the digest extraction device 13 determines the attention text determined by the attention text determining device 12, such as the corresponding portion of the "search space" section [5-10].
- the text abstract extracted by the present invention is different from the text abstract obtained in the prior art.
- the abstract of the text extracted by the present invention can highlight the focus of the original document and improve the accuracy and validity of the text abstract. Accordingly, the user's reading browsing experience is also improved.
- the devices of the abstract extraction device 1 are continuously operated. Specifically, the operation obtaining means 11 continuously acquires the reading operation information of the user regarding the target original text; the attention text determining means 12 continues to determine the user's relevant information according to the reading operation information. The attention text of the target original text is described; the summary extracting means 13 continuously extracts the text abstract of the target original text according to the attention text and the content information of the target original text.
- the "continuation" refers to the acquisition of the reading operation information, the determination of the attention text and the text abstract extraction between the respective devices of the digest extraction device 1 until the digest extraction device 1 is The reading operation information is stopped for a long time.
- the summary extracting device 1 further includes: means for determining a preferred text of interest from the plurality of paragraphs according to the degree of content association between the plurality of paragraphs (hereinafter referred to as "preferred determining means", not shown And wherein the digest extraction means 13 extracts the text digest of the target original text according to the preferred attention text, in combination with the content information of the target original text.
- paragraph [5-7] of the article For example, for the article "Research on the blind detection process of LTE physical downlink control channel", the paragraph in which user A stays in the process of reading the article for more than the predetermined threshold is paragraph [5-7] of the article, assuming [5-6]
- the content of the segment is the dedicated search space and the common search space, and the content of paragraph [7] is the blind detection process, then the correlation between the content of paragraph [7] and the content of paragraph [5-6] is compared.
- the preferred determining means can determine paragraph [5-6] as the preferred text of interest.
- the digest extraction device 13 extracts a text digest of the target original text in accordance with the preferred attention text and the content information of the target original text.
- the digest extraction device 13 extracts the text digest in combination with the content information of the target original text according to the preferred attention text, and the digest extraction device 13 combines the target original text according to the attention text.
- the content information is extracted in the same way or substantially the same as the same. For the sake of brevity, it is not described here, and is included herein by reference.
- the present invention extracts a text abstract by combining the content information of the target original text according to the preferred text of interest, so that the extracted text summary of the present invention further highlights the focus of the original document, and improves the accuracy and validity of the text abstract. To enhance the user's reading and browsing experience.
- the reading operation information further includes a history reading record of the user with respect to the target original text, wherein the attention text determining means 12 determines the attention text according to the historical reading record.
- the historical reading record comprises at least one of the following:
- the attention text determining device 12 Based on the historical reading record, the text of interest is determined to be paragraph [5-6] of the article.
- the attention text determining device 12 is based on The historical reading record determines that the text of interest is paragraph [6] of the article.
- FIG. 2 shows a flow chart of a method for extracting a text digest in accordance with another aspect of the present invention.
- the method comprises step S1, step S2 and step S3.
- the digest extraction device 1 acquires the user's reading operation information about the target original text; in step S2, the digest extraction device 1 determines, according to the reading operation information, the user's information about the target original text. Focusing on the text; in step S3, the digest extraction device 1 extracts a text digest of the target original text according to the attention text and the content information of the target original text.
- the digest extraction device 1 includes, but is not limited to, a network device, a user device, or a device in which a network device is integrated with a user device through a network.
- the network device includes, but is not limited to, an implementation such as a network host, a single network server, a plurality of network server sets, or a cloud computing-based computer collection; or is implemented by a user equipment.
- the cloud is composed of a large number of host or network servers based on Cloud Computing, which is a kind of distributed computing, a super virtual computer composed of a group of loosely coupled computers.
- the user equipment can be any one that can pass with the user.
- the network includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless ad hoc network (Ad Hoc network), and the like.
- Ad Hoc network wireless ad hoc network
- both the network device and the user equipment include an electronic device capable of automatically performing numerical calculation and information processing according to an instruction set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an application specific integrated circuit (ASIC), Programmable gate array (FPGA), digital processor (DSP), embedded devices, etc.
- ASIC application specific integrated circuit
- FPGA Programmable gate array
- DSP digital processor
- step S1 the digest extraction device 1 acquires the user's reading of the target original text by calling an application program interface (API) provided by the user device itself, or by using an application program interface (API) provided by a library or the like. Operational information.
- API application program interface
- API application program interface
- the target original text may be an article of a genre, such as an article, a document, a news, a novel, or the like, which is separated by a text as a main description means.
- the reading operation information refers to reading related operation information that is displayed or executed by the user during reading of the target original text, such as setting a reading mode, changing a reading mode, a staying page, a paragraph content collection, and the like.
- the reading operation information includes but is not limited to at least one of the following:
- the collection related operations include, but are not limited to, operations such as copying, collecting, sharing, and the like.
- the user if the user performs a collection related operation during the process of reading the document, the user has a high degree of attention to the content of the collection, and to a certain extent, the content portion of the user's collection is an important content of the document.
- the reading mode information includes but is not limited to: 1) browsing mode, ie, fast At the normal reading speed, each time the page is turned over; 2) the reading mode, that is, the normal reading speed, one page at a time; 3) the keyword search mode, in which the user can touch through the lasso (Lasso)
- the search is performed by using the corresponding selected content as a keyword, and the search may be to go to the search engine for searching, or may be searched in the article that the user is reading.
- the function of the "lasso" of the lasso touch operation means that the user draws a circle around any one or more words on the webpage by a finger in contact with the touch input device, or any other predefined
- the operation of the circle can be performed on the selected content; wherein the lasso touch operation includes, but is not limited to, a circle operation and a bracket operation, and those skilled in the art should understand that the lasso touch operation is only an example, and other Some of the lasso touch operations that may or may occur in the future are applicable to the present invention and are also included in the scope of the present invention and are incorporated herein by reference.
- the user is reading the commentary on the realm of the word in the "Human Words" about "can be alone in the cold, and in the Du Fu voice", which can be searched for the interpretation of the words "Qing Sha Xing" of Qin Guan.
- target original text and reading operation information are only examples, and other existing or future possible target original text or reading operation information, as applicable to the present invention, should also be included in the scope of protection of the present invention. It is hereby incorporated by reference.
- step S1 the abstract extraction device 1 can obtain the user through the application program interface (API) provided by the library or the like reading APP.
- API application program interface
- the abstract extraction device 1 can obtain the application program interface (API) provided by the library or the like through the library.
- API application program interface
- step S2 the digest extraction device 1 determines the user's attention text regarding the target original text based on the read operation information.
- the attention text refers to a paragraph of interest, a paragraph of interest, and a word in the original text of the target, which also reflects important content in the original text of the target.
- a user when a user reads a portion of the content of interest in the document, it typically has a different performance than reading other portions of the content, such as longer dwell time, slower reading, and collection.
- step S2 the manner in which the digest extraction device 1 determines the text of interest includes, but is not limited to, at least one of the following:
- step S2 the summary extraction device 1 may perform a paragraph corresponding to the pause operation according to the user.
- the stay time is a paragraph in which the stay time is greater than the predetermined time threshold as the attention text.
- the digest extraction device 1 may use the paragraph [5-7] in the article "Research on the LTE Physical Downlink Control Channel Blind Detection Process" as the attention text.
- step S2 the summary extracting device 1 may place the target original text in the key The paragraph being viewed in the word search mode is used as the attention text.
- the digest extraction device 1 can use the paragraph [5-10] in the article "Research on the LTE physical downlink control channel blind detection process" as the attention text.
- the digest extraction device 1 may perform the corresponding operation corresponding to the collection related operation by the user. Paragraph, as the text of interest.
- the digest extraction device 1 can use the paragraph [5-7] in the article "Research on the LTE Physical Downlink Control Channel Blind Detection Process” as the attention text.
- the digest extraction device 1 may read the user in the target original text below a predetermined reading.
- the reading speed of the speed threshold is viewed as the attention text.
- the user A when reading the description of the "search space” part in the article "Research on the LTE physical downlink control channel blind detection process", the user A displays the content of the corresponding paragraph of the "search space” part by the sliding operation as the After the [5] paragraph of the article, stay for 20 minutes before sliding to continue reading paragraph [6] of the article, assuming that the paragraph [5] of the article has a total of 400 words, then read with A [5]
- the reading speed of the segment is 20 words/min, which is lower than the predetermined reading speed threshold, for example, 500 words/min.
- the digest extracting device 1 can refer to the article "Research on the blind detection process of the LTE physical downlink control channel". [5] as the attention text.
- the present invention is based on the reading operation information.
- the attention text may be determined according to a combination of any one or more of the aforementioned reading operation information. For example, for User A, when reading the description of the "Search Space” section in the article "Research on the LTE Physical Downlink Control Channel Blind Detection Process", assume that the entire content of the "Search Space” section is the [5-12 of the article).
- the abstract The extracting device 1 can use the paragraphs [5] and [7] in the article "Research on the LTE physical downlink control channel blind detection process” as the attention text; for example, for the user A, it reads the article "LTE"
- the “search space” is read in the keyword search mode when reading the [7] section of the article.
- the digest extraction device 1 can refer to paragraph [5-10] of the article "Research on the blind detection process of the LTE physical downlink control channel” As the attention text.
- the present invention can also obtain reading operation information of a plurality of users about the target original text, thereby obtaining the attention text of each user about the target original text, so that the present invention can be based on each user.
- the attention text determining the public attention text of the plurality of users about the target original text, as the final attention text, further improving the accuracy and validity of the text summary, and further improving the user's reading browsing experience .
- step S3 the digest extraction device 1 extracts the content according to the attention text, combined with the content information of the target original text, by a method such as using a text as a linear sequence of sentences, a sentence as a linear sequence of words, and the like. A textual summary of the original text of the target.
- LTE physical downlink control channel is the entire system down the line set forth in allocating the various resources of the system plays a key scheduling role-based PDCCH channel transmitting and receiving process, according to the PDCCH channel structure, a detailed analysis of the scheduling process of channel reception developed a terminal PDCCH channel
- the detailed blind detection method provides a theoretical basis for the actual implementation of the LTE system, and in step S3, the digest extraction device 1 according to the attention text extracted by the digest extraction device 1 in step S2, such as "search space” Partially corresponding paragraphs [5-10], combined with the content information of the target original text, are extracted from the LTE physical downlink control by means of a linear sequence such as text as a sentence and a sentence as a linear sequence of words.
- LTE physical downlink control channel for the entire system.
- OK forth allocating the various resources of the system plays a key scheduling role-based PDCCH channel transmitting and receiving process, according to the PDCCH channel structure, a detailed analysis of the scheduling process channel specific search space and the common search space for the terminal PDCCH channel Received a detailed blind detection method to provide a theoretical basis for the actual implementation of the LTE system.”
- the text abstract extracted by the present invention is different from the text abstract obtained in the prior art.
- the abstract of the text extracted by the present invention can highlight the focus of the original document and improve the accuracy and validity of the text abstract. Accordingly, the user's reading browsing experience is also improved.
- step S1 the digest extraction device 1 continuously acquires the user's reading operation information about the target original text; in step S2, the digest extraction device 1 continues to determine the user's original information about the target according to the reading operation information.
- step S3 the summary extracting device 1 continuously extracts the text summary of the target original text according to the content text of the target original text according to the attention text.
- the "persistence" refers to the continuous acquisition of the reading operation information, the determination of the attention text and the text abstract extraction between the respective steps of the digest extraction device 1, until the digest extraction device 1 is The reading operation information is stopped for a long time.
- the summary extracting device 1 further includes a step S4 (not shown). Specifically, in step S4, the digest extraction device 1 determines a preferred attention text from the plurality of paragraphs according to the content relevance degree between the plurality of paragraphs, wherein in step S3, the digest extraction device 1 is Preferred attention text, combined The content information of the target original text extracts a text summary of the target original text.
- paragraph [5-7] of the article For example, for the article "Research on the blind detection process of LTE physical downlink control channel", the paragraph in which user A stays in the process of reading the article for more than the predetermined threshold is paragraph [5-7] of the article, assuming [5-6]
- the content of the segment is the dedicated search space and the common search space, and the content of paragraph [7] is the blind detection process, then the correlation between the content of paragraph [7] and the content of paragraph [5-6] is compared.
- the preferred determining means can determine paragraph [5-6] as the preferred text of interest.
- step S3 the digest extraction device 1 extracts a text digest of the target original text according to the preferred attention text and the content information of the target original text.
- the digest extraction device 1 extracts the text digest in combination with the content information of the target original text according to the preferred attention text, and in the foregoing step S3, the digest extraction device 1 according to the The text of the attention is combined with the content information of the original text of the target, and the manner of extracting the text abstract is the same or substantially the same. For the sake of brevity, it is not described here, and is included in the reference.
- the present invention extracts a text abstract by combining the content information of the target original text according to the preferred text of interest, so that the extracted text summary of the present invention further highlights the focus of the original document, and improves the accuracy and validity of the text abstract. To enhance the user's reading and browsing experience.
- the reading operation information further includes a history reading record of the user with respect to the target original text, wherein, in step S2, the digest extraction device 1 determines the attention text according to the history reading record.
- the historical reading record comprises at least one of the following:
- step S2 The abstract extraction device 1 determines, according to the history reading record, that the attention text is the [5-6] segment of the article.
- the digest extraction device 1 determines, based on the historical reading record, that the text of interest is paragraph [6] of the article.
- the present invention can be implemented in software and/or a combination of software and hardware, for example, using an application specific integrated circuit (ASIC), a general purpose computer, or any other similar hardware device.
- the software program of the present invention may be executed by a processor to implement the steps or functions described above.
- the software program (including related data structures) of the present invention can be stored in a computer readable recording medium such as a RAM memory, a magnetic or optical drive or a floppy disk and the like.
- some of the steps or functions of the present invention may be implemented in hardware, for example, as a circuit that cooperates with a processor to perform various steps or functions.
- a portion of the invention can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide a method and/or solution in accordance with the present invention.
- the program instructions for invoking the method of the present invention may be stored in a fixed or removable recording medium and/or transmitted by a data stream in a broadcast or other signal bearing medium, and/or stored in a The working memory of the computer device in which the program instructions are run.
- an embodiment in accordance with the present invention includes a device including a memory for storing computer program instructions and a processor for executing program instructions, wherein when the computer program instructions are executed by the processor, triggering
- the apparatus operates based on the aforementioned methods and/or technical solutions in accordance with various embodiments of the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种用于提取文本摘要的方法与设备。具体地,获取用户关于目标原始文本的阅读操作信息(S1);根据所述阅读操作信息,确定所述用户关于所述目标原始文本的关注文本(S2);根据所述关注文本,结合所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要(S3)。本方法通过根据用户关于目标原始文本的关注文本,结合所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要,从而使得提取的文本摘要更能突出原始文献的重点,提高了文本摘要的准确度和有效性,相应地,也提升了用户的阅读浏览体验。
Description
相关申请的交叉引用
本申请享有2015年8月20日提交的专利申请号为201510515872.6、名称为“一种用于提取文本摘要的方法与设备”的中国专利申请的优先权,该在先申请的内容以引用方式并入本文本中。
本发明涉及自动文本摘要技术领域,尤其涉及一种用于提取文本摘要的技术。
文摘以简明扼要且语义连贯的语言,确切、准确地记述原始文献的中心内容,极大地节省了人们的阅读时间。现有技术中常常利用计算机,通过诸如将文本作为句子的线性序列、将句子视为词的线性序列等方法,自动地从原始文献中提取文摘,而该等方法对于文本语法、语义及句法的分析及自动摘要常常是基于全文内容或章节相关的段落内容而进行的,并不能很好地突出原始文献的重点。
发明内容
本发明的一个目的是提供一种用于提取文本摘要的方法与设备。
根据本发明的一个方面,提供了一种用于提取文本摘要的方法,其中,该方法包括:
获取用户关于目标原始文本的阅读操作信息;
根据所述阅读操作信息,确定所述用户关于所述目标原始文本的关注文本;
根据所述关注文本,结合所述目标原始文本的内容信息,提取所
述目标原始文本的文本摘要。
根据本发明的另一方面,还提供了一种用于提取文本摘要的摘要提取设备,其中,该摘要提取设备包括:
用于获取用户关于目标原始文本的阅读操作信息的装置;
用于根据所述阅读操作信息,确定所述用户关于所述目标原始文本的关注文本的装置;
用于根据所述关注文本,结合所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要的装置。
与现有技术相比,本发明的一个实施例通过根据用户关于目标原始文本的关注文本,结合所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要,从而使得本发明提取的文本摘要更能突出原始文献的重点,提高了文本摘要的准确度和有效性,相应地,也提升了用户的阅读浏览体验。
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本发明的其它特征、目的和优点将会变得更明显:
图1示出根据本发明一个方面的一种用于提取文本摘要的摘要提取设备的设备示意图;
图2示出根据本发明另一个方面的一种用于提取文本摘要的方法流程图。
附图中相同或相似的附图标记代表相同或相似的部件。
下面结合附图对本发明作进一步详细描述。
图1示出根据本发明一个方面的一种用于提取文本摘要的摘要提取设备1的设备示意图,其中,摘要提取设备1包括用于获取用户关于目标原始文本的阅读操作信息的装置(以下简称“操作获取装置11”);用于根据所述阅读操作信息,确定所述用户关于所述目标原始
文本的关注文本的装置(以下简称“关注文本确定装置12”);用于根据所述关注文本,结合所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要的装置(以下简称“摘要提取装置13”)。
具体地,操作获取装置11获取用户关于目标原始文本的阅读操作信息;关注文本确定装置12根据所述阅读操作信息,确定所述用户关于所述目标原始文本的关注文本;摘要提取装置13根据所述关注文本,结合所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要。
在此,摘要提取设备1包括但不限于网络设备、用户设备或网络设备与用户设备通过网络相集成所构成的设备。在此,所述网络设备包括但不限于如网络主机、单个网络服务器、多个网络服务器集或基于云计算的计算机集合等实现;或者由用户设备实现。在此,云由基于云计算(Cloud Computing)的大量主机或网络服务器构成,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。在此,所述用户设备可以是任何一种可与用户通过键盘、鼠标、触摸板、触摸屏、或手写设备等方式进行人机交互的电子产品,例如计算机、手机、智能手机、PDA、可穿戴设备、掌上电脑PPC或平板电脑等。所述网络包括但不限于互联网、广域网、城域网、局域网、VPN网络、无线自组织网络(Ad Hoc网络)等。本领域技术人员应能理解上述摘要提取设备1仅为举例,其他现有的或今后可能出现的网络设备或用户设备如可适用于本发明,也应包含在本发明保护范围以内,并在此以引用方式包含于此。在此,网络设备及用户设备均包括一种能够按照事先设定或存储的指令,自动进行数值计算和信息处理的电子设备,其硬件包括但不限于微处理器、专用集成电路(ASIC)、可编程门阵列(FPGA)、数字处理器(DSP)、嵌入式设备等。
具体地,操作获取装置11通过调用用户设备自身提供的应用程序接口(API),或者,通过文库等阅读APP提供的应用程序接口(API),获取用户关于目标原始文本的阅读操作信息。
在此,所述目标原始文本可以是文章、文献、新闻、小说等任何体裁的以文字为主要描述手段而独立成篇的内容。
在此,所述阅读操作信息是指用户在阅读所述目标原始文本过程中表现或执行的阅读相关操作信息,诸如设置阅读模式、更改阅读模式、停留页面、段落内容收藏等。优选地,所述阅读操作信息包括但不限于以下至少任一项:
-所述用户阅读所述目标原始文本过程中执行的停留操作;
-所述用户阅读所述目标原始文本过程中执行的收藏相关操作;
-所述用户阅读所述目标原始文本过程中的阅读模式信息;
-所述用户阅读所述目标原始文本过程中的阅读速度。
在此,所述收藏相关操作包括但不限于如复制、收藏、分享等操作。在具体实施例中,若用户在阅读文档过程中执行了收藏相关操作,说明用户对收藏的内容部分关注度较高,在一定程度上也说明了用户收藏的内容部分是文档的重要内容。
在此,所述阅读模式信息包括但不限于:1)浏览模式,即以快于正常阅读速度的、每次翻多页;2)阅读模式,即以正常阅读速度的、每次翻一页;3)关键词搜索模式,该模式下用户可通过套索(Lasso)触摸操作以对应选中内容为关键词进行搜索,该搜索可以是转至搜索引擎进行搜索,也可以是在用户正在阅读的文章中进行搜索。在此,所述套索触摸操作的“套索”的功能,是指用户通过与触摸输入装置接触的手指在网页上的任何一个或多个单词的周围画一个圈,或者任何预定义的其它圈中操作,即可对选定内容进行搜索;其中,所述套索触摸操作包括但不限于画圈操作、括弧操作,本领域技术人员应能理解上述套索触摸操作仅为举例,其他现有的或今后可能出现的套索触摸操作如可适用于本发明,也应包含在本发明保护范围以内,并以引用方式包含于此。例如,用户正在阅读《人间词话》中在词的境界部分有关“可堪孤馆闭春寒,杜鹃声里斜阳暮”的评述,其可搜索秦观《踏沙行》的词句解析。
本领域技术人员应能理解上述目标原始文本和阅读操作信息仅为
举例,其他现有的或今后可能出现的目标原始文本或阅读操作信息如可适用于本发明,也应包含在本发明保护范围以内,并在此以引用方式包含于此。
例如,假设用户A通过其ipad2在文库APP中正阅读文章《LTE物理下行控制信道盲检过程研究》,且阅读至其中关于“搜索空间”部分的描述时,用户A通过滑动显示出“搜索空间”部分相应段落的内容后,停留的时间超过预定时间阈值后才进行滑动以继续阅读,则操作获取装置11可通过文库等阅读APP提供的应用程序接口(API),获取到用户A在阅读目标原始文本《LTE物理下行控制信道盲检过程研究》过程中执行的停留操作。
再如,假设用户A在阅读至文章《LTE物理下行控制信道盲检过程研究》中关于“搜索空间”部分的描述时,将当前阅读模式由浏览模式修改为关键词搜索模式,并执行了在其ipad2的触摸屏上执行了用一个手指在词语“CCE”周围画一个圈的触摸操作,则操作获取装置11可通过文库等阅读APP提供的应用程序接口(API),获取到用户A在阅读目标原始文本《LTE物理下行控制信道盲检过程研究》过程中执行的阅读模式更改操作以及搜索操作。
本领域技术人员应能理解上述获取用户关于目标原始文本的阅读操作信息的方式仅为举例,其他现有的或今后可能出现的获取用户关于目标原始文本的阅读操作信息的方式如可适用于本发明,也应包含在本发明保护范围以内,并在此以引用方式包含于此。
接着,关注文本确定装置12根据所述阅读操作信息,确定所述用户关于所述目标原始文本的关注文本。
在此,所述关注文本是指用户对所述目标原始文本中关注、感兴趣的段落、词语,其也体现了所述目标原始文本中的重要内容。在具体实施例中,用户在阅读文档中的其关注的内容部分时,通常具有与阅读其他内容部分不一样的表现,如停留时间较长、阅读速度放慢、进行收藏等操作。
在此,关注文本确定装置12确定所述关注文本的方式包括但不
限于以下至少任一项:
1)若所述用户阅读操作信息包括所述用户阅读所述目标原始文本过程中的停留操作,则关注文本确定装置12可根据所述用户执行所述停留操作所对应的段落的停留时间,将停留时间大于预定时间阈值的段落作为所述关注文本。
例如,对于用户A,其在阅读至文章《LTE物理下行控制信道盲检过程研究》中关于“搜索空间”部分的描述时,在“搜索空间”部分相应段落如该文章的第[5-7]段停留的时间大于预定时间阈值,则关注文本确定装置12可将文章《LTE物理下行控制信道盲检过程研究》中的第[5-7]段作为所述关注文本。
2)若所述用户阅读所述目标原始文本过程中由所述浏览模式变换为所述关键词搜索模式,则关注文本确定装置12可将所述目标原始文本中在所述关键词搜索模式下被浏览的段落作为所述关注文本。
例如,对于用户A,其在阅读至文章《LTE物理下行控制信道盲检过程研究》中关于“搜索空间”部分的描述时,将当前阅读模式由浏览模式修改为关键词搜索模式,在关键词搜索模式下阅读的“搜索空间”部分对应的段落为该文章的第[5-10]段,则关注文本确定装置12可将文章《LTE物理下行控制信道盲检过程研究》中的第[5-10]段作为所述关注文本。
3)若所述阅读操作信息包括所述用户阅读所述目标原始文本过程中执行的收藏相关操作,则关注文本确定装置12可将所述用户执行所述收藏相关操作所对应的段落,作为所述关注文本。
例如,对于用户A,其在阅读文章《LTE物理下行控制信道盲检过程研究》中关于“搜索空间”部分的描述时,在“搜索空间”部分相应段落如该文章的第[5-7]段中的每一段均执行了收藏操作,则关注文本确定装置12可将文章《LTE物理下行控制信道盲检过程研究》中的第[5-7]段作为所述关注文本。
4)若所述阅读操作信息包括所述用户阅读所述目标原始文本过程中的阅读速度,则关注文本确定装置12可将所述目标原始文本中所
述用户以低于预定阅读速度阈值的阅读速度浏览的段落作为所述关注文本。
例如,对于用户A,其在阅读文章《LTE物理下行控制信道盲检过程研究》中关于“搜索空间”部分的描述时,用户A通过滑动操作显示出“搜索空间”部分相应段落的内容如该文章的第[5]段后,停留了20min,才进行滑动以继续阅读该文章的第[6]段,假设该文章第[5]段共有文字400字,则说明用A阅读第[5]段的阅读速度为20字/min,其低于预定阅读速度阈值如500字/min,则关注文本确定装置12可将文章《LTE物理下行控制信道盲检过程研究》中的第[5]段作为所述关注文本。
本领域技术人员应能理解上述确定所述关注文本的方式仅为举例,其他现有的或今后可能出现的确定所述关注文本的方式如可适用于本发明,也应包含在本发明保护范围以内,并在此以引用方式包含于此。
在此,本领域技术人员应当理解,本发明在根据所述阅读操作信息确定所述关注文本时,可根据前述所述阅读操作信息中的任意一项或多项的组合来确定所述关注文本。例如,对于用户A,其在阅读文章《LTE物理下行控制信道盲检过程研究》中关于“搜索空间”部分的描述时,假设“搜索空间”部分的全部内容为该文章的第[5-12]段,用户A阅读第[5]段的阅读速度低于预定阅读速度阈值如500字/min,其在阅读该文章的第[7]段时执行了收藏操作,则关注文本确定装置12可将文章《LTE物理下行控制信道盲检过程研究》中的第[5]段和第[7]段均作为所述关注文本;再如,对于用户A,其在阅读文章《LTE物理下行控制信道盲检过程研究》中关于“搜索空间”部分的描述时,其在阅读该文章的第[7]段时执行了收藏操作,而其在关键词搜索模式下阅读的“搜索空间”部分对应的段落为该文章的第[5-10]段,则关注文本确定装置12可将文章《LTE物理下行控制信道盲检过程研究》中的第[5-10]段作为所述关注文本。
在此,需要说明的是,本发明还可获取多个用户关于目标原始文本的阅读操作信息,进而可得到每一用户关于所述目标原始文本的关注文本,从而本发明可根据每一用户的所述关注文本,确定所述多个用户关
于所述目标原始文本的公共关注文本,以作为最终的关注文本,可进一步提高文本摘要的准确度和有效性,也进一步提升了用户的阅读浏览体验。
接着,摘要提取装置13根据所述关注文本,结合所述目标原始文本的内容信息,通过诸如将文本作为句子的线性序列、将句子视为词的线性序列等方法,提取所述目标原始文本的文本摘要。
例如,假设文章《LTE物理下行控制信道盲检过程研究》通过现有技术中的自动文本摘要方法,也即基于全文内容或章节相关的段落内容而得到的文本摘要为“LTE物理下行控制信道为整个系统上下行阐述分配各种资源,对系统起着非常关键的调度作用。基于PDCCH信道发送与接收流程,根据PDCCH信道结构,详细分析了信道的调度
过程,为终端PDCCH信道接收制定了一种详细的盲检方法,为LTE系统的实际实现提供了理论依据”,而摘要提取装置13根据关注文本确定装置12确定的所述关注文本,如“搜索空间”部分对应的第[5-10]段,结合所述目标原始文本的内容信息,通过诸如将文本作为句子的线性序列、将句子视为词的线性序列等方法,提取得到文章《LTE物理下行控制信道盲检过程研究》的文本摘要为“LTE物理下行控制信道为整个系统上下行阐述分配各种资源,对系统起着非常关键的调度作用。基于PDCCH信道发送与接收流程,根据PDCCH信道结构,详细分析了信道的调度过程、专用搜索空间和公共搜索空间,为终端PDCCH信道接收制定了一种详细的盲检方法,为LTE系统的实际实现提供了理论依据”。
在此,本发明提取的文本摘要不同于现有技术中得到文本摘要,相对于现有技术,本发明提取的文本摘要更能突出原始文献的重点,提高了文本摘要的准确度和有效性,相应地,也提升了用户的阅读浏览体验。
摘要提取设备1的各装置之间是持续不断工作的。具体地,操作获取装置11持续获取用户关于目标原始文本的阅读操作信息;关注文本确定装置12持续根据所述阅读操作信息,确定所述用户关于所
述目标原始文本的关注文本;摘要提取装置13持续根据所述关注文本,结合所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要。在此,本领域技术人员应当理解所述“持续”是指摘要提取设备1的各个装置之间分别不断地进行阅读操作信息的获取、关注文本的确定与文本摘要提取,直至摘要提取设备1在较长时间内停止获取阅读操作信息。
优选地,若所述用户阅读所述目标原始文本过程中的停留时间大于预定阈值的段落为多个,或者,若所述目标原始文本中在所述关键词搜索模式下被浏览的段落为多个,其中,摘要提取设备1还包括:用于根据多个所述段落之间的内容关联度,从所述多个段落中确定优选关注文本的装置(以下简称“优选确定装置”,未示出),其中,摘要提取装置13根据所述优选关注文本,结合所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要。
例如,对于文章《LTE物理下行控制信道盲检过程研究》,用户A在阅读该文章过程中停留时间大于预定阈值的段落为该文章的第[5-7]段,假设第[5-6]段的内容分别为专用搜索空间和公共搜索空间,而第[7]段的内容为盲检过程,则第[7]段的内容与第[5-6]段的内容之间的关联度比较小,则优选确定装置可确定第[5-6]段作为优选关注文本。
接着,摘要提取装置13根据所述优选关注文本,结合所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要。在此,摘要提取装置13根据所述优选关注文本,结合所述目标原始文本的内容信息,提取所述文本摘要的方式与前述摘要提取装置13根据所述关注文本,结合所述目标原始文本的内容信息,提取所述文本摘要的方式相同或基本相同,为简明起见,故在此不再赘述,并以引用的方式包含与此。
在此,本发明通过根据优选关注文本,结合所述目标原始文本的内容信息,来提取文本摘要,使得本发明提取的文本摘要更一步突出原始文献的重点,提高文本摘要的准确度和有效性,提升用户的阅读浏览体验。
优选地,所述阅读操作信息还包括所述用户关于所述目标原始文本的历史阅读记录,其中,关注文本确定装置12根据所述历史阅读记录,确定所述关注文本。
优选地,所述历史阅读记录包括以下至少任一项:
-所述用户关于所述目标原始文本中段落的历史阅读频率信息;
-所述用户关于所述目标原始文本中段落的历史阅读平均时长。
例如,对于文章《LTE物理下行控制信道盲检过程研究》,假设用户A关于该文章的历史阅读记录中,经常阅读该文章中的第[5-6]段的内容,则关注文本确定装置12根据该历史阅读记录,确定所述关注文本为该文章的第[5-6]段。
再如,对于用户A,假设其关于文章《LTE物理下行控制信道盲检过程研究》的历史阅读记录中,对第[6]段的历史阅读平均时长超过预定阈值,则关注文本确定装置12根据该历史阅读记录,确定所述关注文本为该文章的第[6]段。
图2示出根据本发明另一个方面的一种用于提取文本摘要的方法流程图。
其中,该方法包括步骤S1、步骤S2和步骤S3。具体地,在步骤S1中,摘要提取设备1获取用户关于目标原始文本的阅读操作信息;在步骤S2中,摘要提取设备1根据所述阅读操作信息,确定所述用户关于所述目标原始文本的关注文本;在步骤S3中,摘要提取设备1根据所述关注文本,结合所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要。
在此,摘要提取设备1包括但不限于网络设备、用户设备或网络设备与用户设备通过网络相集成所构成的设备。在此,所述网络设备包括但不限于如网络主机、单个网络服务器、多个网络服务器集或基于云计算的计算机集合等实现;或者由用户设备实现。在此,云由基于云计算(Cloud Computing)的大量主机或网络服务器构成,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。在此,所述用户设备可以是任何一种可与用户通过
键盘、鼠标、触摸板、触摸屏、或手写设备等方式进行人机交互的电子产品,例如计算机、手机、智能手机、PDA、可穿戴设备、掌上电脑PPC或平板电脑等。所述网络包括但不限于互联网、广域网、城域网、局域网、VPN网络、无线自组织网络(Ad Hoc网络)等。本领域技术人员应能理解上述摘要提取设备1仅为举例,其他现有的或今后可能出现的网络设备或用户设备如可适用于本发明,也应包含在本发明保护范围以内,并在此以引用方式包含于此。在此,网络设备及用户设备均包括一种能够按照事先设定或存储的指令,自动进行数值计算和信息处理的电子设备,其硬件包括但不限于微处理器、专用集成电路(ASIC)、可编程门阵列(FPGA)、数字处理器(DSP)、嵌入式设备等。
具体地,在步骤S1中,摘要提取设备1通过调用用户设备自身提供的应用程序接口(API),或者,通过文库等阅读APP提供的应用程序接口(API),获取用户关于目标原始文本的阅读操作信息。
在此,所述目标原始文本可以是文章、文献、新闻、小说等任何体裁的以文字为主要描述手段而独立成篇的内容。
在此,所述阅读操作信息是指用户在阅读所述目标原始文本过程中表现或执行的阅读相关操作信息,诸如设置阅读模式、更改阅读模式、停留页面、段落内容收藏等。优选地,所述阅读操作信息包括但不限于以下至少任一项:
-所述用户阅读所述目标原始文本过程中执行的停留操作;
-所述用户阅读所述目标原始文本过程中执行的收藏相关操作;
-所述用户阅读所述目标原始文本过程中的阅读模式信息;
-所述用户阅读所述目标原始文本过程中的阅读速度。
在此,所述收藏相关操作包括但不限于如复制、收藏、分享等操作。在具体实施例中,若用户在阅读文档过程中执行了收藏相关操作,说明用户对收藏的内容部分关注度较高,在一定程度上也说明了用户收藏的内容部分是文档的重要内容。
在此,所述阅读模式信息包括但不限于:1)浏览模式,即以快
于正常阅读速度的、每次翻多页;2)阅读模式,即以正常阅读速度的、每次翻一页;3)关键词搜索模式,该模式下用户可通过套索(Lasso)触摸操作以对应选中内容为关键词进行搜索,该搜索可以是转至搜索引擎进行搜索,也可以是在用户正在阅读的文章中进行搜索。在此,所述套索触摸操作的“套索”的功能,是指用户通过与触摸输入装置接触的手指在网页上的任何一个或多个单词的周围画一个圈,或者任何预定义的其它圈中操作,即可对选定内容进行搜索;其中,所述套索触摸操作包括但不限于画圈操作、括弧操作,本领域技术人员应能理解上述套索触摸操作仅为举例,其他现有的或今后可能出现的套索触摸操作如可适用于本发明,也应包含在本发明保护范围以内,并以引用方式包含于此。例如,用户正在阅读《人间词话》中在词的境界部分有关“可堪孤馆闭春寒,杜鹃声里斜阳暮”的评述,其可搜索秦观《踏沙行》的词句解析。
本领域技术人员应能理解上述目标原始文本和阅读操作信息仅为举例,其他现有的或今后可能出现的目标原始文本或阅读操作信息如可适用于本发明,也应包含在本发明保护范围以内,并在此以引用方式包含于此。
例如,假设用户A通过其ipad2在文库APP中正阅读文章《LTE物理下行控制信道盲检过程研究》,且阅读至其中关于“搜索空间”部分的描述时,用户A通过滑动显示出“搜索空间”部分相应段落的内容后,停留的时间超过预定时间阈值后才进行滑动以继续阅读,则在步骤S1中,摘要提取设备1可通过文库等阅读APP提供的应用程序接口(API),获取到用户A在阅读目标原始文本《LTE物理下行控制信道盲检过程研究》过程中执行的停留操作。
再如,假设用户A在阅读至文章《LTE物理下行控制信道盲检过程研究》中关于“搜索空间”部分的描述时,将当前阅读模式由浏览模式修改为关键词搜索模式,并执行了在其ipad2的触摸屏上执行了用一个手指在词语“CCE”周围画一个圈的触摸操作,则在步骤S1中,摘要提取设备1可通过文库等阅读APP提供的应用程序接口(API),获
取到用户A在阅读目标原始文本《LTE物理下行控制信道盲检过程研究》过程中执行的阅读模式更改操作以及搜索操作。
本领域技术人员应能理解上述获取用户关于目标原始文本的阅读操作信息的方式仅为举例,其他现有的或今后可能出现的获取用户关于目标原始文本的阅读操作信息的方式如可适用于本发明,也应包含在本发明保护范围以内,并在此以引用方式包含于此。
接着,在步骤S2中,摘要提取设备1根据所述阅读操作信息,确定所述用户关于所述目标原始文本的关注文本。
在此,所述关注文本是指用户对所述目标原始文本中关注、感兴趣的段落、词语,其也体现了所述目标原始文本中的重要内容。在具体实施例中,用户在阅读文档中的其关注的内容部分时,通常具有与阅读其他内容部分不一样的表现,如停留时间较长、阅读速度放慢、进行收藏等操作。
在此,在步骤S2中,摘要提取设备1确定所述关注文本的方式包括但不限于以下至少任一项:
1)若所述用户阅读操作信息包括所述用户阅读所述目标原始文本过程中的停留操作,则在步骤S2中,摘要提取设备1可根据所述用户执行所述停留操作所对应的段落的停留时间,将停留时间大于预定时间阈值的段落作为所述关注文本。
例如,对于用户A,其在阅读至文章《LTE物理下行控制信道盲检过程研究》中关于“搜索空间”部分的描述时,在“搜索空间”部分相应段落如该文章的第[5-7]段停留的时间大于预定时间阈值,则在步骤S2中,摘要提取设备1可将文章《LTE物理下行控制信道盲检过程研究》中的第[5-7]段作为所述关注文本。
2)若所述用户阅读所述目标原始文本过程中由所述浏览模式变换为所述关键词搜索模式,则在步骤S2中,摘要提取设备1可将所述目标原始文本中在所述关键词搜索模式下被浏览的段落作为所述关注文本。
例如,对于用户A,其在阅读至文章《LTE物理下行控制信道盲
检过程研究》中关于“搜索空间”部分的描述时,将当前阅读模式由浏览模式修改为关键词搜索模式,在关键词搜索模式下阅读的“搜索空间”部分对应的段落为该文章的第[5-10]段,则在步骤S2中,摘要提取设备1可将文章《LTE物理下行控制信道盲检过程研究》中的第[5-10]段作为所述关注文本。
3)若所述阅读操作信息包括所述用户阅读所述目标原始文本过程中执行的收藏相关操作,则在步骤S2中,摘要提取设备1可将所述用户执行所述收藏相关操作所对应的段落,作为所述关注文本。
例如,对于用户A,其在阅读文章《LTE物理下行控制信道盲检过程研究》中关于“搜索空间”部分的描述时,在“搜索空间”部分相应段落如该文章的第[5-7]段中的每一段均执行了收藏操作,则在步骤S2中,摘要提取设备1可将文章《LTE物理下行控制信道盲检过程研究》中的第[5-7]段作为所述关注文本。
4)若所述阅读操作信息包括所述用户阅读所述目标原始文本过程中的阅读速度,则在步骤S2中,摘要提取设备1可将所述目标原始文本中所述用户以低于预定阅读速度阈值的阅读速度浏览的段落作为所述关注文本。
例如,对于用户A,其在阅读文章《LTE物理下行控制信道盲检过程研究》中关于“搜索空间”部分的描述时,用户A通过滑动操作显示出“搜索空间”部分相应段落的内容如该文章的第[5]段后,停留了20min,才进行滑动以继续阅读该文章的第[6]段,假设该文章第[5]段共有文字400字,则说明用A阅读第[5]段的阅读速度为20字/min,其低于预定阅读速度阈值如500字/min,则在步骤S2中,摘要提取设备1可将文章《LTE物理下行控制信道盲检过程研究》中的第[5]段作为所述关注文本。
本领域技术人员应能理解上述确定所述关注文本的方式仅为举例,其他现有的或今后可能出现的确定所述关注文本的方式如可适用于本发明,也应包含在本发明保护范围以内,并在此以引用方式包含于此。
在此,本领域技术人员应当理解,本发明在根据所述阅读操作信息
确定所述关注文本时,可根据前述所述阅读操作信息中的任意一项或多项的组合来确定所述关注文本。例如,对于用户A,其在阅读文章《LTE物理下行控制信道盲检过程研究》中关于“搜索空间”部分的描述时,假设“搜索空间”部分的全部内容为该文章的第[5-12]段,用户A阅读第[5]段的阅读速度低于预定阅读速度阈值如500字/min,其在阅读该文章的第[7]段时执行了收藏操作,则在步骤S2中,摘要提取设备1可将文章《LTE物理下行控制信道盲检过程研究》中的第[5]段和第[7]段均作为所述关注文本;再如,对于用户A,其在阅读文章《LTE物理下行控制信道盲检过程研究》中关于“搜索空间”部分的描述时,其在阅读该文章的第[7]段时执行了收藏操作,而其在关键词搜索模式下阅读的“搜索空间”部分对应的段落为该文章的第[5-10]段,则在步骤S2中,摘要提取设备1可将文章《LTE物理下行控制信道盲检过程研究》中的第[5-10]段作为所述关注文本。
在此,需要说明的是,本发明还可获取多个用户关于目标原始文本的阅读操作信息,进而可得到每一用户关于所述目标原始文本的关注文本,从而本发明可根据每一用户的所述关注文本,确定所述多个用户关于所述目标原始文本的公共关注文本,以作为最终的关注文本,可进一步提高文本摘要的准确度和有效性,也进一步提升了用户的阅读浏览体验。
接着,在步骤S3中,摘要提取设备1根据所述关注文本,结合所述目标原始文本的内容信息,通过诸如将文本作为句子的线性序列、将句子视为词的线性序列等方法,提取所述目标原始文本的文本摘要。
例如,假设文章《LTE物理下行控制信道盲检过程研究》通过现有技术中的自动文本摘要方法,也即基于全文内容或章节相关的段落内容而得到的文本摘要为“LTE物理下行控制信道为整个系统上下行阐述分配各种资源,对系统起着非常关键的调度作用。基于PDCCH信道发送与接收流程,根据PDCCH信道结构,详细分析了信道的调度
过程,为终端PDCCH信道接收制定了一种详细的盲检方法,为LTE
系统的实际实现提供了理论依据”,而在步骤S3中,摘要提取设备1根据在步骤S2中,摘要提取设备1确定的所述关注文本,如“搜索空间”部分对应的第[5-10]段,结合所述目标原始文本的内容信息,通过诸如将文本作为句子的线性序列、将句子视为词的线性序列等方法,提取得到文章《LTE物理下行控制信道盲检过程研究》的文本摘要为“LTE物理下行控制信道为整个系统上下行阐述分配各种资源,对系统起着非常关键的调度作用。基于PDCCH信道发送与接收流程,根据PDCCH信道结构,详细分析了信道的调度过程、专用搜索空间
和公共搜索空间,为终端PDCCH信道接收制定了一种详细的盲检方法,为LTE系统的实际实现提供了理论依据”。
在此,本发明提取的文本摘要不同于现有技术中得到文本摘要,相对于现有技术,本发明提取的文本摘要更能突出原始文献的重点,提高了文本摘要的准确度和有效性,相应地,也提升了用户的阅读浏览体验。
摘要提取设备1的各步骤之间是持续不断工作的。具体地,在步骤S1中,摘要提取设备1持续获取用户关于目标原始文本的阅读操作信息;在步骤S2中,摘要提取设备1持续根据所述阅读操作信息,确定所述用户关于所述目标原始文本的关注文本;在步骤S3中,摘要提取设备1持续根据所述关注文本,结合所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要。在此,本领域技术人员应当理解所述“持续”是指摘要提取设备1的各个步骤之间分别不断地进行阅读操作信息的获取、关注文本的确定与文本摘要提取,直至摘要提取设备1在较长时间内停止获取阅读操作信息。
优选地,若所述用户阅读所述目标原始文本过程中的停留时间大于预定阈值的段落为多个,或者,若所述目标原始文本中在所述关键词搜索模式下被浏览的段落为多个,其中,摘要提取设备1还包括步骤S4(未示出)。具体地,在步骤S4中,摘要提取设备1根据多个所述段落之间的内容关联度,从所述多个段落中确定优选关注文本,其中,在步骤S3中,摘要提取设备1根据所述优选关注文本,结合
所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要。
例如,对于文章《LTE物理下行控制信道盲检过程研究》,用户A在阅读该文章过程中停留时间大于预定阈值的段落为该文章的第[5-7]段,假设第[5-6]段的内容分别为专用搜索空间和公共搜索空间,而第[7]段的内容为盲检过程,则第[7]段的内容与第[5-6]段的内容之间的关联度比较小,则优选确定装置可确定第[5-6]段作为优选关注文本。
接着,在步骤S3中,摘要提取设备1根据所述优选关注文本,结合所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要。在此,在步骤S3中,摘要提取设备1根据所述优选关注文本,结合所述目标原始文本的内容信息,提取所述文本摘要的方式与前述在步骤S3中,摘要提取设备1根据所述关注文本,结合所述目标原始文本的内容信息,提取所述文本摘要的方式相同或基本相同,为简明起见,故在此不再赘述,并以引用的方式包含与此。
在此,本发明通过根据优选关注文本,结合所述目标原始文本的内容信息,来提取文本摘要,使得本发明提取的文本摘要更一步突出原始文献的重点,提高文本摘要的准确度和有效性,提升用户的阅读浏览体验。
优选地,所述阅读操作信息还包括所述用户关于所述目标原始文本的历史阅读记录,其中,在步骤S2中,摘要提取设备1根据所述历史阅读记录,确定所述关注文本。
优选地,所述历史阅读记录包括以下至少任一项:
-所述用户关于所述目标原始文本中段落的历史阅读频率信息;
-所述用户关于所述目标原始文本中段落的历史阅读平均时长。
例如,对于文章《LTE物理下行控制信道盲检过程研究》,假设用户A关于该文章的历史阅读记录中,经常阅读该文章中的第[5-6]段的内容,则在步骤S2中,摘要提取设备1根据该历史阅读记录,确定所述关注文本为该文章的第[5-6]段。
再如,对于用户A,假设其关于文章《LTE物理下行控制信道盲检过程研究》的历史阅读记录中,对第[6]段的历史阅读平均时长超过
预定阈值,则在步骤S2中,摘要提取设备1根据该历史阅读记录,确定所述关注文本为该文章的第[6]段。
需要注意的是,本发明可在软件和/或软件与硬件的组合体中被实施,例如,可采用专用集成电路(ASIC)、通用目的计算机或任何其他类似硬件设备来实现。在一个实施例中,本发明的软件程序可以通过处理器执行以实现上文所述步骤或功能。同样地,本发明的软件程序(包括相关的数据结构)可以被存储到计算机可读记录介质中,例如,RAM存储器,磁或光驱动器或软磁盘及类似设备。另外,本发明的一些步骤或功能可采用硬件来实现,例如,作为与处理器配合从而执行各个步骤或功能的电路。
另外,本发明的一部分可被应用为计算机程序产品,例如计算机程序指令,当其被计算机执行时,通过该计算机的操作,可以调用或提供根据本发明的方法和/或技术方案。而调用本发明的方法的程序指令,可能被存储在固定的或可移动的记录介质中,和/或通过广播或其他信号承载媒体中的数据流而被传输,和/或被存储在根据所述程序指令运行的计算机设备的工作存储器中。在此,根据本发明的一个实施例包括一个装置,该装置包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发该装置运行基于前述根据本发明的多个实施例的方法和/或技术方案。
对于本领域技术人员而言,显然本发明不限于上述示范性实施例的细节,而且在不背离本发明的精神或基本特征的情况下,能够以其他的具体形式实现本发明。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本发明的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。装置权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。
Claims (17)
- 一种用于提取文本摘要的方法,其中,该方法包括:获取用户关于目标原始文本的阅读操作信息;根据所述阅读操作信息,确定所述用户关于所述目标原始文本的关注文本;根据所述关注文本,结合所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要。
- 根据权利要求1所述的方法,其中,所述阅读操作信息包括以下至少任一项:-所述用户阅读所述目标原始文本过程中执行的停留操作;-所述用户阅读所述目标原始文本过程中执行的收藏相关操作;-所述用户阅读所述目标原始文本过程中的阅读模式信息;-所述用户阅读所述目标原始文本过程中的阅读速度。
- 根据权利要求2所述的方法,其中,所述用户阅读操作信息包括所述用户阅读所述目标原始文本过程中的停留操作;其中,确定所述用户关于所述目标原始文本的关注文本包括:-根据所述用户执行所述停留操作所对应的段落的停留时间,将停留时间大于预定时间阈值的段落作为所述关注文本。
- 根据权利要求2所述的方法,其中,所述阅读模式信息包括浏览模式和关键词搜索模式;其中,确定所述用户关于所述目标原始文本的关注文本包括:-若所述用户阅读所述目标原始文本过程中由所述浏览模式变换为所述关键词搜索模式,将所述目标原始文本中在所述关键词搜索模式下被浏览的段落作为所述关注文本。
- 根据权利要求3或4所述的方法,其中,若所述停留时间大于预定阈值的段落为多个,或者,若所述目标原始文本中在所述关键词搜索模式下被浏览的段落为多个,其中,该方法还包括:根据多个所述段落之间的内容关联度,从所述多个段落中确定优 选关注文本;其中,提取所述目标原始文本的文本摘要包括:-根据所述优选关注文本,结合所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要。
- 根据权利要求1或2所述的方法,其中,所述阅读操作信息还包括所述用户关于所述目标原始文本的历史阅读记录;其中,确定所述用户关于所述目标原始文本的关注文本包括:-根据所述历史阅读记录,确定所述关注文本。
- 根据权利要求6所述的方法,其中,所述历史阅读记录包括以下至少任一项:-所述用户关于所述目标原始文本中段落的历史阅读频率信息;-所述用户关于所述目标原始文本中段落的历史阅读平均时长。
- 一种用于提取文本摘要的摘要提取设备,其中,该摘要提取设备包括:用于获取用户关于目标原始文本的阅读操作信息的装置;用于根据所述阅读操作信息,确定所述用户关于所述目标原始文本的关注文本的装置;用于根据所述关注文本,结合所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要的装置。
- 根据权利要求8所述的摘要提取设备,其中,所述阅读操作信息包括以下至少任一项:-所述用户阅读所述目标原始文本过程中执行的停留操作;-所述用户阅读所述目标原始文本过程中执行的收藏相关操作;-所述用户阅读所述目标原始文本过程中的阅读模式信息;-所述用户阅读所述目标原始文本过程中的阅读速度。
- 根据权利要求9所述的摘要提取设备,其中,所述用户阅读操作信息包括所述用户阅读所述目标原始文本过程中的停留操作;其中,确定所述用户关于所述目标原始文本的关注文本的装置用于:-根据所述用户执行所述停留操作所对应的段落的停留时间,将停留时间大于预定时间阈值的段落作为所述关注文本。
- 根据权利要求9所述的摘要提取设备,其中,所述阅读模式信息包括浏览模式和关键词搜索模式;其中,确定所述用户关于所述目标原始文本的关注文本的装置用于:-若所述用户阅读所述目标原始文本过程中由所述浏览模式变换为所述关键词搜索模式,将所述目标原始文本中在所述关键词搜索模式下被浏览的段落作为所述关注文本。
- 根据权利要求10或11所述的摘要提取设备,其中,若所述停留时间大于预定阈值的段落为多个,或者,若所述目标原始文本中在所述关键词搜索模式下被浏览的段落为多个,其中,该摘要提取设备还包括:用于根据多个所述段落之间的内容关联度,从所述多个段落中确定优选关注文本的装置;其中,提取所述目标原始文本的文本摘要的装置用于:-根据所述优选关注文本,结合所述目标原始文本的内容信息,提取所述目标原始文本的文本摘要。
- 根据权利要求8或9所述的摘要提取设备,其中,所述阅读操作信息还包括所述用户关于所述目标原始文本的历史阅读记录;其中,确定所述用户关于所述目标原始文本的关注文本的装置用于:-根据所述历史阅读记录,确定所述关注文本。
- 根据权利要求13所述的摘要提取设备,其中,所述历史阅读记录包括以下至少任一项:-所述用户关于所述目标原始文本中段落的历史阅读频率信息;-所述用户关于所述目标原始文本中段落的历史阅读平均时长。
- 一种计算机可读存储介质,所述计算机可读存储介质包括计算机指令,当所述计算机指令被执行时,如权利要求1至7中任一项所述的方法被执行。
- 一种计算机程序产品,当所述计算机程序产品被执行时,如权利要求1至7中任一项所述的方法被执行。
- 一种计算机设备,所述计算机设备包括存储器和处理器,所述存储器中存储有计算机代码,所述处理器被配置来通过执行所述计算机代码以执行如权利要求1至7中任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510515872.6A CN106469176B (zh) | 2015-08-20 | 2015-08-20 | 一种用于提取文本摘要的方法与设备 |
CN201510515872.6 | 2015-08-20 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017028407A1 true WO2017028407A1 (zh) | 2017-02-23 |
Family
ID=58051555
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2015/096931 WO2017028407A1 (zh) | 2015-08-20 | 2015-12-10 | 一种用于提取文本摘要的方法与设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN106469176B (zh) |
WO (1) | WO2017028407A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108520014A (zh) * | 2018-03-21 | 2018-09-11 | 广东欧珀移动通信有限公司 | 信息分享方法、装置、移动终端和计算机可读介质 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109190109B (zh) * | 2018-07-26 | 2020-09-29 | 中国科学院自动化研究所 | 融合用户信息生成评论摘要的方法及装置 |
CN110085066B (zh) * | 2019-04-17 | 2021-12-21 | 北京小米移动软件有限公司 | 展示阅读信息的方法、装置及电子设备 |
CN114115670A (zh) * | 2021-07-30 | 2022-03-01 | 荣耀终端有限公司 | 提醒生成文本摘要的方法、生成文本摘要的方法及装置 |
CN114722194B (zh) * | 2022-03-15 | 2023-05-09 | 电子科技大学 | 一种基于摘要生成算法的突发事件时间序列自动构建方法 |
CN115248803B (zh) * | 2022-09-22 | 2023-02-17 | 天津联想协同科技有限公司 | 适用于网盘文件的收藏方法、装置、网盘及存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1341899A (zh) * | 2000-09-07 | 2002-03-27 | 国际商业机器公司 | 为文字文档自动生成摘要的方法 |
CN101004737A (zh) * | 2007-01-24 | 2007-07-25 | 贵阳易特软件有限公司 | 基于关键词的个性化文档处理系统 |
CN101567004A (zh) * | 2009-02-06 | 2009-10-28 | 浙江大学 | 基于眼球跟踪的英文文本自动摘要方法 |
US8650483B2 (en) * | 2003-10-22 | 2014-02-11 | Shi Xia Liu | Method and apparatus for improving the readability of an automatically machine-generated summary |
CN104503958A (zh) * | 2014-11-19 | 2015-04-08 | 百度在线网络技术(北京)有限公司 | 文档摘要的生成方法及装置 |
CN104636465A (zh) * | 2015-02-10 | 2015-05-20 | 百度在线网络技术(北京)有限公司 | 网页摘要生成方法、展示方法及相应装置 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102254014B (zh) * | 2011-07-21 | 2013-06-05 | 华中科技大学 | 一种网页特征自适应的信息抽取方法 |
CN103838792A (zh) * | 2012-11-27 | 2014-06-04 | 大连灵动科技发展有限公司 | 一种网页主题确定的方法 |
CN103793481B (zh) * | 2014-01-16 | 2017-02-15 | 中国科学院软件研究所 | 基于用户兴趣挖掘的微博词云生成方法及访问支持系统 |
CN103885935B (zh) * | 2014-03-12 | 2016-06-29 | 浙江大学 | 基于图书阅读行为的图书章节摘要生成方法 |
CN104090929A (zh) * | 2014-06-23 | 2014-10-08 | 吕志雪 | 一种个性化图片推荐方法及装置 |
-
2015
- 2015-08-20 CN CN201510515872.6A patent/CN106469176B/zh active Active
- 2015-12-10 WO PCT/CN2015/096931 patent/WO2017028407A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1341899A (zh) * | 2000-09-07 | 2002-03-27 | 国际商业机器公司 | 为文字文档自动生成摘要的方法 |
US8650483B2 (en) * | 2003-10-22 | 2014-02-11 | Shi Xia Liu | Method and apparatus for improving the readability of an automatically machine-generated summary |
CN101004737A (zh) * | 2007-01-24 | 2007-07-25 | 贵阳易特软件有限公司 | 基于关键词的个性化文档处理系统 |
CN101567004A (zh) * | 2009-02-06 | 2009-10-28 | 浙江大学 | 基于眼球跟踪的英文文本自动摘要方法 |
CN104503958A (zh) * | 2014-11-19 | 2015-04-08 | 百度在线网络技术(北京)有限公司 | 文档摘要的生成方法及装置 |
CN104636465A (zh) * | 2015-02-10 | 2015-05-20 | 百度在线网络技术(北京)有限公司 | 网页摘要生成方法、展示方法及相应装置 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108520014A (zh) * | 2018-03-21 | 2018-09-11 | 广东欧珀移动通信有限公司 | 信息分享方法、装置、移动终端和计算机可读介质 |
Also Published As
Publication number | Publication date |
---|---|
CN106469176B (zh) | 2019-08-16 |
CN106469176A (zh) | 2017-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017028407A1 (zh) | 一种用于提取文本摘要的方法与设备 | |
US10122839B1 (en) | Techniques for enhancing content on a mobile device | |
US9524714B2 (en) | Speech recognition apparatus and method thereof | |
US9886430B2 (en) | Entity based content selection | |
CN102024064B (zh) | 快速搜索方法和移动通信终端 | |
US10380120B2 (en) | Automatic discovery and presentation of topic summaries related to a selection of text | |
US10169467B2 (en) | Query formulation via task continuum | |
CN108846091B (zh) | 资讯推荐方法、装置及设备 | |
CN104899220B (zh) | 应用程序推荐方法和系统 | |
US10585923B2 (en) | Generating search keyword suggestions from recently used application | |
JP2018504727A (ja) | 参考文書の推薦方法及び装置 | |
JP6956119B2 (ja) | 文脈情報を提供するためのシステムおよび方法 | |
US9690757B2 (en) | Method of and system for processing content of a web resource in a browser application | |
US20140359413A1 (en) | Apparatuses and methods for webpage content processing | |
CN105094603B (zh) | 一种关联输入的方法与装置 | |
CN104281644A (zh) | 文件名信息的显示方法和装置 | |
CN105095253B (zh) | 网页显示方法及装置 | |
CN107291772B (zh) | 一种搜索访问方法、装置及电子设备 | |
WO2016078480A1 (zh) | 一种用于提供时效性图片搜索结果的方法与设备 | |
WO2018018882A1 (zh) | 一种语音播报方法及装置 | |
RU2654789C2 (ru) | Способ (варианты) и электронное устройство (варианты) обработки речевого запроса пользователя | |
RU2631975C2 (ru) | Способ и система для обработки входных команд пользователя | |
US20150261857A1 (en) | Method And Device For Accessing Websites Via Keywords | |
CN113033163B (zh) | 一种数据处理方法、装置和电子设备 | |
RU2632126C1 (ru) | Способ и система предоставления контекстуальной информации |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15901610 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15901610 Country of ref document: EP Kind code of ref document: A1 |