CN110162778A - The generation method and device of text snippet - Google Patents

The generation method and device of text snippet Download PDF

Info

Publication number
CN110162778A
CN110162778A CN201910263357.1A CN201910263357A CN110162778A CN 110162778 A CN110162778 A CN 110162778A CN 201910263357 A CN201910263357 A CN 201910263357A CN 110162778 A CN110162778 A CN 110162778A
Authority
CN
China
Prior art keywords
sentence
tightness
text
designated key
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910263357.1A
Other languages
Chinese (zh)
Other versions
CN110162778B (en
Inventor
赵智源
周书恒
郭亚
黄同同
祝慧佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910263357.1A priority Critical patent/CN110162778B/en
Publication of CN110162778A publication Critical patent/CN110162778A/en
Application granted granted Critical
Publication of CN110162778B publication Critical patent/CN110162778B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This specification one or more embodiment discloses the generation method and device of a kind of text snippet, to realize simplicity, quickly extract text snippet relevant to personalized theme.The described method includes: obtaining multiple sentences in target text;According to preset subject classification mode, the first tightness between each sentence and designated key is predicted;And determine similarity between each sentence;According to the similarity between each sentence, first tightness is corrected at least once using assignment algorithm, obtains the second tightness between each sentence and the designated key;Each sentence is handled according to second tightness, obtains the text snippet relevant to the designated key of the target text.

Description

The generation method and device of text snippet
Technical field
This specification is related to the generation method and device of text-processing technical field more particularly to a kind of text snippet.
Background technique
In the summarization generation field of long article, the text snippet of mainstream is automatically generated there are two types of mode: one is be based on Textrank is found from original text with one or several immediate sentence of central idea, and production is readed over by computer Original text;Another kind is the production based on deep neural network, and this method generally requires a large amount of corpus labeling, higher cost.
In summarization generation scene, in addition to (this kind of make a summary can give birth to conventional summary type abstract usually using Textran At), also may require that in some scenes and generate qualitative theme related abstract, such as user personalized summary generate, needle To the risk summarization generation etc. of content auditing.In this case, since Textrank is primarily directed to the center of content entirety Sentence is won, therefore can not extract abstract relevant with target topic.And the summarization generation side based on deep neural network Although formula can solve problems by personalization mark, the mark higher cost for class of making a summary often also can not very just It realizes promptly.As it can be seen that current abstraction generating method can not be all used in well in the generation of qualitative theme related abstract.
Summary of the invention
The purpose of this specification one or more embodiment is to provide the generation method and device of a kind of text snippet, to It realizes simplicity, quickly extract text snippet relevant to personalized theme.
In order to solve the above technical problems, this specification one or more embodiment is achieved in that
On the one hand, this specification one or more embodiment provides a kind of generation method of text snippet, comprising:
Obtain multiple sentences in target text;
According to preset subject classification mode, the first tightness between each sentence and designated key is predicted;And really Similarity between fixed each sentence;
According to the similarity between each sentence, first tightness is repaired at least once using assignment algorithm Just, the second tightness between each sentence and the designated key is obtained;
Each sentence is handled according to second tightness, it is obtaining the target text with the specified master Inscribe relevant text snippet.
In one embodiment, described according to preset subject classification mode, predict each sentence and designated key it Between the first tightness, comprising:
Determine the corresponding descriptor of the designated key;
The sentence is analyzed, to determine the including descriptor and/or the descriptor in the sentence The number of related term;
According to the first tightness between sentence described in the estimated number and the designated key;Wherein, the number It is positively correlated between first tightness.
In one embodiment, the assignment algorithm is Pagerank algorithm;
Correspondingly, the similarity according between each sentence, using assignment algorithm to first tightness into Row is corrected at least once, comprising:
Using each sentence as a node, each node is created according to the similarity between each sentence Between relational network figure;
Determine that each first tightness is the corresponding initial weight of each sentence;
According to the relational network figure, changed at least once using the Pagerank algorithm to the initial weight In generation, obtains the corresponding final weight of each sentence.
In one embodiment, the similarity according between each sentence creates the relationship between each node Network, comprising:
When the similarity between two sentences reaches preset threshold, determine corresponding two nodes of described two sentences it Between there are a lines;
According to the side between each node, the relational network figure between each node is created.
In one embodiment, the multiple sentences obtained in target text, comprising:
The target text is split according to specified punctuation mark, obtains the multiple sentence.
In one embodiment, described that each sentence is handled according to second tightness, obtain the mesh Mark the text snippet relevant to the designated key of text, comprising:
According to the sequence of second tightness from big to small, each sentence is ranked up and splicing, is obtained Orderly text;
Determine that the orderly text is the text snippet relevant to the designated key of the target text.
In one embodiment, the sequence according to second tightness from big to small carries out each sentence Sequence and splicing, comprising:
Filter out corresponding first sentence of second tightness for reaching the second preset threshold;
For first sentence, according to corresponding second tightness of each first sentence from big to small suitable Sequence, is ranked up each first sentence and splicing.
On the other hand, this specification one or more embodiment provides a kind of generating means of text snippet, comprising:
Module is obtained, for obtaining multiple sentences in target text;
Prediction and determining module, for according to preset subject classification mode, predict each sentence and designated key it Between the first tightness;And determine similarity between each sentence;
Correction module, for according to the similarity between each sentence, using assignment algorithm to first tightness It is corrected at least once, obtains the second tightness between each sentence and the designated key;
Processing module obtains the target text for handling according to second tightness each sentence Text snippet relevant to the designated key.
In one embodiment, the prediction and determining module include:
First determination unit, for determining the corresponding descriptor of the designated key;
Analytical unit, for analyzing the sentence, to determine in the sentence comprising the descriptor and/or The number of the related term of the descriptor;
Predicting unit, for the first tightness between the sentence according to the estimated number and the designated key; Wherein, it is positively correlated between the number and first tightness.
In one embodiment, the assignment algorithm is Pagerank algorithm;
Correspondingly, the correction module includes:
Creating unit is used for using each sentence as a node, according to the similarity between each sentence Create the relational network figure between each node;
Second determination unit, for determining that each first tightness is the corresponding initial weight of each sentence;
Iteration unit, for according to the relational network figure, using the Pagerank algorithm to the initial weight into Capable iteration at least once, obtains the corresponding final weight of each sentence.
In one embodiment, the creating unit is also used to:
When the similarity between two sentences reaches preset threshold, determine corresponding two nodes of described two sentences it Between there are a lines;
According to the side between each node, the relational network figure between each node is created.
In one embodiment, the acquisition module includes:
Split cells obtains the multiple sentence for splitting according to specified punctuation mark to the target text.
In one embodiment, the processing module includes:
Sequence and concatenation unit carry out each sentence for the sequence according to second tightness from big to small Sequence and splicing, obtain orderly text;
Third determination unit, for determining that the orderly text is the relevant to the designated key of the target text Text snippet.
In one embodiment, the sequence and concatenation unit are also used to:
Filter out corresponding first sentence of second tightness for reaching the second preset threshold;
For first sentence, according to corresponding second tightness of each first sentence from big to small suitable Sequence, is ranked up each first sentence and splicing.
In another aspect, this specification one or more embodiment provides a kind of generating device of text snippet, comprising:
Processor;And
It is arranged to the memory of storage computer executable instructions, the executable instruction makes the place when executed Manage device:
Obtain multiple sentences in target text;
According to preset subject classification mode, the first tightness between each sentence and designated key is predicted;And really Similarity between fixed each sentence;
According to the similarity between each sentence, first tightness is repaired at least once using assignment algorithm Just, the second tightness between each sentence and the designated key is obtained;
Each sentence is handled according to second tightness, it is obtaining the target text with the specified master Inscribe relevant text snippet.
In another aspect, the embodiment of the present application provides a kind of storage medium, for storing computer executable instructions, it is described can It executes instruction and realizes following below scheme when executed:
Obtain multiple sentences in target text;
According to preset subject classification mode, the first tightness between each sentence and designated key is predicted;And really Similarity between fixed each sentence;
According to the similarity between each sentence, first tightness is repaired at least once using assignment algorithm Just, the second tightness between each sentence and the designated key is obtained;
Each sentence is handled according to second tightness, it is obtaining the target text with the specified master Inscribe relevant text snippet.
Using the technical solution of this specification one or more embodiment, by obtaining multiple sentences in target text, And predict the first tightness between each sentence and designated key according to preset subject classification mode, that is, get each sentence with Initial tightness (also referred to as rough tightness) between designated key;Further according to the similarity between each sentence, benefit The first tightness is modified with assignment algorithm, obtains the second tightness between each sentence and designated key, this is second tight Density i.e. more accurate tightness;And then each sentence is handled according to the second tightness, it obtains and designated key Relevant text snippet.As it can be seen that the technical solution is due to each sentence based on target text and the tightness between designated key What the text snippet to obtain text snippet relevant to designated key, therefore obtain necessarily was consistent with designated key height, It is achieved that the generation purpose of personalized text snippet, i.e., can extract the text snippet of different themes preference in one text. In addition, the technical solution without carrying out additional mark to text, belongs to unsupervised algorithm, thus save a large amount of mark at This.
Detailed description of the invention
In order to illustrate more clearly of this specification one or more embodiment or technical solution in the prior art, below will A brief introduction will be made to the drawings that need to be used in the embodiment or the description of the prior art, it should be apparent that, it is described below Attached drawing is only some embodiments recorded in this specification one or more embodiment, and those of ordinary skill in the art are come It says, without any creative labor, is also possible to obtain other drawings based on these drawings.
Fig. 1 is the schematic flow chart according to a kind of generation method of text snippet of one embodiment of this specification;
Fig. 2 is the relational network figure between a kind of sentence node according to one embodiment of this specification;
Fig. 3 is the schematic block diagram according to a kind of generating means of text snippet of one embodiment of this specification;
Fig. 4 is the schematic block diagram according to a kind of generating device of text snippet of one embodiment of this specification.
Specific embodiment
This specification one or more embodiment provides the generation method and device of a kind of text snippet, to realize letter Just, text snippet relevant to personalized theme is quickly extracted.
In order to make those skilled in the art more fully understand the technical solution in this specification one or more embodiment, Below in conjunction with the attached drawing in this specification one or more embodiment, to the technology in this specification one or more embodiment Scheme is clearly and completely described, it is clear that and described embodiment is only this specification a part of the embodiment, rather than Whole embodiments.Based on this specification one or more embodiment, those of ordinary skill in the art are not making creativeness The model of this specification one or more embodiment protection all should belong in every other embodiment obtained under the premise of labour It encloses.
Fig. 1 is according to a kind of schematic flow chart of the generation method of text snippet of one embodiment of this specification, such as Fig. 1 It is shown, this method comprises:
S102 obtains multiple sentences in target text.
S104 predicts the first tightness between each sentence and designated key according to preset subject classification mode;And Determine the similarity between each sentence.
S106 corrects the first tightness using assignment algorithm according to the similarity between each sentence at least once, Obtain the second tightness between each sentence and designated key.
S108 is handled each sentence according to the second tightness, obtains the text relevant to designated key of target text This abstract.
Using the technical solution of this specification one or more embodiment, by obtaining multiple sentences in target text, And predict the first tightness between each sentence and designated key according to preset subject classification mode, that is, get each sentence with Initial tightness (also referred to as rough tightness) between designated key;Further according to the similarity between each sentence, benefit The first tightness is modified with assignment algorithm, obtains the second tightness between each sentence and designated key, this is second tight Density i.e. more accurate tightness;And then each sentence is handled according to the second tightness, it obtains and designated key Relevant text snippet.As it can be seen that the technical solution is due to each sentence based on target text and the tightness between designated key What the text snippet to obtain text snippet relevant to designated key, therefore obtain necessarily was consistent with designated key height, It is achieved that the generation purpose of personalized text snippet, i.e., can extract the text snippet of different themes preference in one text. In addition, the technical solution without carrying out additional mark to text, belongs to unsupervised algorithm, thus save a large amount of mark at This.
The generation method of text snippet provided by the above embodiment described in detail below.
Multiple sentences in target text are obtained first.In the step, can according to specified punctuation mark to target text into Row is split, to obtain multiple sentences.For example, specified punctuation mark include fullstop "." and branch ";", then, utilize target text In this punctuation mark "." and ";" target text is split after, can be obtained multiple sentences.
After getting multiple sentences in target text, can be predicted according to preset subject classification mode each sentence with The first tightness between designated key.Wherein, subject classification mode may include topic model or subject classification rule.Specified master Topic can be theme involved by the content of text of any target text, such as designated key is that sport, cuisines etc. are all kinds of Theme.
Topic model is the statistical model clustered in a manner of unsupervised learning to the implicit semantic structure of collected works, Semantic analysis and text mining can be carried out to each sentence in text, such as text is collected, classify and is dropped by theme Dimension etc..
In one embodiment, it can be predicted between each sentence and designated key using existing any topic model First tightness.For example, designated key is sport, then can analyze each sentence when using topic model analysis target text In include the information arbitrarily relevant to sport such as basketball, swimming, running, body-building, and then these information predictions by analyzing The first tightness between each sentence and designated key out.Wherein, the frequency for occurring word relevant to sport in sentence is higher, should The first tightness between sentence and designated key is also higher.
In another embodiment, the between sentence and designated key is predicted using preset subject classification rule One tightness.Subject classification rule may include such as under type:
Firstly, determining the corresponding descriptor of designated key.
Wherein, descriptor may include keyword relevant to designated key.For example, designated key is sport, then descriptor It may include keyword " sport ", " movement " etc..Descriptor can be set according to the actual demand of user, such as, if it is desired to it is raw At text snippet relevant to " sport " this major class, then can set descriptor includes " sport " and " movement ";If wanting to generate Be related to the text snippet of all related subjects of " sport ", then can set the descriptor more refined, such as may include " sport ", " basketball ", " shuttlecock " and " swimming " etc..
Secondly, analyzing sentence, to determine the number of the related term comprising descriptor and/or descriptor in sentence Mesh.
Wherein, the related term of descriptor may include the near synonym of descriptor.For example, descriptor is " cuisines ", then to sentence When being analyzed, word similar in a kind of meaning such as " cuisines " that include in sentence, " food " can be excavated.
Again, according to the number for the related term in sentence including descriptor and/or descriptor, sentence and designated key are predicted Between the first tightness;Wherein, the number and the first tightness of the related term in sentence comprising descriptor and/or descriptor it Between be positively correlated.That is, the number of the related term in sentence comprising descriptor and/or descriptor is more, the sentence and designated key it Between the first tightness it is higher.
Then the similarity between each sentence is determined.In one embodiment, existing any similarity operator can be used Method calculates the similarity between each sentence.For example, between each sentence Levenshtein distance (i.e. string encoding away from From) as the similarity between each sentence.
After determining the similarity between each sentence, the first tightness predicted is modified using assignment algorithm.
In one embodiment, assignment algorithm is Pagerank algorithm.It, can be close to first in the following way based on this Degree is corrected at least once:
Firstly, being created between each node using each sentence as a node according to the similarity between each sentence Relational network figure.
In the step, when creating relational network figure, it can be determined the need for establishing according to the similarity between each sentence each Side between sentence node.Specifically, determining the two sentences pair when the similarity between two sentences reaches preset threshold There are a lines between two nodes answered, and then according to the side between each sentence node, can create between each sentence node Relational network figure.
Fig. 2 shows the relational network figures between each sentence node in a specific embodiment.As shown in Fig. 2, relational network It include tri- sentence nodes of A, B, C in figure, wherein due to the similarity between sentence A and sentence B, between sentence A and sentence C Reach preset threshold, therefore there are a line, sentence node A and sentence sections between corresponding sentence node A and sentence node B There is also a lines between point C;Since the similarity between sentence B and sentence C is not up to preset threshold, corresponding sentence Side is not present between node B and sentence node C.
Secondly, determining that each first tightness is the corresponding initial weight of each sentence.
Again, according to relational network figure, iteration at least once is carried out to initial weight using Pagerank algorithm, is obtained each The corresponding final weight of sentence.
In the step, the mode that initial weight is multiplied with Iterative Matrix can be used and be iterated, such as following equation (1), In, A is Iterative Matrix, PnThe weight obtained for nth iteration.It is preset until the difference between the corresponding weight of each sentence is less than When threshold value, iteration can be stopped, the weight obtained at this time is the corresponding final weight of each sentence.
Pn+1=APn (1)
The corresponding final weight of each sentence, the second tightness between as each sentence and designated key.
After obtaining the second tightness between each sentence and designated key, can according to the second tightness to each sentence into Row processing, to obtain the text snippet relevant to designated key of target text.
In one embodiment, each sentence can be ranked up and is spliced according to the sequence of the second tightness from big to small Processing, obtains orderly text;Then determine that the orderly text is the text snippet relevant to designated key of target text.
In one embodiment, corresponding first sentence of the second tightness for reaching the second preset threshold can be first filtered out, Then the sequence according to corresponding second tightness of each first sentence from big to small, is ranked up each first sentence and stitching portion Reason, to obtain the text snippet relevant to designated key of target text.
In the present embodiment, the corresponding sentence of the second tightness of certain threshold value is reached by filtering out, and to these sentences It is ranked up, splices, the tightness of the text snippet obtained and designated key can be made higher.
To sum up, the specific embodiment of this theme is described.Other embodiments are in the appended claims In range.In some cases, the movement recorded in detail in the claims can execute and still in a different order Desired result may be implemented.In addition, process depicted in the drawing not necessarily requires the particular order shown or continuous suitable Sequence, to realize desired result.In some embodiments, multitasking and parallel processing can be advantageous.
The above are the generation methods for the text snippet that this specification one or more embodiment provides, and are thought based on same Road, this specification one or more embodiment also provide a kind of generating means of text snippet.
Fig. 3 is according to a kind of schematic flow chart of the generating means of text snippet of one embodiment of this specification, such as Fig. 3 Shown, the generating means 300 of text snippet include:
Module 310 is obtained, for obtaining multiple sentences in target text;
Prediction and determining module 320, for predicting between each sentence and designated key according to preset subject classification mode The first tightness;And determine similarity between each sentence;
Correction module 330, for according to the similarity between each sentence, using assignment algorithm to the first tightness carry out to Few primary amendment, obtains the second tightness between each sentence and designated key;
Processing module 340 is obtaining target text with specified master for being handled according to the second tightness each sentence Inscribe relevant text snippet.
In one embodiment, prediction and determining module 320 include:
First determination unit, for determining the corresponding descriptor of designated key;
Analytical unit, for analyzing sentence, to determine the correlation in sentence comprising descriptor and/or descriptor The number of word;
Predicting unit, for according to the first tightness between estimated number sentence and designated key;Wherein, number and the It is positively correlated between one tightness.
In one embodiment, assignment algorithm is Pagerank algorithm;
Correspondingly, correction module 330 includes:
Creating unit, for creating each section according to the similarity between each sentence using each sentence as a node Relational network figure between point;
Second determination unit, for determining that each first tightness is the corresponding initial weight of each sentence;
Iteration unit, for being changed at least once to initial weight using Pagerank algorithm according to relational network figure In generation, obtains the corresponding final weight of each sentence.
In one embodiment, creating unit is also used to:
When the similarity between two sentences reaches preset threshold, determines and deposited between corresponding two nodes of two sentences In a line;
According to the side between each node, the relational network figure between each node is created.
In one embodiment, obtaining module 310 includes:
Split cells obtains multiple sentences for splitting according to specified punctuation mark to target text.
In one embodiment, processing module 340 includes:
Sequence and concatenation unit are ranked up and spell to each sentence for the sequence according to the second tightness from big to small Processing is connect, orderly text is obtained;
Third determination unit, for determining that orderly text is the text snippet relevant to designated key of target text.
In one embodiment, sequence and concatenation unit are also used to:
Filter out corresponding first sentence of the second tightness for reaching the second preset threshold;
For the first sentence, according to the sequence of corresponding second tightness of each first sentence from big to small, to each first Son is ranked up and splicing.
Using the device of this specification one or more embodiment, by obtaining multiple sentences in target text, and press The first tightness between each sentence and designated key is predicted according to preset subject classification mode, that is, gets each sentence and is specified Initial tightness (also referred to as rough tightness) between theme;Further according to the similarity between each sentence, using referring to Determine algorithm to be modified the first tightness, obtains the second tightness between each sentence and designated key, second tightness Namely more accurate tightness;And then each sentence is handled according to the second tightness, it obtains related to designated key Text snippet.As it can be seen that the technical solution is obtained due to the tightness between each sentence and designated key based on target text What the text snippet for taking text snippet relevant to designated key, therefore obtaining necessarily was consistent with designated key height, therefore The generation purpose of personalized text snippet is realized, i.e., can extract the text snippet of different themes preference in one text.In addition, The technical solution belongs to unsupervised algorithm, therefore save a large amount of mark cost without carrying out additional mark to text.
It should be understood that the generating means of above-mentioned text snippet can be previously described for realizing The generation method of text snippet, datail description therein should be described with method part above it is similar, it is cumbersome to avoid, it is not another herein It repeats.
Based on same thinking, this specification one or more embodiment also provides a kind of generating device of text snippet, As shown in Figure 4.The generating device of text snippet can generate bigger difference because configuration or performance are different, may include one Or more than one processor 401 and memory 402, it can store one or more storage applications in memory 402 Program or data.Wherein, memory 402 can be of short duration storage or persistent storage.The application program for being stored in memory 402 can To include one or more modules (diagram is not shown), each module may include in the generating device to text snippet Series of computation machine executable instruction.Further, processor 401 can be set to communicate with memory 402, pluck in text The series of computation machine executable instruction in memory 402 is executed in the generating device wanted.The generating device of text snippet may be used also To include one or more power supplys 403, one or more wired or wireless network interfaces 404, one or one with Upper input/output interface 405, one or more keyboards 406.
Specifically in the present embodiment, the generating device of text snippet includes memory and one or more Program, perhaps more than one program is stored in memory and one or more than one program may include one for one of them A or more than one module, and each module may include that series of computation machine in generating device to text snippet is executable Instruction, and be configured to execute this or more than one program by one or more than one processor to include for carrying out Following computer executable instructions:
Obtain multiple sentences in target text;
According to preset subject classification mode, the first tightness between each sentence and designated key is predicted;And really Similarity between fixed each sentence;
According to the similarity between each sentence, first tightness is repaired at least once using assignment algorithm Just, the second tightness between each sentence and the designated key is obtained;
Each sentence is handled according to second tightness, it is obtaining the target text with the specified master Inscribe relevant text snippet.
Optionally, computer executable instructions when executed, can also make the processor:
Determine the corresponding descriptor of the designated key;
The sentence is analyzed, to determine the including descriptor and/or the descriptor in the sentence The number of related term;
According to the first tightness between sentence described in the estimated number and the designated key;Wherein, the number It is positively correlated between first tightness.
Optionally, the assignment algorithm is Pagerank algorithm;
Correspondingly, computer executable instructions are when executed, the processor can also be made:
Using each sentence as a node, each node is created according to the similarity between each sentence Between relational network figure;
Determine that each first tightness is the corresponding initial weight of each sentence;
According to the relational network figure, changed at least once using the Pagerank algorithm to the initial weight In generation, obtains the corresponding final weight of each sentence.
Optionally, computer executable instructions when executed, can also make the processor:
When the similarity between two sentences reaches preset threshold, determine corresponding two nodes of described two sentences it Between there are a lines;
According to the side between each node, the relational network figure between each node is created.
Optionally, computer executable instructions when executed, can also make the processor:
The target text is split according to specified punctuation mark, obtains the multiple sentence.
Optionally, computer executable instructions when executed, can also make the processor:
According to the sequence of second tightness from big to small, each sentence is ranked up and splicing, is obtained Orderly text;
Determine that the orderly text is the text snippet relevant to the designated key of the target text.
Optionally, computer executable instructions when executed, can also make the processor:
Filter out corresponding first sentence of second tightness for reaching the second preset threshold;
For first sentence, according to corresponding second tightness of each first sentence from big to small suitable Sequence, is ranked up each first sentence and splicing.
This specification one or more embodiment also proposed a kind of computer readable storage medium, this is computer-readable to deposit Storage media stores one or more programs, which includes instruction, and it is included multiple application programs which, which works as, Electronic equipment when executing, the electronic equipment can be made to execute the generation method of above-mentioned text snippet, and be specifically used for executing:
Obtain multiple sentences in target text;
According to preset subject classification mode, the first tightness between each sentence and designated key is predicted;And really Similarity between fixed each sentence;
According to the similarity between each sentence, first tightness is repaired at least once using assignment algorithm Just, the second tightness between each sentence and the designated key is obtained;
Each sentence is handled according to second tightness, it is obtaining the target text with the specified master Inscribe relevant text snippet.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit can be realized in the same or multiple software and or hardware when specification one or more embodiment.
It should be understood by those skilled in the art that, this specification one or more embodiment can provide for method, system or Computer program product.Therefore, complete hardware embodiment can be used in this specification one or more embodiment, complete software is implemented The form of example or embodiment combining software and hardware aspects.Moreover, this specification one or more embodiment can be used one It is a or it is multiple wherein include computer usable program code computer-usable storage medium (including but not limited to disk storage Device, CD-ROM, optical memory etc.) on the form of computer program product implemented.
This specification one or more embodiment is referring to according to the method for the embodiment of the present application, equipment (system) and meter The flowchart and/or the block diagram of calculation machine program product describes.It should be understood that can be realized by computer program instructions flow chart and/ Or the combination of the process and/or box in each flow and/or block and flowchart and/or the block diagram in block diagram.It can These computer program instructions are provided at general purpose computer, special purpose computer, Embedded Processor or other programmable datas The processor of equipment is managed to generate a machine, so that holding by the processor of computer or other programmable data processing devices Capable instruction generates for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram The device of specified function.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.
This specification one or more embodiment can computer executable instructions it is general on It hereinafter describes, such as program module.Generally, program module includes executing particular task or realization particular abstract data type Routine, programs, objects, component, data structure etc..The application can also be practiced in a distributed computing environment, at these In distributed computing environment, by executing task by the connected remote processing devices of communication network.In distributed computing In environment, program module can be located in the local and remote computer storage media including storage equipment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
The foregoing is merely this specification one or more embodiments, are not limited to this specification.For this For the technical staff of field, this specification one or more embodiment can have various modifications and variations.It is all in this specification one Any modification, equivalent replacement, improvement and so within the spirit and principle of a or multiple embodiments, should be included in this explanation Within the scope of the claims of book one or more embodiment.

Claims (16)

1. a kind of generation method of text snippet, comprising:
Obtain multiple sentences in target text;
According to preset subject classification mode, the first tightness between each sentence and designated key is predicted;And it determines each Similarity between the sentence;
According to the similarity between each sentence, first tightness is corrected at least once using assignment algorithm, Obtain the second tightness between each sentence and the designated key;
Each sentence is handled according to second tightness, it is obtaining the target text with the designated key phase The text snippet of pass.
2. according to the method described in claim 1, described according to preset subject classification mode, each sentence and specified is predicted The first tightness between theme, comprising:
Determine the corresponding descriptor of the designated key;
The sentence is analyzed, to determine the correlation in the sentence comprising the descriptor and/or the descriptor The number of word;
According to the first tightness between sentence described in the estimated number and the designated key;Wherein, the number and institute It states and is positively correlated between the first tightness.
3. according to the method described in claim 1, the assignment algorithm is Pagerank algorithm;
Correspondingly, the similarity according between each sentence, using assignment algorithm to first tightness carry out to Few primary amendment, comprising:
Using each sentence as a node, created between each node according to the similarity between each sentence Relational network figure;
Determine that each first tightness is the corresponding initial weight of each sentence;
According to the relational network figure, iteration at least once is carried out to the initial weight using the Pagerank algorithm, is obtained To the corresponding final weight of each sentence.
4. according to the method described in claim 3, the similarity according between each sentence create each node it Between relational network figure, comprising:
When the similarity between two sentences reaches preset threshold, determines and deposited between corresponding two nodes of described two sentences In a line;
According to the side between each node, the relational network figure between each node is created.
5. according to the method described in claim 1, the multiple sentences obtained in target text, comprising:
The target text is split according to specified punctuation mark, obtains the multiple sentence.
6. being obtained according to the method described in claim 1, described handled each sentence according to second tightness The text snippet relevant to the designated key of the target text, comprising:
According to the sequence of second tightness from big to small, each sentence is ranked up and splicing, is obtained orderly Text;
Determine that the orderly text is the text snippet relevant to the designated key of the target text.
7. according to the method described in claim 6, the sequence according to second tightness from big to small, to each sentence Son is ranked up and splicing, comprising:
Filter out corresponding first sentence of second tightness for reaching the second preset threshold;
It is right according to the sequence of corresponding second tightness of each first sentence from big to small for first sentence Each first sentence is ranked up and splicing.
8. a kind of generating means of text snippet, comprising:
Module is obtained, for obtaining multiple sentences in target text;
Prediction and determining module, for predicting between each sentence and designated key according to preset subject classification mode First tightness;And determine similarity between each sentence;
Correction module, for being carried out to first tightness using assignment algorithm according to the similarity between each sentence It corrects at least once, obtains the second tightness between each sentence and the designated key;
Processing module, for being handled according to second tightness each sentence, obtain the target text with The relevant text snippet of the designated key.
9. device according to claim 8, the prediction and determining module include:
First determination unit, for determining the corresponding descriptor of the designated key;
Analytical unit, for analyzing the sentence, to determine in the sentence comprising the descriptor and/or described The number of the related term of descriptor;
Predicting unit, for the first tightness between the sentence according to the estimated number and the designated key;Wherein, It is positively correlated between the number and first tightness.
10. device according to claim 8, the assignment algorithm is Pagerank algorithm;
Correspondingly, the correction module includes:
Creating unit, for being created according to the similarity between each sentence using each sentence as a node Relational network figure between each node;
Second determination unit, for determining that each first tightness is the corresponding initial weight of each sentence;
Iteration unit, for according to the relational network figure, using the Pagerank algorithm to the initial weight carry out to Few an iteration, obtains the corresponding final weight of each sentence.
11. device according to claim 10, the creating unit is also used to:
When the similarity between two sentences reaches preset threshold, determines and deposited between corresponding two nodes of described two sentences In a line;
According to the side between each node, the relational network figure between each node is created.
12. device according to claim 8, the acquisition module include:
Split cells obtains the multiple sentence for splitting according to specified punctuation mark to the target text.
13. device according to claim 8, the processing module include:
Sequence and concatenation unit are ranked up each sentence for the sequence according to second tightness from big to small And splicing, obtain orderly text;
Third determination unit, for determining that the orderly text is the text relevant to the designated key of the target text Abstract.
14. device according to claim 13, the sequence and concatenation unit are also used to:
Filter out corresponding first sentence of second tightness for reaching the second preset threshold;
It is right according to the sequence of corresponding second tightness of each first sentence from big to small for first sentence Each first sentence is ranked up and splicing.
15. a kind of generating device of text snippet, comprising:
Processor;And
It is arranged to the memory of storage computer executable instructions, the executable instruction makes the processing when executed Device:
Obtain multiple sentences in target text;
According to preset subject classification mode, the first tightness between each sentence and designated key is predicted;And it determines each Similarity between the sentence;
According to the similarity between each sentence, first tightness is corrected at least once using assignment algorithm, Obtain the second tightness between each sentence and the designated key;
Each sentence is handled according to second tightness, it is obtaining the target text with the designated key phase The text snippet of pass.
16. a kind of storage medium, for storing computer executable instructions, the executable instruction is realized following when executed Process:
Obtain multiple sentences in target text;
According to preset subject classification mode, the first tightness between each sentence and designated key is predicted;And it determines each Similarity between the sentence;
According to the similarity between each sentence, first tightness is corrected at least once using assignment algorithm, Obtain the second tightness between each sentence and the designated key;
Each sentence is handled according to second tightness, it is obtaining the target text with the designated key phase The text snippet of pass.
CN201910263357.1A 2019-04-02 2019-04-02 Text abstract generation method and device Active CN110162778B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910263357.1A CN110162778B (en) 2019-04-02 2019-04-02 Text abstract generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910263357.1A CN110162778B (en) 2019-04-02 2019-04-02 Text abstract generation method and device

Publications (2)

Publication Number Publication Date
CN110162778A true CN110162778A (en) 2019-08-23
CN110162778B CN110162778B (en) 2023-05-26

Family

ID=67638967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910263357.1A Active CN110162778B (en) 2019-04-02 2019-04-02 Text abstract generation method and device

Country Status (1)

Country Link
CN (1) CN110162778B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704607A (en) * 2019-08-26 2020-01-17 北京三快在线科技有限公司 Abstract generation method and device, electronic equipment and computer readable storage medium
CN111046672A (en) * 2019-12-11 2020-04-21 山东众阳健康科技集团有限公司 Multi-scene text abstract generation method
CN111723196A (en) * 2020-05-21 2020-09-29 西北工业大学 Single document abstract generation model construction method and device based on multi-task learning
CN112364155A (en) * 2020-11-20 2021-02-12 北京五八信息技术有限公司 Information processing method and device
CN113836296A (en) * 2021-09-28 2021-12-24 平安科技(深圳)有限公司 Method, device, equipment and storage medium for generating Buddhist question-answer abstract
CN114627581A (en) * 2022-05-16 2022-06-14 深圳零匙科技有限公司 Coerced fingerprint linkage alarm method and system for intelligent door lock

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998041930A1 (en) * 1997-03-18 1998-09-24 Siemens Aktiengesellschaft Method for automatically generating a summarized text by a computer
US20070282597A1 (en) * 2006-06-02 2007-12-06 Samsung Electronics Co., Ltd. Data summarization method and apparatus
CN104156452A (en) * 2014-08-18 2014-11-19 中国人民解放军国防科学技术大学 Method and device for generating webpage text summarization
CN106294863A (en) * 2016-08-23 2017-01-04 电子科技大学 A kind of abstract method for mass text fast understanding
CN106599148A (en) * 2016-12-02 2017-04-26 东软集团股份有限公司 Method and device for generating abstract
US20170147691A1 (en) * 2015-11-20 2017-05-25 Guangzhou Shenma Mobile Information Technology Co. Ltd. Method and apparatus for extracting topic sentences of webpages
CN107133213A (en) * 2017-05-06 2017-09-05 广东药科大学 A kind of text snippet extraction method and system based on algorithm
CN108062351A (en) * 2017-11-14 2018-05-22 厦门市美亚柏科信息股份有限公司 Text snippet extracting method, readable storage medium storing program for executing on particular topic classification
CN108090049A (en) * 2018-01-17 2018-05-29 山东工商学院 Multi-document summary extraction method and system based on sentence vector
US20180349360A1 (en) * 2017-01-05 2018-12-06 Social Networking Technology, Inc. Systems and methods for automatically generating news article
CN109101489A (en) * 2018-07-18 2018-12-28 武汉数博科技有限责任公司 A kind of text automatic abstracting method, device and a kind of electronic equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998041930A1 (en) * 1997-03-18 1998-09-24 Siemens Aktiengesellschaft Method for automatically generating a summarized text by a computer
US20070282597A1 (en) * 2006-06-02 2007-12-06 Samsung Electronics Co., Ltd. Data summarization method and apparatus
CN104156452A (en) * 2014-08-18 2014-11-19 中国人民解放军国防科学技术大学 Method and device for generating webpage text summarization
US20170147691A1 (en) * 2015-11-20 2017-05-25 Guangzhou Shenma Mobile Information Technology Co. Ltd. Method and apparatus for extracting topic sentences of webpages
CN106294863A (en) * 2016-08-23 2017-01-04 电子科技大学 A kind of abstract method for mass text fast understanding
CN106599148A (en) * 2016-12-02 2017-04-26 东软集团股份有限公司 Method and device for generating abstract
US20180349360A1 (en) * 2017-01-05 2018-12-06 Social Networking Technology, Inc. Systems and methods for automatically generating news article
CN107133213A (en) * 2017-05-06 2017-09-05 广东药科大学 A kind of text snippet extraction method and system based on algorithm
CN108062351A (en) * 2017-11-14 2018-05-22 厦门市美亚柏科信息股份有限公司 Text snippet extracting method, readable storage medium storing program for executing on particular topic classification
CN108090049A (en) * 2018-01-17 2018-05-29 山东工商学院 Multi-document summary extraction method and system based on sentence vector
CN109101489A (en) * 2018-07-18 2018-12-28 武汉数博科技有限责任公司 A kind of text automatic abstracting method, device and a kind of electronic equipment

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704607A (en) * 2019-08-26 2020-01-17 北京三快在线科技有限公司 Abstract generation method and device, electronic equipment and computer readable storage medium
CN111046672A (en) * 2019-12-11 2020-04-21 山东众阳健康科技集团有限公司 Multi-scene text abstract generation method
CN111723196A (en) * 2020-05-21 2020-09-29 西北工业大学 Single document abstract generation model construction method and device based on multi-task learning
CN112364155A (en) * 2020-11-20 2021-02-12 北京五八信息技术有限公司 Information processing method and device
CN112364155B (en) * 2020-11-20 2024-05-31 北京五八信息技术有限公司 Information processing method and device
CN113836296A (en) * 2021-09-28 2021-12-24 平安科技(深圳)有限公司 Method, device, equipment and storage medium for generating Buddhist question-answer abstract
CN114627581A (en) * 2022-05-16 2022-06-14 深圳零匙科技有限公司 Coerced fingerprint linkage alarm method and system for intelligent door lock

Also Published As

Publication number Publication date
CN110162778B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN110162778A (en) The generation method and device of text snippet
US11423082B2 (en) Methods and apparatus for subgraph matching in big data analysis
CN104778158B (en) A kind of document representation method and device
US9542477B2 (en) Method of automated discovery of topics relatedness
US10409828B2 (en) Methods and apparatus for incremental frequent subgraph mining on dynamic graphs
US20180018392A1 (en) Topic identification based on functional summarization
CN108875743A (en) A kind of text recognition method and device
CN108733694B (en) Retrieval recommendation method and device
CN109271624A (en) A kind of target word determines method, apparatus and storage medium
CN109597982A (en) Summary texts recognition methods and device
CN109726386B (en) Word vector model generation method, device and computer readable storage medium
CN113821657A (en) Artificial intelligence-based image processing model training method and image processing method
CN107861950A (en) The detection method and device of abnormal text
CN105786929B (en) A kind of information monitoring method and device
CN111259975A (en) Method and device for generating classifier and method and device for classifying text
CN113127636B (en) Text clustering cluster center point selection method and device
CN111625615B (en) Method and system for processing text data
Bause et al. Metric indexing for graph similarity search
Lindawati et al. Automated parameter tuning framework for heterogeneous and large instances: Case study in quadratic assignment problem
CN107766373A (en) The determination method and its system of the affiliated classification of picture
CN113010642A (en) Semantic relation recognition method and device, electronic equipment and readable storage medium
CN113407714B (en) Aging-based data processing method and device, electronic equipment and storage medium
CN106776529B (en) Business emotion analysis method and device
CN112632981A (en) New word discovery method and device
CN110245265A (en) A kind of object classification method, device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Applicant before: Advanced innovation technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant