CN109408826A

CN109408826A - A kind of text information extracting method, device, server and storage medium

Info

Publication number: CN109408826A
Application number: CN201811317522.9A
Authority: CN
Inventors: 谢永恒; 段小文; 万月亮
Original assignee: Beijing Ruian Technology Co Ltd
Current assignee: Beijing Ruian Technology Co Ltd
Priority date: 2018-11-07
Filing date: 2018-11-07
Publication date: 2019-03-01

Abstract

A kind of text information extracting method, device, server and storage medium provided in an embodiment of the present invention.This method comprises: determining the term vector of candidate word in text by Word2Vec model, and determine the similarity value between different term vectors；Using term vector as node, and according to the side between the similarity value building node between term vector, candidate word atlas is obtained；Candidate word weight is determined according to the candidate word atlas by TextRank algorithm；According to candidate word weight, the keyword of text is determined.Candidate word is converted into term vector by using Word2Vec model, candidate word can be made to be indicated by the vector of low-dimensional, improve treatment effeciency, it is calculated by similarity value, and construct atlas, it can visually reflect the incidence relation between candidate word, the weighted value of candidate word is calculated finally by TextRank algorithm, thus the more accurate keyword for comprehensively determining text.

Description

A kind of text information extracting method, device, server and storage medium

Technical field

The present embodiments relate to text extraction techniques field more particularly to a kind of text information extracting methods, device, clothes Business device and storage medium.

Background technique

With the fast development of internet, the function of network is more and more comprehensive, the amount of web documents information also rapid growth. But many web documents, there are biggish length, people usually require to consume a large amount of time to read entire article ability Obtain crucial news information.For needing to extract for the editor of article information or the monitoring personnel of network, in order to obtain Crucial article information, requires a great deal of time to read the article of big length, greatly reduces working efficiency.Therefore, The automatically extracting of text key word and text snippet greatly shortens people and obtains key message from big length web documents Time, while also having saved the human cost of some companies or enterprise well.

Currently used keyword and abstract extraction method are the sort method based on TextRank algorithm, TextRank's PageRank algorithm of the basic thought based on Google.TextRank universal model can be expressed as an oriented authorized graph G=(V, E), it is made of point set V and line set E, E is the subset of V × V.In (Vi) is the point set for being directed toward point Vi, and Out (Vi) is The point set that point Vi is directed toward.The score of point Vi is defined as follows:

Wherein, d is damped coefficient, and value range is 0 to 1, represents a certain specified point from figure and is directed toward any other point Probability.

Weighted value is calculated according to above-mentioned algorithm and needs to construct atlas according to cooccurrence relation, but this method needs are built in advance It stands the side between all point sets, then is wherein being chosen by the window being arranged, obtain that there are the sides of incidence relation and candidate Word node, building process is cumbersome, and treatment effeciency is low, and is unable to get the relative size of each edge weighted value, causes to pass through The keyword or abstract that TextRank algorithm obtains be not comprehensively accurate.In addition, traditional alphanumeric method form is simple, The vector dimension of conversion is larger, is unfavorable for calculating and handle.

Summary of the invention

The embodiment of the invention provides a kind of text information extracting method, device, server and storage mediums, solve current Comprehensively not accurate using keyword in TextRank algorithm progress information extraction process or abstract acquisition, treatment effeciency is low to ask Topic.

In a first aspect, the embodiment of the invention provides a kind of text information extracting methods, comprising:

The term vector of candidate word in text is determined by Word2Vec model, and determines the similarity between different term vectors Value；

Using term vector as node, and according to the side between the similarity value building node between term vector, candidate is obtained Word atlas；

Candidate word weight is determined according to the candidate word atlas by TextRank algorithm；

According to candidate word weight, the keyword of text is determined.

Second aspect, the embodiment of the invention provides a kind of text information extraction element, described device includes:

First determining module for determining the term vector of candidate word in text by Word2Vec model, and determines different Similarity value between term vector；

First building module, is used for using term vector as node, and construct node according to the similarity value between term vector Between side, obtain candidate word atlas；

First weight determination module, for according to the candidate word atlas, determining that candidate word is weighed by TextRank algorithm Weight；

Keyword determining module, for determining the keyword of text according to candidate word weight.

The third aspect, the embodiment of the invention provides a kind of servers, comprising:

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processing Device is realized such as any text information extracting method in the embodiment of the present invention.

Fourth aspect, the embodiment of the invention provides a kind of computer readable storage mediums, are stored thereon with computer journey Sequence realizes any text information extracting method in the embodiment of the present invention when program is executed by processor.

A kind of text information extracting method, device, server and storage medium provided in an embodiment of the present invention, by using Candidate word is converted to term vector by Word2Vec model, and candidate word can be made to be indicated by the vector of low-dimensional, raising processing Efficiency is calculated by similarity value, and constructs atlas, can visually reflect the incidence relation between candidate word, and lead to The weighted value that TextRank algorithm calculates candidate word is crossed, thus the more accurate keyword for comprehensively determining text.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing does one and simply introduces, it should be apparent that, drawings in the following description are some embodiments of the invention, for this For the those of ordinary skill of field, without creative efforts, it can also be obtained according to these attached drawings others Attached drawing.

Fig. 1 is a kind of text information extracting method flow chart that the embodiment of the present invention one provides；

Fig. 2 is a kind of text information extracting method flow chart provided by Embodiment 2 of the present invention；

Fig. 3 is a kind of text information extraction element structural schematic diagram that the embodiment of the present invention three provides；

Fig. 4 is a kind of server architecture schematic diagram that the embodiment of the present invention four provides.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, hereinafter with reference to attached in the embodiment of the present invention Figure, clearly and completely describes technical solution of the present invention by embodiment, it is clear that described embodiment is the present invention one Section Example, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not doing Every other embodiment obtained under the premise of creative work out, shall fall within the protection scope of the present invention.

Embodiment one

Fig. 1 is a kind of text information extracting method flow chart that the embodiment of the present invention one provides.The technical side of the present embodiment Case can be adapted for the case where extracting to key messages such as keywords in text.This method can be extracted by text information Device executes, which can be realized by the mode of software and/or hardware, and is integrated in server.This method specifically include as Lower operation:

S110, the term vector that candidate word in text is determined by Word2Vec model, and determine between different term vectors Similarity value.

Specifically, crawling text as text to be processed by web crawlers, wherein the text can be different field Newsletter archive.Data scrubbing is carried out to text to be processed, removes the non-textual information in text to be processed, such as punctuate symbol Number, plain text is obtained, and plain text is split as complete sentence.Plain text is segmented using participle tool and carries out word Property mark, remove stop words, leave the everyday words such as noun, adjective, verb as candidate word.Optionally, participle tool can be with Tool is segmented for Ansj or jieba segments tool.

The term vector of text candidates word is determined by Word2Vec model.Wherein, Word2Vec model is to determine text It is obtained before the term vector of candidate word by training, illustratively, a large amount of different types of text datas of selection, such as society, The newsletter archives such as the people's livelihood, sport, music carry out data cleansing, and participle obtains candidate word, and candidate word is put into file In, wherein the candidate word of each one sentence of behavior in file.It is trained using the model in Word2Vec algorithm, it is optional , using the CBOW model based on hierarchical softmax or the Skip- based on hierarchical softmax Gram model is trained, and obtains Word2Vec model.

After mapping obtains the term vector of candidate word, the similarity value between different term vectors is determined.Wherein, similarity value It can be indicated with the cosine value of two different term vectors, it may be assumed that

Wherein, a, b indicate two different candidate words, and similarity (" a ", " b ") is indicated between two candidate words Similarity value, A, B indicate that the corresponding term vector of two candidate words, AB indicate that the dot product of two term vectors, ‖ A ‖ and ‖ B ‖ indicate The vector length of two term vectors, n indicate the dimension of term vector.The similarity value of two candidate words is obtained according to above-mentioned formula, Such as:

Similarity (" Shandong ", " Jiangsu ")=0.41542658

Similarity (" Shandong ", " Beijing ")=0.19865009

Similarity (" Shandong ", " men's basketball ")=0.16770135.

Wherein, the similarity value of two candidate words in Shandong and Jiangsu is larger, then illustrates the association between the two candidate words Degree is higher, and Shandong and the similarity value of two candidate words of men's basketball are smaller, illustrate that the degree of association between the two candidate words is smaller.

S120, using term vector as node, and according between term vector similarity value building node between side, obtain Candidate word atlas.

Specifically, forming point set using term vector as node, V can be expressed as.Optionally, according between term vector Similarity value constructs the side between node, comprising: if the similarity value between two term vectors is greater than preset first similarity Threshold value then constructs the side between described two term vectors, forms side collection, can be expressed as E, and E is the subset of V × V.Wherein, institute Stating preset first similarity threshold can be configured as needed by technical staff, illustratively, by preset first phase It is set as 0.450 like degree threshold value, when the similarity value between two candidate words is greater than 0.450, shows the relevance between it It is higher, then construct the side between two candidate word nodes.Candidate word atlas G=(V, E) is obtained according to point set and Bian Ji, according to time Select the incidence relation between the available keyword of word atlas G.

S130, candidate word weight is determined according to the candidate word atlas by TextRank algorithm.

Specifically, calculating the weighted value of candidate word, calculation formula according to TextRank algorithm formula are as follows:

Wherein, d is damped coefficient, and value range is 0 to 1, and the general of any other point is directed toward in the certain point represented from figure Rate, general value are 0.85, V_iAnd V_jIndicate two different candidate word nodes, WS (V_i) and WS (V_j) indicate two candidate words Weight, In (V_i) indicate to be directed toward node V_iNode set, Out (V_j) indicate node V_jThe set of the node of direction, w_jiWith w_jkIndicate the weighted value on the side between two nodes, i.e. similarity value between the candidate word of two node on behalf.Pass through above-mentioned public affairs Formula, and according to the incidence relation between the candidate word in candidate word atlas, determine the weight of each candidate word, and iterative diffusion is each The weight of node, until convergence, optionally, convergency value can be set to 0.0001.

S140, according to candidate word weight, determine the keyword of text.

Optionally, according to candidate word weight, the keyword of text is determined, comprising: press to candidate word according to candidate word weight It is ranked up according to inverted order；Keyword of the preceding candidate word of selected and sorted as text.Specifically, being obtained by TextRank algorithm To the weighted value of candidate word, candidate word is ranked up according to weighted value size, optionally, is arranged according to weighted value inverted order Sequence selects weighted value to sort preceding candidate word as text key word, and the keyword number of selection can be by technical staff's root According to being set.

Illustratively, according to candidate word weight, after the keyword for determining text, if can also include: at least two passes The position of keyword in the text is adjacent, then synthesizes at least two keyword.Specifically, obtained keyword is existed Position mark is carried out in original text, if the position of at least two keywords in the text is adjacent, by least two adjacent passes Keyword is synthesized, and more word keywords are formed.

A kind of text information extracting method provided in an embodiment of the present invention, firstly, determining text by Word2Vec model The term vector of middle candidate word, and determine the similarity value between different term vectors；Then, using term vector as node, and according to The side between similarity value building node between term vector, obtains candidate word atlas；By TextRank algorithm, according to described Candidate word atlas determines candidate word weight；Finally, determining the keyword of text according to candidate word weight.By using Candidate word is converted to term vector by Word2Vec model, and candidate word can be made to be indicated by the vector of low-dimensional, improves place Efficiency is managed, is calculated by similarity value, can accurately obtain the connection being selected between word, and by building atlas, it can Reflect the incidence relation between candidate word accurate and visually, the weighted value of candidate word calculated finally by TextRank algorithm, To the more accurate keyword for comprehensively determining text.

Embodiment two

Fig. 2 is a kind of text information extracting method flow chart provided by Embodiment 2 of the present invention.The present embodiment is in above-mentioned reality It applies and advanced optimizes on the basis of example, wherein the not content detailed in Example one of detailed description in the present embodiment.Such as Fig. 2 institute Show, a kind of text information extracting method provided by Embodiment 2 of the present invention specifically includes the following steps:

S210, the term vector that candidate word in text is determined by Word2Vec model, and determine between different term vectors Similarity value.

S220, using term vector as node, and according between term vector similarity value building node between side, obtain Candidate word atlas.

S230, candidate word weight is determined according to the candidate word atlas by TextRank algorithm.

S240, according to candidate word weight, determine the keyword of text.

The term vector of S250, the candidate word according to included by sentence in text determine that the vector of sentence indicates, and determine not With the similarity value between the vector expression of sentence.

Reader is not often only intended to know the keyword of text, it is also necessary to pass through text when reading newsletter archive information This abstract more comprehensively specifically understands content of text.Illustratively, term vector included by sentence in text is overlapped, is obtained Dimension to sentence vector, the sentence vector is identical as the dimension of the term vector of candidate word.By between different sentences to Amount indicates to determine the similarity value between different sentences.Illustratively, available:

Similarity (" Apples of Shandong good harvest ", " peasant is in Jiangsu kind rice ")=0.48500857

Similarity (" Apples of Shandong good harvest ", " failure of Shandong football ")=0.31601506.

Wherein, the similarity value between " Apples of Shandong good harvest " and " peasant is in Jiangsu kind rice " is larger, then illustrates two The degree of association between sentence is larger.

S260, the vector table of sentence is shown as to node, and section is constructed according to the similarity between the expression of the vector of sentence Side between point, obtains sentence atlas.

The vector table of sentence is shown as node, point set is formed, V ' can be expressed as.Optionally, according to the vector of sentence The side between similarity value building node between expression, comprising: if the similarity value between the vector expression of two sentences is big In preset second similarity threshold, then the side between the vector expression of described two sentences is constructed, side collection is formed, can indicate It is V ' × V ' subset for E ', E '.Wherein, preset second similarity threshold can be carried out as needed by technical staff Setting, illustratively, sets 0.550 for preset second similarity threshold, when the similarity value between two candidate words is big When 0.550, show that the relevance between it is higher, then constructs the side between two sentence nodes.It is obtained according to point set and side collection To sentence atlas G '=(V ', E '), according to the incidence relation between the available sentence of sentence atlas G '.

S270, sentence weight is determined according to the sentence atlas by TextRank algorithm.

Specifically, being closed by the calculation formula in TextRank according to the association between the different sentences in sentence atlas System, determines the weighted value of sentence, and the weight of each node of iterative diffusion, until convergence, optionally, convergency value be can be set to 0.0001。

S280, according to sentence weight, determine the abstract of text.

Optionally, according to sentence weight, determine the abstract of text, comprising: according to sentence weight to sentence according to inverted order into Row sequence；The abstract of the preceding sentence composition text of selected and sorted.Specifically, obtaining the weight of sentence by TextRank algorithm Value, is ranked up candidate word according to weighted value size, optionally, is ranked up according to weighted value inverted order, selects weighted value row The preceding sentence of sequence forms text snippet, and the sentence number of selection can be set as needed by technical staff.

A kind of text information extracting method provided in an embodiment of the present invention, increases step: being wrapped according to sentence in text The term vector of the candidate word included determines that the vector of sentence indicates, and the similarity value between the vector expression of determining different sentences； The vector table of sentence is shown as node, and according to the side between the similarity building node between the expression of the vector of sentence, is obtained To sentence atlas；Sentence weight is determined according to the sentence atlas by TextRank algorithm；According to sentence weight, text is determined This abstract.The vector expression of sentence is formed by the way that candidate term vector to be overlapped, the vector for reducing sentence indicates dimension, Treatment effeciency is improved, and by calculating similarity value and building atlas, reflects the association between sentence more accurate and visually Property, sentence weight is obtained finally by TextRank algorithm, and then obtain text snippet, solved at present since sentence vector is tieed up The low problem of higher caused treatment effeciency is spent, and because of incomplete ask of making a summary caused by similarity calculation error between sentence Topic, obtained abstract more comprehensively intuitively reflect content of text.

Embodiment three

Fig. 3 is that a kind of text information that the embodiment of the present invention three provides proposes apparatus structure schematic diagram.As shown in figure 3, described Device includes:

First determining module 310 for determining the term vector of candidate word in text by Word2Vec model, and determines not With the similarity value between term vector；

First building module 320, is used for using term vector as node, and construct section according to the similarity value between term vector Side between point, obtains candidate word atlas；

First weight determination module 330, for being determined candidate by TextRank algorithm according to the candidate word atlas Word weight；

Keyword determining module 340, for determining the keyword of text according to candidate word weight.

Optionally, the first building module 320 is specifically used for:

If similarity value between two term vectors is greater than preset first similarity threshold, construct described two words to Side between amount.

Optionally, the keyword determining module 340, is specifically used for:

Candidate word is ranked up according to inverted order according to candidate word weight；

Keyword of the preceding candidate word of selected and sorted as text.

Optionally, the keyword determining module 340, is also used to:

If the position of at least two keywords in the text is adjacent, at least two keyword is synthesized.

Optionally, further includes:

Second determining module determines the vector of sentence for the term vector of the candidate word according to included by sentence in text It indicates, and the similarity value between the vector expression of determining different sentences；

Second building module, for the vector table of sentence to be shown as node, and according between the expression of the vector of sentence Similarity constructs the side between node, obtains sentence atlas；

Second weight determination module, for determining sentence weight according to the sentence atlas by TextRank algorithm；

Abstract determining module, for determining the abstract of text according to sentence weight.

Optionally, further includes:

Synthesis module, it is crucial to described at least two if adjacent for the position of at least two keywords in the text Word is synthesized.

A kind of text information extraction element provided in an embodiment of the present invention, a kind of text information proposed with above-described embodiment Extracting method belongs to same inventive concept, and the technical detail of detailed description not can be found in above-described embodiment in the present embodiment, and And the present embodiment and above-described embodiment beneficial effect having the same.

Example IV

Fig. 4 is a kind of structure chart for server that the embodiment of the present invention four provides.Fig. 4, which is shown, to be suitable for being used to realizing this hair The block diagram of the exemplary processing devices 412 of bright embodiment.The processing equipment 412 that Fig. 4 is shown is only an example, should not be right The function and use scope of the embodiment of the present invention bring any restrictions.

As shown in figure 4, processing equipment 412 is showed in the form of universal computing device.The component of processing equipment 412 can wrap Include but be not limited to: one or more processor or processing unit 416, system storage 428 connect different system components The bus 418 of (including system storage 428 and processing unit 416).

Bus 418 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Processing equipment 412 typically comprises a variety of computer system readable media.These media can be it is any can be by The usable medium that processing equipment 412 accesses, including volatile and non-volatile media, moveable and immovable medium.

System storage 428 may include the computer system readable media of form of volatile memory, such as deposit at random Access to memory (RAM) 430 and/or cache memory 432.Processing equipment 412 may further include it is other it is removable/no Movably, volatile/non-volatile computer system storage medium.Only as an example, storage system 434 can be used for reading and writing Immovable, non-volatile magnetic media (Fig. 4 do not show, commonly referred to as " hard disk drive ").Although not shown in fig 4, may be used To provide the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk "), and it is non-volatile to moving Property CD (such as CD-ROM, DVD-ROM or other optical mediums) read and write CD drive.In these cases, each drive Dynamic device can be connected by one or more data media interfaces with bus 418.Memory 428 may include at least one journey Sequence product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform this hair The function of bright each embodiment.

Program/utility 440 with one group of (at least one) program module 442, can store in such as memory In 428, such program module 442 includes but is not limited to operating system, one or more application program, other program modules And program data, it may include the realization of network environment in each of these examples or certain combination.Program module 442 Usually execute the function and/or method in embodiment described in the invention.

Processing equipment 412 can also be with one or more external equipments 414 (such as keyboard, sensing equipment, display 424 Deng) communication, can also be enabled a user to one or more equipment interact with the processing equipment 412 communicate, and/or with make Any equipment (such as network interface card, the modem that the processing equipment 412 can be communicated with one or more of the other calculating equipment Etc.) communication.This communication can be carried out by input/output (I/O) interface 422.Also, processing equipment 412 can also lead to Cross network adapter 420 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, example Such as internet) communication.As shown, network adapter 420 is communicated by bus 418 with other modules of processing equipment 412.It answers When understanding, although not shown in the drawings, other hardware and/or software module can be used with combination processing equipment 412, including but unlimited In: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and number According to backup storage system etc..

Processing unit 416 is by running at least one of other programs in the multiple programs being stored in system storage 428 It is a, thereby executing various function application and data processing, such as realize a kind of text information provided by the embodiment of the present invention Extracting method, comprising:

According to candidate word weight, the keyword of text is determined.

Embodiment five

The embodiment of the present invention five additionally provides a kind of storage medium comprising computer executable instructions, and the computer can It executes instruction when being executed by computer processor for executing a kind of text information extracting method, comprising:

According to candidate word weight, the keyword of text is determined.

The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device Using or it is in connection.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service It is connected for quotient by internet).

Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims

1. a kind of text information extracting method, which is characterized in that the described method includes:

The term vector of candidate word in text is determined by Word2Vec model, and determines the similarity value between different term vectors；

Using term vector as node, and according to the side between the similarity value building node between term vector, candidate word figure is obtained Collection；

According to candidate word weight, the keyword of text is determined.

2. the method according to claim 1, wherein according between the similarity value building node between term vector Side, comprising:

If similarity value between two term vectors is greater than preset first similarity threshold, construct described two term vectors it Between side.

3. being wrapped the method according to claim 1, wherein determining the keyword of text according to candidate word weight It includes:

Keyword of the preceding candidate word of selected and sorted as text.

4. the method according to claim 1, wherein determining the word of candidate word in text by Word2Vec model After vector, further includes:

The term vector of the candidate word according to included by sentence in text determines that the vector of sentence indicates, and determines different sentences Similarity value between vector expression；

The vector table of sentence is shown as node, and according between the similarity building node between the expression of the vector of sentence Side obtains sentence atlas；

Sentence weight is determined according to the sentence atlas by TextRank algorithm；

According to sentence weight, the abstract of text is determined.

5. the method according to claim 1, wherein according to candidate word weight, after the keyword for determining text, Further include:

6. a kind of text information extraction element characterized by comprising

First determining module, for determining the term vector of candidate word in text by Word2Vec model, and determine different words to Similarity value between amount；

First building module, is used for using term vector as node, and according between the similarity value building node between term vector Side, obtain candidate word atlas；

First weight determination module, for determining candidate word weight according to the candidate word atlas by TextRank algorithm；

7. device according to claim 6, which is characterized in that the first building module is specifically used for:

8. device according to claim 6, which is characterized in that the keyword determining module is specifically used for:

Keyword of the preceding candidate word of selected and sorted as text.

9. a kind of server characterized by comprising

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as a kind of text information extracting method as claimed in any one of claims 1 to 5.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor A kind of such as text information extracting method as claimed in any one of claims 1 to 5 is realized when execution.