CN110069623A - Summary texts generation method, device, storage medium and computer equipment - Google Patents

Summary texts generation method, device, storage medium and computer equipment Download PDF

Info

Publication number
CN110069623A
CN110069623A CN201711278814.1A CN201711278814A CN110069623A CN 110069623 A CN110069623 A CN 110069623A CN 201711278814 A CN201711278814 A CN 201711278814A CN 110069623 A CN110069623 A CN 110069623A
Authority
CN
China
Prior art keywords
text
normal form
crucial
summary texts
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711278814.1A
Other languages
Chinese (zh)
Other versions
CN110069623B (en
Inventor
刘康
赵占平
窦晓妍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201711278814.1A priority Critical patent/CN110069623B/en
Priority to PCT/CN2018/119214 priority patent/WO2019109918A1/en
Publication of CN110069623A publication Critical patent/CN110069623A/en
Application granted granted Critical
Publication of CN110069623B publication Critical patent/CN110069623B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions

Abstract

This application involves a kind of summary texts generation method, device, computer readable storage medium and computer equipment, method includes: to obtain normal form text and corresponding class label;Preset normal form feature corresponding to query categories label;Crucial text is extracted from normal form text according to normal form feature;Identify text categories belonging to normal form text;According to template corresponding to text categories by the crucial text split of extraction, summary texts are obtained.The efficiency rewritten to text can be improved in scheme provided by the present application.

Description

Summary texts generation method, device, storage medium and computer equipment
Technical field
It, can more particularly to a kind of summary texts generation method, device, computer this application involves field of computer technology Read storage medium and computer equipment.
Background technique
With the rapid development of Internet, more and more information are disclosed by network.Since terminal is from mutual Received information of networking is more and more, crucial information is quickly found from interminable information, it appears particularly significant.
In traditional approach, letter is relied on after usually refining by the higher staff of profession degree to the information of publication Single template rewrites the information of disclosure, and the text of rewriting is then sent to terminal.Obviously, with issuing on network Information is more and more, this too low by the mode efficiency manually rewritten.
Summary of the invention
Based on this, it is necessary to which it is raw to provide a kind of summary texts for the technical issues of being directed to existing rewrite method inefficiency At method, apparatus, computer readable storage medium and computer equipment.
A kind of summary texts generation method, comprising:
Obtain normal form text and corresponding class label;
Preset normal form feature corresponding to query categories label;
Crucial text is extracted from normal form text according to normal form feature;
Identify text categories belonging to normal form text;
According to template corresponding to text categories by the crucial text split of extraction, summary texts are obtained.
A kind of summary texts generating means, comprising:
Module is obtained, for obtaining normal form text and corresponding class label;
Enquiry module, for preset normal form feature corresponding to query categories label;
Extraction module, for extracting crucial text from normal form text according to normal form feature;
Identification module, for identification text categories belonging to normal form text;
Die section, for, by the crucial text split of extraction, obtaining abstract text according to template corresponding to text categories This.
A kind of computer readable storage medium is stored with computer program, when computer program is executed by processor, so that Processor executes the step of above-mentioned summary texts generation method.
A kind of computer equipment, including memory and processor, memory are stored with computer program, computer program quilt When processor executes, so that the step of processor executes above-mentioned summary texts generation method.
Above-mentioned summary texts generation method, device, computer readable storage medium and computer equipment, pass through what is inquired The corresponding normal form feature of normal form text, so that it may extract crucial text from the normal form text, identify the normal form After the corresponding text categories of text, so that it may the crucial text split that the corresponding template of text classification will be extracted is relied on, from And obtain summary texts.Process due to entirely generating summary texts does not need manually to participate in, can greatly improve to text into The efficiency that row is rewritten.
Detailed description of the invention
Fig. 1 is the applied environment figure of summary texts generation method in one embodiment;
Fig. 2 is the flow diagram of summary texts generation method in one embodiment;
Fig. 3 is the interface schematic diagram of the corresponding template of normal form text of stock right transfer class in one embodiment;
Fig. 4 is flow diagram the step of obtaining normal form text and corresponding class label in one embodiment;
Fig. 5 is to be plucked the crucial text split of extraction according to template corresponding to text categories in one embodiment The flow diagram for the step of wanting text;
Fig. 6 is the flow diagram of summary texts generation method in a specific embodiment;
Fig. 7 is to generate showing for corresponding summary texts according to the bullet in file according to exchange's publication of the embodiment of the present application It is intended to;
Fig. 8 is to generate showing for corresponding summary texts according to the bullet in file according to exchange's publication of the embodiment of the present application It is intended to;
Fig. 9 is the key text extracted according to the content of text of the bullet in file according to exchange's publication of the embodiment of the present application This schematic diagram;
Figure 10 is the schematic diagram that crucial text is extracted in the slave bullet in file provided according to the embodiment of the present application;
Figure 11 is the crucial text progress split to extracting from bullet in file provided according to the embodiment of the present application Schematic diagram;
Figure 12 is the schematic diagram identified to the text categories of bullet in file provided according to the embodiment of the present application;
Figure 13 is the schematic diagram to the crucial text progress split extracted provided according to the embodiment of the present application;
Figure 14 be provided according to the embodiment of the present application be summary texts matching modal particle schematic diagram;
Figure 15 be provided according to the embodiment of the present application be parallel construction summary texts matching modal particle schematic diagram;
Figure 16 is the structural block diagram of summary texts generating means in one embodiment;
Figure 17 is the structural block diagram of summary texts generating means in one embodiment;
Figure 18 is the structural block diagram of summary texts generating means in one embodiment;
Figure 19 is the structural block diagram of summary texts generating means in one embodiment;
Figure 20 is the structural block diagram of computer equipment in one embodiment.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and It is not used in restriction the application.
Fig. 1 is the applied environment figure of summary texts generation method in one embodiment.Referring to Fig.1, which generates Method is applied to summary texts and generates system.It includes terminal 110 and server 120 that the summary texts, which generate system,.110 He of terminal Server 120 passes through network connection.Terminal 110 specifically can be terminal console or mobile terminal, and mobile terminal specifically can be with hand At least one of machine, tablet computer, laptop etc..Server 120 can use independent server either multiple clothes The server cluster of business device composition is realized.
As shown in Fig. 2, in one embodiment, providing a kind of summary texts generation method.The present embodiment is mainly with this Method is applied to the server 120 in above-mentioned Fig. 1 to illustrate.Referring to Fig. 2, the summary texts generation method specifically include as Lower step:
S202 obtains normal form text and corresponding class label.
Wherein, normal form text can be the content of text with fixed normal form, can be the text for defining component part This content.For example, normal form text specifically can be in the text in the bullet in file about listed company of exchange's publication Hold, can also be the content of text in normative legal documents.Bullet in file may include transaction prompt file, exchange's bulletin File, supervision message file, listed company's message file, fund-raising gap message file, fund information file, Transaction Information disclose File or bond message file etc..Normative legal documents include people quotient's ruling document, administrative determination document, criminal ruling text Book, notarization legal documents, lawsuit legal documents, non-tells legal documents, company management's document, the liquidation of corporation at arbitration legal documents Document, patent application publication text or license bulletin text etc..
Class label is the label for dividing classification to the file with different fixed normal forms.The text of different classes of label Part has different normal forms, and the file normal form having the same for corresponding to same category label.Server can obtain normal form Class label entrained by normal form text is obtained when text.
For example, having corresponded to class label A in listed company's message file in the bullet in file of exchange's publication, having melted It provides message file of raising stocks and has corresponded to class label B.So, exchange publication about first company listing company's message file with Fund-raising gap message file is different normal forms, has respectively corresponded class label A and class label B, and exchange issues respectively Listed company's message file about first company and company B be identical normal form, all corresponded to class label A.
In one embodiment, server can grab normal form file by network, according to the normal form file of crawl It determines the corresponding label classification of the normal form file, and extracts normal form text from the normal form file of crawl, normal form text The corresponding label classification of part is the corresponding label classification of normal form text.
In one embodiment, server can be monitored the webpage of publication normal form file, grab the net in real time The corresponding HTML content of page, then parses the label in the HTML content of crawl, finds the normal form of the Homepage Publishing After changing the corresponding download link of file, normal form file is downloaded according to the link, normal formization text is extracted from the normal form file This.Normal form file can be the file of PDF (Portable Document Format, portable document format) format, will The file of the PDF format is converted into the file of TXT (Text, plain text document) format, obtains normal form text.
S204, preset normal form feature corresponding to query categories label.
Wherein, normal form is characterized in indicating the feature of the normal form of normal form text, can be used for the key in normal form text Text is positioned.Preset normal form is characterized in pre-stored normal form feature corresponding with class label.Different classification marks Label have usually corresponded to different normal form features.Specifically, normal form feature may include the section that critical section is fallen in normal form text Dropping place is set, and may include sequence text prompt word, can also include keyword.
For example, listed company's message file has corresponded to class label A, server in the bullet in file of exchange's publication After having extracted normal form text in listed company's file, inquiring the corresponding normal form feature of category label A be can wrap Include: critical section falls in the paragraph position in listed company's information text are as follows: the general introduction portion in listed company's information text Point;Sequence text prompt word are as follows: " important content prompt ";Keyword are as follows: " transfer ", " equity " etc..
In one embodiment, server can get normal form text and classification corresponding with normal form text After label, the mapping relations of pre-stored class label Yu normal form feature are searched, inquire classification mark according to the mapping relations Sign corresponding normal form feature.
In one embodiment, server it is corresponding can to read the normal form file after getting normal form file Document identification number determines the corresponding class label of the corresponding normal form text of the normal form file according to this document identification number.
S206 extracts crucial text according to normal form feature from normal form text.
Wherein, crucial text is the text that key message is loaded in normal form text.Crucial text can be by multistage text Composition.Specifically, server can extract crucial text according to digest extraction algorithm from normal form text.
In one embodiment, server can be after extracting crucial text in normal form text, to the pass of extraction Redundance character in key text deletes processing.Redundance character can be at least one of bracket, annotation or annex.Specifically Ground, server can match one by one each character in the crucial text extracted with default redundance character collection, such as using just Then expression formula is matched, and the redundance character that belonging to of then will match to, default redundance character was concentrated is from the key extracted It is deleted in text.
In one embodiment, server can be after extracting crucial text in normal form text, to crucial text Present in missing sequence number, serial number entanglement corrected.Specifically, server can successively extract the big of sequence number value by comparing It is small, judge whether there is serial number entanglement.When check the serial number in crucial text it is discontinuous when, the serial number in text is repaired Just.
In one embodiment, server can carry out the numerical value in the crucial text of extraction to simplify expression processing.For example, Decimal in crucial text can be subjected to the processing that rounds up, for example convert " 25.4% " for " 25.3831% ".Server It can also will occupy the longer numerical character of character to be indicated with Chinese character, for example, by the character " 100000 in crucial text Member " is reduced to " 100,000 yuan ".Server can also make rounding processing to larger value of character in crucial text, for example, by word Symbol " 1000004567 yuan " replaces with " about 1,000,000,000 yuan ".
In one embodiment, server can carry out duplicate removal processing to the crucial text of extraction.Specifically, server can be right The crucial text of extraction is detected, when there are same or similar degree to reach preset value in the crucial text for detect extraction When key message, then the key message for reaching preset value to same or similar degree carries out duplicate removal processing.
S208 identifies text categories belonging to normal form text.
Wherein, text categories are that corresponding classification after classification is carried out to the content of text of normal form text.Text categories are It is divided according to the content of text of normal form text.Text categories are different from class label.
For example, class label A has been corresponded in listed company's message file in the bullet in file of exchange's publication, on City's notice information file can be the file of restructuring about listed company, can be combination of enterprise file, can also be enterprise Industry stock right transfer file etc..So, it is extracted from file of restructuring, combination of enterprise file, enterprise's stock right transfer file Normal form text has just respectively corresponded three kinds of different text categories, i.e. recombination classes, merging class and transfer class.
In one embodiment, server obtains and the model after the text categories to normal form text identify The corresponding multiple text categories of formula text.Multiple text categories specifically can be 3 text categories.
In one embodiment, server can classify according to normal form text of the sorting algorithm to input, obtain The corresponding text categories of normal form text.Sorting algorithm can be Rocchio algorithm, Naive Bayes Classification Algorithm, K- neighbour Algorithm, decision Tree algorithms, neural network algorithm or algorithm of support vector machine etc..
In one embodiment, server can know the text categories of normal form text by machine learning model Not.The corresponding vector of building normal form text is input to machine learning model and predicted by server, obtains normal formization text This corresponding text categories.Machine learning model specifically can be SVM (Support Vector Machine, support vector machines) Model.
In one embodiment, server is by determining text feature according to normal form text, and text feature is converted Classify for category feature to the text, corresponding each category feature gives corresponding score, by the high classification of fractional value As the corresponding text categories of normal form text.
S210 obtains summary texts according to template corresponding to text categories by the crucial text split of extraction.
Wherein, template is that have the design of preset fixed format.The crucial text that template can be used for extract with The format of immobilization carries out split, to obtain summary texts.Summary texts after split are in the default fixed format of the template Existing key text.
As shown in figure 3, schematically illustrating with text categories is the corresponding template 300 of stock right transfer class, in the template 300 In, with predetermined symbol 302, such as the title of " [] " Lai Chengxian summary texts;It is being plucked with default first position 304 title is presented Want the position in text;The corresponding reported object of the summary texts is presented to preset the second position 306, to preset the third place 308 are presented report dates of the summary texts.
In one embodiment, server can pre-establish the mapping table of text categories and template in the database, After server identifies the corresponding text categories of normal form text, pulled by searching for the mapping table and the text The crucial text extracted is carried out split using the template, obtains summary texts by the corresponding template of classification.
Above-mentioned summary texts generation method passes through the corresponding normal form feature of the normal form text inquired, so that it may from this Crucial text is extracted in normal form text, after identifying the corresponding text categories of normal form text, so that it may rely on this article The crucial text split that the corresponding template of this classification will be extracted, to obtain summary texts.Due to entirely generating summary texts Process do not need manually to participate in, the efficiency rewritten to text can be greatly improved.
As shown in figure 4, in one embodiment, step S202 is specifically included:
S402 monitors bullet in file source.
Wherein, bullet in file source is the source of bullet in file.Bullet in file source specifically can be the net of publication bullet in file Page.Bullet in file source can also be that the corresponding database in website of publication bullet in file, database can be one or more.Tool Body, server is measured in real time database after the access authority for getting the corresponding database in website.
In one embodiment, server can establish one and supervise to the corresponding database in website of publication bullet in file The listening thread listened, by the listening thread, with the quantity of bullet in file in prefixed time interval timing inquiry database whether It changes, to realize the real-time monitoring to public documents source.
S404 then obtains newly-increased bullet in file when monitoring that bullet in file source increases bullet in file newly.
In one embodiment, it when listening thread monitors the quantity increase of bullet in file in bullet in file source, then supervises Measuring the bullet in file source has newly-increased bullet in file, the corresponding mark of the bullet in file is returned to by the listening thread, according to the mark Knowledge is associated inquiry, obtains the corresponding download address of the bullet in file, loads to obtain newly-increased bulletin according to the download address File.
S406 extracts normal form text from bullet in file.
In one embodiment, bullet in file exists in the form of picture, to read the text of the normal formization in bullet in file This, server can extract bullet in file by image recognition algorithm, obtain normal form text.
S408 is read and the associated class label of bullet in file.
In one embodiment, it issues and stores bulletin text in the corresponding database in website of bullet in file in the form of a table Part, the associated field of mark with public documents includes class label, download address etc..Server can find it is newly-increased Bullet in file corresponding mark when, the corresponding class label of the bullet in file is read by correlation inquiry.
In the above-described embodiments, server is by carrying out real-time monitoring to bullet in file source, when monitoring bullet in file source When having new bullet in file publication, then the bullet in file newly issued is extracted, normal formization text is extracted from the bullet in file newly issued This, is using the normal form text of extraction as the material for generating summary texts, so as to carry out comprehensive monitoring to bullet in file source, The bullet in file of update is rewritten in real time with realizing.
In one embodiment, crucial text includes at least one of crucial paragraph, crucial whole sentence and key half;Step Rapid S206 is specifically included: when normal form feature includes the paragraph position that critical section is fallen in normal form text, according to paragraph position Crucial paragraph is extracted from normal form text;When normal form feature includes sequence text prompt word, from normal form text with sequence Crucial whole sentence is extracted at the corresponding position of column text prompt word;When normal form feature includes keyword, mentioned from normal form text Take key half including keyword.
Wherein, half is the dispersion sentence obtained after being divided with any punctuation mark to normal form text.Punctuate symbol Number such as at least one of comma, pause mark, fullstop or line feed symbol etc..Whole sentence be in punctuation mark fullstop, exclamation or The dispersion sentence that question mark etc. obtains after dividing to normal form text.Paragraph is to enter a new line symbol to normal form text and draw The dispersion paragraph obtained after point.Crucial paragraph, crucial whole sentence and crucial half are half extracted from normal form text respectively Sentence, whole sentence and paragraph.
Paragraph position is the predeterminated position for dropping into line index to critical section in normal form text.Sequence text prompt word It is the default prompt word for being indexed in normal form text to crucial whole sentence.Keyword is for carrying out to crucial whole sentence The preset word of index.
In one embodiment, when normal form feature includes the paragraph position that critical section is fallen in normal form text, service Device can first determine in normal form text the position where crucial paragraph according to the paragraph position, then according to paragraph position from normal form Change the paragraph position in text and extracts crucial paragraph.
In one embodiment, when normal form feature includes sequence text prompt word, server just mentions the sequence text Show that word is matched with normal form text, the position where crucial whole sentence is determined, then from corresponding with sequence text prompt word Crucial whole sentence is extracted at position.
In one embodiment, when normal form feature includes keyword, server is just by the keyword and normal form text It is matched, when there is half comprising keyword in normal form text, then extracting from normal form text includes keyword Key half.
In the above-described embodiments, server passes through the corresponding text categories of normal form text, inquiry and text classification pair The normal form feature answered can be extracted according to normal form feature to more targetedly being refined in normal form text Crucial text is not only comprehensively but also accurate.
In one embodiment, the step of extracting crucial paragraph from normal form text according to paragraph position specifically includes: Screen the first half split out from the normal form text middle section dropping place place of setting;The first half corresponding weights for obtaining and filtering out Value;Determine that weighted value meets the first half of the first preset condition in the first half filtered out;The first default item will be met Part and continuous the first half form crucial paragraph.
Wherein, the first half are obtained after splitting the text at the normal form text middle section dropping place place of setting as unit of half Fifty-fifty sentence.Weighted value is the quantization means value of the different degree of normal form text belonging to fifty-fifty sentence pair.First preset condition is pre- If the first half in the condition that meets of the corresponding weighted value in part half.First preset condition specifically can be in the first half Half weighted value ranking before 10.
Specifically, server can first be split normal form text as unit of half, and obtain fractionation obtain it is each Half corresponding weighted value, after obtaining the paragraph position where crucial paragraph, screening is obtained pair from the sentence split out Half at paragraph position is answered, as the first half, the first half corresponding weighted values for obtaining that screening obtains, by the first half Middle weighted value meets the first preset condition and continuous the first half form crucial paragraph.
In one embodiment, server can successively traverse the punctuation mark in normal form text, find any mark When point symbol, using continuous before the punctuation mark and text without punctuation mark as half.
For example, in first punctuation mark being matched in normal form text, first punctuate is accorded with for server Half of text formation before number;In second punctuation mark being matched in corresponding normal form text, second is marked Character before the point symbol and text after first punctuation mark is as half;And so on, to realize normal form Text is split as unit of half.
In one embodiment, server can successively sequentially addition marks by the sentence for each fractionation, mark value To indicate half position and sequence in normal form text.When the weighted value in the first half screened meets first Condition, server can whether sequence continuously determines what this was filtered out by the first half corresponding mark values judging filter out The first half whether be it is continuous, continuous half forms continuous whole sentence, to obtain corresponding crucial paragraph accordingly.
In one embodiment, the weight that fifty-fifty sentence pair answers affiliated normal form text can be obtained by digest extraction algorithm Value.
In the above-described embodiments, it by extracting the first half from the critical section dropping place place of setting, and is screened from the first half Weighted value higher the first half out to form crucial paragraph, can obtain the crucial paragraph of normal form text, the key of extraction Paragraph is more accurate.
In one embodiment, crucial whole sentence is extracted from position corresponding with sequence text prompt word in normal form text The step of specifically include: the second half corresponding with sequence text prompt word in screening normal form text;It obtains and filters out The second half corresponding weighted values;Determine that weighted value meets the second the half of the second preset condition in the second half filtered out Sentence;The second preset condition will be met and continuous the second half form crucial whole sentence.
Wherein, the second half are by the text at position corresponding with sequence text prompt word in normal form text with half The fifty-fifty sentence obtained after being split for unit.The weighted value of part half meets in second preset condition be the second half Condition.Second preset condition specifically can be 5 before half weighted value ranking in the second half.
Specifically, server can first be split normal form text as unit of half, and obtain fractionation obtain it is each Half corresponding weighted value is screened from the sentence split out and is obtained after obtaining the corresponding position of sequence text prompt word Half of corresponding position, as the second half, the second half corresponding weighted values for obtaining that screening obtains will be in the second half Weighted value meets the second preset condition and continuous the second half form crucial whole sentence.
In one embodiment, server can be after obtaining weighted value and meeting the second half of the second preset condition, can The second half filtered out are traversed, it is any in finding the symbol to search any symbol in fullstop, exclamation or question mark It, will be before the symbol and the continuous text without other fullstops, exclamation or question mark is as the whole sentence of key when symbol.
For example, server is in first fullstop being matched in the second half filtered out, by first fullstop The whole sentence of key that text before is formed;In second fullstop being matched in the second half filtered out, by second sentence Character before number and the text after first fullstop are used as crucial whole sentence;And so on, with filtered out second Half corresponding whole sentence of key.
In the above-described embodiments, by extracting the second half from the corresponding position of sequence text prompt word, and from Weighted value higher the second half are filtered out in 2 half to form crucial whole sentence, so that the whole sentence of key and normal formization text that extract This is more matched.
In one embodiment, it is extracted from normal form text and includes the steps that key half of keyword specifically includes: From half split out in normal form text, screening includes third half of keyword;The third for obtaining and filtering out half The corresponding weighted value of sentence;Third half that weighted value is met third preset condition is used as key half.
Wherein, third half is in half obtained after being split normal form text as unit of half comprising key Half of word.The condition that the weighted value of part half meets in third preset condition be third half.Third presets item Part specifically can be 10 before half weighted value ranking in third half.Keyword is preset and normal form text text class Not corresponding keyword.
Specifically, server is split normal form text to obtain third half as unit of half, calculates each third Half corresponding weighted value screens third half containing keyword from third half that fractionation obtains, and is filtered out Half corresponding weighted value of third, weighted value in third half filtered out is met half of third preset condition as closing Half, key.
In one embodiment, server can be from after the keyword for obtaining corresponding normal form text, by normal formization text This progress is split as unit of half, obtains fifty-fifty sentence, is traversed according to keyword to obtained fifty-fifty sentence, when any When containing the keyword in half, just third half will be used as comprising half of the keyword.
In one embodiment, keyword can be multiple.What server obtained after being split normal form text Fifty-fifty sentence is matched with keyword respectively, to obtain fifty-fifty sentence corresponding with normal form text.
It in the above-described embodiments, can be targetedly by the keyword in normal form feature corresponding with class label Key half for generating summary texts is found from normal form text.
In one embodiment, step S208 is specifically included: screening belongs to the word of default word set from normal form text;Root According to text categories belonging to the word identification normal form text filtered out.
Wherein, default word set is the set of the word counted on according to the content of text of the normal form text of each text categories.
Specifically, server can first segment normal form text, and obtained word is matched with default word set, with The word for belonging to default word set is filtered out from the obtained word of participle, according to the word filtered out identify normal form text belonging to text This classification.
In one embodiment, server, can also be in removal model before being matched obtained word with default word set After stop words in formula text, each word being calculated corresponds to TF (term frequency, the word of normal form text Frequently value) filters out the word that word frequency meets preset condition from obtained word, then the word is matched with default word set.
In one embodiment, server can divide default word set according to text categories, obtain corresponding to each text The son of this classification presets word set.When the word for having preset ratio in the word filtered out, which belongs to a certain height, presets word set, by the son Default text categories of the corresponding text categories of word set as normal form text.
In the above-described embodiments, the pre-generated default word set of server is with corresponding and text categories, by presetting word set Corresponding relationship between text categories, so that it may obtain the corresponding text class of normal form text according to the word in normal form text Not, to realize the classification to normal form text.
In one embodiment, the step of text categories according to belonging to the word identification normal form text filtered out are specifically wrapped It includes: the significance level for obtaining the word that filters out for normal form text;The text for indicating normal form text is constructed according to significance level This vector;Text vector is inputted to the machine learning model trained, obtains text categories.
Wherein, word is that information expressed by the word is integrally expressed with normal form text for the significance level of normal form text Information between correlation degree.Correlation degree is bigger, which can more represent normal form text information to be expressed, the word Significance level it is also bigger.Text vector is the vector for indicating the content of text of normal form text.The machine learning trained Model is the machine learning model predicted for the text categories to normal form text.By the way that normal form text will be corresponded to Text vector be input to machine learning model, output obtains the corresponding text categories of normal form text.The machine trained Learning model specifically can be SVM model.
In one embodiment, server can be by obtaining the corresponding TF-IDF of word (the term frequency- filtered out Inverse document frequency, word frequency-inverted file frequency) value, the corresponding TF-IDF value of the word filtered out is made It is the word for the significance level of normal form text.TF-IDF value is equal to the product of TF value and IDF value, and TF value indicates the word in institute Normal form text in the frequency that occurs, IDF value indicates that the word can represent the energy of text categories belonging to normal form text Power, whereinni,jIndicate the number that word i occurs in normal form text j, mjIt is to remove to stop in normal form text j The quantity of remaining all words after word,P is server according to presetting several normal form text shapes At document library in total number of files, PiIt is the total number of files in the document library comprising word i.
In one embodiment, server can be counted according to the normal form text of preset each text categories, statistics The corresponding normal form text of each text categories is segmented afterwards, the morphology obtained according to participle is at corresponding default word set, root N-dimensional vector is generated according to obtained default word set, by the word filtered out from the normal form text of text vector to be generated according to the n Dimensional vector and the corresponding TF-IDF value of word i generate the corresponding corresponding text vector of normal form text, by text vector It is input to the machine learning model trained, exports the corresponding text categories of normal form text.
For example, word set W={ w1, w2, w3 ..., wn } is preset according to the normal form text generation in document library, according to It is V=(v1, v2, v3 ..., vn) that the default word set W, which generates n-dimensional vector,;It is filtered out from normal form text M and appears in default word Word in collection W has w1, w3 ..., wk, and the corresponding TF-IDF value of wn is TI, TI2 ..., TIk ..., and Tin generates normal form The corresponding vector of text M is Vm=(TI1,0, TI3 ..., TIk ... TIn).
In the above-described embodiments, by the way that the text vector for representing normal form text is input to trained engineering It practises in model, for the normal form text of the one text classification of different editor Format personnel publication, can also accurately obtain this The corresponding text categories of normal form text.
In one embodiment, step S208 is specifically included: being classified to normal form text, is obtained just classification results; Obtain historical data corresponding to just classification results;Normal form text and historical data are compared, comparison result is obtained;It is tied comparing When fruit meets four preset conditions, using first classification results as text categories belonging to normal form text.
Wherein, first classification results are the classification results obtained after classifying to normal form text.Server can be further The first classification results are verified by historical data.Data corresponding with the classification that classification obtains that historical data is.4th Preset condition is the preset value of the matching degree between normal form text and historical data.
Specifically, server can be corresponding with the category by normal form text after classifying to normal form text Historical data is compared, can be using the first classification results as normal formization text if the comparison result meets the 4th preset condition This corresponding text classification.
In one embodiment, server can pre-establish history normal form text data and just classification according to historical data As a result mapping relations pull after server gets the corresponding just classification results of normal form text according to the mapping relations Normal form text is compared the classification results corresponding history normal form text data with history normal form text data, For example can be compared with duplicate checking method, if repetitive rate reaches preset value, using first classification results as normal form text pair The text categories answered.
In one embodiment, the first classification results that server obtains can be by by the corresponding text of normal form text Vector is input to trained machine learning model, what the classification results of output obtained.
In the above-described embodiments, by comparing with historical data just classification results corresponding with normal form text, The accuracy of just classification results is further verified, so that the text categories for the correspondence normal form text that classification obtains are more Accurately.
As shown in figure 5, in one embodiment, step S210 is specifically included:
S502 distributes template corresponding with text categories for the crucial text of each of extraction respectively.
Specifically, the corresponding relationship between text categories and template can be stored in advance in server, obtaining normal form It is the corresponding template of crucial text matches extracted according to the corresponding relationship after the corresponding text categories of text.
S504, by the template of distribution to the corresponding conjunction of crucial text matches of extraction.
Wherein, conjunction is used to connect the word of each crucial text extracted.Specifically, a template can correspond to more A conjunction.For example, the corresponding conjunction of the corresponding template of normal form text that text categories are finance bulletin may include " through public affairs Take charge of Finance Department's initial estimate ", since this kind of word does not appear in usually in normal form text, the crucial text extracted is also It not comprising this kind of word, therefore can be the crucial text matches conjunction extracted, keep the summary texts generated more smooth.
Crucial text is carried out split by corresponding conjunction, obtains summary texts by S506.
It specifically, can be according to this for the crucial paragraph, crucial whole sentence and key half that are extracted from normal form text The corresponding conjunction of template carrys out split, obtains the corresponding summary texts of normal form text.
In one embodiment, the crucial text extracted can be also respectively allocated to multiple text classifications pair by server The template answered, to generate the summary texts of the different crucial texts of expression.
In the above-described embodiments, the mould of distribution is utilized to the crucial text matches conjunction of extraction by the template of distribution Crucial text split can be obtained sentence more smoothly summary texts by plate and conjunction.
In one embodiment, summary texts generation method further include: determine the logical construction type of summary texts;From plucking It wants to isolate logic unit text in text;Logically text recombination form corresponding to structure type, by logic unit text This recombination, the summary texts recombinated.
Wherein, logical construction type is the type of the corresponding text logic of summary texts generated.Text logic specifically may be used To include at least one of parallel construction type, progressive structure type, replicated structures type and summary structure type.Logic list First text is the unit text of counterlogic structure type in summary texts.For example, being parallel construction class in logical construction type In the summary texts of type, including two sections of texts of A and B, and A and B are coordination from text logic, then A and B are belonged to pair Should summary texts logic unit text.
Text recombination form is that the logic unit text in the summary texts for Different Logic structure type is recombinated Mode.
In one embodiment, server can extract in summary texts the logical word that occurs to determine that the summary texts are corresponding Logical construction type.Logical word can reflect the logical construction type of the summary texts.Logical word such as " wherein ", " into one Step ground ", " although ", " still " etc., logical word can also be the word repeated in logic unit text.Specifically, it services Device can pre-generate the corresponding relationship of logical word Yu logical construction type, and the word in summary texts is carried out with preset logical word Matching can then determine the corresponding logic of summary texts when there is word matched with preset logical word according to the corresponding relationship Structure type.
In one embodiment, server, can be by abstract text after the corresponding logical construction type of summary texts has been determined It is split before logical word with the text after logical word in this, to isolate logic unit text from summary texts.
In one embodiment, server can be after having determined the corresponding logical construction type of the summary texts, for this Summary texts match corresponding transitional word, and the recombination to logic unit text is realized by transitional word.
In one embodiment, when logical construction type is parallel construction type, logically corresponding to structure type Text recombination form, logic unit text is recombinated, the step of summary texts recombinated comprise determining that isolate it is each Head text and tail portion text in logic unit text;Merge each head text, combined head by expression way arranged side by side Portion's text;Merge each tail portion text, combined tail portion text by expression way arranged side by side;By corresponding to parallel construction type Transitional word arranged side by side, be connected the head text and combined tail portion text of merging, the summary texts recombinated.
Wherein, the head text in the summary texts that logical construction type is parallel construction type, in logic unit text This has coordination, and the tail portion text in logic unit text has coordination.Transitional word arranged side by side is and and row logic knot The corresponding preset transitional word of structure type.Transitional word arranged side by side such as can be " being respectively ", " being successively " etc..For example, The summary texts arrived are as follows: " main cause of detracting: comparison same period last year, current period company's cement, commodity clinker sales volume increase, main management Rate of gross profit increases, while the total decline of three expenses is more.Lose main cause: the part subsidiary that company goes into operation in recent years And company parts production line high expensive, volume of production and marketing is low under Vehicles Collected from Market environment, and production capacity fails effectively to play ", server root The logical construction type that can be determined that the summary texts according to " main cause " duplicate in summary texts is parallel construction;From abstract Isolate logic unit text in text, obtain logic unit text A: " main cause of detracting: comparison same period last year, current period are public Water tender mud, the increase of commodity clinker sales volume, and main management rate of gross profit increases, while the total decline of three expenses is more " and logic list First text B: " loss main cause: the part subsidiary and company parts production line high expensive that company goes into operation in recent years, current Volume of production and marketing is low under market environment, and production capacity fails effectively to play ";Determine that the head text in A logic unit text is " to detract main Reason ", tail portion text are that " comparison same period last year, current period company's cement, commodity clinker sales volume increase, and main management rate of gross profit is mentioned Height, while the total decline of three expenses is more ", determine that the head text in B logic unit text is " loss main cause ", tail Portion's text is that " the part subsidiary and company parts production line high expensive that company goes into operation in recent years, produce under Vehicles Collected from Market environment Sales volume is low, and production capacity fails effectively to play ";It is " to subtract by the head text that the head text of logic unit text is merged The tail portion text that the main reason for thanks to and losing ", tail portion text are merged is " comparison same period last year, current period company's water Mud, commodity clinker sales volume increase, and main management rate of gross profit increases, while the total decline of three expenses is more;Company goes into operation in recent years Part subsidiary and company parts production line high expensive, volume of production and marketing is low under Vehicles Collected from Market environment, and production capacity fails effectively to send out It waves ";It is connected combined head text by the transitional word arranged side by side " being respectively " of parallel construction and combined tail portion text obtains weight The summary texts of group, as " detract and be respectively the main reason for loss: comparison same period last year, current period company's cement, commodity are ripe Expect that sales volume increases, main management rate of gross profit increases, while the total decline of three expenses is more.Part that company goes into operation in recent years is public Department and company parts production line high expensive, volume of production and marketing is low under Vehicles Collected from Market environment, and production capacity fails effectively to play."
In one embodiment, when logical construction type is progressive structure type, logically corresponding to structure type Text recombination form, logic unit text is recombinated, the step of summary texts recombinated comprises determining that each logic unit The progressive order of text;Obtain progressive transitional word corresponding with progressive structure type and corresponding with progressive order;According to progressive time Sequence and corresponding progressive transitional word, are connected each logic unit text, the summary texts recombinated.
Wherein, progressive order be logical construction type be progressive structure type summary texts in logic unit text it Between hierarchical sequence.For example, on the basis of logic unit text C, logic unit text D is ....Progressive transitional word is and passs Into the corresponding preset transitional word of structure type.Progressive transitional word such as can be " such as ", " and ", " and " etc..One In a embodiment, when logical construction type is replicated structures type, logically text recombination side corresponding to structure type The step of formula recombinates logic unit text, the summary texts recombinated include: from the logic unit text isolated, Identify the logic unit text of underlying semantics and the logic unit text that turnover is semantic;Determine the adversative conjunction in summary texts; From summary texts, the logic unit text and adversative conjunction of underlying semantics, the summary texts recombinated are deleted.
Wherein, the logic unit text of underlying semantics is logical construction type in the summary texts of replicated structures type Inclined sentence.The semantic logic unit text of turnover is positive sentence in summary texts that logical construction type is replicated structures type.Turning In the sentence of folding relationship, inclined sentence is opposite or opposite with the meaning of positive sentence.
In one embodiment, when logical construction type is summary structure type, logically corresponding to structure type Text recombination form, logic unit text is recombinated, the step of summary texts recombinated comprises determining that summary texts The sub- grade logical construction type of parent logical construction type and each logic unit text;Divide from each logic unit text respectively Separate out corresponding sub- logic unit text;The corresponding sub- logic unit text that each logic unit text is isolated, is pressed respectively It is recombinated according to text recombination form corresponding to corresponding sub- grade logical construction type, the logic unit text recombinated; According to text recombination form corresponding to parent logical construction type, the logic unit text of recombination is recombinated, is recombinated Summary texts.
Wherein, corresponding in the summary structure type summary texts of two or more logical construction type that have been nested Logical construction type.Each logic unit text is corresponding in the corresponding parent logical construction type of summary texts and summary texts Sub- grade logical construction type be nested logical construction type.For example, the corresponding parent logical construction type of summary texts For parallel construction type, logic unit text A, B and C with coordination are isolated from summary texts.From logic unit It is A1, A2 and A3, B1, B2 and B3, C1, C2 and C3 that text A, B and C, which respectively are isolated by corresponding sub- logic unit text,.Wherein, The corresponding sub- grade logical construction type of sub- logic unit text A1, A2 and A3 can be parallel construction type, replicated structures type Or at least one of progressive structure type.And so on, obtain the corresponding sub- grade logical construction class of each logic unit text Type, so that it may sub- logic unit text be recombinated according to the corresponding text recombination form of sub- grade logical construction type, recombinated Logic unit text afterwards finally recombinates the logic unit text after recombination according to the corresponding text of parent logical construction type Mode recombinates the logic unit text after recombination, the summary texts after being recombinated.
In the above-described embodiments, by according to the corresponding logical construction type of summary texts to the logic list in summary texts First text is recombinated, and avoids and mechanically obtained crucial text is pieced together by template, so that recombination obtained Summary texts can reach the requirement formally reported.
In one embodiment, summary texts generation method further include: obtain user data;According to user data and abstract Text determines the push priority of summary texts;According to push priority, abstract text is pushed to terminal corresponding to user data This.
Wherein, user data is the data that can embody the rule that normal form file is read in users from networks.Number of users According to can specifically include download, frequency of use, amount of access, rate of people logging in, remaining time etc..Pushing priority is the same abstract Text corresponds to the value rank of different user data.It is appreciated that due to each user data reflect corresponding user from The feature of normal form file is read on network, the same summary texts usually have different push for different user data Priority.
In one embodiment, server can record the registration information that user corresponds to the place website of normal form file, right The corresponding user data of the registration information is matched after being excavated with the summary texts of generation, is determined according to matched result Summary texts correspond to the push priority of each user data, according to obtained push priority to the corresponding terminal of user data Push the summary texts.
It in the above-described embodiments, being capable of foundation by determining the push priority of the summary texts and user data that generate The push priority realizes the terminal more accurately pushed to the summary texts of generation where user data.
In one embodiment, as shown in fig. 6, summary texts generation method specifically includes:
S601 monitors bullet in file source;
S602 then obtains newly-increased bullet in file when monitoring that bullet in file source increases bullet in file newly;
S603 extracts normal form text from bullet in file;
S604 is read and the associated class label of bullet in file;
S605, preset normal form feature corresponding to query categories label;
S606-1 is screened when normal form feature includes the paragraph position that critical section is fallen in normal form text from normal form Split out the first half of the text middle section dropping place place of setting;The first half corresponding weighted values for obtaining and filtering out;It is filtering out The first half in determine that weighted value meets the first half of the first preset condition;The first preset condition and continuous the will be met Half sentence forms crucial paragraph;
S606-2 is screened in normal form text and is prompted with sequence text when normal form feature includes sequence text prompt word Word is the second half corresponding;The second half corresponding weighted values for obtaining and filtering out;It is determined in the second half filtered out Weighted value meets the second half of the second preset condition;To meet the second preset condition and it is continuous the second half formed it is crucial whole Sentence;
S606-3 extracts key half including keyword when normal form feature includes keyword from normal form text; From half split out in normal form text, screening includes third half of keyword;The third for obtaining and filtering out half The corresponding weighted value of sentence;Third half that weighted value is met third preset condition is used as key half;
S607, screening belongs to the word of default word set from normal form text;
S608 obtains the word that filters out for the significance level of normal form text;
S609 constructs the text vector for indicating normal form text according to significance level;
Text vector is inputted the machine learning model trained, obtains text categories by S610;
S611 distributes template corresponding with text categories for the crucial text of each of extraction respectively;
Crucial text is passed through phase by the template of distribution to the corresponding conjunction of crucial text matches of extraction by S612 The conjunction answered carries out split, obtains summary texts;
S613 determines the logical construction type of summary texts, logic unit text is isolated from summary texts;
S614, logically text recombination form corresponding to structure type, logic unit text is recombinated, is recombinated Summary texts;
S615 obtains user data, the push priority of summary texts is determined according to user data and summary texts;
S616 pushes summary texts to terminal corresponding to user data according to push priority.
Above-mentioned summary texts generation method passes through the corresponding normal form feature of the normal form text inquired, so that it may from this Crucial text is extracted in normal form text, after identifying the corresponding text categories of normal form text, so that it may rely on this article The crucial text split that the corresponding template of this classification will be extracted, to obtain summary texts.Due to entirely generating summary texts Process do not need manually to participate in, the efficiency rewritten to text can be greatly improved.
Fig. 6 is the flow diagram of summary texts generation method in one embodiment.Although should be understood that Fig. 6's Each step in flow chart is successively shown according to the instruction of arrow, but these steps are not necessarily to indicate according to arrow Sequence successively executes.Unless expressly stating otherwise herein, there is no stringent sequences to limit for the execution of these steps, these steps Suddenly it can execute in other order.Moreover, at least part step in Fig. 6 may include multiple sub-steps or multiple ranks Section, these sub-steps or stage are not necessarily to execute completion in synchronization, but can execute at different times, this The execution sequence in a little step perhaps stage be also not necessarily successively carry out but can be with other steps or other steps Sub-step or at least part in stage execute in turn or alternately.
Fig. 7, Fig. 8 are to generate corresponding summary texts according to the bullet in file according to exchange's publication of the embodiment of the present application Schematic diagram.In conjunction with Fig. 7 and Fig. 8, monitored in real time by the source to the fixed normal form bullet in file such as exchange, to obtain Then the bullet in file of publication refines crucial sentence and paragraph from the content of text in bullet in file, passes through matching and bulletin The corresponding template of the file class of file reconstructs text structure using the template, obtains summary texts, obtain finally by judgement Summary texts news value, push to terminal, realize and write by machine using listed company's bullet in file as representative Fixed normal form official document abstract, excavates news point instead of hand digging from bullet in file, can accomplish 7*24 hours round-the-clock prisons Survey thousands of listed companies.
For example, the following are the content of text of bullet in file: "
XX is holding: the bulletin about XX property is transferred the possession of and with the progress strategic cooperation of YY property
Date of declaration: 2017-07-01 00:00:00
Stock code: 6006XX stock abbreviation: the holding number of XX: face the holding Group Plc of 2017-048XX and close Bulletin in transferring the possession of XX property and with the progress strategic cooperation of YY property
The board of directors, our company and all directors guarantee disclosure content there is no any false record, misleading statement or Great omission, and joint and several liabilities are undertaken to the authenticity, accuracy and integrality of its content.
Important content prompt
Transaction content
Company plans 100% stock right transfer of Shanghai XX service for infrastructure Co., Ltd, wholly-owned subsidiary and gives YY estate management service Co., Ltd, stock right transfer are 1,000,000,000.00 yuan to valence.Meanwhile company is quasi- and YY property forms Strategic Cooperation partner Companion supports the fast development of the estate management of YY property future and all kinds of community's value-added services.For this purpose, company and YY property are reached altogether Know, from 5 years of 2018 to 2022, the real estate project of company's exploitation will be using YY property as priority property cooperation Quotient.
This transaction does not constitute connected transaction.
This transaction does not constitute great rearrangement of assets.
This transaction implements that great legal impediment is not present.
One, this transaction is summarized
1, basic condition
Recently, company and YY group Pty Ltd, YY estate management Services Co., Ltd endorsed " framework for cooperation association View ", company and wholly-owned subsidiary, company XX control interest Group Co., Ltd (hereinafter referred to as " XX group ") and YY estate management service Co., Ltd's (hereinafter referred to as " YY property "), Shanghai XX service for infrastructure Co., Ltd (hereinafter referred to as " XX property " or " target public affairs Department ") it endorsed " stock right transfer agreement " about XX property, company plans the limited public affairs of wholly-owned subsidiary's Shanghai XX service for infrastructure It takes charge of 100% equity (hereinafter referred to as " target equity ") and transfers YY estate management Services Co., Ltd, stock right transfer is 1 to valence, 000,000,000.00 yuan.Meanwhile company is quasi- and YY property forms strategic partner, supports the future estate management of YY property And the fast development of all kinds of community's value-added services.For this purpose, company reached common understanding with YY property, from 5 years of 2018 to 2022 Interior, the real estate project of company's exploitation will be using YY property as priority property partner.
2, transaction comes into force, and it is still necessary to the examination & approval fulfiled and other programs
This transaction has come into force.
……
Four, agreement main contents
1, stock right transfer price
100% stock right transfer price of XX property is 100,0000000 yuan.By this price, YY property (" assignee ") will be respectively It is assigned XX property with 90,909.1 ten thousand yuan of RMB, 9,090.9 ten thousand yuan to XX group, our company's (being collectively referred to as " transferor ") 90.9091%, 9.0909% equity.
2, equity is completed a business transaction
Trading parties agree to that the day of the following conditions while satisfaction is the transfer prompt day of target equity:
(1) this stock right transfer obtains the resolution of the board of directors, our company and passes through;
(2) the 51% of stock right transfer money completes payment;
(3) each side's signed " equity completes a business transaction confirmation form ".
Each side should take best endeavors, it is ensured that prompt day is not later than on June 30th, 2017.
In a period of prompt day to industrial and commercial alternation procedure handles completion day, all rights and obligations relevant to target equity, The profit of target company and loss belong to assignee.
3, the payment of stock right transfer money
YY property should be before on June 30th, 2017, to the 51% of transferor payment stock right transfer money;
From the signature of stock right transfer agreement in 60 days, to the residue 49% of transferor payment stock right transfer money.
4, strategic cooperation
After the completion of this stock right transfer, YY property will take " YY property " and " XX property " double brand strategy management, by XX Property is made as house and Commercial Real Estate management project mark post brand, the business scale of expansion XX property, undertaking ability, Yi Jiti High service management is horizontal, realizes that the increment of the developed real estate project of XX group preserves value jointly by YY property and XX property.It is public Department promises to undertake, in applicable law, regulation and regular allowed band, (including but not by trade mark needed for its XX property service operation Be limited to " XX " related text and figurative mark), brand freely authorizes XX property and uses in the estate management scope of business, and exempt from Expense authorize during no less than 5 years, authorize expired, YY property will separately negotiate with company.
Company will support the fast development of the estate management of YY property and all kinds of community's value-added services, and inquire into equity level Cooperative relationship, YY property will support the development of the business such as XX community financial institutions, smart home, community endowment, tele-medicine.After company The real estate project of continuous exploitation will be using YY property as priority property partner.To in December, 2022 from January 1st, 2018 In 5 years on the 31st, under the premise of meeting be applicable in whole laws, regulation, rule, it is ensured that YY property passes through legal conjunction rule Program obtains 7,000,000 square metres of service for infrastructure areas from the property that company develops every year;On this basis, promote YY property By the legal program for closing rule, 3,000,000 square metres of service for infrastructure areas are preferentially obtained again from the property that company develops every year.? Under the premise of meeting relevant governmental guiding price (if any), the average management for infrastructure fee unit price of aforementioned property service area is in principle not Lower than market fair price.
It is positioned as middle and high end service for infrastructure supplier in view of YY property, company, which develops and delivers in accordance with the law, provides object in YY property The real estate project of industry service, should be in the premise for meeting relevant governmental guiding price (if any) and other legal provisions in price Under, the price with the rationally determination and its service for infrastructure level match of YY property.
Meanwhile the real-estate management that YY property is accepted according to market-oriented principle by company in intervention early period expense, the expense of reprocessing, test room Take etc. carries out support on policy.
……”
The summary texts that summary texts generation method in through the foregoing embodiment obtains are as follows:
" holding night June 30 bulletin of [XX is holding: company intends 1,000,000,000 yuan of transfer XX property] XX, company's work done in the manner of a certain author valence 1,000,000,000 Member gives 100% stock right transfer of Shanghai XX service for infrastructure Co., Ltd to YY estate management Services Co., Ltd.Both sides will tie simultaneously At strategic partner, the real estate project of 5 Nian Nei companies exploitation in 2018 to 2022 will be using YY property as preferentially Grade property partner.XX property is XX group Tier One supplier, realizes 1.22 hundred million yuan of business revenue in 2016 years, net profit 286.33 ten thousand Member."
Fig. 9 is that the content of text of the bullet in file according to exchange's publication provided according to the embodiment of the present application extracts The schematic diagram of crucial text.As shown in figure 9, the crucial paragraph extracted according to former bullet in file are as follows: " it is expected that 2017 semi-annual Business performance will be lost, and realize that the net profit for belonging to listed company shareholder is -4,3,060,000 yuan or so, with same period last year It compares, detracts 6,6,390,000 yuan.It is expected that semi-annual business performance in 2017 will be lost, realization belongs to listed company shareholder Net profit be -4,3,060,000 yuan or so, compared with same period last year, detract 6,6,390,000 yuan.";The whole sentence of key extracted are as follows: " main cause of detracting: comparison same period last year, current period company's cement, commodity clinker sales volume increase, and main management rate of gross profit increases, The total decline of three expenses simultaneously is more.", " loss main cause: part subsidiary that company goes into operation in recent years and company parts are raw Producing line high expensive, volume of production and marketing is low under Vehicles Collected from Market environment, and production capacity fails effectively to play.";The key extracted half are as follows: " Fujian cement (stock code 6008XX) ", " it is -1.09 hundred million yuan that same period last year, which belongs to the net profit of listed company shareholder,.".
Figure 10 is that the schematic diagram that crucial text is extracted from bullet in file is shown according to the embodiment of the present application.Such as Figure 10 It is shown, the bullet in file of PDF format is obtained by downloading from bulletin source, is converted to that obtain bullet in file after TXT file corresponding Content of text, then can according to the preset crucial text position of the corresponding bullet in file, chapters and sections sequence prompt word and Key word information is extracted to obtain crucial paragraph, crucial whole sentence and key using Textrank algorithm from obtained content of text Half.
Figure 11 shows the schematic diagram that split is carried out to the crucial text extracted from bullet in file.As shown in figure 11, First the crucial text extracted is carried out simplifying processing, the redundancy in obtained crucial text is rejected, checked and repaired Then it is matched each of to arrive crucial text distribution for coarse extraction for positive missing sequence number, entanglement and amendment unit and data error Template carries out split using crucial text of the template to extraction and obtains summary texts, finally identifies the text logic of summary texts, Summary texts to obtain match modal particle, further to improve the summary texts generated.
Figure 12 shows the schematic diagram identified to the text categories of bullet in file.As shown in figure 12, by bulletin Source is monitored to obtain bullet in file and corresponding content of text, is segmented to obtain word sequence to content of text, from word order The word that TF value reaches preset threshold is filtered out in column, and SVM model is input to according to the word filtered out and obtains corresponding just classification knot Fruit compares the content of text of bullet in file historical data corresponding with first classification results, if comparison reaches to matching degree Preset value, then using the first classification results as the corresponding text categories of the bullet in file.
Figure 13 shows the schematic diagram that split is carried out to the crucial text extracted.As shown in figure 13, by extract Each of crucial text matches correspond to the templates of text categories, be the Keywords matching conjunction extracted by template, then The crucial text extracted is subjected to split by matched conjunction, obtains summary texts.
Figure 14, Figure 15 give the schematic diagram that modal particle is matched for the summary texts that logical construction type is parallel construction. In conjunction with Figure 14 and Figure 15, summary texts arranged side by side are separated, are obtained for parallel construction by the clause by judging the summary texts generated To logic unit text, i.e., son talks about beam side by side, then merges respectively to the header contents and tail portion content of son words beam arranged side by side, and Matching modal particle " being respectively " obtains finishing treated summary texts.
In one embodiment, as shown in figure 16, a kind of summary texts generating means 1600 are provided.Referring to Fig.1 6, it should Summary texts generating means 1600 include: to obtain module 1602, enquiry module 1604, extraction module 1606, identification module 1608 With die section 1610.
Module 1602 is obtained, for obtaining normal form text and corresponding class label;
Enquiry module 1604, for preset normal form feature corresponding to query categories label;
Extraction module 1606, for extracting crucial text from normal form text according to normal form feature;
Identification module 1608, for identification text categories belonging to normal form text;
Die section 1610, for, by the crucial text split of extraction, being plucked according to template corresponding to text categories Want text.
Above-mentioned summary texts generating means pass through the corresponding normal form feature of the normal form text inquired, so that it may from this Crucial text is extracted in normal form text, after identifying the corresponding text categories of normal form text, so that it may rely on this article The crucial text split that the corresponding template of this classification will be extracted, to obtain summary texts.Due to entirely generating summary texts Process do not need manually to participate in, the efficiency rewritten to text can be greatly improved.
In one embodiment, as shown in figure 17, the acquisition module 1602 in summary texts generation module 1600 includes: public affairs Accuse file source monitoring modular 1702, bullet in file obtains module 1704, normal form Text Feature Extraction module 1706 and class label and reads Modulus block 1708.
Bullet in file source monitoring modular 1702, for monitoring bullet in file source;
Bullet in file obtains module 1704, for when monitoring that bullet in file source increases bullet in file newly, then obtaining newly-increased Bullet in file;
Normal form Text Feature Extraction module 1706, for extracting normal form text from bullet in file;
Class label read module 1708, for reading and the associated class label of bullet in file.
In one embodiment, crucial text includes at least one of crucial paragraph, crucial whole sentence and key half;Such as Shown in Figure 18, the extraction module 1606 in summary texts generation module 1600 includes: crucial paragraph extraction module 1802, crucial whole Sentence extraction module 1804 and crucial half extraction module 1806.
Crucial paragraph extraction module 1802, for including the paragraph position that critical section is fallen in normal form text when normal form feature When setting, crucial paragraph is extracted from normal form text according to paragraph position;
Crucial whole sentence extraction module 1804, for when normal form feature includes sequence text prompt word, from normal form text In crucial whole sentence is extracted at position corresponding with sequence text prompt word;
Crucial half extraction module 1806, for extracting packet from normal form text when normal form feature includes keyword Include key half of keyword.
In one embodiment, crucial paragraph extraction module 1802 includes: that the first screening module, the first half weighted values obtain Modulus block, the first half screening modules and crucial paragraph form module.First screening module is for screening from normal form text The first half split out at paragraph position;The first half weighted values obtain the first half phases that module is used to obtain with filters out The weighted value answered;The first half screening modules are used to determine that weighted value meets the first preset condition in the first half filtered out The first half;Crucial paragraph forms module for that will meet the first preset condition and continuous the first half formation critical sections It falls.
In one embodiment, crucial whole sentence extraction module 1804 includes: that the second screening module, the second half weighted values obtain Modulus block, the second half screening modules and crucial whole sentence form module.Second screening module for screen in normal form text with Sequence text prompt word is the second half corresponding;The second half weighted values obtain that module is used to obtain and filter out the second half Corresponding weighted value;The second half screening modules are used to determine that weighted value meets the second default item in the second half filtered out The second half of part;Crucial whole sentence forms module, and for that will meet the second preset condition and continuous the second half, to form key whole Sentence.
In one embodiment, crucial half extraction module 1806 includes: that half third screening module, third weighted value obtain Modulus block and crucial half formation module.Third screening module is used to screen from half split out in normal form text Third including keyword half;Half weighted value of third obtains module for obtaining power corresponding with the third half filtered out Weight values;Crucial half formation module is used to meet weighted value third half of third preset condition as half crucial.
In one embodiment, identification module 1608 further includes screening module, and screening module is used for from normal form text Screening belongs to the word of default word set;Identification module is also used to the text class according to belonging to the word identification normal form text filtered out Not.
In one embodiment, identification module 1608 includes that significance level obtains module, text vector building module and text This classification identification module.Significance level obtains module for obtaining the word filtered out for the significance level of normal form text;Text This vector constructs module and is used to construct the text vector for indicating normal form text according to significance level;Text categories identification module is used In text vector to be inputted to the machine learning model trained, text categories are obtained.
In one embodiment, identification module 1608 includes categorization module, historical data acquisition module, comparison module and text This category determination module.Categorization module obtains just classification results for classifying to normal form text;Historical data obtains mould Block is for obtaining historical data corresponding to first classification results;Comparison module is obtained for comparing normal form text and historical data To comparison result;Text categories determining module be used for comparison result meet four preset conditions when, using first classification results as Text categories belonging to normal form text.
In one embodiment, die section 1610 includes: template distribution module, conjunction matching module and crucial text Die section.Template distribution module is used to distribute template corresponding with text categories respectively for the crucial text of each of extraction;Even Connect word matching module for by the template distributed to the corresponding conjunction of crucial text matches of extraction;Crucial text composite die Block is used to crucial text carrying out split by corresponding conjunction, obtains summary texts.
In one embodiment, as shown in figure 19, summary texts generating means 1600 further include: summary texts logic determines Module 1902, summary texts separation module 1904 and recombination module 1906.
Summary texts logic determining module 1902, for determining the logical construction type of summary texts;
Summary texts separation module 1904, for isolating logic unit text from summary texts;
Recombination module 1906, for text recombination form corresponding to logically structure type, by logic unit text Recombination, the summary texts recombinated.
In one embodiment, summary texts generating means further include: user data obtains module, push priority obtains Module and pushing module.User data obtains module for obtaining user data;Push priority obtain module be used for according to User data and summary texts determine the push priority of summary texts;Pushing module is used for according to push priority, to number of users Summary texts are pushed according to corresponding terminal.
Above-mentioned summary texts generating means pass through the corresponding normal form feature of the normal form text inquired, so that it may from this Crucial text is extracted in normal form text, after identifying the corresponding text categories of normal form text, so that it may rely on this article The crucial text split that the corresponding template of this classification will be extracted, to obtain summary texts.Due to entirely generating summary texts Process do not need manually to participate in, the efficiency rewritten to text can be greatly improved.
Figure 20 shows the internal structure chart of computer equipment in one embodiment.The computer equipment specifically can be figure Server 120 in 1.As shown in figure 20, which includes processor, memory and the net connected by system bus Network interface.Wherein, memory includes non-volatile memory medium and built-in storage.The non-volatile memories of the computer equipment are situated between Matter is stored with operating system, can also be stored with computer program, when which is executed by processor, may make processor Realize summary texts generation method.Computer program can also be stored in the built-in storage, which is held by processor When row, processor executive summary document creation method may make.
It will be understood by those skilled in the art that structure shown in Figure 20, only part relevant to application scheme The block diagram of structure, does not constitute the restriction for the computer equipment being applied thereon to application scheme, and specific computer is set Standby may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, summary texts generating means provided by the present application can be implemented as a kind of computer program Form, computer program can run in computer equipment as shown in figure 20.It can be stored in the memory of computer equipment Each program module of the summary texts generating means is formed, for example, obtaining module 1602, enquiry module shown in Figure 16 1604, extraction module 1606, identification module 1608 and die section 1610.The computer program that each program module is constituted makes Processor executes the step in the summary texts generation method of each embodiment of the application described in this specification.
For example, computer equipment shown in Figure 20 can pass through the acquisition in summary texts generating means as shown in figure 16 Module 1602 executes step S202.Step S204 is executed by enquiry module 1604.Step is executed by extraction module 1606 S206.Step S208 is executed by identification module 1608.Step S210 is executed by die section 1610.
In one embodiment, a kind of computer readable storage medium is provided, is deposited on the computer readable storage medium Computer program is contained, when which is executed by processor, so that processor executes following steps: obtaining normal formization text Sheet and corresponding class label;Preset normal form feature corresponding to query categories label;It is literary from normal formization according to normal form feature Crucial text is extracted in this;Identify text categories belonging to normal form text;It will be extracted according to template corresponding to text categories Crucial text split, obtain summary texts.
In one embodiment, computer program makes processor execute acquisition normal form text and corresponding classification mark Following steps are specifically also executed when the step of label: monitoring bullet in file source;When monitoring that bullet in file source increases bullet in file newly, Then obtain newly-increased bullet in file;Normal form text is extracted from bullet in file;It reads and the associated class label of bullet in file.
In one embodiment, crucial text includes at least one of crucial paragraph, crucial whole sentence and key half;Meter Calculation machine program goes back processor specifically when executing the step for extracting crucial text from normal form text according to normal form feature Execute following steps: when normal form feature includes the paragraph position that critical section is fallen in normal form text, according to paragraph position from Crucial paragraph is extracted in normal form text;When normal form feature includes sequence text prompt word, from normal form text with sequence Crucial whole sentence is extracted at the corresponding position of text prompt word;When normal form feature includes keyword, extracted from normal form text Key including keyword half.
In one embodiment, computer program mentions processor from normal form text according to paragraph position in execution It takes and specifically also executes following steps when the step of crucial paragraph: screening first split out from the normal form text middle section dropping place place of setting Half;The first half corresponding weighted values for obtaining and filtering out;Determine that weighted value meets the in the first half filtered out The first half of one preset condition;The first preset condition will be met and continuous the first half form crucial paragraph.
In one embodiment, computer program prompts that processor from normal form text with sequence text in execution Following steps are specifically also executed when extracting the step of crucial whole sentence at the corresponding position of word: literary with sequence in screening normal form text This prompt word is the second half corresponding;The second half corresponding weighted values for obtaining and filtering out;In the second half filtered out Middle determining weighted value meets the second half of the second preset condition;The second preset condition will be met and continuous the second half form Crucial whole sentence.
In one embodiment, it includes keyword that computer program, which extracts processor from normal form text in execution, Key half step when specifically also execute following steps: from half split out in normal form text, screening includes The third of keyword half;Obtain weighted value corresponding with the third half filtered out;Weighted value is met into third preset condition Third half as crucial half.
In one embodiment, computer program is executing processor according to the word identification normal form text filtered out Following steps are specifically also executed when the step of affiliated text categories: screening belongs to the word of default word set from normal form text; According to text categories belonging to the word identification normal form text filtered out.
In one embodiment, computer program makes processor execute text categories belonging to identification normal form text Step when specifically also execute following steps: obtain the word that filters out for the significance level of normal form text;According to important journey Degree building indicates the text vector of normal form text;Text vector is inputted to the machine learning model trained, obtains text class Not.
In one embodiment, computer program makes processor execute text categories belonging to identification normal form text Step when specifically also execute following steps: classify to normal form text, obtain just classification results;Obtain just classification results Corresponding historical data;Normal form text and historical data are compared, comparison result is obtained;It is default to meet the 4th in comparison result When condition, using first classification results as text categories belonging to normal form text.
In one embodiment, computer program mention processor will according to template corresponding to text categories in execution The crucial text split taken obtains specifically also executing following steps when the step of summary texts: for the crucial text of each of extraction Template corresponding with text categories is distributed respectively;By the template of distribution to the corresponding conjunction of crucial text matches of extraction; Crucial text is subjected to split by corresponding conjunction, obtains summary texts.
In one embodiment, when computer program is executed by processor, processor is also made to execute following steps: determined The logical construction type of summary texts;Logic unit text is isolated from summary texts;Logically corresponding to structure type Text recombination form, logic unit text is recombinated, the summary texts recombinated.
In one embodiment, when computer program is executed by processor, processor is also made to execute following steps: obtained User data;The push priority of summary texts is determined according to user data and summary texts;According to push priority, to user Terminal corresponding to data pushes summary texts.
Above-mentioned computer storage medium passes through the corresponding normal form feature of the normal form text inquired, so that it may from the model Crucial text is extracted in formula text, after identifying the corresponding text categories of normal form text, so that it may rely on the text The crucial text split that the corresponding template of classification will be extracted, to obtain summary texts.Due to entirely generating summary texts Process does not need manually to participate in, and can greatly improve the efficiency rewritten to text.
In one embodiment, a kind of computer equipment, including memory and processor are provided, is stored in memory Computer program, when computer program is executed by processor, so that processor executes following steps: obtaining normal form text and phase The class label answered;Preset normal form feature corresponding to query categories label;It is mentioned from normal form text according to normal form feature Take crucial text;Identify text categories belonging to normal form text;According to template corresponding to text categories by the key of extraction Text split, obtains summary texts.
In one embodiment, computer program makes processor execute acquisition normal form text and corresponding classification mark Following steps are specifically also executed when the step of label: monitoring bullet in file source;When monitoring that bullet in file source increases bullet in file newly, Then obtain newly-increased bullet in file;Normal form text is extracted from bullet in file;It reads and the associated class label of bullet in file.
In one embodiment, crucial text includes at least one of crucial paragraph, crucial whole sentence and key half;Meter Calculation machine program goes back processor specifically when executing the step for extracting crucial text from normal form text according to normal form feature Execute following steps: when normal form feature includes the paragraph position that critical section is fallen in normal form text, according to paragraph position from Crucial paragraph is extracted in normal form text;When normal form feature includes sequence text prompt word, from normal form text with sequence Crucial whole sentence is extracted at the corresponding position of text prompt word;When normal form feature includes keyword, extracted from normal form text Key including keyword half.
In one embodiment, computer program mentions processor from normal form text according to paragraph position in execution It takes and specifically also executes following steps when the step of crucial paragraph: screening first split out from the normal form text middle section dropping place place of setting Half;The first half corresponding weighted values for obtaining and filtering out;Determine that weighted value meets the in the first half filtered out The first half of one preset condition;The first preset condition will be met and continuous the first half form crucial paragraph.
In one embodiment, computer program prompts that processor from normal form text with sequence text in execution Following steps are specifically also executed when extracting the step of crucial whole sentence at the corresponding position of word: literary with sequence in screening normal form text This prompt word is the second half corresponding;The second half corresponding weighted values for obtaining and filtering out;In the second half filtered out Middle determining weighted value meets the second half of the second preset condition;The second preset condition will be met and continuous the second half form Crucial whole sentence.
In one embodiment, it includes keyword that computer program, which extracts processor from normal form text in execution, Key half step when specifically also execute following steps: from half split out in normal form text, screening includes The third of keyword half;Obtain weighted value corresponding with the third half filtered out;Weighted value is met into third preset condition Third half as crucial half.
In one embodiment, computer program is executing processor according to the word identification normal form text filtered out Following steps are specifically also executed when the step of affiliated text categories: screening belongs to the word of default word set from normal form text; According to text categories belonging to the word identification normal form text filtered out.
In one embodiment, computer program makes processor execute text categories belonging to identification normal form text Step when specifically also execute following steps: obtain the word that filters out for the significance level of normal form text;According to important journey Degree building indicates the text vector of normal form text;Text vector is inputted to the machine learning model trained, obtains text class Not.
In one embodiment, computer program makes processor execute text categories belonging to identification normal form text Step when specifically also execute following steps: classify to normal form text, obtain just classification results;Obtain just classification results Corresponding historical data;Normal form text and historical data are compared, comparison result is obtained;It is default to meet the 4th in comparison result When condition, using first classification results as text categories belonging to normal form text.
In one embodiment, computer program mention processor will according to template corresponding to text categories in execution The crucial text split taken obtains specifically also executing following steps when the step of summary texts: for the crucial text of each of extraction Template corresponding with text categories is distributed respectively;By the template of distribution to the corresponding conjunction of crucial text matches of extraction; Crucial text is subjected to split by corresponding conjunction, obtains summary texts.
In one embodiment, when computer program is executed by processor, processor is also made to execute following steps: determined The logical construction type of summary texts;Logic unit text is isolated from summary texts;Logically corresponding to structure type Text recombination form, logic unit text is recombinated, the summary texts recombinated.
In one embodiment, when computer program is executed by processor, processor is also made to execute following steps: obtained User data;The push priority of summary texts is determined according to user data and summary texts;According to push priority, to user Terminal corresponding to data pushes summary texts.
Above-mentioned computer equipment passes through the corresponding normal form feature of the normal form text inquired, so that it may from the normal form Crucial text is extracted in text, after identifying the corresponding text categories of normal form text, so that it may rely on text classification The crucial text split that corresponding template will be extracted, to obtain summary texts.Due to entirely generating the process of summary texts It does not need manually to participate in, the efficiency rewritten to text can be greatly improved.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Instruct relevant hardware to complete by computer program, program can be stored in a non-volatile computer storage can be read In medium, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, provided herein each To any reference of memory, storage, database or other media used in embodiment, may each comprise it is non-volatile and/ Or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) directly RAM (RDRAM), straight Connect memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously The limitation to the application the scope of the patents therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art For, without departing from the concept of this application, various modifications and improvements can be made, these belong to the guarantor of the application Protect range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims (19)

1. a kind of summary texts generation method, comprising:
Obtain normal form text and corresponding class label;
Inquire preset normal form feature corresponding to the class label;
Crucial text is extracted from the normal form text according to the normal form feature;
Identify text categories belonging to the normal form text;
According to template corresponding to the text categories by the crucial text split of extraction, summary texts are obtained.
2. the method according to claim 1, wherein the acquisition normal form text and corresponding class label, Include:
Monitor bullet in file source;
When monitoring that bullet in file source increases bullet in file newly, then the newly-increased bullet in file is obtained;
Normal form text is extracted from the bullet in file;
It reads and the associated class label of the bullet in file.
3. the method according to claim 1, wherein the key text include crucial paragraph, crucial whole sentence and It is at least one of half crucial;It is described to extract crucial text from the normal form text according to the normal form feature, comprising:
When the normal form feature includes the paragraph position that critical section is fallen in the normal form text, according to the paragraph position Crucial paragraph is extracted from the normal form text;
When the normal form feature includes sequence text prompt word, from the normal form text with the sequence text prompt word Crucial whole sentence is extracted at corresponding position;
When the normal form feature includes keyword, extracting from the normal form text includes crucial the half of the keyword Sentence.
4. according to the method described in claim 3, it is characterized in that, it is described according to the paragraph position from the normal form text The crucial paragraph of middle extraction, comprising:
Screen the first half split out from paragraph position described in the normal form text;
The first half corresponding weighted values for obtaining and filtering out;
Determine that weighted value meets the first half of the first preset condition in the first half filtered out;
The first preset condition will be met and continuous the first half form crucial paragraph.
5. according to the method described in claim 3, it is characterized in that, it is described from the normal form text with the sequence text Crucial whole sentence is extracted at the corresponding position of prompt word, comprising:
The second half corresponding with the sequence text prompt word are screened in the normal form text;
The second half corresponding weighted values for obtaining and filtering out;
Determine that weighted value meets the second half of the second preset condition in the second half filtered out;
The second preset condition will be met and continuous described the second half form crucial whole sentence.
6. according to the method described in claim 3, it is characterized in that, described extract from the normal form text includes the pass The key of keyword half, comprising:
From half split out in the normal form text, screening includes third half of the keyword;
Obtain weighted value corresponding with the third half filtered out;
The third half that weighted value is met third preset condition is used as key half.
7. method according to any one of claim 1 to 6, which is characterized in that the identification normal form text institute The text categories of category, comprising:
Screening belongs to the word of default word set from the normal form text;
According to the word filtered out identify the normal form text belonging to text categories.
8. the method according to the description of claim 7 is characterized in that the word that the basis filters out identifies the normal form text Affiliated text categories, comprising:
Significance level of the word that acquisition filters out for the normal form text;
The text vector of the normal form text is indicated according to significance level building;
The text vector is inputted to the machine learning model trained, obtains text categories.
9. method according to any one of claim 1 to 6, which is characterized in that the identification normal form text institute The text categories of category, comprising:
Classify to the normal form text, obtains just classification results;
Obtain historical data corresponding to the just classification results;
The normal form text and the historical data are compared, comparison result is obtained;
When the comparison result meets four preset conditions, using the just classification results as belonging to the normal form text Text categories.
10. method according to any one of claim 1 to 6, which is characterized in that described right according to the text categories institute The crucial text split of extraction is obtained summary texts by the template answered, comprising:
Template corresponding with the text categories is distributed respectively for the crucial text of each of extraction;
By the template of distribution to the corresponding conjunction of crucial text matches of extraction;
The crucial text is subjected to split by the corresponding conjunction, obtains summary texts.
11. method according to any one of claim 1 to 6, which is characterized in that the method also includes:
Obtain user data;
The push priority of the summary texts is determined according to the user data and the summary texts;
According to the push priority, terminal corresponding to Xiang Suoshu user data pushes the summary texts.
12. method according to any one of claim 1 to 6, which is characterized in that the method also includes:
Determine the logical construction type of the summary texts;
Logic unit text is isolated from the summary texts;
According to text recombination form corresponding to the logical construction type, the logic unit text is recombinated, is recombinated Summary texts.
13. according to the method for claim 12, which is characterized in that when the logical construction type is parallel construction type When, it is described to recombinate the logic unit text according to text recombination form corresponding to the logical construction type, obtain weight The summary texts of group, comprising:
Determine the head text and tail portion text in each logic unit text isolated;
Merge each head text, combined head text by expression way arranged side by side;
Merge each tail portion text, combined tail portion text by expression way arranged side by side;
By transitional word arranged side by side corresponding to the parallel construction type, it is connected the combined head text and the merging Tail portion text, the summary texts recombinated.
14. according to the method for claim 12, which is characterized in that when the logical construction type is progressive structure type When, it is described to recombinate the logic unit text according to text recombination form corresponding to the logical construction type, obtain weight The summary texts of group, comprising:
Determine the progressive order of each logic unit text;
Obtain progressive transitional word corresponding with the progressive structure type and corresponding with the progressive order;
According to the progressive order and corresponding progressive transitional word, it is connected each logic unit text, what is recombinated plucks Want text.
15. according to the method for claim 12, which is characterized in that when the logical construction type is replicated structures type When, it is described to recombinate the logic unit text according to text recombination form corresponding to the logical construction type, obtain weight The summary texts of group, comprising:
From the logic unit text isolated, the logic unit text of underlying semantics and the logic list of turnover semanteme are identified First text;
Determine the adversative conjunction in the summary texts;
From the summary texts, the logic unit text and the adversative conjunction of the underlying semantics are deleted, is recombinated Summary texts.
16. according to the method for claim 12, which is characterized in that when the logical construction type is summary structure type When, it is described to recombinate the logic unit text according to text recombination form corresponding to the logical construction type, obtain weight The summary texts of group, comprising:
Determine the parent logical construction type of the summary texts and the sub- grade logical construction class of each logic unit text Type;
Corresponding sub- logic unit text is isolated from each logic unit text respectively;
The corresponding sub- logic unit text that each logic unit text is isolated, respectively according to corresponding sub- grade logical construction class Text recombination form corresponding to type is recombinated, the logic unit text recombinated;
According to text recombination form corresponding to the parent logical construction type, the logic unit text of recombination is recombinated, is obtained To the summary texts of recombination.
17. a kind of summary texts generating means, which is characterized in that described device includes:
Module is obtained, for obtaining normal form text and corresponding class label;
Enquiry module, for inquiring preset normal form feature corresponding to the class label;
Extraction module, for extracting crucial text from the normal form text according to the normal form feature;
Identification module, for identification text categories belonging to the normal form text;
Die section, for, by the crucial text split of extraction, obtaining abstract text according to template corresponding to the text categories This.
18. a kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor, So that the processor is executed such as the step of any one of claims 1 to 16 the method.
19. a kind of computer equipment, including memory and processor, the memory is stored with computer program, the calculating When machine program is executed by the processor, so that the processor is executed such as any one of claims 1 to 16 the method Step.
CN201711278814.1A 2017-12-06 2017-12-06 Abstract text generation method and device, storage medium and computer equipment Active CN110069623B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201711278814.1A CN110069623B (en) 2017-12-06 2017-12-06 Abstract text generation method and device, storage medium and computer equipment
PCT/CN2018/119214 WO2019109918A1 (en) 2017-12-06 2018-12-04 Abstract text generation method, computer readable storage medium and computer device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711278814.1A CN110069623B (en) 2017-12-06 2017-12-06 Abstract text generation method and device, storage medium and computer equipment

Publications (2)

Publication Number Publication Date
CN110069623A true CN110069623A (en) 2019-07-30
CN110069623B CN110069623B (en) 2022-09-23

Family

ID=66750771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711278814.1A Active CN110069623B (en) 2017-12-06 2017-12-06 Abstract text generation method and device, storage medium and computer equipment

Country Status (2)

Country Link
CN (1) CN110069623B (en)
WO (1) WO2019109918A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110706774A (en) * 2019-09-29 2020-01-17 广州达美智能科技有限公司 Medical record generation method, terminal device and computer readable storage medium
CN110956041A (en) * 2019-11-27 2020-04-03 重庆邮电大学 Depth learning-based co-purchase recombination bulletin summarization method
CN111160019A (en) * 2019-12-30 2020-05-15 中国联合网络通信集团有限公司 Public opinion monitoring method, device and system
CN111539012A (en) * 2020-03-19 2020-08-14 重庆特斯联智慧科技股份有限公司 Privacy data distribution storage system and method of edge framework
CN111737989A (en) * 2020-06-24 2020-10-02 深圳前海微众银行股份有限公司 Intention identification method, device, equipment and storage medium
CN111859885A (en) * 2020-06-19 2020-10-30 广州大学 Automatic generation method and system for legal decision book
CN112183077A (en) * 2020-10-13 2021-01-05 京华信息科技股份有限公司 Mode recognition-based official document abstract extraction method and system
CN112395885A (en) * 2020-11-27 2021-02-23 安徽迪科数金科技有限公司 Short text semantic understanding template generation method, semantic understanding processing method and device
CN112668316A (en) * 2020-11-17 2021-04-16 国家计算机网络与信息安全管理中心 word document key information extraction method
CN112784585A (en) * 2021-02-07 2021-05-11 新华智云科技有限公司 Abstract extraction method and terminal for financial bulletin
CN113435212A (en) * 2021-08-26 2021-09-24 山东大学 Text inference method and device based on rule embedding
CN113658652A (en) * 2021-08-18 2021-11-16 四川大学华西医院 Binary relation extraction method based on electronic medical record data text
CN113806522A (en) * 2021-09-18 2021-12-17 北京百度网讯科技有限公司 Abstract generation method, device, equipment and storage medium
WO2022078308A1 (en) * 2020-10-12 2022-04-21 深圳壹账通智能科技有限公司 Method and apparatus for generating judgment document abstract, and electronic device and readable storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110750974B (en) * 2019-09-20 2023-04-25 成都星云律例科技有限责任公司 Method and system for structured processing of referee document
CN113742478B (en) * 2020-05-29 2023-09-05 国家计算机网络与信息安全管理中心 Directional screening device and method for massive text data
CN112541073B (en) * 2020-12-15 2022-12-06 科大讯飞股份有限公司 Text abstract generation method and device, electronic equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030130837A1 (en) * 2001-07-31 2003-07-10 Leonid Batchilo Computer based summarization of natural language documents
CN101620608A (en) * 2008-07-04 2010-01-06 全国组织机构代码管理中心 Information collection method and system
CN101692240A (en) * 2009-08-14 2010-04-07 北京中献电子技术开发中心 Rule-based method for patent abstract automatic extraction and keyword indexing
US20100228693A1 (en) * 2009-03-06 2010-09-09 phiScape AG Method and system for generating a document representation
CN104572849A (en) * 2014-12-17 2015-04-29 西安美林数据技术股份有限公司 Automatic standardized filing method based on text semantic mining
US20150193440A1 (en) * 2014-01-03 2015-07-09 Yahoo! Inc. Systems and methods for content processing
CN105468713A (en) * 2015-11-19 2016-04-06 西安交通大学 Multi-model fused short text classification method
CN106156204A (en) * 2015-04-23 2016-11-23 深圳市腾讯计算机系统有限公司 The extracting method of text label and device
CN106599041A (en) * 2016-11-07 2017-04-26 中国电子科技集团公司第三十二研究所 Text processing and retrieval system based on big data platform
CN106897439A (en) * 2017-02-28 2017-06-27 百度在线网络技术(北京)有限公司 The emotion identification method of text, device, server and storage medium
CN107403375A (en) * 2017-04-19 2017-11-28 北京文因互联科技有限公司 A kind of listed company's bulletin classification and abstraction generating method based on deep learning

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101604312A (en) * 2007-12-07 2009-12-16 宗刚 The method and system of the searching, managing and communicating of information
CN103699525B (en) * 2014-01-03 2016-08-31 江苏金智教育信息股份有限公司 A kind of method and apparatus automatically generating summary based on text various dimensions feature
CN105159886B (en) * 2015-10-10 2016-10-12 广东卓维网络有限公司 A kind of Outlier Detection method and system based on voucher summary texts

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030130837A1 (en) * 2001-07-31 2003-07-10 Leonid Batchilo Computer based summarization of natural language documents
CN101620608A (en) * 2008-07-04 2010-01-06 全国组织机构代码管理中心 Information collection method and system
US20100228693A1 (en) * 2009-03-06 2010-09-09 phiScape AG Method and system for generating a document representation
CN101692240A (en) * 2009-08-14 2010-04-07 北京中献电子技术开发中心 Rule-based method for patent abstract automatic extraction and keyword indexing
US20150193440A1 (en) * 2014-01-03 2015-07-09 Yahoo! Inc. Systems and methods for content processing
CN104572849A (en) * 2014-12-17 2015-04-29 西安美林数据技术股份有限公司 Automatic standardized filing method based on text semantic mining
CN106156204A (en) * 2015-04-23 2016-11-23 深圳市腾讯计算机系统有限公司 The extracting method of text label and device
CN105468713A (en) * 2015-11-19 2016-04-06 西安交通大学 Multi-model fused short text classification method
CN106599041A (en) * 2016-11-07 2017-04-26 中国电子科技集团公司第三十二研究所 Text processing and retrieval system based on big data platform
CN106897439A (en) * 2017-02-28 2017-06-27 百度在线网络技术(北京)有限公司 The emotion identification method of text, device, server and storage medium
CN107403375A (en) * 2017-04-19 2017-11-28 北京文因互联科技有限公司 A kind of listed company's bulletin classification and abstraction generating method based on deep learning

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
ARUNAV MISHRA 等: "Event Digest: A Holistic View on Past Events", 《ACM》 *
LUIS HERRANZ 等: "Combining MPEG Tools to Generate Video Summaries Adapted to the Terminal and Network", 《IEEE》 *
张其文等: "文本主题的自动提取方法研究与实现", 《计算机工程与设计》 *
张晗等: "基于语义图的医学多文档摘要提取模型构建", 《图书情报工作》 *
王叶: "事件的画报式摘要生成技术研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
程园等: "基于综合的句子特征的文本自动摘要", 《计算机科学》 *
罗明等: "一种基于语义标注特征的金融文本分类方法", 《计算机应用研究》 *
陶余会等: "一种基于文本单元关联网络的自动文摘方法", 《模式识别与人工智能》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110706774A (en) * 2019-09-29 2020-01-17 广州达美智能科技有限公司 Medical record generation method, terminal device and computer readable storage medium
CN110956041A (en) * 2019-11-27 2020-04-03 重庆邮电大学 Depth learning-based co-purchase recombination bulletin summarization method
CN111160019A (en) * 2019-12-30 2020-05-15 中国联合网络通信集团有限公司 Public opinion monitoring method, device and system
CN111160019B (en) * 2019-12-30 2023-08-15 中国联合网络通信集团有限公司 Public opinion monitoring method, device and system
CN111539012A (en) * 2020-03-19 2020-08-14 重庆特斯联智慧科技股份有限公司 Privacy data distribution storage system and method of edge framework
CN111859885A (en) * 2020-06-19 2020-10-30 广州大学 Automatic generation method and system for legal decision book
CN111737989A (en) * 2020-06-24 2020-10-02 深圳前海微众银行股份有限公司 Intention identification method, device, equipment and storage medium
WO2022078308A1 (en) * 2020-10-12 2022-04-21 深圳壹账通智能科技有限公司 Method and apparatus for generating judgment document abstract, and electronic device and readable storage medium
CN112183077A (en) * 2020-10-13 2021-01-05 京华信息科技股份有限公司 Mode recognition-based official document abstract extraction method and system
CN112668316A (en) * 2020-11-17 2021-04-16 国家计算机网络与信息安全管理中心 word document key information extraction method
CN112395885A (en) * 2020-11-27 2021-02-23 安徽迪科数金科技有限公司 Short text semantic understanding template generation method, semantic understanding processing method and device
CN112395885B (en) * 2020-11-27 2024-01-26 安徽迪科数金科技有限公司 Short text semantic understanding template generation method, semantic understanding processing method and device
CN112784585A (en) * 2021-02-07 2021-05-11 新华智云科技有限公司 Abstract extraction method and terminal for financial bulletin
CN113658652A (en) * 2021-08-18 2021-11-16 四川大学华西医院 Binary relation extraction method based on electronic medical record data text
CN113658652B (en) * 2021-08-18 2023-07-28 四川大学华西医院 Binary relation extraction method based on electronic medical record data text
CN113435212A (en) * 2021-08-26 2021-09-24 山东大学 Text inference method and device based on rule embedding
CN113435212B (en) * 2021-08-26 2021-11-16 山东大学 Text inference method and device based on rule embedding
CN113806522A (en) * 2021-09-18 2021-12-17 北京百度网讯科技有限公司 Abstract generation method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN110069623B (en) 2022-09-23
WO2019109918A1 (en) 2019-06-13

Similar Documents

Publication Publication Date Title
CN110069623A (en) Summary texts generation method, device, storage medium and computer equipment
Darko et al. Artificial intelligence in the AEC industry: Scientometric analysis and visualization of research activities
Wątróbski et al. Generalised framework for multi-criteria method selection
Munappy et al. Data management challenges for deep learning
Tsai et al. Sustainable supply chain management trends in world regions: A data-driven analysis
Tsui et al. Knowledge-based extraction of intellectual capital-related information from unstructured data
CN108009299A (en) Law tries method and device for business processing
US20150032645A1 (en) Computer-implemented systems and methods of performing contract review
CN103778548A (en) Goods information and keyword matching method, and goods information releasing method and device
CN110796470A (en) Market subject supervision and service oriented data analysis system
CN108108744B (en) Method and system for radiation image auxiliary analysis
CN110990529B (en) Industry detail dividing method and system for enterprises
Mohd Selamat et al. Big data analytics—A review of data‐mining models for small and medium enterprises in the transportation sector
CN110287292B (en) Judgment criminal measuring deviation degree prediction method and device
CN114880486A (en) Industry chain identification method and system based on NLP and knowledge graph
Matthies et al. Computer-aided text analysis of corporate disclosures-demonstration and evaluation of two approaches
CN111737421A (en) Intellectual property big data information retrieval system and storage medium
Linton et al. An extension to a DEA support system used for assessing R&D projects
CN110310012A (en) Data analysing method, device, equipment and computer readable storage medium
Janková A Bibliometric Analysis of Artificial Intelligence Technique in Financial Market.
CN110597796B (en) Big data real-time modeling method and system based on full life cycle
CN116595191A (en) Construction method and device of interactive low-code knowledge graph
Nwankwo et al. Knowledge discovery and analytics in process reengineering: a study of port clearance processes
Modrušan et al. Intelligent Public Procurement Monitoring System Powered by Text Mining and Balanced Indicators
CN112506930B (en) Data insight system based on machine learning technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant