CN107797990A - Method and apparatus for determining text core sentence - Google Patents

Method and apparatus for determining text core sentence Download PDF

Info

Publication number
CN107797990A
CN107797990A CN201710978320.8A CN201710978320A CN107797990A CN 107797990 A CN107797990 A CN 107797990A CN 201710978320 A CN201710978320 A CN 201710978320A CN 107797990 A CN107797990 A CN 107797990A
Authority
CN
China
Prior art keywords
sentence
word
text
frequency
target text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710978320.8A
Other languages
Chinese (zh)
Inventor
张翔
刘辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Shanghai Xiaodu Technology Co Ltd
Original Assignee
Science And Technology (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Science And Technology (beijing) Co Ltd filed Critical Science And Technology (beijing) Co Ltd
Priority to CN201710978320.8A priority Critical patent/CN107797990A/en
Publication of CN107797990A publication Critical patent/CN107797990A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application discloses the method and apparatus for determining text core sentence.One embodiment of this method includes:Target text is obtained from default text set, wherein, text set includes multiple texts, and text includes multiple sentences divided using predetermined symbol;The essential characteristic of the first sentence in target text is calculated, wherein, essential characteristic includes word frequency inverse document frequency, comentropy, repetitive rate, the similarity with the title of target text, and the first sentence is any sentence in target text;Based on the essential characteristic of the first sentence, determine first sentence whether be target text core sentence.This embodiment improves the degree of accuracy that core sentence is determined from text.

Description

Method and apparatus for determining text core sentence
Technical field
The application is related to field of computer technology, and in particular to Internet technical field, more particularly, to determines text The method and apparatus of core sentence.
Background technology
With science and technology progress, increasing intelligent terminal, mobile terminal etc. turn into people life in can not or A scarce part, such as smart mobile phone, smart home and computer.User installs different applications in terminal, can meet to use Family plays audio, the demand such as take pictures, search for.
Generally, corresponding to user can be searched for by retrieving the title of audio, text etc. in the application installed in terminal Audio, text etc..But many times user can not remember the title of audio, text etc. when searching for audio, text etc., and Only remember some of which sentence that what's frequently heard can be repeated in detail, therefore, user is often desirable to that audio, text etc. can be utilized that what's frequently heard can be repeated in detail Sentence searches corresponding audio, text etc..This sentence that what's frequently heard can be repeated in detail is generally it can be thought that be audio, the core of text Sentence.Therefore there is that necessity to provide a kind of method that can determine core sentence from audio, text etc. exactly, to improve sound Frequently, the searching accuracy of text etc..
The content of the invention
The purpose of the embodiment of the present application is to propose a kind of improved method and apparatus for determining text core sentence, To solve the technical problem that background section above is mentioned.
In a first aspect, the embodiment of the present application provides a kind of method for determining text core sentence, this method includes: Target text is obtained from default text set, wherein, text set includes multiple texts, and text utilizes predetermined symbol including multiple The sentence of division;The essential characteristic of the first sentence in target text is calculated, wherein, essential characteristic includes term frequency-inverse document frequency Rate, comentropy, repetitive rate, the similarity with the title of target text, the first sentence are any sentence in target text;It is based on The essential characteristic of first sentence, determine first sentence whether be target text core sentence.
In certain embodiments, the essential characteristic of the first sentence in target text is calculated, including:To each in text set The sentence of text is segmented, and obtains each word after participle, wherein, the word after the first sentence participle is the first word;Meter The term frequency-inverse document frequency of each first word is calculated, and the first sentence is determined according to the term frequency-inverse document frequency of each first sentence Term frequency-inverse document frequency;Word frequency of each first word in target text is calculated, and according to each first word in target text Word frequency determines the comentropy of each first word;Calculate repetitive rate of first sentence in target text;Calculate the first sentence and mesh Mark the similarity of the title of text.
The above method also includes in certain embodiments:Data cleansing is carried out to each text in text set, obtains each text This title and text.
In certain embodiments, the term frequency-inverse document frequency of each first word is calculated, and according to the word of each first sentence Frequently-inverse document frequency determines the term frequency-inverse document frequency of the first sentence, including:Each first word is obtained in target text Word frequency;Obtain inverse document frequency of each first word in text set;Utilize the word frequency and inverse document frequency of each first word, meter Calculate the term frequency-inverse document frequency of each first word;The term frequency-inverse document frequency of each first word is summed, determines the first language The term frequency-inverse document frequency of sentence.
In certain embodiments, word frequency of each first word in target text is calculated, and according to each first word in target The word frequency of text determines the comentropy of each first word, including:Word frequency of each first word in target text is obtained, is calculated The comentropy of each first word;The comentropy of each first word is summed, determines the comentropy of the first sentence.
In certain embodiments, the similarity of the title of the first sentence and target text is calculated, including:Calculate the first sentence With the editing distance of the title of target text;The string length of the string length of first sentence and title is contrasted, from It is middle to determine that longer string length is the first string length;According to editing distance and the ratio of the first string length, really The similarity of fixed first sentence and the title of target lyrics text.
In certain embodiments, the essential characteristic based on the first sentence, determine the first sentence whether be target text core Innermost thoughts and feelings sentence, including:To the term frequency-inverse document frequency of the first sentence, comentropy, repetitive rate, similar to the title of target text Weighted sum is spent, determines the scoring of the first sentence;Scoring based on the first sentence is more than the first predetermined threshold value, determines the first sentence For the core sentence of target text.
In certain embodiments, predetermined symbol is newline.
Second aspect, this application provides the device for determining text core sentence, device includes:Acquiring unit, match somebody with somebody Put for obtaining target text from default text set, wherein, text set includes multiple texts, and text includes multiple using pre- If the sentence of symbol division;Computing unit, it is configured to calculate the essential characteristic of the first sentence in target text, wherein, base Eigen includes term frequency-inverse document frequency, comentropy, repetitive rate, the similarity with the title of target text, and the first sentence is mesh Mark any sentence in text;Determining unit, the essential characteristic based on the first sentence is configured to, whether determines first sentence For the core sentence of target text.
In certain embodiments, computing unit includes:Word-dividing mode, it is configured to the sentence to each text in text set Segmented, obtain each word after participle, wherein, the word after the first sentence participle is the first word;Term frequency-inverse document frequency Rate computing module, it is configured to calculate word frequency of each first word in target text, and according to each first word in target text Word frequency determine the comentropy of each first word;Comentropy computing module, it is configured to calculate each first word in target text Word frequency, and the comentropy of each first word is determined according to each first word in the word frequency of target text;Repetitive rate computing module, It is configured to calculate repetitive rate of first sentence in target text;Similarity calculation module, it is configured to calculate the first sentence With the similarity of the title of target text.
In certain embodiments, device also includes:Cleaning unit, it is configured to carry out data to each text in text set Cleaning, obtains the title and text of each text.
In certain embodiments, term frequency-inverse document frequency computing module is further configured to:Each first word is obtained to exist Word frequency in target text;Obtain inverse document frequency of each first word in text set;Using each first word word frequency and Inverse document frequency, calculate the term frequency-inverse document frequency of each first word;The term frequency-inverse document frequency of each first word is asked With determine the term frequency-inverse document frequency of the first sentence.
In certain embodiments, comentropy computing module is further configured to:Each first word is obtained in target text Word frequency in this, calculate the comentropy of each first word;The comentropy of each first word is summed, determines the information of the first sentence Entropy.
In certain embodiments, similarity calculation module is further configured to:Calculate the first sentence and target text The editing distance of title;The string length of the string length of first sentence and title is contrasted, therefrom determined longer String length is the first string length;According to editing distance and the ratio of the first string length, determine the first sentence with The similarity of the title of target lyrics text.
In certain embodiments, determining unit is further configured to:To word frequency-inverse document frequency, the letter of the first sentence Entropy, repetitive rate, the Similarity-Weighted summation with the title of target text are ceased, determines the scoring of the first sentence;Based on the first sentence Scoring be more than the first predetermined threshold value, determine the first sentence be target text core sentence.
In certain embodiments, predetermined symbol is newline.
The method and apparatus for determining text core sentence that the embodiment of the present application provides, first can be from default text This concentration obtains target text, and then calculating each first sentence in target text includes term frequency-inverse document frequency, comentropy, again Multiple rate, the essential characteristic with the similarity of the title of target text, the essential characteristic value for being finally based on the first sentence may determine that First sentence whether be target text core sentence, so as to realize by using each first sentence essential characteristic raising Determine the accuracy of text core sentence.
Brief description of the drawings
By reading the detailed description made to non-limiting example made with reference to following accompanying drawings, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that the application can apply to exemplary system architecture figure therein;
Fig. 2 is the flow chart for being used to determine one embodiment of the method for text core sentence according to the application;
Fig. 3 is the flow chart for being used to determine another embodiment of the method for text core sentence according to the application;
Fig. 4 is the structural representation for being used to determine one embodiment of the device of text core sentence according to the application;
Fig. 5 is adapted for the structural representation for realizing the terminal device of the embodiment of the present application or the computer system of server Figure.
Embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Be easy to describe, illustrate only in accompanying drawing to about the related part of invention.
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combination.Describe the application in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can be using the application for determining the method for text core sentence or for determining text core The exemplary system architecture 100 of the embodiment of the device of sentence.
As shown in figure 1, system architecture 100 can include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 provide communication link medium.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be interacted with using terminal equipment 101,102,103 by network 104 with server 105, to receive or send out Send text etc..For example, user can upload text by network 104 with using terminal equipment 101,102,103 to server 105, Can also the reception server 105 send text.Various client applications can be installed on terminal device 101,102,103, Such as audio playing software, searching class application etc..
Terminal device 101,102,103 can be the various electronic equipments that web displaying or audio play, including but unlimited In smart mobile phone, tablet personal computer, smart home, E-book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio aspect 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio aspect 4) it is player, on knee portable Computer and desktop computer etc..Server 105 can be to provide the server of various services, such as the text to getting The background server analyzed and processed.
It should be noted that the method for being used to determine text core sentence that the embodiment of the present application is provided is typically by servicing Device 105 is performed, and correspondingly, the device for determining text core sentence is generally positioned in server 105.
It should be understood that the number of the terminal device, network and server in Fig. 1 is only schematical.According to realizing need Will, can have any number of terminal device, network and server.
With continued reference to Fig. 2, one embodiment for being used to determine the method for text core sentence according to the application is shown Flow 200.This is used for the method for determining text core sentence, comprises the following steps:
Step 201, target text is obtained from default text set.
In the present embodiment, first, for determine text core sentence method run with thereon electronic equipment (such as Server in Fig. 1) default text set can be obtained.The electronic equipment is with utilization wired connection mode or wireless connection side Formula receives the text that user uploads or inputted from place terminal device, and above-mentioned electronic equipment can be written by acquired group of text This collection, and be stored in local memory or External memory equipment, in order to which above-mentioned electronic equipment can obtain text set.Or Text collection can also be stored directly in the terminal device where user, above-mentioned electronic equipment can utilize wired connection mode or Person's radio connection obtains above-mentioned text set from the terminal device where user.Here, multiple texts can be included in text set This, above-mentioned target text can be the text for needing to carry out core sentence determination in text set, and each text can utilize in advance If symbol is divided into multiple sentences.For example, each text can be lyrics text, predetermined symbol can be newline, each lyrics text Originally multiple lyrics sentences can be divided into using newline.Certainly, predetermined symbol here is not only limited to newline, and it can be with For space, "/" etc..
Step 202, the essential characteristic of the first sentence in target text is calculated.
In the present embodiment, target text can be divided into multiple sentences by above-mentioned electronic equipment using predetermined symbol, and It is the first sentence therefrom to determine any sentence.Then, above-mentioned electronic equipment can calculate the base of the first sentence in target text Eigen, in order to determine first sentence whether be target text core sentence.Here, the essential characteristic of the first sentence can With the term frequency-inverse document frequency including first sentence, the comentropy of the first sentence, the repetitive rate of the first sentence, the first sentence With the similarity of the title of target text.It is understood that here, above-mentioned electronic equipment can also be from the word of the first sentence Frequently-inverse document frequency, the comentropy of the first sentence, the repetitive rate of the first sentence, the phase of the first sentence and the title of target text One of them or several essential characteristics as first sentence, such a setting is selected to be equally applicable in this implementation like in degree The method that example is provided.
The main thought of above-mentioned word frequency-reverse document-frequency method is, if some word or phrase are in an article The frequency (Term Frequency, TF) of appearance is high, and seldom occurs in other articles, then it is assumed that this word or phrase With good class discrimination ability, it is adapted to classify.And reverse document-frequency (Inverse Document Frequency, IDF) be primarily referred to as, if the document comprising some word or phrase is fewer, IDF is bigger, then illustrate the word or Phrase has good class discrimination ability.Thus, using word frequency-reverse document-frequency (TF-IDF) method, the can be calculated Importance of one sentence inside target text.Above- mentioned information entropy can be understood as the probability of occurrence of certain customizing messages.Here, The comentropy of first sentence can be understood as characterizing first sentence in target text using the probability that first sentence occurs Significance level.The repetitive rate of above-mentioned first sentence can be understood as repetitive rate of above-mentioned first sentence in target text, or The repetitive rate of the sentence of person first is also understood that the repetitive rate for the first sentence in text set.Generally, the repetition of the first sentence Rate can also characterize the significance level of first sentence in target text in certain degree.Above-mentioned first sentence and target The similarity of the title of text can characterize the similarity degree of the title of the first sentence and target text.The title of text generally exists Occupied an important position in text, therefore, the height of the title similarity of the first sentence and target text can also characterize this Significance level of one sentence in target text.Above-mentioned electronic equipment can utilize the various means such as cosine similarity formula to calculate The similarity of the title of first sentence and target text.
Step 203, the essential characteristic based on the first sentence, determine first sentence whether be target text core language Sentence.
In the present embodiment, the essential characteristic of the first sentence calculated based on step 202, above-mentioned electronic equipment can profits The essential characteristic of first sentence is handled with various means, the essential characteristic for facilitating the use the first sentence determines first sentence Whether it is core sentence in target text where it.Above-mentioned electronic equipment can calculate each sentence in target text Essential characteristic, therefore, above-mentioned electronic equipment can utilize the essential characteristic of each sentence in target text, be determined from target text Go out at least one core sentence.
For example, above-mentioned electronic equipment can pre-set threshold value, afterwards, each essential characteristic of the first sentence can be calculated Total and/or average value, then, the threshold comparison that above-mentioned electronic equipment can be by the total and/or average value with pre-setting, most Afterwards, the total and/or average value of essential characteristic can be more than to core sentence of first sentence as the target text of threshold value.Can With understanding, above-mentioned electronic equipment can also handle the essential characteristic of the first sentence, the example above using other means Exemplary only explanation.
As an example, the method for determining text core sentence provided using the present embodiment, it may be determined that lyrics text Core sentence in this, song corresponding to the core sentence search of song can be utilized in order to which audio plays application.The example Comprise the following steps that:First, above-mentioned electronic equipment can be obtained it needs to be determined that lyrics core from the lyrics text set pre-set The lyrics text of innermost thoughts and feelings sentence is target text;Afterwards, to entering line statement division as the lyrics text of target text, can be formed Multiple lyrics sentences;Then, the essential characteristic of each lyrics sentence can be calculated using each lyrics sentence as the first sentence;Finally, according to each The essential characteristic of lyrics sentence can be determined as the core sentence of the lyrics text of target text.It is understood that in response to The core sentence that user inputs lyrics text searches for song, and server can send the song where the core sentence to user Song, in order to which user can play the song on the terminal device.
The method for determining text core sentence that above-described embodiment of the application provides, first can be from default text This concentration obtains target text, and can then calculate each first sentence in target text includes term frequency-inverse document frequency, information Entropy, repetitive rate, the essential characteristic with the similarity of the title of target text, the essential characteristic value for being finally based on the first sentence can be with Determine the first sentence whether be target text core sentence, so as to realize the essential characteristic by using each first sentence Improve the accuracy for determining text core sentence.
Referring next to Fig. 3, it illustrates the another of the method for being used to determine text core sentence according to the present embodiment The flow 300 of one embodiment.The present embodiment is used to determine that the specific steps of the method for text core sentence can to include:
Step 301, target text is obtained from default text set.
In the present embodiment, for determining that the method for text core sentence is run with electronic equipment thereon (such as in Fig. 1 Server) default text set can be obtained, text collection can be made up of multiple texts.In this implementation be used for determine text The method of this core sentence is determined for the core sentence of each text of text concentration.Above-mentioned electronic equipment can be from upper State and target text is obtained in text set, the target text can be it needs to be determined that the text of core sentence in text set.Need Bright, each text in above-mentioned text set can include multiple sentences, and predetermined symbol be present between different sentences.Cause This, each text can be divided into multiple sentences by above-mentioned electronic equipment using predetermined symbol.
In the optional implementation of in the present embodiment some, default symbol in each text, between different sentences be present Number can be newline.It can be seen that each text in above-mentioned text set can be the text with special format, such as lyrics text, Book of Songs etc..Generally, the division of line statement can not be entered in the text such as the lyrics, Book of Songs with punctuation mark, when texts such as song, the Book of Songs In a Statement Completion when, can utilize newline line feed after show next sentence.It can be seen that can be by text using newline The texts such as each lyrics of this concentration, the Book of Songs are divided into sentence.It is understood that above-mentioned lyrics text, Book of Songs text are only pair Text and Chinese version it is pattern for example, being not unique restriction to the pattern of each text in text set.
In the optional implementation of in the present embodiment some, after above-mentioned electronic equipment obtains text set, it may be used also Data cleansing is carried out with each text concentrated to the text, deletes the dirty data in each text.Here dirty data can be thought as The content of the core sentence of the text is unlikely to be in text.For example, if each text in above-mentioned text set is lyrics text, Writing words, wrirting music in lyrics text, music and name etc. are unlikely to be the core sentence in the lyrics text, therefore can will It is deleted as dirty data, so as to realize the data cleansing to lyrics text.Optionally, of the text in above-mentioned text set In other version, may there is a situation where to omit repeat statement, now should be according to the polishing that puts in order of each sentence in text The sentence of text.
Step 302, the sentence of each text in text set is segmented, obtains each word after participle.
In the present embodiment, each text in text set can be divided into multiple by above-mentioned electronic equipment using predetermined symbol Sentence.Then, for each text in text set, it can be segmented using various means to the sentence in the text, And obtain the word after each sentence participle.As an example, above-mentioned electronic equipment can use the method for full cutting to each text In sentence segmented.It is understood that for the target text acquired in electronic equipment, the target text equally also may be used To be divided into multiple words.It should be noted that it is determined that after the first sentence in target text, first sentence can be obtained Participle after word, and the first sentence participle after word can be the first word.
In some optional implementations of the present embodiment, using full cutting method, it can be syncopated as first and language The all possible word of dictionary matching, then optimal cutting result is determined with statistical language model.With with song《Unforgettable the present Night》Lyrics text sentence " no matter the ends of the earth and cape " exemplified by, language dictionary matching can be carried out first, find the institute of matching There is word --- no matter, the ends of the earth, with cape, by day, the ends of the earth and cape;These words table in the form of word grid (word lattices) Show, be next based on word grid and do route searching, then optimal path is found based on statistical language model (such as N-Gram models). If result shows the language model scores highest of " no matter the ends of the earth and cape ", " no matter the ends of the earth and cape " be " no matter day The optimal cutting of margin and cape ".
Step 303, the term frequency-inverse document frequency of each first word is calculated, and according to word frequency-inverse text of each first sentence Shelves frequency determines the term frequency-inverse document frequency of the first sentence.
In the present embodiment, included by the first word and text set included by the first sentence obtained based on step 302 Each word, above-mentioned electronic equipment can calculate the term frequency-inverse document frequency of each first word, and then, it can be according to Word frequency-inverse document frequency of each first word in one sentence determines the term frequency-inverse document frequency of first sentence.
In some optional implementations of the present embodiment, above-mentioned electronic equipment can calculate any first word first Word frequency TF in target text.Afterwards, it can calculate inverse document frequency IDF of first word in text set.Then, The term frequency-inverse document frequency TF-IDF of first word is calculated according to the word frequency TF of first word and inverse document frequency IDF. In this way, above-mentioned electronic equipment can calculate the term frequency-inverse document frequency TF-IDF of each first word in the first sentence.Most Afterwards, above-mentioned electronic equipment can sum to the term frequency-inverse document frequency TF-IDF of each first word in the first sentence, so as to To determine the word frequency of above-mentioned first sentence-inverse document frequency TF-IDF.
As an example, the target text that above-mentioned electronic equipment obtains from text set can be《Remember tonight》The lyrics text This, above-mentioned text set can include m1 text, wherein, the number of the word after target text participle is m2, if judging above-mentioned " no matter the ends of the earth and cape " in lyrics text whether be the lyrics text core sentence, it is seen that " no matter the ends of the earth and cape " can Think the first sentence of target text, and the first word that first sentence includes can be respectively " no matter ", " ends of the earth ", "AND" and " cape ", wherein " no matter " number that occurs in above-mentioned lyrics text is n1, and in n2 text in text set Occur first word " no matter ";According to term frequency-inverse document frequency TF-IDF formula, " no matter " word of this first word Frequently-inverse document frequency" ends of the earth ", "AND" and " cape " can also be calculated according to the above method Term frequency-inverse document frequency TF-IDF, then, to " no matter ", " ends of the earth ", term frequency-inverse document frequency corresponding to "AND" and " cape " Rate TF-IDF summations are the term frequency-inverse document frequency TF-IDF ' of the first sentence " no matter the ends of the earth and cape ".It can be seen that above-mentioned electricity Sub- equipment can utilize the term frequency-inverse document frequency TF-IDF ' of each sentence in this method calculating target text.
Step 304, calculate the word frequency of each first word in target text, and according to each first word target text word Frequency determines the comentropy of each first word.
In the present embodiment, based on step 302 obtain each first sentence included by the first word and target text institute Including each word, above-mentioned electronic equipment can calculate word frequency of each first word in target text, and then, it can root The term frequency-inverse document frequency of first sentence is determined according to the word frequency of each first word in the first sentence.
In some optional implementations of the present embodiment, above-mentioned electronic equipment can calculate any first word first Word frequency TF in target text.Afterwards, the comentropy H of first word can be calculated.Then, can be according to first word Comentropy H calculate the comentropy H ' of first sentence.In this way, above-mentioned electronic equipment can calculate each in the first sentence The comentropy H of one word.Finally, above-mentioned electronic equipment can sum to the comentropy H of each first word in the first sentence, from And the comentropy H ' of above-mentioned first sentence can be determined.
Still with the example in step 303 as an example, above-mentioned electronic equipment can calculate " no matter " word frequency of this first wordThen according to the formula of comentropy, " no matter " comentropy of this first word It can be seen that the comentropy H of " ends of the earth ", "AND" and " cape " can also be calculated according to the above method, then, to " no matter ", " my god Comentropy H summations corresponding to margin ", "AND" and " cape " are the comentropy H ' of the first sentence " no matter the ends of the earth and cape ".It can be seen that Above-mentioned electronic equipment can utilize the comentropy H ' of each sentence in this method calculating target text.
Step 305, repetitive rate of first sentence in target text is calculated.
In the present embodiment, above-mentioned electronic equipment, which can count target text first, includes the number a of total sentence, then The number b that the first sentence occurs is determined in target text again, can finally calculate first sentence in the target text The number b and the number a of total sentence of target text ratio that repetitive rate q, repetitive rate q can occur for the first sentence, i.e. q =ba.
Step 306, the similarity of the title of the first sentence and target text is calculated.
In the present embodiment, above-mentioned electronic equipment can extract the text and title of the text from target text first. Then, the similarity of its first sentence and the title of the target text that can utilize various means calculating target text.
In some optional implementations of the present embodiment, above-mentioned electronic equipment can calculate above-mentioned first sentence first With the editing distance of the title of target text.Editing distance, also known as Levenshtein distances, between two word strings can be referred to, As the minimum edit operation number needed for one changes into another.It can be seen that the editing distance of the first sentence of calculating and title is first The character string of the first sentence and the character string of title can be obtained, then calculates the editing distance D between two character strings.It Afterwards, above-mentioned electronic equipment can obtain the length K1 of the character string of the first sentence and the length K2 of the character string of title, and by K1 Contrasted with K2, the length for therefrom determining the longer character string of character string is the first string length K, i.e. K=max (K1, K2).Most Afterwards, above-mentioned electronic equipment can determine the first sentence and the target lyrics according to editing distance D and the first string length K ratio The similarity p of the title of text.Optionally, similarity p=1-KD.It can be seen that above-mentioned electronic equipment can utilize this method to calculate The similarity p of each sentence and title in target text.
Step 307, to the term frequency-inverse document frequency of the first sentence, comentropy, repetitive rate, with the title of target text Similarity-Weighted is summed, and determines the scoring of the first sentence.
In the present embodiment, the first sentence determined respectively based on step 303, step 304, step 305 and step 306 Term frequency-inverse document frequency, the comentropy of the first sentence, the repetitive rate of the first sentence and the title of the first sentence and target text Similarity, above-mentioned electronic equipment can be respectively the term frequency-inverse document frequency of the first sentence, the comentropy of the first sentence, The repetitive rate of one sentence and the similarity of the title of the first sentence and target text assign weights, then, to the first language after weighting Term frequency-inverse document frequency, the comentropy of the first sentence, the repetitive rate of the first sentence and the mark of the first sentence and target text of sentence The similarity summation of topic, and should can think the scoring of above-mentioned first sentence.It is understood that above-mentioned electronic equipment is according to The term frequency-inverse document frequency of one sentence, the comentropy of the first sentence, the repetitive rate of the first sentence and the first sentence and target text Title similarity the significance level of target text is come for its assign weights.
In some optional implementations of the present embodiment, above-mentioned electronic equipment is the term frequency-inverse document of the first sentence Frequency TF-IDF ', the comentropy H ' of the first sentence, the repetitive rate q of the first sentence and the title of the first sentence and target text The weights that similarity p is assigned are respectively x1, x2, x3 and x4, it is seen then that scoring S=x1 × TF-IDF'+x2 × H' of first sentence + x3 × q+x4 × p, above-mentioned electronic equipment can be scored using the scoring formula for each sentence in target text,
Step 308, the scoring based on the first sentence is more than the first predetermined threshold value, determines the core that the first sentence is target text Innermost thoughts and feelings sentence.
In the present embodiment, based on step 307 determine the first sentence scoring, above-mentioned electronic equipment can by this first The scoring of sentence contrasts with default first threshold.If the scoring of above-mentioned first sentence is more than the first predetermined threshold value, can be true Make the core sentence that first sentence can be target text where it.If the scoring of above-mentioned first sentence is less than or equal to the One predetermined threshold value, then it is not the core sentence for target text that can determine first sentence.
From figure 3, it can be seen that compared with embodiment corresponding to Fig. 2, being used in the present embodiment determines text core language The step of flow 300 of the method for sentence highlights the calculating to the essential characteristic of the first sentence and the step to the scoring of the first sentence Suddenly.Thus, the scheme of the present embodiment description can more accurately determine the scoring of the first sentence, so as to improve as target Text determines the accuracy of core sentence.
With further reference to Fig. 4, as the realization to method shown in above-mentioned each figure, it is used to determine text this application provides one kind One embodiment of the device of this core sentence, the device embodiment is corresponding with the embodiment of the method shown in Fig. 2, device tool Body can apply in various electronic equipments.
As shown in figure 4, the present embodiment is used to determine that the device 400 of text core sentence to include:Acquiring unit 401, meter Calculate unit 402 and determining unit 403.Wherein, acquiring unit 401 is configured to obtain target text from default text set, Wherein, text set includes multiple texts, and text includes multiple sentences divided using predetermined symbol;The configuration of computing unit 402 is used In the essential characteristic for calculating the first sentence in target text, wherein, essential characteristic includes word frequency-inverse document frequency, information Entropy, repetitive rate, the similarity with the title of target text, the first sentence are any sentence in target text;Determining unit 403 Be configured to the essential characteristic based on the first sentence, determine first sentence whether be target text core sentence.
In the present embodiment, the computing unit 402 of the device 400 for determining text core sentence can include participle Module (not shown), it is configured to segment the sentence of each text in text set, obtains each word after participle, its In, the word after the first sentence participle is the first word;Term frequency-inverse document frequency computing module (not shown), is configured to count Word frequency of each first word in target text is calculated, and each first word is determined in the word frequency of target text according to each first word Comentropy;Comentropy computing module (not shown), it is configured to calculate word frequency of each first word in target text, and according to each First word determines the comentropy of each first word in the word frequency of target text;Repetitive rate computing module (not shown), configuration are used In repetitive rate of the first sentence of calculating in target text;Similarity calculation module (not shown), it is configured to calculate the first language The similarity of sentence and the title of target text.
In the present embodiment, for determining that the device 400 of text core sentence can also include cleaning unit (not shown), It is configured to carry out data cleansing to each text in text set, obtains the title and text of each text.
In the present embodiment, term frequency-inverse document frequency computing module is further configured to:Each first word is obtained in mesh Mark the word frequency in text;Obtain inverse document frequency of each first word in text set;Utilize the word frequency of each first word and inverse Document frequency, calculate the term frequency-inverse document frequency of each first word;The term frequency-inverse document frequency of each first word is summed, Determine the term frequency-inverse document frequency of the first sentence.
In the present embodiment, comentropy computing module is further configured to:Each first word is obtained in target text In word frequency, calculate the comentropy of each first word;The comentropy of each first word is summed, determines the information of the first sentence Entropy.
In the present embodiment, similarity calculation module is further configured to:Calculate the mark of the first sentence and target text The editing distance of topic;The string length of the string length of first sentence and title is contrasted, therefrom determines longer word Symbol string length is the first string length;According to editing distance and the ratio of the first string length, the first sentence and mesh are determined Mark the similarity of the title of lyrics text.
In the present embodiment, determining unit 403 is further configured to:Term frequency-inverse document frequency, letter to the first sentence Entropy, repetitive rate, the Similarity-Weighted summation with the title of target text are ceased, determines the scoring of the first sentence;Based on the first sentence Scoring be more than the first predetermined threshold value, determine the first sentence be target text core sentence.
Below with reference to Fig. 5, it illustrates suitable for for realizing the calculating of the terminal device/server of the embodiment of the present application The structural representation of machine system 500.Terminal device/server shown in Fig. 5 is only an example, and the application should not be implemented The function and use range of example bring any restrictions.
As shown in figure 5, computer system 500 includes CPU (CPU) 501, it can be read-only according to being stored in Program in memory (ROM) 502 or the program being loaded into from storage part 508 in random access storage device (RAM) 503 And perform various appropriate actions and processing.In RAM 503, also it is stored with system 500 and operates required various program sums According to.CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 also connects To bus 504.
I/O interfaces 505 are connected to lower component:Importation 506 including keyboard, mouse etc.;Penetrated including such as negative electrode The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage part 508 including hard disk etc.; And the communications portion 509 of the NIC including LAN card, modem etc..Communications portion 509 via such as because The network of spy's net performs communication process.Driver 510 is also according to needing to be connected to I/O interfaces 505.Detachable media 511, it is all Such as disk, CD, magneto-optic disk, semiconductor memory, it is arranged on as needed on driver 510, in order to read from it The computer program gone out is mounted into storage part 508 as needed.
Especially, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product, it includes being carried on computer-readable medium On computer program, the computer program include be used for execution flow chart shown in method program code.In such reality To apply in example, the computer program can be downloaded and installed by communications portion 509 from network, and/or from detachable media 511 are mounted.When the computer program is performed by CPU (CPU) 501, perform and limited in the present processes Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or Person's computer-readable recording medium either the two any combination.Computer-readable recording medium for example can be --- But be not limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor, or it is any more than group Close.The more specifically example of computer-readable recording medium can include but is not limited to:With being electrically connected for one or more wires Connect, portable computer diskette, hard disk, random access storage device (RAM), read-only storage (ROM), erasable type may be programmed it is read-only Memory (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory Part or above-mentioned any appropriate combination.In this application, computer-readable recording medium can any be included or store The tangible medium of program, the program can be commanded the either device use or in connection of execution system, device.And In the application, computer-readable signal media can include believing in a base band or as the data that a carrier wave part is propagated Number, wherein carrying computer-readable program code.The data-signal of this propagation can take various forms, including but not It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer Any computer-readable medium beyond readable storage medium storing program for executing, the computer-readable medium can send, propagate or transmit use In by instruction execution system, device either device use or program in connection.Included on computer-readable medium Program code any appropriate medium can be used to transmit, include but is not limited to:Wirelessly, electric wire, optical cable, RF etc., Huo Zheshang Any appropriate combination stated.
Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system of the various embodiments of the application, method and computer journey Architectural framework in the cards, function and the operation of sequence product.At this point, each square frame in flow chart or block diagram can generation The part of one module of table, program segment or code, the part of the module, program segment or code include one or more use In the executable instruction of logic function as defined in realization.It should also be noted that marked at some as in the realization replaced in square frame The function of note can also be with different from the order marked in accompanying drawing generation.For example, two square frames succeedingly represented are actually It can perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is depending on involved function.Also to note Meaning, the combination of each square frame and block diagram in block diagram and/or flow chart and/or the square frame in flow chart can be with holding Function as defined in row or the special hardware based system of operation are realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit can also be set within a processor, for example, can be described as:A kind of processor bag Include acquiring unit, computing unit and determining unit.Wherein, the title of these units is not formed to the unit under certain conditions The restriction of itself, for example, acquiring unit is also described as " unit that target text is obtained from default text set ".
As on the other hand, present invention also provides a kind of computer-readable medium, the computer-readable medium can be Included in device described in above-described embodiment;Can also be individualism, and without be incorporated the device in.Above-mentioned calculating Machine computer-readable recording medium carries one or more program, when said one or multiple programs are performed by the device so that should Device:Target text is obtained from default text set, wherein, text set includes multiple texts, and text includes multiple using pre- If the sentence of symbol division;The essential characteristic of the first sentence in target text is calculated, wherein, essential characteristic includes word frequency-inverse Document frequency, comentropy, repetitive rate, the similarity with the title of target text, the first sentence are any language in target text Sentence;Based on the essential characteristic of the first sentence, determine first sentence whether be target text core sentence.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.People in the art Member should be appreciated that invention scope involved in the application, however it is not limited to the technology that the particular combination of above-mentioned technical characteristic forms Scheme, while should also cover in the case where not departing from foregoing invention design, carried out by above-mentioned technical characteristic or its equivalent feature The other technical schemes for being combined and being formed.Such as features described above have with (but not limited to) disclosed herein it is similar The technical scheme that the technical characteristic of function is replaced mutually and formed.

Claims (18)

  1. A kind of 1. method for determining text core sentence, it is characterised in that including:
    Target text is obtained from default text set, wherein, the text set includes multiple texts, and the text includes multiple The sentence divided using predetermined symbol;
    The essential characteristic of the first sentence in the target text is calculated, wherein, the essential characteristic includes term frequency-inverse document frequency Rate, comentropy, repetitive rate, the similarity with the title of target text, first sentence are any in the target text Sentence;
    Based on the essential characteristic of first sentence, determine first sentence whether be the target text core language Sentence.
  2. 2. according to the method for claim 1, it is characterised in that the base for calculating the first sentence in the target text Eigen, including:
    The sentence of each text in the text set is segmented, obtains each word after participle, wherein, first sentence Word after participle is the first word;
    The term frequency-inverse document frequency of each first word is calculated, and according to the term frequency-inverse document frequency of each first sentence Rate determines the term frequency-inverse document frequency of first sentence;
    Word frequency of each first word in the target text is calculated, and according to each first word in the target text Word frequency determine the comentropy of each first word;
    Calculate repetitive rate of first sentence in the target text;
    Calculate the similarity of first sentence and the title of the target text.
  3. 3. according to the method for claim 1, it is characterised in that also include:
    Data cleansing is carried out to each text in the text set, obtains the title and text of each text.
  4. 4. according to the method for claim 2, it is characterised in that the word frequency-inverse text for calculating each first word Shelves frequency, and the term frequency-inverse document of first sentence is determined frequently according to the term frequency-inverse document frequency of each first sentence Rate, including:
    Obtain word frequency of each first word in the target text;
    Obtain inverse document frequency of each first word in the text set;
    Using the word frequency and inverse document frequency of each first word, the term frequency-inverse document of each first word is calculated frequently Rate;
    The term frequency-inverse document frequency of each first word is summed, determines the term frequency-inverse document frequency of first sentence.
  5. 5. according to the method for claim 2, it is characterised in that described to calculate each first word in the target text Word frequency, and the comentropy of each first word is determined according to each first word in the word frequency of the target text, bag Include:
    Each word frequency of first word in the target text is obtained, calculates the comentropy of each first word;
    The comentropy of each first word is summed, determines the comentropy of first sentence.
  6. 6. according to the method for claim 2, it is characterised in that described to calculate first sentence and the target text The similarity of title, including:
    Calculate the editing distance of first sentence and the title of the target text;
    The string length of first sentence and the string length of the title are contrasted, therefrom determine longer character String length is the first string length;
    According to the editing distance and the ratio of first string length, first sentence and the target lyrics are determined The similarity of the title of text.
  7. 7. according to the method for claim 1, it is characterised in that the essential characteristic based on first sentence, it is determined that First sentence whether be the target text core sentence, including:
    To the term frequency-inverse document frequency of first sentence, comentropy, repetitive rate, add with the similarity of the title of target text Power summation, determines the scoring of first sentence;
    Scoring based on first sentence is more than the first predetermined threshold value, determines the core that first sentence is the target text Innermost thoughts and feelings sentence.
  8. 8. according to the method for claim 1, it is characterised in that the predetermined symbol is newline.
  9. A kind of 9. device for being used to determine text core sentence, it is characterised in that including:
    Acquiring unit, it is configured to obtain target text from default text set, wherein, the text set includes multiple texts This, the text includes multiple sentences divided using predetermined symbol;
    Computing unit, the essential characteristic for the first sentence for being configured to calculate in the target text, wherein, the essential characteristic Including term frequency-inverse document frequency, comentropy, repetitive rate, the similarity with the title of target text, first sentence is described Any sentence in target text;
    Determining unit, the essential characteristic based on first sentence is configured to, determines whether first sentence is described The core sentence of target text.
  10. 10. device according to claim 9, it is characterised in that the computing unit includes:
    Word-dividing mode, it is configured to segment the sentence of each text in the text set, obtains each word after participle, Wherein, the word after the first sentence participle is the first word;
    Term frequency-inverse document frequency computing module, it is configured to calculate word frequency of each first word in the target text, and The comentropy of each first word is determined in the word frequency of the target text according to each first word;
    Comentropy computing module, it is configured to calculate word frequency of each first word in the target text, and according to each institute State the comentropy that the first word determines each first word in the word frequency of the target text;
    Repetitive rate computing module, it is configured to calculate repetitive rate of first sentence in the target text;
    Similarity calculation module, it is configured to calculate the similarity of the title of first sentence and the target text.
  11. 11. device according to claim 9, it is characterised in that described device also includes:
    Cleaning unit, it is configured to carry out data cleansing to each text in the text set, obtains each text Title and text.
  12. 12. device according to claim 10, it is characterised in that the term frequency-inverse document frequency computing module is further It is configured to:
    Obtain word frequency of each first word in the target text;
    Obtain inverse document frequency of each first word in the text set;
    Using the word frequency and inverse document frequency of each first word, the term frequency-inverse document of each first word is calculated frequently Rate;
    The term frequency-inverse document frequency of each first word is summed, determines the term frequency-inverse document frequency of first sentence.
  13. 13. device according to claim 10, it is characterised in that described information entropy computing module is further configured to:
    Each word frequency of first word in the target text is obtained, calculates the comentropy of each first word;
    The comentropy of each first word is summed, determines the comentropy of first sentence.
  14. 14. device according to claim 10, it is characterised in that the similarity calculation module is further configured to:
    Calculate the editing distance of first sentence and the title of the target text;
    The string length of first sentence and the string length of the title are contrasted, therefrom determine longer character String length is the first string length;
    According to the editing distance and the ratio of first string length, first sentence and the target lyrics are determined The similarity of the title of text.
  15. 15. device according to claim 9, it is characterised in that the determining unit is further configured to:
    To the term frequency-inverse document frequency of first sentence, comentropy, repetitive rate, add with the similarity of the title of target text Power summation, determines the scoring of first sentence;
    Scoring based on first sentence is more than the first predetermined threshold value, determines the core that first sentence is the target text Innermost thoughts and feelings sentence.
  16. 16. device according to claim 9, it is characterised in that the predetermined symbol is newline.
  17. 17. a kind of server, including:
    One or more processors;
    Storage device, for storing one or more programs,
    When one or more of programs are by one or more of computing devices so that one or more of processors are real The now method as described in any in claim 1-7.
  18. 18. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the program is by processor The method as described in any in claim 1-7 is realized during execution.
CN201710978320.8A 2017-10-18 2017-10-18 Method and apparatus for determining text core sentence Pending CN107797990A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710978320.8A CN107797990A (en) 2017-10-18 2017-10-18 Method and apparatus for determining text core sentence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710978320.8A CN107797990A (en) 2017-10-18 2017-10-18 Method and apparatus for determining text core sentence

Publications (1)

Publication Number Publication Date
CN107797990A true CN107797990A (en) 2018-03-13

Family

ID=61533470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710978320.8A Pending CN107797990A (en) 2017-10-18 2017-10-18 Method and apparatus for determining text core sentence

Country Status (1)

Country Link
CN (1) CN107797990A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664470A (en) * 2018-05-04 2018-10-16 武汉斗鱼网络科技有限公司 Measure, readable storage medium storing program for executing and the electronic equipment of video title information amount
CN109388804A (en) * 2018-10-22 2019-02-26 平安科技(深圳)有限公司 Report core views extracting method and device are ground using the security of deep learning model
CN109726282A (en) * 2018-12-26 2019-05-07 东软集团股份有限公司 A kind of method, apparatus, equipment and storage medium generating article abstract
CN111125424A (en) * 2019-12-26 2020-05-08 腾讯音乐娱乐科技(深圳)有限公司 Method, device, equipment and storage medium for extracting core lyrics of song
CN111476021A (en) * 2020-04-07 2020-07-31 北京字节跳动网络技术有限公司 Method, device, electronic equipment and computer readable medium for outputting information
CN111523310A (en) * 2020-04-01 2020-08-11 北京大米未来科技有限公司 Data processing method, data processing apparatus, storage medium, and electronic device
CN111930890A (en) * 2020-07-28 2020-11-13 深圳市梦网科技发展有限公司 Information sending method and device, terminal equipment and storage medium
CN113987174A (en) * 2021-10-22 2022-01-28 上海携旅信息技术有限公司 Core statement extraction method, system, equipment and storage medium for classification label
CN117811851A (en) * 2024-03-01 2024-04-02 深圳市聚亚科技有限公司 Data transmission method for 4G communication module

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101609472A (en) * 2009-08-13 2009-12-23 腾讯科技(深圳)有限公司 A kind of keyword evaluation method and device based on the question and answer platform
CN105159927A (en) * 2015-08-04 2015-12-16 北京金山安全软件有限公司 Method and device for selecting subject term of target text and terminal
CN105224695A (en) * 2015-11-12 2016-01-06 中南大学 A kind of text feature quantization method based on information entropy and device and file classification method and device
CN105488024A (en) * 2015-11-20 2016-04-13 广州神马移动信息科技有限公司 Webpage topic sentence extraction method and apparatus
CN105843795A (en) * 2016-03-21 2016-08-10 华南理工大学 Topic model based document keyword extraction method and system
US20170109786A1 (en) * 2015-10-20 2017-04-20 Korea Electronics Technology Institute System for producing promotional media content and method thereof
CN107066555A (en) * 2017-03-26 2017-08-18 天津大学 Towards the online topic detection method of professional domain

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101609472A (en) * 2009-08-13 2009-12-23 腾讯科技(深圳)有限公司 A kind of keyword evaluation method and device based on the question and answer platform
CN105159927A (en) * 2015-08-04 2015-12-16 北京金山安全软件有限公司 Method and device for selecting subject term of target text and terminal
US20170109786A1 (en) * 2015-10-20 2017-04-20 Korea Electronics Technology Institute System for producing promotional media content and method thereof
CN105224695A (en) * 2015-11-12 2016-01-06 中南大学 A kind of text feature quantization method based on information entropy and device and file classification method and device
CN105488024A (en) * 2015-11-20 2016-04-13 广州神马移动信息科技有限公司 Webpage topic sentence extraction method and apparatus
CN105843795A (en) * 2016-03-21 2016-08-10 华南理工大学 Topic model based document keyword extraction method and system
CN107066555A (en) * 2017-03-26 2017-08-18 天津大学 Towards the online topic detection method of professional domain

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MAKBULE GULCIN OZSOY等: "Text Summarization of Turkish Texts using Latent Semantic Analysis", 《PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS (COLING 2010)》 *
刘金岭等: "基于词汇链的中文短信主题语句抽取方法", 《计算机工程与应用》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664470B (en) * 2018-05-04 2022-06-17 武汉斗鱼网络科技有限公司 Method for measuring video title information amount, readable storage medium and electronic equipment
CN108664470A (en) * 2018-05-04 2018-10-16 武汉斗鱼网络科技有限公司 Measure, readable storage medium storing program for executing and the electronic equipment of video title information amount
CN109388804A (en) * 2018-10-22 2019-02-26 平安科技(深圳)有限公司 Report core views extracting method and device are ground using the security of deep learning model
CN109726282A (en) * 2018-12-26 2019-05-07 东软集团股份有限公司 A kind of method, apparatus, equipment and storage medium generating article abstract
CN111125424A (en) * 2019-12-26 2020-05-08 腾讯音乐娱乐科技(深圳)有限公司 Method, device, equipment and storage medium for extracting core lyrics of song
CN111125424B (en) * 2019-12-26 2024-01-09 腾讯音乐娱乐科技(深圳)有限公司 Method, device, equipment and storage medium for extracting core lyrics of song
CN111523310B (en) * 2020-04-01 2023-06-13 北京大米未来科技有限公司 Data processing method, data processing device, storage medium and electronic equipment
CN111523310A (en) * 2020-04-01 2020-08-11 北京大米未来科技有限公司 Data processing method, data processing apparatus, storage medium, and electronic device
CN111476021B (en) * 2020-04-07 2023-08-15 抖音视界有限公司 Method, apparatus, electronic device, and computer-readable medium for outputting information
CN111476021A (en) * 2020-04-07 2020-07-31 北京字节跳动网络技术有限公司 Method, device, electronic equipment and computer readable medium for outputting information
CN111930890A (en) * 2020-07-28 2020-11-13 深圳市梦网科技发展有限公司 Information sending method and device, terminal equipment and storage medium
CN113987174A (en) * 2021-10-22 2022-01-28 上海携旅信息技术有限公司 Core statement extraction method, system, equipment and storage medium for classification label
CN117811851A (en) * 2024-03-01 2024-04-02 深圳市聚亚科技有限公司 Data transmission method for 4G communication module
CN117811851B (en) * 2024-03-01 2024-05-17 深圳市聚亚科技有限公司 Data transmission method for 4G communication module

Similar Documents

Publication Publication Date Title
CN107797990A (en) Method and apparatus for determining text core sentence
US11907277B2 (en) Method, apparatus, and computer program product for classification and tagging of textual data
US20200019609A1 (en) Suggesting a response to a message by selecting a template using a neural network
US10114809B2 (en) Method and apparatus for phonetically annotating text
CN109408826A (en) A kind of text information extracting method, device, server and storage medium
CN110347908B (en) Voice shopping method, device, medium and electronic equipment
US10685012B2 (en) Generating feature embeddings from a co-occurrence matrix
CN105956053A (en) Network information-based search method and apparatus
CN111414561B (en) Method and device for presenting information
US20220147835A1 (en) Knowledge graph construction system and knowledge graph construction method
US11238050B2 (en) Method and apparatus for determining response for user input data, and medium
CN108628830A (en) A kind of method and apparatus of semantics recognition
CN106919711A (en) The method and apparatus of the markup information based on artificial intelligence
US20150046462A1 (en) Identifying actions in documents using options in menus
CN102930048A (en) Data abundance automatically found by semanteme and using reference and visual data
CN108304381B (en) Entity edge establishing method, device and equipment based on artificial intelligence and storage medium
CN106663123B (en) Comment-centric news reader
CN107766498A (en) Method and apparatus for generating information
CN112686035A (en) Method and device for vectorizing unknown words
CN111859970B (en) Method, apparatus, device and medium for processing information
US11929100B2 (en) Video generation method, apparatus, electronic device, storage medium and program product
CN116402166A (en) Training method and device of prediction model, electronic equipment and storage medium
KR20200082240A (en) Apparatus for determining title of user, system including the same, terminal and method for the same
WO2006106740A1 (en) Information processing device and method, and program recording medium
CN107483595A (en) Information-pushing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20190123

Address after: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing

Applicant after: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd.

Address before: 100020 Block 508, Overseas Chinese Fufang Grassland, No. 9 Dongdaqiao Road, Chaoyang District, Beijing

Applicant before: Raven Technology (Beijing) Co.,Ltd.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210507

Address after: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing

Applicant after: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd.

Applicant after: Shanghai Xiaodu Technology Co.,Ltd.

Address before: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing

Applicant before: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180313