CN110597978A - Article abstract generation method and system, electronic equipment and readable storage medium - Google Patents

Article abstract generation method and system, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN110597978A
CN110597978A CN201810603797.2A CN201810603797A CN110597978A CN 110597978 A CN110597978 A CN 110597978A CN 201810603797 A CN201810603797 A CN 201810603797A CN 110597978 A CN110597978 A CN 110597978A
Authority
CN
China
Prior art keywords
sentences
abstract
keywords
sentence
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810603797.2A
Other languages
Chinese (zh)
Other versions
CN110597978B (en
Inventor
简晓容
佘志东
张震涛
江丹丹
饶正锋
谢蔚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201810603797.2A priority Critical patent/CN110597978B/en
Publication of CN110597978A publication Critical patent/CN110597978A/en
Application granted granted Critical
Publication of CN110597978B publication Critical patent/CN110597978B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • G06Q30/0625Directed, with specific intent or strategy
    • G06Q30/0627Directed, with specific intent or strategy using item specifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a system, electronic equipment and a readable storage medium for generating an article abstract, wherein the method for generating the article abstract comprises the following steps: identifying the text-describing picture of the target object to obtain a plurality of text-describing sentences; extracting a plurality of text description keywords of the target object from the text description sentence; calculating to obtain a TF-IDF value of each text description keyword based on a TF-IDF algorithm; extracting N objective selling point keywords from the plurality of sketch keywords according to the TF-IDF value; n is a natural number; selecting matched sentences from the text description sentences according to the objective selling point keywords; and generating the abstract of the target object according to the matching statement. The invention can automatically write the abstract of the article according to the text and the drawing of the article, so that the writing quality can be controlled, the writing time is shortened, the writing efficiency is improved, and the writing cost is reduced.

Description

Article abstract generation method and system, electronic equipment and readable storage medium
Technical Field
The invention belongs to the technical field of information processing, and particularly relates to a method and a system for generating an article abstract, an electronic device and a readable storage medium.
Background
Typically, the internet website will set up some columns or channels to show or recommend items to the user, such as finding good items, buying albums, etc. To attract users, titles, pictures and textual descriptions are added to the articles, and the textual descriptions are referred to as abstracts in the present application. In the prior art, the abstract generation method is to directly utilize the description of the merchant on the commodity or is written by a specific group (such as a dawn). The description of the commodity by the merchant is directly utilized, and although the description is easier to obtain, the defects are as follows: the language is more rigid and the technical description is emphasized; the picture information written by a specific group of people can be referred to, the language is vivid and diversified, and the picture information is easy to attract users, but the defects are as follows: the writing is performed by specially-assigned people, which is time-consuming, labor-consuming and high in cost, and meanwhile, due to the fact that the levels of manual writing are different, the writing quality is difficult to control.
Disclosure of Invention
The invention aims to overcome the defects that the writing quality is difficult to control, the time consumption is long and the cost is high when product information is written manually in the prior art, and provides an article abstract generation method, a system, an electronic device and a readable storage medium.
The invention solves the technical problems through the following technical scheme:
an article abstract generating method comprises the following steps:
identifying the text-describing picture of the target object to obtain a plurality of text-describing sentences;
extracting a plurality of text description keywords of the target object from the text description sentence;
calculating to obtain a TF-IDF value of each text description keyword based on a TF-IDF (word frequency-inverse file frequency) algorithm;
extracting N objective selling point keywords from the plurality of sketch keywords according to the TF-IDF value; n is a natural number;
selecting matched sentences from the text description sentences according to the objective selling point keywords;
and generating the abstract of the target object according to the matching statement.
Preferably, the step of extracting N objective selling point keywords from the plurality of text-description keywords according to the TF-IDF value specifically includes:
and sequentially extracting N sketch keywords with the TF-IDF values ranked at the top as the objective selling point keywords according to the sequence from big to small.
Preferably, before the step of selecting matching sentences from the document description sentences according to the objective selling point keywords, the method for generating the article abstract further includes:
obtaining a plurality of comment sentences from the comment data of the target object;
extracting a plurality of comment keywords from the comment sentence;
calculating the frequency of each comment keyword;
extracting M subjective selling point keywords from the plurality of comment keywords according to the frequency; m is a natural number;
generating abstract keywords of the target object, wherein the abstract keywords comprise the subjective selling point keywords and the objective selling point keywords;
the step of selecting matched sentences from the text description sentences according to the objective selling point keywords specifically comprises the following steps:
generating a candidate sentence, wherein the candidate sentence comprises the text description sentence and the comment sentence;
and selecting the matching sentences from the candidate sentences according to the abstract keywords.
Preferably, the step of extracting M subjective selling point keywords from the plurality of comment keywords according to the frequency specifically includes:
and sequentially extracting M comment keywords with the top frequency ranking from large to small as the subjective selling point keywords.
Preferably, the step of generating the abstract keyword of the target item specifically includes:
and removing the duplicates of all subjective selling point keywords and all objective selling point keywords to generate the abstract keywords.
Preferably, before the step of selecting the matching sentence from the candidate sentences according to the abstract keyword, the method for generating the article abstract further includes:
normalizing the TF-IDF values of all objective selling point keywords, and taking the normalized TF-IDF value of each objective selling point keyword as a first weight of the objective selling point keyword;
normalizing the frequency of all the subjective selling point keywords, and taking the normalized frequency of each subjective selling point keyword as a second weight of the subjective selling point keyword;
generating the weight of the abstract key words; if the abstract keywords are objective selling point keywords, the weight of the abstract keywords is the first weight, if the abstract keywords are subjective selling point keywords, the weight of the abstract keywords is the second weight, and if the abstract keywords are both objective selling point keywords and subjective selling point keywords, the weight of the abstract keywords is the sum of the first weight and the second weight;
the step of selecting the matching sentence from the candidate sentences according to the abstract keyword specifically comprises:
according to the weight of the abstract key words, the abstract key words are arranged in a descending order;
selecting matching sentences matched with the abstract keywords from the candidate sentences in sequence according to the descending order of the weight;
the step of generating the abstract of the target item according to the matching statement specifically includes:
and sequentially selecting the matched sentences of each abstract keyword according to the weight descending order to form the abstract until the word number of the abstract reaches the preset word number.
Preferably, the step of sequentially selecting matching sentences matched with the abstract keywords from the candidate sentences according to the descending order of the weights specifically comprises:
selecting a first abstract key word according to the weight descending order;
extracting a first class of sentences containing the first abstract key words from the candidate sentences;
scoring each candidate sentence to obtain the score of each candidate sentence;
arranging the sentences in the first type of sentences in an ascending order according to the score size;
selecting the first sentence in the first class of sentences as a first matching sentence matched with the first abstract keyword according to the ascending order of scores; the matching statement comprises the first matching statement;
selecting the next abstract key word according to the weight descending order;
extracting a second type of sentence containing the next abstract key word from the candidate sentences;
arranging the sentences in the second type of sentences in an ascending order according to the score;
selecting the first-ranked sentences in the second-class sentences as second matching sentences matched with the next abstract keywords according to the ascending order of scores, and then executing the step of selecting the next abstract keywords; the matching statement comprises the second matching statement.
Preferably, the step of selecting the first sentence in the second category of sentences as the second matching sentence matched with the next abstract keyword according to the ascending order of scores specifically includes:
sequentially selecting sentences in the second type of sentences and the first matching sentences according to the ascending order of the scores to carry out similarity calculation until the similarity is smaller than a preset similarity;
eliminating sentences of which the similarity is not less than a preset similarity in the second type of sentences;
updating the sentences of which the similarity between the first one of the second sentences and the first matching sentence is less than the preset similarity into the first-ordered sentences in the second sentences;
and taking the first-ranked statement in the updated second-class statements as a second matching statement matched with the next abstract keyword, wherein the matching statement comprises the first matching statement and the second matching statement.
Preferably, the step of scoring each candidate sentence to obtain the score of each candidate sentence specifically includes:
respectively calculating the lexical, syntactic and emotional values of the candidate sentences based on an NLP (natural language processing) algorithm to obtain a first score, a second score and a third score;
calculating the language confusion of the candidate sentences based on a PPL (confusion level) algorithm to obtain a fourth score;
respectively giving corresponding weights to the first score, the second score, the third score and the fourth score;
and weighting and summing the first score, the second score, the third score and the fourth score of each candidate sentence to obtain the score of each candidate sentence.
Preferably, the step of identifying the text-describing picture of the target article to obtain a plurality of text-describing sentences specifically includes:
recognizing the text-describing picture based on an OCR (character recognition algorithm) to obtain a plurality of single-line sentences on the text-describing picture;
calculating whether the pixel height difference between any two adjacent single-line sentences is within a preset range or not based on a PPL algorithm, and if so, calculating the language confusion degree between any two adjacent single-line sentences;
judging whether the language confusion degree is smaller than a preset threshold value, and if so, confirming that any two adjacent single-line sentences belong to the same sentence;
and combining all the single-line sentences belonging to the same sentence to generate the text description sentence.
Preferably, after the step of identifying the text-describing picture of the target article to obtain a plurality of text-describing sentences, the article abstract generating method further includes:
filtering out text description sentences with errors in character recognition based on a preset dirty word bank;
and in the step of extracting a plurality of text description keywords of the target object from the text description sentences, extracting the plurality of text description keywords from the filtered text description sentences.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method for generating a summary of an item as described above when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for generating a summary of an item as described above.
An article abstract generating system comprises a text description sentence recognition module, a text description keyword extraction module, a TF-IDF value calculation module, an objective selling point keyword extraction module, a sentence matching module and an abstract generating module;
the text-describing sentence recognition module is used for recognizing a text-describing picture of a target article to obtain a plurality of text-describing sentences;
the text description keyword extraction module is used for extracting a plurality of text description keywords of the target article from the text description sentences;
the TF-IDF value calculation module is used for calculating to obtain a TF-IDF value of each text description keyword based on a TF-IDF algorithm;
the objective selling point keyword extraction module is used for extracting N objective selling point keywords from the plurality of sketch keywords according to the TF-IDF value; n is a natural number;
the sentence matching module is used for selecting matched sentences from the text description sentences according to the objective selling point keywords;
the abstract generating module is used for generating an abstract of the target object according to the matching statement.
Preferably, the objective selling point keyword extraction module is configured to sequentially extract N sketch keywords with the top TF-IDF values in the descending order as the objective selling point keywords.
Preferably, the article abstract generating system further comprises a comment sentence acquiring module, a comment keyword extracting module, a frequency calculating module, a subjective selling point keyword extracting module and an abstract keyword generating module, wherein the sentence matching module comprises a candidate sentence generating unit;
the comment sentence acquisition module is used for acquiring a plurality of comment sentences from comment data of the target object;
the comment keyword extraction module is used for extracting a plurality of comment keywords from the comment sentences;
the frequency calculating module is also used for calculating the frequency of each comment keyword;
the subjective selling point keyword extraction module is used for extracting M subjective selling point keywords from the comment keywords according to the frequency; m is a natural number;
the abstract keyword generation module is used for generating abstract keywords of the target object, wherein the abstract keywords comprise the subjective selling point keywords and the objective selling point keywords;
the candidate sentence generating unit is used for generating candidate sentences, and the candidate sentences comprise the description sentences and the comment sentences;
the sentence matching module is used for selecting the matching sentences from the candidate sentences according to the abstract keywords.
Preferably, the subjective selling point keyword extraction module is configured to sequentially extract M comment keywords with top-ranked frequencies from large to small as the subjective selling point keyword.
Preferably, the abstract keyword generation module is configured to generate the abstract keyword after deduplicating all subjective selling point keywords and all objective selling point keywords.
Preferably, the article abstract generating system further comprises a weight calculating module, the weight calculating module comprises a normalizing unit and a weight generating unit, and the sentence matching module comprises a sorting unit and a matching sentence selecting unit;
the normalization unit is used for normalizing TF-IDF values of all objective selling point keywords;
the weight generation unit is used for taking the normalized TF-IDF value of each objective selling point keyword as a first weight of the objective selling point keyword;
the normalization unit is also used for normalizing the frequency of all the subjective selling point keywords;
the weight generation unit is also used for taking the normalized frequency of each subjective selling point keyword as a second weight of the subjective selling point keyword;
the weight generating unit is also used for generating the weight of the abstract key words; if the abstract keywords are objective selling point keywords, the weight of the abstract keywords is the first weight, if the abstract keywords are subjective selling point keywords, the weight of the abstract keywords is the second weight, and if the abstract keywords are both objective selling point keywords and subjective selling point keywords, the weight of the abstract keywords is the sum of the first weight and the second weight;
the sorting unit is used for sorting the abstract keywords in a descending order according to the weights of the abstract keywords;
the matching sentence selecting unit is used for sequentially selecting matching sentences matched with the abstract keywords from the candidate sentences according to a weight descending order;
the abstract generating module is used for sequentially selecting the matching sentences of each abstract keyword according to the weight descending order to form the abstract until the word number of the abstract reaches the preset word number.
Preferably, the article abstract generating system further comprises a scoring module, and the sentence matching module further comprises a keyword selecting unit;
the scoring module is used for scoring each candidate statement to obtain the score of each candidate statement;
the keyword selection unit is used for selecting a first abstract keyword according to a weight descending order;
the matching statement selecting unit is used for extracting a first class of statements containing the first abstract key words from the candidate statements;
the sorting unit is used for sorting the sentences in the first type of sentences in an ascending order according to the score size;
the matching sentence selecting unit is used for selecting the first sentence in the first class of sentences as a first matching sentence matched with the first abstract keyword according to the ascending order of scores, and then calling the keyword selecting unit to execute the action of selecting the next abstract keyword according to the descending order of weights; the matching statement comprises the first matching statement;
the matching statement selecting unit is also used for extracting a second type of statement containing the next abstract keyword from the candidate statements;
the sorting unit is further used for sorting the sentences in the second category of sentences in an ascending order according to the score;
the matching sentence selecting unit is also used for selecting the first-ranked sentences in the second-class sentences as second matching sentences matched with the next abstract keywords according to the ascending order of scores, and then calling the keyword selecting unit to execute the action of selecting the next abstract keywords; the matching statement comprises the second matching statement.
Preferably, the sentence matching module further comprises a similarity calculation unit, a rejection unit and an update unit;
the similarity calculation unit is used for sequentially selecting sentences in the second type of sentences and the first matching sentences according to the ascending order of scores to perform similarity calculation until the similarity is smaller than a preset similarity, and calling the rejection unit;
the eliminating unit is used for eliminating sentences of which the similarity is not less than the preset similarity in the second type of sentences;
the updating unit is used for updating the sentence with the similarity between the first matching sentence and the first matching sentence being less than the preset similarity in the second class of sentences into the first-ordered sentence in the second class of sentences;
and the matching statement selecting unit is used for taking the first-ranked statement in the updated second-class statements as a second matching statement matched with the next abstract keyword.
Preferably, the scoring module comprises a score calculating unit and a weight giving unit;
the score calculation unit is used for calculating the lexical, syntactic and emotional values of the candidate sentences respectively based on an NLP algorithm to obtain a first score, a second score and a third score, and is also used for calculating the language confusion degree of the candidate sentences based on a PPL algorithm to obtain a fourth score;
the weight giving unit is used for giving corresponding weights to the first score, the second score, the third score and the fourth score respectively;
the scoring module is configured to sum the first score, the second score, the third score, and the fourth score of each candidate sentence in a weighted manner to obtain a score of each candidate sentence.
Preferably, the written sentence identification module comprises a single-line sentence identification unit, a first calculation unit, a second calculation unit, a first judgment unit, a second judgment unit, a sentence confirmation unit and a written sentence generation unit;
the single-line sentence recognition unit is used for recognizing the text-describing picture based on OCR to obtain a plurality of single-line sentences on the text-describing picture;
the first calculating unit is used for calculating the pixel height difference between any two adjacent single-line sentences based on a PPL algorithm;
the first judging unit is used for judging whether the pixel height difference is within a preset range, and if so, the second calculating unit is called;
the second calculation unit is used for calculating the language confusion degree between any two adjacent single-line sentences;
the second judging unit is also used for judging whether the language confusion degree is smaller than a preset threshold value, and if the language confusion degree is smaller than the preset threshold value, the statement confirming unit is called;
the sentence confirming unit is used for confirming that any two adjacent single-line sentences belong to the same sentence;
the text description sentence generating unit is used for combining all the single-line sentences belonging to the same sentence to generate the text description sentence.
Preferably, the article abstract generating system further comprises a filtering module;
the filtering module is used for filtering text description sentences with wrong character recognition based on a preset dirty word bank;
the text description keyword extraction module is used for extracting the plurality of text description keywords from the filtered text description sentences.
The positive progress effects of the invention are as follows: the invention can automatically write the abstract of the article according to the text and the drawing of the article, so that the writing quality can be controlled, the writing time is shortened, the writing efficiency is improved, and the writing cost is reduced.
Drawings
Fig. 1 is a flowchart of an article summary generation method according to embodiment 1 of the present invention.
Fig. 2 is a specific flowchart of step 10 in the method for generating an article summary according to embodiment 1 of the present invention.
Fig. 3 is a flowchart of another embodiment of the method for generating the article abstract according to embodiment 1 of the present invention.
Fig. 4 is a flowchart of an article summary generation method according to embodiment 2 of the present invention.
Fig. 5 is a flowchart illustrating a step 50 of the method for generating an article summary according to embodiment 3 of the present invention.
Fig. 6 is a flowchart illustrating a step 5202 of the method for generating an article abstract according to embodiment 4 of the present invention.
Fig. 7 is a flowchart illustrating a step 52023 of the method for generating an article abstract according to embodiment 4 of the present invention.
Fig. 8 is a flowchart illustrating a step 52029 of the method for generating an article abstract according to embodiment 5 of the present invention.
Fig. 9 is a schematic mechanism diagram of an electronic device according to embodiment 6 of the present invention.
Fig. 10 is a block diagram of the item digest generation system according to embodiment 8 of the present invention.
Fig. 11 is a block diagram of the structure of the description sentence recognition module in the article abstract generating system according to embodiment 8 of the present invention.
Fig. 12 is a block diagram showing the structure of an item digest generation system according to embodiment 9 of the present invention.
Fig. 13 is a block diagram of a sentence matching module in the item digest generation system according to embodiment 9 of the present invention.
Fig. 14 is a block diagram showing the structure of an item digest generation system according to embodiment 10 of the present invention.
Fig. 15 is a block diagram of a weight calculation module in the item digest generation system according to embodiment 10 of the present invention.
Fig. 16 is a block diagram of a sentence matching module in the item digest generation system according to embodiment 10 of the present invention.
Fig. 17 is a block diagram showing the structure of an item digest generation system according to embodiment 11 of the present invention.
Fig. 18 is a block diagram of a sentence matching module in the item digest generation system according to embodiment 11 of the present invention.
Fig. 19 is a block diagram of a scoring module in the item digest generation system according to embodiment 11 of the present invention.
Fig. 20 is a block diagram of a sentence matching module in the item digest generation system according to embodiment 12 of the present invention.
Detailed Description
The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.
Example 1
A method for generating an article abstract, as shown in fig. 1, includes:
step 10, identifying the text-describing picture of the target object to obtain a plurality of text-describing sentences;
step 20, extracting a plurality of text description keywords of the target object from the text description sentences;
step 30, calculating to obtain a TF-IDF value of each text description keyword based on a TF-IDF algorithm;
step 40, extracting N objective selling point keywords from the plurality of text description keywords according to the TF-IDF value; n is a natural number;
step 50, selecting matched sentences from the text description sentences according to the objective selling point keywords; it should be noted that the matched sentences are sentences containing objective selling point keywords;
and step 60, generating the abstract of the target object according to the matching statement.
In this embodiment, as shown in fig. 2, step 10 specifically includes:
step 101, recognizing the text-describing picture based on OCR to obtain a plurality of single-line sentences on the text-describing picture;
102, calculating the pixel height difference between any two adjacent single-line sentences based on a PPL algorithm;
step 103, judging whether the pixel height difference is within a preset range, if so, executing step 104; if not, the two single-line sentences which are adjacent arbitrarily do not belong to the same sentence;
104, calculating the language confusion degree between any two adjacent single-line sentences;
step 105, judging whether the language confusion degree is smaller than a preset threshold value, if so, executing step 106;
step 106, confirming that any two adjacent single-line sentences belong to the same sentence;
and step 107, combining all the single-line sentences belonging to the same sentence to generate the text description sentence.
In addition, as shown in fig. 3, another implementation manner of the method for generating the abstract of the article is provided, after step 10, the method for generating the abstract of the article further includes:
step 11, filtering text description sentences with wrong character recognition based on a preset dirty word stock;
further, step 201 is used to replace step 20, and specifically includes:
step 201, extracting a plurality of text description keywords of the target object from the filtered text description sentences.
In this embodiment, step 40 specifically includes:
and sequentially extracting N text description keywords with TF-IDF values ranked in the front in the descending order as objective point keywords.
In the embodiment, the abstract of the article can be automatically written according to the text description sentences recognized in the text description pictures of the article, so that the writing quality can be controlled, the writing time is shortened, the writing efficiency is improved, and the writing cost is reduced.
Example 2
The method for generating the article abstract in the embodiment is further improved on the basis of the embodiment 1, as shown in fig. 4, before step 50, the method for generating the article abstract further includes:
step 41, obtaining a plurality of comment sentences from comment data of the target object;
step 42, extracting a plurality of comment keywords from the comment sentences;
step 43, calculating the frequency of each comment keyword;
step 44, extracting M subjective selling point keywords from the plurality of comment keywords according to frequency; m is a natural number;
step 45, generating abstract keywords of the target object; the abstract keywords comprise the subjective selling point keywords and the objective selling point keywords;
further, step 50 specifically includes:
step 510, generating candidate sentences; the candidate sentences comprise the text description sentences and the comment sentences;
and 520, selecting a matched statement from the candidate statements according to the abstract keyword.
In this embodiment, step 45 specifically includes:
and removing the duplicates of all subjective selling point keywords and all objective selling point keywords to generate the abstract keywords.
Wherein, step 44 specifically includes:
and sequentially extracting M comment keywords with the top frequency ranking from large to small as subjective selling point keywords.
In this embodiment, in addition to the above description sentence, the comment of the user on the item is also considered, and the information extracted from the comment better conforms to the preference of the user, so that the user experience is improved. In addition, because the comments of the user are numerous and complicated, spoken sentences and sentences with high emotion can be filtered out through some existing sentence analysis algorithms, and meanwhile, because the candidate sentences comprise the comment sentences and the text description sentences, the similarity calculation can be carried out on the comment sentences and the text description sentences, and the comment sentences with the similarity value larger than the set value are filtered out, so that the subsequent matching efficiency is improved.
Example 3
The method for generating the article abstract in this embodiment is further improved on the basis of embodiment 2, as shown in fig. 5, before step 520, step 50 further includes:
step 511, normalizing the TF-IDF values of all objective selling point keywords;
step 512, taking the normalized TF-IDF value of each objective selling point keyword as a first weight of the objective selling point keyword;
step 513, normalizing the frequency of all the subjective selling point keywords;
step 514, taking the normalized frequency of each subjective selling point keyword as a second weight of the subjective selling point keyword;
step 515, generating the weight of the abstract key words; if the abstract keywords are objective selling point keywords, the weight of the abstract keywords is the first weight, if the abstract keywords are subjective selling point keywords, the weight of the abstract keywords is the second weight, and if the abstract keywords are both objective selling point keywords and subjective selling point keywords, the weight of the abstract keywords is the sum of the first weight and the second weight;
further, step 520 specifically includes:
step 5201, arranging the abstract keywords in a descending order according to the weights of the abstract keywords;
step 5202, selecting matching sentences matched with the abstract keywords from the candidate sentences in sequence according to the descending order of the weight;
further, step 60 specifically includes:
and sequentially selecting the matched sentences of each abstract keyword according to the weight descending order to form the abstract until the word number of the abstract reaches the preset word number.
In this embodiment, after the final article abstract keywords and the corresponding weights are obtained, corresponding sentences are matched for each abstract keyword in sequence from large to small according to the weights, and then the sentences are combined to form the abstract.
Example 4
The method for generating the abstract of the article in this embodiment is further improved on the basis of embodiment 3, as shown in fig. 6, step 5202 specifically includes:
step 52021, selecting a first abstract keyword according to the weight descending order;
step 52022, extracting a first class of sentences containing first abstract keywords from the candidate sentences;
step 52023, scoring each candidate sentence to obtain a score of each candidate sentence;
52024, arranging the sentences in the first type of sentences in an ascending order according to the score;
step 52025, selecting the first sentence in the first class of sentences as a first matching sentence matched with the first abstract keyword according to the ascending order of scores; the matching statement comprises the first matching statement;
step 52026, selecting the next abstract key word according to the weight descending order;
step 52027, extracting a second type of sentences containing the next abstract keywords from the candidate sentences;
step 52028, arranging the sentences in the second type of sentences in an ascending order according to the score;
step 52029, selecting the first-ranked sentences in the second-class sentences as second matching sentences matched with the next abstract keywords according to the ascending order of scores, and then returning to step 52026; the matching statement comprises the second matching statement.
In this embodiment, as shown in fig. 7, step 52023 specifically includes:
step 52023-1, respectively calculating a lexical structure, a syntactic structure and an emotion value of the candidate sentence based on an NLP algorithm to obtain a first score, a second score and a third score; it should be noted that, the above calculation is performed by the part of speech analysis, dependency syntax structure and emotion classifier;
step 52023-2, calculating the language confusion degree of the candidate sentences based on a PPL algorithm to obtain a fourth score;
step 52023-3, respectively giving corresponding weights to the first score, the second score, the third score and the fourth score;
step 52023-4, weighting and summing the first score, the second score, the third score and the fourth score of each candidate sentence to obtain a score of each candidate sentence.
In this embodiment, the score of the candidate sentence is obtained on the basis of scoring the candidate sentence, and when matching sentences for the abstract keyword, the corresponding matching sentence is selected depending on the score.
Example 5
The method for generating the abstract of the article in this embodiment is further improved on the basis of embodiment 4, as shown in fig. 8, step 52029 includes:
52029-1, sequentially selecting sentences in the second type of sentences and the first matching sentences according to the ascending order of scores to perform similarity calculation until the similarity is smaller than a preset similarity;
step 52029-2, eliminating sentences of which the similarity is not less than the preset similarity in the second type of sentences;
step 52029-3, updating the sentence with the similarity between the first matching sentence and the first sentence being less than the preset similarity in the second sentence into the first sentence ordered in the second sentence;
step 52029-4, taking the first-ranked statement in the updated second-class statements as a second matching statement matched with the next abstract keyword; the matching statement includes the first matching statement and the second matching statement.
In this embodiment, when the second abstract keyword is matched, after a matching sentence with the highest score is selected from matching sentences of the second abstract keyword, similarity calculation is performed on the matching sentence with a sentence which is successfully matched with the previous keyword, if the similarity is higher than a preset similarity, the sentence with the highest score is removed, the sentence with the second score in the matching sentences is updated to the sentence with the highest score, and the similarity calculation is continued to be performed on the sentence which is successfully matched with the previous keyword until the similarity is smaller than the preset similarity, and then the sentence with the current score smaller than the similarity is used as the matching sentence of the currently matched abstract keyword.
Example 6
An electronic device, comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor implements the method for generating a summary of an item as described in any of embodiments 1 to 5 when executing the computer program.
Fig. 9 is a schematic structural diagram of an electronic device according to embodiment 6 of the present invention. FIG. 9 illustrates a block diagram of an exemplary electronic device 90 suitable for use in implementing embodiments of the present invention. The electronic device 90 shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 9, the electronic device 90 may be embodied in the form of a general purpose computing device, which may be, for example, a server device. The components of the electronic device 90 may include, but are not limited to: at least one processor 91, at least one memory 92, and a bus 93 that connects the various system components (including the memory 92 and the processor 91).
The bus 93 includes a data bus, an address bus, and a control bus.
Memory 92 may include volatile memory, such as Random Access Memory (RAM)921 and/or cache memory 922, and may further include Read Only Memory (ROM) 923.
Memory 92 may also include a program tool 925 having a set (at least one) of program modules 924, such program modules 924 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The processor 91 executes various functional applications and data processing by running a computer program stored in the memory 92.
The electronic device 90 may also communicate with one or more external devices 94 (e.g., keyboard, pointing device, etc.). Such communication may be through an input/output (I/O) interface 95. Also, the electronic device 90 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via a network adapter 96. The network adapter 96 communicates with the other modules of the electronic device 90 via the bus 93. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 90, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Example 7
A computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the item digest generation method according to any one of embodiments 1 to 5.
More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation manner, the present invention can also be implemented in the form of a program product, which includes program code for causing a terminal device to execute the steps of implementing the method for generating a digest of an article described in any one of embodiments 1 to 5 when the program product is run on the terminal device.
Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may be executed entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.
Example 8
An article abstract generating system is shown in fig. 10, and comprises a text description sentence recognition module 11, a text description keyword extraction module 13, a TF-IDF value calculation module 14, an objective selling point keyword extraction module 15, a sentence matching module 3 and an abstract generating module 4;
the text description sentence recognition module 11 is configured to recognize a text description picture of a target article to obtain a plurality of text description sentences;
the text description keyword extraction module 13 is configured to extract a plurality of text description keywords of the target article from the text description sentence;
the TF-IDF value calculation module 14 is used for calculating a TF-IDF value of each text description keyword based on a TF-IDF algorithm;
the objective selling point keyword extraction module 15 is configured to extract N objective selling point keywords from the plurality of sketch keywords according to the TF-IDF value; n is a natural number; specifically, the objective selling point keyword extraction module 15 is configured to sequentially extract, according to a descending order, N sketch keywords in the top order of the TF-IDF values as the objective selling point keywords;
the sentence matching module 3 is used for selecting matched sentences from the text description sentences according to the objective selling point keywords; it should be noted that the matched sentences are sentences containing objective selling point keywords;
the abstract generating module 4 is used for generating an abstract of the target article according to the matching statement.
In this embodiment, as shown in fig. 11, the written sentence recognition module 11 includes a single-line sentence recognition unit 111, a first calculation unit 112, a second calculation unit 113, a first judgment unit 114, a second judgment unit 115, a sentence confirmation unit 116, and a written sentence generation unit 117;
the single-line sentence recognition unit 111 is configured to recognize the text-describing picture based on OCR to obtain a plurality of single-line sentences on the text-describing picture;
the first calculating unit 112 is configured to calculate a pixel height difference between any two adjacent single-line sentences based on a PPL algorithm;
the first determining unit 114 is configured to determine whether the pixel height difference is within a preset range, and if so, invoke the second calculating unit 113;
the second calculating unit 113 is configured to calculate a language confusion degree between the arbitrary two adjacent single-line sentences;
the second determining unit 115 is further configured to determine whether the language confusion is smaller than a preset threshold, and if so, invoke the sentence confirming unit 116;
the sentence confirmation unit 116 is configured to confirm that the two arbitrarily adjacent single-line sentences belong to the same sentence;
the written sentence generating unit 117 is configured to combine all the single-line sentences belonging to the same sentence to generate the written sentence.
In this embodiment, referring to fig. 10, the article summary generation system further includes a filtering module 12;
the filtering module 12 is configured to filter out text-tracing sentences with incorrect character recognition based on a preset dirty word bank;
the text description keyword extraction module 13 is configured to extract the text description keywords from the filtered text description sentences.
In the embodiment, the abstract of the article can be automatically written according to the text description sentences recognized in the text description pictures of the article, so that the writing quality can be controlled, the writing time is shortened, the writing efficiency is improved, and the writing cost is reduced.
Example 9
The system for generating an abstract of an article according to this embodiment is further improved on the basis of embodiment 8, as shown in fig. 12 to 13, the system for generating an abstract of an article further includes a comment sentence acquisition module 21, a comment keyword extraction module 22, a frequency calculation module 23, a subjective selling point keyword extraction module 24, and an abstract keyword generation module 25, and the sentence matching module 3 includes a candidate sentence generation unit 31;
the comment sentence acquisition module 21 is configured to acquire a plurality of comment sentences from the comment data of the target item;
the comment keyword extraction module 22 is configured to extract a plurality of comment keywords from the comment sentences;
the frequency calculating module 23 is further configured to calculate a frequency of each comment keyword;
the subjective selling point keyword extraction module 24 is configured to extract M subjective selling point keywords from the plurality of comment keywords according to the frequency; m is a natural number; specifically, the subjective selling point keyword extraction module 24 is configured to sequentially extract M comment keywords with top-ranked frequencies in a descending order as the subjective selling point keyword; in the present invention, the values of M and N may be the same or different;
the abstract keyword generation module 25 is configured to generate an abstract keyword of the target item, where the abstract keyword includes the subjective selling point keyword and the objective selling point keyword;
the candidate sentence generating unit 31 is configured to generate candidate sentences including the description sentences and the comment sentences;
the sentence matching module 3 is configured to select the matching sentence from the candidate sentences according to the abstract keyword.
Further, the abstract keyword generation module 25 is configured to generate the abstract keyword after deduplicating all the subjective selling point keywords and all the objective selling point keywords.
In this embodiment, in addition to the above description sentence, the comment of the user on the item is also considered, and the information extracted from the comment better conforms to the preference of the user, so that the user experience is improved. In addition, because the comments of the user are numerous and complicated, spoken sentences and sentences with high emotion can be filtered out through some existing sentence analysis algorithms, and meanwhile, because the candidate sentences comprise the comment sentences and the text description sentences, the similarity calculation can be carried out on the comment sentences and the text description sentences, and the comment sentences with the similarity value larger than the set value are filtered out, so that the subsequent matching efficiency is improved.
Example 10
The system for generating the abstract of the article of this embodiment is further improved on the basis of the embodiment 9, as shown in fig. 14 to 16, the system for generating the abstract of the article further includes a weight calculation module 5, the weight calculation module 5 includes a normalization unit 51 and a weight generation unit 52, and the sentence matching module 3 further includes a sorting unit 32 and a matching sentence selection unit 33;
the normalization unit 51 is configured to perform normalization processing on TF-IDF values of all objective selling point keywords;
the weight generating unit 52 is configured to use the normalized TF-IDF value of each objective selling point keyword as a first weight of the objective selling point keyword;
the normalization unit 51 is further configured to perform normalization processing on the frequency of all the subjective selling point keywords;
the weight generating unit 52 is further configured to use the normalized frequency of each subjective selling point keyword as a second weight of the subjective selling point keyword;
the weight generating unit 52 is further configured to generate weights of the abstract keywords; if the abstract keywords are objective selling point keywords, the weight of the abstract keywords is the first weight, if the abstract keywords are subjective selling point keywords, the weight of the abstract keywords is the second weight, and if the abstract keywords are both objective selling point keywords and subjective selling point keywords, the weight of the abstract keywords is the sum of the first weight and the second weight;
the sorting unit 32 is configured to sort the abstract keywords in a descending order according to the weights of the abstract keywords;
the matching sentence selecting unit 33 is configured to sequentially select matching sentences matched with the abstract keywords from the candidate sentences according to a weight descending order;
further, the abstract generating module 4 is configured to sequentially select matching statements of each abstract keyword according to a descending order of weight to form the abstract until the word count of the abstract reaches a preset word count.
In this embodiment, after the final article abstract keywords and the corresponding weights are obtained, corresponding sentences are matched for each abstract keyword in sequence from large to small according to the weights, and then the sentences are combined to form the abstract.
Example 11
The system for generating the article abstract of the present embodiment is further improved on the basis of the embodiment 10, as shown in fig. 17 to 18, the system for generating the article abstract further includes a scoring module 6, and the sentence matching module 3 further includes a keyword selecting unit 34;
the scoring module 6 is configured to score each candidate sentence to obtain a score of each candidate sentence;
the keyword selection unit 34 is configured to select a first abstract keyword in a descending order of weight;
the matching sentence selecting unit 33 is configured to extract a first category of sentences including the first abstract keyword from the candidate sentences;
the sorting unit 32 is configured to sort the sentences in the first category of sentences in an ascending order according to the score;
the matching sentence selecting unit 33 is configured to select a first sentence in the first category of sentences as a first matching sentence matched with the first abstract keyword according to an ascending score order, and then call the keyword selecting unit 34 to perform an action of selecting a next abstract keyword according to a descending weight order; the matching statement comprises the first matching statement;
the matching sentence selecting unit 33 is further configured to extract a second category of sentences including the next abstract keyword from the candidate sentences;
the sorting unit 32 is further configured to sort the sentences in the second category of sentences in an ascending order according to the score;
the matching sentence selecting unit 33 is further configured to select, according to the ascending order of scores, a first sentence in the second category of sentences as a second matching sentence matched with the next abstract keyword, and then call the keyword selecting unit 34 to perform an action of selecting the next abstract keyword; the matching statement comprises the second matching statement.
In this embodiment, as shown in fig. 19, the scoring module 6 includes a score calculating unit 61 and a weight giving unit 62;
the score calculation unit 61 is configured to calculate a lexical structure, a syntactic structure, and an emotion value of the candidate sentence respectively based on an NLP algorithm to obtain a first score, a second score, and a third score, and is further configured to calculate a language confusion degree of the candidate sentence based on a PPL algorithm to obtain a fourth score;
the weight assigning unit 62 is configured to assign corresponding weights to the first score, the second score, the third score and the fourth score, respectively;
the scoring module 6 is configured to perform weighted summation on the first score, the second score, the third score, and the fourth score of each candidate sentence to obtain a score of each candidate sentence.
In this embodiment, the score of the candidate sentence is obtained on the basis of scoring the candidate sentence, and when matching sentences for the abstract keyword, the corresponding matching sentence is selected depending on the score.
Example 12
The system for generating an article abstract in this embodiment is further improved on the basis of embodiment 11, as shown in fig. 20, the sentence matching module 3 further includes a similarity calculation unit 35, a rejection unit 36, and an update unit 37;
the similarity calculation unit 35 is configured to sequentially select sentences in the second category of sentences and the first matching sentences according to an ascending order of scores to perform similarity calculation until the similarity is smaller than a preset similarity, and call the rejection unit 36 to match the sentence selection unit 33;
the eliminating unit 36 is configured to eliminate sentences with similarity not less than a preset similarity in the second category of sentences;
the updating unit 37 is configured to update a statement in the second category of statements, whose similarity with the first matching statement is smaller than a preset similarity, to a first-ranked statement in the second category of statements;
the matching statement selecting unit 33 is configured to use the first-ranked statement in the updated second-class statements as the second matching statement matched with the next abstract keyword.
In this embodiment, when the second abstract keyword is matched, after a matching sentence with the highest score is selected from matching sentences of the second abstract keyword, similarity calculation is performed on the matching sentence with a sentence which is successfully matched with the previous keyword, if the similarity is higher than a preset similarity, the sentence with the highest score is removed, the sentence with the second score in the matching sentences is updated to the sentence with the highest score, and the similarity calculation is continued to be performed on the sentence which is successfully matched with the previous keyword until the similarity is smaller than the preset similarity, and then the sentence with the current score smaller than the similarity is used as the matching sentence of the currently matched abstract keyword.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims (24)

1. An article abstract generating method is characterized by comprising the following steps:
identifying the text-describing picture of the target object to obtain a plurality of text-describing sentences;
extracting a plurality of text description keywords of the target object from the text description sentence;
calculating to obtain a TF-IDF value of each text description keyword based on a TF-IDF algorithm;
extracting N objective selling point keywords from the plurality of sketch keywords according to the TF-IDF value; n is a natural number;
selecting matched sentences from the text description sentences according to the objective selling point keywords;
and generating the abstract of the target object according to the matching statement.
2. The method for generating an abstract of an article according to claim 1, wherein the step of extracting N objective selling point keywords from the plurality of sketch keywords according to the TF-IDF value comprises:
and sequentially extracting N sketch keywords with the TF-IDF values ranked at the top as the objective selling point keywords according to the sequence from big to small.
3. The method for generating an article abstract according to claim 1, wherein the step of selecting matching sentences from the text-describing sentences according to the objective point keywords is preceded by the method for generating an article abstract further comprising:
obtaining a plurality of comment sentences from the comment data of the target object;
extracting a plurality of comment keywords from the comment sentence;
calculating the frequency of each comment keyword;
extracting M subjective selling point keywords from the plurality of comment keywords according to the frequency; m is a natural number;
generating abstract keywords of the target object, wherein the abstract keywords comprise the subjective selling point keywords and the objective selling point keywords;
the step of selecting matched sentences from the text description sentences according to the objective selling point keywords specifically comprises the following steps:
generating a candidate sentence, wherein the candidate sentence comprises the text description sentence and the comment sentence;
and selecting the matching sentences from the candidate sentences according to the abstract keywords.
4. The method for generating an abstract of an item according to claim 3, wherein the step of extracting M subjective selling point keywords from the plurality of comment keywords according to the frequency specifically comprises:
and sequentially extracting M comment keywords with the top frequency ranking from large to small as the subjective selling point keywords.
5. The method for generating an abstract of an article according to claim 3, wherein the step of generating the abstract key words of the target article specifically comprises:
and removing the duplicates of all subjective selling point keywords and all objective selling point keywords to generate the abstract keywords.
6. The method for generating a summary of an item according to claim 3, wherein the step of selecting the matching sentence from the candidate sentences according to the summary keyword is preceded by the method for generating a summary of an item further comprising:
normalizing the TF-IDF values of all objective selling point keywords, and taking the normalized TF-IDF value of each objective selling point keyword as a first weight of the objective selling point keyword;
normalizing the frequency of all the subjective selling point keywords, and taking the normalized frequency of each subjective selling point keyword as a second weight of the subjective selling point keyword;
generating the weight of the abstract key words; if the abstract keywords are objective selling point keywords, the weight of the abstract keywords is the first weight, if the abstract keywords are subjective selling point keywords, the weight of the abstract keywords is the second weight, and if the abstract keywords are both objective selling point keywords and subjective selling point keywords, the weight of the abstract keywords is the sum of the first weight and the second weight;
the step of selecting the matching sentence from the candidate sentences according to the abstract keyword specifically comprises:
according to the weight of the abstract key words, the abstract key words are arranged in a descending order;
selecting matching sentences matched with the abstract keywords from the candidate sentences in sequence according to the descending order of the weight;
the step of generating the abstract of the target item according to the matching statement specifically includes:
and sequentially selecting the matched sentences of each abstract keyword according to the weight descending order to form the abstract until the word number of the abstract reaches the preset word number.
7. The method for generating the abstract of the item according to claim 6, wherein the step of sequentially selecting the matching sentences matched with the abstract keywords from the candidate sentences according to the descending order of the weights specifically comprises:
selecting a first abstract key word according to the weight descending order;
extracting a first class of sentences containing the first abstract key words from the candidate sentences;
scoring each candidate sentence to obtain the score of each candidate sentence;
arranging the sentences in the first type of sentences in an ascending order according to the score size;
selecting the first sentence in the first class of sentences as a first matching sentence matched with the first abstract keyword according to the ascending order of scores; the matching statement comprises the first matching statement;
selecting the next abstract key word according to the weight descending order;
extracting a second type of sentence containing the next abstract key word from the candidate sentences;
arranging the sentences in the second type of sentences in an ascending order according to the score;
selecting the first-ranked sentences in the second-class sentences as second matching sentences matched with the next abstract keywords according to the ascending order of scores, and then executing the step of selecting the next abstract keywords; the matching statement comprises the second matching statement.
8. The method for generating the article abstract according to claim 7, wherein the step of selecting the first-ranked sentence in the second-class sentences as the second matching sentence matched with the next abstract keyword in ascending order of score specifically comprises:
sequentially selecting sentences in the second type of sentences and the first matching sentences according to the ascending order of the scores to carry out similarity calculation until the similarity is smaller than a preset similarity;
eliminating sentences of which the similarity is not less than a preset similarity in the second type of sentences;
updating the sentences of which the similarity between the first one of the second sentences and the first matching sentence is less than the preset similarity into the first-ordered sentences in the second sentences;
and taking the first-ranked statement in the updated second-class statements as a second matching statement matched with the next abstract keyword, wherein the matching statement comprises the first matching statement and the second matching statement.
9. The method for generating an abstract of an item according to claim 7, wherein the step of scoring each candidate sentence to obtain the score of each candidate sentence specifically comprises:
respectively calculating the lexical, syntactic and emotional values of the candidate sentences based on an NLP algorithm to obtain a first score, a second score and a third score;
calculating the language confusion degree of the candidate sentences based on a PPL algorithm to obtain a fourth score;
respectively giving corresponding weights to the first score, the second score, the third score and the fourth score;
and weighting and summing the first score, the second score, the third score and the fourth score of each candidate sentence to obtain the score of each candidate sentence.
10. The method for generating the abstract of the object according to claim 1, wherein the step of identifying the text-describing picture of the object to obtain a plurality of text-describing sentences specifically comprises:
identifying the text-describing picture based on OCR to obtain a plurality of single-line sentences on the text-describing picture;
calculating whether the pixel height difference between any two adjacent single-line sentences is within a preset range or not based on a PPL algorithm, and if so, calculating the language confusion degree between any two adjacent single-line sentences;
judging whether the language confusion degree is smaller than a preset threshold value, and if so, confirming that any two adjacent single-line sentences belong to the same sentence;
and combining all the single-line sentences belonging to the same sentence to generate the text description sentence.
11. The method for generating an article abstract according to claim 1, wherein after the step of identifying the text-describing picture of the target article to obtain a plurality of text-describing sentences, the method for generating an article abstract further comprises:
filtering out text description sentences with errors in character recognition based on a preset dirty word bank;
and in the step of extracting a plurality of text description keywords of the target object from the text description sentences, extracting the plurality of text description keywords from the filtered text description sentences.
12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of generating a summary of an item as claimed in any one of claims 1 to 11 when executing the computer program.
13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the item digest generation method according to any one of claims 1 to 11.
14. An article abstract generating system is characterized by comprising a text-description sentence recognition module, a text-description keyword extraction module, a TF-IDF value calculation module, an objective selling point keyword extraction module, a sentence matching module and an abstract generating module;
the text-describing sentence recognition module is used for recognizing a text-describing picture of a target article to obtain a plurality of text-describing sentences;
the text description keyword extraction module is used for extracting a plurality of text description keywords of the target article from the text description sentences;
the TF-IDF value calculation module is used for calculating to obtain a TF-IDF value of each text description keyword based on a TF-IDF algorithm;
the objective selling point keyword extraction module is used for extracting N objective selling point keywords from the plurality of sketch keywords according to the TF-IDF value; n is a natural number;
the sentence matching module is used for selecting matched sentences from the text description sentences according to the objective selling point keywords;
the abstract generating module is used for generating an abstract of the target object according to the matching statement.
15. The article summary generation system of claim 14, wherein the objective point keyword extraction module is configured to sequentially extract N top-ranked sketch keywords of the TF-IDF values in descending order as the objective point keywords.
16. The system for generating an abstract of an article according to claim 14, wherein the system for generating an abstract of an article further comprises a comment sentence acquisition module, a comment keyword extraction module, a frequency calculation module, a subjective selling point keyword extraction module and an abstract keyword generation module, and the sentence matching module comprises a candidate sentence generation unit;
the comment sentence acquisition module is used for acquiring a plurality of comment sentences from comment data of the target object;
the comment keyword extraction module is used for extracting a plurality of comment keywords from the comment sentences;
the frequency calculating module is also used for calculating the frequency of each comment keyword;
the subjective selling point keyword extraction module is used for extracting M subjective selling point keywords from the comment keywords according to the frequency; m is a natural number;
the abstract keyword generation module is used for generating abstract keywords of the target object, wherein the abstract keywords comprise the subjective selling point keywords and the objective selling point keywords;
the candidate sentence generating unit is used for generating candidate sentences, and the candidate sentences comprise the description sentences and the comment sentences;
the sentence matching module is used for selecting the matching sentences from the candidate sentences according to the abstract keywords.
17. The system for summarizing an item according to claim 16, wherein said subjective selling point keyword extracting module is configured to sequentially extract M review keywords ranked at the top in order from big to small as said subjective selling point keyword.
18. The system for summarizing an item according to claim 16, wherein said summarization keyword generation module is configured to generate said summarization keywords by de-duplicating all subjective point keywords and all objective point keywords.
19. The system for generating an article abstract according to claim 16, wherein the system for generating an article abstract further comprises a weight calculation module, the weight calculation module comprises a normalization unit and a weight generation unit, the sentence matching module comprises a sorting unit and a matching sentence selection unit;
the normalization unit is used for normalizing TF-IDF values of all objective selling point keywords;
the weight generation unit is used for taking the normalized TF-IDF value of each objective selling point keyword as a first weight of the objective selling point keyword;
the normalization unit is also used for normalizing the frequency of all the subjective selling point keywords;
the weight generation unit is also used for taking the normalized frequency of each subjective selling point keyword as a second weight of the subjective selling point keyword;
the weight generating unit is also used for generating the weight of the abstract key words; if the abstract keywords are objective selling point keywords, the weight of the abstract keywords is the first weight, if the abstract keywords are subjective selling point keywords, the weight of the abstract keywords is the second weight, and if the abstract keywords are both objective selling point keywords and subjective selling point keywords, the weight of the abstract keywords is the sum of the first weight and the second weight;
the sorting unit is used for sorting the abstract keywords in a descending order according to the weights of the abstract keywords;
the matching sentence selecting unit is used for sequentially selecting matching sentences matched with the abstract keywords from the candidate sentences according to a weight descending order;
the abstract generating module is used for sequentially selecting the matching sentences of each abstract keyword according to the weight descending order to form the abstract until the word number of the abstract reaches the preset word number.
20. The system for generating a summary of an item according to claim 19, wherein the system for generating a summary of an item further comprises a scoring module, the sentence matching module further comprises a keyword selection unit;
the scoring module is used for scoring each candidate statement to obtain the score of each candidate statement;
the keyword selection unit is used for selecting a first abstract keyword according to a weight descending order;
the matching statement selecting unit is used for extracting a first class of statements containing the first abstract key words from the candidate statements;
the sorting unit is used for sorting the sentences in the first type of sentences in an ascending order according to the score size;
the matching sentence selecting unit is used for selecting the first sentence in the first class of sentences as a first matching sentence matched with the first abstract keyword according to the ascending order of scores, and then calling the keyword selecting unit to execute the action of selecting the next abstract keyword according to the descending order of weights; the matching statement comprises the first matching statement;
the matching statement selecting unit is also used for extracting a second type of statement containing the next abstract keyword from the candidate statements;
the sorting unit is further used for sorting the sentences in the second category of sentences in an ascending order according to the score;
the matching sentence selecting unit is also used for selecting the first-ranked sentences in the second-class sentences as second matching sentences matched with the next abstract keywords according to the ascending order of scores, and then calling the keyword selecting unit to execute the action of selecting the next abstract keywords; the matching statement comprises the second matching statement.
21. The item summary generation system of claim 20, wherein the sentence matching module further comprises a similarity calculation unit, a culling unit and an updating unit;
the similarity calculation unit is used for sequentially selecting sentences in the second type of sentences and the first matching sentences according to the ascending order of scores to perform similarity calculation until the similarity is smaller than a preset similarity, and calling the rejection unit;
the eliminating unit is used for eliminating sentences of which the similarity is not less than the preset similarity in the second type of sentences;
the updating unit is used for updating the sentence with the similarity between the first matching sentence and the first matching sentence being less than the preset similarity in the second class of sentences into the first-ordered sentence in the second class of sentences;
and the matching statement selecting unit is used for taking the first-ranked statement in the updated second-class statements as a second matching statement matched with the next abstract keyword.
22. The system for generating a summary of an item according to claim 20, wherein the scoring module includes a score calculating unit and a weight giving unit;
the score calculation unit is used for calculating the lexical, syntactic and emotional values of the candidate sentences respectively based on an NLP algorithm to obtain a first score, a second score and a third score, and is also used for calculating the language confusion degree of the candidate sentences based on a PPL algorithm to obtain a fourth score;
the weight giving unit is used for giving corresponding weights to the first score, the second score, the third score and the fourth score respectively;
the scoring module is configured to sum the first score, the second score, the third score, and the fourth score of each candidate sentence in a weighted manner to obtain a score of each candidate sentence.
23. The article summarization generation system of claim 14 wherein the written sentence identification module comprises a single line sentence identification unit, a first calculation unit, a second calculation unit, a first judgment unit, a second judgment unit, a sentence confirmation unit and a written sentence generation unit;
the single-line sentence recognition unit is used for recognizing the text-describing picture based on OCR to obtain a plurality of single-line sentences on the text-describing picture;
the first calculating unit is used for calculating the pixel height difference between any two adjacent single-line sentences based on a PPL algorithm;
the first judging unit is used for judging whether the pixel height difference is within a preset range, and if so, the second calculating unit is called;
the second calculation unit is used for calculating the language confusion degree between any two adjacent single-line sentences;
the second judging unit is also used for judging whether the language confusion degree is smaller than a preset threshold value, and if the language confusion degree is smaller than the preset threshold value, the statement confirming unit is called;
the sentence confirming unit is used for confirming that any two adjacent single-line sentences belong to the same sentence;
the text description sentence generating unit is used for combining all the single-line sentences belonging to the same sentence to generate the text description sentence.
24. The system for generating a summary of an item of claim 14, wherein the system for generating a summary of an item further comprises a filtering module;
the filtering module is used for filtering text description sentences with wrong character recognition based on a preset dirty word bank;
the text description keyword extraction module is used for extracting the plurality of text description keywords from the filtered text description sentences.
CN201810603797.2A 2018-06-12 2018-06-12 Article abstract generation method, system, electronic equipment and readable storage medium Active CN110597978B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810603797.2A CN110597978B (en) 2018-06-12 2018-06-12 Article abstract generation method, system, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810603797.2A CN110597978B (en) 2018-06-12 2018-06-12 Article abstract generation method, system, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN110597978A true CN110597978A (en) 2019-12-20
CN110597978B CN110597978B (en) 2023-12-08

Family

ID=68848918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810603797.2A Active CN110597978B (en) 2018-06-12 2018-06-12 Article abstract generation method, system, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN110597978B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178953A (en) * 2019-12-20 2020-05-19 贝壳技术有限公司 Information generation method and device, electronic equipment and storage medium
CN111192082A (en) * 2019-12-26 2020-05-22 广东美的白色家电技术创新中心有限公司 Product selling point analysis method, terminal equipment and computer readable storage medium
CN111192111A (en) * 2019-12-26 2020-05-22 广东美的白色家电技术创新中心有限公司 Product sales data analysis method and terminal equipment
CN111738791A (en) * 2020-01-20 2020-10-02 北京沃东天骏信息技术有限公司 Text processing method, device, equipment and storage medium
CN112148988A (en) * 2020-10-16 2020-12-29 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for generating information
CN112288548A (en) * 2020-11-13 2021-01-29 北京沃东天骏信息技术有限公司 Method, device, medium and electronic equipment for extracting key information of target object

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101667194A (en) * 2009-09-29 2010-03-10 北京大学 Automatic abstracting method and system based on user comment text feature
US20100169317A1 (en) * 2008-12-31 2010-07-01 Microsoft Corporation Product or Service Review Summarization Using Attributes
CN105824915A (en) * 2016-03-16 2016-08-03 上海珍岛信息技术有限公司 Method and system for generating commenting digest of online shopped product
CN106294425A (en) * 2015-05-26 2017-01-04 富泰华工业(深圳)有限公司 The automatic image-text method of abstracting of commodity network of relation article and system
CN106599148A (en) * 2016-12-02 2017-04-26 东软集团股份有限公司 Method and device for generating abstract

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100169317A1 (en) * 2008-12-31 2010-07-01 Microsoft Corporation Product or Service Review Summarization Using Attributes
CN101667194A (en) * 2009-09-29 2010-03-10 北京大学 Automatic abstracting method and system based on user comment text feature
CN106294425A (en) * 2015-05-26 2017-01-04 富泰华工业(深圳)有限公司 The automatic image-text method of abstracting of commodity network of relation article and system
CN105824915A (en) * 2016-03-16 2016-08-03 上海珍岛信息技术有限公司 Method and system for generating commenting digest of online shopped product
CN106599148A (en) * 2016-12-02 2017-04-26 东软集团股份有限公司 Method and device for generating abstract

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张红斌 等: "基于梯度核特征及N-gram模型的商品图像句子标注", 《计算机科学》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178953A (en) * 2019-12-20 2020-05-19 贝壳技术有限公司 Information generation method and device, electronic equipment and storage medium
CN111178953B (en) * 2019-12-20 2023-10-31 贝壳技术有限公司 Information generation method and device, electronic equipment and storage medium
CN111192082A (en) * 2019-12-26 2020-05-22 广东美的白色家电技术创新中心有限公司 Product selling point analysis method, terminal equipment and computer readable storage medium
CN111192111A (en) * 2019-12-26 2020-05-22 广东美的白色家电技术创新中心有限公司 Product sales data analysis method and terminal equipment
CN111192082B (en) * 2019-12-26 2024-03-26 广东美的白色家电技术创新中心有限公司 Product selling point analysis method, terminal equipment and computer readable storage medium
CN111738791A (en) * 2020-01-20 2020-10-02 北京沃东天骏信息技术有限公司 Text processing method, device, equipment and storage medium
CN111738791B (en) * 2020-01-20 2024-05-24 北京沃东天骏信息技术有限公司 Text processing method, device, equipment and storage medium
CN112148988A (en) * 2020-10-16 2020-12-29 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for generating information
CN112148988B (en) * 2020-10-16 2023-07-28 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for generating information
CN112288548A (en) * 2020-11-13 2021-01-29 北京沃东天骏信息技术有限公司 Method, device, medium and electronic equipment for extracting key information of target object

Also Published As

Publication number Publication date
CN110597978B (en) 2023-12-08

Similar Documents

Publication Publication Date Title
JP7282940B2 (en) System and method for contextual retrieval of electronic records
CN110597978B (en) Article abstract generation method, system, electronic equipment and readable storage medium
US10860654B2 (en) System and method for generating an answer based on clustering and sentence similarity
CN109492222B (en) Intention identification method and device based on concept tree and computer equipment
CN111767716B (en) Method and device for determining enterprise multi-level industry information and computer equipment
US11907659B2 (en) Item recall method and system, electronic device and readable storage medium
CN109684627A (en) A kind of file classification method and device
CN104834651B (en) Method and device for providing high-frequency question answers
CN107943792B (en) Statement analysis method and device, terminal device and storage medium
WO2020233344A1 (en) Searching method and apparatus, and storage medium
CN110032734B (en) Training method and device for similar meaning word expansion and generation of confrontation network model
CN107239455B (en) Core word recognition method and device
CN112395881B (en) Material label construction method and device, readable storage medium and electronic equipment
CN114372122A (en) Information acquisition method, computing device and storage medium
CN111523019A (en) Method, apparatus, device and storage medium for outputting information
CN114742062B (en) Text keyword extraction processing method and system
CN109684467A (en) A kind of classification method and device of text
CN112988962B (en) Text error correction method and device, electronic equipment and storage medium
US20210357867A1 (en) Method, system and computer-readable medium for information retrieval
CN114329206A (en) Title generation method and device, electronic equipment and computer readable medium
CN113468339A (en) Label extraction method, system, electronic device and medium based on knowledge graph
CN112926297A (en) Method, apparatus, device and storage medium for processing information
CN112015989A (en) Method and device for pushing information
CN110704605A (en) Method, system and equipment for automatically generating article abstract and readable storage medium
CN113793191B (en) Commodity matching method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant