WO2021042529A1 - Procédé de génération automatique d'un résumé d'article, dispositif, et support d'informations lisible par ordinateur - Google Patents

Procédé de génération automatique d'un résumé d'article, dispositif, et support d'informations lisible par ordinateur Download PDF

Info

Publication number
WO2021042529A1
WO2021042529A1 PCT/CN2019/117289 CN2019117289W WO2021042529A1 WO 2021042529 A1 WO2021042529 A1 WO 2021042529A1 CN 2019117289 W CN2019117289 W CN 2019117289W WO 2021042529 A1 WO2021042529 A1 WO 2021042529A1
Authority
WO
WIPO (PCT)
Prior art keywords
article
data set
abstract
word
probability model
Prior art date
Application number
PCT/CN2019/117289
Other languages
English (en)
Chinese (zh)
Inventor
刘媛源
汪伟
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021042529A1 publication Critical patent/WO2021042529A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device, and computer-readable storage medium for deep learning from an original article data set to form an article abstract.
  • the existing abstract extraction methods are mainly based on the extractive abstract extraction method, and the sentences with higher importance are obtained by scoring and sorting the sentences. Because it is easy to cause scoring errors when scoring sentences, and the generated abstract lacks connectives, etc., the abstract sentences are not smooth and lack flexibility.
  • This application provides an article abstract automatic generation method, device, and computer-readable storage medium, the main purpose of which is to perform deep learning on the original article data set to obtain an article abstract.
  • an article abstract automatic generation method provided by this application includes:
  • the training set and label set are obtained after word vectorization and word vector encoding are performed on the primary article data set and the primary summary data set; input the training set and the label set into the pre-built automatic abstract generation model for training and obtain the training value, If the training value is less than the preset threshold, the automatic abstract generation model exits training; the article received by the user is inputted into the abstract automatic generation after preprocessing, word vectorization, and word vector encoding are performed on the article
  • the model generates a summary and outputs it.
  • the present application also provides an article abstract automatic generation device, which includes a memory and a processor, and an article abstract automatic generation program that can be run on the processor is stored in the memory.
  • the automatic article abstract generation program is executed by the processor, the following steps are implemented: receiving the original article data set and the original abstract data set, and performing word cutting and removing stop words on the original article data set and the original abstract data set Preprocess to obtain a primary article data set and a primary summary data set; perform word vectorization and word vector encoding on the primary article data set and the primary summary data set to obtain a training set and a label set, respectively; combine the training set and the label set Input to the pre-built abstract automatic generation model for training and obtain the training value.
  • the abstract automatic generation model exits the training; receives the article input by the user and performs the above-mentioned preprocessing on the article , Word vectorization and word vector encoding are input to the abstract automatic generation model to generate abstracts and output.
  • the present application also provides a computer-readable storage medium, the computer-readable storage medium stores an article abstract automatic generation program, the article abstract automatic generation program can be used by one or more processors Execute to realize the steps of the method for automatically generating article abstracts as described above.
  • This application performs preprocessing of the original article data set and the original abstract data set including word segmentation and removal of stop words, which can effectively extract words that may belong to the article abstract. Further, through word vectorization and word vector encoding, Without losing the accuracy of the features, the computer can be efficiently analyzed, and finally trained based on the pre-built abstract automatic generation model, so as to obtain the current article abstract. Therefore, the method, device, and computer-readable storage medium for automatically generating article abstracts proposed in this application can realize accurate, efficient and coherent article abstract content.
  • FIG. 1 is a schematic flowchart of a method for automatically generating article abstracts according to an embodiment of the application
  • FIG. 2 is a schematic diagram of the internal structure of an article abstract automatic generation device provided by an embodiment of the application;
  • FIG. 3 is a schematic diagram of modules of an article abstract automatic generation program in an article abstract automatic generation device provided by an embodiment of the application.
  • This application provides a method for automatically generating article abstracts.
  • FIG. 1 it is a schematic flowchart of a method for automatically generating article abstracts according to an embodiment of this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the method for automatically generating article abstracts includes:
  • the original article data set includes investment research reports, academic papers, government planning summaries, etc.
  • the original article data set does not include an abstract part
  • the original abstract data set is the original The article data set corresponds to the abstract of the article. If the investment research report A mainly discusses the company's future investment direction that can be carried out around the Internet education industry, a statement of several thousand or even tens of thousands of words, then the original abstract data set is a summary summary of the investment research report A , Generally can be a few hundred words or even a few crosses.
  • the word segmentation is to segment each sentence in the original article data set and the original abstract data set to obtain a single word. Because in Chinese representation, there is no clear separation mark between words and words, so the word segmentation is necessary.
  • the word segmentation described in this application is processed using a stuttering word database based on programming languages such as Python, JAVA, etc.
  • the stuttering word database is developed based on Chinese part-of-speech features and is based on the original article data set And the number of occurrences of each word in the original abstract data set is converted to frequency, and the path of maximum probability is found based on dynamic programming, and the maximum segmentation combination based on word frequency is found.
  • the text of investment research report A in the original article data set is:
  • companies must formulate qualified sales models based on market conditions, strive to expand market share, stabilize sales prices, and improve product competitiveness. Therefore, in the feasibility analysis, the marketing model must be studied.
  • the stuttering word database After the stuttering word database is processed, it becomes: in the commodity economy environment, the enterprise must formulate a qualified sales model according to market conditions, strive to expand market share, stabilize sales prices, and improve product competitiveness. Therefore, in the feasibility analysis, the marketing model must be studied.
  • the space part represents the processing result of the stuttering word database.
  • the de-stop words are those that have no practical meaning in the original article data set and the original abstract data set, and have no effect on the classification of the text, but have a high frequency of occurrence, including commonly used pronouns, prepositions, etc. Studies have shown that stop words that have no practical meaning will reduce the effect of text classification. Therefore, one of the most critical steps in the preprocessing of text data is to remove stop words.
  • the selected method for removing stop words is stop word list filtering, that is, one-by-one matching is performed through the stop word list that has been constructed and the words in the text data. If the matching is successful, then This word is a stop word and needs to be deleted.
  • the result is: the commodity economy environment, the enterprise formulates a qualified sales model according to the market situation, strives to expand the market share, stabilize the sales price, and improve the competitiveness of the product. Therefore, feasibility analysis, marketing model research.
  • the word vectorization is to represent any word in the primary article data set and the primary abstract data set with an N-dimensional matrix vector, where N is the primary article data set or the primary abstract The total number of words contained in the data set.
  • N is the primary article data set or the primary abstract The total number of words contained in the data set.
  • the following formula is used to initially vectorize the words
  • i represents the number of the word
  • v i represents the N-dimensional matrix vector of the word i, assuming that there are s words in total
  • v j is the j-th element of the N-dimensional matrix vector.
  • the word vector encoding is to shorten the generated N-dimensional matrix vector into data that has a smaller dimension and is easier to calculate for subsequent automatic generation of model training, that is, the primary article data set is finally converted into a training set, and The primary summary data set is finally transformed into a tag set.
  • the word vector encoding first establishes a forward probability model and a backward probability model, and then optimizes the forward probability model and the backward probability model to obtain an optimal solution, and the optimal solution is the result.
  • the training set and the label set are the same.
  • forward probability model and the backward probability model are respectively:
  • max represents the optimization
  • v i represents the N-dimensional matrix vector of word i
  • the primary article data set and the primary abstract data set have a total of s words
  • the dimension of the N-dimensional matrix vector is reduced to a smaller size, and the training set and the label set are obtained by completing the word vector encoding process.
  • the automatic abstract generation model includes a language prediction model, and the language prediction model can predict x l+1 by calculating the prediction probability according to the given words x 1 ,..., x l .
  • the predicted probability is:
  • the automatic abstract generation model further includes an input layer, a hidden layer and an output layer.
  • the input layer has n input units
  • the output layer has m output units, corresponding to m feature selection results
  • the number of units in the hidden layer is q.
  • Z represents the hidden layer to the output layer.
  • the output O q of the hidden layer is:
  • the output value y i of the j-th unit of the output layer is:
  • s is the number of features in the tag set.
  • the abstract of the academic paper is obtained after preprocessing and word vectorization are input into the automatic abstract generation model, and the abstract is the summary of the academic paper .
  • the invention also provides a device for automatically generating article abstracts.
  • FIG. 2 it is a schematic diagram of the internal structure of a device for automatically generating article abstracts according to an embodiment of the present application.
  • the article abstract automatic generation device 1 may be a PC (Personal Computer, personal computer), or a terminal device such as a smart phone, a tablet computer, a portable computer, or the like, or a server.
  • the article abstract automatic generation device 1 at least includes a memory 11, a processor 12, a communication bus 13, and a network interface 14.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, and the like.
  • the memory 11 may be an internal storage unit of the article abstract automatic generating device 1, for example, the hard disk of the article abstract automatic generating device 1.
  • the memory 11 may also be an external storage device of the article summary automatic generating device 1, such as a plug-in hard disk, a smart media card (SMC), and a secure digital device equipped on the article summary automatic generating device 1. (Secure Digital, SD) card, flash card (Flash Card), etc.
  • the memory 11 may also include both an internal storage unit of the article abstract automatic generation device 1 and an external storage device.
  • the memory 11 can be used not only to store application software and various data installed in the article abstract automatic generating device 1, such as the code of the article abstract automatic generating program 01, etc., but also to temporarily store data that has been output or will be output.
  • the processor 12 may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip, for running program codes or processing stored in the memory 11 Data, such as the execution of the article abstract automatic generation program 01 and so on.
  • CPU central processing unit
  • controller microcontroller
  • microprocessor or other data processing chip
  • the communication bus 13 is used to realize the connection and communication between these components.
  • the network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is usually used to establish a communication connection between the apparatus 1 and other electronic devices.
  • the device 1 may also include a user interface.
  • the user interface may include a display (Display) and an input unit such as a keyboard (Keyboard).
  • the optional user interface may also include a standard wired interface and a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc.
  • the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the article summary automatic generating device 1 and to display a visualized user interface.
  • Figure 2 only shows the article abstract automatic generation device 1 with components 11-14 and the article abstract automatic generation program 01. Those skilled in the art can understand that the structure shown in Figure 1 does not constitute an article abstract automatic generation device
  • the definition of 1 may include fewer or more components than shown, or a combination of certain components, or different component arrangements.
  • the article abstract automatic generation program 01 is stored in the memory 11; the processor 12 implements the following steps when executing the article abstract automatic generation program 01 stored in the memory 11:
  • Step 1 Receive the original article data set and the original abstract data set, and perform preprocessing including word cutting and removing stop words on the original article data set and the original abstract data set respectively to obtain the primary article data set and the primary abstract data set .
  • the original article data set includes investment research reports, academic papers, government planning summaries, etc.
  • the original article data set does not include an abstract part
  • the original abstract data set is the original The article data set corresponds to the abstract of the article. If the investment research report A mainly discusses the company's future investment direction that can be carried out around the Internet education industry, a statement of several thousand or even tens of thousands of words, then the original abstract data set is a summary summary of the investment research report A , Generally can be a few hundred words or even a few crosses.
  • the word segmentation is to segment each sentence in the original article data set and the original abstract data set to obtain a single word. Because in Chinese representation, there is no clear separation mark between words and words, so the word segmentation is necessary.
  • the word segmentation described in this application is processed using a stuttering word database based on programming languages such as Python, JAVA, etc.
  • the stuttering word database is developed based on Chinese part-of-speech features and is based on the original article data set And the number of occurrences of each word in the original abstract data set is converted to frequency, and the path of maximum probability is found based on dynamic programming, and the maximum segmentation combination based on word frequency is found.
  • the text of investment research report A in the original article data set is:
  • companies must formulate qualified sales models based on market conditions, strive to expand market share, stabilize sales prices, and improve product competitiveness. Therefore, in the feasibility analysis, the marketing model must be studied.
  • the stuttering word database After the stuttering word database is processed, it becomes: in the commodity economy environment, the enterprise must formulate a qualified sales model according to market conditions, strive to expand market share, stabilize sales prices, and improve product competitiveness. Therefore, in the feasibility analysis, the marketing model must be studied.
  • the space part represents the processing result of the stuttering word database.
  • the de-stop words are those that have no practical meaning in the original article data set and the original abstract data set, and have no effect on the classification of the text, but have a high frequency of occurrence, including commonly used pronouns, prepositions, etc. Studies have shown that stop words that have no practical meaning will reduce the effect of text classification. Therefore, one of the most critical steps in the preprocessing of text data is to remove stop words.
  • the selected method for removing stop words is stop word list filtering, that is, one-by-one matching is performed through the stop word list that has been constructed and the words in the text data. If the matching is successful, then This word is a stop word and needs to be deleted.
  • the result is: the commodity economy environment, the enterprise formulates a qualified sales model according to the market situation, strives to expand the market share, stabilize the sales price, and improve the competitiveness of the product. Therefore, feasibility analysis, marketing model research.
  • Step 1 Perform word vectorization and word vector encoding on the primary article data set and primary abstract data set to obtain a training set and a label set, respectively.
  • the word vectorization is to represent any word in the primary article data set and the primary abstract data set with an N-dimensional matrix vector, where N is the primary article data set or the primary abstract The total number of words contained in the data set.
  • N is the primary article data set or the primary abstract The total number of words contained in the data set.
  • the following formula is used to initially vectorize the words
  • i represents the number of the word
  • v i represents the N-dimensional matrix vector of the word i, assuming that there are s words in total
  • v j is the j-th element of the N-dimensional matrix vector.
  • the word vector encoding is to shorten the generated N-dimensional matrix vector into data that has a smaller dimension and is easier to calculate for subsequent automatic generation of model training, that is, the primary article data set is finally converted into a training set, and The primary summary data set is finally transformed into a tag set.
  • the word vector encoding first establishes a forward probability model and a backward probability model, and then optimizes the forward probability model and the backward probability model to obtain an optimal solution, and the optimal solution is the result.
  • the training set and the label set are the same.
  • forward probability model and the backward probability model are respectively:
  • max represents the optimization
  • v i represents the N-dimensional matrix vector of word i
  • the primary article data set and the primary abstract data set have a total of s words
  • the dimension of the N-dimensional matrix vector is reduced to a smaller size, and the training set and the label set are obtained by completing the word vector encoding process.
  • Step 3 Input the training set and the label set into the pre-built automatic abstract generation model for training and obtain the training value. If the training value is less than the preset threshold, the automatic abstract generation model exits the training.
  • the automatic abstract generation model includes a language prediction model, and the language prediction model can predict x l+1 by calculating the prediction probability according to the given words x 1 ,..., x l .
  • the predicted probability is:
  • the automatic abstract generation model further includes an input layer, a hidden layer and an output layer.
  • the input layer has n input units
  • the output layer has m output units, corresponding to m feature selection results
  • the number of units in the hidden layer is q.
  • Z represents the hidden layer to the output layer.
  • the output O q of the hidden layer is:
  • the output value y i of the j-th unit of the output layer is:
  • s is the number of features in the tag set.
  • Step 4 Receive the article input by the user, and input the article into the abstract automatic generation model after the above-mentioned preprocessing, word vectorization and word vector encoding to generate an abstract and output it.
  • the abstract of the academic paper is obtained after preprocessing and word vectorization are input into the automatic abstract generation model, and the abstract is the summary of the academic paper .
  • the article abstract automatic generation program can also be divided into one or more modules, and the one or more modules are stored in the memory 11 and run by one or more processors (this embodiment It is executed by the processor 12) to complete this application.
  • the module referred to in this application refers to a series of computer program instruction segments that can complete specific functions, and is used to describe the execution process of the article abstract automatic generation program in the article abstract automatic generation device .
  • FIG. 3 a schematic diagram of the program module of the article abstract automatic generation program in an embodiment of the article abstract automatic generation device of this application.
  • the article abstract automatic generation program can be divided into data receiving and The processing module 10, the word vector conversion module 20, the model training module 30, and the article summary output module 40 are exemplary:
  • the data receiving and processing module 10 is used to: receive an original article data set and an original abstract data set, and perform preprocessing including word cutting and removing stop words on the original article data set and the original abstract data set to obtain a preliminary article Data set and primary summary data set.
  • the word vector conversion module 20 is used for: performing word vectorization and word vector encoding on the primary article data set and the primary abstract data set to obtain a training set and a label set, respectively.
  • the model training module 30 is configured to: input the training set and label set into a pre-built summary automatic generation model for training and obtain training values, if the training value is less than a preset threshold, the summary automatic generation model Quit training.
  • the article abstract output module 40 is configured to receive an article input by a user, and input the article into the abstract automatic generation model after the above-mentioned preprocessing, word vectorization, and word vector encoding to generate an abstract and output it.
  • an embodiment of the present application also proposes a computer-readable storage medium.
  • the computer-readable storage medium stores an article abstract automatic generation program, and the article abstract automatic generation program can be executed by one or more processors to To achieve the following operations:
  • the original article data set and the original abstract data set are received, and the original article data set and the original abstract data set are preprocessed including word cutting and stop word removal to obtain the primary article data set and the primary abstract data set.
  • a training set and a label set are obtained respectively.
  • the training set and the label set are input into a pre-built summary automatic generation model for training and training values are obtained. If the training value is less than a preset threshold, the summary automatic generation model exits the training.
  • An article input by a user is received, and the article is preprocessed, word vectorized, and word vector encoded, and then input to the abstract automatic generation model to generate an abstract and output.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé de génération automatique d'un résumé d'article, comprenant : la réception d'un ensemble de données d'article d'origine et d'un ensemble de données de résumé d'origine et la réalisation d'un prétraitement comprenant une découpe de mots et une suppression de mots vides de façon à obtenir un ensemble de données d'article primaire et un ensemble de données de résumé primaire (S1) ; la réalisation d'une vectorisation de mots et d'un codage de vecteur de mots sur l'ensemble de données d'article primaire et l'ensemble de données de résumé primaires pour obtenir un ensemble d'apprentissage et un ensemble d'étiquettes (S2) ; l'entrée de l'ensemble d'apprentissage et de l'ensemble d'étiquettes dans un modèle de génération automatique de résumé pré-construit pour l'apprentissage afin d'obtenir une valeur d'apprentissage ; si la valeur d'apprentissage est inférieure à un seuil prédéfini, alors le modèle de génération automatique de résumé sort de l'apprentissage (S3) ; la réception d'un article saisi par un utilisateur, et après que l'article a fait l'objet d'un pré-traitement, d'une vectorisation de mot, et d'un codage de vecteur de mots, l'entrée dans le modèle de génération de résumé automatique pour générer un résumé et délivrer celui-ci en sortie (S4). L'invention concerne également un dispositif permettant de générer automatiquement un résumé d'article et un support de stockage lisible par ordinateur. Le procédé permet d'obtenir une génération automatique précise et efficace de résumés d'articles.
PCT/CN2019/117289 2019-09-02 2019-11-12 Procédé de génération automatique d'un résumé d'article, dispositif, et support d'informations lisible par ordinateur WO2021042529A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910840724.XA CN110717333B (zh) 2019-09-02 2019-09-02 文章摘要自动生成方法、装置及计算机可读存储介质
CN201910840724.X 2019-09-02

Publications (1)

Publication Number Publication Date
WO2021042529A1 true WO2021042529A1 (fr) 2021-03-11

Family

ID=69210312

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117289 WO2021042529A1 (fr) 2019-09-02 2019-11-12 Procédé de génération automatique d'un résumé d'article, dispositif, et support d'informations lisible par ordinateur

Country Status (2)

Country Link
CN (1) CN110717333B (fr)
WO (1) WO2021042529A1 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111428449B (zh) * 2020-03-02 2024-10-25 中国平安人寿保险股份有限公司 邮件智能编辑方法、装置及计算机可读存储介质
CN111708878B (zh) * 2020-08-20 2020-11-24 科大讯飞(苏州)科技有限公司 一种体育文本摘要提取方法、装置、存储介质及设备
CN112434157B (zh) * 2020-11-05 2024-05-17 平安直通咨询有限公司上海分公司 文书多标签分类方法、装置、电子设备及存储介质
CN112634863B (zh) * 2020-12-09 2024-02-09 深圳市优必选科技股份有限公司 一种语音合成模型的训练方法、装置、电子设备及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943783A (zh) * 2017-10-12 2018-04-20 北京知道未来信息技术有限公司 一种基于lstm‑cnn的分词方法
CN108090049A (zh) * 2018-01-17 2018-05-29 山东工商学院 基于句子向量的多文档摘要自动提取方法及系统
CN108319630A (zh) * 2017-07-05 2018-07-24 腾讯科技(深圳)有限公司 信息处理方法、装置、存储介质和计算机设备
CN109766432A (zh) * 2018-07-12 2019-05-17 中国科学院信息工程研究所 一种基于生成对抗网络的中文摘要生成方法和装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4343213B2 (ja) * 2006-12-25 2009-10-14 株式会社東芝 文書処理装置および文書処理方法
CN105930314B (zh) * 2016-04-14 2019-02-05 清华大学 基于编码-解码深度神经网络的文本摘要生成系统及方法
CN107908635B (zh) * 2017-09-26 2021-04-16 百度在线网络技术(北京)有限公司 建立文本分类模型以及文本分类的方法、装置
CN108304445B (zh) * 2017-12-07 2021-08-03 新华网股份有限公司 一种文本摘要生成方法和装置
US10437936B2 (en) * 2018-02-01 2019-10-08 Jungle Disk, L.L.C. Generative text using a personality model
CN109241272B (zh) * 2018-07-25 2021-07-06 华南师范大学 一种中文文本摘要生成方法、计算机可读储存介质及计算机设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319630A (zh) * 2017-07-05 2018-07-24 腾讯科技(深圳)有限公司 信息处理方法、装置、存储介质和计算机设备
CN107943783A (zh) * 2017-10-12 2018-04-20 北京知道未来信息技术有限公司 一种基于lstm‑cnn的分词方法
CN108090049A (zh) * 2018-01-17 2018-05-29 山东工商学院 基于句子向量的多文档摘要自动提取方法及系统
CN109766432A (zh) * 2018-07-12 2019-05-17 中国科学院信息工程研究所 一种基于生成对抗网络的中文摘要生成方法和装置

Also Published As

Publication number Publication date
CN110717333A (zh) 2020-01-21
CN110717333B (zh) 2024-01-16

Similar Documents

Publication Publication Date Title
WO2021068339A1 (fr) Procédé et dispositif de classification de texte, et support de stockage lisible par ordinateur
US11030199B2 (en) Systems and methods for contextual retrieval and contextual display of records
WO2021042529A1 (fr) Procédé de génération automatique d'un résumé d'article, dispositif, et support d'informations lisible par ordinateur
US10796084B2 (en) Methods, systems, and articles of manufacture for automatic fill or completion for application software and software services
WO2020253042A1 (fr) Procédé et dispositif intelligent d'évaluation de sentiments et support de stockage lisible par ordinateur
US20160239739A1 (en) Semantic frame identification with distributed word representations
US20180068221A1 (en) System and Method of Advising Human Verification of Machine-Annotated Ground Truth - High Entropy Focus
US8676566B2 (en) Method of extracting experience sentence and classifying verb in blog
CN110765765B (zh) 基于人工智能的合同关键条款提取方法、装置及存储介质
US11651015B2 (en) Method and apparatus for presenting information
WO2022048363A1 (fr) Procédé et appareil de classification de site web, dispositif informatique et support de stockage
CN111783471B (zh) 自然语言的语义识别方法、装置、设备及存储介质
CN112860919B (zh) 基于生成模型的数据标注方法、装置、设备及存储介质
WO2020258481A1 (fr) Procédé et appareil de recommandation intelligente de texte personnalisé, et support d'enregistrement lisible par ordinateur
CN111753082A (zh) 基于评论数据的文本分类方法及装置、设备和介质
US11972625B2 (en) Character-based representation learning for table data extraction using artificial intelligence techniques
CN113360654B (zh) 文本分类方法、装置、电子设备及可读存储介质
US11347944B2 (en) Systems and methods for short text identification
US20230004819A1 (en) Method and apparatus for training semantic retrieval network, electronic device and storage medium
CN113051380A (zh) 信息生成方法、装置、电子设备和存储介质
CN114970553A (zh) 基于大规模无标注语料的情报分析方法、装置及电子设备
CN112906368B (zh) 行业文本增量方法、相关装置及计算机程序产品
US20210224307A1 (en) Information processing device, information processing system, and computer program product
CN117390173A (zh) 一种语义相似度匹配的海量简历筛选方法
CN112560427A (zh) 问题扩展方法、装置、电子设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19944274

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19944274

Country of ref document: EP

Kind code of ref document: A1