WO2018028164A1 - Text information extracting method, device and mobile terminal - Google Patents

Text information extracting method, device and mobile terminal Download PDF

Info

Publication number
WO2018028164A1
WO2018028164A1 PCT/CN2017/073944 CN2017073944W WO2018028164A1 WO 2018028164 A1 WO2018028164 A1 WO 2018028164A1 CN 2017073944 W CN2017073944 W CN 2017073944W WO 2018028164 A1 WO2018028164 A1 WO 2018028164A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
symbol
extracted
vector
text
Prior art date
Application number
PCT/CN2017/073944
Other languages
French (fr)
Chinese (zh)
Inventor
陈军
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2018028164A1 publication Critical patent/WO2018028164A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the embodiments of the present invention relate to the field of information processing technologies, and in particular, to a method, an apparatus, and a mobile terminal for extracting text information.
  • SMS and notification messages have become an essential function of mobile terminals.
  • the terminal will receive various types of short messages and notification messages, such as billing information, booking information, schedules, etc., with the increase of such information, the user is not very convenient to retrieve. If you can extract the key content of this information and combine it with other applications of the mobile phone, such as depositing into accounting software, calendar and other applications, it will bring great convenience to users in the inquiry and reminder of information, which is convenient for users. usage of.
  • the user For example, for a bank SMS bill, the user generally withdraws the repayment date and the repayment amount by himself and deposits it in the schedule. If the terminal can intelligently extract such useful information and output it to the calendar, the user does not have to spend a lot of effort to find the search for the terminal to store a large number of short messages and notification messages, and it is not easy to forget the important schedule.
  • the technical problem to be solved by the embodiments of the present invention is to provide a method, a device, and a mobile terminal for extracting text information, and solving the problem that the fixed template is difficult to extract key information flexibly and accurately in the related art.
  • an embodiment of the present invention provides a method for extracting text information, including:
  • the step of determining, according to the context information of the first symbol, whether the first symbol meets the semantics of the information to be extracted includes:
  • the performing the weighting operation according to the first vector information and the second vector information, and determining, according to the operation result, whether the first symbol meets the semantics of the information to be extracted includes:
  • the step of performing a weighting operation by using the weight coefficients corresponding to the preset multiple information types according to the first vector information and the second vector information includes:
  • the step of identifying information corresponding to the preset one or more symbols in the text information includes:
  • the information corresponding to the preset one or more symbols in the text information is identified by means of regular expressions and/or keyword matching.
  • the step of acquiring the first symbol corresponding to the information to be extracted and the context information of the first symbol includes:
  • the extraction method further includes:
  • the obtained characters before the first symbol and the preset useless characters included in the characters after the first symbol are excluded, and the preset useless characters include punctuation marks, modal characters and blank symbols.
  • the step of acquiring the first symbol corresponding to the information to be extracted and the context information of the first symbol includes:
  • the first symbol corresponding to the information to be extracted and the context information of the first symbol are acquired.
  • an embodiment of the present invention further provides a text information extracting apparatus, including:
  • the replacement module is configured to identify information corresponding to the preset one or more symbols in the text information, and replace the identified information with the corresponding symbol;
  • Obtaining a module configured to acquire, in the replaced text information, a first symbol corresponding to the information to be extracted and context information of the first symbol;
  • the extracting module is configured to determine, according to the context information of the first symbol, whether the first symbol conforms to the semantics of the information to be extracted, and if yes, extract the content replaced by the first symbol Information and output.
  • the extraction module includes:
  • a first acquiring sub-module configured to acquire, in a preset vector database, first vector information corresponding to the first symbol and second vector information corresponding to context information of the first symbol;
  • the first determining sub-module is configured to perform a weighting operation according to the first vector information and the second vector information, and determine, according to the operation result, whether the first symbol conforms to the semantics of the information to be extracted.
  • an embodiment of the present invention further provides a mobile terminal, comprising: the text information extracting apparatus according to any one of the preceding claims.
  • Another embodiment of the present invention provides a computer storage medium, where the computer storage medium stores execution instructions for performing one or a combination of the steps in the foregoing method embodiments.
  • the method for extracting text information in the embodiment of the present invention first identifies information corresponding to the preset one or more symbols in the text information, and replaces the identified information with the corresponding symbol; and then in the replaced text information. Obtaining the first symbol corresponding to the information to be extracted and the context information of the first symbol; finally, determining, according to the context information of the first symbol, whether the first symbol conforms to the semantics of the information to be extracted, and if yes, extracting the text information The information replaced by the first symbol is output.
  • the semantic feature of the context of the text information is used to extract the information, and the content of interest to the user can be intelligently extracted; without specifying a keyword, the method has greater flexibility than the traditional template matching method, and can adapt to different writing modes.
  • the terminal enables various applications based on intelligent understanding of the text language and enhances the user experience. Solved the use of fixed technology in related technologies It is difficult for templates to extract key information in a flexible and accurate manner.
  • FIG. 1 is a flowchart of a method for extracting text information according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of an apparatus for extracting text information according to an embodiment of the present invention.
  • a method for extracting text information includes:
  • Step 101 Identify information corresponding to the preset one or more symbols in the text information, and replace the identified information with the corresponding symbol.
  • the text information includes short messages and notification messages received by the terminal, and the like.
  • some special types of words and/or symbols corresponding to words may be preset.
  • the e-mail address, URL, date, time, percentage, quantifier, currency, phone number, number, foreign words, etc. contained in the text message string can be replaced with special symbols.
  • custom vocabulary can also be replaced with special symbols, such as vocabulary, idioms, food, place, equipment, person name, place name, organization name, etc. in professional application fields.
  • the preset symbols include “DATE” corresponding to the date, “CURRENCY” corresponding to the currency, “BANK” corresponding to the bank, and “TIME” corresponding to the time.
  • the preset symbols include “DATE” corresponding to the date, “CURRENCY” corresponding to the currency, “BANK” corresponding to the bank, and “TIME” corresponding to the time.
  • For the receipt of a text message "Your personal credit card November bill RMB 481.93, expired repayment date November 23rd. [China Merchants Bank]”, after identification, replacement, become “your personal credit card DATE bill CURRENCY, expired Day DATE. [BANK]”.
  • the repayment amount is RMB 940.18.
  • [ICBC] after Identification, After the replacement, become a "respected customer, your personal loan in BANK must be repaid before TIME DATE, the repayment amount of principal and interest total CURREN
  • Step 102 Acquire, in the replaced text information, a first symbol corresponding to the information to be extracted and context information of the first symbol.
  • the first symbol and the context information of the first symbol need to be acquired in the text information to determine, by subsequent steps, whether the semantics of the first symbol in the text information conform to the semantics of the information to be extracted.
  • the context information of the symbols "DATE” and "DATE” corresponding to the repayment date needs to be obtained in the replaced text information.
  • Step 103 Determine, according to the context information of the first symbol, whether the first symbol meets the semantics of the information to be extracted, and if yes, extract information that is replaced by the first symbol from the text information. Output.
  • a plurality of first symbols may be acquired in the text information, and the semantics of each of the first symbols may be different in the text information. Therefore, it is required to combine the context information of the first symbol to determine whether the first symbol conforms to the semantics of the information to be extracted. If it is met, the information replaced by the first symbol is the information to be extracted, and the information replaced by the first symbol is extracted from the text information and output.
  • the outputted information can be output to some applications of the terminal, such as outputting the repayment date to the calendar application, so as to implement functions such as date reminding.
  • the method for extracting text information combines the semantic features of the context of the text information to extract information, and can intelligently extract content of interest such as repayment date and repayment amount; without specifying keywords,
  • the template matching method has greater flexibility and can adapt to different writing styles; enables the terminal to carry out various applications on the basis of intelligent understanding of the text language, thereby improving the user experience.
  • the problem that the fixed template is difficult to extract key information flexibly and accurately in the related art is solved.
  • the step of determining, according to the context information of the first symbol, whether the first symbol meets the semantics of the information to be extracted may include:
  • Step 1031 Acquire, in a preset vector database, first vector information corresponding to the first symbol and second vector information corresponding to context information of the first symbol.
  • the first vector information corresponding to the first symbol and the second vector information corresponding to the context information of the first symbol may be acquired in the pre-trained vector database to perform weighting calculation by the subsequent steps.
  • the vector database may include a vector value corresponding to each symbol and a vector value corresponding to a word and/or a word that may be used in the context.
  • the vector value corresponding to each word and/or word included in the context information may be obtained to obtain a vector sequence.
  • the vector in the vector sequence should be consistent with the contextual order of the text information.
  • Step 1032 Perform a weighting operation according to the first vector information and the second vector information, and determine, according to the operation result, whether the first symbol conforms to the semantics of the information to be extracted.
  • the weighting operation is performed according to the acquired vector information, and according to the operation result, it is determined whether the first symbol conforms to the semantics of the information to be extracted (such as the repayment date).
  • the weighting operation based on the vector information can accurately determine the semantics of the first symbol, thereby achieving the purpose of accurately extracting key information.
  • the foregoing step 1032 may include:
  • Step 10321 Perform weighting operations on the first vector information and the second vector information by using weight coefficients corresponding to preset multiple information types to obtain an operation result.
  • Step 10322 Determine, according to the operation result, an information type of the first symbol.
  • the information type of the first symbol is determined by the calculated probability value of each type of information.
  • the information type with the largest probability value can be selected as the information type of the first symbol.
  • Step 10323 Determine whether the information type of the first symbol is consistent with the information type of the information to be extracted, and if yes, determine that the first symbol conforms to the semantics of the information to be extracted, otherwise, determine the first The symbol does not conform to the semantics of the information to be extracted.
  • the information type of the first symbol is consistent with the information type of the information to be extracted, it may be determined that the first symbol conforms to the semantics of the information to be extracted, otherwise, it may be determined that the first symbol does not conform to the semantics of the information to be extracted.
  • the information type of the information to be extracted may be a repayment date and a repayment amount, that is, it is possible to simultaneously extract a plurality of information to be extracted.
  • the weighting operation is performed by the weight coefficient corresponding to the preset information type, and the semantics of the first symbol can be accurately determined, thereby achieving the purpose of accurately extracting key information.
  • the step of the above step 10321 may include:
  • Step 103211 Perform a pre-trained model using a bidirectional long- and short-range memory model neural network or a convolutional neural network, and perform pre-processing on the first vector information and the second vector information to obtain a combined vector.
  • Step 103212 Perform weighting operations on the weight coefficients corresponding to the multiple information types according to the combination vector.
  • the model pre-trained by the two-way long- and short-range memory model neural network or the convolutional neural network first pre-processes the first vector information and the second vector information to obtain a combined vector of the first symbol and the context, and then passes the combination.
  • Vector weight coefficient corresponding to multiple information types Do not perform weighting operations, can accurately determine the semantics of the first symbol, so as to accurately extract key information.
  • the step of identifying the information corresponding to the preset one or more symbols in the text information may include:
  • step 1011 the information corresponding to the preset one or more symbols in the text information is identified by using a regular expression and/or a keyword matching manner.
  • the regular expression and/or keyword matching method can accurately identify the information corresponding to the preset symbol in the text information.
  • the foregoing step 102 may include:
  • Step 1021 In the replaced text information, acquiring a first symbol corresponding to the information to be extracted, and acquiring a first preset number of characters and/or the first symbol before the first symbol A second predetermined number of characters including words and/or words.
  • first preset number and the second preset number are both set to 5, it is necessary to acquire 5 characters before and after the first symbol.
  • the Chinese sentence is very free, it is generally more important than the following to identify the current symbol. Therefore, an asymmetric context can also be used. If the first preset number is set to 7 and the second preset number is set to 5, it is necessary to acquire 7 characters before the first symbol and 5 characters after the first symbol.
  • the number of characters of the context can be defined as needed to better distinguish the semantics of the first symbol in combination with the context.
  • the number of characters determining the context is equivalent to determining the size of the context window of the current symbol, and the semantics of the current symbol are subsequently determined by the characters in the context window. Assume that the first preset number and the second preset number are both set to 5. For DATE in "expiration repayment date DATE. [BANK]”, if DATE is the current symbol to discriminate semantics, the context window contains words It is “to”, “period”, “return”, “model”, “day”, “.”, “[”, “BANK”, “]”.
  • the extracting method may further include:
  • Step 1022 culling the obtained character before the first symbol and the first symbol
  • the preset useless characters are included in the following characters, and the preset useless characters include punctuation marks, modal particles, and blank symbols.
  • the preset useless characters may also include some special symbols and the like.
  • step 102 may include:
  • Step 1023 Perform word segmentation on the replaced text information.
  • Step 1024 Acquire, in the text information after the word segmentation, the first symbol corresponding to the information to be extracted and the context information of the first symbol.
  • the word segmentation technique can be used to first perform word segmentation on the content of the text information, that is, the common words are separated, thereby facilitating the semantic judgment.
  • the word vector corresponding to the word can be directly read, and the corresponding word vector does not have to be read.
  • the above-mentioned word segmentation process can be omitted, because the model of the weighting operation can express the semantics of various combinations of different words when the sample is sufficient.
  • the method for extracting text information combines the semantic features of the context of the text information to extract information, and can intelligently extract content of interest such as repayment date and repayment amount; Compared with the traditional template matching method, it has greater flexibility and can adapt to different writing styles; enables the terminal to carry out various applications on the basis of intelligent understanding of the text language, facilitating the realization of smart reminders and other functions; Subsequent storage, retrieval and other applications have improved the user experience.
  • the problem that the fixed template is difficult to extract key information flexibly and accurately in the related art is solved.
  • an embodiment of the present invention further provides an apparatus for extracting text information, including:
  • the replacement module 201 is configured to identify information corresponding to the preset one or more symbols in the text information, and replace the identified information with the corresponding symbol;
  • the obtaining module 202 is configured to: obtain, in the replaced text information, a first symbol corresponding to the information to be extracted and context information of the first symbol;
  • the extracting module 203 is configured to determine, according to the context information of the first symbol, whether the first symbol conforms to the semantics of the information to be extracted, and if yes, extract the text information to be replaced by the first symbol Information and output.
  • the text information extracting apparatus of the embodiment of the present invention combines the semantic features of the context of the text information to extract information, and can intelligently extract content of interest such as repayment date and repayment amount; without specifying a keyword,
  • the template matching method has greater flexibility and can adapt to different writing styles; enables the terminal to carry out various applications on the basis of intelligent understanding of the text language, thereby improving the user experience.
  • the problem that the fixed template is difficult to extract key information flexibly and accurately in the related art is solved.
  • the extraction module 203 includes:
  • a first acquiring sub-module configured to acquire, in a preset vector database, first vector information corresponding to the first symbol and second vector information corresponding to context information of the first symbol;
  • the first determining sub-module is configured to perform a weighting operation according to the first vector information and the second vector information, and determine, according to the operation result, whether the first symbol conforms to the semantics of the information to be extracted.
  • the first determining submodule comprises:
  • the first weighting operation unit is configured to perform weighting operations on the first vector information and the second vector information by using weight coefficients corresponding to the preset plurality of information types to obtain an operation result;
  • a first determining unit configured to determine, according to the operation result, an information type of the first symbol
  • a second determining unit configured to determine whether the information type of the first symbol is consistent with the information type of the information to be extracted, and if yes, determining that the first symbol conforms to the semantics of the information to be extracted, otherwise, determining The first symbol does not conform to the semantics of the information to be extracted.
  • the first weighting operation unit includes:
  • a pre-processing sub-unit configured to pre-train the model using a bidirectional long- and short-range memory model neural network or a convolutional neural network, and pre-process the first vector information and the second vector information to obtain a combined vector;
  • the first weighting operation subunit is configured to perform a weighting operation on the weight coefficients corresponding to the plurality of information types according to the combination vector.
  • the replacement module 201 includes:
  • the identification sub-module is configured to identify information corresponding to the preset one or more symbols in the text information by using a regular expression and/or a keyword matching manner.
  • the obtaining module 202 includes:
  • a second obtaining sub-module configured to acquire, in the replaced text information, a first symbol corresponding to the information to be extracted, and acquire a first preset number of characters and/or the first symbol A second predetermined number of characters after the first symbol, the characters including words and/or words.
  • the extracting device further includes:
  • the culling module is configured to cull the obtained character before the first symbol and the preset useless characters included in the character after the first symbol, the preset useless characters including punctuation marks, modal characters and blank symbols.
  • the obtaining module 202 includes:
  • a word segmentation sub-module configured to perform word segmentation on the replaced text information
  • the third obtaining sub-module is configured to acquire, in the text information after the word segmentation processing, the first symbol corresponding to the information to be extracted and the context information of the first symbol.
  • the text information extracting apparatus of the embodiment of the present invention combines the semantic features of the context of the text information to extract information, and can intelligently extract content of interest such as repayment date and repayment amount; Compared with the traditional template matching method, it has greater flexibility and can adapt to different writing styles; enables the terminal to carry out various applications on the basis of intelligent understanding of the text language, facilitating the realization of smart reminders and other functions; Subsequent storage, retrieval The application experience has improved the user experience. The problem that the fixed template is difficult to extract key information flexibly and accurately in the related art is solved.
  • the apparatus for extracting the text information is a device corresponding to the method for extracting the text information, wherein all the implementation manners in the foregoing method embodiments are applicable to the embodiment of the device, and the same technical effect can be achieved. .
  • the text information extracting apparatus of the embodiment of the present invention is applied to a mobile terminal. Therefore, the embodiment of the present invention further provides a mobile terminal, including: the text information extracting apparatus as described in the foregoing embodiment.
  • the implementation examples of the foregoing text information extracting apparatus are applicable to the embodiment of the mobile terminal, and the same technical effects can be achieved.
  • the mobile terminal of the present invention may be a mobile electronic device such as a mobile phone or a tablet computer.
  • Embodiments of the present invention also provide a storage medium.
  • the foregoing storage medium stores an execution instruction, where the execution instruction is used to perform one or a combination of the steps in the foregoing method embodiments.
  • the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), and a Random Access Memory (RAM).
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • the method, apparatus, and mobile terminal for extracting text information provided by the embodiments of the present invention have the following beneficial effects: can be combined with the semantic features of the context of the text information.
  • the extraction of information can intelligently extract the content of interest to the user; it does not need to specify keywords, and has greater flexibility than the traditional template matching method, and can adapt to different styles of writing; the terminal is developed on the basis of intelligent understanding of the text language.
  • a variety of applications that enhance the user experience. The problem that the fixed template is difficult to extract key information flexibly and accurately in the related art is solved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

A method, an device for extracting text information and a mobile terminal, relating to the field of information processing technology, solve the problem that it is difficult to extract key information flexibly and accurately using a fixed template. The method comprises: identifying information in the text information corresponding to one or more symbols preset, and replacing the identified information with the corresponding symbol (101); in the replaced text information, obtaining a first symbol corresponding to information to be extracted and context information of the first symbol (102); and according to the context information of the first symbol, determining whether or not the first symbol conforms to the semantic of the information to be extracted, and if the first symbol conforms to the semantic of the information to be extracted, then extracting the information replaced by the first symbol from the text information and outputting the same (103). In the above method, information is extracted by combining semantic features of the context of the text information, which can be flexibly adapted to different manners of writing, and can accurately extract the content the user interests.

Description

一种文本信息的提取方法、装置和移动终端Method, device and mobile terminal for extracting text information 技术领域Technical field
本发明实施例涉及信息处理技术领域,特别涉及一种文本信息的提取方法、装置和移动终端。The embodiments of the present invention relate to the field of information processing technologies, and in particular, to a method, an apparatus, and a mobile terminal for extracting text information.
背景技术Background technique
目前,短信、通知消息已经成为手机终端的必备功能。在日常生活中终端会接收各类短信和通知消息,比如账单信息、订票信息、日程安排等等,随着这些信息的增多,用户检索起来不是很方便。如果能提取这些信息中的关键内容并与手机的其他应用相结合,比如存放到记账软件、日程表等应用中,将在信息的查询、提醒上给用户带来极大的便利,方便用户的使用。At present, SMS and notification messages have become an essential function of mobile terminals. In daily life, the terminal will receive various types of short messages and notification messages, such as billing information, booking information, schedules, etc., with the increase of such information, the user is not very convenient to retrieve. If you can extract the key content of this information and combine it with other applications of the mobile phone, such as depositing into accounting software, calendar and other applications, it will bring great convenience to users in the inquiry and reminder of information, which is convenient for users. usage of.
例如,对于银行短信账单,用户一般通过自行提取还款日和还款金额,并存放到日程表中。如果终端能智能提取这些有用信息,并输出到日程表中,对于终端存储了大量短信和通知消息的情况,用户就不必花费大量精力去查找检索,也不容易遗忘重要的日程安排。For example, for a bank SMS bill, the user generally withdraws the repayment date and the repayment amount by himself and deposits it in the schedule. If the terminal can intelligently extract such useful information and output it to the calendar, the user does not have to spend a lot of effort to find the search for the terminal to store a large number of short messages and notification messages, and it is not easy to forget the important schedule.
传统对于关键信息的提取,大多采用关键词模版匹配的方式。但文字消息的行文是非常灵活的,关键词依赖于上下文的行文往往具有不同的含义,因此采用固定模版很难灵活、准确地提取关键信息。Traditionally, for the extraction of key information, most of them use the keyword template matching method. However, the text of text messages is very flexible. Keywords that rely on contexts often have different meanings. Therefore, it is difficult to extract key information flexibly and accurately using fixed templates.
发明内容Summary of the invention
本发明实施例要解决的技术问题是提供一种文本信息的提取方法、装置和移动终端,解决相关技术中采用固定模版很难灵活、准确地提取关键信息的问题。The technical problem to be solved by the embodiments of the present invention is to provide a method, a device, and a mobile terminal for extracting text information, and solving the problem that the fixed template is difficult to extract key information flexibly and accurately in the related art.
为解决上述技术问题,本发明的实施例提供一种文本信息的提取方法,包括:To solve the above technical problem, an embodiment of the present invention provides a method for extracting text information, including:
识别文本信息中与预设的一个或多个符号对应的信息,并将识别出的 信息用对应的符号进行替换;Identifying information in the text message corresponding to the preset one or more symbols and identifying the The information is replaced with the corresponding symbol;
在替换后的所述文本信息中,获取与待提取信息对应的第一符号以及所述第一符号的上下文信息;Obtaining, in the replaced text information, a first symbol corresponding to the information to be extracted and context information of the first symbol;
根据所述第一符号的上下文信息,判断所述第一符号是否符合所述待提取信息的语义,若符合,则从所述文本信息中提取被所述第一符号替换的信息并输出。Determining, according to the context information of the first symbol, whether the first symbol conforms to the semantics of the information to be extracted, and if yes, extracting information replaced by the first symbol from the text information and outputting the information.
可选地,所述根据所述第一符号的上下文信息,判断所述第一符号是否符合所述待提取信息的语义的步骤包括:Optionally, the step of determining, according to the context information of the first symbol, whether the first symbol meets the semantics of the information to be extracted includes:
在预设的向量数据库中,获取所述第一符号对应的第一向量信息以及所述第一符号的上下文信息对应的第二向量信息;Obtaining, in a preset vector database, first vector information corresponding to the first symbol and second vector information corresponding to context information of the first symbol;
根据所述第一向量信息以及所述第二向量信息进行加权运算,并根据运算结果,判断所述第一符号是否符合所述待提取信息的语义。Performing a weighting operation according to the first vector information and the second vector information, and determining, according to the operation result, whether the first symbol conforms to the semantics of the information to be extracted.
可选地,所述根据所述第一向量信息以及所述第二向量信息进行加权运算,并根据运算结果,判断所述第一符号是否符合所述待提取信息的语义的步骤包括:Optionally, the performing the weighting operation according to the first vector information and the second vector information, and determining, according to the operation result, whether the first symbol meets the semantics of the information to be extracted includes:
根据所述第一向量信息以及所述第二向量信息,采用与预设的多种信息类型对应的权系数分别进行加权运算,得到运算结果;And performing weighting operations on the first vector information and the second vector information by using weight coefficients corresponding to preset multiple information types to obtain an operation result;
根据所述运算结果,确定所述第一符号的信息类型;Determining an information type of the first symbol according to the operation result;
判断所述第一符号的信息类型是否与所述待提取信息的信息类型一致,若一致,则确定所述第一符号符合所述待提取信息的语义,否则,确定所述第一符号不符合所述待提取信息的语义。Determining whether the information type of the first symbol is consistent with the information type of the information to be extracted, and if yes, determining that the first symbol conforms to the semantics of the information to be extracted, otherwise determining that the first symbol does not match The semantics of the information to be extracted.
可选地,所述根据所述第一向量信息以及所述第二向量信息,采用与预设的多种信息类型对应的权系数分别进行加权运算的步骤包括:Optionally, the step of performing a weighting operation by using the weight coefficients corresponding to the preset multiple information types according to the first vector information and the second vector information includes:
采用双向长短程记忆模型神经网络或者卷积神经网络预先训练出的模型,对所述第一向量信息以及所述第二向量信息进行预处理,得到组合向量; Using a bidirectional long- and short-range memory model neural network or a pre-trained model of a convolutional neural network, pre-processing the first vector information and the second vector information to obtain a combined vector;
根据所述组合向量与所述多种信息类型对应的权系数分别进行加权运算。Performing a weighting operation according to the weight coefficients corresponding to the plurality of information types by the combination vector.
可选地,所述识别文本信息中与预设的一个或多个符号对应的信息的步骤包括:Optionally, the step of identifying information corresponding to the preset one or more symbols in the text information includes:
采用正则表达式和/或关键词匹配的方式,识别文本信息中与预设的一个或多个符号对应的信息。The information corresponding to the preset one or more symbols in the text information is identified by means of regular expressions and/or keyword matching.
可选地,所述在替换后的所述文本信息中,获取与待提取信息对应的第一符号以及所述第一符号的上下文信息的步骤包括:Optionally, in the text information that is replaced, the step of acquiring the first symbol corresponding to the information to be extracted and the context information of the first symbol includes:
在替换后的所述文本信息中,获取与待提取信息对应的第一符号,并获取所述第一符号之前的第一预设数量的字符和/或所述第一符号之后的第二预设数量的字符,所述字符包括字和/或词。Obtaining, in the replaced text information, a first symbol corresponding to the information to be extracted, and acquiring a first preset number of characters before the first symbol and/or a second pre-after the first symbol A number of characters including words and/or words.
可选地,所述在替换后的所述文本信息中,获取与待提取信息对应的第一符号,并获取所述第一符号之前的第一预设数量个字和/或词、所述第一符号之后的第二预设数量个字和/或词之后,所述提取方法还包括:Optionally, in the replaced text information, acquiring a first symbol corresponding to the information to be extracted, and acquiring a first preset number of words and/or words before the first symbol, After the second predetermined number of words and/or words after the first symbol, the extraction method further includes:
剔除获取到的所述第一符号之前的字符以及所述第一符号之后的字符中包含的预设无用字符,所述预设无用字符包括标点符号、语气词和空白符号。The obtained characters before the first symbol and the preset useless characters included in the characters after the first symbol are excluded, and the preset useless characters include punctuation marks, modal characters and blank symbols.
可选地,所述在替换后的所述文本信息中,获取与待提取信息对应的第一符号以及所述第一符号的上下文信息的步骤包括:Optionally, in the text information that is replaced, the step of acquiring the first symbol corresponding to the information to be extracted and the context information of the first symbol includes:
对替换后的所述文本信息进行分词处理;Performing word segmentation on the replaced text information;
在分词处理后的所述文本信息中,获取与待提取信息对应的第一符号以及所述第一符号的上下文信息。In the text information after the word segmentation process, the first symbol corresponding to the information to be extracted and the context information of the first symbol are acquired.
为解决上述技术问题,本发明的实施例还提供一种文本信息的提取装置,包括:In order to solve the above technical problem, an embodiment of the present invention further provides a text information extracting apparatus, including:
替换模块,设置为识别文本信息中与预设的一个或多个符号对应的信息,并将识别出的信息用对应的符号进行替换; The replacement module is configured to identify information corresponding to the preset one or more symbols in the text information, and replace the identified information with the corresponding symbol;
获取模块,设置为在替换后的所述文本信息中,获取与待提取信息对应的第一符号以及所述第一符号的上下文信息;Obtaining a module, configured to acquire, in the replaced text information, a first symbol corresponding to the information to be extracted and context information of the first symbol;
提取模块,设置为根据所述第一符号的上下文信息,判断所述第一符号是否符合所述待提取信息的语义,若符合,则从所述文本信息中提取被所述第一符号替换的信息并输出。The extracting module is configured to determine, according to the context information of the first symbol, whether the first symbol conforms to the semantics of the information to be extracted, and if yes, extract the content replaced by the first symbol Information and output.
可选地,所述提取模块包括:Optionally, the extraction module includes:
第一获取子模块,设置为在预设的向量数据库中,获取所述第一符号对应的第一向量信息以及所述第一符号的上下文信息对应的第二向量信息;a first acquiring sub-module, configured to acquire, in a preset vector database, first vector information corresponding to the first symbol and second vector information corresponding to context information of the first symbol;
第一判断子模块,设置为根据所述第一向量信息以及所述第二向量信息进行加权运算,并根据运算结果,判断所述第一符号是否符合所述待提取信息的语义。The first determining sub-module is configured to perform a weighting operation according to the first vector information and the second vector information, and determine, according to the operation result, whether the first symbol conforms to the semantics of the information to be extracted.
为解决上述技术问题,本发明的实施例还提供一种移动终端,包括:如上任一项所述的文本信息的提取装置。In order to solve the above technical problem, an embodiment of the present invention further provides a mobile terminal, comprising: the text information extracting apparatus according to any one of the preceding claims.
本发明另一实施例提供了一种计算机存储介质,所述计算机存储介质存储有执行指令,所述执行指令用于执行上述方法实施例中的步骤之一或其组合。Another embodiment of the present invention provides a computer storage medium, where the computer storage medium stores execution instructions for performing one or a combination of the steps in the foregoing method embodiments.
本发明实施例中的上述技术方案的有益效果如下:The beneficial effects of the above technical solutions in the embodiments of the present invention are as follows:
本发明实施例的文本信息的提取方法,首先识别文本信息中与预设的一个或多个符号对应的信息,并将识别出的信息用对应的符号进行替换;然后在替换后的文本信息中,获取与待提取信息对应的第一符号以及第一符号的上下文信息;最后根据第一符号的上下文信息,判断第一符号是否符合待提取信息的语义,若符合,则从文本信息中提取被第一符号替换的信息并输出。这样,结合文本信息的上下文的语义特征来进行信息的抽取,能智能抽取用户感兴趣的内容;不需要指定关键词,比传统的模版匹配方法具有更大的灵活性,能适应不同的行文方式;使终端在智能理解文本语言的基础上开展各种应用,提升了用户体验。解决了相关技术中采用固定 模版很难灵活、准确地提取关键信息的问题。The method for extracting text information in the embodiment of the present invention first identifies information corresponding to the preset one or more symbols in the text information, and replaces the identified information with the corresponding symbol; and then in the replaced text information. Obtaining the first symbol corresponding to the information to be extracted and the context information of the first symbol; finally, determining, according to the context information of the first symbol, whether the first symbol conforms to the semantics of the information to be extracted, and if yes, extracting the text information The information replaced by the first symbol is output. In this way, the semantic feature of the context of the text information is used to extract the information, and the content of interest to the user can be intelligently extracted; without specifying a keyword, the method has greater flexibility than the traditional template matching method, and can adapt to different writing modes. The terminal enables various applications based on intelligent understanding of the text language and enhances the user experience. Solved the use of fixed technology in related technologies It is difficult for templates to extract key information in a flexible and accurate manner.
附图说明DRAWINGS
图1为本发明实施例中文本信息的提取方法的流程图;1 is a flowchart of a method for extracting text information according to an embodiment of the present invention;
图2为本发明实施例中文本信息的提取装置的结构示意图。FIG. 2 is a schematic structural diagram of an apparatus for extracting text information according to an embodiment of the present invention.
具体实施方式detailed description
为使本发明要解决的技术问题、技术方案和优点更加清楚,下面将结合附图及具体实施例进行详细描述。The technical problems, the technical solutions, and the advantages of the present invention will be more clearly described in the following description.
如图1所示,本发明实施例的文本信息的提取方法,包括:As shown in FIG. 1, a method for extracting text information according to an embodiment of the present invention includes:
步骤101,识别文本信息中与预设的一个或多个符号对应的信息,并将识别出的信息用对应的符号进行替换。Step 101: Identify information corresponding to the preset one or more symbols in the text information, and replace the identified information with the corresponding symbol.
这里,识别文本信息中与预设的符号对应的信息,然后将识别出的信息用对应的符号进行替换,能够对该符号代表的一类信息进行统一的处理。文本信息包括终端接收的短信息和通知消息等。Here, the information corresponding to the preset symbol in the text information is identified, and then the identified information is replaced with the corresponding symbol, and a type of information represented by the symbol can be uniformly processed. The text information includes short messages and notification messages received by the terminal, and the like.
其中,可预先设定某些特殊类型的字和/或词所对应的符号。如对文本信息字符串中包含的电子邮箱、网址、日期、时间、百分比、量词、货币、电话号码、数字、外文词等,均可用特殊的符号进行替换。Among them, some special types of words and/or symbols corresponding to words may be preset. For example, the e-mail address, URL, date, time, percentage, quantifier, currency, phone number, number, foreign words, etc. contained in the text message string can be replaced with special symbols.
可选地,还可以对自定义的词汇用特殊的符号进行替换,如专业应用领域的词汇、成语、食物、地点、设备、人名、地名、机构名称等。Optionally, custom vocabulary can also be replaced with special symbols, such as vocabulary, idioms, food, place, equipment, person name, place name, organization name, etc. in professional application fields.
例如,假定预设的符号包括与日期对应的“DATE”、与货币对应的“CURRENCY”、与银行对应的“BANK”、与时间对应的“TIME”。对于接收到的一条短信“您个人信用卡11月账单人民币4818.93,到期还款日11月23日。[招商银行]”,经过识别、替换后,成为“您个人信用卡DATE账单CURRENCY,到期还款日DATE。[BANK]”。对于接收到的另一条短信“尊敬的客户,您在工商银行办理的个人贷款需于2014年5月14日17:00前还款,还款金额本息合计9402.18元。[工商银行]”,经过识别、 替换后,成为“尊敬的客户,您在BANK办理的个人贷款需于DATE TIME前还款,还款金额本息合计CURRENCY。[BANK]”。For example, assume that the preset symbols include "DATE" corresponding to the date, "CURRENCY" corresponding to the currency, "BANK" corresponding to the bank, and "TIME" corresponding to the time. For the receipt of a text message "Your personal credit card November bill RMB 481.93, expired repayment date November 23rd. [China Merchants Bank]", after identification, replacement, become "your personal credit card DATE bill CURRENCY, expired Day DATE. [BANK]". For another SMS received, “Respected customers, your personal loan at ICBC needs to be repaid before 17:00 on May 14, 2014. The repayment amount is RMB 940.18. [ICBC]”, after Identification, After the replacement, become a "respected customer, your personal loan in BANK must be repaid before TIME DATE, the repayment amount of principal and interest total CURRENCY. [BANK]".
步骤102,在替换后的所述文本信息中,获取与待提取信息对应的第一符号以及所述第一符号的上下文信息。Step 102: Acquire, in the replaced text information, a first symbol corresponding to the information to be extracted and context information of the first symbol.
这里,需要在文本信息中获取第一符号以及第一符号的上下文信息,以通过后续步骤确定第一符号在文本信息中的语义是否符合待提取信息的语义。Here, the first symbol and the context information of the first symbol need to be acquired in the text information to determine, by subsequent steps, whether the semantics of the first symbol in the text information conform to the semantics of the information to be extracted.
假定待提取信息为还款日,则需要在替换后的文本信息中,获取还款日对应的符号“DATE”以及“DATE”的上下文信息。Assuming that the information to be extracted is the repayment date, the context information of the symbols "DATE" and "DATE" corresponding to the repayment date needs to be obtained in the replaced text information.
步骤103,根据所述第一符号的上下文信息,判断所述第一符号是否符合所述待提取信息的语义,若符合,则从所述文本信息中提取被所述第一符号替换的信息并输出。Step 103: Determine, according to the context information of the first symbol, whether the first symbol meets the semantics of the information to be extracted, and if yes, extract information that is replaced by the first symbol from the text information. Output.
这里,在文本信息中可能获取到多个第一符号,每个第一符号在文本信息中的语义可能不同,因此需要结合第一符号的上下文信息,判断第一符号是否符合待提取信息的语义,如果符合,说明被第一符号替换的信息就是要提取的信息,则从文本信息中提取被第一符号替换的信息并输出。Here, a plurality of first symbols may be acquired in the text information, and the semantics of each of the first symbols may be different in the text information. Therefore, it is required to combine the context information of the first symbol to determine whether the first symbol conforms to the semantics of the information to be extracted. If it is met, the information replaced by the first symbol is the information to be extracted, and the information replaced by the first symbol is extracted from the text information and output.
仍以上面提到的一条短信“您个人信用卡11月账单人民币4818.93,到期还款日11月23日。[招商银行]”为例,经过识别、替换后,这条短信成为“您个人信用卡DATE账单CURRENCY,到期还款日DATE。[BANK]”。假定待提取信息为还款日,还款日对应的符号为“DATE”。则从上面替换后的短信中能够获取到两个“DATE”,这两个“DATE”在短信中分别代表账单日期和还款日,因此需要结合“DATE”的上下文信息,判断“DATE”是否符合还款日的语义。通过判断可以知道第二个“DATE”符合还款日的语义,则提取被第二个“DATE”替换的信息(11月23日)并输出,从而从短信中提取出了还款日这一信息。Still taking the above mentioned text message "Your personal credit card November bill RMB 4818.93, due date repayment date November 23rd. [China Merchants Bank]", after identification, replacement, this text message becomes "your personal credit card" DATE bill CURRENCY, due date DATE. [BANK]". Assume that the information to be extracted is the repayment date, and the symbol corresponding to the repayment date is "DATE". Then, two "DATE"s can be obtained from the text message replaced above. The two "DATE"s represent the billing date and the repayment date respectively in the short message. Therefore, it is necessary to combine the context information of "DATE" to determine whether "DATE" is Meet the semantics of the repayment date. By judging that the second "DATE" conforms to the semantics of the repayment date, the information replaced by the second "DATE" is extracted (November 23) and output, thereby extracting the repayment date from the short message. information.
其中,输出提取的信息时可输出到终端的某些应用中,如将还款日输出到日程表应用中,以便于实现日期提醒等功能。 The outputted information can be output to some applications of the terminal, such as outputting the repayment date to the calendar application, so as to implement functions such as date reminding.
本发明实施例的文本信息的提取方法,结合文本信息的上下文的语义特征来进行信息的抽取,能智能抽取还款日、还款金额等用户感兴趣的内容;不需要指定关键词,比传统的模版匹配方法具有更大的灵活性,能适应不同的行文方式;使终端在智能理解文本语言的基础上开展各种应用,提升了用户体验。解决了相关技术中采用固定模版很难灵活、准确地提取关键信息的问题。The method for extracting text information according to the embodiment of the present invention combines the semantic features of the context of the text information to extract information, and can intelligently extract content of interest such as repayment date and repayment amount; without specifying keywords, The template matching method has greater flexibility and can adapt to different writing styles; enables the terminal to carry out various applications on the basis of intelligent understanding of the text language, thereby improving the user experience. The problem that the fixed template is difficult to extract key information flexibly and accurately in the related art is solved.
优选的,上述步骤103中,所述根据所述第一符号的上下文信息,判断所述第一符号是否符合所述待提取信息的语义的步骤可以包括:Preferably, in the foregoing step 103, the step of determining, according to the context information of the first symbol, whether the first symbol meets the semantics of the information to be extracted may include:
步骤1031,在预设的向量数据库中,获取所述第一符号对应的第一向量信息以及所述第一符号的上下文信息对应的第二向量信息。Step 1031: Acquire, in a preset vector database, first vector information corresponding to the first symbol and second vector information corresponding to context information of the first symbol.
这里,可在预先训练好的向量数据库中,获取到第一符号对应的第一向量信息以及第一符号的上下文信息对应的第二向量信息,以通过后续步骤进行加权计算。Here, the first vector information corresponding to the first symbol and the second vector information corresponding to the context information of the first symbol may be acquired in the pre-trained vector database to perform weighting calculation by the subsequent steps.
其中,向量数据库中可包含每个符号对应的向量值以及在上下文中可能用到的字和/或词对应的向量值。获取第一符号的上下文信息对应的第二向量信息时,可获取上下文信息包含的每个字和/或词分别对应的向量值,得到一向量序列。为保证计算的准确性,该向量序列中的向量应与文本信息的上下文顺序保持一致。The vector database may include a vector value corresponding to each symbol and a vector value corresponding to a word and/or a word that may be used in the context. When the second vector information corresponding to the context information of the first symbol is obtained, the vector value corresponding to each word and/or word included in the context information may be obtained to obtain a vector sequence. To ensure the accuracy of the calculation, the vector in the vector sequence should be consistent with the contextual order of the text information.
步骤1032,根据所述第一向量信息以及所述第二向量信息进行加权运算,并根据运算结果,判断所述第一符号是否符合所述待提取信息的语义。Step 1032: Perform a weighting operation according to the first vector information and the second vector information, and determine, according to the operation result, whether the first symbol conforms to the semantics of the information to be extracted.
这里,根据获取到的向量信息进行加权运算,根据运算结果,判断第一符号是否符合待提取信息(如还款日)的语义。Here, the weighting operation is performed according to the acquired vector information, and according to the operation result, it is determined whether the first symbol conforms to the semantics of the information to be extracted (such as the repayment date).
此时,基于向量信息进行加权运算,能准确判断第一符号的语义,从而达到准确提取关键信息的目的。At this time, the weighting operation based on the vector information can accurately determine the semantics of the first symbol, thereby achieving the purpose of accurately extracting key information.
可选地,上述步骤1032可以包括:Optionally, the foregoing step 1032 may include:
步骤10321,根据所述第一向量信息以及所述第二向量信息,采用与预设的多种信息类型对应的权系数分别进行加权运算,得到运算结果。 Step 10321: Perform weighting operations on the first vector information and the second vector information by using weight coefficients corresponding to preset multiple information types to obtain an operation result.
这里,假定预先设置了三种信息类型:还款日、还款金额、其他,那么通过第一符号及上下文得到的向量信息要与这三种信息类型对应的权系数分别进行加权运算,算出三个概率值。Here, it is assumed that three types of information are set in advance: the repayment date, the repayment amount, and others, and the vector information obtained by the first symbol and the context is weighted separately from the weight coefficients corresponding to the three types of information, and three are calculated. Probability values.
步骤10322,根据所述运算结果,确定所述第一符号的信息类型。Step 10322: Determine, according to the operation result, an information type of the first symbol.
这里,通过计算出的每种信息类型的概率值,确定第一符号的信息类型。可选取概率值最大的信息类型为第一符号的信息类型。Here, the information type of the first symbol is determined by the calculated probability value of each type of information. The information type with the largest probability value can be selected as the information type of the first symbol.
步骤10323,判断所述第一符号的信息类型是否与所述待提取信息的信息类型一致,若一致,则确定所述第一符号符合所述待提取信息的语义,否则,确定所述第一符号不符合所述待提取信息的语义。Step 10323: Determine whether the information type of the first symbol is consistent with the information type of the information to be extracted, and if yes, determine that the first symbol conforms to the semantics of the information to be extracted, otherwise, determine the first The symbol does not conform to the semantics of the information to be extracted.
这里,如果第一符号的信息类型与待提取信息的信息类型一致,可确定第一符号符合待提取信息的语义,否则,可确定第一符号不符合待提取信息的语义。Here, if the information type of the first symbol is consistent with the information type of the information to be extracted, it may be determined that the first symbol conforms to the semantics of the information to be extracted, otherwise, it may be determined that the first symbol does not conform to the semantics of the information to be extracted.
其中,若预先设置了三种信息类型:还款日、还款金额、其他,待提取信息的信息类型可以是还款日和还款金额,也就是可以实现同时提取多个待提取的信息。Wherein, if three types of information are set in advance: repayment date, repayment amount, and others, the information type of the information to be extracted may be a repayment date and a repayment amount, that is, it is possible to simultaneously extract a plurality of information to be extracted.
此时,通过预先设定的信息类型对应的权系数进行加权运算,能准确判断第一符号的语义,从而达到准确提取关键信息的目的。At this time, the weighting operation is performed by the weight coefficient corresponding to the preset information type, and the semantics of the first symbol can be accurately determined, thereby achieving the purpose of accurately extracting key information.
优选的,上述步骤10321的步骤可以包括:Preferably, the step of the above step 10321 may include:
步骤103211,采用双向长短程记忆模型神经网络或者卷积神经网络预先训练出的模型,对所述第一向量信息以及所述第二向量信息进行预处理,得到组合向量;Step 103211: Perform a pre-trained model using a bidirectional long- and short-range memory model neural network or a convolutional neural network, and perform pre-processing on the first vector information and the second vector information to obtain a combined vector.
步骤103212,根据所述组合向量与所述多种信息类型对应的权系数分别进行加权运算。Step 103212: Perform weighting operations on the weight coefficients corresponding to the multiple information types according to the combination vector.
此时,采用双向长短程记忆模型神经网络或者卷积神经网络预先训练出的模型首先对第一向量信息及第二向量信息进行预处理,得到第一符号及上下文的组合向量,再通过该组合向量与多种信息类型对应的权系数分 别进行加权运算,能准确判断第一符号的语义,从而准确提取关键信息。At this time, the model pre-trained by the two-way long- and short-range memory model neural network or the convolutional neural network first pre-processes the first vector information and the second vector information to obtain a combined vector of the first symbol and the context, and then passes the combination. Vector weight coefficient corresponding to multiple information types Do not perform weighting operations, can accurately determine the semantics of the first symbol, so as to accurately extract key information.
优选的,上述步骤101中,所述识别文本信息中与预设的一个或多个符号对应的信息的步骤可以包括:Preferably, in the step 101, the step of identifying the information corresponding to the preset one or more symbols in the text information may include:
步骤1011,采用正则表达式和/或关键词匹配的方式,识别文本信息中与预设的一个或多个符号对应的信息。In step 1011, the information corresponding to the preset one or more symbols in the text information is identified by using a regular expression and/or a keyword matching manner.
此时,采用正则表达式和/或关键词匹配的方式,都能准确识别出文本信息中与预设的符号对应的信息。At this time, the regular expression and/or keyword matching method can accurately identify the information corresponding to the preset symbol in the text information.
优选的,上述步骤102可以包括:Preferably, the foregoing step 102 may include:
步骤1021,在替换后的所述文本信息中,获取与待提取信息对应的第一符号,并获取所述第一符号之前的第一预设数量的字符和/或所述第一符号之后的第二预设数量的字符,所述字符包括字和/或词。Step 1021: In the replaced text information, acquiring a first symbol corresponding to the information to be extracted, and acquiring a first preset number of characters and/or the first symbol before the first symbol A second predetermined number of characters including words and/or words.
这里,为了运算的简便,可采用对称的上下文形式。如将第一预设数量和第二预设数量均设为5,则需要获取第一符号前后各5个字符。Here, for the simplicity of the operation, a symmetrical context form can be employed. If the first preset number and the second preset number are both set to 5, it is necessary to acquire 5 characters before and after the first symbol.
另外,因为中文句子的行文非常自由,一般上文比下文对当前符号的识别更为重要,因此,也可采用非对称的上下文形式。如将第一预设数量设为7,第二预设数量设为5,则需要获取第一符号之前的7个字符,第一符号之后的5个字符。In addition, because the Chinese sentence is very free, it is generally more important than the following to identify the current symbol. Therefore, an asymmetric context can also be used. If the first preset number is set to 7 and the second preset number is set to 5, it is necessary to acquire 7 characters before the first symbol and 5 characters after the first symbol.
此时,可根据需要限定上下文的字符数目,以结合上下文更好地判别第一符号的语义。At this time, the number of characters of the context can be defined as needed to better distinguish the semantics of the first symbol in combination with the context.
其中,确定上下文的字符数目相当于确定当前符号的上下文窗口的大小,后续以该上下文窗口内的字符判别当前符号的语义。假定第一预设数量和第二预设数量均设为5,对于“到期还款日DATE。[BANK]”中的DATE,如果DATE是要判别语义的当前符号,则上下文窗口包含的字为“到”、“期”、“还”、“款”、“日”、“。”、“[”、“BANK”、“]”。The number of characters determining the context is equivalent to determining the size of the context window of the current symbol, and the semantics of the current symbol are subsequently determined by the characters in the context window. Assume that the first preset number and the second preset number are both set to 5. For DATE in "expiration repayment date DATE. [BANK]", if DATE is the current symbol to discriminate semantics, the context window contains words It is “to”, “period”, “return”, “model”, “day”, “.”, “[”, “BANK”, “]”.
可选地,上述步骤1021之后,所述提取方法还可以包括:Optionally, after the step 1021, the extracting method may further include:
步骤1022,剔除获取到的所述第一符号之前的字符以及所述第一符号 之后的字符中包含的预设无用字符,所述预设无用字符包括标点符号、语气词和空白符号。Step 1022, culling the obtained character before the first symbol and the first symbol The preset useless characters are included in the following characters, and the preset useless characters include punctuation marks, modal particles, and blank symbols.
此时,通过对语义判别关系不大的字符进行剔除,避免了一些不必要的计算,提高了处理效率。进一步地,预设无用字符还可以包括一些特殊符号等。At this time, by eliminating the characters with little semantic discrimination, some unnecessary calculations are avoided, and the processing efficiency is improved. Further, the preset useless characters may also include some special symbols and the like.
由于单个字往往不能准确表达特定的语义,几个字组成的词才能准确表达特定的语义,比如“公”和“司”两个字的意思与“公司”完全不一样。为了更加便于语义的判断,优选的,上述步骤102可以包括:Since a single word often cannot accurately express a specific semantic, a word composed of several words can accurately express a specific semantic. For example, the meanings of "public" and "division" are completely different from "company". In order to make the judgment of the semantics more convenient, the above step 102 may include:
步骤1023,对替换后的所述文本信息进行分词处理;Step 1023: Perform word segmentation on the replaced text information.
步骤1024,在分词处理后的所述文本信息中,获取与待提取信息对应的第一符号以及所述第一符号的上下文信息。Step 1024: Acquire, in the text information after the word segmentation, the first symbol corresponding to the information to be extracted and the context information of the first symbol.
此时,可采用分词技术对文本信息的内容首先进行分词处理,即将常用词分出来,从而更加便于语义的判断。At this time, the word segmentation technique can be used to first perform word segmentation on the content of the text information, that is, the common words are separated, thereby facilitating the semantic judgment.
其中,当进行分词后,直接读取词对应的词向量即可,不必读取对应的字向量。另外,当训练样本足够大时,可以省略上述分词过程,因为样本足够时加权运算的模型能够表达不同文字的各种组合表达的语义。Wherein, after the word segmentation, the word vector corresponding to the word can be directly read, and the corresponding word vector does not have to be read. In addition, when the training sample is large enough, the above-mentioned word segmentation process can be omitted, because the model of the weighting operation can express the semantics of various combinations of different words when the sample is sufficient.
综上,本发明实施例的文本信息的提取方法,结合文本信息的上下文的语义特征来进行信息的抽取,能智能抽取还款日、还款金额等用户感兴趣的内容;不需要指定关键词,比传统的模版匹配方法具有更大的灵活性,能适应不同的行文方式;使终端在智能理解文本语言的基础上能够开展各种应用,便于实现智能提醒等功能;在信息的内容提取及后续存储、检索等应用上都提升了用户体验。解决了相关技术中采用固定模版很难灵活、准确地提取关键信息的问题。In summary, the method for extracting text information according to the embodiment of the present invention combines the semantic features of the context of the text information to extract information, and can intelligently extract content of interest such as repayment date and repayment amount; Compared with the traditional template matching method, it has greater flexibility and can adapt to different writing styles; enables the terminal to carry out various applications on the basis of intelligent understanding of the text language, facilitating the realization of smart reminders and other functions; Subsequent storage, retrieval and other applications have improved the user experience. The problem that the fixed template is difficult to extract key information flexibly and accurately in the related art is solved.
如图2所示,本发明的实施例还提供一种文本信息的提取装置,包括:As shown in FIG. 2, an embodiment of the present invention further provides an apparatus for extracting text information, including:
替换模块201,设置为识别文本信息中与预设的一个或多个符号对应的信息,并将识别出的信息用对应的符号进行替换; The replacement module 201 is configured to identify information corresponding to the preset one or more symbols in the text information, and replace the identified information with the corresponding symbol;
获取模块202,设置为在替换后的所述文本信息中,获取与待提取信息对应的第一符号以及所述第一符号的上下文信息;The obtaining module 202 is configured to: obtain, in the replaced text information, a first symbol corresponding to the information to be extracted and context information of the first symbol;
提取模块203,设置为根据所述第一符号的上下文信息,判断所述第一符号是否符合所述待提取信息的语义,若符合,则从所述文本信息中提取被所述第一符号替换的信息并输出。The extracting module 203 is configured to determine, according to the context information of the first symbol, whether the first symbol conforms to the semantics of the information to be extracted, and if yes, extract the text information to be replaced by the first symbol Information and output.
本发明实施例的文本信息的提取装置,结合文本信息的上下文的语义特征来进行信息的抽取,能智能抽取还款日、还款金额等用户感兴趣的内容;不需要指定关键词,比传统的模版匹配方法具有更大的灵活性,能适应不同的行文方式;使终端在智能理解文本语言的基础上开展各种应用,提升了用户体验。解决了相关技术中采用固定模版很难灵活、准确地提取关键信息的问题。The text information extracting apparatus of the embodiment of the present invention combines the semantic features of the context of the text information to extract information, and can intelligently extract content of interest such as repayment date and repayment amount; without specifying a keyword, The template matching method has greater flexibility and can adapt to different writing styles; enables the terminal to carry out various applications on the basis of intelligent understanding of the text language, thereby improving the user experience. The problem that the fixed template is difficult to extract key information flexibly and accurately in the related art is solved.
优选的,所述提取模块203包括:Preferably, the extraction module 203 includes:
第一获取子模块,设置为在预设的向量数据库中,获取所述第一符号对应的第一向量信息以及所述第一符号的上下文信息对应的第二向量信息;a first acquiring sub-module, configured to acquire, in a preset vector database, first vector information corresponding to the first symbol and second vector information corresponding to context information of the first symbol;
第一判断子模块,设置为根据所述第一向量信息以及所述第二向量信息进行加权运算,并根据运算结果,判断所述第一符号是否符合所述待提取信息的语义。The first determining sub-module is configured to perform a weighting operation according to the first vector information and the second vector information, and determine, according to the operation result, whether the first symbol conforms to the semantics of the information to be extracted.
优选的,所述第一判断子模块包括:Preferably, the first determining submodule comprises:
第一加权运算单元,设置为根据所述第一向量信息以及所述第二向量信息,采用与预设的多种信息类型对应的权系数分别进行加权运算,得到运算结果;The first weighting operation unit is configured to perform weighting operations on the first vector information and the second vector information by using weight coefficients corresponding to the preset plurality of information types to obtain an operation result;
第一确定单元,设置为根据所述运算结果,确定所述第一符号的信息类型;a first determining unit, configured to determine, according to the operation result, an information type of the first symbol;
第二确定单元,设置为判断所述第一符号的信息类型是否与所述待提取信息的信息类型一致,若一致,则确定所述第一符号符合所述待提取信息的语义,否则,确定所述第一符号不符合所述待提取信息的语义。 a second determining unit, configured to determine whether the information type of the first symbol is consistent with the information type of the information to be extracted, and if yes, determining that the first symbol conforms to the semantics of the information to be extracted, otherwise, determining The first symbol does not conform to the semantics of the information to be extracted.
优选的,所述第一加权运算单元包括:Preferably, the first weighting operation unit includes:
预处理子单元,设置为采用双向长短程记忆模型神经网络或者卷积神经网络预先训练出的模型,对所述第一向量信息以及所述第二向量信息进行预处理,得到组合向量;a pre-processing sub-unit, configured to pre-train the model using a bidirectional long- and short-range memory model neural network or a convolutional neural network, and pre-process the first vector information and the second vector information to obtain a combined vector;
第一加权运算子单元,设置为根据所述组合向量与所述多种信息类型对应的权系数分别进行加权运算。The first weighting operation subunit is configured to perform a weighting operation on the weight coefficients corresponding to the plurality of information types according to the combination vector.
优选的,所述替换模块201包括:Preferably, the replacement module 201 includes:
识别子模块,设置为采用正则表达式和/或关键词匹配的方式,识别文本信息中与预设的一个或多个符号对应的信息。The identification sub-module is configured to identify information corresponding to the preset one or more symbols in the text information by using a regular expression and/or a keyword matching manner.
优选的,所述获取模块202包括:Preferably, the obtaining module 202 includes:
第二获取子模块,设置为在替换后的所述文本信息中,获取与待提取信息对应的第一符号,并获取所述第一符号之前的第一预设数量的字符和/或所述第一符号之后的第二预设数量的字符,所述字符包括字和/或词。a second obtaining sub-module, configured to acquire, in the replaced text information, a first symbol corresponding to the information to be extracted, and acquire a first preset number of characters and/or the first symbol A second predetermined number of characters after the first symbol, the characters including words and/or words.
优选的,所述提取装置还包括:Preferably, the extracting device further includes:
剔除模块,设置为剔除获取到的所述第一符号之前的字符以及所述第一符号之后的字符中包含的预设无用字符,所述预设无用字符包括标点符号、语气词和空白符号。The culling module is configured to cull the obtained character before the first symbol and the preset useless characters included in the character after the first symbol, the preset useless characters including punctuation marks, modal characters and blank symbols.
优选的,所述获取模块202包括:Preferably, the obtaining module 202 includes:
分词子模块,设置为对替换后的所述文本信息进行分词处理;a word segmentation sub-module, configured to perform word segmentation on the replaced text information;
第三获取子模块,设置为在分词处理后的所述文本信息中,获取与待提取信息对应的第一符号以及所述第一符号的上下文信息。The third obtaining sub-module is configured to acquire, in the text information after the word segmentation processing, the first symbol corresponding to the information to be extracted and the context information of the first symbol.
综上,本发明实施例的文本信息的提取装置,结合文本信息的上下文的语义特征来进行信息的抽取,能智能抽取还款日、还款金额等用户感兴趣的内容;不需要指定关键词,比传统的模版匹配方法具有更大的灵活性,能适应不同的行文方式;使终端在智能理解文本语言的基础上能够开展各种应用,便于实现智能提醒等功能;在信息的内容提取及后续存储、检索 等应用上都提升了用户体验。解决了相关技术中采用固定模版很难灵活、准确地提取关键信息的问题。In summary, the text information extracting apparatus of the embodiment of the present invention combines the semantic features of the context of the text information to extract information, and can intelligently extract content of interest such as repayment date and repayment amount; Compared with the traditional template matching method, it has greater flexibility and can adapt to different writing styles; enables the terminal to carry out various applications on the basis of intelligent understanding of the text language, facilitating the realization of smart reminders and other functions; Subsequent storage, retrieval The application experience has improved the user experience. The problem that the fixed template is difficult to extract key information flexibly and accurately in the related art is solved.
需要说明的是,该文本信息的提取装置是与上述文本信息的提取方法相对应的装置,其中上述方法实施例中所有实现方式均适用于该装置的实施例中,也能达到同样的技术效果。It should be noted that the apparatus for extracting the text information is a device corresponding to the method for extracting the text information, wherein all the implementation manners in the foregoing method embodiments are applicable to the embodiment of the device, and the same technical effect can be achieved. .
由于本发明实施例的文本信息的提取装置应用于移动终端,因此,本发明实施例还提供了一种移动终端,包括:如上述实施例中所述的文本信息的提取装置。其中,上述文本信息的提取装置的所述实现实施例均适用于该移动终端的实施例中,也能达到相同的技术效果。本发明的移动终端如可以是手机、平板电脑等移动电子设备。The text information extracting apparatus of the embodiment of the present invention is applied to a mobile terminal. Therefore, the embodiment of the present invention further provides a mobile terminal, including: the text information extracting apparatus as described in the foregoing embodiment. The implementation examples of the foregoing text information extracting apparatus are applicable to the embodiment of the mobile terminal, and the same technical effects can be achieved. The mobile terminal of the present invention may be a mobile electronic device such as a mobile phone or a tablet computer.
本发明的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质中存储有执行指令,该执行指令用于执行上述方法实施例中的步骤之一或其组合。Embodiments of the present invention also provide a storage medium. Optionally, in this embodiment, the foregoing storage medium stores an execution instruction, where the execution instruction is used to perform one or a combination of the steps in the foregoing method embodiments.
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。Optionally, in the embodiment, the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), and a Random Access Memory (RAM). A variety of media that can store program code, such as a hard disk, a disk, or an optical disk.
在本发明的各种实施例中,应理解,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。In the various embodiments of the present invention, it should be understood that the size of the sequence numbers of the above processes does not mean the order of execution, and the order of execution of each process should be determined by its function and internal logic, and should not be taken to the embodiments of the present invention. The implementation process constitutes any limitation.
以上所述是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明所述原理的前提下,还可以作出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。The above is a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention.
工业实用性Industrial applicability
如上所述,本发明实施例提供的一种文本信息的提取方法、装置和移动终端具有以下有益效果:能够结合文本信息的上下文的语义特征来进行 信息的抽取,能智能抽取用户感兴趣的内容;不需要指定关键词,比传统的模版匹配方法具有更大的灵活性,能适应不同的行文方式;使终端在智能理解文本语言的基础上开展各种应用,提升了用户体验。解决了相关技术中采用固定模版很难灵活、准确地提取关键信息的问题。 As described above, the method, apparatus, and mobile terminal for extracting text information provided by the embodiments of the present invention have the following beneficial effects: can be combined with the semantic features of the context of the text information. The extraction of information can intelligently extract the content of interest to the user; it does not need to specify keywords, and has greater flexibility than the traditional template matching method, and can adapt to different styles of writing; the terminal is developed on the basis of intelligent understanding of the text language. A variety of applications that enhance the user experience. The problem that the fixed template is difficult to extract key information flexibly and accurately in the related art is solved.

Claims (11)

  1. 一种文本信息的提取方法,包括:A method for extracting text information, comprising:
    识别文本信息中与预设的一个或多个符号对应的信息,并将识别出的信息用对应的符号进行替换;Identifying information corresponding to the preset one or more symbols in the text information, and replacing the identified information with the corresponding symbol;
    在替换后的所述文本信息中,获取与待提取信息对应的第一符号以及所述第一符号的上下文信息;Obtaining, in the replaced text information, a first symbol corresponding to the information to be extracted and context information of the first symbol;
    根据所述第一符号的上下文信息,判断所述第一符号是否符合所述待提取信息的语义,若符合,则从所述文本信息中提取被所述第一符号替换的信息并输出。Determining, according to the context information of the first symbol, whether the first symbol conforms to the semantics of the information to be extracted, and if yes, extracting information replaced by the first symbol from the text information and outputting the information.
  2. 根据权利要求1所述的提取方法,其中,所述根据所述第一符号的上下文信息,判断所述第一符号是否符合所述待提取信息的语义的步骤包括:The extraction method according to claim 1, wherein the step of determining whether the first symbol conforms to the semantics of the information to be extracted according to the context information of the first symbol comprises:
    在预设的向量数据库中,获取所述第一符号对应的第一向量信息以及所述第一符号的上下文信息对应的第二向量信息;Obtaining, in a preset vector database, first vector information corresponding to the first symbol and second vector information corresponding to context information of the first symbol;
    根据所述第一向量信息以及所述第二向量信息进行加权运算,并根据运算结果,判断所述第一符号是否符合所述待提取信息的语义。Performing a weighting operation according to the first vector information and the second vector information, and determining, according to the operation result, whether the first symbol conforms to the semantics of the information to be extracted.
  3. 根据权利要求2所述的提取方法,其中,所述根据所述第一向量信息以及所述第二向量信息进行加权运算,并根据运算结果,判断所述第一符号是否符合所述待提取信息的语义的步骤包括:The extraction method according to claim 2, wherein the weighting operation is performed according to the first vector information and the second vector information, and based on the operation result, determining whether the first symbol meets the information to be extracted The semantic steps include:
    根据所述第一向量信息以及所述第二向量信息,采用与预设的多种信息类型对应的权系数分别进行加权运算,得到运算结果;And performing weighting operations on the first vector information and the second vector information by using weight coefficients corresponding to preset multiple information types to obtain an operation result;
    根据所述运算结果,确定所述第一符号的信息类型;Determining an information type of the first symbol according to the operation result;
    判断所述第一符号的信息类型是否与所述待提取信息的信息类型一致,若一致,则确定所述第一符号符合所述待提取信息的语义,否则,确定所述第一符号不符合所述待提取信息的语义。Determining whether the information type of the first symbol is consistent with the information type of the information to be extracted, and if yes, determining that the first symbol conforms to the semantics of the information to be extracted, otherwise determining that the first symbol does not match The semantics of the information to be extracted.
  4. 根据权利要求3所述的提取方法,其中,所述根据所述第一 向量信息以及所述第二向量信息,采用与预设的多种信息类型对应的权系数分别进行加权运算的步骤包括:The extraction method according to claim 3, wherein said according to said first The vector information and the second vector information are respectively subjected to weighting operations by using weight coefficients corresponding to preset multiple information types, including:
    采用双向长短程记忆模型神经网络或者卷积神经网络预先训练出的模型,对所述第一向量信息以及所述第二向量信息进行预处理,得到组合向量;Using a bidirectional long- and short-range memory model neural network or a pre-trained model of a convolutional neural network, pre-processing the first vector information and the second vector information to obtain a combined vector;
    根据所述组合向量与所述多种信息类型对应的权系数分别进行加权运算。Performing a weighting operation according to the weight coefficients corresponding to the plurality of information types by the combination vector.
  5. 根据权利要求1所述的提取方法,其中,所述识别文本信息中与预设的一个或多个符号对应的信息的步骤包括:The extraction method according to claim 1, wherein the step of identifying information corresponding to the preset one or more symbols in the text information comprises:
    采用正则表达式和/或关键词匹配的方式,识别文本信息中与预设的一个或多个符号对应的信息。The information corresponding to the preset one or more symbols in the text information is identified by means of regular expressions and/or keyword matching.
  6. 根据权利要求1所述的提取方法,其中,所述在替换后的所述文本信息中,获取与待提取信息对应的第一符号以及所述第一符号的上下文信息的步骤包括:The extraction method according to claim 1, wherein the step of acquiring the first symbol corresponding to the information to be extracted and the context information of the first symbol in the replaced text information comprises:
    在替换后的所述文本信息中,获取与待提取信息对应的第一符号,并获取所述第一符号之前的第一预设数量的字符和/或所述第一符号之后的第二预设数量的字符,所述字符包括字和/或词。Obtaining, in the replaced text information, a first symbol corresponding to the information to be extracted, and acquiring a first preset number of characters before the first symbol and/or a second pre-after the first symbol A number of characters including words and/or words.
  7. 根据权利要求6所述的提取方法,其中,所述在替换后的所述文本信息中,获取与待提取信息对应的第一符号,并获取所述第一符号之前的第一预设数量个字和/或词、所述第一符号之后的第二预设数量个字和/或词之后,所述提取方法还包括:The extraction method according to claim 6, wherein in the replaced text information, acquiring a first symbol corresponding to the information to be extracted, and acquiring a first preset number of the first symbol After the word and/or the word, the second predetermined number of words and/or words after the first symbol, the extracting method further includes:
    剔除获取到的所述第一符号之前的字符以及所述第一符号之后的字符中包含的预设无用字符,所述预设无用字符包括标点符号、语气词和空白符号。The obtained characters before the first symbol and the preset useless characters included in the characters after the first symbol are excluded, and the preset useless characters include punctuation marks, modal characters and blank symbols.
  8. 根据权利要求1所述的提取方法,其中,所述在替换后的所 述文本信息中,获取与待提取信息对应的第一符号以及所述第一符号的上下文信息的步骤包括:The extraction method according to claim 1, wherein said replacement place In the text information, the step of acquiring the first symbol corresponding to the information to be extracted and the context information of the first symbol includes:
    对替换后的所述文本信息进行分词处理;Performing word segmentation on the replaced text information;
    在分词处理后的所述文本信息中,获取与待提取信息对应的第一符号以及所述第一符号的上下文信息。In the text information after the word segmentation process, the first symbol corresponding to the information to be extracted and the context information of the first symbol are acquired.
  9. 一种文本信息的提取装置,包括:A text information extraction device includes:
    替换模块,设置为识别文本信息中与预设的一个或多个符号对应的信息,并将识别出的信息用对应的符号进行替换;The replacement module is configured to identify information corresponding to the preset one or more symbols in the text information, and replace the identified information with the corresponding symbol;
    获取模块,设置为在替换后的所述文本信息中,获取与待提取信息对应的第一符号以及所述第一符号的上下文信息;Obtaining a module, configured to acquire, in the replaced text information, a first symbol corresponding to the information to be extracted and context information of the first symbol;
    提取模块,设置为根据所述第一符号的上下文信息,判断所述第一符号是否符合所述待提取信息的语义,若符合,则从所述文本信息中提取被所述第一符号替换的信息并输出。The extracting module is configured to determine, according to the context information of the first symbol, whether the first symbol conforms to the semantics of the information to be extracted, and if yes, extract the content replaced by the first symbol Information and output.
  10. 根据权利要求9所述的提取装置,其中,所述提取模块包括:The extraction device according to claim 9, wherein the extraction module comprises:
    第一获取子模块,设置为在预设的向量数据库中,获取所述第一符号对应的第一向量信息以及所述第一符号的上下文信息对应的第二向量信息;a first acquiring sub-module, configured to acquire, in a preset vector database, first vector information corresponding to the first symbol and second vector information corresponding to context information of the first symbol;
    第一判断子模块,设置为根据所述第一向量信息以及所述第二向量信息进行加权运算,并根据运算结果,判断所述第一符号是否符合所述待提取信息的语义。The first determining sub-module is configured to perform a weighting operation according to the first vector information and the second vector information, and determine, according to the operation result, whether the first symbol conforms to the semantics of the information to be extracted.
  11. 一种移动终端,包括:如权利要求9-10任一项所述的文本信息的提取装置。 A mobile terminal comprising: the text information extracting apparatus according to any one of claims 9-10.
PCT/CN2017/073944 2016-08-11 2017-02-17 Text information extracting method, device and mobile terminal WO2018028164A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610658626.0A CN107729310A (en) 2016-08-11 2016-08-11 A kind of extracting method of text message, device and mobile terminal
CN201610658626.0 2016-08-11

Publications (1)

Publication Number Publication Date
WO2018028164A1 true WO2018028164A1 (en) 2018-02-15

Family

ID=61162602

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/073944 WO2018028164A1 (en) 2016-08-11 2017-02-17 Text information extracting method, device and mobile terminal

Country Status (2)

Country Link
CN (1) CN107729310A (en)
WO (1) WO2018028164A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782888A (en) * 2018-07-27 2020-02-11 国际商业机器公司 Voice tone control system for changing perceptual-cognitive state
CN113609853A (en) * 2021-07-30 2021-11-05 支付宝(杭州)信息技术有限公司 Enterprise subject attribute identification method, device and equipment

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110837547A (en) * 2019-10-16 2020-02-25 云知声智能科技股份有限公司 Method and device for understanding multi-intention text in man-machine interaction
CN113345409B (en) * 2021-08-05 2021-11-26 北京世纪好未来教育科技有限公司 Speech synthesis method, speech synthesis device, electronic equipment and computer-readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130113902A1 (en) * 2011-11-04 2013-05-09 Inventec Corporation Reminding method for daily life management
CN103984687A (en) * 2013-02-07 2014-08-13 北京搜狗科技发展有限公司 Reminding creating method and device
CN104378441A (en) * 2014-11-25 2015-02-25 小米科技有限责任公司 Schedule creating method and device
CN105183704A (en) * 2014-06-17 2015-12-23 中兴通讯股份有限公司 Method and device for extracting lunar calendar time from text
CN105447750A (en) * 2015-11-17 2016-03-30 小米科技有限责任公司 Information identification method, apparatus, terminal and server

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9542528B2 (en) * 2012-03-30 2017-01-10 The Florida State University Research Foundation, Inc. Automated extraction of bio-entity relationships from literature
CN104699763B (en) * 2015-02-11 2017-10-17 中国科学院新疆理化技术研究所 The text similarity gauging system of multiple features fusion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130113902A1 (en) * 2011-11-04 2013-05-09 Inventec Corporation Reminding method for daily life management
CN103984687A (en) * 2013-02-07 2014-08-13 北京搜狗科技发展有限公司 Reminding creating method and device
CN105183704A (en) * 2014-06-17 2015-12-23 中兴通讯股份有限公司 Method and device for extracting lunar calendar time from text
CN104378441A (en) * 2014-11-25 2015-02-25 小米科技有限责任公司 Schedule creating method and device
CN105447750A (en) * 2015-11-17 2016-03-30 小米科技有限责任公司 Information identification method, apparatus, terminal and server

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782888A (en) * 2018-07-27 2020-02-11 国际商业机器公司 Voice tone control system for changing perceptual-cognitive state
CN113609853A (en) * 2021-07-30 2021-11-05 支付宝(杭州)信息技术有限公司 Enterprise subject attribute identification method, device and equipment

Also Published As

Publication number Publication date
CN107729310A (en) 2018-02-23

Similar Documents

Publication Publication Date Title
US11663411B2 (en) Ontology expansion using entity-association rules and abstract relations
WO2018028077A1 (en) Deep learning based method and device for chinese semantics analysis
WO2020244073A1 (en) Speech-based user classification method and device, computer apparatus, and storage medium
CN110597952A (en) Information processing method, server, and computer storage medium
US8688690B2 (en) Method for calculating semantic similarities between messages and conversations based on enhanced entity extraction
CN110633577B (en) Text desensitization method and device
WO2018028164A1 (en) Text information extracting method, device and mobile terminal
CN106547875B (en) Microblog online emergency detection method based on emotion analysis and label
CN111125354A (en) Text classification method and device
CN108550065B (en) Comment data processing method, device and equipment
CN113837531A (en) Product quality problem finding and risk assessment method based on network comments
CN109918556B (en) Method for identifying depressed mood by integrating social relationship and text features of microblog users
WO2018028065A1 (en) Method and device for classifying short message and computer storage medium
CN110008473B (en) Medical text named entity identification and labeling method based on iteration method
CN112199588A (en) Public opinion text screening method and device
CN109086340A (en) Evaluation object recognition methods based on semantic feature
CN110765889A (en) Legal document feature extraction method, related device and storage medium
CN112132238A (en) Method, device, equipment and readable medium for identifying private data
Biba et al. Sentiment analysis through machine learning: an experimental evaluation for Albanian
CN110781428A (en) Comment display method and device, computer equipment and storage medium
CN116151233A (en) Data labeling and generating method, model training method, device and medium
CN112581297B (en) Information pushing method and device based on artificial intelligence and computer equipment
CN117278675A (en) Outbound method, device, equipment and medium based on intention classification
CN112527963A (en) Multi-label emotion classification method and device based on dictionary, equipment and storage medium
CN115687754A (en) Active network information mining method based on intelligent conversation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17838301

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17838301

Country of ref document: EP

Kind code of ref document: A1