WO2018028164A1

WO2018028164A1 - Text information extracting method, device and mobile terminal

Info

Publication number: WO2018028164A1
Application number: PCT/CN2017/073944
Authority: WO
Inventors: 陈军
Original assignee: 中兴通讯股份有限公司
Priority date: 2016-08-11
Filing date: 2017-02-17
Publication date: 2018-02-15
Also published as: CN107729310A

Abstract

A method, an device for extracting text information and a mobile terminal, relating to the field of information processing technology, solve the problem that it is difficult to extract key information flexibly and accurately using a fixed template. The method comprises: identifying information in the text information corresponding to one or more symbols preset, and replacing the identified information with the corresponding symbol (101); in the replaced text information, obtaining a first symbol corresponding to information to be extracted and context information of the first symbol (102); and according to the context information of the first symbol, determining whether or not the first symbol conforms to the semantic of the information to be extracted, and if the first symbol conforms to the semantic of the information to be extracted, then extracting the information replaced by the first symbol from the text information and outputting the same (103). In the above method, information is extracted by combining semantic features of the context of the text information, which can be flexibly adapted to different manners of writing, and can accurately extract the content the user interests.

Description

Method, device and mobile terminal for extracting text information

Technical field

The embodiments of the present invention relate to the field of information processing technologies, and in particular, to a method, an apparatus, and a mobile terminal for extracting text information.

Background technique

At present, SMS and notification messages have become an essential function of mobile terminals. In daily life, the terminal will receive various types of short messages and notification messages, such as billing information, booking information, schedules, etc., with the increase of such information, the user is not very convenient to retrieve. If you can extract the key content of this information and combine it with other applications of the mobile phone, such as depositing into accounting software, calendar and other applications, it will bring great convenience to users in the inquiry and reminder of information, which is convenient for users. usage of.

For example, for a bank SMS bill, the user generally withdraws the repayment date and the repayment amount by himself and deposits it in the schedule. If the terminal can intelligently extract such useful information and output it to the calendar, the user does not have to spend a lot of effort to find the search for the terminal to store a large number of short messages and notification messages, and it is not easy to forget the important schedule.

Traditionally, for the extraction of key information, most of them use the keyword template matching method. However, the text of text messages is very flexible. Keywords that rely on contexts often have different meanings. Therefore, it is difficult to extract key information flexibly and accurately using fixed templates.

Summary of the invention

The technical problem to be solved by the embodiments of the present invention is to provide a method, a device, and a mobile terminal for extracting text information, and solving the problem that the fixed template is difficult to extract key information flexibly and accurately in the related art.

To solve the above technical problem, an embodiment of the present invention provides a method for extracting text information, including:

Identifying information in the text message corresponding to the preset one or more symbols and identifying the The information is replaced with the corresponding symbol;

Obtaining, in the replaced text information, a first symbol corresponding to the information to be extracted and context information of the first symbol;

Determining, according to the context information of the first symbol, whether the first symbol conforms to the semantics of the information to be extracted, and if yes, extracting information replaced by the first symbol from the text information and outputting the information.

Optionally, the step of determining, according to the context information of the first symbol, whether the first symbol meets the semantics of the information to be extracted includes:

Obtaining, in a preset vector database, first vector information corresponding to the first symbol and second vector information corresponding to context information of the first symbol;

Performing a weighting operation according to the first vector information and the second vector information, and determining, according to the operation result, whether the first symbol conforms to the semantics of the information to be extracted.

Optionally, the performing the weighting operation according to the first vector information and the second vector information, and determining, according to the operation result, whether the first symbol meets the semantics of the information to be extracted includes:

And performing weighting operations on the first vector information and the second vector information by using weight coefficients corresponding to preset multiple information types to obtain an operation result;

Determining an information type of the first symbol according to the operation result;

Determining whether the information type of the first symbol is consistent with the information type of the information to be extracted, and if yes, determining that the first symbol conforms to the semantics of the information to be extracted, otherwise determining that the first symbol does not match The semantics of the information to be extracted.

Optionally, the step of performing a weighting operation by using the weight coefficients corresponding to the preset multiple information types according to the first vector information and the second vector information includes:

Using a bidirectional long- and short-range memory model neural network or a pre-trained model of a convolutional neural network, pre-processing the first vector information and the second vector information to obtain a combined vector;

Performing a weighting operation according to the weight coefficients corresponding to the plurality of information types by the combination vector.

Optionally, the step of identifying information corresponding to the preset one or more symbols in the text information includes:

The information corresponding to the preset one or more symbols in the text information is identified by means of regular expressions and/or keyword matching.

Optionally, in the text information that is replaced, the step of acquiring the first symbol corresponding to the information to be extracted and the context information of the first symbol includes:

Obtaining, in the replaced text information, a first symbol corresponding to the information to be extracted, and acquiring a first preset number of characters before the first symbol and/or a second pre-after the first symbol A number of characters including words and/or words.

Optionally, in the replaced text information, acquiring a first symbol corresponding to the information to be extracted, and acquiring a first preset number of words and/or words before the first symbol, After the second predetermined number of words and/or words after the first symbol, the extraction method further includes:

The obtained characters before the first symbol and the preset useless characters included in the characters after the first symbol are excluded, and the preset useless characters include punctuation marks, modal characters and blank symbols.

Performing word segmentation on the replaced text information;

In the text information after the word segmentation process, the first symbol corresponding to the information to be extracted and the context information of the first symbol are acquired.

In order to solve the above technical problem, an embodiment of the present invention further provides a text information extracting apparatus, including:

The replacement module is configured to identify information corresponding to the preset one or more symbols in the text information, and replace the identified information with the corresponding symbol;

Obtaining a module, configured to acquire, in the replaced text information, a first symbol corresponding to the information to be extracted and context information of the first symbol;

The extracting module is configured to determine, according to the context information of the first symbol, whether the first symbol conforms to the semantics of the information to be extracted, and if yes, extract the content replaced by the first symbol Information and output.

Optionally, the extraction module includes:

a first acquiring sub-module, configured to acquire, in a preset vector database, first vector information corresponding to the first symbol and second vector information corresponding to context information of the first symbol;

The first determining sub-module is configured to perform a weighting operation according to the first vector information and the second vector information, and determine, according to the operation result, whether the first symbol conforms to the semantics of the information to be extracted.

In order to solve the above technical problem, an embodiment of the present invention further provides a mobile terminal, comprising: the text information extracting apparatus according to any one of the preceding claims.

Another embodiment of the present invention provides a computer storage medium, where the computer storage medium stores execution instructions for performing one or a combination of the steps in the foregoing method embodiments.

The beneficial effects of the above technical solutions in the embodiments of the present invention are as follows:

The method for extracting text information in the embodiment of the present invention first identifies information corresponding to the preset one or more symbols in the text information, and replaces the identified information with the corresponding symbol; and then in the replaced text information. Obtaining the first symbol corresponding to the information to be extracted and the context information of the first symbol; finally, determining, according to the context information of the first symbol, whether the first symbol conforms to the semantics of the information to be extracted, and if yes, extracting the text information The information replaced by the first symbol is output. In this way, the semantic feature of the context of the text information is used to extract the information, and the content of interest to the user can be intelligently extracted; without specifying a keyword, the method has greater flexibility than the traditional template matching method, and can adapt to different writing modes. The terminal enables various applications based on intelligent understanding of the text language and enhances the user experience. Solved the use of fixed technology in related technologies It is difficult for templates to extract key information in a flexible and accurate manner.

DRAWINGS

1 is a flowchart of a method for extracting text information according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an apparatus for extracting text information according to an embodiment of the present invention.

detailed description

The technical problems, the technical solutions, and the advantages of the present invention will be more clearly described in the following description.

As shown in FIG. 1, a method for extracting text information according to an embodiment of the present invention includes:

Step 101: Identify information corresponding to the preset one or more symbols in the text information, and replace the identified information with the corresponding symbol.

Here, the information corresponding to the preset symbol in the text information is identified, and then the identified information is replaced with the corresponding symbol, and a type of information represented by the symbol can be uniformly processed. The text information includes short messages and notification messages received by the terminal, and the like.

Among them, some special types of words and/or symbols corresponding to words may be preset. For example, the e-mail address, URL, date, time, percentage, quantifier, currency, phone number, number, foreign words, etc. contained in the text message string can be replaced with special symbols.

Optionally, custom vocabulary can also be replaced with special symbols, such as vocabulary, idioms, food, place, equipment, person name, place name, organization name, etc. in professional application fields.

For example, assume that the preset symbols include "DATE" corresponding to the date, "CURRENCY" corresponding to the currency, "BANK" corresponding to the bank, and "TIME" corresponding to the time. For the receipt of a text message "Your personal credit card November bill RMB 481.93, expired repayment date November 23rd. [China Merchants Bank]", after identification, replacement, become "your personal credit card DATE bill CURRENCY, expired Day DATE. [BANK]". For another SMS received, “Respected customers, your personal loan at ICBC needs to be repaid before 17:00 on May 14, 2014. The repayment amount is RMB 940.18. [ICBC]”, after Identification, After the replacement, become a "respected customer, your personal loan in BANK must be repaid before TIME DATE, the repayment amount of principal and interest total CURRENCY. [BANK]".

Step 102: Acquire, in the replaced text information, a first symbol corresponding to the information to be extracted and context information of the first symbol.

Here, the first symbol and the context information of the first symbol need to be acquired in the text information to determine, by subsequent steps, whether the semantics of the first symbol in the text information conform to the semantics of the information to be extracted.

Assuming that the information to be extracted is the repayment date, the context information of the symbols "DATE" and "DATE" corresponding to the repayment date needs to be obtained in the replaced text information.

Step 103: Determine, according to the context information of the first symbol, whether the first symbol meets the semantics of the information to be extracted, and if yes, extract information that is replaced by the first symbol from the text information. Output.

Here, a plurality of first symbols may be acquired in the text information, and the semantics of each of the first symbols may be different in the text information. Therefore, it is required to combine the context information of the first symbol to determine whether the first symbol conforms to the semantics of the information to be extracted. If it is met, the information replaced by the first symbol is the information to be extracted, and the information replaced by the first symbol is extracted from the text information and output.

Still taking the above mentioned text message "Your personal credit card November bill RMB 4818.93, due date repayment date November 23rd. [China Merchants Bank]", after identification, replacement, this text message becomes "your personal credit card" DATE bill CURRENCY, due date DATE. [BANK]". Assume that the information to be extracted is the repayment date, and the symbol corresponding to the repayment date is "DATE". Then, two "DATE"s can be obtained from the text message replaced above. The two "DATE"s represent the billing date and the repayment date respectively in the short message. Therefore, it is necessary to combine the context information of "DATE" to determine whether "DATE" is Meet the semantics of the repayment date. By judging that the second "DATE" conforms to the semantics of the repayment date, the information replaced by the second "DATE" is extracted (November 23) and output, thereby extracting the repayment date from the short message. information.

The outputted information can be output to some applications of the terminal, such as outputting the repayment date to the calendar application, so as to implement functions such as date reminding.

The method for extracting text information according to the embodiment of the present invention combines the semantic features of the context of the text information to extract information, and can intelligently extract content of interest such as repayment date and repayment amount; without specifying keywords, The template matching method has greater flexibility and can adapt to different writing styles; enables the terminal to carry out various applications on the basis of intelligent understanding of the text language, thereby improving the user experience. The problem that the fixed template is difficult to extract key information flexibly and accurately in the related art is solved.

Preferably, in the foregoing step 103, the step of determining, according to the context information of the first symbol, whether the first symbol meets the semantics of the information to be extracted may include:

Step 1031: Acquire, in a preset vector database, first vector information corresponding to the first symbol and second vector information corresponding to context information of the first symbol.

Here, the first vector information corresponding to the first symbol and the second vector information corresponding to the context information of the first symbol may be acquired in the pre-trained vector database to perform weighting calculation by the subsequent steps.

The vector database may include a vector value corresponding to each symbol and a vector value corresponding to a word and/or a word that may be used in the context. When the second vector information corresponding to the context information of the first symbol is obtained, the vector value corresponding to each word and/or word included in the context information may be obtained to obtain a vector sequence. To ensure the accuracy of the calculation, the vector in the vector sequence should be consistent with the contextual order of the text information.

Step 1032: Perform a weighting operation according to the first vector information and the second vector information, and determine, according to the operation result, whether the first symbol conforms to the semantics of the information to be extracted.

Here, the weighting operation is performed according to the acquired vector information, and according to the operation result, it is determined whether the first symbol conforms to the semantics of the information to be extracted (such as the repayment date).

At this time, the weighting operation based on the vector information can accurately determine the semantics of the first symbol, thereby achieving the purpose of accurately extracting key information.

Optionally, the foregoing step 1032 may include:

Step 10321: Perform weighting operations on the first vector information and the second vector information by using weight coefficients corresponding to preset multiple information types to obtain an operation result.

Here, it is assumed that three types of information are set in advance: the repayment date, the repayment amount, and others, and the vector information obtained by the first symbol and the context is weighted separately from the weight coefficients corresponding to the three types of information, and three are calculated. Probability values.

Step 10322: Determine, according to the operation result, an information type of the first symbol.

Here, the information type of the first symbol is determined by the calculated probability value of each type of information. The information type with the largest probability value can be selected as the information type of the first symbol.

Step 10323: Determine whether the information type of the first symbol is consistent with the information type of the information to be extracted, and if yes, determine that the first symbol conforms to the semantics of the information to be extracted, otherwise, determine the first The symbol does not conform to the semantics of the information to be extracted.

Here, if the information type of the first symbol is consistent with the information type of the information to be extracted, it may be determined that the first symbol conforms to the semantics of the information to be extracted, otherwise, it may be determined that the first symbol does not conform to the semantics of the information to be extracted.

Wherein, if three types of information are set in advance: repayment date, repayment amount, and others, the information type of the information to be extracted may be a repayment date and a repayment amount, that is, it is possible to simultaneously extract a plurality of information to be extracted.

At this time, the weighting operation is performed by the weight coefficient corresponding to the preset information type, and the semantics of the first symbol can be accurately determined, thereby achieving the purpose of accurately extracting key information.

Preferably, the step of the above step 10321 may include:

Step 103211: Perform a pre-trained model using a bidirectional long- and short-range memory model neural network or a convolutional neural network, and perform pre-processing on the first vector information and the second vector information to obtain a combined vector.

Step 103212: Perform weighting operations on the weight coefficients corresponding to the multiple information types according to the combination vector.

At this time, the model pre-trained by the two-way long- and short-range memory model neural network or the convolutional neural network first pre-processes the first vector information and the second vector information to obtain a combined vector of the first symbol and the context, and then passes the combination. Vector weight coefficient corresponding to multiple information types Do not perform weighting operations, can accurately determine the semantics of the first symbol, so as to accurately extract key information.

Preferably, in the step 101, the step of identifying the information corresponding to the preset one or more symbols in the text information may include:

In step 1011, the information corresponding to the preset one or more symbols in the text information is identified by using a regular expression and/or a keyword matching manner.

At this time, the regular expression and/or keyword matching method can accurately identify the information corresponding to the preset symbol in the text information.

Preferably, the foregoing step 102 may include:

Step 1021: In the replaced text information, acquiring a first symbol corresponding to the information to be extracted, and acquiring a first preset number of characters and/or the first symbol before the first symbol A second predetermined number of characters including words and/or words.

Here, for the simplicity of the operation, a symmetrical context form can be employed. If the first preset number and the second preset number are both set to 5, it is necessary to acquire 5 characters before and after the first symbol.

In addition, because the Chinese sentence is very free, it is generally more important than the following to identify the current symbol. Therefore, an asymmetric context can also be used. If the first preset number is set to 7 and the second preset number is set to 5, it is necessary to acquire 7 characters before the first symbol and 5 characters after the first symbol.

At this time, the number of characters of the context can be defined as needed to better distinguish the semantics of the first symbol in combination with the context.

The number of characters determining the context is equivalent to determining the size of the context window of the current symbol, and the semantics of the current symbol are subsequently determined by the characters in the context window. Assume that the first preset number and the second preset number are both set to 5. For DATE in "expiration repayment date DATE. [BANK]", if DATE is the current symbol to discriminate semantics, the context window contains words It is “to”, “period”, “return”, “model”, “day”, “.”, “[”, “BANK”, “]”.

Optionally, after the step 1021, the extracting method may further include:

Step 1022, culling the obtained character before the first symbol and the first symbol The preset useless characters are included in the following characters, and the preset useless characters include punctuation marks, modal particles, and blank symbols.

At this time, by eliminating the characters with little semantic discrimination, some unnecessary calculations are avoided, and the processing efficiency is improved. Further, the preset useless characters may also include some special symbols and the like.

Since a single word often cannot accurately express a specific semantic, a word composed of several words can accurately express a specific semantic. For example, the meanings of "public" and "division" are completely different from "company". In order to make the judgment of the semantics more convenient, the above step 102 may include:

Step 1023: Perform word segmentation on the replaced text information.

Step 1024: Acquire, in the text information after the word segmentation, the first symbol corresponding to the information to be extracted and the context information of the first symbol.

At this time, the word segmentation technique can be used to first perform word segmentation on the content of the text information, that is, the common words are separated, thereby facilitating the semantic judgment.

Wherein, after the word segmentation, the word vector corresponding to the word can be directly read, and the corresponding word vector does not have to be read. In addition, when the training sample is large enough, the above-mentioned word segmentation process can be omitted, because the model of the weighting operation can express the semantics of various combinations of different words when the sample is sufficient.

In summary, the method for extracting text information according to the embodiment of the present invention combines the semantic features of the context of the text information to extract information, and can intelligently extract content of interest such as repayment date and repayment amount; Compared with the traditional template matching method, it has greater flexibility and can adapt to different writing styles; enables the terminal to carry out various applications on the basis of intelligent understanding of the text language, facilitating the realization of smart reminders and other functions; Subsequent storage, retrieval and other applications have improved the user experience. The problem that the fixed template is difficult to extract key information flexibly and accurately in the related art is solved.

As shown in FIG. 2, an embodiment of the present invention further provides an apparatus for extracting text information, including:

The replacement module 201 is configured to identify information corresponding to the preset one or more symbols in the text information, and replace the identified information with the corresponding symbol;

The obtaining module 202 is configured to: obtain, in the replaced text information, a first symbol corresponding to the information to be extracted and context information of the first symbol;

The extracting module 203 is configured to determine, according to the context information of the first symbol, whether the first symbol conforms to the semantics of the information to be extracted, and if yes, extract the text information to be replaced by the first symbol Information and output.

The text information extracting apparatus of the embodiment of the present invention combines the semantic features of the context of the text information to extract information, and can intelligently extract content of interest such as repayment date and repayment amount; without specifying a keyword, The template matching method has greater flexibility and can adapt to different writing styles; enables the terminal to carry out various applications on the basis of intelligent understanding of the text language, thereby improving the user experience. The problem that the fixed template is difficult to extract key information flexibly and accurately in the related art is solved.

Preferably, the extraction module 203 includes:

Preferably, the first determining submodule comprises:

The first weighting operation unit is configured to perform weighting operations on the first vector information and the second vector information by using weight coefficients corresponding to the preset plurality of information types to obtain an operation result;

a first determining unit, configured to determine, according to the operation result, an information type of the first symbol;

a second determining unit, configured to determine whether the information type of the first symbol is consistent with the information type of the information to be extracted, and if yes, determining that the first symbol conforms to the semantics of the information to be extracted, otherwise, determining The first symbol does not conform to the semantics of the information to be extracted.

Preferably, the first weighting operation unit includes:

a pre-processing sub-unit, configured to pre-train the model using a bidirectional long- and short-range memory model neural network or a convolutional neural network, and pre-process the first vector information and the second vector information to obtain a combined vector;

The first weighting operation subunit is configured to perform a weighting operation on the weight coefficients corresponding to the plurality of information types according to the combination vector.

Preferably, the replacement module 201 includes:

The identification sub-module is configured to identify information corresponding to the preset one or more symbols in the text information by using a regular expression and/or a keyword matching manner.

Preferably, the obtaining module 202 includes:

a second obtaining sub-module, configured to acquire, in the replaced text information, a first symbol corresponding to the information to be extracted, and acquire a first preset number of characters and/or the first symbol A second predetermined number of characters after the first symbol, the characters including words and/or words.

Preferably, the extracting device further includes:

The culling module is configured to cull the obtained character before the first symbol and the preset useless characters included in the character after the first symbol, the preset useless characters including punctuation marks, modal characters and blank symbols.

Preferably, the obtaining module 202 includes:

a word segmentation sub-module, configured to perform word segmentation on the replaced text information;

The third obtaining sub-module is configured to acquire, in the text information after the word segmentation processing, the first symbol corresponding to the information to be extracted and the context information of the first symbol.

In summary, the text information extracting apparatus of the embodiment of the present invention combines the semantic features of the context of the text information to extract information, and can intelligently extract content of interest such as repayment date and repayment amount; Compared with the traditional template matching method, it has greater flexibility and can adapt to different writing styles; enables the terminal to carry out various applications on the basis of intelligent understanding of the text language, facilitating the realization of smart reminders and other functions; Subsequent storage, retrieval The application experience has improved the user experience. The problem that the fixed template is difficult to extract key information flexibly and accurately in the related art is solved.

It should be noted that the apparatus for extracting the text information is a device corresponding to the method for extracting the text information, wherein all the implementation manners in the foregoing method embodiments are applicable to the embodiment of the device, and the same technical effect can be achieved. .

The text information extracting apparatus of the embodiment of the present invention is applied to a mobile terminal. Therefore, the embodiment of the present invention further provides a mobile terminal, including: the text information extracting apparatus as described in the foregoing embodiment. The implementation examples of the foregoing text information extracting apparatus are applicable to the embodiment of the mobile terminal, and the same technical effects can be achieved. The mobile terminal of the present invention may be a mobile electronic device such as a mobile phone or a tablet computer.

Embodiments of the present invention also provide a storage medium. Optionally, in this embodiment, the foregoing storage medium stores an execution instruction, where the execution instruction is used to perform one or a combination of the steps in the foregoing method embodiments.

Optionally, in the embodiment, the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), and a Random Access Memory (RAM). A variety of media that can store program code, such as a hard disk, a disk, or an optical disk.

In the various embodiments of the present invention, it should be understood that the size of the sequence numbers of the above processes does not mean the order of execution, and the order of execution of each process should be determined by its function and internal logic, and should not be taken to the embodiments of the present invention. The implementation process constitutes any limitation.

The above is a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention.

Industrial applicability

As described above, the method, apparatus, and mobile terminal for extracting text information provided by the embodiments of the present invention have the following beneficial effects: can be combined with the semantic features of the context of the text information. The extraction of information can intelligently extract the content of interest to the user; it does not need to specify keywords, and has greater flexibility than the traditional template matching method, and can adapt to different styles of writing; the terminal is developed on the basis of intelligent understanding of the text language. A variety of applications that enhance the user experience. The problem that the fixed template is difficult to extract key information flexibly and accurately in the related art is solved.

Claims

A method for extracting text information, comprising:

Identifying information corresponding to the preset one or more symbols in the text information, and replacing the identified information with the corresponding symbol;

Obtaining, in the replaced text information, a first symbol corresponding to the information to be extracted and context information of the first symbol;

Determining, according to the context information of the first symbol, whether the first symbol conforms to the semantics of the information to be extracted, and if yes, extracting information replaced by the first symbol from the text information and outputting the information.
The extraction method according to claim 1, wherein the step of determining whether the first symbol conforms to the semantics of the information to be extracted according to the context information of the first symbol comprises:

Obtaining, in a preset vector database, first vector information corresponding to the first symbol and second vector information corresponding to context information of the first symbol;

Performing a weighting operation according to the first vector information and the second vector information, and determining, according to the operation result, whether the first symbol conforms to the semantics of the information to be extracted.
The extraction method according to claim 2, wherein the weighting operation is performed according to the first vector information and the second vector information, and based on the operation result, determining whether the first symbol meets the information to be extracted The semantic steps include:

And performing weighting operations on the first vector information and the second vector information by using weight coefficients corresponding to preset multiple information types to obtain an operation result;

Determining an information type of the first symbol according to the operation result;

Determining whether the information type of the first symbol is consistent with the information type of the information to be extracted, and if yes, determining that the first symbol conforms to the semantics of the information to be extracted, otherwise determining that the first symbol does not match The semantics of the information to be extracted.
The extraction method according to claim 3, wherein said according to said first The vector information and the second vector information are respectively subjected to weighting operations by using weight coefficients corresponding to preset multiple information types, including:

Using a bidirectional long- and short-range memory model neural network or a pre-trained model of a convolutional neural network, pre-processing the first vector information and the second vector information to obtain a combined vector;

Performing a weighting operation according to the weight coefficients corresponding to the plurality of information types by the combination vector.
The extraction method according to claim 1, wherein the step of identifying information corresponding to the preset one or more symbols in the text information comprises:

The information corresponding to the preset one or more symbols in the text information is identified by means of regular expressions and/or keyword matching.
The extraction method according to claim 1, wherein the step of acquiring the first symbol corresponding to the information to be extracted and the context information of the first symbol in the replaced text information comprises:

Obtaining, in the replaced text information, a first symbol corresponding to the information to be extracted, and acquiring a first preset number of characters before the first symbol and/or a second pre-after the first symbol A number of characters including words and/or words.
The extraction method according to claim 6, wherein in the replaced text information, acquiring a first symbol corresponding to the information to be extracted, and acquiring a first preset number of the first symbol After the word and/or the word, the second predetermined number of words and/or words after the first symbol, the extracting method further includes:

The obtained characters before the first symbol and the preset useless characters included in the characters after the first symbol are excluded, and the preset useless characters include punctuation marks, modal characters and blank symbols.
The extraction method according to claim 1, wherein said replacement place In the text information, the step of acquiring the first symbol corresponding to the information to be extracted and the context information of the first symbol includes:

Performing word segmentation on the replaced text information;

In the text information after the word segmentation process, the first symbol corresponding to the information to be extracted and the context information of the first symbol are acquired.
A text information extraction device includes:

The replacement module is configured to identify information corresponding to the preset one or more symbols in the text information, and replace the identified information with the corresponding symbol;

Obtaining a module, configured to acquire, in the replaced text information, a first symbol corresponding to the information to be extracted and context information of the first symbol;

The extracting module is configured to determine, according to the context information of the first symbol, whether the first symbol conforms to the semantics of the information to be extracted, and if yes, extract the content replaced by the first symbol Information and output.
The extraction device according to claim 9, wherein the extraction module comprises:

a first acquiring sub-module, configured to acquire, in a preset vector database, first vector information corresponding to the first symbol and second vector information corresponding to context information of the first symbol;

The first determining sub-module is configured to perform a weighting operation according to the first vector information and the second vector information, and determine, according to the operation result, whether the first symbol conforms to the semantics of the information to be extracted.
A mobile terminal comprising: the text information extracting apparatus according to any one of claims 9-10.