CN113111168A - Alarm receiving and processing text household registration information extraction method and device based on deep learning model - Google Patents
Alarm receiving and processing text household registration information extraction method and device based on deep learning model Download PDFInfo
- Publication number
- CN113111168A CN113111168A CN202010306789.9A CN202010306789A CN113111168A CN 113111168 A CN113111168 A CN 113111168A CN 202010306789 A CN202010306789 A CN 202010306789A CN 113111168 A CN113111168 A CN 113111168A
- Authority
- CN
- China
- Prior art keywords
- household
- information
- word segmentation
- text
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012545 processing Methods 0.000 title claims abstract description 77
- 238000013136 deep learning model Methods 0.000 title claims abstract description 66
- 238000000605 extraction Methods 0.000 title claims abstract description 30
- 230000011218 segmentation Effects 0.000 claims abstract description 137
- 239000013598 vector Substances 0.000 claims abstract description 99
- 238000012549 training Methods 0.000 claims abstract description 80
- 238000013145 classification model Methods 0.000 claims abstract description 57
- 238000000034 method Methods 0.000 claims abstract description 40
- 238000002372 labelling Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 10
- 239000003795 chemical substances by application Substances 0.000 description 9
- 238000010586 diagram Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 5
- 210000001503 joint Anatomy 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 125000000484 butyl group Chemical group [H]C([*])([H])C([H])([H])C([H])([H])C([H])([H])[H] 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/387—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Tourism & Hospitality (AREA)
- Computing Systems (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Library & Information Science (AREA)
- Technology Law (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the disclosure discloses an alarm receiving and processing text household registration information extraction method and device based on a deep learning model. One embodiment of the method comprises: acquiring an alarm receiving and processing text of the household registration information to be extracted; performing word segmentation on the alarm receiving and processing text of the household registration information to be extracted to obtain a corresponding word segmentation sequence; for each participle in the resulting sequence of participles, performing the following household information classification operations: inputting the word vector corresponding to the word segmentation into a household location information classification model to obtain a classification result of whether the word segmentation is household location information, wherein the household location information classification model is obtained by pre-training based on a deep learning model; and determining the household registration area information set corresponding to the household registration area information alarm receiving text to be extracted for each word indicating household registration area information according to the corresponding classification result in the word segmentation sequence. The embodiment realizes the automatic extraction of the household registration information in the alarm receiving and processing text.
Description
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to an alarm receiving and processing text household registration information extraction method and device based on a deep learning model.
Background
Currently, a 110-degree alarm receiving person in a public security organization enters an alarm receiving text when receiving an alarm. The alarm handling person can enter an alarm handling text after the alarm handling is finished. The alarm receiving and processing text comprises the alarm receiving text and the alarm processing text. In practice, a large number of alarm receiving texts relate to descriptions of information about the residences of the involved persons. The home information may include a home identifier and a corresponding home address. For example, the "first-family unit 202" of the first-family province, the second-family city, the third-family district is a piece of household information, wherein the "current family" is a household identifier for indicating a household address corresponding to the "current family" in the following content, and the "first-family unit 202" of the first-family city, the third-family district, the fifth-family district is a household address corresponding to the "current family". For another example, the "family two unit 301 of the c/c cell of the first province and the second city of the ancestry" is also a piece of family information, wherein the "family" is a family identifier for indicating that the following content is the corresponding family address, and the "family two unit 301 of the c/c cell of the first province and the second city of the ancestry" is the corresponding family address of the "family".
For case analysts, it is very important to extract the household location information in the alarm receiving and processing text. For example, case analysts can perform statistical analysis on household registration information extracted from a large number of historical alarm receiving and processing texts to obtain relevant information of criminals of specified types in a certain province and a certain city, and further provide data basis for management of household registration addresses for personnel in the certain province and the certain city in the future. At present, the household location information in the alarm receiving and processing text is extracted manually, but the manual cost for extracting the household location information in the alarm receiving and processing text manually is too high and depends on personal experience.
Disclosure of Invention
The embodiment of the disclosure provides an alarm receiving and processing text household registration information extraction method and device based on a deep learning model.
In a first aspect, an embodiment of the present disclosure provides a method for extracting alarm receiving and processing text household information based on a deep learning model, where the method includes: acquiring an alarm receiving and processing text of the household registration information to be extracted; performing word segmentation on the alarm receiving and processing text of the household registration information to be extracted to obtain a corresponding word segmentation sequence; for each participle in the resulting sequence of participles, performing the following household information classification operations: inputting the word vector corresponding to the word segmentation into a household location information classification model to obtain a classification result of whether the word segmentation is household location information, wherein the household location information classification model is obtained by pre-training based on a deep learning model; and determining the household registration area information set corresponding to the household registration area information alarm receiving text to be extracted for each word indicating household registration area information according to the corresponding classification result in the word segmentation sequence.
In some embodiments, the household information classification model based on the deep learning model is obtained by training in advance through the following training steps: acquiring a training sample set, wherein the training sample comprises a word segmentation sequence obtained by segmenting a historical alarm receiving and processing text and a labeling information sequence corresponding to the word segmentation sequence, and the labeling information is used for indicating whether corresponding words in the word segmentation sequence are household address information or not; determining each training sample of the corresponding word segmentation sequence in the training sample set, which comprises the household information word segmentation, as a positive sample set, wherein the household information word segmentation is the word segmentation of which the corresponding labeled information in the word segmentation sequence indicates that the word segmentation is the household information; determining a text feature vector of each positive sample according to each household address information participle included in the participle sequence of each positive sample in the positive sample set; and training an initial deep learning model by taking the text feature vector of the positive sample in the positive sample set as an input and taking the classification result indicating the domicile information as a corresponding expected output to obtain the domicile information classification model.
In some embodiments, the training step further comprises: inputting preset negative sample feature vectors into the household registration place information classification model to obtain corresponding actual output results; and adjusting the model parameters of the household information classification model according to the difference between the obtained actual output result and the classification result indicating that the household information is not the household information.
In some embodiments, the determining the text feature vector of the positive sample according to the household information participles included in the participle sequence of each positive sample in the positive sample set includes: for each positive sample in the set of positive samples, performing the following vector generation and assignment operations: generating a text characteristic vector corresponding to the positive sample, wherein each component in the generated text characteristic vector corresponds to each word in a preset dictionary one by one; setting a component corresponding to the household information participle in the generated text characteristic vector as a word frequency-inverse text frequency index TF-IDF of the household information participle for each household information participle in the word sequence of the positive sample; and setting each unassigned component in the generated text feature vector as a preset numerical value, wherein the unassigned component is a component corresponding to a word of each household address information word in the word segmentation sequence which belongs to the preset dictionary but does not belong to the positive sample.
In a second aspect, an embodiment of the present disclosure provides an apparatus for extracting alarm receiving and processing text household information based on a deep learning model, where the apparatus includes: an obtaining unit configured to obtain an alarm receiving and processing text of the household registration information to be extracted; the word segmentation unit is configured to segment words of the alarm receiving and processing text of the household registration information to be extracted to obtain a corresponding word segmentation sequence; a classification unit configured to perform, for each participle in the obtained participle sequence, the following household information classification operations: inputting the word vector corresponding to the word segmentation into a household location information classification model to obtain a classification result of whether the word segmentation is household location information, wherein the household location information classification model is obtained by pre-training based on a deep learning model; and the determining unit is configured to determine the household location information set corresponding to the household location information alarm receiving and processing text to be extracted for each participle indicating household location information according to the corresponding classification result in the participle sequence.
In some embodiments, the household information classification model based on the deep learning model is obtained by training in advance through the following training steps: acquiring a training sample set, wherein the training sample comprises a word segmentation sequence obtained by segmenting a historical alarm receiving and processing text and a labeling information sequence corresponding to the word segmentation sequence, and the labeling information is used for indicating whether corresponding words in the word segmentation sequence are household address information or not; determining each training sample of the corresponding word segmentation sequence in the training sample set, which comprises the household information word segmentation, as a positive sample set, wherein the household information word segmentation is the word segmentation of which the corresponding labeled information in the word segmentation sequence indicates that the word segmentation is the household information; determining a text feature vector of each positive sample according to each household address information participle included in the participle sequence of each positive sample in the positive sample set; and training an initial deep learning model by taking the text feature vector of the positive sample in the positive sample set as an input and taking the classification result indicating the domicile information as a corresponding expected output to obtain the domicile information classification model.
In some embodiments, the training step further comprises: inputting preset negative sample feature vectors into the household registration place information classification model to obtain corresponding actual output results; and adjusting the model parameters of the household information classification model according to the difference between the obtained actual output result and the classification result indicating that the household information is not the household information.
In some embodiments, the determining the text feature vector of the positive sample according to the household information participles included in the participle sequence of each positive sample in the positive sample set includes: for each positive sample in the set of positive samples, performing the following vector generation and assignment operations: generating a text characteristic vector corresponding to the positive sample, wherein each component in the generated text characteristic vector corresponds to each word in a preset dictionary one by one; setting a component corresponding to the household information participle in the generated text characteristic vector as a word frequency-inverse text frequency index TF-IDF of the household information participle for each household information participle in the word sequence of the positive sample; and setting each unassigned component in the generated text feature vector as a preset numerical value, wherein the unassigned component is a component corresponding to a word of each household address information word in the word segmentation sequence which belongs to the preset dictionary but does not belong to the positive sample.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any implementation manner of the first aspect.
In a fourth aspect, the present disclosure provides a computer-readable storage medium on which a computer program is stored, wherein the computer program, when executed by one or more processors, implements the method as described in any implementation manner of the first aspect.
In the prior art, the household registration information in the alarm receiving and processing text is generally extracted manually, and the following problems may exist: (1) a large amount of alarm receiving and processing texts which have not been extracted from the household registration information are left in history, and an alarm receiving and processing worker can enter a new large amount of alarm receiving and processing texts every day along with the lapse of time, so that the data volume of the alarm receiving and processing texts of the household registration information to be extracted is too large, and the labor cost and the time cost required by manual extraction are too high; (2) the receiving and processing of the alarm texts mostly adopts natural language description and has serious spoken language and irregular expression modes, and the difficulty of manually extracting the household registration information is high; (3) the household registration information has more household registration identification and address information types, and different household registration information extraction methods for different types of different items depend on manual experience, i.e. the learning cost in the manual extraction process is higher.
According to the method and the device for extracting the household registration location information based on the deep learning model, the household registration location information receiving and processing alarm text to be extracted is subjected to word segmentation to obtain a corresponding word segmentation sequence, and then for each word segmentation in the obtained word segmentation sequence, a word vector corresponding to the word segmentation is input into a household registration location information classification model obtained through pre-training, so that the household registration location information in the household registration location information receiving and processing alarm text to be extracted is extracted. Therefore, the household registration area information classification model is effectively utilized, the automatic extraction of the household registration area information from the butt joint alarm handling text is realized, manual operation is not needed, the cost of the household registration area information extraction from the butt joint alarm handling text is reduced, and the extraction speed of the household registration area information extraction from the butt joint alarm handling text is improved.
Drawings
Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;
FIG. 2 is a flow diagram of one embodiment of a deep learning model-based alarm-receiving text household information extraction method according to the present disclosure;
FIG. 3 is a flow chart of one embodiment of training steps according to the present disclosure;
FIG. 4 is a schematic diagram illustrating an embodiment of an apparatus for extracting alarm-receiving text household information based on a deep learning model according to the present disclosure;
FIG. 5 is a schematic block diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an exemplary system architecture 100 to which an embodiment of a deep learning model-based alarm-receiving text domicile information extraction method or a deep learning model-based alarm-receiving text domicile information extraction apparatus of the present disclosure may be applied.
As shown in fig. 1, system architecture 100 may include terminal device 101, network 102, and server 103. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal device 101 to interact with server 103 over network 102 to receive or send messages and the like. Various communication client applications, such as an alarm receiving and processing record application, an alarm receiving and processing text household information extraction application, a web browser application, etc., may be installed on the terminal device 101.
The terminal apparatus 101 may be hardware or software. When the terminal device 101 is hardware, it may be various electronic devices having a display screen and supporting text input, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatus 101 is software, it can be installed in the electronic apparatuses listed above. It may be implemented as multiple software or software modules (e.g., to provide alarm-receiving text household information extraction services) or as a single software or software module. And is not particularly limited herein.
The server 103 may be a server providing various services, such as a background server providing extraction of the household information for the alarm receiving text sent by the terminal device 101. The background server can analyze and process the received alarm receiving and processing text, and feed back the processing result (such as the household information) to the terminal equipment.
In some cases, the method for extracting alarm receiving text household information based on the deep learning model provided by the embodiment of the present disclosure may be performed by the terminal device 101 and the server 103 together, for example, the step of "obtaining the alarm receiving text of household information to be extracted" may be performed by the terminal device 101, and the rest of the steps may be performed by the server 103. The present disclosure is not limited thereto. Accordingly, the alarm receiving text household information extracting device based on the deep learning model can also be respectively arranged in the terminal equipment 101 and the server 103.
In some cases, the method for extracting alarm receiving and processing text household information based on the deep learning model provided by the embodiment of the present disclosure may be executed by the server 103, and accordingly, the device for extracting alarm receiving and processing text household information based on the deep learning model may also be disposed in the server 103, in which case, the system architecture 100 may also not include the terminal device 101.
In some cases, the method for extracting alarm receiving and processing text household information based on the deep learning model provided by the embodiment of the present disclosure may be executed by the terminal device 101, and accordingly, the device for extracting alarm receiving and processing text household information based on the deep learning model may also be disposed in the terminal device 101, in which case, the system architecture 100 may also not include the server 103.
The server 103 may be hardware or software. When the server 103 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 103 is software, it may be implemented as a plurality of software or software modules (for example, for providing an alarm receiving text household information extraction service), or as a single software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to fig. 2, a flow 200 of one embodiment of a deep learning model-based alarm-receiving text household information extraction method according to the present disclosure is shown. The method for extracting the alarm receiving and processing text household registration information based on the deep learning model comprises the following steps:
In this embodiment, an executing entity (for example, a server shown in fig. 1) of the deep learning model based alarm receiving and processing text household information extracting method may obtain the locally stored alarm receiving and processing text of the household information to be extracted, or the executing entity may also remotely obtain the alarm receiving and processing text of the household information to be extracted from other electronic devices (for example, terminal devices shown in fig. 1) connected to the executing entity through a network.
Here, the alarm receiving and processing text of the household location information to be extracted may be text data arranged by an alarm receiver according to the content of an alarm receiving telephone or text data arranged by an alarm processor according to an alarm processing procedure. The alarm receiving and processing text of the household location information to be extracted can also be an alarm text which is received from the terminal equipment and is input by the user in an alarm application installed on the terminal equipment or a webpage with an alarm function.
In this embodiment, the executing entity may adopt various implementation manners to perform word segmentation on the alarm receiving and processing text of the household location information to be extracted to obtain a corresponding word segmentation sequence. It should be noted that how to cut words of text is the prior art of extensive research and application in this field, and will not be described herein. For example, a word segmentation method based on string matching, a word segmentation method based on understanding, or a word segmentation method based on statistics, etc. may be employed. For example, the word segmentation sequence "zhang san/ancestor/first/province/second/city/presence/third/province/butyl/city/business/supply/home/environment/enterprise" can be obtained by performing word segmentation on the household information to be extracted and the alarm text "zhang san, ancestor first province, second city, third province, and fourth province, and the environment-friendly enterprise".
In this embodiment, the executing agent may execute the household information classification operation on each participle in the participle sequence obtained in step 202. Here, the household information classification operation is to input the word vector corresponding to the participle into the household information classification model, and obtain the classification result of whether the participle is household information.
Here, the household information classification model is trained in advance based on the deep learning model.
In this embodiment, the execution subject may first determine a word vector corresponding to the word segmentation in various implementations.
In some optional implementations, the word vector corresponding to the segmented word may include N-dimensional components, where N is a positive integer, and each dimensional component in the N-dimensional components corresponds to each word in the preset dictionary one to one. In the process of determining the word vector corresponding to the word segmentation, a component corresponding to the word segmentation in each component of the word vector of the word segmentation may be set to a first preset value (e.g., 1); the other component of the word vector corresponding to the participle (i.e., the component corresponding to a word in the preset dictionary other than the participle) is set to a second preset numerical value (e.g., 0).
In some optional implementations, the word vector corresponding to the segmented word may include N-dimensional components, where N is a positive integer, and each dimensional component in the N-dimensional components corresponds to each word in the preset dictionary one to one. In the process of determining the word vector corresponding to the participle, the executing main body may also first calculate a word Frequency-Inverse text Frequency index (TF-IDF, Term Frequency-Inverse Document Frequency) of the participle in the registered alarm text of the household information to be extracted, then set a component corresponding to the participle in the word vector corresponding to the participle as the calculated word Frequency-Inverse text Frequency index of the participle, and finally set other components of the word vector corresponding to the participle (i.e., components corresponding to words in a preset dictionary different from the participle) as a third preset value (e.g., 0).
Then, the executing agent may input the word vector corresponding to the word segmentation into the household information classification model to obtain a classification result of whether the word segmentation is household information. For example, to-be-extracted household location information alarm receiving text "every participle in the corresponding participle sequence" three-page/name/presence/third-page/second-page/third-page of the Zhang nations 'province/third-page of the Zhang nations' city "and the word vector corresponding to the participle is input into a pre-. Referring to table 1, table 1 shows the classification result obtained by inputting each participle in the participle sequence into the household location information classification model.
TABLE 1
Word segmentation | Classification result |
Ancestral book | Is that |
First of all | Is that |
Economic | Is that |
Second step | Is that |
City (R) | Is that |
Is/are as follows | Whether or not |
Zhang San | Whether or not |
Balance | Whether or not |
Book of the present | Is that |
C3 | Is that |
Economic | Is that |
T-shirt | Is that |
City (R) | Is that |
Is/are as follows | Whether or not |
Li Si | Whether or not |
Delinquent | Whether or not |
It is composed of | Whether or not |
Payroll | Whether or not |
A | Whether or not |
All the details of | Whether or not |
Yuan | Whether or not |
And 204, determining a household location information set corresponding to the household location information alarm receiving text to be extracted according to the corresponding classification result in the word segmentation sequence, wherein the word segmentation result is used for indicating that the word is household location information.
Here, in step 203, a word in the word segmentation sequence is input into the household information classification model, a classification result indicating whether the word is household information is obtained, and if the classification result indicating that the word is household information is obtained, the word is household information word segmentation. In step 204, the executing entity may determine, by using various implementations, a household information set corresponding to the household information alarm receiving text to be extracted according to each household information participle in the participle sequence.
In some alternative implementations, the executing entity may determine each household information participle in the participle sequence as the household information in the household information set. This implementation is more suitable for the segmentation sequence obtained by word segmentation, in which each segmentation itself is a relatively complete information of the household location.
In some alternative implementations, the executing entity may also combine the household information participles directly adjacent to each other in the participle sequence into one household information, and use the obtained household information as the household information in the household information set. The implementation mode is more suitable for the situation that each participle in the participle sequence obtained by word segmentation is relatively short and cannot form complete household information. Continuing with the above example of "the law is set by the law of wording ten thousand yuan for the zhang san call of the nationality first province, second province, third province, second city, fourth city, wording four/default, work/ten thousand yuan" regarding the alarm text of the information of the household location to be extracted, according to the classification result in table 1, the corresponding word segmentation sequence is "the household location information comprising: "ancestral book", "first", "province", "second", "city", "current book", "third", "province", "D" and "city". In order to form the household information with practical significance, the household information participles which are directly adjacent to each other can be combined into the household information according to the positions of the household information participles in the participle sequence, and then the household information in the household information set is obtained. For example, the following set of household location information { "family a, province, city b", "family c, city d" } may be obtained here.
It should be noted that the alarm receiving and processing text of the household information to be extracted may not include any household information, and the household information set corresponding to the alarm receiving and processing text of the household information to be extracted may be empty. The alarm receiving and processing text of the household information to be extracted may also include at least any household information, and the household information set corresponding to the alarm receiving and processing text of the household information to be extracted may include at least one household information.
In some alternative implementations, the household information classification model based on the deep learning model may be obtained by pre-training through a training step as shown in fig. 3. Referring to fig. 3, fig. 3 illustrates a flow 300 of one embodiment of training steps according to the present disclosure. The training step comprises the following steps:
here, the execution subject of the training step may be the same as that of the above-described alarm-receiving text-to-domicile information extraction method based on the deep learning model. In this way, the executing agent of the training step may store the model parameters of the household information classification model in the local of the executing agent after the household information classification model is obtained by training, and read the model parameters of the household information classification model obtained by training in the process of executing the alarm-receiving text household information extraction method based on the deep learning model.
Here, the execution subject of the training step may also be different from the execution subject of the above-described alarm-receiving text-to-home information extraction method based on the deep learning model. In this way, the executing agent of the training step may send the model parameters of the household information classification model to the executing agent of the alarm receiving text household information extraction method based on the deep learning model after the household information classification model is obtained through training. In this way, the executing agent of the method for extracting alarm receiving and processing text household information based on the deep learning model may read the model parameters of the household information classification model received from the executing agent of the training step in the process of executing the method for extracting alarm receiving and processing text household information based on the deep learning model.
Here, the performing subject of the training step may first obtain a set of training samples. Each training sample comprises a word segmentation sequence obtained by segmenting a historical alarm receiving and processing text and a labeling information sequence corresponding to the word segmentation sequence, wherein the labeling information is used for indicating whether corresponding words in the word segmentation sequence are household address information or not.
As an example, the training sample may include a segmentation sequence "ancestry/first/province/second/city/third/title/present/third/fourth/delinquent/payroll/one/ten thousand/yuan" and a tagging information sequence "1/1/1/1/1/0/0/0/1/1/1/1/1/0/0/0/0/0/0/0/0", where "0" is used to indicate that its corresponding segmentation is not household information and "1" is used to indicate that its corresponding segmentation is household information.
In practice, a word segmentation method can be used for manually segmenting the historical alarm receiving and processing text to obtain a word segmentation sequence and labeling each word segmentation in the word segmentation sequence to obtain a corresponding labeled information sequence.
Here, the household information participle is the corresponding label information in the participle sequence, which indicates that the participle is the participle of the household information.
In this embodiment, the executing agent of the training step may determine, for each positive sample in the positive sample set determined in step 302, a text feature vector of the positive sample according to each domicile information participle included in the participle sequence of the positive sample.
In some alternative implementations, step 303 may proceed as follows: if the preset dictionary includes N words, where N is a positive integer, the text feature vector of the positive sample may include N-dimensional components, and each of the N-dimensional components corresponds to each of the words of the preset dictionary one by one. Determining the text feature vector for the positive sample may be performed as follows: for each household information participle in the word segmentation sequence of the positive sample, setting a component corresponding to the household information participle in the text feature vector of the positive sample as a fourth preset value (e.g., 1), and setting each unassigned component in the text feature vector of the positive sample as a fifth preset value (e.g., 0), where the unassigned component is a component corresponding to a word belonging to a preset dictionary but not belonging to each household information participle in the word segmentation sequence of the positive sample.
For ease of understanding, the following is exemplified: it is assumed that the preset dictionary includes 20 words, and the positive sample includes a word segmentation sequence "ancestor/first/province/second/city/third/title/present/third/city/fourth/delinquent/its/payroll/one/ten thousand/yuan" and a tagging information sequence "1/1/1/1/1/0/0/0/1/1/1/1/1/0/0/0/0/0/0/0/0", wherein "0" is used to indicate that its corresponding word segmentation is not information of a place of residence, and "1" is used to indicate that its corresponding word is information of a place of residence. Here, for each participle of the participle sequence "ancestral/first/province/second/city/third/title/present/third/fourth/delinquent/payroll/one/ten thousand/yuan" of the positive sample, if the participle is a household information participle, a component corresponding to the household information participle in the 20-dimensional text feature vector of the positive sample may be set to 1. Specifically, whether the participle is a household information participle can be determined by using the corresponding tagged information sequence of the participle sequence. Therefore, from the above labeled information sequence "ancestor/first/province/second/city/third/title/present/third/present/third/city/lie/debt/its/wage/one/ten thousand/yuan", ancestor "," first "," province "," second "," city "," present "," c "and" d "are known as household information clauses. And the components corresponding to the household information participles in the preset dictionary are respectively 1 st, 3 rd, 7 th, 8 th, 12 th, 16 th, 18 th and 20 th dimensions, the 1 st, 3 rd, 7 th, 8 th, 12 th, 16 th, 18 th and 20 th dimensions in the 20-dimensional text feature vector of the positive sample can be respectively set as 1. Then, each unassigned component in the text feature vector of the positive sample may be set to 0, that is, other components except for the 1 st, 3 rd, 7 th, 8 th, 12 th, 16 th, 18 th, and 20 th dimensions may be set to 0, so as to obtain the following text feature vector: (1,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,1,0,1).
In some alternative implementations, step 303 may also proceed as follows:
for each positive sample in the set of positive samples, the following vector generation and assignment operations are performed:
first, a text feature vector corresponding to the positive sample is generated. Here, each component in the generated text feature vector corresponds to each word in the preset dictionary one-to-one.
Secondly, for each household information participle in the participle sequence of the positive sample, setting a component corresponding to the household information participle in the generated text feature vector as a word frequency-inverse text frequency index of the household information participle.
And finally, setting each unassigned component in the generated text feature vector as a preset numerical value. Here, the unassigned component is a component corresponding to a word belonging to a preset dictionary but not to each domicile information participle in the word segmentation sequence of the positive sample.
For the sake of understanding, the above example is continued, and unlike in the above example, in the text feature vector generated here, the 1 st, 3 rd, 7 th, 8 th, 12 th, 16 th, 18 th, 20 th-dimensional components corresponding to "ancestor", "first", "province", "second", "city", "present", "c", and "d" are not set to 1, but are set to the word frequency-inverse text frequency indexes of "ancestor", "first", "province", "b", "city", "present", "c", and "d" of 0.72, 0.65, 0.52,0.71, 0.66, 0.75, 0.4, 0.19, respectively. Then, each unassigned component in the text feature vector of the positive sample may be set to 0, that is, other components except for the 1 st, 3 rd, 7 th, 8 th, 12 th, 16 th, 18 th, and 20 th dimensions may be set to 0, so as to obtain the following text feature vector: (0.72,0,0.65,0,0,0,0.52,0.71,0,0,0,0.66,0,0,0,0.75,0,0.4,0,0.19).
And step 304, taking the text feature vector of the positive sample in the positive sample set as input, taking the classification result for indicating that the sample is the household information as corresponding expected output, training an initial deep learning model, and obtaining a household information classification model.
Here, with the positive sample set, the executing entity of the training step may train the initial deep learning model with the text feature vector of the positive sample in the positive sample set as an input for indicating that the classification result is the household information as a corresponding expected output, resulting in the household information classification model. Specifically, the following can be performed:
first, the model structure of the initial deep learning model may be determined.
Here, the initial deep learning model may include various deep learning models. For example, the initial deep learning model may include at least one of: convolutional neural networks, cyclic neural networks, long-short term memory networks, conditional random fields.
By way of example, if the initial deep learning model is determined to be a convolutional neural network, it can be determined which layers the convolutional neural network specifically includes, such as which convolutional layers, pooling layers, fully-connected layers, and precedence relationships between layers. If convolutional layers are included, the size of the convolutional kernel of the convolutional layer, the convolution step size, can be determined. If a pooling layer is included, a pooling method may be determined.
Second, initial values of model parameters included in the initial deep learning model may be determined.
For example, if the initial deep learning model is determined to be a convolutional neural network, here, convolutional kernel parameters of convolutional layers that may be included in the convolutional neural network may be initialized, connection parameters for fully-connected layers may be initialized, and so on.
Finally, a parameter adjustment operation may be performed on the positive samples in the positive sample set until a preset training end condition is satisfied, where the parameter adjustment operation may include: inputting the text feature vector of the positive sample into an initial deep learning model to obtain a corresponding actual output result, calculating the difference between the obtained actual output result and a classification result used for indicating that the text feature vector is the household information, and adjusting the model parameters of the initial deep learning model based on the obtained difference. Here, the training end condition may include, for example, at least one of: the number of times of executing parameter adjustment operation reaches the preset maximum training number, and the calculated difference is smaller than the preset difference threshold value.
Through the parameter adjustment operation, the model parameters of the initial deep learning model are optimized, and the initial deep learning model after the parameter optimization can be determined as the household information classification model. It should be noted that how to adjust and optimize the model parameters of the initial deep learning model based on the calculated differences is a prior art widely studied and applied in the field, and is not described herein again. For example, a gradient descent method may be employed.
In some optional implementations, the flow 300 may further include the following steps 305 and 306:
and 305, inputting the preset negative sample feature vector into the household registration information classification model to obtain a corresponding actual output result.
Here, the negative example feature vector refers to a feature vector for characterizing a negative example, and the negative example is a training example in which the corresponding word segmentation sequence in the training example set does not include the household information word segmentation. Since the corresponding word segmentation sequence of the negative examples does not include the household registration information word segmentation, all the negative examples can be represented by using the preset negative example feature vector.
For another example, when the text feature vector of the positive sample is the first optional implementation manner described in step 303, that is, the fourth preset value and the fifth preset value are respectively used to represent the household information participle and the non-household information participle, the preset negative sample feature vector here may be a feature vector in which each dimensional component is the fifth preset value. That is, for example, if the text feature vector of the positive sample has 20 dimensions and the fifth preset value is 0, the preset negative sample feature vector may be: (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0).
For example, when the text feature vector of the positive sample is the second alternative implementation manner described in step 303, that is, the word frequency-inverse text frequency index and the preset value are respectively used to represent the household information participle and the non-household information participle, the preset negative sample feature vector here may be a feature vector in which each dimensional component is a preset value.
And step 306, adjusting model parameters of the household information classification model according to the difference between the obtained actual output result and the classification result indicating that the household information is not the household information.
By using the training procedure shown in the above-mentioned flow 300, the household information classification model can be automatically generated, and the labor cost for generating the household information classification model is reduced. The expression mode of people changes along with the time, the reaction also changes in the alarm receiving text, and in addition, novel household information can also appear along with the development of the society. At this time, a new training sample set can be obtained, training is carried out by adopting a training step to obtain an updated household registration information classification model, so that the expression mode change requirement of the current alarm receiving and processing text and the extraction requirement of novel household registration information are met.
According to the method provided by the embodiment of the disclosure, the household registration location information is automatically extracted from the butt-joint alarm handling text by using the household registration location information classification model, manual operation is not needed, the cost of extracting the household registration location information from the butt-joint alarm handling text is reduced, and the extraction speed of extracting the household registration location information from the butt-joint alarm handling text is increased.
With further reference to fig. 4, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of an apparatus for extracting alarm receiving text household information based on a deep learning model, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.
As shown in fig. 4, the deep learning model-based alert receiving text household information extraction apparatus 400 of the present embodiment includes: an acquisition unit 401, a word segmentation unit 402, a classification unit 403 and a determination unit 404. The acquiring unit 401 is configured to acquire an alarm receiving and processing text of the household registration information to be extracted; a word segmentation unit 402, configured to segment words of the alarm receiving and processing text of the household registration information to be extracted to obtain a corresponding word segmentation sequence; a classification unit 403 configured to perform, for each participle in the resulting sequence of participles, the following household information classification operations: inputting the word vector corresponding to the word segmentation into a household location information classification model to obtain a classification result of whether the word segmentation is household location information, wherein the household location information classification model is obtained by pre-training based on a deep learning model; a determining unit 404, configured to determine, according to the corresponding classification result in the word segmentation sequence, a home information set corresponding to the alarm receiving text of the home information to be extracted for each word used for indicating that the word is home information.
In this embodiment, specific processes of the obtaining unit 401, the word segmentation unit 402, the classification unit 403, and the determination unit 404 of the deep learning model-based alarm receiving text household information extraction apparatus 400 and technical effects thereof may respectively refer to the related descriptions of step 201, step 202, step 203, and step 204 in the corresponding embodiment of fig. 2, and are not repeated herein.
In some optional implementation manners of this embodiment, the household information classification model based on the deep learning model may be obtained by training in advance through the following training steps: acquiring a training sample set, wherein the training sample comprises a word segmentation sequence obtained by segmenting a historical alarm receiving and processing text and a labeling information sequence corresponding to the word segmentation sequence, and the labeling information is used for indicating whether corresponding words in the word segmentation sequence are household address information or not; determining each training sample of the corresponding word segmentation sequence in the training sample set, which comprises the household information word segmentation, as a positive sample set, wherein the household information word segmentation is the word segmentation of which the corresponding labeled information in the word segmentation sequence indicates that the word segmentation is the household information; determining a text feature vector of each positive sample according to each household address information participle included in the participle sequence of each positive sample in the positive sample set; and training an initial deep learning model by taking the text feature vector of the positive sample in the positive sample set as an input and taking the classification result indicating the domicile information as a corresponding expected output to obtain the domicile information classification model.
In some optional implementation manners of this embodiment, the training step may further include: inputting preset negative sample feature vectors into the household registration place information classification model to obtain corresponding actual output results; and adjusting the model parameters of the household information classification model according to the difference between the obtained actual output result and the classification result indicating that the household information is not the household information.
In some optional implementation manners of this embodiment, the determining the text feature vector of the positive sample according to the household information participles included in the participle sequence of each positive sample in the positive sample set may include: for each positive sample in the set of positive samples, performing the following vector generation and assignment operations: generating a text characteristic vector corresponding to the positive sample, wherein each component in the generated text characteristic vector corresponds to each word in a preset dictionary one by one; setting a component corresponding to the household information participle in the generated text characteristic vector as a word frequency-inverse text frequency index TF-IDF of the household information participle for each household information participle in the word sequence of the positive sample; and setting each unassigned component in the generated text feature vector as a preset numerical value, wherein the unassigned component is a component corresponding to a word of each household address information word in the word segmentation sequence which belongs to the preset dictionary but does not belong to the positive sample.
It should be noted that, for details and technical effects of implementation of each unit in the device for extracting household registration information of alarm receiving and processing text based on a deep learning model provided in the embodiment of the present disclosure, reference may be made to descriptions of other embodiments in the present disclosure, and details are not described here again.
Referring now to FIG. 5, shown is a block diagram of a computer system 500 suitable for use in implementing the electronic devices of embodiments of the present disclosure. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An Input/Output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input section 506 including a touch screen, a tablet, a keyboard, a mouse, or the like; an output section 507 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509. The above-described functions defined in the method of the present disclosure are performed when the computer program is executed by a Central Processing Unit (CPU) 501. It should be noted that the computer readable medium in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, Python, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a word segmentation unit, a classification unit, and a determination unit. The names of the units do not form a limitation on the units themselves in some cases, and for example, the acquiring unit may also be described as a unit for acquiring the alarm text of the household information to be extracted.
As another aspect, the present disclosure also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring an alarm receiving and processing text of the household registration information to be extracted; performing word segmentation on the alarm receiving and processing text of the household registration information to be extracted to obtain a corresponding word segmentation sequence; for each participle in the resulting sequence of participles, performing the following household information classification operations: inputting the word vector corresponding to the word segmentation into a household location information classification model to obtain a classification result of whether the word segmentation is household location information, wherein the household location information classification model is obtained by pre-training based on a deep learning model; and determining the household registration area information set corresponding to the household registration area information alarm receiving text to be extracted for each word indicating household registration area information according to the corresponding classification result in the word segmentation sequence.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Claims (10)
1. A method for extracting alarm receiving and processing text household address information based on a deep learning model comprises the following steps:
acquiring an alarm receiving and processing text of the household registration information to be extracted;
performing word segmentation on the alarm receiving and processing text of the household registration information to be extracted to obtain a corresponding word segmentation sequence;
for each participle in the resulting sequence of participles, performing the following household information classification operations: inputting the word vector corresponding to the word segmentation into a household location information classification model to obtain a classification result of whether the word segmentation is household location information, wherein the household location information classification model is obtained by pre-training based on a deep learning model;
and determining the household registration area information set corresponding to the household registration area information alarm receiving and processing text to be extracted for each word used for indicating household registration area information according to the corresponding classification result in the word segmentation sequence.
2. The method of claim 1, wherein the household information classification model based on the deep learning model is obtained by training in advance through the following training steps:
acquiring a training sample set, wherein the training sample comprises a word segmentation sequence obtained by segmenting a historical alarm receiving and processing text and a labeling information sequence corresponding to the word segmentation sequence, and the labeling information is used for indicating whether corresponding words in the word segmentation sequence are household address information or not;
determining each training sample of which the corresponding word segmentation sequence in the training sample set comprises the household information word segmentation as a positive sample set, wherein the household information word segmentation is the word segmentation of which the corresponding marking information in the word segmentation sequence indicates that the word segmentation is the household information;
determining a text feature vector of each positive sample according to each household address information participle included in the participle sequence of each positive sample in the positive sample set;
and training an initial deep learning model by taking the text feature vector of the positive sample in the positive sample set as an input and taking the classification result indicating that the sample is the household information as a corresponding expected output, so as to obtain the household information classification model.
3. The method of claim 2, wherein the training step further comprises:
inputting preset negative sample feature vectors into the household registration place information classification model to obtain corresponding actual output results;
adjusting model parameters of the household information classification model according to a difference between the obtained actual output result and a classification result indicating that the household information is not household information.
4. The method according to claim 2 or 3, wherein the determining the text feature vector of each positive sample according to the domicile information participle included in the participle sequence of the positive sample in the positive sample set comprises:
for each positive sample in the set of positive samples, performing the following vector generation and assignment operations: generating a text characteristic vector corresponding to the positive sample, wherein each component in the generated text characteristic vector corresponds to each word in a preset dictionary one by one; setting a component corresponding to the household information participle in the generated text characteristic vector as a word frequency-inverse text frequency index TF-IDF of the household information participle for each household information participle in the word sequence of the positive sample; and setting each unassigned component in the generated text feature vector as a preset numerical value, wherein the unassigned component is a component corresponding to a word of each household address information word in the word segmentation sequence which belongs to the preset dictionary but does not belong to the positive sample.
5. An alarm receiving and processing text household address information extraction device based on a deep learning model comprises:
an obtaining unit configured to obtain an alarm receiving and processing text of the household registration information to be extracted;
the word segmentation unit is configured to segment words of the alarm receiving and processing text of the household registration information to be extracted to obtain a corresponding word segmentation sequence;
a classification unit configured to perform, for each participle in the obtained participle sequence, the following household information classification operations: inputting the word vector corresponding to the word segmentation into a household location information classification model to obtain a classification result of whether the word segmentation is household location information, wherein the household location information classification model is obtained by pre-training based on a deep learning model;
and the determining unit is configured to determine the household location information set corresponding to the household location information alarm receiving and processing text to be extracted for each participle indicating household location information according to the corresponding classification result in the participle sequence.
6. The apparatus of claim 5, wherein the household information classification model based on the deep learning model is obtained by pre-training through the following training steps:
acquiring a training sample set, wherein the training sample comprises a word segmentation sequence obtained by segmenting a historical alarm receiving and processing text and a labeling information sequence corresponding to the word segmentation sequence, and the labeling information is used for indicating whether corresponding words in the word segmentation sequence are household address information or not;
determining each training sample of which the corresponding word segmentation sequence in the training sample set comprises the household information word segmentation as a positive sample set, wherein the household information word segmentation is the word segmentation of which the corresponding marking information in the word segmentation sequence indicates that the word segmentation is the household information;
determining a text feature vector of each positive sample according to each household address information participle included in the participle sequence of each positive sample in the positive sample set;
and training an initial deep learning model by taking the text feature vector of the positive sample in the positive sample set as an input and taking the classification result indicating that the sample is the household information as a corresponding expected output, so as to obtain the household information classification model.
7. The apparatus of claim 6, wherein the training step further comprises:
inputting preset negative sample feature vectors into the household registration place information classification model to obtain corresponding actual output results;
adjusting model parameters of the household information classification model according to a difference between the obtained actual output result and a classification result indicating that the household information is not household information.
8. The apparatus according to claim 6 or 7, wherein the determining the text feature vector of each positive sample according to the domicile information participle included in the participle sequence of the positive sample in the positive sample set includes:
for each positive sample in the set of positive samples, performing the following vector generation and assignment operations: generating a text characteristic vector corresponding to the positive sample, wherein each component in the generated text characteristic vector corresponds to each word in a preset dictionary one by one; setting a component corresponding to the household information participle in the generated text characteristic vector as a word frequency-inverse text frequency index TF-IDF of the household information participle for each household information participle in the word sequence of the positive sample; and setting each unassigned component in the generated text feature vector as a preset numerical value, wherein the unassigned component is a component corresponding to a word of each household address information word in the word segmentation sequence which belongs to the preset dictionary but does not belong to the positive sample.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-4.
10. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-4.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010091294 | 2020-02-13 | ||
CN2020100912949 | 2020-02-13 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113111168A true CN113111168A (en) | 2021-07-13 |
CN113111168B CN113111168B (en) | 2024-09-06 |
Family
ID=76708902
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010306789.9A Active CN113111168B (en) | 2020-02-13 | 2020-04-17 | Method and device for extracting information of text household registration based on deep learning model receiving and processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113111168B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101930561A (en) * | 2010-05-21 | 2010-12-29 | 电子科技大学 | N-Gram participle model-based reverse neural network junk mail filter device |
CN107066449A (en) * | 2017-05-09 | 2017-08-18 | 北京京东尚科信息技术有限公司 | Information-pushing method and device |
CN108345585A (en) * | 2018-01-11 | 2018-07-31 | 浙江大学 | A kind of automatic question-answering method based on deep learning |
CN108363716A (en) * | 2017-12-28 | 2018-08-03 | 广州索答信息科技有限公司 | Realm information method of generating classification model, sorting technique, equipment and storage medium |
CN109684476A (en) * | 2018-12-07 | 2019-04-26 | 中科恒运股份有限公司 | A kind of file classification method, document sorting apparatus and terminal device |
-
2020
- 2020-04-17 CN CN202010306789.9A patent/CN113111168B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101930561A (en) * | 2010-05-21 | 2010-12-29 | 电子科技大学 | N-Gram participle model-based reverse neural network junk mail filter device |
CN107066449A (en) * | 2017-05-09 | 2017-08-18 | 北京京东尚科信息技术有限公司 | Information-pushing method and device |
CN108363716A (en) * | 2017-12-28 | 2018-08-03 | 广州索答信息科技有限公司 | Realm information method of generating classification model, sorting technique, equipment and storage medium |
CN108345585A (en) * | 2018-01-11 | 2018-07-31 | 浙江大学 | A kind of automatic question-answering method based on deep learning |
CN109684476A (en) * | 2018-12-07 | 2019-04-26 | 中科恒运股份有限公司 | A kind of file classification method, document sorting apparatus and terminal device |
Non-Patent Citations (1)
Title |
---|
胡可奇: "基于深度学习的短文本分类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 October 2018 (2018-10-15), pages 138 - 1015 * |
Also Published As
Publication number | Publication date |
---|---|
CN113111168B (en) | 2024-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113326764B (en) | Method and device for training image recognition model and image recognition | |
CN111177319B (en) | Method and device for determining risk event, electronic equipment and storage medium | |
CN110580308B (en) | Information auditing method and device, electronic equipment and storage medium | |
CN113657113B (en) | Text processing method and device and electronic equipment | |
CN112863683A (en) | Medical record quality control method and device based on artificial intelligence, computer equipment and storage medium | |
CN108228567B (en) | Method and device for extracting short names of organizations | |
CN113393306A (en) | Product recommendation method and device, electronic equipment and computer readable medium | |
CN113051911A (en) | Method, apparatus, device, medium, and program product for extracting sensitive word | |
CN113435859A (en) | Letter processing method and device, electronic equipment and computer readable medium | |
CN111723180A (en) | Interviewing method and device | |
US20210349920A1 (en) | Method and apparatus for outputting information | |
CN113111165A (en) | Deep learning model-based alarm receiving warning condition category determination method and device | |
CN113111167B (en) | Method and device for extracting warning text received vehicle model based on deep learning model | |
CN116578925B (en) | Behavior prediction method, device and storage medium based on feature images | |
CN113111233A (en) | Regular expression-based method and device for extracting residential address of alarm receiving and processing text | |
CN113111169A (en) | Deep learning model-based alarm receiving and processing text address information extraction method and device | |
CN113111230B (en) | Regular expression-based alarm receiving text home address extraction method and device | |
CN113111897A (en) | Alarm receiving and warning condition type determining method and device based on support vector machine | |
CN113111174A (en) | Group identification method, device, equipment and medium based on deep learning model | |
CN115470790A (en) | Method and device for identifying named entities in file | |
CN113111168A (en) | Alarm receiving and processing text household registration information extraction method and device based on deep learning model | |
CN114066603A (en) | Post-loan risk early warning method and device, electronic equipment and computer readable medium | |
CN113111164A (en) | Method and device for extracting information of alarm receiving and processing text residence based on deep learning model | |
CN113111229A (en) | Regular expression-based method and device for extracting track-to-ground address of alarm receiving and processing text | |
CN113111170A (en) | Method and device for extracting alarm receiving and processing text track ground information based on deep learning model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |