CN112016327A - Intelligent structured text extraction method and device based on multiple rounds of conversations and electronic equipment - Google Patents

Intelligent structured text extraction method and device based on multiple rounds of conversations and electronic equipment Download PDF

Info

Publication number
CN112016327A
CN112016327A CN202011193595.9A CN202011193595A CN112016327A CN 112016327 A CN112016327 A CN 112016327A CN 202011193595 A CN202011193595 A CN 202011193595A CN 112016327 A CN112016327 A CN 112016327A
Authority
CN
China
Prior art keywords
information
text
extraction
matching
user voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011193595.9A
Other languages
Chinese (zh)
Inventor
唐雨晴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qiyu Information Technology Co Ltd
Original Assignee
Beijing Qiyu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qiyu Information Technology Co Ltd filed Critical Beijing Qiyu Information Technology Co Ltd
Priority to CN202011193595.9A priority Critical patent/CN112016327A/en
Publication of CN112016327A publication Critical patent/CN112016327A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an intelligent structured text extraction method and device based on multiple rounds of conversations and electronic equipment. The method comprises the following steps: configuring information extraction anchor points and corresponding matching rules thereof; acquiring user voice in a historical conversation, identifying the user voice and obtaining a voice text corresponding to the user voice; performing word segmentation processing on a user voice text, and performing text extraction on the user voice text according to an information extraction anchor point and a matching rule; and matching and screening the extracted text characteristic information, and establishing an information base which comprises a structured text information table associated with each information extraction anchor point. The invention can realize the intelligent text extraction of multi-turn conversations, form a structured text information table, further optimize the text extraction method, form information data with higher visualization and stronger structuralization and enhance the convenience of business personnel for inquiring or managing the information data.

Description

Intelligent structured text extraction method and device based on multiple rounds of conversations and electronic equipment
Technical Field
The invention relates to the field of computer information processing, in particular to an intelligent structured text extraction method and device based on multi-turn conversation and electronic equipment.
Background
The information extraction is a process of automatically extracting and converting unstructured data in a document into structured data, and for example, extracts and converts unstructured data such as the names, contract time, and contract addresses of both parties of a contract in a rental contract. The information extraction mainly comprises entity extraction, relation extraction and event extraction from the aspect of extracting content, and the extraction length division mainly comprises vocabulary extraction and field/paragraph extraction. In addition, the extraction of open domain information and the extraction of closed domain information are also divided.
However, with the development of deep neural networks and the enhancement of computer computing power, the existing information extraction method mainly trains an end-to-end deep learning model with a large parameter magnitude based on large-scale labeled data, and then extracts text information under different business scenes based on the trained model. The information extraction method does not perform classification extraction aiming at different extraction lengths, so that the final extraction result has low pertinence, low accuracy and low information extraction efficiency.
In addition, in service-type industries such as finance and e-commerce, in order to improve the service level and work efficiency of the industry, a practitioner needs to mine and arrange voice text data (particularly voice text data in a dialogue system), but at present, a method for arranging relevant text information data in the dialogue system manually is inefficient and has a large workload.
Therefore, there is a need to provide a more efficient intelligent structured text extraction method.
Disclosure of Invention
The invention aims to solve the problems of low efficiency and large workload of the existing method for manually sorting related text information data in a dialogue system. In order to solve the above problems, the present invention provides an intelligent structured text extraction method based on multiple rounds of dialogues, which includes: configuring information extraction anchor points and corresponding matching rules thereof; acquiring user voice in a historical conversation, identifying the user voice and obtaining a voice text corresponding to the user voice; performing word segmentation processing on a user voice text, and performing text extraction on the user voice text according to an information extraction anchor point and a matching rule; and matching and screening the extracted text characteristic information, and establishing an information base which comprises a structured text information table associated with each information extraction anchor point.
Preferably, the method further comprises the following steps: setting a plurality of information extraction anchors and preset matching word sets corresponding to the anchors based on a service scene, and regularly updating the information extraction anchors.
Preferably, the information extraction anchor point includes an identity, a time, an event, a reason, a resource quota, and a resource return intention.
Preferably, the method further comprises the following steps: and setting a matching rule, wherein the matching rule comprises a first matching rule and a second matching rule, the first matching rule is used for extracting text characteristic information corresponding to time and resource quota, and the second matching rule is used for extracting text characteristic information corresponding to identity, event, reason and resource returning intention.
Preferably, the matching and screening the extracted text feature information includes: extracting keywords, and matching the extracted keywords with a preset matching word set to judge whether the extracted keywords are effective text characteristic information of the information extraction anchor points; and storing the effective text characteristic information to a structured text information table of the corresponding information extraction anchor point.
Preferably, the matching and screening the extracted text feature information includes: performing semantic vector conversion on the extracted text characteristic information, and calculating the semantic vector similarity of each matched word in a preset matched word set; and comparing the calculated semantic vector similarity with a set threshold value to judge whether the extracted text characteristic information is the effective text characteristic information of the information extraction anchor point.
Preferably, the method further comprises the following steps: and forming an information base taking the information extraction anchor point as an index based on the extracted effective text characteristic information and the information extraction anchor point, wherein the information base comprises a plurality of structured text information tables in a table form, and one structured text information table corresponds to the plurality of information extraction anchor points.
Preferably, the method further comprises the following steps: semantic vector conversion is performed using a BERT model and a RoBERTa model.
In addition, the invention also provides an intelligent structured text extraction device based on multi-turn dialog, which comprises: the configuration module is used for configuring the information extraction anchor point and the corresponding matching rule; the acquisition module is used for acquiring the user voice in the historical conversation, identifying the user voice and obtaining a voice text corresponding to the user voice; the processing module is used for performing word segmentation processing on the user voice text and performing text extraction on the user voice text according to the information extraction anchor point and the matching rule; and the screening and establishing module is used for matching and screening the extracted text characteristic information and establishing an information base, wherein the information base comprises a structured text information table associated with each information extraction anchor point.
Preferably, the method further comprises the following steps: setting a plurality of information extraction anchors and preset matching word sets corresponding to the anchors based on a service scene, and regularly updating the information extraction anchors.
Preferably, the information extraction anchor point includes an identity, a time, an event, a reason, a resource quota, and a resource return intention.
Preferably, the system further comprises a setting module, wherein the setting module is used for setting matching rules, the matching rules comprise a first matching rule and a second matching rule, the first matching rule is used for extracting text feature information corresponding to time and resource quota, and the second matching rule is used for extracting text feature information corresponding to identity, event, reason and resource return intention.
Preferably, the system further comprises an extraction module, wherein the extraction module is used for extracting keywords and matching the extracted keywords with a preset matching word set so as to judge whether the extracted keywords are effective text feature information of the information extraction anchor point; and storing the effective text characteristic information to a structured text information table of the corresponding information extraction anchor point.
Preferably, the system further comprises a calculation module, wherein the calculation module is used for performing semantic vector conversion on the extracted text feature information and calculating the semantic vector similarity of each matched word in a preset matched word set; and comparing the calculated semantic vector similarity with a set threshold value to judge whether the extracted text characteristic information is the effective text characteristic information of the information extraction anchor point.
Preferably, the method further comprises the following steps: and forming an information base taking the information extraction anchor point as an index based on the extracted effective text characteristic information and the information extraction anchor point, wherein the information base comprises a plurality of structured text information tables in a table form, and one structured text information table corresponds to the plurality of information extraction anchor points.
Preferably, the method further comprises the following steps: semantic vector conversion is performed using a BERT model and a RoBERTa model.
In addition, the present invention also provides an electronic device, wherein the electronic device includes: a processor; and a memory storing computer executable instructions that, when executed, cause the processor to perform the intelligent structured text extraction method based on multiple rounds of dialog of the present invention.
In addition, the present invention also provides a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the intelligent structured text extraction method based on multiple rounds of dialog according to the present invention.
Advantageous effects
Compared with the prior art, the intelligent structured text extraction method can realize intelligent text extraction of multiple rounds of conversations, form the structured text information table, further optimize the text extraction method, realize more effective text extraction, improve the extraction efficiency and accuracy, form more visual and structured information data, and enhance the convenience of business personnel for inquiring or managing the information data.
Drawings
In order to make the technical problems solved by the present invention, the technical means adopted and the technical effects obtained more clear, the following will describe in detail the embodiments of the present invention with reference to the accompanying drawings. It should be noted, however, that the drawings described below are only illustrations of exemplary embodiments of the invention, from which other embodiments can be derived by those skilled in the art without inventive faculty.
FIG. 1 is a flow chart of an example of the intelligent structured text extraction method based on multi-turn dialog of the present invention.
FIG. 2 is a flow chart of another example of the intelligent structured text extraction method based on multi-turn dialog of the present invention.
Fig. 3 is a flowchart of another example of the intelligent structured text extraction method based on multi-turn dialog of the present invention.
Fig. 4 is a schematic structural block diagram of an example of the intelligent structured text extraction device based on multi-turn dialog of the present invention.
Fig. 5 is a schematic structural block diagram of another example of the intelligent structured text extraction device based on multi-turn dialog of the present invention.
Fig. 6 is a schematic structural block diagram of still another example of the intelligent structured text extraction device based on multi-turn dialog of the present invention.
Fig. 7 is a block diagram of an exemplary embodiment of an electronic device according to the present invention.
Fig. 8 is a block diagram of an exemplary embodiment of a computer-readable medium according to the present invention.
Detailed Description
Exemplary embodiments of the present invention will now be described more fully with reference to the accompanying drawings. The exemplary embodiments, however, may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art. The same reference numerals denote the same or similar elements, components, or parts in the drawings, and thus their repetitive description will be omitted.
Features, structures, characteristics or other details described in a particular embodiment do not preclude the fact that the features, structures, characteristics or other details may be combined in a suitable manner in one or more other embodiments in accordance with the technical idea of the invention.
In describing particular embodiments, the present invention has been described with reference to features, structures, characteristics or other details that are within the purview of one skilled in the art to provide a thorough understanding of the embodiments. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific features, structures, characteristics, or other details.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components, or sections, these terms should not be construed as limiting. These phrases are used to distinguish one from another. For example, a first device may also be referred to as a second device without departing from the spirit of the present invention.
The term "and/or" and/or "includes any and all combinations of one or more of the associated listed items.
In order to further optimize a text extraction method, the invention provides an intelligent structured text extraction method based on multiple rounds of conversations, the method can realize intelligent text extraction of multiple rounds of conversations and form a structured text information table, information data with higher visualization and stronger structuralization can be formed while the text extraction method is further optimized, and convenience of business personnel for inquiring or managing the information data can be enhanced.
In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.
Example 1
An embodiment of the intelligent structured text extraction method based on multiple rounds of dialog according to the present invention will be described with reference to fig. 1 to 3.
FIG. 1 is a flow chart of an example of the intelligent structured text extraction method based on multi-turn dialog of the present invention.
As shown in fig. 1, an intelligent structured text extraction method includes the following steps.
Step S101, configuring information extraction anchor points and corresponding matching rules.
Step S102, obtaining the user voice in the history dialogue, recognizing the user voice and obtaining the voice text corresponding to the user voice.
And step S103, performing word segmentation processing on the user voice text, and performing text extraction on the user voice text according to the information extraction anchor point and the matching rule.
And step S104, matching and screening the extracted text characteristic information, and establishing an information base, wherein the information base comprises a structured text information table associated with each information extraction anchor point.
First, in step S101, the configuration information extracts an anchor point and a matching rule corresponding to the anchor point.
In this example, based on a service scenario, a plurality of information extraction anchors and a preset matching word set corresponding to the anchors are set, and the information extraction anchors are periodically updated.
Specifically, the service scenario includes an audit scenario, a post-loan management scenario, and the like. For example, an audit scenario of qualification audit of financial products or financial service products, and a post-loan management scenario after resource allocation.
Further, the information extraction anchor point comprises an identity, a time, an event, a reason, a resource quota and a resource return intention.
In this example, the identity is used to represent information of the user identity, including, for example, a mobile phone number, an account name, a name, an account associate, and the like.
Specifically, the event includes an event related to a resource request, return, post-loan management, or user. More specifically, risk events that affect the return of the resource are included.
Further, the reason and time correspond to an event, the reason mainly refers to the reason for the occurrence of the event, and the time is a specific time point or time period corresponding to the resource request or return.
Furthermore, the resource returning will comprise three levels of strong, medium and weak, which are used for representing risk conditions, and corresponding risk strategies are formulated by determining the risk conditions.
Next, in step S102, the user voice in the history dialogue is acquired, recognized, and a voice text corresponding to the user voice is obtained.
In this example, user information of a history user, user voice input information in history dialogue information, history business information corresponding to the history dialogue, and the like are acquired.
Further, voice recognition is carried out on the acquired user voice input to obtain a user voice text.
For example, in a post-loan management scenario, a voice text after conversion of multiple dialogs is a two-round dialog, which specifically includes: the voice robot asks user 1 "do you are", user 1 says "Pair, I am"; the voice robot says that the returning time of the financial resource a arrives immediately and the returning time can be reached, and the user 1 says that the financial resource a falls ill, does not have income and returns after three days.
It should be noted that the above description is only for illustrative purposes, and the present invention is not limited thereto.
Next, in step S103, word segmentation processing is performed on the user voice text, and text extraction is performed on the user voice text according to the information extraction anchor and the matching rule.
In this example, each piece of user speech text is subjected to word segmentation processing in conjunction with the upper and lower texts of the dialog text information.
Preferably, text data cleaning is also included before the word segmentation processing. After word segmentation processing, the method also comprises the processing of punctuation removal, word overlap removal and the like.
As shown in fig. 2, a step S201 of setting a matching rule is further included.
In step S201, a matching rule is set for text extraction.
Specifically, the matching rules include a first matching rule and a second matching rule, where the first matching rule is used to extract text feature information corresponding to time and resource quota, and the second matching rule is used to extract text feature information corresponding to identity, event, reason, and resource return intention.
And further, extracting the text of the user voice text after word segmentation according to the information extraction anchor point and the matching rule.
For example, the first round of dialog: the voice robot asks user 1 "do you are", user 1 says "pair, i are". In the wheel dialogue, the information extraction anchor point is ' identity ', the ' i is ' i is ' according to the upper and lower text information, then the ' i is ' i ' and ' i is ' i ' obtained by word segmentation. Since it is "identity", it is determined to perform text extraction using the second extraction rule. Thus, according to the second extraction rule, "flare" is extracted from the above three words.
As another example, the second round of dialog: the voice robot says that the returning time of the financial resource a arrives immediately and the returning time can be reached, and the user 1 says that the financial resource a falls ill, does not have income and returns after three days. In the wheel-to-wheel session, the information extraction anchor point is "time" and is determined as the first extraction rule. The eight words of 'I' get sick, do not have income, return after three days 'are obtained by the word segmentation processing, namely' I 'get sick', 'do' not have ',' income 'after three days', 'return' and 'return'. And extracting the text of the segmented user voice text according to a first extraction rule. Thus, "three days past" is extracted from the above eight words.
Preferably, when the information extraction anchor point is "time", the "event" and/or "reason" of the information extraction anchor point is added, and secondary text extraction or tertiary text extraction is performed on the pair of events, thereby realizing more effective text extraction and also improving extraction efficiency and accuracy.
It should be noted that, in the present example, the multi-turn dialog is a two-turn dialog, but the present invention is not limited thereto, and in other examples, the multi-turn dialog may also be a three-turn dialog, a four-turn dialog, or more. The foregoing is illustrative only and is not to be construed as limiting the invention.
Next, in step S104, the extracted text feature information is subjected to matching screening, and an information base is established, which includes a structured text information table associated with each information extraction anchor point.
In this example, keywords are extracted from the user voice text information, and the extracted keywords are matched with a preset matching word set to determine whether the extracted keywords are effective text feature information of the information extraction anchor point.
And further, performing semantic vector conversion on the extracted text characteristic information, and calculating the semantic vector similarity of each matched word in a preset matched word set.
Preferably, the semantic vector conversion is performed using a BERT model and a RoBERTa model.
It should be noted that, for semantic vector conversion, in other examples, a DistilBERT model, an XLNet model, or the like may also be used. The foregoing is illustrative only and is not to be construed as limiting the invention.
In this example, the calculated semantic vector similarity is compared with a set threshold to determine whether the extracted text feature information is valid text feature information of the information extraction anchor.
On one hand, when the calculated semantic vector similarity is larger than a set threshold, the extracted text feature information is judged to be effective text feature information of the information extraction anchor point.
On the other hand, when the calculated semantic vector similarity is larger than the set threshold, the extracted text feature information is judged to be effective text feature information of the information extraction anchor point.
Further, the judged effective text characteristic information is stored in a structured text information table of the corresponding information extraction anchor point.
Further, based on the extracted effective text feature information and the information extraction anchor, an information base indexed by the information extraction anchor is formed.
In this example, the library information base includes a structured text information table associated with each information extraction anchor.
Preferably, the structured text information table is in a table form, and one structured text information table corresponds to a plurality of information extraction anchors, which is specifically shown in the following table.
Figure DEST_PATH_IMAGE001
The table is a structured text information table formed by applying the intelligent structured text extraction method of the invention. As can be seen from the above table, the structured text information table includes a plurality of information anchors, and other information such as identities, times, events, reasons, resource quotas, and resource return willingness corresponding to the information anchors. Therefore, a structured text information table can be formed, so that information data with higher visualization and stronger structuralization is realized, and the convenience of business personnel for inquiring or managing the information data is enhanced.
It should be noted that the structured text information table is described as a preferred example, and the limitation of the present invention is not understood. In other examples, risk information, revenue information, whether there is overdue information or breach information within a predetermined time period, and the like may also be included.
The procedures of the above-described method are merely for illustrating the present invention, and the order and number of the steps are not particularly limited. In addition, the steps in the method may also be split into two (for example, S104 is split into S104 and S301, see fig. 3 specifically), three, or some steps may also be combined into one step, and the adjustment is performed according to an actual example.
Compared with the prior art, the intelligent structured text extraction method can realize intelligent text extraction of multiple rounds of conversations, form the structured text information table, further optimize the text extraction method, realize more effective text extraction, improve the extraction efficiency and accuracy, form more visual and structured information data, and enhance the convenience of business personnel for inquiring or managing the information data.
Those skilled in the art will appreciate that all or part of the steps to implement the above-described embodiments are implemented as programs (computer programs) executed by a computer data processing apparatus. When the computer program is executed, the method provided by the invention can be realized. Furthermore, the computer program may be stored in a computer readable storage medium, which may be a readable storage medium such as a magnetic disk, an optical disk, a ROM, a RAM, or a storage array composed of a plurality of storage media, such as a magnetic disk or a magnetic tape storage array. The storage medium is not limited to centralized storage, but may be distributed storage, such as cloud storage based on cloud computing.
Embodiments of the apparatus of the present invention are described below, which may be used to perform method embodiments of the present invention. The details described in the device embodiments of the invention should be regarded as complementary to the above-described method embodiments; reference is made to the above-described method embodiments for details not disclosed in the apparatus embodiments of the invention.
Example 2
Referring to fig. 4, 5 and 6, the present invention further provides an intelligent structured text extraction apparatus 400 based on multiple rounds of conversations, wherein the intelligent structured text extraction apparatus 400 comprises: a configuration module 401, configured to configure an information extraction anchor and a matching rule corresponding to the information extraction anchor; an obtaining module 402, configured to obtain a user voice in a historical conversation, recognize the user voice, and obtain a voice text corresponding to the user voice; the processing module 403 is configured to perform word segmentation on the user voice text, and perform text extraction on the user voice text according to the information extraction anchor point and the matching rule; and a screening establishing module 404, configured to perform matching screening on the extracted text feature information, and establish an information base, where the information base includes a structured text information table associated with each information extraction anchor point.
Preferably, the method further comprises the following steps: setting a plurality of information extraction anchors and preset matching word sets corresponding to the anchors based on a service scene, and regularly updating the information extraction anchors.
Preferably, the information extraction anchor point includes an identity, a time, an event, a reason, a resource quota, and a resource return intention.
As shown in fig. 5, the system further includes a setting module 501, where the setting module 501 is configured to set matching rules, where the matching rules include a first matching rule and a second matching rule, the first matching rule is used to extract text feature information corresponding to time and resource quota, and the second matching rule is used to extract text feature information corresponding to identity, event, reason, and resource return intention.
As shown in fig. 6, the system further includes an extraction module 601, where the extraction module 601 is configured to extract a keyword, and match the extracted keyword with a preset matching word set, so as to determine whether the extracted keyword is effective text feature information of an information extraction anchor point; and storing the effective text characteristic information to a structured text information table of the corresponding information extraction anchor point.
Preferably, the system further comprises a calculating module 502, wherein the calculating module 502 is configured to perform semantic vector conversion on the extracted text feature information, and calculate semantic vector similarity of each matching word in a preset matching word set; and comparing the calculated semantic vector similarity with a set threshold value to judge whether the extracted text characteristic information is the effective text characteristic information of the information extraction anchor point.
Preferably, the method further comprises the following steps: and forming an information base taking the information extraction anchor point as an index based on the extracted effective text characteristic information and the information extraction anchor point, wherein the information base comprises a plurality of structured text information tables in a table form, and one structured text information table corresponds to the plurality of information extraction anchor points.
Preferably, the method further comprises the following steps: semantic vector conversion is performed using a BERT model and a RoBERTa model.
In embodiment 2, the same portions as those in embodiment 1 are not described.
Compared with the prior art, the intelligent structured text extraction device can realize the intelligent text extraction of multi-turn conversations, form a structured text information table, realize more effective text extraction, improve the extraction efficiency and accuracy, form more visual and structured information data and enhance the convenience of business personnel for inquiring or managing the information data.
Those skilled in the art will appreciate that the modules in the above-described embodiments of the apparatus may be distributed as described in the apparatus, and may be correspondingly modified and distributed in one or more apparatuses other than the above-described embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
Example 3
In the following, embodiments of the electronic device of the present invention are described, which may be regarded as specific physical implementations for the above-described embodiments of the method and apparatus of the present invention. Details described in the embodiments of the electronic device of the invention should be considered supplementary to the embodiments of the method or apparatus described above; for details which are not disclosed in embodiments of the electronic device of the invention, reference may be made to the above-described embodiments of the method or the apparatus.
Fig. 7 is a block diagram of an exemplary embodiment of an electronic device according to the present invention. An electronic device 200 according to the invention will be described below with reference to fig. 7. The electronic device 200 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 7, the electronic device 200 is embodied in the form of a general purpose computing device. The components of the electronic device 200 may include, but are not limited to: at least one processing unit 210, at least one memory unit 220, a bus 230 connecting different system components (including the memory unit 220 and the processing unit 210), a display unit 240, and the like.
Wherein the storage unit stores program code executable by the processing unit 210 to cause the processing unit 210 to perform steps according to various exemplary embodiments of the present invention described in the above-mentioned electronic device processing method section of the present specification. For example, the processing unit 210 may perform the steps as shown in fig. 1.
The memory unit 220 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM) 2201 and/or a cache memory unit 2202, and may further include a read only memory unit (ROM) 2203.
The storage unit 220 may also include a program/utility 2204 having a set (at least one) of program modules 2205, such program modules 2205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 230 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 200 may also communicate with one or more external devices 300 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 200, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 200 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 250. Also, the electronic device 200 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 260. The network adapter 260 may communicate with other modules of the electronic device 200 via the bus 230. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 200, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments of the present invention described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a computer-readable storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, or a network device, etc.) execute the above-mentioned method according to the present invention. The computer program, when executed by a data processing apparatus, enables the computer readable medium to carry out the above-described methods of the invention.
As shown in fig. 8, the computer program may be stored on one or more computer readable media. The computer readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In summary, the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the components in embodiments in accordance with the invention may be implemented in practice using a general purpose data processing device such as a microprocessor or a Digital Signal Processor (DSP). The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.

Claims (10)

1. An intelligent structured text extraction method based on multiple rounds of conversations is characterized by comprising the following steps:
configuring information extraction anchor points and corresponding matching rules thereof;
acquiring user voice in a historical conversation, identifying the user voice and obtaining a voice text corresponding to the user voice;
performing word segmentation processing on a user voice text, and performing text extraction on the user voice text according to an information extraction anchor point and a matching rule;
and matching and screening the extracted text characteristic information, and establishing an information base which comprises a structured text information table associated with each information extraction anchor point.
2. The intelligent structured text extraction method according to claim 1, further comprising:
setting a plurality of information extraction anchors and preset matching word sets corresponding to the anchors based on a service scene, and regularly updating the information extraction anchors.
3. The intelligent structured text extraction method according to claim 1 or 2, wherein the information extraction anchor point comprises identity, time, event, reason, resource quota and resource return will.
4. The intelligent structured text extraction method according to claim 3, further comprising:
and setting a matching rule, wherein the matching rule comprises a first matching rule and a second matching rule, the first matching rule is used for extracting text characteristic information corresponding to time and resource quota, and the second matching rule is used for extracting text characteristic information corresponding to identity, event, reason and resource returning intention.
5. The intelligent structured text extraction method according to claim 4, wherein the matching and screening the extracted text feature information comprises:
extracting keywords, and matching the extracted keywords with a preset matching word set to judge whether the extracted keywords are effective text characteristic information of the information extraction anchor points;
and storing the effective text characteristic information to a structured text information table of the corresponding information extraction anchor point.
6. The intelligent structured text extraction method according to claim 4, wherein the matching and screening the extracted text feature information comprises:
performing semantic vector conversion on the extracted text characteristic information, and calculating the semantic vector similarity of each matched word in a preset matched word set;
and comparing the calculated semantic vector similarity with a set threshold value to judge whether the extracted text characteristic information is the effective text characteristic information of the information extraction anchor point.
7. The intelligent structured text extraction method according to claim 5 or 6, further comprising:
and forming an information base taking the information extraction anchor point as an index based on the extracted effective text characteristic information and the information extraction anchor point, wherein the information base comprises a plurality of structured text information tables in a table form, and one structured text information table corresponds to the plurality of information extraction anchor points.
8. An intelligent structured text extraction device based on multi-turn dialogue, which is characterized by comprising:
the configuration module is used for configuring the information extraction anchor point and the corresponding matching rule;
the acquisition module is used for acquiring the user voice in the historical conversation, identifying the user voice and obtaining a voice text corresponding to the user voice;
the processing module is used for performing word segmentation processing on the user voice text and performing text extraction on the user voice text according to the information extraction anchor point and the matching rule;
and the screening and establishing module is used for matching and screening the extracted text characteristic information and establishing an information base, wherein the information base comprises a structured text information table associated with each information extraction anchor point.
9. An electronic device, wherein the electronic device comprises:
a processor; and the number of the first and second groups,
memory storing computer-executable instructions that, when executed, cause the processor to perform the intelligent structured text extraction method based on multi-turn dialogs of any one of claims 1 to 7.
10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the intelligent structured text extraction method based on multiple rounds of dialog of any one of claims 1 to 7.
CN202011193595.9A 2020-10-30 2020-10-30 Intelligent structured text extraction method and device based on multiple rounds of conversations and electronic equipment Pending CN112016327A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011193595.9A CN112016327A (en) 2020-10-30 2020-10-30 Intelligent structured text extraction method and device based on multiple rounds of conversations and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011193595.9A CN112016327A (en) 2020-10-30 2020-10-30 Intelligent structured text extraction method and device based on multiple rounds of conversations and electronic equipment

Publications (1)

Publication Number Publication Date
CN112016327A true CN112016327A (en) 2020-12-01

Family

ID=73527731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011193595.9A Pending CN112016327A (en) 2020-10-30 2020-10-30 Intelligent structured text extraction method and device based on multiple rounds of conversations and electronic equipment

Country Status (1)

Country Link
CN (1) CN112016327A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378579A (en) * 2021-05-31 2021-09-10 五八到家有限公司 Method, system and electronic equipment for voice input of structured data
CN113535900A (en) * 2021-07-08 2021-10-22 李刚 Target information extraction method, electronic device, and computer-readable storage medium
CN113779228A (en) * 2021-11-15 2021-12-10 北京明略昭辉科技有限公司 Information processing method and device, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378579A (en) * 2021-05-31 2021-09-10 五八到家有限公司 Method, system and electronic equipment for voice input of structured data
CN113535900A (en) * 2021-07-08 2021-10-22 李刚 Target information extraction method, electronic device, and computer-readable storage medium
CN113779228A (en) * 2021-11-15 2021-12-10 北京明略昭辉科技有限公司 Information processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109408526B (en) SQL sentence generation method, device, computer equipment and storage medium
CN112016327A (en) Intelligent structured text extraction method and device based on multiple rounds of conversations and electronic equipment
CN109086026B (en) Broadcast voice determination method, device and equipment
CN110019742B (en) Method and device for processing information
KR20220000046A (en) System and method for manufacturing conversational intelligence service providing chatbot
CN112016275A (en) Intelligent error correction method and system for voice recognition text and electronic equipment
CN110689880A (en) Voice recognition method and device applied to power dispatching field
CN112035626A (en) Rapid identification method and device for large-scale intentions and electronic equipment
CN113205814B (en) Voice data labeling method and device, electronic equipment and storage medium
CN112015402A (en) Method and device for quickly establishing service scene and electronic equipment
CN112100339A (en) User intention recognition method and device for intelligent voice robot and electronic equipment
CN110738055A (en) Text entity identification method, text entity identification equipment and storage medium
CN113836925A (en) Training method and device for pre-training language model, electronic equipment and storage medium
CN114579104A (en) Data analysis scene generation method, device, equipment and storage medium
CN113140219A (en) Regulation and control instruction generation method and device, electronic equipment and storage medium
CN113407610A (en) Information extraction method and device, electronic equipment and readable storage medium
CN114841128B (en) Business interaction method, device, equipment, medium and product based on artificial intelligence
CN112417875B (en) Configuration information updating method and device, computer equipment and medium
CN111949777A (en) Intelligent voice conversation method and device based on crowd classification and electronic equipment
CN108920715B (en) Intelligent auxiliary method, device, server and storage medium for customer service
CN113569578B (en) User intention recognition method and device and computer equipment
CN115658903A (en) Text classification method, model training method, related device and electronic equipment
CN114118937A (en) Information recommendation method and device based on task, electronic equipment and storage medium
CN112925889A (en) Natural language processing method, device, electronic equipment and storage medium
CN112287078A (en) Multi-sentence matching method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination