CN114117017A - Session information extraction method, system, device and storage medium - Google Patents

Session information extraction method, system, device and storage medium Download PDF

Info

Publication number
CN114117017A
CN114117017A CN202111481562.9A CN202111481562A CN114117017A CN 114117017 A CN114117017 A CN 114117017A CN 202111481562 A CN202111481562 A CN 202111481562A CN 114117017 A CN114117017 A CN 114117017A
Authority
CN
China
Prior art keywords
time
time information
text
session
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111481562.9A
Other languages
Chinese (zh)
Inventor
吴盈娇
江小林
罗超
邹宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ctrip Travel Information Technology Shanghai Co Ltd
Original Assignee
Ctrip Travel Information Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ctrip Travel Information Technology Shanghai Co Ltd filed Critical Ctrip Travel Information Technology Shanghai Co Ltd
Priority to CN202111481562.9A priority Critical patent/CN114117017A/en
Publication of CN114117017A publication Critical patent/CN114117017A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a method, a system, equipment and a storage medium for extracting session information, wherein the method comprises the following steps: collecting a conversation text to be processed, and judging whether a time entity exists in the conversation text or not based on a text classification model; if a time entity exists in the session text, extracting at least one candidate time information from the time entity based on a preset time extraction rule; sorting the candidate time information based on a preset candidate time information sorting rule; and selecting at least one candidate time information as the recorded time information according to the sorting result of the candidate time information. The invention realizes the rapid and accurate extraction of the time information from the conversation.

Description

Session information extraction method, system, device and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, a system, a device, and a storage medium for extracting session information.
Background
For fine-grained entity recognition, the main idea is to recognize entity context expression, most of the previous methods employ a sequence labeling model, after text is segmented, whether each word is a part of an entity composition is sequentially judged, each word corresponds to a classification category, for example, "i/0 half/1 hour/1 within/0 replies to you/0", where 1 denotes that the word is an entity composition vocabulary, 0 denotes a non-entity vocabulary, vocabularies labeled with 1 in succession are spliced together to obtain an entity expression fragment "half hour", and a commonly used sequence labeling model is LSTM (Long Short-Term Memory, Long Short-Term Memory network), LSTM-CRF (Conditional Random Field), and the like. But this method cannot realize the function of extracting a specific time. Therefore, a rule-based method is required to extract a specific time value. In addition, the sequence labeling method has the problems of time consumption in labeling, inconsistent labeling and the like.
Disclosure of Invention
In view of the problems in the prior art, an object of the present invention is to provide a method, a system, a device and a storage medium for extracting session information, so as to quickly and accurately extract time information from a session.
The embodiment of the invention provides a session information extraction method, which comprises the following steps:
collecting a conversation text to be processed, and judging whether a time entity exists in the conversation text or not based on a text classification model;
if a time entity exists in the session text, extracting at least one candidate time information from the time entity based on a preset time extraction rule;
sorting the candidate time information based on a preset candidate time information sorting rule;
and selecting at least one candidate time information as the recorded time information according to the sorting result of the candidate time information.
In some embodiments, the text classification model is a pre-trained binary classification model;
after judging whether the time entity exists in the conversation text based on the text classification model, the method further comprises the following steps:
if the time entity does not exist in the session text, ending the time information extraction process of the current session text to be processed;
and if the time entity exists in the session text, acquiring the position of the time entity identified by the text classification model in the session text.
In some embodiments, if a time entity exists in the session text, before extracting at least one candidate time information from the time entity based on a preset time extraction rule, the method further includes the following steps:
and preprocessing the conversation text based on a preset data preprocessing rule to obtain a preprocessed conversation text.
In some embodiments, the pre-processing the conversation text based on the preset data pre-processing rule includes the following steps:
acquiring the position of the time entity identified by the text classification model in a conversation text;
judging whether a plurality of continuous time entities exist in the session text;
if a plurality of continuous time entities exist, identifying time expression boundaries in a plurality of continuous time information to obtain a plurality of separated time entities;
determining whether a relationship between a plurality of separate time entities is a continuous expression relationship or a repetitive expression relationship;
if the relation is a continuous expression relation, adding preset connection words among a plurality of separated time entities of the continuous expression relation;
if the relation is a repeated expression relation, selecting one time entity from a plurality of separated time entities of the repeated expression relation, and deleting the unselected time entities.
In some embodiments, the pre-processing the conversation text based on the preset data pre-processing rule includes the following steps:
and carrying out normalized expression on the expression of the time entity in the conversation text.
In some embodiments, the pre-processing the conversation text based on the preset data pre-processing rule includes the following steps:
obtaining a context text of a time entity;
judging whether the time entity belongs to an interference time entity or not according to the context text of the time entity by adopting a preset interference time judgment rule;
if so, deleting the time entity.
In some embodiments, the extracting at least one candidate time information from the time entity based on the preset time extraction rule includes the following steps:
acquiring a preset date and time identification template, and determining the attribute to be filled in the template;
extracting attribute values corresponding to attributes needing to be filled in the template from the time entity of the session text;
and generating candidate time information based on the attribute values and the date and time identification template.
In some embodiments, ranking the candidate time information comprises:
acquiring a first score expressing the accuracy of each candidate time information and/or acquiring a second score of the position of the candidate time information in the text;
obtaining a confidence degree of the candidate time information according to the first score and/or the second score;
and sorting the candidate time information from high to low according to the confidence degree of the candidate time information.
In some embodiments, selecting at least one candidate time information as the recorded time information according to the sorting result of the candidate time information includes the following steps:
and selecting the candidate time information with the highest confidence as the recorded time information according to the sorting result of the candidate time information.
In some embodiments, obtaining a first score expressing an accuracy of each of the candidate time information comprises:
obtaining a context text of the candidate time information;
judging the priority corresponding to the context text according to a preset priority hit rule;
determining a first score of the candidate time information according to the priority corresponding to the context text;
acquiring a second score of the appearance position of the candidate time information in the text, wherein the second score comprises the following steps:
acquiring the position of the candidate time information in a text;
and determining a second score of the candidate time information according to the appearing position, wherein the closer the appearing position is to the end of the session, the higher the corresponding second score is.
In some embodiments, after selecting at least one candidate time information as the recorded time information according to the sorting result of the candidate time information, the method further comprises the following steps:
creating a reservation event according to the session text, and recording the relationship between the reservation event and customer service;
using the recorded time information as the reserved time information of the reserved event;
when the current time reaches the time point corresponding to the recorded time information, reminding the customer service;
and carrying out statistics on the relationship between the completion time and the reservation time of all reservation events of the customer service in a preset time period, and generating a customer service assessment score according to a statistical result.
The embodiment of the invention also provides a session information extraction system, which is used for realizing the session information extraction method, and the system comprises:
the time entity identification module is used for acquiring a session text to be processed and judging whether a time entity exists in the session text based on a text classification model;
the candidate time extraction module is used for extracting at least one candidate time information from the time entity based on a preset time extraction rule if the time entity exists in the session text;
the candidate time sorting module is used for sorting the candidate time information based on a preset candidate time information sorting rule;
and the time information recording module is used for selecting at least one candidate time information as the recorded time information according to the sorting result of the candidate time information.
An embodiment of the present invention further provides a session information extraction device, including:
a processor;
a memory having stored therein executable instructions of the processor;
wherein the processor is configured to perform the steps of the session information extraction method via execution of the executable instructions.
The embodiment of the present invention further provides a computer-readable storage medium for storing a program, where the program is executed by a processor to implement the steps of the session information extraction method.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
The session information extraction method, the system, the equipment and the storage medium have the following beneficial effects:
the invention firstly identifies the session text, determines whether a time entity exists in the session text, extracts the candidate time information if the time entity exists, and sequences and selects the candidate time information, thereby obtaining the accurate time information for recording.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings.
Fig. 1 is a flowchart of a session information extraction method according to an embodiment of the present invention;
FIGS. 2-4 are flow diagrams illustrating the pre-processing of session information according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating a manner in which the session information extraction method is deployed to an online application according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a session information extraction system according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a session information extraction device according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
As shown in fig. 1, an embodiment of the present invention provides a session information extraction method, including the following steps:
s100: collecting a conversation text to be processed, and judging whether a time entity exists in the conversation text or not based on a text classification model;
s200: if a time entity exists in the session text, extracting at least one candidate time information from the time entity based on a preset time extraction rule;
s300: sorting the candidate time information based on a preset candidate time information sorting rule;
s400: and selecting at least one candidate time information as the recorded time information according to the sorting result of the candidate time information.
The method for extracting the session information comprises the steps of firstly identifying the session text through the step S100, determining whether a time entity exists in the session text, extracting candidate time information through the step S200 if the time entity exists, and sequencing and selecting the candidate time information through the steps S300 and S400, so that accurate time information is obtained for recording.
The session information extraction method can be applied to a customer service scene, the conversation between the customer service and the user or the merchant is monitored, the conversation text, namely the session text, is collected, the appointed time point of the customer service to the user or the merchant is automatically identified, the time of manually recording the appointment time by the customer service is saved, the customer service can be reminded when the appointment time is reached after the recording, and the customer service can be checked according to the condition that the customer service finishes the appointment. By adopting the invention, the extraction of the reservation time can be realized with more than 90% of accuracy and recall rate, and the improvement of the efficiency of the customer service is effectively promoted.
Before time entity recognition is performed by using the text classification model, considering that most of the conversation texts are spoken expressions, the conversation texts may be preprocessed, for example, non-entity contents in the conversation texts are detected by using some preset non-entity hit rules and deleted. For example, as shown in fig. 2, the conversation text includes a very long conversation content before being processed, and after the conversation text is preprocessed, a simplified conversation text is obtained, which is beneficial to accurately and quickly identifying a text classification model.
In this embodiment, the text classification model is a pre-trained binary classification model. For example, the text classification model may employ a TextCNN text classification model to determine whether there are time entities in the conversation text. In step S100, it is first determined whether a time entity exists in the session text, where the task is defined as a classification task in NLP (Natural Language Processing), and a two-classification model is constructed based on the TextCNN network structure, where the two-classification result corresponds to two cases, namely, a time entity and no time entity. When the text classification model is trained, 1 ten thousand pieces of labeled text data are firstly collected as samples to train the model, so that the model has the capability of distinguishing the expression of the existing time entity from the expression of the non-existing time entity, the trained model is directly loaded on line, and the classification of fine-grained time entities can be realized.
In step S100, after determining whether a time entity exists in the session text based on the text classification model, the method further includes the following steps:
if the time entity does not exist in the session text, the appointment time does not exist in the current session text, and the time information does not need to be extracted, the time information extraction process of the current session text to be processed is ended;
if the time entity exists in the session text, the position of the time entity identified by the text classification model in the session text is obtained, and the text classification model can further identify the position of the time entity in the session text besides identifying whether the time entity exists in the session text, so as to be used for extracting candidate time information of the time entity in the session text subsequently.
In practical application, a lot of spoken expressions often exist in the conversation text, and the time information directly identified in the conversation text is easy to identify a lot of redundant information or invalid information. Therefore, in this embodiment, if a time entity exists in the session text, in executing the step S200: before at least one candidate time information is extracted from the time entity based on a preset time extraction rule, the method further comprises the following steps:
and preprocessing the conversation text based on a preset data preprocessing rule to obtain a preprocessed conversation text. One example of pre-processing of the dialog text is shown in fig. 3.
In this embodiment, the preprocessing the conversation text based on the preset data preprocessing rule includes the following steps:
acquiring the position of the time entity identified by the text classification model in a conversation text;
judging whether a plurality of continuous time entities exist in the session text;
if a plurality of continuous time entities exist, identifying time expression boundaries in a plurality of continuous time information to obtain a plurality of separated time entities; for example, as shown in fig. 3, it is detected that there are a plurality of consecutive time entities "thirteen points twenty-seven-one hour", the time expression boundary thereof is identified to obtain "thirteen points twenty-seven" and "one hour", it is detected that there are a plurality of consecutive time entities "two three hours", the time expression boundary thereof is identified to obtain "two" and "three hours", it is detected that there are a plurality of consecutive time entities "seven points zero eighty three eighteen eight", the time expression boundary thereof is identified to obtain "seven points zero eight minutes" and "thirty eight", the time expression boundary thereof may be identified by using a machine learning model, or by using a preset time boundary hit rule;
judging whether the relationship between the separated time entities is a continuous expression relationship or a repeated expression relationship, wherein the judgment of the relationship can be determined based on a preset text relationship judgment rule, for example, a rule is preset, when the semantics of two continuous time entities meet a specific requirement, the relationship is determined to be the continuous expression relationship, and when the semantics of the two continuous time entities are repeated, the relationship is determined to be the repeated expression relationship;
if the relation is a continuous expression relation, adding preset connection words among a plurality of separated time entities of the continuous expression relation; for example, "twenty-seven thirteen points" and "one hour" are continuously expressed, the conjunction "is added, and the expression becomes" one hour of twenty-seven points ";
if the relation is a repeated expression relation, selecting a time entity from a plurality of separated time entities of the repeated expression relation, and deleting the unselected time entities; for example, a "two to three hour" treatment is "three hours".
In this embodiment, the preprocessing the conversation text based on the preset data preprocessing rule includes the following steps:
and carrying out normalized expression on the expression of the time entity in the conversation text. For example, as shown in fig. 3, "sunday" and "day of worship" are all expressed as week 7, and "three points over 15 minutes", "three points over one moment", "three points over fifteen" are all expressed as 3 points over 15 minutes, and the like.
In this embodiment, the preprocessing the conversation text based on the preset data preprocessing rule includes the following steps:
obtaining a context text of a time entity;
judging whether the time entity belongs to an interference time entity or not according to the context text of the time entity by adopting a preset interference time judgment rule; for example, when it is judged that "cannot be present" in the above, it is determined that the time entity is the time that is denied, or when it is judged that "not go, i are on the airplane" in the below, it is determined that the time entity is the time that is denied;
and if so, deleting the time entity to avoid the influence of the existence of excessive interference time on the accuracy of the final time information extraction.
By adopting the data preprocessing process of the conversation text, a conversation text after further normalized processing can be obtained. For example, as shown in fig. 4, the text before the session text preprocessing includes, for example, "i see that you are replying to you within half an hour before you reply to seven cents late within ten am booking's time, and the text after the data preprocessing includes" you reply to you within 20 minutes before you reply to 7 cents late within 30 minutes after you wait ", and it can be seen that the disturbance time entity" ten am "is deleted because it is the non-booking time, and normalized by half an hour and two tenths of seven, so that the problems of expression diversification and expression simplification are solved, and redundant expressions can be cut off, and disturbance of extraction at subsequent times is reduced.
In this embodiment, the step S200: extracting at least one candidate time information from the time entity based on a preset time extraction rule, comprising the following steps:
acquiring a preset date and time identification template, and determining the attribute to be filled in the template;
extracting attribute values corresponding to attributes needing to be filled in the template from the time entity of the session text; specifically, time expression segments are identified from the long sentences of the conversation, specific numerical values corresponding to month/day/hour/minute are respectively extracted, and for holidays, a holiday-date mapping relation needs to be constructed in advance, for example, the date corresponding to the national day festival is 10 months and 1 day. Secondly, designing a mapping template of time expression and time addition and subtraction operation, wherein the expression mapping of 'tomorrow, acquired, next week/week X, X minutes later, within X minutes, and equal X minutes' is time addition operation;
and generating candidate time information based on the attribute values and the date and time identification template, finally uniformly constructing the specific time information into a format of '2021-09-1716: 32: 00', and checking whether the time is reasonable or not, for example, the extracted time is required to be larger than the current time.
In this embodiment, the step S300 of sorting the candidate time information includes the following steps:
acquiring a first score expressing the accuracy of each candidate time information and/or acquiring a second score of the position of the candidate time information in the text;
obtaining a confidence degree of the candidate time information according to the first score and/or the second score;
and sorting the candidate time information from high to low according to the confidence degree of the candidate time information.
Thus, in this embodiment, confidence scoring of the candidate temporal information may be performed based on the dimensions of the first score and/or the second score. The first score is the definition degree of reservation expression, different rules can be adopted to model expressions with different definition degrees as the more clear the expression is expected to be, the rules are assigned with corresponding priorities, and the higher the priority of the rules hit by the context of the candidate time information is, the higher the score is obtained. The second score is the sequence of the time expression segments, namely the time entities appearing in the text, a negotiation process often exists in the preset time in the spoken language scene, and the prior knowledge is introduced, namely the probability that the later time becomes the final appointed time is higher. When the two dimensions of the first score and the second score are comprehensively adopted for performing the confidence score, different weights can be set for the first score and the second score, and the first score and the second score are subjected to weighted summation, and the result is used as the confidence score.
In this embodiment, in the step S400, selecting at least one candidate time information as the recorded time information according to the sorting result of the candidate time information includes the following steps:
and selecting the candidate time information with the highest confidence as the recorded time information according to the sorting result of the candidate time information.
In this embodiment, obtaining a first score expressing the accuracy of each of the candidate time information includes the steps of:
obtaining a context text of the candidate time information;
judging the priority corresponding to the context text according to a preset priority hit rule;
and determining a first score of the candidate time information according to the priority corresponding to the context text.
Acquiring a second score of the appearance position of the candidate time information in the text, wherein the second score comprises the following steps:
acquiring the position of the candidate time information in a text;
and determining a second score of the candidate time information according to the appearing position, wherein the closer the appearing position is to the end of the session, the higher the corresponding second score is, namely the higher the second score of the candidate time information is, the later the appearing position is.
In this embodiment, the step S400: after selecting at least one candidate time information as the recorded time information according to the sorting result of the candidate time information, the method further comprises the following steps:
creating a reservation event according to the session text, recording the relationship between the reservation event and customer service, and if the reservation event is a reservation event with a user, further recording the relationship between the reservation event and the user; the reservation event may include, for example, a call return, a message reply, etc.;
using the recorded time information as the reserved time information of the reserved event;
when the current time reaches the time point corresponding to the recorded time information, reminding the customer service;
the method comprises the steps of counting the relation between the completion time and the reservation time of all reservation events of customer service in a preset time period, generating a customer service assessment score according to a counting result, specifically, calculating the timeliness of the customer service processing events according to the sequence and the time difference between the completion time and the reservation time of the reservation events, establishing a customer service assessment index based on the timeliness, and improving the customer service quality.
As shown in fig. 5, a way to deploy the session information extraction method to an online application is provided. Building a model inference interface based on a Python Fasclk framework, firstly carrying out data preprocessing on an input session text to be predicted, sequentially loading a model 1 and a model 2 for prediction, and returning a prediction result. Here, the model 1 may be a session classification model, and the model 2 may be a model for extracting candidate time information and selecting time information with the highest confidence score after sorting the candidate time information. The method comprises the steps of realizing a real-time monitoring function of a session based on Java service, storing the real-time session in a redis cache, designing a model triggering mechanism, judging whether a Python model needs to be called currently, transmitting the complete session in the redis and the speaking time of each sentence to the Python service after a triggering condition is met, taking a return result of the Python service, and transmitting the return result to a downstream service.
As shown in fig. 6, an embodiment of the present invention further provides a session information extraction system, which is used to implement the session information extraction method, and the system includes:
the time entity identification module M100 is used for acquiring a session text to be processed and judging whether a time entity exists in the session text based on a text classification model;
a candidate time extraction module M200, configured to, if a time entity exists in the session text, extract at least one candidate time information from the time entity based on a preset time extraction rule;
a candidate time sorting module M300, configured to sort the candidate time information based on a preset candidate time information sorting rule;
a time information recording module M400, configured to select at least one candidate time information as the recorded time information according to the sorting result of the candidate time information.
The conversation information extraction system firstly identifies the conversation text through the time entity identification module M100 to determine whether a time entity exists in the conversation text, if the time entity exists, candidate time information is extracted through the candidate time extraction module M200, and the candidate time information is sequenced and selected through the candidate time sequencing module M300 and the time information recording module M400, so that accurate time information is obtained for recording.
In the session information extraction system of the present invention, the functions of each module may be implemented by using the specific implementation manner of the corresponding steps of the session information extraction method described above, which is not described herein again.
The embodiment of the invention also provides a session information extraction device, which comprises a processor; a memory having stored therein executable instructions of the processor; wherein the processor is configured to perform the steps of the session information extraction method via execution of the executable instructions.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" platform.
An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 7. The electronic device 600 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 7, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.
Wherein the storage unit stores program code executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention described in the above-mentioned session information extraction method section of the present specification. For example, the processing unit 610 may perform the steps as shown in fig. 1.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In the session information extraction device, the program in the memory is executed by the processor to realize the steps of the session information extraction method, so the device can also obtain the technical effects of the session information extraction method.
The embodiment of the present invention further provides a computer-readable storage medium for storing a program, where the program is executed by a processor to implement the steps of the session information extraction method. In some possible embodiments, aspects of the present invention may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present invention described in the above-mentioned session information extraction method section of this specification, when the program product is executed on the terminal device.
Referring to fig. 8, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be executed on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The program in the computer storage medium implements the steps of the session information extraction method when executed by a processor, and therefore, the computer storage medium can also obtain the technical effects of the session information extraction method.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (14)

1. A session information extraction method is characterized by comprising the following steps:
collecting a conversation text to be processed, and judging whether a time entity exists in the conversation text or not based on a text classification model;
if a time entity exists in the session text, extracting at least one candidate time information from the time entity based on a preset time extraction rule;
sorting the candidate time information based on a preset candidate time information sorting rule;
and selecting at least one candidate time information as the recorded time information according to the sorting result of the candidate time information.
2. The method according to claim 1, wherein the text classification model is a pre-trained binary classification model;
after judging whether the time entity exists in the conversation text based on the text classification model, the method further comprises the following steps:
if the time entity does not exist in the session text, ending the time information extraction process of the current session text to be processed;
and if the time entity exists in the session text, acquiring the position of the time entity identified by the text classification model in the session text.
3. The method as claimed in claim 1, wherein if a time entity exists in the session text, before extracting at least one candidate time information from the time entity based on a preset time extraction rule, the method further comprises the following steps:
and preprocessing the conversation text based on a preset data preprocessing rule to obtain a preprocessed conversation text.
4. The method according to claim 3, wherein the preprocessing the session text based on the preset data preprocessing rule includes the following steps:
acquiring the position of the time entity identified by the text classification model in a conversation text;
judging whether a plurality of continuous time entities exist in the session text;
if a plurality of continuous time entities exist, identifying time expression boundaries in a plurality of continuous time information to obtain a plurality of separated time entities;
determining whether a relationship between a plurality of separate time entities is a continuous expression relationship or a repetitive expression relationship;
if the relation is a continuous expression relation, adding preset connection words among a plurality of separated time entities of the continuous expression relation;
if the relation is a repeated expression relation, selecting one time entity from a plurality of separated time entities of the repeated expression relation, and deleting the unselected time entities.
5. The method according to claim 3, wherein the preprocessing the session text based on the preset data preprocessing rule includes the following steps:
and carrying out normalized expression on the expression of the time entity in the conversation text.
6. The method according to claim 3, wherein the preprocessing the session text based on the preset data preprocessing rule includes the following steps:
obtaining a context text of a time entity;
judging whether the time entity belongs to an interference time entity or not according to the context text of the time entity by adopting a preset interference time judgment rule;
if so, deleting the time entity.
7. The method according to claim 1, wherein the extracting at least one candidate time information from the time entity based on a preset time extraction rule comprises the following steps:
acquiring a preset date and time identification template, and determining the attribute to be filled in the template;
extracting attribute values corresponding to attributes needing to be filled in the template from the time entity of the session text;
and generating candidate time information based on the attribute values and the date and time identification template.
8. The method according to claim 1, wherein the step of ranking the candidate time information comprises the steps of:
acquiring a first score expressing the accuracy of each candidate time information and/or acquiring a second score of the position of the candidate time information in the text;
obtaining a confidence degree of the candidate time information according to the first score and/or the second score;
and sorting the candidate time information from high to low according to the confidence degree of the candidate time information.
9. The method according to claim 8, wherein selecting at least one candidate time information as the recorded time information according to the ranking result of the candidate time information comprises:
and selecting the candidate time information with the highest confidence as the recorded time information according to the sorting result of the candidate time information.
10. The session information extraction method according to claim 8, wherein obtaining a first score expressing the degree of accuracy of each of the candidate time information includes:
obtaining a context text of the candidate time information;
judging the priority corresponding to the context text according to a preset priority hit rule;
determining a first score of the candidate time information according to the priority corresponding to the context text;
acquiring a second score of the appearance position of the candidate time information in the text, wherein the second score comprises the following steps:
acquiring the position of the candidate time information in a text;
and determining a second score of the candidate time information according to the appearing position, wherein the closer the appearing position is to the end of the session, the higher the corresponding second score is.
11. The method according to claim 1, wherein after selecting at least one candidate time information as the recorded time information according to the result of sorting the candidate time information, the method further comprises the steps of:
creating a reservation event according to the session text, and recording the relationship between the reservation event and customer service;
using the recorded time information as the reserved time information of the reserved event;
when the current time reaches the time point corresponding to the recorded time information, reminding the customer service;
and carrying out statistics on the relationship between the completion time and the reservation time of all reservation events of the customer service in a preset time period, and generating a customer service assessment score according to a statistical result.
12. A session information extraction system for implementing the session information extraction method according to any one of claims 1 to 11, the system comprising:
the time entity identification module is used for acquiring a session text to be processed and judging whether a time entity exists in the session text based on a text classification model;
the candidate time extraction module is used for extracting at least one candidate time information from the time entity based on a preset time extraction rule if the time entity exists in the session text;
the candidate time sorting module is used for sorting the candidate time information based on a preset candidate time information sorting rule;
and the time information recording module is used for selecting at least one candidate time information as the recorded time information according to the sorting result of the candidate time information.
13. A session information extraction device characterized by comprising:
a processor;
a memory having stored therein executable instructions of the processor;
wherein the processor is configured to perform the steps of the session information extraction method of any one of claims 1 to 11 via execution of the executable instructions.
14. A computer-readable storage medium storing a program, wherein the program when executed by a processor implements the steps of the session information extraction method of any one of claims 1 to 11.
CN202111481562.9A 2021-12-06 2021-12-06 Session information extraction method, system, device and storage medium Pending CN114117017A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111481562.9A CN114117017A (en) 2021-12-06 2021-12-06 Session information extraction method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111481562.9A CN114117017A (en) 2021-12-06 2021-12-06 Session information extraction method, system, device and storage medium

Publications (1)

Publication Number Publication Date
CN114117017A true CN114117017A (en) 2022-03-01

Family

ID=80367180

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111481562.9A Pending CN114117017A (en) 2021-12-06 2021-12-06 Session information extraction method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN114117017A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881582A (en) * 2023-07-18 2023-10-13 北京粉笔蓝天科技有限公司 Entry time extraction method based on pattern matching and part-of-speech tagging

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881582A (en) * 2023-07-18 2023-10-13 北京粉笔蓝天科技有限公司 Entry time extraction method based on pattern matching and part-of-speech tagging
CN116881582B (en) * 2023-07-18 2024-02-13 北京粉笔蓝天科技有限公司 Entry time extraction method based on pattern matching and part-of-speech tagging

Similar Documents

Publication Publication Date Title
US20200294489A1 (en) Methods, computing devices, and storage media for generating training corpus
CN112270379A (en) Training method of classification model, sample classification method, device and equipment
CN110910257A (en) Information prediction method, information prediction device, electronic equipment and computer readable medium
CN111651996A (en) Abstract generation method and device, electronic equipment and storage medium
CN114416943B (en) Training method and device for dialogue model, electronic equipment and storage medium
CN114625855A (en) Method, apparatus, device and medium for generating dialogue information
CN111783450A (en) Phrase extraction method and device in corpus text, storage medium and electronic equipment
CN112926308A (en) Method, apparatus, device, storage medium and program product for matching text
CN113239204A (en) Text classification method and device, electronic equipment and computer-readable storage medium
CN112182220A (en) Customer service early warning analysis method, system, equipment and medium based on deep learning
CN114117017A (en) Session information extraction method, system, device and storage medium
US11587567B2 (en) User utterance generation for counterfactual analysis and improved conversation flow
CN113051911A (en) Method, apparatus, device, medium, and program product for extracting sensitive word
CN110263135B (en) Data exchange matching method, device, medium and electronic equipment
CN109933926B (en) Method and apparatus for predicting flight reliability
CN116737927A (en) Gravitational field constraint model distillation method, system, electronic equipment and storage medium for sequence annotation
CN115759100A (en) Data processing method, device, equipment and medium
CN115620726A (en) Voice text generation method, and training method and device of voice text generation model
CN115618264A (en) Method, apparatus, device and medium for topic classification of data assets
CN114417974A (en) Model training method, information processing method, device, electronic device and medium
CN113806541A (en) Emotion classification method and emotion classification model training method and device
CN112465149A (en) Same-city part identification method and device, electronic equipment and storage medium
CN113705206B (en) Emotion prediction model training method, device, equipment and storage medium
CN111353087A (en) Hot word statistical method and device, storage medium and electronic terminal
CN112364149B (en) User question obtaining method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination