CN113792140A - Text processing method and device and computer readable storage medium - Google Patents

Text processing method and device and computer readable storage medium Download PDF

Info

Publication number
CN113792140A
CN113792140A CN202110923172.6A CN202110923172A CN113792140A CN 113792140 A CN113792140 A CN 113792140A CN 202110923172 A CN202110923172 A CN 202110923172A CN 113792140 A CN113792140 A CN 113792140A
Authority
CN
China
Prior art keywords
target
conversation
text
field
session
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110923172.6A
Other languages
Chinese (zh)
Inventor
庄傲然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Xingyun Digital Technology Co Ltd
Original Assignee
Nanjing Xingyun Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Xingyun Digital Technology Co Ltd filed Critical Nanjing Xingyun Digital Technology Co Ltd
Priority to CN202110923172.6A priority Critical patent/CN113792140A/en
Publication of CN113792140A publication Critical patent/CN113792140A/en
Priority to CA3170100A priority patent/CA3170100A1/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Abstract

The invention discloses a text processing method, a text processing device and a computer readable storage medium, wherein the text processing method comprises the following steps: acquiring a session content text; classifying the conversation content texts to obtain target conversation content texts corresponding to the target objects; identifying a target conversation content text based on a pre-constructed classification model and the conversation content text so as to mark a target object; the method comprises the steps of distinguishing an acquisition object and session content of the acquisition object according to the session content, and identifying, judging and marking the session content of the acquisition object according to a pre-established classification model and specific contents combined with a conversation, so that whether the acquisition object is a suspicious illegal loan object or not is accurately and efficiently judged.

Description

Text processing method and device and computer readable storage medium
Technical Field
The invention relates to the technical field of computer information processing, in particular to a text processing method and device and a computer readable storage medium.
Background
At present, in online loan services, the situation that lawless persons illegally conspire to benefit from a whole industrial chain which bypasses a monitoring system and a wind control engine by compiling false information, tampering equipment, script operation, utilizing technical loopholes and the like on a loan platform often occurs.
After introducing the record of urging by the urging system in the prior period, it is found that the abnormal overdue client may mention the suspected potential aggregation condition in the urging process, such as 'loan is operated through an intermediary', 'friend introduction intermediary', etc., so that at present, a certain degree of data is usually explored for the urging user in the post-loan link to judge whether the urging user has the aggregation characteristic, and the specific means is to identify the suspicious words in the session process.
However, when the session in the collection process is identified and screened, all keywords cannot be covered due to manual checking, and the massive text data cannot be subjected to scene restoration, so that the hit rate is low, the keywords are single, and the exploration range cannot be expanded according to the corpus.
Disclosure of Invention
The invention aims to: a text processing method, a text processing device and a computer readable storage medium are provided, which can accurately identify whether an object to be urged to receive is a suspicious illegal loan object according to a call record.
In order to achieve the purpose, the technical scheme of the invention is as follows: in a first aspect, the present invention provides a text processing method, including:
acquiring a session content text;
classifying the conversation content texts to obtain target conversation content texts corresponding to target objects;
and identifying the target conversation content text based on a pre-constructed classification model and the conversation content text so as to mark the target object.
In a preferred embodiment, the obtaining the session content text includes:
and acquiring a conversation content text generated based on conversion of the call record, wherein the conversation content text comprises a conversation object number and a conversation statement field corresponding to the object number.
In a preferred embodiment, the classifying the conversation content text to obtain a target conversation content text corresponding to a target object includes:
and identifying a target conversation object number based on the conversation statement field and acquiring a target conversation statement field corresponding to the target conversation object number, wherein the target conversation content text comprises the target conversation statement field.
In a preferred embodiment, identifying a target session object number based on the session sentence field and acquiring a target session sentence field corresponding to the target session object number includes:
identifying a first preset field in the conversation statement field;
recording the session object number corresponding to the session statement field containing the first preset field as a reference session object number, wherein the session statement field corresponding to the reference session object number forms a reference session content text;
and the part of the conversation statement field except the reference conversation content text is a target conversation statement field, and the conversation object number corresponding to the target conversation statement field is a target conversation object number.
In a preferred embodiment, the identifying the target conversation content text based on the pre-constructed classification model and the conversation content text to mark the target object includes:
acquiring a target identity label and a target field corresponding to the target identity label based on a pre-constructed classification model and the target conversation statement field;
judging whether the target identity label is correct or not based on the reference session content text and the target session content text,
if yes, marking the target session object number by the target identity label;
and if not, updating the target identity label and marking the target session object number with the updated target identity label.
In a preferred embodiment, the determining whether the target identity tag is correct based on the reference session content text and the target session content text includes:
judging whether a conversation sentence field adjacent to the target field in the reference conversation content text contains a second preset field or not,
if so, the target identity label is correct;
if not, judging whether the target identity label is correct or not based on the target session content text.
In a preferred embodiment, the determining whether the target identity tag is correct based on the target session content text includes:
acquiring the probability distribution of each preset identity label based on the classification model and the target conversation statement field;
judging whether the probability distribution standard deviation is larger than a preset threshold value or not;
if so, updating the target identity label by the preset identity label with the maximum probability value;
if not, calculating a probability value corresponding to each preset identity label based on a target conversation statement field and a pre-counted probability value for converting the current conversation intention type to the next conversation type, and selecting the preset identity label corresponding to the probability value with the maximum value to update the target identity label.
In a preferred embodiment, before the identifying the target conversation content text based on the pre-constructed classification model and the conversation content text to mark the target object, the method further includes:
and performing error correction processing on the target session content text based on a pre-constructed error correction database.
In a second aspect, the present invention provides a text processing apparatus, comprising:
the acquisition module is used for acquiring a session content text;
the classification module is used for classifying the conversation content texts to obtain target conversation content texts corresponding to target objects;
and the identification marking module is used for identifying the target conversation content text based on a pre-constructed classification model and the conversation content text so as to mark the target object.
In a third aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, provides the steps of the method of any one of the text processing methods provided in the first aspect.
The invention has the advantages that: provided are a text processing method, a text processing device and a computer-readable storage medium, wherein the text processing method comprises the following steps: acquiring a session content text; classifying the conversation content texts to obtain target conversation content texts corresponding to the target objects; identifying a target conversation content text based on a pre-constructed classification model and the conversation content text so as to mark a target object; the method comprises the steps of distinguishing an acquisition object and session content of the acquisition object according to the session content, and identifying, judging and marking the session content of the acquisition object according to a pre-established classification model and specific contents combined with a conversation, so that whether the acquisition object is a suspicious illegal loan object or not is accurately and efficiently judged.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart of a text processing method according to embodiment 1 of the present invention;
fig. 2 is a text content diagram of session content in embodiment 1 of the present invention;
fig. 3 is a flowchart of determining whether a target identity tag is correct based on a target session content text in the text processing method according to embodiment 1 of the present invention;
fig. 4 is a storage table display diagram generated in the text processing method according to embodiment 1 of the present invention;
fig. 5 is a structural diagram of a text processing apparatus according to embodiment 2 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments that can be derived from the embodiments given herein by a person of ordinary skill in the art are intended to be within the scope of the present disclosure.
As described in the background art, when an abnormal overdue loan user is solicited by a solicited recipient, a dialog often has a key sentence, and currently, the solicited session record is manually screened to identify whether the overdue loan user is abnormal or not, which is limited by the progress and precision of manual check, and the abnormal identity of the overdue loan user cannot be quickly and accurately determined.
In order to solve the problems, the application introduces an NLP (natural language processing) technology to process the content of the urging conversation, constructs a text classification model, extracts a target conversation sentence containing a specific label from the text classification model, accurately identifies the identity of a user in the user conversation, acquires key sentence fields to acquire more evidences so as to facilitate the intermediary intervention condition fed back in the post-credit link, further excavates suspected intermediary aggregation for a hit user group, and facilitates the follow-up tracking of the change situation of overdue loan users with different classifications so as to facilitate early warning.
Example 1: the present embodiment provides a text processing method, which is described with reference to fig. 1, and includes:
and S1, acquiring the session content text.
In a preferred embodiment, the method comprises the steps of:
and acquiring a conversation content text generated based on conversion of the call record, wherein the conversation content text comprises a conversation object number and a conversation statement field corresponding to the object number.
Preferably, the session content text acquired in this embodiment is the session content text generated by converting the quality-checked call records.
Since there are two dialogues in the call record, in order to facilitate the subsequent distinction process, the conversation content text includes the number of each conversation object in addition to the conversation content text generated by the voice conversion, and more specifically, the number of the conversation object of each sentence of the dialog is located before the conversation content text of the dialog. Certainly, the text of the session content may further include a duration or a time node identifier of each sentence of dialog, which is not limited in this embodiment.
And S2, classifying the conversation content texts to obtain target conversation content texts corresponding to the target objects.
In a preferred embodiment, the method comprises the steps of:
and identifying a target conversation object number based on the conversation statement field and acquiring a target conversation statement field corresponding to the target conversation object number, wherein the target conversation content text comprises the target conversation statement field.
More preferably, the method specifically comprises the following steps:
and S21, identifying a first preset field in the conversation statement field.
The first preset field is an identification field capable of identity judgment.
S22, recording the conversation object number corresponding to the conversation statement field containing the first preset field as a reference conversation object number, and forming a reference conversation content text by the conversation statement field corresponding to the reference conversation object number;
and the part except the reference conversation content text in the conversation statement field is a target conversation statement field, and the conversation object number corresponding to the target conversation statement field is a target conversation object number.
Specifically, when the call is called, the call recipient can call after the call is connected, for example: mr. X or Ms X, therefore, Mr. X and Ms X are set as the first preset field, and by identifying the Mr. X or Ms X field in the conversation sentence field, it can be determined that the conversation object number corresponding to the conversation sentence field containing the field is the number of the acquirer, i.e. the reference conversation object number, and all conversation sentence fields corresponding to the reference conversation object number constitute the reference conversation content text. More preferably, the session object number corresponding to the session sentence field containing the first preset number of session sentence fields in the session sentence field is judged to be the reference session object number by identifying the first preset field in the previous preset number of session sentence fields in the session sentence field. The number of the other party in the conversation is the number of the lending user, namely the number of the target object, and the conversation statement field corresponding to the number of the target object forms the conversation content of the lender, namely the text of the target conversation content.
Illustratively, one of '1' and '2' represents an acquirer and one represents a lending user, and the role of the acquirer is identified by including 'mr' or 'lady' information in the first 10 conversations. As shown in fig. 2, the number '1' represents the number of the acquirer, the number '2' represents the lending user, the number '1' of the session object corresponding to the session sentence field containing the "mr" field is determined to be the number of the acquirer, i.e. the reference session object number, by identifying the "mr" field, and all the session sentence fields corresponding to the session object number '1' constitute the reference session content text. The conversation statement fields except the reference conversation content text in the conversation statement fields are target conversation statement fields, all the target conversation statement fields form a target object, namely the conversation content text of the lending user, namely all the conversation statement fields corresponding to the conversation object number '2' form the conversation content text of the target object. The sentence corresponding to '1' is used as the speech of urging person, and the sentence corresponding to '2' is used as the speech of user.
In a preferred embodiment, after S2 and before S3, the method further comprises:
and SA, performing error correction processing on the target session content text based on a pre-constructed error correction database.
Specifically, asr (voice recognition) has poor recognition effect when meeting local dialects, and the conversation content text generated by conversion has error fields, so that the introduction of the text error correction function has great improvement on the classification effect. The pre-constructed error correction database is a black product error correction knowledge base constructed based on pre-collected memorial contents and a financial knowledge base, and the pre-constructed error correction database is a black product error correction knowledge base constructed based on pre-collected hundreds of thousands of memorial contents and the financial knowledge base. The type of the black product error correction knowledge base comprises 2-gram, 3-gram and 4-gram, the format of the knowledge base is shown as follows, wherein the 2-gram corresponds to 'Mr. Shih, win, saint, and claim', the 3-gram corresponds to 'Mi relationship, Mei relationship', the 4-gram corresponds to 'financing, namely fast setting financing, Suningy and Suningy', and if a sentence is matched with a wrong word, the words are replaced by corresponding previous correct words.
And S3, identifying the target conversation content text based on the pre-constructed classification model and the conversation content text to mark the target object.
In a preferred embodiment, the method comprises the steps of:
s31, acquiring the target identity label and the target field corresponding to the target identity label based on the pre-constructed classification model and the target conversation sentence field.
Specifically, the pre-constructed classification model is used for identifying a target field, which is a dialect of the lending user, and classifying the target field into a corresponding preset identity tag category.
The pre-constructed classification model is obtained by the following method:
constructing a machine learning model;
training a machine learning model by adopting a corpus training set to obtain a pre-constructed classification model, wherein the corpus training set comprises 1090 corpora marked with identity labels in advance, and the pre-marked identity labels comprise identity of the user, identity acquaintance, identity negation, questioned identity and others;
constructing a rule classification model for identifying an 'agent agency' label, which is difficult to train a machine learning model because the 'agent agency' label data is few; the pre-constructed classification model comprises the trained machine learning model and the rule classification model. The machine learning classification model can be multinomial NB, Logistic regression, RandomForestClassification, SVM and Fasttext model, and the classification accuracy of each model is shown in the following table. In this embodiment, the Fasttext classification model is preferred.
LogisticRegression MultinomialNB RandomForestClassifier SVM Fasttext
0.7996 0.7990 0.5368 0.8083 0.8152
And after the target conversation sentence is input into the pre-constructed classification model, the pre-constructed classification model outputs a corresponding target identity label and a target field corresponding to the identity label.
Illustratively, five target conversation sentences corresponding to the target conversation object number obtained in the previous step, i.e. the number '2', are sent into the text classifier one by one, the sentences corresponding to 'whether oneself is concerned', 'identity is acquainted', 'identity is negative', 'question identity', 'intermediary agent' are identified, the sentences of 'other' categories are filtered out, the target identity label 'identity is positive', and the corresponding target field has [ 'for you to speak'. 'and' go away, of course. ',' no money woollen has been said to you. ',' kay. ']
S32, judging whether the target identity label is correct or not based on the reference session content text and the target session content text,
if so, the process proceeds to step S33, and if not, the process proceeds to step S34.
It can be seen from the target field corresponding to the target identity tag obtained by the classification model that there are many error data in the classified target field, because the classification model is only used for classifying a single sentence, which is easy to classify the sentence incorrectly, it is necessary to combine context information and verify whether the target identity tag obtained by the classification model is correct by using the context dialogues of the lending user and the acquirer.
In a preferred embodiment, the method comprises the steps of:
s321, judging whether a conversation statement field adjacent to a target field in the reference conversation content text contains a second preset field;
if yes, the target identity tag is correct, otherwise, the process goes to step S322.
Specifically, the second preset field is a keyword of an identity query, the keyword of the identity query comprises 'mr', 'woman', 'hello', and the like, and whether the target identity tag is correct is judged by identifying whether a second preset field is included in a conversation sentence field adjacent to a target field in the conversation content text, namely whether a 'mr' or 'woman' or 'hello' field is included in a conversation sentence adjacent to the target field in the recognition target field. More preferably, it is determined whether the second preset field is included in the conversation sentence fields of the two rounds before and after the target field in the reference conversation content text, that is, it is determined whether the conversation sentence of the acquirer in the two rounds before and after the target field includes the 'mr' or 'ms' or 'hello' field, and if so, the target identity tag is correct.
For example, most cases of identity confirmation are in the beginning, the 'identity positive' tag corresponds to a target field of 'say to you'. ',' of course. ',' no money woollen was said to you. 'kay'. ']. You say first referring to the target field' in the conversation content text. If the preceding session sentence field contains a second preset field ' mr ', the following session sentence field contains ' suning ', and both the preceding and following session sentence fields contain the identity query keyword, i.e. the second preset field, the number of the target session object is recorded to be 2, the target identity tag is confirmed to be correct, and the target field ' is spoken to you. ' store in the identity positive list. The destination field' goes of course. ' the corresponding confidence value is 1, the target identity tag is correct, and is also stored in the identity positive list. But the reference session content text is in the target field 'bingo's of you. The 'previous session statement field is' how this has not been handled yet. 'and' what time did i say was i. If both words do not include the identity query keyword, i.e., the second preset field, the confidence value corresponding to the target field is 0, so that the word "don't pay you. ' Do not deposit in the positive list of identities, like Rinze. ' should also be removed.
And S322, judging whether the target identity label is correct or not based on the target conversation content text.
If the second preset field is not identified from the reference dialog text in step S321, it indicates that the dialog sentence field of the context acquirer does not contain the key information, but sometimes the user 'S dialog actually contains the intention of classification, and it is necessary to use the user' S context dialog to confirm the target identity tag. ,
specifically, the method comprises the following steps:
s3221, obtaining probability distribution of each preset identity tag based on the classification model and the target conversation statement field.
Specifically, after the target conversational sentence field enters the classification model, a probability value of each preset category, that is, a probability value of each preset identity tag, may be output.
S3222, whether the probability distribution standard deviation is larger than a preset threshold value is judged.
If yes, go to step S3223; if not, the process proceeds to step S3224.
Specifically, if the probability value of the identity tag with the highest probability value output by the classification model is much larger than the probabilities of the other identity tags, that is, the standard deviation of the probability values is larger, the identity tag with the highest probability value can be considered to be reliable, and for example, the standard deviation threshold is set to 0.2. The target conversation statement field is ' I ' is ' and the probability distribution output by the classification model is 80% ' identity positive ', 5% ' identity acquaintance ', 5% ' identity negative ', 5% ' challenge identity ' and 5% ' other ', and the standard deviation of the probability is 0.3, so that the identity label ' identity positive ' is reliable. If the predicted probability distributions are relatively close, i.e. the standard deviation is less than 0.2, this means that the classification model cannot be determined on several intentions identity labels with close probabilities for this sentence, which is usually the case that the sentence lacks key information for the classification model to make a decisive judgment. At this time, we need the above dialog sentence field of the user to assist in judging the current intention, and proceed to step S3224.
And S3223, updating the target identity tag with the preset identity tag with the maximum probability value.
S3224, calculating a probability value corresponding to each preset identity tag based on the target conversation statement field and a pre-counted probability value for converting the current conversation intention category to the next conversation category, and selecting the preset identity tag corresponding to the maximum probability value to update the target identity tag.
Specifically, the pre-statistical probability value for switching from the current dialog intention category to the next dialog intention category is obtained by the following method: firstly, the conversation transition probability is counted based on a large number of collected conversation labeling identity label results, and the conversation transition probability is the probability value for converting the current conversation intention category to the next conversation category. For example, the current user's speech intention category ' identity negation ', the next pair of speech corresponding to ' identity acquaintance ' transition probability is 0.6, the corresponding ' identity affirmance ' probability is 0.2, the corresponding ' identity negation ' probability is 0.15, the corresponding ' challenge identity ' probability is 0.05, and the intention transition probability values are shown in the following table:
identity confirmation Identity denial Identity acquaintance Identity of challenge Others
Identity confirmation 0.2 0.1 0.1 0.15 0.45
Identity denial 0.1 0.3 0.2 0.2 0.2
Identity acquaintance 0.2 0.2 0.3 0.1 0.2
Identity of challenge 0.1 0.25 0.25 0.1 0.3
Others 0.15 0.1 0.15 0.1 0.5
The method comprises the steps that a plurality of multi-turn conversations are involved in a black birth hasten corpus, effective information contained in the previous turn of conversation is possibly insufficient, and an identity label corresponding to a target conversation statement field of each turn of conversation is added into a classification model to output multi-turn conversation information of a user. Assuming that the probability values of the previous i-wheel identity label being transferred to the current identity label are respectively piThe current identity label probability is q, alphaiAlpha is the farther from the current statement, representing the i-th round transfer probability weightiThe smaller the value, the current identity tag probability value pfinalIs composed of
Figure BDA0003208201370000101
The final probability, let alpha, is typically calculated from the intent value of the user's three-round dialog1=0.5、ɑ2=0.33、ɑ3And finding the identity label with the highest probability value, and updating the target identity label with the identity label.
And S33, marking the target session object number with the identity label.
Specifically, a category tag [ 'identity positive' ] is written into the category _ type field to mark the target session object number. More preferably, will' say to you. ',' of course. The ' two words and their corresponding tags ' identity positive ' are written into the category field.
And S34, updating the identity label and marking the target session object number with the updated target identity label.
More preferably, the method further comprises:
and S4, generating a storage table based on the session content text, the target identity label and the target field.
Preferably, the storage table further includes the call record ID and the confidence value in step S321, and the storage table is a hive table. The text processing method provided by the embodiment comprises the following steps: acquiring a session content text; classifying the conversation content texts to obtain target conversation content texts corresponding to the target objects; identifying a target conversation content text based on a pre-constructed classification model and the conversation content text so as to mark a target object; the method comprises the steps of distinguishing an acquisition object and session content of the acquisition object according to the session content, and identifying, judging and marking the session content of the acquisition object according to a pre-established classification model and specific contents combined with a conversation, so that whether the acquisition object is a suspicious illegal loan object or not is accurately and efficiently judged.
Example 2: the present embodiment provides a text processing apparatus, as shown in fig. 5, the apparatus including:
an obtaining module 51, configured to obtain a session content text;
the classification module 52 is configured to classify the session content text to obtain a target session content text corresponding to the target object;
and the identification marking module 53 is used for identifying the target conversation content text based on the pre-constructed classification model and the conversation content text so as to mark the target object.
In a preferred embodiment, the obtaining module 51 is configured to obtain a session content text generated by converting the call record, where the session content text includes a session object number and a session sentence field corresponding to the object number.
More preferably, the classification module 52 is configured to:
and identifying a target conversation object number based on the conversation statement field and acquiring a target conversation statement field corresponding to the target conversation object number, wherein the target conversation content text comprises the target conversation statement field.
More preferably, the classification module 52 includes:
the identification submodule 521 is used for identifying a first preset field in the session statement field;
the classification submodule 522 is configured to record a session object number corresponding to a session statement field including a first preset field as a reference session object number, and compose a reference session content text from the session statement field corresponding to the reference session object number;
and the part except the reference conversation content text in the conversation statement field is a target conversation statement field, and the conversation object number corresponding to the target conversation statement field is a target conversation object number.
More preferably, the mark identifying module 53 includes:
the obtaining submodule 531 is configured to obtain a target identity tag and a target field corresponding to the target identity tag based on a pre-established classification model and a target session statement field;
a judging submodule 532, configured to judge whether the target identity tag is correct based on the reference session content text and the target session content text;
the marking sub-module 533 is configured to mark the number of the target session object with the target identity tag when the determining sub-module 532 determines that the target identity tag is correct based on the reference session content text and the target session content text;
and the update marking sub-module 534 is configured to update the target identity tag and mark the target session object number with the updated target identity tag when the determining sub-module 532 determines that the target identity tag is incorrect based on the reference session content text and the target session content text.
More preferably, the determining submodule 533 includes:
the first judging unit 5331 is configured to judge whether a session sentence field adjacent to the target field in the reference session content text includes a second preset field;
a second determining unit 5332, configured to determine whether the target identity tag is correct based on the target session content text
More preferably, the second determination unit 5332 includes:
an obtaining subunit 53321, configured to obtain probability distribution of each preset identity tag based on the classification model and the target session statement field;
a determining subunit 523322, configured to determine whether the probability distribution standard deviation is greater than a preset threshold;
if yes, the update mark sub-module 534 updates the target identity tag with the preset identity tag with the maximum probability value;
the calculation selection subunit 53323 is configured to calculate a probability value corresponding to each preset identity tag based on the target conversation statement field and a pre-counted probability value for converting the current conversation intention category to the next conversation category, and select the preset identity tag corresponding to the probability value with the largest value, where at this time, the update labeling submodule 534 updates the target identity tag with the preset identity tag corresponding to the probability value with the largest value.
In a preferred embodiment, the apparatus further comprises:
and an error correction module 54, configured to perform error correction processing on the target conversation content text based on the pre-constructed error correction database before the recognition marking module 53 recognizes the target conversation content text based on the pre-constructed classification model and the conversation content text to mark the target object.
The beneficial effects of the event video clipping system provided in this embodiment for implementing the event video clipping method provided in embodiment 1 are the same as those of the event video clipping method provided in embodiment 1, and are not described herein again.
It should be noted that: in the text processing apparatus provided in the above embodiment, when executing a text processing method, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the above described functions. In addition, the text processing apparatus and the text processing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.
Example 3: the present embodiment provides a computer-readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing any of the following steps when the computer program is executed by the processor:
acquiring a session content text;
classifying the conversation content texts to obtain target conversation content texts corresponding to target objects;
and identifying the target conversation content text based on a pre-constructed classification model and the conversation content text so as to mark the target object.
The beneficial effects of a computer-readable storage medium provided in this embodiment for processing and executing the steps of the text processing method provided in embodiment 1 are the same as those of the text processing method provided in embodiment 1, and are not described herein again.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be, but is not limited to, a read-only memory, a magnetic or optical disk, and the like.
It should be understood that the above-mentioned embodiments are only illustrative of the technical concepts and features of the present invention, and are intended to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the scope of the present invention. All modifications made according to the spirit of the main technical scheme of the invention are covered in the protection scope of the invention.

Claims (10)

1. A method of text processing, the method comprising:
acquiring a session content text;
classifying the conversation content texts to obtain target conversation content texts corresponding to target objects;
and identifying the target conversation content text based on a pre-constructed classification model and the conversation content text so as to mark the target object.
2. The text processing method of claim 1, wherein the obtaining the session content text comprises:
and acquiring a conversation content text generated based on conversion of the call record, wherein the conversation content text comprises a conversation object number and a conversation statement field corresponding to the object number.
3. The text processing method according to claim 2, wherein the classifying the conversation content text to obtain a target conversation content text corresponding to a target object comprises:
and identifying a target conversation object number based on the conversation statement field and acquiring a target conversation statement field corresponding to the target conversation object number, wherein the target conversation content text comprises the target conversation statement field.
4. The text processing method of claim 3, wherein identifying a target conversational sentence field based on the conversational sentence field and obtaining a target conversational sentence field corresponding to the target conversational sentence field comprises:
identifying a first preset field in the conversation statement field;
recording the session object number corresponding to the session statement field containing the first preset field as a reference session object number, wherein the session statement field corresponding to the reference session object number forms a reference session content text;
and the part of the conversation statement field except the reference conversation content text is a target conversation statement field, and the conversation object number corresponding to the target conversation statement field is a target conversation object number.
5. The text processing method of claim 4, wherein the identifying the target conversational content text to mark the target object based on the pre-constructed classification model and the conversational content text comprises:
acquiring a target identity label and a target field corresponding to the target identity label based on a pre-constructed classification model and the target conversation statement field;
judging whether the target identity label is correct or not based on the reference session content text and the target session content text,
if yes, marking the target session object number by the target identity label;
and if not, updating the target identity label and marking the target session object number with the updated target identity label.
6. The text processing method of claim 5, wherein the determining whether the target identity tag is correct based on the reference session content text and the target session content text comprises:
judging whether a conversation sentence field adjacent to the target field in the reference conversation content text contains a second preset field or not,
if so, the target identity label is correct;
if not, judging whether the target identity label is correct or not based on the target session content text.
7. The text processing method of claim 6, wherein the determining whether the target identity tag is correct based on the target session content text comprises:
acquiring the probability distribution of each preset identity label based on the classification model and the target conversation statement field;
judging whether the probability distribution standard deviation is larger than a preset threshold value or not;
if so, updating the target identity label by the preset identity label with the maximum probability value;
if not, calculating a probability value corresponding to each preset identity label based on a target conversation statement field and a pre-counted probability value for converting the current conversation intention type to the next conversation type, and selecting the preset identity label corresponding to the probability value with the maximum value to update the target identity label.
8. The text processing method of claim 4, wherein before the identifying the target conversational content text to mark the target object based on the pre-constructed classification model and the conversational content text, the method further comprises:
and performing error correction processing on the target session content text based on a pre-constructed error correction database.
9. A text processing apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring a session content text;
the classification module is used for classifying the conversation content texts to obtain target conversation content texts corresponding to target objects;
and the identification marking module is used for identifying the target conversation content text based on a pre-constructed classification model and the conversation content text so as to mark the target object.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.
CN202110923172.6A 2021-08-12 2021-08-12 Text processing method and device and computer readable storage medium Withdrawn CN113792140A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110923172.6A CN113792140A (en) 2021-08-12 2021-08-12 Text processing method and device and computer readable storage medium
CA3170100A CA3170100A1 (en) 2021-08-12 2022-08-10 Text processing method and device and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110923172.6A CN113792140A (en) 2021-08-12 2021-08-12 Text processing method and device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN113792140A true CN113792140A (en) 2021-12-14

Family

ID=78875896

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110923172.6A Withdrawn CN113792140A (en) 2021-08-12 2021-08-12 Text processing method and device and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN113792140A (en)
CA (1) CA3170100A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114285929A (en) * 2021-12-27 2022-04-05 中国联合网络通信集团有限公司 Method, device and storage medium for identifying malicious anti-hasten users

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107343077A (en) * 2016-04-28 2017-11-10 腾讯科技(深圳)有限公司 Identify malicious call and establish the method, apparatus of identification model, equipment
CN107886955A (en) * 2016-09-29 2018-04-06 百度在线网络技术(北京)有限公司 A kind of personal identification method, device and the equipment of voice conversation sample
CN109241256A (en) * 2018-08-20 2019-01-18 百度在线网络技术(北京)有限公司 Dialog process method, apparatus, computer equipment and readable storage medium storing program for executing
CN110136727A (en) * 2019-04-16 2019-08-16 平安科技(深圳)有限公司 Speaker's personal identification method, device and storage medium based on speech content
WO2020073530A1 (en) * 2018-10-12 2020-04-16 平安科技(深圳)有限公司 Customer service robot session text classification method and apparatus, and electronic device and computer-readable storage medium
CN111382270A (en) * 2020-03-05 2020-07-07 中国平安人寿保险股份有限公司 Intention recognition method, device and equipment based on text classifier and storage medium
CN111508501A (en) * 2020-07-02 2020-08-07 成都晓多科技有限公司 Voice recognition method and system with accent for telephone robot
CN111695352A (en) * 2020-05-28 2020-09-22 平安科技(深圳)有限公司 Grading method and device based on semantic analysis, terminal equipment and storage medium
CN112100349A (en) * 2020-09-03 2020-12-18 深圳数联天下智能科技有限公司 Multi-turn dialogue method and device, electronic equipment and storage medium
CN112307168A (en) * 2020-10-30 2021-02-02 康键信息技术(深圳)有限公司 Artificial intelligence-based inquiry session processing method and device and computer equipment
CN112836025A (en) * 2019-11-22 2021-05-25 航天信息股份有限公司 Intention identification method and device
CN113066499A (en) * 2021-03-12 2021-07-02 四川大学 Method and device for identifying identity of land-air conversation speaker

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107343077A (en) * 2016-04-28 2017-11-10 腾讯科技(深圳)有限公司 Identify malicious call and establish the method, apparatus of identification model, equipment
CN107886955A (en) * 2016-09-29 2018-04-06 百度在线网络技术(北京)有限公司 A kind of personal identification method, device and the equipment of voice conversation sample
CN109241256A (en) * 2018-08-20 2019-01-18 百度在线网络技术(北京)有限公司 Dialog process method, apparatus, computer equipment and readable storage medium storing program for executing
WO2020073530A1 (en) * 2018-10-12 2020-04-16 平安科技(深圳)有限公司 Customer service robot session text classification method and apparatus, and electronic device and computer-readable storage medium
CN110136727A (en) * 2019-04-16 2019-08-16 平安科技(深圳)有限公司 Speaker's personal identification method, device and storage medium based on speech content
CN112836025A (en) * 2019-11-22 2021-05-25 航天信息股份有限公司 Intention identification method and device
CN111382270A (en) * 2020-03-05 2020-07-07 中国平安人寿保险股份有限公司 Intention recognition method, device and equipment based on text classifier and storage medium
CN111695352A (en) * 2020-05-28 2020-09-22 平安科技(深圳)有限公司 Grading method and device based on semantic analysis, terminal equipment and storage medium
CN111508501A (en) * 2020-07-02 2020-08-07 成都晓多科技有限公司 Voice recognition method and system with accent for telephone robot
CN112100349A (en) * 2020-09-03 2020-12-18 深圳数联天下智能科技有限公司 Multi-turn dialogue method and device, electronic equipment and storage medium
CN112307168A (en) * 2020-10-30 2021-02-02 康键信息技术(深圳)有限公司 Artificial intelligence-based inquiry session processing method and device and computer equipment
CN113066499A (en) * 2021-03-12 2021-07-02 四川大学 Method and device for identifying identity of land-air conversation speaker

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114285929A (en) * 2021-12-27 2022-04-05 中国联合网络通信集团有限公司 Method, device and storage medium for identifying malicious anti-hasten users

Also Published As

Publication number Publication date
CA3170100A1 (en) 2023-02-12

Similar Documents

Publication Publication Date Title
CN109582949B (en) Event element extraction method and device, computing equipment and storage medium
CN110826316B (en) Method for identifying sensitive information applied to referee document
CN110175229B (en) Method and system for on-line training based on natural language
CN111177382B (en) Intelligent legal system recommendation auxiliary system based on FastText algorithm
CN113033438B (en) Data feature learning method for modal imperfect alignment
CN105868179A (en) Intelligent asking-answering method and device
US11363146B2 (en) Unsupervised method and system to automatically train a chatbot using domain conversations
CN112287090A (en) Financial question asking back method and system based on knowledge graph
Schraagen et al. Evaluation of Named Entity Recognition in Dutch online criminal complaints
CN110610003B (en) Method and system for assisting text annotation
CN112016850A (en) Service evaluation method and device
CN113486166B (en) Construction method, device and equipment of intelligent customer service robot and storage medium
CN110750626B (en) Scene-based task-driven multi-turn dialogue method and system
CN113792140A (en) Text processing method and device and computer readable storage medium
CN110362828B (en) Network information risk identification method and system
TW202133027A (en) Dialogue system and method for human-machine cooperation
IT201900000526A1 (en) ARTIFICIAL INTELLIGENCE SYSTEM FOR BUSINESS PROCESSES
CN111464687A (en) Strange call request processing method and device
CN116150313A (en) Data expansion processing method and device
CN112364136B (en) Keyword generation method, device, equipment and storage medium
CN110766091B (en) Method and system for identifying trepanning loan group partner
CN113362169A (en) Catalytic recovery optimization method and device
CN113111855A (en) Multi-mode emotion recognition method and device, electronic equipment and storage medium
CN117119104B (en) Telecom fraud active detection processing method based on virtual character orientation training
TWI725577B (en) Intelligent voice information quality inspection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20211214