WO2024027552A1

WO2024027552A1 - Text classification method and apparatus, text recognition method and apparatus, electronic device and storage medium

Info

Publication number: WO2024027552A1
Application number: PCT/CN2023/109568
Authority: WO
Inventors: 李长林; 肖冰; 曹磊; 罗奇帅
Original assignee: 马上消费金融股份有限公司
Priority date: 2022-08-03
Filing date: 2023-07-27
Publication date: 2024-02-08
Also published as: CN117556040A

Abstract

The present disclosure provides a text classification method and apparatus, a text recognition method and apparatus, an electronic device, and a storage medium. The text classification method comprises: acquiring a text to be classified; on the basis of a preset text class feature and the text to be classified, generating a feature value of the text class feature of the text to be classified; performing text classification processing on the text to be classified according to the feature value of the text class feature, to obtain a text classification result, the text classification result being used to indicate whether a specified type of noise is present.

Description

Text classification method and device, text recognition method and device, electronic equipment, storage medium

Cross-references to related applications

This patent application claims priority from Chinese patent application 202210928633.3, which was filed with the State Intellectual Property Office of China on August 3, 2022. The disclosure of this Chinese patent application is incorporated herein by reference in its entirety.

Technical field

The present disclosure relates to the field of computer technology, and in particular to a text classification method, recognition method and device, equipment, and storage medium.

Background technique

In the field of natural language processing, a large number of text processing tasks can be solved by text classification. Text classification refers to the automatic classification of text according to certain standards. For example, text processing tasks such as sentiment analysis, intent recognition, and question and answer matching can be processed through text classification, which can improve text processing capabilities.

When performing text task processing for text recognition, the text content that needs to be recognized may contain corresponding interference information due to the presence of noise, resulting in problems such as semantic incoherence and semantic confusion in the text content, and thus the inability to obtain objective text recognition results. The problem. Therefore, text classification needs to be performed based on whether there is noise data in the text, so that during the text recognition process, the interference caused by noise data can be reduced based on the classification results.

Contents of the invention

The present disclosure provides a text classification method and device, a text recognition method and device, electronic equipment, and storage media.

In a first aspect, the present disclosure provides a text classification method. The text classification method It includes: obtaining the text to be classified; generating feature values of the text feature of the text to be classified based on the preset text feature and the text to be classified; performing text classification processing on the text to be classified based on the feature values of the text feature to obtain text classification As a result, the text classification results are used to indicate whether the specified type of noise is present.

In a second aspect, the present disclosure provides a text recognition method. The text recognition method includes: performing sensitive word recognition on the acquired text to be recognized, and obtaining a sensitive word recognition result; according to the characteristic value of the text-type feature of the text to be recognized, The identified text is subjected to text classification processing and a text classification result is generated. The text classification result is used to indicate whether the specified type of noise exists; based on the sensitive word recognition result and the text classification result, a text recognition result of the text to be recognized is generated.

In a third aspect, the present disclosure provides a text classification device. The text classification device includes: an acquisition module for acquiring text to be classified; a feature value generation module for generating based on preset text class features and the text to be classified. The characteristic value of the text class feature of the text to be classified; the classification determination module is used to perform text classification processing on the text to be classified according to the characteristic value of the text class feature to obtain a text classification result, and the text classification result is used to indicate whether the specified type of noise exists.

In a fourth aspect, the present disclosure provides a text recognition device. The text recognition device includes: a word recognition module for performing sensitive word recognition on the acquired text to be recognized to obtain a sensitive word recognition result; and a classification module for performing sensitive word recognition according to the text to be recognized. Identify the feature values of the text-type features of the text, perform text classification processing on the text to be recognized, and generate text classification results. The text classification results are used to indicate whether the specified type of noise exists; the result generation module is used to identify the sensitive words and the text classification results based on the results. , generate text recognition results of the text to be recognized

In a fifth aspect, the present disclosure provides an electronic device. The electronic device includes: at least one processor; and a memory communicatively connected to the at least one processor. The memory stores one or more computer programs, and one or more computer programs. Can be executed by at least one processor, so that at least one processor can execute the above-mentioned text classification method or text recognition method.

In a sixth aspect, the present disclosure provides a computer-readable storage medium on which a computer program is stored. The computer program implements the above-mentioned text classification method or text recognition method when executed by a processor/processing core.

The embodiments provided by the present disclosure can generate feature values of the text feature of the text to be classified based on the preset text feature and the text to be classified, and perform text classification processing on the feature values of the generated text feature to obtain the text Classification results, through which the text classification results can be used to determine whether there is specified type of noise in the text to be classified. This text classification method can determine whether there is a specified type of noise in the text to be classified based on text characteristics, so that during the text recognition process, the interference caused by the noise data can be reduced based on the classification results, which is conducive to obtaining objective Text recognition results.

It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood from the following description.

Description of the drawings

The accompanying drawings are used to provide a further understanding of the present disclosure and constitute a part of the specification. They are used to explain the present disclosure together with the embodiments of the present disclosure and do not constitute a limitation of the present disclosure. The above and other features and advantages will become more apparent to those skilled in the art by describing detailed example embodiments with reference to the accompanying drawings, in which:

Figure 1 is a scene diagram of a voice call service provided by an embodiment of the present disclosure;

Figure 2 is a flow chart of a text classification method provided by an embodiment of the present disclosure;

Figure 3 is a schematic flowchart of model training and model use provided by an embodiment of the present disclosure;

Figure 4 is a flow chart of a text recognition method provided by an embodiment of the present disclosure;

Figure 5 is a flow chart of another text recognition method provided by an embodiment of the present disclosure;

Figure 6 is a block diagram of a text classification device provided by an embodiment of the present disclosure;

Figure 7 is a block diagram of a text recognition device provided by an embodiment of the present disclosure;

Figure 8 is a block diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as Considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes may be made to the embodiments described herein and modifications without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

The embodiments of the present disclosure and the features in the embodiments may be combined with each other without conflict.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The terminology used herein is used to describe particular embodiments only and is not intended to limit the disclosure. As used herein, the singular forms "a," "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that when the terms "comprising" and/or "made of" are used in this specification, the presence of features, integers, steps, operations, elements and/or components is specified but does not exclude the presence or addition of a or a plurality of other features, integers, steps, operations, elements, components and/or groups thereof. Words such as "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will also be understood that terms such as those defined in commonly used dictionaries should be construed to have meanings consistent with their meanings in the context of the relevant art and the present disclosure, and will not be construed as having idealized or excessive formal meanings, Unless expressly so limited herein.

In practical application scenarios, speech recognition technology is a technology that can convert speech data into text information. Speech recognition technology involves many disciplines and technical fields such as acoustics, phonetics, linguistics, digital signal processing theory, information theory, and computer science. Due to the diversity and complexity of speech signals, the processing performance of speech signals by speech processing equipment is easily affected by the following performance influencing factors, such as the size of the recognition vocabulary and the complexity of the speech, the quality of the speech signal, the speaker's Quantity (single speaker vs. multiple speakers), quality of call hardware, and processing power. Under the influence of the above performance factors, the recognition accuracy of speech processing is subject to certain limitations.

Figure 1 is a scene diagram of a voice call service provided by an exemplary embodiment of the present disclosure. As shown in Figure 1, this scenario includes: user 10, user communication device 11, customer service agent 20, agent communication device 21, communication network 30 and voice processing device 40. User call equipment 11 establishes a call with the agent call device 21 through the communication network 30, and the customer service agent 20 provides voice call services to the user 10 during the call, such as receiving consultations, handling business, etc.

In order to detect the service quality of the voice call, the voice processing device 40 can obtain the voice data that requires quality inspection among the voice data of both parties to the call from the communication network 30, and convert the voice data that requires quality inspection into the corresponding voice data through automatic speech recognition. dialogue text, and perform service quality detection based on the dialogue text to obtain service quality detection results.

It should be noted that in the technical solution of the present disclosure, when obtaining voice data that requires quality inspection from the communication network 30, authorization confirmation from the user involved in the call is required. For example, the user 10 must obtain authorization and consent before collecting the above-mentioned voice data that requires quality inspection. The acquisition, storage, use and processing of data in this disclosed technical solution all comply with the relevant provisions of national laws and regulations.

In some embodiments, the main content of voice service quality inspection (voice quality inspection for short) is sensitive word recognition. Specifically, sensitive words include words that do not comply with norms such as industry norms, management norms and/or disciplinary norms. In some scenarios, a sensitive word database can be established in advance based on the usage scenario. The sensitive word database can include dirty words, abusive words, uncivilized words, threatening words, words related to major events, and other sensitive words created according to specific specifications. word. Since the same word may or may not be a sensitive word in different language environments, the sensitive word dictionary needs to be updated in a timely manner according to the phonetic environment of the usage scenario.

In some embodiments, sensitive words may include words of interest. Specifically, the words of interest may include at least one of service evaluation terms, business evaluation terms, and business keywords. In some scenarios, based on the identification of words of interest, it is helpful to obtain the business information that the user is interested in, the user's evaluation information on related services, and the user's evaluation information on related servers.

In the voice quality inspection scenario, the call process between the user 10 and the customer service agent 20 may be interfered by various types of noise, such as environmental noise, human voice interference, reverberation, echo and other interference sources. The source of environmental noise can be a machine capable of playing meaningful audio signals (such as a radio, audio player, etc.). Reverberation can be understood as an acoustic phenomenon in which a sound signal and the sound signal are repeatedly reflected and absorbed by obstacles during propagation to form a superposition of sound waves. Echo can also be called acoustic echo (Acoustic echo) Echo), echo can be understood as a repeated sound signal formed by the propagation and reflection of the sound played by the speaker of the speech processing device itself in the space. This repeated sound signal will be transmitted back to the microphone to form noise interference.

The noise interference caused by call return noise (also called call return noise, return noise) is one type of echo interference. This noise interference is usually caused by the hardware characteristics of the communication equipment itself. For example, the customer service agent's call equipment has poor isolation of its own transceiver loop, or the call equipment's own loudspeaker is louder and the microphone sensitivity is higher, causing the sound played in the call equipment's own speaker to be transmitted back to the microphone, and then transmitted back to the microphone. The sound data from the microphone is mixed with the voice data of the customer service agent, forming return noise in the conversation text of the customer service agent.

As a specific example of return noise, the following schematic dialogue text shows part of the dialogue text between a user (hereinafter referred to as a customer) and a customer service agent (hereinafter referred to as an agent) in the form of a dialogue. The content of each dialogue The speaker's identity and the corresponding call content can be separated by a separator symbol (such as a colon); the left side of the colon represents the speaker's identity, and the right side of the colon represents the speaker's call content in text form.

Example 1: The call content may include the following text information.

Customer: Keep calling me during my working hours. Can you be a little more qualified?

Agent: Somewhat qualified. We will leave you a note here to reduce the number of calls to you during working hours. Goodbye.

Example 2: The call content may include the following text information.

Customer: If you push me again, I will file a complaint against you.

Agent: Go complain to you. Sir, we are here to remind you to pay attention to your credit report and hope that you will deal with it as soon as possible.

Example 3: The call content may include the following text information.

Customer: Hello, yes, I am XXX (xxx is pronounced as: shenjingjing).

Attendant: Crazy. We are here to inform you that your information is about to expire and we hope you can handle it as soon as possible.

It can be seen from the text information of the above call content: "Some quality" in the agent's call content in Example 1 and "Go and complain about you" in the agent's call content in Example 2 are return noises. In Example 3, the agent's call content The "neurosis" is based on the interference of return noise and the speech recognition errors caused by the text conversion of automatic speech recognition. That is That is to say, when automatic speech recognition technology is used to convert voice call data into text (also called speech translation), the return noise data caused by the voice backhaul phenomenon will also be translated. Regardless of whether the translation is correct, the translation result will be affected. Cause noise interference.

Among related technologies, voiceprint recognition technology is also called speaker recognition technology. It is an intelligent voice core technology that uses computer systems to automatically complete speaker identity recognition. This technology is based on the unique personality information of the speaker contained in the voice data, and uses computers and current information recognition technology to automatically identify the identity of the speaker corresponding to the current voice. This technology is used to identify noise data in speech data and remove the noise data to improve the accuracy of automatic speech recognition technology and thereby improve the accuracy of speech quality inspection.

In some scenarios, voiceprint recognition technology can be used to identify the identity of the speaker in the voice call data. Based on the identified identity of the current speaker, the voice data that does not belong to the current speaker is eliminated, and the voice of the current speaker is retained. data to denoise voice call data.

For example, the denoising process of voice data through voiceprint recognition technology may include: receiving voice data; based on the unique personality information of each speaker in the voice data, automatically identifying the identity of each speaker through voiceprint recognition technology; The audio data of the designated speaker's voice data is used as noise data, the noise data is removed from the current voice data, the designated speaker's voice data is retained, and automatic speech recognition technology is used to translate the designated speaker's voice data. Get the translated dialogue text of the specified speaker.

In the above-mentioned voice data denoising process, since the audio information of the noise data is usually short, the voiceprint recognition technology usually cannot correctly identify the identity of the speaker corresponding to the current voice; the superposition of noise data and the voice data of both parties in the call, such as When both parties speak at the same time, the superposition of noise data will further increase the difficulty of voiceprint recognition technology. In the above example, the process of denoising through voiceprint recognition and then translating through speech recognition technology involves two technologies. The use, processing process is complicated and cumbersome, and the processing efficiency is low. Therefore, in related technologies, the noise data in the call voice data is not easy to be correctly identified, and may even cause the call voice data to be misrecognized, resulting in a low recognition accuracy.

The following describes the text classification method and text recognition method provided by the embodiments of the present disclosure with reference to the accompanying drawings and specific embodiments. The text classification method can determine the presence of specified types of noise in speech data; the text recognition method can determine based on the Specify class The determination result of the existence of type noise is further recognized and processed to obtain the recognition result based on the existence of the noise.

The text classification method and text recognition method according to the embodiments of the present disclosure can be executed by electronic devices such as terminal devices or servers. The terminal devices can be vehicle-mounted devices with data processing capabilities, user equipment (User Equipment, UE), mobile devices, and user terminals. , terminal, personal digital assistant (Personal Digital Assistant, PDA), handheld device, computing device, vehicle-mounted device, etc., these methods can be implemented by the processor in the terminal device calling computer-readable program instructions stored in the memory, or , these methods can be executed through the server.

Figure 2 is a flow chart of a text classification method provided by an embodiment of the present disclosure. Referring to Figure 2, the text classification method includes the following steps S210 to S230.

In step S210, the text to be classified is obtained.

In this step, the processing device may obtain the text to be classified in various ways. For example, directly use a text in the conversation text as the text to be classified; or, store multiple texts in the text processing device in advance, and the processing device obtains each text one by one as the current text to be classified; or, the processing device is performing During text recognition processing, if it is necessary to classify the current text to be recognized, the text to be recognized can be directly used as the text to be classified.

In step S220, based on the preset text class features and the text to be classified, a feature value of the text class feature of the text to be classified is generated.

In this step, the text features are text-related feature items preset based on the text to be classified. These feature items can be used to characterize the probability of the presence of sensitive words in the conversation text.

In step S230, text classification processing is performed on the text to be classified according to the feature value of the text type feature, and a text classification result indicating the presence of the specified type of noise is generated.

In this step, the specified type of noise may be a sound signal that has certain practical meaning and damages the quality of the collected voice data. Illustratively, the specified type of noise includes, but is not limited to, those pointed out in the above embodiments: noise caused by environmental noise, human voice interference, reverberation, echo and other interference sources.

According to the text classification method of the embodiment of the present disclosure, the text classification method can be based on the preset text class features and the text to be classified, generate the feature values of the text-type features of the text to be classified, perform text classification processing on the generated feature values of the text-type features, and obtain the text classification results. Through the text classification results, it can be determined whether the text to be classified is The specified type of noise is present. This method can determine whether there is a specified type of noise in the text to be classified based on text characteristics, so that in the subsequent text recognition process, the interference caused by the noise data can be reduced based on the classification results, so it is conducive to obtaining objective Text recognition results.

Compared with denoising voice call data through voiceprint recognition technology, the text classification method of the embodiment of the present disclosure is to determine whether there is a specified type of noise in the text to be classified based on the characteristic values of the text-type features of the text to be classified, and to determine whether there is specified type of noise in the text to be classified. It has nothing to do with pattern recognition processing. Therefore, the method according to the embodiments of the present disclosure determines whether there is noise in the dialogue text, and will not be affected by various factors that adversely affect the accuracy of voiceprint recognition in related technologies, and the accuracy is higher; and, compared with In the related art, it is necessary to first perform denoising based on voiceprint recognition on the acquired voice data, and then perform a voice recognition process on the denoised voice data. The text classification method in the embodiment of the present disclosure is to denoise the acquired text to be classified. , determine whether the specified type of noise exists based on text features, improve the accuracy of classification results, simplify the processing method, and improve processing efficiency.

The text classification method according to the embodiment of the present disclosure will be described below.

In some embodiments, the preset text-like features include at least one text-like feature. In the above-mentioned step S220, based on the preset text-type features and the text to be classified, the step of generating feature values of the text-type features of the text to be classified may specifically include the following steps S11 and S12.

In step S11, the value rule of each text-type feature in the at least one text-type feature is determined based on the at least one text-type feature.

In this step, each text feature is represented as a feature operator, and each feature operator is used to describe the value rule of a text feature, that is, the correspondence between the text feature and different feature values.

In step S12, based on the value rules of each text class feature, a feature value of each text class feature of the text to be classified is generated.

In this embodiment, the feature value of the text feature is a numerical representation of the text feature. The text classification process based on the feature value of the text feature of the text to be classified can embody Whether there is an objective situation of a specified type of noise in the dialogue text, and the richer the types of preset text features, the more accurate the subsequent corresponding text classification results will be, and the more conducive to improving the accuracy of the text classification results.

In some embodiments, the text to be classified is text selected from pre-obtained conversation texts. At least one text-type feature includes at least one of the following text-type features: sensitive word distribution features, used to characterize the distribution of sensitive words in conversation texts; predetermined features of the text itself, used to characterize the predetermined features of the text to be classified; and Predetermined features related to the dialogue text are used to characterize the predetermined features related to the text to be classified and the dialogue text.

Through multiple different types of text feature settings, it provides an objective basis for subsequent accurate judgment of whether there is specified type of noise in the text to be classified.

In some embodiments, the conversation text includes: the conversation text of the target object and the conversation text of the conversation object generated during a call between the target object and the conversation object with which the target object talks, and the text to be classified is the target object One of the dialogue texts.

In some embodiments, when the conversation text of the target object is the agent call text, the conversation text of the conversation object with which the target object talks is the customer call text. The agent call text contains multiple agent call texts, and the customer call text contains multiple customer call texts.

In some embodiments, at least one text-like feature includes a sensitive word distribution feature.

The above-mentioned step S11 may specifically include: determining the value rules of at least one of the following text-type features included in the sensitive word distribution features: the value rules of the first text-type feature, and the first text-type feature is used to represent: the dialogue text of the target object. Whether only the text to be classified contains sensitive words; the value rules of the second text class feature. The second text class feature is used to characterize: whether the sensitive word in the text to be classified appears in the dialogue text of the conversation object; the third text The value rules of class features, the third text class feature is used to characterize: whether there are sensitive words in the dialogue text of the conversation object; and the value rules of the fourth text class feature, the fourth text class feature is used to represent: scheduled dialogue Whether there are sensitive words in the text, and whether the sensitive words in the predetermined dialogue text are consistent with the sensitive words in the text to be classified, the predetermined dialogue text is one of the dialogue texts of the dialogue object, and the predetermined dialogue text is related to the text to be classified adjacent text.

The above step S12 may specifically include: value rules based on the first text type feature, At least one of the value rules of the second text-type feature, the value rule of the third text-type feature, and the value rule of the fourth text-type feature is used to generate the text to be classified corresponding to the first text-type feature, the second text-type feature, and the value rule of the fourth text-type feature. The characteristic value of at least one text-type feature among the text-type feature, the third text-type feature and the fourth text-type feature.

As a specific example, the feature operator T ₁ can be constructed through the following expression (1) to determine the value rule of the first text type feature.

In the above expression (1), T ₁ (α ₁ ) is referred to as operator T ₁ , α ₁ is the first text type feature, and the value of T ₁ (α ₁ ) represents the characteristic value of the first text type feature; A text class feature α ₁ is used to characterize: whether only the text to be classified in the dialogue text of the target object contains sensitive words. If so, then T ₁ (α ₁ ) takes the value 0. If not, then T ₁ (α ₁ ) The value is 1.

In the embodiment of the present disclosure, if only the text to be classified in the dialogue text of the target object contains sensitive words (α ₁ =yes), it means that the probability that the sensitive words in the text to be classified is the real dialogue content is small (T ₁ ( α ₁ )=0); if the target object’s dialogue text does not contain sensitive words only in the current text to be classified (α ₁ =No), it indicates that the probability of the sensitive words in the predetermined text to be classified is the real dialogue content is relatively high. (T ₁ (α ₁ )=1).

As an example, in actual application scenarios, if the agent call text contains sensitive words, usually there may be multiple (not just one) agent call texts containing sensitive words, and customer call texts may also contain sensitive words, so Only the text to be classified is less likely to contain sensitive words.

It should be noted that the above value of T ₁ (α ₁ ) is only a schematic explanation, and it must be satisfied that the value of T ₁ (α ₁ ) when α ₁ is “yes” is smaller than the value of T ₁ when α ₁ is “no”. The value of (α ₁ ) is enough, and the specific value can be customized according to actual needs.

As a specific example, the feature operator T ₂ can be constructed through the following expression (2) to determine the value rule of the second text type feature.

In the above expression (2), T ₂ (α ₂ ) is referred to as operator T ₂ , α ₂ is the second text class feature, and T ₂ (α ₂ ) is the feature value of the second text class feature; the second text class Feature α ₂ is used to characterize: whether the sensitive words in the text to be classified appear in the dialogue text of the dialogue object; T ₂ (α ₂ ) table Indicates the value of operator T ₂ . Specifically, if the sensitive word in the text to be classified appears in the dialogue text of the dialogue object (α ₂ = yes), then the value of T ₂ (α ₂ ) is 0; the sensitive word in the text to be classified does not appear in the dialogue object In the dialogue text (α ₂ =No), then T ₂ (α ₂ ) takes the value 1.

As an example, the conversation text of the target object is the agent call text, the text to be classified is any agent call text in the agent call text, and the conversation text of the conversation object with the target object is the customer call text, the second text type feature The meaning of the feature value is: whether the sensitive words in the text to be classified appear in the customer call text adjacent to the text to be classified.

In the embodiment of the present disclosure, if a sensitive word in the text to be classified appears in the customer call text adjacent to the text to be classified, it indicates that the probability of the predetermined sensitive word actually existing in the text to be classified is small; if the sensitive word to be classified appears If the sensitive words in the text do not appear in the customer call text adjacent to the text to be classified, it indicates that the sensitive words in the predetermined text to be classified are more likely to be real conversation content.

For example, in the specific example of return noise described in the previous embodiment; Example 1 is "Customer: You keep calling me during my working hours. Can you be a little bit qualified? Agent: A little bit qualified. We will give you a note here. I’ll reduce the number of calls to you during working hours. Goodbye.” Assume that the text to be classified is the agent call text in Example 1. Since "somewhat quality" appears not only in the agent call text, but also in the customer call text adjacent to the agent call text, in this case "somewhat quality" The probability that it actually exists in the corresponding text to be classified (that is, the agent's call text) is small, and the probability that it is return noise is high.

It should be noted that the above value of T ₂ (α ₂ ) is only a schematic explanation, and it must be satisfied that the value of T ₂ (α ₂ ) when α ₂ is “yes” is smaller than the value of T _{2 when α 2} _is “no”. The value of (α ₂ ) is sufficient, and the specific value can be customized according to actual needs.

As a specific example, the feature operator T ₃ can be constructed through the following expression (3) to determine the value rule of the third text type feature.

In the above expression (3), T ₃ (α ₃ ) represents the value of operator T ₃ , α ₃ is the third text class feature, T ₃ (α ₃ ) is the feature value of the third text class feature, and the third Text class feature α ₃ is used for Characterizes whether the conversation text of the conversation object contains sensitive words. Specifically, if the dialogue text of the dialogue object contains sensitive words (α ₃ =yes), then T ₃ (α ₃ ) takes a value of 1; if the dialogue text of the dialogue object does not contain sensitive words (α ₃ =no), Then T ₃ (α ₃ ) takes the value 0.

As an example, when the conversation text of the target object is an agent call text, the text to be classified is an agent call text among the agent call texts, and the conversation text of the conversation object with the target object is a customer call text, the third text type feature The meaning is: whether the customer call text contains sensitive words.

In this embodiment of the present disclosure, if the dialogue text of the dialogue object also contains sensitive words (it is enough to include sensitive words, and the sensitive words may be the same as or different from the sensitive words in the text to be classified), it indicates that the predetermined sensitive words are to be classified. The probability of real existence in the classified text is relatively high; if the dialogue text of the dialogue object does not contain sensitive words, it means that the probability of the sensitive words in the predetermined text to be classified is the actual dialogue content is small.

It should be noted that the above value of T ₃ (α ₃ ) is only a schematic explanation, and it must be satisfied that the value of T ₃ (α ₃ ) when α 3 is “no” is smaller than the value of _{T 3} ₍ α 3 ) when α ₃ is “yes”. The value of (α ₃ ) is sufficient, and the specific value can be customized according to actual needs.

As a specific example, the feature operator T ₄ can be constructed through the following expression (4) to determine the value rule of the fourth text type feature.

In the above expression (4), T ₄ (α ₄ ) is referred to as operator T ₄ , α ₄ is the fourth text class feature, T ₄ (α ₄ ) is the characteristic value of the fourth text class feature, and the fourth text class Feature α ₄ is used to characterize: whether the predetermined dialogue text contains sensitive words, and, if sensitive words are contained, whether the sensitive words present in the predetermined dialogue text are consistent with the sensitive words in the text to be classified; where, the predetermined dialogue text is One of the dialogue texts of the dialogue object, and the predetermined dialogue text is text adjacent to the text to be classified.

Specifically, if the predetermined dialogue text does not contain sensitive words (α ₄ =No), the value of T ₄ (α ₄ ) is 0; if the predetermined dialogue text contains sensitive words and the sensitive words are the same as those in the text to be classified The sensitive words are consistent (α ₄ = yes, consistent), then T ₄ (α ₄ ) takes the value 1; if the predetermined dialogue text contains sensitive words and the sensitive words are inconsistent with the sensitive words in the text to be classified (α ₄ =Yes, inconsistent), then T ₄ (α ₄ ) takes the value 2.

As an example, when the conversation text of the target object is an agent call text, the text to be classified is an agent call text among the agent call texts, and the conversation text of the conversation object with the target object is a customer call text, the fourth text type feature The meaning is: whether the customer call text adjacent to the text to be classified contains sensitive words, and if it contains sensitive words, whether the sensitive words are consistent with the sensitive words in the text to be classified.

In the embodiment of the present disclosure, if the predetermined conversation text does not contain sensitive words (for example, the customer call text adjacent to the text to be classified does not contain sensitive words), it indicates the probability that the sensitive words in the text to be classified are real conversation content Minimum (T ₄ (α ₄ ) = 0); if the predetermined dialogue text contains sensitive words, but the sensitive words in the predetermined dialogue text are consistent with the sensitive words in the text to be classified (for example, the customer call text adjacent to the text to be classified contains sensitive words, but is consistent with the sensitive words in the text to be classified), it indicates that there is a certain probability that the sensitive words actually come from the target object, but the probability is low (T ₄ (α ₄ ) = 1); if the predetermined dialogue text contains sensitive words, and the sensitive words in the predetermined conversation text are inconsistent with the sensitive words in the text to be classified (for example, the customer call text adjacent to the text to be classified contains sensitive words, and the sensitive words are inconsistent with the sensitive words in the text to be classified) , indicating that the probability that the sensitive word actually comes from the target object is relatively high (T ₄ (α ₄ )=2).

For example, in a call scenario between an agent and a customer, the agent and the customer take turns speaking. If there is a dispute between the two, and the agent call text corresponding to a certain sentence of the agent's speech contains sensitive words such as uncivilized words, the corresponding agent call text will be deleted. The adjacent customer call text (the previous sentence of the customer call text or the next sentence of the customer call text of the agent call text in the dialogue text) also contains sensitive words with a high probability. In addition, considering the possibility of return noise,: for the situation where the sensitive words in the agent call text are different from the sensitive words in the adjacent customer call text, and the sensitive words in the agent call text are different from the adjacent customer call texts. Compared with the situation where the sensitive words contained in the customer call text are the same, the sensitive words in the former situation are different, which excludes the possibility of return noise. Therefore, the former situation (the situation where the sensitive words are different) is different from the latter situation. Compared with the situation (the situation with the same sensitive words), the sensitive words in the text to be classified are more likely to be the actual conversation content; and only the agent call text contains sensitive words such as uncivilized words, and the adjacent customers of the agent call text The call text does not contain sensitive words, and the probability of this situation happening in real scenarios is low.

It should be noted that the above value of T ₄ (α ₄ ) is only a schematic explanation. It must be satisfied that when α ₄ is “No”, the value of T ₄ (α ₄ ) is smaller than when α ₄ is “Yes, consistent”. The value of T ₄ (α ₄ ), when α ₄ is “yes, consistent”, the value of T ₄ (α ₄ ) is smaller than the value of T ₄ (α ₄ ) when α ₄ is “yes, inconsistent”, The specific value can be customized according to actual needs.

In some embodiments, at least one text-like feature includes a predetermined feature of the text itself.

The above-mentioned step S11 may specifically include: determining the value rules of at least one of the following text-type features included in the predetermined characteristics of the text itself: the value rules of the fifth text-type feature and the value rules of the sixth text-type feature. The features are used to represent: the sentence integrity information of the text to be classified; the sixth text type feature is used to represent: the total number of times a specific word appears in a specified position in the dialogue text of the target object.

The above step S12 may specifically include: based on at least one of the value rules of the fifth text class feature and the value rule of the sixth text class feature, generating text corresponding to the fifth text class feature and the sixth text class in the text to be classified. The characteristic value of at least one text-type feature in the feature.

In this embodiment, at least one of the following information items can be represented by the predetermined characteristics of the text itself in the text-type features: sentence integrity information of the text to be classified, specific words appearing at specified positions in the dialogue text of the target object. Total times.

As a specific example, the feature operator T ₅ can be constructed through the following expression (5) to determine the value rule of the fifth text type feature.

In the above expression (5), T ₅ (α ₅ ) is referred to as operator T ₅ , α ₅ is the fifth text class feature, T ₅ (α ₅ ) is the characteristic value of the fifth text class feature, and the fifth text class feature Feature α ₅ is used to characterize the sentence integrity of the text to be classified. Sentence integrity includes the rationality of the sentence structure and the consistency of the sentence semantics; the value of α ₅ can be obtained by scoring the text to be classified through the preset semantic model. The higher the score, the higher the score. High means the completeness of the sentence is better.

Specifically, if the sentence completeness score of the text to be classified is less than or equal to 0.5, then T ₄ (α ₄ ) takes the value 0; if the sentence completeness score of the text to be classified is greater than 0.5 and less than or equal to 0.8, then T ₄ ( α ₄ ) takes a value of 1; if the sentence completeness score of the text to be classified is greater than 0.8, then T ₄ (α ₄ ) takes the value 2.

In this embodiment of the present disclosure, the probability that sensitive words in the text to be classified are real conversation content is proportional to the sentence completeness value of the text to be classified; if the sentence completeness of the text to be classified is higher, the sentence completeness in the text to be classified will be higher. The higher the probability that the sensitive words are real conversation content; the lower the completeness of the sentence, the lower the probability that the sensitive words in the text to be classified actually come from the text to be classified.

It should be noted that the above value of T ₅ (α ₅ ) is only a schematic explanation. The larger α ₅ is, the larger the value of T ₅ (α ₅ ) is. The specific value can be determined according to the actual situation. Requires custom settings.

As a specific example, the feature operator T ₆ can be constructed through the following expression (6) to determine the value rule of the fifth text type feature.

In the above expression (6), T ₆ (α ₆ ) is referred to as operator T ₆ , α ₆ is the sixth text type feature, and T ₆ (α ₆ ) is the characteristic value of the sixth text type feature. The sixth text type feature α ₆ is used to represent: the total number of times a specific word appears in the specified position in the target object’s dialogue text; α ₆ =0 means that the specific word does not appear in all specified positions; α 6 =n1 means that the specific word does not appear in all specified positions; α ₆ =n1 means that the specific word appears in all specified positions. The specified position appears n1 times, n1 is greater than or equal to 1 and less than the predetermined number threshold; α ₆ = n2 means that all specific terms appear n2 times at the specified position, n2 is greater than or equal to the predetermined number threshold, and the predetermined number threshold is greater than or equal to 1 and Less than or equal to the total number of times, which is the number of times a specific term appears in all specified positions.

In the embodiment of the present disclosure, the probability that a sensitive word in the text to be classified is a real conversation content is inversely proportional to the number of times a specific word appears in a specified position in the text to be classified; the more times a specific word appears in a specified position, the more , the lower the probability that the sensitive words in the text to be classified are real dialogue content; the fewer times a specific word appears in a specified position, the higher the probability that the sensitive words in the text to be classified are real dialogue content.

For example, if a specific word is a polite word, the more the total number of times the polite word appears in all specified positions, the lower the probability that the sensitive words in the text to be classified are real conversation content; the less the total number of times the polite word appears in all the specified positions. , indicating the sensitivity in the text to be classified The higher the probability that the testimonials are real conversation content.

It should be noted that the above value of T ₆ (α ₆ ) is only a schematic explanation. The larger α ₆ is, the smaller the value of T ₆ (α ₆ ) is. The specific value can be determined according to the actual situation. Requires custom settings.

As an example, in the voice service quality inspection (voice quality inspection for short) scenario, the specific words are polite words, and the specified positions include at least the beginning position (the first conversation text in the conversation text) and the end position (the last conversation text in the conversation text). dialogue text). For example, a specific phrase at least includes a polite greeting at the beginning and a polite closing at the end; α ₆ can represent the number of times polite phrases appear in the target object’s dialogue text; α ₆ = 0 means that no polite phrases appear in either the greeting or the ending. At this time, the probability that the sensitive words in the text to be classified is the real conversation content is the highest (T ₆ (α ₆ ) = 2); when n1 is equal to 1, that is, α ₆ = 1, it means that the greeting only appears once at the beginning of the conversation text. slang or the concluding sentence only appears once at the end of the dialogue text. At this time, the probability of the sensitive words in the text to be classified is the real dialogue content is the second highest (T ₆ (α ₆ ) = 1); when n2 is equal to 2, that is, α ₆ =2, which means that a greeting appears at the beginning of the call and a closing word appears at the end of the call. At this time, the probability that the sensitive words in the text to be classified is the actual conversation content is the lowest (T ₆ (α ₆ ) = 0).

In some embodiments, at least one text-like feature includes predetermined features related to the conversation text.

The above-mentioned step S11 may specifically include: determining a value rule for at least one of the following text-type features included in the predetermined features related to the dialogue text: a value rule for a seventh text-type feature and a value rule for an eighth text-type feature. The seventh text-type feature is used to characterize: the number of text items in the dialogue text to which the text to be classified belongs; the eighth text-type feature is used to characterize the position where the text to be classified appears in the dialogue text.

The above step S12 may specifically include: based on at least one of the value rules of the seventh text class feature and the value rule of the eighth text class feature, generating text corresponding to the seventh text class feature and the eighth text class in the text to be classified. The characteristic value of at least one text-type feature in the feature.

In this example, at least one of the following information items can be represented by predetermined features related to the dialogue text in the text-type features: the number of text items of the dialogue text, and the occurrence position information of the text to be classified in the dialogue text.

As an example, the feature operator T ₇ can be constructed through the following expression (7) to determine the value rule of the seventh text type feature.

In the above expression (7), T ₇ (α ₇ ) is referred to as operator T ₇ , α ₇ is the seventh text type feature, and the value of T ₇ (α ₇ ) is the characteristic value of the seventh text type feature. The seventh text type feature α ₇ is used to characterize the total number of texts (conversation turns) contained in the dialogue text to which the text to be classified belongs, that is, how many texts are produced in total during a call between the two parties; among them, K1 is smaller than K2, And K1 and K2 are both integers greater than or equal to 1.

In the embodiment of the present disclosure, the probability that a sensitive word in the text to be classified is the real dialogue content is proportional to the total number of dialogue texts in which the text to be classified is located; for example: the total number of texts included in the dialogue text to which the text to be classified belongs. The more there are (or the more rounds of calls), the higher the probability that the sensitive words in the text to be classified are real conversation content.

It should be noted that the above value of T ₇ (α ₇ ) is only a schematic explanation. It needs to be satisfied that the larger α ₇ is, the larger the value of T ₇ (α ₇ ) is. The specific value can be determined according to the actual situation. Custom settings are required; the values of K1 and K2 can be set according to actual needs. For example, K1=10 and K2=50.

As an example, the feature operator T ₈ can be constructed through the following expression (8) to determine the value rule of the eighth text type feature.

In the above expression (8), T ₈ (α ₈ ) is simply called operator T ₈ , x represents the sentence in which the text to be classified appears in the dialogue text; L is the total number of texts contained in the dialogue text; the eighth text type feature is used to represent: the position where the text to be classified appears in the dialogue text. specifically, Indicates that the position where the text to be classified appears in the dialogue text is the previous paragraph Location; Indicates that the position where the text to be classified appears in the dialogue text is the middle position, Indicates that the position where the text to be classified appears in the dialogue text is the later position.

In the embodiment of the present disclosure, the probability that a sensitive word in the text to be classified is the real conversation content is related to the position of the text to be classified in the conversation text. The later the position of the text to be classified appears in the conversation text, the later the position of the text to be classified appears in the conversation text. The higher the probability that the sensitive words in the text to be classified are real conversation content.

It should be noted that the above value of T ₈ (α ₈ ) is only a schematic explanation. The larger α ₈ is, the larger the value of T ₈ (α ₈ ) is. The specific value can be determined according to the actual situation. Requires custom settings;

In the above expressions (1)-(8), the larger the value of the feature operator (that is, the larger the feature value of the corresponding text class feature of the text to be classified) indicates the probability that the sensitive word actually exists in the text to be classified. The bigger.

In practical applications, more types of settings can be made based on actual needs for the distribution features of sensitive words in the text-type features, the predetermined features of the text itself in the text-type features, and the predetermined features related to the conversation text. The embodiments of the present disclosure do not Specific limitations.

In some embodiments, the text to be classified belongs to the dialogue text of the target object.

In the above-mentioned step S230, the step of performing text classification processing on the text to be classified according to the characteristic value of the text feature to obtain the text classification result may specifically include the following steps S21 and S22.

In step S21, based on the preset portrait features, feature values of the text to be classified corresponding to the portrait features of the target object are obtained. The portrait features are used to characterize the individual characteristics of the target object. In step S22, text classification processing is performed on the text to be classified according to the feature values of the text feature and the feature value of the portrait feature to obtain a text classification result.

In this embodiment, portrait features can be used to assist in determining whether there is a specified type of noise based on text features. Combining the feature values of text features and the feature values of portrait features, the text to be classified can be text-based. Classification processing to improve the accuracy of text classification results.

In some embodiments, the preset portrait features include at least one portrait feature; in step S21, based on the preset portrait features, obtain the feature value of the text to be classified corresponding to the portrait feature of the target object, Specifically, it may include the following steps S31 and S32.

In step S31, a value rule for each portrait feature is determined based on at least one portrait feature.

In this step, each portrait feature is represented as a feature operator, and each feature operator is used to describe the value rule of a portrait feature, that is, the correspondence between the portrait feature and different feature values.

S32: Based on the value rules of each portrait feature, obtain the feature value of each portrait feature of the text to be classified corresponding to the target object.

In this embodiment, the characteristic value of the portrait feature is a numerical representation of the portrait feature; text classification processing based on the feature values of the text feature of the text to be classified and the portrait feature can more accurately reflect the text to be classified. Whether there is an objective situation of the specified type of noise in the text, and the richer the types of pre-set portrait features, the more accurate the subsequent text classification results will be based on combining the feature values of the two different types of features (text and portrait) , which will help further improve the accuracy of text classification results.

In some embodiments, the target object is a customer service agent; the individual characteristics are used to characterize at least one of the following information items: agent level, agent length of service, the number of times the agent's speech does not comply with the predetermined speech rules within a predetermined statistical period, and Whether the agent's speech does not comply with the predetermined speech rules because the text to be classified contains sensitive words and there is a corresponding historical record.

In this embodiment, multiple different types of portrait feature settings are used to provide auxiliary judgment for the subsequent accurate judgment of whether there is specified type of noise in the text to be classified, thereby improving the accuracy of the final classification result.

As a specific example, the feature operator S ₁ can be constructed through the following expression (9) to determine the value rule for the individual feature of agent service age.

In the above expression (9), S ₁ (β ₁ ) is referred to as operator S ₁ , β ₁ represents the length of service of the call customer service agent, and the unit can be years; S ₁ (β ₁ ) represents the value of operator S ₁ . Both A1 and A2 are integers greater than or equal to 1, and A2 is greater than A1. For example, A2=3 and A1=1.

In this embodiment of the present disclosure, the sensitive words in the text to be classified are those of real conversation content. Probability is inversely proportional to the length of service of the call agent; the greater the length of service of the call agent (for example, β ₁ is greater than A2 (such as 10 years)), the lower the probability that the sensitive words in the text to be classified are real conversation content; the call agent The smaller the party's working experience (for example, the working experience β ₁ is less than A1 (such as 1 year)), the higher the probability that the sensitive words in the text to be classified are real conversation content.

It should be noted that the above-mentioned values of A1, A2, and S ₁ (β ₁ ) are only schematic explanations. The larger the value of β ₁ is, the smaller the value of S ₁ (β ₁ ) is. The specific value can be customized according to actual needs.

As a specific example, the feature operator S ₂ can be constructed through the following expression (10) to determine the value rule of the feature of agent level.

In the above expression (10), S ₂ (β ₂ ) is referred to as operator S ₂ , β ₂ represents the agent level, and I, II, and III respectively represent the three levels from high to low. For example, for first-level and second-level , three levels, the first level is the highest level, the second level is the second, and the third level is the lowest level.

In this embodiment of the present disclosure, the probability that sensitive words in the text to be classified are real conversation content is inversely proportional to the agent level; the higher the agent level, the lower the probability that sensitive words in the text to be classified are real conversation content; the agent level The lower the value, the higher the probability that the sensitive words in the text to be classified are real conversation content.

It should be noted that the above-mentioned setting and expression of the number of levels are only schematic explanations. It must be satisfied that the higher the agent level represented by β ₂ , the smaller the value of S ₂ (β ₂ ). The specific value is The settings can be customized according to actual needs.

As a specific example, the feature operator S ₃ can be constructed through the following expression (11) to determine the value rule of the individual feature that the agent's speaking skills within a predetermined statistical period do not comply with the predetermined speaking rules.

In expression (11), S ₃ (β ₃ ) is referred to as operator S ₃ , and β ₃ indicates that the agent suffered a loss of profits due to his speech not complying with the predetermined speech rules during the predetermined statistical period (for example, within the last month). Degree; S ₃ (β ₃ ) represents the value of operator S ₃ ; C1 and C2 are both integers greater than or equal to 1. number, and C2 is greater than C1; for example, C1=5, C2=10.

In this embodiment of the present disclosure, the probability that the sensitive words in the text to be classified are real conversation content is proportional to the number of times that the agent suffered a loss of profits in a predetermined statistical period because his speaking skills did not comply with the predetermined speaking rules; for example, the agent’s number in the past month The more times you are punished for internal violations of speech skills, the higher the probability that the sensitive words in the text to be classified are real conversation content.

It should be noted that the value of the above-mentioned predetermined statistical period is only a schematic explanation. The higher the number of times represented by β ₃ , the larger the value of S ₃ (β ₃ ). The values of C1 and C2 The specific value can be customized according to actual needs.

As a specific example, the feature operator S ₄ can be constructed through the following expression (12) to determine whether the agent's speech does not comply with the predetermined speech rules due to sensitive words contained in the text to be classified and is subject to loss of profits. History records the value rules for this individual characteristic.

In the above expression (12), S ₄ (β ₄ ) is referred to as operator S ₄ , and β ₄ indicates whether the agent’s speech in the historical call data does not comply with the predetermined speech rules because the text to be classified contains sensitive words. Subject to profit loss treatment; S ₄ (β ₄ ) represents the value of operator S ₄ . Specifically, if you have suffered a loss of benefits (for example, including but not limited to at least one of position demotion, disqualification, or economic loss, that is, β ₄ = yes), then S ₄ (β ₄ ) takes the value 1 , if there has been no profit loss treatment (that is, β ₄ =No), then S ₄ (β ₄ ) takes the value 0.

In the embodiment of the present disclosure, the probability that the sensitive words in the text to be classified is the actual conversation content is related to whether the agent in the historical call data suffered a loss of profits due to the inclusion of sensitive words in the text to be classified, causing the agent's speech to not comply with the predetermined speech rules. Processing is related.

For example, in historical call data, if the agent has been punished for speaking irregularities caused by sensitive words in the text to be identified, the probability that the sensitive words in the text to be classified is the actual conversation content is higher; in historical call data , if the agent has never been punished for speaking violations caused by sensitive words in the text to be identified, then the probability that the sensitive words in the text to be classified is the actual conversation content is low.

It should be noted that the above value of S ₄ (β ₄ ) is only a schematic explanation, and it must be satisfied that the value of S ₄ (β ₄ ) when β ₄ is “no” is smaller than the value of S _{4 when β 4} _is “yes”. The value of (β ₄ ) is enough, the specific value is The settings can be customized according to actual needs.

In some embodiments, the portrait characteristics may also include at least one of the following information items: the number of evaluations, the number of complaints within a predetermined statistical period because the speech skills do not comply with the predetermined speech rules, etc. In practical applications, for portraits Class features can be set in more types as needed, which are not specifically limited in the embodiments of this disclosure.

In some embodiments, if the target object is a customer, corresponding portrait features can be set to characterize at least one of the following information items: customer service level, customer credit score, number of customer points, whether the customer appears corresponding Bad history.

In the embodiment of the present disclosure, there is a certain correlation between the feature value of the text feature of the text to be classified and whether the specified type of noise exists in the text to be classified; when performing text classification processing on the text to be classified, the correlation can be based on this correlation , generating text classification results indicating whether the specified type of noise exists based on the feature values of the text class features. The association relationship can be a representation of a function or a model, and the association relationship can be obtained through model training.

In some embodiments, in the above step S230, text classification processing is performed on the text to be classified according to the feature value of the text type feature, and a text classification result indicating whether the specified type of noise exists is generated. Specifically, the following steps S41 and S42 may be included.

In step S41, the feature values of the text class features are processed by the first classification model to obtain the first text category of the text to be classified. The first classification model is a model pre-trained using sample text. In step S42, a text classification result is generated based on a predetermined correspondence between the value of the first text category and whether there is a predetermined type of noise.

In this embodiment, the first classification model is used to indicate an association between the feature value of the text class feature and the text class of the text to be classified. The feature value of the text class feature is processed based on the first classification model to obtain the first text category of the text to be classified; when the first text category is a first value, for example, 1, the corresponding text classification result is that the text exists in the text to be classified. Specified type of noise; when the first text category is the second value, for example, 0, the corresponding text classification result is that there is no specified type of noise in the text to be classified; thus, based on the processing results output by the model, it can be accurately judged whether there is specified type of noise in the text to be classified. Type noise, the processing steps are not cumbersome and the processing efficiency is high.

In this embodiment of the present disclosure, the training data of the first classification model is: from historical speech The sample text obtained from the dialogue text corresponding to the data; during the training process of the first classification model, the text class feature value of the sample text can be obtained, and corresponding annotation information is added to whether there is a specified type of noise in the sample text, using Model training is performed on the text class feature values of the sample text with the annotation information to obtain a first classification model, which is used to perform text classification processing on the text to be classified and generate a text classification result indicating whether the specified type of noise exists in the text to be classified, Improve processing efficiency and accuracy of text classification and recognition based on text class features.

In some embodiments, the above-mentioned step S22 is the step of performing text classification processing on the text to be classified according to the feature values of the text feature and the feature value of the portrait feature to obtain the text classification result, which may specifically include the following steps S51 and S52.

In step S51, the feature values of the text feature and the feature value of the portrait feature are processed through the second classification model to obtain the second text category of the text to be classified. The second classification model is a model trained using sample text. . In step S52, a text classification result is generated based on a predetermined correspondence between the value of the second text category and whether there is a predetermined type of noise.

In this embodiment, the second classification model is used to indicate the correlation between the feature values of the text-type features and the feature values of the portrait-type features and the text category of the text to be classified. Based on the second classification model, the feature values of the text feature and the feature value of the portrait feature are processed to obtain the second text category of the text to be classified; when the second text category is the first value, for example, 1, the corresponding text classification result When the specified type of noise exists in the text to be classified, and the second text category is the second value, for example, 0, the corresponding text classification result is that the specified type of noise does not exist in the text to be classified; thus, based on the processing results output by the model, the target type can be accurately judged. Whether there is specified type of noise in the classified text, the processing steps are not cumbersome and the processing efficiency is high.

In the embodiment of the present disclosure, the training data of the second classification model is: a sample text obtained from the dialogue text corresponding to the historical speech data; during the training process of the second classification model, the text class feature value and Portrait class feature values, and label whether there is a specified type of noise in the sample text. Use the text class feature values and portrait class feature values of the sample text with the label information to perform model training to obtain the second classification model. It is used to classify the text to be classified and generate a classification result that indicates whether the specified type of noise exists in the text to be classified, thereby improving the processing efficiency and conducting processing based on text class features. Accuracy of line text classification recognition.

The following takes the second classification model as an example to describe the training process of the second classification model.

The following describes the construction process of training data for the second classification model.

As an example, the schematic values of the training data of the second classification model are schematically shown in Table 1 below.

Table 1 Training data of the second classification model

In the above Table 1, n represents the number of sample texts, and n is an integer greater than or equal to 1, T ₁ to T ₈ represent text-like features described by the above expressions (1)-(8), and S ₁ to S ₄ represent Through the portrait features expressed by the above expressions (9)-(12), the annotation information value is an annotation of whether there is a specified type of noise in each sample text in the sample text for training the second classification model. The annotation information value If it is 0, it means that the corresponding sample text does not have the specified type of noise; if the annotation information value is 1, it means that the corresponding sample text does not have the specified type of noise.

In this step, for each sample text, the characteristic values of the text-type features and the characteristic values of the portrait-type features are determined, and corresponding annotation information is added to each sample text (annotation information "0" indicates that the specified text does not exist in the text. Type noise; the label information "1" indicates that the specified type of noise exists in the text), thereby completing the construction of training data. When the specified type of noise is return noise, the model is used to train the correlation between the eigenvalues of text features and the feature values of portrait features and the return noise.

During the processing of the above embodiments, the construction of text features and portrait features and the speech return judgment based on the constructed feature values of text features and portrait features are the identification of noise data at the text level and portrait level. and judgment. Among related technologies, voiceprint recognition technology segments and eliminates noise data at the speech level. Both the text classification method and the text recognition method in the embodiments of the present disclosure perform noise processing at the text level and the image level. The judgment of voice data does not have the problems of low denoising accuracy and complicated process of voiceprint recognition technology.

The training process of the second classification model is described below.

Figure 3 is a schematic flowchart of model training and model use provided by an embodiment of the present disclosure. As shown in Figure 3, the model training process may include the following steps S301 to S303.

In step S301, as shown in "Input training data" in Figure 3, the input training data can be obtained using a text classification task.

In step S302, as shown in "Machine Learning Training" in Figure 3, a predetermined type of machine learning network is used for model training to obtain a trained model.

In step S303, as shown in "Obtaining the trained text classification model" in Figure 3, the trained model is used as the second classification model.

In some embodiments, the machine learning network can adopt any of the following machine models: logistic regression or logistic regression (Logistic Regression, LR) model, text classification algorithm model TextCNN, pre-trained language model Bert and other machine learning models Model.

The LR model is the simplest and most commonly used classification model in traditional machine learning. The LR algorithm is simple, efficient, easy to parallelize and has the characteristics of online learning. It has a very wide range of applications in the industry. The TNN model can obtain a two-dimensional sentence matrix based on the word vector, and then select different filters for convolution operations to obtain multiple Feature matrix (featuremap) performs a maximum pooling operation on each feature matrix, then splices them together, and finally classifies through the fully connected layer of the classifier (softmax). The TextCNN model has the advantage of a simple network structure. By introducing already trained word vectors, it will have a better model training effect; this model has the advantages of less number of model parameters, less calculation, and fast training speed; the Bert model can use conversion The bidirectional encoder representation of Transformer (Transformer), the pre-trained BERT representation can be fine-tuned through an additional output layer, and is suitable for the construction of state-of-the-art models for a wide range of tasks; in actual application scenarios, the appropriate BERT representation can be selected according to actual training needs The model is not specifically limited in the embodiments of this disclosure.

The following describes the process of using the second classification model.

Continuing to refer to Figure 3, using the trained model to perform text classification processing may include the following steps S304 to S306.

In step S304, as shown in "Feature Value Calculation" in Figure 3, calculate the text to be classified The eigenvalues of text-type features and the eigenvalues of portrait-type features.

In step S305, as shown in "Model Processing" in Figure 3, the trained second classification model is used to process the feature values of the text feature and the feature value of the portrait feature of the text to be classified.

In step S306, as shown in "Output Text Category" in Figure 3, if the output text category is "1", it is determined that there is speech backhaul in the text to be recognized; if the text category is "0", it is determined that the text is to be recognized. There is no rhetorical echo in the text.

Through the above steps S301 to S306, the training process and usage process of the second classification model are described. It should be understood that the model training process of the first classification model is similar to the training process of the second classification model. The difference is that the training data of the first classification model is the text class features of the sample text used to train the second classification model. eigenvalues. The sample text used to train the first classification model and the sample text used to train the second classification model may be the same sample text, or they may be different sample texts. For other details in the training process of the first classification model, please refer to the corresponding content in the training of the second classification model, and will not be described again here. In the model identification step, it is necessary to calculate the feature values of the text-type features of the text to be classified, and use the first classification model obtained by training to process the feature values of the text-type features of the text to be classified to obtain the corresponding text category for use To determine whether there is return noise in the text to be classified.

In the embodiment of this disclosure, including but not limited to the location and completeness of the text used, the presence or absence of sensitive words in the customer call text, similarities and differences with agent sensitive words, agent length of service, level, and loss of profits due to non-compliance with speech rules. The number of times (such as the number of times being punished) and other features have been completed to complete the construction of text features and portrait features; during the speech return judgment process, the constructed text features and portrait features can be used in advance to obtain model training The eigenvalues of text features and the eigenvalues of portrait features in the sample text are used to generate training data, and the training data and the annotation results of the sample text (at least whether there is a specific type of noise) are used for model training to obtain a classification model. Thus, the output of the trained model can be used to determine whether there is speech backhaul in the text to be recognized.

In some embodiments, the specified type of noise includes: return noise generated by sound return; sound return refers to: sound return from the speaker of the calling device to the microphone array during the call.

The model classification method of the embodiment of the present disclosure can generate the feature values of the text feature of the text to be classified based on the preset text feature and the text to be classified, and perform text classification processing on the feature value of the generated text feature to obtain Text classification results are used to determine whether there is call return noise in the text to be classified, and the accuracy of the classification results is improved. The processing method and processing steps are simple and feasible, and the processing efficiency of the classification results is improved.

It can be understood that the above-mentioned method embodiments mentioned in this disclosure can be combined with each other to form a combined embodiment without violating the principle logic. Due to space limitations, the details will not be described in this disclosure. Those skilled in the art can understand that in the above-mentioned methods of specific embodiments, the specific execution order of each step should be determined by its function and possible internal logic.

The processing flow of the text recognition method according to the embodiment of the present disclosure is described below through FIG. 4 . Figure 4 is a flow chart of a text recognition method provided by an embodiment of the present disclosure. As shown in Figure 4, the method may include the following steps S410 to S430.

In step S410, sensitive word recognition is performed on the acquired text to be recognized, and a sensitive word recognition result is obtained.

Before step S410, the text to be recognized may be obtained in advance from the conversation text converted from the obtained call voice. In this step, the text to be recognized is output, and the text to be recognized can be a dialogue text in the specified dialogue text. For example, the text to be recognized may be any dialogue text of the customer service agent among the dialogue texts between the customer service agent and the customer.

In step S410, NamedEntityRecognition (NER) is performed on the text to be recognized. According to the entity recognition results, it is confirmed whether there are sensitive words in the text to be recognized. Sensitive words in the text to be recognized are identified through NER technology. The NER model can use long short-term memory. Network (LongShort-TermMemory, LSTM), Bert network. If no sensitive word is recognized, a null value or corresponding prompt message can be output, indicating that the text to be recognized does not contain sensitive words; if a sensitive word is recognized, the recognized sensitive word can be added to the sensitive word list for use for subsequent processing.

The LSTM network is a temporal recurrent neural network that can alleviate the long-term dependency problem of general recurrent neural networks. The LSTM network and Bert network can achieve better recognition results in named entity recognition. In practical applications, the network for named entity recognition can be selected as needed, and the embodiments of the present disclosure do not impose specific restrictions.

In step S420, text classification processing is performed on the text to be recognized according to the feature value of the text-type feature of the text to be recognized, and a text classification result is generated. The text classification result is used to indicate whether the specified type of noise exists.

In the embodiment of the present disclosure, when performing text classification processing on the text to be recognized, the text to be recognized is used as the text to be classified, and the text classification method in the above embodiment is executed to obtain the corresponding text classification result. For the specific process and details of text classification, please refer to the specific steps of the text classification method described in conjunction with Figures 2-3 in the previous embodiments, and will not be described again in the embodiments of this disclosure.

In step S430, a text recognition result of the text to be recognized is generated based on the sensitive word recognition result and the text classification result.

Through this text recognition method, the presence or absence of specific types of noise data in the dialogue text can be effectively determined based on the characteristic values of the text features of the text to be recognized; the determination results of the presence or absence of specific types of noise data can be used to assist sensitive word recognition. Improved the accuracy of sensitive word recognition. The text recognition method proposed in this disclosure is at the text level, reducing the adverse effects of speech return noise and translation errors on text recognition results during the text recognition process, and effectively reducing the erroneous judgment of sensitive word recognition results in the presence of predetermined types of noise. , improving the accuracy of sensitive word recognition.

According to the text recognition method of the embodiment of the present disclosure, when identifying sensitive words in the text to be recognized, the sensitive word recognition results and the speech return judgment results can be combined to complete the recognition of sensitive words in a speech quality inspection scenario and improve sensitive words. The accuracy of identification reduces the erroneous judgment that the agent's speech does not comply with the predetermined speech rules.

In some embodiments, step S420 may specifically include: using the text classification method of any of the above embodiments of the present disclosure to perform text classification processing on the text to be recognized, to obtain a text classification result.

In some embodiments, the text to be recognized is one of the dialogue texts of the target object obtained from the dialogue text. Step S420 may specifically include: step S61, obtaining the feature value of the portrait feature of the text to be recognized, which is used to characterize the individual features of the target object; step S62, based on the feature value of the text feature and the feature value of the portrait feature , perform text classification processing on the text to be recognized, and obtain text classification results.

The text recognition method in the embodiment of the present disclosure calculates text-like features of the text to be recognized The eigenvalues of the eigenvalues and the eigenvalues of portrait features are used to determine whether there is a specified type of noise (such as speech return noise) in the text to be identified, and the sensitive words in the text to be identified are identified, combined with the determination results of whether the specified type of noise exists. and sensitive word recognition results to improve the accuracy of sensitive word recognition in speech quality inspection scenarios.

In some embodiments, step S430 may specifically include: if the number of sensitive words identified from the text to be recognized is greater than or equal to 1, and the text classification result is that there is no specified type of noise, output the identified sensitive words as text Recognition results.

In this step, if the sensitive word is identified and it is determined that there is no noise of the specified type, it is determined that the recognition result of the sensitive word is not caused by the noise of the specified type. Therefore, the recognized sensitive word is output as the recognition result, combined with the result of the specified type of noise. The existence determination results and sensitive word recognition results improve the accuracy of sensitive word recognition in speech quality inspection scenarios.

In some embodiments, step S430 may further include: if the number of sensitive words recognized from the text to be recognized is equal to zero, determining that there are no sensitive words in the text to be recognized, and outputting the first prompt information, where the first prompt information is is empty to indicate sensitive words; if the number of sensitive words identified from the text to be recognized is greater than or equal to 1, and the text classification result indicates that there is a specified type of noise, determine whether the identified sensitive words are caused by the specified type of noise, and then determine the recognition The second prompt information is output when the identified sensitive word is caused by the specified type of noise. The second prompt information is used to indicate that the sensitive word is caused by the specified type of noise, and is output when it is determined that the identified sensitive word is not caused by the specified type of noise. The identified sensitive words are used as text recognition results.

In this step, if no sensitive word is identified, it can be directly determined that there is no sensitive word. If the sensitive word is identified and it is determined that the specified type of noise exists, determine whether the identified sensitive word is caused by the specified type of noise, and output the corresponding The prompt information is used to prompt the processing results, so that the determination results of the existence of specified types of noise and the sensitive word recognition results can be combined to improve the accuracy of sensitive word recognition in speech quality inspection scenarios.

FIG. 5 shows a flowchart of a text recognition method according to an exemplary embodiment of the present disclosure. As shown in Figure 5, the text recognition method may include the following steps S501 to S509.

In step S501, text to be recognized is input.

In some embodiments, the text to be recognized is any dialogue text of the customer service agent among the dialogue texts between the customer service agent and the customer.

In step S502, determine whether the text to be recognized contains sensitive words.

In this step, sensitive words in the text to be recognized can be identified through NER technology. If the sensitive word is not recognized, step S503 is executed. If the sensitive word is recognized, step S504 is executed.

In step S503, the first prompt information is output.

The first prompt information may be, for example, a first prompt symbol, indicating that the text to be recognized does not contain sensitive words. The first prompt symbol may be "[]", for example.

In some embodiments, after step S503, you can return to step S501 to recognize the next text to be recognized.

In step S504, the recognized sensitive words are obtained.

In this step, the acquired sensitive words can be added to the sensitive word list.

In step S505, text classification processing is performed to determine whether there is return noise.

Through text classification processing, the presence or absence of speech return noise in the text to be recognized is determined, that is, whether there is a "speech return" type of noise interference phenomenon in the text to be recognized. If there is a "conversation back" phenomenon in the text to be recognized, step S506 is executed; if there is no "conversation back" phenomenon in the text to be recognized, step S508 is executed.

In step S506, it is determined that speech backhaul exists, and step S507 is executed.

In step S507, second prompt information is output.

The second prompt information may be, for example, a second prompt symbol. The second prompt symbol indicates that the sensitive word in the text to be recognized is caused by "talking back", so the sensitive word is not output; the second prompt symbol may be the same as the second prompt symbol. A prompt symbol is a different symbol. For example, the first prompt symbol may be "{}".

In some embodiments, after step S507, step S501 may be returned to recognize the next text to be recognized.

In step S508, it is determined that there is no speech backhaul, and step S509 is executed.

In step S509, the recognized sensitive words are output.

In this step, if there is a sensitive word in the text to be recognized and it is determined that there is no speech return, the sensitive word in the text to be recognized is output.

In some embodiments, the dialogue text includes: the dialogue text of the target object and the dialogue text of the dialogue object that dialogues with the target object, and the text to be recognized is the dialogue text of the target object. One of the books.

The text recognition result also includes: obtaining new text to be recognized in the dialogue text converted from the obtained call voice information, and generating new text recognition results until the number of acquisitions is equal to the number of text items of the target object's dialogue text, and we obtain Text recognition results of the target object’s dialogue text based on the presence or absence of noise.

In this embodiment, each text in the dialogue text can be used as a text to be recognized for the above-mentioned text recognition processing in turn, until the last text to be recognized is obtained for text recognition processing, and a noise-based noise-based analysis of all texts in the dialogue text of the target object is obtained. Based on the existing text recognition results, the recognition of sensitive words in the dialogue text of the target object in the speech quality inspection scenario is completed. This strategy improves the recognition accuracy of sensitive words.

In some embodiments, the conversation text of the target object is the conversation text of the customer service agent, and the conversation text of the conversation object is the conversation text of the customer or user.

In this embodiment, through this text recognition method, it is possible to complete the recognition of sensitive words in a speech quality inspection scenario. This strategy improves the accuracy of sensitive word recognition and reduces errors when agents use speech techniques that do not comply with predetermined speech rules. determination.

According to the text recognition method of the embodiment of the present disclosure, taking the specified type of noise as speech return noise as an example, by pre-constructing text features and portrait features, the feature values of the text to be recognized based on the constructed features are obtained, and then according to the above The eigenvalues of the constructed features are used to determine whether there is speech return noise to determine whether there is speech return noise. Based on the judgment results of the speech return noise and the sensitive word recognition results, it is comprehensively determined whether there are sensitive words in the dialogue text of the target object. , improve the accuracy of sensitive word recognition in speech quality inspection scenarios.

In addition, the present disclosure also provides a text classification device, a text recognition device, electronic equipment, and a computer-readable storage medium. The text classification device can be used to implement any text classification method provided by the present disclosure, and the text recognition device can be used to implement any text classification method provided by the present disclosure. Any text recognition method, electronic equipment, computer-readable storage media and the above can be used to implement any text classification method or any text recognition method provided by the present disclosure. For the corresponding technical solutions and descriptions, please refer to the corresponding section of the method section. Record, no more details.

Figure 6 is a block diagram of a text classification device provided by an embodiment of the present disclosure.

Referring to Figure 6, an embodiment of the present disclosure provides a text classification device. The text classification device 600 includes the following modules.

Obtaining module 610 is used to obtain text to be classified.

The feature value generation module 620 is configured to generate feature values of the text feature of the text to be classified based on the preset text feature and the text to be classified.

The classification determination module 630 is configured to perform text classification processing on the text to be classified according to the feature value of the text class feature, and obtain a text classification result. The text classification result is used to indicate whether the specified type of noise exists.

In some embodiments, the preset text-like features include at least one text-like feature. The feature value generation module 620 may specifically include: a rule determination unit, configured to determine the value rule for each text feature based on at least one preset text feature; and a value generation unit, configured to determine the value rule based on each text feature. Value rules are used to generate feature values corresponding to each text class feature in the text to be classified.

In some embodiments, the text to be classified is text selected from pre-obtained conversation texts. Text-type features include at least one of the following features: sensitive word distribution features, predetermined features of the text itself, and predetermined features related to the dialogue text; among them, the sensitive word distribution features are used to characterize the distribution of sensitive words in the dialogue text; text The predetermined features of the text itself are used to characterize the predetermined features of the text to be classified; the predetermined features related to the dialogue text are used to characterize the predetermined features of the text to be classified that are related to the dialogue text.

In some embodiments, the dialogue text includes: the dialogue text of the target object and the dialogue text of the dialogue object with which the target object is dialogued, generated during a call between the target object and the dialogue object with which the target object is conversing. The classification text is one of the dialogue texts of the target object.

In some embodiments, at least one text feature includes a sensitive word distribution feature. The rule determination unit is specifically used to: determine the value rules of at least one of the following text-type features included in the sensitive word distribution characteristics: the value rules of the first text-type feature, and the first text-type feature is used to represent: the dialogue text of the target object Whether the sensitive words in the text to be classified only exist in the text to be classified; the value rules of the second text class feature. The second text class feature is used to characterize: whether the sensitive word in the text to be classified appears in the dialogue text of the conversation object; the second The value rules of the three text-type features, the third text-type feature is used to characterize: whether there are sensitive words in the dialogue text of the conversation object; and the value rules of the fourth text-type feature, the fourth text-type feature is used to characterize: Whether there are sensitive words in the scheduled dialogue text, and whether there are sensitive words in the scheduled dialogue text Whether the sensitive word in is consistent with the sensitive word in the text to be classified, the predetermined dialogue text is one of the dialogue texts of the dialogue object, and the predetermined dialogue text is text adjacent to the text to be classified.

The value generation unit is specifically configured to: based on at least one of the value rules of the first text-type feature, the value rules of the second text-type feature, the value rules of the third text-type feature, and the value rules of the fourth text-type feature. Or, generating a feature value corresponding to at least one text-type feature among the first text-type feature, the second text-type feature, the third text-type feature and the fourth text-type feature in the text to be classified.

In some embodiments, at least one text-like feature includes predetermined features of the text itself. The rule determination unit is specifically used to: determine the value rules of at least one of the following text-type features included in the predetermined characteristics of the text itself: the value rules of the fifth text-type feature and the value rules of the sixth text-type feature, the fifth text-type feature The features are used to represent: the sentence integrity information of the text to be classified; the sixth text type feature is used to represent: the total number of times a specific word appears in a specified position in the dialogue text of the target object.

The value generation unit is specifically configured to: generate, based on at least one of the value rules of the fifth text class feature and the value rule of the sixth text class feature, corresponding to the fifth text class feature and the sixth text class in the text to be classified. The characteristic value of at least one text-type feature in the feature.

In some embodiments, at least one text-like feature includes predetermined features related to the conversation text. The rule determination unit is specifically used to: determine the value rule of at least one of the following text-type features contained in the predetermined characteristics related to the dialogue text: the value rule of the seventh text-type feature and the value rule of the eighth text-type feature, the first The seventh text-type feature is used to represent: the number of text items of the dialogue text to which the text to be classified belongs; the eighth text-type feature is used to represent: the position where the text to be classified appears in the dialogue text.

The value generation unit is specifically configured to: based on at least one of the value rules of the seventh text class feature and the value rule of the eighth text class feature, generate the text to be classified corresponding to the seventh text class feature and the eighth text class The characteristic value of at least one text-type feature in the feature.

In some embodiments, the text to be classified belongs to the dialogue text of the target object; the classification determination module 630 is specifically configured to: based on the preset portrait features, obtain the characteristic value of the portrait feature of the target object corresponding to the text to be classified, and the portrait feature Features are used to characterize the individual characteristics of the target object; based on the feature values of text features and the feature values of portrait features, text classification processing is performed on the text to be classified to obtain text classification results.

In some embodiments, the preset portrait features include at least one portrait feature. The classification determination module 630, when used to obtain the feature values of the portrait features of the text to be classified based on the preset portrait features, is specifically used to: determine the value of each portrait feature based on at least one preset portrait feature. Value rules; based on the value rules of each portrait feature, obtain the feature value of each portrait feature of the target object corresponding to the text to be classified.

In some embodiments, the target object is a customer service agent; the individual characteristics are used to characterize at least one of the following information items: agent level, agent length of service, the number of times the agent's speech does not comply with the predetermined speech rules within a predetermined statistical period, and A history of whether the agent's speech did not comply with the predetermined speech rules and the agent suffered a loss of profits due to the inclusion of sensitive words in the text to be classified.

In some embodiments, the classification determination module 630 is specifically configured to: process the feature values of the text class features through a first classification model to obtain the first text category of the text to be classified. The first classification model is pre-trained using sample texts. a model; generating a text classification result based on a predetermined correspondence between the value of the first text category and whether there is a predetermined type of noise.

In some embodiments, the classification determination module 630 is specifically used to: perform text classification processing on the text to be classified according to the feature values of the text feature and the feature value of the portrait feature to obtain the text classification result: through the second Classification model processing processes the feature values of text features and the feature values of portrait features to obtain the second text category of the text to be classified. The second classification model is a model pre-trained using sample text; according to the second text category There is a predetermined correspondence between the value and whether there is a predetermined type of noise, and a text classification result is generated.

According to the text classification device according to the embodiment of the present disclosure, the feature value of the text feature of the text to be classified can be generated based on the preset text feature and the text to be classified, and the generated feature value of the text feature can be subjected to text classification processing. , the text classification result is obtained, through which the text classification result can be used to determine whether there is a specified type of noise in the text to be classified; this method can determine whether there is a specified type of noise in the text to be classified based on the text class characteristics, so as to perform the text recognition process , the interference caused by noisy data can be reduced based on the classification results, which is beneficial to obtaining objective text recognition results.

Figure 7 is a block diagram of a text recognition device provided by an embodiment of the present disclosure.

Referring to FIG. 7 , an embodiment of the present disclosure provides a text recognition device. The text recognition device 700 includes the following modules.

The word recognition module 710 is used to perform sensitive word recognition on the acquired text to be recognized, and obtain sensitive word recognition results.

The classification module 720 is configured to perform text classification processing on the text to be identified based on the feature values of the text-type features of the text to be identified, and generate a text classification result. The text classification result is used to indicate whether a specified type of noise exists.

The result generation module 730 is configured to generate a text recognition result of the text to be recognized based on the sensitive word recognition result and the text classification result.

In some embodiments, the classification module 720 is specifically configured to perform text classification processing on the text to be recognized according to the text classification method in any of the above embodiments of the present disclosure, and obtain a text classification result.

In some embodiments, the text to be recognized is one of the dialogue texts of the target object obtained from the dialogue text; the classification module 720 is specifically used to: obtain the feature values of the portrait features of the text to be recognized; wherein the portrait features are used for Characterize the individual characteristics of the target object; based on the eigenvalues of text features and the eigenvalues of portrait features, perform text classification processing on the text to be recognized to obtain text classification results.

In some embodiments, the result generation module 730 is specifically configured to: if the number of sensitive words identified from the text to be recognized is greater than or equal to 1, and the text classification result is that there is no specified type of noise, then output the identified sensitive words, as text recognition results.

In some embodiments, the result generation module 730 is also specifically configured to: if the number of sensitive words recognized from the text to be recognized is equal to zero, determine that there are no sensitive words in the text to be recognized, and output the first prompt information; the first prompt information Used to indicate that the input sensitive words are empty; if the number of sensitive words identified from the text to be recognized is greater than or equal to 1, and the text classification result shows that there is a specified type of noise, determine whether the identified sensitive words are caused by the specified type of noise, in When it is determined that the identified sensitive word is caused by the specified type of noise, the second prompt information is output. The second prompt information is used to indicate that the sensitive word is caused by the specified type of noise, and when it is determined that the identified sensitive word is not caused by the specified type of noise The identified sensitive words are output as text recognition results.

In some embodiments, the dialogue text includes: in the target object and with the target object The conversation text of the target object generated during a call between the conversation objects and the conversation text of the conversation object with the target object, and the text to be recognized is one of the conversation texts of the target object. The text recognition device also includes: an acquisition module for acquiring new text to be recognized in the conversation text obtained by converting the acquired call voice; the result generation module 730 is also used to generate a text recognition result of the new text to be recognized until The number of acquisition times is equal to the number of text pieces of the target object's dialogue text, and the text recognition result of the dialogue text is obtained.

In some embodiments, the conversation text of the target object is the conversation text of the customer service agent.

Through this text recognition device, the presence or absence of a specific type of noise data in the dialogue text can be effectively determined based on the characteristic values of the text features of the text to be recognized; the determination result of the presence or absence of the specific type of noise data can be used to assist sensitive word recognition. The accuracy of sensitive word recognition is improved; the text recognition method proposed in this disclosure is at the text level, reducing the adverse effects of speech return noise and translation errors on text recognition results during the text recognition process, effectively reducing the presence of predetermined types of noise Under the premise of erroneous judgment of sensitive word recognition results, the accuracy of sensitive word recognition is improved.

FIG. 8 is a block diagram of an electronic device provided by an embodiment of the present disclosure.

Referring to Figure 8, an embodiment of the present disclosure provides an electronic device, which includes: at least one processor 801; at least one memory 802, and one or more I/Os connected between the processor 801 and the memory 802. Interface 803. The memory 802 stores one or more computer programs that can be executed by at least one processor 801, and the one or more computer programs can be executed by at least one processor 801, so that at least one processor 801 can perform the above text classification method or Any text recognition method.

Embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored. The computer program implements the above text classification method or any text recognition method when executed by a processor/processing core. Computer-readable storage media may be volatile or non-volatile computer-readable storage media.

An embodiment of the present disclosure also provides a computer program. When the computer program is run in a processor of an electronic device, the processor in the electronic device executes the above text classification method or any text recognition method.

Those of ordinary skill in the art can understand that all or some of the steps, systems, and functional modules/units in the devices disclosed above can be implemented as software, firmware, hardware and its appropriate combination. In hardware implementations, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may consist of several physical components. Components execute cooperatively. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on computer-readable storage media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).

As is known to those of ordinary skill in the art, the term computer storage media includes volatile and non-volatile media implemented in any method or technology for storage of information such as computer readable program instructions, data structures, program modules or other data. lossless, removable and non-removable media. Computer storage media include, but are not limited to, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), static random access memory (SRAM), flash memory or other memory technology, portable Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassette, magnetic tape, disk storage or other magnetic storage device, or that can be used to store the desired information and can be accessed by a computer any other medium. Additionally, it is known to those of ordinary skill in the art that communication media typically embodies computer readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery medium.

Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .

Computer program instructions for performing operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages. Source code written in any combination or object code, programming languages including object-oriented programming languages - such as Smalltalk, C++, etc., and conventional procedural programming languages - such as the "C" language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server implement. In situations involving remote computers, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through the Internet). connect). In some embodiments, by utilizing state information of computer-readable program instructions to personalize an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), the electronic circuit can Computer readable program instructions are executed to implement various aspects of the disclosure.

The computer program described here may be implemented specifically through hardware, software, or a combination thereof. In an optional embodiment, the computer program is embodied as a software product, such as a software development kit (Software Development Kit, SDK) and so on.

Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine that, when executed by the processor of the computer or other programmable data processing apparatus, , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.

Computer-readable program instructions can also be loaded into a computer, other programmable data processing device, or other equipment, so that when the computer, other programmable data processing device A series of operational steps are executed on a computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, such that instructions executed on a computer, other programmable data processing apparatus, or other device implement one or more of the methods in the flowcharts and/or block diagrams. The function/action specified in the box.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that contains one or more executable functions for implementing the specified logical functions instruction. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It will also be noted that each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.

Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a general illustrative sense only and not for purpose of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone, or may be used in conjunction with other embodiments, unless expressly stated otherwise. Features and/or components used in combination. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made without departing from the scope of the present disclosure as set forth in the appended claims.

Claims

A text classification method including:

Get the text to be classified;

Based on the preset text-type features and the text to be classified, generate feature values of the text-type features of the text to be classified; and

According to the characteristic value of the text-type feature, text classification processing is performed on the text to be classified to obtain a text classification result. The text classification result is used to indicate whether a specified type of noise exists.
The text classification method according to claim 1, wherein the preset text class features include at least one text class feature, and,

Wherein, generating feature values of the text-type features of the text to be classified based on the preset text-type features and the text to be classified includes:

determining a value rule for each text-type feature in the at least one text-type feature based on the at least one text-type feature; and

Based on the value rule of each text class feature, a feature value of each text class feature of the text to be classified is generated.
The text classification method according to claim 2, wherein the text to be classified is a dialogue text selected from pre-obtained dialogue texts,

Wherein, the at least one text-based feature includes at least one of the following text-based features:

Sensitive word distribution characteristics, which are used to characterize the distribution of sensitive words in the dialogue text;

Predetermined characteristics of the text itself, which are used to characterize the predetermined characteristics of the text to be classified; and

Predetermined features related to the dialogue text, which are used to characterize the predetermined features of the text to be classified and related to the dialogue text.
The text classification method according to claim 3, wherein the conversation text Including: the dialogue text of the target object and the dialogue text of the dialogue object generated during a call between the target object and the dialogue object with which the target object talks, and the text to be classified is the target A dialogue text within the object's dialogue text.
The text classification method according to claim 4, wherein the at least one text class feature includes the sensitive word distribution feature,

Wherein, determining the value rule of each text-type feature in the at least one text-type feature according to the at least one text-type feature includes: determining the first text-type feature included in the sensitive word distribution feature , the value rule of at least one text-type feature among the second text-type feature, the third text-type feature and the fourth text-type feature, the first text-type feature is used to represent: whether the dialogue text of the target object Only sensitive words exist in the text to be classified. The second text-type features are used to represent: whether the sensitive words in the text to be classified appear in the dialogue text of the conversation object. The third text-type features Used to characterize: whether there are sensitive words in the conversation text of the conversation object, and the fourth text type feature is used to represent: whether there are sensitive words in the predetermined conversation text and whether the sensitive words present in the predetermined conversation text are consistent with the said Whether the sensitive words in the text to be classified are consistent, the predetermined dialogue text is one of the dialogue texts of the dialogue object, and the predetermined dialogue text is text adjacent to the text to be classified, and

Wherein, generating the feature value of each text feature of the text to be classified based on the value rule of each text feature includes: based on the value rule of the first text feature, the At least one of the value rules of the second text-type feature, the value rule of the third text-type feature, and the value rule of the fourth text-type feature, generating a text corresponding to the third text type of the text to be classified. The characteristic value of at least one text-type feature among the first text-type feature, the second text-type feature, the third text-type feature and the fourth text-type feature.
The text classification method according to claim 4, wherein the at least one text feature includes a predetermined feature of the text itself,

Wherein, determining a value rule for each text-type feature in the at least one text-type feature based on the at least one text-type feature includes: determining a fifth text-type feature included in the predetermined feature of the text itself. and at least one of the sixth text-type features Value rules for text-type features, the fifth text-type feature is used to characterize: the sentence integrity information of the text to be classified, and the sixth text-type feature is used to characterize: the dialogue text of the target object the total number of times a specific term occurs in a specified position, and

Wherein, generating the feature value of each text feature of the text to be classified based on the value rule of each text feature includes: based on the value rule of the fifth text feature and the At least one of the sixth text-type feature value rules generates a feature value of the text to be classified corresponding to at least one of the fifth text-type feature and the sixth text-type feature.
The text classification method according to claim 4, wherein the at least one text-like feature includes the predetermined feature related to the dialogue text,

Wherein, determining a value rule for each text-type feature in the at least one text-type feature based on at least one preset text-type feature includes: determining the first predetermined feature included in the conversation text-related features. The value rules for at least one text feature among the seven text features and the eighth text feature. The seventh text feature is used to represent: the general text items contained in the dialogue text to which the text to be classified belongs. number, the eighth text type feature is used to characterize: the position where the text to be classified appears in the conversation text, and

Wherein, generating the feature value of each text feature of the text to be classified based on the value rule of each text feature includes: based on the value rule of the seventh text feature and the At least one of the value rules of the eighth text-type feature generates a feature value of the text to be classified corresponding to at least one of the seventh text-type feature and the eighth text-type feature.
The text classification method according to claim 1, wherein the text to be classified is a dialogue text among the dialogue texts of the target object,

Wherein, the step of performing text classification processing on the text to be classified according to the feature value of the text feature to obtain a text classification result includes:

Based on the preset portrait features, the feature value of the text to be classified corresponding to the portrait features of the target object is obtained, and the portrait features are used to characterize the target object. individual characteristics of the elephant; and

According to the feature values of the text feature and the feature value of the portrait feature, text classification processing is performed on the text to be classified to obtain the text classification result.
The text classification method according to claim 8, wherein the preset portrait features include at least one portrait feature, and

Wherein, obtaining the feature value of the text to be classified corresponding to the portrait feature of the target object based on the preset portrait features includes:

Determine the value rules for each portrait feature based on the at least one portrait feature; and

Based on the value rules of each portrait feature, a feature value of each portrait feature of the text to be classified corresponding to the target object is obtained.
The text classification method according to claim 1, wherein the text classification processing is performed on the text to be classified according to the characteristic value of the text class feature to obtain the text classification result, including:

The feature values of the text class features are processed by a first classification model to obtain the first text category of the text to be classified, where the first classification model is a pre-trained model; and

The text classification result is generated according to a predetermined correspondence between the value of the first text category and whether there is a predetermined type of noise.
The text classification method according to claim 8, wherein the text to be classified is subjected to text classification processing based on the feature values of the text feature and the feature value of the portrait feature to obtain the text classification Results include:

The feature values of the text-type features and the feature values of the portrait-type features are processed by a second classification model to obtain a second text category of the text to be classified. The second classification model is a pre-trained model. ;as well as

The text classification result is generated according to a predetermined correspondence between the value of the second text category and whether there is a predetermined type of noise.
A text recognition method including:

Perform sensitive word recognition on the acquired text to be recognized, and obtain the sensitive word recognition results;

Perform text classification processing on the text to be recognized according to the feature value of the text-type feature of the text to be recognized, and generate a text classification result, where the text classification result is used to indicate whether a specified type of noise exists; and

According to the sensitive word recognition result and the text classification result, a text recognition result of the text to be recognized is generated.
The text recognition method according to claim 12, wherein the text classification process is performed on the text to be recognized using the text classification method according to any one of claims 1 to 11 to obtain the text classification result.
The text recognition method according to claim 12 or 13, wherein generating the text recognition result of the text to be recognized according to the sensitive word recognition result and the text classification result includes:

If the number of sensitive words identified from the text to be recognized is equal to zero, it is determined that there are no sensitive words in the text to be recognized, and first prompt information is input, and the first prompt information is used to indicate that the sensitive word is empty;

If the number of sensitive words identified from the text to be recognized is greater than or equal to 1, and the text classification result is that there is no specified type of noise, then the identified sensitive words are output as the text recognition result ;as well as

If the number of sensitive words identified from the text to be recognized is greater than or equal to 1, and the text classification result indicates that the specified type of noise exists, it is determined that the identified sensitive words are caused by the specified type of noise, and Second prompt information is output, and the second prompt information is used to indicate that the sensitive word is caused by the specified type of noise.
A text classification device including:

The acquisition module is used to obtain the text to be classified;

Feature value generation module, which is used to generate text based on preset text features and the to-be-classified text, generating feature values of text-type features of the text to be classified; and

A classification determination module, which is configured to perform text classification processing on the text to be classified according to the characteristic value of the text class feature to obtain a text classification result, where the text classification result is used to indicate whether a specified type of noise exists.
A text recognition device including:

A word recognition module, which is used to identify sensitive words on the acquired text to be recognized and obtain sensitive word recognition results;

A classification module configured to perform text classification processing on the text to be identified based on the feature values of the text-type features of the text to be identified, and generate a text classification result, where the text classification result is used to indicate whether a specified type of noise exists; as well as

A result generation module, configured to generate a text recognition result of the text to be recognized based on the sensitive word recognition result and the text classification result.
An electronic device including:

at least one processor; and

a memory communicatively connected to said at least one processor,

Wherein, the memory stores one or more computer programs, and the one or more computer programs can be executed by the at least one processor, so that the at least one processor can execute any of claims 1-11. The text classification method described in one item.
An electronic device including:

at least one processor; and

a memory communicatively connected to said at least one processor,

Wherein, the memory stores one or more computer programs, and the one or more computer programs can be executed by the at least one processor, so that the at least one processor can execute any one of claims 12-14 The text recognition method described in the item.
A non-transitory computer-readable storage medium with a computer program stored thereon, which when executed by a processor implements any one of claims 1-11 The text classification method described in the item.
A non-transitory computer-readable storage medium on which a computer program is stored, which implements the text recognition method described in any one of claims 12-14 when executed by a processor.