CN117499528A

CN117499528A - Method, device, equipment and storage medium for detecting session quality

Info

Publication number: CN117499528A
Application number: CN202310565898.6A
Authority: CN
Inventors: 刘光华; 赵国庆; 蒋宁; 郭江
Original assignee: Mashang Xiaofei Finance Co Ltd
Current assignee: Mashang Xiaofei Finance Co Ltd
Priority date: 2023-05-18
Filing date: 2023-05-18
Publication date: 2024-02-02

Abstract

The application discloses a method, a device, equipment and a storage medium for detecting session quality, which are used for solving the problem of how to accurately test the quality of a session. The method comprises the following steps: acquiring a recording text and a conversation text of a conversation; the recorded text is a text file for storing the service information of the session recorded by the recording party; the session text is a text file generated based on session content of the session; determining a value of an edit distance between syllables of the named entity in the recorded text and syllables of the named entity in the conversational text; and determining a detection result representing the quality of the session according to the value of the editing distance. By adopting the scheme disclosed by the application, the detection result for characterizing the quality of the session is determined based on the value of the editing distance between the named entities, so that the accurate determination of the session quality can be realized.

Description

Method, device, equipment and storage medium for detecting session quality

Technical Field

The present disclosure relates to the field of intelligent quality inspection technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting session quality.

Background

Intelligent quality inspection is a technology for realizing automatic full-quantity quality inspection based on voice recognition, natural language understanding and big data processing technologies.

The intelligent quality inspection can be used for inspecting whether risks exist in voice data of a session recorded in scenes such as telemarketing, answering customer questions, prompting receipts, revisiting and the like, text data of an online customer service instant session and text data of an enterprise public number session; in addition, intelligent quality inspection can also be used to inspect the quality of a conversation (voice conversation or text conversation) between customer service and customer, such as detecting whether customer voice/text is accurately understood by customer service participating in the conversation.

At present, how to accurately test the quality of a session is a problem to be solved.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for detecting session quality, which are used for solving the problem of how to accurately and qualitatively detect the session quality.

In one aspect, an embodiment of the present application provides a method for detecting session quality, including:

acquiring a recording text and a conversation text of a conversation; the recorded text is a text file for storing the service information of the session recorded by the recording party; the session text is a text file generated based on session content of the session; determining a value of an edit distance between syllables of the named entity in the recorded text and syllables of the named entity in the conversational text; and determining a detection result representing the quality of the session according to the value of the editing distance.

In one aspect, an embodiment of the present application provides a device for detecting session quality, including:

the acquisition unit is used for acquiring the record text and the conversation text of the conversation; the recorded text is a text file for storing the service information of the session recorded by the recording party; the session text is a text file generated based on session content of the session; an edit distance determination unit configured to determine a value of an edit distance between syllables of a named entity in the recorded text and syllables of a named entity in the conversational text; and the session quality determining unit is used for determining a detection result representing the quality of the session according to the value of the editing distance.

In one aspect, embodiments of the present application provide a computing device comprising: the system comprises a memory and a processor, wherein the memory is used for storing a computer program; the processor is coupled to the memory and is configured to execute the computer program stored in the memory, so as to perform the method for detecting session quality.

In one aspect, embodiments of the present application provide a computer-readable storage medium having a computer program stored thereon, where the computer program, when executed by a computer, is capable of implementing the above-mentioned method for detecting session quality.

The above-mentioned at least one technical scheme that this application embodiment adopted can reach following beneficial effect:

for the quality detection scene of a conversation including a voice conversation, the quality requirements can be met as long as the voice is similar to the same named entities respectively appearing in the recorded text and the conversation text of the conversation, and whether the characters are completely consistent or not is not a very strict requirement in the scene. With such consideration, in the embodiment of the application, the detection result for characterizing the quality of the conversation is determined according to the value of the editing distance between the syllable of the named entity in the recorded text and the syllable of the same named entity in the conversation text, so that the effect of accurately determining the quality of the conversation can be achieved, and the conversation quality detection requirement in the scene can be met.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a schematic diagram of entering work order information in a "telemarketing" scenario;

FIG. 2 is a schematic diagram of a voice conversation recorded in a "telemarketing" scenario to obtain a conversation text;

Fig. 3 is a flowchart of a specific implementation of a method for detecting session quality according to an embodiment of the present application;

FIG. 4 is a schematic diagram showing various selectable configuration items corresponding to different alignment strategies in an interface;

fig. 5 is a schematic diagram of an application process of a method for detecting session quality in an actual scenario according to an embodiment of the present application;

fig. 6 is a schematic diagram of a specific structure of a device for detecting session quality according to an embodiment of the present application;

fig. 7 is a schematic diagram of a specific structure of a computing device according to an embodiment of the present application.

Detailed Description

For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

As one of ordinary skill in the art can appreciate, with the development of technology and the appearance of new scenes, the technical solutions provided in the embodiments of the present application are applicable to similar technical problems.

The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which the embodiments of the application described herein have been described for objects of the same nature. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

In order to solve the problem of how to accurately test the quality of the session in the prior art, the embodiment of the application provides a method for detecting the quality of the session.

In some alternative embodiments, the session may include, but is not limited to, a voice session in a telemarketing, answering customer questions, prompting, revisiting, etc., scenario; an instant text session of an online customer service; text conversation based on the public number of the enterprise; etc.

Taking the "telemarketing" scenario as an example, as shown in fig. 1, a telemarketer may perform a business/product promotion by establishing a voice call with a customer, and the established voice call may be referred to as a voice session. According to the prior art, when a telephone sales person performs a voice call with a customer or after the voice call is finished, service information corresponding to the voice call is used as "work order information" according to the voice call content between the telephone sales person and the customer, input equipment such as a keyboard, a mouse, a microphone and the like is adopted to input the work order information into a text file called "work order", and then the work order is stored into a work order system for uniformly storing the work orders. The service information may include, for example, information related to the service obtained by a telemarketer from a voice session, such as a customer name, occupation, age, and the like.

In such a scenario, if the quality of the voice session needs to be detected, the inventors propose voice data that can be obtained by recording the voice session with a recording device. As shown in fig. 2, the voice session may be recorded to obtain voice data; based on the voice data, corresponding conversation text can be generated by performing voice recognition on the voice data; further, by comparing whether named entities (such as Zhang san in the name field or teacher in the professional field, which are named entities) in the same name field (such as name field or professional field) in the conversation text and the worksheet are consistent, a detection result of the quality of the current voice conversation can be obtained. The quality detection result can reflect whether the telephone sales personnel performs full communication with the clients, inquires and records important information, and the like to a great extent.

Furthermore, on the one hand, it is considered that the conversation has a form of text conversation in addition to a form of voice conversation; on the other hand, considering that the detection result of the quality has a certain requirement for accuracy, the inventors propose to determine the detection result characterizing the quality of the conversation from the value of the edit distance between the syllable of the named entity in the recorded text of the conversation and the syllable of the named entity in the conversation text of the conversation.

Wherein the session may be any form of session including, but not limited to, a text session, a voice session, a video session, etc.; the recording text is a text file for storing the service information of the conversation recorded by the recording party, namely, a work order recorded by a telephone sales person in the example above; the conversation text is a text file generated based on the conversation content of the conversation, such as a conversation text generated by performing voice recognition on voice data in the above example, or a conversation text generated by directly storing the conversation content of a text conversation.

Based on the above consideration, a flowchart of a specific implementation of the method for detecting session quality according to the embodiment of the present application is shown in fig. 3.

The execution main body of the method can be any computing device or computing device cluster capable of realizing the method, such as a server, a mobile phone, a personal computer, intelligent wearable devices, intelligent robots and the like, or a computing device cluster formed by one or more of the computing devices; alternatively, the execution subject of the method may be software such as a session quality detection tool.

The execution sequence of the different steps is not limited in the embodiment of the present application. When the method provided by the embodiment of the application is used, the execution sequence of different steps can be adjusted according to actual requirements.

The flow recording method shown in fig. 2 may include the following steps:

step 31: recording text and conversation text of the conversation are obtained.

The session may be any form of session including, but not limited to, a text session, a voice session, a video session, and so forth.

The record text is a text file storing the service information of the session entered by the recording party.

In one specific example, as shown in FIG. 2, the party may be a telemarketer; the text file entered by the recorder corresponding to the session may be a work order, for example. In this example, the business information corresponding to the session in the text is recorded, such as including but not limited to: customer name, occupation, age, etc.

In another specific example, the logger may be, for example, an online customer service; the text file corresponding to the instant text conversation recorded by the recording party can be, for example, a work order recorded by an online customer service according to the communication content with a client in the instant text conversation process. In this example, the business information of the session in the text is recorded, such as including but not limited to: customer name, occupation, age, etc. In addition, the service information may include, but is not limited to: keywords in questions to be answered, such as property insurance, etc., presented by the customer.

The session text is a text file generated based on session content of a session.

In a specific example, as shown in fig. 2, the conversation text may be generated by performing speech recognition on speech data obtained by recording a speech conversation. For example, taking the case that the telephone sales person speaks "dangerous seed 1 is in five-fold preference" in the voice conversation, after the voice recognition is performed on the corresponding voice data, the generated conversation text at least contains the text that "dangerous seed 1 is in five-fold preference".

In an alternative embodiment, the speech data may be speech recognized, for example, using automatic speech recognition (Automatic Speech Recognition, ASR) techniques, to effect conversion of the speech data into conversational text.

In another specific example, the conversation text may be conversation text generated by storing conversation content during an instant text conversation as shown in FIG. 3. Taking the session content shown in fig. 3 as an example, the generated session text may include text of the session content between the online customer service and the client.

Recording one of the differences between text and session text, comprising: the dominant of the recorded text is a recorder, which can reflect subjective understanding of the content of the conversation by the recorder; the conversation text is an objective record of the actual conversation content.

Based on the above-described characteristics of the recorded text and the session text, by performing the subsequent step 32, a difference between the subjective understanding of the session content by the recorder and the objective description of the actual session content can be obtained. The larger the difference is, the more the subjective understanding of the recorder on the conversation content is likely to deviate from objective facts, and the quality of the conversation at this time is likely to be poor; conversely, the smaller the difference, the less likely the recorder's subjective understanding of the session content deviates from objective facts, and the quality of the present session may be higher.

Step 32: a value of an edit distance between syllables of the named entity in the recorded text and syllables of the named entity in the conversational text is determined.

Named Entity (name Entity), which is a person name, organization name, place name, and all other entities identified by name. The broader named entities may also include numbers, dates, currencies, addresses, and the like.

When the conversation is based on Chinese (such as Mandarin or dialect), the syllables of the named entity may be the pinyin of the named entity. When the conversation is based on English, the syllables of the named entity may be English syllables of the named entity.

It will be appreciated by those skilled in the art that when the conversation is based on other languages than chinese, english, the syllables of the named entity may be syllables of the other languages.

It should be noted that syllables are the smallest phonetic units of the combined pronunciation of single vowel phones and consonant phones in a language, and of course, single vowel phones may be self-forming syllables. Syllables of Chinese are phonetic units formed by spelling initial consonants and vowels, and single vowels can also form syllables. For non-tonal languages such as western english and russian, the main pronunciation of the language is syllable. In particular, for the tone languages such as Chinese, thai, and the like, the voice of the tone language has tone besides syllables, and the syllables plus the tone are pronunciation. Syllables described in the examples of the present application may be either tone-free or tone-loaded.

It will be appreciated by those skilled in the art that in the embodiment of the present application, when determining the value of the edit distance between the syllable of the named entity in the recorded text and the syllable of the named entity in the conversation text, the value of the edit distance between the syllables of the named entity of the same type in the recorded text and the conversation text may be determined, or the value of the edit distance between the syllables of the named entity in the field of the same name in the recorded text and the conversation text may be determined.

In a specific example, for example, a named entity of the type "person name" exists in both the recorded text and the text of the conversation, and syllables of the named entity of the type in the recorded text are specifically "ZhangSan" (corresponding to the word Zhang Sa), and syllables of the named entity of the type in the text of the conversation are specifically "zhangSa" (corresponding to the word Zhang Sa), then in step 32, the value of the edit distance between "ZhangSan" and "zhangSa" may be determined.

In another specific example, for example, a field of the name "place name" exists in both the recorded text and the conversation text, and syllables of the named entity in the field of the name in the recorded text are specifically "Hu Nan", and syllables of the named entity of the type in the conversation text are specifically "Fu Nan", then in step 32, the value of the edit distance between "Hu Nan" and "Fu Nan" may be determined.

The edit distance is an index for measuring the degree of difference between two character strings, and specifically means the minimum number of editing operations required to change from one to the other between two character strings. The larger the value of the edit distance of the two strings, the more different they are.

Step 33: and determining a detection result representing the quality of the session according to the determined value of the editing distance.

As described above, the larger the value of the edit distance, the more different the respective two character strings are.

Thus, in an alternative embodiment, an edit distance threshold may be preset, and in step 33, if the value of the determined edit distance obtained by the comparison is greater than the edit distance threshold, it may be determined that the quality of the session is high, so as to obtain a detection result that characterizes the quality of the session as high. If the value of the determined editing distance is not greater than the editing distance threshold value through comparison, the quality of the conversation can be determined to be low, and therefore a detection result representing that the quality of the conversation is low is obtained. The detection result for characterizing the quality of the session may refer to, for example, outputting a detection result characterizing the quality of the session.

In an alternative embodiment, when the quality of the session is determined to be high, a detection result indicating that the quality of the session is high may be output, for example, a prompt text of "high session quality" that can be displayed on the display screen may be output. When it is determined that the quality of the session is low, a detection result indicating that the quality of the session is low may be output, for example, a prompt text of "low session quality" that can be displayed on the display screen may be output. In particular, the prompt text of "low session quality" may be output to the recorder so that the recorder knows that the previous session quality is to be improved and takes corresponding quality improvement measures.

By adopting the method provided by the embodiment of the application, the quality of the conversation is determined according to the value of the editing distance between syllables of the named entity in the recorded text of the conversation and syllables of the named entity in the conversation text of the conversation, and the value of the editing distance between syllables is a quantized value which can objectively and accurately measure the degree of differentiation between syllables (for example, between the pinyin of two named entities or between English syllables corresponding to two named entities).

Some alternative embodiments relating to this method are further described below.

In an alternative embodiment, to further improve the accuracy of the determined session quality, an implementation of step 33 may include:

determining a value of text similarity between the named entity in the recorded text and the named entity in the conversation text; and determining a detection result representing the quality of the session according to the value of the editing distance and the value of the text similarity.

The text similarity referred to herein may be, but is not limited to, calculated using a short text similarity algorithm. The text similarity value calculated using the short text similarity algorithm is typically a value that falls within 0, 1.

When the text similarity value is a value falling within [0,1], the edit distance value may be normalized to obtain a value falling within [0,1 ].

Since the value of the text similarity and the value of the editing distance obtained in the above manner are positively correlated with the similarity degree between named entities, the value of the text similarity and the value of the editing distance after normalization can be directly added, and the sum value is compared with a preset threshold value to determine the quality of the session. Specifically, if the sum is greater than a preset threshold, determining that the quality of the session is high, and obtaining a detection result representing that the quality of the session is high; otherwise, the quality of the session can be determined to be low, and a detection result indicating that the quality of the session is high can be obtained.

The method for determining the detection result by combining the text similarity value and the editing distance value can be realized:

on the basis of measuring the similarity degree of pronunciation of the same named entity in the recorded text and the conversation text by the value of the editing distance, the value of the text similarity is further introduced to measure the similarity degree of the text content of the same named entity, so that the similarity degree of the same named entity can be comprehensively examined from the two aspects of 'sound similarity' and 'shape similarity', and when business information of a conversation is recorded in an actual scene, a text recording strategy according to a pronunciation approximation principle and a text recording strategy according to a character identical principle which are possibly adopted are considered, so that the quality of the conversation can be judged fairly, and the accuracy of a detection result representing the quality of the conversation is ensured.

In an alternative embodiment, determining the detection result characterizing the quality of the session according to the value of the editing distance and the value of the text similarity may specifically include:

determining a quality inspection score value of a session according to the value of the editing distance, the value of the text similarity, the number of syllables of a named entity in a recorded text, a preset editing distance weight value and a preset text similarity weight value; and determining a detection result representing the quality of the session according to the quality detection score.

In one specific example, the quality score value for a session may be calculated, for example, using the following formula:

quality score = [ (number of syllables of named entity in recorded text-value of edit distance)/number of syllables of named entity in recorded text ]. Preset edit distance weight value + value of text similarity × preset text similarity weight value

In the above formula "(value of number of syllables of named entity in recorded text-edit distance)/number of syllables of named entity in recorded text", it is equivalent to calculation: when converting the syllable characters of the named entity in the recorded text into the syllable characters of the named entity in the conversation text, the ratio of the number of characters for editing operation to the number of characters of the syllable of the named entity in the recorded text is not required. This calculation corresponds to normalizing the value of the edit distance to the section of [0,1 ].

The preset editing distance weight value and the preset text similarity weight value in the formula can be set according to actual requirements. Typically, the sum of the preset edit distance weight value and the preset text similarity weight value is equal to 1. For example, the preset edit distance weight may be 0.6, and the preset text similarity weight may be 0.4.

By adopting the formula to calculate the quality inspection score value, on one hand, the value of the editing distance can be normalized to the section to which the value of the text similarity belongs, so that the sum value of the value and the value can be conveniently calculated; on the other hand, the "contribution degree" of the value of the editing distance and the value of the text similarity in the quality inspection score calculation process can be controlled by using the preset editing distance weight value and the preset text similarity weight value.

By solving the sum of the two values, the calculated quality inspection score value can comprehensively examine the similarity degree of the same named entity from two aspects of sound and shape, and the possible adopted text input strategy according to the principle of pronunciation approximation and text input strategy according to the principle of text similarity are considered when the service information of the conversation is input in the actual scene, so that the quality of the conversation can be judged fairly, and the accuracy of the detection result of the quality of the conversation is ensured. And the contribution degree is introduced when the sum value of the two is obtained, so that the effect of conveniently adjusting the matching degree of the quality inspection score value obtained by calculation and the session type can be achieved.

For example, for a voice conversation, since the business information recorded by the recorder is likely to be the same as the business information actually notified by the client, but the text difference is large, for example, the business information recorded by the recorder is "Wu Yi", and the corresponding text in the conversation text is "Wu Yi", when the above formula is applied to quality detection of the voice conversation, the edit distance weight value may be set to be greater than the text similarity weight, so that when calculating the quality detection score, the "contribution degree" of the edit distance weight value is higher, and thus the calculated quality detection score may be more adapted to the scene of the voice conversation.

Similarly, when the above formula is applied to quality detection of a text session, the edit distance weight value may be set smaller than the text similarity weight, so that when the quality detection score value is calculated, the "contribution degree" of the text similarity value is higher, and thus the calculated quality detection score value may be more suitable for the scene of the "text session".

In an alternative embodiment, when the magnitude of the quality inspection score value positively correlates with the quality of the session, determining a detection result characterizing the quality of the session according to the quality inspection score value may specifically include the following sub-steps a to d:

Substep a: if the quality inspection score value is greater than or equal to a preset first score threshold value, determining that the quality of the session is high, and obtaining a detection result representing that the quality of the session is high; if the quality score value is less than a preset first score threshold, i.e. the quality of the conversation is likely to be low, then the number of characters with the same syllable in the named entity in the recorded text and the named entity in the conversation text may be further determined.

In a specific example, if the named entity in the recorded text is "world international film city", the syllable (pinyin) thereof is "huanqiuguojingcheng", and the named entity in the conversational text is "sailball international silver chen", the syllable thereof is "fanqiuguojiyin chen", then if the quality score value of the named entity "world international film city" in the recorded text is smaller than the preset first score threshold, the number of characters having the same syllable in the "huanqiuguojingcheng" and the "fanqiuguojiyin chen" may be determined further according to the "huanqiuguojingcheng" and the "fanqiuguojichen", that the same syllable may be determined by the comparison, so that the number of characters may be determined to be 3.

Substep b: and judging whether the named entity in the recorded text and the named entity in the conversation text are consistent or not according to the determined number of characters with the same syllable (hereinafter referred to as the number of characters) in the named entity in the recorded text and the named entity in the conversation text and a target ratio strategy preset based on a number threshold.

In an alternative embodiment, the target ratio strategy set in advance based on the number threshold may be, for example, the first strategy.

The first strategy comprises: when the number of the characters is greater than or equal to a first number threshold, judging that the named entity in the record text is consistent with the named entity in the conversation text; otherwise, it may be determined that there is no agreement.

In an alternative embodiment, the target ratio strategy, which is set in advance based on the number threshold, may be, for example, the second strategy.

The second strategy comprises: when the number of the characters is smaller than the first number threshold value but larger than the second number threshold value, the quality inspection score value is further combined to judge whether the named entity in the recorded text is consistent with the named entity in the conversation text. Wherein the first number threshold is greater than the second number threshold.

In a specific example, if the quality inspection score value is greater than the second score threshold, determining that the named entity in the recorded text is consistent with the named entity in the session text; otherwise, the judgment is inconsistent. Wherein the first score threshold is greater than the second score threshold.

Those skilled in the art will appreciate that the target contrast strategy may be other strategies as well, and is not limited to either the first strategy or the second strategy.

In an alternative embodiment, the target ratio policy may be selected from different ratio policies set in advance based on different number thresholds before determining whether the named entity in the recorded text and the named entity in the conversation text agree according to the number of characters and the target ratio policy set in advance based on the number threshold.

In one specific example, a currently open ratio strategy may be selected as the target ratio strategy, for example. Wherein the "on", i.e. active.

In an alternative implementation manner, the execution body of the method provided in the embodiment of the present application may display, in an interface as shown in fig. 4, each selectable configuration item respectively corresponding to a different alignment policy. Each selectable configuration item may be an interface element that can be selected, such as a control.

If the selectable configuration item displayed in the interface is selected, that is, the execution body receives a selection instruction for the selectable configuration item, then a comparison policy corresponding to the selected selectable configuration item may be opened. And for other optional configuration items that are not selected, no opening may be performed.

In fig. 4, policy one is a first policy and policy two is a second policy. By selecting a circular control displayed in front of the first strategy (or the second strategy) in the interface, a selection instruction for the selectable configuration item can be triggered to be generated, and the first strategy (or the second strategy) corresponding to the selected circular control can be started in response to the selection instruction.

Substep c: and c, determining a detection result for characterizing the quality of the session according to the judgment result obtained by executing the substep b.

In an alternative embodiment, whether the target ratio policy is the first policy or the second policy, when it is determined that the named entity in the record text is consistent with the named entity in the session text, the quality of the session may be determined to be high, and a detection result indicating that the quality of the session is high may be obtained; otherwise, when the quality of the session is inconsistent, the quality of the session can be determined to be low, and a detection result indicating that the quality of the session is low is obtained.

In this embodiment of the present application, by adopting the sub-steps a to c, when the quality inspection score is smaller than the preset first score threshold, the quality of the session is not directly determined to be low, but the detection result for characterizing the quality of the session is further determined according to the number of the named entities in the recorded text and the characters with the same syllable in the named entities in the session text, and the target ratio strategy set based on the number threshold in advance, which is equivalent to that the criterion for determining the quality of the session is relaxed, so that the problem of "one cut" generated by determining the quality of the session only according to the magnitude relation between the quality inspection score and the first score threshold can be avoided, thereby effectively avoiding the erroneous determination for the quality of the session.

In an alternative implementation manner, the method for detecting session quality provided in the embodiments of the present application may also be used in combination with some related technologies.

For example, in a specific example, before performing step 32, the method for detecting session quality may further include:

and after the named entities in the recorded text and the named entities in the conversation text are precisely compared, obtaining a comparison result with inconsistent comparison.

That is, the accurate comparison may be performed to obtain a comparison result with inconsistent comparison, which may be a condition for triggering the execution of step 32.

Wherein, the precise comparison may include, but is not limited to, at least one of the following:

1. and comparing whether the characters of the named entity in the recorded text are completely consistent with the characters of the named entity in the conversation text.

For example, in a specific example, assuming that the named entity in the record text is "Wang Fujian" and the named entity in the session text is also "Wang Fujian", it is known by comparison that the two are identical.

In a specific example, assuming that the named entity in the recorded text is "Wang Fujian" and the named entity in the session text is "Wang Hujian", it is known by comparison that the two are not completely identical.

2. And comparing whether each syllable corresponding to the named entity in the recorded text is completely consistent with each syllable corresponding to the named entity in the conversation text.

For instance, in a specific example, assuming that the named entity in the record text is "wangfujian" and the named entity in the session text is also "wangfujian", it is known that the two are completely identical by comparison.

In a specific example, assuming that the named entity in the record text is "wanghujian" and the named entity in the session text is "wanghujian", it is known by comparison that the two are not completely identical.

In an alternative embodiment, the characters of the named entities in the recorded text and the characters of the named entities in the conversation text can be compared and matched; if not, further comparing whether each syllable corresponding to the named entity in the recorded text is completely consistent with each syllable corresponding to the named entity in the conversation text; if there is still inconsistency, step 32 may be performed.

In the embodiment of the application, accurate comparison is adopted first, when the comparison result of the accurate comparison is inconsistent, step 32 is executed, so that the fact that the named entity in the recorded text and the named entity in the conversation text cannot be accurately matched, including incomplete text consistency, or the corresponding syllables are not completely consistent, can be achieved, and step 32 is triggered and executed; and when the matching can be performed accurately, including the complete consistency of characters and/or the complete consistency of corresponding syllables, the detection result with high quality for representing the conversation can be directly obtained, so that the detection result can be obtained with higher efficiency when the named entity in the recorded text and the named entity in the conversation text can be matched accurately. Meanwhile, even if the accurate matching cannot be realized, the step shown in the figure 2 can be adopted to realize 'inaccurate comparison', so that false judgment of a detection result caused by too strict mode of accurate comparison is avoided.

The following describes an application manner of the method provided in the embodiment of the present application in an actual scenario. In this actual scenario, it is assumed that the purpose of performing session quality detection is to determine the quality of a voice session completed by human customer service.

Specifically, if it is assumed that the execution body of the method in the actual scenario is a detection tool for session quality, the detection tool may execute the method provided in the embodiment of the present application in the actual scenario, where the method specifically includes the following steps as shown in fig. 5:

step 51: acquiring conversation corpus of voice conversation to be subjected to quality inspection;

the speech corpus of the speech session to be inspected refers to speech data obtained by recording the interactive speech between two parties in the process of carrying out the speech session between the manual customer service and the client.

Step 52: converting the acquired voice data into a conversation text by adopting an ASR technology;

step 53: extracting a named entity of the target type from the session text by using a named entity recognition (Named Entity Recognition, NER) technology;

in the embodiment of the present application, it is assumed that the target type includes a "name" type.

Step 54: accurately comparing the named entity of the 'name' type extracted from the conversation text with the value of the 'name field' in a work order (equivalent to the recorded text) input by a manual customer service to judge whether the matching is identical; if the judgment result is negative, executing step 55; if yes, go to step 514;

For example, the detection tool of the conversation quality performs quality detection on 5 voice conversations to be detected with the numbers of 001-005, wherein the values of the "name field" in the worksheets corresponding to the 5 voice conversations shown in the following table 1 and the named entities of the "name" type extracted from the conversation texts of the 5 voice conversations are accurately compared, so as to obtain the comparison result shown in the table 1.

Table 1:

as can be seen from step 54, for "Liu Fan" and "Zheng Xingzhong" with "consistent" comparison results, it can be determined that the quality of the voice session to be inspected is high with the corresponding number.

For "Tan Jian", "Wu Yi" and "old Gong Jie" where the comparison is "inconsistent", step 55 may be further performed.

Of course, the 5 named entities in table 1 may also be session text based on the same voice session to be inspected. In this case, even if there are named entities whose comparison results are consistent, the quality of the voice session to be inspected may be tentatively determined, and step 55 and subsequent steps thereof may be continued.

Step 55: and converting the name entity with inconsistent comparison results and the value of the name field in the corresponding work order into pinyin, and sequencing the pinyin obtained by conversion according to the sequence of characters before conversion so as to ensure that the sequence of pinyin is consistent with the sequence of characters before conversion.

For example, the text in table 1 is converted into pinyin, and pinyin as shown in table 2 can be obtained.

Table 2:

step 56: accurately comparing whether the pinyin obtained by conversion and sequencing is completely matched and consistent;

specifically, the pinyin obtained by converting the named entity and the pinyin obtained by converting the value of the name field are accurately compared to judge whether the complete matching is consistent; if the determination result is negative, go to step 57; if yes, go to step 514;

using the example shown in table 2, the following comparison results shown in table 3 can be obtained by performing step 56.

Table 3:

step 57: calculating the value of the editing distance between the pinyin obtained by converting the pinyin accurate comparison inconsistent named entity and the pinyin obtained by converting the value of the corresponding name field;

in step 57, a matrix may be created for storing the calculated edit distance values.

The specific calculation process for establishing the matrix and realizing the value of the editing distance based on the matrix can comprise the following substeps 1-4:

sub-step 1: and establishing a two-dimensional matrix according to the lengths of the character string a and the character string b. The size of the matrix is (n+1, m+1), n is the length of a, and m is the length of b.

In a specific example, assuming that the string a=love, b=lolpe, then n is 4 and m is 5. As shown in table 4 below, such a matrix may be established for storing the calculated values of the edit distance.

Table 4:

0

l

o

l

p

e

0

l

o

v

e

sub-step 2: initializing the values in the first row and the values in the first column of table 4;

specifically, sub-step 2 includes: filling the first column with values sequentially increasing from 0 to n; the first line is filled with values that increment from 0 to m in sequence. After filling these values in table 4, table 5 below can be obtained.

Table 5:

/>

sub-step 3: the matrix is traversed starting from the second row and column of table 5 (see dark cells noted in table 5), the value of the current position is calculated and filled in table 5.

The matrix traversal can be sequentially traversed row by row until the whole matrix is traversed; or sequentially traversing column by column until the complete matrix is traversed.

Specifically, in sub-step 3, the respective values d [ i ] [ j ] to be filled in table 5 may be calculated as follows:

when the ith character in the character string a is the same as the jth character in the character string b, di j=di-1 j-1; and when the ith character in the character string a is different from the jth character in the character string b, di j=min (di-1 j, di j-1, di-1 j-1) +1.

Based on such a calculation, for example, the calculation of the value located in the second row and the second column in table 5 includes: since the 1 st character in the character string a and the 1 st character in the character string b are the same and "l", i=1, j=1, j=d [ i ] [ j ] =d [ i-1] [ j-1] =d [0] [0] corresponding to the value is determined in the above calculation manner.

As can be seen from the lookup table 5, d [0] [0] =0, so the second row and the second column in table 5 are filled with the value of 0.

For another example, the calculation for the value in the third column of the second row in table 5 includes: since the 1 st character "l" in the character string a is different from the 2 nd character "o" in the character string b, the value corresponding to i=1, j=2 is determined, and d [1] [2] =min (d [ i-1] [ j ], d [ i ] [ j-1], d [ i-1] [ j-1 ]) +1=min (d [0] [2], d [1] [1], d [0] [1 ]) +1 in the above calculation manner.

As can be seen from table 5 after filling 0 in the second row and the second column, d [0] [2] =2, d [1] =0, d [0] [1] =1, d [1] [2] =min (2,0,1) +1=0+1=1 can be calculated. The second row and the third column in table 5 are thus filled with this value of 1.

By analogy, each position to be filled in table 5 can be filled in completely by executing sub-step 3.

For example, the final filled table 5 is shown in table 6:

table 6:

sub-step 4: and determining a final calculation result.

And acquiring the value of the right lower corner of the matrix as the value of the editing distance between the character string a and the character string b which are finally calculated.

Based on table 6, it can be determined that the value of the edit distance between the resulting character strings a=love and b=lolpe is 2 (see the lower right corner value in table 6).

Step 58: calculating the value of the similarity of short text between the naming entity with inconsistent pinyin accurate comparison and the value of the corresponding name field;

step 59: calculating a quality inspection score according to the calculated value of the editing distance, the short text similarity value and a preset quality inspection score calculation formula;

the preset quality inspection score calculation formula may be, for example, as follows:

quality score = [ (value of number of characters of syllable of named entity in recorded text-edit distance) ] +.

Record the number of syllables of named entity in text ] x65% + value of text similarity x 35%

In the formula, 65% is a preset editing distance weight value, and 35% is a preset text similarity weight value.

Generally, the preset edit distance weight value may be in the range of [45%,75% ], and the preset text similarity weight value may be in the range of [25%,55% ]. The preset editing distance weight value and the preset text similarity weight value satisfy the following conditions: preset edit distance weight + preset text similarity weight = 1.

Step 510: judging whether the quality inspection score value is smaller than a preset first score threshold value or not; if the determination result is no, go to step 514; if yes, go to step 511;

step 511: judging the opening of the spam strategy;

specifically, it is determined whether the spam policy is opened, and whether the spam policy 1 or the spam policy 2 is specifically selected to be opened.

In step 511, there may be at least three cases of the obtained determination result:

1. the spam strategy is started, and the selective starting is the spam strategy 1;

2. the spam strategy is started, and the selective starting is the spam strategy 2;

3. the bottom-of-pocket strategy is not turned on.

For the above three determination results, when the determination result is "no spam policy is opened", step 513 may be executed, i.e. a conclusion that the quality inspection comparison is inconsistent is output; when the determination result is two other, step 512 may be performed.

Step 512: executing the spam strategy 1 when the spam strategy 1 is selected to be opened; when the open is the spam policy 2, the spam policy 2 is executed.

Specifically, executing the spam policy 1 can include:

determining the number of the same characters in the values of the named entities in the conversation text and the corresponding fields in the worksheet; if the number is 2 or more, step 514 may be performed;

If the number is less than 2, determining the number of characters with the same pinyin in the values of the corresponding fields in the named entities and the worksheets in the conversation text; if the number is 2 or more, step 514 may be performed; otherwise, step 513 may be performed.

Specifically, executing the spam policy 2 can include:

determining the number of the same characters in the values of the named entities in the conversation text and the corresponding fields in the worksheet;

if the number of the same characters is greater than or equal to 1, it may be further determined whether the quality inspection score is greater than a preset second score threshold (e.g., 0.6); if it is determined that the quality score value is greater than the preset second score threshold, step 514 may be performed; if it is determined that the quality score value is not greater than the preset second score threshold, step 513 may be performed;

if the number of the same characters is 0, determining the number of the characters with the same pinyin in the values of the corresponding fields in the naming entity and the work order in the conversation text;

if the number of the characters with the same pinyin is greater than or equal to 1, whether the quality inspection score value is greater than a preset second score threshold (for example, 0.6) can be further judged; if it is determined that the quality score value is greater than the preset second score threshold, step 514 may be performed; if it is determined that the quality score value is not greater than the preset second score threshold, step 513 may be performed.

By adopting any spam strategy, the problem that the quality detection result of the session is not accurate enough due to the fact that multiple words or missing words are generated when NER is adopted to obtain the named entity can be solved. The specific analysis is as follows:

according to the research of the inventor, in many scenes, the names of people uttered by clients are often connected with the context semantics to form other words, which is the main reason for the occurrence of the above situation.

Some examples of situations where this occurs are:

the conversation text obtained by conversion of the conversation corpus comprises: "I am the German of the Wood Li Pinde in, when NER is used to obtain named entity, it will extract" Li Pinde "as the name of the person-more than one" Ping "word is extracted relative to the correct name" Maid "in the worksheet.

For another example, the conversation text obtained by converting the conversation corpus includes: "does the call Zhao Ganghao hear? "when NER is used to obtain named entities from it," Zhao Gang "is extracted as a person name-a" good "word is not extracted relative to the name" Zhao Ganghao "with correct pinyin in the worksheet.

For the above situation, the above-mentioned spam strategy 1 or spam strategy 2 may generate a certain tolerance to the case of "exact alignment inconsistency", that is, even if the exact alignment cannot be consistent, it may be required that only 2 words (or pinyin) are aligned consistently, or 1 word (or pinyin) is aligned consistently and the quality inspection score is greater than a certain threshold. The tolerance can well avoid the problem of inaccurate conversation quality detection results caused by the multi-extraction or the missing extraction.

In the embodiment of the application, the interface shown in fig. 4 can be presented to the user using the detection tool of the session quality, so as to support the user to combine the service condition of the user, and flexibly customize the quality inspection severity.

For example, for a session with low strict requirements on individual quality inspection, the user can choose to open a spam policy, so that the problems caused by multi-extraction or missed extraction can be tolerated to a great extent, and the limited resources of manual quality inspection and review are ensured to be put into the review of a session with poor quality.

In FIG. 4, the user may implement the open or close spam policy by clicking a button behind "quality check spam".

In FIG. 4, policy one is the spam policy 1 and policy two is the spam policy 2. By selecting a circular control displayed in front of the policy one (or the policy two) in the interface, a selection instruction for the selectable configuration item can be triggered to be generated, and in response to the selection instruction, the spam policy 1 (or the spam policy 2) corresponding to the selected circular control can be opened.

In fig. 4, the specific contents of the policy one and the policy two may be displayed, so that the user may select the spam policy according to his own needs by referring to the specific contents of the policies.

For example, in fig. 4, the specific content of policy one is "the pinyin comparison of two or more characters or words in the conversation text and the work order is consistent, and the quality inspection is considered to be consistent"; the specific content of the policy II is policy II: in the conversation text and the work order, the pinyin comparison of one or more characters or words is consistent, and the quality inspection consistency is considered if the quality inspection score is higher than the threshold value.

Step 513: and outputting a result of inconsistent quality inspection comparison, and displaying the result in an interface of the computing device.

The computing device may be a computing device on which the detection tool for session quality is deployed.

Step 514: and outputting a result of consistent quality inspection comparison, and displaying the result in an interface of the computing device.

The application process of the method for detecting the session quality in the actual scene can be known, and the method can be applied in the actual scene to detect the session quality by combining various similarity determination modes including word comparison, pinyin comparison and edit distance comparison. When the similarity determining modes are combined, the advantages of the similarity determining modes can be respectively exerted to make up for the disadvantages of other modes.

For example, for a certain voice conversation, assuming that the text in the work order is "Zhao Li", and the text in the conversation text obtained based on voice recognition is "Zhao Li", in such a case, if only a text comparison mode is adopted to detect the conversation quality, the quality detection result may be inaccurate, that is, when a customer service person performs the voice conversation, the work order entry may be performed according to the correct pronunciation but not the correct text (the customer service person is likely to be unable to know what is the correct text in the voice conversation scene), and since the same pronunciation may correspond to different text, the conversation quality detection result may be too coarse and inaccurate, and the quality of the voice conversation performed by the customer service person cannot be accurately reflected.

In this case, if the pinyin comparison mode is combined on the basis of the text comparison, the result that the pinyin "zhaoli" of "Zhao Li" and "Zhao Li" are consistent in comparison can be obtained, so that the above-mentioned possible problems of detecting the session quality by simply adopting the text comparison mode can be overcome.

Further, by adopting the editing distance comparison mode, the problem that the text obtained by voice recognition is inconsistent with the text in the work order due to the client accent, so that the conversation quality detection result is not accurate can be solved. For example, for a voice conversation, the text in the work order is "Wang Fujian", and the text obtained by voice recognition is "Wang Hujian". For such inconsistencies, it is difficult to obtain more accurate quality results, whether by text or pinyin comparison. However, if the comparison method based on the editing distance provided in the embodiment of the application is adopted, it can be precisely determined that the difference between "wangfujian" and "wanghujian" is only that the number of editing operations required for converting "fu" into "hu" is 1, based on this, the smaller difference existing between "wangfujian" and "wanghujian" can be accurately presented, and further, a relatively accurate quality detection result is obtained, and the defects existing in the case of simply adopting text comparison and/or pinyin comparison are overcome.

In order to solve the problem of how to accurately test the quality of the session, the embodiments of the present application further provide a device for testing the quality of the session, which has the same inventive concept as that of the embodiments of the present application.

The specific structure schematic diagram of the device is shown in fig. 6, and the device comprises the following functional units:

an acquisition unit 61 for acquiring a recording text of a conversation and a conversation text; recording text, which is a text file for storing the service information of the session recorded by the recording party; the session text is a text file generated based on session content of the session;

an edit distance determination unit 62 for determining a value of an edit distance between the syllable of the named entity in the recorded text and the syllable of the named entity in the conversation text acquired by the acquisition unit 61;

a session quality determining unit 63 for determining a detection result characterizing the quality of the session based on the value of the edit distance determined by the edit distance determining unit 62.

In an alternative embodiment, the session quality determination unit 63 may specifically be configured to:

determining a value of text similarity between a named entity in the recorded text and a named entity in the session text; and determining a detection result representing the quality of the conversation according to the value of the editing distance and the value of the text similarity.

calculating the quality inspection score value of the session according to the value of the editing distance, the value of the text similarity, the number of syllables of the named entity in the recorded text, the preset editing distance weight value and the preset text similarity weight value; and determining a detection result representing the quality of the session according to the quality detection score.

if the quality inspection score value is smaller than a preset first score threshold value, determining the number of characters with the same syllable in the named entity in the recorded text and the named entity in the conversation text;

judging whether the named entities in the recorded text are consistent with the named entities in the session text or not according to the quantity and a target ratio strategy preset based on a quantity threshold; and determining a detection result for characterizing the quality of the session according to the judgment result.

In an alternative implementation manner, the device provided in the embodiment of the present application may further include: and a policy selection unit.

The policy selection unit may be configured to select, as the target ratio policy, a currently opened ratio policy before the session quality determination unit 63 determines whether the named entity in the recorded text and the named entity in the session text agree according to the number and a target ratio policy set in advance based on a number threshold.

In an alternative implementation manner, the device provided in the embodiment of the present application may further include: and a strategy selection and starting unit.

The strategy selection and opening unit can be used for displaying each selectable configuration item respectively corresponding to different comparison strategies in an interface before the strategy selection unit selects the currently opened comparison strategy from the different comparison strategies preset based on different quantity threshold values as the target comparison strategy; and starting the comparison strategy corresponding to the selected selectable configuration item based on the selected instruction of the selectable configuration item.

In an alternative embodiment, the different ratio strategies include a first strategy and a second strategy.

The first strategy comprises: if the number is greater than or equal to a first number threshold, the named entity in the record text is consistent with the named entity in the session text;

the second strategy comprises: and if the quantity is smaller than the first quantity threshold value but larger than the second quantity threshold value, if the quality inspection score value is larger than the second score threshold value, the named entity in the recorded text is consistent with the named entity in the session text.

Wherein the first number threshold is greater than the second number threshold; the first score threshold is greater than the second score threshold.

In an alternative implementation manner, the device provided in the embodiment of the present application may further include: and a precise comparison unit.

And the precise comparison unit is used for obtaining a comparison result with inconsistent comparison after precisely comparing the named entity in the record text with the named entity in the session text before the edit distance determining unit 62 determines the value of the edit distance.

Wherein, the accurate comparison includes at least one of the following:

comparing whether each character of the named entity in the record text is completely consistent with each character of the named entity in the conversation text;

and comparing whether each syllable corresponding to the named entity in the recorded text is completely consistent with each syllable corresponding to the named entity in the conversation text.

By adopting the device provided by the embodiment of the application, for the quality detection scene of the conversation including the voice conversation, for the same named entities respectively appearing in the recorded text of the conversation and the conversation text, the quality requirements can be met if the words are similar to each other, and the strict requirements are not met under the scene. With such consideration, in the embodiment of the application, the detection result for characterizing the quality of the conversation is determined according to the value of the editing distance between the syllable of the named entity in the recorded text and the syllable of the same named entity in the conversation text, so that the effect of accurately determining the quality of the conversation can be achieved, and the conversation quality detection requirement in the scene can be met.

The embodiments of the present application also provide a computing device for solving the problem of how to accurately and quality-check the quality of a session, in view of the same inventive concepts as the embodiments of the present application.

As shown in fig. 7, the computing device includes: a memory 71 and a processor 72. The memory 71 may be configured to store various other data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on an electronic device. The memory 71 may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

A processor 72 is coupled to the memory 71 for executing the computer program stored in the memory 71 for executing the method of detecting session quality as described in the embodiments of the present application.

The processor 72, when executing the computer program in the memory 71, may perform other functions in addition to the above functions, see in particular the description of the embodiments above.

Further, as shown in fig. 7, the computing device further includes: a display 74, a communication component 73, a power supply component 75, an audio component 76, and other components. Only some of the components are schematically shown in fig. 7, which does not mean that the computing device only includes the components shown in fig. 7.

Accordingly, embodiments of the present application also provide a computer-readable storage medium storing a computer program, where the computer program is capable of implementing the method provided in each of the above embodiments when executed by a computer.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A method for detecting session quality, comprising:

acquiring a recording text and a conversation text of a conversation; the recorded text is a text file for storing the service information of the session recorded by the recording party; the session text is a text file generated based on session content of the session;

determining a value of an edit distance between syllables of the named entity in the recorded text and syllables of the named entity in the conversational text;

and determining a detection result representing the quality of the session according to the value of the editing distance.

2. The method of claim 1, wherein the determining a detection result characterizing the quality of the session based on the value of the edit distance comprises:

Determining a value of text similarity between a named entity in the recorded text and a named entity in the session text;

and determining a detection result representing the quality of the session according to the value of the editing distance and the value of the text similarity.

3. The method of claim 2, wherein the determining a detection result characterizing the quality of the session based on the value of the edit distance and the value of the text similarity comprises:

determining a quality inspection score value of the session according to the value of the editing distance, the value of the text similarity, the number of syllables of the named entity in the recorded text, the preset editing distance weight value and the preset text similarity weight value;

and determining a detection result representing the quality of the session according to the quality detection score value.

4. A method according to claim 3, wherein said determining a detection result characterizing the quality of the session based on the quality-check score value comprises:

Judging whether the named entities in the recorded text are consistent with the named entities in the session text or not according to the quantity and a target ratio strategy preset based on a quantity threshold;

and determining a detection result representing the quality of the session according to the judgment result.

5. The method of claim 4, wherein before determining whether the named entity in the recorded text and the named entity in the session text agree according to the number and a target ratio policy set in advance based on a number threshold, the method further comprises:

and selecting the currently opened ratio strategy as the target ratio strategy.

6. The method of claim 5, wherein prior to selecting a currently open ratio strategy as the target ratio strategy, the method further comprises:

displaying each selectable configuration item respectively corresponding to the different alignment strategies in an interface;

and starting the comparison strategy corresponding to the selected selectable configuration item based on the selected instruction of the selectable configuration item.

7. The method of claim 6, wherein the different ratio strategies comprise a first strategy and a second strategy:

The first policy includes: if the number is greater than or equal to a first number threshold, the named entity in the record text is consistent with the named entity in the session text;

the second policy includes: if the number is smaller than the first number threshold but larger than the second number threshold, if the quality inspection score is larger than the second number threshold, the named entity in the recorded text is consistent with the named entity in the session text;

8. The method of claim 1, wherein prior to determining the value of the edit distance, the method further comprises:

after accurately comparing the named entities in the record text with the named entities in the session text, obtaining a comparison result with inconsistent comparison;

the accurate comparison includes at least one of the following:

9. A device for detecting session quality, comprising:

the acquisition unit is used for acquiring the record text and the conversation text of the conversation; the recorded text is a text file for storing the service information of the session recorded by the recording party; the session text is a text file generated based on session content of the session;

an edit distance determination unit configured to determine a value of an edit distance between syllables of a named entity in the recorded text and syllables of a named entity in the conversational text;

and the session quality determining unit is used for determining a detection result representing the quality of the session according to the value of the editing distance.

10. A computing device, comprising: a memory and a processor, wherein,

the memory is used for storing a computer program;

the processor, coupled to the memory, for executing the computer program stored in the memory for performing the method of detecting session quality of any of claims 1-8.

11. A computer-readable storage medium storing a computer program which, when executed by a computer, enables the method for detecting session quality according to any one of claims 1 to 8.