CN113434670A - Method and device for generating dialogistic text, computer equipment and storage medium - Google Patents

Method and device for generating dialogistic text, computer equipment and storage medium Download PDF

Info

Publication number
CN113434670A
CN113434670A CN202110692177.2A CN202110692177A CN113434670A CN 113434670 A CN113434670 A CN 113434670A CN 202110692177 A CN202110692177 A CN 202110692177A CN 113434670 A CN113434670 A CN 113434670A
Authority
CN
China
Prior art keywords
text data
text
target
preset
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110692177.2A
Other languages
Chinese (zh)
Inventor
张映
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weikun Shanghai Technology Service Co Ltd
Original Assignee
Weikun Shanghai Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weikun Shanghai Technology Service Co Ltd filed Critical Weikun Shanghai Technology Service Co Ltd
Priority to CN202110692177.2A priority Critical patent/CN113434670A/en
Priority to PCT/CN2021/109281 priority patent/WO2022267174A1/en
Publication of CN113434670A publication Critical patent/CN113434670A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to the technical field of artificial intelligence, and provides a method and a device for generating a dialect text, computer equipment and a storage medium. The method comprises the following steps: the method comprises the steps of obtaining a recording file and performance data of a user, conducting voice escaping on the recording file to obtain text data corresponding to the recording file, conducting keyword recognition on the text data, extracting target text data, obtaining evaluation information of the user according to the target text data and the performance data, classifying the target text data according to the evaluation information, and obtaining a target language skill text according to a classification result and a preset basic language skill text. By adopting the method, more comprehensive and accurate evaluation information can be obtained, the target text data can be objectively classified according to the more comprehensive and accurate evaluation information, and then more reasonable and actual excellent dialectics are arranged based on the classified text data, so that the method is more in line with the requirements of actual application scenes and brings convenience to users.

Description

Method and device for generating dialogistic text, computer equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for generating a dialect text, a computer device, and a storage medium.
Background
With the increasing market competition, the quality of service requirements for the call center customer service personnel are also increasing. In the current telephone sales system, a customer service person and a client can store a corresponding recording file each time of communication, and in order to improve the service capability of the customer service person, the service quality of the customer service person is usually evaluated by analyzing the recording file of the customer service person, so that the purpose of improving the service capability is achieved.
At present, a method for improving service capability by analyzing a recording file of a customer service generally includes performing voice recognition on the recording file to obtain corresponding text data, further, processing the text data corresponding to the recording file by using a natural voice processing technology and a scoring mechanism to obtain a corresponding call quality score of the customer service staff, and further, screening out an excellent speech text according to the score.
According to the method, only the sound recording file is taken as an analysis object, the customer service staff are evaluated from the dimension of the sound recording file, the evaluation dimension is too single, and the finally obtained dialect text cannot be guaranteed to meet the requirements of an actual application scene.
Disclosure of Invention
In view of the foregoing, there is a need to provide a method, an apparatus, a computer device, and a storage medium for generating a dialect text, which can better meet the requirements of practical application scenarios.
A method of generating verbal text, the method comprising:
acquiring a recording file and performance data of a user;
carrying out voice escaping on the recording file to obtain text data corresponding to the recording file;
performing keyword recognition on the text data, and extracting target text data;
obtaining evaluation information of the user according to the target text data and the achievement data;
and classifying the target text data according to the evaluation information, and obtaining a target dialect text according to a classification result and a preset basic dialect text.
In one embodiment, performing keyword recognition on the text data, and extracting the target text data includes:
performing keyword identification on the text data according to preset invalid keywords, and deleting the text data corresponding to the preset invalid keywords to obtain initial text data;
and performing keyword recognition on the initial text data according to the preset effective keywords, and extracting target text data corresponding to the preset effective keywords.
In one embodiment, obtaining rating information for the user based on the target text data and the performance data comprises:
analyzing the target text data by a preset natural language processing technology to obtain an analysis result;
and obtaining the evaluation information of the user based on the analysis result and the performance data by combining with a preset evaluation rule.
In one embodiment, analyzing the target text data by a preset natural language processing technique, and obtaining an analysis result includes:
performing word segmentation processing on target text data through a preset word segmentation tool to obtain a keyword sequence;
matching the keywords in the keyword sequence with preset keywords to obtain a matching result;
and according to the matching result, carrying out user intention identification and expression specification identification on the target text data to obtain an analysis result.
In one embodiment, classifying the target text data according to the evaluation information, and obtaining the target utterance text according to the classification result and a preset basic utterance includes:
sequencing the target text data according to the evaluation information to obtain a ranking result;
classifying target text data in a first preset ranking range into first type text data, and classifying target text data in a second preset ranking range into second type text data;
and obtaining a target language operation text according to the first type text data, the second type text data and a preset basic language operation.
In one embodiment, obtaining the target conversational text according to the first type of text data, the second type of text data and the preset basic conversational text includes:
respectively comparing the first type of text data and the second type of text data with a preset basic dialect text, and correspondingly extracting a first dialect text and a second dialect text;
and performing content increase and decrease operation on the preset basic phonetics text according to the first phonetics text and the second phonetics text to obtain the target phonetics text.
In one embodiment, obtaining the audio record file of the user comprises:
acquiring an initial sound recording file;
and screening out the recording files matched with the preset time length threshold from the initial recording files according to the preset time length threshold.
A verbal text generating apparatus, the apparatus comprising:
the data acquisition module is used for acquiring a recording file and performance data of a user;
the voice escaping module is used for conducting voice escaping on the recording file to obtain text data corresponding to the recording file;
the text screening module is used for carrying out keyword recognition on the text data and extracting target text data;
the evaluation information determining module is used for obtaining the evaluation information of the user according to the target text data and the achievement data;
and the word-operation sorting module is used for classifying the target text data according to the evaluation information and obtaining a target word-operation text according to the classification result and a preset basic word-operation text.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a recording file and performance data of a user;
carrying out voice escaping on the recording file to obtain text data corresponding to the recording file;
performing keyword recognition on the text data, and extracting target text data;
obtaining evaluation information of the user according to the target text data and the achievement data;
and classifying the target text data according to the evaluation information, and obtaining a target dialect text according to a classification result and a preset basic dialect text.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a recording file and performance data of a user;
carrying out voice escaping on the recording file to obtain text data corresponding to the recording file;
performing keyword recognition on the text data, and extracting target text data;
obtaining evaluation information of the user according to the target text data and the achievement data;
and classifying the target text data according to the evaluation information, and obtaining a target dialect text according to a classification result and a preset basic dialect text.
According to the method, the device, the computer equipment and the storage medium for generating the dialect text, after the corresponding text data is obtained by conducting voice escaping on the recording file, the text data is not directly processed, but the target text data is extracted by conducting keyword recognition on the text data, so that the purpose of simplifying the text data is achieved, and the data processing workload is reduced; and from the perspective of actual requirements, the two dimensions of the recording file and the performance data are evaluated to obtain more comprehensive and accurate evaluation information, so that the target text data can be objectively classified according to the more comprehensive and accurate evaluation information, and then more reasonable and actual excellent dialect is arranged based on the classified text data, so that the method is more in line with the requirements of actual application scenes and brings convenience to users.
Drawings
FIG. 1 is a diagram of an application environment for a method of generating a conversational text, in one embodiment;
FIG. 2 is a flow diagram that illustrates a method for generating spoken text, in accordance with one embodiment;
FIG. 3 is a detailed flow diagram of a method for generating a verbal text in another embodiment;
FIG. 4 is a block diagram of an apparatus for generating spoken text in one embodiment;
FIG. 5 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The method for generating the dialect text can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. Specifically, the manager logs in the service management system at the terminal 102, sends a recording file processing message to the server 104 by operating on a system operation interface, the server 102 receives the recording file processing message, acquires a recording file and performance data of an existing seat, performs voice escaping on the recording file to obtain text data corresponding to the recording file, performs keyword recognition on the text data based on the basis of the voice recognition character escaping, extracts target text data, obtains evaluation information of the user according to the target text data and the performance data, classifies the target text data according to the evaluation information, and obtains a target language technical text according to a classification result and a preset basic language technical text. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 2, a method for generating a verbal text is provided, which is exemplified by the application of the method to the server in fig. 1, and includes the following steps:
step 202, acquiring a recording file and performance data of a user.
In this embodiment, the user takes the seat as an example, the recording file is an effective recording file after being screened, and both the recording file and the performance data carry the identity of the seat, so as to perform subsequent classification processing. The identification can be identification information such as a position identification code or an employee number. Performance data includes volume of trades and customer satisfaction, among other things. In a specific implementation, the administrator may set a timing task so that the server 102 acquires the audio files retained in the previous day every day in a fixed time period.
And step 204, performing voice escaping on the recording file to obtain text data corresponding to the recording file.
Speech escape refers to the rapid and accurate conversion of speech data into text, and may also be referred to as speech recognition. At present, a relatively mature solution for voice escaping exists in the industry, and the third-party service for providing voice escaping has hundreds of degrees, Tencent, science news and the like. In this embodiment, a third-party service may be invoked to perform voice escaping on the recording file to obtain text data corresponding to the recording file.
And step 206, performing keyword recognition on the text data, and extracting target text data.
The target text data refers to valid text data. After the text data corresponding to the audio file is extracted, since not all the text data are valid information, the text data can be further simplified, and invalid text data is filtered out to obtain target text data. Specifically, the text data may be subjected to keyword recognition, an invalid portion may be recognized and deleted, and then the target text data may be further extracted.
And step 208, obtaining the evaluation information of the user according to the target text data and the performance data.
The rating information may be information such as a rating, and/or comment. In the present example, the evaluation information is described by taking the overall score as an example. Specifically, the comprehensive score is mainly evaluated according to customer satisfaction and business capacity, the customer satisfaction can be represented as a system score, and the system score is obtained by basic service scores and system deduction scores. The basic service score is obtained by calculating the average value of the customer scores after each call is finished, and the system deduction score is obtained by evaluating the call quality of the seat. The determination of call quality may be determined by keyword recognition. Business capabilities can be assessed based on volume and conversational norms. Wherein, the speaking standard degree can be obtained by matching keywords with the target text data.
And step 210, classifying the target text data according to the evaluation information, and obtaining a target dialect text according to a classification result and a preset basic dialect text.
In practice, the telemarketing industry has a fixed set of words. In this embodiment, the predetermined basic dialogs are fixed dialogs. Specifically, the seats can be divided into excellent seats and poor seats according to comprehensive scores, target text data corresponding to the excellent seats and the poor seats are further classified to obtain excellent text data and poor text data, then parts of the seats, which are free to play, outside the basic dialect are analyzed based on the excellent text data, the poor text data and a preset basic dialect, so that what dialect customers are more interested in what the users are, and what dialect can lose the interest of the users to obtain the target dialect.
In the method for generating the dialect text, after the corresponding text data is obtained by conducting voice escaping on the recording file, the text data is not directly processed, but the target text data is extracted by conducting keyword recognition on the text data, so that the purpose of simplifying the text data is achieved, and the data processing workload is reduced; and from the perspective of actual demand, evaluate from two dimensions of recording file and performance data, obtain comparatively comprehensive evaluation information, and then can be according to more comprehensive accurate evaluation information, objectively classify target text data, then based on the text data after the classification, arrange out more reasonable and be close to actual outstanding speech, more accord with actual application scene demand, bring conveniently for the user.
In one embodiment, obtaining the audio record file of the user comprises: the method comprises the steps of obtaining initial recording files, and screening out the recording files matched with a preset time threshold from the initial recording files according to the preset time threshold.
In practical application, because each seat needs to make multiple calls one day to obtain multiple recording files, the number of the recording files existing in the system is huge. If each recording file needs to be processed, the workload is huge. It is found that a recording file of about one minute generally only contains the opening greeting of the seat and the product introduction, and then the call is hung up by the customer, and on the basis, the call recording file with shorter call duration can be defined as an invalid recording file for filtering. Therefore, in this embodiment, it is necessary to filter out invalid audio files before performing text escaping on the audio files. Specifically, the sound recording file with the sound recording duration less than 1 minute may be determined as an invalid sound recording file, and only the sound recording file with the sound recording duration greater than or equal to 1 minute may be extracted. By the method, mass sound recording files are simplified, time for subsequently transferring the characters of the sound recording files is saved, and processing speed is improved.
As shown in FIG. 3, in one embodiment, step 206 includes: step 226, performing keyword recognition on the text data according to the preset invalid keywords, deleting the text data corresponding to the preset invalid keywords to obtain initial text data, performing keyword recognition on the initial text data according to the preset valid keywords, and extracting target text data corresponding to the preset valid keywords.
A text data derived from the escape of the audio file includes valid contents and invalid contents. The effective content may mainly include a segment in which the customer generates interest to interact with the service desk, and the ineffective content may include various greetings (such as good, goodbye, congratulatory, happy life, etc.), fixed-line speech or product introduction. And deleting the part of invalid content to simplify the text data before completing the voice escape and performing subsequent processing. Specifically, the method may include determining an invalid keyword according to the element in the fixed telephone operation and the relevant element of the product introduction to perform keyword recognition, and recognizing and deleting the invalid text. The invalid text can be removed by adopting stop word technology processing, and the invalid content of the part is extracted and deleted to obtain initial text data. Considering that the customer interaction process has related product description and promotion content, effective keywords can be determined according to product characteristics and relevant elements of product sales, for example, effective keywords such as starting investment, income and risk can be set for financial products, keyword recognition is carried out on initial text data, and free-play text data which is not matched with the customer in terms of seat departure is extracted. For example, a section of text data, which is from a keyword of the client for consulting the detailed information as a starting point to a later point and shows no interest to the client, or from a point at which the client has a willingness as an ending point, is determined as an important section, i.e., effective content, to obtain effective text data. In the embodiment, effective text data are identified through keyword identification, so that the subsequent word arrangement work can be accurately and efficiently carried out, and the time is saved.
As shown in FIG. 3, in one embodiment, step 208 includes: and 228, analyzing the target text data through a preset natural language processing technology to obtain an analysis result, and obtaining the evaluation information of the user based on the analysis result and the achievement data by combining a preset evaluation rule.
In specific implementation, a preset word segmentation tool can be adopted to perform word segmentation on target text data, the target text data is segmented into a keyword sequence consisting of a plurality of keywords, the keywords in the keyword sequence are matched with the preset keywords to obtain a matching result, and then user intention recognition, phrase specification recognition and the like are performed according to the matching result, so that the call quality is evaluated, and a corresponding system deduction score is given. Specifically, the preset keywords include user intention keywords and phrase specification keywords, if keywords matched with the user intention keywords (such as unnecessary keywords and uninteresting keywords) are matched, the user intention can be determined to be unconscious, if the matched keywords matched with the user specification keywords exceed a preset threshold value, such as 5 keywords, the phrases are judged to be in accordance with the specification, and otherwise, the phrases are not in accordance with the specification. In another embodiment, analysis of the sound file based on natural language processing techniques further includes speech segment recognition, cold field recognition, mood recognition, emotion recognition, dialect usage recognition, semantic definition recognition, sentence coherence recognition, and the like. In this embodiment, the user intention may be recognized through a keyword such as no interest, no need, and the like, after a keyword that the client expresses impatience is recognized, that is, after the user intention is recognized as "no wish", it is determined whether the seat is polite and apology represented, or the product is continuously introduced by anechoic hearing, if the seat is anechoic hearing, the product is continuously introduced by ptering and anechoic hearing, and finally the product is directly hung up by the client, the system automatically and correspondingly deducts the score according to a preset scoring rule. And then, subtracting the system deduction score from the basic service score to obtain a system score. Further, performance scores in the performance data are obtained, corresponding weights are set for the performance scores and the system scores respectively according to preset evaluation rules, and then the performance scores and the system scores are summed to obtain comprehensive scores of the seats. In the embodiment, the target text data are analyzed, and the preset evaluation rule is combined, so that the corresponding comprehensive score can be intelligently and objectively given to the seat, the seat can be evaluated conveniently, and the service quality of the seat is improved.
As shown in FIG. 3, in one embodiment, step 210 includes: and step 230, sequencing the target text data according to the evaluation information to obtain a ranking result, classifying the target text data in the first preset ranking range into first type text data, classifying the target text data in the second preset ranking range into second type text data, and obtaining a target dialect text according to the first type text data, the second type text data and a preset basic dialect.
When the system is specifically implemented, the achievement data, the comprehensive score and the text data all carry the seat identity identification information, and the conversational arrangement can be performed according to the seat identity identification information. Ranking the seats according to the comprehensive scores of the seats, dividing the first ten seats into excellent seats, dividing the corresponding text data into first type text data, namely excellent texts, dividing the last ten seats into weak seats, and dividing the corresponding text data into second type text data, namely weak texts. And then, comparing the excellent text data with the poor text data by combining with a preset basic dialect text, and sorting out a reasonable target dialect. In the embodiment, the seats can be objectively ranked according to the comprehensive scores to obtain excellent seats and poor seats, text data of the excellent seats and the poor seats are further analyzed, the actual situation can be more approached, and more reasonable dialects are arranged.
In one embodiment, obtaining the target conversational text according to the first type of text data, the second type of text data and the preset basic conversational text includes: and respectively comparing the first type of text data and the second type of text data with a preset basic dialect text, correspondingly extracting a first dialect text and a second dialect text, and performing content increase and decrease operation on the preset basic dialect text according to the first dialect text and the second dialect text to obtain a target dialect text.
Since all telemarketing is fixed-based, it is possible to determine the location of the first type of text data and the second type of text data by comparing the first type of text data and the second type of text data with the fixed-based, correspondingly extracting excellent interactive contents and poor interactive contents which are not separated from basic dialogues and interacted with the client, then ranking the excellent interaction segments of the first ten seats in the ranking according to the weight of 10-1 to obtain excellent text ranking, ranking the poor interaction segments of the last ten seats in the ranking to obtain dregs text ranking, summing up the common points of the dialogs by analyzing a large amount of excellent texts and dregs texts to obtain the excellent dialogs and the poor dialogs, then the excellent dialect is added into the basic dialect, and the bad dialect is deleted from the basic dialect, so that a more reasonable target dialect is obtained through sorting, and the service capability of the seat is effectively improved.
To more clearly describe the present application, the following description will be made with reference to a specific example:
the management personnel sets a timing task for the server, so that the server acquires the recording files of all seats in the previous day at a fixed time interval every night, and when receiving a verbal text generation message sent by the terminal, the server acquires the stored recording files and performance data of the seats, and because the recording files have huge data and a plurality of invalid recording files with short time, the recording files with the recording time less than 1 minute are used as the invalid recording files for filtering, effective recording files are extracted, third-party services such as science news flight voice escaping service are called, and the recording files are subjected to voice escaping, so that text data corresponding to the recording files are obtained. Then, carrying out keyword recognition on the text data according to preset invalid keywords, deleting the invalid text data corresponding to the invalid keywords to obtain initial text data, carrying out keyword recognition on the initial text data according to preset valid keywords, and extracting target text data corresponding to the valid keywords, namely key text data. And based on a natural language processing technology, performing user intention identification, disconnected speech identification, phrase specification identification and the like on the target text data to obtain an analysis result, and determining the comprehensive score of the seat by combining a preset evaluation rule based on the analysis result and the achievement data. Ranking the seats according to the comprehensive scores of the seats to obtain ranking results, classifying text data corresponding to the first ten seats into first type text data, namely excellent text data, classifying text data corresponding to the last ten seats into second type text data, namely poor text data, comparing the excellent text data and the poor text data with a preset basic dialect, respectively extracting excellent interaction segments and poor interaction segments of free-play parts of the excellent seats and the poor seats out of the basic dialect, then ranking the excellent interaction segments of the first ten seats according to the weight 10-1 mode to obtain excellent text ranking, ranking the poor interaction segments of the last ten seats to obtain grain meal text ranking, analyzing a large amount of excellent texts and grain meal texts to sum up a common point of the dialect, the excellent dialogs and the bad dialogs are obtained, then the excellent dialogs are added into the basic dialogs, the bad dialogs are deleted from the basic dialogs, and therefore the more reasonable target dialogs are obtained through arrangement. According to the method, after the corresponding text data is obtained by conducting voice escaping on the recording file, the text data is not directly processed, but the target text data is extracted by conducting keyword recognition on the text data, so that the purpose of simplifying the text data is achieved, the data processing workload is reduced, in addition, the target text data and the performance data are combined, evaluation is conducted from two dimensions of the call quality and the service capacity, more comprehensive and accurate evaluation information is obtained, further, the target text data can be objectively classified according to the more comprehensive and accurate evaluation information, then based on the classified text data, more reasonable excellent words close to the reality are arranged, the actual application scene requirements are better met, and convenience is brought to users.
It should be understood that, although the steps in the flowcharts related to the above embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in each flowchart related to the above embodiments may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.
In one embodiment, as shown in fig. 4, there is provided a verbal text generating device comprising: a data acquisition module 510, a data acquisition module 520, a text filtering module 530, an evaluation information determination module 540, and a word-art sorting module 550, wherein:
a data acquisition module 510 for acquiring the user's voice recording file and performance data.
And a voice escaping module 520, configured to perform voice escaping on the recording file to obtain text data corresponding to the recording file.
And the text screening module 530 is configured to perform keyword recognition on the text data and extract target text data.
And the evaluation information determining module 540 is configured to obtain the evaluation information of the user according to the target text data and the performance data.
And a language-technology sorting module 550, configured to sort the target text data according to the evaluation information, and obtain a target language-technology text according to the classification result and a preset basic language-technology text.
In an embodiment, the text screening module 530 is further configured to perform keyword recognition on the text data according to a preset invalid keyword, delete the text data corresponding to the preset invalid keyword to obtain initial text data, perform keyword recognition on the initial text data according to a preset valid keyword, and extract target text data corresponding to the preset valid keyword.
In one embodiment, the evaluation information determining module 540 is further configured to analyze the target text data by using a preset natural language processing technique to obtain an analysis result, and obtain the evaluation information of the user based on the analysis result and the performance data by combining with a preset evaluation rule.
In an embodiment, the evaluation information determining module 540 is further configured to perform word segmentation on the target text data through a preset word segmentation tool to obtain a keyword sequence, match keywords in the keyword sequence with preset keywords to obtain a matching result, and perform user intention recognition and term specification recognition on the target text data according to the matching result to obtain an analysis result.
In an embodiment, the utterance sorting module 550 is further configured to sort the target text data according to the evaluation information to obtain a ranking result, classify the target text data within the first preset ranking range as a first type of text data, classify the target text data within the second preset ranking range as a second type of text data, and obtain the target utterance text according to the first type of text data, the second type of text data, and a preset basic utterance.
In an embodiment, the phonetics collating module 550 is further configured to compare the first type of text data and the second type of text data with a preset basic phonetics text, extract a first phonetics text and a second phonetics text correspondingly, and perform content increase and decrease operation on the preset basic phonetics text according to the first phonetics text and the second phonetics text to obtain a target phonetics text.
In an embodiment, the data obtaining module 510 is further configured to obtain initial recording files, and screen out, according to a preset time threshold, recording files in the initial recording files that match the preset time threshold.
For a specific embodiment of the apparatus for generating a spoken text, reference may be made to the above embodiment of the method for generating a spoken text, and details are not described here. The various modules in the aforementioned verbal text generating device may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing data such as a recording file and a preset evaluation rule. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of generating a verbal text.
Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: the method comprises the steps of obtaining a recording file and performance data of a user, conducting voice escaping on the recording file to obtain text data corresponding to the recording file, conducting keyword recognition on the text data, extracting target text data, obtaining evaluation information of the user according to the target text data and the performance data, classifying the target text data according to the evaluation information, and obtaining a target language skill text according to a classification result and a preset basic language skill text.
In one embodiment, the processor, when executing the computer program, further performs the steps of: performing keyword recognition on the text data according to preset invalid keywords, deleting the text data corresponding to the preset invalid keywords to obtain initial text data, performing keyword recognition on the initial text data according to preset valid keywords, and extracting target text data corresponding to the preset valid keywords.
In one embodiment, the processor, when executing the computer program, further performs the steps of: analyzing the target text data by a preset natural language processing technology to obtain an analysis result, and obtaining the evaluation information of the user by combining a preset evaluation rule based on the analysis result and the achievement data.
In one embodiment, the processor, when executing the computer program, further performs the steps of: the method comprises the steps of performing word segmentation processing on target text data through a preset word segmentation tool to obtain a keyword sequence, matching keywords in the keyword sequence with preset keywords to obtain a matching result, and performing user intention recognition and term standard recognition on the target text data according to the matching result to obtain an analysis result.
In one embodiment, the processor, when executing the computer program, further performs the steps of: and sequencing the target text data according to the evaluation information to obtain a ranking result, classifying the target text data in a first preset ranking range into first type text data, classifying the target text data in a second preset ranking range into second type text data, and obtaining a target dialect text according to the first type text data, the second type text data and a preset basic dialect.
In one embodiment, the processor, when executing the computer program, further performs the steps of: and respectively comparing the first type of text data and the second type of text data with a preset basic dialect text, correspondingly extracting a first dialect text and a second dialect text, and performing content increase and decrease operation on the preset basic dialect text according to the first dialect text and the second dialect text to obtain a target dialect text.
In one embodiment, the processor, when executing the computer program, further performs the steps of: the method comprises the steps of obtaining initial recording files, and screening out the recording files matched with a preset time threshold from the initial recording files according to the preset time threshold.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: the method comprises the steps of obtaining a recording file and performance data of a user, conducting voice escaping on the recording file to obtain text data corresponding to the recording file, conducting keyword recognition on the text data, extracting target text data, obtaining evaluation information of the user according to the target text data and the performance data, classifying the target text data according to the evaluation information, and obtaining a target language skill text according to a classification result and a preset basic language skill text.
In one embodiment, the computer program when executed by the processor further performs the steps of: performing keyword recognition on the text data according to preset invalid keywords, deleting the text data corresponding to the preset invalid keywords to obtain initial text data, performing keyword recognition on the initial text data according to preset valid keywords, and extracting target text data corresponding to the preset valid keywords.
In one embodiment, the computer program when executed by the processor further performs the steps of: analyzing the target text data by a preset natural language processing technology to obtain an analysis result, and obtaining the evaluation information of the user by combining a preset evaluation rule based on the analysis result and the achievement data.
In one embodiment, the computer program when executed by the processor further performs the steps of: the method comprises the steps of performing word segmentation processing on target text data through a preset word segmentation tool to obtain a keyword sequence, matching keywords in the keyword sequence with preset keywords to obtain a matching result, and performing user intention recognition and term standard recognition on the target text data according to the matching result to obtain an analysis result.
In one embodiment, the computer program when executed by the processor further performs the steps of: and sequencing the target text data according to the evaluation information to obtain a ranking result, classifying the target text data in a first preset ranking range into first type text data, classifying the target text data in a second preset ranking range into second type text data, and obtaining a target dialect text according to the first type text data, the second type text data and a preset basic dialect.
In one embodiment, the computer program when executed by the processor further performs the steps of: and respectively comparing the first type of text data and the second type of text data with a preset basic dialect text, correspondingly extracting a first dialect text and a second dialect text, and performing content increase and decrease operation on the preset basic dialect text according to the first dialect text and the second dialect text to obtain a target dialect text.
In one embodiment, the computer program when executed by the processor further performs the steps of: the method comprises the steps of obtaining initial recording files, and screening out the recording files matched with a preset time threshold from the initial recording files according to the preset time threshold.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method of generating verbal text, the method comprising:
acquiring a recording file and performance data of a user;
carrying out voice escaping on the recording file to obtain text data corresponding to the recording file;
performing keyword recognition on the text data, and extracting target text data;
obtaining the evaluation information of the user according to the target text data and the performance data;
and classifying the target text data according to the evaluation information, and obtaining a target dialect text according to a classification result and a preset basic dialect text.
2. The method of claim 1, wherein the performing keyword recognition on the text data and extracting target text data comprises:
performing keyword recognition on the text data according to preset invalid keywords, and deleting the text data corresponding to the preset invalid keywords to obtain initial text data;
and according to preset effective keywords, carrying out keyword recognition on the initial text data, and extracting target text data corresponding to the preset effective keywords.
3. The method of claim 1, wherein said analyzing said target text data and said performance data to obtain rating information for said user comprises:
analyzing the target text data through a preset natural language processing technology to obtain an analysis result;
and obtaining the evaluation information of the user by combining a preset evaluation rule based on the analysis result and the performance data.
4. The method according to claim 3, wherein the analyzing the target text data by a preset natural language processing technology to obtain an analysis result comprises:
performing word segmentation processing on target text data through a preset word segmentation tool to obtain a keyword sequence;
matching the keywords in the keyword sequence with preset keywords to obtain a matching result;
and according to the matching result, carrying out user intention identification and expression standard identification on the target text data to obtain an analysis result.
5. The method of claim 1, wherein the classifying the target text data according to the evaluation information, and obtaining a target dialect text according to a classification result and a preset basic dialect comprises:
sequencing the target text data according to the evaluation information to obtain a ranking result;
classifying target text data in a first preset ranking range into first type text data, and classifying target text data in a second preset ranking range into second type text data;
and obtaining a target dialect text according to the first type of text data, the second type of text data and a preset basic dialect.
6. The method of claim 5, wherein obtaining the target verbal text from the first type of text data, the second type of text data and a predetermined basic verbal text comprises:
respectively comparing the first type of text data and the second type of text data with the preset basic dialect text, and correspondingly extracting a first dialect text and a second dialect text;
and according to the first and second phonetics texts, performing content increase and decrease operation on the preset basic phonetics texts to obtain target phonetics texts.
7. The method of any of claims 1 to 6, wherein obtaining the user's audio recording file comprises:
acquiring an initial sound recording file;
and screening out the recording files matched with the preset time length threshold from the initial recording files according to the preset time length threshold.
8. A verbal text generating apparatus, the apparatus comprising:
the data acquisition module is used for acquiring a recording file and performance data of a user;
the voice escaping module is used for conducting voice escaping on the recording file to obtain text data corresponding to the recording file;
the text screening module is used for carrying out keyword identification on the text data to obtain target text data;
the evaluation information determining module is used for obtaining the evaluation information of the user according to the target text data and the performance data;
and the word-operation sorting module is used for classifying the target text data according to the evaluation information and obtaining a target word-operation text according to a classification result and a preset basic word-operation text.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202110692177.2A 2021-06-22 2021-06-22 Method and device for generating dialogistic text, computer equipment and storage medium Pending CN113434670A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110692177.2A CN113434670A (en) 2021-06-22 2021-06-22 Method and device for generating dialogistic text, computer equipment and storage medium
PCT/CN2021/109281 WO2022267174A1 (en) 2021-06-22 2021-07-29 Script text generating method and apparatus, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110692177.2A CN113434670A (en) 2021-06-22 2021-06-22 Method and device for generating dialogistic text, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113434670A true CN113434670A (en) 2021-09-24

Family

ID=77756985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110692177.2A Pending CN113434670A (en) 2021-06-22 2021-06-22 Method and device for generating dialogistic text, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN113434670A (en)
WO (1) WO2022267174A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115757862A (en) * 2023-01-09 2023-03-07 百融至信(北京)科技有限公司 Method and device for matching voice texts in batch recording mode, electronic equipment and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116361429B (en) * 2023-01-19 2024-02-02 北京伽睿智能科技集团有限公司 Business exception employee management method, system, equipment and storage medium
CN116980522B (en) * 2023-09-22 2024-01-09 湖南三湘银行股份有限公司 System and method for notifying customer image based on intelligent quality inspection

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763499A (en) * 2018-05-30 2018-11-06 平安科技(深圳)有限公司 Calling quality detecting method, device, equipment and storage medium based on intelligent sound
CN110189751A (en) * 2019-04-24 2019-08-30 中国联合网络通信集团有限公司 Method of speech processing and equipment
CN110472017A (en) * 2019-08-21 2019-11-19 佰聆数据股份有限公司 A kind of analysis of words art and topic point identify matched method and system
CN111160017A (en) * 2019-12-12 2020-05-15 北京文思海辉金信软件有限公司 Keyword extraction method, phonetics scoring method and phonetics recommendation method
WO2021003930A1 (en) * 2019-07-10 2021-01-14 深圳前海微众银行股份有限公司 Quality inspection method, apparatus, and device for customer service audio, and computer readable storage medium
CN112804400A (en) * 2020-12-31 2021-05-14 中国工商银行股份有限公司 Customer service call voice quality inspection method and device, electronic equipment and storage medium
CN112885332A (en) * 2021-01-08 2021-06-01 天讯瑞达通信技术有限公司 Voice quality inspection method, system and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763499A (en) * 2018-05-30 2018-11-06 平安科技(深圳)有限公司 Calling quality detecting method, device, equipment and storage medium based on intelligent sound
CN110189751A (en) * 2019-04-24 2019-08-30 中国联合网络通信集团有限公司 Method of speech processing and equipment
WO2021003930A1 (en) * 2019-07-10 2021-01-14 深圳前海微众银行股份有限公司 Quality inspection method, apparatus, and device for customer service audio, and computer readable storage medium
CN110472017A (en) * 2019-08-21 2019-11-19 佰聆数据股份有限公司 A kind of analysis of words art and topic point identify matched method and system
CN111160017A (en) * 2019-12-12 2020-05-15 北京文思海辉金信软件有限公司 Keyword extraction method, phonetics scoring method and phonetics recommendation method
CN112804400A (en) * 2020-12-31 2021-05-14 中国工商银行股份有限公司 Customer service call voice quality inspection method and device, electronic equipment and storage medium
CN112885332A (en) * 2021-01-08 2021-06-01 天讯瑞达通信技术有限公司 Voice quality inspection method, system and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115757862A (en) * 2023-01-09 2023-03-07 百融至信(北京)科技有限公司 Method and device for matching voice texts in batch recording mode, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2022267174A1 (en) 2022-12-29

Similar Documents

Publication Publication Date Title
CN110444198B (en) Retrieval method, retrieval device, computer equipment and storage medium
US10771627B2 (en) Personalized support routing based on paralinguistic information
US8306814B2 (en) Method for speaker source classification
CN104598445B (en) Automatically request-answering system and method
CN109767787B (en) Emotion recognition method, device and readable storage medium
CN113434670A (en) Method and device for generating dialogistic text, computer equipment and storage medium
CN102623011B (en) Information processing apparatus, information processing method and information processing system
CN111311327A (en) Service evaluation method, device, equipment and storage medium based on artificial intelligence
US20100070276A1 (en) Method and apparatus for interaction or discourse analytics
WO2020007129A1 (en) Context acquisition method and device based on voice interaction
CN110177182B (en) Sensitive data processing method and device, computer equipment and storage medium
CN110310663A (en) Words art detection method, device, equipment and computer readable storage medium in violation of rules and regulations
US9904927B2 (en) Funnel analysis
CN111640436B (en) Method for providing dynamic customer portraits of conversation objects to agents
Macary et al. AlloSat: A new call center french corpus for satisfaction and frustration analysis
CN113094578A (en) Deep learning-based content recommendation method, device, equipment and storage medium
KR20180120488A (en) Classification and prediction method of customer complaints using text mining techniques
CN113360622A (en) User dialogue information processing method and device and computer equipment
US11687946B2 (en) Systems and methods for detecting complaint interactions
Galanis et al. Classification of emotional speech units in call centre interactions
CN116631412A (en) Method for judging voice robot through voiceprint matching
CN111599382B (en) Voice analysis method, device, computer equipment and storage medium
CN115831125A (en) Speech recognition method, device, equipment, storage medium and product
CN114356982A (en) Marketing compliance checking method and device, computer equipment and storage medium
CN115733925A (en) Business voice intention presenting method, device, medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination