CN113206997A - Method and device for simultaneously detecting quality of multi-service recorded audio data - Google Patents

Method and device for simultaneously detecting quality of multi-service recorded audio data Download PDF

Info

Publication number
CN113206997A
CN113206997A CN202110484409.5A CN202110484409A CN113206997A CN 113206997 A CN113206997 A CN 113206997A CN 202110484409 A CN202110484409 A CN 202110484409A CN 113206997 A CN113206997 A CN 113206997A
Authority
CN
China
Prior art keywords
service
audio data
segment
segments
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110484409.5A
Other languages
Chinese (zh)
Other versions
CN113206997B (en
Inventor
沈超建
魏薇郦
刘金山
江文乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110484409.5A priority Critical patent/CN113206997B/en
Publication of CN113206997A publication Critical patent/CN113206997A/en
Application granted granted Critical
Publication of CN113206997B publication Critical patent/CN113206997B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N17/00Diagnosis, testing or measuring for television systems or their details
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Signal Processing (AREA)
  • Finance (AREA)
  • Multimedia (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Strategic Management (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Human Resources & Organizations (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

The method and the device for simultaneously detecting the quality of the multi-service recorded audio data can be used in the technical field of cloud computing or other fields, and the method comprises the following steps: the multiple recorded audio data are configured in a time-length unit of the same processing channel to be processed according to the segments and the reserved time lengths corresponding to the segments, the time length of the time-length unit is equal to the longest reserved time length of all the segments, multiple services can be processed in one ASR processing channel at the same time, each audio recording does not monopolize the ASR processing channel, each audio recording sends the audio data of the segments to the ASR only after the completion of a certain segment, and the ASR resources are released after the processing is finished, so that the 1-path ASR resources can provide services for the multiple audio recording services, the idle waiting state of the ASR channels is greatly reduced, the ASR processing efficiency is improved, and the requirement on the number of the ASR concurrencies is greatly reduced.

Description

Method and device for simultaneously detecting quality of multi-service recorded audio data
Technical Field
The application relates to the technical field of cloud computing, in particular to a method and a device for simultaneously detecting quality of multi-service recorded audio data.
Background
In order to protect consumers, according to the requirements of the supervision organization, when the financial institution sells products such as financing, fund and insurance, the financial institution needs to synchronously record audio and video (audio recording for short) in the selling process, and the situations of misleading sale, private sale of 'flyer bill' and the like are avoided. The financial institution formulates audio recording steps and standard speech templates of each step during product sale according to self business processes, and checks whether audio recorded videos meet requirements one by one after audio recording is completed.
In order to reduce the labor input of quality inspection, unify the quality inspection standard and improve the timeliness, partial mechanisms utilize the artificial intelligence technology to carry out real-time quality inspection on audio recording, namely, one path of real-time audio and video stream is additionally obtained to carry out automatic quality inspection when the audio is recorded, the real-time audio and video stream is completed before a customer leaves a business place as much as possible, the situation that the customer returns to additional recording again due to the fact that the audio is recorded and is not qualified is avoided, and therefore the customer experience and the business efficiency are improved.
The audio recording steps and standard speech templates of different mechanisms are different, and a plurality of quality inspection points exist in voice and video respectively, such as: (1) voice quality inspection: confirming the identity of the customer, introducing by the customer manager the product issuing subject, the property of the warranty, the income level, the risk condition, the handling fee and the like; (2) video quality inspection: the customer manager shows the certificate, shows the product information, and has signature in the key link.
The audio recording quality inspection needs to check whether relevant quality inspection points are in compliance one by one from the audio recording video, and the current universal audio recording real-time quality inspection device has the following defects:
the voice recognition resource occupies a high level: the voice-to-text conversion is the basic operation of audio recording quality inspection, and speech segmentation, starting and stopping time determination of each quality inspection point, and subsequent speech quality inspection and video quality inspection can be performed only after the voice-to-text conversion is carried out into the text. The real-time quality inspection device generally sends real-time audio stream recorded by audio to speech recognition (ASR) in real time to convert the real-time audio stream into characters, the audio recording duration is generally 5-60 minutes, and at the moment, 1 audio recording service needs to occupy 1 ASR processing channel in the whole process. For financial institutions with more branch institutions, the audio recording concurrency is large, the number of ASR concurrent processing channels is high, and the investment cost is high.
Disclosure of Invention
Aiming at the problems in the prior art, the application provides a method and a device for simultaneously testing the quality of recorded audio data of multiple services, the method and the device configure a plurality of recorded audio data in a time length unit of the same processing channel for processing according to the segments and the reserved time lengths corresponding to the segments, the time length of the time length unit is equal to the longest reserved time length in all the segments currently located, and then a plurality of services can be simultaneously processed in one ASR processing channel, so that each audio recording does not monopolize the ASR processing channel, each audio recording sends the audio data of the segments to ASR only after a certain segment is finished, the ASR resources are released after the processing is finished, and 1-path ASR resources can provide services for a plurality of audio services, thereby greatly reducing the idle waiting state of the ASR channels and improving the processing efficiency of the ASR, thereby greatly reducing the requirement on the concurrent number of ASR.
In order to solve the technical problem, the application provides the following technical scheme:
in a first aspect, the present invention provides a method for simultaneously quality testing of multi-service recorded audio data, including:
acquiring recorded audio data of a plurality of ongoing banking businesses and business segmentation information corresponding to all the banking businesses;
determining the segment where each recorded audio data is located and the reserved time length corresponding to each segment according to the service segment information;
and configuring the plurality of recorded audio data in a time length unit of the same processing channel for processing according to the segments and the reserved time lengths corresponding to the segments, wherein the time length of the time length unit is equal to the longest reserved time length in all the segments currently located.
In a preferred embodiment, further comprising:
performing priority sequencing on each banking business to generate a priority sequence;
and configuring the time sequence of each banking business in the time length unit according to the priority sequence.
In a preferred embodiment, the service segment information includes identity information and tone information of each service segment audio producer, and the determining the segment of the currently recorded audio data according to the service segment information of the service includes:
comparing the tone information of the currently recorded audio data with the tone information in the service segmentation information, and determining the identity information of the current audio producer;
and determining the current service segment according to the identity information, the current service running time and the duration of the tone.
In a preferred embodiment, the traffic segmentation information includes: the system comprises a client identity confirming section, a self-introduction section, a product type description section, a warranty and income description section, an investment range description section, a risk description section, a product deadline description section, a commission charge description section and a risk prompt section.
In a preferred embodiment, the determining, according to the identity information, the current service running duration, and the duration of the tone, the current service segment includes:
determining the service segmentation range according to the identity information;
according to the duration of the tone and the service duration of each service segment in the service segment range, eliminating the service segments with the matching degree lower than a set threshold value to obtain an updated service segment range;
and matching the service segments in the range of the updated service segment according to the current service carrying time and the service duration of all the service segments, and determining the service segment with the highest matching degree as the current service segment.
In a second aspect, the present invention provides a device for simultaneously quality testing of multi-service recorded audio data, including:
the acquisition module is used for acquiring recorded audio data of a plurality of ongoing banking businesses and business segmentation information corresponding to all the banking businesses;
the segmentation module is used for determining the segment where each piece of recorded audio data is located and the reserved time length corresponding to each segment according to the service segmentation information;
and the processing module is used for configuring the plurality of recorded audio data into a time length unit of the same processing channel for processing according to the segments and the reserved time lengths corresponding to the segments, wherein the time length of the time length unit is equal to the longest reserved time length in all the segments currently located.
In a preferred embodiment, further comprising:
the sorting module is used for sorting the priority of each banking business to generate a priority sequence;
and the configuration module is used for configuring the time sequence of each banking business in the time length unit according to the priority sequence.
In a preferred embodiment, the service segment information includes identity information and tone information of each service segment audio producer, and the segmentation module includes:
the comparison unit is used for comparing the tone information of the currently recorded audio data with the tone information in the service segmentation information and determining the identity information of the current audio producer;
and the determining unit is used for determining the current service segment according to the identity information, the current service running time and the duration of the tone.
In a preferred embodiment, the traffic segmentation information includes: the system comprises a client identity confirming section, a self-introduction section, a product type description section, a warranty and income description section, an investment range description section, a risk description section, a product deadline description section, a commission charge description section and a risk prompt section.
In a preferred embodiment, the determining unit includes:
the segmentation range determining unit is used for determining the service segmentation range according to the identity information;
the segmentation range screening unit is used for eliminating service segments with the matching degree lower than a set threshold value according to the duration of the tone and the service duration of each service segment in the service segmentation range to obtain an updated service segmentation range;
and the matching unit is used for matching the service segments in the range of the updated service segments according to the current service carrying time length and the service duration time lengths of all the service segments, and determining the service segment with the highest matching degree as the current service segment.
In a third aspect, the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method for simultaneously quality testing of multi-service recorded audio data.
In a fourth aspect, the present application provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method for simultaneous quality inspection of recorded audio data for multiple services.
According to the technical scheme, the method and the device for simultaneously testing the quality of the recorded audio data of multiple services, provided by the application, dispose the recorded audio data in a time length unit of the same processing channel for processing according to the segment where the recorded audio data is located and the reserved time length corresponding to each segment, wherein the time length of the time length unit is equal to the longest reserved time length of all the segments where the recorded audio data is located, so that multiple services can be simultaneously processed in one ASR processing channel, each audio recording does not monopolize the ASR processing channel, each audio recording sends the audio data of the segment to ASR only after a certain segment is completed, and the ASR resources are released after the processing is completed, so that 1-path ASR resources can provide services for the recorded audio services, the idle waiting state of the ASR channels is greatly reduced, and the ASR processing efficiency is improved, thereby greatly reducing the requirement on the concurrent number of ASR.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart illustrating a method for simultaneously quality testing of multi-service recorded audio data in an embodiment of the present application.
Fig. 2 is a flowchart illustrating an embodiment of a simultaneous quality inspection apparatus for recording audio data of multiple services according to the present application.
Fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the method and the device for simultaneously quality testing of multi-service recorded audio data disclosed in the present application can be used in the technical field of cloud computing, and can also be used in any field except the technical field of cloud computing.
Considering that in the existing recorded audio data quality inspection method, the voice-to-text conversion is the basic operation of double-recording quality inspection, and only after the voice-to-text conversion is carried out into the text, the speech segmentation can be carried out, the start-stop time of each quality inspection point can be determined, and the subsequent speech quality inspection and video quality inspection can be carried out. The real-time quality inspection device generally sends a double-recording real-time audio stream to speech recognition (ASR) in real time to convert the double-recording real-time audio stream into characters, the double-recording duration is generally 5-60 minutes, and at the moment, 1 double-recording service needs to occupy 1 ASR processing channel in the whole process. For the financial institutions with more branch institutions, the double-recording concurrency is large, the number of ASR concurrency processing channels is high, and the investment cost is high.
Based on the above, the present application further provides a multi-service simultaneous quality inspection apparatus for implementing the multi-service simultaneous quality inspection method for recording audio data provided in one or more embodiments of the present application, where the multi-service simultaneous quality inspection apparatus for recording audio data may be in communication connection with a service terminal, and the service terminal may be provided with a plurality of multi-service simultaneous quality inspection apparatuses for recording audio data, and may specifically access the service terminal through a dedicated network.
The device for simultaneously recording the audio data of the multiple services and detecting the quality of the audio data can acquire the recorded audio data of a plurality of ongoing banking services and service segmentation information corresponding to all the banking services; determining the segment where each recorded audio data is located and the reserved time length corresponding to each segment according to the service segment information; and configuring the plurality of recorded audio data in a time length unit of the same processing channel for processing according to the segments and the reserved time lengths corresponding to the segments, wherein the time length of the time length unit is equal to the longest reserved time length in all the segments currently located.
It is understood that the service terminal may include a smart phone, a tablet electronic device, a portable computer, a desktop computer, a Personal Digital Assistant (PDA), and the like.
The service terminal may have a communication module (i.e., a communication unit), and may be communicatively connected to a remote server to implement data transmission with the server. For example, the communication unit may transmit multi-service recorded audio data to the quality inspection apparatus. The communication unit can also receive a quality inspection result returned by the quality inspection device.
The quality inspection device and the service terminal can communicate with each other by using any suitable network protocol, including a network protocol that has not been developed at the filing date of the present application. The network protocol may include, for example, a TCP/IP protocol, a UDP/IP protocol, an HTTP protocol, an HTTPS protocol, or the like. Of course, the network Protocol may also include, for example, an RPC Protocol (Remote Procedure Call Protocol), a REST Protocol (Representational State Transfer Protocol), and the like used above the above Protocol.
The method, the device, the electronic equipment and the computer readable storage medium for simultaneously testing the quality of the recorded audio data of multiple services can simultaneously process multiple services in one ASR processing channel, so that each audio record does not monopolize the ASR processing channel, each audio record sends the segmented audio data to ASR only after a certain segment is finished, ASR resources are released after the processing is finished, 1 path of ASR resources can provide service for multiple audio record services, the idle waiting state of the ASR channels is greatly reduced, the processing efficiency of the ASR is improved, and the requirement for the number of the concurrent ASR is greatly reduced.
The following embodiments and application examples are specifically and respectively described.
In order to solve the problem that the existing real-time quality inspection system generally sends double-recorded real-time audio streams to speech recognition (ASR) in real time to convert the double-recorded real-time audio streams into characters, the double-recording duration is generally 5-60 minutes, and at the moment, 1 double-recording service needs to occupy 1 ASR processing channel in the whole process. For financial institutions with more branches, the double-recording concurrency is large, at this time, the requirement on the number of ASR concurrent processing channels is high, and the investment cost is large, the present application provides an embodiment of a method for simultaneously quality testing of multi-service recorded audio data, referring to fig. 1, where the method for simultaneously quality testing of multi-service recorded audio data specifically includes the following contents:
step S100: and acquiring recorded audio data of a plurality of ongoing banking businesses and business segmentation information corresponding to all the banking businesses.
Step S200: and determining the segment where each recorded audio data is located and the reserved time length corresponding to each segment according to the service segment information.
Step S300: and configuring the plurality of recorded audio data in a time length unit of the same processing channel for processing according to the segments and the reserved time lengths corresponding to the segments, wherein the time length of the time length unit is equal to the longest reserved time length in all the segments currently located.
As can be seen from the above description, the method for simultaneously quality testing of recorded audio data of multiple services provided in the embodiments of the present application, configuring the plurality of recorded audio data to a time length unit of the same processing channel for processing according to the segments and the reserved time lengths corresponding to the segments, and the time length of the duration unit is equal to the longest reserved time length in all the segments where the time is currently located, multiple services can be processed in one ASR processing channel simultaneously, so that each audio recording does not monopolize the ASR processing channel any more, each audio recording sends the segmented audio data to ASR only after a certain segment is finished, ASR resources are released after the processing is finished, the 1-path ASR resource can provide service for a plurality of audio recording services, so that the idle waiting state of an ASR channel is greatly reduced, the processing efficiency of ASR is improved, and the requirement on the concurrent number of ASR is greatly reduced.
In the present invention, the service segmentation information may be generated in advance or may be generated online, and the present invention is not limited thereto, and in an embodiment, the steps of the present invention include a generation process of the service segmentation information, that is:
the quality inspection method of the service recorded data further comprises the following steps:
and determining the service segmentation information of each service according to the service rule and the service characteristics.
In order to further reasonably configure the processing channel, in an embodiment of the method for simultaneously quality testing the recorded audio data of multiple services provided by the present application, a preferred method for recording audio data segments is provided, which further includes:
performing priority sequencing on each banking business to generate a priority sequence;
and configuring the time sequence of each banking business in the time length unit according to the priority sequence.
In this embodiment, each service has a processing priority, which can avoid service processing confusion.
In order to provide a specific method for determining service segments, in an embodiment of the quality inspection method for service recorded data provided by the present invention, a preferred manner for recording data segments is provided, where the currently recorded data includes audio data, the service segment information includes identity information and tone information of an audio producer of each service segment, and the determining the segment where the currently recorded data is located according to the service segment information of a service includes:
comparing the tone information of the current audio data with the tone information in the service segmentation information, and determining the identity information of the current audio producer;
and determining the current service segment according to the identity information, the current service running time and the duration of the tone.
In this embodiment, since the audio producer in each service segment is a client, a client manager, a system prompt, or the like, the present invention determines the identity according to the difference of the tone of the audio producer, and further determines the segment currently located based on the information such as the service progress duration and the tone duration.
To further explain the specific scheme of how to determine the current service segment in the above steps, in an embodiment of the quality inspection method for service recorded data provided by the present invention, a preferred mode of data quality inspection is provided, where determining the current service segment according to the identity information, the current service running duration, and the duration of the tone includes:
determining the service segmentation range according to the identity information;
according to the duration of the tone and the service duration of each service segment in the service segment range, eliminating the service segments with the matching degree lower than a set threshold value to obtain an updated service segment range;
and matching the service segments in the range of the updated service segment according to the current service carrying time and the service duration of all the service segments, and determining the service segment with the highest matching degree as the current service segment.
In the embodiment, the preliminary range is determined through the identity information, then matching is carried out according to the tone duration and the duration of each service segment, if the matching degree is lower than a set threshold value, the time difference between the tone duration and the duration is overlarge, then the duration of the current service is matched with the duration in the service segment information, the service segment with the highest matching degree is selected, the obtained service segment is accurate, and manual service segmentation is not needed.
The traditional real-time quality inspection system generally sends a double-recording real-time audio stream to a server-side audio quality inspection unit in real time to perform voice-to-text conversion and then performs quality inspection on characters, and at the moment, 1 double-recording service needs to occupy 1 ASR processing channel in the whole process. The processing efficiency of the ASR is limited by the speed of speech of the customer manager and the customer, and most of the time is in an idle state waiting for the arrival of the next speech, so that the processing capability of the ASR at the server cannot be fully exerted. For financial institutions with numerous branch institutions, the double-recording concurrency is large, the ASR concurrency requirement is high at the moment, and the investment cost is large. The invention divides the complete double-recording audio stream into a plurality of audios according to the speech segmentation dimension to respectively perform ASR, so that each double recording does not monopolize an ASR processing channel any more, each double recording sends the segmented audio data to ASR only after a certain segment is finished, and ASR resources are released after the processing is finished, so that 1 path of ASR resources can provide service for a plurality of double-recording services, the idle waiting state of the ASR channel is greatly reduced, the ASR processing efficiency is improved, and the requirement on the ASR concurrency number is greatly reduced. Compared with a real-time audio stream mode, the method has the advantages that the requirement for the voice recognition concurrency number is at least reduced to 1/6, and if the performance of the voice recognition function is continuously improved subsequently or high-performance GPU, TPU and other equipment are adopted, the reduction ratio is larger.
In terms of software, in order to solve the problem that the current real-time quality inspection system generally sends a double-recording real-time audio stream to speech recognition (ASR) in real time to convert the double-recording real-time audio stream into characters, the double-recording duration is generally 5-60 minutes, and at this time, 1 double-recording service needs to occupy 1 ASR processing channel in the whole process. For the financial institutions with more branches, which have a large amount of double recording concurrency, and at this time, the requirement on the number of ASR concurrent processing channels is high, and the investment cost is high, the present application provides an embodiment of a simultaneous quality inspection apparatus for multi-service recorded audio data of all or part of the contents in the method for performing simultaneous quality inspection of multi-service recorded audio data, referring to fig. 2, where the simultaneous quality inspection apparatus for multi-service recorded audio data specifically includes the following contents:
the acquisition module 10 is used for acquiring recorded audio data of a plurality of ongoing banking businesses and business segmentation information corresponding to all the banking businesses;
the segmentation module 20 determines the segment where each recorded audio data is located and the reserved time length corresponding to each segment according to the service segmentation information;
and the processing module 30 is configured to configure the plurality of recorded audio data in a time length unit of the same processing channel for processing according to the segment where the recorded audio data is located and the reserved time length corresponding to each segment, where the time length of the time length unit is equal to the longest reserved time length in all the segments where the recorded audio data is currently located.
In a preferred embodiment, further comprising:
the sorting module is used for sorting the priority of each banking business to generate a priority sequence;
and the configuration module is used for configuring the time sequence of each banking business in the time length unit according to the priority sequence.
In a preferred embodiment, the service segment information includes identity information and tone information of each service segment audio producer, and the segmentation module includes:
the comparison unit is used for comparing the tone information of the currently recorded audio data with the tone information in the service segmentation information and determining the identity information of the current audio producer;
and the determining unit is used for determining the current service segment according to the identity information, the current service running time and the duration of the tone.
In a preferred embodiment, the traffic segmentation information includes: the system comprises a client identity confirming section, a self-introduction section, a product type description section, a warranty and income description section, an investment range description section, a risk description section, a product deadline description section, a commission charge description section and a risk prompt section.
In a preferred embodiment, the determining unit includes:
the segmentation range determining unit is used for determining the service segmentation range according to the identity information;
the segmentation range screening unit is used for eliminating service segments with the matching degree lower than a set threshold value according to the duration of the tone and the service duration of each service segment in the service segmentation range to obtain an updated service segmentation range;
and the matching unit is used for matching the service segments in the range of the updated service segments according to the current service carrying time length and the service duration time lengths of all the service segments, and determining the service segment with the highest matching degree as the current service segment.
According to the technical scheme, the simultaneous quality inspection device for the multi-service recorded audio data, configuring the plurality of recorded audio data to a time length unit of the same processing channel for processing according to the segments and the reserved time lengths corresponding to the segments, and the time length of the duration unit is equal to the longest reserved time length in all the segments where the time is currently located, furthermore, a plurality of services can be processed in one ASR processing channel simultaneously, so that each audio recording does not monopolize the ASR processing channel any more, each audio recording sends the segmented audio data to ASR only after a certain segment is finished, the ASR resources are released after the processing is finished, the 1-path ASR resource can provide service for a plurality of audio recording services, so that the idle waiting state of an ASR channel is greatly reduced, the processing efficiency of ASR is improved, and the requirement on the concurrent number of ASR is greatly reduced.
In terms of hardware, in order to solve the problem that the current real-time quality inspection system generally sends a double-recording real-time audio stream to speech recognition (ASR) in real time to convert the double-recording real-time audio stream into characters, the double-recording duration is generally 5-60 minutes, and at this time, 1 double-recording service needs to occupy 1 ASR processing channel in the whole process. For financial institutions with more branches, the problem that the number of the ASR concurrent processing channels is high, and the investment cost is high due to the large amount of the double-recording concurrency, the present application provides an embodiment of an electronic device for implementing all or part of the content in the multi-service recorded audio data simultaneous quality inspection method, where the electronic device specifically includes the following contents:
fig. 3 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 3, the electronic device 9600 can include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this FIG. 3 is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.
In one embodiment, the simultaneous quality control function of multi-service recorded audio data can be integrated into the central processor. Wherein the central processor may be configured to control:
step S100: and acquiring recorded audio data of a plurality of ongoing banking businesses and business segmentation information corresponding to all the banking businesses.
Step S200: and determining the segment where each recorded audio data is located and the reserved time length corresponding to each segment according to the service segment information.
Step S300: and configuring the plurality of recorded audio data in a time length unit of the same processing channel for processing according to the segments and the reserved time lengths corresponding to the segments, wherein the time length of the time length unit is equal to the longest reserved time length in all the segments currently located.
As can be seen from the above description, in the electronic device provided in this embodiment of the present application, according to the segment where the ASR is located and the reserved time length corresponding to each segment, the multiple pieces of recorded audio data are configured in a time-length unit of the same processing channel to be processed, and the time length of the time-length unit is equal to the longest reserved time length in all the segments where the ASR is currently located, so that multiple services can be simultaneously processed in one ASR processing channel, each audio recording does not monopolize the ASR processing channel, each audio recording sends the segmented audio data to the ASR only after a certain segment is completed, and after the processing is completed, the ASR resource is released, so that the 1-way ASR resource can provide services for multiple audio recording services, the idle waiting state of the ASR channel is greatly reduced, the processing efficiency of the ASR is improved, and the requirement on the number of ASR concurrencies is greatly reduced.
In another embodiment, the simultaneous quality inspection device for recording audio data of multiple services may be configured separately from the central processing unit 9100, for example, the simultaneous quality inspection device for recording audio data of multiple services may be configured as a chip connected to the central processing unit 9100, and the simultaneous quality inspection function for recording audio data of multiple services is realized through the control of the central processing unit.
As shown in fig. 3, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 also does not necessarily include all of the components shown in fig. 3; further, the electronic device 9600 may further include components not shown in fig. 3, which may be referred to in the art.
As shown in fig. 3, a central processor 9100, sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device, which central processor 9100 receives input and controls the operation of the various components of the electronic device 9600.
The memory 9140 can be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 9100 can execute the program stored in the memory 9140 to realize information storage or processing, or the like.
The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. Power supply 9170 is used to provide power to electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.
The memory 9140 can be a solid state memory, e.g., Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 9140 could also be some other type of device. Memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 being used for storing application programs and function programs or for executing a flow of operations of the electronic device 9600 by the central processor 9100.
The memory 9140 can also include a data store 9143, the data store 9143 being used to store data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers for the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, contact book applications, etc.).
The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. The communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, which may be the same as in the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and receive audio input from the microphone 9132, thereby implementing ordinary telecommunications functions. The audio processor 9130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100, thereby enabling recording locally through the microphone 9132 and enabling locally stored sounds to be played through the speaker 9131.
An embodiment of the present application further provides a computer-readable storage medium capable of implementing all steps in the method for simultaneously quality testing of multi-service recorded audio data in the foregoing embodiments, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program implements all steps of the method for simultaneously quality testing of multi-service recorded audio data, where an execution subject of the computer program is a server or a client, for example, when the processor executes the computer program, the processor implements the following steps:
step S100: and acquiring recorded audio data of a plurality of ongoing banking businesses and business segmentation information corresponding to all the banking businesses.
Step S200: and determining the segment where each recorded audio data is located and the reserved time length corresponding to each segment according to the service segment information.
Step S300: and configuring the plurality of recorded audio data in a time length unit of the same processing channel for processing according to the segments and the reserved time lengths corresponding to the segments, wherein the time length of the time length unit is equal to the longest reserved time length in all the segments currently located.
As can be seen from the above description, in the electronic device provided in this embodiment of the present application, according to the segment where the ASR is located and the reserved time length corresponding to each segment, the multiple pieces of recorded audio data are configured in a time-length unit of the same processing channel to be processed, and the time length of the time-length unit is equal to the longest reserved time length in all the segments where the ASR is currently located, so that multiple services can be simultaneously processed in one ASR processing channel, each audio recording does not monopolize the ASR processing channel, each audio recording sends the segmented audio data to the ASR only after a certain segment is completed, and after the processing is completed, the ASR resource is released, so that the 1-way ASR resource can provide services for multiple audio recording services, the idle waiting state of the ASR channel is greatly reduced, the processing efficiency of the ASR is improved, and the requirement on the number of ASR concurrencies is greatly reduced. On the basis of segmenting audio data to obtain an optimal segmented text, frame extraction and picture forming are carried out on video data in a time period corresponding to the optimal segmented text, quality inspection is carried out on pictures, frame extraction and picture forming are not needed to be carried out on the video data recorded in the whole segment, and then the video data are inspected one by one, so that the time consumption and resource consumption of quality inspection are reduced.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (12)

1. A method for simultaneously detecting the quality of multi-service recorded audio data is characterized by comprising the following steps:
acquiring recorded audio data of a plurality of ongoing banking businesses and business segmentation information corresponding to all the banking businesses;
determining the segment where each recorded audio data is located and the reserved time length corresponding to each segment according to the service segment information;
and configuring the plurality of recorded audio data in a time length unit of the same processing channel for processing according to the segments and the reserved time lengths corresponding to the segments, wherein the time length of the time length unit is equal to the longest reserved time length in all the segments currently located.
2. The method for simultaneously quality testing of multi-service recorded audio data according to claim 1, further comprising:
performing priority sequencing on each banking business to generate a priority sequence;
and configuring the time sequence of each banking business in the time length unit according to the priority sequence.
3. The method of claim 1, wherein the service segment information includes identity information and color information of each service segment audio producer, and the determining the segment of the currently recorded audio data according to the service segment information of the service comprises:
comparing the tone information of the currently recorded audio data with the tone information in the service segmentation information, and determining the identity information of the current audio producer;
and determining the current service segment according to the identity information, the current service running time and the duration of the tone.
4. The method of claim 1, wherein the service segment information comprises: the system comprises a client identity confirming section, a self-introduction section, a product type description section, a warranty and income description section, an investment range description section, a risk description section, a product deadline description section, a commission charge description section and a risk prompt section.
5. The method of claim 3, wherein the determining the current service segment according to the identity information, the current service running time and the duration of the tone comprises:
determining the service segmentation range according to the identity information;
according to the duration of the tone and the service duration of each service segment in the service segment range, eliminating the service segments with the matching degree lower than a set threshold value to obtain an updated service segment range;
and matching the service segments in the range of the updated service segment according to the current service carrying time and the service duration of all the service segments, and determining the service segment with the highest matching degree as the current service segment.
6. A device for simultaneously inspecting the quality of multi-service recorded audio data, comprising:
the acquisition module is used for acquiring recorded audio data of a plurality of ongoing banking businesses and business segmentation information corresponding to all the banking businesses;
the segmentation module is used for determining the segment where each piece of recorded audio data is located and the reserved time length corresponding to each segment according to the service segmentation information;
and the processing module is used for configuring the plurality of recorded audio data into a time length unit of the same processing channel for processing according to the segments and the reserved time lengths corresponding to the segments, wherein the time length of the time length unit is equal to the longest reserved time length in all the segments currently located.
7. The device for simultaneous quality control of multi-service recorded audio data according to claim 6, further comprising:
the sorting module is used for sorting the priority of each banking business to generate a priority sequence;
and the configuration module is used for configuring the time sequence of each banking business in the time length unit according to the priority sequence.
8. The apparatus for simultaneous quality control of recorded audio data of multiple services according to claim 6, wherein said service segment information includes identity information and tone information of each service segment audio producer, and said segmentation module comprises:
the comparison unit is used for comparing the tone information of the currently recorded audio data with the tone information in the service segmentation information and determining the identity information of the current audio producer;
and the determining unit is used for determining the current service segment according to the identity information, the current service running time and the duration of the tone.
9. The apparatus for simultaneous quality control of recorded audio data of multiple services according to claim 6, wherein said service segment information comprises: the system comprises a client identity confirming section, a self-introduction section, a product type description section, a warranty and income description section, an investment range description section, a risk description section, a product deadline description section, a commission charge description section and a risk prompt section.
10. The apparatus for simultaneously inspecting recorded audio data of multiple services according to claim 8, wherein said determining unit comprises:
the segmentation range determining unit is used for determining the service segmentation range according to the identity information;
the segmentation range screening unit is used for eliminating service segments with the matching degree lower than a set threshold value according to the duration of the tone and the service duration of each service segment in the service segmentation range to obtain an updated service segmentation range;
and the matching unit is used for matching the service segments in the range of the updated service segments according to the current service carrying time length and the service duration time lengths of all the service segments, and determining the service segment with the highest matching degree as the current service segment.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of simultaneous quality inspection of recorded multi-service audio data according to any one of claims 1 to 5 when executing the program.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of simultaneous quality inspection of multi-service recorded audio data according to any one of claims 1 to 5.
CN202110484409.5A 2021-04-30 2021-04-30 Method and device for simultaneously detecting quality of multi-service recorded audio data Active CN113206997B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110484409.5A CN113206997B (en) 2021-04-30 2021-04-30 Method and device for simultaneously detecting quality of multi-service recorded audio data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110484409.5A CN113206997B (en) 2021-04-30 2021-04-30 Method and device for simultaneously detecting quality of multi-service recorded audio data

Publications (2)

Publication Number Publication Date
CN113206997A true CN113206997A (en) 2021-08-03
CN113206997B CN113206997B (en) 2022-10-28

Family

ID=77028217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110484409.5A Active CN113206997B (en) 2021-04-30 2021-04-30 Method and device for simultaneously detecting quality of multi-service recorded audio data

Country Status (1)

Country Link
CN (1) CN113206997B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105765650A (en) * 2013-09-27 2016-07-13 亚马逊技术公司 Speech recognizer with multi-directional decoding
US20190088260A1 (en) * 2017-09-21 2019-03-21 Tata Consultancy Services Limited System and method for improving call-centre audio transcription
CN109831665A (en) * 2019-01-16 2019-05-31 深圳壹账通智能科技有限公司 A kind of video quality detecting method, system and terminal device
CN110085213A (en) * 2019-04-30 2019-08-02 广州虎牙信息科技有限公司 Abnormality monitoring method, device, equipment and the storage medium of audio
CN111885375A (en) * 2020-07-15 2020-11-03 中国工商银行股份有限公司 Method, device, server and system for testing double-recorded video

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105765650A (en) * 2013-09-27 2016-07-13 亚马逊技术公司 Speech recognizer with multi-directional decoding
US20190088260A1 (en) * 2017-09-21 2019-03-21 Tata Consultancy Services Limited System and method for improving call-centre audio transcription
CN109831665A (en) * 2019-01-16 2019-05-31 深圳壹账通智能科技有限公司 A kind of video quality detecting method, system and terminal device
CN110085213A (en) * 2019-04-30 2019-08-02 广州虎牙信息科技有限公司 Abnormality monitoring method, device, equipment and the storage medium of audio
CN111885375A (en) * 2020-07-15 2020-11-03 中国工商银行股份有限公司 Method, device, server and system for testing double-recorded video

Also Published As

Publication number Publication date
CN113206997B (en) 2022-10-28

Similar Documents

Publication Publication Date Title
KR101731404B1 (en) Voice and/or facial recognition based service provision
CN111312283B (en) Cross-channel voiceprint processing method and device
CN111400518A (en) Method, device, terminal, server and system for generating and editing works
CN111078930A (en) Audio file data processing method and device
CN110413528B (en) Intelligent configuration method and system for test environment
CN111897738B (en) Automatic testing method and device based on atomic service
CN112202803A (en) Audio processing method, device, terminal and storage medium
CN111930288B (en) Interactive service processing method and system
CN113315979A (en) Data processing method and device, electronic equipment and storage medium
CN112836037A (en) Method and device for recommending dialect
CN112529585A (en) Interactive awakening method, device, equipment and system for risk transaction
CN112910708B (en) Distributed service calling method and device
CN112911332B (en) Method, apparatus, device and storage medium for editing video from live video stream
CN113051924A (en) Method and system for segmented quality inspection of recorded data
CN113206997B (en) Method and device for simultaneously detecting quality of multi-service recorded audio data
CN113206998B (en) Method and device for quality inspection of video data recorded by service
CN111210826B (en) Voice information processing method and device, storage medium and intelligent terminal
CN113515447B (en) Automatic testing method and device for system
CN115205009A (en) Account opening business processing method and device based on virtual technology
CN113206996B (en) Quality inspection method and device for service recorded data
CN113744712A (en) Intelligent outbound voice splicing method, device, equipment, medium and program product
CN112820298A (en) Voiceprint recognition method and device
CN114095738A (en) Video and live broadcast processing method, live broadcast system, electronic device, terminal and medium
CN111048063A (en) Audio synthesis method and device
CN111292766B (en) Method, apparatus, electronic device and medium for generating voice samples

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant