CN115438889A

CN115438889A - Interactive service quality detection method, electronic device and computer storage medium

Info

Publication number: CN115438889A
Application number: CN202110619555.4A
Authority: CN
Inventors: 刘冲
Original assignee: Alibaba Singapore Holdings Pte Ltd
Current assignee: Alibaba Innovation Co
Priority date: 2021-06-03
Filing date: 2021-06-03
Publication date: 2022-12-06

Abstract

The embodiment of the application provides an interactive service quality detection method, electronic equipment and a computer storage medium, wherein the interactive service quality detection method comprises the following steps: acquiring multimedia data recorded for interactive services in real time, wherein the multimedia data comprises: audio data and video data; performing quality inspection node detection in real time according to the acquired multimedia data, wherein the quality inspection node comprises at least one of the following types: the personnel object identity quality inspection node, the service flow quality inspection node and the service content quality inspection node; performing single-mode recognition or multi-mode recognition matched with the type of the quality inspection node on the multimedia data corresponding to the quality inspection node; and determining the quality detection result of the multimedia data acquired in real time according to the modal identification result. By the embodiment of the application, the interactive service can be effectively and specifically detected, and the overall detection quality of the interactive service is improved.

Description

Interactive service quality detection method, electronic device and computer storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to an interactive service quality detection method, electronic equipment and a computer storage medium.

Background

With the development of computer technology, more and more work can be done collaboratively by means of two-way or multi-way interaction of electronic devices. In these scenarios, a technology (also called dual recording) for simultaneously recording audio and video is widely used, such as insurance, banking, and other industries that have high standards for interactive services. The early double recording is field double recording, and generally requires that a service party such as an insurance agency and a served party such as an applicant arrive at the field together to carry out interactive service activities and carry out double recording on an interactive service field. Since the standard requirement is a necessary requirement for such interactive service activities, it is also common that a person performing quality inspection is present in the field.

However, with the development of the internet and the increasing diversity of customer needs, the high labor cost and lack of flexibility of on-site double recording increasingly restrict the development of such industries. Under the circumstances, remote double recording supporting a remote audio and video mode becomes an increasingly urgent need. In view of the fact that quality inspection of such services is an important means for quality inspection of the interactive service process in the various industries, it can be effectively ensured that the related industry services meet the standardization requirements, and therefore how to implement the interactive service quality inspection in such a scenario becomes a problem to be solved urgently.

Disclosure of Invention

In view of the above, embodiments of the present application provide an interactive quality of service detection scheme to at least partially solve the above problem.

According to a first aspect of embodiments of the present application, a method for detecting quality of interaction service is provided, including: acquiring multimedia data recorded for an interactive service in real time, the multimedia data comprising: audio data and video data; performing quality inspection node detection in real time according to the acquired multimedia data, wherein the quality inspection node comprises at least one of the following types: the personnel object identity quality inspection node, the service flow quality inspection node and the service content quality inspection node; performing single-mode recognition or multi-mode recognition matched with the type of the quality inspection node on the multimedia data corresponding to the quality inspection node; and determining the quality detection result of the multimedia data acquired in real time according to the modal identification result.

According to a second aspect of the embodiments of the present application, there is provided an interactive service quality detection apparatus, including: an obtaining module, configured to obtain, in real time, multimedia data recorded for an interactive service, where the multimedia data includes: audio data and video data; the detection module is used for detecting quality detection nodes in real time according to the acquired multimedia data, and the quality detection nodes comprise at least one of the following types: the personnel object identity quality inspection node, the service flow quality inspection node and the service content quality inspection node; the identification module is used for performing single-mode identification or multi-mode identification matched with the type of the quality inspection node on the multimedia data corresponding to the quality inspection node; and the determining module is used for determining the quality detection result of the multimedia data acquired in real time according to the modal identification result.

According to a third aspect of embodiments of the present application, there is provided an electronic apparatus, including: the device comprises audio and video acquisition equipment, a display, a processor, a memory, a communication interface and a communication bus, wherein the audio and video acquisition equipment, the display, the processor, the memory and the communication interface finish mutual communication through the communication bus; the audio and video acquisition equipment is used for recording and acquiring the video data of the real-time audio data of the interactive service to form multimedia data; the processor is used for acquiring multimedia data recorded and acquired by the audio and video acquisition equipment in real time; performing quality inspection node detection in real time according to the acquired multimedia data, wherein the quality inspection node comprises at least one of the following components: the personnel object identity quality inspection node, the service flow quality inspection node and the service content quality inspection node; performing single-mode recognition or multi-mode recognition matched with the type of the quality inspection node on the multimedia data corresponding to the quality inspection node; determining a quality detection result of the multimedia data acquired in real time according to a modal identification result; the display is used for displaying the multimedia data recorded and collected by the audio and video collecting equipment in real time and/or the quality detection result output by the processor; the memory is used for storing the multimedia data, the quality detection result and intermediate data output by the processor in the process of determining the quality detection result.

According to a fourth aspect of embodiments of the present application, there is provided a computer storage medium having a computer program stored thereon, which when executed by a processor, implements the interactive quality of service detection method according to the first aspect.

According to the interactive service quality detection scheme provided by the embodiment of the application, multimedia data of interactive services are recorded aiming at an interactive service scene, part or all of identity quality inspection nodes, service flow quality inspection nodes and service content quality inspection nodes of a person object involved in the multimedia data are detected in real time, corresponding single-mode identification or multi-mode identification is determined based on the types of the nodes, content corresponding to the nodes, namely single-mode identification results or multi-mode identification results, is identified from the multimedia data corresponding to the quality inspection nodes, and quality detection results of the multimedia data acquired in real time are determined based on the single-mode identification results or the multi-mode identification results. Because different types of quality inspection nodes relate to different detection targets, different modal identification modes can be adopted based on the detection results to obtain the detection results, effective and targeted quality detection of the interactive service can be ensured, and the overall detection quality of the interactive service is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1A is a flowchart illustrating steps of a method for interactive quality of service detection according to a first embodiment of the present application;

FIG. 1B is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 1A;

fig. 2 is a flowchart illustrating steps of an interactive service quality detection method according to a second embodiment of the present application;

fig. 3 is a block diagram of a device for detecting quality of interactive service according to a third embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of protection of the embodiments in the present application.

The following further describes specific implementations of embodiments of the present application with reference to the drawings of the embodiments of the present application.

Example one

Referring to fig. 1A, a flowchart illustrating steps of an interactive quality of service detection method according to a first embodiment of the present application is shown.

The interactive service quality detection method of the embodiment comprises the following steps:

step S102: and acquiring the multimedia data recorded aiming at the interactive service in real time.

Wherein the multimedia data includes: audio data and video data. That is, in the present embodiment, the audio and video recording is performed on the interactive service at the same time.

In the embodiment of the present application, the interaction service may be any activity involving two or more parties and having certain specifications and standard requirements, including but not limited to: the insurance policy is used for the insurance industry, the policy signing activity, the money related transaction activity in the banking industry, the online education activity, the notarization activity, the conference activity and the like, and the embodiment of the application is not limited to the above.

Step S104: and detecting the quality inspection node in real time according to the acquired multimedia data.

Wherein the quality inspection node comprises at least one of: the system comprises a personnel object identity quality inspection node, a service flow quality inspection node and a service content quality inspection node.

In many industries involving interactive services, many have corresponding specifications and requirements for the identity of human objects, service flows, and service content that are involved in the interactive activities.

For example, in the insurance industry, a service person (also referred to as a service object) who is required to dominate an insurance application must be a practitioner of a regular insurance company and have corresponding performance, and a serviced person (also referred to as a serviced object), i.e., an applicant, must be an actual real applicant. Also for example, in the banking industry, it is required that the service person (i.e., lending entity) must be a regular bank worker and have the appropriate qualifications, that the serviced person, i.e., lender, must be a regular bank worker and authorized to perform a lending activity, and so on.

Based on the method, the quality inspection of identity identification and authentication of the personnel object can be realized in different interactive service links through the personnel object identity quality inspection node; quality inspection of the sequence, completion degree and normalization degree of the service flow in the interactive service can be performed through the service flow quality inspection node; through the service content quality inspection node, the quality inspection of the related content displayed and/or played in different interactive service links can be performed, and the service content is guaranteed to be displayed, played and known in a standard mode. It can also be seen that different quality inspection nodes have different targeting targets, and accordingly, it can also be considered that different quality inspection nodes correspond to different types.

In practical application, when the real-time quality inspection node detection of the multimedia data acquired in real time is realized, the node detection can be performed through corresponding voice recognition, for example, if the fact that 'please show your identity document' is played is detected, the person object identity quality inspection node can be considered to be detected; or, node detection may be performed through corresponding image recognition, for example, if a certificate is detected to appear in a video image, it may be considered that a person object identity quality inspection node is detected; alternatively, the node detection may be performed by a preset time node, for example, if a certain interactive service specifies that certificate display and identity authentication are required in the 2 nd to 3 rd minutes of the service process, when the multimedia data is played to the 2 nd minute, it may be considered that a quality inspection node is detected. Of course, other modes capable of performing quality inspection node detection are also applicable to the scheme of the embodiment of the present application.

Step S106: and performing single-mode recognition or multi-mode recognition matched with the type of the quality inspection node on the multimedia data corresponding to the quality inspection node.

In this embodiment, different modality identification modes may be preset, and different types of quality inspection nodes may correspond to one or more modality identification modes. In an optional manner, multiple modality identification modes can be set for each type of quality inspection node in advance, and in practical application, a person skilled in the art can select at least one mode for the quality inspection node from the multiple modality identification modes corresponding to each quality inspection node according to actual requirements and types of the quality inspection node. In the embodiments of the present application, the numbers "plural" and "plural" relating to "plural" mean two or more unless otherwise specified.

For example, for the person object identity quality inspection node, it may correspond to one or more of face recognition, voiceprint recognition, and certificate recognition; for another example, for a service flow quality inspection node, it may correspond to one or more of face recognition, voiceprint recognition, certificate recognition, lip motion recognition, voice recognition, and motion recognition; for another example, the service content quality inspection node may correspond to one or more of face recognition, voiceprint recognition, certificate recognition, voice recognition, image content recognition, and text recognition.

The same modal identification mode can be set for a certain type of quality inspection nodes aiming at all multimedia data of one field of interactive service, and different modal identification modes can be set for different quality inspection nodes according to the requirements of each quality inspection node in the certain type of quality inspection nodes.

Step S108: and determining the quality detection result of the multimedia data acquired in real time according to the modal identification result.

The result of the mode identification can usually represent the quality detection result of the multimedia data corresponding to the quality detection node, and for a certain quality detection node, it can be considered that if the quality detection of the multimedia data corresponding to the quality detection node passes, the previous multimedia data all pass the quality detection.

In the following, the above process is schematically illustrated with a scenario example, as shown in fig. 1B.

In this example, it is set that a uniform modal recognition mode is set for different types of quality inspection nodes in a certain interactive service scene, specifically, in this example, the person object identity quality inspection node corresponds to face recognition and certificate recognition in a uniform manner, the service flow quality inspection node corresponds to face recognition, lip motion recognition and voice recognition in a uniform manner, and the service content quality inspection node corresponds to face recognition, motion recognition and image content recognition in a uniform manner.

Based on this, in fig. 1B, the insurance application interactive service is taken as an example, and relates to an insurer (service staff) and an insured person (service target staff), and real-time audio and video recording is performed on the interactive service. If the situation that the voice 'please the insurer face the shot and show the professional qualification certificate' appears in the recorded audio in the 2 nd minute is detected, the identity quality inspection node of the detected personnel object is determined, the corresponding face identification and certificate identification are determined, the face and the certificate appearing in the next video are correspondingly identified, and the identification result is obtained. And comparing the identification result with corresponding information of the interactive service in a pre-stored database, and judging whether the identification result is consistent with the corresponding information of the interactive service. If the two are consistent, the two pass; otherwise, it does not pass. In this example, it is set that the working qualification certificate of the insurer is associated with the service enterprise and the identity thereof, that is, the unit where the insurer is located and the identity thereof can be determined according to the information of the working qualification certificate, and if the quality inspection content of the quality inspection node represented by the identification result meets the standard and specification, the subsequent interactive activity can be performed.

After the quality detection node passes the quality detection, and then in the 5 th minute, the situation that voice appears in the recorded audio, that is, the insurant explains the insurable content is detected, the detected service flow quality detection node is determined, the corresponding face recognition, lip motion recognition and voice recognition are determined, the face recognition and the lip motion recognition are carried out on the video data which are recorded next, and the voice recognition is carried out on the audio data to obtain the voice content. If the lip movement behavior of the insurer is determined according to the lip movement recognition result, and after the face recognition result and the voice content are respectively compared with the corresponding information of the interactive service in the pre-stored database, the information is determined to be consistent, the information is passed; otherwise, it does not pass. The comparison of the voice content can be performed by converting the voice content into the text content and then comparing the text content, and further, the keywords in the converted text content can be extracted for comparison.

After the quality inspection node of the service flow passes the quality inspection, in 11 th minute, it is detected that voice appears in the recorded audio, that is, a guarantee item is requested to be displayed to an applicant, the quality inspection node of the service content is determined to be detected, the corresponding face recognition, action recognition and image content recognition are determined, and corresponding detection and recognition are carried out on a video frame in the next recorded video data, so that a face recognition result, an action recognition result and an image content recognition result of a current speaker are obtained. And comparing the obtained face recognition result, the action recognition result and the image content recognition result with corresponding information of the interactive service in a pre-stored database, and judging whether the face recognition result, the action recognition result and the image content recognition result are consistent with the corresponding information of the interactive service. If the two are consistent, the two pass; otherwise, it does not pass.

It can be seen that the foregoing process simply illustrates quality detection of some links in an application process, and in practical applications, the quality inspection nodes and quality inspection contents related thereto may be more complex, but all of them can be implemented by referring to the corresponding description in this embodiment, which is not described in detail herein.

According to the method and the device, multimedia data of interactive services are recorded aiming at an interactive service scene, part or all of personnel object identity quality inspection nodes, service flow quality inspection nodes and service content quality inspection nodes related to the multimedia data are detected in real time, corresponding single-mode identification or multi-mode identification is determined based on the types of the nodes, content corresponding to the nodes, namely single-mode identification results or multi-mode identification results, is identified from the multimedia data corresponding to the quality inspection nodes, and quality detection results of the multimedia data acquired in real time are determined based on the single-mode identification results or the multi-mode identification results. Because different types of quality inspection nodes relate to different detection targets, different modal identification modes can be adopted based on the detection results to obtain the detection results, effective and targeted quality detection of the interactive service can be ensured, and the overall detection quality of the interactive service is improved.

The interactive quality of service detection method of the present embodiment may be performed by any suitable electronic device having data processing capabilities, including but not limited to: a server, a mobile terminal (such as a mobile phone, a PAD and the like), a PC and the like.

Example two

Referring to fig. 2, a flowchart illustrating steps of an interactive quality of service detection method according to a second embodiment of the present application is shown.

step S202: and acquiring the multimedia data recorded aiming at the interactive service in real time.

Wherein the multimedia data comprises: audio data and video data, i.e. simultaneous audio and video recording.

The scheme of the embodiment can be suitable for interactive services of which interactive service participants are all located on the same site, and at the moment, the interactive services are recorded aiming at the site; the scheme of this embodiment is also applicable to interactive services in which the interactive service participants are located in different places, for example, a service object is located in a service enterprise, and a served object is located at home, and under such a situation, this step can be implemented as follows: and acquiring real-time multimedia data recorded aiming at the interactive service from the first terminal and the second terminal respectively. Therefore, the realization of the interactive service is greatly facilitated, the flexibility of the interactive service is improved, and the requirements of different interactive service providers are met.

In addition, in a feasible manner, before the step, an interactive service guide phone book matched with the interactive service can be obtained; and guiding the voice book to guide interactive activities based on the interactive service and recording audio and video of the interactive service according to the recording triggering conditions.

The interactive service guide text book is used for guiding interactive activities corresponding to the interactive services. The interactive service guide text book can guide the interactive activities so as to guide the interaction between the service object and the served object and control the flow and time of the interactive activities. For example, a guide script for application interaction services can guide the insurer and applicant to perform corresponding operations and interactions, such as displaying certificates, displaying application content, determining whether the applicant knows relevant information, and the like. Recording trigger conditions may also be set for multimedia data recording of the interactive service, including but not limited to recording trigger buttons, recording trigger voices, recording trigger gestures, etc., but it should be understood by those skilled in the art that other recording trigger manners are also applicable. When the modes are triggered, the recording triggering conditions are met, and the audio and video recording of the interactive service can be started. Under the condition that the guide voice book exists, the guide voice book guides the interactive activity to be carried out, and the audio and video recording can be carried out aiming at the interactive activity guided by the guide voice book. But not limited thereto, as mentioned above, the recording of the interactive service may also be performed directly, by guiding the interactive activity by the participant in the interactive service, or by other means.

Step S204: and detecting quality inspection nodes in real time according to the acquired multimedia data.

Wherein the quality inspection nodes comprise at least one of the following types: the system comprises a personnel object identity quality inspection node, a service flow quality inspection node and a service content quality inspection node.

As described in the first embodiment, the quality testing node may be detected by audio detection, video detection, time detection, or the like.

In case the interactive service is adapted with a boot-up phonebook, this step can be implemented as: and detecting audio data in the acquired multimedia data in real time to judge whether the played interactive service guide phone book is played to a preset keyword, and detecting a quality inspection node according to a judgment result. Because the guide phone book is used for guiding the interaction activities, the guide phone book usually has clear separation of a plurality of links and keyword representation, and whether quality inspection nodes exist can be judged according to preset keywords in the guide phone book based on the clear separation and the keyword representation. Therefore, the speed and the efficiency of detecting the quality detection nodes can be greatly improved, the quality detection nodes are more convenient and quicker to detect, and the detection cost of the quality detection nodes is saved.

Step S206: and performing single-mode recognition or multi-mode recognition matched with the type of the quality inspection node on the multimedia data corresponding to the quality inspection node.

In a feasible mode, if the quality inspection node is a person object identity quality inspection node, at least one of the following identifications is performed on the multimedia data corresponding to the quality inspection node: face recognition, voiceprint recognition and certificate recognition; or if the quality inspection node is a service flow quality inspection node, performing at least one of the following identifications on the multimedia data corresponding to the quality inspection node: face recognition, voiceprint recognition, certificate recognition, lip movement recognition, voice recognition and action recognition; or if the quality inspection node is a service content quality inspection node, performing at least one of the following identifications on the multimedia data corresponding to the quality inspection node: face recognition, voiceprint recognition, certificate recognition, voice recognition, image content recognition and text recognition.

In this embodiment, different modality recognition modes may be adapted for the types of the quality inspection nodes. The face and the voiceprint are important biological identification features, the certificate can represent the identity of a person, any one of the person and the certificate can be identified, the corresponding person can be authenticated, and the accuracy of authentication can be further guaranteed by combining part or all of the person and the certificate. The lip movement recognition and the movement recognition can effectively recognize the movement of the person in the video so as to provide an effective basis for judging or making an effective basis whether a certain movement or a certain sentence is spoken by a certain person. The specific content can be identified by voice identification, image content identification and text identification, so that a judgment basis is provided for judging whether the content spoken by the personnel or the displayed content in the interactive service meets the specification or standard.

Because the service process quality inspection node and the service content quality inspection node relate to not only personnel but also content, the service process quality inspection node and the service content quality inspection node need to be realized by combining identity verification and content detection, on the basis of determining the identity of the current personnel, whether the personnel is the appointed personnel of the current service or whether the content spoken or displayed by the current personnel meets the requirements of the current process can be determined, the normalization and the standardization of the service process and the service content can be effectively ensured, and the management and control of the service process and the service content can be strengthened.

Hereinafter, a specific quality control node will be described in detail.

And the quality inspection node is a personnel object identity quality inspection node.

Aiming at the personnel object identity quality inspection node, at least one of the following identifications can be carried out on the multimedia data corresponding to the quality inspection node: face recognition, voiceprint recognition and license recognition.

The first method is as follows: extracting video data corresponding to the quality inspection identity node of the personnel object from the multimedia data acquired in real time; carrying out face recognition on a video frame image corresponding to the video data; and comparing the face recognition result with face data of a standard person object of the interactive service in a prestored face database, and determining whether the person object in the video frame image is a normal participating object specified by the interactive service according to the face comparison result. Through face recognition, in combination with information pre-stored in the database, it can be accurately determined whether a person object in the video is a normal participating object (legal participating object) of the interactive service, for example, whether the person object is a service object (e.g., legal insurance) that should provide a service in the interactive service, whether the person object is a served object (e.g., actual applicant) that should receive a service in the interactive service, and the like, or whether a person that does not meet the specification or requirements exists in the current interactive service scene, such as other persons that are neither a service object nor a served object and are not required by the interactive service. In addition, in some cases, the service object and/or the served object may have multiple persons participating at the same time, and may be identified and verified one by one. Through the face recognition mode, the identity detection of the personnel object can be accurately and efficiently realized. In this way, only video data can be processed, and the data processing amount can be effectively controlled.

The second method comprises the following steps: extracting video data corresponding to the personnel object identity quality inspection node from the multimedia data acquired in real time; carrying out face recognition and certificate recognition on a video frame image corresponding to the video data; comparing the face recognition result with the face data of the standard personnel object of the interactive service in a prestored face database, and comparing the certificate recognition result with the certificate data of the standard personnel object of the interactive service in a prestored certificate database; and determining whether the personnel object in the video frame image is a normal participating object specified by the interactive service according to the face comparison result and the certificate comparison result. The document in the video frame image may be any suitable document, including but not limited to an identity document, a professional qualification document, an employee document of a service enterprise, and the like, which is not limited in this embodiment of the present application. The face recognition and the certificate recognition are combined, so that the accuracy of identity verification can be further improved, and the safety of interactive service is ensured.

The third method comprises the following steps: extracting video data and audio data corresponding to the personnel object identity quality inspection node from the multimedia data acquired in real time; carrying out face recognition on a video frame image corresponding to the video data, and carrying out voiceprint recognition on the audio data; comparing the face recognition result with face data of the standard personnel object of the interactive service in a prestored face database, and comparing the voiceprint recognition result with voiceprint data of the standard personnel object of the interactive service in a prestored voiceprint database; and determining whether the personnel object in the video frame image is a normal participating object specified by the interactive service according to the face comparison result and the voiceprint comparison result. The human face and the voiceprint are important biological identity marks, the human face recognition and the voiceprint recognition are combined, and the accuracy of identity verification and the safety of interactive services can be effectively guaranteed.

The method is as follows: extracting video data and audio data corresponding to the personnel object identity quality inspection node from the multimedia data acquired in real time; performing face recognition and certificate recognition on a video frame image corresponding to the video data, and performing voiceprint recognition on the audio data; comparing the face recognition result with face data of a standard personnel object of interactive service in a prestored face database, comparing the certificate recognition result with certificate data of the standard personnel object of interactive service in the prestored certificate database, and comparing the voiceprint recognition result with voiceprint data of the standard personnel object of interactive service in the prestored voiceprint database; and determining whether the personnel object in the video frame image is a normal participating object specified by the interactive service or not according to the face comparison result, the certificate comparison result and the voiceprint comparison result. The face recognition, the certificate recognition and the voiceprint recognition are combined for identity verification, and the method and the system are applicable to interactive services with high safety requirements, so that the legality of participants of the interactive services is ensured, and the interactive activities can be developed with higher safety standards.

But not limited to the above examples, in practical application, the way of combining voiceprint recognition with certificate recognition for identity verification is also applicable. Namely:

the fifth mode is as follows: extracting video data and audio data corresponding to the personnel object identity quality inspection node from the multimedia data acquired in real time; carrying out certificate identification on a video frame image corresponding to the video data, and carrying out voiceprint identification on the audio data; comparing the certificate identification result with the certificate data of the standard personnel object of the interactive service in a prestored certificate database, and comparing the voiceprint identification result with the voiceprint data of the standard personnel object of the interactive service in the prestored voiceprint database; and determining whether the personnel object in the video frame image is a normal participating object specified by the interactive service according to the certificate comparison result and the voiceprint comparison result.

In the above embodiment, the person object may be a service object, a served object, or both a service object and a served object.

And (II) the quality inspection node is a service flow quality inspection node.

In this case, the multimedia data corresponding to the quality inspection node is identified by at least one of the following methods: face recognition, voiceprint recognition, certificate recognition, lip movement recognition, voice recognition and action recognition. As described above, the quality check of the service flow involves both the identity check of the person object and the check whether the flow is normative, and therefore, one or more of face recognition, voiceprint recognition, and document recognition for performing the identity check of the person object, and one or more of lip motion recognition, voice recognition, and motion recognition may be selected.

Based on the mode identification corresponding to the service flow quality inspection node and the identification operation based on the mode identification, one or more of the following modes can be adopted in combination.

The first method is as follows: extracting video data corresponding to the service flow quality inspection node from the multimedia data acquired in real time; carrying out face recognition, lip movement recognition and action recognition on a video frame image corresponding to the video data; and determining the current person object as a normal person object providing the current service according to the face recognition result, and determining the current service provided by the current person object as the service specified by the interactive service according to the lip motion recognition result and the action recognition result. For example, in an insurance interaction service scenario, some links are links that need to be shown and described by an insurer, such as an insurance application process. By the method, whether the current service operation is a normal person object (such as an insurer in an example) meeting the interactive service specification or standard can be determined according to the face recognition result, and whether the normal person object provides services meeting the current link requirement (such as an insurance application flow chart displayed by the insurer in the example and explanation) can be determined according to the lip movement recognition result and the action recognition result. Therefore, the safety and the regularity of the personnel providing the service and the process links are ensured.

The second method comprises the following steps: extracting video data and audio data corresponding to the service flow quality inspection node from the multimedia data acquired in real time; carrying out face recognition, lip movement recognition and action recognition on a video frame image corresponding to the video data, and carrying out voiceprint recognition on the audio data; and determining the current person object as a normal person object providing the current service according to the face recognition result and the voiceprint recognition result, and determining the current service provided by the current person object as a service specified by the interactive service according to the lip movement recognition result and the action recognition result. Compared with the first mode, in the method, when the identity of the person object is determined, the face recognition mode and the voiceprint recognition mode are combined, so that the accuracy of judging whether the person object providing the service is a normal person object meeting the standard or standard of the interactive service is improved.

The third method comprises the following steps: extracting video data corresponding to the service flow quality inspection node from the multimedia data acquired in real time; carrying out face recognition, certificate recognition and action recognition on a video frame image corresponding to the video data; and determining the current person object as a normal person object providing the current service according to the face recognition result and the certificate recognition result, and determining the current service provided by the current person object as the service specified by the interactive service according to the action recognition result. Similar to the second mode, when the identity of the person object is determined, the face recognition mode and the certificate recognition mode are combined, so that the accuracy of judging whether the person object providing service is a normal person object meeting the specifications or standards of interactive service is improved.

The method is as follows: extracting video data and audio data corresponding to the service flow quality inspection node from the multimedia data acquired in real time; carrying out face recognition and lip motion recognition on video frame images corresponding to the video data, and carrying out voice recognition on the audio data; and determining the current person object as a normal person object providing current service according to the face recognition result, and determining the current service provided by the current person object as the service specified by the interactive service according to the lip movement recognition result and the voice recognition result. The lip motion service can identify the outlet type, and in combination with the voice recognition, the identification of specific contents related to the service can be simply realized, and whether the service provided by the personnel object meets the specifications or standards of the interactive service can be judged more effectively and accurately.

The fifth mode is as follows: extracting video data and audio data corresponding to the service flow quality inspection node from the multimedia data acquired in real time; carrying out face recognition and action recognition on a video frame image corresponding to the video data, and carrying out voice recognition on the audio data; and determining the current person object as a normal person object providing the current service according to the face recognition result, and determining the current service provided by the current person object as the service specified by the interactive service according to the action recognition result and the voice recognition result. For example, when some terms and rules are explained and explained, the action recognition and the voice recognition are combined, so that whether the person object performs the operation specified in the current link or not can be effectively judged, and whether the content of the operation is the content conforming to the specification of the current link or not can be effectively judged. Therefore, the accuracy of quality detection of the service flow quality inspection node is effectively guaranteed.

In the above embodiment, the person object may be a service object, a served object, or both a service object and a served object. Moreover, the identity verification of the person object can be realized according to one or more combinations of face recognition, voiceprint recognition and certificate recognition besides the above exemplary modes; the judgment of whether the service flow meets the specifications and standards of the current link can be realized according to one or more combinations of action recognition, voice recognition, whiteboard content recognition and text recognition, and the method and the device are all applicable to the scheme of the embodiment of the application.

Besides, the multi-modal identification of the service flow quality inspection node can be further divided into multi-modal identification of the service flow sequence, multi-modal identification of the service flow interaction and multi-modal identification of the service flow completeness. Among other things, multimodal recognition of the service flow order can check whether the order of execution of the service flows meets a prescribed order or requirement, e.g., whether the presentation of the terms of the application precedes the signing of the application agreement, whether the identity verification of the insurer and applicant precedes the description of the application content, and so on. Multimodal recognition of service flow interactions can check whether the service object and the serviced object have previously performed sufficient information interaction, particularly whether the service object has adequately illustrated or exposed the service content or information, and whether the serviced object has sufficient knowledge or knowledge of the information. For example, whether the insurer fully specified the insurable content to the insurant, whether the insurable content has been fully understood by the insurant based on the insurer's specification, and the like. The multi-mode recognition aiming at the service flow completeness can check the completion degree of each link of the interactive service and the overall completion degree of the interactive service. For example, whether the insurer has made a presentation of terms of the application to the applicant, or to which link the interactive service has currently proceeded, etc. Therefore, the process of the interactive service can be effectively mastered and controlled, and the normal and complete operation of the interactive service is ensured.

In addition, in the above various manners, if it is determined that the current person object is not a normal person object providing the current service, corresponding processing may be performed, including but not limited to performing an exception prompt and/or terminating the interactive service, so as to ensure compliance and validity of the interactive service.

In some scenarios, the person object includes multiple, e.g., multiple serviced objects such as multiple applicant persons, or multiple service objects such as multiple insurers, etc. Under the condition, the identity of the personnel needs to be verified one by one so as to avoid that irrelevant personnel participate in the interactive service to influence and interfere the interactive service. In this case, when it is determined that the current person object is a normal person object providing the current service, the method includes: and if the plurality of current personnel objects are consistent with the standard personnel objects specified by the interactive service and the plurality of current personnel objects are simultaneously present in a plurality of video frames of the video data, determining that the current personnel objects are normal personnel objects providing the current service. Further, the identities of a plurality of current person objects may also be marked in the plurality of video frames to clearly distinguish the respective persons. Optionally, when the person object is marked, information such as a role and a name of the current person object may be displayed, for example: insurer king XX, insured person zheng XX, insured person zhang XX, etc.

Through the multiple modes, multi-modal recognition of the multimedia data of the service flow in the interactive service scene can be realized, and thus quality detection of the service flow is realized.

And (III) the quality inspection node is a service content quality inspection node.

In this case, the multimedia data corresponding to the quality inspection node is identified by at least one of the following methods: face recognition, voiceprint recognition, certificate recognition, voice recognition and image content recognition. The quality detection of the service content also relates to both the identity detection of the person object and the detection of whether the service content is normative, so that one or more of face recognition, voiceprint recognition and document recognition for performing the identity detection of the person object, and one or more of voice recognition, image content recognition and text recognition can be selected.

Based on the above, the modality identifications corresponding to the service content quality inspection nodes and the identification operation based on the modality identifications can adopt one or more of the following manners in combination.

The first method is as follows: extracting video data corresponding to the service content quality inspection node from the multimedia data acquired in real time; carrying out face recognition and image content recognition on a video frame image corresponding to the video data; and determining the current person object as a normal person object providing the current service according to the face recognition result, and determining the content currently displayed by the current person object as the service content specified by the interactive service according to the image content recognition result. For example, in an insurance interaction service scene, the content related to an insurance process, the content related to insurance content and the like need to be displayed, and through face recognition and image content recognition, whether the service object displays the related service content to the served object can be quickly recognized, so that the service content is sufficiently transmitted and communicated.

The second method comprises the following steps: extracting video data and audio data corresponding to the service content quality inspection node from the multimedia data acquired in real time; performing face recognition on a video frame image corresponding to the video data, and performing voice recognition on the audio data; and determining the current person object as a normal person object providing the current service according to the face recognition result, and determining the content currently displayed by the current person object as the service content specified by the interactive service according to the voice recognition result. The method is suitable for interactive service scenes in which the related service contents can be transmitted through voice, video frame images do not need to be processed, data processing burden is reduced, and transmission and communication of the service contents can be effectively guaranteed.

The third method comprises the following steps: extracting video data corresponding to the service content quality inspection node from the multimedia data acquired in real time; carrying out face recognition, certificate recognition and image content recognition on a video frame image corresponding to the video data; and determining the current person object as a normal person object providing the current service according to the face recognition result and the certificate recognition result, and determining the content currently displayed by the current person object as service content specified by the interactive service according to the image content recognition result. In the method, the human face recognition and the certificate recognition are combined to ensure that the human object is the human object which accords with the actual interactive service specification and standard, and the image content recognition result is combined to ensure that the normal human object displays the service content which accords with the specification and standard, so that the safety and the accuracy of the service content quality detection are improved.

The method is as follows: extracting video data and audio data corresponding to the service content quality inspection node from the multimedia data acquired in real time; carrying out face recognition and image content recognition on a video frame image corresponding to the video data, and carrying out voiceprint recognition on the audio data; and determining the current person object as a normal person object providing the current service according to the face recognition result and the voiceprint recognition result, and determining the content currently displayed by the current person object as the service content specified by the interactive service according to the image content recognition result. In the method, the human face recognition and the voiceprint recognition are combined to ensure that the human object is a human object which accords with the actual interactive service standard and standard, and the image content recognition result is combined to ensure that the normal human object displays the service content which accords with the standard and standard, so that the safety and the accuracy of the quality detection of the service content are improved.

The fifth mode is as follows: extracting video data and audio data corresponding to the service content quality inspection node from the multimedia data acquired in real time; performing face recognition on a video frame image corresponding to the video data, and performing voiceprint recognition and voice recognition on the audio data; and determining the current person object as a normal person object providing the current service according to the face recognition result and the voiceprint recognition result, and determining the content currently displayed by the current person object as service content specified by the interactive service according to the voice recognition result. In the method, voice recognition is carried out according to the audio data, the fact that the normal personnel object shows the service content meeting the standard and the standard is determined according to the voice recognition result, and safety and accuracy of service content quality detection are improved.

In the above embodiment, the person object may be a service object, a served object, or both a service object and a served object. Moreover, the identity verification of the person object can be realized according to one or more of the combination of face recognition, voiceprint recognition and certificate recognition besides the above exemplary modes; the judgment of whether the service content meets the specification and standard of the current link can be realized according to one or more combinations of action recognition, voice recognition, image content recognition, whiteboard content recognition and text recognition, and the method and the device are all suitable for the scheme of the embodiment of the application.

In addition, service content includes, but is not limited to: one or more of service document presentation, key information prompting. Based on this, multimodal recognition for service content quality inspection nodes can be further subdivided into multimodal recognition for service document exposure, multimodal recognition for key information cues, and multimodal recognition for service document signing. In which multimodal recognition of the service document presentation checks whether the service document to be presented is fully presented or illustrated, e.g., whether the insurer has presented all or only a portion of the terms of the application to the applicant, etc. The multi-mode recognition aiming at the key information prompt is used for performing key prompt on the key information in the service document needing to be displayed so as to remind the served object and/or the service object of paying attention to the key information, for example, whether to repeat voice for a plurality of times on the year or the amount term in the insurance clause, or to perform voice key prompt, or to perform one or more operations such as font amplification, font boldness, striking color prompt and the like on the key information displayed in a text form, and the like. Multimodal recognition for service document signing can check whether the service document is correctly and canonically signed by the signer, e.g., whether the applicant has signed a name at a specified location and the signed name is the same as the reserved name's glyph writing, etc. Therefore, the full display and reminding of the contents to be displayed in the interactive service and the effective signing of the document can be ensured.

Through the multiple modes, multi-modal recognition of the multimedia data of the service content in the interactive service scene can be realized, and thus quality detection of the service content can be realized.

Step S208: and determining the quality detection result of the multimedia data acquired in real time according to the modal identification result.

As described above, according to the modality identification result, it may be determined whether the quality detection of the corresponding quality inspection node passes or not, and further, the quality detection result of the multimedia data acquired from the inside is determined based on the result, for example, the current multimedia data all pass the quality detection, or the current multimedia data do not pass the quality detection, and the like.

Step S210: and displaying the quality detection result.

The step is an optional step, and in practical application, the quality detection result can be used as intermediate data for other purposes. But the quality detection result is displayed, so that related personnel can more intuitively know the quality condition of the interactive service, and clear judgment can be made for whether the interactive service meets the specifications and standards.

In addition, in order to prevent sudden interference behaviors such as telephone call, short message reminding, APP pushing and the like from occurring in the process of recording audio and video for interactive services, there is a need to start the flight mode of the mobile terminal in some cases. After the audio and video recording is started, the modal identification and quality inspection items based on the modal identification are carried out, such as the above-mentioned face identification, certificate identification, lip movement identification, motion identification, voice identification, image content, whiteboard content, text content identification and the like. Since the mobile terminal is in the flight mode, the conventional interaction mode using the algorithm model through data transmission and interaction between the mobile terminal and the server is not applicable. Thus, in a feasible manner, the quality detection of the interactive service can be realized in an off-line manner. For example, in the offline mode, a preset software development kit is called through a calling interface to perform an interactive service quality detection method. For example, a mobile terminal SDK may be constructed and packaged as an App to provide quality testing capabilities for various quality testing nodes. In addition, the mobile terminal can be provided with a corresponding database so as to compare and match corresponding information. Therefore, the quality detection of the interactive service in the flight mode can be realized, and the detection process is not required to be interrupted.

It should be noted that, the examples in this embodiment all use an insurance industry interaction scenario as an example, but are not limited thereto, and other scenarios such as a banking industry, a securities industry, a notary industry, and the like may be applicable. In addition, online education scenarios and meeting scenarios may also be applicable.

For example, in an online education scene, whether a current teacher is a teacher meeting the regulations (qualification, the current class should teach a course) can be judged through quality detection of the personnel object identity quality detection node; judging whether the current teacher interacts with students according to the regulations, the course teaching completion degree and the like through the quality detection of the service flow quality inspection nodes; and judging whether the current teacher teaches the content to be taught or not, whether the key content part is repeatedly taught and prompted or not, and the like through the quality detection of the service content quality inspection node.

For another example, in a conference scene, whether current conference participants are all allowed participants can be judged through quality detection of the personnel object identity quality inspection node, or whether the current conference participants are presenters and speakers are determined; judging whether the conference is carried out according to a preset flow or not by detecting the quality of the service flow quality detection node; through the quality detection of the service content quality inspection node, whether the related conference issues or contents are effectively displayed or not, whether key parts in the conference contents are highlighted or not and the like are judged.

In addition, in practical application, voiceprint recognition and voice recognition can be carried out on audio data in the multimedia data acquired in real time; and generating the text content corresponding to at least one person object according to the voiceprint recognition result and the voice recognition result, and displaying the text content. In the mode, corresponding functions can be realized only by processing the audio data, the processing efficiency is high, and the processing data volume is small. For example, in a conference scene, a text form of speech content corresponding to each speaker can be generated by voiceprint recognition and voice recognition. For example, zhang three: xxxxxxxxxx; and fourthly, plum: yyyyyyyy; and (5) Wang Wu: zzzzzzzzzzzzzzzzzzzz. Therefore, the text of the voice content is effectively realized, and the corresponding speaking content record is generated.

Further optionally, when the text content is displayed, whether preset key information is contained in the text content can be judged; if yes, highlighting the key information in the text content. The manner of highlighting can be set by those skilled in the art according to the actual requirement, including but not limited to font enlargement, bolding, italics, using eye-catching color indication, etc. By the method, the participators in the interactive service can be brought to pay full attention and know about the key information, and effective transmission and communication of the key information are ensured.

According to the embodiment, multimedia data of interactive services are recorded aiming at an interactive service scene, part or all of personnel object identity quality inspection nodes, service flow quality inspection nodes and service content quality inspection nodes related to the multimedia data are detected in real time, corresponding single-mode identification or multi-mode identification is determined based on the types of the nodes, content corresponding to the nodes, namely single-mode identification results or multi-mode identification results, is identified from the multimedia data corresponding to the quality inspection nodes, and quality detection results of the multimedia data acquired in real time are determined based on the single-mode identification results or the multi-mode identification results. Because different types of quality inspection nodes relate to different detection targets, different modal identification modes can be adopted based on the detection results to obtain the detection results, effective and targeted quality detection of the interactive service can be ensured, and the overall detection quality of the interactive service is improved.

The interactive quality of service detection method of the present embodiment may be performed by any suitable electronic device having data processing capabilities, including but not limited to: server, mobile terminal (such as mobile phone, PAD, etc.), PC, etc.

EXAMPLE III

Referring to fig. 3, a block diagram of an interactive service quality detection apparatus according to a third embodiment of the present application is shown.

The interactive service quality detection device of the embodiment comprises: an obtaining module 302, configured to obtain, in real time, multimedia data recorded for an interactive service, where the multimedia data includes: audio data and video data; a detection module 304, configured to perform quality inspection node detection in real time according to the obtained multimedia data, where the quality inspection node includes at least one of the following types: the personnel object identity quality inspection node, the service flow quality inspection node and the service content quality inspection node; the identification module 306 is configured to perform single-mode identification or multi-mode identification matched with the type of the quality inspection node on the multimedia data corresponding to the quality inspection node; the determining module 308 is configured to determine a quality detection result of the multimedia data acquired in real time according to the modality identification result.

Optionally, the identifying module 306 is configured to, if the quality inspection node is a person object identity quality inspection node, perform at least one of the following identifications on the multimedia data corresponding to the quality inspection node: face recognition, voiceprint recognition and certificate recognition; or, if the quality inspection node is a service flow quality inspection node, performing at least one of the following identifications on the multimedia data corresponding to the quality inspection node: face recognition, voiceprint recognition, certificate recognition, lip movement recognition, voice recognition and action recognition; or, if the quality inspection node is a service content quality inspection node, performing at least one of the following identifications on the multimedia data corresponding to the quality inspection node: face recognition, voiceprint recognition, certificate recognition, voice recognition and image content recognition.

Optionally, if the quality inspection node is a person object identity quality inspection node, the identifying module 306 performs at least one of the following identification on the multimedia data corresponding to the quality inspection node: face identification, voiceprint identification, license identification include: extracting video data corresponding to the quality inspection identity node of the personnel object from the multimedia data acquired in real time; carrying out face recognition on a video frame image corresponding to the video data; comparing the face recognition result with the face data of the standard person object of the interactive service in a prestored face database, and determining whether the person object in the video frame image is a normal participating object specified by the interactive service according to the face comparison result; or extracting video data corresponding to the personnel object identity quality inspection node from the multimedia data acquired in real time; carrying out face recognition and certificate recognition on the video frame image corresponding to the video data; comparing the face recognition result with the face data of the standard personnel object of the interactive service in a prestored face database, and comparing the certificate recognition result with the certificate data of the standard personnel object of the interactive service in a prestored certificate database; determining whether the personnel object in the video frame image is a normal participating object specified by the interactive service or not according to the face comparison result and the certificate comparison result; or extracting video data and audio data corresponding to the personnel object identity quality inspection node from the multimedia data acquired in real time; performing face recognition on a video frame image corresponding to the video data, and performing voiceprint recognition on the audio data; comparing the face recognition result with the face data of the standard personnel object of the interactive service in a prestored face database, and comparing the voiceprint recognition result with the voiceprint data of the standard personnel object of the interactive service in the prestored voiceprint database; determining whether the personnel object in the video frame image is a normal participating object specified by the interactive service or not according to the face comparison result and the voiceprint comparison result; or extracting video data and audio data corresponding to the personnel object identity quality inspection node from the multimedia data acquired in real time; performing face recognition and certificate recognition on a video frame image corresponding to the video data, and performing voiceprint recognition on the audio data; comparing the face recognition result with the face data of the standard personnel object of the interactive service in a prestored face database, comparing the certificate recognition result with the certificate data of the standard personnel object of the interactive service in a prestored certificate database, and comparing the voiceprint recognition result with the voiceprint data of the standard personnel object of the interactive service in a prestored voiceprint database; and determining whether the personnel object in the video frame image is a normal participating object specified by the interactive service or not according to the face comparison result, the certificate comparison result and the voiceprint comparison result.

Optionally, if the quality inspection node is a service flow quality inspection node, the identifying module 306 performs at least one of the following identifications on the multimedia data corresponding to the quality inspection node: face identification, voiceprint identification, certificate identification, lip movement identification, speech recognition, action identification include: extracting video data corresponding to the service flow quality inspection node from the multimedia data acquired in real time; carrying out face recognition, lip movement recognition and action recognition on the video frame image corresponding to the video data; determining the current person object as a normal person object providing the current service according to the face recognition result, and determining the current service provided by the current person object as the service specified by the interactive service according to the lip motion recognition result and the action recognition result; or extracting video data and audio data corresponding to the service flow quality inspection node from the multimedia data acquired in real time; carrying out face recognition, lip movement recognition and action recognition on a video frame image corresponding to the video data, and carrying out voiceprint recognition on the audio data; determining the current person object as a normal person object providing current service according to the face recognition result and the voiceprint recognition result, and determining the current service provided by the current person object as the service specified by the interactive service according to the lip movement recognition result and the action recognition result; or extracting video data corresponding to the service flow quality inspection node from the multimedia data acquired in real time; carrying out face recognition, certificate recognition and action recognition on the video frame image corresponding to the video data; determining the current person object as a normal person object providing the current service according to the face recognition result and the certificate recognition result, and determining the current service provided by the current person object as the service specified by the interactive service according to the action recognition result; or extracting video data and audio data corresponding to the service flow quality inspection node from the multimedia data acquired in real time; performing face recognition and lip movement recognition on a video frame image corresponding to the video data, and performing voice recognition on the audio data; determining the current person object as a normal person object providing the current service according to the face recognition result, and determining the current service provided by the current person object as the service specified by the interactive service according to the lip motion recognition result and the voice recognition result; or extracting video data and audio data corresponding to the service flow quality inspection node from the multimedia data acquired in real time; performing face recognition and action recognition on a video frame image corresponding to the video data, and performing voice recognition on the audio data; and determining the current person object as a normal person object providing the current service according to the face recognition result, and determining the current service provided by the current person object as the service specified by the interactive service according to the action recognition result and the voice recognition result.

Optionally, the interactive service quality detection apparatus of this embodiment further includes: an exception module 310, configured to perform exception prompting and/or suspend the interactive service if it is determined that the current person object is not a normal person object providing the current service.

Optionally, the current person object includes a plurality of person objects, and the determining module 306 determines that the current person object is a normal person object providing the current service, including: and if the plurality of current personnel objects are consistent with the standard personnel objects specified by the interactive service and the plurality of current personnel objects are simultaneously present in the plurality of video frames of the video data, determining that the current personnel objects are normal personnel objects providing the current service.

Optionally, the identifying module 306 is further configured to identify a plurality of current person objects in the plurality of video frames.

Optionally, if the quality inspection node is a service content quality inspection node, the identifying module 306 performs at least one of the following identification on the multimedia data corresponding to the quality inspection node: face identification, voiceprint identification, certificate identification, speech recognition, image content identification, include: extracting video data corresponding to the service content quality inspection node from the multimedia data acquired in real time; carrying out face recognition and image content recognition on a video frame image corresponding to the video data; determining the current person object as a normal person object providing current service according to the face recognition result, and determining the content currently displayed by the current person object as the service content specified by the interactive service according to the image content recognition result; or extracting video data and audio data corresponding to the service content quality inspection node from the multimedia data acquired in real time; performing face recognition on a video frame image corresponding to the video data, and performing voice recognition on the audio data; determining the current person object as a normal person object providing current service according to the face recognition result, and determining the content currently displayed by the current person object as the service content specified by the interactive service according to the voice recognition result; or extracting video data corresponding to the service content quality inspection node from the multimedia data acquired in real time; performing face recognition, certificate recognition and image content recognition on a video frame image corresponding to the video data; determining the current person object as a normal person object providing the current service according to the face recognition result and the certificate recognition result, and determining the content currently displayed by the current person object as the service content specified by the interactive service according to the image content recognition result; or extracting video data and audio data corresponding to the service content quality inspection node from the multimedia data acquired in real time; carrying out face recognition and image content recognition on a video frame image corresponding to the video data, and carrying out voiceprint recognition on the audio data; determining the current person object as a normal person object providing the current service according to the face recognition result and the voiceprint recognition result, and determining the content currently displayed by the current person object as the service content specified by the interactive service according to the image content recognition result; or extracting video data and audio data corresponding to the service content quality inspection node from the multimedia data acquired in real time; performing face recognition on a video frame image corresponding to the video data, and performing voiceprint recognition and voice recognition on the audio data; and determining the current person object as a normal person object providing the current service according to the face recognition result and the voiceprint recognition result, and determining the content currently displayed by the current person object as the service content specified by the interactive service according to the voice recognition result.

Optionally, the service content includes a service document presentation and/or a key information prompt.

Optionally, the interactive service quality detection apparatus of this embodiment further includes: the offline module 312 is configured to call, in an offline mode, a preset software development kit through a call interface to perform the interactive service quality detection method.

Optionally, the identifying module 306 is further configured to perform voiceprint recognition and voice recognition on the audio data in the multimedia data acquired in real time; and generating text contents corresponding to at least one person object according to the voiceprint recognition result and the voice recognition result, and displaying the text contents.

Optionally, the displaying the text content by the recognition module 306 includes: judging whether the text content contains preset key information or not; if yes, highlighting the key information in the text content.

The interactive service quality detection apparatus of this embodiment is used to implement the corresponding interactive service quality detection method in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again. In addition, the functional implementation of each module in the interactive service quality detection apparatus of this embodiment can refer to the description of the corresponding part in the foregoing method embodiment, and is not repeated here.

Example four

Referring to fig. 4, a schematic structural diagram of an electronic device according to a fourth embodiment of the present application is shown, and the specific embodiment of the present application does not limit a specific implementation of the electronic device.

As shown in fig. 4, the electronic device may include: a processor (processor) 402, a communication Interface 404, a memory 406, a communication bus 408, an audio/video capture device 410, and a display 412.

Wherein:

the processor 402, the communication interface 404, the audio/video capture device 410, the display 412, and the memory 406 communicate with each other via the communication bus 408.

A communication interface 404 for communicating with other electronic devices or servers.

And the audio and video acquisition device 410 is configured to record and acquire video data of real-time audio data of the interactive service to form multimedia data. For example, video data may be captured using an image capture device, such as a camera, audio data may be captured using an audio capture device, such as a microphone, and so forth.

The processor 402 is used for acquiring multimedia data recorded and acquired by the audio and video acquisition equipment in real time; performing quality inspection node detection in real time according to the acquired multimedia data, wherein the quality inspection node comprises at least one of the following components: the personnel object identity quality inspection node, the service flow quality inspection node and the service content quality inspection node; performing single-mode recognition or multi-mode recognition matched with the type of the quality inspection node on the multimedia data corresponding to the quality inspection node; and determining the quality detection result of the multimedia data acquired in real time according to the modal identification result.

And the display 412 is used for displaying the multimedia data recorded and collected by the audio/video collection device 410 in real time and/or the quality detection result output by the processor 402.

A memory 406 for storing the multimedia data, the quality detection result, and intermediate data output by the processor in determining the quality detection result.

The processor 402 may be a central processing unit CPU, or an Application Specific Integrated Circuit ASIC (Application Specific Integrated Circuit), or one or more Integrated circuits configured to implement embodiments of the present Application. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 406 for storing data and programs. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

For specific implementation of each step in the program 402, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing interactive quality of service detection method embodiment, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present application may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present application.

The above-described methods according to embodiments of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the methods described herein may be stored in such software processes on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It is understood that a computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by a computer, processor, or hardware, implements the interactive quality of service detection methods described herein. Further, when a general-purpose computer accesses code for implementing the interactive quality of service detection methods shown herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the interactive quality of service detection methods shown herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The above embodiments are only used for illustrating the embodiments of the present application, and not for limiting the embodiments of the present application, and those skilled in the relevant art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also belong to the scope of the embodiments of the present application, and the scope of patent protection of the embodiments of the present application should be defined by the claims.

Claims

1. An interactive service quality detection method comprises the following steps:

acquiring multimedia data recorded for interactive services in real time, wherein the multimedia data comprises: audio data and video data;

performing quality inspection node detection in real time according to the acquired multimedia data, wherein the quality inspection node comprises at least one of the following types: the personnel object identity quality inspection node, the service flow quality inspection node and the service content quality inspection node;

performing single-mode recognition or multi-mode recognition matched with the type of the quality inspection node on the multimedia data corresponding to the quality inspection node;

and determining the quality detection result of the multimedia data acquired in real time according to the modal identification result.

2. The method of claim 1, wherein performing single-mode recognition or multi-mode recognition matching the type of the quality inspection node on the multimedia data corresponding to the quality inspection node comprises:

if the quality inspection node is a person object identity quality inspection node, performing at least one of the following identifications on the multimedia data corresponding to the quality inspection node: face recognition, voiceprint recognition and certificate recognition;

or,

if the quality inspection node is a service flow quality inspection node, performing at least one of the following identifications on the multimedia data corresponding to the quality inspection node: face recognition, voiceprint recognition, certificate recognition, lip movement recognition, voice recognition and action recognition;

or,

if the quality inspection node is a service content quality inspection node, performing at least one of the following identifications on the multimedia data corresponding to the quality inspection node: face recognition, voiceprint recognition, certificate recognition, voice recognition and image content recognition.

3. The method of claim 2, wherein if the quality inspection node is a person object identity quality inspection node, at least one of the following identifications is made to the multimedia data corresponding to the quality inspection node: face identification, voiceprint identification, license identification include:

extracting video data corresponding to the quality inspection identity node of the personnel object from the multimedia data acquired in real time; carrying out face recognition on a video frame image corresponding to the video data; comparing the face recognition result with face data of a standard person object of the interactive service in a prestored face database, and determining whether the person object in the video frame image is a normal participating object specified by the interactive service according to the face comparison result;

or,

extracting video data corresponding to the personnel object identity quality inspection node from the multimedia data acquired in real time; carrying out face recognition and certificate recognition on the video frame image corresponding to the video data; comparing the face recognition result with the face data of the standard personnel object of the interactive service in a prestored face database, and comparing the certificate recognition result with the certificate data of the standard personnel object of the interactive service in a prestored certificate database; determining whether the personnel object in the video frame image is a normal participating object specified by the interactive service or not according to the face comparison result and the certificate comparison result;

or,

extracting video data and audio data corresponding to the personnel object identity quality inspection node from the multimedia data acquired in real time; performing face recognition on a video frame image corresponding to the video data, and performing voiceprint recognition on the audio data; comparing the face recognition result with face data of the standard personnel object of the interactive service in a prestored face database, and comparing the voiceprint recognition result with voiceprint data of the standard personnel object of the interactive service in a prestored voiceprint database; determining whether the personnel object in the video frame image is a normal participating object specified by the interactive service or not according to the face comparison result and the voiceprint comparison result;

or,

extracting video data and audio data corresponding to the personnel object identity quality inspection node from the multimedia data acquired in real time; performing face recognition and certificate recognition on a video frame image corresponding to the video data, and performing voiceprint recognition on the audio data; comparing the face recognition result with the face data of the standard personnel object of the interactive service in a prestored face database, comparing the certificate recognition result with the certificate data of the standard personnel object of the interactive service in a prestored certificate database, and comparing the voiceprint recognition result with the voiceprint data of the standard personnel object of the interactive service in a prestored voiceprint database; and determining whether the personnel object in the video frame image is a normal participating object specified by the interactive service or not according to the face comparison result, the certificate comparison result and the voiceprint comparison result.

4. The method of claim 2, wherein if the quality inspection node is a service flow quality inspection node, at least one of the following identifications is performed on the multimedia data corresponding to the quality inspection node: face identification, voiceprint identification, certificate identification, lip movement identification, speech recognition, action identification include:

extracting video data corresponding to the service flow quality inspection node from the multimedia data acquired in real time; carrying out face recognition, lip movement recognition and action recognition on a video frame image corresponding to the video data; determining the current person object as a normal person object providing the current service according to the face recognition result, and determining the current service provided by the current person object as the service specified by the interactive service according to the lip motion recognition result and the action recognition result;

or,

extracting video data and audio data corresponding to the service flow quality inspection node from the multimedia data acquired in real time; performing face recognition, lip movement recognition and action recognition on a video frame image corresponding to the video data, and performing voiceprint recognition on the audio data; determining the current person object as a normal person object providing the current service according to the face recognition result and the voiceprint recognition result, and determining the current service provided by the current person object as the service specified by the interactive service according to the lip movement recognition result and the action recognition result;

or,

extracting video data corresponding to the service flow quality inspection node from the multimedia data acquired in real time; carrying out face recognition, certificate recognition and action recognition on the video frame image corresponding to the video data; determining the current person object as a normal person object providing the current service according to the face recognition result and the certificate recognition result, and determining the current service provided by the current person object as the service specified by the interactive service according to the action recognition result;

or,

extracting video data and audio data corresponding to the service flow quality inspection node from the multimedia data acquired in real time; performing face recognition and lip movement recognition on a video frame image corresponding to the video data, and performing voice recognition on the audio data; determining the current person object as a normal person object providing current service according to the face recognition result, and determining the current service provided by the current person object as the service specified by the interactive service according to the lip movement recognition result and the voice recognition result;

or,

extracting video data and audio data corresponding to the service flow quality inspection node from the multimedia data acquired in real time; performing face recognition and action recognition on a video frame image corresponding to the video data, and performing voice recognition on the audio data; and determining the current person object as a normal person object providing the current service according to the face recognition result, and determining the current service provided by the current person object as the service specified by the interactive service according to the action recognition result and the voice recognition result.

5. The method of claim 4, wherein the method further comprises:

and if the current person object is determined not to be a normal person object providing the current service, performing abnormal prompt and/or stopping the interactive service.

6. The method of claim 4, wherein the current person object comprises a plurality of person objects, and the determining that the current person object is a normal person object providing current service comprises:

and if the plurality of current personnel objects are consistent with the standard personnel objects specified by the interactive service and the plurality of current personnel objects are simultaneously present in the plurality of video frames of the video data, determining that the current personnel objects are normal personnel objects providing the current service.

7. The method of claim 6, wherein the method further comprises: the identities of a plurality of current person objects are marked in the plurality of video frames.

8. The method of claim 2, wherein if the quality inspection node is a service content quality inspection node, at least one of the following identifications is performed on the multimedia data corresponding to the quality inspection node: face identification, voiceprint identification, certificate identification, speech recognition, image content identification, include:

extracting video data corresponding to the service content quality inspection node from the multimedia data acquired in real time; carrying out face recognition and image content recognition on a video frame image corresponding to the video data; determining the current person object as a normal person object providing current service according to the face recognition result, and determining the content currently displayed by the current person object as the service content specified by the interactive service according to the image content recognition result;

or,

extracting video data and audio data corresponding to the service content quality inspection node from the multimedia data acquired in real time; performing face recognition on a video frame image corresponding to the video data, and performing voice recognition on the audio data; determining the current person object as a normal person object providing the current service according to the face recognition result, and determining the content currently displayed by the current person object as the service content specified by the interactive service according to the voice recognition result;

or,

extracting video data corresponding to the service content quality inspection node from the multimedia data acquired in real time; performing face recognition, certificate recognition and image content recognition on a video frame image corresponding to the video data; determining the current person object as a normal person object providing the current service according to the face recognition result and the certificate recognition result, and determining the content currently displayed by the current person object as the service content specified by the interactive service according to the image content recognition result;

or,

extracting video data and audio data corresponding to the service content quality inspection node from the multimedia data acquired in real time; carrying out face recognition and image content recognition on a video frame image corresponding to the video data, and carrying out voiceprint recognition on the audio data; determining the current person object as a normal person object providing the current service according to the face recognition result and the voiceprint recognition result, and determining the content currently displayed by the current person object as the service content specified by the interactive service according to the image content recognition result;

or,

extracting video data and audio data corresponding to the service content quality inspection node from the multimedia data acquired in real time; performing face recognition on a video frame image corresponding to the video data, and performing voiceprint recognition and voice recognition on the audio data; and determining the current person object as a normal person object providing the current service according to the face recognition result and the voiceprint recognition result, and determining the content currently displayed by the current person object as the service content specified by the interactive service according to the voice recognition result.

9. The method of claim 8, wherein the service content comprises a service document presentation and/or a key information cue.

10. The method of any of claims 1-9, wherein the method further comprises:

and in an off-line mode, calling a preset software development kit through a calling interface to carry out the interactive service quality detection method.

11. The method of any of claims 1-9, wherein the method further comprises:

performing voiceprint recognition and voice recognition on audio data in the multimedia data acquired in real time;

and generating text content corresponding to at least one person object according to the voiceprint recognition result and the voice recognition result, and displaying the text content.

12. The method of claim 11, wherein said presenting said textual content comprises:

judging whether the text content contains preset key information or not;

if yes, highlighting the key information in the text content.

13. An electronic device, comprising: the device comprises audio and video acquisition equipment, a display, a processor, a memory, a communication interface and a communication bus, wherein the audio and video acquisition equipment, the display, the processor, the memory and the communication interface finish mutual communication through the communication bus;

the audio and video acquisition equipment is used for recording and acquiring the video data of the real-time audio data of the interactive service to form multimedia data;

the processor is used for acquiring multimedia data recorded and acquired by the audio and video acquisition equipment in real time; performing quality inspection node detection in real time according to the acquired multimedia data, wherein the quality inspection node comprises at least one of the following components: the personnel object identity quality inspection node, the service flow quality inspection node and the service content quality inspection node; performing single-mode recognition or multi-mode recognition matched with the type of the quality inspection node on the multimedia data corresponding to the quality inspection node; determining a quality detection result of the multimedia data acquired in real time according to a modal identification result;

the display is used for displaying the multimedia data recorded and collected by the audio and video collecting equipment in real time and/or the quality detection result output by the processor;

the memory is used for storing the multimedia data, the quality detection result and intermediate data output by the processor in the process of determining the quality detection result.

14. A computer storage medium having stored thereon a computer program which, when executed by a processor, implements the interactive quality of service detection method as claimed in any one of claims 1 to 12.