WO2021159745A1

WO2021159745A1 - Data processing method and apparatus, device, and medium

Info

Publication number: WO2021159745A1
Application number: PCT/CN2020/124730
Authority: WO
Inventors: 王锁平; 周登宇; 张伟坤
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-09-08
Filing date: 2020-10-29
Publication date: 2021-08-19
Also published as: CN112037796A

Abstract

Disclosed are a data processing method and apparatus, a device, and a medium, which relate to speech processing technology in artificial intelligence, and are applicable to a blockchain network. The method comprises: acquiring, from a terminal, first multimedia data of a first service; identifying the first multimedia data to obtain first service attribute information; determining, from a shared identification engine set, an identification engine matched with the first service attribute information to serve as a target identification engine; outputting prompt information of processing the first service; acquiring, from the terminal, second multimedia data sent with regard to the prompt information; and sending the second multimedia data to a first service platform, so that the first service platform uses the target identification engine to identify the second multimedia data, and processes the first service. By means of the embodiments of the present application, resource waste can be avoided, and the cost is reduced.

Description

Data processing method, device, equipment and medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 8, 2020, the application number is 202010918464.6, and the invention title is "a data processing method, device, equipment and medium", the entire content of which is incorporated by reference In this application.

Technical field

This application relates to voice processing technology in artificial intelligence, and in particular to a data processing method, device, equipment, and medium.

Background technique

At present, video robot calls are used in many industries, such as business consultation and business processing in the service industry. Video robot calls have gradually replaced manual labor, and can achieve business processing anytime, anywhere. The inventor realizes that when a user calls a video robot, different recognition engines are usually connected according to the different services that the user needs to handle, and the recognition engines are used to process the services. Since different services need to be processed by different servers, video robots need to carry more service attributes to connect with different recognition engines. Each service requires custom development of different recognition engines, which wastes a lot of resources and is costly.

technical problem

The embodiments of the present application provide a data processing method, device, equipment, and medium, which can avoid waste of resources and reduce costs.

Technical solutions

On the one hand, the embodiments of the present application provide a data processing method, including: acquiring first multimedia data about a first service from a terminal; identifying the first multimedia data to obtain the first service attribute information, The first service attribute information includes at least one of the service level of the first service or the business income of the first service; the recognition engine that matches the first service attribute information is determined from the shared recognition engine set as the target recognition Engine; output prompt information about processing the first service; obtain the second multimedia data sent for the prompt information from the terminal; send the second multimedia data to the first service platform, so that the The first service platform uses the target recognition engine to recognize the second multimedia data and process the first service.

On the one hand, the embodiments of the present application provide a data processing device, including: a first acquisition module, configured to acquire first multimedia data related to a first service from a terminal; Volume data to obtain the first business attribute information, the first business attribute information includes at least one of the business level of the first business or the business income of the first business; the engine determination module is used to identify from the shared The recognition engine that matches the first service attribute information is determined in the engine set as the target recognition engine; the information output module is used to output prompt information about processing the first service; the second acquisition module is used to obtain information from the terminal The second multimedia data sent in response to the prompt information; the service processing module is configured to send the second multimedia data to the first service platform, so that the first service platform uses the target recognition engine for the first service platform Second, the multimedia data is identified, and the first service is processed.

One aspect of the present application provides a computer device, including: a processor, a memory, and a network interface; the processor is connected to the memory and the network interface, wherein the network interface is used to provide data communication functions, and the memory is used to store computer programs, The above-mentioned processor is configured to call the above-mentioned computer program to execute the following method: obtain the first multimedia data about the first service from the terminal; identify the first multimedia data to obtain the first service attribute information, the The first service attribute information includes at least one of the service level of the first service or the business income of the first service; the recognition engine matching the first service attribute information is determined from the shared recognition engine set as the target recognition engine ; Output prompt information about processing the first service; obtain the second multimedia data sent for the prompt information from the terminal; send the second multimedia data to the first service platform, so that the first A service platform uses the target recognition engine to recognize the second multimedia data and process the first service.

One aspect of the embodiments of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program includes program instructions that, when executed by a processor, cause the processor to perform the following method : Obtain the first multimedia data about the first service from the terminal; identify the first multimedia data to obtain the first service attribute information, and the first service attribute information includes the service level of the first service Or at least one of the business income of the first service; determine the recognition engine matching the attribute information of the first service from the shared recognition engine set as the target recognition engine; output prompt information about processing the first service; The terminal obtains the second multimedia data sent for the prompt information; sends the second multimedia data to the first service platform, so that the first service platform uses the target recognition engine to perform the second multimedia data The media data is identified, and the first service is processed.

Beneficial effect

The embodiments of the present application can avoid waste of resources, save investment in hardware resources, and thereby save costs. Further, it is possible to realize the separation of the two processes of determining the recognition engine and processing the business, and realize the rapid connection to the business processing platform for business processing.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings.

FIG. 1 is a schematic flowchart of a data processing method provided by an embodiment of the present application.

Fig. 2 is a schematic flowchart of a data processing method provided by an embodiment of the present application.

FIG. 3 is a schematic diagram of the composition structure of a data processing device provided by an embodiment of the present application.

FIG. 4 is a schematic diagram of the composition structure of a computer device provided by an embodiment of the present application.

Embodiments of the present invention

The following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

The technical solution of this application can be applied to the fields of artificial intelligence, blockchain and/or big data technology to realize business processing.

Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology. Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

Among them, speech processing technology (Speech Technology)’s key technologies include automatic speech recognition technology (ASR), speech synthesis technology (TTS) and voiceprint recognition technology. Enabling computers to be able to listen, see, speak, and feel is the future development direction of human-computer interaction, among which voice has become one of the most promising human-computer interaction methods in the future.

This application relates to the voice processing technology in artificial intelligence. The voice processing technology is used to recognize the first multimedia data about the first service to obtain the first service attribute information, and the first service attribute information is determined from the shared recognition engine set. The matched target recognition engine sends the second multimedia data to the first service platform, so that the first service platform uses the target recognition engine to recognize the second multimedia data and process the first service. Since different services in this application can share the recognition engines in the set, there is no need to customize recognition engines for different services, which can avoid waste of resources, save investment in hardware resources, and thereby save costs. This application can be applied to the fields of smart government affairs, smart education, etc., and is conducive to promoting the construction of smart cities.

The technical solution of the present application is applicable to the scenario where the multimedia data sent by the terminal is recognized, so as to perform corresponding service processing according to the service attribute information in the multimedia data. For example, the technical solution of this application is applicable to scenarios such as remote face-to-face audits, video return visits, and remote account opening. By acquiring the first multimedia data about the first service from the terminal, the first multimedia data is identified, and the first multimedia data is obtained. According to the attribute information of a service, the target recognition engine that matches the attribute information is determined according to the attribute information, and the prompt information about processing the first service is output, so that the terminal sends the second multimedia data according to the prompt information. The media data is sent to the service platform corresponding to the first service, so that the service platform uses the target recognition engine to identify the second multimedia data and process the first service. By recognizing the multimedia data containing the service, the service attribute information in the multimedia data can be determined, so that the corresponding service can be handled according to the service attribute information.

Please refer to Figure 1. Figure 1 is a schematic flow chart of a data processing method provided by an embodiment of the present application. The method can be applied to computer equipment. Mobile Internet equipment (MID, mobile internet device), POS (Point Of Sales, point of sale) machines, wearable devices (such as smart watches, smart bracelets, etc.), etc.; can also refer to an independent server, or a server cluster composed of several servers, or a cloud computing center. As shown in Figure 1, the method includes the following steps.

S101. Acquire first multimedia data about a first service from a terminal.

Here, the terminal may refer to a terminal used by a user for service processing. Terminals can include mobile phones, tablets, laptops, handheld computers, smart speakers, mobile Internet devices (MID, mobile internet device), POS (Point Of Sales, point of sale) machines, wearable devices (such as smart watches, smart bracelets, etc.), etc. The first business may include the business that the user needs to handle, such as purchasing XX property insurance, bank loans, bank card processing, credit card processing, and so on. Alternatively, the first service may also include services required by the user, such as bank card balance inquiry, credit card limit inquiry, and so on. The first multimedia data may include voice data types, video data types, and so on.

In specific implementation, the user can send a call request through the terminal, the computer device obtains the call request, establishes a call connection with the terminal according to the call request, and obtains the first multimedia information about the first service from the terminal through the call connection体数据。 Body data. Here, the call connection may include a video connection, a voice connection, and so on. The video connection is used to obtain the video data sent by the terminal connected to the computer device, and the voice connection is used to obtain the voice data sent by the terminal connected to the computer device.

S102: Identify the first multimedia data to obtain first service attribute information.

Here, the first multimedia data includes keywords corresponding to the first service, and the computer device can recognize the first multimedia data, and recognize that the first multimedia data includes keywords corresponding to the first service. , The keyword is used as the first business attribute information. For example, the first multimedia data may be "I want to apply for a credit card", then the recognized keywords include "transaction" and "credit card", and the first attribute information includes "transaction" and "credit card".

S103: Determine a recognition engine matching the first service attribute information from the shared recognition engine set as the target recognition engine.

Here, the recognition engine is used to recognize multimedia data. The shared recognition engine set includes at least one recognition engine, and the shared recognition engine set may include a recognition engine that recognizes multimedia data corresponding to multiple services. Among them, one service can correspond to multiple recognition engines, for example, it can include a voice data recognition engine, a text data recognition engine, a facial data recognition engine, and so on. One recognition engine can also recognize multiple services. The recognition engine matching the first service attribute information refers to the recognition engine that can recognize the multimedia data corresponding to the first service. For example, if the first business attribute information is "credit card processing", the first business can be "credit card processing", and the recognition engine that matches the first business attribute information refers to the recognition engine that can identify the multimedia data corresponding to "credit card processing" In other words, the recognition engine can recognize text information, voice data, and so on that users fill in for credit card transactions. For example, when the user needs to handle the first service, the user sends the voice data and text data required for handling the first service to the computer device through the terminal, and the target recognition engine is a recognition engine that can recognize the voice data and text data.

Optionally, the first service attribute information may include at least one of the service level of the first service or the service income of the first service. Then, the recognition engine corresponding to the first service attribute information can be determined according to the first service attribute information. The service level of the first service refers to the level of identification data that needs to be obtained to process the first service, and the identification data may include at least one of voice data, fingerprint data, and facial data. For example, the recognition level of facial data is greater than that of fingerprint data, the recognition level of fingerprint data is greater than that of voice data, and so on. The lower the recognition level of the recognition data, the lower the recognition complexity, and the higher the recognition level of the recognition data, the higher the recognition complexity. That is, if the first service attribute information only includes voice data, the service level of the first service is lower; if the first service attribute information includes facial data, the service level of the first service is higher. When the service level of the first service is low, a recognition engine with a lower cost can be used to realize the recognition of multimedia data, and the recognition result meets the recognition requirement of service processing. When the service level of the first service is relatively high, the recognition engine with higher recognition accuracy can be used for recognition to improve the recognition accuracy. When the service level of the first service is high, using a recognition engine with higher recognition accuracy can improve the accuracy of recognition; when the service level of the first service is low, using a recognition engine with a lower recognition cost can save services The cost of processing.

The business revenue of the first business can be the expected revenue of the first business. For example, the lower the cost of the recognition engine, the higher the business revenue of the first business; the lower the cost of the recognition engine, the higher the cost of the first business. The lower the business income.

S104: Output prompt information about processing the first service.

Here, the prompt information of the first service refers to process information for processing the first service. For example, if the process information for processing the first business includes obtaining user identity information and obtaining user facial data, the prompt information for the first business may include "please fill in the currently displayed identity information", "please aim your face at the camera", "please Blink", "Please move your face left and right" and so on. By outputting the prompt information about processing the first service, the user can make corresponding responses based on the prompt information, such as filling in identity information, aligning the face to the camera, etc., so that the terminal can collect the user to respond according to the prompt information of the first service, and get the first service. 2. Multimedia data. Here, the second multimedia data may include a voice data type, a video data type, and so on. If the second multimedia data is of the voice data type, the terminal records the voice replied by the user according to the prompt information of the first service to obtain the voice data, that is, the second multimedia data; if the second multimedia data is Video data type, the terminal records the video replied by the user according to the prompt information of the first service to obtain the video data, that is, the second multimedia data.

S105: Acquire second multimedia data sent for the prompt information from the terminal.

Here, since the terminal collects the second multimedia data that the user replies according to the prompt information of the first service in the above steps, the terminal can send the second multimedia data to the computer device, and the computer device obtains the prompt information The second multimedia data sent.

S106: Send the second multimedia data to the first service platform, so that the first service platform uses the target recognition engine to recognize the second multimedia data and process the first service.

Here, the first service platform may refer to a platform that processes the first service. For example, if the first business is credit card processing, the first business platform is a banking platform. After the computer device sends the second multimedia data to the first service platform, the first service platform uses the target recognition engine to recognize the second multimedia data and process the first service.

Specifically, the first service platform may use the target recognition engine to recognize the second multimedia data and recognize the authenticity of the second multimedia data, and if the second multimedia data has authenticity, process the first service; if If the second multimedia data does not have authenticity, the processing of the first service is ended. For example, if the second multimedia data includes the user's facial information, the target recognition engine is used to recognize the second multimedia data, and the authenticity of the second multimedia data may include: recognizing the second multimedia data Whether the user’s facial information included in the first service platform is the user’s facial information, if so, the second multimedia data is considered authentic; if not, the second multimedia data is considered not authentic . Wherein, the facial information of the user stored by the first service platform may be based on the facial information stored when the user handles historical services on the first service platform. For example, if the user has processed a bank card on the first business platform, the user's facial information stored on the first business platform may be the user's facial information reserved when the user has processed the bank card on the first business platform. If the user does not handle historical business on the first business platform, or the user does not store facial information when handling historical business on the first business platform, the user’s facial information can be obtained from other platforms that store the user’s facial information, for example, from Obtain the user's facial information from the corresponding platforms of the Ministry of Public Security and the Ministry of Civil Affairs.

Optionally, after processing the first service, the multimedia data sent by the terminal can also be obtained, the second service attribute information is determined by identifying the multimedia data, and the second service attribute information is determined from the shared recognition engine set. The recognition engine, as the second recognition engine, outputs prompt information about processing the second service; acquires multimedia data sent by the prompt information for the second service from the terminal, and sends the multimedia data to the second service platform for processing The second business. In other words, since the shared recognition engine set includes at least one recognition engine, and different recognition engines correspond to different services, this method can concentrate multiple recognition engines in a set, so that different services share one recognition engine. Engine, there is no need to customize the recognition engine for different businesses, saving costs. This method also brings together multiple services to facilitate quick docking to the service platform. Even if users need to handle multiple different services, by identifying the service attribute information in the multimedia data, they can be docked to the corresponding recognition engine and process the corresponding Business, thereby improving the efficiency of business processing.

Optionally, the computer equipment in this application can refer to any node equipment in the blockchain. The so-called blockchain is a computer technology such as distributed data storage, peer-to-peer transmission (P2P transmission), consensus mechanism, encryption algorithm, etc. The new type of application model is essentially a decentralized database; a block chain can be composed of multiple serial transaction records (also called blocks) that are connected and protected by cryptography. The connected distributed ledger allows multiple parties to effectively record the transaction, and the transaction can be permanently checked (not tampered with). Among them, the consensus mechanism refers to the mathematical algorithm that realizes the establishment of trust between different nodes and the acquisition of rights and interests in the blockchain network; that is to say, the consensus mechanism is a mathematical algorithm recognized by all network nodes of the blockchain. This application can use the consensus mechanism of the blockchain to realize that multiple services share the recognition engine in the shared recognition engine set, so as to avoid waste of resources and save costs.

In the embodiment of the present application, by identifying the first multimedia data, the first service attribute information corresponding to the first service can be obtained. By determining the target recognition engine corresponding to the first service, when the first service is subsequently processed, the target recognition engine is used for recognition and the first service is processed. Since the shared recognition engine set includes multiple recognition engines, this method can concentrate multiple recognition engines in the shared recognition engine set. Different services can share the recognition engines in the set, and there is no need to customize recognition engines for different services. , Can avoid the waste of resources, save the investment of hardware resources, thereby saving costs. Further, the prompt information about processing the first service is output, and the second multimedia data sent for the prompt information is obtained from the terminal. By outputting the prompt information, the terminal can collect the second information obtained by the user according to the prompt information. Media data. The second multimedia data is sent to the first service platform, so that the first service platform uses the target recognition engine to recognize the second multimedia data and process the first service. When the user needs to handle the service, he only needs to obtain the service attribute information in the multimedia data, determine the corresponding service and the recognition engine corresponding to the service, and send the multimedia data corresponding to the first service to the first service platform. That is, the corresponding recognition engine can be used to identify and process the first service. It can realize the separation of the two processes of determining the recognition engine and processing the business, and realize the quick connection to the business processing platform for business processing.

In an embodiment, the first service attribute information includes the identifier of the first service, and the above step S104 may include the following steps s11 to s13.

s11: Determine the processing platform for the first service according to the identifier of the first service.

s12. Obtain prompt information about processing the first service from the first service platform.

s13, output the first prompt message.

In steps s11 to s13, the identifier of the first service is used to uniquely indicate the first service. For example, the identifier of the first service can be the name of the first service, the abbreviation of the name of the first service, and the pinyin of the name of the first service. , The pinyin abbreviation of the name of the first business, and the number used to indicate the first business, etc. Then, the processing first business platform is a platform that can process the first business. For example, if the first business identifier is Ping An Bank card processing, the first business platform is the Ping An Bank platform. By determining the first service platform, the computer device can obtain the process information for handling the first service from the first service platform, such as obtaining user identity information, obtaining user facial data, etc. in the above steps, to obtain prompt information for the first service , Output the first prompt message to the terminal. The user can view the prompt information through the terminal, and make a corresponding reply according to the prompt information to conduct business processing.

In an embodiment, the first multimedia data includes first voice data, and the above step S102 may include the following steps s21 to s23.

s21: Perform voice recognition on the first voice data to obtain the first keyword associated with the service in the first voice data, and determine the first service attribute information according to the first keyword.

Here, the first voice data refers to data obtained by collecting the voice of the user. The first keyword associated with the business may be, for example, the name of the business, the abbreviation of the business name, and the number used to represent the business, and so on. The computer device obtains the first keyword associated with the service in the first voice data by performing voice recognition on the first voice data, such as the name of the service, and then determines the first service attribute information according to the name of the service. For example, the first voice data is "I want to apply for a bank card" and the first keyword is "bank card". The first business attribute information can be determined by obtaining the words before and after the first keyword. For example, it is determined that the first business attribute information includes "Apply for a bank card."

In specific implementation, the computer device may use ASR technology or other voice recognition technology to recognize voice data, obtain the first keyword associated with the service in the first voice data, and determine the first service attribute information according to the first keyword.

s22: Convert the first voice data to obtain first text data corresponding to the first voice data.

Here, since the first voice data is voice-type data, the voice-type data can be converted into text-type data to obtain the first text data.

s23: Perform keyword extraction on the first text data to obtain a second keyword associated with the business in the first text data, and determine the first business attribute information according to the second keyword.

Here, the first keyword and the second keyword may be the same, and the first keyword and the second keyword may also be different. After the computer device converts the first voice data into the first text data, it performs keyword extraction on the first text data to obtain the second keyword associated with the service in the first text data, and determine the first service according to the second keyword Property information.

In specific implementation, the computer device first performs word segmentation processing on the first text data, and divides the first text data into at least one word segmentation; obtains a stop word set, and the stop word set includes at least one word that is not related to business; Search for a target word that matches the at least one participle in the word set; delete the target word in the at least one participle; perform keyword extraction on at least one participle after deleting the target word to obtain the second keyword, according to the second key The word determines the first business attribute information.

For example, the first text data is "I want to apply for a bank card", the result of word segmentation processing is "I want to apply for a bank card", which is divided into 4 words, and then these 4 words are divided into the stop word set. Each stop word is matched. If it matches the 2 participles of "I" and "Want", delete these 2 participles to obtain "bank card application", and perform keyword extraction on "bank card application" to get the first The second keyword "bank card", the first business attribute information is determined according to the second keyword.

In specific implementation, you can choose to perform voice recognition on the first voice data according to specific needs, or convert the first voice data into text data for keyword extraction. For example, the cost of voice recognition is lower, so in the case of cost savings, Voice recognition is adopted; or, the accuracy of keyword extraction by converting the voice data into text data is relatively high. In the case of improving the recognition accuracy, the voice data is converted into text data for keyword extraction.

By performing voice recognition on the first voice data, or converting the first voice data into text data for conversion, and performing keyword extraction on the text data, the first service attribute information can be obtained, so that the biological information can be determined according to the first service attribute information. The recognition engine and the first service platform can then perform corresponding service processing.

In an embodiment, the first service attribute information includes the service level of the first service, and the above step S103 may include the following steps s31 to s32.

s31: Acquire the recognition level of the recognition engine in the shared recognition engine set, and the recognition level of the recognition engine is used to reflect the accuracy of the recognition engine in recognizing the multimedia data.

s32: Determine the recognition engine whose recognition level matches the service level of the first service in the shared recognition engine set as the target recognition engine.

In steps s31 to s32, the higher the recognition level of the recognition engine, the higher the accuracy of the recognition engine in recognizing the multimedia data; the lower the recognition level of the recognition engine, the lower the accuracy of the recognition engine in recognizing the multimedia data. The higher the business level of the first service, the higher the identification level of the identification data that needs to be obtained to process the first business; the lower the business level of the first business, the higher the identification level of the identification data that needs to be acquired to process the first business Low. For example, the recognition data that needs to be obtained to process the first service is voice data, which means that the service level of the first service is low, and the recognition level of the recognition engine is low; the recognition level of the recognition engine is low; The recognition data is facial data, which indicates that the service level of the first service is higher, and the recognition level of the recognition engine is higher than that of the service level matching degree of the first service.

Optionally, in a case where the identification data that needs to be acquired to process the first service is at least two of voice data, fingerprint data, and facial data, the service level of the first service may be determined according to the type of the identification data. For example, when the identification data that needs to be acquired to process the first service includes voice data, fingerprint data, and facial data, the business level of the first business is higher; when the identification data that needs to be acquired to process the first business includes voice data and fingerprint data, then The business level of the first business is lower. For example, the identification data needed to process the first service 1 to the first service 4 includes identification data 1 to identification data 4, and the identification data 1 includes voice data and fingerprint data, and the identification data 2 includes voice data and facial data, and identification data. 3 includes fingerprint data and facial data, and identification data 4 includes voice data, fingerprint data, and facial data. Then the business level of the first business 1 is less than the business level of the first business 2, and the business level of the first business 2 is less than that of the first business 3. The service level of the first service 3 is less than the service level of the first service 4.

By obtaining the recognition level of the recognition engine in the shared recognition engine set, the recognition engine in the shared recognition engine set whose recognition level matches the business level of the first service is determined as the target recognition engine. When the service level of the first service is low, a recognition engine with a lower recognition level can be used to save costs; when the service level of the first service is higher, a recognition engine with a higher recognition level can be used , Which can improve the accuracy of recognizing multimedia data.

In an embodiment, the first business attribute information includes the business income of the first business, and the above step S103 may include the following steps s41 to s42.

s41: Obtain the recognition cost of the recognition engine in the shared recognition engine set.

s42: Determine the recognition engine whose recognition cost matches the business income of the first business in the shared recognition engine set as the target recognition engine.

In steps s41 to s42, the business income of the first business may be the expected income of the first business. The identification cost of the recognition engine refers to the amount of currency required to purchase or use the recognition engine. The more the recognition cost of the recognition engine is Lower, the higher the business income of the first business; the higher the recognition cost of the recognition engine, the lower the business income of the first business. The computer device obtains the recognition cost of the recognition engine in the shared recognition engine set, and determines the recognition engine that matches the recognition cost of the first business in the shared recognition engine set as the target recognition engine. In the case that the business income of the first service is high, the recognition engine with lower identification cost is used to identify the multimedia data, which can reduce the identification cost, thereby increasing the business income of the first service.

Optionally, please refer to FIG. 2, which is a schematic flowchart of a data processing method provided in an embodiment of the present application. The method is applied to computer equipment; as shown in Figure 2, the method includes the following steps.

S201: Acquire first multimedia data about a first service from a terminal.

S202: Identify the first multimedia data to obtain first service attribute information.

S203: Determine a recognition engine matching the first service attribute information from the shared recognition engine set as the target recognition engine.

S204: Output prompt information about processing the first service.

S205: Acquire the second multimedia data sent for the prompt information from the terminal.

Here, the second multimedia data includes the first video data and the second voice data. For the specific implementation of steps S201 to S205, reference may be made to the description of steps S101 to S105 in the embodiment corresponding to FIG. 1, which will not be repeated here.

S206: Acquire a first image of a user corresponding to the terminal according to the first video data.

Here, the first video data is the video data collected by the terminal and obtained by the user responding according to the prompt information for processing the first service. The first video data includes the user's facial image.

The computer device may intercept the first video data every preset time to obtain the first image containing the user's face, and obtain the first image of the user corresponding to the terminal. For example, the image in the first video data may be intercepted every 0.5 seconds to obtain the first image. For example, if the duration of the first video data is 2 seconds, the number of first images of the user acquired is 4.

S207: Send the first image, the first video data, and the second voice data to the first service platform, so that the first service platform verifies the legitimacy of the terminal according to the first image, and uses the target recognition engine when the terminal has legitimacy Recognize the first video data and the second voice data, and process the first service.

Here, the second voice data is the voice data collected by the terminal and the user responds according to the prompt information for processing the first service. The computer device sends the first image, the first video data, and the second voice data to the first service platform so that the first service platform verifies the legitimacy of the terminal according to the first image. When the terminal has legitimacy, the target recognition engine is used to The first video data and the second voice data are recognized, and the first service is processed.

In specific implementation, after the first service platform obtains the first image, the first video data, and the second voice data, it can use the target recognition engine to recognize the first image, and determine the user’s facial image in the first image and the first service Whether the user image stored on the platform is the facial image of the same user, if it is, it is determined that the terminal is legal, and the target recognition engine is used to identify the first video data and the second voice data, and process the first service. If not, it is determined that the terminal does not have legitimacy, and warning information indicating that the terminal does not have legitimacy is generated, so that the user can adjust the posture according to the warning information.

In a possible implementation manner, when the first service platform uses the target recognition engine to recognize the first video data and the second voice data, it can obtain the third image of the user corresponding to the second voice data in the first video data , That is, the third image when the user answers the question according to the prompt information of the first service is obtained from the first video data, and the third image contains the facial image of the user. By performing micro-expression recognition on the third image, the authenticity of the question answered by the user is determined according to the micro-expression when the user answers the question. If it is determined through micro-expression recognition that the authenticity of the question answered by the user is high, the first service is processed. If it is determined through the micro-expression recognition that the authenticity of the question answered by the user is low, the instruction information used to verify the user's identity for the second time is sent or the question with the abnormal micro-expression of the user is output again. If the second verification is passed or the facial expression when the user answers the question again indicates that the authenticity of the question answered by the user is high, the first service is processed. If the second verification fails or the user’s facial expression when answering the question again indicates that the authenticity of the question answered by the user is low, the output is used to instruct the user to conduct business processing at the manual business processing office corresponding to the first business platform, and end the processing of the first business platform. One business.

By acquiring the first image in the first video data and sending the first image to the first service platform for verification, the authenticity of the user’s identity can be improved, and the first service platform can perform verification on the third image in the first video data. Micro-expression recognition can identify the authenticity of the question answered by the user, thereby realizing the second verification of the user's identity information and improving the accuracy of business processing.

In an embodiment, the above step method may include the following steps s51 to s54.

s51: If the warning information sent by the first service platform for indicating that the terminal is not legal is obtained, output adjustment information for instructing the user to adjust the posture.

s52. Acquire third multimedia data sent by the terminal for the adjustment information, where the third multimedia data includes third video data.

s53: Acquire a second image of the user according to the third video data.

s54. Send the second image to the first service platform, so that the first service platform verifies the legitimacy of the terminal according to the second image.

In steps s51 to s54, if the computer device obtains the warning information sent by the first service platform to indicate that the terminal is not legal, it outputs adjustment information for instructing the user to adjust the posture, so that the user can follow the adjustment information Perform posture adjustment. For example, when the user’s face is not aligned with the camera of the terminal, the adjusted user’s face is aligned with the camera of the terminal; or, when the camera of the terminal includes user A and user B, and user A is required For the user who handles the first service, only user A is included in the camera of the adjusted terminal.

The computer device obtains the third multimedia data sent by the terminal for the adjustment information, the third multimedia data includes third video data; obtains the user's second image according to the third video data; sends the second image to the first service platform , So that the first service platform verifies the legitimacy of the terminal based on the second image. The second image includes the facial image of the user. If the second image and the facial image of the user stored in the first service platform are the facial image of the same user, the terminal has legitimacy and processes the first service. If the second image and the user's facial image stored in the first service platform are not the same user's facial image, the terminal does not have legitimacy, and the processing of the first service is terminated, and the output is used to instruct the user to correspond to the manual on the first service platform. The business handling office conducts business handling and ends the processing of the first business. In the case of verifying that the first terminal is not legal, the user is prompted to adjust the posture by outputting adjustment information, thereby verifying the legitimacy of the terminal, thereby improving the authenticity of the user identity information verification.

The method of the embodiment of the present application is described above, and the device of the embodiment of the present application is described below.

Referring to FIG. 3, FIG. 3 is a schematic diagram of the composition structure of a data processing device provided by an embodiment of the present application. The above data processing device may be a computer program (including program code) running in a computer device. For example, the data processing device is An application software; the device can be used to execute the corresponding steps in the method provided in the embodiments of this application. The device 30 includes: a first obtaining module 301, which is used to obtain first multimedia data about a first service from a terminal; and a data recognition module 302, which is used to recognize the first multimedia data to obtain the first multimedia data. A service attribute information, the first service attribute information includes at least one of the service level of the first service or the service income of the first service; the engine determining module 303 is configured to determine the first service from the set of shared recognition engines A recognition engine with matching service attribute information is used as a target recognition engine; an information output module 304 is used to output prompt information about processing the first service; a second acquisition module 305 is used to obtain information specific to the prompt information from the terminal The second multimedia data sent; the service processing module 306, configured to send the second multimedia data to the first service platform, so that the first service platform uses the target recognition engine for the second multimedia The data is identified and the first service is processed.

Optionally, the information output module 304 is configured to: determine to process the first service platform according to the identifier of the first service; obtain prompt information about processing the first service from the first service platform; and output the first service platform; Prompt information.

Optionally, the first multimedia data includes first voice data, and the data recognition module 302 is specifically configured to: perform voice recognition on the first voice data to obtain the first voice data associated with the service in the first voice data. A keyword, the first service attribute information is determined according to the first keyword; or, the first voice data is converted to obtain the first text data corresponding to the first voice data; the first text data is keyed Word extraction is used to obtain the second keyword associated with the business in the first text data; the first business attribute information is determined according to the second keyword.

Optionally, the first service attribute information includes the service level of the first service; the engine determining module 303 is specifically configured to: obtain the recognition level of the recognition engine in the shared recognition engine set, and the recognition level of the recognition engine is used To reflect the accuracy of the recognition engine in recognizing the multimedia data; the recognition engine whose recognition level matches the service level of the first service in the shared recognition engine set is determined as the target recognition engine.

Optionally, the first service attribute information includes the business income of the first service; the engine determining module 303 is specifically configured to: obtain the recognition cost of the recognition engines in the shared recognition engine set; and in the shared recognition engine set The identification engine whose identification cost matches the business income of the first business is determined as the target identification engine.

Optionally, the second multimedia data includes first video data and second voice data; the service processing module 306 is specifically configured to: obtain the first image of the user corresponding to the terminal according to the first video data; The first image, the first video data, and the second voice data are sent to the first service platform, so that the first service platform verifies the legitimacy of the terminal according to the first image, and when the terminal has legitimacy , Using the target recognition engine to recognize the first video data and the second voice data, and process the first service.

Optionally, the device further includes: an adjustment module 307, configured to: if the warning information sent by the first service platform indicating that the terminal is not legal is obtained, outputting an instruction to instruct the user to adjust the posture Adjustment information; obtaining third multimedia data sent by the terminal for the adjustment information, where the third multimedia data includes third video data; obtaining a second image of the user according to the third video data; The image is sent to the first service platform, so that the first service platform verifies the legitimacy of the terminal according to the second image.

It should be noted that, for content not mentioned in the embodiment corresponding to FIG. 3, please refer to the description of the method embodiment, which will not be repeated here.

Referring to FIG. 4, FIG. 4 is a schematic diagram of the composition structure of a computer device provided by an embodiment of the present application. As shown in FIG. 4, the foregoing computer device 40 may include: a processor 401, a network interface 404, and a memory 405. In addition, the foregoing computer device 40 may also include: a user interface 403, and at least one communication bus 402. Among them, the communication bus 402 is used to implement connection and communication between these components. The user interface 403 may include a display screen (Display) and a keyboard (Keyboard), and the optional user interface 403 may also include a standard wired interface and a wireless interface. The network interface 404 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 405 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), for example, at least one magnetic disk memory. Optionally, the memory 405 may also be at least one storage device located far away from the foregoing processor 401. As shown in FIG. 4, the memory 405 as a computer-readable storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.

In the computer device 40 shown in FIG. 4, the network interface 404 can provide network communication functions; the user interface 403 is mainly used to provide an input interface for the user; and the processor 401 can be used to call the device control application stored in the memory 405 Program to realize: obtain first multimedia data about the first service from the terminal; identify the first multimedia data to obtain the first service attribute information, and the first service attribute information includes the first At least one of the business level of the service or the business income of the first service; the recognition engine matching the attribute information of the first service is determined from the shared recognition engine set as the target recognition engine; output information about processing the first service Prompt information; obtain the second multimedia data sent for the prompt information from the terminal; send the second multimedia data to the first service platform, so that the first service platform uses the target recognition engine to The second multimedia data is identified, and the first service is processed.

It should be understood that the computer device 40 described in the embodiment of the present application can perform the foregoing data processing method described in the foregoing embodiment corresponding to FIG. 1 and FIG. 2, and may also perform the foregoing data processing method in the foregoing embodiment corresponding to FIG. 3 The description of the device will not be repeated here. In addition, the description of the beneficial effects of using the same method will not be repeated.

The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program includes program instructions that, when executed by a computer, cause the computer to execute Method, the computer can be a part of the aforementioned computer equipment. For example, the aforementioned processor 401.

Optionally, the storage medium involved in this application, such as a computer-readable storage medium, may be non-volatile or volatile.

As an example, the program instructions may be deployed and executed on one computer device, or be deployed on multiple computer devices located in one location, or on multiple computer devices that are distributed in multiple locations and interconnected by a communication network Execution, multiple computer devices distributed in multiple locations and interconnected through a communication network can form a blockchain network.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The program can be stored in a computer readable storage medium. At this time, it may include the procedures of the embodiments of the above-mentioned methods. Among them, the storage medium can be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM) etc.

The above-disclosed are only preferred embodiments of this application, and of course the scope of rights of this application cannot be limited by this. Therefore, equivalent changes made in accordance with the claims of this application still fall within the scope of this application.

Claims

A data processing method, which includes:

Acquiring first multimedia data about the first service from the terminal;

Identify the first multimedia data to obtain the first service attribute information, where the first service attribute information includes at least one of the service level of the first service or the service income of the first service kind;

Determining a recognition engine matching the first service attribute information from the shared recognition engine set as the target recognition engine;

Outputting prompt information about processing the first service;

Acquiring the second multimedia data sent for the prompt information from the terminal;

The second multimedia data is sent to the first service platform, so that the first service platform uses the target recognition engine to recognize the second multimedia data and process the first service.
The method according to claim 1, wherein the first service attribute information further includes an identifier of the first service, and the outputting prompt information about processing the first service includes:

Determine to process the first service platform according to the identifier of the first service;

Acquiring prompt information about processing the first service from the first service platform;

Output the first prompt information.
The method according to claim 1, wherein the first multimedia data includes first voice data, and the recognizing the first multimedia data to obtain the first service attribute information includes:

Perform voice recognition on the first voice data to obtain the first keyword associated with the service in the first voice data, and determine the first service attribute information according to the first keyword; or,

Converting the first voice data to obtain first text data corresponding to the first voice data;

Keyword extraction is performed on the first text data to obtain a second keyword associated with a business in the first text data; the first business attribute information is determined according to the second keyword.
The method according to claim 1, wherein the first service attribute information includes the service level of the first service;

The determining the recognition engine matching the first service attribute information from the shared recognition engine set as the target recognition engine includes:

Acquiring a recognition level of a recognition engine in the shared recognition engine set, where the recognition level of the recognition engine is used to reflect the accuracy of the recognition engine in recognizing multimedia data;

The recognition engine whose recognition level matches the service level of the first service in the shared recognition engine set is determined as the target recognition engine.
The method according to claim 1, wherein the first service attribute information includes the business income of the first service, and the identifying engine matching the first service attribute information is determined from a set of shared recognition engines, As a target recognition engine, it includes:

Acquiring the recognition cost of the recognition engine in the shared recognition engine set;

The recognition engine whose recognition cost matches the business income of the first service in the shared recognition engine set is determined as the target recognition engine.
The method according to claim 1, wherein the second multimedia data includes first video data and second voice data; and the second multimedia data is sent to the first service platform to enable The first service platform uses the target recognition engine to recognize the second multimedia data and process the first service, including:

Acquiring a first image of a user corresponding to the terminal according to the first video data;

Send the first image, the first video data, and the second voice data to the first service platform, so that the first service platform verifies the legitimacy of the terminal according to the first image When the terminal is legal, the target recognition engine is used to recognize the first video data and the second voice data, and process the first service.
The method according to claim 6, wherein the method further comprises:

If the warning information used to indicate that the terminal does not have legitimacy sent by the first service platform is obtained, output adjustment information used to instruct the user to adjust the posture;

Acquiring third multimedia data sent by the terminal for the adjustment information, where the third multimedia data includes third video data;

Acquiring a second image of the user according to the third video data;

The second image is sent to the first service platform, so that the first service platform verifies the legitimacy of the terminal according to the second image.
A data processing device, which includes:

The first obtaining module is configured to obtain first multimedia data about the first service from the terminal;

The data identification module is configured to identify the first multimedia data to obtain the first service attribute information, where the first service attribute information includes the service level of the first service or the information of the first service At least one of business income;

An engine determination module, configured to determine a recognition engine matching the first service attribute information from the shared recognition engine set, as a target recognition engine;

An information output module, configured to output prompt information about processing the first service;

The second acquisition module is configured to acquire the second multimedia data sent for the prompt information from the terminal;

The service processing module is configured to send the second multimedia data to the first service platform, so that the first service platform uses the target recognition engine to recognize the second multimedia data, and process all The first business.
A computer device, which includes: a processor, a memory, and a network interface;

The processor is connected to the memory and the network interface, wherein the network interface is used to provide a data communication function, the memory is used to store program code, and the processor is used to call the program code to execute The following methods:

Acquiring first multimedia data about the first service from the terminal;

Identify the first multimedia data to obtain the first service attribute information, where the first service attribute information includes at least one of the service level of the first service or the service income of the first service kind;

Determining a recognition engine matching the first service attribute information from the shared recognition engine set as the target recognition engine;

Outputting prompt information about processing the first service;

Acquiring the second multimedia data sent for the prompt information from the terminal;

The second multimedia data is sent to the first service platform, so that the first service platform uses the target recognition engine to recognize the second multimedia data and process the first service.
The computer device according to claim 9, wherein the first service attribute information further includes an identifier of the first service, and when the prompt information about processing the first service is output, the specific execution is performed:

Determine to process the first service platform according to the identifier of the first service;

Acquiring prompt information about processing the first service from the first service platform;

Output the first prompt information.
9. The computer device according to claim 9, wherein the first multimedia data includes first voice data, and when the first multimedia data is identified to obtain the first service attribute information, Specific implementation:

Perform voice recognition on the first voice data to obtain the first keyword associated with the service in the first voice data, and determine the first service attribute information according to the first keyword; or,

Converting the first voice data to obtain first text data corresponding to the first voice data;

Keyword extraction is performed on the first text data to obtain a second keyword associated with a business in the first text data; the first business attribute information is determined according to the second keyword.
The computer device according to claim 9, wherein the first service attribute information includes the service level of the first service; and the recognition engine that matches the first service attribute information is determined from a set of shared recognition engines , As a target recognition engine, specifically execute: obtain the recognition level of the recognition engine in the shared recognition engine set, the recognition level of the recognition engine is used to reflect the accuracy of the recognition engine to recognize the multimedia data; The recognition engine whose recognition level matches the service level of the first service in the recognition engine set is determined to be the target recognition engine; or,

The first service attribute information includes the business income of the first service, and when the recognition engine that matches the first service attribute information is determined from the shared recognition engine set, as the target recognition engine, the specific execution is: The recognition cost of the recognition engine in the shared recognition engine set; and the recognition engine in the shared recognition engine set that matches the recognition cost with the business income of the first service is determined as the target recognition engine.
The computer device according to claim 9, wherein the second multimedia data includes first video data and second voice data; and the second multimedia data is sent to the first service platform to Make the first service platform use the target recognition engine to recognize the second multimedia data, and when processing the first service, specifically execute:

Acquiring a first image of a user corresponding to the terminal according to the first video data;

Send the first image, the first video data, and the second voice data to the first service platform, so that the first service platform verifies the legitimacy of the terminal according to the first image When the terminal is legal, the target recognition engine is used to recognize the first video data and the second voice data, and process the first service.
The computer device according to claim 13, wherein the processor is further configured to execute:

If the warning information used to indicate that the terminal does not have legitimacy sent by the first service platform is obtained, output adjustment information used to instruct the user to adjust the posture;

Acquiring third multimedia data sent by the terminal for the adjustment information, where the third multimedia data includes third video data;

Acquiring a second image of the user according to the third video data;

The second image is sent to the first service platform, so that the first service platform verifies the legitimacy of the terminal according to the second image.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program includes program instructions that, when executed by a processor, cause the processor to perform the following method:

Acquiring first multimedia data about the first service from the terminal;

Identify the first multimedia data to obtain the first service attribute information, where the first service attribute information includes at least one of the service level of the first service or the service income of the first service kind;

Determining a recognition engine matching the first service attribute information from the shared recognition engine set as the target recognition engine;

Outputting prompt information about processing the first service;

Acquiring the second multimedia data sent for the prompt information from the terminal;

The second multimedia data is sent to the first service platform, so that the first service platform uses the target recognition engine to recognize the second multimedia data and process the first service.
15. The computer-readable storage medium according to claim 15, wherein the first service attribute information further includes an identifier of the first service, and when the prompt information about processing the first service is output, the following is specifically executed:

Determine to process the first service platform according to the identifier of the first service;

Acquiring prompt information about processing the first service from the first service platform;

Output the first prompt information.
The computer-readable storage medium according to claim 15, wherein the first multimedia data includes first voice data, and the first multimedia data is identified to obtain the first service attribute When information, the specific implementation:

Perform voice recognition on the first voice data to obtain the first keyword associated with the service in the first voice data, and determine the first service attribute information according to the first keyword; or,

Converting the first voice data to obtain first text data corresponding to the first voice data;

Keyword extraction is performed on the first text data to obtain a second keyword associated with a business in the first text data; the first business attribute information is determined according to the second keyword.
The computer-readable storage medium according to claim 15, wherein the first service attribute information includes the service level of the first service; and the determination from a set of shared recognition engines matches the first service attribute information When the recognition engine is used as a target recognition engine, specifically execute: obtain the recognition level of the recognition engine in the shared recognition engine set, the recognition level of the recognition engine is used to reflect the accuracy of the recognition engine in recognizing multimedia data; The recognition engine whose recognition level matches the service level of the first service in the shared recognition engine set is determined to be the target recognition engine; or,

The first service attribute information includes the business income of the first service, and when the recognition engine that matches the first service attribute information is determined from the shared recognition engine set, as the target recognition engine, the specific execution is: The recognition cost of the recognition engine in the shared recognition engine set; and the recognition engine in the shared recognition engine set that matches the recognition cost with the business income of the first service is determined as the target recognition engine.
The computer-readable storage medium according to claim 15, wherein the second multimedia data includes first video data and second voice data; and the second multimedia data is sent to the first service Platform, so that the first service platform uses the target recognition engine to recognize the second multimedia data, and when processing the first service, it specifically executes:

Acquiring a first image of a user corresponding to the terminal according to the first video data;

Send the first image, the first video data, and the second voice data to the first service platform, so that the first service platform verifies the legitimacy of the terminal according to the first image When the terminal is legal, the target recognition engine is used to recognize the first video data and the second voice data, and process the first service.
The computer-readable storage medium according to claim 19, wherein the program instructions when executed by the processor are also used to cause the processor to execute:

If the warning information used to indicate that the terminal does not have legitimacy sent by the first service platform is obtained, output adjustment information used to instruct the user to adjust the posture;

Acquiring third multimedia data sent by the terminal for the adjustment information, where the third multimedia data includes third video data;

Acquiring a second image of the user according to the third video data;

The second image is sent to the first service platform, so that the first service platform verifies the legitimacy of the terminal according to the second image.