CN109559744B

CN109559744B - Voice data processing method and device and readable storage medium

Info

Publication number: CN109559744B
Application number: CN201811517309.2A
Authority: CN
Inventors: 周笑涵
Original assignee: Taikang Health Industry Investment Holdings Co ltd; Taikang Insurance Group Co Ltd
Current assignee: Taikang Health Industry Investment Holdings Co ltd; Taikang Insurance Group Co Ltd
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2022-07-08
Anticipated expiration: 2038-12-12
Also published as: CN109559744A

Abstract

The method, the device and the readable storage medium for processing the voice data receive the voice data to be processed, perform voice recognition on the voice data to be processed to determine the voice keyword information corresponding to the voice data, determine the type of the language recognition support packet corresponding to the voice data to be processed according to the voice keyword information, convert the voice data to be processed according to the determined language recognition support packet to obtain the processed voice conversion data, obtain the voice keyword information of the voice data to be processed by filtering, and select the language recognition support packet which is most matched with the voice data to be processed from the different types of language recognition support packets according to the voice keyword information, so that the voice data to be processed is processed by using the language recognition support packet, the flow is simple, is beneficial to use.

Description

Voice data processing method and device and readable storage medium

Technical Field

The present invention relates to computer technologies, and in particular, to a method and an apparatus for processing voice data, and a readable storage medium.

Background

With the development of electronic technology, paperless office work is becoming a development trend, and office work through a voice mode is one of the modes.

The way of working with speech also requires speech recognition, and due to the requirements of different services, the types of the language support packages used for the recognition are different. In the prior art, a user needs to select a language support packet matched with a voice to be input, and then can input the corresponding voice, so that a service platform can process the voice according to the selected language support packet.

Such a voice data processing method is cumbersome in flow and very disadvantageous to use.

Disclosure of Invention

In view of the above-mentioned problems that the voice input operation can be started only after the corresponding language support package is selected based on the content of the voice input by the user in the prior art, the processing flow is complicated, and the use is not facilitated, the present invention provides a method and an apparatus for processing voice data, and a readable storage medium.

In one aspect, the present invention provides a method for processing voice data, including:

receiving voice data to be processed;

performing voice recognition on the voice data to be processed to determine voice keyword information corresponding to the voice data;

determining a type of language identification support packet corresponding to the voice data to be processed according to the voice keyword information;

and converting the voice data to be processed according to the determined language identification support packet to obtain processed voice conversion data.

In an optional implementation manner, the determining, according to the voice keyword information, a type of a language recognition support package corresponding to the voice data to be processed includes:

and selecting the language identification support packet with the highest matching degree with the voice keyword information from the various types of language identification support packets by using a preset naive Bayesian algorithm as the language identification support packet of the voice data to be processed.

In one optional implementation manner, the selecting, by using a preset naive bayes algorithm, a language recognition support packet with the highest matching degree with the voice keyword information from among the various types of language recognition support packets as the language recognition support packet for the voice data to be processed includes:

determining the probability that each keyword in the voice keyword information belongs to each type of language identification support packet according to a preset keyword vector table;

and determining the probability that the voice data to be processed belongs to each type of language identification support packet according to the probability that each keyword in the voice keyword information belongs to each type of language identification support packet, and selecting the voice data to be processed with the highest probability as the language identification support packet of the voice data to be processed.

In an optional implementation manner, the performing speech recognition on the speech data to be processed to determine speech keyword information corresponding to the speech data includes:

filtering the voice data according to a pre-stored vocabulary packet, and determining key words in the voice data to be processed;

determining at least one of the number of different keywords appearing in the voice data to be processed, the professional type of each keyword, the number of times of each keyword appearing in the voice data and the weight value of each keyword;

correspondingly, the determining the type of the language identification support packet corresponding to the voice data to be processed according to the voice keyword information includes: and determining a type of language identification support packet corresponding to the voice data to be processed according to at least one of the number of different keywords appearing in the voice data to be processed, the professional type of each keyword, the number of times of appearance of each keyword in the voice data and the weight value occupied by each keyword.

In one optional implementation, the type of the language identification support package includes: a medical language identification support package, a customer service language identification support package, and a conference language identification support package.

In an optional implementation manner, after obtaining the processed voice conversion data, the method further includes:

and performing associated storage on the voice data to be processed and the corresponding voice conversion processing.

In an optional implementation manner, the receiving voice data to be processed includes:

and calling an interface of the service platform, and receiving the voice data to be processed uploaded to the service platform from each client.

In another aspect, the present invention provides a speech data processing apparatus, including:

the receiving module is used for receiving voice data to be processed;

the recognition module is used for carrying out voice recognition on the voice data to be processed so as to determine the voice keyword information corresponding to the voice data;

the language packet conversion module is used for determining a language identification support packet of a type corresponding to the voice data to be processed according to the voice keyword information; and the voice conversion module is also used for converting the voice data to be processed according to the determined language identification support packet to obtain processed voice conversion data.

In yet another aspect, the present invention provides an electronic device, comprising: a memory, a processor coupled to the memory, and a computer program stored on the memory and executable on the processor,

the processor, when executing the computer program, performs the method of any of the preceding claims.

In a final aspect, the invention provides a readable storage medium having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, is adapted to carry out the method of any one of claims 1-7.

Drawings

With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

FIG. 1 is a schematic diagram of a network architecture on which the present invention is based;

fig. 2 is a flowchart illustrating a method for processing voice data according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a method for processing voice data according to a second embodiment of the present invention;

fig. 4 is a schematic structural diagram of a speech data processing apparatus according to a third embodiment of the present invention;

fig. 5 is a hardware schematic diagram of a speech data processing apparatus according to a fourth embodiment of the present invention.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.

In view of the above-mentioned technical problems, the present invention provides a method and an apparatus for processing voice data, and a readable storage medium. It should be noted that the method, the apparatus, and the readable storage medium for processing voice data provided in the present application can be widely applied to application scenarios that require office or intelligent data entry and storage by using voice, where the application scenarios include, but are not limited to: voice recording of meetings, medical case entry, voice customer service data entry, and the like.

Fig. 1 is a schematic diagram of a network architecture on which the present invention is based, and as shown in fig. 1, the network architecture on which the present invention is based at least includes a service platform 1, a processing device 2 for voice data, and a terminal 3. The service platform 1 can be connected to and perform data interaction with the terminal 3 and the voice data processing apparatus 2 by wireless communication. In addition, the user can upload the input voice to the service platform 1 through the terminal 3, and the processing device 2 of the voice data acquires the voice data from the service platform, processes the voice data and stores the processed voice data in the service platform 1 for subsequent use. Of course, the processing device 2 for voice data provided in the present application can process various types of voice data, and accordingly, the user can upload various types of voice data to the service platform 1 through the terminal 3 for processing by the processing device 2 for voice data.

Fig. 2 is a flowchart illustrating a method for processing voice data according to an embodiment of the present invention.

As shown in fig. 2, the method for processing voice data includes:

step 101, receiving voice data to be processed.

And 102, performing voice recognition on the voice data to be processed to determine voice keyword information corresponding to the voice data.

And 103, determining a type of language identification support packet corresponding to the voice data to be processed according to the voice keyword information.

And 104, converting the voice data to be processed according to the determined language identification support packet to obtain processed voice conversion data.

The main body of the execution of the method for processing voice data provided by the present invention may be specifically the processing apparatus of voice data shown in fig. 1.

Specifically, the invention provides a voice data processing method, firstly, a user uploads voice data acquired in real time or pre-recorded voice data to a service platform through a terminal, and a voice data processing device captures the voice data to be processed from the service platform or receives the voice data to be processed initiated actively by the service platform.

And then, the processing device of the voice data carries out voice recognition on the voice data to be processed so as to determine the voice keyword information corresponding to the voice data, and determines the type of the language recognition support packet corresponding to the voice data to be processed according to the voice keyword information.

Specifically, unlike the prior art, in the present embodiment, since the processing device of voice data can process a plurality of types of voice data, a plurality of types of language identification support packages are deployed in advance, and these language identification support packages include at least: a medical language identification support package, a customer service language identification support package, and a conference language identification support package. The various types of the language identification support packages can be used for performing text conversion on the voice based on different professional vocabularies so as to obtain more accurate converted data. By using the voice keyword information, a language recognition support package more matching the voice data to be processed can be determined from various types of language recognition support packages and recognized. The voice keyword information may specifically include one or more information of the number of different keywords appearing in the voice data to be processed, the professional type to which each keyword belongs, the number of occurrences of each keyword in the voice data, and the like.

And performing voice recognition on the voice data to be processed to determine the voice keyword information corresponding to the voice data, which may specifically include: filtering the voice data according to a pre-stored vocabulary packet, and determining key words in the voice data to be processed; determining at least one of the number of different keywords appearing in the voice data to be processed, the professional type of each keyword, the number of times of each keyword appearing in the voice data and the weight value of each keyword; correspondingly, the determining the type of the language identification support packet corresponding to the voice data to be processed according to the voice keyword information includes: and determining a type of language identification support packet corresponding to the voice data to be processed according to at least one of the number of different keywords appearing in the voice data to be processed, the professional type of each keyword, the number of times of appearance of each keyword in the voice data and the weight value occupied by each keyword. The determination of the type of the speech recognition support package corresponding to the speech data to be processed may be performed by calculating a composite score using a preset algorithm.

And finally, the voice data processing device converts the voice data to be processed according to the determined language identification support packet to obtain processed voice conversion data.

Taking a medical scenario as an example: in the working process of a clinician, if documents such as medical records and the like need to be written, a user can turn on the intelligent voice software or the microphone switch, speak in the effective range of the microphone, and the terminal uploads the voice to the service platform so that the voice data processing device can recognize the voice data and automatically call the medical language recognition support packet, and the voice is converted into characters and displayed in the documents. The texts can be uniformly stored or subjected to subsequent analysis through the service platform.

The method for processing voice data provided by the embodiment of the invention receives the voice data to be processed, performing voice recognition on the voice data to be processed to determine voice keyword information corresponding to the voice data, determining a type of language recognition support package corresponding to the voice data to be processed according to the voice keyword information, converting the voice data to be processed according to the determined language identification support packet to obtain processed voice conversion data, obtaining the voice keyword information of the voice data to be processed by filtering, selecting a language identification support packet which is most matched with the voice data to be processed from different types of language identification support packets according to the voice keyword information, therefore, the speech recognition support package is used for processing the speech data to be processed, the flow is simple, and the use is facilitated.

On the basis of the first embodiment, in order to further improve the conversion accuracy of the voice data, fig. 3 is a schematic flow chart of a processing method of the voice data according to the second embodiment of the present invention, as shown in fig. 3, the processing method of the voice data includes:

step 201, calling an interface of a service platform, and receiving to-be-processed voice data uploaded to the service platform from each client.

Step 202, performing voice recognition on the voice data to be processed to determine voice keyword information corresponding to the voice data;

and 203, selecting the language identification support packet with the highest matching degree with the voice keyword information from the various types of language identification support packets by using a preset naive Bayesian algorithm as the language identification support packet of the voice data to be processed.

And 204, converting the voice data to be processed according to the determined language identification support packet to obtain processed voice conversion data.

Step 205, the voice data to be processed and the corresponding voice conversion processing are stored in association, so that the service platform can manage the voice data.

Similar to the embodiment, the main body of the voice data processing method provided by the present invention may be specifically the voice data processing apparatus shown in fig. 1.

In this embodiment, first, after a user uploads voice data acquired in real time or pre-recorded voice data to a service platform through a terminal, a processing device of the voice data invokes an interface of the service platform to receive to-be-processed voice data uploaded to the service platform from each client. In order to facilitate the processing device of the voice data to capture and receive the voice data to be processed, a uniform interface is reserved in the service platform so that the processing device of the voice data can call the voice data from the interface.

In the present embodiment, a vocabulary packet is stored in the processing apparatus for voice data, and each vocabulary has its own specialty (medical vocabulary, customer service vocabulary, or conference vocabulary) and a corresponding weight value. And filtering the voice data to be processed by utilizing the vocabulary packets, and determining keywords appearing in the voice data to be processed. And counting and determining the number of different keywords appearing in the voice data to be processed, the professional type of each keyword and the number of times of each keyword appearing in the voice data.

Then, the voice keyword information and a preset algorithm are used for obtaining a language identification support packet which is matched most. Specifically, a preset keyword vector table for indicating the probability that each keyword in the speech keyword information belongs to each type of language identification support packet is also prestored in the speech data processing apparatus. The processing apparatus of the voice data can determine the probability that each voice keyword appearing in the voice keyword information belongs to each type of language recognition support package using the keyword vector table. And then, determining a language identification support packet with the highest probability as the voice data to be processed by combining the occurrence frequency of each voice keyword in the whole voice data to be processed and the number of the voice keywords of the whole voice data to be processed. That is, the processing device of the voice data increases the weight of the language support packet according to the frequency of occurrence of words in the analyzed language, and the processing device of the voice data matches the frequency of use of the integration, and the language support packet is integrated and optimized.

Further, since the processing device of voice data can process a plurality of types of voice data, a plurality of types of language identification support packages including at least: a medical language identification support package, a customer service language identification support package, and a conference language identification support package.

And finally, the voice data processing device converts the voice data to be processed according to the determined language identification support packet, and after the processed voice conversion data is obtained, the voice data to be processed and the corresponding voice conversion processing are stored for the service platform to manage.

The voice data processing method provided by the invention comprises the steps of receiving voice data to be processed, carrying out voice recognition on the voice data to be processed to determine voice keyword information corresponding to the voice data, determining a language recognition support packet of a type corresponding to the voice data to be processed according to the voice keyword information, converting the voice data to be processed according to the determined language recognition support packet to obtain processed voice conversion data, obtaining the voice keyword information of the voice data to be processed through filtering, and selecting a language recognition support packet which is most matched with the voice data to be processed from different types of language recognition support packets according to the voice keyword information, so that the voice data to be processed is processed by using the language recognition support packet, and the method is simple in flow and beneficial to use.

Fig. 4 is a schematic structural diagram of a speech data processing apparatus according to a third embodiment of the present invention, and as shown in fig. 4, the speech data processing apparatus includes:

a receiving module 10, configured to receive voice data to be processed;

the recognition module 20 is configured to perform voice recognition on the voice data to be processed to determine voice keyword information corresponding to the voice data;

a language package conversion module 30, configured to determine, according to the voice keyword information, a language identification support package of a type corresponding to the to-be-processed voice data; and the voice conversion module is also used for converting the voice data to be processed according to the determined language identification support packet to obtain processed voice conversion data.

In an optional implementation manner, the identification module 20 is specifically configured to:

and determining the probability that the voice data to be processed belongs to each type of language identification support packet according to the probability of each voice keyword, and selecting the voice data to be processed with the highest probability as the language identification support packet of the voice data to be processed.

determining the number of different keywords appearing in the voice data to be processed, the professional type of each keyword and the number of times of each keyword appearing in the voice data;

correspondingly, the determining the type of the language identification support packet corresponding to the voice data to be processed according to the voice keyword information includes: and determining a type of language identification support packet corresponding to the voice data to be processed according to the number of different keywords appearing in the voice data to be processed, the professional type of each keyword and the occurrence frequency of each keyword in the voice data.

In an optional implementation manner, the system further includes a storage unit, configured to store the to-be-processed voice data and the corresponding voice conversion processing after the processed voice conversion data is obtained, so that the service platform manages the to-be-processed voice data and the corresponding voice conversion processing.

In an optional implementation manner, the receiving module 10 is specifically configured to:

The voice data processing device provided by the invention receives the voice data to be processed, carries out voice recognition on the voice data to be processed to determine the voice keyword information corresponding to the voice data, determines the language recognition support packet of the type corresponding to the voice data to be processed according to the voice keyword information, converts the voice data to be processed according to the determined language recognition support packet to obtain the processed voice conversion data, obtains the voice keyword information of the voice data to be processed by filtering, and selects the language recognition support packet which is most matched with the voice data to be processed from the language recognition support packets of different types according to the voice keyword information, so that the voice data to be processed is processed by using the language recognition support packet, the flow is simple, and the use is facilitated.

Fig. 5 is a hardware schematic diagram of a speech data processing apparatus according to a fourth embodiment of the present invention. As shown in fig. 5, the apparatus for processing voice data includes: a processor 42 and a computer program stored on the memory 41 and executable on the processor 42, the processor 42 executing the method of the above embodiment when executing the computer program.

The present invention also provides a readable storage medium comprising a program which, when run on a terminal, causes the terminal to perform the method of any of the above embodiments.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for processing voice data, comprising:

receiving voice data to be processed;

performing voice recognition on the voice data to be processed to determine voice keyword information corresponding to the voice data, where the voice keyword information corresponding to the voice data includes: keywords in the voice data to be processed and the professional type to which each keyword belongs;

selecting a language identification support packet with the highest matching degree with the voice keyword information from various types of language identification support packets by using a preset naive Bayes algorithm as the language identification support packet of the voice data to be processed;

converting the voice data to be processed according to the determined language identification support packet of the voice data to be processed to obtain processed voice conversion data;

the method for selecting the language identification support packet with the highest matching degree with the voice keyword information from the language identification support packets of various types by using the preset naive Bayesian algorithm as the language identification support packet of the voice data to be processed comprises the following steps:

selecting a language identification support packet with the highest probability as the voice data to be processed according to the probability that each keyword in the voice keyword information belongs to each type of language identification support packet, the number of different keywords appearing in the voice data to be processed and the number of times of each keyword appearing in the voice data to be processed;

the voice recognition is performed on the voice data to be processed to determine the voice keyword information corresponding to the voice data, and the method comprises the following steps:

determining the professional type of each keyword appearing in the voice data to be processed, and determining the number of different keywords appearing in the voice data to be processed, the number of times of each keyword appearing in the voice data, and the weight value of each keyword.

2. The method of processing speech data according to claim 1, wherein the type of the language recognition support package comprises: a medical language identification support package, a customer service language identification support package, and a conference language identification support package.

3. The method for processing voice data according to claim 1 or 2, wherein after obtaining the processed voice conversion data, the method further comprises:

4. The method according to claim 1 or 2, wherein the receiving the voice data to be processed comprises:

5. An apparatus for processing voice data, comprising:

the receiving module is used for receiving voice data to be processed;

the recognition module is used for performing voice recognition on the voice data to be processed to determine voice keyword information corresponding to the voice data, wherein the voice keyword information corresponding to the voice data comprises: keywords in the voice data to be processed and the professional type to which each keyword belongs;

the language packet conversion module is used for selecting the language identification support packet with the highest matching degree with the voice keyword information from the various types of language identification support packets according to the voice keyword information and taking the language identification support packet as the language identification support packet of the voice data to be processed; the voice processing device is also used for converting the voice data to be processed according to the determined language identification supporting packet of the voice data to be processed to obtain processed voice conversion data;

the language pack conversion module is specifically configured to: determining the probability that each keyword in the voice keyword information belongs to each type of language identification support packet according to a preset keyword vector table;

the identification module is specifically configured to:

6. An electronic device, comprising: a memory, a processor coupled to the memory, and a computer program stored on the memory and executable on the processor,

the processor, when executing the computer program, performs the method of any of claims 1-4.

7. A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1-4.