CN111865752A

CN111865752A - Text processing device, method, electronic device and computer readable storage medium

Info

Publication number: CN111865752A
Application number: CN201910330853.4A
Authority: CN
Inventors: 龚彩霞; 查转玲
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2019-04-23
Filing date: 2019-04-23
Publication date: 2020-10-30

Abstract

The embodiment of the application provides a text processing device, a text processing method, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a first IM message text, generating a feature vector corresponding to the text style of the first IM message text, and respectively generating a feature vector for each vocabulary in the first IM message text; inputting the feature vectors corresponding to the text style and the feature vectors corresponding to the vocabularies into a language style conversion model corresponding to the target language style to obtain a second IM message text matched with the target language style; based on the context of the words in the text of the second IM message, the probability of each word appearing in a neighboring position behind other words or groups of words is determined for use in speech recognition by the speech recognition model. The embodiment of the application can improve the recognition accuracy rate of the voice recognition model for voice recognition.

Description

Text processing device, method, electronic device and computer readable storage medium

Technical Field

The present application relates to the field of information technology, and in particular, to a text processing apparatus, a text processing method, an electronic device, and a computer-readable storage medium.

Background

With the continuous and rapid development of the automobile electronic technology, the travel modes such as taxi taking and private car taking in appointment are greatly developed, the irreplaceable effect is achieved in the daily life and travel of people, and great convenience is brought to the daily life and traffic travel of people.

At present, in a protection system of a trip service platform, a call recording between a driver and a passenger is generally recognized through a voice recognition model, so as to determine whether the passenger or the driver has a risk problem of taking a car, and effectively help the driver or the passenger avoid the risk, then, a recognition effect of the voice recognition model directly affects a voice recognition result, and the recognition effect of the voice recognition model is related to the amount of training data of the model.

When training data is obtained, generally, text labeling is performed on the recording data manually, and the mode of manually labeling the text corresponding to the recording data is too low in efficiency and limited in acquirable sample data, so that the recognition accuracy of a trained speech recognition model is low, and the accuracy of speech recognition based on the speech recognition model is also low. Therefore, a method capable of improving the recognition accuracy of the speech recognition model is required.

Disclosure of Invention

In view of the above, an object of the present application is to provide a text processing apparatus, a text processing method, a text processing device, and a computer readable storage medium, so as to improve recognition accuracy when performing speech recognition using a speech recognition model.

In a first aspect, an embodiment of the present application provides a text processing apparatus, where the apparatus includes:

the acquisition module is used for acquiring a first Instant Messaging (IM) message text between the service request end and the service providing end and transmitting the acquired first IM message text to the generation module;

the generating module is used for generating a characteristic vector corresponding to the text style of the first IM message text, respectively generating a characteristic vector for each vocabulary in the first IM message text, and transmitting the generated characteristic vectors to the converting module;

the conversion module is used for inputting the feature vector corresponding to the text style of the first IM message text and the feature vector corresponding to each vocabulary generated by the generation module into a language style conversion model corresponding to a target language style to obtain a second IM message text matched with the target language style, and transmitting the second IM message text to the probability determination module;

a probability determination module, configured to determine, based on the context of each vocabulary in the second IM message text obtained by the conversion module, a probability that each vocabulary appears in an adjacent position behind another vocabulary or a vocabulary group; the probabilities are used for speech recognition by the speech recognition model.

Optionally, the conversion module is specifically configured to:

inputting the feature vectors corresponding to the text style of the first IM message text and the feature vectors corresponding to the vocabularies into an encoder of the language style conversion model according to the context of the corresponding vocabularies in the first IM message text for semantic feature extraction to obtain the feature extraction vectors corresponding to the first IM message text;

and inputting the feature extraction vector corresponding to the first IM message text and the feature vector corresponding to the target language style into a generator of the language style conversion model to obtain a second IM message text matched with the target language style.

Optionally, the apparatus further comprises: a training module to:

constructing a sample training library, wherein the sample training library comprises a first sample IM message text and a corresponding artificially labeled sample text style;

inputting the feature vectors corresponding to the sample text style and the feature vectors corresponding to each sample word in the first sample IM message text into an initial encoder of an initial language style conversion model for semantic feature extraction according to the context of the corresponding sample words in the sample IM message text to obtain sample feature extraction vectors corresponding to the first sample IM message text;

Inputting a sample feature extraction vector corresponding to the first sample IM message text and a feature vector corresponding to the target language style into an initial generator of the initial language style conversion model to obtain a second sample IM message text of the target language style;

inputting the feature vectors corresponding to the sample IM text words in the second sample IM message text of the target language style into a text style recognition model according to the context relationship of the corresponding sample IM text words in the second sample IM message text of the target language style, so as to obtain a first probability value corresponding to the second sample IM message text of the target language style;

and adjusting model parameters of the initial language style conversion model according to the principle that the difference between the obtained first probability value and a preset threshold corresponding to the target language style is minimum, so as to obtain an adjusted language style conversion model.

Optionally, the training module is further configured to:

acquiring a sample target language text corresponding to the target language style;

inputting the feature vectors corresponding to the sample target language text vocabularies in the sample target language text into an initial text style recognition model according to the context of the corresponding sample target language text vocabularies in the sample target language text to obtain a second probability value corresponding to the sample target language text of the target language style;

And adjusting model parameters of the initial text style recognition model according to a principle that the difference between the obtained second probability value and the preset threshold corresponding to the target language style is minimum, so as to obtain an adjusted text style recognition model.

Optionally, the training module is further configured to:

inputting the feature vectors corresponding to the sample IM text words in the second sample IM message text of the target language style into an initial text style recognition model according to the context relationship of the corresponding sample IM text words in the second sample IM message text of the target language style, so as to obtain a third probability value corresponding to the sample IM message text of the target language style;

and adjusting model parameters of the initial text style recognition model according to a principle that the difference between the obtained third probability value and the preset threshold corresponding to the target language style is minimum, so as to obtain an adjusted text style recognition model.

Optionally, the generating module is further configured to:

and removing the interference text in the first IM message text.

Optionally, the target language style comprises a transcriptionally textual language style.

Optionally, the obtaining module is further configured to obtain a target language text corresponding to the target language style of the manual annotation;

The probability determination module is specifically configured to determine, according to the context of each vocabulary in the target language text and the context of each vocabulary in the second IM message text, a probability that each vocabulary appears in an adjacent position behind another vocabulary or a vocabulary group.

In a second aspect, an embodiment of the present application provides a text processing method, where the method includes:

acquiring a first Instant Messaging (IM) message text between a service request end and a service provider end;

generating a characteristic vector corresponding to the text style of the first IM message text, and respectively generating a characteristic vector for each vocabulary in the first IM message text;

inputting the feature vector corresponding to the text style of the first IM message text and the feature vector corresponding to each vocabulary generated by the generation module into a language style conversion model corresponding to a target language style to obtain a second IM message text matched with the target language style;

determining the probability of each vocabulary appearing at the adjacent position behind other vocabularies or vocabularies based on the context relationship of each vocabulary in the second IM message text; the probabilities are used for speech recognition by the speech recognition model.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the method as described above.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to perform the steps of the above method.

In the text processing apparatus provided in the embodiment of the present application, the generating module generates a feature vector and a feature vector of a word for a text style corresponding to an instant messaging IM message text between the service requesting terminal and the service providing terminal, which is acquired by the acquiring module, further, the feature vector corresponding to the text style of the IM message text acquired by the acquiring module and the feature vector corresponding to each word generated by the generating module are input into a language style conversion model corresponding to a target language style through the converting module, so as to obtain a second IM message text matching the target language style, a probability that each word appears in an adjacent position behind other words or word groups is determined based on a context relationship of each word in the obtained second IM message text, and finally, the determined probability is used for speech recognition in the speech recognition model. Therefore, when the voice recognition model carries out voice recognition, the vocabulary with the maximum probability at the corresponding position can be selected as the recognized text according to the probability that each vocabulary appears at the adjacent position behind other vocabularies or vocabulary groups. In addition, the probability of each word appearing at the adjacent position behind other words or word groups is determined by using the second IM message text matched with the target language style, so that the accuracy of probability results is improved, and the recognition accuracy of the voice recognition model for carrying out voice information recognition by utilizing the probability is further improved.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

FIG. 1 is a schematic diagram illustrating a first structure of a text processing apparatus according to an embodiment of the present application;

FIG. 2 is a second schematic structural diagram of a text processing apparatus according to an embodiment of the present application;

fig. 3 is a first flowchart illustrating a text processing method according to an embodiment of the present application;

FIG. 4 is a second flowchart illustrating a text processing method according to an embodiment of the present application;

FIG. 5 is a third flowchart illustrating a text processing method according to an embodiment of the present application;

Fig. 6 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

To enable those skilled in the art to use the present disclosure, the following embodiments are presented in conjunction with a specific application scenario, "travel scenario". It will be apparent to those skilled in the art that the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the application. Although the present application primarily focuses on travel scenarios, it should be understood that this is only one exemplary embodiment.

It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.

The terms "passenger," "requestor," "service requestor," and "customer" are used interchangeably in this application to refer to an individual, entity, or tool that can request or order a service. The terms "driver," "provider," "service provider," and "provider" are used interchangeably in this application to refer to an individual, entity, or tool that can provide a service. The term "user" in this application may refer to an individual, entity or tool that requests a service, subscribes to a service, provides a service, or facilitates the provision of a service. For example, the user may be a passenger, a driver, an operator, etc., or any combination thereof. In the present application, "passenger" and "passenger terminal" may be used interchangeably, and "driver" and "driver terminal" may be used interchangeably.

The terms "service request" and "order" are used interchangeably herein to refer to a request initiated by a passenger, a service requester, a driver, a service provider, or a supplier, the like, or any combination thereof. Accepting the "service request" or "order" may be a passenger, a service requester, a driver, a service provider, a supplier, or the like, or any combination thereof. The service request may be charged or free.

The embodiment of the application can serve a travel service platform, and the travel service platform is used for providing corresponding services for a service requester according to a received travel service request of a service requester. The trip service platform may include a plurality of taxi taking systems, such as a taxi taking system, a express taxi taking system, a special taxi taking system, a tailgating taxi taking system, and the like.

The text processing method of the embodiment of the application can be applied to a server of a travel service platform and can also be applied to any other computing equipment with a processing function. In some embodiments, the server or computing device may include a processor. The processor may process information and/or data related to the service request to perform one or more of the functions described herein.

Currently, in a protection system of a travel service platform, a call recording between a driver and a passenger is generally recognized by using a conditional probability of vocabularies so as to determine whether the passenger or the driver has a riding risk problem, the conditional probability of the vocabularies (i.e. the probability that each vocabulary in an Instant Messaging (IM) message text appears in an adjacent position behind other vocabularies or vocabulary groups) affects a voice recognition result, and a recognition effect of recognizing by using the probability of the vocabularies is related to the sample data of statistical probability.

When a sample text is obtained, text labeling is generally performed on the recording data manually, if the current call recording has noise or is unclear, manual labeling needs to be performed on the current call recording repeatedly to determine the specific content of the text in the call recording, so that the number of the recording texts recognized in a short time is small, and the cost of obtaining the recording texts manually is high, thus the accuracy of the probability of the words obtained through statistics is low, and the recognition accuracy of the voice recognition model in voice recognition by using the probability is further reduced.

For convenience of description, the text is to determine an IM message text between a service request end and a service provider end in the field of travel services so as to obtain a large number of sample texts for a vocabulary frequency statistical model and improve the accuracy of conditional probability of the statistical vocabulary, and for this purpose, for the field of travel services, a generating module generates a feature vector and a feature vector of the included vocabulary for a text style corresponding to an instant messaging IM message text between the service request end and the service provider end acquired by an acquiring module, and further, inputs the feature vector corresponding to the text style of the IM message text acquired by the acquiring module and the feature vector corresponding to each vocabulary generated by the generating module into a language style conversion model corresponding to a target language style through a converting module to obtain a second IM message text matching the target language style, based on a context relationship of each vocabulary in the obtained second IM message text, the probability of each vocabulary appearing in the adjacent position behind other vocabularies or vocabulary groups is determined, and finally, the determined probability is used for speech recognition by a speech recognition model. Therefore, when the voice recognition model carries out voice recognition, the vocabulary with the maximum probability at the corresponding position can be selected as the recognized text according to the probability that each vocabulary appears at the adjacent position behind other vocabularies or vocabulary groups. In addition, the probability of each word appearing at the adjacent position behind other words or word groups is determined by using the second IM message text matched with the target language style, so that the accuracy of probability results is improved, and the recognition accuracy of the voice recognition model for carrying out voice information recognition by utilizing the probability is further improved. The embodiments of the present application will be described in detail based on this idea.

In view of the above situation, an embodiment of the present application provides a text processing apparatus applied to a backend server, and as shown in fig. 1, the apparatus includes: the device comprises an acquisition module 11, a generation module 12 and a conversion module 13.

The obtaining module 11 is configured to obtain a first instant messaging IM message text between the service request end and the service provider end, and transmit the obtained first IM message text to the generating module 12.

Here, the first instant messaging IM message text is the text of the text communication between the service requester and the service provider when traveling using the travel service platform, for example, when the service requester reserves a vehicle from a destination a to a destination B on the travel service platform, after the service provider arrives at the destination a, the service provider will send "you good, i have arrived at the destination, and we can start at any time. ", the content sent by the service provider is the text of the IM message.

The generating module 12 is configured to generate a feature vector corresponding to a text style of the first IM message text, generate a feature vector for each vocabulary in the first IM message text, and transmit the generated feature vector to the converting module 13.

The generation module 12, prior to generating the feature vectors for the words in the first IM message text, is further configured to: and performing vocabulary cutting on the first IM message text to obtain each vocabulary in the first IM message text.

Here, when vocabulary segmentation processing may be performed before generating the feature vector for the first IM message text, a tokenizer may be used to perform tokenization processing on the first IM message text, where the tokenizer includes an ansj tokenizer, a Hanlp tokenizer, a Chinese tokenization, an oracle _ sdk tokenization, and the like, and may be determined according to an actual situation; each cut vocabulary can comprise one character, two characters or three characters, the characteristic vector of the vocabulary is generally a characteristic vector representing the semantic of the vocabulary, the characteristic vector is generally obtained by inputting the vocabulary into a preset characteristic vector generation model, and the training process of the characteristic vector generation model is not described in detail herein; the text style of the text of the first IM message generally means that the content of the first IM message is text content, and the feature vector corresponding to the text style is generally obtained by using a feature vector generation model and can be determined according to actual conditions.

In the specific application process, before vocabulary cutting is performed on the first IM message text, the method further comprises the following steps: and removing the interference text in the first IM message text. The interfering text is generally non-chinese characters, such as numbers, english characters, punctuation marks, special characters, and the like. Generally, if the first IM message text includes a plurality of conversation entries sent by the service request side, before the interference text is removed, the conversation entries may be sequentially connected according to the generation time of the conversation entries.

Because the first IM message text may not only include chinese characters but also include non-chinese characters, such as english characters, punctuation marks, and the like, if the non-chinese characters are not removed, the non-chinese characters need to be input into the model each time the model is trained during subsequent training of the language-style conversion model, increasing the training times and reducing the training efficiency, and therefore, in order to improve the training efficiency and reduce the influence of numbers, punctuation marks, special symbols, and the like on the feature vectors of the determined vocabulary, the non-chinese characters need to be removed from the request information.

For example, the first IM message text is "i have reached the destination. "," good. "connect the conversation items in order by the conversation item generation time to" i have reached the destination. Is good. Eliminating non-Chinese characters in the conversation items connected in sequence to obtain 'I has reached a good destination', and further performing word segmentation processing on a plurality of conversation items without the non-Chinese characters to obtain a cut vocabulary: "i", "already", "arrived", "destination", "good".

A conversion module 13, configured to input the feature vector corresponding to the text style of the first IM message text and the feature vector corresponding to each vocabulary generated by the generation module into a language style conversion model corresponding to a target language style, to obtain a second IM message text matching the target language style, and transmit the second IM message text to the probability determination module.

Here, the text style of the first IM message text is generally strict, the feature vector of the text style of the first IM message text is generally generated by a preset vector generation model, and the feature vector has the same dimension as the feature vector of the vocabulary, for example, the dimensions are all 200 dimensions; the language style conversion model may be, but is not limited to, a convolutional neural network model, a cyclic neural network model, etc.; the target language style includes a recorded transcribed text language style, i.e., a language style in which the call recording is transcribed to text, and the recorded transcribed text language style is more spoken.

The language style conversion model at least comprises an encoder, a generator and the like, wherein the encoder is connected with the generator, the encoder is used for performing semantic feature extraction on a feature vector of words in an input first IM message text to remove a text style in the first IM message text so as to obtain semantic features of the first IM message text, and the generator is used for performing feature extraction on a feature extraction vector obtained by the encoder of the encoder so as to obtain a second IM message text which is matched with a target language style, so that the conditional probability of the words is counted by using the obtained second IM message text which is matched with the target language style, the accuracy of the counted conditional probability is improved, and the recognition accuracy of voice information recognition by using the conditional probability is also improved.

In order to facilitate traveling, a service requester generally communicates with a service provider through a call voice mode before traveling, conversation between the service requester and the service provider in the call voice is spoken, and call recording between the service requester and the service provider is generally recognized in order to determine whether a passenger or a driver has a riding risk problem, and when a call language is recognized, a text in a language style of a recorded and transcribed text is generally used, and the text in the language style of the recorded and transcribed text is less, so that the text quantity of a trained second IM message is less, the accuracy of the probability of a statistical vocabulary is also met, and in order to improve the recognition accuracy of voice recognition by utilizing the probability of the vocabulary, the text of the IM message in the text style is converted into the text in the language style of the recorded and transcribed text.

Before text style conversion is performed on the IM message text by using the language style conversion model corresponding to the target language style, training is required on the language style conversion model corresponding to the target language style, and referring to fig. 2, the apparatus in fig. 2 includes a training module 15 compared with the apparatus in fig. 1, where the training module 15 is configured to:

Here, the first sample IM message text is generally obtained from the travel service platform, the first sample IM message text is generally a plurality of session entries between the service request end and the service provider end, the sample IM message text includes a plurality of session entries, the first sample IM message text is generally time-aligned according to the session entries, and may include a session entry sent by the service request end to the service provider end, and may also include a session entry replied by the service provider end to the service request end; the sample vocabulary is obtained by performing vocabulary cutting on a first sample IM message text, and the sample target language text vocabulary is obtained by performing vocabulary cutting on the sample target language text; the vocabulary cutting process can refer to the vocabulary cutting processing process of the first IM information text; the initial language style conversion model is an untrained and adjusted model; the text style recognition model can be, but is not limited to, a text style discriminator and the like, wherein the text style discriminator is used for judging whether the input text message is a recorded and transcribed text language style; the first probability value is generally a number between 0 and 1, and the larger the first probability value is, the more probability that the text of the input text style recognition model is in the target language style is represented; the preset threshold corresponding to the target language style is generally a positive integer, and preferably, the preset threshold is 1.

In the specific implementation process, after a sample training library is constructed, the feature vectors corresponding to the manually labeled sample text style and the feature vectors corresponding to each sample word in the first sample IM message text are input into an initial encoder of an initial language style conversion model for semantic feature extraction according to the context of the corresponding sample word in the first sample IM message text, and the sample feature extraction vector corresponding to the first sample IM message text is obtained. The dimension of the feature vector corresponding to the sample vocabulary is the same as the dimension of the feature vector corresponding to the vocabulary in the first IM message text, and is generally 200 dimensions, however, it should be noted that the above example of the dimension of the feature vector is only illustrative and can be determined according to actual situations.

After the sample feature extraction vector corresponding to the first sample IM message text is obtained, the feature extraction vector corresponding to the first sample IM message text and the feature vector corresponding to the language style of the recorded and transcribed text are further input into an initial generator of the initial language style conversion model, so that the first sample IM message text has the language style of the recorded and transcribed text, therefore, the text with the language style of the recorded and transcribed text can be obtained without manually marking the call record, and the cost is saved.

Since the model parameters in the initial encoder and the initial generator of the initial language-style conversion model are not adjusted and are randomly set, if the sample text obtained by directly performing the language-style conversion model by using the randomly set model parameters is lower in accuracy in the application process compared with the sample text obtained by performing the language-style conversion model after adjusting the model parameters for many times, the model parameters of the initial language-style conversion model need to be adjusted when training the language-style conversion model, so as to improve the application accuracy of the obtained sample text.

After the second sample IM message text of the language style of the recorded and transcribed text is obtained, the feature vectors corresponding to all sample IM text vocabularies in the first sample IM message text are further input into the text style discriminator according to the context relationship of the corresponding sample IM text vocabularies in the first sample IM message text of the language style of the recorded and transcribed text, and a first probability value that the first sample IM message text of the language style of the recorded and transcribed text is obtained.

Comparing the first probability value with a preset threshold (such as 1) corresponding to the target language style, adjusting model parameters of the initial language style conversion model according to a principle that a difference value between the obtained first probability value and the preset threshold corresponding to the target language style is minimum, namely, the first probability value is infinitely close to the preset threshold corresponding to the target language style, finishing training the language style conversion model when the difference value between the first probability value and the preset threshold corresponding to the target language style is minimum, and finally obtaining the adjusted language style conversion model.

For example, the first sample IM message text is "i have reached the destination, we can start at any time", the sample IM message text is subjected to vocabulary segmentation processing, the obtained segmented vocabulary is "i", "has", "destination", "we", "any time", "can", "start", the feature vectors corresponding to the text style and "i", "has", "destination", "we", "any time", "can", "start" are respectively input into the encoder of the initial language style conversion model according to the context of the vocabulary in the sample IM message text for semantic feature extraction, the feature extraction vector corresponding to the "i have reached the destination, we can start" is obtained, the feature extraction vector corresponding to the "i have reached the destination, and the feature vector corresponding to the feature extraction vector and the language of the transcription and recording text for which we can start at any time are input into the initial language style conversion model for semantic feature extraction, and the feature extraction vector corresponding to the" i have reached the destination, we can start The generator of the style conversion model obtains the text of the language style of the recorded and transcribed text, namely 'I arrive at the destination, and we can start at any time'.

Then, for "I arrive at the destination, we can start at any time" to perform vocabulary cutting to obtain "I", "arrive", "destination", "we", "anytime", "can", "depart", "having" and "like", inputting the corresponding feature vectors of "I", "arrive", "destination", "we", "anytime", "can", "depart", "having" and "like" into the text style discriminator according to the context relation in the corresponding vocabulary, we can start at any time ", identifying the language style of" I arrive at the destination, we can start at any time "to obtain" I arrive at the destination, the first probability of "I arrive at the destination, so that the minimum difference between alpha and 1 is used for adjusting the model parameters of the initial language style conversion model, finally, the adjusted language style conversion model is obtained.

In the training process of the initial language style conversion model, parameters of the text style recognition model are generally controlled to be unchanged, the purpose is to improve the training efficiency of the initial language style conversion model, and the text style recognition model needs to be trained in order to improve the application accuracy of a training sample obtained by the language style conversion model.

The training module 15 is further configured to train the style recognition model in any one of the following manners:

the first method is as follows: acquiring a sample target language text corresponding to the target language style;

Here, the sample target language text is generally a text in a language style of a recorded and transcribed text, and is usually a text obtained by manually labeling a call recording; the larger the second probability value is, the higher the probability of representing the sample target language text as the target language style is, the higher the application accuracy of the parameters in the text style identification model is.

In the specific implementation process, after a sample language text is obtained, vocabulary cutting processing is carried out on the sample language text to obtain a plurality of sample target language text vocabularies, after the sample target language text vocabularies are obtained, the feature vectors corresponding to the sample target language text vocabularies in the sample target language text are input into the initial text style recognition model according to the context relationship of the corresponding sample target language text vocabularies in the sample target language text, and a second probability value of the sample target language text of the language style of the recorded and transcribed text as the language style of the recorded and transcribed text is obtained.

Since the model parameters of the initial text style recognition model are not adjusted and are randomly set, if the text style recognition model with the randomly set model parameters is used for recognition, the accuracy in the application process is lower compared with the recognition of the text style recognition model with model parameters adjusted for many times, therefore, when the text style recognition model is trained, the model parameters of the initial text style recognition model need to be adjusted to improve the recognition accuracy of the obtained text style.

After the second probability value is obtained, comparing the second probability value with a preset threshold (such as 1) corresponding to the target language style, adjusting model parameters of the initial text style recognition model according to a principle that a difference value between the obtained second probability value and the preset threshold corresponding to the target language style is minimum, namely, the second probability value is infinitely close to the preset threshold corresponding to the target language style, finishing training of the initial text style recognition model when the difference value between the second probability value and the preset threshold corresponding to the target language style is minimum, and finally obtaining the adjusted initial text style recognition model.

For example, the sample target language text is "i have reached the destination, we can start at any time", the vocabulary cutting process is performed on the sample target language text, the obtained words after cutting are "i", "has", "reaches", "we", "at any time", "can", "start", "has", "yawn", the feature vectors respectively corresponding to "i", "has", "reaches", "we", "at any time", "can", "start", "has", "yawn", and "yawn" are input into the initial text style recognition model according to the context of each word in the sample target language text, the obtained "i have reached the destination, we can start at any time" the corresponding second probability value β, so that the model parameter of the initial text recognition model is adjusted according to the principle that the difference between β and 1 is the minimum, and finally obtaining the adjusted text style recognition model.

The second method comprises the following steps: inputting the feature vectors corresponding to the sample IM text words in the second sample IM message text of the target language style into an initial text style recognition model according to the context relationship of the corresponding sample IM text words in the second sample IM message text of the target language style, so as to obtain a third probability value corresponding to the second sample IM message text of the target language style;

Here, the larger the third probability value is, the higher the probability that the second sample IM message text representing the target language style is in the target language style is, which indicates that the application accuracy of the parameters in the text style identification model is higher.

In a specific implementation process, after the second sample IM message text with the target language style is obtained by using the trained language style conversion magic core, the feature vectors corresponding to all sample IM text words in the second sample IM message text with the target language style are input into the initial text style recognition model according to the context relationship of the corresponding sample IM text words in the sample IM message text with the target language style, and a third probability value that the second sample IM message text with the target language style is obtained.

After the third probability value is obtained, comparing the third probability value with a preset threshold (such as 1) corresponding to the target language style, adjusting model parameters of the initial text style recognition model according to a principle that a difference value between the obtained third probability value and the preset threshold corresponding to the target language style is minimum, namely, the third probability value is infinitely close to the preset threshold corresponding to the target language style, finishing training of the initial text style recognition model when the difference value between the third probability value and the preset threshold corresponding to the target language style is minimum, and finally obtaining the adjusted initial text style recognition model.

An example of the two-way adjustment of the initial text style recognition model may refer to the adjustment process of the initial text style recognition model in the first way, which is not illustrated here.

After the language style conversion model is obtained, the obtained language style conversion model is used for converting the IM message text, and the following detailed description is given:

the generating module 12 is specifically configured to:

Here, the context relationship represents the positional relationship of each vocabulary in the first IM message text, for example, the IM message text is "i have reached the destination", and the vocabularies of "i", "has", "reached", "destination" included in the IM message text are vocabularies having a context relationship.

In a specific implementation process, after vocabulary cutting is performed on a first IM message text, and a feature vector is generated for the cut vocabulary, the feature vector corresponding to the text style of the first IM message text and the feature vector corresponding to each vocabulary are input into an encoder of a language style conversion model according to the context of the corresponding vocabulary in the first IM message text for semantic feature extraction, so that a feature extraction vector corresponding to the first IM message text is obtained, and the feature extraction vector only has semantics contained in the first IM message text.

And after the feature extraction vector corresponding to the first IM message text is obtained, inputting the feature extraction vector corresponding to the first IM message text and the feature vector corresponding to the target language style into a generator of a language style conversion model to obtain a second IM message text matched with the target language style.

For example, the first IM message text is "i can start at any time", the vocabulary cutting processing is performed on the first IM message text, the obtained words after cutting are "i", "at any time", "can", "start", the feature vectors corresponding to the text style of "i can start at any time" and the feature vectors corresponding to "i", "at any time", "can", "start" are input into the language style conversion model according to the context of the words in the first IM message text, and the second IM message text matching the target language style is obtained, i.e., "i can start at any time".

A probability determining module 14, configured to determine, based on the context relationship of each vocabulary in the second IM message text obtained by the converting module 13, a probability that each vocabulary appears at an adjacent position behind other vocabularies or vocabulary groups; the probabilities are used for speech recognition by the speech recognition model.

The probability determining module 14 needs to perform vocabulary segmentation on the second IM message text before determining the probability that each vocabulary appears in an adjacent position behind other vocabularies or vocabulary groups based on the obtained context relationship of each vocabulary in the second IM message text, and specifically includes:

And performing vocabulary cutting on the second IM message text to obtain each vocabulary in the second IM message text.

In the specific implementation process, the vocabulary segmentation is performed on the second IM message text by using a word segmentation device, where the word segmentation device includes an ansj word segmentation device, a Hanlp word segmentation device, a final word segmentation device, an oracle _ sdk word segmentation device, and the like, and may be determined according to the actual situation, and the vocabulary segmentation process of the second IM message text is not described in detail herein.

After the second IM message text is processed, when the probability of the vocabulary needs to be further determined, the probability may be specifically determined in the following manner:

the acquisition module 11 is further configured to acquire a target language text corresponding to the target language style of the manual annotation;

and a probability determining module 14, configured to determine, according to the context of each vocabulary in the target language text and the context of each vocabulary in the second IM message text, a probability that each vocabulary appears in an adjacent position behind another vocabulary or a vocabulary group.

Here, the other words may be one word, and the word group may include two or more words, and generally, the word group includes two words; the target language text corresponding to the target language style is a recorded and transcribed text corresponding to the recorded and transcribed text language style, and the recorded and transcribed text is generally obtained by manually labeling the recorded and transcribed text; the greater the probability, the greater the probability that the current vocabulary is represented in the adjacent position after the other vocabulary or vocabulary group.

When determining the probability for each vocabulary, the probability may be determined for each vocabulary in the second IM message text only, but since the number of the obtained second IM message texts is limited, in order to improve the accuracy of the probability of the determined vocabulary, the recorded transcribed text labeled manually may be added to the probability data of the determined vocabulary, that is, when determining the probability of the vocabulary, the target language text (i.e., the recorded transcribed text) corresponding to the language style of the recorded transcribed text labeled manually may be obtained, and the probability of each vocabulary may be determined according to the context of each vocabulary in the recorded transcribed text and the context of each vocabulary in the second IM message text.

In the specific implementation process, taking the second IM message text as an example for illustration, the probability of occurrence of each vocabulary having a context relationship in the second IM message text after other vocabularies or vocabulary groups adjacent to the vocabulary is counted. The word group generally includes two words, and the probability is determined according to actual application, and is generally the probability that the word appears after one word (other words) before the position adjacent to the word, or the probability that the word appears after two words (word group) before the position adjacent to the word, and the determination method of the probability of the word in the target language text is similar to that of the second IM message text, and is not repeated.

For example, the second IM message text includes three words, a, b, and c, and the context relationship of a, b, and c in the second IM message text is abc, that is, b is located after a, c is located after b, ab occurs 300 times in the second IM message text, and abc occurs 100 times in the second IM message text, then P (c | ab) ═ 100/300 ═ 0.33, that is, the probability of c occurring after ab is 0.33.

And after the probability of the vocabulary in the second IM message text and the probability of the vocabulary in the target language text are obtained, applying the obtained probabilities to a voice recognition model to realize the recognition of the voice information.

The speech recognition model recognizes the speech information based on the following ways:

the trip platform server obtains the voice of the service request end or the service providing end, and inputs the voice into the voice recognition model, the voice recognition model converts the voice into vocabulary pinyin texts after receiving the voice, the vocabulary with the maximum probability at the corresponding position can be selected as the recognized text according to the probability of the adjacent position of each vocabulary after other vocabularies or vocabulary groups, namely, the probability corresponding to each vocabulary pinyin in the vocabulary voice texts is determined from the determined probability, when the probability is determined for each vocabulary pinyin, because the same vocabulary pinyin corresponds to a plurality of vocabularies, the determined probability of each vocabulary pinyin is multiple, and the vocabulary with the maximum probability is determined as the vocabulary corresponding to each vocabulary pinyin.

When determining the probability for each vocabulary pinyin, the pinyin of the vocabulary in the probability determined by each vocabulary pinyin is the same as the current vocabulary pinyin, and the pinyins of other vocabularies or vocabulary groups in the determined probability are the same as the pinyins of other vocabularies or vocabulary groups before the current vocabulary position.

For example, the speech is "i want to go out", the pinyin text obtained by the speech recognition model is "woyaoqbeijing", when "wo" is recognized, the probability that "wo" appears at the head may be γ 1 and γ 2, if the probability of γ 2 is the maximum and the vocabulary corresponding to γ 2 is me, the vocabulary corresponding to "wo" is determined to be "i", when "yao" is recognized, the probability that "yao" appears after "wo" corresponds to γ 3 and γ 4, if the probability of γ 4 is the maximum and the vocabulary corresponding to γ 4 is "main", the vocabulary corresponding to "yao" is determined to be "main", and the recognition process of going out "is the same as the recognition process of" main ", and is not repeated here.

Referring to fig. 3, a schematic diagram of a text processing method provided in an embodiment of the present application is shown, where the method includes the following steps:

s301, acquiring a first Instant Messaging (IM) message text between a service request end and a service providing end;

s302, generating a characteristic vector corresponding to the text style of the first IM message text, and respectively generating a characteristic vector for each vocabulary in the first IM message text;

s303, inputting the feature vector corresponding to the text style of the first IM message text and the feature vector corresponding to each vocabulary generated by the generating module into a language style conversion model corresponding to a target language style to obtain a second IM message text matched with the target language style;

s304, determining the probability of the adjacent position of each vocabulary after other vocabularies or vocabulary groups based on the context of each vocabulary in the second IM message text; the probabilities are used for speech recognition by the speech recognition model.

When step S303 is executed, referring to fig. 4, the inputting the feature vector corresponding to the text style of the first IM message text and the feature vector corresponding to each vocabulary generated by the generating module into the language style conversion model corresponding to the target language style to obtain the second IM message text matching the target language style specifically includes the following steps:

S401, inputting the feature vectors corresponding to the text style of the first IM message text and the feature vectors corresponding to the vocabularies into an encoder of the language style conversion model according to the context relationship of the corresponding vocabularies in the first IM message text for semantic feature extraction, so as to obtain the feature extraction vectors corresponding to the first IM message text;

s402, inputting the feature extraction vector corresponding to the first IM message text and the feature vector corresponding to the target language style into a generator of the language style conversion model to obtain a second IM message text matched with the target language style.

Optionally, referring to fig. 5, the language style conversion model corresponding to the target language style is trained according to the following steps:

s501, constructing a sample training library, wherein the sample training library comprises a first sample IM message text and a corresponding artificially labeled sample text style;

s502, inputting the feature vectors corresponding to the sample text style and the feature vectors corresponding to each sample word in the first sample IM message text into an initial encoder of an initial language style conversion model for semantic feature extraction according to the context of the corresponding sample words in the sample IM message text to obtain sample feature extraction vectors corresponding to the first sample IM message text;

S503, inputting a sample feature extraction vector corresponding to the first sample IM message text and a feature vector corresponding to the target language style into an initial generator of the initial language style conversion model to obtain a second sample IM message text of the target language style;

s504, inputting the feature vectors corresponding to the sample IM text vocabularies in the second sample IM message text of the target language style into a text style recognition model according to the context relationship of the corresponding sample IM text vocabularies in the second sample IM message text of the target language style, and obtaining a first probability value corresponding to the second sample IM message text of the target language style;

and S505, according to the principle that the difference between the obtained first probability value and a preset threshold corresponding to the target language style is minimum, performing model parameter adjustment on the initial language style conversion model to obtain an adjusted language style conversion model.

Optionally, the text style recognition model is trained according to the following steps:

Optionally, the text style recognition model is adjusted according to the following steps:

inputting the feature vectors corresponding to the sample IM text words in the second sample IM message text of the target language style into an initial text style identification model according to the context relationship of the corresponding sample IM text words in the sample IM message text of the target language style, so as to obtain a third probability value corresponding to the sample IM message text of the target language style;

Optionally, before the generating the feature vectors for the words in the first IM message text, the method further includes:

and removing the interference text in the first IM message text.

Optionally, the method further comprises:

obtaining each vocabulary in the first IM message text by performing vocabulary cutting on the first IM message text; and performing vocabulary cutting on the second IM message text to obtain each vocabulary in the second IM message text.

Optionally, determining the probability of each vocabulary occurring in a neighboring position behind other vocabularies or groups of vocabularies comprises:

acquiring a target language text corresponding to the target language style of the manual annotation;

and determining the probability of the adjacent position of each vocabulary after other vocabularies or vocabulary groups according to the context relationship of each vocabulary in the target language text and the context relationship of each vocabulary in the second IM message text.

The description of each processing flow in the method may refer to the related description of the interaction flow between each module in the above device embodiments, and is not described in detail here.

An embodiment of the present application further provides a computer device 60, as shown in fig. 6, which is a schematic structural diagram of the computer device 60 provided in the embodiment of the present application, and includes: a processor 61, a memory 62, and a bus 63. The memory 62 stores machine-readable instructions executable by the processor 61 (such as execution instructions corresponding to the obtaining module 11, the generating module 12, the converting module 13, and the probability determining module 14 in the apparatus in fig. 1, etc.), when the computer device 60 is running, the processor 61 communicates with the memory 62 through the bus 63, and when the processor 61 executes the following processes:

inputting the feature vectors corresponding to the text style of the first IM message text and the feature vectors corresponding to the vocabularies into a language style conversion model corresponding to a target language style to obtain a second IM message text matched with the target language style;

In one possible implementation, in the instructions executed by the processor 61, the inputting the feature vector corresponding to the text style of the first IM message text and the feature vector corresponding to each vocabulary generated by the generating module into the language style conversion model corresponding to the target language style to obtain the second IM message text matching the target language style includes:

In one possible embodiment, the processor 61 executes instructions to train a language style conversion model corresponding to the target language style according to the following steps:

In one possible embodiment, processor 61 executes instructions to train the text style recognition model according to the following steps:

In one possible embodiment, processor 61 executes instructions for adapting the text style recognition model according to the following steps:

In one possible embodiment, the instructions executed by the processor 61 further include, before generating the feature vectors for the words in the first IM message text, respectively:

And removing the interference text in the first IM message text.

In a possible implementation, in the instructions executed by the processor 61, the method further includes:

In one possible embodiment, the processor 61 executes instructions wherein the target language style comprises a transcriptionally textual language style.

In one possible embodiment, the instructions executed by the processor 61 to determine the probability of each word occurring in an adjacent position after the other words or groups of words comprise:

The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program executes the steps of the processing method for responding to the user-side request.

Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when a computer program on the storage medium is executed, the text processing method can be executed, so as to solve the problem of low recognition accuracy in the prior art of performing speech recognition by using the conditional probability of vocabulary output by the vocabulary probability statistical model, in the text processing apparatus provided in the embodiment of the present application, the generating module generates a feature vector and a feature vector of included vocabulary for a text style corresponding to an instant messaging IM message text between a service requesting end and a service providing end acquired by the acquiring module, and further, inputs the feature vector corresponding to the text style of the IM message text acquired by the acquiring module and the feature vector corresponding to each vocabulary generated by the generating module into a language style conversion model corresponding to a target language style through the converting module, and finally, the determined probability is used for the voice recognition model to perform voice recognition. Therefore, when the voice recognition model carries out voice recognition, the vocabulary with the maximum probability at the corresponding position can be selected as the recognized text according to the probability that each vocabulary appears at the adjacent position behind other vocabularies or vocabulary groups. In addition, the probability of each word appearing at the adjacent position behind other words or word groups is determined by using the second IM message text matched with the target language style, so that the accuracy of probability results is improved, and the recognition accuracy of the voice recognition model for carrying out voice information recognition by utilizing the probability is further improved.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A text processing apparatus, characterized in that the apparatus comprises:

2. The text processing apparatus of claim 1, wherein the conversion module is specifically configured to:

3. The text processing apparatus of claim 2, further comprising: a training module to:

4. The text processing apparatus of claim 3, wherein the training module is further to:

5. The text processing apparatus of claim 3, wherein the training module is further to:

6. The text processing apparatus of claim 1, wherein the generation module is further to:

and removing the interference text in the first IM message text.

7. The text processing apparatus of claim 1, wherein the target language style comprises a transcriptionally textual language style.

8. The text processing apparatus according to claim 1, wherein the obtaining module is further configured to obtain a target language text corresponding to the target language style of the manual annotation;

9. A method for processing text, the method comprising:

10. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the method of claim 9 when executed.

11. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method as claimed in claim 9.