WO2021047003A1

WO2021047003A1 - Text positioning method, apparatus, device, and storage medium

Info

Publication number: WO2021047003A1
Application number: PCT/CN2019/116470
Authority: WO
Inventors: 张超; 汤耀华
Original assignee: 深圳前海微众银行股份有限公司
Priority date: 2019-09-09
Filing date: 2019-11-08
Publication date: 2021-03-18
Also published as: CN110599028A; CN110599028B

Abstract

The present application relates to the field of fintech. Disclosed in the present application are a text positioning method, an apparatus, a device, and a storage medium. The text positioning method comprises: acquiring audio recording content, and performing speech recognition processing on the audio recording content so as to obtain text to be positioned; acquiring a standard script text, and constructing a distance context model according to the standard script text; and obtaining and selecting a text segment according to the distance context model and the text to be positioned. The present application solves the technical problem in the prior art in which text quality inspection content requiring evaluation cannot be quickly positioned.

Description

Text positioning method, device, equipment and storage medium To

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201910851802.6, and the invention title is "text positioning method, device, equipment and storage medium" on September 9, 2019, the entire content of which is incorporated by reference Applying.

Technical field

This application relates to the technical field of financial technology, and in particular to a text positioning method, device, equipment and storage medium.

Background technique

With the development of computer technology, more and more technologies (big data, distributed, blockchain, artificial intelligence, etc.) are applied in the financial field. The traditional financial industry is gradually transforming to Fintech. However, due to financial The industry's security and real-time requirements also place higher requirements on technology.

At present, the quality inspection and evaluation process of the customer service industry usually requires manual random inspection of customer service recordings, and human operations often have certain subjectiveness and limitations, and cannot comprehensively and objectively evaluate the quality of customer service services; at the same time, manual random inspections may always be spot-checked Poor service quality recordings cause quality inspection imbalances, resulting in inaccurate spot checks; and manual spot checks require quality inspectors to evaluate each word, and voice recordings may contain a lot of other irrelevant information, which makes it impossible to quickly locate the text to be evaluated. Quality inspection content, resulting in the inability of quality inspectors to quickly locate the text content to be evaluated, that is, the prior art text positioning function has low positioning accuracy and low text positioning efficiency, which indirectly reduces the quality of quality inspection work and the efficiency of quality inspection.

Therefore, how to achieve high-precision text positioning and improve text positioning efficiency is a technical problem that needs to be solved urgently.

Summary of the invention

The main purpose of this application is to provide a text positioning method, device, equipment and storage medium, aiming to solve the technical problem that the text quality inspection content to be evaluated cannot be quickly located.

To achieve the foregoing objective, an embodiment of the present application provides a text locating method, the text locating method includes:

Acquiring the recording content, and performing voice recognition processing on the recording content to obtain the text to be located;

Obtaining the standard speech text, and constructing a distance context model based on the standard speech text;

According to the distance context model and the text to be located, candidate text segments are obtained.

Optionally, the step of constructing a distance context model based on the standard speech text includes:

Acquiring text characters of the standard speech text;

Performing step processing on the text characters to obtain a first step array of each text character;

Perform list sorting according to the first step array to construct a distance context model.

Optionally, the step of obtaining candidate text fragments according to the distance context model and the text to be located includes:

Acquiring text characters that meet a preset distance range in the text to be located according to the distance context model, and determining a target location range of the text characters;

The target character in the target positioning range is determined as a candidate text segment.

Optionally, the step of obtaining text characters that meet a preset distance range in the text to be located according to the distance context model includes:

Performing character traversal in the text to be located according to the first step array, so as to obtain a character sequence that meets a preset distance range from the text to be located;

Obtaining a target character sequence with the longest sequence length in the character sequence;

The to-be-positioned text corresponding to the target character sequence is confirmed as a text character.

Optionally, the step of performing character traversal in the text to be located according to the first step array to obtain a character sequence that meets a preset distance range from the text to be located further includes:

Obtaining the above text of the text to be located;

Perform step calculation according to the above text to obtain the second step array;

Perform character traversal in the text to be located according to the first step array and the second step array, so as to obtain a character sequence that meets a preset distance range from the text to be located.

Obtaining the following text of the text to be located;

Perform step calculation according to the following text to obtain the third step array;

Perform character traversal in the text to be located according to the first step array and the third step array, so as to obtain a character sequence that meets a preset distance range from the text to be located.

Acquiring the above text of the text to be located, and acquiring the following text of the text to be located;

Perform step calculation according to the above text to obtain a second step array, and perform step calculation according to the following text to obtain a third step array;

Perform character traversal in the text to be located according to the first step array, the second step array, and the third step array, so as to obtain a character sequence that meets a preset distance range from the text to be located.

The present application also provides a text positioning device, the text positioning device includes:

The recognition module is used to obtain the recording content, and perform voice recognition processing on the recording content to obtain the text to be located;

A building module for obtaining standard speech texts, and constructing a distance context model based on the standard speech texts;

The obtaining module is used to obtain candidate text fragments according to the distance context model and the text to be located.

Optionally, the building module includes:

An obtaining sub-module for obtaining text characters of the standard speech text;

The step submodule is used to perform step processing on the text characters to obtain the first step array of each text character;

The construction sub-module is used to sort the list according to the first step array to construct a distance context model.

Optionally, the acquisition module includes:

The first determining sub-module is configured to obtain text characters that meet a preset distance range in the text to be located according to the distance context model, and determine the target location range of the text characters;

The second determining sub-module is used to determine the target character in the target positioning range as a candidate text segment.

Optionally, the first determining submodule includes:

A traversal unit, configured to perform character traversal in the text to be located according to the first step array, so as to obtain a character sequence that meets a preset distance range from the text to be located;

An obtaining unit for obtaining a target character sequence with the longest sequence length in the character sequence;

The confirming unit is used to confirm the to-be-located text corresponding to the target character sequence as a text character.

Optionally, the traversal unit further includes:

The first obtaining subunit is used to obtain the above text of the text to be located;

The first step subunit is used to perform step calculation according to the above text to obtain the second step array;

The first traversal subunit is configured to perform character traversal in the text to be located according to the first step array and the second step array, so as to obtain a character sequence that meets a preset distance range from the text to be located .

Optionally, the traversal unit further includes:

The second obtaining subunit is used to obtain the following text of the text to be located;

The second step subunit is used to perform step calculation according to the following text to obtain the third step array;

The second traversal subunit is configured to perform character traversal in the text to be located according to the first step array and the third step array, so as to obtain a character sequence that meets a preset distance range from the text to be located .

Optionally, the traversal unit further includes:

The third obtaining subunit is used to obtain the above text of the text to be located, and obtain the following text of the text to be located;

The third step subunit is used to perform step calculation according to the above text to obtain a second step array, and perform step calculation according to the following text to obtain a third step array;

The third traversal subunit is used to perform character traversal in the text to be located according to the first step array, the second step array, and the third step array, so as to obtain the pre-aligned text from the text to be located. Set the character sequence of the distance range.

In addition, in order to achieve the above object, the present application also provides a device, the device including: a memory, a processor, and computer-readable instructions stored on the memory and running on the processor, wherein:

When the computer-readable instructions are executed by the processor, the steps of the text positioning method as described above are realized.

In addition, in order to achieve the above purpose, this application also provides a computer storage medium;

The computer storage medium stores computer readable instructions, and when the computer readable instructions are executed by a processor, the steps of the text positioning method as described above are realized.

This application obtains the recording content, performs speech recognition processing on the recording content to obtain the text to be located; obtains the standard speech text, and constructs a distance context model according to the standard speech text; according to the distance context model and the The text to be located is obtained, and candidate text fragments are obtained. Through the above solution, high-precision voice and text positioning is realized, the efficiency of text positioning is improved, the technical problem that the existing technology cannot quickly locate the text positioning content to be evaluated is solved, and the quality of quality inspection work and the efficiency of quality inspection are indirectly improved.

Description of the drawings

FIG. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the application;

FIG. 2 is a schematic flowchart of an embodiment of a text positioning method of this application.

The realization of the purpose, functional characteristics and advantages of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

detailed description

It should be understood that the specific embodiments described here are only used to explain the application, and not used to limit the application.

As shown in FIG. 1, FIG. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application.

The device in the embodiment of the present application may be a PC or a server device.

As shown in FIG. 1, the device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002. Among them, the communication bus 1002 is used to implement connection and communication between these components. The user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 can be a high-speed RAM memory or a stable memory (non-volatile memory), such as disk storage. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001.

Those skilled in the art can understand that the structure of the device shown in FIG. 1 does not constitute a limitation on the device, and may include more or fewer components than those shown in the figure, or a combination of certain components, or different component arrangements.

As shown in FIG. 1, a memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and computer readable instructions.

In the device shown in Figure 1, the network interface 1004 is mainly used to connect to the back-end server and communicate with the back-end server; the user interface 1003 is mainly used to connect to the client (user side) to communicate with the client; and the processor 1001 can be used to call computer-readable instructions stored in the memory 1005, and perform operations in each embodiment of the text positioning method described below.

Based on the foregoing hardware structure, an embodiment of the text positioning method of the present application is proposed.

This application belongs to the field of financial technology (Fintech). This application provides a text positioning method, which is mainly applied to devices. In an embodiment of the text positioning method, referring to FIG. 2, the text positioning method includes:

Step S10: Obtain the recording content, and perform voice recognition processing on the recording content to obtain the text to be located;

Step S20: Obtain a standard speech text, and construct a distance context model based on the standard speech text;

Step S30: Obtain candidate text fragments according to the distance context model and the text to be located.

The specific content is as follows:

In this embodiment, the recording content includes customer service recording. Generally, customer service recordings are stored in the process of customer service communication with customers, and the device will perform voice recognition on the customer service recordings to convert the customer service recordings into text to be located. It should be noted that since the subject of quality inspection is customer service, the text to be located refers to the text converted from the voice in the recording content, not the text converted from the user's voice.

The standard speech text refers to the predetermined quality control speech that must be attached to the text to be tested during the quality inspection process, that is, the text to be tested needs to have a sample of the predetermined speech, so the standard speech text is the text to be tested. Whether there is a reference standard for predetermined phonetics in the text, that is, the standard phonetic text is used to detect whether the standard phonetic text exists in the text to be detected. For example, "Thank you for your application and cooperation." Obtaining the standard speech text can be used to locate the text to be detected.

Specifically, the step of constructing a distance context model based on the standard speech text includes:

Step A1, obtaining the text characters of the standard speech text;

In this embodiment, the standard phonetic text is used as the reference sample, so it is necessary to characterize the standard phonetic text to obtain the text characters of the standard phonetic text. For example, the text characters of the standard phraseology text "Thank you for your application and cooperation" are "Gan", "Thank you", "You", "De," "Apply", "Please", "He", "With", and "He".

Step A2, performing step processing on the text characters to obtain a first step array of each text character;

Step A3: Perform list sorting according to the first step array to construct a distance context model.

Assuming that the standard phonetic text is str1, then str1=Thank you for your application and cooperation, and obtain the character length len of str1. Construct the model matrix matrix[len][len] through the following matrix formula:

matrix[i][j] = [min distance, max distance]

min distance = min(sgn(j-i)*1,j-i) if i <= j

max distance = max(sgn(j-i)*1, j-i) if i <= j

The sgn(x) is a step function. By traversing the character interval characteristics of the standard speech text, the step coordinates of the standard speech text relative to each character of the standard speech text are obtained, thereby obtaining the first step array of each text character . If the value of x is greater than 0, sgn(x) returns 1; if the value is equal to 0, it returns 0; if the value is less than 0, it returns -1. The first step array refers to an array composed of the minimum distance value and the maximum distance value between a character number and other character numbers. For example, the character number of character a is i, j, then the step array of character a is [ min distance, max distance], where the minimum distance value is min distance, the maximum distance value is max distance, and min distance is obtained by min(sgn(j-i)*1, j-1), and max The distance is obtained by max(sgn(j-i)*1, j-1), and i and j represent the character array numbers. That is to say, the step function is calculated according to the character number, and the return value obtained will be used as the input parameter of the min function and max function, so as to calculate the min function and max function, and the return value of the min function and max function as the corresponding The first step array of text characters.

According to the above matrix formula, the first step array of each text character is obtained, and the step array is arranged in a list, corresponding to the respective text characters, such as the character number of the horizontal "Thank you for your application and cooperation" with j Is the representative array word number a[j], which are a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[ 7], a[8], and the character number of "Thank you for your application and cooperation" in the vertical row is represented by i to obtain the corresponding array character number b[i]. According to the character array number and the corresponding matrix formula, you can Obtain the matrix model shown in Table 1 below:

Table 1

	sense	thank	you	of	Apply for	please	with	Match	Combine
sense	[0, 0]	[1, 1]	[1, 2]	[1, 3]	[1, 4]	[1, 5]	[1, 6]	[1, 7]	[1, 8]
thank	[-1, -1]	[0, 0]	[1, 1]	[1, 2]	[1, 3]	[1, 4]	[1, 5]	[1, 6]	[1, 7]
you	[-2, -1]	[-1, -1]	[0, 0]	[1, 1]	[1, 2]	[1, 3]	[1, 4]	[1, 5]	[1, 6]
of	[-3, -1]	[-2, -1]	[-1, -1]	[0, 0]	[1, 1]	[1, 2]	[1, 3]	[1, 4]	[1, 5]
Apply for	[-4, -1]	[-3, -1]	[-2, -1]	[-1, -1]	[0, 0]	[1, 1]	[1, 2]	[1, 3]	[1, 4]
please	[-5, -1]	[-4, -1]	[-3, -1]	[-2, -1]	[-1, -1]	[0, 0]	[1, 1]	[1, 2]	[1, 3]
with	[-6, -1]	[-5, -1]	[-4, -1]	[-3, -1]	[-2, -1]	[-1, -1]	[0, 0]	[1, 1]	[1, 2]
Match	[-7, -1]	[-6, -1]	[-5, -1]	[-4, -1]	[-3, -1]	[-2, -1]	[-1, -1]	[0, 0]	[1, 1]
Combine	[-8, -1]	[-7, -1]	[-6, -1]	[-5, -1]	[-4, -1]	[-3, -1]	[-2, -1]	[-1, -1]	[0, 0]

Therefore, the matrix model shown in Table 1 above is the distance context model.

The distance context model is a standard matrix of standard speech text, which can be used as a reference model for the text to be located. In this embodiment, the text to be located may not completely correspond to the standard speech text. Therefore, it is necessary to use standard speech The text-based distance context model performs detection and matching to filter out candidate text fragments that meet the model rules.

Further, the step of obtaining candidate text fragments according to the distance context model and the text to be located includes:

Step B1: Acquire text characters that meet a preset distance range in the text to be located according to the distance context model, and determine the target location range of the text characters;

Input the text to be located into the distance context model, and perform text character retrieval on the text to be located according to the distance context model. Assuming that the text to be positioned is "Uh, thank you for your cooperation, we will process it for you as soon as possible", then the text to be positioned is determined to be text. The device will filter the characters in the text and obtain the text characters that meet the preset distance range. The text character is a character text in text that meets the semantics of a standard telephony text, that is, a character sample that has the closest word order to the standard telephony text.

After obtaining the text character, the device will determine the target positioning range of the text character based on the start position and end position of the text character.

Specifically, the step of acquiring text characters that meet a preset distance range in the text to be located according to the distance context model includes:

Step a: Perform character traversal in the text to be located according to the first step array, so as to obtain a character sequence that meets a preset distance range from the text to be located;

The first step array in the distance context model can provide character matching and retrieval processing for the text to be located, so as to obtain the co-occurrence relationship between the characters of the text to be located, so as to retrieve the character features in the text to be located that meet the preset distance range To get to the character sequence.

Traverse each character char i of the text to be positioned, and its index is i:

If the character char i, in the first column of the matrix:

Traverse each character char j of text, its index is j:

If char j, in the first row of the matrix:

if dis(i,j)>=matrix[i][j][0] and dis(i,j)<=matrix[i][j][1]:

It is understandable that text[j] and text[i] in the text to be positioned have a co-occurrence relationship that conforms to the distance range, and all character sequences seq=text that conform to the co-occurrence relationship of the distance range with the character text[i] are obtained [j]...text[k], the text segment within the range [j:k] of the character sequence seq of the character text[i] is the text segment that determines the start and end positions.

It is understandable that the text segment within the range [j:k] of the character sequence seq of the character text[i]. The text segment within the range [j':k'] of the character sequence seq of the character text[i'].

Further, the step of performing character traversal in the text to be located according to the first step array to obtain a character sequence within a preset distance range from the text to be located further includes:

Step a1, obtaining the above text of the text to be located;

It is understandable that the distance context model is constructed by the standard phonetics text itself. Assume that M is the text of standard phonetics, and the distance context model of M itself is matrix(M, M). In addition to using M itself as the distance context model, you can also assume that M is the text of standard phonetics, L is the text above of M, and M The distance context model with LM is matrix(M, LM). It should be noted that the above text L will be displayed as the left text of the standard speech text in the model.

For example, the above text is "Thank you for calling", then the standard phonetics text and the above text construct a matrix (M, LM).

Step a2, perform step calculation according to the above text to obtain a second step array;

In the same way, the device will calculate the step array of the text above to get the second step array.

Step a3: Perform character traversal in the text to be located according to the first step array and the second step array, so as to obtain a character sequence that meets a preset distance range from the text to be located.

Since the second step array is added as the reference control group, this embodiment can perform character traversal processing on the text to be positioned based on the first step array and the second step array, and traverse more reference samples from the array to be positioned. A character sequence that meets the preset distance range is obtained in the text.

Step b: Obtain the target character sequence with the longest sequence length in the character sequence;

The character sequence represents a character sample with a co-occurrence relationship, and the character sequence is filtered by meeting the preset distance range, so the sequence length of the character sequence may not be consistent. The device will filter out the target character sequence with the longest sequence length from all character sequences. The specific method is to arrange the sequence length of each character sequence in reverse order by length to obtain the longest sequence length. The target character sequence.

For example, the character sequence of the character "match" in the text "um, thank you for your cooperation, we will process it for you as soon as possible" is "thank you for your cooperation", which is the longest target character sequence.

Step c, confirming the to-be-located text corresponding to the target character sequence as a text character.

The target character sequence corresponds to part of the characters in the text to be located, and the part of the text to be located can be determined as text characters.

Step B2: Determine the target character in the target location range as a candidate text segment.

After determining the target location range, the device will obtain the target character in the target location range from the text to be located, and determine the target character as a candidate text segment for later use.

Further, based on the first embodiment, a second embodiment of the text positioning method of the present application is proposed. In this embodiment, the character traversal is performed in the text to be positioned according to the first step array to obtain The step of obtaining a character sequence that meets the preset distance range in the text to be located further includes:

Obtaining the following text of the text to be located;

It is understandable that the distance context model is constructed by the standard phonetics text itself. Assume that M is the standard phonetic text, and the distance context model of M itself is matrix(M, M). In addition to using M itself as the distance context model, you can also assume that M is the standard phonetic text and R is the following text of M, and M and The distance context model of MR is matrix(M, MR). It should be noted that the following text R will be displayed in the model as the right text of the standard speech text.

For example, the following text is "We will handle it for you as soon as possible", then the standard language text and the following text construct a matrix (M, MR).

In the same way, the device will calculate the step array of the text below to get the third step array.

Since the third step array is added as the reference control group, this embodiment can perform character traversal processing on the text to be positioned based on the first step array and the third step array, and traverse more reference samples from the array to be positioned. A character sequence that meets the preset distance range is obtained in the text.

The distance context model in this embodiment adds a new step calculation sample based on the distance context model of the first embodiment. The model of the first embodiment uses the standard phonetic text itself to construct: M is the standard For verbal texts, the distance context model of M itself is matrix(M, M). In order to further enhance the text location efficiency of the model, this embodiment adds the following text of the text to be located into the model. The device obtains the above text for step calculation and obtains the second step array, and obtains the following text for step calculation to obtain the third step array. The principle is the same as that of the first step array.

Suppose that M is the standard phonetic text, L is the above text of M, R is the following text of M, and the distance context model of M and LMR is matrix(M, LMR). It should be noted that the above text L will be displayed as the left text of the standard spelling text in the model, and the following text R will be displayed as the right text of the standard spelling text in the model.

For example, the standard language text M is "Thank you for your application and cooperation", the text L above is "Thank you for calling", and the text R below is "We will process it for you as soon as possible", constructing a matrix (M, LMR) ).

Similar to the square table matrix matrix[M, M] of the first embodiment, in this embodiment, matrix[M, LMR] is a rectangular table matrix encoding M and LMR.

It is understandable that in this embodiment, the basic principle of matrix[M,LMR] construction is completely consistent with matrix[M,M]. The second step array is obtained by step calculation of the above text, and the second step array is obtained through the following text. Step calculation obtains the third step array. Then based on the first step array, the second step array and the third step array, the characters in the text to be positioned are traversed and searched to determine the positional relationship between the characters in the text to be positioned and other characters, and then it is determined that it conforms to the preset The character sequence of the distance range.

The following Table 2 is an example matrix[M, LMR] of this embodiment:

L: Thank you for calling

M: Thank you for your application and cooperation

R: We will handle it for you as soon as possible

M: Thank you for your application and cooperation

Based on the matrix (M, LMR), in the text to be positioned "Well thank you for your cooperation, we will process it for you as soon as possible", locate ["", "Thank you for your cooperation", "We will do it for you as soon as possible deal with"].

The advantages of matrix(M, LMR) over matrix(M, M) are:

(1) Standard speech texts are generally short, and candidate segments of the text to be located often have many errors due to speech recognition problems, which can cause short text and short text matching problems. By increasing the context and expanding the standard speech text, short text pair matching will be transformed into long text pair matching, thereby improving the matching effect and the accuracy of text positioning.

(2) After the voice of the customer service is recognized, there will be various problems, and the text in question is often random. By increasing the context, the proportion of questionable text can be better reduced, thereby improving the matching effect. The proportion of errors in short text is higher than that in long text, so it is better to use long text for matching.

In the above manner, on the basis of the first embodiment, this embodiment adds the step calculation of the following text, and increases the text positioning reference samples of the distance context model, thereby improving the efficiency of text positioning.

Further, based on the first embodiment, a third embodiment of the text location method of the present application is proposed. In this embodiment, the step of performing voice recognition processing on the recording content to obtain the text to be located includes:

Step e: Perform voice recognition processing on the recording content to obtain the first text;

In this embodiment, there may be grammatical errors or semantic divergence in the recorded content. Therefore, it is necessary to standardize the voice recognition processing on the recorded content to obtain the first text.

Step f: Perform text word segmentation processing on the first text to obtain a second text;

English words are naturally separated by spaces, and it is easy to divide them according to the spaces, but sometimes it is necessary to use multiple words as one participle, such as some nouns such as "New York" needs to be treated as a word. Since Chinese has no spaces, the first text needs to be segmented. The first text consists of multiple phrases, and the device will separate the phrases in the first text to obtain Meaningful phrases.

Through the bag of words model (Bag of Words, abbreviated as BoW, clusters the words of each text sample with the corresponding word frequency based on the characteristics of the words to achieve text vectorization to form phrase clusters; or through the word set model (Set of Words, abbreviated as SoW), is different from the bag-of-words model in that the word set model only considers whether the word appears in the text, and does not consider the word frequency.

Based on the above model, the text segmentation process is performed to obtain the second text.

Step g: Perform text error correction processing on the second text to obtain a third text;

The common errors of text errors mainly include miscellaneous characters, pure pinyin, fuzzy pronunciation, mixed pinyin and Chinese characters, and mixed pinyin and other symbols. There may be one or more of the above problems in the second text, so text error correction is required.

The text error correction process is divided into two steps. The first step is error detection, and the second step is error correction. 1. The error detection part first performs word segmentation on the second text through the Chinese word segmenter. Since the second text may contain typos, the result of word segmentation will often be segmented incorrectly, so it can be detected from both the word granularity and the word granularity. Error, integrate these two granular results of suspected errors to form a candidate set of suspected error positions; 2. Error correction part, traverse all suspected error positions, and replace words in the wrong position with sound-like and shape-like dictionaries, and then calculate by language model Sentence perplexity, compare and sort the results of all candidate sets to get the best corrected words.

Through the above text error correction processing, the third text can be obtained.

Step h: Perform text rewriting processing on the third text to obtain the text to be located.

The text rewriting process achieves the effect of cleaning up the messy text by transforming the lexical attributes in the third text. For example, the word order, grammar, vocabulary and words in the third text are amended, so as to achieve the technical effect of clear and unobstructed semantic expression of the text of the third text, and the revised text obtained after the text rewriting process is the text to be located.

Through the voice recognition processing of the customer service voice, the recognition accuracy of the text to be located is greatly improved, and effective data support is provided for subsequent applications of the text to be located. The ultimate goal of this embodiment is to serve the efficient positioning of text data.

In addition, an embodiment of the present application also proposes a text positioning device, the text positioning device includes:

Optionally, the building module includes:

Optionally, the acquisition module includes:

Optionally, the first determining submodule includes:

Optionally, the traversal unit further includes:

In addition, an embodiment of the present application also proposes a device. The device includes a memory 109, a processor 110, and computer-readable instructions that are stored on the memory 109 and can run on the processor 110. The computer-readable instructions are executed by the processor. When 110 is executed, the steps of each embodiment of the above-mentioned text positioning method are realized.

In addition, the present application also provides a computer storage medium, and the computer-readable storage medium may be a non-volatile readable storage medium.

The expanded content of the specific implementation of the device and storage medium of the application (ie, computer storage medium) is basically the same as the embodiments of the text positioning method described above, and will not be repeated here.

It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, method, article, or device. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or device that includes the element.

The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the superiority or inferiority of the embodiments.

Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disk, optical disk), including several instructions to make a device (can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the method described in each embodiment of the present application.

The embodiments of the application are described above with reference to the accompanying drawings, but the application is not limited to the above-mentioned specific embodiments. The above-mentioned specific embodiments are only illustrative and not restrictive. Those of ordinary skill in the art are Under the enlightenment of this application, many forms can be made without departing from the purpose of this application and the scope of protection of the claims, and these are all within the protection of this application.

Claims

A text positioning method, characterized in that the text positioning method includes:

Acquiring the recording content, and performing voice recognition processing on the recording content to obtain the text to be located;

Obtaining the standard speech text, and constructing a distance context model based on the standard speech text;

According to the distance context model and the text to be located, candidate text segments are obtained.
The text positioning method according to claim 1, wherein the step of constructing a distance context model according to the standard speech text comprises:

Acquiring text characters of the standard speech text;

Performing step processing on the text characters to obtain a first step array of each text character;

Perform list sorting according to the first step array to construct a distance context model.
3. The text positioning method according to claim 2, wherein the step of obtaining candidate text fragments according to the distance context model and the text to be located comprises:

Acquiring text characters that meet a preset distance range in the text to be located according to the distance context model, and determining a target location range of the text characters;

The target character in the target positioning range is determined as a candidate text segment.
3. The text positioning method according to claim 3, wherein the step of obtaining text characters that meet a preset distance range in the text to be located according to the distance context model comprises:

Performing character traversal in the text to be located according to the first step array, so as to obtain a character sequence that meets a preset distance range from the text to be located;

Obtaining a target character sequence with the longest sequence length in the character sequence;

The to-be-positioned text corresponding to the target character sequence is confirmed as a text character.
The text positioning method according to claim 4, wherein the character traversal is performed in the text to be located according to the first step array, so as to obtain a preset distance range from the text to be located The sequence of characters also includes:

Obtaining the above text of the text to be located;

Perform step calculation according to the above text to obtain the second step array;

Perform character traversal in the text to be located according to the first step array and the second step array, so as to obtain a character sequence that meets a preset distance range from the text to be located.
The text positioning method according to claim 4, wherein the character traversal is performed in the text to be located according to the first step array, so as to obtain a preset distance range from the text to be located The sequence of characters also includes:

Obtaining the following text of the text to be located;

Perform step calculation according to the following text to obtain the third step array;

Perform character traversal in the text to be located according to the first step array and the third step array, so as to obtain a character sequence that meets a preset distance range from the text to be located.
The text positioning method according to claim 4, wherein the character traversal is performed in the text to be located according to the first step array, so as to obtain a preset distance range from the text to be located The sequence of characters also includes:

Acquiring the above text of the text to be located, and acquiring the following text of the text to be located;

Perform step calculation according to the above text to obtain a second step array, and perform step calculation according to the following text to obtain a third step array;

Perform character traversal in the text to be located according to the first step array, the second step array, and the third step array, so as to obtain a character sequence that meets a preset distance range from the text to be located.
A text positioning device, characterized in that the text positioning device comprises:

The recognition module is used to obtain the recording content, and perform voice recognition processing on the recording content to obtain the text to be located;

A building module for obtaining standard speech texts, and constructing a distance context model based on the standard speech texts;

The obtaining module is used to obtain candidate text fragments according to the distance context model and the text to be located.
8. The text positioning device according to claim 8, wherein the building module comprises:

An obtaining sub-module for obtaining text characters of the standard speech text;

The step submodule is used to perform step processing on the text characters to obtain the first step array of each text character;

The construction sub-module is used to sort the list according to the first step array to construct a distance context model.
9. The text positioning device according to claim 9, wherein the acquiring module comprises:

The first determining sub-module is configured to obtain text characters that meet a preset distance range in the text to be located according to the distance context model, and determine the target location range of the text characters;

The second determining sub-module is used to determine the target character in the target positioning range as a candidate text segment.
10. The text positioning device according to claim 10, wherein the first determining submodule comprises:

A traversal unit, configured to perform character traversal in the text to be located according to the first step array, so as to obtain a character sequence that meets a preset distance range from the text to be located;

An obtaining unit for obtaining a target character sequence with the longest sequence length in the character sequence;

The confirming unit is used to confirm the to-be-located text corresponding to the target character sequence as a text character.
The text positioning device according to claim 11, wherein the traversal unit further comprises:

The first obtaining subunit is used to obtain the above text of the text to be located;

The first step subunit is used to perform step calculation according to the above text to obtain the second step array;

The first traversal subunit is configured to perform character traversal in the text to be located according to the first step array and the second step array, so as to obtain a character sequence that meets a preset distance range from the text to be located .
A device, characterized in that the device comprises: a memory, a processor, and computer-readable instructions stored on the memory and capable of running on the processor, and when the computer-readable instructions are executed by the processor To achieve the following steps:

Acquiring the recording content, and performing voice recognition processing on the recording content to obtain the text to be located;

Obtaining the standard speech text, and constructing a distance context model based on the standard speech text;

According to the distance context model and the text to be located, candidate text segments are obtained.
The device according to claim 13, wherein the step of constructing a distance context model based on the standard speech text comprises:

Acquiring text characters of the standard speech text;

Performing step processing on the text characters to obtain a first step array of each text character;

Perform list sorting according to the first step array to construct a distance context model.
The device according to claim 14, wherein the step of obtaining candidate text fragments according to the distance context model and the text to be located comprises:

Acquiring text characters that meet a preset distance range in the text to be located according to the distance context model, and determining a target location range of the text characters;

The target character in the target positioning range is determined as a candidate text segment.
The device according to claim 15, wherein the step of acquiring text characters that meet a preset distance range in the text to be located according to the distance context model comprises:

Performing character traversal in the text to be located according to the first step array, so as to obtain a character sequence that meets a preset distance range from the text to be located;

Obtaining a target character sequence with the longest sequence length in the character sequence;

The to-be-positioned text corresponding to the target character sequence is confirmed as a text character.
The device according to claim 16, wherein the character traversal is performed in the text to be located according to the first step array, so as to obtain characters that meet a preset distance range from the text to be located The sequence of steps also includes:

Obtaining the above text of the text to be located;

Perform step calculation according to the above text to obtain the second step array;

Perform character traversal in the text to be located according to the first step array and the second step array, so as to obtain a character sequence that meets a preset distance range from the text to be located.
A storage medium, characterized in that computer readable instructions are stored on the storage medium, and when the computer readable instructions are executed by a processor, the following steps are implemented:

Acquiring the recording content, and performing voice recognition processing on the recording content to obtain the text to be located;

Obtaining the standard speech text, and constructing a distance context model based on the standard speech text;

According to the distance context model and the text to be located, candidate text segments are obtained.
The storage medium of claim 18, wherein the step of constructing a distance context model based on the standard speech text comprises:

Acquiring text characters of the standard speech text;

Performing step processing on the text characters to obtain a first step array of each text character;

Perform list sorting according to the first step array to construct a distance context model.
The storage medium according to claim 19, wherein the step of obtaining candidate text fragments according to the distance context model and the text to be located comprises:

Acquiring text characters that meet a preset distance range in the text to be located according to the distance context model, and determining a target location range of the text characters;

The target character in the target positioning range is determined as a candidate text segment.