CN117493523A

CN117493523A - Semantic information determining method and device of telephone traffic text, electronic equipment and medium

Info

Publication number: CN117493523A
Application number: CN202311566768.0A
Authority: CN
Inventors: 刘涛
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-11-22
Filing date: 2023-11-22
Publication date: 2024-02-02

Abstract

The disclosure provides a semantic information determining method, device, electronic equipment and medium for telephone traffic text, and relates to the technical field of artificial intelligence. The method comprises the following steps: collecting real-time voice stream in remote call service, and converting the real-time voice stream into continuous n sentences of text content, wherein n is an integer greater than 1; for any i-th text content in n-th text content, i=1, 2, … and n-1, detecting whether the i-th text content is a sentence break compared with the i+1-th text content by using a pre-trained sentence break detection model, if not, splicing the i-th text content with the i+1-th text content, taking the spliced text content as updated i+1-th text content, and returning to the operation of performing sentence break detection by using a sentence break detection model for the updated i+1-th text content; and after all the n sentences of text contents are spliced, obtaining m sentences of text contents, wherein m is a positive integer smaller than n, and determining semantic information contained in the voice stream according to the m sentences of text contents.

Description

Semantic information determining method and device of telephone traffic text, electronic equipment and medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to a semantic information determining method, a semantic information determining device, electronic equipment, media and program products of telephone traffic texts.

Background

Currently, commercial banks, operators and the like in China have established relatively complete remote telephone services (such as the voice of customers of banks, remote bank hotlines and the like). In the service process, in order to ensure the service quality, the recorded content may be recorded, and the recorded content may be subjected to applications such as real-time transcription, offline transcription quality inspection, and the like.

In practice, in the process of transcribing a speech stream, a transcription algorithm usually uses fixed "inactive" speech as a sentence to segment, so as to segment a continuously collected speech stream into multiple sentences, and the fixed segmentation will segment a sentence of a certain character into more than two sentences (generally because of pauses, speech, thinking, etc.). The text after sentence breaking is lost compared with the complete semantic information of a sentence, and the semantic information contained in the text can not be fully mined.

Disclosure of Invention

In view of the foregoing, the present disclosure provides a semantic information determining method, apparatus, electronic device, medium and program product for traffic text.

According to a first aspect of the present disclosure, there is provided a semantic information determining method of traffic text, including: collecting real-time voice stream in remote call service, and converting the real-time voice stream into continuous n sentences of text content, wherein n is an integer greater than 1; for any i-th text content in n-th text content, i=1, 2, … and n-1, detecting whether the i-th text content is a sentence break compared with the i+1-th text content by using a pre-trained sentence break detection model, if not, splicing the i-th text content with the i+1-th text content, taking the spliced text content as updated i+1-th text content, and returning to the operation of performing sentence break detection by using a sentence break detection model for the updated i+1-th text content; and after all the n sentences of text contents are spliced, obtaining m sentences of text contents, wherein m is a positive integer smaller than n, and determining semantic information contained in the voice stream according to the m sentences of text contents.

According to an embodiment of the present disclosure, collecting real-time voice streams in a remote call service further includes: and collecting real-time voice streams according to any one of two roles of the remote call service according to the interaction sequence, wherein the two roles comprise a client side and a service side.

According to an embodiment of the present disclosure, the sentence-breaking detection model includes a lexical analysis model and a semantic analysis model, and is obtained by training in advance in the following manner: acquiring a historical voice stream in a remote call service at a historical moment, and writing the historical voice stream into continuous multi-sentence text content; marking the sentence breaking attribute of any sentence of text content in the plurality of sentences, wherein the sentence breaking attribute of any sentence of text content represents whether the sentence of text content is a sentence breaking compared with the next sentence of text content; according to the sentence-breaking attribute and a preset screening proportion, screening out a part of sentence-breaking marking data and another part of non-sentence-breaking marking data from the multi-sentence text content respectively, and taking the sentence-breaking marking data and the non-sentence-breaking marking data as training data sets; training the lexical analysis model and the semantic analysis model by using the training data set to obtain a trained lexical analysis model and a trained semantic analysis model.

According to an embodiment of the present disclosure, after the historical voice stream is transcribed into the continuous multiple text content, the method further includes: removing specific characters in the text content of the multiple sentences through a regular matching algorithm, wherein the specific characters comprise stop words and sensitive characters; and limiting the length of each text content in the multiple text contents according to the preset coding requirement of the sentence breaking detection model.

According to an embodiment of the present disclosure, training a lexical analysis model and a semantic analysis model using a training dataset to obtain a trained lexical analysis model and a semantic analysis model, respectively, comprising: aiming at any two adjacent text contents in the training data set, performing sentence breaking detection on the two text contents through an lexical analysis model and a semantic analysis model respectively; carrying out weighted summation on sentence-breaking detection results of the lexical analysis model and the semantic analysis model to obtain a final sentence-breaking detection result of the two-sentence text content; and comparing the final sentence-breaking detection result with the sentence-breaking attribute marked by the text content of the two sentences to adjust the parameters of the lexical analysis model and the semantic analysis model until the expected performance index is reached, and obtaining the trained lexical analysis model and the semantic analysis model.

According to an embodiment of the present disclosure, weighted summation of sentence-break detection results of an lexical analysis model and a semantic analysis model includes: respectively giving a first weight and a second weight to sentence breaking detection results of the lexical analysis model and the semantic analysis model, wherein the sum of the first weight and the second weight is 1, and the first weight is smaller than the second weight; and calculating the final sentence-breaking detection result of the two-sentence text content according to the sentence-breaking detection results of the lexical analysis model and the semantic analysis model, the first weight and the second weight.

According to an embodiment of the present disclosure, detecting whether an i-th sentence text content is a sentence break as compared to an i+1-th sentence text content using a pre-trained sentence break detection model includes: and determining whether the ith text content is a sentence break compared with the (i+1) th text content according to the final sentence break detection result of the ith text content and the (i+1) th text content.

According to the embodiment of the disclosure, the lexical analysis model adopts an NLTK model; the semantic analysis model adopts a Word2Vec model, a GloVe model or a BERT model.

A second aspect of the present disclosure provides a semantic information determining apparatus for traffic text, including: the voice stream writing module is used for collecting real-time voice streams in the remote call service, writing the real-time voice streams into continuous n sentences of text contents, wherein n is an integer greater than 1; the sentence breaking detection and splicing module is used for detecting whether the ith text content is a sentence breaking compared with the (i+1) th text content or not according to any (i=1, 2, …, n-1) th text content in the n-th text content by utilizing a pre-trained sentence breaking detection model, if not, splicing the ith text content with the (i+1) th text content, taking the spliced text content as the (i+1) th text content after updating, and returning to the (i+1) th text content after updating by utilizing the sentence breaking detection model for carrying out sentence breaking detection; the semantic information determining module is used for obtaining m sentences of text contents after all n sentences of text contents are spliced, wherein m is a positive integer smaller than n, and semantic information contained in the voice stream is determined according to the m sentences of text contents.

A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of determining semantic information for traffic text described above.

A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described method of determining semantic information of traffic text.

The fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described semantic information determination method of traffic text.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:

fig. 1 schematically illustrates an application scenario of a semantic information determination method and apparatus for traffic text according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a method of semantic information determination of traffic text according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a pre-training process of a sentence-break detection model in accordance with an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of a preprocessing process of multiple sentences of text content according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow diagram for training a lexical analysis model and a semantic analysis model according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a flow chart for weighted summation of sentence-break detection results in accordance with an embodiment of the present disclosure;

FIG. 7 schematically illustrates a block diagram of a semantic information determining apparatus for traffic text according to an embodiment of the present disclosure;

fig. 8 schematically illustrates a block diagram of an electronic device adapted to implement a semantic information determination method of traffic text according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Some of the block diagrams and/or flowchart illustrations are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, when executed by the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart. The techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). Additionally, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon, the computer program product being for use by or in connection with an instruction execution system.

In the technical solution of the present disclosure, the related user information (including, but not limited to, user personal information, user image information, user equipment information, such as location information, etc.) and data (including, but not limited to, data for analysis, stored data, displayed data, etc.) are information and data authorized by the user or sufficiently authorized by each party, and the related data is collected, stored, used, processed, transmitted, provided, disclosed, applied, etc. and processed, all in compliance with the related laws and regulations and standards of the related country and region, necessary security measures are taken, no prejudice to the public order, and corresponding operation entries are provided for the user to select authorization or rejection.

The embodiment of the disclosure provides a semantic information determining method of a traffic text, comprising the following steps: collecting real-time voice stream in remote call service, and converting the real-time voice stream into continuous n sentences of text content, wherein n is an integer greater than 1; for any i-th text content in n-th text content, i=1, 2, … and n-1, detecting whether the i-th text content is a sentence break compared with the i+1-th text content by using a pre-trained sentence break detection model, if not, splicing the i-th text content with the i+1-th text content, taking the spliced text content as updated i+1-th text content, and returning to the operation of performing sentence break detection by using a sentence break detection model for the updated i+1-th text content; and after all the n sentences of text contents are spliced, obtaining m sentences of text contents, wherein m is a positive integer smaller than n, and determining semantic information contained in the voice stream according to the m sentences of text contents.

Fig. 1 schematically illustrates an application scenario of a semantic information determining method and apparatus for traffic text according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.

As shown in fig. 1, the application scenario 100 according to this embodiment may include … …. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that, the method for determining semantic information of traffic text provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the semantic information determining apparatus for traffic text provided by the embodiments of the present disclosure may be generally provided in the server 105. The semantic information determination method of traffic text provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the semantic information determining apparatus for traffic text provided by the embodiments of the present disclosure may also be provided in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The semantic information determining method of the traffic text according to the embodiment of the present disclosure will be described in detail below with reference to fig. 2 to 6 based on the system architecture described in fig. 1.

Fig. 2 schematically illustrates a flow chart of a semantic information determination method of traffic text according to an embodiment of the present disclosure.

As shown in fig. 2, the semantic information determining method of the traffic text of this embodiment may include operations S210 to S230, and the semantic information determining method of the traffic text may be performed by the server 105 described above.

In operation S210, a real-time voice stream is collected in the remote call service, and the real-time voice stream is transcribed into continuous n sentences of text content, where n is an integer greater than 1.

For example, the real-time voice stream may be a voice stream of a telephone bank call interaction and a voice stream of a remote online customer service interaction, and specifically may be a voice stream of call content in a call process of a remote bank agent and a customer.

For example, ASR (speech transcription algorithm) constructed based on the RNN of deep learning intelligently transcribes real-time speech streams collected during the interaction of remote telephone calls to obtain continuous multi-sentence text content.

In operation S220, for any I-th text content in the n-th text content, i=1, 2, …, n-1, using a pre-trained sentence breaking detection model to detect whether the I-th text content is a sentence breaking as compared with the i+1-th text content, if not, splicing the I-th text content with the i+i-th text content, using the spliced text content as the updated i+1-th text content, and returning to the operation of performing sentence breaking detection by using the sentence breaking detection model for the updated i+1-th text content.

In operation S230, m text contents are obtained after all n text contents are spliced, m is a positive integer smaller than n, and semantic information contained in the voice stream is determined according to the m text contents.

According to the embodiment of the disclosure, aiming at the situation that sentence breaking occurs when the voice stream of the call content is transcribed in the call process of the remote bank agent and the client, sentence breaking detection is carried out on continuous multi-sentence text content, whether the continuous content is a sentence or not is judged, and if the continuous content is a sentence, splicing is carried out, so that good data is provided for fully mining the semantic information of the interactive text. Further, abundant information contained in the interactive text is fully mined, and sentence information with complete semantics can be further provided for emotion recognition, language and gas recognition, intention recognition, interactive text desensitization and the like.

In an embodiment of the present disclosure, the step S210 collects a real-time voice stream in a remote call service, and further includes: and collecting real-time voice streams according to any one of two roles of the remote call service according to the interaction sequence, wherein the two roles comprise a client side and a service side.

For example, sound collection is performed after the remote call service is established, and because the remote call service includes a client side and a service side (remote bank agent), for mining semantic information contained in the client side, real-time voice streams of the client side in the call process can be collected sequentially according to the interaction sequence of the client side and the service side aiming at the role of the client side. Similarly, in order to mine semantic information contained in the service party, the role of the service party can be that the real-time voice stream of the service party in the conversation process is collected sequentially according to the interaction sequence of the client party and the service party.

Fig. 3 schematically illustrates a flow chart of a pre-training process of a sentence-break detection model according to an embodiment of the present disclosure.

As shown in fig. 3, in the embodiment of the present disclosure, the sentence-break detection model in operation S220 includes a lexical analysis model and a semantic analysis model, and the sentence-break detection model is pre-trained by the following operations S301 to S304.

In operation S301, a history voice stream is acquired in a remote call service at a history time, and the history voice stream is rewritten into a continuous multi-sentence text content.

For example, intelligent transcription is performed on call voice data manually served at a history time by using an ASR (speech transcription algorithm) to obtain continuous multi-sentence text content as basic data of a sentence-break detection model. In order to ensure the reliability of data, high-quality training data are constructed, and for texts in specific fields (such as remote banks of banks, voices of clients and the like), corresponding field professional vocabularies and knowledge bases can be supplemented to improve the accuracy of the model, so that the processing effect of the interactive texts is better and meets the requirements of the specific fields.

In operation S302, the sentence breaking attribute of any one of the multiple sentences of text contents is marked, wherein the sentence breaking attribute of any one of the multiple sentences of text contents characterizes whether the sentence of text contents is a sentence breaking or not compared with the next sentence of text contents.

For any text content of the transcribed multiple text content, the sentence breaking attribute marked for the text content of the sentence (i.e. whether the text content of the sentence is a sentence breaking compared with the text content of the next sentence) is divided into two types, namely a sentence breaking or a non-sentence breaking (not a sentence breaking), for example, the label is carried out as shown in the following formula:

Then, the remote banking staff (seat) can label the text content of multiple sentences, mainly extracts the sentence fragments in actual transcription, combines the sentence fragments, labels the sentence as 1, screens the non-sentence fragments, labels the sentence as 0 after combining, as the following two labeled examples:

1. non-punctuation:

current sentence text content sequence= { "we X province Y city" };

next text content nextsequence= { "that business bank is on that house lending" };

the sentence breaking attribute label= { "0" }.

2. Is a sentence break:

current sentence text content sequence= { "we X province Y that commercial bank is on that house lending" };

next text content nextsequence= { "i want to check what the payoff amount is in me month" };

the sentence breaking attribute label= { "1" }.

In operation S303, a part of sentence-breaking annotation data and another part of non-sentence-breaking annotation data are respectively screened out from the multiple sentence text contents according to the sentence-breaking attribute and the predetermined screening proportion, and are used as training data sets.

In the actual data extraction process, the combination should be performed in a certain proportion, which is defined by the service personnel according to the actual proportion of the service. For example, there are 60 clauses in an actual call, and the proportion of 10 clauses is about 15% -20% if the 10 clauses are to be spliced. That is, in the data, the data construction data set of the sentence-breaking marking data and the non-sentence-breaking marking data can be selected, the proportion of the sentence breaking in the data set is 15% -20%, and the data construction data set is combined in this way to form the training data set.

In operation S304, training the lexical analysis model and the semantic analysis model using the training data set, respectively, to obtain a trained lexical analysis model and semantic analysis model.

Through the embodiment of the disclosure, a sentence-breaking detection model based on two parts of an lexical analysis model and a semantic analysis model is designed, a continuous detection scheme is established, continuous multi-sentence text contents are detected, and whether two adjacent sentence text contents in the continuous multi-sentence text contents are sentence-breaking or not is determined by taking the detection results of the two parts of the lexical analysis model and the semantic analysis model as references, so that whether the two adjacent sentence text contents in front and back can be spliced or not is determined.

Fig. 4 schematically illustrates a flowchart of a preprocessing process of multiple sentences of text content according to an embodiment of the present disclosure.

As shown in fig. 4, in the embodiment of the present disclosure, after the above-described operation S301 writes the history voice stream into the continuous multi-sentence text content, the following operations S401 to S402 may be further included.

In operation S401, specific characters in the text contents of the plurality of sentences are removed by a regular matching algorithm, the specific characters including stop words and sensitive characters.

The operation is used for data cleaning. For example, the sensitive character may be an identification number, a cell phone number, an address, etc. The identity card number (such as num len=18), the mobile phone number (such as num len=11), the address and the like can be removed through a regular matching algorithm, so that the privacy of data is ensured.

In operation S402, length limitation is performed on each text content in the plurality of text contents according to a coding requirement preset by the sentence-breaking detection model.

To meet the length requirements of the model code, such as the length constraint 512 of the BERT model, the length constraint may be applied to each text content, removing a single long sentence, too short nonsensical sentences ("aizhi", "feed, hello" …), etc.

FIG. 5 schematically illustrates a flow diagram for training a lexical analysis model and a semantic analysis model according to an embodiment of the present disclosure.

As shown in fig. 5, in the embodiment of the present disclosure, the above operation S304 trains the lexical analysis model and the semantic analysis model using the training data set, to obtain a trained lexical analysis model and a trained semantic analysis model, which may include the following operations S501 to S503.

In operation S501, for any two adjacent text contents in the training data set, sentence breaking detection is performed on the two text contents through an lexical analysis model and a semantic analysis model, respectively.

In operation S502, weighted summation is performed on the sentence-breaking detection results of the lexical analysis model and the semantic analysis model, so as to obtain a final sentence-breaking detection result of the two-sentence text content.

In operation S503, the final sentence-breaking detection result is compared with the sentence-breaking attribute marked by the text content of the two sentences to adjust the parameters of the lexical analysis model and the semantic analysis model until the expected performance index is reached, and the trained lexical analysis model and the semantic analysis model are obtained.

According to the embodiment of the disclosure, the sentence-breaking detection is performed by using two modes of the lexical analysis model and the semantic analysis model, and then the weighted calculation is performed on the two sentence-breaking detection results, so that the sentence-breaking detection results are more accurate.

In the embodiment of the disclosure, the lexical analysis model adopts an NLTK model; the semantic analysis model adopts a Word2Vec model, a GloVe model or a BERT model.

Specifically, firstly, the lexical characteristics of two sentences are mainly analyzed according to the lexical analysis model, and the sentence breaking detection detail process is as described in the steps A1) to A5).

A1 Input and output of the lexical analysis model is constructed. Text (through analysis, word stopping, word stem/word shape reduction, word diversity reduction) is input as two sentences, and a boolean value is output to indicate whether the sentence is a sentence break or not.

A2 Training with NLTK model, the training data set is the data set constructed in operation S303, and the data set contains two samples of positive example and negative example with proper proportion, wherein the positive example refers to two continuous sentences (non-broken sentences), and the negative example refers to two discontinuous sentences (broken sentences).

A3 Determining features for training the model. Features are selected that effectively distinguish between positive and negative examples, e.g., features that are used include parts of speech, word senses, grammatical structures, associations between words, etc.

A4 Training the model according to the training data set, evaluating and optimizing the model, and adjusting the quality, proportion, model parameters and the like of the positive and negative examples until the expected performance index is reached.

A5 Using a trained model (stored model) to perform sentence breaking judgment, inputting texts of two sentences, judging whether the texts are sentence breaking or not by the model according to the input texts and the trained features, and outputting a boolean value (1: breaking sentences; 0: non-sentence breaking), the result of sentence breaking detection is obtained as shown in the following formula ₁ ：

Secondly, aiming at the semantic analysis model, the sentence-breaking detection detailed process is as described in the steps B1) to B5).

B1 A semantic analysis model based on deep learning BERT is built and input and output of the model are defined. Text of two sentences is input ((through analysis, word stopping, word stem/word shape reduction, word diversity reduction) and output as a boolean value to indicate whether the sentence is a break (1: break; 0: non-break).

B2 Using Word2Vec model, gloVe model or BERT model. The training dataset should contain both positive examples, which refer to two consecutive sentences (non-broken sentences), and negative examples, which refer to two non-consecutive sentences (broken sentences).

B3 Sentence vector and text similarity (manhattan distance) are selected as features of the model to train the model. The reason for feature selection is that selecting sentence vectors and text similarity can distinguish positive examples and negative examples more effectively.

B4 Training the model according to the training data set, and evaluating and optimizing the model until the expected performance index is reached.

B5 And (3) performing sentence-breaking judgment by using the trained model. Inputting texts of two sentences, judging whether the texts are sentence-breaking or not by a model according to the input texts and trained characteristics, and outputting a Boolean value (1: sentence-breaking; 0: non-sentence-breaking) of a judging result, wherein a sentence-breaking detection result is obtained according to the following formula ₂ ：

Fig. 6 schematically illustrates a flow chart for weighted summation of sentence-break detection results in accordance with an embodiment of the present disclosure.

As shown in fig. 6, in the embodiment of the present disclosure, the above-described operation S502 performs weighted summation on the sentence-break detection results of the lexical analysis model and the semantic analysis model, and may include the following operations S601 to S602.

In operation S601, the sentence breaking detection results of the lexical analysis model and the semantic analysis model are respectively given a first weight and a second weight, wherein the sum of the first weight and the second weight is 1, and the first weight is smaller than the second weight.

In operation S602, a final sentence-breaking detection result of the two-sentence text content is calculated according to the sentence-breaking detection results of the lexical analysis model and the semantic analysis model, the first weight and the second weight.

According to the embodiment of the disclosure, for any two adjacent text contents in the training data set, sentence breaking detection is carried out on the two text contents through an lexical analysis model and a semantic analysis model. And comprehensively analyzing the sentence-breaking detection results of the two models, selecting the sentence-breaking detection results of the lexical analysis model to give lower weight in consideration of the fact that the lexical analysis model contains lower semantic information, and finally obtaining the result by weighting and summing, namely judging whether the front sentence and the rear sentence are the sentence-breaking.

For example, considering that the lexical analysis contains lower semantic information, performing weighted analysis in the combination of the results, calculating the lexical analysis weight of 0.3 and the semantic analysis result weight of 0.7, obtaining a final sentence breaking detection result as shown in the following formula, and judging whether the two sentence text contents are sentence breaking or not:

result＝result ₁ +result ₂

table 1 below shows the results of sentence-breaking detection of the lexical analysis model and the semantic analysis model in the actual calculation process ₁ 、result ₂ And finally, a sentence-breaking detection result.

In an embodiment of the present disclosure, the step S230 of detecting whether the i-th sentence text content is a sentence break compared to the i+1-th sentence text content using a pre-trained sentence break detection model includes: and determining whether the ith text content is a sentence break compared with the (i+1) th text content according to the final sentence break detection result of the ith text content and the (i+1) th text content.

Next, in the above operation S220, from the n-sentence text content after transcription, from the first sentence and the second sentence, according to the roles, the sentence breaking detection is performed from front to back one by one in order, if the sentence breaking is not performed, the splicing is performed, then the spliced sentence is used as the first sentence of the next sentence breaking detection, and then the next sentence is taken for performing the sentence breaking detection.

For example, if a pre-trained sentence-breaking detection model is used for any sentence of text content, if the sentence of text content is detected to be not a sentence-breaking compared with the next sentence of text content, the two sentences of text content are spliced, then the spliced sentence is used as the first sentence of the next sentence-breaking detection, and the next sentence is taken according to the role and the interaction sequence to carry out sentence-breaking detection. As the following examples:

sentence ₁ = { "can i hear? "}

sentence ₂ = { "i want to transfer, in cell phone banking" }

sentence ₃ = { "how does there be no my login account? "}

sentence ₄ Is? "}

…

sentence _n

In the sense 1 to sense _n The n sentences of text content are selected first ₁ And sense ₂ Performing sentence breaking detection according to the operation S220, wherein non-sentence breaking is detected, and splicing is not possible; and then to the sense ₂ And sense ₃ Performing sentence breaking detection, if the detected sentence breaking can be spliced, combining two sentences to be used as the content ₃ The method comprises the steps of carrying out a first treatment on the surface of the Then detect the spliced sense ₃ And sense ₄ . And analogically, performing sentence breaking detection on the n sentences of text contents.

Finally, the splicing result of the n sentences of text contents, namely m sentences of text contents obtained after the n sentences of text contents are spliced, is as follows:

sentence ₁ = { "can i hear? "}

sentence ₃ = { "how does i want to transfer money, there is no my login account in the cell phone bank? "}

sentence ₄ Is? "}

…

sentence _m

And finally, repeating the operations S210-S230 until the two roles in the call are compared and the broken sentences are spliced according to the interactive contents of the interactive sequence, so as to obtain the call interactive contents with complete and rich semantics.

After semantic information contained in each role is mined, intention recognition can be performed, and multi-aspect application development is provided: the telephone traffic text intention recognition can store the data lake of the enterprise, and then is used for various data analysis and application development, call abstract, intention recognition, service display, public opinion analysis and other applications.

In summary, according to the method for determining semantic information of a traffic text provided by the embodiment of the present disclosure, firstly, sound collection is performed after a remote telephone is established, a collected voice stream is transcribed into continuous multi-sentence text content through an ASR algorithm, further sentence breaking detection is performed through a sentence breaking detection model combining lexical analysis and grammar analysis, so as to determine whether splicing can be performed, and if so, the transcribed text after splicing is further applied.

The method and the system detect the broken sentences by combining two modes of lexical analysis and semantic analysis, construct a data set based on real dialogue data, provide a relatively accurate multi-role interactive content broken sentence detection and splicing method, process remote interactive transfer contents such as remote banks, and the like, and enable semantic information contained in interactive contents to be more abundant through the broken sentence splicing, so that the analysis effect is better in the application of further interactive content analysis (semantic analysis, intention recognition, and the like).

The disclosure also provides a semantic information determining device for the telephone traffic text. The device will be described in detail below in connection with fig. 7.

Fig. 7 schematically shows a block diagram of a semantic information determining apparatus for traffic text according to an embodiment of the present disclosure.

As shown in fig. 7, the semantic information determining apparatus 700 for traffic text of this embodiment includes a voice stream writing module 710, a sentence break detection and concatenation module 720, and a semantic information determining module 730.

The voice stream writing module 710 is configured to collect a real-time voice stream in the remote call service, and write the real-time voice stream into n consecutive text contents, where n is an integer greater than 1. In an embodiment, the voice stream writing module 710 may be configured to perform the operation S210 described above, which is not described herein.

The sentence breaking detection and splicing module 720 is configured to detect whether the i-th sentence text content is a sentence breaking compared with the i+1th sentence text content by using a pre-trained sentence breaking detection model for any i-th sentence text content, i=1, 2, …, n-1 in the n-th sentence text content, if not, splice the i-th sentence text content with the i+1th sentence text content, take the spliced text content as the updated i+1th sentence text content, and return the operation of performing sentence breaking detection by using the sentence breaking detection model for the updated i+1th sentence text content. In an embodiment, the sentence breaking detection and splicing module 720 may be used to perform the operation S220 described above, which is not described herein.

The semantic information determining module 730 is configured to obtain m text contents after all the n text contents are spliced, where m is a positive integer smaller than n, and determine semantic information contained in the voice stream according to the m text contents. In an embodiment, the semantic information determining module 730 may be configured to perform the operation S230 described above, which is not described herein.

According to an embodiment of the present disclosure, any of the voice stream transcription module 710, the sentence break detection and concatenation module 720, and the semantic information determination module 730 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the present disclosure, at least one of the voice stream transcription module 710, the sentence break detection and concatenation module 720, and the semantic information determination module 730 may be implemented, at least in part, as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable way of integrating or packaging the circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, at least one of the voice stream transcription module 710, the sentence break detection and concatenation module 720, and the semantic information determination module 730 may be at least partially implemented as a computer program module that, when executed, performs the corresponding functions.

As shown in fig. 8, an electronic device 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 801 may also include on-board memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the disclosure.

In the RAM 803, various programs and data required for the operation of the electronic device 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or the RAM 803. Note that the program may be stored in one or more memories other than the ROM 802 and the RAM 803. The processor 801 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

According to an embodiment of the present disclosure, the electronic device 800 may also include an input/output (I/O) interface 805, the input/output (I/O) interface 805 also being connected to the bus 804. The electronic device 800 may also include one or more of the following components connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.

The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs that, when executed, implement a semantic information determination method for traffic text according to an embodiment of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 802 and/or RAM 803 and/or one or more memories other than ROM 802 and RAM 803 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. When the computer program product runs in a computer system, the program code is used for enabling the computer system to realize the semantic information determining method of the telephone traffic text provided by the embodiment of the disclosure.

The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, and/or from a removable medium 811 via a communication portion 809. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.

The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims

1. A semantic information determining method of traffic text comprises the following steps:

collecting a real-time voice stream in a remote call service, and transferring the real-time voice stream into continuous n sentences of text contents, wherein n is an integer greater than 1;

For any ith text content in the n-sentence text content, i=1, 2, … and n-1, detecting whether the ith text content is a sentence break compared with the (i+1) th text content by using a pre-trained sentence break detection model, if not, splicing the ith text content with the (i+1) th text content, taking the spliced text content as the (i+1) th text content after updating, and returning the (i+1) th text content after updating to the operation of performing sentence break detection by using the sentence break detection model;

and after the n sentences of text contents are spliced, m sentences of text contents are obtained, wherein m is a positive integer smaller than n, and semantic information contained in the voice stream is determined according to the m sentences of text contents.

2. The method of claim 1, wherein the capturing real-time voice streams in a remote telephony service further comprises:

and collecting the real-time voice stream according to any one of two roles of the remote call service and the interaction sequence, wherein the two roles comprise a client side and a server side.

3. The method of claim 1, wherein the sentence-break detection model comprises a lexical analysis model and a semantic analysis model, the sentence-break detection model being pre-trained by:

Acquiring a historical voice stream in a remote call service at a historical moment, and writing the historical voice stream into continuous multi-sentence text content;

marking the sentence breaking attribute of any one of the plurality of sentences of text contents, wherein the sentence breaking attribute of any one of the plurality of sentences of text contents represents whether the sentence of text content is a sentence breaking compared with the next sentence of text content;

according to the sentence-breaking attribute and a preset screening proportion, screening a part of sentence-breaking marking data and another part of non-sentence-breaking marking data from the multi-sentence text content respectively to serve as a training data set;

and training the lexical analysis model and the semantic analysis model by using the training data set to obtain a trained lexical analysis model and a trained semantic analysis model.

4. The method of claim 3, wherein said writing said historical speech stream into a continuous multi-sentence text content further comprises:

removing specific characters in the text contents of the multiple sentences through a regular matching algorithm, wherein the specific characters comprise stop words and sensitive characters;

and limiting the length of each text content in the multiple text contents according to the preset coding requirement of the sentence breaking detection model.

5. The method of claim 3, wherein the training the lexical analysis model and the semantic analysis model using the training dataset to obtain a trained lexical analysis model and a trained semantic analysis model, respectively, comprises:

aiming at any two adjacent text contents in the training data set, performing sentence breaking detection on the two text contents through an lexical analysis model and a semantic analysis model respectively;

carrying out weighted summation on the sentence breaking detection results of the lexical analysis model and the semantic analysis model to obtain a final sentence breaking detection result of the two-sentence text content;

and comparing the final sentence-breaking detection result with the sentence-breaking attribute marked by the two sentences of text content to adjust the parameters of the lexical analysis model and the semantic analysis model until the expected performance index is reached, so as to obtain the trained lexical analysis model and the trained semantic analysis model.

6. The method of claim 5, wherein said weighted summing of sentence-break detection results of the lexical and semantic analysis models comprises:

respectively endowing sentence breaking detection results of the lexical analysis model and the semantic analysis model with a first weight and a second weight, wherein the sum of the first weight and the second weight is 1, and the first weight is smaller than the second weight;

And calculating the final sentence-breaking detection result of the two-sentence text content according to the sentence-breaking detection results of the lexical analysis model and the semantic analysis model, the first weight and the second weight.

7. The method of claim 5, wherein the detecting whether the i text content is a sentence break as compared to the i+1 text content using the pre-trained sentence break detection model comprises:

and determining whether the ith text content is a sentence break compared with the (i+1) th text content according to the final sentence break detection result of the ith text content and the (i+1) th text content.

8. The method of claim 3, wherein the lexical analysis model employs an NLTK model;

the semantic analysis model adopts a Word2Vec model, a GloVe model or a BERT model.

9. A semantic information determining apparatus for traffic text, comprising:

the voice stream transfer module is used for collecting real-time voice streams in the remote call service, transferring the real-time voice streams into continuous n sentences of text contents, wherein n is an integer greater than 1;

the sentence breaking detection and splicing module is used for detecting whether the ith text content is a sentence breaking compared with the (i+1) th text content or not according to any (i=1, 2, …, n-1) th text content in the n-th text content by utilizing a pre-trained sentence breaking detection model, if not, splicing the ith text content with the (i+1) th text content, taking the spliced text content as the (i+1) th text content after updating, and returning the (i+1) th text content after updating to the operation of performing sentence breaking detection by utilizing the sentence breaking detection model;

The semantic information determining module is used for obtaining m sentences of text contents after the n sentences of text contents are spliced, wherein m is a positive integer smaller than n, and determining semantic information contained in the voice stream according to the m sentences of text contents.

10. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-8.

11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-8.

12. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.