CN113743120A

CN113743120A - Statement processing method and device

Info

Publication number: CN113743120A
Application number: CN202111042496.5A
Authority: CN
Inventors: 李林峰; 黄海荣
Original assignee: Hubei Ecarx Technology Co Ltd
Current assignee: Ecarx Hubei Tech Co Ltd
Priority date: 2021-09-07
Filing date: 2021-09-07
Publication date: 2021-12-03
Anticipated expiration: 2041-09-07
Also published as: CN113743120B

Abstract

The practical application provides a sentence processing method and a sentence processing device, in the process of processing sentences, word combinations in the sentences to be processed are extracted, and after the word combinations are input into a machine learning model, sentence breaking of the sentences to be processed, intention of sub-sentences and named entities in the sub-sentences can be achieved through the machine learning model, so that the number of the machine learning models arranged on electronic equipment is reduced, occupation of the machine learning model on storage space is reduced, time of the electronic equipment for processing the sentences can be reduced, occupation amount and occupation time of the electronic equipment for internal memory are reduced, and further speed and efficiency of the electronic equipment for processing the sentences are improved.

Description

Statement processing method and device

Technical Field

The present application relates to the field of Natural Language Processing (NLP), and in particular, to a method and an apparatus for Processing a sentence.

Background

With the continuous development of science and technology, electronic equipment used by people daily such as vehicle-mounted terminals, mobile phones and computers has the function of providing voice interaction, so that a user controls the electronic equipment in a speaking mode under the conditions of driving, working or other busy conditions, and instructs the electronic equipment to execute corresponding instructions. For the electronic device, after the sentence spoken by the user is collected, it is ensured that the subsequent execution is accurate only if the intention of the user is accurately recognized according to the sentence.

In the prior art, in order to recognize collected sentences of a user, an electronic device is usually provided with at least three machine learning models to process the sentences in sequence. For example, after a sentence is collected, the sentence is firstly divided into a plurality of sub-sentence parts through a sentence-breaking model, then the intention of each sub-sentence part is identified through an intention identification model, then a word slot in each sub-sentence is determined through an entity identification model, and finally, corresponding instructions in the user sentence can be executed according to the intention and the word slot of the plurality of sub-sentence parts obtained through the three machine learning models.

By adopting the prior art, the number of the machine learning models arranged on the electronic equipment is large, so that the calculation steps required in the sentence processing process are large, and the speed and the efficiency of sentence processing are reduced.

Disclosure of Invention

The application provides a sentence processing method and device, which are used for solving the technical problems of low processing speed and low efficiency caused by a large number of machine learning models in the process of processing sentences by electronic equipment.

A first aspect of the present application provides a statement processing method, including: obtaining a statement to be processed; extracting all single characters or word combinations of a plurality of continuous characters of the sentence to be processed; inputting all the word combinations into a machine learning model; the machine learning model is used for extracting feature information of all word combinations in the sentence to be processed and determining labels of all the word combinations according to the feature information; the tags include tags of a first type to indicate an intent of the combination of words and tags of a second type to indicate named entities included in the plurality of combinations of words; and determining the sub-sentences in the sentence to be processed, the intentions of the sub-sentences and the named entities of the sub-sentences according to the labels of all word combinations output by the machine learning model.

In an embodiment of the first aspect of the present application, the feature information includes: the first type of tag comprises an unintended tag, and an intended tag; the determining, according to the label output by the machine learning model, a sub-sentence in the sentence to be processed, an intention of the sub-sentence, and a named entity of the sub-sentence includes: determining the word combination corresponding to the intention label as a sub-statement in the sentence to be processed, and determining the intention corresponding to the intention label as the intention of the sub-statement.

In an embodiment of the first aspect of the present application, the tags of the second type include non-entity tags, and entity tags; the determining, according to the label output by the machine learning model, a sub-sentence in the sentence to be processed, an intention of the sub-sentence, and a named entity of the sub-sentence includes: and determining the word combination corresponding to the entity label as a named entity of the sub-statement in the statement to be processed.

In an embodiment of the first aspect of the present application, the extracting feature information of all the word combinations in the to-be-processed sentence includes: for each word combination, determining a feature matrix of the word combination according to a word feature vector corresponding to each word in the word combination; each word has a corresponding word feature vector, and the word feature vector combination of each word in the word combination forms a feature matrix of the word combination; determining mask information corresponding to the word combination; the mask information comprises mask length information and mask starting position information; and splicing the feature matrix and the mask information of the word combination into the feature information of the word combination.

In an embodiment of the first aspect of the present application, the extracting a word combination of all single words or multiple continuous words of the sentence to be processed includes: presetting a mask information set according to the length of the statement to be processed, wherein the mask information set comprises all preset mask information determined according to the length of the preset statement, and each mask information corresponds to a word combination in the statement to be processed; traversing all mask information in a mask information set, and obtaining word combinations corresponding to all the mask information as word combinations of all single words or a plurality of continuous words of the statement to be processed; for each word combination, determining the starting position of the word combination from the statement to be processed according to mask starting position information in the mask information; determining the ending position of the word combination from the statement to be processed according to the mask length information and the mask starting position information of the mask information; and determining the characters between the starting position and the ending position in the sentence to be processed as the word combination.

In an embodiment of the first aspect of the present application, the forming of all preset mask information in a mask information set includes: acquiring the length of a statement to be processed, wherein the length of the statement to be processed is formed by each character bit; taking each character bit as a mask initial position, and forming a plurality of mask information by a plurality of mask lengths respectively corresponding to the character bit; the mask lengths corresponding to the character positions are all lengths smaller than the detachable length, and the detachable length is the length from the character position to the tail character position; traversing all the character bits as mask initial positions, and respectively forming all mask information in a mask information forming mask information set by a plurality of mask lengths corresponding to the character bits.

In an embodiment of the first aspect of the present application, the structure of the machine learning model includes: the input layer is used for converting each character in the character and word combination into a corresponding character index number; the word embedding layer is used for determining a word characteristic vector of each word in the word combination according to the word index number, and the word characteristic vectors of all the words in the word combination form a characteristic matrix of the word combination; a first fusion layer for fusing the feature matrix of the word combination with the mask information corresponding to the word combination to obtain a first matrix as the feature information of the word combination; the convolution layer calculates the sentence vector characteristics of the word combination through the first matrix to obtain a second matrix used for indicating the sentence characteristic information of the word combination; the pooling layer is used for carrying out downsampling processing on the second matrix to obtain a third matrix of sentence characteristic information used for indicating word combinations; the full connection layer is used for mapping the third matrix to the dimensionality of a plurality of preset labels to obtain a fifth matrix, wherein the preset labels comprise a first type label and a second type label; the Softmax layer is used for carrying out normalization processing on the numerical values in the fifth matrix to obtain a sixth matrix used for indicating the probability that the word combination is mapped to each preset label; and the output layer determines the label with the highest probability in the sixth matrix as the label of the word combination. In an embodiment of the first aspect of the present application, after the obtaining the to-be-processed sentence, the method further includes: when the number of the words of the sentence to be processed meets a preset condition, inputting the sentence to be processed into the machine learning model; or, the number of the currently processed sentences is larger than a preset threshold value, and the sentences to be processed are input into the machine learning model; or when the sentence to be processed is collected under the preset condition, inputting the sentence to be processed into the machine learning model.

In an embodiment of the first aspect of the present application, the structure of the machine learning model further includes: and the second fusion layer fuses the third matrix obtained by the pooling layer and the mask information to obtain a fourth matrix used for indicating sentence characteristic information of word combinations, so that the fourth matrix is mapped to the dimensionality of a plurality of preset labels by the full-connection layer to obtain a fifth matrix.

A second aspect of the present application provides a sentence processing apparatus comprising: the acquisition module is used for acquiring the statement to be processed; the extraction module is used for extracting all single characters or word combinations of a plurality of continuous characters of the sentence to be processed; the input module is used for inputting all the word combinations into a machine learning model; the machine learning model is used for extracting feature information of all word combinations in the sentence to be processed and determining labels of all the word combinations according to the feature information; the tags include tags of a first type to indicate an intent of the combination of words and tags of a second type to indicate named entities included in the plurality of combinations of words; and the determining module is used for determining the sub-sentences in the sentences to be processed, the intentions of the sub-sentences and the named entities of the sub-sentences according to the labels of all word combinations output by the machine learning model.

A third aspect of the present application provides an electronic device comprising: a memory and a processor; the memory stores computer-executable instructions; the processor executes computer-executable instructions stored by the memory, so that the processor executes the statement processing method according to any one of the first aspect of the present application.

A fourth aspect of the present application provides a computer-readable storage medium having stored therein computer-executable instructions for implementing the sentence processing method according to any one of the first aspect of the present application when executed by a processor.

A fifth aspect of the present application provides a computer program product comprising a computer program which, when executed by a processor, implements a statement processing method as claimed in any one of the first aspects of the present application.

To sum up, the sentence processing method and apparatus provided by the present application, in the process of processing the sentences, after the sentences to be processed are obtained, the word combinations in the sentences to be processed can be extracted, and after the word combinations are input into a machine learning model, the characteristics of all the word combinations can be extracted through one machine learning model, the labels of all the word combinations are determined, and further, the sentence breaking, the intention of obtaining the sub-sentences and the named entities in the sub-sentences of the sentences to be processed can be simultaneously realized through the labels output by the machine learning model, so that the number of the machine learning models for processing the sentences arranged in the electronic device is reduced, the occupation of the machine learning model on the storage space is reduced, the sentence processing time of the electronic device can also be reduced, and the occupation amount and the occupation time of the memory of the electronic device by the sentence processing are reduced, and further, the speed and the efficiency of processing the sentences by the electronic equipment are improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a flow chart illustrating a sentence processing process performed by an electronic device;

FIG. 2 is a schematic diagram of an intent recognition model for use with an electronic device;

FIG. 3 is a schematic diagram of an entity recognition model for use with an electronic device;

FIG. 4 is a flowchart illustrating a sentence processing method according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an embodiment of a machine learning model provided herein;

fig. 6 is a schematic structural diagram of another embodiment of the machine learning model provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Before formally describing the embodiments of the present application, the following description will be made with reference to the accompanying drawings. Specifically, the method and the device are applied to a scene that the electronic device processes the sentences, wherein the electronic device can be a mobile phone, a notebook computer, a tablet computer and other electronic devices, the electronic device can collect the sentences spoken by the user, identify and execute the instructions corresponding to the sentences, and then the user can send the instructions to the electronic device through voice. Or, the electronic device may also be a vehicle-mounted terminal in the vehicle, or a main control system on the intelligent vehicle, and the vehicle-mounted electronic devices may collect the sentences of the driver during the driving process of the driver, and execute the sentences after recognizing the instructions therein, so as to provide a more intelligent driving environment, so that the driver can send the instructions to the electronic device through voice without departing from the current driving state. The method provided by the embodiments is described, but not limited, by taking an electronic device as an execution subject, and it is understood that the sentence processing method provided by the present application can also be applied in scenes such as smart home, smart industry and the like, and executed by any electronic device with relevant data processing capability.

In some embodiments, fig. 1 is a schematic flow chart of an electronic device processing a statement, where after the electronic device in the foregoing scenario receives the statement spoken by a user, a corresponding instruction in the statement needs to be identified. In the example shown in fig. 1, the sentence S includes two sub-sentence parts S1 and S2, for example, the sentence S may be "skylight is turned to 25 degrees, and the sentence S can be literally understood as two sub-sentence parts S1: "skylight one-point open" and S2: "air conditioning to 25 degrees". Therefore, the electronic device needs to first divide the sentence S into a plurality of sub-sentence parts S1 and S2 by the sentence break model, and then perform intention recognition and word slot recognition on S1 and S2, respectively. For example, the sub-sentence part S1 is processed by the intent recognition model to obtain the intent of S1 as an A-skylight operation, and then the sub-sentence part S1 is processed by the entity recognition model to obtain the word slot of S1 as a C-dot. And processing the sub-sentence part S2 through the intention recognition model to obtain the intention of S2 as B-air conditioning operation, and then processing the sub-sentence part S2 through the entity recognition model to obtain the word slot of S2 as D-25 degrees.

More specifically, fig. 2 is a schematic structural diagram of an intention recognition model used by an electronic device, and taking the intention recognition model shown in fig. 2 as an example to process a sub-sentence part S1, the intention recognition model firstly passes through an input layer, can convert each word in a sentence "skylight is opened a little" in a character string form into a word index number, and then outputs an array including a plurality of index numbers, the array is a shaping numerical value, the index array output by the input layer can be preset in length, for example, 70 words, and then the output data is an array of 70 numerical values, each numerical value represents an index of one word. Then, in the word embedding layer, the meaning of each word is expressed by using multidimensional floating point data, for example, the meaning of a word can be expressed by using an array of 128-dimensional elements, and a matrix of [70,128] can be obtained, wherein each element in the matrix is a floating point number. The floating point data corresponding to each word can be preset or obtained by training in advance, and the word embedding layer can be determined by table lookup and the like. Subsequently, in the convolutional layer, NLP intent classification processing may be used, using feature extraction of 3 words, 4 words, 5 words in length, to extract features for subsequent processing for consecutive 3 words, 4 words, 5 words. The convolutional layer, after obtaining a matrix of [70,128] of the output of the word-embedded layer, the size of the output matrix is related to the size of the convolutional kernel. For example, when the size of the convolution kernel is [3,128], the matrix size of the convolution layer output is [68,1 ]. Accordingly, a 4-byte length feature extraction corresponds to a convolution kernel [4,128], and a 5-byte length feature extraction corresponds to a convolution kernel [5,128 ]. Then, the pooling layer down-samples the output of each convolution kernel in the convolutional layer by means of down-sampling, thereby outputting a value representing the convolution result of the convolutional layer, which is the maximum value in the convolution result matrix, i.e. the maximum value replaces the whole matrix and is output. The fusion layer then combines the data output from the pooling layers into an array, for example, the 128 downsampled matrices of 3 bytes, the 128 downsampled matrices of 4 bytes, and the 128 downsampled matrices of 5 bytes obtained from the pooling layers are combined to obtain 384 values to form a one-dimensional array, where each element in the array is a floating point number. Subsequently, after the full-link layer receives the one-dimensional array output by the fusion layer, n floating point values are obtained through the change of the full-link layer, where n is a preset intention classification category number, for example, 20 user intentions that can be processed by the electronic device include: music, weather, skylights, seats, etc., where 20 floating point values are output, each value indicating the probability of an intent, a larger value indicating a higher likelihood that a sentence corresponds to the intent, and a smaller value indicating a lower likelihood that a sentence corresponds to the sentence. Finally, the output layer outputs the classification number corresponding to the largest floating point value among the n floating point values corresponding to the intentions output by the full connection layer, for example, if the floating point value corresponding to the intention "skylight operation" is the largest among the n floating point values corresponding to the intentions, the output layer outputs a preset "skylight operation" identifier 1, which indicates that the intention of the current statement S1 is "skylight operation".

Fig. 3 is a schematic structural diagram of an entity recognition model used by an electronic device, and taking the entity recognition model shown in fig. 3 as an example of processing a sub-sentence part S1, the entity recognition model firstly passes through an input layer to convert each word in a sentence "open a sky in a string" into a word index number, and then outputs an array including a plurality of index numbers. Subsequently, in the word embedding layer, the meaning of each word is expressed using multidimensional floating point data. And then, extracting the characteristics corresponding to the floating point data of each word through a bidirectional LSTM layer, wherein the LSTM is Short for Short-Term Memory network (Long Short-Term Memory), and can obtain more accurate information by combining context information when processing a statement. Illustratively, if only one direction of LSTM is used, the information of the sequence of the words and words in the sentence is lost, and the meanings of "I love you" and "I love me" cannot be distinguished, while when the model uses the two-way LSTM, one forward LSTM processes "I love you" and the other backward LSTM processes "I love me", so that the combination of the results of the 2 LSTM processes can extract the characteristics of the sequence relation of each word and word in the sentence through the common use of the two-way LSTM. The bi-directional LSTM layer outputs a matrix of size [70,2 HIDDENUNIT ], where 70 corresponds to 70 words converted by the input layer, and positive LSTM and negative LSTM result in a matrix of 140 dimensions, and HIDDENUNIT is a preset length of bi-directional LSTM, which may be 128, for example. The fully-connected layer is then used to process the bi-directional LSTM derived matrix into a new matrix of size [70, outputtdim ], where outputtdim represents the number of NERs (named entity identifications) that the entity identification model can obtain, for example, the NERs result may be: temperature, humidity, name of person, etc. Each NER result corresponds to a value of outputtdim. Then, since the matrix size of the full link layer output is [70, output die ], but eventually there is only one tag per word, 70 words are 70 tags, and the output format is a one-dimensional array of 70 elements. Therefore, in the decoding layer, the values of each link are added by means of Viterbi decoding (Viterbi decode), and a transition matrix is added to obtain the values of the whole link, and then floating point values corresponding to the number of NER results are output, wherein each value is used for indicating the probability of one NER result, and the larger the value is, the higher the possibility that the word in the statement corresponds to the NER result is. And finally, the output layer outputs the word slot corresponding to the largest floating point value and the NER result thereof from the floating point values corresponding to the NER results output by the decoding layer. For example, the NER result for the word slot "25 degrees" included in the current statement S2 may be obtained as a temperature through the output layer.

In summary, in the manner shown in fig. 1 to fig. 3, when analyzing a sentence, the electronic device may divide the sentence into a plurality of sub-sentence portions through a sentence break model, obtain an intention corresponding to the sub-sentence portion through an intention recognition model, and obtain a word slot in the sub-sentence through an entity recognition model. The subsequent electronic device can execute the corresponding instruction in the user sentence according to the intentions and word slots of the plurality of sub-sentence parts obtained by the three machine learning models.

However, when the above-mentioned method is used for performing sentence analysis, because the number of the machine learning models that need to be set by the electronic device is large, the data size of each machine learning model is large, and in addition, each machine learning model also needs to store the related comparison data for auxiliary recognition, the machine learning model occupies a large storage space of the electronic device. Meanwhile, the electronic equipment can process sentences by using only one machine learning model and then uses the next machine learning model for processing, each machine learning model needs to perform data extraction and calculation for a long time, so that the time required in the sentence processing process is long, the processing speed of the electronic equipment when processing the sentences is slow, the machine learning model occupies the memory of the electronic equipment for a long time, and the speed and the efficiency of the sentence processing of the electronic equipment are greatly reduced.

Therefore, the present application provides a sentence processing method and apparatus, which are used for solving the technical problems of slow processing speed and low efficiency caused by a large number of machine learning models in the process of processing sentences by the electronic device, so that the electronic device does not need to set more machine learning models, thereby reducing the amount of calculation by reducing the number of machine learning models, simultaneously, the electronic device does not need to store more data of the machine learning models, and the method and apparatus can also reduce the occupation amount and the occupation time of the operation of the machine learning models on the memory of the electronic device, thereby improving the speed and efficiency of processing sentences by the electronic device.

The technical solution of the present application will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. For example, fig. 4 is a schematic flowchart of an embodiment of a statement processing method provided in the present application, and the method shown in fig. 4 may be applied in a scenario where an electronic device processes a statement, and specifically includes:

s101: and acquiring a statement to be processed.

In S101, the electronic device first obtains a sentence to be processed, and then performs sentence break, intention Recognition and Named Entity Recognition (NER) processing on the sentence to be processed, so that the sentence break is performed on the sentence to be processed according to the intention and the Named Entity to obtain a plurality of sub-sentences, the intention of each sub-sentence is determined at the same time, and the Named Entity in the sub-sentence is recognized.

In some embodiments, the electronic device may be a vehicle-mounted device such as a vehicle-mounted terminal, and in S101, the electronic device may acquire, through a sound acquisition device such as a microphone, a sentence spoken by a user through voice, and perform subsequent processing; alternatively, the electronic device may also acquire a sentence or the like input by the user.

Exemplarily, assuming that the to-be-processed sentence acquired in S101 is "open skylight to one point and air-conditioned to 25 degrees", the purpose of the electronic device to perform sentence segmentation on the to-be-processed sentence is to obtain two sub-sentences "open skylight to one point" and "air-conditioned to 25 degrees", and the intentions of the two sub-sentences are "skylight" and "air-conditioned" respectively, and the purpose of performing named entity identification on the to-be-processed sentence is to obtain an entity type corresponding to the named entity "25 degrees" in the sub-sentence "air-conditioned to 25 degrees" as "temperature". The types of different entities in the sentence may be preset or pre-trained, for example, the corresponding relationship of named entities such as the words "25 degrees" - "temperature", the words "zhang san" - "name", and the words "volume" - "voice" may be specified.

S102: and extracting word combinations of all the single words and the plurality of continuous words in the sentence to be processed acquired in the S101.

Specifically, the electronic device determines all single words and multiple continuous words in the whole sentence to be processed according to preset mask information, and records each single word or multiple continuous words as a word combination in the sentence to be processed.

In some embodiments, a mask information set is preset according to the length of the to-be-processed statement, where the mask information set includes all preset mask information determined according to the length of the preset statement, each mask information in the mask information set corresponds to a word combination in the to-be-processed statement, and the mask information includes mask start position information and mask length information of the corresponding word combination. The length of the sentence to be processed is formed by all the character positions which are arranged in sequence, wherein each character position has a corresponding serial number, and each character position can be filled with a character to form the sentence to be processed. The forming of the mask information set includes: using a character bit in the sentence to be processed as the mask initial position, using the length from the character bit to the tail character bit as the dismantlable length, using any length value less than the dismantlable length as a plurality of mask lengths corresponding to the character bit, namely, the mask lengths corresponding to the text bits are all length values smaller than the dismantlable length, for example, the bit is the minimum unit representing the length of the information, the length of the statement to be processed is composed of 70-bit text bits, taking the first character bit as the initial position of the mask, the length between the first character bit and the last character bit is 70bit, taking 70bit as the detachable length to disassemble, can be disassembled into 70 lengths of less than 0,1, 2, 3, … …, 69, then any length less than 70bit detachable length is used as mask length, and respectively forms mask information with the mask starting position; according to the mode, all the character bits are traversed to be used as mask initial positions, and a plurality of mask lengths corresponding to the current character bits are respectively formed into mask information, so that all the preset mask information in the mask information set is formed. Generally, when the length of the preset statement is m character bits, the number of the mask information in the mask information set is (m +1) × m/2; therefore, when M characters are filled into each character bit in the sentence to be processed, the word combination quantity M of a single character and a plurality of continuous characters obtained according to the mask information is (M +1) × M/2.

Exemplarily, taking the to-be-processed sentence as "the skylight is opened a little and the air conditioner is adjusted to 25 degrees", the word combination of the single word or the plurality of continuous words determined in S102 can be represented by the following table 1:

TABLE 1

As shown in table 1, the electronic device divides the sentence to be processed into a plurality of word combinations labeled 1-M in table 1, where the electronic device may divide the sentence to be processed into a plurality of word combinations composed of one or more continuous words as shown in table 1 according to preset mask information. The mask information may be the starting position and length of each word in the sentence, and the word combination obtained by combining each word with n words behind the word in the sentence. N is an integer between 1 and N in sequence, and N is the total number of characters behind the character arrangement sequence. For example, the word combination of the single character arranged in sequence may be sequentially day, window, and opening … …, the word combination of a plurality of consecutive characters may take one character "day" as an example, for N characters consecutive behind the one character, N is 11, when N is 1, the word combination including 2 consecutive characters of sequence number 2 may be obtained as "skylight", until N reaches 11, the word combination including 12 consecutive characters of sequence number 12 may be obtained as "skylight opening one point air conditioner is adjusted to 25 degrees", then the window is used as the start of the word combination of a plurality of consecutive characters, when N is 10, and the word combination of a plurality of consecutive characters of sequence numbers 13-24 may be obtained. By analogy, M + 12+11+10+9+8+7+6+5+4+3+2+1 may be obtained as 78 word combinations, each word combination including a single or multiple continuous words in the sentence to be processed.

In some embodiments, the mask information for determining the multiple word combinations may be the starting position and the length of each word or a plurality of consecutive words in the sentence to be processed, and when determining the multiple word combinations in the sentence to be processed as shown in table 1 in S102, the electronic device may first determine the starting position of the word combination of the sentence to be processed according to the mask starting position information in the mask information in order to determine one word combination in the sentence to be processed. For example, if the mask start position information is the first bit, the second bit, and the third bit … … of the to-be-processed statement, each word in the to-be-processed statement corresponding to the day, the window, and the window … … is used as the word at the start position of the mask information, that is, the word at the start position of the word combination corresponding to the mask information. Then, combining the mask start position information and the mask length information in the mask information, the end position of the mask information can be determined, for example, the mask start position information is the first, second, and third bits of the to-be-processed sentence, the mask length information is 2, 3, and 4, the end positions of the mask information are the second, fourth, and sixth bits of the to-be-processed sentence, that is, the end position of the word combination corresponding to the mask information is the second, fifth, and seventh bits of the to-be-processed sentence, and the characters "skylight", "open point", and "open point empty" between the start position and the end position are respectively corresponding word combinations. Finally, the electronic equipment determines the characters between each starting position and each ending position in the characters to be processed as a character and word combination according to the determined starting positions and ending positions. For example, the word "day" between the start position 1 and the end position 1 is defined as the word combination 1, the word "skylight" between the start position 1 and the end position 2 is defined as the word combination 2, and so on, and all of the word combinations are finally determined.

S103: all word combinations determined in S102 are input into the machine learning model, so that the machine learning model outputs tags of all word combinations. The labels output by the machine learning model comprise a first type label and a second type label, the first type label is used for indicating the intention of the corresponding word combination, and the second type label is used for indicating the named entity corresponding to the word combination.

Specifically, the machine learning model provided in the embodiment of the present application has a sub-sentence obtained by segmenting a sentence to be processed according to an obtained tag, and a function of obtaining an intention of the sub-sentence and a named entity in the sentence, and when a plurality of continuous word combinations obtained from the sentence to be processed are input to the machine learning model, the machine learning model may sequentially extract feature information of all word combinations, and determine and output a tag (label) corresponding to each word combination according to the feature information, so that the sub-sentence in the sentence to be processed that can be determined by the tag, and information such as the intention of the sub-sentence and the named entity in the sub-sentence that is determined by the machine learning model can be indicated.

In some embodiments, since the features of the to-be-processed sentence are extracted by the machine learning model in the embodiment of the present application, the machine learning model may specifically determine, according to the manner shown in table 1, a word feature vector corresponding to each word in each word combination in the to-be-processed sentence, so as to obtain a feature matrix of the word combination, then determine mask information corresponding to the word combination, so as to splice the feature matrix and the mask information of the word combination into feature information of the word combination, and finally determine the label of the word combination according to the feature information. Illustratively, in the example shown in table 1, the labels output by the machine learning model for the word combination "skylight is opened a little" and the word combination "air-conditioning is adjusted to 25 degrees" are "segment 1" and "segment 2", the label "segment 1" is a label of the first type, corresponding to the intention 1 being "skylight", the label "segment 2" is a label of the first type, corresponding to the intention 2 being "air-conditioning", the label of the first type is used to indicate that a plurality of continuous words in the sentence to be processed correspond to one sub-sentence, that is, the word combinations corresponding to the segment1 and the segment2 are respectively used as the sub-sentences of the sentence to be processed; the label of the machine learning model output for the word combination of "25 degrees" is "temperature", and the label "temperature" is a second type label, and is used for indicating that the named entity category of the word combination is temperature, that is, the named entity having "temperature" in the corresponding sub-sentence.

S104: the electronic equipment determines at least two sub-sentences in the sentence to be processed, the intention of each sub-sentence and named entities included in each sub-sentence according to the labels of all word combinations output by the machine learning model.

Specifically, the tags of the first type include intention tags and non-intention tags, and the tags of the second type include entity tags and non-entity tags, and after the machine learning model obtains the tags of all word combinations shown in the last column in table 1, the electronic device may determine the sub-sentences in the sentence to be processed according to the output tags of the machine learning model. When the label of the output word combination is the intention label "segment 1" in the first type label, determining the word combination corresponding to the intention label as a sub-sentence in the sentence to be processed, for example, determining the intention label as a "skylight", determining the word combination corresponding to the intention label as a sub-sentence in the sentence to be processed, and the intention of the sub-sentence being a "skylight"; when the label of the output word combination is the intention label "segment 2" in the first type label, determining the word combination corresponding to the intention label as another sub-sentence in the sentence to be processed, for example, determining the intention label as "air conditioner", determining the word combination corresponding to the intention label as another sub-sentence in the sentence to be processed, and the intention of the sub-sentence is "air conditioner"; and when the label of the output word combination is determined to be the unintended label '0' in the first type label, determining that the word combination corresponding to the label is not a sub-sentence of the sentence to be processed.

And when the label of the output word combination is the entity label 'temperature' in the second type label, determining the word combination corresponding to the entity label as a named entity in the sub-sentence of the sentence to be processed, wherein the air conditioner is adjusted to 25 degrees, and when the label of the output word combination is determined to be the non-dependent entity label '0' in the second type label, determining that the word combination corresponding to the label is not the named entity.

It should be noted that the above statements are merely examples, and in the process of practical application, the number of the sub-statements and the named entities that can be determined by the machine learning model is not limited, and the intended categories of the sub-statements (skylights, air conditioners, seats, etc.) and the specific categories of the named entities (temperature, speed, height, etc.) may be customized, which is not limited in the embodiment of the present application.

In summary, in the sentence processing method provided by this embodiment, after the electronic device extracts all word combinations in the sentence to be processed and inputs the word combinations into a machine learning model, sentence interruption performed on the sentence to be processed can be simultaneously realized by one machine learning model to obtain a sub-sentence, an intention of the sub-sentence and a named entity in the sub-sentence, and sentence interruption, intention recognition and named entity recognition can be simultaneously realized by using one machine learning model, so that the number of machine learning models provided by the electronic device is reduced, and occupation of a storage space by the machine learning model is reduced, and as the machine learning model can simultaneously perform sentence interruption and named entity recognition in a parallel processing manner, compared with the prior art in which intention recognition can be performed after sentence interruption performed on the sentence interruption model, the time of the electronic equipment for processing the sentences can be reduced, the occupied amount and the occupied time of the sentences on the memory of the electronic equipment are reduced, and the speed and the efficiency of the electronic equipment for processing the sentences are improved.

In some embodiments, the machine learning model used in the embodiments of the present application may be trained in advance, each training sentence may be divided into a plurality of word combinations according to the mask information and input into the machine learning model during training, and the intentions of sub-sentences in the plurality of word combinations included in the training sentence and the categories of the named entities are labeled in advance. For example, training may use training sentences "skylight off-point air conditioner is adjusted to 30 degrees", "speed is higher and then turns to the right", "B song playing of a singer is louder and the like, each training sentence may be divided into a plurality of word combinations according to mask information, and labeled as different labels (for example, in" B song playing of a singer is louder, a sub-sentence of a sub-sentence "B song playing of a singer" is labeled as a first type of intention label, corresponding intention is "song", a sub-sentence of a sub-sentence "louder is labeled as a first type of intention label, corresponding intention is" volume ", a character" a "is labeled as a second type of entity label, corresponding named entity category is" singer ", and character" B "is labeled as a second type of entity label, the category of the corresponding named entity is "song", the label of one or more other characters can be marked as 0, and the like), so that the named entity can be sent to a machine learning model for learning and training. After the machine learning model learns the features of the training sentences through training and receives a plurality of word combinations of the sentences to be processed, the same features can be extracted according to the trained model, the labels corresponding to the sentences to be processed are judged by comparing the similarity with the learned features of the training sentences, and the corresponding labels shown in table 1 are output.

A specific structure of the machine learning model provided in the present application is described below with reference to the accompanying drawings. Fig. 5 is a schematic structural diagram of an embodiment of a machine learning model provided in the present application, and the machine learning model shown in fig. 5 specifically includes, between an input layer and an output layer thereof, in sequence:

and the input layer is used for converting each character in a plurality of character combinations in the sentence to be processed into a corresponding character index number, so that one character combination can be converted into an array comprising one or more index numbers. The length of the array may be predetermined, for example 70, and the output of the input layer is an array of 70 elements, each element being an index whose integer value represents a word. The embodiment is described by taking the length of 70 as an example, but not limited thereto.

The word embedding layer is used for determining a word characteristic vector of each word in the word combination according to the word index number obtained by the input layer, and the self characteristic vectors of all the words in the determined word combination form a characteristic matrix of the word combination; after receiving the array of index numbers with the size of [70,1], the word embedding layer determines a word feature vector corresponding to each word through table lookup or complex feature extraction (BERT for short) and the like according to the index numbers, wherein each word feature vector is used for representing the feature of one word, so that the word embedding layer outputs a word feature vector matrix with the size of [70, H ]. H may be in the characteristic dimension of each word, for example, taking H200 as an example.

The first fusion layer fuses the feature matrix of the word combination and the mask information corresponding to the word combination to obtain a first matrix serving as feature information of the word combination; the mask information includes the start position and the length of the word combination in the statement to be processed, and the specific values of the start position and the length can be selected with reference to the example in table 1. Then, the starting position of the preset mask in M rows in table 1 can be represented by a matrix with a size of [ M,1], the length of the mask can be represented by a matrix with a size of [ M,1], and a first matrix with a size of [ M,70, H +2] is obtained after fusion.

And the convolution layer calculates the sentence vector characteristics of the word combination through the first matrix so as to obtain a second matrix capable of indicating the sentence characteristics of the word combination. The convolutional layers may specifically perform convolution processing on the first matrix through a plurality of convolutional cores, so as to obtain a sentence feature vector output by each convolutional core, where the sentence feature vector may be extracted through Natural Language Processing (NLP), and sentence feature information obtained by the convolutional layers may be indicated by a second matrix with a size of [ M,70, H +2 ]. For example, the input of the convolutional layer is the output matrix of the first fusion layer, the output is also a matrix, the size of the matrix is related to the size of the convolutional kernel, for example, the size of the convolutional kernel is [3,3], the output matrix after the convolution is [ M,70, H +2], the specific implementation manner of the convolution is not limited in the present application, and it is only required to ensure that the manner of performing convolution on the sentence to be processed is the same as the manner of performing convolution on the training sentence, so that the features of the subsequent two sentences can be compared. In some embodiments, the convolutional layer is used to extract more feature information, especially context information between the preceding and following characters in one or more continuous characters, and when the word embedding layer considers the preceding and following context information of the characters, the convolutional layer here can be omitted, so as to directly send the matrix output by the mask fusion layer to the subsequent pooling layer for processing.

And the pooling layer is used for performing downsampling processing on the second matrix with the size of [ M,70, H +2] output by the convolutional layer to obtain a third matrix with the size of [ M,1, H +2], wherein the third matrix can be used for indicating sentence characteristic information of word combinations. The purpose of the pooling layer is to omit unimportant features from the features extracted by the convolution kernel, and the down-sampling is to find the maximum value in the second matrix and replace the whole second matrix with a third matrix composed of the maximum values. Each convolution kernel of a convolutional layer is followed by a pooling layer, the output of which is a maximum representing the result of that convolution in the second matrix. For example, the input to the pooling layer is a second matrix of [ M,70, H +2], the average pooling of the middle dimension 70, the output dimension becomes a third matrix of [ M,1, H +2 ].

The full connection layer maps the word combination of the third matrix to the dimensionality of a plurality of preset labels to obtain a fifth matrix with the size of [ M, N ]; each preset label in the N preset labels is used for indicating a label marked by a word combination obtained by dividing the training sentence according to the mask information, and the preset labels comprise a first type label and a second type label. For example, the preset tag includes at least: the label of the first type, segment1 ', is used for indicating that the text is a sub-sentence and is intended to be a skylight, the label of the first type, segment 2', is used for indicating that the text is a sub-sentence and is intended to be an air conditioner, the label of the second type, temperature, is used for indicating that a named entity of the text corresponds to a temperature type, and the label of the second type, name, is used for indicating that a named entity of the text corresponds to a name type and the like, the labels are all artificial labeling contents acquired by a machine learning network when the machine learning network learns the training sentences, and after the machine learning model learns the characteristics through training, the sentences to be processed can be analyzed according to the same characteristic processing mode as the labeled labels. In some embodiments, the fully connected layer obtains the fifth matrix Y from the fourth matrix X by specifically using the following formula: and Y is X W + B, wherein W is a weight matrix with the size [ H +3, N ], B is a bias matrix of a one-dimensional array [ N ], and B and N can be preset or obtained by training a training sentence.

And the Softmax layer is used for carrying out normalization processing on the numerical value corresponding to each preset label in the fifth matrix to obtain a sixth matrix used for indicating the probability that the word combination is mapped to each preset label. And in the M sentences in the fifth matrix, each sentence corresponds to N floating point values, and each floating point value is used for indicating the probability value of one sentence corresponding to one label. For example, in the fifth matrix, if the statement "25 degrees" corresponds to N tags having floating point values of 1, C0, C1, C2, … CN-1, the Softmax layer may normalize the floating point values corresponding to the N tags to obtain a sixth matrix with a constant size [ M, N ].

And the output layer outputs the label with the highest probability as the label of the word combination, namely, the label with the maximum floating point value corresponding to the N labels corresponding to each statement in the M statements is output according to the sixth matrix of [ M, N ], and the output can refer to the last column of labels in the table 1. For example, the sentence "skylight is opened a little" outputs the "segment 1" label corresponding to the largest floating point value among N floating point values output, the sentence "air conditioner is turned to 25 degrees" outputs the "segment 2" label corresponding to the largest floating point value, the sentence "25 degrees" outputs the "temperature" label corresponding to the largest floating point value among N floating point values, and other labels may output 0 without the largest floating point value.

In some embodiments, fig. 6 is a schematic structural diagram of another embodiment of the machine learning model provided in the present application, and the machine learning model shown in fig. 6 further includes, on the basis of the embodiment shown in fig. 5, between the pooling layer and the full connection layer: a second fused layer. The second fusion layer is used for fusing the third matrix of [ M,1, H +2] and the mask length in the mask information again to obtain a fourth matrix; the mask length can be expressed as a matrix with the size of [ M,1], and then the matrix with the size of [ M,1] is fused with the third matrix to output a fourth matrix with the size of [ M, H +3 ]. Accordingly, in fig. 6, the full link layer may be configured to map the fourth matrix with a size of [ M, H +3] to dimensions of a plurality of preset labels, resulting in the fifth matrix with a size of [ M, N ]. The other layers in fig. 6 are the same as those shown in fig. 5 and are not described again.

In some embodiments, after obtaining the sentence to be processed, the electronic device may determine the number of the words included in the sentence to be processed, if the number of the words meets a preset condition, for example, the number of the words is greater than 10, it is indicated that the sentence to be processed more likely includes a plurality of sub-sentences, and then determine that the sentence to be processed is divided into a plurality of word groups and input into the machine learning model according to the manner shown in fig. 4, so as to perform processing such as sentence breaking and intention recognition at the same time. Otherwise, if the number of words is less than 5, which may correspond to one sentence, the recognition of the intention and named entity may be performed directly by using the intention recognition model and the entity recognition model shown in fig. 2 without sentence break. Or, the more the number of the words, the higher the corresponding priority, when the electronic device is processing a plurality of sentences at the same time, and the number of the words received from the sentence to be processed is the largest among the plurality of sentences, the sentence to be processed is divided into a plurality of word groups and merged into the machine learning model to be processed as shown in fig. 4, and the sentence is divided into a plurality of word groups and merged into the machine learning model according to the number of the words of the received sentence and the order of the number of the words from large to small as shown in fig. 4.

In the foregoing embodiments, the statement processing method provided in the embodiments of the present application is described, and in order to implement each function in the statement processing method provided in the embodiments of the present application, the electronic device serving as the execution subject may include a hardware structure and/or a software module, and implement each function in the form of a hardware structure, a software module, or a hardware structure plus a software module. Whether any of the above-described functions is implemented as a hardware structure, a software module, or a hardware structure plus a software module depends upon the particular application and design constraints imposed on the technical solution.

For example, the present application also provides a sentence processing apparatus including: the device comprises an acquisition module, an extraction module, an input module and a determination module. The acquisition module is used for acquiring the statement to be processed; the extraction module is used for extracting all single characters or word combinations of a plurality of continuous characters of the sentence to be processed; the input module is used for inputting all word combinations into the machine learning model; the machine learning model is used for extracting the characteristic information of all word combinations in the sentence to be processed and determining the labels of all word combinations according to the characteristic information; the tags comprise tags of a first type and tags of a second type, the tags of the first type being used for indicating the intention of a word combination, the tags of the second type being used for indicating a named entity included in a word or a plurality of consecutive words; and the determining module is used for determining the sub-sentences in the sentence to be processed, the intentions of the sub-sentences and the named entities of the sub-sentences according to the labels of all the word combinations output by the machine learning model.

Specifically, the specific principle and implementation manner of the above steps executed by each module in the sentence processing apparatus, and the specific structure of the machine learning model may refer to the description in the sentence processing method in the foregoing embodiment of the present application, and are not described again.

It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. The processing element may be a separate processing element, or may be integrated into a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and a processing element of the apparatus may call and execute the functions of the above determination module. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when some of the above modules are implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor that can call program code. As another example, these modules may be integrated together, implemented in the form of a system-on-a-chip (SOC).

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The present application further provides an electronic device comprising: a processor and a memory; wherein the memory has stored therein a computer program, and when the processor executes the computer program, the processor is operable to execute the sentence processing method as in any of the preceding embodiments of the present application.

The present application also provides a computer-readable storage medium storing a computer program which, when executed, is operable to perform a sentence processing method as in any of the previous embodiments of the present application.

The embodiment of the present application further provides a chip for executing the instruction, where the chip is used to execute the statement processing method executed by the electronic device in any of the embodiments described above in the present application.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A sentence processing method, comprising:

obtaining a statement to be processed;

extracting all single characters or word combinations of a plurality of continuous characters of the sentence to be processed;

inputting all the word combinations into a machine learning model; the machine learning model is used for extracting feature information of all word combinations in the sentence to be processed and determining labels of all the word combinations according to the feature information; the tags include tags of a first type to indicate an intent of the combination of words and tags of a second type to indicate named entities included in the plurality of combinations of words;

and determining the sub-sentences in the sentence to be processed, the intentions of the sub-sentences and the named entities of the sub-sentences according to the labels of all word combinations output by the machine learning model.

2. The method of claim 1, wherein the first type of tag comprises an unintended tag, and an intended tag;

the determining, according to the label output by the machine learning model, a sub-sentence in the sentence to be processed, an intention of the sub-sentence, and a named entity of the sub-sentence includes:

determining the word combination corresponding to the intention label as a sub-statement in the sentence to be processed, and determining the intention corresponding to the intention label as the intention of the sub-statement.

3. The method of claim 2, wherein the second type of tag comprises a non-entity tag, and an entity tag;

and determining the word combination corresponding to the entity label as a named entity of the sub-statement in the statement to be processed.

4. The method according to any one of claims 1 to 3, wherein the extracting feature information of all the word combinations in the sentence to be processed comprises:

for each word combination, determining a feature matrix of the word combination according to a word feature vector corresponding to each word in the word combination; each word has a corresponding word feature vector, and the word feature vector combination of each word in the word combination forms a feature matrix of the word combination;

determining mask information corresponding to the word combination; the mask information comprises mask length information and mask starting position information;

and splicing the feature matrix and the mask information of the word combination into the feature information of the word combination.

5. The method according to any one of claims 1 to 3, wherein extracting all single words or word combinations of a plurality of consecutive words of the sentence to be processed comprises:

presetting a mask information set according to the length of the statement to be processed, wherein the mask information set comprises all preset mask information determined according to the length of the preset statement, and each mask information corresponds to a word combination in the statement to be processed;

traversing all mask information in a mask information set, and obtaining word combinations corresponding to all the mask information as word combinations of all single words or a plurality of continuous words of the statement to be processed;

for each word combination, determining the starting position of the word combination from the statement to be processed according to mask starting position information in the mask information; determining the ending position of the word combination from the statement to be processed according to the mask length information and the mask starting position information of the mask information; and determining the characters between the starting position and the ending position in the sentence to be processed as the word combination.

6. The method of claim 5, wherein the forming of all preset mask information in the mask information set comprises:

acquiring the length of a statement to be processed, wherein the length of the statement to be processed is formed by each character bit;

taking each character bit as a mask initial position, and forming a plurality of mask information by a plurality of mask lengths respectively corresponding to the character bit; the mask lengths corresponding to the character positions are all lengths smaller than the detachable length, and the detachable length is the length from the character position to the tail character position;

traversing all the character bits as mask initial positions, and respectively forming all mask information in a mask information forming mask information set by a plurality of mask lengths corresponding to the character bits.

7. The method of any of claims 1-3, wherein the structure of the machine learning model comprises:

the input layer is used for converting each character in the character and word combination into a corresponding character index number;

the word embedding layer is used for determining a word characteristic vector of each word in the word combination according to the word index number, and the word characteristic vectors of all the words in the word combination form a characteristic matrix of the word combination;

a first fusion layer for fusing the feature matrix of the word combination with the mask information corresponding to the word combination to obtain a first matrix as the feature information of the word combination;

the convolution layer calculates the sentence vector characteristics of the word combination through the first matrix to obtain a second matrix used for indicating the sentence characteristic information of the word combination;

the pooling layer is used for carrying out downsampling processing on the second matrix to obtain a third matrix of sentence characteristic information used for indicating word combinations;

the full connection layer is used for mapping the third matrix to the dimensionality of a plurality of preset labels to obtain a fifth matrix, wherein the preset labels comprise a first type label and a second type label;

the Softmax layer is used for carrying out normalization processing on the numerical values in the fifth matrix to obtain a sixth matrix used for indicating the probability that the word combination is mapped to each preset label;

and the output layer determines the label with the highest probability in the sixth matrix as the label of the word combination.

8. The method of claim 7, wherein the structure of the machine learning model further comprises: and the second fusion layer fuses the third matrix obtained by the pooling layer and the mask information to obtain a fourth matrix used for indicating sentence characteristic information of word combinations, so that the fourth matrix is mapped to the dimensionality of a plurality of preset labels by the full-connection layer to obtain a fifth matrix.

9. A sentence processing apparatus, comprising:

the acquisition module is used for acquiring the statement to be processed;

the extraction module is used for extracting all single characters or word combinations of a plurality of continuous characters of the sentence to be processed;

the input module is used for inputting all the word combinations into a machine learning model; the machine learning model is used for extracting feature information of all word combinations in the sentence to be processed and determining labels of all the word combinations according to the feature information; the tags include tags of a first type to indicate an intent of the combination of words and tags of a second type to indicate named entities included in the plurality of combinations of words;

and the determining module is used for determining the sub-sentences in the sentences to be processed, the intentions of the sub-sentences and the named entities of the sub-sentences according to the labels of all word combinations output by the machine learning model.

10. An electronic device, comprising: a memory and a processor; the memory stores computer-executable instructions; the processor executing the computer-executable instructions stored by the memory causes the processor to perform the statement processing method of any of claims 1 to 8.