CN113743120B

CN113743120B - Statement processing method and device

Info

Publication number: CN113743120B
Application number: CN202111042496.5A
Authority: CN
Inventors: 李林峰; 黄海荣
Original assignee: Ecarx Hubei Tech Co Ltd
Current assignee: Ecarx Hubei Tech Co Ltd
Priority date: 2021-09-07
Filing date: 2021-09-07
Publication date: 2023-07-11
Anticipated expiration: 2041-09-07
Also published as: CN113743120A

Abstract

The sentence processing method and device provided by the application are used for extracting word combinations in the sentence to be processed in the sentence processing process, and inputting the word combinations into a machine learning model, then the sentence breaking of the sentence to be processed, the intention of a sub-sentence and the named entity in the sub-sentence can be simultaneously realized through the machine learning model, so that the number of the machine learning models arranged by the electronic equipment is reduced, the occupation of the machine learning model to the storage space is reduced, the sentence processing time of the electronic equipment can be reduced, the occupation amount and the occupation time of the sentence processing to the memory of the electronic equipment are reduced, and the sentence processing speed and efficiency of the electronic equipment are further improved.

Description

Statement processing method and device

Technical Field

The present invention relates to the technical field of natural language processing (Natural Language Processing, abbreviated as NLP), and in particular, to a sentence processing method and apparatus.

Background

Along with the continuous development of technology, electronic devices used by people such as vehicle-mounted terminals, mobile phones and computers in daily life have a function of providing voice interaction, so that users can control the electronic devices in a speaking manner under driving, working or other busy conditions to instruct the electronic devices to execute corresponding instructions. For the electronic device, after the sentences spoken by the user are collected, the intention of the user is accurately identified according to the sentences, and the accuracy of subsequent execution can be ensured.

In the prior art, in order to identify collected sentences of a user, an electronic device is generally provided with at least three machine learning models for sequentially processing the sentences. For example, after a sentence is collected, the sentence is divided into a plurality of sub-sentence portions through a sentence breaking model, then the intention of each sub-sentence portion is identified through an intention identification model, then the word slot in each sub-sentence is determined through an entity identification model, and finally, the corresponding instruction in the user sentence can be executed according to the intention and the word slot of the plurality of sub-sentence portions obtained by the three machine learning models.

By adopting the prior art, the number of machine learning models arranged by the electronic equipment is more, so that the calculation steps required to be carried out in the sentence processing process are more, and the sentence processing speed and efficiency are reduced.

Disclosure of Invention

The application provides a sentence processing method and device, which are used for solving the technical problems of slower processing speed and lower efficiency caused by more machine learning models in the sentence processing process of electronic equipment.

The first aspect of the present application provides a sentence processing method, including: acquiring a statement to be processed; extracting word combinations of all single characters or a plurality of continuous characters of the sentence to be processed; inputting all the word combinations into a machine learning model; the machine learning model is used for extracting characteristic information of all word combinations in the sentence to be processed and determining labels of all the word combinations according to the characteristic information; the tags include a first type of tag for indicating an intention of the word combination and a second type of tag for indicating a named entity included in the plurality of word combinations; and determining the sub-sentences in the sentences to be processed, the intentions of the sub-sentences and the named entities of the sub-sentences according to the labels of all word combinations output by the machine learning model.

In a first embodiment of the first aspect of the present application, the feature information includes: the first type of tags includes an unintended tag, and an intended tag; the determining the sub-sentence in the sentence to be processed, the intention of the sub-sentence and the named entity of the sub-sentence according to the label output by the machine learning model comprises the following steps: determining the word combination corresponding to the intention label as a sub-sentence in the sentence to be processed, and determining the intention corresponding to the intention label as the intention of the sub-sentence.

In a first embodiment of the first aspect of the present application, the second type of tag includes a non-entity tag, and an entity tag; the determining the sub-sentence in the sentence to be processed, the intention of the sub-sentence and the named entity of the sub-sentence according to the label output by the machine learning model comprises the following steps: and determining the word combinations corresponding to the entity labels as named entities of sub-sentences in the sentences to be processed.

In an embodiment of the first aspect of the present application, the extracting feature information of all the word combinations in the sentence to be processed includes: for each word combination, determining a feature matrix of the word combination according to a word feature vector corresponding to each word in the word combination; each word has a corresponding word feature vector, and the word feature vector of each word in the word combination is combined to form a feature matrix of the word combination; determining mask information corresponding to the word combinations; wherein the mask information includes mask length information and mask start position information; and splicing the feature matrix and the mask information of the word combinations into feature information of the word combinations.

In an embodiment of the first aspect of the present application, the extracting all word combinations of single text or a plurality of continuous text of the sentence to be processed includes: presetting a mask information set according to the length of the sentence to be processed, wherein the mask information set comprises all preset mask information determined according to the length of the preset sentence, and each mask information corresponds to a word combination in the sentence to be processed; traversing all mask information in the mask information set to obtain word combinations corresponding to all mask information as word combinations of all single words or a plurality of continuous words of the sentence to be processed; for each word combination, determining the starting position of the word combination from the sentence to be processed according to mask starting position information in the mask information; determining the ending position of the word combination from the sentence to be processed according to mask length information and mask starting position information of mask information; and determining the characters between the starting position and the ending position in the sentence to be processed as one word combination.

In a first embodiment of the first aspect of the present application, the forming of all preset mask information in the mask information set includes: acquiring the length of a sentence to be processed, wherein the length of the sentence to be processed is formed by each word bit; taking each character bit as a mask starting position, and forming a plurality of mask information by a plurality of mask lengths corresponding to the character bit respectively; the length of the plurality of masks corresponding to the character bit is smaller than the length of the detachable length, and the detachable length is between the character bit and the tail character bit; and traversing all the text bits as mask starting positions, and respectively forming mask information by a plurality of mask lengths corresponding to the text bits to form all mask information in a mask information set.

In a first embodiment of the first aspect of the present application, the structure of the machine learning model includes: an input layer for converting each word in the word combination into a corresponding word index number; the word embedding layer is used for determining the word feature vector of each word in the word combination according to the word index number, and the word feature vectors of all words in the word combination form a feature matrix of the word combination; the first fusion layer fuses the feature matrix of the word combination with mask information corresponding to the word combination to obtain a first matrix serving as the feature information of the word combination; the convolution layer calculates sentence vector features of the word combinations through the first matrix to obtain a second matrix for indicating sentence feature information of the word combinations; the pooling layer is used for carrying out downsampling treatment on the second matrix to obtain a third matrix for indicating sentence characteristic information of word combinations; the full-connection layer maps the third matrix to the dimensions of a plurality of preset labels to obtain a fifth matrix, wherein the preset labels comprise a first type of label and a second type of label; the Softmax layer normalizes the numerical values in the fifth matrix to obtain a sixth matrix for indicating the probability that the word combinations are mapped to each preset label; and the output layer is used for determining the label with the highest probability in the sixth matrix as the label of the word combination. In an embodiment of the first aspect of the present application, after the obtaining the statement to be processed, the method further includes: when the number of characters of the sentence to be processed accords with a preset condition, inputting the sentence to be processed into the machine learning model; or, inputting the sentences to be processed into the machine learning model, wherein the number of the sentences to be processed is larger than a preset threshold value; or when the sentence to be processed is acquired under the preset condition, inputting the sentence to be processed into the machine learning model.

In an embodiment of the first aspect of the present application, the structure of the machine learning model further includes: and the second fusion layer is used for fusing the third matrix obtained by the pooling layer with the mask information to obtain a fourth matrix for indicating sentence characteristic information of word combinations, so that the full-connection layer maps the fourth matrix to the dimensions of a plurality of preset labels to obtain a fifth matrix.

A second aspect of the present application provides a sentence processing apparatus, including: the acquisition module is used for acquiring the statement to be processed; the extraction module is used for extracting word combinations of all single characters or a plurality of continuous characters of the sentence to be processed; the input module is used for inputting all the word combinations into the machine learning model; the machine learning model is used for extracting characteristic information of all word combinations in the sentence to be processed and determining labels of all the word combinations according to the characteristic information; the tags include a first type of tag for indicating an intention of the word combination and a second type of tag for indicating a named entity included in the plurality of word combinations; and the determining module is used for determining the sub-sentences in the sentences to be processed, the intention of the sub-sentences and the named entity of the sub-sentences according to the labels of all word combinations output by the machine learning model.

A third aspect of the present application provides an electronic device, comprising: a memory and a processor; the memory stores computer-executable instructions; the processor executes computer-executable instructions stored in the memory, causing the processor to perform the sentence processing method according to any one of the first aspects of the present application.

A fourth aspect of the present application provides a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, are adapted to carry out the sentence processing method according to any one of the first aspects of the present application.

A fifth aspect of the present application provides a computer program product comprising a computer program which, when executed by a processor, implements a statement processing method as claimed in any one of the first aspects of the present application.

In summary, the sentence processing method and the sentence processing device provided by the application can extract word combinations in the sentence to be processed after the sentence to be processed is obtained in the process of processing the sentence, and after the word combinations are input into a machine learning model, the characteristics of all the word combinations can be extracted through the machine learning model, and the labels of all the word combinations are determined, so that the sentence breaking, the intention of obtaining the sub-sentence and the named entity in the sub-sentence of the sentence to be processed can be simultaneously realized through the labels output by the machine learning model, thereby reducing the number of the machine learning models used for processing the sentence and arranged in the electronic equipment, reducing the occupation of the machine learning model to the storage space, reducing the time of the electronic equipment for processing the sentence, reducing the occupation amount and the occupation time of the sentence processing to the memory of the electronic equipment, and further improving the speed and the efficiency of the sentence processing of the electronic equipment.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive faculty for a person skilled in the art.

FIG. 1 is a schematic flow chart of an electronic device processing a sentence;

FIG. 2 is a schematic diagram of an intent recognition model for use with an electronic device;

FIG. 3 is a schematic diagram of an entity recognition model used in an electronic device;

FIG. 4 is a flowchart illustrating an embodiment of a sentence processing method provided in the present application;

FIG. 5 is a schematic diagram illustrating the structure of an embodiment of a machine learning model provided in the present application;

fig. 6 is a schematic structural diagram of another embodiment of a machine learning model provided in the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be capable of operation in sequences other than those illustrated or described herein, for example. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Before formally introducing the embodiments of the present application, the scenario applied by the present application and the technical problems existing in the prior art will be described with reference to the accompanying drawings. Specifically, the method and the device are applied to a scene that the electronic equipment processes sentences, wherein the electronic equipment can be electronic equipment such as a mobile phone, a notebook computer and a tablet computer, the electronic equipment can collect sentences which are spoken by a user, and identify and execute instructions corresponding to the sentences, so that the user sends instructions to the electronic equipment through voice. Or the electronic equipment can also be a vehicle-mounted terminal in an automobile or a main control system on an intelligent automobile, and the vehicle-mounted electronic equipment can collect sentences of a driver in the driving process of the driver, identify instructions in the sentences and execute the sentences, so that a more intelligent driving environment is provided, and the instructions can be sent to the electronic equipment through voice without separating from the current driving state of the driver. The method provided by each embodiment is described by taking the electronic device as an execution main body, but is not limited to the method, and it is understood that the statement processing method provided by the application can also be applied to scenes such as intelligent home, intelligent industry and the like and executed by any electronic device with relevant data processing capability.

In some embodiments, fig. 1 is a schematic flow chart of processing a sentence by an electronic device, where after receiving a sentence uttered by a user, the electronic device in the foregoing scenario needs to identify a corresponding instruction in the sentence. In the example shown in fig. 1, the sentence S includes two sub-sentence portions S1 and S2, for example, the sentence S may be "the skylight is opened a little and the air is adjusted to 25 degrees", and then the sentence may be interpreted as two sub-sentence portions S1 according to the words: "skylight open a bit" and S2: "air conditioner was adjusted to 25 degrees". Therefore, the electronic apparatus needs to first divide the sentence S into a plurality of sub-sentence portions S1 and S2 by the sentence-breaking model, and then perform intention recognition and word slot recognition on S1 and S2, respectively. For example, the sub-sentence portion S1 is processed by the intention recognition model to obtain an intention of S1 as an A-skylight operation, and then the sub-sentence portion S1 is processed by the entity recognition model to obtain a word slot of S1 as C-point. And processing the sub-sentence part S2 through an intention recognition model to obtain the B-air conditioning operation of the intention of S2, and then processing the sub-sentence part S2 through an entity recognition model to obtain the D-25 degree word slot of S2.

More specifically, fig. 2 is a schematic structural diagram of an intent recognition model used by an electronic device, and taking the intent recognition model shown in fig. 2 as an example to process the sub-sentence portion S1, the intent recognition model can convert each word in the sentence "skylight opening" in the form of a character string into a word index number through an input layer, and then output an array including a plurality of index numbers, where the array is a shaping numerical value, the index array output by the input layer can be preset to have a length of, for example, 70 words, and the output data is an array of 70 numerical values, where each numerical value represents an index of one word. Then, in the word embedding layer, the meaning of each word is represented by multi-dimensional floating point data, for example, an array of 128-dimensional elements may be used to represent the meaning of a word, and a matrix of [70,128] may be obtained, where each element in the matrix is a floating point number. The floating point data corresponding to each word can be preset or can be obtained by training in advance, and the word embedding layer can be determined by a table look-up mode and the like. Subsequently, in the convolutional layer, the NLP intent classification process may be used, with feature extraction of 3, 4, 5 word lengths, with successive 3, 4, 5 word extraction features for subsequent processing. The convolution layer, after deriving the matrix of the output of the word embedding layer 70,128, the size of the output matrix is related to the size of the convolution kernel. For example, when the size of the convolution kernel is [3,128], the matrix size of the convolution layer output is [68,1]. Accordingly, a 4 byte length feature extraction corresponds to the convolution kernel [4,128], and a 5 byte length feature extraction corresponds to the convolution kernel [5,128]. The pooling layer then downsamples the output of each convolution kernel in the convolution layer by downsampling to output a value representing the convolution result of the convolution layer, which is the maximum value in the convolution result matrix, i.e., the maximum value replaces the entire matrix and is output. Then, the fusion layer combines the data output by the pooling layers to obtain an array, for example, 128 3-byte downsampling matrixes, 128 4-byte downsampling matrixes and 128 5-byte downsampling matrixes obtained by the pooling layers are combined to obtain 384 values to form a one-dimensional array, and each element in the array is a floating point number. Then, after the full connection layer receives the one-dimensional array output by the fusion layer, n floating point values are obtained through the change of the full connection layer, n is a preset intention classification category number, for example, 20 user intentions can be processed by the electronic equipment, and the method comprises the following steps: music, weather, skylights, seating, etc., where 20 floating point values are output, each value indicating a probability of an intent, a larger value indicating a higher likelihood that a statement corresponds to the intent and a smaller value indicating a lower likelihood that the statement corresponds to the statement. Finally, the output layer outputs a preset skylight operation identifier 1 in advance to prompt that the intention of the current statement S1 is skylight operation in the n floating point values which are output by the full connection layer and are corresponding to the largest floating point value, for example, the floating point value which is corresponding to the intention of skylight operation is largest in the n floating point values which are corresponding to the intention.

Fig. 3 is a schematic structural diagram of an entity recognition model used by an electronic device, and takes the entity recognition model as shown in fig. 3 as an example to process the sub-sentence portion S1, the entity recognition model first passes through an input layer to convert each word in a sentence "skylight opening point" in a character string form into a word index number, and then outputs an array including a plurality of index numbers. Then, in the word embedding layer, the meaning of each word is represented using multidimensional floating point data. And extracting the corresponding characteristics of the floating point data of each word through a bidirectional LSTM layer, wherein LSTM is Short for Long Short-Term Memory, and more accurate information can be obtained by combining context information when processing one sentence. Illustratively, if only one direction of LSTM is used, the information of the word sequence and the word sequence in the sentence is lost, the meaning of 'I love you' and 'I love me' is not distinguished, and when the model uses the bidirectional LSTM, one forward LSTM is used for treating 'I love you', and the other reverse LSTM is used for treating 'I love me', so that the results of 2 LSTM treatments are combined, and the characteristic of the sequence relation of each word and the word in the sentence can be extracted through the common extraction of the bidirectional LSTM. The bidirectional LSTM layer outputs a matrix with a size of [70,2 x hiddenuni ], where 70 corresponds to 70 words converted by the input layer, and positive LSTM and negative LSTM obtain a matrix with 140 dimensions, and hiddenuni is a preset length of bidirectional LSTM, which may be 128, for example. The fully-connected layer is then configured to process the bi-directional LSTM derived matrix into a new matrix of size [70, OUTPUTDIM ], where OUTPUTDIM represents the number of NERs (named entity identities) that can be obtained by the entity recognition model, e.g., the NER results may be: temperature, humidity, name of person, etc. Each NER result corresponds to a value of OUTPUTDIM. Then, since the matrix size of the full connection layer output is [70, output dim ], but eventually each word can only have one tag, 70 words are 70 tags, and the output format is a one-dimensional array of 70 elements. Therefore, in the decoding layer, the value of each link is added by means of Viterbi decoding (Viterbi decoding), and a transition matrix is added, so that after the value of the whole link is finally obtained, a floating point value corresponding to the number of NER results is output, each value is used for indicating the probability of one NER result, and the larger the value is, the higher the probability that the word in the statement corresponds to the NER result is. Finally, the output layer outputs the word slot corresponding to the largest floating point value and the NER result thereof in the floating point values corresponding to the NER results output by the decoding layer. For example, the NER result of the word slot "25 degrees" included in the current sentence S2 may be obtained as a temperature through the output layer.

In summary, in the manner shown in fig. 1 to fig. 3, when the electronic device analyzes a sentence, the sentence can be divided into a plurality of sub-sentence portions by the sentence breaking model, then the intention corresponding to the sub-sentence portions is obtained by the intention recognition model, and the word slots in the sub-sentence are obtained by the entity recognition model. The subsequent electronic equipment can execute the corresponding instruction in the user sentence according to the intentions and word slots of the multiple sub-sentence parts obtained by the three machine learning models.

However, when the sentence analysis is performed in the above manner, since the number of machine learning models to be set in the electronic device is large, the data size of each machine learning model is large, and each machine learning model needs to store relevant comparison data to perform auxiliary recognition, so that the machine learning model occupies a large storage space of the electronic device. Meanwhile, the electronic equipment can only use one machine learning model to process sentences and then use the next machine learning model to process, each machine learning model needs to carry out data extraction and calculation processes for a long time, so that the time required in the sentence processing process is long, the processing speed of the electronic equipment when the sentences are processed is slow, the running of the machine learning model also occupies the memory of the electronic equipment for a long time, and the sentence processing speed and efficiency of the electronic equipment are greatly reduced.

Therefore, the application provides a sentence processing method and device, which are used for solving the technical problems of slower processing speed and lower efficiency caused by more machine learning models in the sentence processing process of the electronic equipment, so that more machine learning models are not required to be set in the electronic equipment, the calculated amount is reduced by reducing the number of the machine learning models, meanwhile, data of more machine learning models are not required to be stored in the electronic equipment, the occupation amount and the occupation time of the operation of the machine learning models to the memory of the electronic equipment can be reduced, and the speed and the efficiency of the electronic equipment for processing the sentences are further improved.

The technical scheme of the present application is described in detail below with specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes. For example, fig. 4 is a flowchart of an embodiment of a sentence processing method provided in the present application, where the method shown in fig. 4 may be applied in a scenario where an electronic device processes a sentence, and specifically includes:

s101: and acquiring a statement to be processed.

The electronic device firstly obtains the sentence to be processed in S101, and then performs sentence breaking, intention recognition and named entity recognition (Named Entity Recognition, abbreviated as NER) processing on the sentence to be processed, so as to obtain a plurality of sub-sentences by performing sentence breaking on the sentence to be processed according to the intention and the named entity, determine the intention of each sub-sentence at the same time, and recognize the named entity in the sub-sentence.

In some embodiments, the electronic device may be a vehicle-mounted device such as a vehicle-mounted terminal, and the electronic device may collect, in S101, a sentence uttered by a user through a voice through a sound collecting device such as a microphone, and perform subsequent processing; alternatively, the electronic device may also acquire a sentence or the like input by the user.

For example, assuming that the sentence to be processed acquired in S101 is "the skylight is opened by a little and the air conditioner is adjusted to 25 degrees", the electronic device performs the sentence breaking on the sentence to be processed to obtain two sub-sentences, namely "the skylight is opened by a little" and "the air conditioner is adjusted to 25 degrees", and the intention of the two sub-sentences is "the skylight" and "the air conditioner" respectively, and meanwhile, performs the named entity recognition on the sentence to be processed to obtain the entity type corresponding to the named entity "25 degrees" in the sub-sentences "the air conditioner is adjusted to 25 degrees" as "the temperature". The types of different entities in the sentence can be preset or can be trained in advance, for example, the corresponding relation of the named entities such as the words of 25 degrees, temperature, zhang Sany, name, volume and sound can be specified.

S102: all the word combinations of the single text and the plurality of continuous texts in the sentence to be processed acquired in S101 are extracted.

Specifically, the electronic device determines all single characters and a plurality of continuous characters in the whole sentence to be processed according to preset mask information, and marks each single character or a plurality of continuous characters as word combinations in the sentence to be processed.

In some embodiments, a mask information set is preset according to the length of the sentence to be processed, wherein the mask information set includes all preset mask information determined according to the length of the preset sentence, each mask information in the mask information set corresponds to a word combination in the sentence to be processed, and the mask information includes mask start position information and mask length information of the corresponding word combination. The length of the sentence to be processed is formed by each word bit arranged in sequence, wherein each word bit has a corresponding serial number, and each word bit can be filled with a word to form the sentence to be processed. The forming of the mask information set includes: taking a word bit in a sentence to be processed as a mask starting position, taking the length between the word bit and an end word bit as a detachable length, taking any length value smaller than the detachable length as a plurality of mask lengths corresponding to the word bit, namely taking any length smaller than the detachable length as a mask length, and respectively forming mask information with the mask starting position, wherein the length of the sentence to be processed consists of 70bit word bits, taking a first word bit as the mask starting position, taking the length between the first word bit and the end word bit as the detachable length, and taking any length smaller than the detachable length as 70 lengths of 0, 1, 2, 3, … … and 69; in this way, all text bits are traversed as mask starting positions, and a plurality of mask lengths corresponding to the current text bit are respectively formed into mask information, thereby forming all preset mask information in the mask information set. Then, when the length of the preset sentence is m text bits, the number of the mask information in the mask information set is (m+1) m/2; therefore, when M words fill in each word bit in the sentence to be processed, the number of word combinations M of a single word and a plurality of continuous words, which can be obtained according to the mask information, is (m+1) ×m/2.

Illustratively, taking the sentence to be processed as "skylight is open and conditioned to 25 degrees" as an example, the word combination of the single text or the plurality of continuous texts determined in S102 may be represented by the following table 1:

TABLE 1

As shown in table 1, the electronic device divides the sentence to be processed into a plurality of word combinations of reference numerals 1 to M in table 1, wherein the electronic device may divide the sentence to be processed into a plurality of word combinations of one or more consecutive text compositions as shown in table 1 according to preset mask information. The mask information may be a starting position and a length of a word combination obtained by combining each word in the sentence in sequence and n words behind the sequence of the word. N is an integer between 1 and N in sequence, and N is the total number of characters behind the character arrangement sequence. For example, the word combinations of the single words arranged in sequence may be a day, a window, and a window … …, where the word combinations of the plurality of continuous words take a word "day" as an example, for N consecutive words behind the one word, n=11, where n=1, a word combination of 2 consecutive words with a sequence number of 2 may be obtained as a "skylight" until N takes 11, a word combination of 12 consecutive words with a sequence number of 12 is obtained as "skylight is opened by a little air conditioner to 25 degrees", and then the window is taken as the beginning of the word combination of the plurality of continuous words, where n=10, and a word combination of the plurality of continuous words with a sequence number of 13-24 may be obtained. Similarly, m=12+11+10+9+8+7+6+5+4+3+2+1=78 word combinations can be obtained, each word combination including a single or multiple consecutive words in the sentence to be processed.

In some embodiments, the mask information used to determine the plurality of word combinations may be a starting position and a length of each word or a plurality of consecutive words in the sentence to be processed, and when determining the plurality of word combinations as shown in table 1 in the sentence to be processed in S102, the electronic device may first determine, according to the mask starting position information in the mask information, a starting position of the word combination of the sentence to be processed in order to determine one word combination in the sentence to be processed. For example, the mask starting position information is the first, second and third bits … … of the sentence to be processed, and each word in the sentence to be processed corresponds to the day, window and opening … … as the word at the starting position of the mask information, that is, the word at the starting position of the word combination corresponding to the mask information. Then, by combining the mask starting position information and the mask length information in the mask information, the ending position of the mask information can be determined, for example, the mask starting position information is the first bit, the second bit and the third bit of the sentence to be processed, the mask length information is 2, 3 and 4, and then the ending position of the mask information is the second bit, the fourth bit and the sixth bit of the sentence to be processed respectively, that is, the ending position of the word combination corresponding to the mask information is the second bit, the fifth bit and the seventh bit of the sentence to be processed, and the characters "skylight", "open dot" and "open dot" between the starting position and the ending position are the corresponding word combination respectively. Finally, the electronic equipment determines the characters between each starting position and each ending position in the characters to be processed as a word combination according to the determined starting positions and ending positions. For example, the word "day" between the start position 1 and the end position 1 is taken as the word combination 1, the word "skylight" between the start position 1 and the end position 2 is taken as the word combination 2, and the like, and all the multiple word combinations are finally determined.

S103: inputting all the word combinations determined in S102 into the machine learning model, so that the machine learning model outputs the labels of all the word combinations. The labels output by the machine learning model comprise a first type of labels and a second type of labels, wherein the first type of labels are used for indicating the intention of the corresponding word combinations, and the second type of labels are used for indicating the named entities corresponding to the word combinations.

Specifically, the machine learning model provided in the embodiment of the present application has a function of obtaining a sub-sentence obtained by breaking a sentence to be processed according to an obtained tag, and obtaining an intention of the sub-sentence and a named entity in the sentence, so that when a plurality of continuous word combinations obtained by the sentence to be processed are input into the machine learning model, the machine learning model can sequentially extract feature information of all word combinations, determine a tag (label) corresponding to each word combination according to the feature information, and output the feature information, so that the sub-sentence in the sentence to be processed can be determined by the tag, and information such as the intention of the sub-sentence and the named entity in the sub-sentence determined by the machine learning model is indicated.

In some embodiments, since the feature of the sentence to be processed is extracted by the machine learning model in the embodiments of the present application, the machine learning model may determine, for each word combination in the sentence to be processed, a word feature vector corresponding to each word in the word combination according to a manner as shown in table 1, so as to obtain a feature matrix of the word combination, and then determine mask information corresponding to the word combination, so as to splice the feature matrix of the word combination and the mask information into feature information of the word combination, and finally determine a tag of the word combination according to the feature information. Illustratively, in the example shown in table 1, the labels output by the machine learning model for the word combination "skylight opening a point" and the word combination "air-conditioning to 25 degrees" are "segment1" and "segment2", the label "segment1" is a label of a first type, corresponding to the intent 1 being "skylight", the label "segment2" is a label of a first type, corresponding to the intent 2 being "air-conditioning", the label of the first type is used for indicating that a plurality of consecutive words in the sentence to be processed correspond to one sub-sentence, i.e., the word combinations corresponding to the segment1 and the segment2 are respectively taken as sub-sentences of the sentence to be processed; the label output by the machine learning model on the word combination of 25 degrees is 'temperature', and the label 'temperature' is a label of a second type and is used for indicating that the named entity category of the word combination is temperature, namely the named entity with 'temperature' in the corresponding sub-sentence.

S104: the electronic device determines at least two sub-sentences in the sentence to be processed, the intention of each sub-sentence and the named entity included in each sub-sentence according to the labels of all word combinations output by the machine learning model.

Specifically, the first type of tag includes an intended tag and a non-intended tag, the second type of tag includes an entity tag and a non-entity tag, and when the machine learning model obtains the tags of all word combinations shown in the last column in table 1, the electronic device may determine the sub-sentence in the sentence to be processed according to the tag output by the machine learning model. When the label of the outputted word combination is an intention label ' segment1 ' in the first type label, determining the word combination corresponding to the intention label as a sub-sentence in the sentence to be processed, for example, determining the intention label as a ' skylight ', and determining the word combination corresponding to the intention label as a sub-sentence of the sentence to be processed, wherein the intention of the sub-sentence is ' skylight; when the label of the outputted word combination is an intention label 'segment 2' in the first type label, determining the word combination corresponding to the intention label as another sub-sentence in the sentence to be processed, for example, determining the intention label as 'air conditioner', and determining the word combination corresponding to the intention label as another sub-sentence of the sentence to be processed, wherein the intention of the sub-sentence is 'air conditioner'; when the label of the outputted word combination is determined to be an unintended label '0' in the first type label, the word combination corresponding to the label is determined not to be a sub-sentence of the sentence to be processed.

When the label of the outputted word combination is an entity label 'temperature' in the second type label, determining the word combination corresponding to the entity label as a named entity in the sub-sentence of the sentence to be processed, and when the label of the outputted word combination is determined to be a non-visual entity label '0' in the second type label, determining that the word combination corresponding to the label is not the named entity.

It should be noted that the above sentences are merely examples, in the practical application process, the number of sub-sentences and named entities that can be determined by the machine learning model is not limited, and the intended category (skylight, air conditioner, seat, etc.) of the sub-sentences and the specific category (temperature, speed, height, etc.) of the named entities can be customized, which is not limited in the embodiment of the present application.

In summary, according to the sentence processing method provided by the embodiment, in the process of processing a sentence, the electronic device extracts all word combinations in the sentence to be processed, and inputs the word combinations into a machine learning model, then a sub-sentence can be obtained by simultaneously obtaining the intent of the sub-sentence and a named entity in the sub-sentence through one machine learning model, so that the purposes of sentence breaking, intent recognition and named entity recognition can be simultaneously achieved by using one machine learning model, the number of machine learning models set by the electronic device is reduced, the occupation of the machine learning model on a storage space is reduced, and because the machine learning model can simultaneously perform the purposes of sentence breaking and named entity recognition in a parallel processing mode, compared with the prior art that the purposes of sentence breaking and intent recognition can be performed after the sentence breaking model is performed, the time of the electronic device for sentence processing can be reduced, the occupation amount and occupation time of the processing of the sentence on the memory of the electronic device are reduced, and the processing speed and efficiency of the sentence processing by the electronic device are further improved.

In some embodiments, the machine learning model used in the embodiments of the present application may be trained in advance, and during training, each training sentence may be divided into a plurality of word combinations according to mask information and input into the machine learning model, and the intent of the sub-sentence in the plurality of word combinations included in the training sentence, and the category of the named entity are labeled in advance. For example, when training, the training sentences "the skylight point air conditioner is adjusted to 30 degrees", "the speed is higher a bit and then changes to the right", "the B song of the playing a singer is more loud a bit", and the like may be used, each training sentence may be divided into a plurality of word combinations according to mask information and labeled as different labels (for example, in the "the B song of the playing a singer is more loud a bit", the sub-sentences of the sub-sentences "the B song of the playing a singer" are labeled as the first type of intention labels, the corresponding intention is "songs", the sub-sentences of the sub-sentences "the volume is more loud a bit" are labeled as the first type of intention labels, the corresponding named entity is labeled as the second type of entity labels, the corresponding named entity is classified as the "singer", the corresponding named entity is labeled as the second type of entity is labeled as "song", and the other one or more text labels may be labeled as 0, and the like), so as to send into the machine learning model for learning training. When the machine learning model learns the characteristics of the training sentences through training, and then receives a plurality of word combinations of the sentences to be processed, the same characteristics can be extracted according to the trained model, the labels corresponding to the sentences to be processed are judged in a mode of comparing the similarity between the characteristics of the trained sentences, and the labels corresponding to the sentences to be processed are output as shown in table 1.

A specific structure of the machine learning model provided in the present application is described below with reference to the accompanying drawings. Fig. 5 is a schematic structural diagram of an embodiment of a machine learning model provided in the present application, where the machine learning model shown in fig. 5 specifically includes, in order, between an input layer and an output layer:

and the input layer is used for converting each word in the plurality of word combinations in the sentence to be processed into a corresponding word index number, and then one word combination can be converted into an array comprising one or more index numbers. The length of the array may be preset, for example, 70, and the output of the input layer is an array of 70 elements, each element being an integer representing an index of a word. This embodiment is followed by a length of 70 as an example and is not intended to be limiting.

The word embedding layer is used for determining the word feature vector of each word in the word combination according to the word index number obtained by the input layer, and the self feature vectors of all the words in the determined word combination form a feature matrix of the word combination; after receiving the array of index numbers with the size of [70,1], the word embedding layer can determine the word feature vector corresponding to each word through looking up a table or complex feature extraction (Bidirectional Encoder Representation from Transformers, abbreviated as BERT) according to the index numbers, and the like, wherein each word feature vector is used for representing the feature of one word, so that the word embedding layer outputs a word feature vector matrix with the size of [70, h ]. H may be in the feature dimension of each word, for example taking h=200 as an example.

The first fusion layer fuses the feature matrix of the word combination with mask information corresponding to the word combination to obtain a first matrix serving as feature information of the word combination; the mask information includes a starting position and a length of the word combination in the sentence to be processed, and specific values of the starting position and the length can be selected by referring to an example of table 1. The starting position of the M lines of preset masks in table 1 may be represented by a matrix with a size of [ M,1], and the mask length may be represented by a matrix with a size of [ M,1], and the first matrix with a size of [ M,70, h+2] may be obtained after fusion.

And the convolution layer calculates sentence vector features of the word combinations through the first matrix, so that a second matrix capable of indicating the sentence features of the word combinations is obtained. The convolution layer can specifically perform convolution processing on the first matrix through a plurality of set convolution cores, so that sentence feature vectors output by each convolution core can be obtained, the sentence feature vectors can be extracted through natural language processing (Neuro-Linguistic Programming, abbreviated as NLP), and sentence feature information obtained by the convolution layer can be indicated through a second matrix with the size of [ M,70, H+2 ]. For example, the input of the convolution layer is the output matrix of the previous first fusion layer, the output is also a matrix, the matrix size is related to the size of the convolution kernel, for example, the convolution kernel size is [3,3], then the output matrix after convolution is [ M,70, h+2], the specific implementation manner of convolution is not limited, and only the way of ensuring that the sentence to be processed is convolved is the same as the way of convolving the training sentence, so that the characteristics of the two follow-up steps can be compared. In some embodiments, the convolution layer is used to extract more feature information, especially context information between the preceding and following words in one or more consecutive words, and when the word embedding layer considers the context information of the word, the convolution layer may be omitted here, so that the matrix output by the mask fusion layer is directly sent to the subsequent pooling layer for processing.

And the pooling layer performs downsampling treatment on the second matrix with the size of [ M,70, H+2] output by the convolution layer to obtain a third matrix with the size of [ M,1, H+2], wherein the third matrix can be used for indicating sentence characteristic information of word combination. The purpose of the pooling layer is to ignore unimportant characteristics in the features extracted by the convolution kernel, and the "downsampling" is to find out the maximum value in the second matrix, and replace the whole second matrix with a third matrix composed of the maximum values. Each convolution kernel of the convolution layers is followed by a pooling layer whose output is the maximum value representing the convolution result in the second matrix. For example, the input to the pooling layer is a second matrix of [ M,70, H+2], the middle dimension 70 is averaged, and the output dimension becomes a third matrix of [ M,1, H+2 ].

The full-connection layer maps the word combinations of the third matrix to the dimensions of a plurality of preset labels to obtain a fifth matrix with the size of [ M, N ]; each preset label in the N preset labels is used for indicating labels marked by word combinations obtained by dividing according to training sentences according to mask information, and the preset labels comprise a first type of label and a second type of label. For example, the preset tag includes at least: the first type of label 'segment 1' -is used for indicating that the characters are sub-sentences and are intended to be 'skylights', the first type of label 'segment 2' -is used for indicating that the characters are sub-sentences and are intended to be 'air-conditioning', the second type of label 'temperature' -is used for indicating that the named entities of the characters correspond to temperature types, the second type of label 'name' -is used for indicating that the named entities of the characters correspond to name types and the like, the labels are all manual labeling contents acquired by a machine learning network when learning training sentences, and after the machine learning model learns the characteristics through training, the sentence to be processed can be analyzed according to the characteristic processing mode which is the same as that of the labeled labels. In some embodiments, the fully connected layer obtains the fifth matrix Y from the fourth matrix X by specifically: y=x×w+b, where W is a weight matrix of size [ h+3, N ], B is a bias matrix of a one-dimensional array [ N ], and B and N may be preset or may be obtained by training through a training sentence.

And the Softmax layer is used for carrying out normalization processing on the numerical value corresponding to each preset label in the fifth matrix to obtain a sixth matrix for indicating the probability of mapping the word combination to each preset label. Wherein, in the M sentences in the fifth matrix, each sentence corresponds to N floating point values, and each floating point value is used for indicating a probability value of one sentence corresponding to one tag. For example, in the fifth matrix, the statement "25 degrees" corresponds to the floating point values of N labels being 1, C0, C1, C2, … CN-1, and the Softmax layer may normalize the floating point values corresponding to the N labels to obtain a sixth matrix of unchanged size [ M, N ].

And an output layer for outputting the label with the highest probability as the label of the word combination, namely outputting the label with the largest floating point value corresponding to the N labels corresponding to each sentence in the M sentences according to a sixth matrix of [ M, N ], wherein the output can refer to the last column of labels in the table 1. For example, the statement "skylight is opened a little" and outputs a "segment1" tag corresponding to the largest floating point value among N floating point values, the statement "air conditioner is tuned to a" segment2 "tag corresponding to the largest floating point value among N floating point values output at 25 degrees", the statement "25 degrees" is output a "temperature" tag corresponding to the largest floating point value among N floating point values, and other tags may output 0 without the largest floating point value.

In some embodiments, fig. 6 is a schematic structural diagram of another embodiment of a machine learning model provided in the present application, where the machine learning model shown in fig. 6 further includes, between the pooling layer and the fully connected layer, on the basis of the embodiment shown in fig. 5: and a second fusion layer. The second fusion layer is used for fusing the third matrix of [ M,1, H+2] and the mask length in the mask information again to obtain a fourth matrix; the mask length may be expressed as a matrix with a size of [ M,1], and then the matrix with the size of [ M, h+3] is fused with the third matrix to output a fourth matrix with the size of [ M, h+3 ]. Accordingly, in fig. 6, the full-connection layer may be configured to map a fourth matrix with a size of [ M, h+3] onto dimensions of a plurality of preset labels, to obtain a fifth matrix with a size of [ M, N ]. Other layers in fig. 6 are the same as those shown in fig. 5, and will not be described again.

In some embodiments, after obtaining the sentence to be processed, the electronic device may determine the number of words included in the sentence to be processed, and if the number of words meets a preset condition, for example, the number of words is greater than 10, it is indicated that the sentence to be processed is more likely to include a plurality of sub-sentences, and then determine that the sentence to be processed is divided into a plurality of word combinations in a manner shown in fig. 4 and input into the machine learning model, so as to perform processing such as sentence breaking, intention recognition and the like simultaneously. Otherwise, if the number of words is less than 5, the words may correspond to one sentence, and the intention recognition model and the entity recognition model shown in fig. 2 may be directly used for recognizing the intention and the named entity without breaking the sentence. Or, the more the number of words can be, the higher the corresponding priority, and when the electronic device is processing multiple sentences simultaneously, the more the number of words received in the sentence to be processed is in the multiple sentences, the sentence to be processed is divided into multiple word combinations in the manner shown in fig. 4 and is input into the machine learning model for processing, and the sentences are divided into multiple word combinations in the manner shown in fig. 4 and are input into the machine learning model in turn according to the number of words received in the sentence, and the order of the number of words is from large to small in the sentence.

In the foregoing embodiments, description has been made of the sentence processing method provided in the embodiments of the present application, and in order to implement the functions in the sentence processing method provided in the embodiments of the present application, the electronic device as the execution subject may include a hardware structure and/or a software module, and implement the functions in the form of a hardware structure, a software module, or a hardware structure plus a software module. Some of the functions described above are performed in a hardware configuration, a software module, or a combination of hardware and software modules, depending on the specific application of the solution and design constraints.

For example, the present application further provides a sentence processing apparatus including: the device comprises an acquisition module, an extraction module, an input module and a determination module. The acquisition module is used for acquiring the statement to be processed; the extraction module is used for extracting word combinations of all single characters or a plurality of continuous characters of the sentence to be processed; the input module is used for inputting all word combinations into the machine learning model; the machine learning model is used for extracting the characteristic information of all word combinations in the sentence to be processed and determining the labels of all word combinations according to the characteristic information; the labels comprise a first type of label and a second type of label, wherein the first type of label is used for indicating the intention of word combination, and the second type of label is used for indicating a named entity included in one word or a plurality of continuous words; the determining module is used for determining sub-sentences in the sentences to be processed, intention of the sub-sentences and named entities of the sub-sentences according to labels of all word combinations output by the machine learning model.

Specifically, the specific principle and implementation manner of the above steps and the specific structure of the machine learning model, which are executed by each module in the sentence processing device respectively, may refer to the description in the sentence processing method in the foregoing embodiments of the present application, and will not be repeated.

It should be noted that, it should be understood that the division of the modules of the above apparatus is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. The function of the above determination module may be implemented as a processing element that is set up separately, or may be integrated into a chip of the above apparatus, or may be stored in a memory of the above apparatus in the form of program codes, and may be called and executed by a processing element of the above apparatus. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.

For example, the modules above may be one or more integrated circuits configured to implement the methods above, such as: one or more specific integrated circuits (application specific integrated circuit, ASIC), or one or more microprocessors (digital signal processor, DSP), or one or more field programmable gate arrays (field programmable gate array, FPGA), or the like. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general purpose processor, such as a central processing unit (central processing unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

The present application also provides an electronic apparatus including: a processor and a memory; wherein the memory has stored therein a computer program which, when executed by the processor, is operable to perform a sentence processing method as in any of the previous embodiments of the present application.

The present application also provides a computer-readable storage medium storing a computer program which, when executed, is operable to perform a sentence processing method as in any of the foregoing embodiments of the present application.

The embodiment of the application also provides a chip for running the instructions, and the chip is used for executing the statement processing method executed by the electronic equipment in any embodiment of the application.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A sentence processing method, comprising:

acquiring a statement to be processed;

extracting word combinations of all single characters or a plurality of continuous characters of the sentence to be processed;

inputting all the word combinations into a machine learning model; the machine learning model is used for extracting characteristic information of all word combinations in the sentence to be processed and determining labels of all the word combinations according to the characteristic information; the tags include a first type of tag for indicating an intention of the word combination and a second type of tag for indicating a named entity included in the plurality of word combinations;

determining sub-sentences in the sentences to be processed, the intentions of the sub-sentences and the named entities of the sub-sentences according to the labels of all word combinations output by the machine learning model;

the extracting the characteristic information of all word combinations in the sentence to be processed comprises the following steps:

for each word combination, determining a feature matrix of the word combination according to a word feature vector corresponding to each word in the word combination; each word has a corresponding word feature vector, and the word feature vector of each word in the word combination is combined to form a feature matrix of the word combination;

Determining mask information corresponding to the word combinations; wherein the mask information includes mask length information and mask start position information;

splicing the feature matrix and the mask information of the word combinations into feature information of the word combinations;

the extracting word combinations of all single characters or a plurality of continuous characters of the sentence to be processed comprises the following steps:

presetting a mask information set according to the length of the sentence to be processed, wherein the mask information set comprises all preset mask information determined according to the length of the preset sentence, and each mask information corresponds to a word combination in the sentence to be processed;

traversing all mask information in the mask information set to obtain word combinations corresponding to all mask information as word combinations of all single words or a plurality of continuous words of the sentence to be processed;

for each word combination, determining the starting position of the word combination from the sentence to be processed according to mask starting position information in the mask information; determining the ending position of the word combination from the sentence to be processed according to mask length information and mask starting position information of mask information; determining the characters between the starting position and the ending position in the sentence to be processed as one word combination;

The forming of all preset mask information in the mask information set comprises the following steps:

acquiring the length of a sentence to be processed, wherein the length of the sentence to be processed is formed by each word bit;

taking each character bit as a mask starting position, and forming a plurality of mask information by a plurality of mask lengths corresponding to the character bit respectively; the length of the plurality of masks corresponding to the character bit is smaller than the length of the detachable length, and the detachable length is between the character bit and the tail character bit;

and traversing all the text bits as mask starting positions, and respectively forming mask information by a plurality of mask lengths corresponding to the text bits to form all mask information in a mask information set.

2. The method of claim 1, wherein the first type of tag comprises an unintended tag, and an intended tag;

the determining the sub-sentence in the sentence to be processed, the intention of the sub-sentence and the named entity of the sub-sentence according to the label output by the machine learning model comprises the following steps:

determining the word combination corresponding to the intention label as a sub-sentence in the sentence to be processed, and determining the intention corresponding to the intention label as the intention of the sub-sentence.

3. The method of claim 2, wherein the second type of tag comprises a non-physical tag, and a physical tag;

and determining the word combinations corresponding to the entity labels as named entities of sub-sentences in the sentences to be processed.

4. A method according to any one of claims 1-3, wherein the structure of the machine learning model comprises:

an input layer for converting each word in the word combination into a corresponding word index number;

the word embedding layer is used for determining the word feature vector of each word in the word combination according to the word index number, and the word feature vectors of all words in the word combination form a feature matrix of the word combination;

the first fusion layer fuses the feature matrix of the word combination with mask information corresponding to the word combination to obtain a first matrix serving as the feature information of the word combination;

the convolution layer calculates sentence vector features of the word combinations through the first matrix to obtain a second matrix for indicating sentence feature information of the word combinations;

The pooling layer is used for carrying out downsampling treatment on the second matrix to obtain a third matrix for indicating sentence characteristic information of word combinations;

the full-connection layer maps the third matrix to the dimensions of a plurality of preset labels to obtain a fifth matrix, wherein the preset labels comprise a first type of label and a second type of label;

the Softmax layer normalizes the numerical values in the fifth matrix to obtain a sixth matrix for indicating the probability that the word combinations are mapped to each preset label;

and the output layer is used for determining the label with the highest probability in the sixth matrix as the label of the word combination.

5. The method of claim 4, wherein the structure of the machine learning model further comprises: and the second fusion layer is used for fusing the third matrix obtained by the pooling layer with the mask information to obtain a fourth matrix for indicating sentence characteristic information of word combinations, so that the full-connection layer maps the fourth matrix to the dimensions of a plurality of preset labels to obtain a fifth matrix.

6. A sentence processing apparatus, comprising:

the acquisition module is used for acquiring the statement to be processed;

the extraction module is used for extracting word combinations of all single characters or a plurality of continuous characters of the sentence to be processed;

The input module is used for inputting all the word combinations into the machine learning model; the machine learning model is used for extracting characteristic information of all word combinations in the sentence to be processed and determining labels of all the word combinations according to the characteristic information; the tags include a first type of tag for indicating an intention of the word combination and a second type of tag for indicating a named entity included in the plurality of word combinations;

the determining module is used for determining sub-sentences in the sentences to be processed, the intention of the sub-sentences and the named entity of the sub-sentences according to the labels of all word combinations output by the machine learning model;

the machine learning model is specifically configured to determine, for each word combination, a feature matrix of the word combination according to a word feature vector corresponding to each word in the word combination; each word has a corresponding word feature vector, and the word feature vector of each word in the word combination is combined to form a feature matrix of the word combination; determining mask information corresponding to the word combinations; wherein the mask information includes mask length information and mask start position information; splicing the feature matrix and the mask information of the word combinations into feature information of the word combinations;

The extraction module is specifically configured to preset a mask information set according to the length of the sentence to be processed, where the mask information set includes all preset mask information determined according to the length of the preset sentence, and each mask information corresponds to a word combination in the sentence to be processed; traversing all mask information in the mask information set to obtain word combinations corresponding to all mask information as word combinations of all single words or a plurality of continuous words of the sentence to be processed; for each word combination, determining the starting position of the word combination from the sentence to be processed according to mask starting position information in the mask information; determining the ending position of the word combination from the sentence to be processed according to mask length information and mask starting position information of mask information; determining the characters between the starting position and the ending position in the sentence to be processed as one word combination;

the acquiring module is further used for acquiring the length of the sentence to be processed, wherein the length of the sentence to be processed is formed by each word bit;

the mask information composition module is used for taking each character bit as a mask starting position and forming a plurality of mask information by a plurality of mask lengths corresponding to the character bit respectively; the length of the plurality of masks corresponding to the character bit is smaller than the length of the detachable length, and the detachable length is between the character bit and the tail character bit; and traversing all the text bits as mask starting positions, and respectively forming mask information by a plurality of mask lengths corresponding to the text bits to form all mask information in a mask information set.

7. An electronic device, comprising: a memory and a processor; the memory stores computer-executable instructions; the processor executes computer-executable instructions stored in the memory, causing the processor to perform the sentence processing method according to any one of claims 1 to 5.