WO2023116572A1 - Procédé de génération de mots ou de phrases et dispositif associé - Google Patents

Procédé de génération de mots ou de phrases et dispositif associé Download PDF

Info

Publication number
WO2023116572A1
WO2023116572A1 PCT/CN2022/139629 CN2022139629W WO2023116572A1 WO 2023116572 A1 WO2023116572 A1 WO 2023116572A1 CN 2022139629 W CN2022139629 W CN 2022139629W WO 2023116572 A1 WO2023116572 A1 WO 2023116572A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
character string
character
words
sentences
Prior art date
Application number
PCT/CN2022/139629
Other languages
English (en)
Chinese (zh)
Inventor
肖镜辉
刘群
吴海腾
张哲�
熊元峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023116572A1 publication Critical patent/WO2023116572A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present application relates to the field of artificial intelligence, in particular to a method for generating words and sentences and related equipment.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is the branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that respond in ways similar to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • the input method editor is a necessary application program for the client, and is widely used in desktop computers, notebooks, mobile phones, tablets, smart TVs, car computers and other devices; and the user's daily activities, such as: searching for places, finding restaurants, Chatting and making friends, travel planning, etc., will largely be transformed into user input behaviors, so the data of the input method editor can be used to accurately describe users. Therefore, input method editors have great strategic significance in the Internet field.
  • the input method editor will generate words (words or sentences) and prompt the words and sentences for the user to choose.
  • words words or sentences
  • the accuracy of the generated words and sentences directly affects the input method editor.
  • the accuracy rate and user experience; for this, a method that can accurately generate words and sentences is needed.
  • the present application provides a method for generating words and sentences, which reduces the influence of the error superposition of the error correction model and the word segmentation model on the accuracy of words and sentences, and improves the generation accuracy of words and sentences.
  • the present application provides a method for generating words and sentences, the method comprising:
  • the target character string is input by the user in the input method tool
  • a character string can be understood as a combination of characters, which is a carrier of language information and is used to generate a sentence; the sentence can be one word or multiple words, and a word can also become a word .
  • the user can input the target character string sequence through the input method tool, and then the terminal device can obtain the target character string sequence input by the user.
  • the target words and sentences corresponding to the target character string sequence are generated through the target neural network, wherein the target neural network includes an encoder and a decoder, and the encoder is used to The sequence obtains the embedding vector, and the decoder is used to generate the target words and sentences according to the embedding vectors.
  • the target neural network is obtained through training samples, and the training samples include character string sequences and corresponding words and sentences (for example, it can be The correct words and sentences corresponding to the string sequence); the target words and sentences are presented in the interface of the input method tool.
  • the encoder can obtain the embedding vector according to the target string sequence, where the encoder can process each character in the target string sequence to obtain the embedding vector of each character (or called is a hidden vector), it should be understood that the size of the input and output of the encoder can be kept the same.
  • the decoder can generate the target words and sentences according to the embedding vector.
  • the decoder can obtain at least one word unit and the probability of each word unit according to the embedding vector, and combine the planning algorithm to obtain the target words and sentences .
  • the uncorrected and unsegmented character strings are directly input to the phonetic-to-character conversion model (such as the target neural network in the embodiment of the present application).
  • the phonetic-to-character conversion model such as the target neural network in the embodiment of the present application.
  • the target neural network has the ability to correct errors, and since the target string sequence is input through the input method tool, the length of the character will not be very long ( is less than the threshold), without word segmentation, directly based on the original string, the target neural network can still get accurate words and sentences.
  • the method solves the error superposition influence of the error correction model and the word segmentation model on the accuracy of the words and sentences in the prior art, and improves the generation accuracy of the words and sentences.
  • training the target neural network through the above noise samples can enable the target neural network to have error correction capabilities (that is, the target neural network can still generate correct words and sentences for strings containing noise).
  • the number of characters in the target string sequence is less than a threshold, and the threshold is a value less than or equal to 128, for example, the threshold may be 64, 70, 80, 90, 100, 128, and so on.
  • the decoder can adopt a non-autoregressive parallel decoding method.
  • the input is a sequence of letters and the output is a sequence of Chinese characters.
  • a Chinese character needs to be composed of multiple letters. Therefore, the length of the output Chinese character sequence is usually much smaller than the input letter sequence. Therefore, a 'generated sequence length prediction' module is added to the encoder to guide the length of the generated sequence.
  • the decoder side is changed from one-way Attention (such as the GPT model) to the two-way Attention of the Bert model to support parallel decoding.
  • the target word and sentence may include a first word unit and a second word unit, and the position of the first word unit in the target word and sentence is earlier than that of the second word unit,
  • the decoder is specifically configured to: generate the second word unit according to the target string sequence without relying on the first word unit being generated.
  • the decoder is specifically configured to: generate the first word unit and the second word unit in parallel according to the target character string sequence.
  • non-autoregressive decoding can greatly increase the inference speed of the model without greatly reducing the performance of the model.
  • the number of word units of the target sentence can be predicted through a word count prediction model; according to the target string sequence, the target character can be generated through a target neural network The initial words and sentences corresponding to the string sequence; according to the number of the character units, the initial words and sentences are intercepted to obtain the target words and sentences.
  • the target neural network after the target neural network receives the target character string sequence, it can encode the input sequence through the encoder; the length of the target sentence (the number of subunits) can be predicted through the word count prediction model, and the decoder can use the encoding result of the encoder , generate the initial words and sentences corresponding to the target string sequence in parallel; finally, adjust the initial words and sentences according to the number of subunits predicted before (such as: truncate the part exceeding the length).
  • the word count prediction model may be a classification model or a regression model.
  • the number of subunits of the target words and sentences may also be predicted by a word count prediction model, and the initial words and sentences may be adjusted based on the number of subunits.
  • the target character string sequence is a character string sequence containing noise, and the noise is caused by a user's wrong input in the input method tool;
  • the target words and sentences are correct words and sentences corresponding to the target character string sequence after denoising.
  • the target neural network includes an encoder and a decoder, wherein the encoder or decoder can be one of the following models: LSTM, GRU, SRU, bert, roberta, spanbert, xlnet , GPT, nezha, mass, bart, mbart, albert, structbert, ernie, knowbert, k-bert, tinybert.
  • the encoder or decoder can be one of the following models: LSTM, GRU, SRU, bert, roberta, spanbert, xlnet , GPT, nezha, mass, bart, mbart, albert, structbert, ernie, knowbert, k-bert, tinybert.
  • the encoder can be understood as a deep learning network model, and there are various network structures of the encoder, which are not specifically limited in this embodiment of the present application; specifically, the network structure of the encoder can be Transformer The network structure of the encoder part of the network, or the network structure of a series of other networks derived from the encoder part of the Transformer network.
  • the present application provides a sample construction method, the method comprising:
  • the first character string sequence including a first character
  • words and sentences can be converted into the first character string sequence through the phonetic conversion module.
  • the first character is a character in the first character string sequence.
  • the first character may be obtained by randomly sampling (or in other ways) the characters of the first character string sequence.
  • the first character can be used as adding noise to the first string sequence (specifically, it can be replacing the first character with other characters except the first character, or adding other characters than the first character before the first character or after) objects.
  • the target character corresponding to the first character is determined from at least one second character through the target probability model, wherein the target probability model indicates that when the user inputs the first character on the virtual keyboard, the user touches the at least one character by mistake.
  • the probability of a virtual key corresponding to each second character in a second character is related to at least one of the following:
  • the size information of the virtual keys the layout information of the virtual keys, the user's operating habits or the user's hand structure characteristics;
  • the two character string sequences and the words and sentences are used as training samples of the target neural network, and the target neural network is used to generate corresponding words and sentences according to the character string sequences.
  • the target probability model may be used to describe the probability that the user touches the virtual key corresponding to each second character in the at least one second character by mistake when inputting the first character on the virtual keyboard.
  • the probabilities of different keys touched by mistake may not be equal. It may be related to the size information of the virtual keys, the layout information of the virtual keys, the operating habits of the user, or the structural features of the user's hands.
  • the larger the size of the virtual key the greater the probability of being accidentally touched.
  • the vicinity of button A includes button B, button C, and button D. If the size of button B is larger than the size of button C and button D, the user will accidentally touch the button when pressing button A. Pressing B has a higher probability.
  • keyboards with different size information of virtual keys may correspond to different target probability models.
  • the layout information of the virtual keys may include information such as the arrangement of the keys on the keyboard, the distance between the keys, and the shape of the keys themselves. For example, when the user presses key A, there are keys near key A B. If the distance between button C, button D, and button B and button A is smaller than the distance between button C, button D, and button A, the probability that the user will accidentally touch button B when pressing button A is higher.
  • keyboards with different virtual key layout information may correspond to different target probability models.
  • the user's operating habits can be understood as the user's action habits when pressing a button. Different users may have different action habits.
  • the vicinity of button A includes button B, button C, and button D.
  • User A is pressing When pressing button A, it is easier to press button B due to operating habits, and the probability that the user will accidentally touch button B when pressing button A is higher; another example is that operating habits can be related to the proficiency of keyboard input.
  • users with different operating habits may correspond to different target probability models.
  • the structural feature of the user's hand may be understood as the structural feature of the user's operating finger when pressing the keyboard, for example, it may be the size of the area between the finger and the contact surface.
  • the structural characteristics of the hand may be related to the age of the user. For users of the same age, different structural characteristics of the hand may be corresponding based on gender and individual differences.
  • users with different hand structure characteristics may correspond to different target probability models.
  • the size information of the virtual key may include size information of at least one second character.
  • the layout information of the virtual key may include at least one layout feature between the second character and the first character.
  • the target character used to replace the first character is determined through the target probability model, which can more accurately describe the user's actual behavior, that is, determine the character that is more likely to be touched by mistake, and then obtain
  • the noise-added training samples can better reflect the actual user operation situation, and the target neural network trained based on the noise-added training samples is also more accurate, which can enhance the robustness of the model in real user input scenarios.
  • the method also includes:
  • the probability determined by the target probability model constructed based on the pressing point cloud can be related to the user's operating habits.
  • the target probability model is a Gaussian probability model.
  • the method also includes:
  • the target neural network is trained according to the second character string sequence and the correct words and sentences.
  • the present application provides a device for generating words and sentences, the device comprising:
  • An acquisition module configured to acquire a target character string sequence, the target character string being input by the user in the input method tool
  • the phrase generation module is used to generate the target phrase corresponding to the target string sequence through the target neural network according to the target string sequence, wherein the target neural network includes an encoder and a decoder, and the encoder is used for The embedding vector is obtained according to the target string sequence, and the decoder is used to generate the target words and sentences according to the embedding vector.
  • the target neural network is obtained through training samples, and the training samples include string sequences and corresponding words;
  • a presentation module configured to present the target words and sentences in the interface of the input method tool.
  • the number of characters in the target string sequence is less than a threshold, and the threshold is a value less than or equal to 128.
  • the target word and sentence includes a first word unit and a second word unit, and the position of the first word unit in the target word and sentence is earlier than that of the second word unit, so
  • the decoder is specifically configured to: generate the second word unit according to the target string sequence without relying on the first word unit being generated.
  • the decoder is specifically configured to: generate the first word unit and the second word unit in parallel according to the target character string sequence.
  • the device also includes:
  • the number of words prediction module is used to predict the number of word units of the target word and sentence through the number of words prediction model according to the target character string sequence;
  • the word and sentence generation module is specifically used for:
  • the initial words and sentences corresponding to the target character string sequence are generated through the target neural network
  • the initial words and sentences are intercepted to obtain the target words and sentences.
  • the target character string sequence is a character string sequence containing noise, and the noise is caused by a user's wrong input in the input method tool;
  • the target words and sentences are correct words and sentences corresponding to the target character string sequence after denoising.
  • the encoder or decoder is one of the following models:
  • the present application provides a sample construction device, the device comprising:
  • An acquisition module configured to acquire a first character string sequence and corresponding words and sentences, where the first character string sequence includes a first character
  • a character replacement module configured to determine a target character corresponding to the first character from at least one second character through a target probability model, wherein the target probability model indicates that when the user inputs the first character on the virtual keyboard, The probability of accidentally touching the virtual key corresponding to each second character in the at least one second character, the probability is related to at least one of the following:
  • the size information of the virtual keys the layout information of the virtual keys, the user's operating habits or the user's hand structure characteristics;
  • the two character string sequences and the words and sentences are used as training samples of the target neural network, and the target neural network is used to generate corresponding words and sentences according to the character string sequences.
  • the target probability model is a Gaussian probability model.
  • the device also includes:
  • the training module is used to train the target neural network according to the second character string sequence and the correct words and sentences.
  • the embodiment of the present application provides a neural network search device, which may include a memory, a processor, and a bus system, wherein the memory is used to store programs, and the processor is used to execute the programs in the memory to perform the above-mentioned first aspect and any optional method thereof, and the above-mentioned second aspect and any optional method thereof.
  • the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when it is run on a computer, the computer executes the above-mentioned first aspect and any optional program.
  • the embodiment of the present application provides a computer program, which, when running on a computer, enables the computer to execute the above-mentioned first aspect and any optional method thereof, the above-mentioned second aspect and any optional method thereof method.
  • the present application provides a chip system, which includes a processor, configured to support an execution device or a training device to implement the functions involved in the above aspect, for example, send or process the data involved in the above method; or, information.
  • the system-on-a-chip further includes a memory, and the memory is used for storing necessary program instructions and data of the execution device or the training device.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • the present application provides a method for generating words and sentences, the method comprising: obtaining a target character string sequence, the target character string is input by a user in an input method tool; according to the target character string sequence, through a target neural network, generating The target words and sentences corresponding to the target character string sequence, wherein the target neural network includes an encoder and a decoder, the encoder is used to obtain an embedding vector according to the target character string sequence, and the decoder is used to obtain an embedding vector based on the embedding
  • the vector generates the target words and sentences, the target neural network is obtained through training samples, the training samples include character string sequences and corresponding words and sentences; the target words and sentences are presented in the interface of the input method tool.
  • the character strings without error correction and word segmentation are input into the phonetic-character conversion model (such as the target neural network in the embodiment of the application), which solves the problem in the prior art that due to the error correction model and the word segmentation model on the accuracy of words and sentences
  • the superimposed influence of errors improves the generation accuracy of words and sentences.
  • Fig. 1 is a kind of structural schematic diagram of main frame of artificial intelligence
  • Fig. 2 is a schematic diagram of an interface of an input method scene
  • Fig. 3 is a schematic diagram of an interface of an input method scene
  • FIG. 4 is a schematic diagram of the architecture of an application system
  • FIG. 5 is a schematic diagram of the architecture of an application system
  • Fig. 6 is a schematic representation of a method for generating words and sentences
  • Fig. 7 is a schematic representation of a method for generating words and sentences
  • FIG. 8 is a schematic flow diagram of a method for generating words and sentences provided in an embodiment of the present application.
  • Figure 9 is a schematic diagram of the construction of the embedding vector of the embodiment of the present application.
  • Figure 10 is a schematic representation of a non-autoregressive network
  • FIG. 11 is a schematic diagram of generating a word and sentence in the embodiment of the present application.
  • Figure 12 is a schematic flow chart of the sample construction method provided by the embodiment of the present application.
  • FIG. 13 is a schematic diagram of a click distribution of a user clicking on a virtual keyboard
  • Fig. 14 is a schematic diagram of a sample construction method
  • Fig. 15 is a schematic diagram of an embodiment of a device for generating words and sentences provided by the embodiment of the present application.
  • Fig. 16 is a schematic diagram of an embodiment of a sample construction device provided by the embodiment of the present application.
  • Fig. 17 is a schematic structural diagram of an execution device provided by an embodiment of the present application.
  • Fig. 18 is a schematic structural diagram of a training device provided by an embodiment of the present application.
  • FIG. 19 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • Figure 1 shows a schematic structural diagram of the main framework of artificial intelligence.
  • the following is from the “intelligent information chain” (horizontal axis) and “IT value chain” ( Vertical axis) to illustrate the above artificial intelligence theme framework in two dimensions.
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensed process of "data-information-knowledge-wisdom".
  • IT value chain reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of artificial intelligence, information (provided and processed by technology) to the systematic industrial ecological process.
  • the infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform.
  • the basic platform includes distributed computing framework and network and other related platform guarantees and supports, which can include cloud storage and Computing, interconnection network, etc.
  • sensors communicate with the outside to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.
  • Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, text, and IoT data of traditional equipment, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
  • machine learning and deep learning can symbolize and formalize intelligent information modeling, extraction, preprocessing, training, etc. of data.
  • Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, and using formalized information to carry out machine thinking and solve problems according to reasoning control strategies.
  • the typical functions are search and matching.
  • Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image processing identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is the packaging of the overall solution of artificial intelligence, which commercializes intelligent information decision-making and realizes landing applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, smart cities, etc.
  • This application can, but is not limited to, be applied in the field of natural language processing in the field of artificial intelligence, and can be specifically applied in the field of natural language processing.
  • the following will introduce a number of application scenarios that have been implemented in products.
  • This application can be applied to the scene of information input based on the input method.
  • the user can input a character string on the terminal device.
  • the input method editor (input method editor, IME) deployed inside the terminal device will receive the character string entered by the user and generate a character string based on the character string. corresponding phrase, and then prompt the phrase to the user.
  • the input method editor may be realized by a neural network, such as the target neural network in the embodiment of the present application.
  • the task of converting character strings into corresponding words and sentences is called a phonetic-to-character conversion task.
  • a character string (also called a character string sequence, such as the target character string sequence in the embodiment of the present application) can be understood as a combination of characters, which is a carrier of language information and is used to generate Words and sentences; the words and sentences can be one word or multiple words, and one word can also become a word.
  • the character string may be a character indicating the pronunciation of a word or sentence that the user wants to input.
  • the above-mentioned input scene may be an input scene of multiple languages such as Chinese, Japanese, Chinese, etc.
  • the form of the character string is different; taking Chinese as an example, the character string may include one pinyin or multiple pinyins.
  • the words and sentences prompted by the input method editor are Arthur's Ark, Arthur's Ark, Arthur's Ark, and Nuoya's Ark.
  • FIG. 2 shows a schematic diagram of an interface when inputting based on an input method on a mobile terminal.
  • the mobile terminal can receive the user's input operation under the input method.
  • the input operation may be an input operation for inputting a character sequence in a spelling area, and corresponding candidate words can be generated based on the character sequence in the spelling area through the input method.
  • the input operation may be one of keyboard input operation and handwriting input operation.
  • the input operation may also be another type of input operation, which is not limited in this embodiment of the present application.
  • FIG. 3 shows a schematic interface when inputting based on an input method on a PC terminal. Unlike FIG. 2, the user in FIG. 3. The user can input character strings on the physical keyboard.
  • the terminal device may be a desktop computer, a notebook computer, a tablet computer, a smart phone, or a smart TV.
  • the terminal device may also be any other device capable of deploying an input method editor, such as a vehicle-mounted computer.
  • FIG. 4 shows a natural language processing system
  • the natural language processing system includes a user device (this embodiment of the present application may also be referred to as a terminal device or a smart device).
  • the user equipment includes terminal equipment such as a mobile phone and a personal computer.
  • the user equipment can receive user instructions, for example, the user equipment can receive a string input by the user, so that the user equipment can perform processing on the string (for example, perform a phonetic-word conversion task), so as to obtain the A corresponding processing result of a character string (for example, the words and sentences corresponding to the character string, etc.).
  • processing on the string for example, perform a phonetic-word conversion task
  • a corresponding processing result of a character string for example, the words and sentences corresponding to the character string, etc.
  • the user equipment may store the target neural network, and after each operating system (operating system, OS) or application program (application, APP) invokes the model, perform an inference task according to the target neural network (such as The above-mentioned phonetic-word conversion task).
  • OS operating system
  • APP application program
  • the neural network can be composed of neural units, and the neural unit can refer to an operation unit that takes xs and intercept 1 as input, and the output of the operation unit can be:
  • Ws is the weight of xs
  • b is the bias of the neuron unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field.
  • the local receptive field can be an area composed of several neural units.
  • Deep Neural Network can be understood as a neural network with many hidden layers. There is no special metric for the "many” here. The essence of the multi-layer neural network and deep neural network we often say is above is the same thing. According to the position of different layers of DNN, the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the layers in the middle are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer. Although DNN looks complicated, it is actually not complicated in terms of the work of each layer.
  • the linear coefficient from the 4th neuron of the second layer to the 2nd neuron of the third layer is defined as
  • the superscript 3 represents the layer number of the coefficient W, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
  • the coefficient from the kth neuron of the L-1th layer to the jth neuron of the Lth layer is defined as Note that the input layer has no W parameter.
  • more hidden layers make the network more capable of describing complex situations in the real world. Theoretically speaking, a model with more parameters has a higher complexity and a greater "capacity", which means that it can complete more complex learning tasks.
  • Natural language is human language, and natural language processing (NLP) is the processing of human language. Natural language processing is the process of systematically analyzing, understanding and extracting information from text data in an intelligent and efficient manner.
  • NLP natural language processing
  • the convolutional neural network can use the back propagation (BP) algorithm to correct the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, passing the input signal forward until the output will generate an error loss, and updating the parameters in the initial super-resolution model by backpropagating the error loss information, so that the error loss converges.
  • the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the parameters of the optimal super-resolution model, such as the weight matrix.
  • Input method preferred word When the user enters a character string, the input method editor will provide the user with a candidate list, which is used to prompt the user for words and sentences, and the first word in the candidate list is called input preferred word for the law.
  • Input method error correction module When the user is making an input, there will be a "false touch phenomenon", that is, the intention is to press a certain key, but actually press another key, the input method error correction module It is a module that corrects wrong key information to correct key information.
  • Pinyin segmentation module the original sequence input by the user of the input method is a sequence of letters without separation, and the pinyin segmentation module splits the input sequence to form a sequence composed of complete pinyin, which is further sent to the phonetic-character conversion module.
  • Transformer network structure a deep neural network structure, including substructures such as input layer, self-attention layer, feed-forward layer, and normalization layer.
  • Bert model a model with a Transformer network structure, and on the basis of the Transformer network structure, a "pre-training + fine-tuning" learning paradigm is proposed, and two pre-training tasks, Masked Language Model and Next Sentence Prediction, are designed .
  • Ngram model a model widely used in Chinese input method tasks.
  • Bart Use the Bert model as the encoder, use the GPT model as the decoder, and design a variety of pre-training tasks to train the model. Bart has achieved good results in both NLP understanding tasks and generation tasks.
  • Smoothing algorithm an algorithm designed to solve the zero-probability problem of the Ngram model. When it is judged that there is a zero-probability risk, the smoothing algorithm usually uses a stable but inaccurate low-order Ngram model probability. Some way to fit unstable, but accurate higher order Ngram model probabilities.
  • Viterbi algorithm It is a dynamic programming algorithm for finding the Viterbi path that is most likely to generate the observed event sequence, or the hidden state sequence, especially in the context of Markov information source and hidden Markov In the Cove model, it is often used in speech recognition, keyword recognition, computational linguistics and bioinformatics; among them, the Viterbi algorithm can also be called the Finite State Transducers (FST) algorithm.
  • FST Finite State Transducers
  • the Ngram model is introduced in detail below.
  • the Ngram model makes the Markov assumption that the probability of the current word is only related to a limited number of N words.
  • N takes different values, a series of specific Ngram models are obtained.
  • the smoothing algorithm can be simply understood as, when the probability of the Ngram model is 0, the product of a certain weight and the probability of the (N-1)gram model is used as the probability of the (N)gram model.
  • the Ngram model is described below with a specific example.
  • P(Nuo, Ya, of, technology, technology, strong) P(Nuo
  • the bottom line represents pinyin nodes
  • the upper four lines of nodes are Chinese characters corresponding to pinyin nodes. These Chinese characters constitute various possibilities for user input.
  • the probability of each Chinese character node can be calculated by using the Ngram model. Since the probability of the Chinese character node is actually the conditional probability of the occurrence of the previous N Chinese character nodes, this probability can also be regarded as the path transition probability between Chinese character nodes.
  • the Ngram model can be used to calculate the probabilities P (Ya
  • FIG. 5 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the system architecture 500 includes an execution device 510 , a training device 520 , a database 530 , a client device 540 , a data storage system 550 and a data acquisition system 560 .
  • the execution device 510 includes a calculation module 511 , an I/O interface 512 , a preprocessing module 513 and a preprocessing module 514 .
  • the calculation module 511 may include the target model/rule 501, and the preprocessing module 513 and the preprocessing module 514 are optional.
  • the data collection device 560 is used to collect training samples.
  • the training samples may be data (such as character strings and corresponding words and sentences) used when training the neural network. After collecting the training samples, the data collection device 560 stores these training samples in the database 530 .
  • the training device 520 can train the neural network based on the training samples to search for the target model/rule 501 .
  • the target model/rule 501 may be a target neural network.
  • the training samples maintained in the database 530 are not necessarily all collected by the data collection device 560, and may also be received from other devices.
  • the training device 520 does not necessarily perform the training of the target model/rule 501 based entirely on the training samples maintained by the database 530, and it is also possible to obtain training samples from the cloud or other places for model training. Limitations of the Examples.
  • the target model/rule 501 trained according to the training device 520 can be applied to different systems or devices, such as the execution device 510 shown in FIG. 5, which can be a terminal, such as a mobile terminal, a tablet computer, a notebook Computers, augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) equipment, vehicle terminals, etc.
  • the execution device 510 shown in FIG. 5 can be a terminal, such as a mobile terminal, a tablet computer, a notebook Computers, augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) equipment, vehicle terminals, etc.
  • the training device 520 may transfer the target neural network to the execution device 510 .
  • an execution device 510 is configured with an input/output (input/output, I/O) interface 512 for data interaction with an external device, and a user can input data to the I/O interface 512 through a client device 540 (such as this The target string sequence in the application example).
  • I/O input/output
  • the preprocessing module 513 and the preprocessing module 514 are configured to perform preprocessing according to the input data received by the I/O interface 512 . It should be understood that there may be no preprocessing module 513 and preprocessing module 514 or only one preprocessing module. When the preprocessing module 513 and the preprocessing module 514 do not exist, the calculation module 511 may be used directly to process the input data.
  • the execution device 510 When the execution device 510 preprocesses the input data, or in the calculation module 511 of the execution device 510 performs calculation and other related processing, the execution device 510 can call the data, codes, etc. in the data storage system 550 for corresponding processing , the correspondingly processed data and instructions may also be stored in the data storage system 550 .
  • the I/O interface 512 presents the processing result (for example, the target word and sentence in the embodiment of the present application) to the client device 540, thereby providing it to the user.
  • the user can manually specify input data, and the “manually specify input data” can be operated through the interface provided by the I/O interface 512 .
  • the client device 540 can automatically send the input data to the I/O interface 512 . If the client device 540 is required to automatically send the input data to obtain the user's authorization, the user can set the corresponding authority in the client device 540 .
  • the user can view the results output by the execution device 510 on the client device 540, and the specific presentation form may be specific ways such as display, sound, and action.
  • the client device 540 can also be used as a data collection terminal, collecting input data from the input I/O interface 512 and output results from the output I/O interface 512 as new sample data, and storing them in the database 530 .
  • the data is stored in database 530 .
  • FIG. 5 is only a schematic diagram of a system architecture provided by the embodiment of the present application, and the positional relationship between devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 550 is an external memory relative to the execution device 510 , and in other cases, the data storage system 550 may also be placed in the execution device 510 . It should be understood that the above execution device 510 may be deployed in the client device 540 .
  • the calculation module 511 of the execution device 520 can obtain the code stored in the data storage system 550 to implement the method for generating words and sentences in the embodiment of the present application.
  • the calculation module 511 of the execution device 520 may include a hardware circuit (such as an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a general-purpose processor, digital signal processing (digital signal processing, DSP), microprocessor or microcontroller, etc.), or a combination of these hardware circuits, for example, the training device 520 can be a hardware system with the function of executing instructions, such as CPU, DSP, etc. , or a hardware system that does not have the function of executing instructions, such as ASIC, FPGA, etc., or a combination of the above-mentioned hardware systems that do not have the function of executing instructions and hardware systems that have the function of executing instructions.
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • DSP digital signal processing
  • microprocessor or microcontroller etc.
  • the training device 520 can be a hardware system with the function of executing instructions, such as CPU, DSP, etc. , or a hardware system that does not have
  • the calculation module 511 of the execution device 520 can be a hardware system with the function of executing instructions.
  • the method for generating words and sentences provided in the embodiment of the present application can be a software code stored in a memory, and the calculation module 511 of the execution device 520 can read from the memory.
  • the software code is obtained, and the obtained software code is executed to implement the method for generating words and sentences provided by the embodiment of the present application.
  • calculation module 511 of the execution device 520 can be a combination of a hardware system that does not have the function of executing instructions and a hardware system that has the function of executing instructions.
  • the computing module 511 is implemented by a hardware system that does not have the function of executing instructions, and is not limited here.
  • the above-mentioned training device 520 can obtain the code stored in the memory (not shown in FIG. 5, which can be integrated into the training device 520 or deployed separately from the training device 520) to implement the model training in the embodiment of the present application related methods.
  • the training device 520 may include a hardware circuit (such as an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a general-purpose processor, a digital signal processor (digital signal processing, DSP), microprocessor or microcontroller, etc.), or a combination of these hardware circuits, for example, the training device 520 can be a hardware system with the function of executing instructions, such as CPU, DSP, etc., or for not A hardware system with the function of executing instructions, such as ASIC, FPGA, etc., or a combination of the above-mentioned hardware system without the function of executing instructions and a hardware system with the function of executing instructions.
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • DSP digital signal processor
  • microprocessor or microcontroller etc.
  • the training device 520 can be a hardware system with the function of executing instructions, such as CPU, DSP, etc., or for not A hardware system with the function of executing instructions, such as ASIC,
  • the training device 520 may be a hardware system capable of executing instructions, and the method related to model training provided in the embodiment of the present application may be a software code stored in a memory, and the training device 520 may obtain the software code from the memory, And execute the obtained software code to implement the method related to model training provided by the embodiment of the present application.
  • the training device 520 can be a combination of a hardware system that does not have the function of executing instructions and a hardware system that has the function of executing instructions. It can be implemented by a hardware system capable of executing instructions, which is not limited here.
  • corresponding candidate words may be generated based on a user's input of a character string, and displayed on an interface of the input method tool for the user to select.
  • the input method software needs to go through three steps:
  • the input method software first corrects the user's input. Specifically, when the user is actually inputting, it is easy to have a 'false touch phenomenon', that is, the intention is to press a certain key, but actually press another key.
  • the keyboard error correction module converts the user's actual input sequence into a correct input key sequence that conforms to the user's input intention. As shown in Figure 7, the user's actual input sequence is 'nuiyafangzou', where 'nuo' is mistyped as 'nui' (because the distance between 'o' and 'i' is very close on the keyboard), and it should be tongue roll The phonetic 'zhou' is mistyped as 'zou'. After the keyboard error correction module, the sequence is corrected to 'nuoyafangzhou'.
  • the keyboard error correction can be a rule-based method, that is, based on the above input, the current input letter, and adjacent letters, it is judged whether the current input letter needs to be corrected, and what letter is corrected.
  • the result after error correction can be segmented into pinyin, and the letter sequence input by the user can be converted into a pinyin sequence.
  • the pinyin segmentation module is to divide the user key input sequence into a pinyin sequence, which is convenient for the following phonetic-word conversion module to process.
  • Pinyin is the official phonetic system for Chinese characters stipulated by the state, and it is also the phonetic system for Chinese characters with the largest number of users. In the Pinyin input method, the user inputs Chinese characters through Pinyin.
  • the pinyin segmentation problem can be solved as a traditional word segmentation problem, using some traditional word segmentation algorithms, such as: maximum matching word segmentation algorithm, word segmentation algorithm based on hidden Markov model, and so on.
  • the pinyin sequence can be input into the phonetic-character conversion module to convert the phonetic sequence into words and sentences (ie, candidate words).
  • the phonetic-character conversion is to convert the pinyin sequence into a Chinese character sequence, and finally prompt the user.
  • the existing solution is to model the three problems of input error correction, pinyin segmentation, and phonetic-word conversion separately.
  • This serial modeling method is easy to cause cascading and amplification of errors, that is, errors in the previous tasks will be The errors caused by the subsequent tasks are superimposed with the errors of the subsequent tasks themselves, resulting in greater errors.
  • an error in keyboard input error correction has a high probability of causing an error in the result of pinyin segmentation, which will further cause an error in the result of phonetic conversion.
  • an embodiment of the present application provides a method for generating words and sentences.
  • the embodiment of the present application provides an embodiment of a method for generating words and sentences, which can be applied to input method systems in multiple languages such as Chinese, Japanese, and Korean; the input method system can be deployed in terminal devices , can also be deployed in the cloud server; when the input method system is deployed in the cloud server, this embodiment is executed by the cloud server, and the cloud server sends the generated target words to the terminal device for display on the terminal device.
  • Fig. 8 shows an embodiment of a method for generating words and sentences provided by an embodiment of the present application.
  • the method for generating words and sentences provided by an embodiment of the present application can be applied to an execution device, and the execution device can be a mobile phone, a tablet, or a notebook For terminal devices such as computers and smart wearable devices, as shown in Figure 8, the method for generating words and sentences provided by the embodiment of the present application may include:
  • the number of characters in the target string sequence is less than a threshold, and the threshold is a value less than or equal to 128, for example, the threshold may be 64, 70, 80, 90, 100, 128, and so on.
  • a character string can be understood as a combination of characters, which is a carrier of language information and is used to generate a sentence; the sentence can be one word, or multiple words, and a word can also become a word.
  • the above-mentioned input scene can be an input scene of multiple languages such as Chinese, Japanese, and Chinese; corresponding to different types of languages, the form of the character string is different; taking Chinese as an example, the character string can include one pinyin or multiple pinyin, at this time, the characters A string can also be called a pinyin string, for example, the string can be "nuoyafangzhou”.
  • the user can input the target character string sequence through the input method tool, and then the terminal device can obtain the target character string sequence input by the user.
  • the target character string sequence use the target neural network to generate target words and sentences corresponding to the target character string sequence, wherein the target neural network includes an encoder and a decoder, and the encoder is used to The character string sequence obtains an embedding vector, and the decoder is used to generate the target words and sentences according to the embedding vector, and the target neural network is obtained through training of training samples, and the training samples include a character string sequence and corresponding words and sentences.
  • the uncorrected and unsegmented character strings are directly input to the phonetic-to-character conversion model (such as the target neural network in the embodiment of the present application).
  • the phonetic-to-character conversion model such as the target neural network in the embodiment of the present application.
  • the target neural network includes an encoder and a decoder, wherein the encoder or decoder can be one of the following models: LSTM, GRU, SRU, bert, roberta, spanbert, xlnet , GPT, nezha, mass, bart, mbart, albert, structbert, ernie, knowbert, k-bert, tinybert.
  • the encoder or decoder can be one of the following models: LSTM, GRU, SRU, bert, roberta, spanbert, xlnet , GPT, nezha, mass, bart, mbart, albert, structbert, ernie, knowbert, k-bert, tinybert.
  • the encoder can be understood as a deep learning network model, and there are various network structures of the encoder, which are not specifically limited in this embodiment of the present application; specifically, the network structure of the encoder can be Transformer The network structure of the encoder part of the network, or the network structure of a series of other networks derived from the encoder part of the Transformer network.
  • the input of the standard Bart contains three embedding embedding layers: position embedding, segment embedding and token embedding.
  • position embedding is used to distinguish the different positions of the current token in the sequence
  • segment embedding is used to distinguish whether the current token is in the first sentence input or in the second sentence, for the next pre-training task between sentences Prepare
  • token embedding represents the semantics of the current token.
  • Figure 9 is a schematic diagram of the construction of an embedding vector for the Pinyin bart in this application.
  • the input token of the standard bart is composed of subwords, usually Chinese characters and common short words, and the number is about 30,000.
  • the pinyin Bart is oriented to the problem of key sound conversion.
  • the input token is the key letter on the keyboard, and there are only 26 keys.
  • Pinyin Bart does not have a segment token, because Pinyin Bart does not need to do pre-training tasks, but directly trains on the key-sound conversion task.
  • the maximum input length of the standard Bart is 512 tokens, which can accommodate an article of general length, so that it can handle chapter tasks; while Pinyin Bart only handles the key-to-sound conversion task in the input method, and the user's input method software
  • the input sequence is generally short, and the applicable scenarios of Pinyin Bart are limited to short input sequences, and the maximum sequence length is set at 64 or 32 letters. Combining the above three factors, the input layer parameters of Pinyin Bart are much smaller than the standard Bart model.
  • the encoder can obtain the embedding vector according to the target string sequence, where the encoder can process each character in the target string sequence to obtain the embedding vector of each character (or called is a hidden vector), it should be understood that the size of the input and output of the encoder can be kept the same.
  • the decoder can generate the target words and sentences according to the embedding vector.
  • the decoder can obtain at least one word unit and the probability of each word unit according to the embedding vector, and combine the planning algorithm to obtain the target words and sentences .
  • the planning algorithm may be an Ngram model, a Viterbi algorithm, etc., which are not limited here.
  • the decoder can sequentially generate the word units of the target sentence according to the embedding vector, that is, the word units generated earlier are used (or described as being used as input) when the word units are subsequently generated,
  • the hidden vector can be given to the decoder (for example, the input 'A-E' is encoded and given to the decoder).
  • the input sequence is input token by token (for example, input 'B'), and the expected results are generated one by one according to the input token and the hidden vector given by the encoder (for example, 'C' is generated).
  • the decoder can adopt a non-autoregressive parallel decoding method.
  • the input is a sequence of letters and the output is a sequence of Chinese characters.
  • a Chinese character needs to be composed of multiple letters. Therefore, the length of the output Chinese character sequence is usually much smaller than the input letter sequence. Therefore, a 'generated sequence length prediction' module is added to the encoder to guide the length of the generated sequence.
  • the decoder side is changed from one-way Attention (such as the GPT model) to the two-way Attention of the Bert model to support parallel decoding.
  • the target word and sentence may include a first word unit and a second word unit, and the position of the first word unit in the target word and sentence is earlier than that of the second word unit,
  • the decoder is specifically configured to: generate the second word unit according to the target string sequence without relying on the first word unit being generated.
  • the decoder is specifically configured to: generate the first word unit and the second word unit in parallel according to the target character string sequence.
  • non-autoregressive decoding can greatly increase the inference speed of the model without greatly reducing the performance of the model.
  • the standard Bart model uses an autoregressive decoding method, while Pinyin Bart uses a non-autoregressive decoding method to improve the reasoning speed.
  • the Pinyin Bart constructed with the autoregressive decoding module is recorded as 'Pinyin Bart-AR', where 'AR' is the meaning of 'auto-regressive (autoregressive)'.
  • the non-autoregressive decoding 'Pinyin Bart' is similar to the autoregressive decoding 'Pinyin Bart-AR', and the former only has a performance loss of 0.03%.
  • the former infers each token (Chinese character) at a speed of 1.60ms, while the latter is 15.66ms, and the former is 9.78 times faster than the latter.
  • the noise is relatively large (5%)
  • the accuracy rate of 'Pinyin Bart' decreases to 0.91%, but it still remains within 1%, and the reasoning speed can still be increased by more than 9 times (9.30 times).
  • the number of word units of the target sentence can be predicted through a word count prediction model; according to the target string sequence, the target character can be generated through a target neural network The initial words and sentences corresponding to the string sequence; according to the number of the character units, the initial words and sentences are intercepted to obtain the target words and sentences.
  • the target neural network after the target neural network receives the target character string sequence, it can encode the input sequence through the encoder; predict the length of the target word and sentence (number of subunits) through the word count prediction model, and the decoder can be based on the encoding
  • the encoding result of the device is used to generate the initial words and sentences corresponding to the target string sequence in parallel; finally, the initial words and sentences are adjusted according to the number of subunits predicted before (such as: truncating the part exceeding the length).
  • the word count prediction model may be a classification model or a regression model.
  • the number of subunits of the target words and sentences may also be predicted by a word count prediction model, and the initial words and sentences may be adjusted based on the number of subunits.
  • the embodiment of the present application removes the correction
  • the training samples with added noise can be used when training the target neural network.
  • the so-called noise sample refers to the string sequence obtained after modifying the correct string sequence (such as adding characters, deleting characters or target characters), and using the words and sentences corresponding to the string sequence before adding noise as labels, to form noise training samples.
  • the correct character string is 'nuoyafangzhou'
  • the character string with noise added is 'nuiyafangzou'
  • the word and sentence corresponding to the character string sequence before adding noise is 'Noah's Ark'.
  • 'nuiyafangzou' and 'Noah's Ark' can constitute noise training samples.
  • Training the target neural network through the above noise samples can make the target neural network have error correction capabilities (that is, the target neural network can still generate correct words and sentences for strings containing noise).
  • the target character string sequence is a character string sequence containing noise
  • the noise is caused by a user's incorrect input in the input method tool
  • the target word and sentence is the noise-removed The correct word and sentence corresponding to the target string sequence.
  • the "after denoising” here is not limited to the denoising behavior of the target neural network, but means that, from the perspective of the effect of the final generated target words and sentences, the target words and sentences correspond to the denoised target sequence of strings.
  • the target word and sentence can be displayed as a candidate word in the interface of the input method tool, for example, it can be prompted as a preferred word and sentence, and the preferred word and sentence is ranked first among the multiple words and sentences prompted by the input method words and sentences.
  • the target words and sentences may be presented in the interface of the input method tool in the manner shown in FIG. 2 or FIG. 3 .
  • the 'existing engine' reproduces the commonly used Bigram language model as the engine
  • 'Pinyin Bert' is the existing Pinyin input method engine using the Bert model architecture
  • 'Pinyin Bart' is the input method engine corresponding to the embodiment of this application.
  • 'input noise' refers to the noise generated by the user during the keyboard input process.
  • 'woainizhongguo' is incorrectly entered as 'woaonizongguo'.
  • Different proportions of noise are mixed in the test set, and the models also have different performances.
  • 'Segmentation noise' refers to the noise brought by the pinyin segmentation process.
  • the maximum matching segmentation method is used to segment the pinyin sequence.
  • the noise of the algorithm itself is the pinyin segmentation noise.
  • the "accuracy rate” in the table refers to the accuracy rate based on "characters”, that is, the number of correct Chinese characters given by the input method for every 100 Chinese characters entered by the user.
  • the Pinyin Bert engine achieves an accuracy rate of 95.59%, which is 11.03% higher than the 84.56% of the 'existing engine', indicating that the performance of the previously proposed Pinyin Bert engine is much better than the existing input method engine.
  • the accuracy rate drops to 92.22%, a drop of 3.72%; when the input noise is added, the accuracy rate further drops to 82.77%, a drop of 12.82%, which is slightly lower than the 'existing engine' Low; when the noise ratio is increased, the accuracy rate drops to 56%, a drop of 39.35%. This shows that although the description ability of the 'Pinyin Bert' model is strong, its performance will be greatly reduced in a noisy environment.
  • the present application provides a method for generating words and sentences, the method comprising: obtaining a target character string sequence, the target character string is input by a user in an input method tool; according to the target character string sequence, through a target neural network, generating The target words and sentences corresponding to the target string sequence, wherein the target neural network includes an encoder and a decoder, the encoder is used to obtain an embedding vector according to the target string sequence, and the decoder is used to obtain an embedding vector based on the embedding
  • the vector generates the target words and sentences, the target neural network is obtained through training samples, the training samples include character string sequences and corresponding words and sentences; the target words and sentences are presented in the interface of the input method tool.
  • the character strings without error correction and word segmentation are input into the phonetic-character conversion model (such as the target neural network in the embodiment of the application), which solves the problem in the prior art that due to the error correction model and the word segmentation model on the accuracy of words and sentences
  • the superimposed influence of errors improves the generation accuracy of words and sentences.
  • FIG. 12 provides a schematic flow chart of a sample construction method in the embodiment of the present application.
  • the sample construction method provided in the embodiment of the present application includes:
  • the first character string sequence may be a character string before adding noise, for example, the first character string sequence may be 'woainizhongguo', and its corresponding correct sentence is 'I love you China'
  • words and sentences can be converted into the first character string sequence through the phonetic conversion module.
  • the phonetic conversion module converts a sequence of Chinese characters, such as: 'I love you China', into a sequence of pinyin, such as: 'wo ai ni zhong guo', and then merges them into 'woainizhongguo'.
  • the idea of the word-to-sound conversion algorithm is generally to segment the Chinese corpus first, and then transcribe the corpus according to the pinyin corresponding to the words.
  • the first character is a character in the first character string sequence.
  • the first character may be obtained by randomly sampling (or in other ways) the characters of the first character string sequence.
  • the first character can be used as adding noise to the first string sequence (specifically, it can be replacing the first character with other characters except the first character, or adding other characters than the first character before the first character or after) objects.
  • the first character string sequence can be traversed, and three operations are randomly performed at a certain ratio (for example: 1%): adding a letter at the current position of the traversal (the position where the first character is located), deleting a letter, replace a letter.
  • the probability of the virtual key corresponding to each second character in the at least one second character is related to at least one of the following: the size information of the virtual key, the layout information of the virtual key, the user's Operating habits or hand anatomical features of said user.
  • the target probability model may be used to describe the probability that the user touches the virtual key corresponding to each second character in the at least one second character by mistake when inputting the first character on the virtual keyboard.
  • the probabilities of different keys touched by mistake may not be equal. It may be related to the size information of the virtual keys, the layout information of the virtual keys, the operating habits of the user, or the structural features of the user's hands.
  • the larger the size of the virtual key the greater the probability of being accidentally touched.
  • the vicinity of button A includes button B, button C, and button D. If the size of button B is larger than the size of button C and button D, the user will accidentally touch the button when pressing button A. Pressing B has a higher probability.
  • keyboards with different size information of virtual keys may correspond to different target probability models.
  • the layout information of the virtual keys may include information such as the arrangement of the keys on the keyboard, the distance between the keys, and the shape of the keys themselves. For example, when the user presses key A, there are keys near key A B. If the distance between button C, button D, and button B and button A is smaller than the distance between button C, button D, and button A, the probability that the user will accidentally touch button B when pressing button A is higher.
  • keyboards with different virtual key layout information may correspond to different target probability models.
  • the user's operating habits can be understood as the user's action habits when pressing a button. Different users may have different action habits.
  • the vicinity of button A includes button B, button C, and button D.
  • User A is pressing When pressing button A, it is easier to press button B due to operating habits, and the probability that the user will accidentally touch button B when pressing button A is higher; another example is that operating habits can be related to the proficiency of keyboard input.
  • users with different operating habits may correspond to different target probability models.
  • the structural feature of the user's hand may be understood as the structural feature of the user's operating finger when pressing the keyboard, for example, it may be the size of the area between the finger and the contact surface.
  • the structural characteristics of the hand may be related to the age of the user. For users of the same age, different structural characteristics of the hand may be corresponding based on gender and individual differences.
  • users with different hand structure characteristics may correspond to different target probability models.
  • the size information of the virtual key may include size information of at least one second character.
  • the layout information of the virtual key may include at least one layout feature between the second character and the first character.
  • the target probability model can be constructed by pre-collecting the user's case click behavior.
  • FIG. 13 is a distribution feature map of the pressed point cloud when the user clicks a button. It can be seen that during the actual input process, the user clicks each button in a different area and range.
  • the press point cloud (or click point cloud, click position point cloud, etc.) of the sample user when actually inputting characters on the virtual keyboard can be obtained, and the press point cloud can describe the user's operation
  • the distribution of each pressing point can also be related to the size and layout of the keyboard itself and the hand characteristics of the user.
  • a target probability model can be constructed based on the above-mentioned pressed point cloud, for example, a corresponding target probability model can be constructed for each character, and each target probability model can represent the corresponding target probability model when the user inputs the target probability model. The probability of touching other virtual keys by mistake.
  • modeling may be performed to construct a target probability model (for example, Gaussian modeling may be performed to construct a Gaussian model).
  • Gaussian modeling may be performed to construct a Gaussian model.
  • can represent the variance, the smaller the variance, the more stable the set of data, and the larger the variance, the more unstable the set of data.
  • the variance is equal to the average of the sum of squared deviations of each data (for example, the distance from the pressed point to the center point of the button) and its arithmetic mean.
  • can represent the mean value.
  • the mean value is equal to the average number of each data (for example, the distance from the pressed point to the center point of the button).
  • the probability that the intention of any drop point on the keyboard is to input the current letter can be calculated.
  • the probability that the user will mistakenly touch other characters (at least one second character) when inputting the first character can be obtained, and then at least one second character can be sampled (or in other ways) based on the probability, A target character for replacing the first character is determined from at least one second character.
  • the current user's input intention is the letter 's', and he may actually touch the letters 'a', 'd', 'z' and other letters by mistake.
  • the probability can be obtained through the following steps: first obtain the coordinates of the center position of the mistouched letter on the keyboard, such as the coordinates of the center point of the letter 'a'. Next, according to the coordinates, combined with the Gaussian model of the letter 's', the probability of inputting the intention of 's' but touching 'a' by mistake is calculated.
  • the character string in this embodiment of the present application may include at least one character (for example, characters corresponding to virtual keys such as English letters and punctuation marks).
  • the target character used to replace the first character is determined through the target probability model, which can more accurately describe the user's actual behavior, that is, determine the character that is more likely to be touched by mistake, and then obtain
  • the noise-added training samples can better reflect the actual user operation situation, and the target neural network trained based on the noise-added training samples is also more accurate, which can enhance the robustness of the model in real user input scenarios.
  • the first character in the first string sequence may be replaced with the target character, or the first character may be added before the target character or Afterwards, to obtain a second character string sequence, the second character string sequence and the words and sentences are used as training samples of the target neural network, and the target neural network is used to generate corresponding words and sentences according to the character string sequence.
  • the pinyin sequence of 'woainizhongguo' can be noised into 'woaonizongguo' (replace i with o), combined with the previous corresponding Chinese character sequence 'I love you China' to form the training corpus after noise addition.
  • the pinyin sequence of 'woainizhongguo' can be noised into 'woaoinizongguo' (increase the o before the i), combined with the previous corresponding Chinese character sequence 'I love you China' to form the training corpus after noise addition.
  • the pinyin sequence of 'woainizhongguo' can be added noise to become 'woaionizongguo' (adding o to i), combined with the corresponding Chinese character sequence 'I love you China' before, to form the training corpus after noise addition.
  • the target neural network may be trained by using the training samples, and the target neural network may be the network described in the foregoing embodiments.
  • the target neural network may be trained according to the second character string sequence and the correct words and sentences.
  • the word count prediction module can also be used to predict the number of subunits. Therefore, in addition to the training loss of the target neural network, the training loss of the word count prediction module can also be constructed.
  • the key tone conversion task implemented by the target neural network is a standard sequence token classification task, so cross entropy loss can be used as the loss function, and cross entropy loss can be in the following form:
  • the length prediction can also be transformed into a classification problem. For example, assuming that the maximum output length of the model is 64, then the output of the length prediction module is transformed into a classification problem of 1 to 64 categories. This can also be described by cross entropy loss.
  • the length prediction problem can also be transformed into a regression problem, that is, the model predicts a real number to represent the length.
  • MSE mean squared error loss
  • the overall loss of the model in the training process is composed of the above two losses, which can be calculated by means of weighted average, for example, it can be in the following form:
  • Loss total w 1 *Loss mse +w 2 *Loss cross entropy ;
  • the weights w 1 and w 2 can be manually specified according to experience.
  • the loss fusion method other more complex methods that are now disclosed can also be used, such as the GradNorm method.
  • the present application provides a sample construction method, the method comprising: obtaining a first character string sequence and corresponding words and sentences, the first character string sequence including a first character; using a target probability model, from at least one second character Determining the target character corresponding to the first character, wherein the target probability model indicates that when the user inputs the first character on the virtual keyboard, the user accidentally touches the target character corresponding to each second character in the at least one second character.
  • the probability of the virtual key is related to at least one of the following: the size information of the virtual key, the layout information of the virtual key, the user's operating habits or the user's hand structure characteristics; replacing the first character in the first character string sequence with the target character, or adding the first character before or after the target character to obtain a second character string sequence, the second character
  • the string sequence and the words and sentences are used as training samples of the target neural network, and the target neural network is used to generate corresponding words and sentences according to the string sequence.
  • the target character used to replace the first character is determined through the target probability model, which can more accurately describe the user's actual behavior, that is, determine the character that is more likely to be touched by mistake, and then obtain
  • the noise-added training samples can better reflect the actual user operation situation, and the target neural network trained based on the noise-added training samples is also more accurate, which can enhance the robustness of the model in real user input scenarios.
  • FIG. 15 is a schematic structural diagram of a word and sentence generation device provided in the embodiment of the present application.
  • the word and sentence generation device 1500 provided by the present application includes:
  • An acquisition module 1501 configured to acquire a sequence of target character strings, the target character strings being input by the user in the input method tool;
  • Words and sentences generating module 1502 for generating the target words and sentences corresponding to the target character string sequence through the target neural network according to the target character string sequence, wherein the target neural network includes an encoder and a decoder, and the encoder uses Obtaining an embedding vector according to the target string sequence, the decoder is used to generate the target words and sentences according to the embedding vector, the target neural network is obtained through training samples, and the training samples include a string sequence and the corresponding words;
  • step 802 for the description of the word-sentence generation module 1502, reference may be made to the description of step 802 in the above-mentioned embodiment, which will not be repeated here.
  • a presentation module 1503 configured to present the target words and sentences in the interface of the input method tool.
  • the number of characters in the target string sequence is less than a threshold, and the threshold is a value less than or equal to 128.
  • the target word and sentence includes a first word unit and a second word unit, and the position of the first word unit in the target word and sentence is earlier than that of the second word unit, so
  • the decoder is specifically configured to: generate the second word unit according to the target string sequence without relying on the first word unit being generated.
  • the decoder is specifically configured to: generate the first word unit and the second word unit in parallel according to the target character string sequence.
  • the device also includes:
  • the number of words prediction module is used to predict the number of word units of the target word and sentence through the number of words prediction model according to the target character string sequence;
  • the word and sentence generation module is specifically used for:
  • the initial words and sentences corresponding to the target character string sequence are generated through the target neural network
  • the initial words and sentences are intercepted to obtain the target words and sentences.
  • the target character string sequence is a character string sequence containing noise, and the noise is caused by a user's wrong input in the input method tool;
  • the target words and sentences are correct words and sentences corresponding to the target character string sequence after denoising.
  • the encoder or decoder is one of the following models:
  • Fig. 16 is a schematic structural diagram of a sample construction device provided by the embodiment of the present application.
  • the sample construction device 1600 provided by the embodiment of the present application may include:
  • An acquisition module 1601 configured to acquire a first character string sequence and corresponding words and sentences, where the first character string sequence includes a first character;
  • a character replacement module 1602 configured to determine a target character corresponding to the first character from at least one second character through a target probability model, wherein the target probability model represents when the user inputs the first character on the virtual keyboard , the probability of accidentally touching the virtual key corresponding to each second character in the at least one second character, the probability is related to at least one of the following:
  • the size information of the virtual keys the layout information of the virtual keys, the user's operating habits or the user's hand structure characteristics;
  • the two character string sequences and the words and sentences are used as training samples of the target neural network, and the target neural network is used to generate corresponding words and sentences according to the character string sequences.
  • the target probability model is a Gaussian probability model.
  • the device also includes:
  • the training module is used to train the target neural network according to the second character string sequence and the correct words and sentences.
  • FIG. 17 is a schematic structural diagram of the execution device provided by the embodiment of the present application. Tablets, laptops, smart wearable devices or servers, etc., are not limited here.
  • the execution device 1700 includes: a receiver 1701, a transmitter 1702, a processor 1703, and a memory 1704 (the number of processors 1703 in the execution device 1700 may be one or more, and one processor is taken as an example in FIG. 17 ) , where the processor 1703 may include an application processor 17031 and a communication processor 17032 .
  • the receiver 1701 , the transmitter 1702 , the processor 1703 and the memory 1704 may be connected through a bus or in other ways.
  • the memory 1704 may include read-only memory and random-access memory, and provides instructions and data to the processor 1703 . Part of the memory 1704 may also include non-volatile random access memory (non-volatile random access memory, NVRAM).
  • the memory 1704 stores processors and operating instructions, executable modules or data structures, or their subsets, or their extended sets, wherein the operating instructions may include various operating instructions for implementing various operations.
  • the processor 1703 controls the operations of the execution device.
  • various components of the execution device are coupled together through a bus system, where the bus system may include not only a data bus, but also a power bus, a control bus, and a status signal bus.
  • the various buses are referred to as bus systems in the figures.
  • the methods disclosed in the foregoing embodiments of the present application may be applied to the processor 1703 or implemented by the processor 1703 .
  • the processor 1703 may be an integrated circuit chip, which has a signal processing capability. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 1703 or instructions in the form of software.
  • the above-mentioned processor 1703 may be a general-purpose processor, a digital signal processor (digital signal processing, DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • FPGA field programmable Field-programmable gate array
  • the processor 1703 may implement or execute various methods, steps, and logic block diagrams disclosed in the embodiments of the present application.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory 1704, and the processor 1703 reads the information in the memory 1704, and completes the steps of the above method in combination with its hardware.
  • the receiver 1701 can be used to receive input digital or character information, and generate signal input related to performing device related settings and function control.
  • the transmitter 1702 can be used to output numeric or character information; the transmitter 1702 can also be used to send instructions to the disk group to modify the data in the disk group.
  • the processor 1703 is configured to execute the method for generating words and sentences and the method for constructing samples executed by the execution device in the above embodiments (for example, the step of model reasoning through the target neural network).
  • FIG. 18 is a schematic structural diagram of the training device provided in the embodiment of the present application.
  • the training device 1800 is implemented by one or more servers. Can produce relatively large differences due to different configurations or performances, and can include one or more central processing units (central processing units, CPU) 1818 (for example, one or more processors) and memory 1832, one or more storage applications A storage medium 1830 for programs 1842 or data 1844 (such as one or more mass storage devices). Wherein, the memory 1832 and the storage medium 1830 may be temporary storage or persistent storage.
  • the program stored in the storage medium 1830 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the training device.
  • the central processing unit 1818 may be configured to communicate with the storage medium 1830 , and execute a series of instruction operations in the storage medium 1830 on the training device 1800 .
  • the training device 1800 can also include one or more power supplies 1826, one or more wired or wireless network interfaces 1850, one or more input and output interfaces 1858; or, one or more operating systems 1841, such as Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM and so on.
  • operating systems 1841 such as Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM and so on.
  • the central processing unit 1818 is configured to execute steps related to model training in the above embodiments.
  • the embodiment of the present application also provides a computer program product, which, when running on a computer, causes the computer to perform the steps performed by the aforementioned execution device, or enables the computer to perform the steps performed by the aforementioned training device.
  • An embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a program for signal processing, and when it is run on a computer, the computer executes the steps performed by the aforementioned executing device , or, causing the computer to perform the steps performed by the aforementioned training device.
  • the execution device, training device or terminal device provided in the embodiment of the present application may specifically be a chip.
  • the chip includes: a processing unit and a communication unit.
  • the processing unit may be, for example, a processor.
  • the communication unit may be, for example, an input/output interface, a pin or circuits etc.
  • the processing unit can execute the computer-executed instructions stored in the storage unit, so that the chips in the execution device execute the data processing methods described in the above embodiments, or make the chips in the training device execute the data processing methods described in the above embodiments.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (read- only memory, ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
  • ROM read-only memory
  • RAM random access memory
  • FIG. 19 is a schematic structural diagram of a chip provided by the embodiment of the present application.
  • the chip can be represented as a neural network processor NPU 1900, and the NPU 1900 is mounted to the main CPU (Host CPU) as a coprocessor ), the tasks are assigned by the Host CPU.
  • the core part of the NPU is the operation circuit 1903, and the operation circuit 1903 is controlled by the controller 1904 to extract matrix data in the memory and perform multiplication operations.
  • the operation circuit 1903 includes multiple processing units (Process Engine, PE).
  • arithmetic circuit 1903 is a two-dimensional systolic array.
  • the arithmetic circuit 1903 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition.
  • arithmetic circuit 1903 is a general-purpose matrix processor.
  • the operation circuit fetches the data corresponding to the matrix B from the weight memory 1902, and caches it in each PE in the operation circuit.
  • the operation circuit takes the data of matrix A from the input memory 1901 and performs matrix operation with matrix B, and the obtained partial or final results of the matrix are stored in the accumulator (accumulator) 1908 .
  • the unified memory 1906 is used to store input data and output data.
  • the weight data directly accesses the controller (Direct Memory Access Controller, DMAC) 1905 through the storage unit, and the DMAC is transferred to the weight storage 1902.
  • Input data is also transferred to unified memory 1906 by DMAC.
  • DMAC Direct Memory Access Controller
  • the BIU is the Bus Interface Unit, that is, the bus interface unit 1910, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (Instruction Fetch Buffer, IFB) 1909.
  • IFB Instruction Fetch Buffer
  • the bus interface unit 1910 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1909 to obtain instructions from the external memory, and is also used for the storage unit access controller 1905 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to move the input data in the external memory DDR to the unified memory 1906 , move the weight data to the weight memory 1902 , or move the input data to the input memory 1901 .
  • the vector calculation unit 1907 includes a plurality of calculation processing units, and if necessary, further processes the output of the calculation circuit, such as vector multiplication, vector addition, exponent operation, logarithmic operation, size comparison and so on. It is mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as Batch Normalization (batch normalization), pixel-level summation, and upsampling of feature planes.
  • the vector computation unit 1907 can store the vector of the processed output to unified memory 1906 .
  • the vector calculation unit 1907 can apply a linear function; or, a nonlinear function to the output of the operation circuit 1903, such as performing linear interpolation on the feature plane extracted by the convolution layer, and then for example, a vector of accumulated values to generate an activation value.
  • the vector calculation unit 1907 generates normalized values, pixel-level summed values, or both.
  • the vector of processed outputs can be used as an activation input to operational circuitry 1903, eg, for use in subsequent layers in a neural network.
  • An instruction fetch buffer (instruction fetch buffer) 1909 connected to the controller 1904 is used to store instructions used by the controller 1904;
  • the unified memory 1906, the input memory 1901, the weight memory 1902 and the fetch memory 1909 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • the processor mentioned above can be a general-purpose central processing unit, microprocessor, ASIC, or one or more integrated circuits for controlling the execution of the above-mentioned programs.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be A physical unit can be located in one place, or it can be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the connection relationship between the modules indicates that they have communication connections, which can be specifically implemented as one or more communication buses or signal lines.
  • the essence of the technical solution of this application or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a floppy disk of a computer , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, training device, or network device, etc.) execute the instructions described in various embodiments of the present application method.
  • a computer device which can be a personal computer, training device, or network device, etc.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transferred from a website, computer, training device, or data
  • the center transmits to another website site, computer, training device or data center via wired (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • wired eg, coaxial cable, fiber optic, digital subscriber line (DSL)
  • wireless eg, infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (such as a floppy disk, a hard disk, or a magnetic tape), an optical medium (such as a DVD), or a semiconductor medium (such as a solid state disk (Solid State Disk, SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

La présente invention se rapporte au domaine de l'intelligence artificielle. L'invention concerne un procédé de génération de mots ou de phrases. Le procédé comporte les étapes consistant à: acquérir une séquence de chaîne de caractères cible, une chaîne de caractères cible étant introduite par un utilisateur dans un outil de procédé d'entrée; générer, selon la séquence de chaîne de caractères cible et au moyen d'un réseau neuronal cible, un mot ou une phrase cible correspondant à la séquence de chaîne de caractères cible, le réseau neuronal cible comportant un codeur et un décodeur, le codeur étant utilisé pour obtenir un vecteur d'incorporation selon la séquence de chaîne de caractères cible, le décodeur étant utilisé pour générer le mot ou la phrase cible selon le vecteur d'incorporation, le réseau neuronal cible étant obtenu au moyen de l'apprentissage d'échantillons d'apprentissage, et les échantillons d'apprentissage comportant une séquence de chaîne de caractères et des mots ou phrases correspondants; et présenter le mot ou la phrase cible dans une interface de l'outil de procédé d'entrée. Dans la présente invention, une chaîne de caractères, qui n'est pas soumise à une correction d'erreurs ou à une segmentation de mots, est introduite dans un réseau neuronal cible, réduisant ainsi l'influence par superposition d'erreurs d'un modèle de correction d'erreurs et d'un modèle de segmentation de mots sur la précision d'un mot ou d'une phrase, et améliorant donc la précision de génération du mot ou de la phrase.
PCT/CN2022/139629 2021-12-21 2022-12-16 Procédé de génération de mots ou de phrases et dispositif associé WO2023116572A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111576377.8 2021-12-21
CN202111576377.8A CN116306612A (zh) 2021-12-21 2021-12-21 一种词句生成方法及相关设备

Publications (1)

Publication Number Publication Date
WO2023116572A1 true WO2023116572A1 (fr) 2023-06-29

Family

ID=86831040

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/139629 WO2023116572A1 (fr) 2021-12-21 2022-12-16 Procédé de génération de mots ou de phrases et dispositif associé

Country Status (2)

Country Link
CN (1) CN116306612A (fr)
WO (1) WO2023116572A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116757254B (zh) * 2023-08-16 2023-11-14 阿里巴巴(中国)有限公司 任务处理方法、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140136970A1 (en) * 2011-07-14 2014-05-15 Tencent Technology (Shenzhen) Company Limited Text inputting method, apparatus and system
CN112015279A (zh) * 2019-05-28 2020-12-01 北京搜狗科技发展有限公司 按键误触纠错方法及装置
CN112988962A (zh) * 2021-02-19 2021-06-18 平安科技(深圳)有限公司 文本纠错方法、装置、电子设备及存储介质
CN113468895A (zh) * 2021-05-28 2021-10-01 沈阳雅译网络技术有限公司 一种基于解码器输入增强的非自回归神经机器翻译方法
CN113553864A (zh) * 2021-06-30 2021-10-26 北京百度网讯科技有限公司 翻译模型的训练方法、装置、电子设备及存储介质
CN113655893A (zh) * 2021-07-08 2021-11-16 华为技术有限公司 一种词句生成方法、模型训练方法及相关设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140136970A1 (en) * 2011-07-14 2014-05-15 Tencent Technology (Shenzhen) Company Limited Text inputting method, apparatus and system
CN112015279A (zh) * 2019-05-28 2020-12-01 北京搜狗科技发展有限公司 按键误触纠错方法及装置
CN112988962A (zh) * 2021-02-19 2021-06-18 平安科技(深圳)有限公司 文本纠错方法、装置、电子设备及存储介质
CN113468895A (zh) * 2021-05-28 2021-10-01 沈阳雅译网络技术有限公司 一种基于解码器输入增强的非自回归神经机器翻译方法
CN113553864A (zh) * 2021-06-30 2021-10-26 北京百度网讯科技有限公司 翻译模型的训练方法、装置、电子设备及存储介质
CN113655893A (zh) * 2021-07-08 2021-11-16 华为技术有限公司 一种词句生成方法、模型训练方法及相关设备

Also Published As

Publication number Publication date
CN116306612A (zh) 2023-06-23

Similar Documents

Publication Publication Date Title
WO2022007823A1 (fr) Procédé et dispositif de traitement de données de texte
WO2022022163A1 (fr) Procédé d'apprentissage de modèle de classification de texte, dispositif, appareil, et support de stockage
CN111291181B (zh) 经由主题稀疏自编码器和实体嵌入的用于输入分类的表示学习
CN107836000B (zh) 用于语言建模和预测的改进的人工神经网络方法、电子设备
CN111506714A (zh) 基于知识图嵌入的问题回答
CN109376222B (zh) 问答匹配度计算方法、问答自动匹配方法及装置
CN110704576B (zh) 一种基于文本的实体关系抽取方法及装置
WO2023160472A1 (fr) Procédé de formation de modèle et dispositif associé
CN109214006B (zh) 图像增强的层次化语义表示的自然语言推理方法
WO2021051513A1 (fr) Procédé de traduction de chinois en anglais basé sur un réseau neuronal et dispositifs associés
WO2021238333A1 (fr) Réseau de traitement de texte, procédé d'entraînement de réseau de neurones et dispositif associé
WO2022001724A1 (fr) Procédé et dispositif de traitement de données
CN111401084A (zh) 一种机器翻译的方法、设备以及计算机可读存储介质
EP4361843A1 (fr) Procédé de recherche de réseau neuronal et dispositif associé
WO2021129411A1 (fr) Procédé et dispositif de traitement de texte
CN113779225B (zh) 实体链接模型的训练方法、实体链接方法及装置
CN113707299A (zh) 基于问诊会话的辅助诊断方法、装置及计算机设备
CN111145914B (zh) 一种确定肺癌临床病种库文本实体的方法及装置
CN116432019A (zh) 一种数据处理方法及相关设备
CN114492661B (zh) 文本数据分类方法和装置、计算机设备、存储介质
WO2023116572A1 (fr) Procédé de génération de mots ou de phrases et dispositif associé
CN108875024B (zh) 文本分类方法、系统、可读存储介质及电子设备
CN112560440A (zh) 一种基于深度学习的面向方面级情感分析的句法依赖方法
CN116680401A (zh) 文档处理方法、文档处理装置、设备及存储介质
CN113362809B (zh) 语音识别方法、装置和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22909885

Country of ref document: EP

Kind code of ref document: A1