CN112633479A - Target data prediction method and device - Google Patents

Target data prediction method and device Download PDF

Info

Publication number
CN112633479A
CN112633479A CN202011631559.6A CN202011631559A CN112633479A CN 112633479 A CN112633479 A CN 112633479A CN 202011631559 A CN202011631559 A CN 202011631559A CN 112633479 A CN112633479 A CN 112633479A
Authority
CN
China
Prior art keywords
network model
target data
sample
batches
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011631559.6A
Other languages
Chinese (zh)
Inventor
吴帅
李健
陈明
武卫东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sinovoice Technology Co Ltd
Original Assignee
Beijing Sinovoice Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sinovoice Technology Co Ltd filed Critical Beijing Sinovoice Technology Co Ltd
Priority to CN202011631559.6A priority Critical patent/CN112633479A/en
Publication of CN112633479A publication Critical patent/CN112633479A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a target data prediction method and a target data prediction device, and relates to the technical field of neural networks. Wherein the method comprises the following steps: acquiring a plurality of first target data satisfying a long-term dependency condition; inputting a plurality of first target data into a neural network model and outputting second target data; the second target data has a sequence correlation with the plurality of first target data; the neural network model is obtained by training the following steps: dividing the sample data set into a plurality of batches of input items, inputting the input items to the initial network model according to the sequence of the batches, calculating to obtain an output item of the initial network model according to an expansion convolution algorithm, and training the initial network model according to the input items, the output item and a loss function to obtain the neural network model. According to the target data prediction method and device, the trained neural network model is used for quickly predicting the second target data with sequence correlation for the first target data with the overlong dependency relationship.

Description

Target data prediction method and device
Technical Field
The invention relates to the technical field of neural networks, in particular to a target data prediction method and a target data prediction device.
Background
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will relate to natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics, but has important difference. Natural language processing is not a general study of natural language but is directed to the development of computer systems, and particularly software systems therein, that can efficiently implement natural language communications. It is thus part of computer science.
Natural language processing can be applied to a plurality of technical fields, for example, in the field of speech recognition, natural language processing is performed on a preceding text to predict a following text. In the field of quantitative trading, natural language processing is performed on historical prices to predict future prices. In the field of intelligent recommendation, natural language processing is performed on past browsing records of a user, so that contents which are interested by the user are predicted.
However, the current common natural language processing process has the problem that quick prediction and capture of ultra-long dependence cannot be simultaneously considered.
Disclosure of Invention
In view of the above, the present invention has been made to provide a prediction method and apparatus of target data that overcomes or at least partially solves the above problems.
According to a first aspect of the present invention, there is provided a method of predicting target data, the method comprising: acquiring a plurality of first target data satisfying a preset long-term dependency condition; inputting a plurality of first target data into a trained neural network model, and outputting second target data; wherein the second target data has a sequence correlation with a plurality of the first target data; the neural network model is obtained by training through the following steps: dividing a sample data set into a plurality of batches of input items, inputting the input items to an initial network model according to the sequence of the batches, calculating according to an expansion convolution algorithm to obtain output items of the initial network model, and training the initial network model according to the input items, the output items and a loss function to obtain the neural network model.
Optionally, the dividing the sample data set into a plurality of batches of entries includes: converting each sample data in the sample data set into a sample vector; dividing all the sample vectors into a plurality of batches of the input items according to a preset length; wherein the number of sample data divided by the length is equal to the number of batches.
Optionally, after the dividing the sample data set into a plurality of batches of entries, the method further comprises: and respectively shifting each input item backwards by one bit of the sample vector, and dividing according to the length to obtain the marking data corresponding to each input item.
Optionally, the inputting the input items to the initial network model in the order of the batches includes: splicing the current input item with the output item of the last network layer in the initial network model; and inputting the spliced sample vector to a current network layer in the initial network model.
Optionally, the network layer of the neural network model includes a plurality of expansion convolutional layers, and a storage space of each expansion convolutional layer is a matrix space in which the length is multiplied by a dimension of the sample vector.
According to a second aspect of the present invention, there is provided an apparatus for predicting target data, the apparatus comprising: the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a plurality of first target data meeting a preset long-term dependence condition; the input module is used for inputting a plurality of first target data into the trained neural network model and outputting second target data; wherein the second target data has a sequence correlation with a plurality of the first target data; the device further comprises: the training module is used for dividing the sample data set into a plurality of batches of input items, inputting the input items to the initial network model according to the sequence of the batches, calculating to obtain an output item of the initial network model according to an expansion convolution algorithm, and training the initial network model according to the input items, the output item and a loss function to obtain the neural network model.
Optionally, the training module comprises: a sample conversion module for converting each sample data in the sample data set into a sample vector; the sample dividing module is used for dividing all the sample vectors into a plurality of batches of the input items according to a preset length; wherein the number of sample data divided by the length is equal to the number of batches.
Optionally, the training module further includes: and the sample labeling module is used for respectively shifting each input item backwards by one bit of the sample vector after all the sample vectors are divided into a plurality of batches of the input items by the sample dividing module, and dividing according to the length to obtain labeling data corresponding to each input item.
Optionally, the training module comprises: the sample splicing module is used for splicing the current input item with the output item of the last network layer in the initial network model; and inputting the spliced sample vector to a current network layer in the initial network model.
Optionally, the network layer of the neural network model includes a plurality of expansion convolutional layers, and a storage space of each expansion convolutional layer is a matrix space in which the length is multiplied by a dimension of the sample vector.
The embodiment of the invention has the following beneficial effects:
according to the target data prediction method and device provided by the embodiment of the invention, a plurality of first target data meeting a preset long-term dependency condition are obtained, the plurality of first target data are input into a trained neural network model, and second target data are output. Wherein the plurality of first target data has a sequence correlation with the second target data. The plurality of first target data satisfying the long-term dependency condition described above may be understood as a plurality of first target data having an ultra-long dependency relationship. That is, with the neural network model described above, second target data having sequence correlation can be predicted for a plurality of first target data having an ultralong dependency relationship.
The training step of the neural network model may include: dividing the sample data set into a plurality of batches of input items, inputting the input items to the initial network model according to the sequence of the batches, calculating according to an expansion convolution algorithm to obtain output items of the initial network model, and training the initial network model according to the input items, the output items and the loss function to obtain the neural network model. That is, in the training process of the neural network model, the output term is calculated according to the input term by using the dilation convolution algorithm. The expansion convolution algorithm has the advantage of high calculation speed, so that the neural network model also has the advantage of quick prediction result. That is, the neural network model described above can quickly predict the second target data for the first target data.
In summary, in the embodiments of the present invention, the trained neural network model can be used to quickly predict the second target data with sequence correlation for a plurality of first target data with very long dependency.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart illustrating a method for predicting target data according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a training method of a neural network model according to a second embodiment of the present invention;
FIG. 3 illustrates a network architecture diagram of an initial network model of an embodiment of the present invention;
FIG. 4 is a schematic diagram of a network structure of a neural network model according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a target data prediction apparatus according to a third embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Example one
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for predicting target data according to a first embodiment of the present invention.
The method for predicting target data provided by the embodiment of the invention may specifically include the following steps.
In step S101, a plurality of first target data satisfying a preset long-term dependency condition is acquired.
In an embodiment of the present invention, the first target data may be specific data in an actual application scenario. For example, in application scenarios such as banking, security, courier, and airline flights, the first objective data may be a number. Moreover, the plurality of first target data satisfying the preset long-term dependency condition may be a plurality of numbers having an ultra-long dependency relationship, such as a number in a bank card number, a number in a stock exchange number, a number in an express delivery number, a number in a flight number, and the like. In general, a plurality of first target data satisfying the long-term dependency condition can be represented by an x (t) data sequence. Wherein t is a natural number, and x (t) represents the t-th first target data.
And S102, inputting a plurality of first target data into the trained neural network model, and outputting second target data.
In the embodiment of the invention, the trained neural network model can be obtained by training through the following steps: dividing the sample data set into a plurality of batches of input items, inputting the input items to the initial network model according to the sequence of the batches, calculating according to an expansion convolution algorithm to obtain output items of the initial network model, and training the initial network model according to the input items, the output items and the loss function to obtain the neural network model. The neural network model is used for predicting second target data for a plurality of first target data, and the second target data has sequence correlation with the plurality of first target data. For example, if the plurality of first target data are first words in a sentence, the second target data may be a last word or a few last words in the sentence. For another example, if the first target data are the first numbers of a bank card number, the second target data may be the last number or the last numbers of the bank card number.
According to the target data prediction method for the target object, provided by the embodiment of the invention, a plurality of first target object target data meeting a preset long-term dependency condition are obtained, the plurality of first target object target data are input into a trained neural network model, and second target object target data are output. Wherein the plurality of first target object target data has a sequence correlation with the second target object target data. The plurality of first target object target data satisfying the long-term dependency condition described above may be understood as a plurality of first target object target data having an ultra-long dependency relationship. That is, the neural network model can predict second target object target data having sequence correlation for a plurality of first target object target data having an ultralong dependency relationship.
The training step of the neural network model may include: dividing the sample data set into a plurality of batches of input items, inputting the input items to the initial network model according to the sequence of the batches, calculating according to an expansion convolution algorithm to obtain output items of the initial network model, and training the initial network model according to the input items, the output items and the loss function to obtain the neural network model. That is, in the training process of the neural network model, the output term is calculated according to the input term by using the dilation convolution algorithm. The expansion convolution algorithm has the advantage of high calculation speed, so that the neural network model also has the advantage of quick prediction result. That is, the neural network model described above can quickly predict the second target object target data for the first target object target data.
In summary, in the embodiments of the present invention, the trained neural network model can be used to quickly predict the second target object target data with sequence correlation for a plurality of first target object target data with an ultra-long dependency relationship.
Example two
Referring to fig. 2, fig. 2 is a flowchart illustrating a training method of a neural network model according to a second embodiment of the present invention.
The training method of the neural network model provided by the embodiment of the invention specifically comprises the following steps.
Step S201, a sample data set is obtained and is divided.
In an embodiment of the present invention, the sample data set may include multiple segments of text, multiple bank card numbers, multiple order numbers, and the like. The embodiment of the invention is introduced by taking a natural language processing scene as an example, and the sample data sets of other application scenes can be different, but the execution processes of the division processing for the sample data sets can be referred to each other. Therefore, the sample data set x in the embodiment of the present invention may be a segment of text as follows:
"poetry: chaos is not all the day and the field, and the vast atto is absent. Since ancient times of hongmeng, the syndrome of clear and turbid was developed. Ancient times carried the crowd who went upward to the benevolence, invented all things good. To know the original function of the formation, you should look at the western journey to release it. The number of people who vegetate the heaven and earth is one yuan, which is twelve thousand, nine thousand, six hundred years old. Divide the unary into twelve parts, Naizi, Chongzi, Yin, Mao, Chen, Yi, Wu, Yu, Shen, you, Xu and Hai. Every meeting will be ten thousand and eight hundred years old. And as for one day: when the chicken is born, yang qi is obtained, and when the chicken is born, the chicken is singing; yin Bi Tong Guang'
Each character or punctuation can be understood as sample data in the sample data set x. After the sample data set x is acquired, each sample data in the sample data set x is converted into a sample vector x (i), and the sample vector x (i) represents the ith sample data. i may represent the sequence number of the sample data in the sample data set x. The dimension of each sample data is d, which can be understood as the representing dimension of the sample data in the computer.
Then, all sample vectors x (i) in the sample data set x are divided into a plurality of batches of entries according to a preset length L. Specifically, the number N of sample data or the number N of sample vectors may be divided by the preset length L to obtain the number N/L of batches. The predetermined length L is the number of sample vectors contained in an entry of a batch.
Therefore, the sample data set x may be divided into several batches of entries as follows:
in the first batch: [ poetry: chaos is not all the day and the field, and the vast atto is absent. Since ancient craze hongmeng, open up daozqing)
And (3) second batch: [ differentiation of turbid pathogen. Ancient times carried the crowd who went upward to the benevolence, invented all things good. To know the original function of the formation, you need to look at the west trip
And (3) third batch: [ release of early transmission. The number of people who vegetate the heaven and earth is one yuan, which is twelve thousand, nine thousand, six hundred years old. Dividing a unary into twelve ]
And a fourth batch: twelve earthly branches of Nai, Chou, Yin, Mao, Chen, Yi, Wu, Yu, Shen, Yu, Wu and Hai. ]
And a fifth batch: [ this ten thousand and one hundred years old at each meeting. And as for one day: when the chicken is born, yang qi is obtained, and when the chicken is born, the chicken is singing; yin obstructed light ]
Taking the first batch of entries as an example for explanation, the description of the entries of other batches can refer to the relevant explanation of the entries of the first batch. In the first batch of entries, L is the number of "poems", "york", … …, "qing" (containing text and punctuation), i.e., L is 32. The entries of the first batch may be represented as x (1), x (2), x (3), … …, x (32) using a sample vector x (i). It should be noted that, in order to make the understanding of the present embodiment easier, i in the sample vector can be directly replaced by the sample data itself, that is, x (1) is replaced by x (poem), x (2) is replaced by x (yu), x (3) is replaced by x (: … …), and x (32) is replaced by x (qing). Thus, the entries for the first batch may also be denoted as poem, x (R), x (: … …, x (Qing).
Step S202, acquiring the annotation data of the input item.
In the embodiment of the present invention, after the input items of each batch are obtained, each input item may be shifted backward by one bit of sample vector, and the labeling data corresponding to the input items of each batch is obtained by dividing according to a preset length.
In the above example, the first batch of entries is still used as an example for explanation. In the sample data set x, the entries (poetry), x (yu), x (: … …, x (Qing) of the first batch are offset backward by one sample vector, and are divided according to a preset length L to obtain the marking data corresponding to the entries of the first batch. I.e. entries (poem), x (yu), x (: … …, x (qing) of the first batch correspond to the annotation data starting from x (yu) up to a vector of 32 samples of x (turbid).
Finally, the entries for the first batch are poetry, x (ri), x (: … …, x (qing). The entries of the first batch correspond to the labeled data x (yu), x (: x (mix), … …, x (clear). Similarly, the annotation data corresponding to the input items of other batches can be obtained, and the embodiment of the present invention is not described herein again.
The labeling data of the input items of each batch can be used as the input items of each batch and input into the neural network model, and the calibration data of the output items is obtained. That is to say, the input items of each batch are input to the neural network model, the obtained output items are compared with the corresponding labeled data, and then the network parameters of each network layer of the neural network model are adjusted according to the comparison result until the neural network model after the network parameters are adjusted meets the convergence condition, that is, after the input items of each batch are input to the neural network model after the network parameters are adjusted, the obtained output items are the same as, similar to or similar to the corresponding labeled data to meet the requirements.
And step S203, inputting the input items of each batch into the initial network model, and calculating according to the expansion convolution algorithm to obtain output items.
In embodiments of the present invention, the initial network model may comprise a plurality of network layers. The network layer may be divided into an input layer, an output layer, and an intermediate layer. Each intermediate layer may be allocated a corresponding storage space in advance. The storage space of the middle layer is consistent with the storage space of the input items, namely the storage space of the middle layer is a matrix of L ﹡ d. If the initial network model has a middle layer of M layers (in practical applications, M may be 6), then M ﹡ L ﹡ d needs to be allocated.
Furthermore, embodiments of the present invention may also provide a corresponding memory layer for each intermediate layer. The storage space of the memory layer may be greater than or equal to the storage space of the intermediate layer. The entries of the middle layer may be copied to the memory layer, i.e. the memory layer may act as a backup for the middle layer. Initially, the memory layer is empty.
In an exemplary embodiment of the invention, the initial network model may contain two intermediate layers in addition to the input layer and the output layer. As shown in fig. 3, fig. 3 shows a network structure diagram of the initial network model. First, a first batch of entries may be input to the input layer, the first batch of entries may be input from the input layer to the first-layer middle layer, and the output of the first-layer middle layer may be copied to the memory layer corresponding to the first-layer middle layer. And then splicing the input items of the first batch with the output items of the first layer intermediate layer stored in the memory layer, and inputting the spliced sample vector into the second layer intermediate layer to be used as the input items of the second layer intermediate layer. That is, in addition to the entries of the first batch, the current entry may be spliced with the output entry of the previous network layer in the initial network model, and the spliced sample vector may be input to the current network layer as the entry of the current network layer.
In an exemplary embodiment of the invention, each intermediate layer in the neural network model may be a swelling convolutional layer, that is, the memory space of each swelling convolutional layer is a matrix of L ﹡ d.
And step S204, training the initial network model according to the loss function.
In the embodiment of the invention, a loss function corresponding to an application scene can be selected to train the initial network model, and finally the trained neural network model is obtained.
Based on the above-mentioned related description about the training method of a neural network model, a network structure of a neural network model is described below. As shown in fig. 4, fig. 4 is a schematic diagram of a network structure of a neural network model. In fig. 4, the middle layer of the neural network model may contain two swelling convolutional layers. Further, a memory layer is provided for each expanded winding layer. The memory layer is used for backing up the input items of the previous batch to be input into the expansion convolution layer, and obtaining the output items through calculation. For example, the entry for the current lot (current input) is x (i) and the entry for the previous lot (historical input) is x' (i). Splicing x (i) and x' (i) into
Figure BDA0002876722080000091
As an input to the current inflation convolutional layer. When calculating the output item of each expansion convolution layer, according to
Figure BDA0002876722080000092
Computing
Figure BDA0002876722080000093
Wherein,
Figure BDA0002876722080000094
indicating the value of the ith position of the jth layer. And will be calculated
Figure BDA0002876722080000095
To the corresponding memory layer. The value stored in the memory layer is represented as
Figure BDA0002876722080000096
And obtaining an output item (current output) corresponding to the input item of the current batch after passing through the two expansion convolution layers of the neural network model. The last memory layer corresponding to the expansion convolution layer stores the output item (history output) of the previous batch.
Embodiments of the present invention train a neural network model that can both compute output items with low latency and capture input items with very long dependencies.
The embodiment of the invention splices the current input and the historical input to be used as the input items of the current network layer, thereby realizing the function of capturing the input items with overlong dependency relationship.
According to the embodiment of the invention, the initial network model can be trained by adopting the loss function in different application scenes, so that the neural network model aiming at different application scenes is obtained, and the application range of the neural network model is widened.
EXAMPLE III
Referring to fig. 5, fig. 5 is a schematic structural diagram illustrating a target data prediction apparatus according to a third embodiment of the present invention. The device may specifically include the following modules:
an obtaining module 51, configured to obtain a plurality of first target data that satisfy a preset long-term dependency condition;
an input module 52, configured to input a plurality of first target data into the trained neural network model, and output second target data;
wherein the second target data has a sequence correlation with a plurality of the first target data;
the device further comprises: the training module 53 is configured to divide the sample data set into multiple batches of input items, input the input items to the initial network model according to the sequence of the batches, calculate an output item of the initial network model according to an expansion convolution algorithm, and train the initial network model according to the input item, the output item, and a loss function to obtain the neural network model.
In an exemplary embodiment of the present invention, the training module 53 includes:
a sample conversion module for converting each sample data in the sample data set into a sample vector;
the sample dividing module is used for dividing all the sample vectors into a plurality of batches of the input items according to a preset length;
wherein the number of sample data divided by the length is equal to the number of batches.
In an exemplary embodiment of the present invention, the training module 53 further includes:
and the sample labeling module is used for respectively shifting each input item backwards by one bit of the sample vector after all the sample vectors are divided into a plurality of batches of the input items by the sample dividing module, and dividing according to the length to obtain labeling data corresponding to each input item.
In an exemplary embodiment of the present invention, the training module 53 includes:
the sample splicing module is used for splicing the current input item with the output item of the last network layer in the initial network model; and inputting the spliced sample vector to a current network layer in the initial network model.
In an exemplary embodiment of the invention, the network layer of the neural network model comprises a plurality of swelling convolutional layers, and a storage space of each swelling convolutional layer is a matrix space of the length multiplied by a dimension of the sample vector.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As is readily imaginable to the person skilled in the art: any combination of the above embodiments is possible, and thus any combination between the above embodiments is an embodiment of the present invention, but the present disclosure is not necessarily detailed herein for reasons of space.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (10)

1. A method of predicting target data, the method comprising:
acquiring a plurality of first target data satisfying a preset long-term dependency condition;
inputting a plurality of first target data into a trained neural network model, and outputting second target data;
wherein the second target data has a sequence correlation with a plurality of the first target data;
the neural network model is obtained by training through the following steps:
dividing a sample data set into a plurality of batches of input items, inputting the input items to an initial network model according to the sequence of the batches, calculating according to an expansion convolution algorithm to obtain output items of the initial network model, and training the initial network model according to the input items, the output items and a loss function to obtain the neural network model.
2. The method of claim 1, wherein the dividing the sample data set into multiple batches of entries comprises:
converting each sample data in the sample data set into a sample vector;
dividing all the sample vectors into a plurality of batches of the input items according to a preset length;
wherein the number of sample data divided by the length is equal to the number of batches.
3. The method of claim 1, wherein after said dividing the sample data set into batches of entries, the method further comprises:
and respectively shifting each input item backwards by one bit of the sample vector, and dividing according to the length to obtain the marking data corresponding to each input item.
4. The method of claim 1, wherein said inputting said entries into an initial network model in a batch order comprises:
splicing the current input item with the output item of the last network layer in the initial network model;
and inputting the spliced sample vector to a current network layer in the initial network model.
5. The method of any one of claims 2 to 4, wherein the network layer of the neural network model comprises a plurality of swelling convolutional layers, and a storage space of each swelling convolutional layer is a matrix space of the length multiplied by a dimension of the sample vector.
6. An apparatus for predicting target data, the apparatus comprising:
the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a plurality of first target data meeting a preset long-term dependence condition;
the input module is used for inputting a plurality of first target data into the trained neural network model and outputting second target data;
wherein the second target data has a sequence correlation with a plurality of the first target data;
the device further comprises: the training module is used for dividing the sample data set into a plurality of batches of input items, inputting the input items to the initial network model according to the sequence of the batches, calculating to obtain an output item of the initial network model according to an expansion convolution algorithm, and training the initial network model according to the input items, the output item and a loss function to obtain the neural network model.
7. The apparatus of claim 6, wherein the training module comprises:
a sample conversion module for converting each sample data in the sample data set into a sample vector;
the sample dividing module is used for dividing all the sample vectors into a plurality of batches of the input items according to a preset length;
wherein the number of sample data divided by the length is equal to the number of batches.
8. The apparatus of claim 7, wherein the training module further comprises:
and the sample labeling module is used for respectively shifting each input item backwards by one bit of the sample vector after all the sample vectors are divided into a plurality of batches of the input items by the sample dividing module, and dividing according to the length to obtain labeling data corresponding to each input item.
9. The apparatus of claim 6, wherein the training module comprises:
the sample splicing module is used for splicing the current input item with the output item of the last network layer in the initial network model; and inputting the spliced sample vector to a current network layer in the initial network model.
10. The apparatus of any one of claims 7 to 9, wherein the network layer of the neural network model comprises a plurality of swelling convolutional layers, and a storage space of each swelling convolutional layer is a matrix space of the length multiplied by a dimension of the sample vector.
CN202011631559.6A 2020-12-30 2020-12-30 Target data prediction method and device Pending CN112633479A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011631559.6A CN112633479A (en) 2020-12-30 2020-12-30 Target data prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011631559.6A CN112633479A (en) 2020-12-30 2020-12-30 Target data prediction method and device

Publications (1)

Publication Number Publication Date
CN112633479A true CN112633479A (en) 2021-04-09

Family

ID=75289793

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011631559.6A Pending CN112633479A (en) 2020-12-30 2020-12-30 Target data prediction method and device

Country Status (1)

Country Link
CN (1) CN112633479A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358948A (en) * 2017-06-27 2017-11-17 上海交通大学 Language in-put relevance detection method based on attention model
CN110286778A (en) * 2019-06-27 2019-09-27 北京金山安全软件有限公司 Chinese deep learning input method and device and electronic equipment
CN111199727A (en) * 2020-01-09 2020-05-26 厦门快商通科技股份有限公司 Speech recognition model training method, system, mobile terminal and storage medium
CN111882031A (en) * 2020-06-30 2020-11-03 华为技术有限公司 Neural network distillation method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358948A (en) * 2017-06-27 2017-11-17 上海交通大学 Language in-put relevance detection method based on attention model
CN110286778A (en) * 2019-06-27 2019-09-27 北京金山安全软件有限公司 Chinese deep learning input method and device and electronic equipment
CN111199727A (en) * 2020-01-09 2020-05-26 厦门快商通科技股份有限公司 Speech recognition model training method, system, mobile terminal and storage medium
CN111882031A (en) * 2020-06-30 2020-11-03 华为技术有限公司 Neural network distillation method and device

Similar Documents

Publication Publication Date Title
CN111581510B (en) Shared content processing method, device, computer equipment and storage medium
CN113627447A (en) Label identification method, label identification device, computer equipment, storage medium and program product
CN111461301B (en) Serialized data processing method and device, and text processing method and device
CN110309282A (en) A kind of answer determines method and device
CN109885723A (en) A kind of generation method of video dynamic thumbnail, the method and device of model training
WO2023071242A1 (en) Text generation method and apparatus, and storage medium
CN112256886A (en) Probability calculation method and device in map, computer equipment and storage medium
CN109670161A (en) Commodity similarity calculating method and device, storage medium, electronic equipment
CN113052262A (en) Form generation method and device, computer equipment and storage medium
CN112699656A (en) Advertisement title rewriting method, device, equipment and storage medium
CN112651324A (en) Method and device for extracting semantic information of video frame and computer equipment
CN114282055A (en) Video feature extraction method, device and equipment and computer storage medium
CN116245097A (en) Method for training entity recognition model, entity recognition method and corresponding device
CN112232052A (en) Text splicing method and device, computer equipment and storage medium
CN113887237A (en) Slot position prediction method and device for multi-intention text and computer equipment
CN111444335B (en) Method and device for extracting central word
CN108614815A (en) Sentence exchange method and device
CN117520499A (en) Training method, using method, device, equipment and medium of general language model
CN112434746A (en) Pre-labeling method based on hierarchical transfer learning and related equipment thereof
CN117079292A (en) Method and device for extracting identity card information, computer equipment and storage medium
CN112633479A (en) Target data prediction method and device
CN112417260B (en) Localized recommendation method, device and storage medium
CN113420869A (en) Translation method based on omnidirectional attention and related equipment thereof
CN111859939A (en) Text matching method and system and computer equipment
CN113762992A (en) Method and device for processing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination