WO2021047473A1 - 神经网络的训练方法及装置、语义分类方法及装置和介质 - Google Patents

神经网络的训练方法及装置、语义分类方法及装置和介质 Download PDF

Info

Publication number
WO2021047473A1
WO2021047473A1 PCT/CN2020/113740 CN2020113740W WO2021047473A1 WO 2021047473 A1 WO2021047473 A1 WO 2021047473A1 CN 2020113740 W CN2020113740 W CN 2020113740W WO 2021047473 A1 WO2021047473 A1 WO 2021047473A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
comment
network
vector
representation
Prior art date
Application number
PCT/CN2020/113740
Other languages
English (en)
French (fr)
Inventor
张振中
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US17/418,836 priority Critical patent/US11934790B2/en
Publication of WO2021047473A1 publication Critical patent/WO2021047473A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Definitions

  • the embodiments of the present disclosure relate to a training method of a neural network, a training device of a neural network, a semantic classification method, a semantic classification device, and a storage medium.
  • Natural intelligence is a theory, method, technology and application system that uses digital computers or digital computer-controlled machines to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence includes studying the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology can be applied to the field of Natural Language Processing (NLP). NLP is the intersection of computer science, artificial intelligence, and information engineering. It involves knowledge of statistics, linguistics, etc., and its goal is to allow computers to process or "understand" natural language to perform tasks such as text classification, language translation, and question answering.
  • NLP Natural Language Processing
  • At least one embodiment of the present disclosure provides a semantic classification method, including: inputting a first comment about a first object; processing the first comment by using a common representation extractor to extract the information used to characterize the first comment The first common representation vector of the common representation; use the first representation extractor to process the first comment to extract the first single representation vector that is used to characterize the single representation in the first comment; A common representation vector and the first single representation vector are spliced together to obtain a first representation vector; and a first semantic classifier is used to process the first representation vector to obtain the semantic classification of the first comment
  • the common representation includes meaning representations used to comment on both the first object and the second object, the second object being an associated comment object different from the first object, and the first object
  • the single expression of the comment includes a meaning expression used only for commenting on the first object.
  • the semantic classification method provided by some embodiments of the present disclosure further includes: mapping the first comment to a first original vector; wherein, using the common representation extractor to process the first comment includes: using The common representation extractor processes the first original vector; using the first representation extractor to process the first comment includes: using the first representation extractor to process the first original vector To process.
  • mapping the first comment to the first original vector includes: using a word vector algorithm to map each word in the first comment as having Specify the length of the vector to obtain the first original vector.
  • the common representation extractor and the first representation extractor each include one of a recurrent neural network, a long short-term memory network, and a bidirectional long-short-term memory network, respectively.
  • the first semantic classifier includes a softmax classifier.
  • the semantic classification method provided by some embodiments of the present disclosure further includes: inputting a second comment about a second object; using the common representation extractor to process the second comment to extract the second comment for characterizing the first object.
  • the second common representation vector of the common representation in the two reviews; the second review is processed by a second representation extractor to extract a second single representation vector for characterizing the single representation in the second review ; Splicing the second common representation vector and the second single representation vector to obtain a second representation vector; and using a second semantic classifier to process the second representation vector to obtain the second representation vector Semantic classification of reviews; wherein the single representation of the second review includes a meaning representation used only for reviewing the second object.
  • the semantic classification method provided by some embodiments of the present disclosure further includes: mapping the second comment to a second original vector; wherein, using the common representation extractor to process the second comment includes: using The common representation extractor processes the second original vector; using the second representation extractor to process the second comment includes: using the second representation extractor to process the second original vector To process.
  • mapping the second comment to the second original vector includes: using a word vector algorithm to map each word in the second comment as having Specify the length of the vector to obtain the second original vector.
  • the second representation extractor includes one of a recurrent neural network, a long short-term memory network, and a bidirectional long short-term memory network, and the second semantic classifier includes softmax classification.
  • the second semantic classifier includes softmax classification.
  • the corpus source of the first comment and the second comment includes at least one of text and voice.
  • At least one embodiment of the present disclosure further provides a neural network training method, the neural network including: a generating network, a first branch network, a first classification network, a second branch network, and a second classification network; the training method includes : Semantic classification training stage; wherein, the semantic classification training stage includes: inputting a first training comment about a first object, and using the generation network to process the first training comment to extract a first training common representation vector , Using the first branch network to process the first training comment to extract a first training single representation vector, and splicing the first training common representation vector with the first training single representation vector to obtain A first training representation vector, using the first classification network to process the first training representation vector to obtain a prediction category identifier of the semantic classification of the first training comment; inputting a second training comment about a second object , Using the generation network to process the second training comment to extract a second training common representation vector, and using the second branch network to process the second training comment to extract a second training single representation vector , Splicing the second training
  • the semantic classification training stage further includes: mapping the first training comment to a first training original vector, and mapping the second training comment to a second training Original vector; wherein, using the generation network to process the first training comment includes: using the generation network to process the first training original vector; using the first branch network to process the first training comment Processing training comments includes: using the first branch network to process the first training original vectors; using the generation network to process the second training comments, including: using the generation network to process the Processing a second training original vector; processing the second training comment using the second branch network includes: using the second branch network to process the second training original vector.
  • mapping the first training comment to the first training original vector includes: using a word vector algorithm to map each word in the first training comment Is a vector with a specified length to obtain the first training original vector; mapping the second training comment to the second training original vector includes: using the word vector algorithm to transfer the second training comment Each word of is mapped to a vector with the specified length to obtain the second training original vector.
  • the generation network, the first branch network, and the second branch network all include one of a recurrent neural network, a long short-term memory network, and a bidirectional long short-term memory network , Both the first classification network and the second classification network include a softmax classifier.
  • the system loss function is expressed as:
  • L obj represents the system loss function
  • L( ⁇ , ⁇ ) represents the cross-entropy loss function
  • Y1 represents the prediction category identification of the first training review
  • T1 represents the true category identification of the first training review
  • L(Y1 , T1) represents the loss of the first cross-entropy function of training review
  • ⁇ 1 represents the weight loss function of the system
  • the first training reviews cross entropy loss function L (Y1, T1) of the weight
  • Y2 represents the second
  • T1 represents the true category identifier of the second training review
  • L(Y2, T2) represents the cross-entropy loss function of the second training review
  • ⁇ 2 represents the first training review in the system loss function.
  • Y and T are both formal parameters
  • N represents the number of training comments
  • K represents the number of category identifiers for semantic classification
  • the neural network further includes a discriminant network; the training method further includes: generating a confrontation training phase; and alternately performing the generation confrontation training phase and the semantic classification Training phase; wherein the generative confrontation training phase includes: training the discriminant network based on the generative network; training the generative network based on the discriminant network; and alternately performing the above-mentioned training process to Complete the training in the generation confrontation training phase.
  • training the discriminant network based on the generation network includes: inputting a third training comment on the first object, and using the generation network to The third training comment is processed to extract a third training common representation vector, and the third training common representation vector is processed using the discriminant network to obtain a third training output; inputting the second object about the second object Four training comments, using the generation network to process the fourth training comment to extract a fourth training common representation vector, and using the discriminant network to process the fourth training common representation vector to obtain fourth training Output; Based on the third training output and the fourth training output, the discriminant network confrontation loss value is calculated by the discriminant network confrontation loss function; the parameters of the discriminant network are modified according to the discriminant network confrontation loss value.
  • the discriminant network includes a two-class softmax classifier.
  • the discriminant network confrontation loss function is expressed as:
  • L D represents the discriminant network confrontation loss function
  • z1 represents the third training review
  • P data (z1) represents the set of the third training review
  • G(z1) represents the third training common representation vector
  • D(G(z1)) represents the third training output
  • z2 represents the fourth training reviews
  • P data (z2) represents the set of the fourth training reviews
  • G(z2) represents the fourth training common representation vector
  • D(G(z2)) represents the fourth training output
  • training the generation network based on the discriminant network includes: inputting a fifth training comment on the first object, and using the generation network to The fifth training comment is processed to extract a fifth training common representation vector, and the fifth training common representation vector is processed using the discriminant network to obtain the fifth training output; Six training comments, using the generation network to process the sixth training comment to extract a sixth training common representation vector, and using the discriminant network to process the sixth training common representation vector to obtain the sixth training Output; based on the fifth training output and the sixth training output, calculate and generate a network confrontation loss value by generating a network confrontation loss function; modify the parameters of the generation network according to the generation network confrontation loss value.
  • the generated network counter loss function can be expressed as:
  • L G represents the generation network confrontation loss function
  • z3 represents the fifth training review
  • P data (z3) represents the set of the fifth training review
  • G(z3) represents the fifth training common representation vector
  • D(G(z3)) represents the fifth training output
  • z4 represents the sixth training reviews
  • P data (z4) represents the set of the sixth training reviews
  • G(z4) represents the sixth training common representation vector
  • D(G(z4)) represents the sixth training output
  • At least one embodiment of the present disclosure further provides a semantic classification device, including: a memory, configured to store non-transitory computer-readable instructions; and a processor, configured to run the computer-readable instructions.
  • a semantic classification device including: a memory, configured to store non-transitory computer-readable instructions; and a processor, configured to run the computer-readable instructions.
  • the semantic classification method provided in any embodiment of the present disclosure is executed.
  • At least one embodiment of the present disclosure further provides a neural network training device, including: a memory, configured to store non-transitory computer-readable instructions; and a processor, configured to run the computer-readable instructions.
  • a neural network training device including: a memory, configured to store non-transitory computer-readable instructions; and a processor, configured to run the computer-readable instructions.
  • the training method provided in any embodiment of the present disclosure is executed.
  • At least one embodiment of the present disclosure further provides a storage medium for non-transitory storage of computer-readable instructions.
  • the non-transitory computer-readable instructions are executed by a computer, the semantic classification method provided by any embodiment of the present disclosure can be executed. Instructions.
  • At least one embodiment of the present disclosure further provides a storage medium for non-transitory storage of computer-readable instructions.
  • the training method provided in any embodiment of the present disclosure can be executed. instruction.
  • FIG. 1 is a flowchart of a semantic classification method provided by at least one embodiment of the present disclosure
  • Fig. 2 is an exemplary flow chart of the semantic classification method shown in Fig. 1;
  • FIG. 3 is a flowchart of another semantic classification method provided by at least one embodiment of the present disclosure.
  • FIG. 4 is an exemplary flowchart of the semantic classification method shown in FIG. 3;
  • FIG. 5 is a schematic structural block diagram of a neural network provided by at least one embodiment of the present disclosure.
  • Fig. 6 is a flowchart of a neural network training method provided by at least one embodiment of the present disclosure
  • FIG. 7 is a schematic training architecture block diagram of a discriminant network in the generation confrontation training stage corresponding to the training method shown in FIG. 6 provided by at least one embodiment of the present disclosure
  • FIG. 8 is a schematic flowchart of a process of training a discriminant network provided by at least one embodiment of the present disclosure
  • FIG. 9 is a block diagram of a schematic training architecture of a generating network in the generating confrontation training stage corresponding to the training method shown in FIG. 6 provided by at least one embodiment of the present disclosure
  • FIG. 10 is a schematic flowchart of a process of training a generation network provided by at least one embodiment of the present disclosure
  • FIG. 11 is a schematic block diagram of a training architecture corresponding to the semantic classification training phase of the training method shown in FIG. 6 provided by at least one embodiment of the present disclosure
  • FIG. 12 is a schematic flowchart of a training process in a semantic classification training phase in a training method provided by at least one embodiment of the present disclosure
  • FIG. 13 is a schematic block diagram of a semantic classification device provided by at least one embodiment of the present disclosure.
  • FIG. 14 is a schematic block diagram of a neural network training device provided by at least one embodiment of the present disclosure.
  • FIG. 15 is a schematic diagram of a storage medium provided by at least one embodiment of the present disclosure.
  • comments about hospitals and doctors can be divided into: comments only used to evaluate hospitals, such as comments such as "complete departments”; comments only used to evaluate doctors, such as Comments such as "excellent medical skills"; and, comments that can be used to evaluate both hospitals and doctors, such as comments such as "good service”.
  • comments that can be used to evaluate different review objects are called common expressions; comments that are only used to evaluate a single review target are called single expressions.
  • Reviews about hospitals and doctors can be semantically classified according to their review content, for example, they can be divided into favorable reviews, moderate reviews, and negative reviews.
  • When performing semantic classification on reviews about hospitals and doctors if the common representation and single representation in the reviews can be extracted to perform semantic classification based on more effective information, it will help improve the objectivity and accuracy of the review analysis.
  • the two comment objects of hospital and doctor are defined as related comment objects, that is, the hospital is the related comment object of the doctor, and the doctor is the related comment object of the hospital; similarly, the others are related comments to each other.
  • the target situation can also include schools and teachers, take-out platforms and take-out merchants, etc.
  • there may be a certain interdependent relationship between two related review objects but it is not limited to this.
  • one review object is a component of another review object (such as employees), service providers, or suppliers (such as takeaway services). ), etc.; for another example, the quality of the review of one of the two related review objects may reflect the quality of the other review of the two related review objects to a certain extent.
  • the semantic classification method includes: inputting a first comment about a first object; using a common representation extractor to process the first comment to extract a first common representation vector used to characterize the common representation in the first review; using the first The representation extractor processes the first comment to extract the first single representation vector used to represent the single representation in the first comment; concatenate the first common representation vector and the first single representation vector to obtain the first representation vector ; And using the first semantic classifier to process the first representation vector to obtain the semantic classification of the first comment; wherein the common representation includes meaning representations used to comment on both the first object and the second object, and the second The object is an associated comment object different from the first object, and the single representation of the first comment includes a meaning representation used only for commenting on the first object.
  • Some embodiments of the present disclosure also provide a semantic classification device corresponding to the above-mentioned semantic classification method, a training method of a neural network, a device corresponding to a training method of a neural network, and a storage medium.
  • the semantic classification method provided by at least one embodiment of the present disclosure can extract a common representation and a single representation in a first comment about a first object, and perform semantic classification on the first comment based on the common representation and the single representation, which helps to improve The objectivity and accuracy of the comment analysis.
  • FIG. 1 is a flowchart of a semantic classification method provided by at least one embodiment of the present disclosure
  • FIG. 2 is an exemplary flowchart of the semantic classification method shown in FIG. 1.
  • the semantic classification method includes step S110 to step S150.
  • the semantic classification method shown in FIG. 1 will be described in detail below in conjunction with FIG. 2.
  • Step S110 Input the first comment about the first object.
  • the first object may be any type of review object, such as a hospital, a doctor, a school, a teacher, a takeaway platform, a takeaway merchant, etc., which is not limited in the embodiment of the present disclosure.
  • the first comment may come from a forum related to the first object, etc.
  • the corpus source of the first comment may include text, speech, pictures (such as emoticons), etc., for example, speech, pictures, etc. can be converted into text by manual or artificial intelligence.
  • the language of the first comment may include Chinese, English, Japanese, German, Korean, etc., which is not limited in the embodiment of the present disclosure.
  • the semantic classification method can process one or more predetermined languages, and the first comment in other languages (not belonging to the one or more predetermined languages) can be translated (for example, , Translated into a predetermined language) before processing.
  • step S110 inputting a first comment on the first object, that is, step S110 may include: mapping the first comment to the first original vector P1. Therefore, processing the first comment in the subsequent steps is processing the first original vector P1.
  • a word vector algorithm for example, deep neural network, word2vec program, etc.
  • the first original vector P1 includes all the words in the first comment after being mapped All vectors obtained.
  • the length of the vector corresponding to each word is the same. It should be noted that in the embodiments of the present disclosure, the length of a vector refers to the number of elements included in the vector.
  • the word vector algorithm can be used to map the n characters in the first comment to the vectors Vx1, Vx2,..., Vxn.
  • Vx1, Vx2,..., Vxn the vectors
  • Vx1, Vx2,..., Vxn have the same length.
  • the first original vector has a matrix form.
  • Step S120 Use the common representation extractor to process the first comment to extract a first common representation vector for characterizing the common representation in the first comment.
  • the common representation extractor can adopt a model based on the relationship of samples in the time series, for example, including but not limited to Recurrent Neural Network (RNN), Long Short Term Memory, LSTM), Bi-directional Long Short Term Memory (Bi-directional Long Short Term Memory, Bi-LSTM), etc.
  • RNN Recurrent Neural Network
  • LSTM Long Short Term Memory
  • Bi-LSTM Bi-directional Long Short Term Memory
  • the common representation extractor EE0 is used to process the first original vector P1 to extract the first common representation vector P01.
  • the LSTM includes multiple processing units (cells) connected in sequence, and the n vectors Vx1, Vx2 in the first original vector P1 (Vx1, Vx2,..., Vxn) ,..., Vxn are respectively used as the input of the first n processing units of the LSTM, and the output of the nth processing unit of the LSTM is the first common representation vector P01.
  • the number of processing units included in the LSTM here is greater than or equal to the number of words of the longest first comment processed by it.
  • the common representation includes a common representation of meaning used to comment on both a first object and a second object, where the second object is an associated comment object different from the first object.
  • the first subject is a hospital and the second subject is a doctor.
  • the common expressions include "good service", "clean”, etc., which can be used to rate hospitals or It is used to evaluate the comments of doctors, or it cannot be used to distinguish whether it is used to evaluate the hospital or the comments of doctors without referring to the context.
  • the common representation extractor EE0 can be obtained through training methods that will be introduced later, so as to achieve the function of extracting the common representation in the first comment and the second comment. It should be noted that the embodiments of the present disclosure include but are not limited to this.
  • Step S130 Use a first representation extractor to process the first comment to extract a first single representation vector for representing a single representation in the first comment.
  • the first representation extractor may also adopt a model based on the relationship of samples on the time series, such as recurrent neural network (RNN), long short-term memory network (LSTM), bi-directional long short-term memory network (Bi-LSTM) )Wait.
  • RNN recurrent neural network
  • LSTM long short-term memory network
  • Bi-LSTM bi-directional long short-term memory network
  • the first representation extractor may adopt the same type of model as the same representation extractor.
  • the first original vector P1 is processed by the first representation extractor EE1 to extract the first single representation Vector P11.
  • the process of processing the first original vector P1 by the first representation extractor EE1 can refer to the process of processing the first original vector P1 by the common representation extractor EE0, which will not be repeated here.
  • the single representation in the first comment includes a meaning representation used only for commenting on the first object, that is, the meaning representation is not used for commenting on a second object (that is, an associated comment object that is different from the first object).
  • the first object is a hospital and the second object is a doctor.
  • the single expression in the first comment includes "complete departments", "advanced equipment” and so on. It is used to evaluate hospitals and cannot be used to evaluate doctors’ comments.
  • the first single representation vector P11 includes the information of the single representation in the first comment; in addition, the first single representation vector P11 may also include (of course, or may also Excluding) the commonly expressed information in the first comment; it should be noted that the embodiment of the present disclosure does not limit this.
  • the first representation extractor EE1 can be obtained through training methods that will be introduced later, so as to achieve the function of extracting a single representation in the first comment. It should be noted that the embodiments of the present disclosure include but are not limited to this.
  • Step S140 concatenate the first common representation vector and the first single representation vector to obtain the first representation vector.
  • the first common representation vector P01 and the first single representation vector P11 are spliced to obtain the first representation vector P10.
  • the first common representation vector P01 includes s elements (a1, a2,..., as) and the first single representation vector P11 includes t elements (b1, b2,..., bt)
  • the first common representation vector P01 and The first single representation vector P11 is spliced, that is, the s+t elements are spliced in a predetermined order.
  • it can be spliced into (a1,..., as, b1,..., bt) or (b1,..., bt, a1,...,as) and other forms to obtain the first representation vector P10.
  • the embodiment of the present disclosure does not limit the arrangement order of the elements in the first representation vector P10, as long as the first representation vector P10 includes all the elements in the first common representation vector P01 and the first single representation vector P11 That's it.
  • Step S150 Use the first semantic classifier to process the first representation vector to obtain the semantic classification of the first comment.
  • the first semantic classifier CC1 is used to process the first representation vector P10 to obtain the semantic classification of the first comment.
  • the first semantic classifier CC1 may include a softmax classifier, and the softmax classifier includes, for example, a fully connected layer.
  • a K-dimensional (that is, including K elements, corresponding to K category identifiers) vector z is obtained.
  • the elements in the vector z can be any real numbers; the softmax classifier can divide the K-dimensional
  • the vector z is compressed into a K-dimensional vector.
  • the formula of the softmax classifier is as follows:
  • Z j represents the j-th element in the K-dimensional vector z
  • ⁇ (z) represents the predicted probability of each category label (label)
  • ⁇ (z) is a real number
  • its range is (0,1)
  • K-dimensional The sum of the vector ⁇ (z) is 1.
  • each category identifier in the K-dimensional vector z is assigned a certain prediction probability, and the category identifier with the largest prediction probability is selected as the category identifier for semantic classification.
  • the category number of the category identifiers of the semantic classification is K, for example, K is an integer greater than or equal to 2.
  • K is an integer greater than or equal to 2.
  • the embodiments of the present disclosure include but are not limited to this.
  • the first semantic classifier CC1 can be obtained through training methods that will be introduced later, so as to realize the above-mentioned semantic classification function. It should be noted that the embodiments of the present disclosure include but are not limited to this.
  • FIG. 3 is a flowchart of another semantic classification method provided by at least one embodiment of the present disclosure
  • FIG. 4 is an exemplary flowchart of the semantic classification method shown in FIG. 3.
  • the semantic classification method shown in FIG. 3 further includes steps S160 to S200.
  • the operations in step S160 to step S200 in the semantic classification method shown in FIG. 3 are basically similar to the operations in step S110 to step S150, and the main difference lies in: step S110 to step S150 are used for matching
  • the first comment on the first object is subjected to semantic classification processing
  • steps S160 to S200 are used to perform semantic classification processing on the second comment on the second object, wherein the first object and the second object are related comment objects to each other. Therefore, the details of step S160 to step S200 may correspond to the relevant description of step S110 to step S150.
  • steps S160 to S200 of the semantic classification method shown in FIG. 3 will be described in detail with reference to FIG. 4.
  • Step S160 Input a second comment on the second object.
  • the second object is an associated comment object different from the first object.
  • the second object can be a comment object associated with the hospital, such as a doctor or medicine; or, when the first object is a doctor, the second object can be a comment associated with the doctor, such as a hospital or medicine.
  • Object Object.
  • the embodiments of the present disclosure include but are not limited to this.
  • one of the first object and the second object may also be a school, a takeaway platform, etc.
  • the other of the first object and the second object may also be For teachers, takeaway businesses, etc.; in other words, as long as the first object and the second object are related to each other's review objects.
  • the second comment may originate from a forum related to the second object.
  • the first comment and the second comment may originate from the same forum or the like.
  • the corpus source of the second comment may also include text, voice, pictures, etc., for example, voice, pictures, etc. can be converted into text manually or artificially.
  • the language of the second comment may include Chinese, English, Japanese, German, Korean, etc., which is not limited in the embodiment of the present disclosure.
  • the semantic classification method can process one or more predetermined languages, and the second comment in other languages (not belonging to the one or more predetermined languages) can be translated (for example, , Translated into a predetermined language) before processing.
  • step S160 inputting the first comment on the first object, that is, step S160 may include: mapping the second comment to the second original vector P2. Therefore, processing the second comment in the subsequent step is processing the second original vector P2.
  • a word vector algorithm for example, deep neural network, wotd2vec program, etc.
  • the second original vector P2 includes all the words in the second comment after being mapped All vectors obtained.
  • the length of the vector corresponding to each word in the second comment is the same as the length of the vector corresponding to each word in the first comment.
  • Step S170 Use the common representation extractor to process the second comment to extract a second common representation vector used to characterize the common representation in the second comment.
  • the common representation extractor EE0 used in step S120 can also be used in step S170, that is, the common representation extractor EE0 can also process the second comment to extract the The second common representation of the common representation vector P02.
  • the common representation extractor EE0 is used to process the second original vector P2 to extract the second common representation vector P02.
  • the process of processing the second original vector P2 by the common representation extractor EE0 can refer to the process of processing the first original vector P1 by the common representation extractor EE0, which will not be repeated here.
  • the number of processing units included in the LSTM is also greater than or equal to the number of words of the longest second comment processed by it.
  • Step S180 Use a second representation extractor to process the second comment to extract a second single representation vector for representing a single representation in the second comment.
  • the second representation extractor may also adopt a model based on the relationship of samples on the time series, such as recurrent neural network (RNN), long short-term memory network (LSTM), bi-directional long short-term memory network (Bi-LSTM) )Wait.
  • RNN recurrent neural network
  • LSTM long short-term memory network
  • Bi-LSTM bi-directional long short-term memory network
  • the second representation extractor may adopt the same type of model as the common representation extractor.
  • the second original vector P2 is processed by the second representation extractor EE2 to extract the second single representation Vector P22.
  • the process of processing the second original vector P2 by the second representation extractor EE2 can refer to the process of processing the first original vector P1 by the common representation extractor EE0, which will not be repeated here.
  • the single representation in the second comment includes a meaning representation used only to comment on the second object, that is, the meaning representation is not used to comment on the first object (that is, an associated comment object different from the second object).
  • the first object is a hospital and the second object is a doctor.
  • the single expression in the second comment includes "excellent medical skills", "kindness” and so on. Comments that are used to evaluate doctors and cannot be used to evaluate hospitals.
  • the second single representation vector P22 includes the information of the single representation in the second comment; in addition, the second single representation vector P22 may also include (of course, it may not include ) Commonly expressed information in the second comment; it should be noted that the embodiment of the present disclosure does not limit this.
  • the second representation extractor EE2 can be obtained through training methods that will be introduced later, so as to achieve the function of extracting a single representation in the second comment. It should be noted that the embodiments of the present disclosure include but are not limited to this.
  • Step S190 concatenate the second common representation vector and the second single representation vector to obtain a second representation vector.
  • the second common representation vector P02 and the second single representation vector P22 are spliced to obtain the second representation vector P20.
  • the splicing process and details in step S190 can refer to the splicing process and details in step S140, which will not be repeated here.
  • Step S200 Use the second semantic classifier to process the second representation vector to obtain the semantic classification of the second comment.
  • the second semantic classifier CC2 is used to process the second representation vector P20 to obtain the semantic classification of the second comment.
  • the second semantic classifier CC2 may also include a softmax classifier, for example, the softmax classifier includes a fully connected layer; for example, the processing process and details of the second semantic classifier CC2 can refer to the processing process of the first semantic classifier CC1 and The details will not be repeated here.
  • the common representation extractor EE0, the first representation extractor EE1, and the second representation extractor EE2 perform similar functions, and the three can have the same or similar structures, but the three The included parameters can be different.
  • the first semantic classifier CC1 and the second semantic classifier CC2 perform similar functions, and the two may have the same or similar structure, but the parameters included in the two may be different.
  • the common representation extractor EE0, the first representation extractor EE1, the second representation extractor EE2, the first semantic classifier CC1, and the second semantic classifier CC2 can all be It is implemented by software, hardware, firmware, or any combination thereof, so that the corresponding processing procedures can be executed respectively.
  • the flow of the above-mentioned semantic classification method may include more or less operations (for example, in the semantic classification method shown in FIG. 3, only steps S110 to S150 may be executed.
  • the operations can also be performed only from step S160 to step S200), and these operations can be performed sequentially or in parallel (for example, step S120 and step S130 can be performed in parallel, or can be performed sequentially in any order).
  • the flow of the image display processing method described above includes multiple operations appearing in a specific order, it should be clearly understood that the order of the multiple operations is not limited.
  • the semantic classification method described above can be executed once or multiple times according to predetermined conditions.
  • the first comment/second comment when the first comment/second comment is mapped to the first original vector/second original vector, the first comment/second comment can be firstly classified as irrelevant to semantic classification. Filter out the words of (for example, stop words, etc.), and then map the remaining semantic classification-related words in the first comment/second comment to the first original vector/second original vector .
  • the common representation extractor EE0, the first representation extractor EE1, and the second representation extractor EE2 trained by a specific training method can filter out words that are not related to semantic classification when extracting meaning representations. word. It should be noted that the embodiments of the present disclosure do not limit this.
  • the semantic classification method provided by the embodiment of the present disclosure can extract the common representation and the single representation in the first comment about the first object, and perform semantic classification on the first comment based on the common representation and the single representation, which is helpful to improve the comment.
  • the objectivity and accuracy of the analysis can extract the common representation and the single representation in the first comment about the first object, and perform semantic classification on the first comment based on the common representation and the single representation, which is helpful to improve the comment.
  • FIG. 5 is a schematic structural block diagram of a neural network provided by at least one embodiment of the present disclosure
  • FIG. 6 is a flowchart of a training method of a neural network provided by at least one embodiment of the present disclosure.
  • the neural network includes a generation network G, a discrimination network D, a first branch network SN1, a first classification network CN1, a second branch network SN2, and a second branch network CN2.
  • the training method includes: generating a confrontation training stage S300 and a semantic classification training stage S400, and performing these two stages of training alternately to obtain a trained neural network.
  • the generating network G, the first branch network SN1, the first classifier CN1, the second branch network SN2, and the second classifier CN2 can be used to implement the common semantic classification methods mentioned above.
  • the functions of the representation extractor EE0, the first representation extractor EE1, the first semantic classifier CC1, the second representation extractor EE2, and the second semantic classifier CC2 can be used to perform the aforementioned semantic classification method.
  • the generating confrontation training stage S300 includes:
  • Step S310 Training the discriminant network based on the generation network
  • Step S320 Training the generation network based on the discriminant network.
  • step S310 and step S320 The above-mentioned training process (ie, step S310 and step S320) is alternately performed to complete the training of the generation confrontation training stage S300.
  • the construction of the generating network G may be the same as the construction of the aforementioned common representation extractor EE0, and the construction details and working principles of the generating network G can refer to the related description of the aforementioned common representation extractor EE0, which will not be repeated here.
  • the generation network G is used to process comments on the first object and also to process comments on the second object to extract meaning in the comments, where the first object and the second object They are related to each other's comments.
  • the discriminant network D can adopt a two-class softmax classifier.
  • the discriminating network D is used to determine whether the meaning extracted by the generating network G is used to comment on the first object or the second object.
  • FIG. 7 is a schematic training architecture block diagram of a discriminant network in the generated adversarial training phase corresponding to the training method shown in FIG. 6 provided by at least one embodiment of the present disclosure.
  • FIG. 8 is a block diagram of a discriminant network provided by at least one embodiment of the present disclosure. A schematic flow chart of the process of training the discriminant network.
  • step S310 includes step S311 to step S314, as follows:
  • Step S311 Input the third training comment about the first object, use the generative network to process the third training comment to extract the third training common representation vector, and use the discriminant network to process the third training common representation vector to obtain the first Three training output;
  • Step S312 Input the fourth training comment about the second object, use the generative network to process the fourth training comment to extract the fourth training common representation vector, and use the discriminant network to process the fourth training common representation vector to obtain the first Four training output;
  • Step S313 Based on the third training output and the fourth training output, calculate the discriminant network confrontation loss value through the discriminant network confrontation loss function;
  • Step S314 Correct the parameters of the discriminating network according to the discriminating network countermeasure loss value.
  • training the discriminant network based on the generative network may also include: judging whether the training of the discriminant network meets a predetermined condition; if the predetermined condition is not met, repeating the training process of the discriminant network; The training process of the discriminant network at this stage is stopped, and the discriminant network trained in this stage is obtained.
  • the aforementioned predetermined condition is the discriminant network confrontation loss value corresponding to two consecutive pairs of reviews (for example, in the process of training the discriminant network, each pair of reviews includes a third training review and a fourth training review) It is no longer significantly reduced.
  • the foregoing predetermined condition is that the number of training times or training periods of the discriminating network reaches a predetermined number. The embodiment of the present disclosure does not limit this.
  • the above example is only a schematic illustration of the training process of the discriminant network.
  • sample comments that is, comments on the first object and comments on the second object
  • Both can include multiple iterations to modify the parameters of the discriminant network.
  • the training process of the discriminant network also includes fine-tune the parameters of the discriminant network to obtain more optimized parameters.
  • the initial parameter of the discriminant network D may be a random number, for example, the random number conforms to a Gaussian distribution.
  • the initial parameters of the discriminant network D can also be trained parameters in databases commonly used in this field. The embodiment of the present disclosure does not limit this.
  • the training process of the discriminant network D may also include an optimization function (not shown in FIG. 7).
  • the optimization function can calculate the error value of the parameters of the discriminant network D according to the discriminant network countermeasure loss value calculated by the discriminant network countermeasure loss function. And according to the error value, the parameters of the discriminant network D are corrected.
  • the optimization function may use a stochastic gradient descent (SGD) algorithm, a batch gradient descent (batch gradient descent, BGD) algorithm, etc., to calculate the error value of the parameters of the discriminant network D.
  • SGD stochastic gradient descent
  • BGD batch gradient descent
  • the third training comment comes from the comment sample set of the first object; for example, each comment in the comment sample set of the first object has been semantically classified in advance (for example, semantic classification is performed manually), with The category identification of the determined semantic classification; for example, the category identification of the semantic classification in the comment sample set of the first object includes good reviews, moderate reviews, and negative reviews.
  • the embodiments of the present disclosure include but are not limited to this.
  • the fourth training comment comes from the comment sample set of the second object; for example, each comment in the comment sample set of the second object has been semantically classified in advance (for example, semantic classification is performed manually, etc.), with
  • the category identification of the determined semantic classification for example, the category identification of the semantic classification in the comment sample set of the second object includes good reviews, medium reviews, and negative reviews.
  • the embodiments of the present disclosure include but are not limited to these.
  • the word vector algorithm may be used to map the third training comment and the fourth training comment to the original vector, and the original vectors corresponding to the third training comment and the fourth training comment are processed by the generation network G.
  • the processing and details of generating the network G can be referred to the processing and details of the aforementioned common representation extractor EE0, which will not be repeated here.
  • the discriminative network adversarial loss function can be expressed as:
  • L D represents the discriminative network confrontation loss function
  • z1 represents the third training review
  • P data (z1) represents the set of third training reviews
  • G(z1) represents the third training common representation vector
  • D(G(z1)) Represents the third training output
  • z2 represents the fourth training reviews
  • P data (z2) represents the set of fourth training reviews
  • G(z2) represents the fourth training common representation vector
  • a batch gradient descent algorithm can be used to optimize the parameters of the discriminant network D.
  • discriminant network countermeasure loss function expressed by the above formula is exemplary, and the embodiments of the present disclosure include but are not limited to this.
  • the training goal of discriminant network D is to minimize the value of discriminant network confrontation loss.
  • the object label of the third training comment is set to 1, that is, the discriminant network D needs to identify that the third training common representation vector comes from the comment about the first object; at the same time, the fourth training The object label of the comment is set to 0, that is, the discriminant network D needs to identify that the fourth training common representation vector comes from the comment on the second object.
  • the training goal of the discriminant network D is to enable the discriminant network D to accurately determine the true source of the meaning expression extracted by the generating network G (that is, from the comment on the first object or the comment on the second object), that is, The discrimination network D can accurately determine whether the meaning representation extracted by the generation network G is used to comment on the first object or the second object.
  • the parameters of the discriminant network D are constantly revised, so that the discriminant network D after the parameter correction can accurately identify the source of the third training common representation vector and the fourth training common representation vector. That is, the output of the discriminant network D corresponding to the third training comment is constantly approaching 1, and the output of the discriminant network D corresponding to the second training comment is constantly approaching 0, thereby continuously reducing the generation network confrontation loss value.
  • FIG. 9 is a schematic training architecture block diagram of a generative network in the generative confrontation training phase corresponding to the training method shown in FIG. 6 provided by at least one embodiment of the present disclosure
  • FIG. 10 is a block diagram of a method provided by at least one embodiment of the present disclosure
  • step S320 includes step S321 to step S324, as shown below:
  • Step S321 Input the fifth training comment about the first object, use the generation network to process the fifth training comment to extract the fifth training common representation vector, and use the discriminant network to process the fifth training common representation vector to obtain the first Five training output;
  • Step S322 Input the sixth training comment about the second object, use the generation network to process the sixth training comment to extract the sixth training common representation vector, and use the discriminant network to process the sixth training common representation vector to obtain the first Six training output;
  • Step S323 Based on the fifth training output and the sixth training output, a network confrontation loss value is calculated and generated by generating a network confrontation loss function
  • Step S324 Correct the parameters of the generating network according to the value of the generating network countermeasure loss.
  • training the generative network based on the discriminant network may also include: judging whether the training of the generative network satisfies a predetermined condition, and if the predetermined condition is not met, repeating the above-mentioned training process of the generative network; if the predetermined condition is satisfied, Then stop the training process of the generative network in this stage, and get the generative network trained in this stage.
  • the foregoing predetermined condition is the discriminant network confrontation loss value corresponding to two consecutive pairs of reviews (for example, in the process of training the generation network, each pair of reviews includes a fifth training review and a sixth training review) It is no longer significantly reduced.
  • the foregoing predetermined condition is that the number of training times or training periods of the generated network reaches a predetermined number. The embodiment of the present disclosure does not limit this.
  • the above example is only a schematic illustration of the training process of the generation network.
  • sample comments that is, comments on the first object and comments on the second object
  • Both can include multiple iterations to modify the parameters of the generated network.
  • the training process of the generative network also includes fine-tune the parameters of the generative network to obtain more optimized parameters.
  • the initial parameter of the generating network G may be a random number, for example, the random number conforms to a Gaussian distribution.
  • the initial parameters of the generating network G can also be trained parameters in databases commonly used in this field. The embodiment of the present disclosure does not limit this.
  • the training process of the generator network G may also include an optimization function (not shown in FIG. 7), and the optimization function may calculate the error value of the parameter of the generator network G according to the generated network counter loss value calculated by the generator network counter loss function. And according to the error value, the parameters of the generating network G are corrected.
  • the optimization function may use a stochastic gradient descent (SGD) algorithm, a batch gradient descent (BGD) algorithm, etc. to calculate the error value of the parameters of the generated network G.
  • SGD stochastic gradient descent
  • BGD batch gradient descent
  • the fifth training comment is also derived from the comment sample set of the first object, and the embodiments of the present disclosure include but are not limited to this.
  • the sixth training comment is also derived from the comment sample set of the second object, and the embodiments of the present disclosure include but are not limited to this.
  • the generated network confrontation loss function can be expressed as:
  • L G represents a function generating a network against loss
  • G (z3) together represent a fifth training vector
  • D (G (z3)) Represents the fifth training output
  • z4 represents the sixth training review
  • P data (z4) represents the set of sixth training reviews
  • G(z4) represents the sixth training common representation vector
  • D(G(z4)) Represents the sixth training output
  • Represents expectations for the sixth training comment collection can be used to optimize the parameters of the generation network G.
  • discriminant network countermeasure loss function expressed by the above formula is exemplary, and the embodiments of the present disclosure include but are not limited to this.
  • the training goal of the generative network G is to minimize the counter-loss value of the generative network.
  • the object label of the fifth training comment is set to 0, that is, the discriminant network D needs to identify that the fifth training common representation vector comes from the comment about the second object; at the same time, the sixth training The object label of the review is set to 1, that is, the discriminant network D needs to identify that the sixth training common representation vector comes from the review about the first object.
  • the training goal of the generative network G is to make the discriminant network D unable to accurately determine the true source of the meaning expression extracted by the generative network G (that is, from the comment about the first object or the comment about the second object), even if it is determined
  • the network D cannot determine whether the meaning representation extracted by the generating network G is used to comment on the first object or the second object.
  • the discrimination network D cannot determine the true source of the meaning representation extracted by the generation network G.
  • the parameters of the generative network G are constantly revised, so that the meaning extracted by the generative network G after the parameter correction is expressed as a comment on the first object and a comment on the second object.
  • Common representation so that the discriminant network D cannot accurately identify the source of the fifth training common representation vector and the sixth training common representation vector, that is, the output of the discriminant network D corresponding to the fifth training comment is kept away from 1 (that is, it keeps close to 0), and make the output of the discriminant network D corresponding to the four training comments keep away from 0 (that is, keep getting closer to 1), so as to continuously reduce the generation network confrontation loss value.
  • the training of the generating network G and the training of the discriminant network D are performed alternately and iteratively.
  • the first stage of training is generally performed on the discriminant network D to improve the discriminative ability of the discriminant network D (that is, to identify the true source of the input to the discriminant network D), and obtain the The discriminant network D trained in the first stage; then, the generation network G is trained in the first stage based on the discriminant network D trained in the first stage to improve the extraction of comments about the first object and comments about the second object by the generation network G The ability of the joint representation of, obtains the generative network G trained in the first stage.
  • the discriminant network D trained in the first stage is trained in the second stage to improve the discriminating ability of the discriminant network D, and get The discriminant network D trained in the second stage; then, based on the discriminant network D trained in the second stage, the generative network G trained in the first stage is trained in the second stage to improve the extraction of comments on the first object of the generative network G The ability to express together with the comment on the second object, obtain the generative network G trained in the second stage, and so on, and then perform the third-stage training, the fourth-stage training on the discriminant network D and the generative network G,... , Until the output of the generative network G is obtained as a common representation of the comment on the first object and the comment on the second object, so as to complete the training of a generative confrontation training stage S300.
  • the anti-antibodies of the generation network G and the discriminant network D are now: comments on the first object (that is, the third training comment And the fifth training comment).
  • the output of the generation network G corresponding to each individual training process has different object labels (in the training process of the discriminant network D, the object label of the third training comment is 1, and the object label of the third training comment is 1.
  • the object label of the fifth training comment is 0
  • the output of the generation network G corresponding to the comment on the second object has Different object labels (in the training process of the discriminant network D, the object label of the fourth training comment is 0, and in the training process of generating the network G, the object label of the sixth training comment is 1).
  • the confrontation between the generation network G and the discrimination network D is also reflected in the discrimination network confrontation loss function and the generation network confrontation loss function.
  • the meaning extracted by the generated network G after training is expressed as a common representation of the comments on the first object and the comments on the second object (regardless of whether the input of the generated network G is about the first object The comment of is still the comment about the second object), the output of the discriminant network D for the common representation is both 0.5, that is, the generating network G and the discriminant network D reach the Nash equilibrium through the confrontation game.
  • the semantic classification training stage S400 includes: training the generation network, the first branch network, the first classification network, the second branch network, and the second classification network.
  • the structure of the first branch network SN1 may be the same as the structure of the aforementioned first representation extractor EE1.
  • the first branch network SN1 is used to process comments on the first object to extract a single representation in the comment (whether to extract the common representation in the comment is not limited).
  • the structure of the second branch network SN2 may be the same as the structure of the aforementioned second representation extractor EE2.
  • the second branch network SN2 is used to process comments on the second object to extract a single representation in the comment (whether to extract the common representation in the comment is not limited).
  • the structures of the first classification network CN1 and the second classification network CN2 may be the same as those of the aforementioned first semantic classifier CC1 and the second semantic classifier CC2, respectively.
  • the structure details of the first classification network CN1 and the second classification network CN2 For the working principle, please refer to the related description of the first semantic classifier CC1 and the second semantic classifier CC2, which will not be repeated here.
  • FIG. 11 is a block diagram of a schematic training architecture corresponding to the semantic classification training phase of the training method shown in FIG. 6 provided by at least one embodiment of the present disclosure
  • FIG. 12 is a training method provided in at least one embodiment of the present disclosure
  • a schematic flow chart of the training process of the semantic classification training phase will be described in detail with reference to FIG. 11 and FIG. 12.
  • the semantic classification training stage S400 includes steps S401 to S405.
  • Step S401 Input the first training comment about the first object, use the generative network to process the first training comment to extract the first training common representation vector, and use the first branch network to process the first training comment to extract the first training comment.
  • a training single representation vector, the first training common representation vector and the first training single representation vector are spliced together to obtain the first training representation vector, and the first training representation vector is processed by the first classification network to obtain the first training The predicted category identifier for the semantic classification of the comment.
  • the first training comment is also derived from the comment sample set of the first object, and the embodiments of the present disclosure include but are not limited to this.
  • the first training comment has a category identifier T1 (that is, a true category identifier) of a certain semantic classification, for example, the true category identifier is represented in the form of a vector.
  • T1 that is, a true category identifier
  • the real category identifier is a K-dimensional vector; when the k-th element of the K-dimensional vector is 1, and the other elements are 0, the K-dimensional vector represents the k-th True category identification, where k is an integer and 1 ⁇ k ⁇ K.
  • inputting the first training comment on the first object may include: mapping the first training comment to the first training original vector TP1. Therefore, processing the first training comment in the subsequent operation is processing the first training original vector TP1.
  • a word vector algorithm for example, deep neural network, word2vec program, etc.
  • a word vector algorithm may be used to map each word in the first training comment to a vector of a specified length, so that the first training original vector P1 includes all of the first training comment. All vectors obtained by mapping words. For example, the length of the vector corresponding to each word is the same.
  • step S401 can refer to the related descriptions of step S110 to step S150 of the aforementioned semantic classification method, which will not be repeated here.
  • the predicted category identifier of the first training review is a vector with the same dimension as its real category identifier.
  • the predicted category identifier of the first training review can be expressed in the form of the aforementioned vector, and each element in the vector represents the predicted probability of each category identifier.
  • the category identifier with the largest prediction probability is selected as the category identifier of the semantic classification.
  • Step S402 Input the second training comment about the second object, use the generative network to process the second training comment to extract the second training common representation vector, and use the second branch network to process the second training comment to extract the second training comment.
  • the second training single representation vector, the second training common representation vector and the second training single representation vector are spliced to obtain the second training representation vector, and the second training representation vector is processed by the second classification network to obtain the second training The predicted category identifier for the semantic classification of the comment.
  • the second training comment is also derived from the comment sample set of the second object.
  • the embodiments of the present disclosure include but are not limited to this.
  • the second training review has a category identifier T2 (that is, a real category identifier) of a certain semantic classification.
  • the representation form of the real category identifier T2 of the second training review can refer to the representation form of the real category identifier T1 of the first training review. I will not repeat them here.
  • inputting the second training comment on the second object may include: mapping the second training comment to the second training original vector TP2. Therefore, processing the second training comment in the subsequent operation is processing the second training original vector TP2.
  • a word vector algorithm for example, deep neural network, word2vec program, etc.
  • the second training original vector TP2 includes all of the second training comments All vectors obtained by mapping words.
  • the length of the vector corresponding to each word in the second training comment is the same as the length of the vector corresponding to each word in the first training comment.
  • step S402 can refer to the related description of step S160 to step S200 of the aforementioned semantic classification method, which will not be repeated here.
  • the predicted category identifier of the second training review is a vector with the same dimension as its real category identifier.
  • the predicted category identifier of the second training review can also be expressed in the form of the aforementioned vector, and each element in the vector represents the prediction of each category identifier. Probability, for example, the category identifier with the largest predicted probability is selected as the category identifier of the semantic classification.
  • Step S403 Based on the predicted category identification of the first training review and the predicted category identification of the second training review, a system loss value is calculated through the system loss function;
  • system loss function can be expressed as:
  • Lobi represents the system loss function
  • L( ⁇ , ⁇ ) represents the cross-entropy loss function
  • Y1 represents the prediction category identification of the first training review
  • T1 represents the true category identification of the first training review
  • L(Y1, T1) represents The cross-entropy loss function of the first training review
  • ⁇ 1 represents the weight of the cross-entropy loss function L(Y1, T1) of the first training review in the system loss function
  • Y2 represents the prediction category identifier of the second training review
  • T1 represents the first training review.
  • L(Y2, T2) represents the cross entropy loss function of the second training review
  • ⁇ 2 represents the weight of the cross entropy loss function L(Y2, T2) of the second training review in the system loss function .
  • the cross-entropy loss function L( ⁇ , ⁇ ) can be expressed as:
  • Y and T are both formal parameters
  • N represents the number of training reviews (for example, the first training review or the second training review)
  • K represents the number of category identifiers for semantic classification.
  • the training goal of the semantic classification training stage S400 is to minimize the loss of the system. For example, the smaller the value of the cross-entropy loss function L(Y1, T1) of the first training comment is, the closer the predicted category identifier of the first training comment is to the true category identifier of the first training comment, that is, the value of the first training comment The more accurate the semantic classification is; similarly, the smaller the value of the cross-entropy loss function L(Y2, T2) of the second training review is, it indicates that the predicted category identification of the second training review is closer to the true category identification of the second training review. That is, the semantic classification of the second training comment is more accurate.
  • Step S404 Correct the parameters of the generating network, the first branch network, the first classification network, the second branch network, and the second classification network according to the system loss value.
  • the initial parameters of the first branch network SN1, the first classification network CN1, the second branch network SN2, and the second classification network CN2 may be random numbers, for example, the random numbers conform to a Gaussian distribution.
  • the initial parameters of the first branch network SN1, the first classification network CN1, the second branch network SN2, and the second classification network CN2 may also be trained parameters in databases commonly used in the art. The embodiment of the present disclosure does not limit this.
  • the training process of the semantic classification training stage S400 may also include an optimization function (not shown in FIG. 11).
  • the optimization function may be calculated according to the system loss value calculated by the system loss function to generate the network G, the first branch network SN1, and the first branch network SN1.
  • the optimization function can use stochastic gradient descent (SGD) algorithm, batch gradient descent (BGD) algorithm, etc. to calculate the generation network G, the first branch network SN1, the first classification network CN1, and the second branch The error value of the parameters of the network SN2 and the second classification network CN2.
  • the semantic classification training stage S400 may also include: judging whether the training of the generation network, the first branch network, the first classification network, the second branch network, and the second classification network meets predetermined conditions, and if the predetermined conditions are not met, repeat execution The training process of the above-mentioned semantic classification training stage S400; if the predetermined conditions are met, the current training process of the semantic classification training stage S400 is stopped, and the generated network, the first branch network, the first classification network, and the second branch trained at the current stage are obtained Network and second classification network.
  • the foregoing predetermined condition is the system loss corresponding to two consecutive pairs of comments (for example, in the training process of the semantic classification training stage S400, each pair of comments includes a first training comment and a second training comment). The value no longer decreases significantly.
  • the foregoing predetermined condition is that the number of training times or the training period of the semantic classification training stage S400 reaches a predetermined number. The embodiment of the present disclosure does not limit this.
  • the training process of the semantic classification training stage S400 only schematically illustrates the training process of the semantic classification training stage S400.
  • sample comments that is, comments on the first object and comments on the second object
  • Both can include multiple iterations to modify the parameters of the generated network.
  • the training process of the semantic classification training stage S400 also includes fine-tune the parameters of the generation network, the first branch network, the first classification network, the second branch network, and the second classification network to obtain more optimization. Parameters.
  • the generating confrontation training stage S300 and the semantic classification stage S400 are alternately iteratively performed, wherein the generating network G participates in the training of these two training stages at the same time.
  • the generative adversarial training stage S300 can improve the ability of the generative network G to extract common representations, but at the same time, the generative network G may also extract the first training comment and the second training comment that will be used Words that are not related to semantic classification; for example, the semantic classification stage S400 can enable the generation network G to obtain the function of filtering these words that are not related to semantic classification, thereby helping to improve the accuracy of semantic classification and the operating efficiency of the neural network.
  • the neural network training method can train the neural network, wherein the trained generation network G, the first branch network SN1, the second branch network SN2, the first classification network CN1, and the second classification network CN2 can be used to implement the functions of the common representation extractor EE0, the first representation extractor EE1, the second representation extractor EE2, the first semantic classifier CC1, and the second semantic classifier CC2 in the aforementioned semantic classification method, so that Perform the aforementioned semantic classification method.
  • FIG. 13 is a schematic block diagram of a semantic classification device provided by at least one embodiment of the present disclosure.
  • the semantic classification device 500 includes a memory 510 and a processor 520.
  • the memory 510 is used for non-transitory storage of computer readable instructions
  • the processor 520 is used for running the computer readable instructions.
  • the semantic classification method provided by any embodiment of the present disclosure is executed.
  • the neural network training method provided in any embodiment of the present disclosure may also be executed.
  • the memory 510 and the processor 520 may directly or indirectly communicate with each other.
  • components such as the memory 510 and the processor 520 may communicate through a network connection.
  • the network may include a wireless network, a wired network, and/or any combination of a wireless network and a wired network.
  • the network may include a local area network, the Internet, a telecommunication network, the Internet of Things (Internet of Things) based on the Internet and/or a telecommunication network, and/or any combination of the above networks, and so on.
  • the wired network may, for example, use twisted pair, coaxial cable, or optical fiber transmission for communication, and the wireless network may use, for example, a 3G/4G/5G mobile communication network, Bluetooth, Zigbee, or WiFi.
  • the present disclosure does not limit the types and functions of the network here.
  • the processor 520 may control other components in the semantic classification apparatus to perform desired functions.
  • the processor 520 may be a central processing unit (CPU), a tensor processor (TPU), or a graphics processor GPU, and other devices with data processing capabilities and/or program execution capabilities.
  • the central processing unit (CPU) can be an X86 or ARM architecture.
  • the GPU can be directly integrated on the motherboard alone or built into the north bridge chip of the motherboard.
  • the GPU can also be built into the central processing unit (CPU).
  • the memory 510 may include any combination of one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • Volatile memory may include random access memory (RAM) and/or cache memory (cache), for example.
  • Non-volatile memory may include, for example, read only memory (ROM), hard disk, erasable programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, flash memory, etc.
  • one or more computer instructions may be stored in the memory 510, and the processor 520 may execute the computer instructions to implement various functions.
  • Various application programs and various data can also be stored in the computer-readable storage medium, such as the comment sample set of the first object, the comment sample set of the second object, the first original vector, the second original vector, and application usage and data. / Or various data generated etc.
  • one or more steps in the semantic classification method described above may be executed.
  • one or more steps in the neural network training method described above may be executed.
  • the semantic classification device provided by the embodiments of the present disclosure is exemplary rather than restrictive. According to actual application requirements, the semantic classification device may also include other conventional components or structures, for example, to achieve semantic classification. For the necessary functions of the device, those skilled in the art can set other conventional components or structures according to specific application scenarios, which are not limited in the embodiments of the present disclosure.
  • FIG. 14 is a schematic block diagram of a neural network training device provided by at least one embodiment of the present disclosure.
  • the neural network training device 500' includes a memory 510' and a processor 520'.
  • the memory 510' is used for non-transitory storage of computer-readable instructions
  • the processor 520' is used for running the computer-readable instructions
  • the computer-readable instructions are executed when the processor 520' runs Training method of neural network.
  • the semantic classification method provided in any embodiment of the present disclosure can also be executed.
  • the memory 510' and the processor 520' respectively have functions and settings similar to the above-mentioned memory 510 and the processor 520, which have been described in detail above and will not be repeated here.
  • FIG. 15 is a schematic diagram of a storage medium provided by an embodiment of the present disclosure.
  • the storage medium 600 non-transitory stores computer-readable instructions 601.
  • any of the embodiments of the present disclosure can be executed.
  • the instruction of the semantic classification method or the instruction of the neural network training method provided by any embodiment of the present disclosure can be executed. It is also possible to execute the semantic classification method provided by any embodiment of the present disclosure after executing the instruction of the neural network training method provided by any embodiment of the present disclosure.
  • one or more computer instructions may be stored on the storage medium 600.
  • Some computer instructions stored on the storage medium 600 may be, for example, instructions for implementing one or more steps in the above semantic classification method.
  • the other computer instructions stored on the storage medium may be, for example, instructions for implementing one or more steps in the above-mentioned neural network training method.
  • the storage medium may include the storage components of a tablet computer, the hard disk of a personal computer, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), optical disk read-only memory (CD -ROM), flash memory, or any combination of the above storage media, can also be other suitable storage media.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • CD -ROM optical disk read-only memory
  • flash memory or any combination of the above storage media, can also be other suitable storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

一种语义分类方法及装置、神经网络的训练方法及装置、以及存储介质。该语义分类方法包括:输入关于第一对象的第一评论;使用共同表示提取器对第一评论进行处理,以提取用于表征第一评论中的共同表示的第一共同表示向量;使用第一表示提取器对第一评论进行处理,以提取用于表征第一评论中的单一表示的第一单一表示向量;将第一共同表示向量和第一单一表示向量进行拼接,以得到第一表示向量;以及使用第一语义分类器对第一表示向量进行处理,以得到第一评论的语义分类;其中,共同表示包括既用于评论第一对象又用于评论第二对象的意思表示,第二对象为与第一对象不同的关联评论对象,第一评论的单一表示包括仅用于评论第一对象的意思表示。

Description

神经网络的训练方法及装置、语义分类方法及装置和介质
本申请要求于2019年9月9日提交的、申请号为201910863457.8的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开的实施例涉及一种神经网络的训练方法、神经网络的训练装置、语义分类方法、语义分类装置以及存储介质。
背景技术
人工智能是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能包括研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能技术可以应用于自然语言处理(Natural Language Processing,NLP)领域。NLP是计算机科学、人工智能和信息工程的交叉领域,涉及统计学、语言学等的知识,其目标是让计算机处理或“理解”自然语言,以执行文本分类、语言翻译和问题回答等任务。
发明内容
本公开至少一个实施例提供一种语义分类方法,包括:输入关于第一对象的第一评论;使用共同表示提取器对所述第一评论进行处理,以提取用于表征所述第一评论中的共同表示的第一共同表示向量;使用第一表示提取器对所述第一评论进行处理,以提取用于表征所述第一评论中的单一表示的第一单一表示向量;将所述第一共同表示向量和所述第一单一表示向量进行拼接,以得到第一表示向量;以及,使用第一语义分类器对所述第一表示向量进行处理,以得到所述第一评论的语义分类;其中,所述共同表示包括既用于评论所述第一对象又用于评论第二对象的意思表示,所述第二对象为与所述第一对象不同的关联评论对象,所述第一评论的单一表示包括仅用于评论所述第一对象的意思表示。
例如,本公开一些实施例提供的语义分类方法,还包括:将所述第一评论映射为第一原始向量;其中,使用所述共同表示提取器对所述第一评论进行处理,包括:使用所述共同表示提取器对所述第一原始向量进行处理;使用所述第一表示提取器对所述第一评论进行处理,包括:使用所述第一表示提取器对所述第一原始向量进行处理。
例如,在本公开一些实施例提供的语义分类方法中,将所述第一评论映射为所述第一原始向量,包括:使用词向量算法将所述第一评论中的每个字映射为具有指定长度的向量,以得到所述第一原始向量。
例如,在本公开一些实施例提供的语义分类方法中,所述共同表示提取器和所述第一表示提取器各自分别包括循环神经网络、长短期记忆网络和双向长短期记忆网络之一,所述第一语义分类器包括softmax分类器。
例如,本公开一些实施例提供的语义分类方法,还包括:输入关于第二对象的第二评论;使用所述共同表示提取器对所述第二评论进行处理,以提取用于表征所述第二评论中的所述共同表示的第二共同表示向量;使用第二表示提取器对所述第二评论进行处理,以提取用于表征所述第二评论中的单一表示的第二单一表示向量;将所述第二共同表示向量和所述第二单一表示向量进行拼接,以得到第二表示向量;以及使用第二语义分类器对所述第二表示向量进行处理,以得到所述第二评论的语义分类;其中,所述第二评论的单一表示包括仅用于评论所述第二对象的意思表示。
例如,本公开一些实施例提供的语义分类方法,还包括:将所述第二评论映射为第二原始向量;其中,使用所述共同表示提取器对所述第二评论进行处理,包括:使用所述共同表示提取器对所述第二原始向量进行处理;使用所述第二表示提取器对所述第二评论进行处理,包括:使用所述第二表示提取器对所述第二原始向量进行处理。
例如,在本公开一些实施例提供的语义分类方法中,将所述第二评论映射为所述第二原始向量,包括:使用词向量算法将所述第二评论中的每个字映射为具有指定长度的向量,以得到所述第二原始向量。
例如,在本公开一些实施例提供的语义分类方法中,所述第二表示提取器包括循环神经网络、长短期记忆网络和双向长短期记忆网络之一,所述第二语义分类器包括softmax分类器。
例如,在本公开一些实施例提供的语义分类方法中,所述第一评论和所述第二评论的语料来源包括文本和语音至少之一。
本公开至少一实施例还提供一种神经网络的训练方法,所述神经网络包括:生成网络、第一分支网络、第一分类网络、第二分支网络和第二分类网络;所述训练方法包括:语义分类训练阶段;其中,所述语义分类训练阶段包括:输入关于第一对象的第一训练评论,使用所述生成网络对所述第一训练评论进行处理,以提取第一训练共同表示向量,使用所述第一分支网络对所述第一训练评论进行处理,以提取第一训练单一表示向量,将所述第一训练共同表示向量与所述第一训练单一表示向量进行拼接,以得到第一训练表示向量,使用所述第一分类网络对所述第一训练表示向量进行处理,以得到所述第一训练评论的语义分类的预测类别标识;输入关于第二对象的第二训练评论,使用所述生成网络对所述第二训练评论进行处理,以提取第二训练共同表示向量,使用所述第二分支网络对所述第二训练评论进行处理,以提取第二训练单一表示向量,将所述第二训练共同表示向量与所述第二训练单一表示向量进行拼接,以得到第二训练表示向量,使用所述第二分类网络对所述第二训练表示向量进行处理,以得到所述第二训练评论的语义分类的预测类别标识;基于所述第一训练评论的预测类别标识和所述第二训练评论的预测类别标识,通过系统损失函数计算系统损失值;以及,根据所述系统损失值对所述生成网络、所述第一分支网络、所述第一分类网络、所述第二分支网络和所述第二分类网络的参数进行修正;其中,所述第一对象和所述第二对象为关联评论对象。
例如,在本公开一些实施例提供的训练方法中,所述语义分类训练阶段还包括:将所述第一训练评论映射为第一训练原始向量,将所述第二训练评论映射为第二训练原始向量;其中,使用所述生成网络对所述第一训练评论进行处理,包括:使用所述生成网络对所述第一训练原始向量进行处理;使用所述第一分支网络对所述第一训练评论进行处理,包括:使用所述第一分支网络对所述第一训练原始向量进行处理;使用所述生成网络对所述第二训练评论进行处理,包括:使用所述生成网络对所述第二训练原始向量进行处理;使用所述第二分支网络对所述第二训练评论进行处理,包括:使用所述第二分支网络对所述第二训练原始向量进行处理。
例如,在本公开一些实施例提供的训练方法中,将所述第一训练评论映射为所述第一训练原始向量,包括:使用词向量算法将所述第一训练评论中的每个字映射为具有指定长度的向量,以得到所述第一训练原始向量;将所述第二训练评论映射为所述第二训练原始向量,包括:使用所述词向量算法将所述第二训练评论中的每个字映射为具有所述指定长度的向量,以得到所述第二训练原始向量。
例如,在本公开一些实施例提供的训练方法中,所述生成网络、所述第一分支网络、所述第二分支网络均包括循环神经网络、长短期记忆网络和双向长短期记忆网络之一,所述第一分类网络、所述第二分类网络均包括softmax分类器。
例如,在本公开一些实施例提供的训练方法中,所述系统损失函数表示为:
L obj=λ 1·L(Y1,T1)+λ 2·L(Y2,T2)
其中,L obj表示系统损失函数,L(·,·)表示交叉熵损失函数,Y1表示所述第一训练评论的预测类别标识,T1表示所述第一训练评论的真实类别标识,L(Y1,T1)表示第一训练评论的交叉熵损失函数,λ 1表示在所述系统损失函数中所述第一训练评论的交叉熵损失函数L(Y1,T1)的权重,Y2表示所述第二训练评论的预测类别标识,T1表示所述第二训练评论的真实类别标识,L(Y2,T2)表示第二训练评论的交叉熵损失函数,λ 2表示在所述系统损失函数中所述第二训练评论的交叉熵损失函数L(Y2,T2)的权重;
所述交叉熵损失函数L(·,·)表示为:
Figure PCTCN2020113740-appb-000001
其中,Y、T均为形式参数,N表示训练评论的数量,K表示语义分类的类别标识的数量,
Figure PCTCN2020113740-appb-000002
表示第i个训练评论的预测类别标识中第j个类别标识的概率值,
Figure PCTCN2020113740-appb-000003
表示所述第i个训练评论的真实类别标识中第j个类别标识的概率值。
例如,在本公开一些实施例提供的训练方法中,所述神经网络还包括判别网络;所述训练方法还包括:生成对抗训练阶段;以及交替地执行所述生成对抗训练阶段和所述语义分类训练阶段;其中,所述生成对抗训练阶段包括:基于所述生成网络,对所述判别网络进行训练;基于所述判别网络,对所述生成网络进行训练;以及交替地执行上述训练过程,以完成所述述生成对抗训练阶段的训练。
例如,在本公开一些实施例提供的训练方法中,基于所述生成网络,对所述判别网络进行训练,包括:输入关于所述第一对象的第三训练评论,使用所述生成网络对所述第三训练评论进行处理,以提取第三训练共同表示向量,使用所述判别网络对所述第三训练共同表示向量进行处理,以得到第三训练输出;输入关于所述第二对象的第四训练评论,使用所述生成网络对所述第四训练评论进行处理,以提取第四训练共同表示向量,使用所述判别网络对所述第四训练共同表示向量进行处理,以得到第四训练输出;基于所述第三训练输出和所述第四训练输出,通过判别网络对抗损失函数计算判别网络对抗损失值;根据所述判别网络对抗损失值对所述判别网络的参数进行修正。
例如,在本公开一些实施例提供的训练方法中,所述判别网络包括二分类的softmax分类器。
例如,在本公开一些实施例提供的训练方法中,所述判别网络对抗损失函数表示为:
Figure PCTCN2020113740-appb-000004
其中,L D表示所述判别网络对抗损失函数,z1表示所述第三训练评论,P data(z1)表示所述第三训练评论的集合,G(z1)表示所述第三训练共同表示向量,D(G(z1))表示所述第三训练输出,
Figure PCTCN2020113740-appb-000005
表示针对所述第三训练评论的集合求期望,z2表示所述第四训练评论,P data(z2)表示所述第四训练评论的集合,G(z2)表示所述第四训练共同表示向量,D(G(z2))表示所述第四训练输出,
Figure PCTCN2020113740-appb-000006
表示针对所述第四训练评论的集合求期望。
例如,在本公开一些实施例提供的训练方法中,基于所述判别网络,对所述生成网络进行训练,包括:输入关于所述第一对象的第五训练评论,使用所述生成网络对所述第五训练评论进行处理,以提取第五训练共同表示向量,使用所述判别网络对所述第五训练共同表示向量进行处理,以得到第五训练输出;输入关于所述第二对象的第六训练评论,使用所述生成网络对所述第六训练评论进行处理,以提取第六训练共同表示向量,使用所述判别网络对所述第六训练共同表示向量进行处理,以得到第六训练输出;基于所述第五训练输出和所述第六训练输出,通过生成网络对抗损失函数计算生成网络对抗损失值;根据所述生成网络对抗损失值对所述生成网络的参数进行修正。
例如,在本公开一些实施例提供的训练方法中,所述生成网络对抗损失函数可以表示为:
Figure PCTCN2020113740-appb-000007
其中,L G表示所述生成网络对抗损失函数,z3表示所述第五训练评论,P data(z3)表示所述第五训练评论的集合,G(z3)表示所述第五训练共同表示向量,D(G(z3))表示所述第五训练输出,
Figure PCTCN2020113740-appb-000008
表示针对所述第五训练评论的集合求期望,z4表示所述第六训练评论,P data(z4)表示所述第六训练评论的集合,G(z4)表示所述第六训练共同表示向量,D(G(z4))表示所述第六训练输出,
Figure PCTCN2020113740-appb-000009
表示针对所述第六训练评论的集合求期望。
本公开至少一实施例还提供一种语义分类装置,包括:存储器,用于存储非暂时性计算机可读指令;以及处理器,用于运行所述计算机可读指令。所述计算机可读指令被 所述处理器运行时执行本公开任一实施例提供的语义分类方法。
本公开至少一实施例还提供一种神经网络的训练装置,包括:存储器,用于存储非暂时性计算机可读指令;以及处理器,用于运行所述计算机可读指令。所述计算机可读指令被所述处理器运行时执行本公开任一实施例提供的训练方法。
本公开至少一实施例还提供一种存储介质,非暂时性地存储计算机可读指令,当所述非暂时性计算机可读指令由计算机执行时可以执行本公开任一实施例提供的语义分类方法的指令。
本公开至少一实施例还提供一种存储介质,非暂时性地存储计算机可读指令,当所述非暂时性计算机可读指令由计算机执行时可以执行本公开任一实施例提供的训练方法的指令。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。
图1为本公开至少一实施例提供的一种语义分类方法的流程图;
图2为图1所示的语义分类方法的示例性流程框图;
图3为本公开至少一实施例提供的另一种语义分类方法的流程图;
图4为图3所示的语义分类方法的示例性流程框图;
图5为本公开至少一实施例提供的一种神经网络的示意性架构框图;
图6为本公开至少一实施例提供的一种神经网络的训练方法的流程图;
图7为本公开至少一实施例提供的一种对应于图6所示的训练方法的生成对抗训练阶段中判别网络的示意性训练架构框图;
图8为本公开至少一实施例提供的一种训练判别网络的过程的示意性流程图;
图9为本公开至少一实施例提供的一种对应于图6所示的训练方法的生成对抗训练阶段中生成网络的示意性训练架构框图;
图10为本公开至少一实施例提供的一种训练生成网络的过程的示意性流程图;
图11为本公开至少一实施例提供的一种对应于图6所示的训练方法的语义分类训练阶段的示意性训练架构框图;
图12为本公开至少一实施例提供的一种训练方法中的语义分类训练阶段的训练过 程的示意性流程图;
图13为本公开至少一实施例提供的一种语义分类装置的示意性框图;
图14为本公开至少一实施例提供的一种神经网络的训练装置的示意性框图;以及
图15为本公开至少一实施例提供的一种存储介质的示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
下面通过几个具体的实施例对本公开进行说明。为了保持本公开实施例的以下说明清楚且简明,本公开省略了已知功能和已知部件的详细说明。当本公开实施例的任一部件在一个以上的附图中出现时,该部件在每个附图中由相同或类似的参考标号表示。
例如,随着互联网的发展,人们对健康日益关注,医疗论坛广泛兴起。在医疗论坛中,人们参与自己感兴趣的话题,并与其他论坛成员进行讨论和交流。因此医疗论坛上存在大量用户生成的主观性文本,例如用户依据自己的就诊经历,在医疗论坛上发表关于医院、医生或者药品治疗等评论。对医疗论坛中的用户评论进行分析具有广泛的应用场景。对于用户来说,在对健康医疗有所需求时,可以从医疗论坛上面查询相关评论信息,例如从其他患者的评论中了解治疗方面的心得体会以及对治疗过程中的医院、医生和药品的态度倾向,然后依据这些信息来做出治疗决策。对于医院和医生来说也可以从 患者反馈的信息中受益,例如医院可以根据这些评论来改善服务质量、环境紧张的医患关系、提高知名度。
以关于医院和医生的评论为例,用户的评语根据其适用的评论对象可以划分为:仅用于评价医院的评语,例如“科室齐全”之类的评语;仅用于评价医生的评语,例如“医术精湛”之类的评语;以及,既可以用于评价医院又可以用于评价医生的评语,例如“服务周到”之类的评语。在本公开中,可以用于评价不同的评论对象的评语,称为共同表示;仅用于评价单一的评论对象的评语,称为单一表示。
关于医院和医生的评论可以根据其评论内容进行语义分类,例如可以划分为好评、中评、差评等。对关于医院和医生的评论进行语义分类时,如果能提取评论中的共同表示和单一表示,以基于更多有效的信息进行语义分类,将有助于提高评论分析的客观性和准确率。
需要说明的是,在本公开中,定义医院和医生这两个评论对象互为关联评论对象,即医院为医生的关联评论对象,医生为医院的关联评论对象;类似地,其他互为关联评论对象的情况还可以包括学校和老师、外卖平台和外卖商家、等。例如,两个关联评论对象之间可能存在某种相互依存的关系,但不限于此,例如,一个评论对象为另一个评论对象的组成部分(例如雇员)、服务商或供应商(例如外卖服务)等;又例如,两个关联评论对象之一的评论的好坏可能可以在一定程度上反映两个关联评论对象另一的评论的好坏。
本公开至少一实施例提供一种语义分类方法。该语义分类方法包括:输入关于第一对象的第一评论;使用共同表示提取器对第一评论进行处理,以提取用于表征第一评论中的共同表示的第一共同表示向量;使用第一表示提取器对第一评论进行处理,以提取用于表征第一评论中的单一表示的第一单一表示向量;将第一共同表示向量和第一单一表示向量进行拼接,以得到第一表示向量;以及使用第一语义分类器对第一表示向量进行处理,以得到第一评论的语义分类;其中,共同表示包括既用于评论第一对象又用于评论第二对象的意思表示,第二对象为与第一对象不同的关联评论对象,第一评论的单一表示包括仅用于评论第一对象的意思表示。
本公开的一些实施例还提供对应于上述语义分类方法的语义分类装置、神经网络的训练方法、对应于神经网络的训练方法的装置、以及存储介质。
本公开至少一实施例提供的语义分类方法,可以提取关于第一对象的第一评论中的 共同表示和单一表示,并基于共同表示和单一表示对该第一评论进行语义分类,有助于提高评论分析的客观性和准确率。
下面结合附图对本公开的一些实施例及其示例进行详细说明。
图1为本公开至少一实施例提供的一种语义分类方法的流程图,图2为图1所示的语义分类方法的示例性流程框图。
例如,如图1所示,该语义分类方法包括步骤S110至步骤S150。以下结合图2,对图1所示的语义分类方法进行详细说明。
步骤S110:输入关于第一对象的第一评论。
例如,在步骤S110中,第一对象可以为任意一种评论对象,例如医院、医生、学校、老师、外卖平台、外卖商家等,本公开的实施例对此不作限制。例如,第一评论可以来源于与第一对象有关的论坛等。
例如,第一评论的语料来源可以包括文本、语音、图片(例如表情图标)等,例如语音、图片等可以通过人工方式或者人工智能方式转换为文本。
例如,第一评论的语言可以包括汉语、英语、日语、德语、韩语等,本公开的实施例对此不作限制。例如,在一些示例中,该语义分类方法可以处理一种或多种预先确定的语言,对于其他语言(不属于该一种或多种预先确定的语言)的第一评论,可以经过翻译(例如,翻译成预先确定的语言)后再进行处理。
例如,在一些示例中,如图2所示,输入关于第一对象的第一评论,即步骤S110可以包括:将第一评论映射为第一原始向量P1。从而,在后续步骤中对第一评论进行处理就是对第一原始向量P1进行处理。例如,可以采用词向量算法(例如,深度神经网络、word2vec程序等)将第一评论中的每个字映射为指定长度的向量,从而第一原始向量P1包括第一评论中的全部字经过映射得到的全部向量。例如,每个字对应的向量的长度相同。需要说明的是,在本公开的实施例中,向量的长度是指该向量包括的元素的数目。
例如,以一条包括n个字(x1,x2,…,xn)的第一评论为例,可以采用词向量算法将该第一评论中的n个字分别映射为向量Vx1、Vx2、…、Vxn,由此得到第一原始向量P1(Vx1,Vx2,…,Vxn),其中,Vx1、Vx2、…、Vxn具有相同的长度。需要说明的是,从数学角度而言,第一原始向量具有矩阵形式。
步骤S120:使用共同表示提取器对第一评论进行处理,以提取用于表征第一评论 中的共同表示的第一共同表示向量。
例如,在步骤S120中,共同表示提取器可以采用基于时间序列上样本关系的模型,例如,包括但不限于,循环神经网络(Recurrent Neural Network,RNN)、长短期记忆网络(Long Short Term Memory,LSTM)、双向长短期记忆网络(Bi-directional Long Short Term Memory,Bi-LSTM)等。
例如,在一些示例中,如图2所示,在将第一评论映射为第一原始向量P1后,使用共同表示提取器EE0对该第一原始向量P1进行处理,以提取第一共同表示向量P01。例如,以共同表示提取器EE0采用LSTM模型为例,LSTM包括依次连接的多个处理单元(cell),将第一原始向量P1(Vx1,Vx2,…,Vxn)中的n个向量Vx1、Vx2、…、Vxn分别作为LSTM的前n个处理单元的输入,LSTM的第n个处理单元的输出即为第一共同表示向量P01。需要说明的是,这里LSTM包括的处理单元的数目大于或等于其处理的最长的第一评论的字数。
例如,共同表示包括既用于评论第一对象又用于评论第二对象的意思表示共同表示,其中,第二对象为与第一对象不同的关联评论对象。例如,在一些示例中,以第一对象为医院、第二对象为医生为例,在此情况下,共同表示包括“服务周到”、“干净”之类的既可以用于评阶医院又可以用于评价医生的评语,又或者是不参考上下文就不能用于区分是用于评价医院还是用于评价医生的评语。
例如,共同表示提取器EE0可以经过后续将要介绍的训练方法训练得到,从而可以实现提取第一评论以及第二评论中的共同表示的功能,需要说明的是,本公开的实施例包括但不限于此。
步骤S130:使用第一表示提取器对第一评论进行处理,以提取用于表征第一评论中的单一表示的第一单一表示向量。
例如,在步骤S130中,第一表示提取器也可以采用基于时间序列上样本关系的模型,例如,循环神经网络(RNN)、长短期记忆网络(LSTM)、双向长短期记忆网络(Bi-LSTM)等。例如,第一表示提取器可以采用与其同表示提取器同一种类的模型。
例如,在一些示例中,如图2所示,在将第一评论映射为第一原始向量P1后,使用第一表示提取器EE1对该第一原始向量P1进行处理,以提取第一单一表示向量P11。例如,第一表示提取器EE1对第一原始向量P1进行处理的过程可以参考共同表示提取器EE0对第一原始向量P1进行处理的过程,在此不再赘述。
例如,第一评论中的单一表示包括仅用于评论第一对象的意思表示,也就是说,该意思表示不用于评论第二对象(即与第一对象不同的关联评论对象)。例如,在一些示例中,以第一对象为医院、第二对象为医生为例,在此情况下,第一评论中的单一表示包括“科室齐全”、“设备先进”之类的仅能用于评价医院而不能用于评价医生的评语。
需要说明的是,在本公开的实施例中,第一单一表示向量P11包括第一评论中的单一表示的信息;除此之外,第一单一表示向量P11还可以包括(当然,或者也可以不包括)第一评论中的共同表示的信息;需要说明的是,本公开的实施例对此不作限制。
例如,第一表示提取器EE1可以经过后续将要介绍的训练方法训练得到,从而可以实现提取第一评论中的单一表示的功能,需要说明的是,本公开的实施例包括但不限于此。
步骤S140:将第一共同表示向量和第一单一表示向量进行拼接,以得到第一表示向量。
例如,如图2所示,将第一共同表示向量P01和第一单一表示向量P11进行拼接,以得到第一表示向量P10。假设第一共同表示向量P01包括s个元素(a1,a2,…,as),第一单一表示向量P11包括t个元素(b1,b2,…,bt),则将第一共同表示向量P01和第一单一表示向量P11进行拼接,就是将该s+t个元素按照预定顺序拼接。例如,可以拼接为(a1,…,as,b1,…,bt)或(b1,…,bt,a1,…,as)等形式,以得到第一表示向量P10。需要说明的是,本公开的实施例对第一表示向量P10中的各个元素的排列顺序不作限制,只要第一表示向量P10包括第一共同表示向量P01和第一单一表示向量P11中的全部元素即可。
步骤S150:使用第一语义分类器对第一表示向量进行处理,以得到第一评论的语义分类。
例如,如图2所示,使用第一语义分类器CC1对第一表示向量P10进行处理,以得到第一评论的语义分类。例如,第一语义分类器CC1可以包括softmax分类器,该softmax分类器例如包括全连接层。例如,第一表示向量经过全连接层处理后,得到一个K维(即包括K个元素,对应K个类别标识)向量z,向量z中的元素可以为任意实数;softmax分类器可以将K维向量z压缩成K维向量。softmax分类器的公式如下:
Figure PCTCN2020113740-appb-000010
其中,Z j表示K维向量z中第j个元素,σ(z)表示每个类别标识(label)的预测概率,σ(z)为实数,且其范围为(0,1),K维向量σ(z)的和为1。根据以上公式,K维向量z中的每个类别标识均被赋予一定的预测概率,而具有最大预测概率的类别标识被选择作为语义分类的类别标识。
应当理解的是,语义分类的类别标识的种类数量即为K,例如K为大于或等于2的整数。例如,在一些示例中,K=3,从而第一评论可以划分为例如好评、中评、差评,需要说明的是,本公开的实施例包括但不限于此。
例如,第一语义分类器CC1可以经过后续将要介绍的训练方法训练得到,从而可以实现上述语义分类的功能,需要说明的是,本公开的实施例包括但不限于此。
图3为本公开至少一实施例提供的另一种语义分类方法的流程图,图4为图3所示的语义分类方法的示例性流程框图。
例如,如图3所示,在图1所示的语义分类方法的基础上,图3所示的语义分类方法还包括步骤S160至步骤S200。需要说明的是,图3所示的语义分类方法中的步骤S160至步骤S200中的操作与步骤S110至步骤S150中的操作基本类似,其不同之处主要在于:步骤S110至步骤S150用于对关于第一对象的第一评论进行语义分类处理,而步骤S160至步骤S200用于对关于第二对象的第二评论进行语义分类处理,其中,第一对象和第二对象互为关联评论对象。因此,步骤S160至步骤S200的细节可以对应参考步骤S110至步骤S150的相关描述。
以下结合图4,对图3所示的语义分类方法的步骤S160至步骤S200进行详细说明。
步骤S160:输入关于第二对象的第二评论。
例如,在步骤S160中,第二对象为与第一对象不同的关联评论对象。例如,当第一对象为医院时,第二对象可以为医生或者药物等与医院关联的评论对象;或者,当第一对象为医生时,第二对象可以为医院或者药物等与医生关联的评论对象。需要说明的是,本公开的实施例包括但不限于此,例如,第一对象和第二对象之一还可以为学校、外卖平台等,相应地,第一对象和第二对象另一还可以为老师、外卖商家等;也就是说,只要第一对象和第二对象互为关联评论对象即可。例如,第二评论可以来源于与第二对象有关的论坛等。例如,在一些示例中,第一评论和第二评论可以来源于同一个论坛等。
例如,与第一评论相似,第二评论的语料来源也可以包括文本、语音、图片等,例如语音、图片等可以通过人工方式或者人工智能方式转换为文本。例如,第二评论的语 言可以包括汉语、英语、日语、德语、韩语等,本公开的实施例对此不作限制。例如,在一些示例中,该语义分类方法可以处理一种或多种预先确定的语言,对于其他语言(不属于该一种或多种预先确定的语言)的第二评论,可以经过翻译(例如,翻译成预先确定的语言)后再进行处理。
例如,在一些示例中,如图4所示,输入关于第一对象的第一评论,即步骤S160可以包括:将第二评论映射为第二原始向量P2。从而,在后续步骤中对第二评论进行处理就是对第二原始向量P2进行处理。例如,可以采用词向量算法(例如,深度神经网络、wotd2vec程序等)将第二评论中的每个字映射为指定长度的向量,从而第二原始向量P2包括第二评论中的全部字经过映射得到的全部向量。例如,第二评论中的每个字对应的向量的长度与第一评论中的每个字对应的向量的长度相同。
步骤S170:使用共同表示提取器对第二评论进行处理,以提取用于表征第二评论中的共同表示的第二共同表示向量。
例如,如图4所示,步骤S120中采用的共同表示提取器EE0还可以用于步骤S170,即共同表示提取器EE0还可以对第二评论进行处理,以提取用于表征第二评论中的共同表示的第二共同表示向量P02。
例如,在一些示例中,如图4所示,在将第二评论映射为第二原始向量P2后,使用共同表示提取器EE0对该第二原始向量P2进行处理,以提取第二共同表示向量P02。例如,共同表示提取器EE0对第二原始向量P2进行处理的过程可以参考共同表示提取器EE0对第一原始向量P1进行处理的过程,在此不再赘述。需要说明的是,以共同表示提取器EE0采用LSTM模型为例,该LSTM包括的处理单元的数目还大于或等于其处理的最长的第二评论的字数。
步骤S180:使用第二表示提取器对第二评论进行处理,以提取用于表征第二评论中的单一表示的第二单一表示向量。
例如,在步骤S180中,第二表示提取器也可以采用基于时间序列上样本关系的模型,例如,循环神经网络(RNN)、长短期记忆网络(LSTM)、双向长短期记忆网络(Bi-LSTM)等。例如,第二表示提取器也可以采用与共同表示提取器同一种类的模型。
例如,在一些示例中,如图4所示,在将第二评论映射为第二原始向量P2后,使用第二表示提取器EE2对该第二原始向量P2进行处理,以提取第二单一表示向量P22。例如,第二表示提取器EE2对第二原始向量P2进行处理的过程可以参考共同表示提取 器EE0对第一原始向量P1进行处理的过程,在此不再赘述。
例如,第二评论中的单一表示包括仅用于评论第二对象的意思表示,也就是说,该意思表示不用于评论第一对象(即与第二对象不同的关联评论对象)。例如,在一些示例中,以第一对象为医院、第二对象为医生为例,在此情况下,第二评论中的单一表示包括“医术精湛”、“语气和蔼”之类的仅能用于评价医生而不能用于评价医院的评语。
需要说明的是,在本公开的实施例中,第二单一表示向量P22包括第二评论中的单一表示的信息;除此之外,第二单一表示向量P22还可以包括(当然也可以不包括)第二评论中的共同表示的信息;需要说明的是,本公开的实施例对此不作限制。
例如,第二表示提取器EE2可以经过后续将要介绍的训练方法训练得到,从而可以实现提取第二评论中的单一表示的功能,需要说明的是,本公开的实施例包括但不限于此。
步骤S190:将第二共同表示向量和第二单一表示向量进行拼接,以得到第二表示向量。
例如,如图4所示,将第二共同表示向量P02和第二单一表示向量P22进行拼接,以得到第二表示向量P20。例如,步骤S190中的拼接过程和细节可以参考步骤S140中的拼接过程和细节,在此不再重复赘述。
步骤S200:使用第二语义分类器对第二表示向量进行处理,以得到第二评论的语义分类。
例如,如图4所示,使用第二语义分类器CC2对第二表示向量P20进行处理,以得到第二评论的语义分类。例如,第二语义分类器CC2也可以包括softmax分类器,该softmax分类器例如包括全连接层;例如,第二语义分类器CC2的处理过程和细节可以参考第一语义分类器CC1的处理过程和细节,在此不再重复赘述。
需要说明的是,在本公开的实施例中,共同表示提取器EE0、第一表示提取器EE1和第二表示提取器EE2执行相似的功能,三者可以具有相同或相似的构造,但是三者包括的参数可以不同。同样地,第一语义分类器CC1和第二语义分类器CC2执行相似的功能,二者可以具有相同或相似的构造,但是二者包括的参数可以不同。还需要说明的是,在本公开的实施例中,共同表示提取器EE0、第一表示提取器EE1、第二表示提取器EE2、第一语义分类器CC1和第二语义分类器CC2等均可以采用软件、硬件、固件或其任意组合等方式实现,从而可以分别执行相应的处理过程。
需要说明的是,本公开的实施例中,上述语义分类方法的流程可以包括更多或更少的操作(例如,在图3所示的语义分类方法中,可以仅执行步骤S110至步骤S150的操作,也可以仅执行步骤S160至步骤S200的操作),这些操作可以顺序执行或并行执行(例如,步骤S120和步骤S130可以并行执行,也可以按任意顺序依次执行)。虽然上文描述的图像显示处理方法的流程包括特定顺序出现的多个操作,但是应该清楚地了解,多个操作的顺序并不受限制。上文描述的语义分类方法可以执行一次,也可以按照预定条件执行多次。
需要说明的是,在本公开的一些示例中,将第一评论/第二评论映射为第一原始向量/第二原始向量时,可以先将第一评论/第二评论中的与语义分类无关的字词(例如,停用词(stop words)等)过滤掉,然后再将第一评论/第二评论中剩下的与语义分类相关的字词映射为第一原始向量/第二原始向量。在本公开的另一些示例中,通过特定的训练方法训练得到的共同表示提取器EE0、第一表示提取器EE1和第二表示提取器EE2在提取意思表示时可以过滤掉与语义分类无关的字词。需要说明的是,本公开的实施例对此不作限制。
本公开的实施例提供的语义分类方法,可以提取关于第一对象的第一评论中的共同表示和单一表示,并基于共同表示和单一表示对该第一评论进行语义分类,有助于提高评论分析的客观性和准确率。
本公开至少一实施例还提供一种神经网络的训练方法。图5为本公开至少一实施例提供的一种神经网络的示意性架构框图,图6为本公开至少一实施例提供的一种神经网络的训练方法的流程图。
例如,如图5所示,该神经网络包括生成网络G、判别网络D、第一分支网络SN1、第一分类网络CN1、第二分支网络SN2和第二分支网络CN2。例如,如图6所示,该训练方法包括:生成对抗训练阶段S300和语义分类训练阶段S400,以及交替地进行这两个阶段的训练,以得到训练好的神经网络。例如,该神经网络经过训练后,其中的生成网络G、第一分支网络SN1、第一分类器CN1、第二分支网络SN2和第二分类器CN2可以分别用于实现前述语义分类方法中的共同表示提取器EE0、第一表示提取器EE1、第一语义分类器CC1、第二表示提取器EE2、第二语义分类器CC2的功能,由此可以执行前述语义分类方法。
例如,如图6所示,生成对抗训练阶段S300包括:
步骤S310:基于生成网络,对判别网络进行训练;
步骤S320:基于判别网络,对生成网络进行训练;以及,
交替地执行上述训练过程(即步骤S310和步骤S320),以完成生成对抗训练阶段S300的训练。
例如,生成网络G的构造可以与前述共同表示提取器EE0的构造相同,生成网络G的构造细节及工作原理可以参考前述共同表示提取器EE0的相关描述,在此不再赘述。例如,如图5所示,生成网络G既用于处理关于第一对象的评论,又用于处理关于第二对象的评论,以提取评论中的意思表示,其中,第一对象和第二对象互为关联评论对象。
例如,判别网络D可以采用二分类的softmax分类器,例如,该二分类的softmax分类器可以参考前述关于softmax分类器(令K=2即可)的相关描述,在此不再重复赘述。例如,如图5所示,判别网络D用于判断生成网络G提取的意思表示是用于评论第一对象还是第二对象。
图7为本公开至少一实施例提供的一种对应于图6所示的训练方法的生成对抗训练阶段中判别网络的示意性训练架构框图,图8为本公开至少一实施例提供的一种训练判别网络的过程的示意性流程图。
例如,结合图7和图8所示,基于生成网络,对判别网络进行训练,即步骤S310,包括步骤S311至步骤S314,如下所示:
步骤S311:输入关于第一对象的第三训练评论,使用生成网络对第三训练评论进行处理,以提取第三训练共同表示向量,使用判别网络对第三训练共同表示向量进行处理,以得到第三训练输出;
步骤S312:输入关于第二对象的第四训练评论,使用生成网络对第四训练评论进行处理,以提取第四训练共同表示向量,使用判别网络对第四训练共同表示向量进行处理,以得到第四训练输出;
步骤S313:基于第三训练输出和第四训练输出,通过判别网络对抗损失函数计算判别网络对抗损失值;
步骤S314:根据判别网络对抗损失值对判别网络的参数进行修正。
例如,基于生成网络,对判别网络进行训练,即步骤S310还可以包括:判断判别网络的训练是否满足预定条件,若不满足预定条件,则重复执行上述判别网络的训练过程;若满足预定条件,则停止本阶段的判别网络的训练过程,得到本阶段训练好的判别 网络。例如,在一些示例中,上述预定条件为连续两对评论(例如,在训练判别网络的过程中,每一对评论包括一个第三训练评论和一个第四训练评论)对应的判别网络对抗损失值不再显著减小。例如,在另一些示例中,上述预定条件为判别网络的训练次数或训练周期达到预定数目。本公开的实施例对此不作限制。
例如,如图7所示,在判别网络D的训练过程中,需要联合生成网络G进行训练。需要说明的是,在判别网络D的训练过程中,生成网络G的参数保持不变。
需要说明的是,上述示例仅是示意性说明判别网络的训练过程。本领域技术人员应当知道,在训练阶段,需要利用大量样本评论(即关于第一对象的评论和关于第二对象的评论)对神经网络进行训练;同时,在针对每一对样本评论的训练过程中,都可以包括多次反复迭代以对判别网络的参数进行修正。又例如,判别网络的训练过程还包括对判别网络的参数进行微调(fine-tune),以获取更优化的参数。
例如,判别网络D的初始参数可以为随机数,例如随机数符合高斯分布。例如,判别网络D的初始参数也可以采用本领域常用的数据库中已训练好的参数。本公开的实施例对此不作限制。
例如,判别网络D的训练过程中还可以包括优化函数(图7中未示出),优化函数可以根据判别网络对抗损失函数计算得到的判别网络对抗损失值计算判别网络D的参数的误差值,并根据该误差值对判别网络D的参数进行修正。例如,优化函数可以采用随机梯度下降(stochastic gradient descent,SGD)算法、批量梯度下降(batch gradient descent,BGD)算法等计算判别网络D的参数的误差值。
例如,第三训练评论来源于第一对象的评论样本集;例如,第一对象的评论样本集中的每一条评论都已经提前进行了语义分类(例如,通过人工等方式进行了语义分类),具有确定的语义分类的类别标识;例如,第一对象的评论样本集中的语义分类的类别标识包括好评、中评和差评,本公开的实施例包括但不限于此。例如,第四训练评论来源于第二对象的评论样本集;例如,第二对象的评论样本集中的每一条评论都已经提前进行了语义分类(例如,通过人工等方式进行了语义分类),具有确定的语义分类的类别标识;例如,第二对象的评论样本集中的语义分类的类别标识包括好评、中评和差评,本公开的实施例包括但不限于此。
例如,在一些示例中,可以采用词向量算法分别将第三训练评论和第四训练评论映射为原始向量,由生成网络G对第三训练评论和第四训练评论对应的原始向量分别进行 处理,生成网络G的处理过程和细节可以参考前述共同表示提取器EE0的处理过程和细节,在此不再重复赘述。
例如,在一些示例中,判别网络对抗损失函数可以表示为:
Figure PCTCN2020113740-appb-000011
其中,L D表示判别网络对抗损失函数,z1表示第三训练评论,P data(z1)表示第三训练评论的集合,G(z1)表示第三训练共同表示向量,D(G(z1))表示第三训练输出,
Figure PCTCN2020113740-appb-000012
表示针对第三训练评论的集合求期望,z2表示第四训练评论,P data(z2)表示第四训练评论的集合,G(z2)表示第四训练共同表示向量,D(G(z2))表示第四训练输出,
Figure PCTCN2020113740-appb-000013
表示针对第四训练评论的集合求期望。由此,例如可以采用批量梯度下降算法对判别网络D进行参数优化。
需要说明的是,上述公式表示的判别网络对抗损失函数是示例性的,本公开的实施例包括但不限于此。
判别网络D的训练目标是最小化判别网络对抗损失值。例如,在判别网络D的训练过程中,第三训练评论的对象标签设置为1,即需要使判别网络D鉴别认定第三训练共同表示向量来源于关于第一对象的评论;同时,第四训练评论的对象标签设置为0,即需要使判别网络D鉴别认定第四训练共同表示向量来源于关于第二对象的评论。也就是说,判别网络D的训练目标是使判别网络D能够准确判断生成网络G提取的意思表示的真实来源(即来源于关于第一对象的评论还是关于第二对象的评论),也即,使判别网络D能够准确判断生成网络G提取的意思表示是用于评论第一对象还是第二对象。
例如,在判别网络D的训练过程中,判别网络D的参数被不断地修正,以使经过参数修正后的判别网络D能够准确鉴别第三训练共同表示向量和第四训练共同表示向量的来源,也就是,使第三训练评论对应的判别网络D的输出不断趋近于1,以及使二训练评论对应的判别网络D的输出不断趋近于0,从而不断地减小生成网络对抗损失值。
图9为本公开至少一实施例提供的一种对应于图6所示的训练方法的生成对抗训练阶段中生成网络的示意性训练架构框图;图10为本公开至少一实施例提供的一种训练生成网络的过程的示意性流程图。
例如,结合图9和图10所示,基于判别网络,对生成网络进行训练,即步骤S320,包括步骤S321至步骤S324,如下所示:
步骤S321:输入关于第一对象的第五训练评论,使用生成网络对第五训练评论进 行处理,以提取第五训练共同表示向量,使用判别网络对第五训练共同表示向量进行处理,以得到第五训练输出;
步骤S322:输入关于第二对象的第六训练评论,使用生成网络对第六训练评论进行处理,以提取第六训练共同表示向量,使用判别网络对第六训练共同表示向量进行处理,以得到第六训练输出;
步骤S323:基于第五训练输出和第六训练输出,通过生成网络对抗损失函数计算生成网络对抗损失值;
步骤S324:根据生成网络对抗损失值对生成网络的参数进行修正。
例如,基于判别网络,对生成网络进行训练,即步骤S320还可以包括:判断生成网络的训练是否满足预定条件,若不满足预定条件,则重复执行上述生成网络的训练过程;若满足预定条件,则停止本阶段的生成网络的训练过程,得到本阶段训练好的生成网络。例如,在一些示例中,上述预定条件为连续两对评论(例如,在训练生成网络的过程中,每一对评论包括一个第五训练评论和一个第六训练评论)对应的判别网络对抗损失值不再显著减小。例如,在另一些示例中,上述预定条件为生成网络的训练次数或训练周期达到预定数目。本公开的实施例对此不作限制。
例如,如图9所示,在生成网络G的训练过程中,需要联合判别网络D进行训练。需要说明的是,在生成网络G的训练过程中,判别网络D的参数保持不变。
需要说明的是,上述示例仅是示意性说明生成网络的训练过程。本领域技术人员应当知道,在训练阶段,需要利用大量样本评论(即关于第一对象的评论和关于第二对象的评论)对神经网络进行训练;同时,在针对每一对样本评论的训练过程中,都可以包括多次反复迭代以对生成网络的参数进行修正。又例如,生成网络的训练过程还包括对生成网络的参数进行微调(fine-tune),以获取更优化的参数。
例如,生成网络G的初始参数可以为随机数,例如随机数符合高斯分布。例如,生成网络G的初始参数也可以采用本领域常用的数据库中已训练好的参数。本公开的实施例对此不作限制。
例如,生成网络G的训练过程中还可以包括优化函数(图7中未示出),优化函数可以根据生成网络对抗损失函数计算得到的生成网络对抗损失值计算生成网络G的参数的误差值,并根据该误差值对生成网络G的参数进行修正。例如,优化函数可以采用随机梯度下降(stochastic gradient descent,SGD)算法、批量梯度下降(batch gradient  descent,BGD)算法等计算生成网络G的参数的误差值。
例如,与第三训练评论类似,第五训练评论来也源于第一对象的评论样本集,本公开的实施例包括但不限于此。例如,与第四训练评论类似,第六训练评论也来源于第二对象的评论样本集,本公开的实施例包括但不限于此。
例如,在一些示例中,生成网络对抗损失函数可以表示为:
Figure PCTCN2020113740-appb-000014
其中,L G表示生成网络对抗损失函数,z3表示第五训练评论,P data(z3)表示第五训练评论的集合,G(z3)表示第五训练共同表示向量,D(G(z3))表示第五训练输出,
Figure PCTCN2020113740-appb-000015
表示针对第五训练评论的集合求期望,z4表示第六训练评论,P data(z4)表示第六训练评论的集合,G(z4)表示第六训练共同表示向量,D(G(z4))表示第六训练输出,
Figure PCTCN2020113740-appb-000016
表示针对第六训练评论的集合求期望。由此,例如可以采用批量梯度下降算法对生成网络G进行参数优化。
需要说明的是,上述公式表示的判别网络对抗损失函数是示例性的,本公开的实施例包括但不限于此。
生成网络G的训练目标是最小化生成网络对抗损失值。例如,在生成网络G的训练过程中,第五训练评论的对象标签设置为0,即需要使判别网络D鉴别认定第五训练共同表示向量来源于关于第二对象的评论;同时,第六训练评论的对象标签设置为1,即需要使判别网络D鉴别认定第六训练共同表示向量来源于关于第一对象的评论。也就是说,生成网络G的训练目标是使判别网络D无法准确判断生成网络G提取的意思表示的真实来源(即来源于关于第一对象的评论还是关于第二对象的评论),也即使判别网络D无法判断生成网络G提取的意思表示是用于评论第一对象还是第二对象。例如,当生成网络G提取的意思表示为关于第一对象的评论和关于第二对象的评论的共同表示时,判别网络D无法判断生成网络G提取的意思表示的真实来源。
例如,在生成网络G的训练过程中,生成网络G的参数被不断地修正,以使经过参数修正后的生成网络G提取的意思表示为关于第一对象的评论和关于第二对象的评论的共同表示,从而判别网络D无法准确鉴别第五训练共同表示向量和第六训练共同表示向量的来源,也就是,使第五训练评论对应的判别网络D的输出不断远离于1(即不断靠近于0),以及使四训练评论对应的判别网络D的输出不断远离于0(即不断靠近于1),从而不断地减小生成网络对抗损失值。
例如,在本公开的实施例中,生成网络G的训练和判别网络D的训练是交替迭代进行的。例如,对于未经训练的生成网络G和判别网络D,一般先对判别网络D进行第一阶段训练,提高判别网络D的鉴别能力(即,鉴别判别网络D的输入的真实来源),得到经过第一阶段训练的判别网络D;然后,基于经过第一阶段训练的判别网络D对生成网络G进行第一阶段训练,提高生成网络G的提取关于第一对象的评论和关于第二对象的评论的共同表示的能力,得到经过第一阶段训练的生成网络G。与第一阶段训练类似,在第二阶段训练中,基于经过第一阶段训练的生成网络G,对经过第一阶段训练的判别网络D进行第二阶段训练,提高判别网络D的鉴别能力,得到经过第二阶段训练的判别网络D;然后,基于经过第二阶段训练的判别网络D对经过第一阶段训练的生成网络G进行第二阶段训练,提高生成网络G的提取关于第一对象的评论和关于第二对象的评论的共同表示的能力,得到经过第二阶段训练的生成网络G,依次类推,接下来对判别网络D和生成网络G进行第三阶段训练、第四阶段训练、……,直到得到的生成网络G的输出为关于第一对象的评论和关于第二对象的评论的共同表示,从而完成一个生成对抗训练阶段S300的训练。
需要说明的是,在生成对抗训练阶段300,即生成网络G和判别网络D的交替训练过程中,生成网络G和判别网络D的对抗体现在:关于第一对象的评论(即第三训练评论和第五训练评论)对应的生成网络G的输出在各自单独的训练过程中具有不同的对象标签(在判别网络D的训练过程中,第三训练评论的对象标签为1,在生成网络G的训练过程中,第五训练评论的对象标签为0),以及,关于第二对象的评论(即第四训练评论和第六训练评论)对应的生成网络G的输出在各自单独的训练过程中具有不同的对象标签(在判别网络D的训练过程中,第四训练评论的对象标签为0,在生成网络G的训练过程中,第六训练评论的对象标签为1)。另外,生成网络G和判别网络D的对抗还体现在判别网络对抗损失函数与生成网络对抗损失函数相反。还需要说明的是,理想情况下,经过训练得到的生成网络G提取的意思表示为关于第一对象的评论和关于第二对象的评论的共同表示(不论生成网络G的输入是关于第一对象的评论还是关于第二对象的评论),判别网络D针对该共同表示的输出均为0.5,即生成网络G和判别网络D经过对抗博弈达到纳什均衡。
例如,如图6所示,语义分类训练阶段S400包括:对生成网络、第一分支网络、第一分类网络、第二分支网络和第二分类网络进行训练。
例如,第一分支网络SN1的构造可以与前述第一表示提取器EE1的构造相同,第一分支网络SN1的构造细节及工作原理可以参考前述第一表示提取器EE1的相关描述,在此不再赘述。例如,如图5所示,第一分支网络SN1用于处理关于第一对象的评论,以提取该评论中的单一表示(是否提取该评论中的共同表示不作限制)。
例如,第二分支网络SN2的构造可以与前述第二表示提取器EE2的构造相同,第二分支网络SN2的构造细节及工作原理可以参考前述第二表示提取器EE2的相关描述,在此不再赘述。例如,如图5所示,第二分支网络SN2用于处理关于第二对象的评论,以提取该评论中的单一表示(是否提取该评论中的共同表示不作限制)。
例如,第一分类网络CN1、第二分类网络CN2的构造可以分别与前述第一语义分类器CC1、第二语义分类器CC2的构造相同,第一分类网络CN1、第二分类网络CN2的构造细节及工作原理可以参考前述第一语义分类器CC1、第二语义分类器CC2的相关描述,在此不再赘述。
图11为本公开至少一实施例提供的一种对应于图6所示的训练方法的语义分类训练阶段的示意性训练架构框图,图12为本公开至少一实施例提供的一种训练方法中的语义分类训练阶段的训练过程的示意性流程图。以下,结合图11和图12,对图6所示的语义分类训练阶段S400的训练过程进行详细说明。
例如,结合11和图12所示,语义分类训练阶段S400包括步骤S401至步骤S405。
步骤S401:输入关于第一对象的第一训练评论,使用生成网络对第一训练评论进行处理,以提取第一训练共同表示向量,使用第一分支网络对第一训练评论进行处理,以提取第一训练单一表示向量,将第一训练共同表示向量与第一训练单一表示向量进行拼接,以得到第一训练表示向量,使用第一分类网络对第一训练表示向量进行处理,以得到第一训练评论的语义分类的预测类别标识。
例如,与第三训练评论和第五训练评论类似,第一训练评论来也源于第一对象的评论样本集,本公开的实施例包括但不限于此。例如,第一训练评论具有确定的语义分类的类别标识T1(即真实类别标识),例如真实类别标识以向量的形式进行表示。例如,假设语义分类的类别标识的总数为K,则真实类别标识为一个K维向量;当该K维向量的第k个元素为1,其他元素为0时,该K维向量代表第k个真实类别标识,其中k为整数,且1≤k≤K。
例如,在一些示例中,如图11所示,输入关于第一对象的第一训练评论可以包括: 将第一训练评论映射为第一训练原始向量TP1。从而,在后续操作中对第一训练评论进行处理就是对第一训练原始向量TP1进行处理。例如,可以采用词向量算法(例如,深度神经网、word2vec程序等)将第一训练评论中的每个字映射为指定长度的向量,从而第一训练原始向量P1包括第一训练评论中的全部字经过映射得到的全部向量。例如,每个字对应的向量的长度相同。
例如,步骤S401中的操作可以参考前述语义分类方法的步骤S110至步骤S150的相关描述,在此不再重复赘述。
例如,第一训练评论的预测类别标识为与其真实类别标识维度相同的向量,例如第一训练评论的预测类别标识可以被表示为前述向量的形式,向量中的各个元素代表各个类别标识的预测概率,例如具有最大预测概率的类别标识被选择作为语义分类的类别标识。
步骤S402:输入关于第二对象的第二训练评论,使用生成网络对第二训练评论进行处理,以提取第二训练共同表示向量,使用第二分支网络对第二训练评论进行处理,以提取第二训练单一表示向量,将第二训练共同表示向量与第二训练单一表示向量进行拼接,以得到第二训练表示向量,使用第二分类网络对第二训练表示向量进行处理,以得到第二训练评论的语义分类的预测类别标识。
例如,与第四训练评论和第六训练评论类似,第二训练评论来也源于第二对象的评论样本集,本公开的实施例包括但不限于此。例如,第二训练评论具有确定的语义分类的类别标识T2(即真实类别标识),例如第二训练评论的真实类别标识T2的表示形式可以参考第一训练评论的真实类别标识T1的表示形式,在此不再重复赘述。
例如,在一些示例中,如图11所示,输入关于第二对象的第二训练评论可以包括:将第二训练评论映射为第二训练原始向量TP2。从而,在后续操作中对第二训练评论进行处理就是对第二训练原始向量TP2进行处理。例如,可以采用词向量算法(例如,深度神经网、word2vec程序等)将第二训练评论中的每个字映射为指定长度的向量,从而第二训练原始向量TP2包括第二训练评论中的全部字经过映射得到的全部向量。例如,第二训练评论中的每个字对应的向量的长度与第一训练评论中的每个字对应的向量的长度相同。
例如,步骤S402中的操作可以参考前述语义分类方法的步骤S160至步骤S200的相关描述,在此不再重复赘述。
例如,第二训练评论的预测类别标识为与其真实类别标识维度相同的向量,例如第二训练评论的预测类别标识也可以被表示为前述向量的形式,向量中的各个元素代表各个类别标识的预测概率,例如具有最大预测概率的类别标识被选择作为语义分类的类别标识。
步骤S403:基于第一训练评论的预测类别标识和第二训练评论的预测类别标识,通过系统损失函数计算系统损失值;
例如,在一些示例中,系统损失函数可以表示为:
L obj=λ 1·L(Y1,T1)+λ 2·L(Y2,T2)
其中,L obi表示系统损失函数,L(·,·)表示交叉熵损失函数,Y1表示第一训练评论的预测类别标识,T1表示第一训练评论的真实类别标识,L(Y1,T1)表示第一训练评论的交叉熵损失函数,λ 1表示在系统损失函数中第一训练评论的交叉熵损失函数L(Y1,T1)的权重,Y2表示第二训练评论的预测类别标识,T1表示第二训练评论的真实类别标识,L(Y2,T2)表示第二训练评论的交叉熵损失函数,λ 2表示在系统损失函数中第二训练评论的交叉熵损失函数L(Y2,T2)的权重。
例如,交叉熵损失函数L(·,·)可以表示为:
Figure PCTCN2020113740-appb-000017
其中,Y、T均为形式参数,N表示训练评论(例如,第一训练评论或第二训练评论)的数量,K表示语义分类的类别标识的数量,
Figure PCTCN2020113740-appb-000018
表示第i个训练评论的预测类别标识中第j个类别标识的概率值,
Figure PCTCN2020113740-appb-000019
表示所述第i个训练评论的真实类别标识中第j个类别标识的概率值。
语义分类训练阶段S400的训练目标是最小化系统损失值。例如,第一训练评论的交叉熵损失函数L(Y1,T1)的值越小,则表明第一训练评论的预测类别标识越接近于第一训练评论的真实类别标识,即第一训练评论的语义分类越准确;同样地,第二训练评论的交叉熵损失函数L(Y2,T2)的值越小,则表明第二训练评论的预测类别标识越接近于第二训练评论的真实类别标识,即第二训练评论的语义分类越准确。
步骤S404:根据系统损失值对生成网络、第一分支网络、第一分类网络、第二分支网络和第二分类网络的参数进行修正。
例如,第一分支网络SN1、第一分类网络CN1、第二分支网络SN2和第二分类网络CN2的初始参数可以为随机数,例如随机数符合高斯分布。例如,第一分支网络SN1、 第一分类网络CN1、第二分支网络SN2和第二分类网络CN2的初始参数也可以采用本领域常用的数据库中已训练好的参数。本公开的实施例对此不作限制。
例如,语义分类训练阶段S400的训练过程中还可以包括优化函数(图11中未示出),优化函数可以根据系统损失函数计算得到的系统损失值计算生成网络G、第一分支网络SN1、第一分类网络CN1、第二分支网络SN2和第二分类网络CN2的参数的误差值,并根据该误差值对生成网络G、第一分支网络SN1、第一分类网络CN1、第二分支网络SN2和第二分类网络CN2的参数进行修正。例如,优化函数可以采用随机梯度下降(stochastic gradient descent,SGD)算法、批量梯度下降(batch gradient descent,BGD)算法等计算生成网络G、第一分支网络SN1、第一分类网络CN1、第二分支网络SN2和第二分类网络CN2的参数的误差值。
例如,语义分类训练阶段S400还可以包括:判断生成网络、第一分支网络、第一分类网络、第二分支网络和第二分类网络的训练是否满足预定条件,若不满足预定条件,则重复执行上述语义分类训练阶段S400的训练过程;若满足预定条件,则停止当前的语义分类训练阶段S400的训练过程,得到当前阶段训练好的生成网络、第一分支网络、第一分类网络、第二分支网络和第二分类网络。例如,在一些示例中,上述预定条件为连续两对评论(例如,在语义分类训练阶段S400的训练过程中,每一对评论包括一个第一训练评论和一个第二训练评论)对应的系统损失值不再显著减小。例如,在另一些示例中,上述预定条件为语义分类训练阶段S400的训练次数或训练周期达到预定数目。本公开的实施例对此不作限制。
需要说明的是,上述示例仅是示意性说明语义分类训练阶段S400的训练过程。本领域技术人员应当知道,在训练阶段,需要利用大量样本评论(即关于第一对象的评论和关于第二对象的评论)对神经网络进行训练;同时,在针对每一对样本评论的训练过程中,都可以包括多次反复迭代以对生成网络的参数进行修正。又例如,语义分类训练阶段S400的训练过程还包括对生成网络、第一分支网络、第一分类网络、第二分支网络和第二分类网络的参数进行微调(fine-tune),以获取更优化的参数。
例如,在本公开的实施例中,生成对抗训练阶段S300和语义分类阶段S400是交替迭代进行的,其中,生成网络G同时参与这两个训练阶段的训练。例如,在一些示例中,生成对抗训练阶段S300可以提高生成网络G提取共同表示的能力,但是与此同时,生成网络G还可能会提取第一训练评论和第二训练评论中均会用到的与语义分类无关的 字词;例如,语义分类阶段S400可以使生成网络G获得过滤这些与语义分类无关的字词的功能,从而有助于提高语义分类的准确率以及提高神经网络的运行效率。
本公开的实施例提供的神经网络的训练方法,可以对神经网络进行训练,其中训练好的生成网络G、第一分支网络SN1、第二分支网络SN2、第一分类网络CN1和第二分类网络CN2可以分别用于实现前述语义分类方法中的共同表示提取器EE0、第一表示提取器EE1、第二表示提取器EE2、第一语义分类器CC1和第二语义分类器CC2的功能,从而可以执行前述语义分类方法。
本公开的实施例提供的神经网络的训练方法的技术效果可以参考上述实施例中关于语义分类方法的相应描述,在此不再赘述。
本公开至少一实施例还提供一种语义分类装置。图13为本公开至少一实施例提供的一种语义分类装置的示意性框图。例如,如图13所示,该语义分类装置500包括存储器510和处理器520。例如,存储器510用于非暂时性存储计算机可读指令,处理器520用于运行该计算机可读指令,该计算机可读指令被处理器520运行时执行本公开任一实施例提供的语义分类方法。在其他实施例中,计算机可读指令被处理器520运行时还可以执行本公开任一实施例提供的神经网络的训练方法。
例如,存储器510和处理器520之间可以直接或间接地互相通信。例如,存储器510和处理器520等组件之间可以通过网络连接进行通信。网络可以包括无线网络、有线网络、和/或无线网络和有线网络的任意组合。网络可以包括局域网、互联网、电信网、基于互联网和/或电信网的物联网(Internet of Things)、和/或以上网络的任意组合等。有线网络例如可以采用双绞线、同轴电缆或光纤传输等方式进行通信,无线网络例如可以采用3G/4G/5G移动通信网络、蓝牙、Zigbee或者WiFi等通信方式。本公开对网络的类型和功能在此不作限制。
例如,处理器520可以控制语义分类装置中的其它组件以执行期望的功能。处理器520可以是中央处理单元(CPU)、张量处理器(TPU)或者图形处理器GPU等具有数据处理能力和/或程序执行能力的器件。中央处理器(CPU)可以为X86或ARM架构等。GPU可以单独地直接集成到主板上,或者内置于主板的北桥芯片中。GPU也可以内置于中央处理器(CPU)上。
例如,存储器510可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。 易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。
例如,在存储器510上可以存储一个或多个计算机指令,处理器520可以运行所述计算机指令,以实现各种功能。在计算机可读存储介质中还可以存储各种应用程序和各种数据,例如第一对象的评论样本集、第二对象的评论样本集、第一原始向量、第二原始向量以及应用程序使用和/或产生的各种数据等。
例如,存储器510存储的一些计算机指令被处理器520执行时可以执行根据上文所述的语义分类方法中的一个或多个步骤。又例如,存储器510存储的另一些计算机指令被处理器520执行时可以执行根据上文所述的神经网络的训练方法中的一个或多个步骤。
例如,关于语义分类方法的处理过程的详细说明可以参考上述语义分类方法的实施例中的相关描述,关于神经网络的训练方法的处理过程的详细说明可以参考上述神经网络的训练方法的实施例中的相关描述,重复之处不再赘述。
需要说明的是,本公开的实施例提供的语义分类装置是示例性的,而非限制性的,根据实际应用需要,该语义分类装置还可以包括其他常规部件或结构,例如,为实现语义分类装置的必要功能,本领域技术人员可以根据具体应用场景设置其他的常规部件或结构,本公开的实施例对此不作限制。
本公开的实施例提供的语义分类装置的技术效果可以参考上述实施例中关于语义分类方法以及神经网络的训练方法的相应描述,在此不再赘述。
本公开至少一实施例还提供一种神经网络的训练装置。图14为本公开至少一实施例提供的一种神经网络的训练装置的示意性框图。例如,如图14所示,该神经网络的训练装置500’包括存储器510’和处理器520’。例如,存储器510’用于非暂时性存储计算机可读指令,处理器520’用于运行该计算机可读指令,该计算机可读指令被处理器520’运行时执行本公开任一实施例提供的神经网络的训练方法。在其他实施例中,计算机可读指令被处理器520’运行时还可以执行本公开任一实施例提供的语义分类方法。
存储器510’和处理器520’分别具有与上述存储器510和处理器520相类似的功能和设置,上文已详细说明,在此不再赘述。
本公开至少一实施例还提供一种存储介质。图15为本公开一实施例提供的一种存储介质的示意图。例如,如图15所示,该存储介质600非暂时性地存储计算机可读指 令601,当非暂时性计算机可读指令601由计算机(包括处理器)执行时可以执行本公开任一实施例提供的语义分类方法的指令或者可以执行本公开任一实施例提供的神经网络的训练方法的指令。也可以在执行本公开任一实施例提供的神经网络的训练方法的指令之后,执行本公开任一实施例提供的语义分类方法。
例如,在存储介质600上可以存储一个或多个计算机指令。存储介质600上存储的一些计算机指令可以是例如用于实现上述语义分类方法中的一个或多个步骤的指令。存储介质上存储的另一些计算机指令可以是例如用于实现上述神经网络的训练方法中的一个或多个步骤的指令。
例如,存储介质可以包括平板电脑的存储部件、个人计算机的硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、光盘只读存储器(CD-ROM)、闪存、或者上述存储介质的任意组合,也可以为其他适用的存储介质。
本公开的实施例提供的存储介质的技术效果可以参考上述实施例中关于语义分类方法以及神经网络的训练方法的相应描述,在此不再赘述。
对于本公开,有以下几点需要说明:
(1)本公开实施例附图中,只涉及到与本公开实施例涉及到的结构,其他结构可参考通常设计。
(2)在不冲突的情况下,本公开同一实施例及不同实施例中的特征可以相互组合。
以上,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。

Claims (24)

  1. 一种语义分类方法包括:
    输入关于第一对象的第一评论;
    使用共同表示提取器对所述第一评论进行处理,以提取用于表征所述第一评论中的共同表示的第一共同表示向量;
    使用第一表示提取器对所述第一评论进行处理,以提取用于表征所述第一评论中的单一表示的第一单一表示向量;
    将所述第一共同表示向量和所述第一单一表示向量进行拼接,以得到第一表示向量;以及
    使用第一语义分类器对所述第一表示向量进行处理,以得到所述第一评论的语义分类;
    其中,所述共同表示包括既用于评论所述第一对象又用于评论第二对象的意思表示,所述第二对象为与所述第一对象不同的关联评论对象,所述第一评论的单一表示包括仅用于评论所述第一对象的意思表示。
  2. 根据权利要求1所述的语义分类方法,还包括:将所述第一评论映射为第一原始向量;其中,
    使用所述共同表示提取器对所述第一评论进行处理,包括:使用所述共同表示提取器对所述第一原始向量进行处理;
    使用所述第一表示提取器对所述第一评论进行处理,包括:使用所述第一表示提取器对所述第一原始向量进行处理。
  3. 根据权利要求2所述的语义分类方法,其中,将所述第一评论映射为所述第一原始向量,包括:
    使用词向量算法将所述第一评论中的每个字映射为具有指定长度的向量,以得到所述第一原始向量。
  4. 根据权利要求1-3任一项所述的语义分类方法,其中,所述共同表示提取器和所述第一表示提取器各自分别包括循环神经网络、长短期记忆网络和双向长短期记忆网络之一,所述第一语义分类器包括softmax分类器。
  5. 根据权利要求1-3任一项所述的语义分类方法,还包括:
    输入关于第二对象的第二评论;
    使用所述共同表示提取器对所述第二评论进行处理,以提取用于表征所述第二评论中的所述共同表示的第二共同表示向量;
    使用第二表示提取器对所述第二评论进行处理,以提取用于表征所述第二评论中的单一表示的第二单一表示向量;
    将所述第二共同表示向量和所述第二单一表示向量进行拼接,以得到第二表示向量;以及
    使用第二语义分类器对所述第二表示向量进行处理,以得到所述第二评论的语义分类;
    其中,所述第二评论的单一表示包括仅用于评论所述第二对象的意思表示。
  6. 根据权利要求5所述的语义分类方法,还包括:将所述第二评论映射为第二原始向量;其中,
    使用所述共同表示提取器对所述第二评论进行处理,包括:使用所述共同表示提取器对所述第二原始向量进行处理;
    使用所述第二表示提取器对所述第二评论进行处理,包括:使用所述第二表示提取器对所述第二原始向量进行处理。
  7. 根据权利要求6所述的语义分类方法,其中,将所述第二评论映射为所述第二原始向量,包括:
    使用词向量算法将所述第二评论中的每个字映射为具有指定长度的向量,以得到所述第二原始向量。
  8. 根据权利要求5-7任一项所述的语义分类方法,其中,所述第二表示提取器包括循环神经网络、长短期记忆网络和双向长短期记忆网络之一,所述第二语义分类器包括softmax分类器。
  9. 根据权利要求5-8任一项所述的语义分类方法,其中,所述第一评论和所述第二评论的语料来源包括文本和语音至少之一。
  10. 一种神经网络的训练方法,所述神经网络包括:生成网络、第一分支网络、第一分类网络、第二分支网络和第二分类网络;所述训练方法包括:语义分类训练阶段;其中,
    所述语义分类训练阶段包括:
    输入关于第一对象的第一训练评论,使用所述生成网络对所述第一训练评论进行处理,以提取第一训练共同表示向量,使用所述第一分支网络对所述第一训练评论进行处理,以提取第一训练单一表示向量,将所述第一训练共同表示向量与所述第一训练单一表示向量进行拼接,以得到第一训练表示向量,使用所述第一分类网络对所述第一训练表示向量进行处理,以得到所述第一训练评论的语义分类的预测类别标识;
    输入关于第二对象的第二训练评论,使用所述生成网络对所述第二训练评论进行处理,以提取第二训练共同表示向量,使用所述第二分支网络对所述第二训练评论进行处理,以提取第二训练单一表示向量,将所述第二训练共同表示向量与所述第二训练单一表示向量进行拼接,以得到第二训练表示向量,使用所述第二分类网络对所述第二训练表示向量进行处理,以得到所述第二训练评论的语义分类的预测类别标识;
    基于所述第一训练评论的预测类别标识和所述第二训练评论的预测类别标识,通过系统损失函数计算系统损失值;以及
    根据所述系统损失值对所述生成网络、所述第一分支网络、所述第一分类网络、所述第二分支网络和所述第二分类网络的参数进行修正;
    其中,所述第一对象和所述第二对象为关联评论对象。
  11. 根据权利要求10所述的训练方法,其中,所述语义分类训练阶段还包括:
    将所述第一训练评论映射为第一训练原始向量,将所述第二训练评论映射为第二训练原始向量;
    其中,使用所述生成网络对所述第一训练评论进行处理,包括:使用所述生成网络对所述第一训练原始向量进行处理;
    使用所述第一分支网络对所述第一训练评论进行处理,包括:使用所述第一分支网络对所述第一训练原始向量进行处理;
    使用所述生成网络对所述第二训练评论进行处理,包括:使用所述生成网络对所述第二训练原始向量进行处理;
    使用所述第二分支网络对所述第二训练评论进行处理,包括:使用所述第二分支网络对所述第二训练原始向量进行处理。
  12. 根据权利要求11所述的训练方法,其中,将所述第一训练评论映射为所述第一训练原始向量,包括:
    使用词向量算法将所述第一训练评论中的每个字映射为具有指定长度的向量,以得 到所述第一训练原始向量;
    将所述第二训练评论映射为所述第二训练原始向量,包括:
    使用所述词向量算法将所述第二训练评论中的每个字映射为具有所述指定长度的向量,以得到所述第二训练原始向量。
  13. 根据权利要求10-12任一项所述的训练方法,其中,所述生成网络、所述第一分支网络和所述第二分支网络各自分别包括循环神经网络、长短期记忆网络和双向长短期记忆网络之一,所述第一分类网络和所述第二分类网络均包括softmax分类器。
  14. 根据权利要求10-12任一项所述的训练方法,其中,所述系统损失函数表示为:
    L obj=λ 1·L(Y1,T1)+λ 2·L(Y2,T2)
    其中,L obj表示系统损失函数,L(·,·)表示交叉熵损失函数,Y1表示所述第一训练评论的预测类别标识,T1表示所述第一训练评论的真实类别标识,L(Y1,T1)表示第一训练评论的交叉熵损失函数,λ 1表示在所述系统损失函数中所述第一训练评论的交叉熵损失函数L(Y1,T1)的权重,Y2表示所述第二训练评论的预测类别标识,T1表示所述第二训练评论的真实类别标识,L(Y2,T2)表示第二训练评论的交叉熵损失函数,λ 2表示在所述系统损失函数中所述第二训练评论的交叉熵损失函数L(Y2,T2)的权重;
    所述交叉熵损失函数L(·,·)表示为:
    Figure PCTCN2020113740-appb-100001
    其中,Y、T均为形式参数,N表示训练评论的数量,K表示语义分类的类别标识的数量,
    Figure PCTCN2020113740-appb-100002
    表示第i个训练评论的预测类别标识中第j个类别标识的概率值,
    Figure PCTCN2020113740-appb-100003
    表示所述第i个训练评论的真实类别标识中第j个类别标识的概率值。
  15. 根据权利要求10-12任一项所述的训练方法,其中,所述神经网络还包括判别网络;所述训练方法还包括:生成对抗训练阶段;以及交替地执行所述生成对抗训练阶段和所述语义分类训练阶段;
    其中,所述生成对抗训练阶段包括:
    基于所述生成网络,对所述判别网络进行训练;
    基于所述判别网络,对所述生成网络进行训练;以及
    交替地执行上述训练过程,以完成所述述生成对抗训练阶段的训练。
  16. 根据权利要求15所述的训练方法,其中,基于所述生成网络,对所述判别网络进行训练,包括:
    输入关于所述第一对象的第三训练评论,使用所述生成网络对所述第三训练评论进行处理,以提取第三训练共同表示向量,使用所述判别网络对所述第三训练共同表示向量进行处理,以得到第三训练输出;
    输入关于所述第二对象的第四训练评论,使用所述生成网络对所述第四训练评论进行处理,以提取第四训练共同表示向量,使用所述判别网络对所述第四训练共同表示向量进行处理,以得到第四训练输出;
    基于所述第三训练输出和所述第四训练输出,通过判别网络对抗损失函数计算判别网络对抗损失值;以及
    根据所述判别网络对抗损失值对所述判别网络的参数进行修正。
  17. 根据权利要求16所述的训练方法,其中,所述判别网络包括二分类的softmax分类器。
  18. 根据权利要求16或17所述的训练方法,其中,所述判别网络对抗损失函数表示为:
    Figure PCTCN2020113740-appb-100004
    其中,L D表示所述判别网络对抗损失函数,z1表示所述第三训练评论,P data(z1)表示所述第三训练评论的集合,G(z1)表示所述第三训练共同表示向量,D(G(z1))表示所述第三训练输出,
    Figure PCTCN2020113740-appb-100005
    表示针对所述第三训练评论的集合求期望,z2表示所述第四训练评论,P data(z2)表示所述第四训练评论的集合,G(z2)表示所述第四训练共同表示向量,D(G(z2))表示所述第四训练输出,
    Figure PCTCN2020113740-appb-100006
    表示针对所述第四训练评论的集合求期望。
  19. 根据权利要求15-18任一项所述的训练方法,其中,基于所述判别网络,对所述生成网络进行训练,包括:
    输入关于所述第一对象的第五训练评论,使用所述生成网络对所述第五训练评论进行处理,以提取第五训练共同表示向量,使用所述判别网络对所述第五训练共同表示向量进行处理,以得到第五训练输出;
    输入关于所述第二对象的第六训练评论,使用所述生成网络对所述第六训练评论进行处理,以提取第六训练共同表示向量,使用所述判别网络对所述第六训练共同表示向量进行处理,以得到第六训练输出;
    基于所述第五训练输出和所述第六训练输出,通过生成网络对抗损失函数计算生成 网络对抗损失值;以及
    根据所述生成网络对抗损失值对所述生成网络的参数进行修正。
  20. 根据权利要求19所述的训练方法,其中,所述生成网络对抗损失函数可以表示为:
    Figure PCTCN2020113740-appb-100007
    其中,L G表示所述生成网络对抗损失函数,z3表示所述第五训练评论,P data(z3)表示所述第五训练评论的集合,G(z3)表示所述第五训练共同表示向量,D(G(z3))表示所述第五训练输出,
    Figure PCTCN2020113740-appb-100008
    表示针对所述第五训练评论的集合求期望,z4表示所述第六训练评论,P data(z4)表示所述第六训练评论的集合,G(z4)表示所述第六训练共同表示向量,D(G(z4))表示所述第六训练输出,
    Figure PCTCN2020113740-appb-100009
    表示针对所述第六训练评论的集合求期望。
  21. 一种语义分类装置,包括:
    存储器,用于存储非暂时性计算机可读指令;以及
    处理器,用于运行所述计算机可读指令,
    其中,所述计算机可读指令被所述处理器运行时执行根据权利要求1-9任一项所述的语义分类方法。
  22. 一种神经网络的训练装置,包括:
    存储器,用于存储非暂时性计算机可读指令;以及
    处理器,用于运行所述计算机可读指令,
    其中,所述计算机可读指令被所述处理器运行时执行根据权利要求10-20任一项所述的训练方法。
  23. 一种存储介质,非暂时性地存储计算机可读指令,其中,当所述非暂时性计算机可读指令由计算机执行时可以执行根据权利要求1-9任一项所述的语义分类方法的指令。
  24. 一种存储介质,非暂时性地存储计算机可读指令,其中,当所述非暂时性计算机可读指令由计算机执行时可以执行根据权利要求10-20任一项所述的训练方法的指令。
PCT/CN2020/113740 2019-09-09 2020-09-07 神经网络的训练方法及装置、语义分类方法及装置和介质 WO2021047473A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/418,836 US11934790B2 (en) 2019-09-09 2020-09-07 Neural network training method and apparatus, semantic classification method and apparatus and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910863457.8 2019-09-09
CN201910863457.8A CN110598786B (zh) 2019-09-09 2019-09-09 神经网络的训练方法、语义分类方法、语义分类装置

Publications (1)

Publication Number Publication Date
WO2021047473A1 true WO2021047473A1 (zh) 2021-03-18

Family

ID=68859161

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/113740 WO2021047473A1 (zh) 2019-09-09 2020-09-07 神经网络的训练方法及装置、语义分类方法及装置和介质

Country Status (3)

Country Link
US (1) US11934790B2 (zh)
CN (1) CN110598786B (zh)
WO (1) WO2021047473A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598786B (zh) * 2019-09-09 2022-01-07 京东方科技集团股份有限公司 神经网络的训练方法、语义分类方法、语义分类装置
CN111858923A (zh) * 2019-12-24 2020-10-30 北京嘀嘀无限科技发展有限公司 一种文本分类方法、系统、装置及存储介质
CN112164125B (zh) * 2020-09-15 2022-07-26 华南理工大学 一种监督可控的人脸多属性分离生成的方法
CN117218693A (zh) * 2022-05-31 2023-12-12 青岛云天励飞科技有限公司 人脸属性预测网络生成方法、人脸属性预测方法及装置
CN115618884B (zh) * 2022-11-16 2023-03-10 华南师范大学 基于多任务学习的言论分析方法、装置以及设备

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229582A (zh) * 2018-02-01 2018-06-29 浙江大学 一种面向医学领域的多任务命名实体识别对抗训练方法
CN108664589A (zh) * 2018-05-08 2018-10-16 苏州大学 基于领域自适应的文本信息提取方法、装置、系统及介质
CN109377448A (zh) * 2018-05-20 2019-02-22 北京工业大学 一种基于生成对抗网络的人脸图像修复方法
CN109447906A (zh) * 2018-11-08 2019-03-08 北京印刷学院 一种基于生成对抗网络的图片合成方法
CN109783812A (zh) * 2018-12-28 2019-05-21 中国科学院自动化研究所 基于自注意力机制的中文命名实体识别方法及装置
US20190259474A1 (en) * 2018-02-17 2019-08-22 Regeneron Pharmaceuticals, Inc. Gan-cnn for mhc peptide binding prediction
CN110188776A (zh) * 2019-05-30 2019-08-30 京东方科技集团股份有限公司 图像处理方法及装置、神经网络的训练方法、存储介质
CN110598786A (zh) * 2019-09-09 2019-12-20 京东方科技集团股份有限公司 神经网络的训练方法、语义分类方法、语义分类装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10360507B2 (en) 2016-09-22 2019-07-23 nference, inc. Systems, methods, and computer readable media for visualization of semantic information and inference of temporal signals indicating salient associations between life science entities
US10347244B2 (en) * 2017-04-21 2019-07-09 Go-Vivace Inc. Dialogue system incorporating unique speech to text conversion method for meaningful dialogue response
CN107679217B (zh) * 2017-10-19 2021-12-07 北京百度网讯科技有限公司 基于数据挖掘的关联内容提取方法和装置
CN107766585B (zh) * 2017-12-07 2020-04-03 中国科学院电子学研究所苏州研究院 一种面向社交网络的特定事件抽取方法
CN108363753B (zh) * 2018-01-30 2020-05-19 南京邮电大学 评论文本情感分类模型训练与情感分类方法、装置及设备
CN108763204A (zh) * 2018-05-21 2018-11-06 浙江大学 一种多层次的文本情感特征提取方法和模型
CN109544524B (zh) * 2018-11-15 2023-05-23 中共中央办公厅电子科技学院 一种基于注意力机制的多属性图像美学评价系统
CN109740154B (zh) * 2018-12-26 2021-10-26 西安电子科技大学 一种基于多任务学习的在线评论细粒度情感分析方法
US11748613B2 (en) * 2019-05-10 2023-09-05 Baidu Usa Llc Systems and methods for large scale semantic indexing with deep level-wise extreme multi-label learning
CN110222182B (zh) * 2019-06-06 2022-12-27 腾讯科技(深圳)有限公司 一种语句分类方法及相关设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229582A (zh) * 2018-02-01 2018-06-29 浙江大学 一种面向医学领域的多任务命名实体识别对抗训练方法
US20190259474A1 (en) * 2018-02-17 2019-08-22 Regeneron Pharmaceuticals, Inc. Gan-cnn for mhc peptide binding prediction
CN108664589A (zh) * 2018-05-08 2018-10-16 苏州大学 基于领域自适应的文本信息提取方法、装置、系统及介质
CN109377448A (zh) * 2018-05-20 2019-02-22 北京工业大学 一种基于生成对抗网络的人脸图像修复方法
CN109447906A (zh) * 2018-11-08 2019-03-08 北京印刷学院 一种基于生成对抗网络的图片合成方法
CN109783812A (zh) * 2018-12-28 2019-05-21 中国科学院自动化研究所 基于自注意力机制的中文命名实体识别方法及装置
CN110188776A (zh) * 2019-05-30 2019-08-30 京东方科技集团股份有限公司 图像处理方法及装置、神经网络的训练方法、存储介质
CN110598786A (zh) * 2019-09-09 2019-12-20 京东方科技集团股份有限公司 神经网络的训练方法、语义分类方法、语义分类装置

Also Published As

Publication number Publication date
CN110598786B (zh) 2022-01-07
CN110598786A (zh) 2019-12-20
US20220075955A1 (en) 2022-03-10
US11934790B2 (en) 2024-03-19

Similar Documents

Publication Publication Date Title
WO2021047473A1 (zh) 神经网络的训练方法及装置、语义分类方法及装置和介质
Dharwadkar et al. A medical chatbot
WO2022007823A1 (zh) 一种文本数据处理方法及装置
WO2021233112A1 (zh) 基于多模态机器学习的翻译方法、装置、设备及存储介质
CN109684445B (zh) 口语化医疗问答方法及系统
KR102424085B1 (ko) 기계-보조 대화 시스템 및 의학적 상태 문의 장치 및 방법
CN111832312B (zh) 文本处理方法、装置、设备和存储介质
Liu et al. Natural language inference in context-investigating contextual reasoning over long texts
CN110322959B (zh) 一种基于知识的深度医疗问题路由方法及系统
CN112100406A (zh) 数据处理方法、装置、设备以及介质
CN113707299A (zh) 基于问诊会话的辅助诊断方法、装置及计算机设备
CN113988013A (zh) 基于多任务学习和图注意力网络的icd编码方法及装置
CN114648032B (zh) 语义理解模型的训练方法、装置和计算机设备
CN117238437A (zh) 基于知识图谱的病情诊断辅助方法及系统
Biswas et al. Symptom-based disease detection system in bengali using convolution neural network
CN111553140A (zh) 数据处理方法、数据处理设备及计算机存储介质
Shukla et al. Optimization assisted bidirectional gated recurrent unit for healthcare monitoring system in big-data
CN117747087A (zh) 问诊大模型的训练方法、基于大模型的问诊方法和装置
JP2022141191A (ja) 機械学習プログラム、機械学習方法および翻訳装置
CN111144134B (zh) 基于OpenKiWi的翻译引擎自动化评测系统
CN112259232A (zh) 一种基于深度学习的vte风险自动评估系统
CN115659987B (zh) 基于双通道的多模态命名实体识别方法、装置以及设备
CN116956934A (zh) 任务处理方法、装置、设备及存储介质
Shen et al. Intelligent recognition of portrait sketch components for child autism assessment
Ismael et al. Chatbot System for Mental Health in Bahasa Malaysia

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20863899

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20863899

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20863899

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14.02.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20863899

Country of ref document: EP

Kind code of ref document: A1