EP3830761A1

EP3830761A1 - Computer-implemented method and device for text analysis

Info

Publication number: EP3830761A1
Application number: EP19739587.4A
Authority: EP
Inventors: Michael Dorna; Anna Constanze HAETTY
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2018-08-03
Filing date: 2019-07-11
Publication date: 2021-06-09
Also published as: US11875265B2; WO2020025285A1; JP7271652B2; US20210279512A1; DE102018213021A1; JP2021533477A

Abstract

The invention relates to a computer-implemented method for training an artificial neural network with training data comprising features and identifiers, wherein the features characterize term candidates from a corpus (302). The corpus comprises a text from a domain, wherein the identifier characterizes a degree of an association to at least three classes of term candidates that are different from one another. Different classes indicate different degrees of association of the term candidate to the domain, wherein the training data comprise an allocation of features to identifiers. The invention further relates to an artificial neural network, to a method for classifying term candidates, and to a computer-implemented method for generating training data.

Description

description

title

Computer-implemented method and device for text analysis

State of the art

The disclosure is based on computer-implemented methods and

Devices for text analysis, in particular for predicting a

Belonging of a composite from a text to a subject.

Machine-based systems for text analysis use rule-based or statistical procedures for terminology extraction and keywording. Hybrid processes and machine learning processes are also used for text analysis.

DE 20 2017 102 235 U1 generally discloses aspects of a machine learning method.

Binary decisions form the basis of such methods for assigning a composite to a specific field. It is desirable to enable an improved approach.

Disclosure of the invention

This is achieved by the methods and devices according to the independent claims.

In the following description, the term corpus denotes a text or a collection of texts. A subject-specific corpus only contains text that is specific to a domain. A general language corpus describes text or a collection of texts without specific assignment to a domain. For example, all texts of a cooking forum on the Internet provide one

subject-specific corpus. All Wikipedia entries, for example, represent a general-language corpus.

In the following description, parts of a corpus that are analyzed are referred to as term candidates. In addition to the term candidates, a text can also contain parts that cannot or should not be analyzed.

In the following description, the term compound means a word combination, i.e. a word that is composed by connecting at least two already existing words or stems.

In the following description, the term component denotes part of a composite, i.e. part of the word composition.

A degree of affiliation of a component or a compound to a specific domain is referred to below as a class. A certain compound is, for example, a certain class

assigned if its degree of belonging to this domain has a certain value or is within a certain value range. With different values or with different, not overlapping

Value ranges, for example, unique classes are defined.

In a computer-implemented method for training an artificial neural network with training data that include features and characteristics, the features characterize term candidates from a corpus, the corpus comprising a text from a domain. A label characterizes a degree of belonging to at least three different classes for the term candidates. Different classes indicate different degrees of belonging of the term candidate to the domain. The training data include an assignment of features

Mark. In the process, an input layer of the artificial

Neural network specified a feature, wherein the artificial neural network the feature from the input layer in a prediction in a

Assigns a label to the output layer of the artificial neural network.

In a comparison, the indicator from the output layer is compared with the Indicator that is assigned to the characteristic in the training data is compared. Depending on the result of the comparison, at least one

Parameters of the artificial neural network learned that connect the artificial neural network between the input layer and the

Characterized output layer. As a result, terms can be classified into four classes NONTERM, SIMTERM, TERM, SPECTERM instead of in two classes term or non-term.

The term candidates are advantageously taken from a corpus which is subject-specific with regard to the domain. Such a body is particularly suitable for training the classification.

The term candidates are advantageously assigned to at least one of the classes and the features for the term candidates are determined, in particular a word vector being determined and at least one parameter of the artificial neural network being trained with the features. Thereby

The term candidates are advantageously composites with at least two components. The training data assign composites to at least one of more than three classes. This is particularly useful for composites, since a composite can have different specificity or centrality for a domain, depending on its components. For example, it is likely that a specific technical term is one or more very specific

Contains components. For example, a general term does not contain a component specific to this domain. The artificial neural network is thus trained for a very fine distinction.

Composites are advantageously divided into components from a corpus specific to the domain as term candidates, the composites assigned to at least one of the classes, the features for the composites and the components determined, and at least one parameter of the artificial neural network is trained with the features. As a result, composite and its components are taken into account in training. This further improves learning behavior. At least one word vector is advantageously determined as a feature. A word vector is a particularly suitable feature for the training process.

A productivity and a frequency of the

Components can be determined as features based on the specific body. Productivity and frequency are other characteristics related to the frequency of occurrence of the components. This improves the training even more.

Features are determined in a computer-implemented method for generating training data for training an artificial neural network, in which the training data include features and characteristics

Characterize term candidates from a corpus, the corpus comprising a text from a domain, wherein an identifier is determined which characterizes a degree of belonging to at least three different classes for the term candidates, with different classes different degrees of belonging of the term candidate to the domain specify, at least one characteristic being assigned to at least one of the indicators. This training data is particularly suitable for training a classification with more than three classes.

The term candidates are advantageously taken from a corpus which is subject-specific with regard to the domain. The subject-specific corpus offers a high density of relevant term candidates of a domain.

The term candidates are advantageously assigned to at least one of the classes and the characteristics for the term candidates are determined, in particular a word vector being determined. The assignment of characteristics to classes is a particularly suitable representation of the classification of term candidates for machine learning.

The term candidates are advantageously composites with at least two components. This form of training data is especially for one

Subdivision of the classes suitable. With regard to the assignment, composites are not classified as either term or non-term due to the possible classification of their components in different classes but can be classified according to their degree of belonging to a domain in another class other than the non-term class.

Advantageously, composites from a corpus specific to the domain are divided into components as term candidates, the composites are assigned to at least one of the classes, and the characteristics for the composites and the components are determined. The additional features enable better training of the artificial neural network even with limited availability of term candidates from a limited range of text, without the need for new texts with new composites.

At least one word vector is advantageously determined as a feature. If the word vectors are used in the training data, the artificial neural network itself does not need an embedding layer that consists of the

Term candidates determined word vectors as characteristics.

A productivity and a frequency of the

Components determined as characteristics based on the specific body. The additional features of productivity and a frequency enable better training of the artificial neural network even with limited availability of term candidates from a limited amount of text, without the need to add new texts with new composites.

An artificial neural network comprises an input layer which can be specified with a feature, the artificial neural network being designed to assign a characteristic to the feature from the input layer in a prediction in an output layer of the artificial neural network, the

Features characterize term candidates from a corpus, the corpus comprising a text from a domain, the identifier characterizing at least three different classes for the term candidates, different classes having different degrees of belonging to the

Specify term candidates for the domain. This artificial neural network is a particularly efficient implementation of a classification of composites in more than two classes.

The artificial neural network advantageously comprises at least one first input layer, to which a compound and its components can be predefined for a first characteristic, the artificial neural network comprising at least a second input layer, which can be predefined a productivity and a frequency of the components for a second characteristic, wherein the output layer is subordinate to the input layers and outputs the indicator in the prediction depending on the first characteristic and the second characteristic. The

Additional features improve the efficiency and reliability of the

Prediction of the artificial neural network additionally.

The artificial neural network preferably comprises a further one

Output layer that is formed a degree of assignment of a

Issue composite to the at least three classes regardless of the productivity and frequency of its components. This one more

Output layer is an auxiliary output that turns into an error function

Optimization can be used.

The artificial neural network preferably comprises a further one

Output layer which is designed to output a degree of assignment of one of the components to the at least three classes depending on the productivity and the frequency of this component. This one more

Output layer is an auxiliary output that turns into an error function

Optimization can be used.

In a procedure for classifying term candidates, one

Input feature of an artificial neural network predefines a feature, wherein the feature from the input layer is assigned a identifier in a prediction in an output layer of the artificial neural network, the features characterizing term candidates from a corpus, the corpus comprising a text from a domain, wherein the

Mark at least three different classes for the

Characterized term candidates, with different classes indicating different degrees of belonging of the term candidate to the domain. The Classification in more than two classes allows in addition to one

Detection of whether a term candidate is a term or not a term relating to the domain, providing a finely classified data record with more than two classes.

A composite and its components for a first characteristic are advantageously specified for a first input layer, productivity and a frequency of the components for a second characteristic being specified for a second input layer, the output layer representing the

Subordinate input layers and the indicator is output in the prediction depending on the first characteristic and the second characteristic. The prediction is further improved by adding these additional features.

Further advantageous embodiments result from the following description and the drawing. In the drawing shows

1 schematically shows parts of an artificial neural network,

2 schematically shows parts of a model for text analysis,

Fig. 3 schematically steps in a training or classification process.

In the following description, the term domain denotes a specialist or subject area.

In an example described below, the classes NONTERM, SIMTERM, TERM, SPECTERM are used.

NONTERM is a class for components or composites that have no particular relation to the domain. For example, a

General-language compound without special relation to the domain classified as NONTERM.

SIMTERM is a class for components or composites that have a greater relationship to the domain than components or composites from the class NONTERM. For example, components or composites with a semantic reference to the domain are classified as SIMTERM.

TERM is a class for components or composites that have a greater relationship to the domain than components or composites from the SIMTERM class. For example, understandable components or composites related to the domain are classified as TERM.

SPECTERM is a class for components or composites that are more related to the domain than components or composites from the TERM class. For example, incomprehensible components or composites related to the domain are classified as SPCTERM.

These four classes represent different degrees of belonging to a domain. More precisely, the degree of belonging to the domain increases with the classification of NONTERM via SIMTERM and TERM to SPECTERM. For example, the four classes are assigned four indicators as follows: The class NONTERM is assigned a first scalar 01, the class

A second scalar o _{2 is} assigned to SIMTERM, a third scalar 03 is assigned to the TERM class and a fourth scalar 04 is assigned to the SPECTERM classes. In the example, a vector 0 = ^, o ₂ , o ₃ , o ₄ ) ^{T is} used as the identifier. In the example, each scalar has a value between 0 and 1, the degree of membership increasing with the value of the respective scalar from 0 to 1.

The degree of belonging to components or composites of a corpus is a measure of a degree of difficulty of the text from the corpus, i.e. its level or its specificity with regard to the domain. Texts with components or composites in the SPECTERM class are very likely to be written by experts or for experts. Texts without components or composites in the classes TERM or SPECTERM are large

Probability not specific to the domain.

In the following description, training data include characteristics and

Mark. More specifically, at least one identifier is assigned to a feature in the training data. In one aspect, at least one identifier is assigned to each characteristic. Features characterize term candidates in the following description. In one aspect, a characteristic uniquely represents a term candidate. A characteristic is, for example, a word vector that represents the term candidate. In a further aspect, a feature represents a productivity or a frequency of a component of a composite in a subject-specific corpus with respect to a general language corpus.

In the following description, characteristics characterize a class. In one aspect, a label clearly represents a class. On

Characteristic is, for example, a vector s with scalars Si, s ₂ , S ₃ , s ₄ whose value between 0 and 1 represents a degree of belonging to this class. For example, the value 1 represents a high degree of affiliation. The value 0 represents, for example, a low degree of affiliation.

An artificial neural network according to a first embodiment is described below with reference to FIG. 1 as an example of a model for classifying text depending on the degree of belonging of a component or a compound to a specific domain.

For example, an output O of the network is defined as:

0 = o (<p (E (x) * W ₁ ) * W ₂ ))

Here x is a word, i.e. a compound or a component and z = E (x) an the exit of an embedding layer in which the function

maps a word x to a vector z. In the example, the vector z for a word x is a 200-dimensional word vector. If a number n words are used in a batch of size b, n vectors z are used in a matrix Z with the dimension [n * 200, b]. Wi and W ₂ are weighting matrices. The weighting matrix W1 in the example for n words has a dimension [64, n * 200] matching the n 200-dimensional vectors z. cp is an activation function. In the example, the tangent hyperbolic function is used as the activation function as follows cp (z * W1) = tanh (z * W1).

In a dense layer in the example at the exit d of the second hidden layer 106 d = cp (Wi * z) with the dimensions [64, b] is used.

In the example, the weighting matrix W ₂ has a dimension [4, 64] suitable for the four classes NONTERM, SIMTERM, TERM, SPECTERM. In the example, 4 neurons are used as output O in the output layer. In the example, s is a Softmax activation function with which one

Probability of belonging to one of the classes is determined. The Softmax activation function converts a raw value into one

Probability, which also serves as a measure of certainty regarding the correctness of the result. As softmax activation function for a neuron i from output O becomes, for example, with n = 4 neurons

0 = (o _1; o ₂ , o ₃ , o ₄ ) in the output layer for each scalar output o _; uses the following function where y _{t is} the line i and y _{k is} the line k of a vector y = cp (E (x) * W1) * W ₂ .

An example assignment is given below:

NONTERM is assigned to o ₁ , SIMTERM is assigned to o ₂ , TERM is assigned to o ₃ , SPECTERM is assigned to o ₄ . The scalar value o _; is a degree to which the term belongs to the respective class. FIG. 1 shows schematically, as an example of a model, parts of an artificial neural network 100 with layers lying one behind the other. The artificial neural network 100 comprises an input layer 102, a first hidden layer 104, a second hidden layer 106 and an output layer 108.

The input layer 102 is designed to transfer a term candidate T as the word x to the first hidden layer 104.

The first hidden layer 104 in the example is the function E (x), i.e. the embedding layer in which the function

maps the word x to the vector z.

The mapping is carried out, for example, using a continuous bag-of-words, CBOW, model. For example, a Word2Vec CBOW model according to Tornas Mikolov et. al, 2013, Distributed representations of words and phrases and their compositionality, Advances in Neural Information Processing Systems, pages 3111-3119, Curran Associatates, Inc., used to generate the 200-dimensional word vector.

For example, in one aspect, the CBOW model is trained using a lexicon to learn the weights of the first hidden layer 104 for words. For example, a previously trained CBOW model is used to initialize the embedding layer. The first hidden layer 104 is then initialized with appropriate weights for words.

Words that are not recognized as such are mapped to a word vector z with random elements, for example. Words that have been recognized are mapped onto the corresponding word vector z. The word vector represents the term candidates.

The word vector z is transferred from the first hidden layer 104 to the second hidden layer 106. The second hidden layer 106 uses the first weight matrix Wi and the activation function cp. In the example, the tangent hyperbolic function is used as the activation function in the second hidden layer 106 as follows: d = cp (E (x) ^* Wi) = tanh (z ^* Wi).

The output d is passed to the output layer 108. The example uses the Softmax activation function, which is used to determine the probability of the word x belonging to one of the classes.

The weight matrices and the activation functions are parameters of the artificial neural network 100. The parameters, in particular the

Weight matrices are changeable in a workout.

A method for training this artificial neural network 100 is described below.

Training data for training this artificial neural network 100 include features and characteristics. More specifically, the training data include an assignment of features to license plates.

The features characterize term candidates T from a subject-specific corpus. A label s characterizes at least three different classes for the term candidates T. In the example, the label s characterizes the four classes NONTERM, SIMTERM, TERM, SPECTERM. The classes indicate the degree of belonging of the term candidate T to the domain.

Annotators, ie people for example, look for words or word combinations as term candidates T from the subject-specific corpus and assign them to one of the four classes. For a term candidate T, the assignment in the training data includes, for example, as a feature a word vector z which represents the term candidate T. In code s, a first scalar si is assigned to class NONTERM, a second scalar s ₂ to class SIMTERM, a third scalar S3 to class TERM and a fourth scalar s ₄ to classes SPECTERM. In the example, a vector s = (s ₁ , s ₂ , s ₃ , s ₄ ) ^{7 '} is used as the identifier. used. In the example, each scalar has a value between 0 and 1, the degree of membership increasing with the value of the respective scalar from 0 to 1, for example. The indicator includes values that the annotator selects.

When the artificial neural network 100 is trained, the composites can be searched automatically using a splitter, and the classes can be predicted using the already trained artificial neural network 100.

A term candidate T is predefined for the input layer 102 of the artificial neural network 100. Provision can be made to initialize the parameters of the artificial neural network 100 with random values before the training. For training, a group of term candidates T can be specified simultaneously as a batch, for example with b = 32 training examples.

The artificial neural network 100 maps a feature that the

Term candidates T from the input layer 102 represent a prediction in the output layer 108 of the artificial neural network 100

Indicator o to. The prediction is made using the model described. The result of the prediction in the example with b = 32 training examples is a matrix O with 32 vectors 01, ... 032.

In a comparison, the identifier o from the output layer 108 is compared with the identifier s assigned to this feature in the training data. For example, an error function is evaluated in the comparison, for example a difference, in particular a Euclidean distance, between the vector s and the vector o.

Depending on the result of the comparison, at least one parameter of the artificial neural network 100 is learned. The parameter characterizes a connection of the artificial neural network 100 between the

Input layer 102 and output layer 108. For example, the weight matrices W1 and W _{2 are} learned depending on the error function until the error function is minimized. For example, the Stochastic Gradient Descent (SGD) method is used. A large number of assignments of features to identifiers is preferably provided in the training data. 50 epochs are used in the example. 32 training examples are processed in each of the 50 epochs.

In this case, a training data record comprises 1600 assignments. Provision can be made to carry out the training with a different number of epochs or with a different size of a training data record.

By using the at least three classes, instead of a binary decision as to whether a term candidate T is a term from the domain or not, it is possible to provide an artificial neural network that defines a degree of membership. This enables a finer classification.

It is particularly advantageous if T is the only term candidate

Composites are used. Such a trained artificial neural network enables a particularly efficient classification of texts based on the composites contained therein.

In this case, training data of a training data record include an assignment of features that represent composites to the indicators that represent the class into which the composites have been classified by annotators. The composites are taken from a corpus specific to a domain. The model is trained for a classification depending on the degree of belonging of a component to a specific domain.

The training data set is based on the following aspects.

Composites are word compositions that contain words or stem words as components. Depending on the composition of the components, composites are formed which have a greater or lesser degree of

Belong to a domain. For example, a corn component can be assigned to a cooking domain or an agriculture domain. In this example, a composite of maize cultivation can only be assigned to the agricultural domain. In this example, a composite of corn flour can only be assigned to the cooking domain. In this example, a classification of the composite is possible by classifying the two other components - cultivation and flour. The compound corn cultivation can also be associated with the domain of cooking. The

Kompositum maize cultivation can be classified, for example, in the SIMTERM class.

To create the training data record, a text or a text collection with a known reference to this domain is used as a subject-specific corpus. In the example, the subject-specific corpus is a text collection of cooking recipes. This contains possible technical terms from the domain "cooking" as term candidates.

Term candidates are identified from the subject-specific corpus. In the example, composites are identified as term candidates. The

Term candidates, i.e. the composites become lexical

Assigned compound definitions or examples. For example, a text is used as a lexical definition or example.

For training and classification, term candidates with a certain minimum length are taken into account in the example. Term candidates with only one letter are ignored in this case. Without a minimum length, term candidates with just one letter could alternatively be classified in the NONTERM class.

As a gold standard for training by one or more annotators, the term candidates are assigned a user-based assessment of specificity and centrality. In the example, for one

Term candidate uses a multidimensional scale using the four classes to assign the term candidate a classification in one of the classes NONTERM, SIMTERM, TERM or SPECTERM. The annotators are required to classify a term candidate into the SPECTERM class if it is very specific for the specific domain, in the example "cooking" and has a high degree of proximity to the specific domain. The

Annotators are required to classify a term candidate into the SIMTERM class if it is very specific and has a medium degree of proximity to the particular domain. The annotators are required to classify a term candidate into the TERM class if he is close to the particular domain, in the example "cooking", but is otherwise technically unspecific. The annotators are required to classify other term candidates into the NONTERM class.

Example of a classification of a term candidate from the

subject-specific corpus is considered the compound "maize cultivation". The composite corn cultivation and the definition for classification are presented to a large number of annotators. For example, some annotators classify the compound into the NONTERM class based on this definition. Other annotators classify the compound, for example, into the class

SIMTERM.

In the example, the training data record is supplemented by the entry maize cultivation in the class into which the composite maize cultivation was classified by all or a majority of the annotators. For example, a training record contains an assignment of a characteristic that represents the entry maize cultivation to one of the classes. For example, the word vector z that the

Term candidates characterized by maize cultivation, assigned to the vector s that characterizes the class SIMTERM.

A training data record contains a large number of such assignments for a large number of different term candidates. Starting from this

Training set, the model is trained. In training, the large number of such assignments from a training data set is used to determine the

To learn weight matrices.

In the first embodiment, the word vectors z representing the composites are used as features. The weight matrices Wi and W ₂ are learned depending on these word vectors z, the vector s and a corresponding error function.

A further improvement is possible if, in addition to the composites, their components are used. Additional features are used for this.

This is described below with reference to the artificial neural network 200 according to a second embodiment, which is shown schematically in FIG. 2. The artificial neural network 200 comprises a first input layer 202a, a second input layer 202b, a third input layer 202c, a fourth input layer 202d and a fifth input layer 202e. The artificial neural network 200 comprises a first hidden layer 204a, which is arranged after the second input layer 202b, a second hidden layer 204b, which is arranged after the third input layer 202c, a third hidden layer 204c, which is arranged after the fourth input layer 202e.

The artificial neural network 200 comprises a fourth hidden layer 206a, which is arranged after the first input layer 202a. The artificial neural network 200 comprises a fifth hidden layer 206b, which is arranged downstream of the first hidden layer 204a. The artificial neural network 200 comprises a sixth hidden layer 206c, which is arranged downstream of the second hidden layer 204b. The artificial neural network 200 comprises a seventh hidden layer 206d, which is arranged after the third hidden layer 204c. The artificial neural network 200 comprises an eighth hidden layer 206e, which is arranged after the fifth input layer 202e.

The artificial neural network 200 comprises a ninth hidden layer 208a, which is arranged after the fourth hidden layer 206a and the fifth hidden layer 206b. The artificial neural network 200 comprises a tenth hidden layer 208b, which is arranged after the seventh hidden layer 206d and the eighth hidden layer 206e.

The artificial neural network 200 comprises an eleventh hidden layer 210, which is arranged after the ninth hidden layer 208a and the tenth hidden layer 208b. The artificial neural network 200 comprises a first one

Output layer 212a, which is after the ninth hidden layer 208a. The artificial neural network 200 comprises a second output layer 212b, which is arranged after the sixth hidden layer 206c. The artificial neural network 200 comprises a third output layer 212c, which is arranged after the tenth hidden layer 208b. The artificial neural network 200 comprises a fourth output layer 214, which is arranged downstream of the eleventh hidden layer 210. The third input layer 202c is designed as an input for term candidates.

In the example, term candidates are composites c _2, ie

Word compositions used.

The second input layers 202b and the fourth input layer 202d are designed as an input layer for components Ci, C of the composite c. A first component Ci and a second component C are shown in FIG. 2, but more than two components can also be used if the composite contains more than two components.

Generally, an input to artificial neural network 200 includes compound c and each of its components.

In a batch with a number of b training data records, the

Input layers each have a vector of dimension [1, b] for each of the

Components and the compound are specified individually.

For example, a vector x in which the composite c and its components are concatenated is used as the input for the hidden layers downstream of the second input layer 202b, the third input layer 202c and the fourth input layer 202d. For the example shown in FIG. 2 with two components Ci, C, a model for concatenated vectors is used

For example, the following vector is used: x = (c _1; c ₂ , c ₃ ).

The function E forms x, for example, on a concatenated word vector from. The word vector z is a characteristic for the assignment.

It can also be provided that a single word is assigned to each input and only concatenated in the dense layer. In this case, single vectors

Xi = Ci X2 = c ₂

X3 = c ₃ and zi = E (xi)

2.2 = E (X ₂ )

z ₃ = E (X ₃ ) used.

In a batch of size b, the vectors xi _, x _2, x _{3 have} the dimension [1, b], where zi _, z _2, z _{3 represents} a matrix of the dimension [200, b].

A respective output of the fifth hidden layer 206b, the sixth hidden layer 206c and the seventh hidden layer 206d is given below for the individually calculated vectors: h = cp (E (ci) * Wi) output of the fifth hidden layer 206b,

1 ₂ = rp (E (c ₂ ) * W ₂ ) output of the sixth hidden layer 206c,

1 ₃ = cp (E (c ₃ ) * W ₃ ) output of the seventh hidden layer 206d.

The function E represents the embedding layer, which for example uses the bag-of-words model to map the respective part of the vector x to a respective part of the word vector z.

The output h of the fifth hidden layer 206b, the output l _{2 of} the sixth hidden layer 206c and the output l _{3 of} the seventh hidden layer 206d each have the dimension [64, b] in the example of the batch with b training data.

The first input layer 202a is an input for a first frequency f (ci) and a first productivity P (ci) of a first component Ci from a composite c ₂ . The fifth input layer 202c is an input for a second frequency (fC3) and a second productivity P (C3) of a second component C3 from one

Composite C2. Frequency here denotes a frequency of occurrence of the respective component Ci, C3 in other composites in the subject-specific corpus based on all components from the subject-specific corpus.

Productivity refers to a number of different ones

Composites in which the respective component Ci _{, C} 3 is contained in composites other than the composite c ₂ in the subject-specific corpus.

Productivity and frequency are two other features for that

Assignment.

In the example, for the first input layer 202a, vi = (/ Oi) ^; P (ci)) and the fifth input layer 202c

^V ₂ = (/ s); P (c ₃ )) used.

A multidimensional vector v with the dimensions frequency and productivity of the individual components is generally used as the input:

V = Ol, V ₂ ).

An output U of the fourth hidden layer 206a and an output I _{5 of} the eighth hidden layer 206e are

U = cp (W4 ^* vi) output of the fourth hidden layer 206a,

I ₅ = cp (W ₅ ^* v ₂ ) output of the eighth hidden layer 206e.

The output U of the fourth hidden layer 206a and the output h of the fifth hidden layer 206b have the batch with b in the B example Training data each have the dimension [64, b]. The output U of the fourth hidden layer 206a and the output h of the fifth hidden layer 206b form an input of the ninth hidden layer 208a.

The output Is of the eighth hidden layer 206e and the output of the seventh hidden layer 206d have the batch with b in the example

Training data each have the dimension [64, b]. The output I5 of the eighth hidden layer 206e and the output I _{3 of} the seventh hidden layer 206d form an input of the tenth hidden layer 208b.

An output l _{6 of} the ninth hidden layer 208a and an output l _{7 of} the tenth hidden layer 208b are

1 ₆ = [h; U] ^T output of the ninth hidden layer 208a,

17 = [; Is] ^T exit of tenth buried layer 208b.

The ninth hidden layer 208a and the tenth hidden layer 208b concatenate their respective entrances in the example.

The output IQ of the ninth hidden layer 208a and the output I7 of the tenth hidden layer 208b have the batch with b in the example

Training data one dimension each [128, b]. The output _Q of the I ninth hidden layer 208a and the output of the tenth I7 hidden layer 208b form the output L ₂ of the sixth hidden layer 206c to the input of the eleventh hidden layer 210th

The output Is of the eleventh buried layer 210 is

Is = [I _Q ; H; I7] ^T exit of the eleventh hidden layer 210.

In the example of the batch with b training data, the output Is of the eleventh hidden layer 210 has the dimension [320, b]. In one aspect, the output of the fourth output layer 214 forms the output of the artificial neural network 200:

0 = o (W ₆ ^* l ₈ ). The output of the artificial neural network 200, in the example the output of the fourth output layer 214, has in the example the batch with b training data one dimension [4, b].

For an optimization of the artificial neural network 200 during training or thereafter, this output O is used in an error function, for example in a stochastic gradient descent method, with the vector s in order to adapt the weights of the weight matrices.

In an optional further aspect, the output O and auxiliary _outputs O _{aUx are} provided

0 = o (W ₆ ^* l ₈ ) output of the fourth output layer 214, O _aUx1 = o (W 7 ^* Iq) output of the first output layer 212a, O _aUx2 = o (We ^* I2) output of the second output layer 212b, O _aUx3 = 0 (Wg ^* I7) output of the third output layer 212c.

The auxiliary _outputs O _auxi , O _auX 2 and O _auX 3 have one dimension in the example of the batch with b training _data [4, b].

The information from the auxiliary _outputs O _aUxi and O _auX 3 for the components ci, C3 are used to optimize the artificial neural network 200 on the way to the output O. The layers leading to the auxiliary _outputs O _auxi and O _auX 3 _sharpen the knowledge in the artificial neural network 200 into which classes the components belong. For the output O, the artificial neural network 200 learns to what extent this information helps to classify the composite.

For example, in a compound "tomato | soup" both components are probably classified as TERM, and then the compound at exit O as well. In the case of a compound "Dosen | soup", the component "Dose" is probably classified as NONTERM and the component "soup" as TERM. The artificial neural network 200 learns again for the output O that TERM mostly prevails with this component combination and makes up the class. For example, the artificial neural network 200 for the compound "purslane | salad" learns from a combination of "purslane" as SPECTERM and "salad" as TERM that the class of the compound is SPECTERM.

The activation function cp is, for example, for a respective input y, and a respective one of the i weight matrices W, defined as cp () = tanh (y * \ L)))

In the example, output O characterizes the i-th of the four classes NONTERM, SIMTERM, TERM, SPECTERM in this order. For example, the output as O = (o _1; o ₂ , o ₃ , o ₄ ) in the output layer for a respective input yi and an i-th of the n scalar outputs o _; uses the following function

The value of o _; specifies in the example starting from 0 an increasing degree of belonging to the class for which o _; was determined.

The j optional additional outputs O _aU xj

= (o _auxj i, o _{auxj 2} , o _{auxj 3} , o _auxj4 ) each give i value o _aU xji, which in the example also indicates the degree of belonging to the i-th class, starting from 0 and increasing to the maximum value 1 , More precisely, the output O _{auxi indicates} the degree of belonging of the first component Ci to the classes. The output O _auX 2 indicates the degree of belonging of the composite c ₂ to the classes. The output O _auX 3 indicates the degree of belonging of the component C3 to the classes. For a further optimization of the artificial neural network 200, the values o _auxjl , o _auxj2 , o _auxj3 , o _auxj4 weighted are used in an error function. In an exemplary error function, the output of the fourth output layer 214 is used with a factor of 1 and all optional outputs are weighted with a factor of 0.2. Another weighting can also be used. A back propagation algorithm, for example, is used for training the neural network Outputs to optimize the weights of the weight matrices with

different weights used.

The dimensions of the weight matrices W are determined to match the dimensions of the respective input layer 202a, 202b, 202c, 202d, 202e and the respective output layer 212a, 212b, 212c, 214. The weight matrix Wi of the fifth hidden layer 206b has, for example, a dimension 200 × 64 for a 200-dimensional word vector zi. Correspondingly, weight matrices W ₂ and W3 of the sixth hidden layer 206c and the seventh hidden layer 206d have for 200-dimensional word vectors z ₂ and Z3 the same dimensions.

The productivity and the frequency of a component are scalars in the example, the associated vector vi or v ₂ is two-dimensional. The weight matrices W3 and W ₄ have the dimension 2 × b, for example, in the batch of size b. The ninth buried layer 208a combines the outputs h and U. The tenth buried layer 208b holds outputs I3 and I5. The dimensions of the respective weight matrices are adapted to the dimensions of the respective outputs and to the size of the batch b.

More or less optional outputs and other suitable dimensions can also be used. The outputs and vectors are combined, for example, by concatenation.

The artificial neural network 200 generally assigns an identifier O to a feature z, v, which represents the compound c ₂ from the input layer 202, in a prediction in the output layer 214 of the artificial neural network 200. The prediction is made using the model described. In the example, the result of the prediction is the vector O.

In a comparison, the identifier O is compared with the identifier s assigned to this characteristic in the training data. For example, an error function, in particular a difference between the vector s and the vector O, is used in the comparison. Depending on the result of the comparison, at least one parameter of the artificial neural network is learned. The parameter characterizes a connection of the artificial neural network between the input layer 102 and the output layer 108. For example, the weight matrices Wi and W _{2 are determined} depending on the difference. For this purpose, an error function is evaluated with which the difference is minimized. For example, the Stochastic Gradient Descent (SGD) method is used.

The second embodiment is based on the following additional aspects compared to the first embodiment.

Productivity and frequency form a degree for a thematic

Assignment, i.e. a centrality, and a degree for a difficulty, i.e. a specificity or a level. Components of a composite that often occur in different composites are most likely central components for this domain. Components of a composite that occur in small numbers are most likely components that are specific to the domain.

Depending on the composition of the components, composites are formed that have a greater or lesser degree of belonging to a domain. For example, a corn component can be assigned to a cooking domain or an agriculture domain. In this example, a composite of maize cultivation can only be assigned to the agricultural domain. In this example, a composite of corn flour can only be assigned to the cooking domain. In this example, the composite can be classified by classifying the common component maize and / or by the two further components - cultivation and flour.

For example, one of the words or stems is one

Word composition as a component can only be assigned to one class.

For example, each of the components can be classified into at least one and the same class. The word composition, ie the compound which consists of these components or which contains these components, is automatically classified into this class, for example. In another aspect of the classification, a word composition contains at least two components that can be classified in different classes, wherein the at least two components cannot be classified in a common class. In this case, a classification of the composite that consists of these components or that contains these components is not clear. In this case, one can be automated

Majority decision is made, according to which the compound is classified in the class in which most of its components are classified. Even if this is excluded due to a lack of majority relationships, there are certain classes in which none of the words or none of the

Word stems of word composition classified as a component were excluded.

Therefore, using the components in the classification in addition to using the composites themselves offers a significant improvement in the classification. This is particularly important for composites that do not occur very often or whose composition was unknown in the training data set with which the model was trained. Even if individual components of a composite are unknown, a classification using the other components of the composite for previously unknown components can be learned in training with this training data set.

In order to create a training data set, the word vectors are trained on a general language corpus in order to be as extensive as possible

Get data material. A fine adjustment is made by training the word vectors on a body that is subject-specific for the domain.

For example, text or a text collection with a known reference to this domain is used as a subject-specific corpus. In the example it is

subject-specific corpus a text collection of cooking recipes. This contains possible technical terms from the domain "cooking" as term candidates. to

Determining productivity or frequency, for example, only uses the subject-specific corpus.

Term candidates become lexical compound definitions or examples assigned. For example, a text is used as a lexical definition or example.

As a gold standard for training by one or more annotators, the term candidates are assigned a user-based assessment of specificity and centrality. In this case, for one

Term candidate used a multidimensional scale to match the

Assign term candidates to one of the classes NONTERM, SIMTERM, TERM or SPECTERM. Frequency and productivity, for example, are added to the training data set as vector v in addition to word vector z. The annotators are required to classify a term candidate into the SPECTERM class if it is very specific to the particular term

Domain, in the example "cooking" and has a high degree of proximity to the particular domain. The annotators are stopped, one

Classify term candidates into the SIMTERM class if they have a medium degree of proximity to the specific domain. The annotators are required to classify a term candidate into the TERM class if he is very close to the specific domain, in the example "cooking", but is otherwise technically unspecific. The annotators are required to classify other term candidates into the NONTERM class.

Example of a classification of a term candidate from the

subject-specific corpus is considered the compound "tomato puree". According to one definition, tomato puree is a paste made from tomatoes, which is mainly used in the kitchen to make sauces. The compound tomato puree and the definition become a multitude of annotators

Classification submitted. For example, some annotators classify the compound into the TERM class based on this definition. Other

Annotators classify the compound, for example, into the class

SPECTERM. In the example, the training data record is supplemented by the entry tomato puree in the class into which the compound tomato puree was classified by all or a majority of the annotators. For example, a training record contains an assignment of a characteristic that is the entry

Tomato puree represents one of the classes. For example, the characteristics of the word vector z and the vector v, which are the term candidates

Characterize tomato puree, assigned to the vector s that characterizes the class SPECTERM.

The training data record comprises a large number of such assignments.

In training, a variety of such assignments are made from the

Training record used to learn the weight matrices.

In the second embodiment, the weight matrices are learned depending on the features that represent the composites. The additional

Features that are dependent on the components and the productivity and / or frequency of the components are also used.

In addition to the compound "tomato puree", characteristics are used in the training that its relevant components "tomato" and "puree"

characterize. For example, a correspondingly concatenated word vector z and a concatenated vector v are used, which characterize productivity and frequency. The layers of the artificial neural network 200 and the vectors and matrices for calculation are used, for example

summarized and sorted accordingly.

When generating the training data record for the "Cooking" domain, the relevant composites in the example of annotators are manually classified into the classes SIMTERM, TERM or SPECTERM, since they are names with different central and differently specific references to the topic of cooking. The class of components is based on the

Composites from the training data set in which they occur are estimated.

For example, a tomato component is likely to be estimated to have the TERM class because the tomato component is very common in Composites such as "tomato soup", "tomato salad", "tomato bake",

"Tomato puree", ... occurs that are classified as TERM and less often in other composites. This classification need not always be the same as the composite classes annotated by the annotators. Nevertheless, this information from the auxiliary _outputs O _aUxi and O _aUx 3 _optimizes the result.

Starting from this training set, a model according to the second embodiment is trained as described below.

All weights from the weight matrices of the artificial neural network 200 are set to the same value, for example at the beginning of the training. Random values can also be used.

Training of the model with the training data set is described using the example of the compound "tomato puree".

In a first step, the compound is broken down into components. The components of the word "tomato puree" include the

Word stem "tomato" and the word "puree". The remaining component "n" is considered in the example as a joint element and not. That in the example, only components that are longer than a minimum length of two letters are used. The resulting relevant components "tomato" and "puree" as well as the compound "tomato puree" form the starting dates for the classification by the model.

In the example, the artificial neural network 200 is used, the weights of which can be adjusted by at least one optimization function. The weights are adjusted depending on the optimization function and the training data set so that the compound "tomato puree" is assigned to the TERM class with a high probability. It can be provided that the further outputs for the components are also optimized, so that the component "tomato" is assigned to the class TERM with a high probability and the component "puree" is assigned to the class SPECTERM with a high probability. For this purpose, an extended training data record is used, in which assignments of features that represent known components belong to corresponding classes are included. This means that the compound is more precisely the compound vector to help decision making.

In general, as shown in FIG. 3, in a first step S1 the composites are searched for as term candidates in the specific body 302 and divided into components. In a second step S2, the composites are automated or assigned to at least one of the classes by annotators. In a third step S3, the characteristics for the composites and the components are determined. That is, word vectors, productivity, and frequency are determined based on the specific body 302. In a fourth step S4, the various models of the artificial neural network are trained with the features in order to predict their classes for the composites in a fifth step S5.

An analysis of a text containing the compound "tomato puree" by means of the model according to the second embodiment, which was trained with the corresponding training data set, comprises the following aspects.

The compound tomato puree is first broken down into its components.

The resulting relevant components "tomato" and "puree" are evaluated in terms of their productivity and frequency in the subject-specific corpus. The characteristics are transferred to the corresponding input layers of the model depending on the compound tomato puree, its relevant components tomato and puree, as well as the productivity and frequency. The compound "tomato puree" is assigned to one of the classes.

The composites and their components are optionally generated by a splitter which, as term candidates, T composita c is extracted from a subject-specific corpus and divided into a number i components q.

For example, the splitter works as described in one of the following references:

CharSplit: Character ngram-based Splitting of sparse compound nouns, Appendix A.3, Don Tuggener, 2016, Incremental Coreference Resolution for German, Thesis presented to the Faculty of Arts and Social Sciences of the University of Zurich.

CompoST: Fabienne Cap, 2014, Morphological Processing of Compounds for Statistical Machine Translation, submitted to the Institute for Machine Language Processing at the University of Stuttgart.

SCS: Marion Weller-Di Marco, 2017, Simple compound Splitting for German, Proceedings of the 13th workshop on Multiword Expressions, MWE @ EACL 2017, pages 161-166, Valencia, Spain.

Composites from the subject-specific corpus in German are preferably first divided using a CompoST procedure. Then the procedure according to SCS is applied and finally the procedure according to CharSplit is applied. This enables particularly good results to be achieved. Corresponding other fragments are used in the same way for other languages.

Will an analysis of the text using the model after the first

Embodiment performed, the procedure is as described for the second embodiment. The step of breaking down into components and the determination and use of productivity and frequency are omitted in this case. Instead, the model according to the first embodiment is used directly with the term candidates.

Both methods of text analysis are a significant improvement over conventional methods for classifying text.

Instead of the exclusive use of an artificial neural network, other machine learning approaches can also be used. For example, another deep learning approach or classifier that can predict for more than two classes can be used. Instead of a computer-implemented method based on an artificial neural network, another statistical method can also be used for the classification. In one aspect, the classification of text includes the artificial neural network. The artificial neural network can be integrated as a device, for example as specific hardware, for example application-specific

Circuit, ASIC, or in the field programmable logic gate arrangement, FPGA. The system can also include a processor as a universal integrated circuit that maps the artificial neural network or interacts with the specific hardware. The artificial neural network provides in particular for a computer with a universal integrated

Circuit represents a computer-implemented data structure that significantly improves the internal functioning of the computer itself.

Claims

Expectations

1. Computer-implemented method for draining an artificial

neural network (100; 200) with training data, which include features (z; z, v) and identifier (s), characterized in that the features (z; z, v) term candidates (T; c ₂ ) from a body ( 302) characterize, the body (302) comprising a text from a domain, wherein the

Characteristic (s) characterizes a degree of belonging to at least three different classes for the term candidates (T; c ₂ ), different classes indicating different degrees of membership of the term candidates (T; c ₂ ) to the domain, with the training data being assigned from features (z; z, v) to identifiers (s), an input layer (102; 202a, ..., 202e) of the artificial neural network (100; 200) being given a feature (z; z, v) (S3), wherein the artificial neural network (100; 200) features (z; z, v) from the input layer (102; 202a, ..., 202e) in a prediction in an output layer (208; 214) of the artificial neural network (100; 200) assigns an identifier (O) (S5), the identifier (O) from the output layer (208; 214) being compared with the identifier (s) which corresponds to the feature (z; z, v ) in the training data is compared (S4), depending on the result of the comparison we at least one parameter (Wi, W ₂ ; Wi, W ₂ , W ₃ , W ₄ , W ₅ , W ₆ , W ₇ , Ws, Wg) of the artificial neural network (100; 200) is learned (S4), which connects the artificial neural network between the

Input layer (102; 202a, ..., 202e) and the output layer (208; 214) characterized.

2. The method according to claim 1, characterized in that the

Term candidates (T, c ₂ ) are taken from a body (302) that is subject-specific to the domain (S1).

3. The method according to claim 2, characterized in that the

Term candidates (T) are assigned to at least one of the classes (S2) and the features (z) for the term candidates (T) are determined (S3), in particular a word vector (z) being determined, and at least one parameter (Wi, W ₂ ) of the artificial neural network (100) with the features (z) is trained (S4).

4. The method according to claim 1 or 2, characterized in that the

Term candidates (c ₂ ) are composites (c ₂ ) with at least two components (ci, C ₃ ).

5. The method according to claim 4, characterized in that composites (c ₂ ) from a body-specific domain (302) as term candidates (c ₂ ) are divided into components (ci, C3) (S1), the composites (c ₂ ) are assigned to at least one of the classes (S2), the features (z, v) for the composites (c ₂ ) and the components (ci, c ₂ ) are determined (S3), and at least one parameter (W ₁ , W ₂ , W ₃ , W ₄ , W ₅ , \ L / _Q , W ₇ , Ws, Wg) of the artificial neural network (200) with the

Characteristics (z, v) is trained (S4).

6. The method according to claim 5, characterized in that at least one word vector is determined as a feature (z).

7. The method according to claim 5 or 6, characterized in that a productivity (P (ci), P (C ₃ )) and a frequency (f (ci), f (C ₃ )) of the components (ci, c ₂ ) can be determined as features (v) on the basis of the specific body (302).

8. Computer-implemented method for generating training data for training an artificial neural network (100; 200), the training data comprising features (z; z, v) and identifier (s), characterized in that features (z; z, v) be determined the

Characterize term candidates (T, c ₂ ) from a corpus (302), the corpus (302) comprising a text from a domain, an identifier (s) being determined which indicates a degree of belonging to at least three different classes for the Term candidates (T; c ₂ ) characterized, with different classes indicating different degrees of belonging of the term candidate to the domain, at least one characteristic (z; z, v) being assigned to at least one of the identifiers (s).

9. The method according to claim 8, characterized in that the

10. The method according to claim 9, characterized in that the

Term candidates (T) are assigned to at least one of the classes (S2) and the features (z) for the term candidates (T) are determined (S3), in particular a word vector (z) being determined.

11. The method according to claim 8 or 9, characterized in that the

12. The method according to claim 11, characterized in that composites (c) from a body-specific domain (302) as term candidates (c ₂ ) are divided into components (ci, C ₃ ) (S1), the composites (c ₂ ) are assigned to at least one of the classes (S2), and the characteristics (z, v) for the composites (c ₂ ) and the components (ci, C3) are determined (S3).

13. The method according to claim 12, characterized in that at least one word vector is determined as a feature (z).

14. The method according to claim 12 or 13, characterized in that a productivity (P (ci), P (C ₃ )) and a frequency (f (ci), f (C ₃ )) of the components (ci, C3) as Features (v) can be determined based on the specific body (302).

15. Artificial neural network (100; 200), characterized in that an input layer (102; 202a, ..., 202e) of the artificial neural network (100; 200) can be given a feature (z; z, v), wherein the artificial neural network (100; 200) is formed, the feature (z; z, v) from the input layer (102; 202a, ..., 202e) in a prediction in a

Assign an identifier (O) to the output layer (208, 214) of the artificial neural network (100; 200), the features (z; z, v)

Characterize term candidates (T, c ₂ ) from a corpus (302), the corpus (302) comprising a text from a domain, the identifier (O) representing at least three different classes for the

Characterized term candidates (T, c ₂ ), with different classes indicating different degrees of belonging of the term candidate (T, c ₂ ) to the domain.

16. Artificial neural network (100; 200) according to claim 15, characterized

characterized in that the artificial neural network (200) comprises at least a first input layer (202b, 202c, 202d), which a compound (c ₂ ) and its components (ci, 03) can be specified for a first feature (z), the artificial neural network (200) at least one second

Input layer (202a, 202e) comprising a productivity (P (ci), P (C3)) and a frequency (f (ci), f (C3)) of the components (ci, 03) for a second feature (v) can be specified, the output layer (214) the

Subordinate input layers and the indicator (O) in the

Outputs prediction depending on the first characteristic (z) and the second characteristic (v).

17. Artificial neural network (100; 200) according to claim 15 or 16, characterized in that the artificial neural network (200) comprises a further output layer (212b), which is formed a degree of assignment of a composite (c) to the at least three classes regardless of the productivity (P (ci), P (C3)) and the frequency (f (ci), f (C3)) of its components (ci, C3).

18. Artificial neural network (100; 200) according to one of claims 15 to 17, characterized in that the artificial neural network (200) comprises a further output layer (212a, 212c) which is designed to provide a degree of assignment of one of the components ( ci, C3) to the at least three classes depending on the productivity (P (ci), P (C3)) and the frequency (f (ci), f (C3)) of this component (ci, C3).

19. A method for classifying term candidates (T, c ₂ ), characterized in that an input layer (102; 202a, ..., 202e) of an artificial neural network (100; 200) is given a feature (z; z, v) with the feature (z; z, v) from the input layer (102; 202a, ..., 202e) in a prediction in an output layer (208, 214) of the artificial neural network (100; 200) an identifier (O ) is assigned, the features (z; z, v) characterizing term candidates (T, c ₂ ) from a corpus (302), the corpus (302) comprising a text from a domain, the identifier (O) at least three different classes for the term candidates (T, c ₂ ) are characterized, different classes indicating different degrees of belonging of the term candidate (T, c ₂ ) to the domain.

20. The method according to claim 19, characterized in that a first input layer (202b, 202c, 202d) a composite (c) and its

Components (ci,) are specified for a first feature (z), with a second input layer (202a, 202e) having a productivity (P (ci), P (C)) and a frequency (f (ci), f (C) ) of the components (ci, C) for a second characteristic (v), the output layer (214) being arranged after the input layers and the identifier (O) in the prediction depending on the first characteristic (z) and the second characteristic (v ) is output.