WO2019016119A1 - Method and system for automatic discovery of topics and trends over time - Google Patents

Method and system for automatic discovery of topics and trends over time Download PDF

Info

Publication number
WO2019016119A1
WO2019016119A1 PCT/EP2018/069210 EP2018069210W WO2019016119A1 WO 2019016119 A1 WO2019016119 A1 WO 2019016119A1 EP 2018069210 W EP2018069210 W EP 2018069210W WO 2019016119 A1 WO2019016119 A1 WO 2019016119A1
Authority
WO
WIPO (PCT)
Prior art keywords
text document
topic
tdc
hidden
text
Prior art date
Application number
PCT/EP2018/069210
Other languages
French (fr)
Inventor
Pankaj Gupta
Subburam Rajaram
Bernt Andrassy
Original Assignee
Siemens Aktiengesellschaft
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Aktiengesellschaft filed Critical Siemens Aktiengesellschaft
Priority to EP18746623.0A priority Critical patent/EP3635504A1/en
Priority to US16/632,022 priority patent/US11520817B2/en
Publication of WO2019016119A1 publication Critical patent/WO2019016119A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the invention relates to a method and system for automatic discovery of topics within temporal ordered text document collections .
  • Topic detection and tracking systems perform natural language processing to find topics over time in a sequence of text collections which exhibit temporal relationships.
  • Probabilis ⁇ tic topic models have been used in the past to extract seman ⁇ tic topics from text documents.
  • Probabilistic topic models such as Latent Dirichlet Allocation LDA have been investigat ⁇ ed to examine the emergence of topics from a sequence of text documents.
  • These conventional methods compute a topic distri ⁇ bution on all text collections to detect and track different topics.
  • these conventional methods do not take into account the dependencies between different text collections over time in an evolutional process.
  • conventional methods do not take into account temporal latent topic de ⁇ pendencies between topic collections evolving over time and forming a temporal sequence of text document collections. Consequently, conventional methods cannot capture topic trends and/or keyword trends accurately.
  • the invention provides according to the first aspect of the present invention a method for performing automatically a discovery of topics within temporal ordered text document collections ,
  • topic trends are derived from the calculated hidden topic vectors of the text document collections .
  • each text document collection comprises one or more text documents published or output at the same time and comprising an associated time stamp indicating the time of publication or output.
  • a sequence of text document collections is sorted according to the associated time stamps to assign time steps to the text document collec ⁇ tions .
  • a hidden topic vector h t representing topics of a text document collection at a time step t is calculated from the bag of words vector v t of this time step t and from the hidden state vectors u t -i of the preceding time step t-1 which is computed from the hidden topic vector h t -i at that preceding time step t-1 and its preceding hidden state vectors u t -2 at time step t-2.
  • the calculated hidden topic vector h of a text document collection comprises a hidden topic probability vector indicating occurrence prob- abilities of different topics within the respective text doc ⁇ ument collection.
  • the generated bag of words vector of a text document collection indicates oc ⁇ currence numbers of words within the respective text document collection .
  • a two- layered recurrent neural network, RNN, -replicated softmax model, RSM is used to calculate the hidden topic vectors for the text document collections of a sequence of text document collections .
  • the two- layered RNN-RSM model comprises
  • an RSM layer including time-ordered hidden state vectors and associated bag of words vectors for text document collections and
  • an RNN hidden layer including the hidden state vectors.
  • the discovered topics are automatically evaluated to control a process.
  • the text document collection comprises text files including words of a natural language consisting of characters.
  • the text document collection comprises source code files written in a programming language.
  • the text document of a text document collection describes features of a technical system.
  • the invention further provides according to a second aspect a topic discovery system comprising the features of claims 13.
  • the invention provides according to the second aspect of the present invention a topic discovery system for automatic dis ⁇ covery of topics and trends within temporal ordered text doc ⁇ ument collections,
  • said topic discovery system comprising:
  • a processor adapted to generate a bag of words vector for each text document collection using a predefined dictionary and adapted to calculate for each text document collection iteratively a hidden topic vector representing topics of the respective text document collection on the basis of the gen ⁇ erated bag of words vectors using a calculated hidden state vector memorizing a hidden state of all previous text docu ⁇ ment collections.
  • the processor is further adapted to derive automatically topic trends from the discovered topics of the text document col ⁇ lections .
  • the topic discovery system comprises an interface to output the discovered topics and/or topic trends.
  • Fig. 1 shows a block diagram for illustrating a possible exemplary embodiment of a topic discovery system according to an aspect of the present invention
  • FIG. 2 shows a flowchart of a possible exemplary embodi ⁇ ment of a method for performing automatically a discovery of topics according to a further aspect of the present invention
  • Fig. 3 illustrates a two-layered RNN-RSM model used for calculating the hidden topic vectors in a possible embodiment of the method and system according to the present invention.
  • a topic discovery system 1 can comprise in a possible embodiment several main com- ponents.
  • the topic discovery system 1 as shown in Fig. 1 provides an automatic discovery of topics and trends within tem ⁇ poral ordered text document collections.
  • the topic discovery system 1 comprises a repository or database where the timestamped or temporal ordered text document collections are stored.
  • the repository 2 comprises a lo ⁇ cal memory or database which stores timestamped or already temporal ordered or sorted text document collections.
  • the repository or database 2 can be a remote database connected to the topic discovery system 1.
  • the topic discovery system 1 comprises a processing unit or a processor 3 having access to the database or repository 2 where the timestamped or tern- poral ordered text document collections TDCs are stored.
  • the processor 3 has further access in the illustrated embodiment to a memory 4 storing a two-layered RNN-RSM model which can be used for calculating hidden topic vectors h by the proces- sor 3.
  • the processor 3 is adapted first to generate a bag of words vector v for each text document collection TDC using a predefined dictionary.
  • the predefined dictionary DIC can be stored in a further memory 5 as shown in Fig. 1.
  • the processor 3 is further adapted to calcu- late for each text document collection TDC stored in the da ⁇ tabase 2 iteratively a hidden topic vector h representing topics of the respective text document collection TDC on the basis of the generated bag of words vectors v using a calcu ⁇ lated hidden state vector u memorizing a hidden state of all previous text document collections TDCs.
  • the processor 3 is further adapted to derive automati ⁇ cally topic trends from the discovered topics of the text document collections TDCs.
  • the topic discovery system 1 further comprises an in- terface 6 for outputting the discovered topics and/or topic trends .
  • Each text document collection TDC stored in the repository 2 can comprise one or more text documents or text files. Each text document can comprise a plurality of words wherein each word can consist of characters. Each text document collection TDC comprises one or more text documents which have been pub ⁇ lished or output by a system at the same time. Accordingly, each text document collection TDC comprises in a preferred embodiment an associated time stamp TS. This time stamp TS can be generated in a possible embodiment automatically when the text documents of the text document collection TDC are published or output by the system. In a further possible em ⁇ bodiment, the time stamps TS are assigned by a user when the text documents of the text document collection TDC are pub ⁇ lished or output.
  • the time stamps TS indicate a time of pub ⁇ lication or a time of outputting the text documents.
  • Each text document collection TDC can comprise an individual plu- rality of different kinds of text documents.
  • the text documents comprise text files including words of a natural language wherein each word consists of one or several characters.
  • the text document collection TDC can comprise also source code files written in a programming language.
  • Each text document of a text document collection TDC describes one or several fea ⁇ tures of a technical system in a possible implementation. This technical system can for instance be a machine or a technical assembly of one or several components.
  • These compo ⁇ nents can include hardware components and/or software compo ⁇ nents .
  • the text documents of a text docu- ment collection TDC are generated automatically by a text document generation unit.
  • the text documents are generated by a user using a keyboard or by a voice recognition system.
  • the text documents are timestamped and temporally ordered. All text documents having the same time stamp TS belong to the same text document collection TDC.
  • the different text document collections TDCs each comprising one or several text documents are sorted in a temporal order according to their time stamp TS.
  • the text document collections TDCs are ordered accord ⁇ ing to the time when they have been published or output by a technical system.
  • the text doc ⁇ ument collections TDCs are ordered according to the time when they have been generated or produced.
  • the text documents be- longing to the same text document collection TDC comprise the same time of production and/or the same time of publication.
  • a text document collection TDC may comprise a number N of text documents having been output or published by a system within the same day, week or year.
  • each text document collection TDC may also comprise all text docu ⁇ ments which have been output by a technical system within the same periodic time interval of e.g. within the same minute.
  • Time stamps TS are automatically generated and assigned to the different text documents generated or output at the same time or within the same time period of a predefined time grid.
  • text document collec ⁇ tions TDCs may belong to a series of text documents published by the same medium such as a newspaper or a technical journal.
  • the different text documents can comprise different articles or files of a periodically published tech ⁇ nical journal.
  • the text documents can also belong to documentation documents documenting a technical system or a group of text documents generated by a machine for monitoring purposes. Another example for a series of text documents are reports generated by a user.
  • a doctor may docu ⁇ ment a healing process of a patient by generating associated text documents.
  • the doctor may dictate health reports which are converted automatically into associated text documents with corresponding time stamps.
  • Text documents of the same day or week may be sampled to form part of a text document collection TDC.
  • the text document col ⁇ lection TDC can comprise several text documents describing a patient's progress over time.
  • each text document collection TDC can comprise one or more text documents generated or published at the same time or within the same time period and comprising an associ- ated time stamp TS indicating the respective time of genera ⁇ tion or publication.
  • a sequence of text document collections TDCs can be sorted automatically according to the associated time stamps TS to assign time steps to the different text document collections TDCs.
  • the processor or processing unit 3 of the topic discovery system 1 as shown in Fig. 1 generates in a first processing step a bag of words vector v for each text document collec- tion TDC by using a predefined dictionary DIC stored in the memory 5 as shown in Fig. 1.
  • the topic discovery system 1 uses a single dictionary, for instance the dictionary of common English words comprising sev- eral thousand different common words used in the English lan ⁇ guage. This dictionary may for instance comprise 65,000 com ⁇ mon words used in the natural language English.
  • the generated bag of words vector v of a text document collection TDC indi ⁇ cates occurrence numbers of different words within the re- spective text document collection TDC.
  • a word can be used a hundred times in the text document collection TDC and another word has not been used at all.
  • the corre ⁇ sponding entry within the bag of words vector v indicates how often the respective word has occurred within the text docu- ment collection TDC.
  • sever ⁇ al different dictionaries DIC can be stored in the memory 5.
  • the dictionary DIC can be selected from a set of dictionaries stored in the memory 5 ac ⁇ cording to a topic domain and/or depending on the language of the text documents. For instance, if the topic domain collec ⁇ tion TDC comprises only text documents in English, an English word dictionary DIC is selected.
  • the topic domain collec ⁇ tion comprises further natural languages
  • other dictionaries can be selected as well.
  • the text document collec- tions TDCs can comprise natural language documents but also text documents written in predefined technical languages, in particular programming languages.
  • the type of the text documents forming part of the text document collections TDC is indicated and a matching diction- ary DIC is automatically selected.
  • the processor 3 can generate automatically a bag of words vector v for each text document collection TDC using the selected dictionaries DIC.
  • the processing unit 3 can on the basis of the generated bag of words vectors v calculate for each text document collection TDC read from the memory 2 itera- tively a hidden topic vector h representing topics of the re ⁇ spective text document collection TDC using a calculated hid- den state vector u which does memorize a hidden state repre ⁇ senting previous text document collections TDC.
  • the processing unit 3 can in a further processing step derive automatically topic trends from the calculated hidden topic vectors of the text document col ⁇ lections TDCs .
  • topic trends are output by the topic discovery system 1 via the interface 6 for further processing and/or for performing an automatic control of a system.
  • the discovered topics can be further automatically processed or evaluated to generate control signals output via the interface 6 to con ⁇ trol actuators of a technical system.
  • a text document of a text document collection TDC can also be input into the topic discovery system 1 as shown in Fig. 1 via the interface 6. The input text documents are stored in the database 2 according to their assigned time stamp.
  • the text document collection TDC is not read by the processing unit 3 from a memory 2 as shown in Fig. 1 but received via an interface in a text document collection data stream.
  • timestamped text document collections TDCs can be received in a temporal order or sequence via a data network connected to the topic discovery system 1.
  • the processing unit 3 is adapted to calculate a hidden topic vector h representing topics of the different text document collections TDCs using a model stored in the local memory 4 as shown in Fig. 1.
  • a hidden topic vector h t representing topics of a text document collection TDC at a time step t is calculated from the bag of words vec ⁇ tor v t of this time step t and from the hidden state vector Ut-i of the preceding time step t-1.
  • the hidden state vector ut-i of the preceding time step t-1 is computed in a preferred embodiment from the hidden topic vector h t -i at said preceding time step t-1 and its preceding hidden state vector u t -2 at a time step t-2.
  • the calculated hidden topic vector h of a text document collection TDC comprises in a possible embodiment a hidden topic probability vector indicating an occurrence probability of the different topics within the respective text document collections TDCs .
  • the method uses a two-layered re ⁇ current neural network, RNN-replicated softmax model RSM to calculate the hidden topic vectors h for the text document collections TDCs of a sequence of text document collections TDCs.
  • This two-layered RNN-RSM model comprises in a preferred embodiment an RSM layer including time-ordered hidden state vectors and associated bag of words vectors v for all text document collections TDCs and also an RNN hidden layer in ⁇ cluding the hidden state vector u.
  • Such a two-layered RNN-RSM model is illustrated in Fig. 3.
  • the model shown in Fig. 3 can be stored in a local memory 4 of the topic discovery system 1 and may comprise a two- layered model structure.
  • the model shown in Fig. 3 comprises an RSM layer and an RNN layer.
  • the RSM layer includes the time-ordered hidden state vectors h and associated bag of words vectors v.
  • the two-layered model further comprises an RNN hidden layer including the hidden state vectors u as shown in Fig. 3.
  • Fig. 3 In the illustrated embodiment of Fig.
  • the model used by the processing unit 3 comprises an RNN-RSM mod ⁇ el, a sequence of conditional RSMs such that at any time step t the RSM' s bias parameters b v and b3 ⁇ 4 depend on the output of a deterministic RNN with hidden layer u t -i in the previous time step t-1.
  • the bag of words vectors v form visible por- tions.
  • the V units are multinominal while the h units are stochastic binary.
  • the RNN hidden units u are constrained to convey temporal information while the RSM hidden units h mod ⁇ el conditional distributions.
  • h E ⁇ 0,1 ⁇ is binary stochastic hidden topic
  • v ⁇ h define the conditional distributions for a visible unit, v and hidden unit hj at time step t.
  • v k ' (t) is sampled D times with identical weights connecting to binary hidden units, resulting in multinominal visibles, therefore, the name Replicated Softmax.
  • biases b of RSM depend on the output of RNN at previous time steps that allows to propagate the estimated gradient at each RSM (with respect to biases) backward through time
  • W uv and W uh are the weight parameters between hidden unit u and input v, and u and h, respectively.
  • W uu is the weight parameter between RNN hidden units
  • W vu is the weight parameter between the visible unit v and hidden unit u.
  • RSM is an energy-based model where energy of the state
  • the RNN-RSM model can be trained.
  • the RNN-RSM model can be trained with BPTT.
  • the used RNN-RSM model captures longer spans for keywords or trends compared to con ventional models.
  • the RNN-RSM model can be used for temporal topic modeling and/or for trend analysis.
  • the method and system 1 according to the present invention can be used for a variety of use cases.
  • the method and system 1 for providing automatic topic discovery of topics can be used for any kind of temporal ordered text document collec ⁇ tions TDC published or generated over time.
  • the method and system 1 can be used for detecting topics and for tracking different topics over time. Furthermore, the detected topic can trigger a control or monitoring routine for a technical system.
  • the detected topic or trend can trigger a process such as a repair or maintenance process for a ma- chine.
  • the discovered topics can be evaluated or processed to calculate automatically trends within the text document collections TDC to make predictions for the future development within a technical domain or for an investigated technical system.
  • the text documents sorted in the ordered sequence of text document collections TDC can originate from the same or different text-generating sources.
  • the text docu ⁇ ments may be produced by different users and/or generated by different technical hardware or software components of a technical system.
  • the method and system 1 according to the present invention is able to capture longer trends of discov ⁇ ered topics over time allowing to make more accurate predic- tions and to initiate more suitable measures or processes pending on the discovered topics.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

A method and system for performing automatically a discovery of topics within temporal ordered text document collections (TDCs), the method comprising the steps of generating (S1) a bag of words vector, v, for each text document collection (TDC) using a predefined dictionary (DIC); and calculating (S2) on the basis of the generated bag of words vectors, v, for each text document collection (TDC) iteratively a hidden topic vector, h, representing topics of the respective text document collection (TDC) using a calculated hidden state vector, u, memorizing a hidden state of all previous text document collections (TDCs).

Description

Description
Method and system for automatic discovery of topics and trends over time
The invention relates to a method and system for automatic discovery of topics within temporal ordered text document collections . Topic detection and tracking systems perform natural language processing to find topics over time in a sequence of text collections which exhibit temporal relationships. Probabilis¬ tic topic models have been used in the past to extract seman¬ tic topics from text documents. Probabilistic topic models such as Latent Dirichlet Allocation LDA have been investigat¬ ed to examine the emergence of topics from a sequence of text documents. These conventional methods compute a topic distri¬ bution on all text collections to detect and track different topics. However, these conventional methods do not take into account the dependencies between different text collections over time in an evolutional process. Accordingly, conventional methods do not take into account temporal latent topic de¬ pendencies between topic collections evolving over time and forming a temporal sequence of text document collections. Consequently, conventional methods cannot capture topic trends and/or keyword trends accurately.
It is an object of the present invention to provide a method and system for providing a precise topic detection and track- ing within temporal ordered text document collections.
This object is achieved according to a first aspect of the present invention by a method comprising the features of claim 1.
The invention provides according to the first aspect of the present invention a method for performing automatically a discovery of topics within temporal ordered text document collections ,
the method comprising the steps of:
generating a bag of words vector for each text document col- lection using a predefined dictionary;
calculating on the basis of the generated bag of words vec¬ tors for each text document collection iteratively a hidden topic vector representing topics of the respective text docu¬ ment collection using a calculated hidden state vector memo- rizing a hidden state of all previous text document collec¬ tions .
In a possible embodiment of the method according to the first aspect of the present invention, topic trends are derived from the calculated hidden topic vectors of the text document collections .
In a further possible embodiment of the method according to the first aspect of the present invention, each text document collection comprises one or more text documents published or output at the same time and comprising an associated time stamp indicating the time of publication or output.
In a further possible embodiment of the method according to the first aspect of the present invention, a sequence of text document collections is sorted according to the associated time stamps to assign time steps to the text document collec¬ tions . In a further possible embodiment of the method according to the first aspect of the present invention, a hidden topic vector ht representing topics of a text document collection at a time step t is calculated from the bag of words vector vt of this time step t and from the hidden state vectors ut-i of the preceding time step t-1 which is computed from the hidden topic vector ht-i at that preceding time step t-1 and its preceding hidden state vectors ut-2 at time step t-2. In a further possible embodiment of the method according to the first aspect of the present invention, the calculated hidden topic vector h of a text document collection comprises a hidden topic probability vector indicating occurrence prob- abilities of different topics within the respective text doc¬ ument collection.
In a further possible embodiment of the method according to the first aspect of the present invention, the generated bag of words vector of a text document collection indicates oc¬ currence numbers of words within the respective text document collection .
In a still further possible embodiment of the method accord- ing to the first aspect of the present invention, a two- layered recurrent neural network, RNN, -replicated softmax model, RSM, is used to calculate the hidden topic vectors for the text document collections of a sequence of text document collections .
In a still further possible embodiment of the method accord¬ ing to the first aspect of the present invention, the two- layered RNN-RSM model comprises
an RSM layer including time-ordered hidden state vectors and associated bag of words vectors for text document collections and
an RNN hidden layer including the hidden state vectors.
In a further possible embodiment of the method according to the first aspect of the present invention, the discovered topics are automatically evaluated to control a process.
In a further possible embodiment of the method according to the first aspect of the present invention, the text document collection comprises text files including words of a natural language consisting of characters. In a still further possible embodiment of the method accord¬ ing to the first aspect of the present invention, the text document collection comprises source code files written in a programming language.
In a still further possible embodiment of the method accord¬ ing to the first aspect of the present invention, the text document of a text document collection describes features of a technical system.
The invention further provides according to a second aspect a topic discovery system comprising the features of claims 13.
The invention provides according to the second aspect of the present invention a topic discovery system for automatic dis¬ covery of topics and trends within temporal ordered text doc¬ ument collections,
said topic discovery system comprising:
a repository which stores temporal ordered text document col- lections and
a processor adapted to generate a bag of words vector for each text document collection using a predefined dictionary and adapted to calculate for each text document collection iteratively a hidden topic vector representing topics of the respective text document collection on the basis of the gen¬ erated bag of words vectors using a calculated hidden state vector memorizing a hidden state of all previous text docu¬ ment collections. In a possible embodiment of the topic discovery system ac¬ cording to the second aspect of the present invention, the processor is further adapted to derive automatically topic trends from the discovered topics of the text document col¬ lections .
In a further possible embodiment of the topic discovery sys¬ tem according to the second aspect of the present invention, the topic discovery system comprises an interface to output the discovered topics and/or topic trends.
In the following, possible embodiments of the different as- pects of the present invention are described in more detail with reference to the enclosed figures.
Fig. 1 shows a block diagram for illustrating a possible exemplary embodiment of a topic discovery system according to an aspect of the present invention;
Fig. 2 shows a flowchart of a possible exemplary embodi¬ ment of a method for performing automatically a discovery of topics according to a further aspect of the present invention;
Fig. 3 illustrates a two-layered RNN-RSM model used for calculating the hidden topic vectors in a possible embodiment of the method and system according to the present invention.
As can be seen in the block diagram of Fig. 1, a topic discovery system 1 according to an aspect of the present invention can comprise in a possible embodiment several main com- ponents. The topic discovery system 1 as shown in Fig. 1 provides an automatic discovery of topics and trends within tem¬ poral ordered text document collections. In the illustrated embodiment of Fig. 1, the topic discovery system 1 comprises a repository or database where the timestamped or temporal ordered text document collections are stored. In the illus¬ trated embodiment of Fig. 1, the repository 2 comprises a lo¬ cal memory or database which stores timestamped or already temporal ordered or sorted text document collections. In an alternative embodiment, the repository or database 2 can be a remote database connected to the topic discovery system 1. In the shown embodiment of Fig. 1, the topic discovery system 1 comprises a processing unit or a processor 3 having access to the database or repository 2 where the timestamped or tern- poral ordered text document collections TDCs are stored. The processor 3 has further access in the illustrated embodiment to a memory 4 storing a two-layered RNN-RSM model which can be used for calculating hidden topic vectors h by the proces- sor 3. The processor 3 is adapted first to generate a bag of words vector v for each text document collection TDC using a predefined dictionary. In a possible implementation, the predefined dictionary DIC can be stored in a further memory 5 as shown in Fig. 1. The processor 3 is further adapted to calcu- late for each text document collection TDC stored in the da¬ tabase 2 iteratively a hidden topic vector h representing topics of the respective text document collection TDC on the basis of the generated bag of words vectors v using a calcu¬ lated hidden state vector u memorizing a hidden state of all previous text document collections TDCs. In a possible embod¬ iment, the processor 3 is further adapted to derive automati¬ cally topic trends from the discovered topics of the text document collections TDCs. In the illustrated embodiment of Fig. 1, the topic discovery system 1 further comprises an in- terface 6 for outputting the discovered topics and/or topic trends .
Each text document collection TDC stored in the repository 2 can comprise one or more text documents or text files. Each text document can comprise a plurality of words wherein each word can consist of characters. Each text document collection TDC comprises one or more text documents which have been pub¬ lished or output by a system at the same time. Accordingly, each text document collection TDC comprises in a preferred embodiment an associated time stamp TS. This time stamp TS can be generated in a possible embodiment automatically when the text documents of the text document collection TDC are published or output by the system. In a further possible em¬ bodiment, the time stamps TS are assigned by a user when the text documents of the text document collection TDC are pub¬ lished or output. The time stamps TS indicate a time of pub¬ lication or a time of outputting the text documents. Each text document collection TDC can comprise an individual plu- rality of different kinds of text documents. In a possible embodiment, the text documents comprise text files including words of a natural language wherein each word consists of one or several characters. In a possible implementation, the text document collection TDC can comprise also source code files written in a programming language. Each text document of a text document collection TDC describes one or several fea¬ tures of a technical system in a possible implementation. This technical system can for instance be a machine or a technical assembly of one or several components. These compo¬ nents can include hardware components and/or software compo¬ nents .
In a possible embodiment, the text documents of a text docu- ment collection TDC are generated automatically by a text document generation unit. In an alternative embodiment, the text documents are generated by a user using a keyboard or by a voice recognition system. In a possible embodiment, the text documents are timestamped and temporally ordered. All text documents having the same time stamp TS belong to the same text document collection TDC. In a possible embodiment, the different text document collections TDCs each comprising one or several text documents are sorted in a temporal order according to their time stamp TS. In a possible implementa- tion, the text document collections TDCs are ordered accord¬ ing to the time when they have been published or output by a technical system. In an alternative embodiment, the text doc¬ ument collections TDCs are ordered according to the time when they have been generated or produced. The text documents be- longing to the same text document collection TDC comprise the same time of production and/or the same time of publication. For example, a text document collection TDC may comprise a number N of text documents having been output or published by a system within the same day, week or year. Further, each text document collection TDC may also comprise all text docu¬ ments which have been output by a technical system within the same periodic time interval of e.g. within the same minute. Time stamps TS are automatically generated and assigned to the different text documents generated or output at the same time or within the same time period of a predefined time grid. Different text documents having the same time stamps TS are collected within a text document collection TDC. Further, the different text document collections TDCs each having an associated time stamp TS can be stored in the database 2 and then be sorted or ordered according to the corresponding time stamps. In a possible implementation, text document collec¬ tions TDCs may belong to a series of text documents published by the same medium such as a newspaper or a technical journal. For instance, the different text documents can comprise different articles or files of a periodically published tech¬ nical journal. However, the text documents can also belong to documentation documents documenting a technical system or a group of text documents generated by a machine for monitoring purposes. Another example for a series of text documents are reports generated by a user. For example, a doctor may docu¬ ment a healing process of a patient by generating associated text documents. For instance, the doctor may dictate health reports which are converted automatically into associated text documents with corresponding time stamps. Text documents of the same day or week may be sampled to form part of a text document collection TDC. Accordingly, the text document col¬ lection TDC can comprise several text documents describing a patient's progress over time.
Accordingly, each text document collection TDC can comprise one or more text documents generated or published at the same time or within the same time period and comprising an associ- ated time stamp TS indicating the respective time of genera¬ tion or publication. A sequence of text document collections TDCs can be sorted automatically according to the associated time stamps TS to assign time steps to the different text document collections TDCs.
The processor or processing unit 3 of the topic discovery system 1 as shown in Fig. 1 generates in a first processing step a bag of words vector v for each text document collec- tion TDC by using a predefined dictionary DIC stored in the memory 5 as shown in Fig. 1. In a possible embodiment, the topic discovery system 1 uses a single dictionary, for instance the dictionary of common English words comprising sev- eral thousand different common words used in the English lan¬ guage. This dictionary may for instance comprise 65,000 com¬ mon words used in the natural language English. The generated bag of words vector v of a text document collection TDC indi¬ cates occurrence numbers of different words within the re- spective text document collection TDC. For instance, a word can be used a hundred times in the text document collection TDC and another word has not been used at all. The corre¬ sponding entry within the bag of words vector v indicates how often the respective word has occurred within the text docu- ment collection TDC. In a further possible embodiment, sever¬ al different dictionaries DIC can be stored in the memory 5. In a possible implementation, the dictionary DIC can be selected from a set of dictionaries stored in the memory 5 ac¬ cording to a topic domain and/or depending on the language of the text documents. For instance, if the topic domain collec¬ tion TDC comprises only text documents in English, an English word dictionary DIC is selected. If the topic domain collec¬ tion comprises further natural languages, other dictionaries can be selected as well. Further, the text document collec- tions TDCs can comprise natural language documents but also text documents written in predefined technical languages, in particular programming languages. In a possible implementa¬ tion, the type of the text documents forming part of the text document collections TDC is indicated and a matching diction- ary DIC is automatically selected. After one or several dic¬ tionaries have been selected, the processor 3 can generate automatically a bag of words vector v for each text document collection TDC using the selected dictionaries DIC. In a fur¬ ther processing step, the processing unit 3 can on the basis of the generated bag of words vectors v calculate for each text document collection TDC read from the memory 2 itera- tively a hidden topic vector h representing topics of the re¬ spective text document collection TDC using a calculated hid- den state vector u which does memorize a hidden state repre¬ senting previous text document collections TDC.
After having calculated the hidden topic vector for each text document collection TDC, the processing unit 3 can in a further processing step derive automatically topic trends from the calculated hidden topic vectors of the text document col¬ lections TDCs . In a possible embodiment, topic trends are output by the topic discovery system 1 via the interface 6 for further processing and/or for performing an automatic control of a system. In a possible embodiment, the discovered topics can be further automatically processed or evaluated to generate control signals output via the interface 6 to con¬ trol actuators of a technical system. In a further possible embodiment, a text document of a text document collection TDC can also be input into the topic discovery system 1 as shown in Fig. 1 via the interface 6. The input text documents are stored in the database 2 according to their assigned time stamp. In a further alternative embodiment, the text document collection TDC is not read by the processing unit 3 from a memory 2 as shown in Fig. 1 but received via an interface in a text document collection data stream. In this alternative embodiment, timestamped text document collections TDCs can be received in a temporal order or sequence via a data network connected to the topic discovery system 1.
The processing unit 3 is adapted to calculate a hidden topic vector h representing topics of the different text document collections TDCs using a model stored in the local memory 4 as shown in Fig. 1. In a possible embodiment, a hidden topic vector ht representing topics of a text document collection TDC at a time step t is calculated from the bag of words vec¬ tor vt of this time step t and from the hidden state vector Ut-i of the preceding time step t-1. The hidden state vector ut-i of the preceding time step t-1 is computed in a preferred embodiment from the hidden topic vector ht-i at said preceding time step t-1 and its preceding hidden state vector ut-2 at a time step t-2. The calculated hidden topic vector h of a text document collection TDC comprises in a possible embodiment a hidden topic probability vector indicating an occurrence probability of the different topics within the respective text document collections TDCs .
In a possible embodiment, the method uses a two-layered re¬ current neural network, RNN-replicated softmax model RSM to calculate the hidden topic vectors h for the text document collections TDCs of a sequence of text document collections TDCs. This two-layered RNN-RSM model comprises in a preferred embodiment an RSM layer including time-ordered hidden state vectors and associated bag of words vectors v for all text document collections TDCs and also an RNN hidden layer in¬ cluding the hidden state vector u.
Such a two-layered RNN-RSM model is illustrated in Fig. 3. The model shown in Fig. 3 can be stored in a local memory 4 of the topic discovery system 1 and may comprise a two- layered model structure. The model shown in Fig. 3 comprises an RSM layer and an RNN layer. The RSM layer includes the time-ordered hidden state vectors h and associated bag of words vectors v. The two-layered model further comprises an RNN hidden layer including the hidden state vectors u as shown in Fig. 3. In the illustrated embodiment of Fig. 3, the model used by the processing unit 3 comprises an RNN-RSM mod¬ el, a sequence of conditional RSMs such that at any time step t the RSM' s bias parameters bv and b¾ depend on the output of a deterministic RNN with hidden layer ut-i in the previous time step t-1. The bag of words vectors v form visible por- tions. The V units are multinominal while the h units are stochastic binary. The RNN hidden units u are constrained to convey temporal information while the RSM hidden units h mod¬ el conditional distributions. Consequently, parameters bv, b¾ Wvh are time-dependent on the sequence history at time t (via a series of conditional RSMs) noted by 9t where 9t = {ντ, ιιτ\τ < t} that captures temporal dependencies. The RNN-RSM model can be defined by its joint probability distribution: t=i
(1) where V = [v1 ... v1-] and
H = [h1 ,
wherein h E {0,1} is binary stochastic hidden topic and
E {1, ... K}D is the discrete visible unit,
wherein K is dictionary size and D is the number of documents published at the same time. The conditional distribution in each RSM at time step t can be given by a softmax and logistic functions as follows:
Figure imgf000013_0001
(2) v if = ψω) = σ(Ρ v '(t) Wlj)
i=l k=l
(3) where and
Figure imgf000013_0002
v{h = define the conditional distributions for a visible unit, v and hidden unit hj at time step t. vk' (t) is sampled D times with identical weights connecting to binary hidden units, resulting in multinominal visibles, therefore, the name Replicated Softmax.
While biases b of RSM depend on the output of RNN at previous time steps that allows to propagate the estimated gradient at each RSM (with respect to biases) backward through time
(BPTT) the RSM biases at each time step t are given by: bf =bv + Wuvu^; b =bh + WuhuV-
(4) where Wuv and Wuh are the weight parameters between hidden unit u and input v, and u and h, respectively. The RNN hidden state u(t) can be computed by: u = tanh bu + Wuuu^-^ + Wuvv )
(5) where bu is the bias of u. Wuu is the weight parameter between RNN hidden units, while Wvu is the weight parameter between the visible unit v and hidden unit u.
RSM is an energy-based model where energy of the state
{v^ h^} at time step t is given by:
F K
E( , (t)) = - ^ ^ WW vk - ^ vkbk - D ^ bh JW
j=l fe=l fc=l
(6; where
Figure imgf000014_0001
denotes the count for the kth word. The ener¬ gy and probability are related by: ρ(τ?Φ) = exp (-£(i7(t),/(t)))
(7) where Z =∑v(t)∑¾Μ exp (—E( ^ h^)) is normalization constant.
In a further possible embodiment, the RNN-RSM model can be trained. In a possible embodiment, the RNN-RSM model can be trained with BPTT.
The cost function in RNN-RSM model is given by C≡ -logPO(t)). Training with respect to nine parameters
(Wuh,Wvh,Wvu,Wuu,bv,bh,by b^ u^) can follow in a possible embod- iment the following steps at each iteration:
1. Propagate deterministic hidden units in RNN portion using equation (5) . Compute RSM parameters and dependent on RNN hid¬ den units, in equation (4) .
Reconstruct the visibles, i.e. negative samples v^* us¬ ing k-step Gibbs sampling.
Estimate the gradient of the cost C using the negative samples as: dC dF(v^) dF v^ )
de& δθ^ deW
(8) where the free energy is related to normalized probability of as P(v^)≡ exp~F(v^ /Z and is given by:
K F K
F(i7(t)) - ^ vk^bk - ^ log (1 + exp (Dbh J + ^ vk^Wik))
k=l j=l k=l
(9)
The gradient with respect to RSM parameters is approxi¬ mated by Contrastive Divergence (CD); using equation (8) and (9) :
Figure imgf000015_0001
dC
db (t) ^ a Wv^ - b?) - a Wv^ - b?) -
Figure imgf000015_0002
(10) Back-propagate the estimated gradient with respect to RSM biases via hidden-to-bias parameters to compute gradients with respect to deterministic RNN connections
(Wuv,Wuu,Wvu,u°) and biases (bv,bh).
In a possible implementation, an average span of selected keywords can be calculated as follows: sp navg{Q) =
Figure imgf000016_0001
(11) where Q is the set of words appearing in topics over the years extracted by the topic model. The used RNN-RSM model captures longer spans for keywords or trends compared to con ventional models. The RNN-RSM model can be used for temporal topic modeling and/or for trend analysis. The method and system 1 according to the present invention can be used for a variety of use cases. The method and system 1 for providing automatic topic discovery of topics can be used for any kind of temporal ordered text document collec¬ tions TDC published or generated over time. The method and system 1 can be used for detecting topics and for tracking different topics over time. Furthermore, the detected topic can trigger a control or monitoring routine for a technical system. Further, the detected topic or trend can trigger a process such as a repair or maintenance process for a ma- chine. Further, the discovered topics can be evaluated or processed to calculate automatically trends within the text document collections TDC to make predictions for the future development within a technical domain or for an investigated technical system. The text documents sorted in the ordered sequence of text document collections TDC can originate from the same or different text-generating sources. The text docu¬ ments may be produced by different users and/or generated by different technical hardware or software components of a technical system. The method and system 1 according to the present invention is able to capture longer trends of discov¬ ered topics over time allowing to make more accurate predic- tions and to initiate more suitable measures or processes pending on the discovered topics.

Claims

Patent Claims
1. A method for performing automatically a discovery of top¬ ics and trends within temporal ordered text document collec- tions (TDCs),
the method comprising the steps of:
(a) generating (SI) a bag of words vectors, v, for each text document collection (TDC) using a predefined dictionary (DIC) ;
(b) calculating (S2) on the basis of the generated bag of words vectors, v, for each text document collection (TDC) iteratively a hidden topic vector, h, repre- senting topics of the respective text document col¬ lection (TDC) using a calculated hidden state vector, u, memorizing a hidden state of all previous text document collections (TDCs) .
2. The method according to claim 1 wherein topic trends are derived from the calculated hidden topic vectors, h, of the text document collections (TDCs) .
3. The method according to claim 1 or 2 wherein each text document collection (TDC) comprises one or more text docu¬ ments published or output at the same time and comprising an associated time stamp, TS, indicating the time of publica¬ tion .
4. The method according to any of the preceding claims 1 to 3 wherein a sequence of text document collections (TDCs) is sorted according to the associated time stamps, TSs, to as¬ sign time steps to the text document collections (TDCs) .
5. The method according to claim 4 wherein a hidden topic vector, ht, representing topics of a text document collection (TDC) at a time step t is calculated from the bag of words vectors, vt, at this time step t and from hidden state vec- tors, ut-i, of the preceding time step, t-1, which is computed from the hidden topic vector, ht-i, at that preceding time step t-1 and its preceding hidden state vectors, ut-2, at time step t-2.
6. The method according to any of the preceding claims 1 to 5 wherein the calculated hidden topic vector, h, of a text doc¬ ument collection (TDC) comprises a hidden topic probability vector indicating occurrence probabilities of different top- ics within the respective text document collection (TDC) .
7. The method according to any of the preceding claims 1 to 6 wherein the generated bag of words vectors, v, of a text doc¬ ument collection (TDC) indicates occurrence numbers of words within the respective text document collection (TDC) .
8. The method according to any of the preceding claims 1 to 7 wherein a two-layered recurrent neural network, RNN, - replicated softmax model, RSM, is used to calculate hidden topic vectors, h, for the text document collections (TDCs) of a sequence of text document collections.
9. The method according to claim 8 wherein the two-layered RNN-RSM model comprises
an RSM layer including time-ordered hidden state vectors, h, and associated bag of words vectors, v, for all text document collections (TDCs) and
an RNN hidden layer including the hidden state vectors, u.
10. The method according to any of the preceding claims 1 to
9 wherein the discovered topics are automatically evaluated to control a process.
11. The method according to any of the preceding claims 1 to
10 wherein the text document collection (TDC) comprises text files including words of a natural language consisting of characters and/or source code files written in a programming language .
12. The method according to any of the preceding claims 1 to 11 wherein the text document of a text document collection (TDC) describes features of a technical system.
13. A topic discovery system for automatic discovery of topics within temporal ordered text document collections (TDCs) , said topic discovery system (1) comprising:
(a) a repository (2) which stores the temporal ordered text document collections (TDCs) ; and
(b) a processor (3) adapted to generate a bag of words vector, v, for each text document collection (TDC) using a predefined dictionary (DIC) and adapted to calculate for each text document collection (TDC) it- eratively a hidden topic vector representing topics of the respective text document collection (TDC) us¬ ing a calculated hidden state vector memorizing a hidden state of all previous text document collec¬ tions .
14. The topic discovery system according to claim 13 wherein the processor (3) is further adapted to derive automatically topic trends from the discovered topics of the text document collections (TDCs) .
15. The topic discovery system according to claim 13 or 14 wherein the topic discovery system (1) comprises an interface (6) to output the discovered topics and/or topic trends.
PCT/EP2018/069210 2017-07-17 2018-07-16 Method and system for automatic discovery of topics and trends over time WO2019016119A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18746623.0A EP3635504A1 (en) 2017-07-17 2018-07-16 Method and system for automatic discovery of topics and trends over time
US16/632,022 US11520817B2 (en) 2017-07-17 2018-07-16 Method and system for automatic discovery of topics and trends over time

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP17181687.9 2017-07-17
EP17181687.9A EP3432155A1 (en) 2017-07-17 2017-07-17 Method and system for automatic discovery of topics and trends over time

Publications (1)

Publication Number Publication Date
WO2019016119A1 true WO2019016119A1 (en) 2019-01-24

Family

ID=59366270

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2018/069210 WO2019016119A1 (en) 2017-07-17 2018-07-16 Method and system for automatic discovery of topics and trends over time

Country Status (3)

Country Link
US (1) US11520817B2 (en)
EP (2) EP3432155A1 (en)
WO (1) WO2019016119A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3770795A1 (en) 2019-07-24 2021-01-27 Gong I.O Ltd. Unsupervised automated extraction of conversation structure from recorded conversations

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851573A (en) * 2018-07-27 2020-02-28 北京京东尚科信息技术有限公司 Statement processing method and system and electronic equipment
CN111694949B (en) * 2019-03-14 2023-12-05 京东科技控股股份有限公司 Multi-text classification method and device
US11308285B2 (en) 2019-10-31 2022-04-19 International Business Machines Corporation Triangulated natural language decoding from forecasted deep semantic representations
CN112069394B (en) * 2020-08-14 2023-09-29 上海风秩科技有限公司 Text information mining method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150019204A1 (en) * 2013-07-12 2015-01-15 Microsoft Corporation Feature completion in computer-human interactive learning
WO2017097231A1 (en) * 2015-12-11 2017-06-15 北京国双科技有限公司 Topic processing method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090070346A1 (en) * 2007-09-06 2009-03-12 Antonio Savona Systems and methods for clustering information
US8719302B2 (en) * 2009-06-09 2014-05-06 Ebh Enterprises Inc. Methods, apparatus and software for analyzing the content of micro-blog messages
US9047283B1 (en) * 2010-01-29 2015-06-02 Guangsheng Zhang Automated topic discovery in documents and content categorization
US20130311471A1 (en) * 2011-02-15 2013-11-21 Nec Corporation Time-series document summarization device, time-series document summarization method and computer-readable recording medium
TW201820172A (en) * 2016-11-24 2018-06-01 財團法人資訊工業策進會 System, method and non-transitory computer readable storage medium for conversation analysis
US10452702B2 (en) * 2017-05-18 2019-10-22 International Business Machines Corporation Data clustering

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150019204A1 (en) * 2013-07-12 2015-01-15 Microsoft Corporation Feature completion in computer-human interactive learning
WO2017097231A1 (en) * 2015-12-11 2017-06-15 北京国双科技有限公司 Topic processing method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BOLELLI LEVENT ET AL: "Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation", 6 April 2009, NETWORK AND PARALLEL COMPUTING; [LECTURE NOTES IN COMPUTER SCIENCE; LECT.NOTES COMPUTER], SPRINGER INTERNATIONAL PUBLISHING, CHAM, PAGE(S) 776 - 780, ISBN: 978-3-642-24392-9, ISSN: 0302-9743, XP047401879 *
DAVID M BLEI ET AL: "Dynamic topic models", PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON MACHINE LEARNING , ICML '06, ACM PRESS, NEW YORK, NEW YORK, USA, 25 June 2006 (2006-06-25), pages 113 - 120, XP058119059, ISBN: 978-1-59593-383-6, DOI: 10.1145/1143844.1143859 *
XUERUI WANG ET AL: "Topics over time", PROCEEDINGS OF THE TWELFTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING : AUGUST 20 - 23, 2006, PHILADELPHIA, PA, USA, NEW YORK, NY : ACM PRESS, 2 PENN PLAZA, SUITE 701 NEW YORK NY 10121-0701 USA, 20 August 2006 (2006-08-20), pages 424 - 433, XP058107672, ISBN: 978-1-59593-339-3, DOI: 10.1145/1150402.1150450 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3770795A1 (en) 2019-07-24 2021-01-27 Gong I.O Ltd. Unsupervised automated extraction of conversation structure from recorded conversations

Also Published As

Publication number Publication date
US11520817B2 (en) 2022-12-06
US20200151207A1 (en) 2020-05-14
EP3432155A1 (en) 2019-01-23
EP3635504A1 (en) 2020-04-15

Similar Documents

Publication Publication Date Title
WO2019016119A1 (en) Method and system for automatic discovery of topics and trends over time
CN110674255B (en) Text content auditing method and device
KR102353545B1 (en) Method and Apparatus for Recommending Disaster Response
CN106528616B (en) Language error correction method and system in human-computer interaction process
CN112347254B (en) Method, device, computer equipment and storage medium for classifying news text
CN113704410A (en) Emotion fluctuation detection method and device, electronic equipment and storage medium
CN115222443A (en) Client group division method, device, equipment and storage medium
CN110287270B (en) Entity relationship mining method and equipment
CN116795978A (en) Complaint information processing method and device, electronic equipment and medium
JP2017173866A (en) Extraction device and extraction method
CN112215006B (en) Organization named entity normalization method and system
CN113901793A (en) Event extraction method and device combining RPA and AI
CN114841579A (en) Index data generation method, device, equipment and storage medium
Horn Exploring Word Usage Change with Continuously Evolving Embeddings
US11115440B2 (en) Dynamic threat intelligence detection and control system
CN114547231A (en) Data tracing method and system
Buana et al. Text-Based Content Analysis on Social Media Using Topic Modelling to Support Digital Marketing
Lokeswari et al. Comparative study of classification algorithms in sentiment analysis
CN115860706B (en) Customized demand oriented personnel sorting method and device
CN108733824A (en) Consider the interactive theme modeling method and device of expertise
CN113255367B (en) Emotion analysis method, device, equipment and storage medium
Permana et al. Named Entity Recognition Using Bidirectional Lstm-Crf Methods in Indonesian Text
Nicandro et al. 128 Artificial Intelligence Research and Development I. Sanz et al.(Eds.) 2023 The Authors. This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
Refianti et al. Sentiment Analysis Using Convolutional Neural Network Method to Classify Reviews on Zoom Cloud Meetings Application Based on Reviews on Google Playstore
JP6714276B2 (en) Information extraction device, information extraction method and program

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018746623

Country of ref document: EP

Effective date: 20200107

NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18746623

Country of ref document: EP

Kind code of ref document: A1