CN109388806B - Chinese word segmentation method based on deep learning and forgetting algorithm - Google Patents

Chinese word segmentation method based on deep learning and forgetting algorithm Download PDF

Info

Publication number
CN109388806B
CN109388806B CN201811258651.5A CN201811258651A CN109388806B CN 109388806 B CN109388806 B CN 109388806B CN 201811258651 A CN201811258651 A CN 201811258651A CN 109388806 B CN109388806 B CN 109388806B
Authority
CN
China
Prior art keywords
word
words
deep learning
word segmentation
stock
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811258651.5A
Other languages
Chinese (zh)
Other versions
CN109388806A (en
Inventor
卢学裕
王安
杨大海
杨利军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Botbrain Intelligent Technology Co ltd
Original Assignee
Beijing Botbrain Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Botbrain Intelligent Technology Co ltd filed Critical Beijing Botbrain Intelligent Technology Co ltd
Priority to CN201811258651.5A priority Critical patent/CN109388806B/en
Publication of CN109388806A publication Critical patent/CN109388806A/en
Application granted granted Critical
Publication of CN109388806B publication Critical patent/CN109388806B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a Chinese word segmentation method based on deep learning and forgetting algorithm, which comprises the following steps: and (3) a step of: the method comprises the steps of scanning sentences word by word to obtain natural language, dividing the scanned natural language into word sequences by adopting a deep learning word segmentation method, and collecting the word sequences into a first word stock; and II: the method comprises the steps of scanning sentences word by word to obtain natural language, dividing the obtained natural language into candidate words by adopting a forgetting algorithm word segmentation method, and collecting the candidate words into a second word bank, and thirdly: fusing the word sequence in the first word bank with the candidate words in the second word bank to obtain a final word segmentation result, wherein the fusion method comprises the following steps: the continuous single words in the second word stock are combined into words if the words in the corresponding deep learning; and if the single word in the second word stock is the word in the corresponding deep learning, merging the single word forwards or backwards into the word. According to the word segmentation method, the deep learning word segmentation method and the forgetting algorithm word segmentation method are fused, so that the knowledge of the field can be automatically detected, a new word discovery function in the unsupervised field is completed, and the word segmentation effect is improved.

Description

Chinese word segmentation method based on deep learning and forgetting algorithm
Technical Field
The invention relates to the technical field of word segmentation, in particular to a Chinese word segmentation method based on deep learning and forgetting algorithm.
Background
Chinese segmentation (Chinese Word Segmentation) refers to the segmentation of a sequence of Chinese characters into individual words. Word segmentation is the process of recombining a continuous word sequence into a word sequence according to a certain specification.
1. Word segmentation method based on character string matching
The method is also called a mechanical word segmentation method, and is characterized in that a Chinese character string to be analyzed is matched with entries in a 'fully large' machine dictionary according to a certain strategy, and if a certain character string is found in the dictionary, the matching is successful (a word is identified). According to different scanning directions, the string matching word segmentation method can be divided into forward matching and reverse matching; according to the situation of the preferential matching of different lengths, the matching can be divided into maximum (longest) matching and minimum (shortest) matching; according to the combination of the part-of-speech labeling process or not, the method can be divided into a simple word segmentation method and an integrated method combining word segmentation and labeling. Several mechanical word segmentation methods are commonly used as follows:
1) Forward maximum matching (left to right direction);
2) Reverse maximum matching (right-to-left direction);
3) Minimum segmentation (minimizing the number of words cut in each sentence).
The above-described various methods may also be combined with each other, and for example, a forward maximum matching method and a reverse maximum matching method may be combined to constitute a bidirectional matching method. Due to the word forming characteristics of Chinese characters, the forward minimum matching and the reverse minimum matching are rarely used. In general, the segmentation accuracy of reverse matching is slightly higher than that of forward matching, and the ambiguity phenomenon is less. The statistics show that the error rate of the pure forward maximum matching is 1/169, and the error rate of the pure reverse maximum matching is 1/245. But this accuracy is far from meeting practical requirements. The word segmentation system actually used takes mechanical word segmentation as a primary segmentation means, and further improves the segmentation accuracy by utilizing various other language information.
One method is to improve the scanning mode, called feature scanning or sign segmentation, to identify and segment some words with obvious features in the character string to be analyzed, and to use these words as break points to divide the original character string into smaller strings and then to enter mechanical word segmentation, so as to reduce the error rate of matching. The other method combines word segmentation and word class labeling, provides help for word segmentation decision by using rich word class information, and also carries out inspection and adjustment on word segmentation results in the labeling process, thereby greatly improving the segmentation accuracy.
For the mechanical word segmentation method, a general model can be built, and a professional academic paper is provided in this respect, which is not discussed in detail here.
2. Word segmentation method based on understanding
The word segmentation method achieves the effect of word recognition by enabling a computer to simulate the understanding of people to sentences. The basic idea is that the syntactic and semantic analysis is performed while the words are segmented, and the syntactic information and the semantic information are utilized to process the ambiguity. It generally consists of three parts: the system comprises a word segmentation subsystem, a syntactic semantic subsystem and a general control part. Under the coordination of the general control part, the word segmentation subsystem can obtain the syntactic and semantic information of related words, sentences and the like to judge word segmentation ambiguity, namely, the word segmentation subsystem simulates the understanding process of people to sentences. This word segmentation method requires the use of a large amount of language knowledge and information. Because of the general and complex nature of Chinese language knowledge, it is difficult to organize various language information into machine-readable forms, and word segmentation systems based on understanding are still in the experimental stage at present.
3. Word segmentation method based on statistics
Formally, words are stable combinations of words, and therefore in this context, the more times adjacent words appear simultaneously, the more likely a word is composed. Therefore, the frequency or probability of co-occurrence of the characters adjacent to the characters can better reflect the credibility of the formed words. The frequency of the combination of each word of adjacent co-occurrence in the corpus can be counted, and the co-occurrence information of each word can be calculated. And defining mutual occurrence information of the two words, and calculating adjacent co-occurrence probabilities of the two Chinese characters X, Y. The mutual information shows the tightness of the combination relation between Chinese characters. When the degree of compactness is above a certain threshold, it is considered that the word may constitute a word. The method only needs to count the word group frequency in the corpus, and does not need to split a dictionary, so the method is also called a dictionary-free word segmentation method or a statistical word extraction method. However, this method has a certain limitation that common word groups, such as "this", "one", "some", "my", "many", etc., which are frequently found but are not words, are frequently extracted, and recognition accuracy of the common words is poor and space-time overhead is high. The statistical word segmentation system in practical application uses a part of basic word segmentation dictionary (commonly used word dictionary) to carry out string matching word segmentation, and simultaneously uses a statistical method to recognize some new words, namely, the string frequency statistics and the string matching are combined, so that the characteristics of high word segmentation speed and high efficiency of matching word segmentation are brought into play, and the advantages of word segmentation without dictionary are utilized to combine with context recognition word generation and automatic disambiguation.
1. Ambiguity identification
Ambiguity refers to the same sentence, and there may be two or more segmentation methods. For example: the phrase "surface" is intended to be divided into "surface" and "surface" because both are words. This is known as cross ambiguity. Such cross ambiguity is quite common, and the previous examples of "and" take "are in fact errors due to the cross ambiguity. "make-up and apparel" can be classified as "make-up and apparel" or "make-up and apparel". Since no one has knowledge to understand, it is difficult for a computer to know exactly which scenario is correct.
The cross ambiguity is relatively easy to handle compared with the combination ambiguity, which must be judged according to the whole sentence. For example, in the sentence "this door handle is bad," the "handle" is a word, but in the sentence "please take the handle off," the "handle" is not a word; in the sentence "will be in the order of one, the" will be "is a word, but in the sentence" will increase twice in three years of output ", the" will be "is no longer a word. How do these words computers go to identify again?
There is also a problem in ambiguity that is true if both the cross ambiguity and the combined ambiguity computers can resolve. True ambiguity means that a sentence is given, and the person does not know which should be a word and which should not be a word. For example: the "auction for table tennis is completed" may be divided into "auction for table tennis is completed" and "auction for table tennis is completed" or "auction for table tennis is completed" and if there is no context, the person who is afraid of not knowing "auction" does not calculate a word here.
2. New word recognition
New words, terms of art are called unregistered words. I.e. those words which are not already included in the dictionary but which can indeed be referred to as words. Most typically, a person can easily understand that the sentence "Wang Junhu" in "out of Guangzhou," Wang Junhu "is a word, because it is a person's name, but it is difficult for a computer to recognize. If Wang Junhu is recorded as a word in a dictionary, there are so many names worldwide, and there are newly added names at every moment, and recording these names is a huge project. Even if this can be done, problems remain, such as: in the sentence "Wang Junhu tiger brain," Wang Junhu "can also not calculate the word?
In addition to the name of a person, the new words include organization names, place names, product names, trademark names, abbreviations, ellipses and the like, which are difficult to process, and are just words which are frequently used by people, so that the new word recognition in a word segmentation system is very important for a search engine. At present, the recognition accuracy of new words is one of important marks for evaluating the quality of a word segmentation system. The existing word segmentation algorithm is based on a word stock, and words which do not appear in the word stock cannot be segmented.
Disclosure of Invention
Aiming at the technical problems, the invention provides a Chinese word segmentation method based on a deep learning and forgetting algorithm, which can automatically detect domain knowledge by fusing the deep learning word segmentation method and the forgetting algorithm word segmentation method, complete a new word discovery function in an unsupervised domain and improve word segmentation effect.
In order to solve the technical problems, the invention adopts the following technical scheme: a Chinese word segmentation method based on deep learning and forgetting algorithm comprises the following steps:
step one: the method comprises the steps of scanning sentences word by word to obtain natural language, dividing the scanned natural language into word sequences by adopting a deep learning word segmentation method, and collecting the word sequences into a first word stock;
step two: the sentence is scanned word by word to obtain natural language, the word segmentation method of forgetting algorithm is adopted to divide the obtained natural language into candidate words and receive the candidate words into a second word stock,
step three: fusing the word sequence in the first word bank with the candidate words in the second word bank to obtain a final word segmentation result, wherein the fusion method comprises the following steps:
the first word stock and the second word stock are words, and then are combined into words; the first word stock and the second word stock are single words, and are combined into words; the continuous single words in the second word stock are combined into words if the words in the corresponding deep learning; and if the single word in the second word stock is the word in the corresponding deep learning, merging the single word forwards or backwards into the word.
Wherein, the step one deep learning word segmentation method adopts an RNN method.
The deep learning word segmentation method adopts an LSTM model in the RNN method.
The word segmentation method of the forgetting algorithm adopts a judgment formula as follows:
P(W n W n+1 )<P(W n )*P(W n+1 )
wherein Wn is the nth word in the scanned sentence
Figure GDA0004231208350000051
Figure GDA0004231208350000052
The forgetting curve adopted by the forgetting algorithm in the second step is a Newton cooling curve.
The beneficial effects of the invention are as follows:
the word segmentation method of the invention has the following advantages:
(1) Unsupervised learning, which can use a large amount of corpus for training;
(2) The O (N) level time complexity can be achieved in a relatively short time for large-scale word segmentation;
(3) The word library is self-maintained, and the program can automatically find and add new words, adjust word frequency, clean wrong words and remove uncommon words under the condition that manual participation is not needed, so that the size of the dictionary is kept to be proper;
(4) Domain adaptation: when the field changes, the vocabulary entry and the vocabulary frequency are adaptively adjusted along with the field change;
(5) The method can support word segmentation of exclusive word banks such as names of remote artists, program names and the like.
Drawings
FIG. 1 is a forgetting graph used by forgetting coefficients in a Chinese word segmentation method based on a deep learning and forgetting algorithm;
fig. 2 is a logic diagram of LSTM model in a chinese word segmentation method based on deep learning and forgetting algorithm according to the present invention.
Detailed Description
In the following detailed description of the preferred embodiments, reference is made to the accompanying drawings which form a part hereof. The accompanying drawings illustrate, by way of example, specific embodiments in which the invention may be practiced. The illustrated embodiments are not intended to be exhaustive of all embodiments according to the invention. It is to be understood that other embodiments may be utilized and structural or logical changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
A Chinese word segmentation method based on deep learning and forgetting algorithm comprises the following steps:
step one: the method comprises the steps of scanning sentences word by word to obtain natural language, dividing the scanned natural language into word sequences by adopting a deep learning word segmentation method, and collecting the word sequences into a first word stock;
step two: the sentence is scanned word by word to obtain natural language, the word segmentation method of forgetting algorithm is adopted to divide the obtained natural language into candidate words and receive the candidate words into a second word stock,
step three: fusing the word sequence in the first word bank with the candidate words in the second word bank to obtain a final word segmentation result, wherein the fusion method comprises the following steps:
the first word stock and the second word stock are words, and then are combined into words; the first word stock and the second word stock are single words, and are combined into words; the continuous single words in the second word stock are combined into words if the words in the corresponding deep learning; and if the single word in the second word stock is the word in the corresponding deep learning, merging the single word forwards or backwards into the word.
The invention adopts the combination of deep learning and forgetting algorithm, can automatically detect domain knowledge, completes the new word discovery function in the unsupervised domain, and improves the word segmentation effect.
The main steps of the forgetting algorithm are as follows:
the word segmentation can be accomplished in O (N) level time, a single pass, using the following steps:
scanning sentences word by word, finding out all words ending with the word in a limited word length from a word stock, respectively calculating probability products of the words and words before the word, taking the word with the largest result value, and respectively caching the maximum probability product of the position of the current word and the corresponding word segmentation result. Repeating the above steps until the sentence is scanned, and obtaining the final word position as the whole sentence word segmentation result.
If two adjacent words are irrelevant, the two words can be disconnected in the middle. The sentence is scanned word by word, if two adjacent words meet the following formula, the two words are disconnected, so that the sentence can be cut into a plurality of substrings, and a candidate word set is obtained, and the judging formula is shown in the following diagram:
P(W n W n+1 )<P(W n )*P(W n+1 )
wherein Wn is the nth word in the scanned sentence
Figure GDA0004231208350000061
Figure GDA0004231208350000062
The parameters required in the formula can be obtained statistically: the corpus is traversed once, so that the frequency number of the single word, the frequency number of the co-occurrence of two adjacent words and the frequency sum of all the single words required in the formula can be obtained.
The forgetting curve used by the forgetting coefficient is shown in fig. 1:
the deep learning method adopts an RNN method, and specifically adopts an LSTM model.
The Chinese word segmentation is to divide a natural language text into word sequences, and preferably sequence labeling is performed by labeling each word in a sentence with four tags, namely BMES (B is a word head, M is a word, E is a word tail, and S is a single word).
For { Beijing east search and big data platform data mining algorithm part })
Labeled { BE BE S BME BE BMME BME }
Training is carried out on the original input sequence and the original output sequence for marking corpus, and finally word segmentation sequences are generated, wherein a logical diagram of the LSTM model is shown in fig. 2. In fig. 2, X is an input sequence, H is an output sequence, and the basic idea of word segmentation uses a sequence labeling problem to label each word in a sentence as four labels of BEMS. The input of the whole model is a character sequence and the output is a label sequence, so this is a standard sequenceto sequence problem.
The combined word segmentation method improves the word segmentation effect by fusing the results of the two methods, and takes a forgetting algorithm as a theme because
By way of example, the name of the artist plays an important role in recommendation
Forgetting algorithm as unsupervised learning, low cost of corpus acquisition path
Deep learning algorithm training corpus is scarce and training time is long
Merging scheme
Successive single words in the result of the forgetting algorithm are merged into words if the word in the corresponding deep learning
Single word in forgetting algorithm result, if word in corresponding deep learning, combining forward or backward into word
Merging with reference parts of speech
Example 1:
natural language is obtained through scanning sentences, then word segmentation and deep learning word segmentation are carried out through a forgetting algorithm, and word segmentation results after fusion are obtained:
the following are the results of the two algorithms separately word-splitting:
the forgetting algorithm improves word segmentation results;
< actual shooting > < man > < subway > < leowed > < woman > < passenger > < quilt > < heat > < passenger > < torsion > < acquisition >
< monster of pocket > < network > < version > < registration > < download > < teaching > < video >
< jol > < some > < snowy > < real > < loving > < romantic > < surface > < girl > < feeling > < crying > <161105> < very > < perfect >
< laugh very popular > < Zheng Mou > < Yang Mou > < kissing > < play > < game > < talk > < love >
< ginger something > < Tight > < mah-jong > < reaction > < olympic > < laugh > < can > < combination with > < gymnastics >
The sound of the ball is the sound of the ball, the sound of the horn is the ultra-large sound of the ball, the sound of the ball is the sound of the ball, and the sound of the ball is the sound of the ball.
Word segmentation result of deep learning algorithm:
< practice > < clap > < man > < subway > < leowed woman > < passenger > < quilt > < heat-center > < passenger > < twist >
< pocket > < monster > < web version > < registration > < download > < teaching > < video >
< something joss > < snow > < night foraging > < genuine love > < romantic > < appearance > < girl > < feeling pain > < cry > <161105> < very > < perfect >
< pico > < laugh > < very > < pouring > < Zheng Mou > < Yang Mou > < kiss > < play > < game > < love >
< ginger > < certain > < Tibet > < mah-jong > < reaction > < forward Olympic > < laugh > < can > < combination with > < gymnastics > < combination >
< week sometime > < stock > < station > < left > < outer > < field > < horn > < loud sound > < self-contained > < won > < fight > < enemy >;
the results after combining by the above scheme:
< actual shooting > < man > < subway > < lewy < passenger > < quilt > < heat center > < passenger > < twist >
< pocket monster > < web version > < registration > < download > < teaching > < video >
< something joker > < snowy > < foraging > < genuine > above > < romantic > < appearance > < girl > < feeling > < cry of pain > <161105> < very > < perfect >
< laugh very popular > < Zheng Mou > < Yang Mou > < kissing > < play > < game > < love >
< ginger something > < Tight > < mah-jong > < reaction > < olympic > < laugh > < can > < combination with > < gymnastics >
< week sometime > < stock > < station > < left > < outer > < field > < horn > < loud sound > < self-contained > < won > < protection > < enemy >.

Claims (3)

1. A Chinese word segmentation method based on deep learning and forgetting algorithm is characterized by comprising the following steps:
step one: the method comprises the steps of scanning sentences word by word to obtain natural language, dividing the scanned natural language into word sequences by adopting a deep learning word segmentation method, and collecting the word sequences into a first word stock;
step two: the sentence is scanned word by word to obtain natural language, the word segmentation method of forgetting algorithm is adopted to divide the obtained natural language into candidate words and receive the candidate words into a second word stock,
step three: fusing the word sequence in the first word bank with the candidate words in the second word bank to obtain a final word segmentation result, wherein the fusion method comprises the following steps:
the first word stock and the second word stock are words, and then are combined into words; the first word stock and the second word stock are single words, and are combined into words; the continuous single words in the second word stock are combined into words if the words in the corresponding deep learning; if the single word in the second word stock is the word in the corresponding deep learning, merging the single word forwards or backwards into the word;
the word segmentation method of the forgetting algorithm adopts a judgment formula as follows:
P(W n W n+1 )<P(W n )*P(W n+1 )
wherein Wn is the nth word in the scanned sentence
P(W n ):
Figure QLYQS_1
P(W n Wn +1 ):
Figure QLYQS_2
And in the second step, the forgetting curve adopted by the forgetting algorithm is a Newton cooling curve.
2. The method for Chinese word segmentation based on deep learning and forgetting algorithm as set forth in claim 1, wherein the step one deep learning word segmentation method adopts RNN method.
3. The method for Chinese word segmentation based on deep learning and forgetting algorithm according to claim 1 or 2, wherein the step one deep learning word segmentation method adopts an LSTM model in the RNN method.
CN201811258651.5A 2018-10-26 2018-10-26 Chinese word segmentation method based on deep learning and forgetting algorithm Active CN109388806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811258651.5A CN109388806B (en) 2018-10-26 2018-10-26 Chinese word segmentation method based on deep learning and forgetting algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811258651.5A CN109388806B (en) 2018-10-26 2018-10-26 Chinese word segmentation method based on deep learning and forgetting algorithm

Publications (2)

Publication Number Publication Date
CN109388806A CN109388806A (en) 2019-02-26
CN109388806B true CN109388806B (en) 2023-06-27

Family

ID=65427965

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811258651.5A Active CN109388806B (en) 2018-10-26 2018-10-26 Chinese word segmentation method based on deep learning and forgetting algorithm

Country Status (1)

Country Link
CN (1) CN109388806B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414002B (en) * 2019-07-19 2023-06-09 山东科技大学 Intelligent Chinese word segmentation method based on statistics and deep learning
CN110751234B (en) * 2019-10-09 2024-04-16 科大讯飞股份有限公司 OCR (optical character recognition) error correction method, device and equipment

Citations (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199972A (en) * 2013-09-22 2014-12-10 中科嘉速(北京)并行软件有限公司 Named entity relation extraction and construction method based on deep learning
CN105740226A (en) * 2016-01-15 2016-07-06 南京大学 Method for implementing Chinese segmentation by using tree neural network and bilateral neural network
CN106528738A (en) * 2016-10-28 2017-03-22 华北理工大学 Method and device for intelligent interaction based on natural language analysis
CN106776562A (en) * 2016-12-20 2017-05-31 上海智臻智能网络科技股份有限公司 A kind of keyword extracting method and extraction system
CN107122479A (en) * 2017-05-03 2017-09-01 西安交通大学 A kind of user cipher conjecture system based on deep learning
CN107145484A (en) * 2017-04-24 2017-09-08 北京邮电大学 A kind of Chinese word cutting method based on hidden many granularity local features
CN107145483A (en) * 2017-04-24 2017-09-08 北京邮电大学 A kind of adaptive Chinese word cutting method based on embedded expression
CN107423288A (en) * 2017-07-05 2017-12-01 达而观信息科技(上海)有限公司 A kind of Chinese automatic word-cut and method based on unsupervised learning
CN107590196A (en) * 2017-08-15 2018-01-16 中国农业大学 Earthquake emergency information screening and evaluating system and system in a kind of social networks
CN107622050A (en) * 2017-09-14 2018-01-23 武汉烽火普天信息技术有限公司 Text sequence labeling system and method based on Bi LSTM and CRF
CN107622049A (en) * 2017-09-06 2018-01-23 国家电网公司 A kind of special word stock generating method of electric service
CN107665254A (en) * 2017-09-30 2018-02-06 济南浪潮高新科技投资发展有限公司 A kind of menu based on deep learning recommends method
CN107798140A (en) * 2017-11-23 2018-03-13 北京神州泰岳软件股份有限公司 A kind of conversational system construction method, semantic controlled answer method and device
CN107807964A (en) * 2017-10-11 2018-03-16 咪咕互动娱乐有限公司 Digital content sort method, device and computer-readable recording medium
CN107818130A (en) * 2017-09-15 2018-03-20 深圳市电陶思创科技有限公司 The method for building up and system of a kind of search engine
CN107885853A (en) * 2017-11-14 2018-04-06 同济大学 A kind of combined type file classification method based on deep learning
CN107894976A (en) * 2017-10-12 2018-04-10 北京知道未来信息技术有限公司 A kind of mixing language material segmenting method based on Bi LSTM
CN107943937A (en) * 2017-11-23 2018-04-20 杭州源诚科技有限公司 A kind of debtors assets monitoring method and system based on trial open information analysis
CN107944014A (en) * 2017-12-11 2018-04-20 河海大学 A kind of Chinese text sentiment analysis method based on deep learning
CN107943783A (en) * 2017-10-12 2018-04-20 北京知道未来信息技术有限公司 A kind of segmenting method based on LSTM CNN
CN107967318A (en) * 2017-11-23 2018-04-27 北京师范大学 A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets
CN108038103A (en) * 2017-12-18 2018-05-15 北京百分点信息科技有限公司 A kind of method, apparatus segmented to text sequence and electronic equipment
CN108268444A (en) * 2018-01-10 2018-07-10 南京邮电大学 A kind of Chinese word cutting method based on two-way LSTM, CNN and CRF
CN108304506A (en) * 2018-01-18 2018-07-20 腾讯科技(深圳)有限公司 Search method, device and equipment
CN108304364A (en) * 2017-02-23 2018-07-20 腾讯科技(深圳)有限公司 keyword extracting method and device
CN108320740A (en) * 2017-12-29 2018-07-24 深圳和而泰数据资源与云技术有限公司 A kind of audio recognition method, device, electronic equipment and storage medium
CN108415953A (en) * 2018-02-05 2018-08-17 华融融通(北京)科技有限公司 A kind of non-performing asset based on natural language processing technique manages knowledge management method
CN108536756A (en) * 2018-03-16 2018-09-14 苏州大学 Mood sorting technique and system based on bilingual information
CN108536667A (en) * 2017-03-06 2018-09-14 中国移动通信集团广东有限公司 Chinese text recognition methods and device
CN108563725A (en) * 2018-04-04 2018-09-21 华东理工大学 A kind of Chinese symptom and sign composition recognition methods

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101004737A (en) * 2007-01-24 2007-07-25 贵阳易特软件有限公司 Individualized document processing system based on keywords
CN100504851C (en) * 2007-06-27 2009-06-24 腾讯科技(深圳)有限公司 Chinese character word distinguishing method and system
US8311973B1 (en) * 2011-09-24 2012-11-13 Zadeh Lotfi A Methods and systems for applications for Z-numbers
CN106874292B (en) * 2015-12-11 2020-05-05 北京国双科技有限公司 Topic processing method and device
CN105426539B (en) * 2015-12-23 2018-12-18 成都云数未来信息科学有限公司 A kind of lucene Chinese word cutting method based on dictionary
CN107291684B (en) * 2016-04-12 2021-02-09 华为技术有限公司 Word segmentation method and system for language text
CN106668985A (en) * 2016-12-22 2017-05-17 山东大学 Real-time monitoring system for transfusion
US10255269B2 (en) * 2016-12-30 2019-04-09 Microsoft Technology Licensing, Llc Graph long short term memory for syntactic relationship discovery
US10565492B2 (en) * 2016-12-31 2020-02-18 Via Alliance Semiconductor Co., Ltd. Neural network unit with segmentable array width rotator
CN107153640A (en) * 2017-05-08 2017-09-12 成都准星云学科技有限公司 A kind of segmenting method towards elementary mathematics field
CN107391486B (en) * 2017-07-20 2020-10-27 南京云问网络技术有限公司 Method for identifying new words in field based on statistical information and sequence labels
CN107844475A (en) * 2017-10-12 2018-03-27 北京知道未来信息技术有限公司 A kind of segmenting method based on LSTM
CN107992467A (en) * 2017-10-12 2018-05-04 北京知道未来信息技术有限公司 A kind of mixing language material segmenting method based on LSTM

Patent Citations (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199972A (en) * 2013-09-22 2014-12-10 中科嘉速(北京)并行软件有限公司 Named entity relation extraction and construction method based on deep learning
CN105740226A (en) * 2016-01-15 2016-07-06 南京大学 Method for implementing Chinese segmentation by using tree neural network and bilateral neural network
CN106528738A (en) * 2016-10-28 2017-03-22 华北理工大学 Method and device for intelligent interaction based on natural language analysis
CN106776562A (en) * 2016-12-20 2017-05-31 上海智臻智能网络科技股份有限公司 A kind of keyword extracting method and extraction system
CN108304364A (en) * 2017-02-23 2018-07-20 腾讯科技(深圳)有限公司 keyword extracting method and device
CN108536667A (en) * 2017-03-06 2018-09-14 中国移动通信集团广东有限公司 Chinese text recognition methods and device
CN107145483A (en) * 2017-04-24 2017-09-08 北京邮电大学 A kind of adaptive Chinese word cutting method based on embedded expression
CN107145484A (en) * 2017-04-24 2017-09-08 北京邮电大学 A kind of Chinese word cutting method based on hidden many granularity local features
CN107122479A (en) * 2017-05-03 2017-09-01 西安交通大学 A kind of user cipher conjecture system based on deep learning
CN107423288A (en) * 2017-07-05 2017-12-01 达而观信息科技(上海)有限公司 A kind of Chinese automatic word-cut and method based on unsupervised learning
CN107590196A (en) * 2017-08-15 2018-01-16 中国农业大学 Earthquake emergency information screening and evaluating system and system in a kind of social networks
CN107622049A (en) * 2017-09-06 2018-01-23 国家电网公司 A kind of special word stock generating method of electric service
CN107622050A (en) * 2017-09-14 2018-01-23 武汉烽火普天信息技术有限公司 Text sequence labeling system and method based on Bi LSTM and CRF
CN107818130A (en) * 2017-09-15 2018-03-20 深圳市电陶思创科技有限公司 The method for building up and system of a kind of search engine
CN107665254A (en) * 2017-09-30 2018-02-06 济南浪潮高新科技投资发展有限公司 A kind of menu based on deep learning recommends method
CN107807964A (en) * 2017-10-11 2018-03-16 咪咕互动娱乐有限公司 Digital content sort method, device and computer-readable recording medium
CN107894976A (en) * 2017-10-12 2018-04-10 北京知道未来信息技术有限公司 A kind of mixing language material segmenting method based on Bi LSTM
CN107943783A (en) * 2017-10-12 2018-04-20 北京知道未来信息技术有限公司 A kind of segmenting method based on LSTM CNN
CN107885853A (en) * 2017-11-14 2018-04-06 同济大学 A kind of combined type file classification method based on deep learning
CN107943937A (en) * 2017-11-23 2018-04-20 杭州源诚科技有限公司 A kind of debtors assets monitoring method and system based on trial open information analysis
CN107967318A (en) * 2017-11-23 2018-04-27 北京师范大学 A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets
CN107798140A (en) * 2017-11-23 2018-03-13 北京神州泰岳软件股份有限公司 A kind of conversational system construction method, semantic controlled answer method and device
CN107944014A (en) * 2017-12-11 2018-04-20 河海大学 A kind of Chinese text sentiment analysis method based on deep learning
CN108038103A (en) * 2017-12-18 2018-05-15 北京百分点信息科技有限公司 A kind of method, apparatus segmented to text sequence and electronic equipment
CN108320740A (en) * 2017-12-29 2018-07-24 深圳和而泰数据资源与云技术有限公司 A kind of audio recognition method, device, electronic equipment and storage medium
CN108268444A (en) * 2018-01-10 2018-07-10 南京邮电大学 A kind of Chinese word cutting method based on two-way LSTM, CNN and CRF
CN108304506A (en) * 2018-01-18 2018-07-20 腾讯科技(深圳)有限公司 Search method, device and equipment
CN108415953A (en) * 2018-02-05 2018-08-17 华融融通(北京)科技有限公司 A kind of non-performing asset based on natural language processing technique manages knowledge management method
CN108536756A (en) * 2018-03-16 2018-09-14 苏州大学 Mood sorting technique and system based on bilingual information
CN108563725A (en) * 2018-04-04 2018-09-21 华东理工大学 A kind of Chinese symptom and sign composition recognition methods

Also Published As

Publication number Publication date
CN109388806A (en) 2019-02-26

Similar Documents

Publication Publication Date Title
Alzantot et al. Generating natural language adversarial examples
Kiros et al. Skip-thought vectors
CN105957518B (en) A kind of method of Mongol large vocabulary continuous speech recognition
US20190129947A1 (en) Neural machine translation method and apparatus
CN110175246B (en) Method for extracting concept words from video subtitles
CN100536532C (en) Method and system for automatic subtilting
CN107608960B (en) Method and device for linking named entities
WO2018201600A1 (en) Information mining method and system, electronic device and readable storage medium
CN112818694A (en) Named entity recognition method based on rules and improved pre-training model
CN108052499A (en) Text error correction method, device and computer-readable medium based on artificial intelligence
CN113553429B (en) Normalized label system construction and text automatic labeling method
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN108038099B (en) Low-frequency keyword identification method based on word clustering
KR102010343B1 (en) Method and apparatus for providing segmented internet based lecture contents
CN109388806B (en) Chinese word segmentation method based on deep learning and forgetting algorithm
CN114756681B (en) Evaluation and education text fine granularity suggestion mining method based on multi-attention fusion
JP2018033048A (en) Metadata generation system
CN109684928A (en) Chinese document recognition methods based on Internal retrieval
Song et al. LSTM-in-LSTM for generating long descriptions of images
CN115310448A (en) Chinese named entity recognition method based on combining bert and word vector
Wang et al. Combining self-training and self-supervised learning for unsupervised disfluency detection
CN111552801A (en) Neural network automatic abstract model based on semantic alignment
CN107590121B (en) Text normalization method and system
Zou et al. To be an artist: automatic generation on food image aesthetic captioning
Andra et al. Automatic lecture video content summarizationwith attention-based recurrent neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant