CN109388806B - Chinese word segmentation method based on deep learning and forgetting algorithm - Google Patents
Chinese word segmentation method based on deep learning and forgetting algorithm Download PDFInfo
- Publication number
- CN109388806B CN109388806B CN201811258651.5A CN201811258651A CN109388806B CN 109388806 B CN109388806 B CN 109388806B CN 201811258651 A CN201811258651 A CN 201811258651A CN 109388806 B CN109388806 B CN 109388806B
- Authority
- CN
- China
- Prior art keywords
- word
- words
- deep learning
- word segmentation
- stock
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a Chinese word segmentation method based on deep learning and forgetting algorithm, which comprises the following steps: and (3) a step of: the method comprises the steps of scanning sentences word by word to obtain natural language, dividing the scanned natural language into word sequences by adopting a deep learning word segmentation method, and collecting the word sequences into a first word stock; and II: the method comprises the steps of scanning sentences word by word to obtain natural language, dividing the obtained natural language into candidate words by adopting a forgetting algorithm word segmentation method, and collecting the candidate words into a second word bank, and thirdly: fusing the word sequence in the first word bank with the candidate words in the second word bank to obtain a final word segmentation result, wherein the fusion method comprises the following steps: the continuous single words in the second word stock are combined into words if the words in the corresponding deep learning; and if the single word in the second word stock is the word in the corresponding deep learning, merging the single word forwards or backwards into the word. According to the word segmentation method, the deep learning word segmentation method and the forgetting algorithm word segmentation method are fused, so that the knowledge of the field can be automatically detected, a new word discovery function in the unsupervised field is completed, and the word segmentation effect is improved.
Description
Technical Field
The invention relates to the technical field of word segmentation, in particular to a Chinese word segmentation method based on deep learning and forgetting algorithm.
Background
Chinese segmentation (Chinese Word Segmentation) refers to the segmentation of a sequence of Chinese characters into individual words. Word segmentation is the process of recombining a continuous word sequence into a word sequence according to a certain specification.
1. Word segmentation method based on character string matching
The method is also called a mechanical word segmentation method, and is characterized in that a Chinese character string to be analyzed is matched with entries in a 'fully large' machine dictionary according to a certain strategy, and if a certain character string is found in the dictionary, the matching is successful (a word is identified). According to different scanning directions, the string matching word segmentation method can be divided into forward matching and reverse matching; according to the situation of the preferential matching of different lengths, the matching can be divided into maximum (longest) matching and minimum (shortest) matching; according to the combination of the part-of-speech labeling process or not, the method can be divided into a simple word segmentation method and an integrated method combining word segmentation and labeling. Several mechanical word segmentation methods are commonly used as follows:
1) Forward maximum matching (left to right direction);
2) Reverse maximum matching (right-to-left direction);
3) Minimum segmentation (minimizing the number of words cut in each sentence).
The above-described various methods may also be combined with each other, and for example, a forward maximum matching method and a reverse maximum matching method may be combined to constitute a bidirectional matching method. Due to the word forming characteristics of Chinese characters, the forward minimum matching and the reverse minimum matching are rarely used. In general, the segmentation accuracy of reverse matching is slightly higher than that of forward matching, and the ambiguity phenomenon is less. The statistics show that the error rate of the pure forward maximum matching is 1/169, and the error rate of the pure reverse maximum matching is 1/245. But this accuracy is far from meeting practical requirements. The word segmentation system actually used takes mechanical word segmentation as a primary segmentation means, and further improves the segmentation accuracy by utilizing various other language information.
One method is to improve the scanning mode, called feature scanning or sign segmentation, to identify and segment some words with obvious features in the character string to be analyzed, and to use these words as break points to divide the original character string into smaller strings and then to enter mechanical word segmentation, so as to reduce the error rate of matching. The other method combines word segmentation and word class labeling, provides help for word segmentation decision by using rich word class information, and also carries out inspection and adjustment on word segmentation results in the labeling process, thereby greatly improving the segmentation accuracy.
For the mechanical word segmentation method, a general model can be built, and a professional academic paper is provided in this respect, which is not discussed in detail here.
2. Word segmentation method based on understanding
The word segmentation method achieves the effect of word recognition by enabling a computer to simulate the understanding of people to sentences. The basic idea is that the syntactic and semantic analysis is performed while the words are segmented, and the syntactic information and the semantic information are utilized to process the ambiguity. It generally consists of three parts: the system comprises a word segmentation subsystem, a syntactic semantic subsystem and a general control part. Under the coordination of the general control part, the word segmentation subsystem can obtain the syntactic and semantic information of related words, sentences and the like to judge word segmentation ambiguity, namely, the word segmentation subsystem simulates the understanding process of people to sentences. This word segmentation method requires the use of a large amount of language knowledge and information. Because of the general and complex nature of Chinese language knowledge, it is difficult to organize various language information into machine-readable forms, and word segmentation systems based on understanding are still in the experimental stage at present.
3. Word segmentation method based on statistics
Formally, words are stable combinations of words, and therefore in this context, the more times adjacent words appear simultaneously, the more likely a word is composed. Therefore, the frequency or probability of co-occurrence of the characters adjacent to the characters can better reflect the credibility of the formed words. The frequency of the combination of each word of adjacent co-occurrence in the corpus can be counted, and the co-occurrence information of each word can be calculated. And defining mutual occurrence information of the two words, and calculating adjacent co-occurrence probabilities of the two Chinese characters X, Y. The mutual information shows the tightness of the combination relation between Chinese characters. When the degree of compactness is above a certain threshold, it is considered that the word may constitute a word. The method only needs to count the word group frequency in the corpus, and does not need to split a dictionary, so the method is also called a dictionary-free word segmentation method or a statistical word extraction method. However, this method has a certain limitation that common word groups, such as "this", "one", "some", "my", "many", etc., which are frequently found but are not words, are frequently extracted, and recognition accuracy of the common words is poor and space-time overhead is high. The statistical word segmentation system in practical application uses a part of basic word segmentation dictionary (commonly used word dictionary) to carry out string matching word segmentation, and simultaneously uses a statistical method to recognize some new words, namely, the string frequency statistics and the string matching are combined, so that the characteristics of high word segmentation speed and high efficiency of matching word segmentation are brought into play, and the advantages of word segmentation without dictionary are utilized to combine with context recognition word generation and automatic disambiguation.
1. Ambiguity identification
Ambiguity refers to the same sentence, and there may be two or more segmentation methods. For example: the phrase "surface" is intended to be divided into "surface" and "surface" because both are words. This is known as cross ambiguity. Such cross ambiguity is quite common, and the previous examples of "and" take "are in fact errors due to the cross ambiguity. "make-up and apparel" can be classified as "make-up and apparel" or "make-up and apparel". Since no one has knowledge to understand, it is difficult for a computer to know exactly which scenario is correct.
The cross ambiguity is relatively easy to handle compared with the combination ambiguity, which must be judged according to the whole sentence. For example, in the sentence "this door handle is bad," the "handle" is a word, but in the sentence "please take the handle off," the "handle" is not a word; in the sentence "will be in the order of one, the" will be "is a word, but in the sentence" will increase twice in three years of output ", the" will be "is no longer a word. How do these words computers go to identify again?
There is also a problem in ambiguity that is true if both the cross ambiguity and the combined ambiguity computers can resolve. True ambiguity means that a sentence is given, and the person does not know which should be a word and which should not be a word. For example: the "auction for table tennis is completed" may be divided into "auction for table tennis is completed" and "auction for table tennis is completed" or "auction for table tennis is completed" and if there is no context, the person who is afraid of not knowing "auction" does not calculate a word here.
2. New word recognition
New words, terms of art are called unregistered words. I.e. those words which are not already included in the dictionary but which can indeed be referred to as words. Most typically, a person can easily understand that the sentence "Wang Junhu" in "out of Guangzhou," Wang Junhu "is a word, because it is a person's name, but it is difficult for a computer to recognize. If Wang Junhu is recorded as a word in a dictionary, there are so many names worldwide, and there are newly added names at every moment, and recording these names is a huge project. Even if this can be done, problems remain, such as: in the sentence "Wang Junhu tiger brain," Wang Junhu "can also not calculate the word?
In addition to the name of a person, the new words include organization names, place names, product names, trademark names, abbreviations, ellipses and the like, which are difficult to process, and are just words which are frequently used by people, so that the new word recognition in a word segmentation system is very important for a search engine. At present, the recognition accuracy of new words is one of important marks for evaluating the quality of a word segmentation system. The existing word segmentation algorithm is based on a word stock, and words which do not appear in the word stock cannot be segmented.
Disclosure of Invention
Aiming at the technical problems, the invention provides a Chinese word segmentation method based on a deep learning and forgetting algorithm, which can automatically detect domain knowledge by fusing the deep learning word segmentation method and the forgetting algorithm word segmentation method, complete a new word discovery function in an unsupervised domain and improve word segmentation effect.
In order to solve the technical problems, the invention adopts the following technical scheme: a Chinese word segmentation method based on deep learning and forgetting algorithm comprises the following steps:
step one: the method comprises the steps of scanning sentences word by word to obtain natural language, dividing the scanned natural language into word sequences by adopting a deep learning word segmentation method, and collecting the word sequences into a first word stock;
step two: the sentence is scanned word by word to obtain natural language, the word segmentation method of forgetting algorithm is adopted to divide the obtained natural language into candidate words and receive the candidate words into a second word stock,
step three: fusing the word sequence in the first word bank with the candidate words in the second word bank to obtain a final word segmentation result, wherein the fusion method comprises the following steps:
the first word stock and the second word stock are words, and then are combined into words; the first word stock and the second word stock are single words, and are combined into words; the continuous single words in the second word stock are combined into words if the words in the corresponding deep learning; and if the single word in the second word stock is the word in the corresponding deep learning, merging the single word forwards or backwards into the word.
Wherein, the step one deep learning word segmentation method adopts an RNN method.
The deep learning word segmentation method adopts an LSTM model in the RNN method.
The word segmentation method of the forgetting algorithm adopts a judgment formula as follows:
P(W n W n+1 )<P(W n )*P(W n+1 )
wherein Wn is the nth word in the scanned sentence
The forgetting curve adopted by the forgetting algorithm in the second step is a Newton cooling curve.
The beneficial effects of the invention are as follows:
the word segmentation method of the invention has the following advantages:
(1) Unsupervised learning, which can use a large amount of corpus for training;
(2) The O (N) level time complexity can be achieved in a relatively short time for large-scale word segmentation;
(3) The word library is self-maintained, and the program can automatically find and add new words, adjust word frequency, clean wrong words and remove uncommon words under the condition that manual participation is not needed, so that the size of the dictionary is kept to be proper;
(4) Domain adaptation: when the field changes, the vocabulary entry and the vocabulary frequency are adaptively adjusted along with the field change;
(5) The method can support word segmentation of exclusive word banks such as names of remote artists, program names and the like.
Drawings
FIG. 1 is a forgetting graph used by forgetting coefficients in a Chinese word segmentation method based on a deep learning and forgetting algorithm;
fig. 2 is a logic diagram of LSTM model in a chinese word segmentation method based on deep learning and forgetting algorithm according to the present invention.
Detailed Description
In the following detailed description of the preferred embodiments, reference is made to the accompanying drawings which form a part hereof. The accompanying drawings illustrate, by way of example, specific embodiments in which the invention may be practiced. The illustrated embodiments are not intended to be exhaustive of all embodiments according to the invention. It is to be understood that other embodiments may be utilized and structural or logical changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
A Chinese word segmentation method based on deep learning and forgetting algorithm comprises the following steps:
step one: the method comprises the steps of scanning sentences word by word to obtain natural language, dividing the scanned natural language into word sequences by adopting a deep learning word segmentation method, and collecting the word sequences into a first word stock;
step two: the sentence is scanned word by word to obtain natural language, the word segmentation method of forgetting algorithm is adopted to divide the obtained natural language into candidate words and receive the candidate words into a second word stock,
step three: fusing the word sequence in the first word bank with the candidate words in the second word bank to obtain a final word segmentation result, wherein the fusion method comprises the following steps:
the first word stock and the second word stock are words, and then are combined into words; the first word stock and the second word stock are single words, and are combined into words; the continuous single words in the second word stock are combined into words if the words in the corresponding deep learning; and if the single word in the second word stock is the word in the corresponding deep learning, merging the single word forwards or backwards into the word.
The invention adopts the combination of deep learning and forgetting algorithm, can automatically detect domain knowledge, completes the new word discovery function in the unsupervised domain, and improves the word segmentation effect.
The main steps of the forgetting algorithm are as follows:
the word segmentation can be accomplished in O (N) level time, a single pass, using the following steps:
scanning sentences word by word, finding out all words ending with the word in a limited word length from a word stock, respectively calculating probability products of the words and words before the word, taking the word with the largest result value, and respectively caching the maximum probability product of the position of the current word and the corresponding word segmentation result. Repeating the above steps until the sentence is scanned, and obtaining the final word position as the whole sentence word segmentation result.
If two adjacent words are irrelevant, the two words can be disconnected in the middle. The sentence is scanned word by word, if two adjacent words meet the following formula, the two words are disconnected, so that the sentence can be cut into a plurality of substrings, and a candidate word set is obtained, and the judging formula is shown in the following diagram:
P(W n W n+1 )<P(W n )*P(W n+1 )
wherein Wn is the nth word in the scanned sentence
The parameters required in the formula can be obtained statistically: the corpus is traversed once, so that the frequency number of the single word, the frequency number of the co-occurrence of two adjacent words and the frequency sum of all the single words required in the formula can be obtained.
The forgetting curve used by the forgetting coefficient is shown in fig. 1:
the deep learning method adopts an RNN method, and specifically adopts an LSTM model.
The Chinese word segmentation is to divide a natural language text into word sequences, and preferably sequence labeling is performed by labeling each word in a sentence with four tags, namely BMES (B is a word head, M is a word, E is a word tail, and S is a single word).
For { Beijing east search and big data platform data mining algorithm part })
Labeled { BE BE S BME BE BMME BME }
Training is carried out on the original input sequence and the original output sequence for marking corpus, and finally word segmentation sequences are generated, wherein a logical diagram of the LSTM model is shown in fig. 2. In fig. 2, X is an input sequence, H is an output sequence, and the basic idea of word segmentation uses a sequence labeling problem to label each word in a sentence as four labels of BEMS. The input of the whole model is a character sequence and the output is a label sequence, so this is a standard sequenceto sequence problem.
The combined word segmentation method improves the word segmentation effect by fusing the results of the two methods, and takes a forgetting algorithm as a theme because
By way of example, the name of the artist plays an important role in recommendation
Forgetting algorithm as unsupervised learning, low cost of corpus acquisition path
Deep learning algorithm training corpus is scarce and training time is long
Merging scheme
Successive single words in the result of the forgetting algorithm are merged into words if the word in the corresponding deep learning
Single word in forgetting algorithm result, if word in corresponding deep learning, combining forward or backward into word
Merging with reference parts of speech
Example 1:
natural language is obtained through scanning sentences, then word segmentation and deep learning word segmentation are carried out through a forgetting algorithm, and word segmentation results after fusion are obtained:
the following are the results of the two algorithms separately word-splitting:
the forgetting algorithm improves word segmentation results;
< actual shooting > < man > < subway > < leowed > < woman > < passenger > < quilt > < heat > < passenger > < torsion > < acquisition >
< monster of pocket > < network > < version > < registration > < download > < teaching > < video >
< jol > < some > < snowy > < real > < loving > < romantic > < surface > < girl > < feeling > < crying > <161105> < very > < perfect >
< laugh very popular > < Zheng Mou > < Yang Mou > < kissing > < play > < game > < talk > < love >
< ginger something > < Tight > < mah-jong > < reaction > < olympic > < laugh > < can > < combination with > < gymnastics >
The sound of the ball is the sound of the ball, the sound of the horn is the ultra-large sound of the ball, the sound of the ball is the sound of the ball, and the sound of the ball is the sound of the ball.
Word segmentation result of deep learning algorithm:
< practice > < clap > < man > < subway > < leowed woman > < passenger > < quilt > < heat-center > < passenger > < twist >
< pocket > < monster > < web version > < registration > < download > < teaching > < video >
< something joss > < snow > < night foraging > < genuine love > < romantic > < appearance > < girl > < feeling pain > < cry > <161105> < very > < perfect >
< pico > < laugh > < very > < pouring > < Zheng Mou > < Yang Mou > < kiss > < play > < game > < love >
< ginger > < certain > < Tibet > < mah-jong > < reaction > < forward Olympic > < laugh > < can > < combination with > < gymnastics > < combination >
< week sometime > < stock > < station > < left > < outer > < field > < horn > < loud sound > < self-contained > < won > < fight > < enemy >;
the results after combining by the above scheme:
< actual shooting > < man > < subway > < lewy < passenger > < quilt > < heat center > < passenger > < twist >
< pocket monster > < web version > < registration > < download > < teaching > < video >
< something joker > < snowy > < foraging > < genuine > above > < romantic > < appearance > < girl > < feeling > < cry of pain > <161105> < very > < perfect >
< laugh very popular > < Zheng Mou > < Yang Mou > < kissing > < play > < game > < love >
< ginger something > < Tight > < mah-jong > < reaction > < olympic > < laugh > < can > < combination with > < gymnastics >
< week sometime > < stock > < station > < left > < outer > < field > < horn > < loud sound > < self-contained > < won > < protection > < enemy >.
Claims (3)
1. A Chinese word segmentation method based on deep learning and forgetting algorithm is characterized by comprising the following steps:
step one: the method comprises the steps of scanning sentences word by word to obtain natural language, dividing the scanned natural language into word sequences by adopting a deep learning word segmentation method, and collecting the word sequences into a first word stock;
step two: the sentence is scanned word by word to obtain natural language, the word segmentation method of forgetting algorithm is adopted to divide the obtained natural language into candidate words and receive the candidate words into a second word stock,
step three: fusing the word sequence in the first word bank with the candidate words in the second word bank to obtain a final word segmentation result, wherein the fusion method comprises the following steps:
the first word stock and the second word stock are words, and then are combined into words; the first word stock and the second word stock are single words, and are combined into words; the continuous single words in the second word stock are combined into words if the words in the corresponding deep learning; if the single word in the second word stock is the word in the corresponding deep learning, merging the single word forwards or backwards into the word;
the word segmentation method of the forgetting algorithm adopts a judgment formula as follows:
P(W n W n+1 )<P(W n )*P(W n+1 )
wherein Wn is the nth word in the scanned sentence
And in the second step, the forgetting curve adopted by the forgetting algorithm is a Newton cooling curve.
2. The method for Chinese word segmentation based on deep learning and forgetting algorithm as set forth in claim 1, wherein the step one deep learning word segmentation method adopts RNN method.
3. The method for Chinese word segmentation based on deep learning and forgetting algorithm according to claim 1 or 2, wherein the step one deep learning word segmentation method adopts an LSTM model in the RNN method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811258651.5A CN109388806B (en) | 2018-10-26 | 2018-10-26 | Chinese word segmentation method based on deep learning and forgetting algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811258651.5A CN109388806B (en) | 2018-10-26 | 2018-10-26 | Chinese word segmentation method based on deep learning and forgetting algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109388806A CN109388806A (en) | 2019-02-26 |
CN109388806B true CN109388806B (en) | 2023-06-27 |
Family
ID=65427965
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811258651.5A Active CN109388806B (en) | 2018-10-26 | 2018-10-26 | Chinese word segmentation method based on deep learning and forgetting algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109388806B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110414002B (en) * | 2019-07-19 | 2023-06-09 | 山东科技大学 | Intelligent Chinese word segmentation method based on statistics and deep learning |
CN110751234B (en) * | 2019-10-09 | 2024-04-16 | 科大讯飞股份有限公司 | OCR (optical character recognition) error correction method, device and equipment |
Citations (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104199972A (en) * | 2013-09-22 | 2014-12-10 | 中科嘉速(北京)并行软件有限公司 | Named entity relation extraction and construction method based on deep learning |
CN105740226A (en) * | 2016-01-15 | 2016-07-06 | 南京大学 | Method for implementing Chinese segmentation by using tree neural network and bilateral neural network |
CN106528738A (en) * | 2016-10-28 | 2017-03-22 | 华北理工大学 | Method and device for intelligent interaction based on natural language analysis |
CN106776562A (en) * | 2016-12-20 | 2017-05-31 | 上海智臻智能网络科技股份有限公司 | A kind of keyword extracting method and extraction system |
CN107122479A (en) * | 2017-05-03 | 2017-09-01 | 西安交通大学 | A kind of user cipher conjecture system based on deep learning |
CN107145484A (en) * | 2017-04-24 | 2017-09-08 | 北京邮电大学 | A kind of Chinese word cutting method based on hidden many granularity local features |
CN107145483A (en) * | 2017-04-24 | 2017-09-08 | 北京邮电大学 | A kind of adaptive Chinese word cutting method based on embedded expression |
CN107423288A (en) * | 2017-07-05 | 2017-12-01 | 达而观信息科技(上海)有限公司 | A kind of Chinese automatic word-cut and method based on unsupervised learning |
CN107590196A (en) * | 2017-08-15 | 2018-01-16 | 中国农业大学 | Earthquake emergency information screening and evaluating system and system in a kind of social networks |
CN107622050A (en) * | 2017-09-14 | 2018-01-23 | 武汉烽火普天信息技术有限公司 | Text sequence labeling system and method based on Bi LSTM and CRF |
CN107622049A (en) * | 2017-09-06 | 2018-01-23 | 国家电网公司 | A kind of special word stock generating method of electric service |
CN107665254A (en) * | 2017-09-30 | 2018-02-06 | 济南浪潮高新科技投资发展有限公司 | A kind of menu based on deep learning recommends method |
CN107798140A (en) * | 2017-11-23 | 2018-03-13 | 北京神州泰岳软件股份有限公司 | A kind of conversational system construction method, semantic controlled answer method and device |
CN107807964A (en) * | 2017-10-11 | 2018-03-16 | 咪咕互动娱乐有限公司 | Digital content sort method, device and computer-readable recording medium |
CN107818130A (en) * | 2017-09-15 | 2018-03-20 | 深圳市电陶思创科技有限公司 | The method for building up and system of a kind of search engine |
CN107885853A (en) * | 2017-11-14 | 2018-04-06 | 同济大学 | A kind of combined type file classification method based on deep learning |
CN107894976A (en) * | 2017-10-12 | 2018-04-10 | 北京知道未来信息技术有限公司 | A kind of mixing language material segmenting method based on Bi LSTM |
CN107943937A (en) * | 2017-11-23 | 2018-04-20 | 杭州源诚科技有限公司 | A kind of debtors assets monitoring method and system based on trial open information analysis |
CN107944014A (en) * | 2017-12-11 | 2018-04-20 | 河海大学 | A kind of Chinese text sentiment analysis method based on deep learning |
CN107943783A (en) * | 2017-10-12 | 2018-04-20 | 北京知道未来信息技术有限公司 | A kind of segmenting method based on LSTM CNN |
CN107967318A (en) * | 2017-11-23 | 2018-04-27 | 北京师范大学 | A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets |
CN108038103A (en) * | 2017-12-18 | 2018-05-15 | 北京百分点信息科技有限公司 | A kind of method, apparatus segmented to text sequence and electronic equipment |
CN108268444A (en) * | 2018-01-10 | 2018-07-10 | 南京邮电大学 | A kind of Chinese word cutting method based on two-way LSTM, CNN and CRF |
CN108304506A (en) * | 2018-01-18 | 2018-07-20 | 腾讯科技(深圳)有限公司 | Search method, device and equipment |
CN108304364A (en) * | 2017-02-23 | 2018-07-20 | 腾讯科技(深圳)有限公司 | keyword extracting method and device |
CN108320740A (en) * | 2017-12-29 | 2018-07-24 | 深圳和而泰数据资源与云技术有限公司 | A kind of audio recognition method, device, electronic equipment and storage medium |
CN108415953A (en) * | 2018-02-05 | 2018-08-17 | 华融融通(北京)科技有限公司 | A kind of non-performing asset based on natural language processing technique manages knowledge management method |
CN108536756A (en) * | 2018-03-16 | 2018-09-14 | 苏州大学 | Mood sorting technique and system based on bilingual information |
CN108536667A (en) * | 2017-03-06 | 2018-09-14 | 中国移动通信集团广东有限公司 | Chinese text recognition methods and device |
CN108563725A (en) * | 2018-04-04 | 2018-09-21 | 华东理工大学 | A kind of Chinese symptom and sign composition recognition methods |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101004737A (en) * | 2007-01-24 | 2007-07-25 | 贵阳易特软件有限公司 | Individualized document processing system based on keywords |
CN100504851C (en) * | 2007-06-27 | 2009-06-24 | 腾讯科技(深圳)有限公司 | Chinese character word distinguishing method and system |
US8311973B1 (en) * | 2011-09-24 | 2012-11-13 | Zadeh Lotfi A | Methods and systems for applications for Z-numbers |
CN106874292B (en) * | 2015-12-11 | 2020-05-05 | 北京国双科技有限公司 | Topic processing method and device |
CN105426539B (en) * | 2015-12-23 | 2018-12-18 | 成都云数未来信息科学有限公司 | A kind of lucene Chinese word cutting method based on dictionary |
CN107291684B (en) * | 2016-04-12 | 2021-02-09 | 华为技术有限公司 | Word segmentation method and system for language text |
CN106668985A (en) * | 2016-12-22 | 2017-05-17 | 山东大学 | Real-time monitoring system for transfusion |
US10255269B2 (en) * | 2016-12-30 | 2019-04-09 | Microsoft Technology Licensing, Llc | Graph long short term memory for syntactic relationship discovery |
US10565492B2 (en) * | 2016-12-31 | 2020-02-18 | Via Alliance Semiconductor Co., Ltd. | Neural network unit with segmentable array width rotator |
CN107153640A (en) * | 2017-05-08 | 2017-09-12 | 成都准星云学科技有限公司 | A kind of segmenting method towards elementary mathematics field |
CN107391486B (en) * | 2017-07-20 | 2020-10-27 | 南京云问网络技术有限公司 | Method for identifying new words in field based on statistical information and sequence labels |
CN107844475A (en) * | 2017-10-12 | 2018-03-27 | 北京知道未来信息技术有限公司 | A kind of segmenting method based on LSTM |
CN107992467A (en) * | 2017-10-12 | 2018-05-04 | 北京知道未来信息技术有限公司 | A kind of mixing language material segmenting method based on LSTM |
-
2018
- 2018-10-26 CN CN201811258651.5A patent/CN109388806B/en active Active
Patent Citations (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104199972A (en) * | 2013-09-22 | 2014-12-10 | 中科嘉速(北京)并行软件有限公司 | Named entity relation extraction and construction method based on deep learning |
CN105740226A (en) * | 2016-01-15 | 2016-07-06 | 南京大学 | Method for implementing Chinese segmentation by using tree neural network and bilateral neural network |
CN106528738A (en) * | 2016-10-28 | 2017-03-22 | 华北理工大学 | Method and device for intelligent interaction based on natural language analysis |
CN106776562A (en) * | 2016-12-20 | 2017-05-31 | 上海智臻智能网络科技股份有限公司 | A kind of keyword extracting method and extraction system |
CN108304364A (en) * | 2017-02-23 | 2018-07-20 | 腾讯科技(深圳)有限公司 | keyword extracting method and device |
CN108536667A (en) * | 2017-03-06 | 2018-09-14 | 中国移动通信集团广东有限公司 | Chinese text recognition methods and device |
CN107145483A (en) * | 2017-04-24 | 2017-09-08 | 北京邮电大学 | A kind of adaptive Chinese word cutting method based on embedded expression |
CN107145484A (en) * | 2017-04-24 | 2017-09-08 | 北京邮电大学 | A kind of Chinese word cutting method based on hidden many granularity local features |
CN107122479A (en) * | 2017-05-03 | 2017-09-01 | 西安交通大学 | A kind of user cipher conjecture system based on deep learning |
CN107423288A (en) * | 2017-07-05 | 2017-12-01 | 达而观信息科技(上海)有限公司 | A kind of Chinese automatic word-cut and method based on unsupervised learning |
CN107590196A (en) * | 2017-08-15 | 2018-01-16 | 中国农业大学 | Earthquake emergency information screening and evaluating system and system in a kind of social networks |
CN107622049A (en) * | 2017-09-06 | 2018-01-23 | 国家电网公司 | A kind of special word stock generating method of electric service |
CN107622050A (en) * | 2017-09-14 | 2018-01-23 | 武汉烽火普天信息技术有限公司 | Text sequence labeling system and method based on Bi LSTM and CRF |
CN107818130A (en) * | 2017-09-15 | 2018-03-20 | 深圳市电陶思创科技有限公司 | The method for building up and system of a kind of search engine |
CN107665254A (en) * | 2017-09-30 | 2018-02-06 | 济南浪潮高新科技投资发展有限公司 | A kind of menu based on deep learning recommends method |
CN107807964A (en) * | 2017-10-11 | 2018-03-16 | 咪咕互动娱乐有限公司 | Digital content sort method, device and computer-readable recording medium |
CN107894976A (en) * | 2017-10-12 | 2018-04-10 | 北京知道未来信息技术有限公司 | A kind of mixing language material segmenting method based on Bi LSTM |
CN107943783A (en) * | 2017-10-12 | 2018-04-20 | 北京知道未来信息技术有限公司 | A kind of segmenting method based on LSTM CNN |
CN107885853A (en) * | 2017-11-14 | 2018-04-06 | 同济大学 | A kind of combined type file classification method based on deep learning |
CN107943937A (en) * | 2017-11-23 | 2018-04-20 | 杭州源诚科技有限公司 | A kind of debtors assets monitoring method and system based on trial open information analysis |
CN107967318A (en) * | 2017-11-23 | 2018-04-27 | 北京师范大学 | A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets |
CN107798140A (en) * | 2017-11-23 | 2018-03-13 | 北京神州泰岳软件股份有限公司 | A kind of conversational system construction method, semantic controlled answer method and device |
CN107944014A (en) * | 2017-12-11 | 2018-04-20 | 河海大学 | A kind of Chinese text sentiment analysis method based on deep learning |
CN108038103A (en) * | 2017-12-18 | 2018-05-15 | 北京百分点信息科技有限公司 | A kind of method, apparatus segmented to text sequence and electronic equipment |
CN108320740A (en) * | 2017-12-29 | 2018-07-24 | 深圳和而泰数据资源与云技术有限公司 | A kind of audio recognition method, device, electronic equipment and storage medium |
CN108268444A (en) * | 2018-01-10 | 2018-07-10 | 南京邮电大学 | A kind of Chinese word cutting method based on two-way LSTM, CNN and CRF |
CN108304506A (en) * | 2018-01-18 | 2018-07-20 | 腾讯科技(深圳)有限公司 | Search method, device and equipment |
CN108415953A (en) * | 2018-02-05 | 2018-08-17 | 华融融通(北京)科技有限公司 | A kind of non-performing asset based on natural language processing technique manages knowledge management method |
CN108536756A (en) * | 2018-03-16 | 2018-09-14 | 苏州大学 | Mood sorting technique and system based on bilingual information |
CN108563725A (en) * | 2018-04-04 | 2018-09-21 | 华东理工大学 | A kind of Chinese symptom and sign composition recognition methods |
Also Published As
Publication number | Publication date |
---|---|
CN109388806A (en) | 2019-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Alzantot et al. | Generating natural language adversarial examples | |
Kiros et al. | Skip-thought vectors | |
CN105957518B (en) | A kind of method of Mongol large vocabulary continuous speech recognition | |
US20190129947A1 (en) | Neural machine translation method and apparatus | |
CN110175246B (en) | Method for extracting concept words from video subtitles | |
CN100536532C (en) | Method and system for automatic subtilting | |
CN107608960B (en) | Method and device for linking named entities | |
WO2018201600A1 (en) | Information mining method and system, electronic device and readable storage medium | |
CN112818694A (en) | Named entity recognition method based on rules and improved pre-training model | |
CN108052499A (en) | Text error correction method, device and computer-readable medium based on artificial intelligence | |
CN113553429B (en) | Normalized label system construction and text automatic labeling method | |
CN112966525B (en) | Law field event extraction method based on pre-training model and convolutional neural network algorithm | |
CN108038099B (en) | Low-frequency keyword identification method based on word clustering | |
KR102010343B1 (en) | Method and apparatus for providing segmented internet based lecture contents | |
CN109388806B (en) | Chinese word segmentation method based on deep learning and forgetting algorithm | |
CN114756681B (en) | Evaluation and education text fine granularity suggestion mining method based on multi-attention fusion | |
JP2018033048A (en) | Metadata generation system | |
CN109684928A (en) | Chinese document recognition methods based on Internal retrieval | |
Song et al. | LSTM-in-LSTM for generating long descriptions of images | |
CN115310448A (en) | Chinese named entity recognition method based on combining bert and word vector | |
Wang et al. | Combining self-training and self-supervised learning for unsupervised disfluency detection | |
CN111552801A (en) | Neural network automatic abstract model based on semantic alignment | |
CN107590121B (en) | Text normalization method and system | |
Zou et al. | To be an artist: automatic generation on food image aesthetic captioning | |
Andra et al. | Automatic lecture video content summarizationwith attention-based recurrent neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |