WO2008077148A1 - Generating chinese language banners - Google Patents
Generating chinese language banners Download PDFInfo
- Publication number
- WO2008077148A1 WO2008077148A1 PCT/US2007/088466 US2007088466W WO2008077148A1 WO 2008077148 A1 WO2008077148 A1 WO 2008077148A1 US 2007088466 W US2007088466 W US 2007088466W WO 2008077148 A1 WO2008077148 A1 WO 2008077148A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- banner
- banners
- computer
- candidates
- output
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/53—Processing of non-Latin text
Definitions
- Artificial intelligence is the science and engineering of making intelligent machines, especially computer programs.
- Applications of artificial intelligence include game playing and speech recognition.
- an antithetical couplet includes two phrases or sentences written as calligraphy on vertical red banners, typically placed on either side of a door or in a large hall. Such couplets are illustratively displayed during special occasions such as weddings or during the Spring Festival, i.e. Chinese New Year.
- Other types of couplets include birthday couplets, elegiac couplets, decoration couplets, professional or other human association couplets, and the like.
- Antithetical couplets can be of different length.
- a short couplet can include one or two characters while a longer couplet can reach several hundred characters.
- the antithetical couplets can also have diverse forms or relative meanings. For instance, one form can include first and second scroll sentences having the same meaning. Another form can include scroll sentences having the opposite meaning.
- Chinese couplets generally conform to rules or principles such as the following:
- Principle 1 The two sentences of the couplet generally have the same number of words and total number of Chinese characters. Each Chinese character has one syllable when spoken. A Chinese word can have one, two or more characters, and consequently, be pronounced with one, two or more syllables. Each word of a first scroll sentence should have the same number of Chinese characters as the corresponding word in the second scroll sentence.
- Principle 2 Tones (e.g. "Ping” ( 1 B and "Ze” (JJ) in Chinese) are generally coinciding and harmonious. The traditional custom is that the character at the end of first scroll sentence should be "JJC' (called tone "Ze” in Chinese) . This tone is pronounced in a sharp downward tone. The character at the end of the second scroll sentence should be " ⁇ ' (called tone "Ping” in Chinese) . This tone is pronounced with a level tone.
- couplets are accompanied by a banner (a.k.a., a streamer), typically horizontally placed above a door between the vertical couplet banners.
- a banner most commonly a phrase composed of 4 Chinese characters, is used to attach with a couplet to summarize, emphasize and complement the meaning of the couplet.
- the length of a banner can vary from 2 characters to 5 or 6 characters, a banner most typically has 4 characters.
- a basic requirement for a banner is that its meaning should fit the meaning of the first and second scroll sentences. For example, the banner for the couplet ⁇ ⁇ -UWJW WmRaB, " is
- the banner generally consists of 4
- Embodiments disclosed herein pertain to methods for automatically generating a banner given a first scroll sentence and a second scroll sentence of a Chinese couplet.
- the first and/or second scroll sentence can be generated by an automatic computer system or by a human (e.g., manually generated and then provided as input to an automated banner generation system) or obtained from any source (e.g., a book) and provided as input.
- an information retrieval process is utilized to identify banner candidates that best match the first and second scroll sentences.
- candidate banners are automatically generated.
- a ranking model is applied in order to rank banner candidates derived from the banner search and generation processes. One ore more banners are then selected from the ranked banner candidates.
- FIG. 1 is a block diagram of a computing environment .
- FIG. 2 is a broad overview of a process for generating banners.
- FIG. 3 is a flow chart diagram demonstrating steps associated with constructing a banner taxonomy.
- FIG. 4 is a flow chart diagram demonstrating steps associated with finding, for a given couplet, its best matched candidate banners .
- FIG. 5 is a flow chart diagram demonstrating steps associated with banner generation.
- FIG. 1 illustrates an example of a suitable computing system environment 100 in which the embodiments may be implemented.
- the computing system environment 100 is only one example of a suitable computing environment and is not intended to suggest any limitation as to scope of use or functionality. Neither should the computing environment 100 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 100.
- the embodiments are operational with numerous other general purpose or special purpose computing system environments or configurations.
- Examples of well known computing systems, environments, and/or configurations that may be suitable for use with embodiments disclosed herein include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, telephone systems, distributed computing environments that include any of the above systems or devices, and the like.
- the embodiments may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
- program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
- processor executable instructions can be written on any form of a computer readable medium.
- the embodiments may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
- program modules may be located in both local and remote computer storage media including memory storage devices .
- an exemplary system for implementing the embodiments include a general purpose computing device in the form of a computer 110.
- Components of computer 110 may include, but are not limited to, a processing unit 120, a system memory 130, and a system bus 121 that couples various system components including the system memory to the processing unit 120.
- the system bus 121 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.
- such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.
- ISA Industry Standard Architecture
- MCA Micro Channel Architecture
- EISA Enhanced ISA
- VESA Video Electronics Standards Association
- PCI Peripheral Component Interconnect
- Computer 110 typically includes a variety of computer readable media.
- Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media.
- Computer readable media may comprise computer storage media and communication media.
- Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
- Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 110.
- Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
- modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
- communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.
- the system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132.
- ROM read only memory
- RAM random access memory
- BIOS basic input/output system 133
- RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120.
- FIG. 1 illustrates operating system 134, application programs 135, other program modules 136, and program data 137.
- Programs 135 are shown as possibly including a banner generation system, embodiments of which will be described herein in detail. This is but one example of where in environment 100 such systems might be implemented. Other implementations (e.g., as part of programs 145 or 185) should also be considered within the scope of the present invention.
- the computer 110 may also include other removable/non-removable volatile/nonvolatile computer storage media.
- FIG. 1 illustrates a hard disk drive 141 that reads from or writes to nonremovable, nonvolatile magnetic media, a magnetic disk drive 151 that reads from or writes to a removable, nonvolatile magnetic disk 152, and an optical disk drive 155 that reads from or writes to a removable, nonvolatile optical disk 156 such as a CD ROM or other optical media.
- removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like.
- the hard disk drive 141 is typically connected to the system bus 121 through a non-removable memory interface such as interface 140, and magnetic disk drive 151 and optical disk drive 155 are typically connected to the system bus 121 by a removable memory interface, such as interface 150.
- the drives and their associated computer storage media discussed above and illustrated in FIG. 1, provide storage of computer readable instructions, data structures, program modules and other data for the computer 110.
- hard disk drive 141 is illustrated as storing operating system 144, application programs 145, other program modules 146, and program data 147. Note that these components can either be the same as or different from operating system 134, application programs 135, other program modules 136, and program data 137. Operating system 144, application programs 145, other program modules 146, and program data 147 are given different numbers here to illustrate that, at a minimum, they are different copies.
- a user may enter commands and information into the computer 110 through input devices such as a keyboard 162, a microphone 163, and a pointing device 161, such as a mouse, trackball or touch pad.
- Other input devices may include a joystick, game pad, satellite dish, scanner, or the like.
- These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB) .
- a monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190.
- computers may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected through an output peripheral interface 190.
- the computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180.
- the remote computer 180 may be a personal computer, a hand-held device, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110.
- the logical connections depicted in FIG. 1 include a local area network (LAN) 171 and a wide area network (WAN) 173, but may also include other networks.
- LAN local area network
- WAN wide area network
- Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.
- the computer 110 When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170.
- the computer 110 When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet.
- the modem 172 which may be internal or external, may be connected to the system bus 121 via the user input interface 160, or other appropriate mechanism.
- program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device.
- FIG. 1 illustrates remote application programs 185 as residing on remote computer 180. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
- FIG. 2 is a schematic overview of a process for generating banners.
- Banners with four characters are generally most common. Thus, in the present description, embodiments will be described in the context of four character banners. However, the scope of the present invention is not so limited. The same or similar concepts can just as easily be applied in the context of banners with other than four characters .
- first and second scroll sentences are provided as input.
- two different methods are employed to produce banner candidates.
- a ranking model is applied, for example to support selection of the N-best banners from the generated candidates.
- Output banner or banners 210 are selected from the N-best banners.
- some banner candidates are produced utilizing an information retrieval based approach. At least because of a relatively high recurrence of common banners, it can be worthwhile to produce banner candidates by searching a database of existing banners.
- a taxonomy of banners is built and filled by existing banners collected from external sources (e.g., books, Internet, etc.). Then, for a given couplet, the taxonomy is searched with the couplet sentences (first and second sentences) in order to produce a set of best matched candidate banners .
- FIG. 3 is a flow chart diagram demonstrating steps associated with constructing a banner taxonomy.
- 4-character banners are collected.
- Each banner illustratively uses a phrase that, for example, has been used as a banner before, are an idiom, and/or happen to be a high-frequency 4-character phase.
- Those skilled in the art will appreciate that these types of phrases can be obtained from a wide variety of different sources. The scope of the present invention is not limited to one particular source or combination of sources.
- a feature vector is created for each collected banner.
- the feature vector illustratively serves to identify an associated meaning.
- the creation of feature vectors involves first searching the collected banners with a web search engine and collecting returned top N snippets. These snippets can be further combined with information retrieved from one or more additional sources (e.g., a news corpus) to enhance coverage and therefore form a new larger corpus.
- additional sources e.g., a news corpus
- the weight of each feature word is decided by the mutual information (see the following equation) between the feature word and the candidate banner in the new corpus .
- MI(w,c,) p(w,c,)log P , W ' C ' Eq. 1 p(w)p(a)
- the collected banners are divided into semantic categories (e.g., 14 categories). In one embodiment, this sorting is done through human intervention (e.g., categorizing is done by human experts). In one embodiment, the categories are defined by human experts, such as banners for Spring Festival, banners for birthday, banners for wedding ceremony, banners to celebrate success, etc. In one embodiment, a single banner is allowed to belong to multiple categories.
- the collected banners are automatically clustered within the categories into subcategories .
- this is done using K- Means Clustering.
- the distance measure between two candidate banners used in clustering is illustratively defined as the cosine value of their feature vectors, i.e.,
- Vl and V2 represent the feature word vectors of the two banner candidates respectively.
- a centroid feature vector is created for each subcategory. In one embodiment, thi s i s done by averaging the member vectors of the subcategory, i . e . ,
- FIG. 4 is a flow chart diagram demonstrating steps associated with using the constructed taxonomy to, for a given couplet, find its best matched candidate banners.
- the feature vector of the input couplet is created using the words included in the couplet.
- the weight of each word in the feature vector is the frequency of the word appearing in the couplet .
- a distance is calculated between the couplet feature vector and the centroid feature vector of each subcategory in the banner taxonomy.
- Vl and V2 represent the couplet feature vector and the centroid feature vector of each subcategory in the banner taxonomy respectively.
- a number n subcategories with the minimum distance are illustratively selected.
- step 406 the distance is calculated between the couplet feature vector and the feature vector of each banner in the selected n subcategories.
- Vl and V2 represent the couplet feature vector and the feature vector of each candidate banner in the selected n subcategories respectively.
- n candidate banners with the minimum distance are selected.
- the n selected candidate banners are used for the ranking model, which will be described in greater detail below.
- FIG. 5 is a flow chart diagram demonstrating, on a high level, steps associated with this second approach to banner generation.
- step 502 related words are obtained using a translation model.
- count (a, b) represents the number of occurrences that a and b appear in the same position of a couplet.
- Count (b) represents the frequency of b appearing in training data.
- a word wj is selected into a related words list if p (wj
- an association strength (AS) model is used to enhance the related words list. Given a couplet C(cl...cn), the association strength between a word w and the couplet C is illustratively approximated using the following formula:
- MI (w, ci) is illustrtively approximately estimated with couplet training data, i.e.,
- p (w) count (w) /N
- p (w, ci) count (w, ci) /N
- N is the size of couplet training data
- count (w) is the number of couplets containing word w
- count (w,ci) is the number of couplets containing both w and ci .
- word w is added into a related words list if AS (w, C) if over a threshold.
- step 506 arbitrary numbers of words are combined in the list to form 4-characters banner candidates. Some or all of these candidate banners are used for the ranking model, which will now be described in greater detail.
- ranking is performed with a ranking SVM model:
- x indicates the feature vector of a banner candidate and w is the weight vector of SVM model.
- Features used in x illustratively can include but are not limited to (assuming B is a banner candidate) :
- the association score between the banner candidate and the couplet is first segmented into words. Assuming the candidate banner is segmented into ⁇ wl, w2... wn ⁇ , then its association is illustratively computed using the following formula:
- Context similarity between the banner candidate B and the couplet C For those candidate banners obtained using information retrieval based method, their context similarity have been obtained when searching in the taxonomy. For those from generation with input couplet method, their context similarity is illustratively computed with the input couplet (e.g., using context similarity equation described above) . Their feature vector can be obtained by summing up the feature vectors of their component words. To get the feature vector of each word in the vocabulary beforehand, a method similar to creation of the feature vector for candidate banners in the taxonomy is illustratively applied.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Document Processing Apparatus (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP07865944.8A EP2122491A4 (en) | 2006-12-20 | 2007-12-20 | Generating chinese language banners |
CA002669218A CA2669218A1 (en) | 2006-12-20 | 2007-12-20 | Generating chinese language banners |
JP2009543241A JP5337705B2 (en) | 2006-12-20 | 2007-12-20 | Chinese banner generation |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US87608506P | 2006-12-20 | 2006-12-20 | |
US60/876,085 | 2006-12-20 | ||
US11/788,448 US8000955B2 (en) | 2006-12-20 | 2007-04-20 | Generating Chinese language banners |
US11/788,448 | 2007-04-20 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2008077148A1 true WO2008077148A1 (en) | 2008-06-26 |
Family
ID=39536757
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2007/088466 WO2008077148A1 (en) | 2006-12-20 | 2007-12-20 | Generating chinese language banners |
Country Status (5)
Country | Link |
---|---|
US (2) | US8000955B2 (en) |
EP (1) | EP2122491A4 (en) |
JP (1) | JP5337705B2 (en) |
CA (1) | CA2669218A1 (en) |
WO (1) | WO2008077148A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7962507B2 (en) | 2007-11-19 | 2011-06-14 | Microsoft Corporation | Web content mining of pair-based data |
TW200933391A (en) * | 2008-01-24 | 2009-08-01 | Delta Electronics Inc | Network information search method applying speech recognition and sysrem thereof |
CN111984783B (en) * | 2020-08-28 | 2024-04-02 | 达闼机器人股份有限公司 | Training method of text generation model, text generation method and related equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030083861A1 (en) * | 2001-07-11 | 2003-05-01 | Weise David N. | Method and apparatus for parsing text using mutual information |
US20050071148A1 (en) * | 2003-09-15 | 2005-03-31 | Microsoft Corporation | Chinese word segmentation |
Family Cites Families (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4712174A (en) * | 1984-04-24 | 1987-12-08 | Computer Poet Corporation | Method and apparatus for generating text |
JPH083815B2 (en) * | 1985-10-25 | 1996-01-17 | 株式会社日立製作所 | Natural language co-occurrence relation dictionary maintenance method |
SG49804A1 (en) | 1996-03-20 | 1998-06-15 | Government Of Singapore Repres | Parsing and translating natural language sentences automatically |
JPH10312382A (en) * | 1997-05-13 | 1998-11-24 | Keiichi Shinoda | Similar example translation system |
US6299452B1 (en) * | 1999-07-09 | 2001-10-09 | Cognitive Concepts, Inc. | Diagnostic system and method for phonological awareness, phonological processing, and reading skill testing |
WO2001033409A2 (en) * | 1999-11-01 | 2001-05-10 | Kurzweil Cyberart Technologies, Inc. | Computer generated poetry system |
US7269802B1 (en) * | 1999-11-01 | 2007-09-11 | Kurzweil Cyberart Technologies, Inc. | Poetry screen saver |
US6941262B1 (en) * | 1999-11-01 | 2005-09-06 | Kurzweil Cyberart Technologies, Inc. | Poet assistant's graphical user interface (GUI) |
JP2003178057A (en) * | 2001-12-13 | 2003-06-27 | Ntt Data Corp | Phrase producing device, phrase producing method, and program |
AUPR958901A0 (en) * | 2001-12-18 | 2002-01-24 | Telstra New Wave Pty Ltd | Information resource taxonomy |
US20040122660A1 (en) * | 2002-12-20 | 2004-06-24 | International Business Machines Corporation | Creating taxonomies and training data in multiple languages |
US20040133558A1 (en) * | 2003-01-06 | 2004-07-08 | Masterwriter, Inc. | Information management system plus |
JP2005100335A (en) * | 2003-09-01 | 2005-04-14 | Advanced Telecommunication Research Institute International | Machine translation apparatus, machine translation computer program, and computer |
JP2005228016A (en) * | 2004-02-13 | 2005-08-25 | Hitachi Ltd | Character display method |
US7810021B2 (en) * | 2006-02-24 | 2010-10-05 | Paxson Dana W | Apparatus and method for creating literary macramés |
US20070294223A1 (en) * | 2006-06-16 | 2007-12-20 | Technion Research And Development Foundation Ltd. | Text Categorization Using External Knowledge |
-
2007
- 2007-04-20 US US11/788,448 patent/US8000955B2/en active Active
- 2007-12-20 CA CA002669218A patent/CA2669218A1/en not_active Abandoned
- 2007-12-20 JP JP2009543241A patent/JP5337705B2/en not_active Expired - Fee Related
- 2007-12-20 WO PCT/US2007/088466 patent/WO2008077148A1/en active Application Filing
- 2007-12-20 EP EP07865944.8A patent/EP2122491A4/en not_active Ceased
-
2011
- 2011-04-15 US US13/087,407 patent/US8862459B2/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030083861A1 (en) * | 2001-07-11 | 2003-05-01 | Weise David N. | Method and apparatus for parsing text using mutual information |
US20050071148A1 (en) * | 2003-09-15 | 2005-03-31 | Microsoft Corporation | Chinese word segmentation |
Non-Patent Citations (3)
Title |
---|
PABLO GERVAS: "An expert system for the composition of formal Spanish poetry", KNOWLEDGE-BASED SYSTEMS, vol. 14, no. 3-4, 1 June 2011 (2011-06-01) |
See also references of EP2122491A4 * |
YI YONG; HE ZHONG-SHI; LI LIANG-YAN; ZHOU JIAN-YONG; QU YI-BO ZHANG HONG-BIN: "On Computation Models of Chinese Couplet Responses", COMPUTER SCIENCE, vol. 33, no. 4, 1 April 2006 (2006-04-01), XP009500561 |
Also Published As
Publication number | Publication date |
---|---|
CA2669218A1 (en) | 2008-06-26 |
EP2122491A1 (en) | 2009-11-25 |
US20080154580A1 (en) | 2008-06-26 |
JP5337705B2 (en) | 2013-11-06 |
US20110257959A1 (en) | 2011-10-20 |
US8000955B2 (en) | 2011-08-16 |
JP2010515123A (en) | 2010-05-06 |
US8862459B2 (en) | 2014-10-14 |
EP2122491A4 (en) | 2017-11-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106997382B (en) | Innovative creative tag automatic labeling method and system based on big data | |
US8042053B2 (en) | Method for making digital documents browseable | |
CN111177365B (en) | Unsupervised automatic abstract extraction method based on graph model | |
JP3272288B2 (en) | Machine translation device and machine translation method | |
US8694303B2 (en) | Systems and methods for tuning parameters in statistical machine translation | |
US9483460B2 (en) | Automated formation of specialized dictionaries | |
JP2005526317A (en) | Method and system for automatically searching a concept hierarchy from a document corpus | |
CN108073565A (en) | The method and apparatus and machine translation method and equipment of words criterion | |
WO2007005884A2 (en) | Generating chinese language couplets | |
JP3831357B2 (en) | Parallel translation information creation device and parallel translation information search device | |
Kessler et al. | Extraction of terminology in the field of construction | |
US8862459B2 (en) | Generating Chinese language banners | |
CN113486155B (en) | Chinese naming method fusing fixed phrase information | |
Poornima et al. | Text preprocessing on extracted text from audio/video using R | |
CN107818078B (en) | Semantic association and matching method for Chinese natural language dialogue | |
Malandrakis et al. | Affective language model adaptation via corpus selection | |
KR20000036487A (en) | A Database System for Korean-English Translation Using Information Retrieval Techniques | |
Raghallaigh et al. | Improving full-text search results on dúchas. ie using language technology | |
Rao et al. | A statistical model for gist generation: A case study on hindi news article | |
Vlasenko | Research of Summarization Methods for Presentation Slides Generation | |
CN118036725A (en) | Atlas generation method and system based on text big data | |
JP2000322449A (en) | Natural language sentence relation deciding device, natural language sentence retrieving device, natural language sentence generating device, framed expression output device to be used for the same, method therefor and recording medium | |
CN118747293A (en) | Intelligent recall method and device for document writing and document generation method and device | |
CN115017300A (en) | Automatic summarization method and system based on TextRank and multi-dimensional semantic feature fusion | |
Zhu | Spotting keywords and sensing topic changes in speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200780047860.9 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07865944 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2669218 Country of ref document: CA |
|
ENP | Entry into the national phase |
Ref document number: 2009543241 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2007865944 Country of ref document: EP |