CN108664477B - Translation method of transaction information multi-language machine translation subsystem - Google Patents

Translation method of transaction information multi-language machine translation subsystem Download PDF

Info

Publication number
CN108664477B
CN108664477B CN201810481052.3A CN201810481052A CN108664477B CN 108664477 B CN108664477 B CN 108664477B CN 201810481052 A CN201810481052 A CN 201810481052A CN 108664477 B CN108664477 B CN 108664477B
Authority
CN
China
Prior art keywords
commodity
information
comment
attribute
multilingual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810481052.3A
Other languages
Chinese (zh)
Other versions
CN108664477A (en
Inventor
张俊星
贺建军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Minzu University
Original Assignee
Dalian Minzu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Minzu University filed Critical Dalian Minzu University
Priority to CN201810481052.3A priority Critical patent/CN108664477B/en
Publication of CN108664477A publication Critical patent/CN108664477A/en
Application granted granted Critical
Publication of CN108664477B publication Critical patent/CN108664477B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Accounting & Taxation (AREA)
  • General Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)

Abstract

The divisional application discloses a translation method of a transaction information multilingual machine translation subsystem, which belongs to the field of e-commerce information translation and is used for solving the problem of transaction information translation.

Description

Translation method of transaction information multi-language machine translation subsystem
The application is a divisional application with the name of application number 201610489374.3, application date 2016-06-28 and invention of a cross-border electronic commerce platform oriented Chinese-English-Mongolian-Tibet-Uygur language machine translation system and method.
Technical Field
The invention belongs to the field of E-commerce information translation, and relates to a cross-border electronic commerce platform-oriented Chinese-English-Mongolian-Tibet-Uygur language multi-language machine translation system and method.
Background
Today, global economic development has entered the era of network economy, and the internet has spread throughout the world and has begun to have a tremendous impact on economic changes in various countries and the world. With the economic development, electronic commerce is gradually changing the economic development mode and the commodity circulation mode, and the original communication mode is gradually changed, and more electronic commerce is started, among countries in the world, between countries and enterprises, between enterprises and individuals, between individuals and individuals, and between economic communication and commodity circulation. With the continuous increase of RMB, the increase of raw material price and the continuous increase of labor cost in recent years, foreign trade-oriented enterprises in China are strongly impacted, so that the traditional foreign trade development speed in China is seriously slowed down, but cross-border electronic commerce keeps a rapidly increasing situation. Data of the department of commerce shows that the amount of cross-border electronic commerce transactions in China is 1.6 trillion yuan in 2011, which is increased by 33% on year-on-year basis; in 2012, the cross-border electronic commerce transaction amount in China reaches 2 trillion yuan, the year by year is increased by 25%, and the foreign trade acceleration rate in China in the same period is only 6.2%. After listing in Shanghai self-trade area in 2014, a plurality of electric merchants see huge opportunities of sea panning, develop cross-border electric merchants, and a plurality of listed companies begin to lay out cross-border electric merchant markets at present. According to incomplete statistics, foreign trade enterprises which develop cross-border electronic commerce business through various platforms in China currently exceed 20 thousand. The cross-border electronic commerce has huge development potential and will become an important growth point for foreign trade in China. With the development of cross-border e-commerce, the demand for e-commerce translation is increasing, but the current research situation of e-commerce translation is greatly lagging behind the demand of the translation industry. Especially relates to the electronic business translation system of national minority languages such as Mongolian Tibetan dimension, etc., at present, a few. Therefore, the invention establishes a cross-border electronic commerce platform-oriented Chinese-English-Mongolian-Tibet-Uygur language multi-language machine translation system and has important application value. The key problem to be solved is to translate the information of the product into the native language of the user, and the user browses, selects and purchases the goods on the e-commerce platform of the native language version.
Disclosure of Invention
The invention aims to solve the problem that in order to better develop cross-border e-commerce services for minority enterprises, a Chinese-English-Mongolian-Tibet-Uygur language machine translation system oriented to a cross-border e-commerce platform is established, enterprises or sellers only need to input commodity information in a native language environment, the translation system automatically translates the commodity information into other languages for target customers to browse and purchase, the customers only need to purchase commodities singly in the native language environment, and the translation system automatically translates the purchase information and feeds the purchase information back to the sellers.
In order to solve the problems, the technical scheme provided by the invention is characterized in that: a cross-border electronic commerce platform-oriented Chinese-English Mongolian and Tibetan language multilingual machine translation system comprises an attribute information multilingual machine translation subsystem used for translating attribute information of commodities, a comment information multilingual machine translation subsystem used for translating comment information of the commodities and a transaction information multilingual machine translation subsystem used for translating transaction information of the commodities, wherein when the subsystems translate, the subsystems retrieve and correspondingly translate in a Chinese-English Mongolian and Tibetan language multilingual parallel corpus, and the Chinese-English Mongolian and Tibetan language multilingual parallel corpus is constructed on the basis of an electronic dictionary and bilingual web pages.
Has the advantages that: the invention enables an enterprise or a seller to input commodity information only under the native language environment, the translation system automatically translates the commodity information into other languages for a target client to browse and purchase, the client only needs to purchase commodities singly under the native language environment, the translation system automatically translates the purchase information and feeds back the information to the seller, and a Chinese-English Mongolian multilingual parallel corpus is used and is constructed based on an electronic dictionary and bilingual webpages, so that the translation accuracy can be improved.
Drawings
FIG. 1 shows the overall construction of a Chinese-English Mongolian and Tibetan language multilingual machine translation system and the method employed;
FIG. 2 shows a translation process of the Chinese-English-Mongolian-Tibet-Uygur multi-language machine translation subsystem for attribute information;
FIG. 3 shows a translation process of a Chinese-English-Mongolian-Tibet-Uygur multi-language machine translation subsystem for commenting information;
FIG. 4 shows a translation process of a Chinese-English-Mongolian-Tibet-Uygur multi-language machine translation subsystem for transaction information;
FIG. 5 shows a process of constructing a multilingual parallel corpus of Chinese, English, Mongolian and Tibetan language information;
fig. 6 shows a flowchart of the product review element extraction method.
Detailed Description
Example 1: the trade process of electronic commerce mainly comprises three steps of commodity purchasing, payment settlement and logistics distribution. Whether a customer can purchase a commodity or not mainly depends on whether the attribute information of the commodity is required by the customer and how other customers evaluate the commodity, so that accurate translation of the attribute information and the evaluation information of the commodity is important for a cross-border electronic commerce platform, and in addition, after a seller accurately knows the commodity information purchased by the customer, the delivery address and other transaction information, the safe delivery of the commodity purchased by the customer to the customer can be ensured, so that the accurate translation of the transaction information is also important, and the attribute information, the evaluation information and the transaction information of the commodity are different in the difficulty and the method of translation, so that in order to solve the problems, as shown in fig. 1, a cross-border electronic commerce platform multi-language translation system is established in the embodiment, and mainly comprises the attribute information multi-language machine translation system, The three subsystems of the comment information multilingual machine translation system and the transaction information multilingual machine translation system relate to a multilingual parallel corpus and three machine translation methods, namely a Chinese-English Mongolian Tibetan language multilingual parallel corpus of commodity information, a rule-based unregistered word multilingual machine translation method, a transliteration-based unregistered word machine translation method and a commodity comment multilingual abstract generation method.
And the attribute information translation subsystem for the commodity. Since the attribute information of the commodity on the e-commerce platform is usually named entities such as the name, the place of production, the specification and the like of the commodity and rarely contains complex semantic information, the translation difficulty of the attribute information is slightly lower than that of a common text, and the method is different, and the attribute information is basically a multi-language translation of the named entities, so that, as shown in fig. 2, the translation process of the Chinese-English Montgomery-Tibetan language multi-language machine translation subsystem of the attribute information is that each attribute name or attribute value of the commodity is read first, whether the attribute name or attribute value exists or not is searched in a Chinese-English Montgomery-Tibetan language parallel corpus of the commodity information, if the attribute name or attribute value exists, the multi-language translation result of the attribute name or attribute value is directly given according to the parallel corpus, if the attribute name or attribute value does not exist, the attribute is an unknown word, and the attribute information is split into the named entities existing in a smaller parallel corpus by using a word splitting method, then, the method is translated into various languages by using a machine translation method based on rules, and if the attribute name or the attribute value cannot be split into small named entities which already exist, the method is directly translated into other languages by using a machine translation method based on transliteration.
And a comment information translation subsystem for the commodity. The comment information of the commodity is an important factor for determining whether a customer can buy the commodity, and provides important basis for manufacturers to formulate a commodity research and development strategy and an improvement direction, because the electronic commerce platform generally has no requirement on the content of the comment of the commodity by the user, the user is possible to make a targeted comment on the attribute or the use feeling of a certain commodity concerned by the user, and also can issue contents irrelevant to the commodity, and no matter whether the new customer or the manufacturer is concerned by the user, the feeling and the evaluation of the commodity attribute are often the people, so when translating the comment information of the commodity, the user only needs to translate the commodity attribute in the comment information, the evaluation word corresponding to the attribute, the emotion comment elements of the user and the like, and the translation is not needed, so that the translation difficulty can be reduced, on the other hand, the method can help customers and manufacturers to see the required evaluation information at a glance. According to the thought, the comment information translation process shown in the figure 3 is adopted, for each comment of a commodity, a certain comment element extraction method is used for extracting commodity attribute-evaluation word pairs and emotional tendency of a client in the comment information, then the attribute-evaluation word pairs are translated into different languages according to a Chinese-English Mongolian multilingual parallel corpus, and commodity comment abstracts of various language versions are generated according to the attribute-evaluation word pairs and the emotional tendency of the client under each language environment by a certain abstract generation method, so that the multilingual translation of the commodity comment information is realized.
For the transaction information translation subsystem. The transaction information translation subsystem has the main function of translating the transaction information completed by the customer in the native language environment of the customer into the transaction information in the native language environment of the seller, so that the seller can deliver goods to the customer. The related content in the transaction process mainly comprises the related information of the commodity purchased by the customer and the related information of the name, the receiving address and the like of the customer, and as the related information of the commodity is completed in the commodity information translation subsystem, the commodity information only needs to be corresponding to the native language environment of the seller from the native language environment of the customer, so the main difficulty of the transaction information translation subsystem is the translation of the related information of the name, the receiving address and the like of the customer, and as shown in fig. 4, the translation of the information is mainly realized by a multi-language machine translation method based on transliteration.
The construction of the multilingual translation system and the translation process are described above, and the following describes in detail the construction problem of the multilingual parallel corpus and the multilingual abstract generation method of product reviews involved in the translation process.
And (5) constructing a multilingual parallel corpus. Parallel corpora are important resources indispensable to statistical machine translation and a series of related research applications thereof. The traditional method for manually checking and inputting parallel corpora is time-consuming and labor-consuming, and a large-scale parallel corpus is difficult to build in a limited time. With the rise of various bilingual and multilingual websites on the internet, many researchers have begun to research on obtaining bilingual parallel corpora from the internet. The invention constructs a multilingual parallel corpus of commodity information based on an electronic dictionary and bilingual webpages, the specific flow is shown in figure 5, Chinese commodity information to be translated is obtained through various Chinese electronic commerce platforms, then the bilingual dictionary is used for translating partial commodity information, the advantage of translating by using the bilingual dictionary is easy to obtain, convenient to use and high in accuracy, the defect is that translation of a plurality of professional vocabularies is not available in the bilingual dictionary, therefore, the Chinese commodity information which cannot be translated by the bilingual dictionary is translated into other languages by using a bilingual parallel sentence mining method based on the internet, the specific idea is that the similarity of webpage label sequences and the similarity of maximally matched calculation number sequences are used as characteristic information, a support vector machine is used for extracting candidate parallel webpages, and then sentences are segmented on the webpages, And finally, obtaining Chinese-English, Chinese-Mongolian, Chinese-Tibetan and Chinese-Uygur bilingual parallel linguistic data of commodity information through operations of alignment, arrangement and the like, thereby completing the construction of a multilingual parallel corpus.
Provided is a method for extracting product review elements. The comment information of a customer on a commodity on an e-commerce platform usually comprises two parts of contents (such as Jingdong mart), the first part is a comment with a fixed format, the customer is required to evaluate the advantages and the disadvantages of the commodity respectively, the comment information mostly exists in the form of subjective phrases or short sentences, commodity attributes and evaluation words are generally explicitly specified, and the reference and the metaphor are less adopted, and the comment information irrelevant to the commodity is less available; the second part is free comment, and the reviewer can freely express the opinion on the attribute information of the commodity and can also publish comment information irrelevant to the commodity. The invention extracts two comment elements, namely attribute-comment word pair and emotion tendentiousness, of a commodity according to the flow shown in FIG. 6, and firstly, because the form of the attribute-comment word pair in the fixed-format comment information is usually simpler, the attribute-comment word pair in the fixed-format comment information is extracted by a method of directly matching with an artificial dictionary; then, for free comment information, extracting attribute-comment word pairs by using a commodity attribute word and evaluation word synchronous extraction algorithm based on a part-of-speech relation template, namely firstly mining possible part-of-speech dependency relation patterns from training samples by using a supervised sequence rule mining algorithm, scoring confidence degrees of the patterns, forming a template set by using the patterns with higher confidence degrees, and then extracting possible attribute-comment word pairs from the comment information by using a template; and finally, after attribute-evaluation word pairs in the comment information are obtained, analyzing the emotion tendentiousness of the comment information by using a method based on an emotion dictionary, namely judging the emotion tendentiousness of the evaluation words by using an emotion dictionary, and judging the emotion tendentiousness of the comment sentence according to the number superiority of the positive and negative evaluation words in the comment sentence.
Provided is a method for generating a commodity review abstract. The invention is intended to organize the commodity review summaries from the commodity level and the review level, respectively. The commodity-level comment abstract is to classify and summarize all comments of the same commodity to generate a user's overall evaluation of the commodity, so that a reader can comprehensively know the whole and specific attributes of the commodity on a statistical level, the commodity-level comment abstract comprises two parts of contents, the first part is the whole grading of the commodity by the client, and is mainly used for counting the emotional tendency of each comment of the commodity, the overall score of a commodity is calculated through statistics of various emotions, the second part is the overall evaluation of various attributes of the commodity by a user, the basic idea is to cluster attribute-evaluation word pairs in the commodity comment, and displaying the comments of the customers to the main attributes of the commodities in a list form according to the clustering result, and attaching the number of the positive and negative comments under each attribute. The commodity comment abstract of the comment hierarchy is that each client comment is organized into an abstract, the commodity comment abstract of the comment hierarchy can enable readers to have comprehensive knowledge of a commodity, however, sometimes, the details of the commodity need to be more deeply known by reading the comment details of each client, and therefore, the reviews of each client are generated by adopting a topic model method.
Example 2: a cross-border electronic commerce platform-oriented Chinese-English Mongolian and Tibetan-Uygur language multi-language machine translation system comprises an attribute information multi-language machine translation subsystem for translating commodity attribute information, a comment information multi-language machine translation subsystem for translating comment information of commodities and a transaction information multi-language machine translation subsystem for translating transaction information of commodities, wherein when the subsystems translate, corresponding translation is searched and carried out in a Chinese-English Mongolian and Tibetan language multi-language parallel corpus, and the Chinese-English Mongolian and Tibetan language multi-language parallel corpus is constructed on the basis of an electronic dictionary and bilingual webpage.
As an embodiment, the method for constructing the chinese-english Mongolian Tibetan language multilingual parallel corpus is as follows: acquiring Chinese commodity information to be translated through each Chinese electronic commerce platform, and translating partial commodity information by using a bilingual dictionary; and the similarity of the webpage label sequence and the similarity of the maximum matching calculation number sequence are used as feature information, candidate parallel webpages are extracted by using a support vector machine, and then sentence segmentation, alignment and arrangement are carried out on the webpages to obtain Chinese-English, Hanmeng, Hantiban and Hanwei bilingual parallel linguistic data of commodity information, so that the construction of a multilingual parallel corpus is completed.
The process of translation for each subsystem is described in detail below:
the translation process of the attribute information multi-language machine translation subsystem is as follows: reading each attribute name or attribute value of a commodity, searching whether the attribute name or attribute value exists in a Chinese-English Mongolian Tibetan dimension multilingual parallel corpus of commodity information, if so, directly giving a multilingual translation result of the attribute name or attribute value according to a Chinese-English Mongolian Tibetan dimension multi-parallel corpus, if not, indicating that the attribute name or attribute value is an unknown word, splitting the attribute name or attribute value into named entities existing in a smaller Chinese-English Mongolian Tibetan dimension parallel corpus by using a word segmentation method, then translating the named entities into various languages by using a rule-based machine translation method, and if the attribute name or attribute value cannot be split into the existing smaller named entities, directly translating the named entities into other languages by using a machine translation method based on transliteration.
The translation process of the comment information multilingual machine translation subsystem is as follows: for each comment of the commodity, a commodity comment element extraction method is used for extracting commodity attribute-evaluation word pairs and emotional orientation of a client in comment information, the attribute-evaluation word pairs are translated into different languages according to a Chinese-English Mongolian multilingual parallel corpus, commodity comment abstracts of various language versions are generated by a commodity comment abstract generation method according to the attribute-evaluation word pairs and the emotional orientation of the client in each language environment, and multilingual translation of the commodity comment information is achieved.
The translation process of the transaction information multi-language machine translation subsystem is as follows: the translation process of the attribute information multi-language machine translation subsystem is as follows: reading trade information of a commodity, searching whether the trade information exists in a Chinese-English Mongolian and Tibetan language multilingual parallel corpus of the commodity information, if the trade information exists, directly giving a multilingual translation result of the trade information according to the Chinese-English Mongolian and Tibetan language multilingual corpus, and if the trade information does not exist, indicating that the trade information is an unknown word, and directly translating the unknown word into other languages by adopting a machine translation method based on transliteration.
The method for extracting the commodity comment elements is used for extracting two comment elements, namely attribute-evaluation word pairs and emotional tendencies of commodities, and comprises the following steps:
firstly, extracting attribute-comment word pairs in fixed format comment information by a method of directly matching with an artificial dictionary; then, for free comment information, extracting attribute-comment word pairs by using a commodity attribute word and evaluation word synchronous extraction algorithm based on a part-of-speech relation template;
and finally, after the attribute-evaluation word pair in the comment information is obtained, analyzing the emotion tendentiousness of the comment information by using a method based on an emotion dictionary.
The method for generating the commodity comment abstract organizes the commodity comment abstract from a commodity level and a comment level respectively, and comprises the following steps:
clustering attribute-evaluation word pairs in the commodity comments, displaying the comments of the main attributes of the commodity by the customers in a list form according to clustering results, and attaching the number of positive and negative comments under each attribute;
and then, organizing a summary of the comments of each client aiming at the commodity comment summary of the comment level, wherein the comment summary of the commodity level generates a summary of the comments of each client on the word granularity by adopting a topic model method.
The commodity attribute word and evaluation word synchronous extraction algorithm based on the part-of-speech relationship template is characterized in that possible part-of-speech dependency relationship patterns are firstly extracted from training samples through a supervised sequence rule mining algorithm, confidence degree grading is carried out on the patterns, a template set is formed by the patterns with higher confidence degrees, and then possible attribute-evaluation word pairs are extracted from comment information by using the templates; the method based on the emotion dictionary is characterized in that the emotion tendentiousness of the evaluation words is judged through the emotion dictionary, and then the emotion tendentiousness of the comment sentence is judged according to the number superiority of the positive and negative evaluation words in the comment sentence.
The present embodiment also relates to a translation method using the translation system in any of the above schemes, including:
translating the commodity attribute information; translating the comment information of the commodity; translating the transaction information of the commodity; when each subsystem is translated, searching in a Chinese-English Mongolian Tibetan language multilingual parallel corpus and performing corresponding translation, wherein the multilingual parallel corpus is constructed on the basis of an electronic dictionary and bilingual webpages;
the step of translating the comment information of the commodity includes a step of extracting the commodity comment elements and a step of generating a commodity comment abstract.
The above description is only for the purpose of creating a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the technical scope of the present invention.

Claims (1)

1. A translation method of a transaction information multilingual machine translation subsystem is characterized in that: the system comprises an attribute information multilingual machine translation subsystem for translating the attribute information of the commodity, a comment information multilingual machine translation subsystem for translating the comment information of the commodity and a transaction information multilingual machine translation subsystem for translating the transaction information of the commodity, wherein when the subsystems translate, the subsystems are searched in a Chinese-English Mongolian multilingual parallel corpus and perform corresponding translation, and the Chinese-English Mongolian multilingual parallel corpus is constructed on the basis of an electronic dictionary and bilingual webpages; reading transaction information of a commodity, searching whether the transaction information exists in a Chinese-English Mongolian and Tibetan-Wei multilingual parallel corpus of the commodity information, if so, directly giving a multilingual translation result of the transaction information according to the Chinese-English Mongolian and Tibetan-Wei multilingual corpus, if not, indicating that the transaction information is an unregistered word, directly translating the unregistered word into other languages by adopting a machine translation method based on transliteration, acquiring Chinese commodity information to be translated through each Chinese electronic commerce platform, and translating part of the commodity information by utilizing a bilingual dictionary; the similarity of the webpage label sequence and the similarity of the maximum matching calculation number sequence are used as feature information, a support vector machine is used for extracting candidate parallel webpages, and then sentence segmentation, alignment and arrangement are carried out on the webpages to obtain Chinese-English, Chinese-Mongolian, Chinese-Tibetan and Chinese-Uygur bilingual parallel linguistic data of commodity information, so that the construction of a multilingual parallel corpus is completed, the multilingual parallel corpus comprises three subsystems, namely an attribute information multilingual machine translation system, a comment information multilingual machine translation system and a transaction information multilingual machine translation system, and the multilingual parallel corpus and three machine translation methods are related to one multilingual parallel corpus and the three machine translation methods, namely the Chinese-English-Mongolian multilingual parallel corpus of the commodity information, a rule-based unregistered word multilingual machine translation method, a transliteration-based unregistered word machine translation method and a multilingual abstract generation method of commodity comments;
the translation process of the Chinese-English Mongolian-Tibet-Uygur language multi-language machine translation subsystem of the attribute information comprises the steps of firstly reading each attribute name or attribute value of a commodity, searching and seeing whether the attribute name or attribute value exists in a Chinese-English Mongolian-Tiygur language multi-language parallel corpus of the commodity information, if the attribute name or attribute value exists, directly giving a multi-language translation result of the attribute name or attribute value according to the parallel corpus, if the attribute name or attribute value does not exist, indicating that the attribute name or attribute value is an unknown word, splitting the unknown word into named entities existing in a smaller parallel corpus by using a word splitting method, then translating the named entities into various languages by using a rule-based machine translation method, and if the attribute name or attribute value cannot be split into the existing small named entities, directly translating the named entities into other languages by using a machine translation method based on transliteration;
the comment information translation process of the comment information translation subsystem of the commodity comprises the steps of firstly utilizing a certain comment element extraction method to extract commodity attribute-evaluation word pairs and emotional tendencies of customers in comment information for each comment of the commodity, then translating the attribute-evaluation word pairs into different languages according to a Chinese-English Mongolian multilingual parallel corpus, and then utilizing a certain abstract generation method to generate commodity comment abstracts of various language versions according to the attribute-evaluation word pairs and the emotional tendencies of the customers under each language environment, so that the multilingual translation of the commodity comment information is realized;
for the transaction information translation subsystem, a multi-language machine translation method based on transliteration is adopted to realize information translation;
constructing a multilingual parallel corpus: a multilingual parallel corpus of commodity information is constructed based on an electronic dictionary and bilingual webpages, and the specific process is as follows: firstly, acquiring Chinese commodity information to be translated through various Chinese electronic commerce platforms, then translating partial commodity information by utilizing a bilingual dictionary, translating the Chinese commodity information which cannot be translated by the bilingual dictionary into other languages by utilizing a bilingual parallel statement mining method based on the Internet, wherein the specific idea is that the similarity of a webpage label sequence and the similarity of a maximum matching calculation number sequence are used as feature information, a support vector machine is utilized to extract candidate parallel webpages, then sentence segmentation, alignment and arrangement are carried out on the webpages, and finally, Chinese-English, Hanmeng, Hanzang and Hanwei bilingual parallel linguistic data of the commodity information are obtained, so that the construction of a multilingual parallel corpus is completed;
extracting two comment elements, namely an attribute-comment word pair and emotion tendentiousness, of a commodity by an extraction method of the commodity comment elements, and firstly extracting the attribute-comment word pair in the fixed-format comment information by a method of directly matching with an artificial dictionary; then, for free comment information, extracting attribute-comment word pairs by using a commodity attribute word and evaluation word synchronous extraction algorithm based on a part-of-speech relation template, namely firstly excavating part-of-speech dependency relation patterns from training samples by using a supervised sequence rule mining algorithm, carrying out confidence degree scoring on the patterns, and then extracting attribute-comment word pairs from the comment information by using the template; finally, after attribute-evaluation word pairs in the comment information are obtained, analyzing the emotion tendentiousness of the comment information by using a method based on an emotion dictionary, namely judging the emotion tendentiousness of the evaluation words by using an emotion dictionary, and judging the emotion tendentiousness of the comment sentence according to the number superiority of positive and negative evaluation words in the comment sentence;
the method for generating the commodity comment abstract comprises the following steps: organizing the commodity review summaries from the commodity level and the review level respectively; the commodity-level comment abstract is to classify and summarize all comments in the same commodity to generate a user's overall evaluation of the commodity, so that a reader can comprehensively know the whole and specific attributes of the commodity on a statistical level, the commodity-level comment abstract comprises two parts of contents, the first part is the overall evaluation of the commodity by the client, the emotional tendency of each comment of the commodity is counted, the overall score of the commodity is calculated through various emotional statistics, the second part is the overall evaluation of various attributes of the commodity by the user, the idea is to cluster attribute-evaluation word pairs in the commodity comment, the comments of the commodity attributes by the client are displayed in a list form according to a clustering result, and the number of positive and negative comments under each attribute is attached.
CN201810481052.3A 2016-06-28 2016-06-28 Translation method of transaction information multi-language machine translation subsystem Expired - Fee Related CN108664477B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810481052.3A CN108664477B (en) 2016-06-28 2016-06-28 Translation method of transaction information multi-language machine translation subsystem

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610489374.3A CN106202061B (en) 2016-06-28 2016-06-28 Chinese-English illiteracy towards cross-border e-commerce platform, which is hidden, ties up multi-lingual machine translation system and method
CN201810481052.3A CN108664477B (en) 2016-06-28 2016-06-28 Translation method of transaction information multi-language machine translation subsystem

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201610489374.3A Division CN106202061B (en) 2016-06-28 2016-06-28 Chinese-English illiteracy towards cross-border e-commerce platform, which is hidden, ties up multi-lingual machine translation system and method

Publications (2)

Publication Number Publication Date
CN108664477A CN108664477A (en) 2018-10-16
CN108664477B true CN108664477B (en) 2022-04-01

Family

ID=57462220

Family Applications (6)

Application Number Title Priority Date Filing Date
CN201810479768.XA Expired - Fee Related CN108763223B (en) 2016-06-28 2016-06-28 Method for constructing Chinese-English Mongolian Tibetan language multilingual parallel corpus
CN201810480399.6A Pending CN108763224A (en) 2016-06-28 2016-06-28 The interpretation method of the multi-lingual machine translation subsystem of comment information
CN201810480430.6A Pending CN108763225A (en) 2016-06-28 2016-06-28 The interpretation method of the multi-lingual machine translation subsystem of attribute information
CN201610489374.3A Active CN106202061B (en) 2016-06-28 2016-06-28 Chinese-English illiteracy towards cross-border e-commerce platform, which is hidden, ties up multi-lingual machine translation system and method
CN201810481045.3A Pending CN108763226A (en) 2016-06-28 2016-06-28 The abstracting method of comment on commodity element
CN201810481052.3A Expired - Fee Related CN108664477B (en) 2016-06-28 2016-06-28 Translation method of transaction information multi-language machine translation subsystem

Family Applications Before (5)

Application Number Title Priority Date Filing Date
CN201810479768.XA Expired - Fee Related CN108763223B (en) 2016-06-28 2016-06-28 Method for constructing Chinese-English Mongolian Tibetan language multilingual parallel corpus
CN201810480399.6A Pending CN108763224A (en) 2016-06-28 2016-06-28 The interpretation method of the multi-lingual machine translation subsystem of comment information
CN201810480430.6A Pending CN108763225A (en) 2016-06-28 2016-06-28 The interpretation method of the multi-lingual machine translation subsystem of attribute information
CN201610489374.3A Active CN106202061B (en) 2016-06-28 2016-06-28 Chinese-English illiteracy towards cross-border e-commerce platform, which is hidden, ties up multi-lingual machine translation system and method
CN201810481045.3A Pending CN108763226A (en) 2016-06-28 2016-06-28 The abstracting method of comment on commodity element

Country Status (1)

Country Link
CN (6) CN108763223B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018128925A (en) * 2017-02-09 2018-08-16 富士通株式会社 Information output program, information output method and information output device
WO2020106438A1 (en) * 2018-11-22 2020-05-28 Yeogirl Yun Multilingual tag-based review system
CN110110336A (en) * 2019-05-05 2019-08-09 西北民族大学 A kind of construction method of the Tibetan language syntax corpus towards hiding Chinese machine translation
CN110232107A (en) * 2019-05-08 2019-09-13 深圳市小满科技有限公司 A kind of product data acquisition methods
CN110321568B (en) * 2019-07-09 2020-08-28 昆明理工大学 Chinese-Yue convolution neural machine translation method based on fusion of part of speech and position information
CN110889295B (en) * 2019-09-12 2021-10-01 华为技术有限公司 Machine translation model, and method, system and equipment for determining pseudo-professional parallel corpora
CN111126046B (en) * 2019-12-06 2023-07-14 腾讯云计算(北京)有限责任公司 Sentence characteristic processing method and device and storage medium
CN111078894B (en) * 2019-12-17 2023-09-12 中国科学院遥感与数字地球研究所 Scenic spot evaluation knowledge base construction method based on metaphor topic mining
CN113761882B (en) * 2020-06-08 2024-09-20 北京沃东天骏信息技术有限公司 Dictionary construction method and device
CN113657123A (en) * 2021-07-14 2021-11-16 内蒙古工业大学 Mongolian aspect level emotion analysis method based on target template guidance and relation head coding
CN113627201B (en) * 2021-10-11 2022-02-08 北京达佳互联信息技术有限公司 Information extraction method and device, electronic equipment and storage medium
CN117875816B (en) * 2024-01-05 2024-10-11 深圳市瀚力科技有限公司 Cross-border E-business data statistical processing method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007073054A (en) * 2005-09-08 2007-03-22 Fujitsu Ltd Parallel translation phrase presentation program, parallel translation phrase presentation method and parallel translation phrase presentation device
CN101079028A (en) * 2007-05-29 2007-11-28 中国科学院计算技术研究所 On-line translation model selection method of statistic machine translation
CN101206643A (en) * 2006-12-21 2008-06-25 中国科学院计算技术研究所 Translation method syncretizing sentential form template and statistics mechanical translation technique
CN101676898A (en) * 2008-09-17 2010-03-24 中国科学院自动化研究所 Method and device for translating Chinese organization name into English with the aid of network knowledge
CN103577394A (en) * 2012-07-31 2014-02-12 阿里巴巴集团控股有限公司 Machine translation method and device based on double-array search tree
AU2015215882A1 (en) * 2005-01-04 2015-09-10 Thomson Reuters Global Resources Systems, methods, software, and interfaces for multilingual information retrieval

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1839401A (en) * 2003-09-19 2006-09-27 东芝解决方案株式会社 Information processing device and information processing method
US7299171B2 (en) * 2004-08-17 2007-11-20 Contentguard Holdings, Inc. Method and system for processing grammar-based legality expressions
CN101075230B (en) * 2006-05-18 2011-11-16 中国科学院自动化研究所 Method and device for translating Chinese organization name based on word block
CN101290616A (en) * 2008-06-11 2008-10-22 中国科学院计算技术研究所 Statistical machine translation method and system
JP4701292B2 (en) * 2009-01-05 2011-06-15 インターナショナル・ビジネス・マシーンズ・コーポレーション Computer system, method and computer program for creating term dictionary from specific expressions or technical terms contained in text data
US8275604B2 (en) * 2009-03-18 2012-09-25 Microsoft Corporation Adaptive pattern learning for bilingual data mining
CN101957815A (en) * 2009-07-13 2011-01-26 白劲实 Automatic translation method and system based on correct translation result and corresponding relation
JP5747508B2 (en) * 2011-01-05 2015-07-15 富士ゼロックス株式会社 Bilingual information search device, translation device, and program
US20140279731A1 (en) * 2013-03-13 2014-09-18 Ivan Bezdomny Inc. System and Method for Automated Text Coverage of a Live Event Using Structured and Unstructured Data Sources
CN103268566B (en) * 2013-05-23 2016-09-21 新疆卡尔罗媒体科技有限公司 A kind of social network platform system and interactive approach
CN103530284B (en) * 2013-09-22 2016-07-06 中国专利信息中心 Short sentence cutting device, machine translation system and corresponding cutting method and interpretation method
CN104615593B (en) * 2013-11-01 2017-09-29 北大方正集团有限公司 Hot microblog topic automatic testing method and device
CN103646097B (en) * 2013-12-18 2016-09-07 北京理工大学 A kind of suggestion target based on restriction relation and emotion word associating clustering method
CN103823890B (en) * 2014-03-10 2016-11-02 中国科学院信息工程研究所 A kind of microblog hot topic detection method for special group and device
CN104408078B (en) * 2014-11-07 2019-02-12 北京第二外国语学院 A kind of bilingual Chinese-English parallel corpora base construction method based on keyword
CN105022728A (en) * 2015-07-13 2015-11-04 广西达译商务服务有限责任公司 Automatic acquisition system of Chinese and Lao bilingual parallel texts and implementation method
CN105045862A (en) * 2015-07-13 2015-11-11 广西达译商务服务有限责任公司 System for automatically acquiring bilingual parallel corpus of Chinese-foreign languages and realization method
CN105068997B (en) * 2015-07-15 2017-12-19 清华大学 The construction method and device of parallel corpora
CN105117428B (en) * 2015-08-04 2018-12-04 电子科技大学 A kind of web comment sentiment analysis method based on word alignment model
CN105354183A (en) * 2015-10-19 2016-02-24 Tcl集团股份有限公司 Analytic method, apparatus and system for internet comments of household electrical appliance products
CN105528341B (en) * 2015-11-25 2018-07-24 金陵科技学院 The term translation digging system and method for function are customized with field

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2015215882A1 (en) * 2005-01-04 2015-09-10 Thomson Reuters Global Resources Systems, methods, software, and interfaces for multilingual information retrieval
JP2007073054A (en) * 2005-09-08 2007-03-22 Fujitsu Ltd Parallel translation phrase presentation program, parallel translation phrase presentation method and parallel translation phrase presentation device
CN101206643A (en) * 2006-12-21 2008-06-25 中国科学院计算技术研究所 Translation method syncretizing sentential form template and statistics mechanical translation technique
CN101079028A (en) * 2007-05-29 2007-11-28 中国科学院计算技术研究所 On-line translation model selection method of statistic machine translation
CN101676898A (en) * 2008-09-17 2010-03-24 中国科学院自动化研究所 Method and device for translating Chinese organization name into English with the aid of network knowledge
CN103577394A (en) * 2012-07-31 2014-02-12 阿里巴巴集团控股有限公司 Machine translation method and device based on double-array search tree

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Combining lexical and statistical translation evidence for cross‐language information retrieval";Rawoens Gudrun 等;《Meta : journal des traducteurs / Meta: Translators’ Journal》;20150101;第60卷(第3期);第237-242页 *
"构建汉越/越汉平行语料库——以机械制造业汉越语料库建设为例";张智丹;《企业科技与发展》;20151205;第23卷(第12期);第11-14页 *

Also Published As

Publication number Publication date
CN108763223B (en) 2022-05-13
CN108763225A (en) 2018-11-06
CN108763223A (en) 2018-11-06
CN106202061A (en) 2016-12-07
CN106202061B (en) 2018-09-14
CN108763226A (en) 2018-11-06
CN108763224A (en) 2018-11-06
CN108664477A (en) 2018-10-16

Similar Documents

Publication Publication Date Title
CN108664477B (en) Translation method of transaction information multi-language machine translation subsystem
CN107491531B (en) Chinese network comment sensibility classification method based on integrated study frame
CN107862343B (en) Commodity comment attribute level emotion classification method based on rules and neural network
TWI689880B (en) Method and device for implementing review search engine ranking
WO2017162074A1 (en) Method, apparatus and device for mapping products
TW201423450A (en) Information pushing, search method and device based on keyword extraction of electronic information
CN108346075A (en) Information recommendation method and device
JP2009026195A (en) Article classification apparatus, article classification method and program
Thomaidou et al. Automated snippet generation for online advertising
CN105931082B (en) Commodity category keyword extraction method and device
Hanni et al. Summarization of customer reviews for a product on a website using natural language processing
CN101937432A (en) System and method for negotiation between two parties according to supply and demand information
Hananto et al. A machine learning approach to analyze fashion styles from large collections of online customer reviews
CN118193806A (en) Target retrieval method, target retrieval device, electronic equipment and storage medium
Yamada et al. A text mining approach for automatic modeling of Kansei evaluation from review texts
Soliman et al. Utilizing support vector machines in mining online customer reviews
WO2021136009A1 (en) Search information processing method and apparatus, and electronic device
Bhargava et al. Comment based seller trust model for e-commerce
Nabiha et al. Sentiment analysis for informal malay text in social commerce
JP7138981B1 (en) Similarity determination device, similarity determination system, similarity determination method, and program
Kordomatis et al. Web object identification for web automation and meta-search
Halim et al. Consumer Opinion Extraction Using Text Mining for Product Recommendations On E-Commerce
Zhang A personalized recommendation algorithm based on text mining
Jadon et al. Sentiment analysis for movies prediction using machine leaning techniques
Xia et al. Understanding the evolution of fine-grained user opinions in product reviews

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220401

CF01 Termination of patent right due to non-payment of annual fee