JP5787934B2 - Information processing apparatus, information processing method, and information processing program - Google Patents

Information processing apparatus, information processing method, and information processing program Download PDF

Info

Publication number
JP5787934B2
JP5787934B2 JP2013128180A JP2013128180A JP5787934B2 JP 5787934 B2 JP5787934 B2 JP 5787934B2 JP 2013128180 A JP2013128180 A JP 2013128180A JP 2013128180 A JP2013128180 A JP 2013128180A JP 5787934 B2 JP5787934 B2 JP 5787934B2
Authority
JP
Japan
Prior art keywords
translation
web page
language
search query
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2013128180A
Other languages
Japanese (ja)
Other versions
JP2015005011A (en
Inventor
裕貴 石川
裕貴 石川
颯々野 学
学 颯々野
Original Assignee
ヤフー株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ヤフー株式会社 filed Critical ヤフー株式会社
Priority to JP2013128180A priority Critical patent/JP5787934B2/en
Publication of JP2015005011A publication Critical patent/JP2015005011A/en
Application granted granted Critical
Publication of JP5787934B2 publication Critical patent/JP5787934B2/en
Application status is Active legal-status Critical
Anticipated expiration legal-status Critical

Links

Images

Description

  The present invention relates to an information processing apparatus, an information processing method, and an information processing program for estimating the correctness of parallel translation of words included in a sentence translated by machine translation.

In a web page on the Internet, a web page created in a language (foreign language) different from the language (native language) used by the user is translated so that the translated web page can be searched. . A machine translation system is generally used for the translation. In the case of translation using a machine translation system, the translation quality is often inferior compared to the case where a translator performs translation.
In order to improve the quality of translation by a machine translation system, the correctness of the parallel translation is estimated statistically from the parallel corpus. Alternatively, a bilingual dictionary created manually is used.
Moreover, there exists patent document 1 etc. as a related technique.

  Patent Document 1 aims to provide a word semantic relationship extraction device that accurately extracts a word semantic relationship between words in a language composed of phonetic characters based on the notation of characters, and from words included in data, In a word semantic relationship extraction device that extracts a word pair composed of two words and determines a word semantic relationship of the extracted word pair, a plurality of characters from words of the word semantic relationship word pair registered in the word semantic relationship dictionary Extracting a semantic element consisting of, calculating the similarity between the semantic elements of the words of the word semantic relation word pair, calculating the similarity of the word pair extracted from the data based on the similarity between the semantic elements, It is disclosed to determine the word semantic relationship between word pairs based on the similarity of word pairs.

JP 2012-108570 A

Both manual bilingual dictionaries and bilingual corpora are expensive to prepare. In addition, when a bilingual dictionary is used, there is a possibility that a word whose translation is ambiguous is translated erroneously. When the bilingual corpus is used, the translation probability basically depends on the appearance frequency, so that a low-frequency word may be erroneously translated. In search services, translation accuracy is required for words in search queries.
The present invention has been made in the background of such a background art, and is an information processing apparatus and an information processing method for estimating the translation accuracy of words in a search query that requires translation quality in a search service. And an information processing program.

The gist of the present invention for achieving the object lies in the inventions of the following items.
The invention of [1] uses query accepting means for accepting a search query including a word in the second language, and a log of an operator's operation on the translated Web page searched by the search query accepted by the query accepting means. And estimating means for estimating the accuracy of the translation corresponding to the words in the search query received by the query receiving means , wherein the search query is a machine translation of the original Web page described in the first language. A system for searching a translated web page translated into a second language by the system, wherein the parallel translation is a first translation used by the machine translation system to translate the translated web page from the original web page. The information processing apparatus is a parallel translation corresponding to a word in a language .

The invention according to [2] is the information processing apparatus according to claim 1, further comprising: a bilingual changing unit that changes the bilingual translation based on the accuracy estimated by the estimating unit.

The invention of [3] further comprises learning means for learning an estimation process in the estimation means using a log to which correct answer information indicating whether the parallel translation is correct is added as teacher data , and the estimation means The information processing apparatus according to claim 1, wherein the accuracy of the parallel translation is estimated according to the estimation process learned by the learning unit.

The invention of [4] is an information processing method performed by the information processing apparatus, the query receiving step receiving a search query including a word in the second language, and the search query received by the query receiving step An estimation step for estimating the accuracy of the translation corresponding to the word in the search query accepted by the query acceptance step using a log of the operator's operation on the translation web page , wherein the search query includes a first query For searching a translated web page obtained by translating an original web page written in a second language into a second language by a machine translation system, and the parallel translation is performed by the machine translation system from the original web page. information processing side, characterized in that the word of the first language used in the translation of the translation Web page is the corresponding translation It is.

In the invention of [5], the computer accepts a query accepting unit that accepts a search query including a word in the second language, and an operator's operation on the translated Web page searched by the search query accepted by the query accepting unit. The log is used as an estimation unit that estimates the accuracy of the translation corresponding to the word in the search query received by the query reception unit, and the search query is the original Web page described in the first language. For a translation web page translated into a second language by a machine translation system, and the parallel translation is used by the machine translation system to translate the translation web page from the original web page The information processing program is a parallel translation corresponding to a word in the first language .

  According to the information processing apparatus, the information processing method, and the information processing program according to the present invention, it is possible to estimate the accuracy of translation of words in a search query that requires translation quality in a search service.

It is a conceptual module block diagram about the structural example of 1st Embodiment. It is explanatory drawing which mainly shows the example of the flow of data in this Embodiment. It is explanatory drawing which shows the system configuration example for implement | achieving 1st Embodiment. It is a flowchart which shows the process example in 1st Embodiment. It is explanatory drawing which shows the example of a data structure of a bilingual table. It is explanatory drawing which shows the data structure example of a log. It is a conceptual module block diagram about the structural example of 2nd Embodiment. It is a flowchart which shows the example of a learning process in 2nd Embodiment. It is explanatory drawing which shows the data structure example of a log. It is a block diagram which shows the hardware structural example of the computer which implement | achieves this Embodiment.

Hereinafter, examples of various preferred embodiments for realizing the present invention will be described with reference to the drawings.
FIG. 1 is a conceptual module configuration diagram of a configuration example according to the first embodiment.
The module generally refers to components such as software (computer program) and hardware that can be logically separated. Therefore, the module in the present embodiment indicates not only a module in a computer program but also a module in a hardware configuration. Therefore, the present embodiment is a computer program for causing these modules to function (a program for causing a computer to execute each procedure, a program for causing a computer to function as each means, and a function for each computer. This also serves as an explanation of the program and system and method for realizing the above. However, for the sake of explanation, the words “store”, “store”, and equivalents thereof are used. However, when the embodiment is a computer program, these words are stored in a storage device or stored in memory. It is the control to be stored in the device. Modules may correspond to functions one-to-one, but in mounting, one module may be configured by one program, or a plurality of modules may be configured by one program, and conversely, one module May be composed of a plurality of programs. The plurality of modules may be executed by one computer, or one module may be executed by a plurality of computers in a distributed or parallel environment. Note that one module may include other modules. Hereinafter, “connection” is used not only for physical connection but also for logical connection (data exchange, instruction, reference relationship between data, etc.). “Predetermined” means that the process is determined before the target process, and not only before the process according to this embodiment starts but also after the process according to this embodiment starts. In addition, if it is before the target processing, it is used in accordance with the situation / state at that time or with the intention to be decided according to the situation / state up to that point. When there are a plurality of “predetermined values”, they may be different values, or two or more values (of course, including all values) may be the same.
In addition, the system or device is configured by connecting a plurality of computers, hardware, devices, and the like by communication means such as a network (including one-to-one correspondence communication connection), etc., and one computer, hardware, device. The case where it implement | achieves by etc. is included. “Apparatus” and “system” are used as synonymous terms. Of course, the “system” does not include a social “mechanism” (social system) that is an artificial arrangement.
In addition, when performing a plurality of processes in each module or in each module, the target information is read from the storage device for each process, and the processing result is written to the storage device after performing the processing. is there. Therefore, description of reading from the storage device before processing and writing to the storage device after processing may be omitted. Here, the storage device may include a hard disk, a RAM (Random Access Memory), an external storage medium, a storage device via a communication line, a register in a CPU (Central Processing Unit), and the like.

  The information processing apparatus 100 according to the first embodiment includes a search query reception module 110, a search module 120, a translation module 130, a parallel translation storage module 140, a log collection module 150, a log storage module 160, an accuracy estimation module 170, a parallel translation. A change module 180 is included.

The search query reception module 110 is connected to the search module 120 and the log storage module 160. The search query reception module 110 searches the translated web page obtained by translating the original web page written in the first language into the second language by the machine translation system, and searches for words in the second language. Accept search queries that contain. Here, the first language is a language used for the description of the original Web page, for example, English. The second language is a language that can be read by a user who wants to access the original Web page, such as Japanese. For the user, in general, the first language is a foreign language and the second language is a native language. The first web page is a web page translated by the machine translation system. Further, the search query includes a second language word, and there may be a plurality of words. For example, AND search and OR search of a plurality of words. As a general example, the search query receiving module 110 receives a search query input by a user through a Web browser. For example, there is a search for a shop site.
The search module 120 is connected to the search query reception module 110. The search module 120 searches the translated web page according to the search query received by the search query receiving module 110. A conventional search system may be used.

The translation module 130 is connected to the parallel translation storage module 140. The translation module 130 is a module that performs translation processing in the machine translation system described above, and translates from the original web page using the bilingual translation table in the bilingual storage module 140 to create a translated web page. In addition, the translation module 130 may store the pair of translations used when creating the translation Web page in the log storage module 160.
The parallel translation storage module 140 is connected to the translation module 130, the accuracy estimation module 170, and the parallel translation change module 180. The bilingual storage module 140 stores a first language word used for translation of the translation web page from the original web page by the machine translation system and a second language word that is a translation. For example, a bilingual table 500 is stored. FIG. 5 is an explanatory diagram showing an example of the data structure of the bilingual table 500. The parallel translation table 500 includes a first language field 510, a second language field 520, and a translation probability field 530. The first language column 510 stores words that are the first language. The second language column 520 stores words of the second language that are parallel translations of the words of the first language. The translation probability column 530 stores the probability of using the parallel translation. There are multiple possibilities for word translation that is a parallel translation. The probability here indicates the possibility (probability, conditional probability) of adopting these word correspondences. Generally, when a word in the first language is an ambiguous word, there are a plurality of parallel translations, and the parallel translation is selected according to the context in the Web page in which the words in the first language are used.

The log collection module 150 is connected to the log storage module 160. Logs may be collected by a terminal displaying the translated web page, and the collected log may be transmitted to the log collection module 150, or a web server that receives an instruction for the translated web page may collect the log. The collected logs may be transmitted to the log collection module 150. Logs to be collected include (1) URL of translation web page (Uniform Resource Locator), (2) Stay time (period until moving to the next web page (translation web page display period)), (3) Search Information indicating whether or not the translated web page indicating the result has been clicked (information indicating whether or not the translated web page has been displayed), (4) parallel translation for each word used in the search query, and the like. Moreover, it is not restricted to these, You may collect the following.
(1) Click Log This is basic data for the following operations, and stores mouse operations (cursor movement operations, selection operations such as the right button or left button) on the Web page as a log. Further, when the device that displays the Web page is a touch panel, operations such as a finger (tap, drag, flick, pinch in, pinch out, long press, shake, etc.) are stored as a log. The click log may be analyzed and the following operations may be collected as a log.
(2) Operations related to scrolling Specifically, there are the number of times scrolling has been performed, the moving distance by scrolling, the moving direction, and the like.
(3) Operation related to the original Web page (the original language (first language) page) The “original Web page” is displayed together with the “translated Web page”. For example, the original Web page may be displayed using a tag or the like.
Specifically, there are the number of movements to the “original Web page”, the stay time of the “original Web page”, and the like. When a tag is used, the number of times the “original Web page” tag is selected is the number of times of movement to the “original Web page”. The period from when the “original Web page” tag is selected until another tag is selected or until the next Web page is reached is the “original Web page” stay time.
(4) Operation related to “Like” button, etc. Operation related to “Like” button in social networking services (for example, Facebook, Twitter) and the like.
Specifically, there are the number of times the “Like” button or the like has been selected.
(5) Operations related to sales, etc. When the target translation Web page is for commercial purposes, there are the number of purchases, the amount of sales, the number of reservations, and the like.
(6) Operation related to conversion (CV) Conversion refers to the final result that can be obtained on a website for commercial purposes, etc. If it is a commercial web page, it can be used for product purchase, information provision, and community translation web. If it is a page, membership registration etc. will be converted. In addition, primary results such as document requests and inquiries may be included as conversions. Further, the conversion rate (the ratio of the number of cases linked to the conversion to the number of accesses to the Web page) may be collected.
(7) Operation related to transition to dictionary search Specifically, the number of times the dictionary search is performed on the words in the “original Web page” and the “translation Web page”, the type of dictionary, and the search target word Etc.
(8) Inflow traffic from the search result page In other words, the number of inflows to the target Web page from the search result by the search in the Internet search system (the number of visits to the Web page via the so-called search engine) is referred to as the operation log. To do.
This operation log and translation quality have a certain correlation. This will be described below. If the translation quality is poor, it may not match the user's search query in the first place. In addition, if the translation quality of the part displayed on the search result page is poor, for example, if the translation quality of the product title is poor, or if the translation quality of the text displayed only partially in the search result is poor, the search result The possibility of clicking on a certain Web page is reduced.

  The log storage module 160 is connected to the search query reception module 110, the log collection module 150, and the accuracy estimation module 170. The log storage module 160 stores an operation log of the operator with respect to the translated web page searched by the search query received by the search query receiving module 110. For example, a log 600 is stored. FIG. 6 is an explanatory diagram showing an example of the data structure of the log 600. The log 600 includes a query field 610, a page field 620, a stay time field 630, a click field 640, a translation 1 field 650, a translation 2 field 660, and the like. The query column 610 stores search queries collected by the search query receiving module 110. Here, a string of words included in the search query is stored. The page column 620 stores the URL of the translation Web page (translation Web page displayed as corresponding to the search query) searched by the search query. The stay time column 630 stores the stay time of the translated Web page. The click column 640 stores information indicating whether or not the translated Web page has been clicked. The bilingual 1 column 650 stores the bilingual translation in the translation web page for the first word in the search query. That is, the machine translation system stores the translation set of the translation table 500 used when the translation Web page is translated. The parallel translation 2 column 660 stores the same content as the parallel translation 1 column 650 for the second word in the search query. Of course, when the number of words in the search query is 3 or more, the column equivalent to the bilingual translation 1 column 650 continues, and when the number of words in the search query is 1, the bilingual translation 2 column 660 is unnecessary. (NULL may be stored in the bilingual translation 2 column 660).

The accuracy estimation module 170 is connected to the parallel translation storage module 140, the log storage module 160, and the parallel translation change module 180. The accuracy estimation module 170 uses the log stored in the log storage module 160 to determine the accuracy of the translation stored in the parallel translation storage module 140 corresponding to the word in the search query received by the search query reception module 110. Estimate gender.
For example, the search results that are generated by the search module 120 include each word of the search query, but the number of clicks on the page that is correctly translated is long and the stay time is long, and the page that is incorrectly translated The number of clicks is small and the stay time is expected to be short. Since the translation module 130 translates using the translation storage module 140, the translation of each word in the search query is known. The accuracy of each translation is estimated using the log stored in the log storage module 160 as a clue. In other words, the value of the translation accuracy on the translation web page that has been clicked (see the click field 640 of the log 600) and the stay time (see the stay time field 630 of the log 600) is longer than a predetermined value. The accuracy value of the parallel translation having a large and reverse tendency is reduced and reflected in the parallel translation storage module 140. More specifically, in the case of a translation web page whose stay time is longer than a predetermined value, the translation probability of the translation in the translation web page is multiplied by a predetermined coefficient (one or more coefficients), It may be an accuracy value. In the case of a translated web page that has not been clicked or a translated web page whose stay time is less than or equal to a predetermined value, the translation probability of the translation in the translated web page is multiplied by a predetermined coefficient (a coefficient less than 1). The accuracy value may be used. Further, the accuracy value may be a value proportional to the staying time without using the original translation probability.
Note that the log used to estimate the accuracy of the parallel translation includes the click log on the translated web page, the operation related to scrolling, the operation related to the original web page, and “good” in addition to the information on whether or not the click was made and the staying time information. Operation related to sales etc., operation related to conversion (CV), operation related to transition to dictionary search, and the like. For example, as the number of clicks on the translation Web page, the number of scrolls, the number of times the “Like” button is selected, the sales, and the conversion rate are higher (higher), the accuracy value of the parallel translation may be increased. Good. Alternatively, the accuracy value of the parallel translation may be increased as the movement to the original Web page is smaller and the number of transitions to the dictionary search is smaller. In the opposite case, the translation accuracy value may be reduced.

The parallel translation change module 180 is connected to the parallel translation storage module 140 and the accuracy estimation module 170. The parallel translation change module 180 changes the parallel translation stored in the parallel translation storage module 140 based on the accuracy estimated by the accuracy estimation module 170. Here, the translation change may be a translation probability change in the translation probability column 530 in addition to the translation change in the second language column 520 in the translation table 500. For example, when the accuracy value is less than a predetermined value, the parallel translation may be deleted from the parallel translation storage module 140.
Also, the accuracy value may be treated as equivalent to the translation probability. That is, the accuracy value estimated by the accuracy estimation module 170 described above is replaced with a translation probability, and the accuracy value calculated by the accuracy estimation module 170 is used as a new translation probability. The translation probability may be changed.
By adjusting the translation probability and re-translating the original web page (re-creating the translated web page) with the machine translation system, the translation result of the original web page containing the word is improved and translated into different words until then. Translation web pages that did not appear in the search result because of the above appear. In addition, since the translation accuracy of words that are likely to appear as search queries is preferentially improved, search results and sales are expected to improve.

FIG. 2 is an explanatory diagram mainly showing an example of a data flow in the present embodiment (the information processing apparatus 100 and the machine translation system 220).
The machine translation system 220 creates the translation shopping site 210B by translating the overseas shopping site 210A. Then, it is made public on the Internet so that it can be searched by an operation by the user 201, 202 or the like. Each of the users 201, 202, etc. inputs a search query using a Web browser using a terminal possessed by the user (such as a portable information terminal including a notebook PC or a smartphone). The search query is received by the search query receiving module 110, and the search module 120 searches the translated Web page and presents the translated shopping site 210B as a search result. The users 201 and 202 perform operations on the translation shopping site 210B. The history of these operations is collected by the log collection module 150 and stored as a log 230 in the log storage module 160. The accuracy estimation module 170 and the parallel translation change module 180 calculate the translation probability calculation result 240 of the parallel translation. Then, the parallel translation change module 180 corrects the parallel translation in the machine translation system 220 using the translation probability calculation result 240 of the parallel translation.
The machine translation system 220 translates the overseas shopping site 210A again using the corrected parallel translation. In other words, the newly created translation shopping site 210B is a Web page with more accurate translation than the previous translation shopping site 210B.
The overseas shopping site 210A and the translation shopping site 210B include not only one web page but also a plurality of web pages, and the translation probability calculation result 240 of the parallel translation calculated from the log 230 for a certain web page is the other web page. Will be used for re-translation.

FIG. 3 is an explanatory diagram illustrating an example of a system configuration for realizing the first embodiment. The web page evaluation system 300 includes the information processing apparatus 100, a web page server 310, and a log collection device 320. The Web page evaluation system 300, the machine translation system 220, the terminal 340A, the terminal 340B, the terminal 340C, the terminal 340D, and the terminal 340E are connected via a communication line 399, respectively. The translation module 130 and the parallel translation storage module 140 illustrated in the example of FIG. 1 may be included in the machine translation system 220, and the log collection module 150 may be included in the log collection device 320.
For example, the machine translation system 220 translates the original web page, and the web page that is the translation result is stored in the web page server 310.
Then, the terminal 340A, the terminal 340B, and the like input a search query and access a Web page in the Web page server 310 that is the search result. At that time, the log collection device 320 collects operation logs for the Web pages displayed by the terminals 340A, 340B, and the like. For example, the terminal 340 </ b> A, the terminal 340 </ b> B, etc. detect the operation, and transmit the detection result to the log collection device 320 via the communication line 399. Then, the log collected by the log collection device 320 is transferred to the information processing device 100, and the information processing device 100 estimates the translation accuracy of the parallel translation of the word used in the search query and adjusts the parallel translation. The machine translation system 220 translates the original Web page again using the adjusted translation.

FIG. 4 is a flowchart illustrating an example of processing in the first embodiment.
In step S402, the translation module 130 prepares a machine translation Web page.
In step S404, the search query receiving module 110 receives a search query.
In step S406, the log collection module 150 collects logs for the machine translation Web page.
In step S408, the accuracy estimation module 170 calculates a translation probability. In this example, the translation probability is used as the accuracy value.
In step S410, the parallel translation change module 180 determines whether or not the translation probability is equal to or greater than a threshold value. If the translation probability is equal to or greater than the threshold value, the process is terminated (step S499). Otherwise, the process proceeds to step S412.
In step S412, the parallel translation change module 180 corrects the parallel translation data of the words used in the search query.
Further, thereafter, the translation module 130 may recreate the machine translation Web page using the corrected parallel translation data.

FIG. 7 is a conceptual module configuration diagram of a configuration example according to the second embodiment.
The information processing apparatus 100 includes a search query receiving module 110, a search module 120, a translation module 130, a parallel translation storage module 140, a log collection module 150, a log storage module 160, an accuracy estimation module 170, a parallel translation change module 180, and a learning module 750. Have. In addition, the same code | symbol is attached | subjected to the site | part of the same kind as 1st Embodiment, and the overlapping description is abbreviate | omitted.
The parallel translation storage module 140 is connected to the translation module 130, the accuracy estimation module 170, the parallel translation change module 180, and the learning module 750.
The log storage module 160 is connected to the search query reception module 110, the log collection module 150, the accuracy estimation module 170, and the learning module 750. The log storage module 160 adds and stores correct information indicating whether or not the parallel translation is correct. For example, a log 900 is stored. FIG. 9 is an explanatory diagram showing a data structure example of the log 900. The log 900 includes a query field 910, a page field 920, a stay time field 930, a click field 940, a parallel translation 1 field 950, a parallel translation 2 field 960, and a teacher data field 970. The log 900 is obtained by adding a teacher data column 970 to the log 600. The teacher data column 970 stores information indicating that the translation is correct in the translated Web page. Whether or not the translation is correct is a result of manual judgment (by a translator or the like). Since it is assumed that the machine translation system 220 is used for translation, it is not determined whether or not the parallel translation is correct for all the parallel translations. In other words, teacher data is necessary for learning the accuracy estimation method, and a correct parallel translation is used as the teacher data. For this purpose, a determination is made on the retrieved search results.

The learning module 750 is connected to the parallel translation storage module 140, the log storage module 160, and the accuracy estimation module 170. The learning module 750 learns the estimation process in the accuracy estimation module 170 using the log stored in the log storage module 160 using the correct answer information stored in the log storage module 160 as teacher data. The learning here uses the coefficients and thresholds of the formula for calculating the accuracy value from the log of stay time (stay time column 930) and click (click column 940) so that the accuracy value in the correct translation is high. A certain predetermined value is obtained by learning. As described above, for the accuracy estimation method, a log other than the staying time and the click may be used. Therefore, learning is also performed based on these logs. As a learning method, specifically, a coefficient or the like may be determined by learning using a neural network.
The accuracy estimation module 170 is connected to the parallel translation storage module 140, the log storage module 160, the parallel translation change module 180, and the learning module 750. The accuracy estimation module 170 estimates the accuracy of the translation according to the estimation process learned by the learning module 750. Therefore, it is possible to perform an accurate estimation process reflecting an actual log, rather than estimating accuracy using a predetermined coefficient or the like as in the first embodiment.

FIG. 8 is a flowchart illustrating an example of learning processing according to the second embodiment.
In step S802, the translation module 130 prepares a machine translation Web page.
In step S804, the search query receiving module 110 receives a search query.
In step S806, the log collection module 150 collects a log for the machine translation Web page.
In step S808, the learning module 750 generates a translation probability calculation model using the teacher data log. In this example, the translation probability is used as the accuracy value.
In step S810, the accuracy estimation module 170 incorporates a translation probability calculation model.
Thereafter, the translation according to the flowchart shown in the example of FIG. 4 is performed to change the translation. Note that the translation to be changed may be a translation other than the teacher data.

  Note that the hardware configuration of the computer on which the program according to the present embodiment is executed is a general computer as illustrated in FIG. 10, and specifically, a personal computer, a computer that can be a server, a mobile phone ( Including smartphones). That is, as a specific example, the CPU 1001 is used as a processing unit (arithmetic unit), and the RAM 1002, the ROM 1003, and the HD 1004 are used as storage devices. For example, a hard disk may be used as the HD 1004. CPU 1001 that executes programs such as search query reception module 110, search module 120, translation module 130, log collection module 150, accuracy estimation module 170, parallel translation change module 180, and learning module 750, and RAM 1002 that stores the programs and data A ROM 1003 that stores a program for starting the computer, an HD 1004 that is an auxiliary storage device, a reception device 1006 that receives data based on a user's operation on a keyboard, a mouse, a touch panel, and the like, and a liquid crystal display An output device 1005 such as a display, a communication line interface 1007 for connecting to a communication network such as a network interface card, etc., and connecting them to exchange data It is constituted by the eye of a bus 1008. A plurality of these computers may be connected to each other via a network.

Among the above-described embodiments, the computer program is a computer program that reads the computer program, which is software, in the hardware configuration system, and the software and hardware resources cooperate with each other. Is realized.
Note that the hardware configuration illustrated in FIG. 10 illustrates one configuration example, and the present embodiment is not limited to the configuration illustrated in FIG. 10, and is a configuration capable of executing the modules described in the present embodiment. I just need it. For example, some modules may be configured by dedicated hardware (for example, ASIC), and some modules may be in an external system and connected via a communication line. A plurality of systems shown in FIG. 5 may be connected to each other via communication lines so as to cooperate with each other.

  Further, in the description of the above-described embodiment, “more than”, “less than”, “greater than”, and “less than (less than)” in a comparison with a predetermined value contradicts the combination. As long as the above does not occur, “larger”, “smaller (less than)”, “more”, and “less” may be used.

The program described above may be provided by being stored in a recording medium, or the program may be provided by communication means. In that case, for example, the above-described program may be regarded as an invention of a “computer-readable recording medium recording the program”.
The “computer-readable recording medium on which a program is recorded” refers to a computer-readable recording medium on which a program is recorded, which is used for program installation, execution, program distribution, and the like.
The recording medium is, for example, a digital versatile disc (DVD), which is a standard established by the DVD Forum, such as “DVD-R, DVD-RW, DVD-RAM,” and DVD + RW. Standard “DVD + R, DVD + RW, etc.”, compact disc (CD), read-only memory (CD-ROM), CD recordable (CD-R), CD rewritable (CD-RW), Blu-ray disc ( Blu-ray (registered trademark) Disc), magneto-optical disk (MO), flexible disk (FD), magnetic tape, hard disk, read-only memory (ROM), electrically erasable and rewritable read-only memory (EEPROM (registered trademark)) )), Flash memory, Random access memory (RAM) SD (Secure Digital) memory card and the like.
The program or a part of the program may be recorded on the recording medium for storage or distribution. Also, by communication, for example, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), a wired network used for the Internet, an intranet, an extranet, etc., or wireless communication It may be transmitted using a transmission medium such as a network or a combination of these, or may be carried on a carrier wave.
Furthermore, the program may be a part of another program, or may be recorded on a recording medium together with a separate program. Moreover, it may be divided and recorded on a plurality of recording media. Further, it may be recorded in any manner as long as it can be restored, such as compression or encryption.
The above-described embodiment may be grasped as follows.
[A] A search query including a word in the second language for performing a search on the translated Web page obtained by translating the original Web page described in the first language into the second language by the machine translation system. A query receiving means for receiving;
Bilingual storage means for storing a word in a second language that is a parallel translation with a word in a first language used by the machine translation system to translate the translation web page from the original web page;
Log storage means for storing an operation log of the operator for the translated web page searched by the search query received by the query receiving means;
Estimating means for estimating the accuracy of the parallel translation stored in the parallel translation storage means corresponding to the word in the search query accepted by the query acceptance means using the log stored in the log storage means
An information processing apparatus comprising:
[B] Bilingual change means for changing the parallel translation stored in the parallel translation storage means based on the accuracy estimated by the estimating means
The information processing apparatus according to [A], further comprising:
[C] The log storage means adds correct information indicating whether or not the parallel translation is correct to the log, and stores the correct information.
Learning means for learning the estimation process in the estimation means using the correct answer information stored in the log storage means as teacher data and using the log stored in the log storage means
Further comprising
The estimation means estimates the accuracy of the parallel translation according to the estimation process learned by the learning means.
The information processing apparatus according to [A] or [B].
[D] An information processing method performed by an information processing apparatus including a bilingual storage unit and a log storage unit,
The bilingual storage means stores a first language word used by the machine translation system to translate the original web page into a translated web page and a second language word that is a translation.
Query reception that accepts a search query including a word in the second language for performing a search on a translated Web page obtained by translating an original Web page written in the first language into a second language by a machine translation system Steps,
Storing a log of an operator's operation on the translated web page searched by the search query received by the query receiving step in the log storage means;
An estimation step for estimating the accuracy of the parallel translation stored in the parallel translation storage unit corresponding to the word in the search query received by the query reception step using the log stored in the log storage unit.
An information processing method comprising:
[E]
Query reception that accepts a search query including a word in the second language for performing a search on a translated Web page obtained by translating an original Web page written in the first language into a second language by a machine translation system Means,
Bilingual storage means for storing a word in a second language that is a parallel translation with a word in a first language used by the machine translation system to translate the translation web page from the original web page;
Log storage means for storing an operation log of the operator for the translated web page searched by the search query received by the query receiving means;
Estimating means for estimating the accuracy of the parallel translation stored in the parallel translation storage means corresponding to the word in the search query accepted by the query acceptance means using the log stored in the log storage means
Information processing program to function as

DESCRIPTION OF SYMBOLS 100 ... Information processing apparatus 110 ... Search query reception module 120 ... Search module 130 ... Translation module 140 ... Parallel translation storage module 150 ... Log collection module 160 ... Log storage module 170 ... Accuracy estimation module 180 ... Parallel translation change module 300 ... Web page evaluation System 310 ... Web page server 320 ... Log collection device 340 ... Terminal 399 ... Communication line 750 ... Learning module

Claims (5)

  1. Query accepting means for accepting a search query including a word in the second language;
    Estimating the accuracy of the translation corresponding to the word in the search query accepted by the query accepting means using the log of the operator's operation on the translated web page retrieved by the search query accepted by the query accepting means An estimation means for
    The search query is for performing a search on a translated web page obtained by translating an original web page described in a first language into a second language by a machine translation system.
    The information processing apparatus according to claim 1, wherein the parallel translation is a parallel translation corresponding to a word in a first language used by the machine translation system to translate the original web page to the translated web page.
  2. The information processing apparatus according to claim 1, further comprising: a parallel translation changing unit that changes the parallel translation based on the accuracy estimated by the estimation unit.
  3. Learning means for learning estimation processing in the estimation means using a log to which correct information indicating whether the parallel translation is correct is added as teacher data ;
    The information processing apparatus according to claim 1, wherein the estimation unit estimates the accuracy of the parallel translation according to the estimation process learned by the learning unit.
  4. An information processing method performed by an information processing apparatus,
    A query accepting step for accepting a search query including a word in the second language;
    Estimating the accuracy of the translation corresponding to the word in the search query accepted by the query acceptance step using the log of the operator's operation on the translation web page retrieved by the search query accepted by the query acceptance step An estimation step to
    The search query is for performing a search on a translated web page obtained by translating an original web page described in a first language into a second language by a machine translation system.
    The parallel translation is a parallel translation corresponding to a word in a first language used by the machine translation system to translate the translated web page from the original web page.
  5. Computer
    Query accepting means for accepting a search query including a word in the second language;
    Estimating the accuracy of the translation corresponding to the word in the search query accepted by the query accepting means using the log of the operator's operation on the translated web page retrieved by the search query accepted by the query accepting means Function as an estimation means,
    The search query is for performing a search on a translated web page obtained by translating an original web page described in a first language into a second language by a machine translation system.
    The information processing program according to claim 1, wherein the parallel translation is a parallel translation corresponding to a word of a first language used by the machine translation system for translation of the translated web page from the original web page.
JP2013128180A 2013-06-19 2013-06-19 Information processing apparatus, information processing method, and information processing program Active JP5787934B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2013128180A JP5787934B2 (en) 2013-06-19 2013-06-19 Information processing apparatus, information processing method, and information processing program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2013128180A JP5787934B2 (en) 2013-06-19 2013-06-19 Information processing apparatus, information processing method, and information processing program

Publications (2)

Publication Number Publication Date
JP2015005011A JP2015005011A (en) 2015-01-08
JP5787934B2 true JP5787934B2 (en) 2015-09-30

Family

ID=52300895

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2013128180A Active JP5787934B2 (en) 2013-06-19 2013-06-19 Information processing apparatus, information processing method, and information processing program

Country Status (1)

Country Link
JP (1) JP5787934B2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6554841B2 (en) * 2015-03-16 2019-08-07 富士ゼロックス株式会社 Information processing apparatus and information processing program
WO2017175275A1 (en) * 2016-04-04 2017-10-12 株式会社ミニマル・テクノロジーズ Translation system

Also Published As

Publication number Publication date
JP2015005011A (en) 2015-01-08

Similar Documents

Publication Publication Date Title
JP5387124B2 (en) Method and system for performing content type search
RU2458391C2 (en) Internet-based collocation error checking
US9098488B2 (en) Translation of multilingual embedded phrases
CN101868797B (en) Cross-language search
US20070198530A1 (en) Reputation information processing program, method, and apparatus
JP2009543255A (en) Map hierarchical and sequential document trees to identify parallel data
JP2009535732A (en) Demographic classification for local word wheeling / web search
US20130031076A1 (en) Systems and methods for contextual searching of semantic entities
US8694303B2 (en) Systems and methods for tuning parameters in statistical machine translation
US9223831B2 (en) System, method and computer program product for searching summaries of mobile apps reviews
TWI471737B (en) System and method for trail identification with search results
CN101452453B (en) A kind of method of input method Web side navigation and a kind of input method system
US9971745B2 (en) Method and system for providing suggested tags associated with a target web page for manipulation by a user optimal rendering engine
JP2019504413A (en) System and method for proposing emoji
US20130132361A1 (en) Input method for querying by using a region formed by an enclosed track and system using the same
US9361386B2 (en) Clarification of submitted questions in a question and answer system
US9646096B2 (en) System and methods for analyzing and improving online engagement
JP6575335B2 (en) Method, computer system, and program for estimating links between social media messages and facilities
CN104025077A (en) Real-Time Natural Language Processing Of Datastreams
CN102144228A (en) Resource locator suggestions from input character sequence
US9805718B2 (en) Clarifying natural language input using targeted questions
CN102306171B (en) A kind of for providing network to access suggestion and the method and apparatus of web search suggestion
US20150371137A1 (en) Displaying Quality of Question Being Asked a Question Answering System
US20130060769A1 (en) System and method for identifying social media interactions
US20140278957A1 (en) Normalization of media object metadata

Legal Events

Date Code Title Description
A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20150410

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20150421

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20150501

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20150623

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20150630

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20150721

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20150728

R150 Certificate of patent or registration of utility model

Ref document number: 5787934

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

S533 Written request for registration of change of name

Free format text: JAPANESE INTERMEDIATE CODE: R313533