CN108304412A - A kind of cross-language search method and apparatus, a kind of device for cross-language search - Google Patents

A kind of cross-language search method and apparatus, a kind of device for cross-language search Download PDF

Info

Publication number
CN108304412A
CN108304412A CN201710025472.6A CN201710025472A CN108304412A CN 108304412 A CN108304412 A CN 108304412A CN 201710025472 A CN201710025472 A CN 201710025472A CN 108304412 A CN108304412 A CN 108304412A
Authority
CN
China
Prior art keywords
search result
translation
default
exposition
translation model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710025472.6A
Other languages
Chinese (zh)
Other versions
CN108304412B (en
Inventor
翟飞飞
张骏
许静芳
薛征山
祝天刚
于恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201710025472.6A priority Critical patent/CN108304412B/en
Publication of CN108304412A publication Critical patent/CN108304412A/en
Application granted granted Critical
Publication of CN108304412B publication Critical patent/CN108304412B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)

Abstract

An embodiment of the present invention provides a kind of cross-language search method and apparatus, a kind of device for cross-language search, method therein specifically includes:Obtain the search term of the first languages;According to described search word, the search result of the second languages is obtained;For the search result of each second languages, following steps are executed:Determine target translation model corresponding with each default exposition of described search result;Using the target translation model, the corresponding translation search result of each default exposition of described search result is obtained;The corresponding translation search result of each default exposition of described search result is shown to user.The embodiment of the present invention can improve the accuracy of translation search result.

Description

A kind of cross-language search method and apparatus, a kind of device for cross-language search
Technical field
The present invention relates to information search technique fields, more particularly to a kind of cross-language search method and apparatus, Yi Zhongyong In the device of cross-language search.
Background technology
With the continuous growth of internet information, more stringent requirements are proposed for information search by people, is no longer satisfied with It is searched in same languages database, and requires to obtain a variety of languages data.For example, if search term input by user (query)) it is " Donald Trump ", then the search in Chinese database can not may farthest meet user demand, be originated from There may be more excellent, more search results in the Database in English of American-European website.
Cross-language search technology combines information retrieval technique and machine translation mothod.Existing cross-language search scheme Realization process can specifically include:First, the search term of source language is converted to by object language by machine translation mothod The search term of form, then, respectively according to source language search term and object language form search term, in corresponding list Information retrieval is carried out in language database, to obtain multilingual search result, wherein multilingual search result can wrap It includes:The search result of original language and the search result of object language.
In order to meet the need for the limited user of reading ability for not having object language reading ability or object language It asks, existing scheme can utilize translation model, be translated to the search result of object language, to obtain turning over for source language Translate search result.
Inventor has found that at least there are the following problems for existing scheme during implementing the embodiment of the present invention:Existing side Case generally use general translator model translates the search result of object language, and the limitation of the general translator model is easy The accuracy for influencing translation search result, that is, the accuracy of the translation search result obtained in existing scheme is relatively low.
Invention content
In view of the above problems, it is proposed that the embodiment of the present invention overcoming the above problem or at least partly in order to provide one kind Cross-language search method, cross-language search device and the device for cross-language search to solve the above problems, the present invention are implemented Example can improve the accuracy of translation search result.
To solve the above-mentioned problems, the invention discloses a kind of cross-language search methods, including:
Obtain the search term of the first languages;
According to described search word, the search result of the second languages is obtained;
For the search result of each second languages, following steps are executed:
Determine target translation model corresponding with each default exposition of described search result;
Using the target translation model, the corresponding translation search knot of each default exposition of described search result is obtained Fruit;
The corresponding translation search result of each default exposition of described search result is shown to user.
Optionally, the step of the corresponding target translation model of each default exposition of the determination and described search result Suddenly, including:
The corresponding displaying type of each default exposition for determining that described search result includes;
According to the displaying type, target translation model corresponding with each default exposition is obtained.
Optionally, if the corresponding displaying type of the default exposition is title class, the acquisition and each default exhibition Show that the corresponding target translation model in part includes:Title translation model is obtained, the title translation model is according to title language Material training obtains;
And/or
If the corresponding displaying type of the default exposition is abstract class, the acquisition and each default exposition phase Corresponding target translation model includes:Abstract translation model is obtained, the abstract translation model is trained according to language material of making a summary It arrives;
And/or
If the corresponding displaying type of the default exposition is content of pages class, the acquisition and each default displaying portion The corresponding target translation model of split-phase includes:Content of pages translation model is obtained, the content translation model is according to preset page Face content language material trains to obtain.
Optionally, described to utilize the target translation model if the default exposition is title division, obtain institute The step of stating each default exposition corresponding translation search result of search result, including:
Identify the pre- set symbol that the title division is included;
According to the pre- set symbol, the title division is divided into multiple semantic primitives;
Each semantic primitive obtained to segmentation using the corresponding first object translation model of the title division is translated, To obtain the corresponding translation result of each semantic primitive;
According to the pre- set symbol, the corresponding translation result of each semantic primitive is combined, to obtain the mark Inscribe the corresponding first translation search result in part;The first translation search result includes the pre- set symbol.
Optionally, each semantic list that segmentation is obtained using the title division corresponding first object translation model The step of member is translated, including:
Each semantic primitive and its corresponding context are input to the first object translation model respectively, it is described to obtain The corresponding translation result of each semantic primitive of first object translation model output.
Optionally, described to utilize the target translation model if the default exposition is abstract part, obtain institute The step of stating each default exposition corresponding translation search result of search result, including:
Object content of the extraction positioned at preset position from the abstract part;
Using the corresponding second target translation model of the preset position, the object content is translated, to obtain Corresponding second translation search result.
Optionally, the method further includes:Determine the target category belonging to described search result;
It is described according to the displaying type, obtaining target translation model corresponding with each default exposition includes:
Target category in conjunction with belonging to described search result and the corresponding displaying type of each default exposition obtain each pre- If the corresponding target translation model of exposition.
Optionally, the step of target category belonging to the determining described search result, including:
The content that described search result includes is matched with the dictionary of each preset classification respectively, to obtain each preset class Not corresponding matching rate;
By the corresponding preset classification of the maximum in the corresponding matching rate of all preset classifications, as described search result institute The target category of category.
Optionally, the step of target belonging to the determining described search result preset classification, including:
The content for including by search result inputs grader, and the classification results that the grader exports are searched as described in Target category belonging to hitch fruit;Wherein, the grader obtains for the search result sample training according to each preset classification.
On the other hand, the invention discloses a kind of cross-language search devices, including:
Search term acquisition module, the search term for obtaining the first languages;
Search result acquisition module, for according to described search word, obtaining the search result of the second languages;
Search result processing module is handled for the search result to each second languages;
Described search result treatment module includes:It translation model determining module, translation search result acquisition module and turns over Translate search result display module;
The translation model determining module determines and described search knot for the search result for each second languages Each default corresponding target translation model of exposition of fruit;
The translation search result acquisition module obtains described search result for utilizing the target translation model The corresponding translation search result of each default exposition;And
The translation search result display module, each default exposition pair for showing described search result to user The translation search result answered.
Optionally, the translation model determining module includes:Show that type determination module and translation model obtain submodule Block;
Wherein, the displaying type determination module, for determining each default exposition that described search result includes Corresponding displaying type;
The translation model acquisition submodule, for according to the displaying type, obtaining opposite with each default exposition The target translation model answered.
Optionally, if the corresponding displaying type of the default exposition is title class, the translation model obtains son Module includes:First translation model acquiring unit;
The first translation model acquiring unit, for obtaining title translation model, the title translation model is foundation Title language material trains to obtain;
And/or
If the corresponding displaying type of the default exposition is abstract class, the translation model acquisition submodule packet It includes:Second translation model acquiring unit;
The second translation model acquiring unit, for obtaining abstract translation model, the abstract translation model is foundation Abstract language material trains to obtain;
And/or
If the corresponding displaying type of the default exposition is content of pages class, the translation model acquisition submodule Including:Third translation model acquiring unit;
The third translation model acquiring unit, for obtaining content of pages translation model, the content translation model is It trains to obtain according to pre-set page content language material.
Optionally, if the default exposition is title division, the translation search result acquisition module includes:Know Small pin for the case module, segmentation submodule, the first translation submodule and combination submodule;
Wherein, the identification submodule, the pre- set symbol that the title division is included for identification;
The segmentation submodule, for according to the pre- set symbol, the title division to be divided into multiple semantic primitives;
It is described first translation submodule, for using the corresponding first object translation model of the title division to dividing To each semantic primitive translated, to obtain the corresponding translation result of each semantic primitive;
The combination submodule, for according to the pre- set symbol, to the corresponding translation result of each semantic primitive into Row combination, to obtain the corresponding first translation search result of the title division;The first translation search result includes described Pre- set symbol.
Optionally, the first translation submodule includes:Translation unit;
The translation unit is turned over for each semantic primitive and its corresponding context to be input to the first object respectively Model is translated, to obtain the corresponding translation result of each semantic primitive of the first object translation model output.
Optionally, if the default exposition is abstract part, the translation search result acquisition module includes:It carries Submodule and second is taken to translate submodule;
The extracting sub-module, for extracting the object content positioned at preset position from the abstract part;
The second translation submodule utilizes the corresponding second target translation model of the preset position, in the target Appearance is translated, to obtain corresponding second translation search result.
Optionally, described device further includes:Category determination module;
The category determination module, for determining the target category belonging to described search result;
The translation model acquisition submodule includes:Model acquiring unit;
The model acquiring unit, for combining target category and each default exposition pair belonging to described search result The displaying type answered obtains each default corresponding target translation model of exposition.
Optionally, the category determination module includes:Matched sub-block and determination sub-module;
The dictionary of the matched sub-block, content and each preset classification for including by described search result respectively carries out Matching, to obtain the corresponding matching rate of each preset classification;
The determination sub-module, for by the corresponding preset class of the maximum in the corresponding matching rate of all preset classifications Not, as the target category belonging to described search result.
Optionally, the category determination module includes:Classification submodule;
The classification submodule, the content for including by search result inputs grader, and the grader is exported Classification results as the target category belonging to described search result;Wherein, the grader is searching according to each preset classification Hitch fruit sample training obtains.
In another aspect, the invention discloses a kind of device for cross-language search, include memory and one or The more than one program of person, one of them either more than one program be stored in memory and be configured to by one or It includes the instruction for being operated below that more than one processor, which executes the one or more programs,:
Obtain the search term of the first languages;
According to described search word, the search result of the second languages is obtained;
For the search result of each second languages, following steps are executed:
Determine target translation model corresponding with each default exposition of described search result;
Using the target translation model, the corresponding translation search knot of each default exposition of described search result is obtained Fruit;
The corresponding translation search result of each default exposition of described search result is shown to user.
The embodiment of the present invention includes following advantages:
The embodiment of the present invention can determine first in the translation process of the search result of the second languages of cross-language search Target translation model corresponding with each default exposition of described search result, then utilizes above-mentioned target translation model, Obtain the corresponding translation search result of default exposition of described search result;In this way, above-mentioned target translation model can be Translation model compatible with each default exposition, that is, above-mentioned target translation model can be according to each default exposition The characteristics of carry out translation of second languages to the first languages, therefore the accuracy of translation search result can be improved.
Description of the drawings
Fig. 1 is a kind of schematic diagram of the application environment of cross-language search method of the present invention;
Fig. 2 is a kind of step flow chart of cross-language search embodiment of the method one of the present invention;
Fig. 3 is a kind of structure diagram of cross-language search device embodiment of the present invention;
Fig. 4 be the present invention a kind of device 900 for cross-language search as terminal when block diagram;And
Fig. 5 be the present invention a kind of device for cross-language search as server when structural schematic diagram.
Specific implementation mode
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings and specific real Applying mode, the present invention is described in further detail.
In the embodiment of the present invention, machine translation is regarded as the process of information transmission, with a kind of channel model to machine Device translation explains.This thought thinks that the translation of source language sentence to target language sentence is a probability problem, any One target language sentence be likely to be any one source language sentence translation, only probability is different, and machine translation is appointed Business is exactly to find the sentence of maximum probability.Specific method is will to translate the decoding regarded as to original text by model conversion for translation Journey.Therefore translation model can be divided into following problem:Model problem, training problem, decoding problem.So-called model problem, Exactly it is translation model of the machine translation foundation for describing probability, that is, defines source language sentence to target language sentence Translation probability computational methods.And training problem, it is all parameters that this model is obtained using corpus.So-called solution Code problem is then, for the source language sentence of any one input, to go to search general on the basis of known translation model and parameter The maximum translation of rate.
Inventor has found that existing scheme generally use general translator model is to mesh during implementing the embodiment of the present invention The search result of poster speech is translated, which will be obtained identical as long as the content of text of input is identical Translation search result.However, the search result for being typically different type usually has the characteristics that itself, in this way, using general translator Model translates all types of search results, then is easy to influence the accuracy of translation search result, that is, existing scheme In the obtained accuracy of translation search result it is relatively low.
The relatively low technical problem of accuracy for translation search result present in existing scheme, the embodiment of the present invention carry Supply a kind of cross-language search scheme, the program that can obtain the search term of the first languages;According to described search word, second is obtained The search result of languages;And for the search result of each second languages, determine the default exposition with described search result Corresponding target translation model;And the target translation model is utilized, obtain the default exposition pair of described search result The translation search result answered;And then the corresponding translation search result of default exposition of described search result is shown to user. Since the embodiment of the present invention can determine that target corresponding with each default exposition of described search result translates mould first Then type utilizes above-mentioned target translation model, obtain the corresponding translation search result of default exposition of described search result; In this way, above-mentioned target translation model can be translation model compatible with each default exposition, that is, above-mentioned target is translated Model can be according to carrying out translation of second languages to the first languages the characteristics of each default exposition, therefore can improve translation The accuracy of search result.
In the embodiment of the present invention, the search term of the first languages can be translated as to the search term of the second languages first, then, Search term according to the second languages is retrieved in the database of the second languages, to obtain the search result of the second languages.Cause This, the search result of the second languages can be used for indicating the corresponding search result of search term of the second languages, and translation search result can The translation search result for the first languages that search result for indicating according to the second languages is translated, wherein the second languages Search result and the translation search results of the first languages can correspond to identical search result (such as webpage, video, picture, sound It is happy etc.), one of difference of the two is the difference of linguistic form.
In a kind of application example of the present invention, if the search term of the first languages is " Donald Trump ", corresponding second languages Search term be " Trump ", then can be retrieved in Database in English according to " Trump ", with obtain English search knot Fruit, and using target translation model corresponding with each default exposition of described search result, to each default exposition It is translated, to obtain corresponding translation search result.
The embodiment of the present invention can be applied to search APP, search website (such as search engine) has cross-language search work( In the platform environment of energy, the search result from multilingual database can be not only provided a user, but also can be carried to user For more accurate translation search as a result, not having object language reading ability or object language reading ability with satisfaction has The demand of the user of limit.The embodiment of the present invention mainly for searching for APP to the cross-language search method of the embodiment of the present invention into Row explanation, the corresponding cross-language search method of other platforms such as search website are cross-referenced.
Cross-language search method provided in an embodiment of the present invention can be applied in application environment shown in FIG. 1, such as Fig. 1 institutes Show, client 100 and server 200 are located in wired or wireless network, by the wired or wireless network, client 100 and Server 200 carries out data interaction.
The cross-language search flow of the embodiment of the present invention can be by any or combination in client 100 and server 200 It executes:
For example, client 100 can receive the search term of the first languages input by user, and being sent to server 200 should The search term of first languages;Server 200 can obtain after the search term for receiving first languages according to described search word The search result for taking the second languages determines each default exhibition with described search result for the search result of each second languages Show the corresponding target translation model in part;And the target translation model is utilized, obtain each default exhibition of described search result Show the corresponding translation search in part as a result, and sending the corresponding translation search of each default exposition as a result, making to client 100 Client 100 shows the corresponding translation search result of each default exposition of described search result to user.
Since the search result of the second languages and/or the acquisition process of translation search result are executed by server 200, therefore energy The abundant advantage of 200 computing resource of server is enough played, the search result of the second languages and/or obtaining for translation search result are improved It takes efficiency and obtains accuracy rate.For example, Cloud Server can be deployed with the computing device of numerous high configurations, therefore utilize these calculating Equipment carry out the second languages search result and/or translation search result acquisition, with improve the second languages search result and/ Or the acquisition efficiency of translation search result and obtain accuracy rate;The calculation resources of 100 side of client can be saved simultaneously, improve visitor The performance of intelligent terminal corresponding to family end 100.
Certainly, the search result of the second languages and/or the acquisition process of translation search result can also be held by client 100 Row, the embodiment of the present invention execute master for the search result of the second languages and/or the specific of the acquisition process of translation search result Body does not limit.
Optionally, client 100 may operate on intelligent terminal, and above-mentioned intelligent terminal specifically includes but unlimited:Intelligence Mobile phone, tablet computer, E-book reader, MP3 (dynamic image expert's compression standard audio level 3, Moving Picture Experts Group Audio Layer III) player, MP4 (dynamic image expert's compression standard audio levels 4, Moving Picture Experts Group Audio Layer IV) player, pocket computer on knee, vehicle-mounted computer, desk-top meter Calculation machine, set-top box, intelligent TV set, wearable device etc..
Embodiment of the method one
With reference to Fig. 2, a kind of step flow chart of cross-language search embodiment of the method one of the present invention is shown, it specifically can be with Include the following steps:
Step 201, the search term for obtaining the first languages;
Step 202, according to described search word, obtain the search result of the second languages;
For the search result of each second languages, following steps are executed:
Step 203 determines target translation model corresponding with each default exposition of described search result;
Step 204, using the target translation model, each default exposition for obtaining described search result corresponding turns over Translate search result;
Step 205, the corresponding translation search result of each default exposition that described search result is shown to user.
In the embodiment of the present invention, the search term of the first languages can be that user is inputted using the first languages.In practical application In, UI (user interface, UserInterface) can be provided by searching for the client of APP or search website, then user can lead to Cross the search term that the modes such as search box, the speech interface on the UI submit the first languages to client.No matter which kind of user passes through Mode to client submit the first languages search term, client can include by the search term of the first languages received Search box.Therefore, in the embodiment of the present invention, the search term of the first languages input by user may include:User passes through arbitrary side The search term for the first languages that formula is submitted to client.It is appreciated that search term of the embodiment of the present invention for first languages Specific acquisition modes do not limit.
In the embodiment of the present invention, the first languages and the second languages can be used for indicating different bilinguals, above-mentioned first language Kind and the second languages can be obtained by user is preset, also can pass through the search behavior of analysis user by searching for APP or search website And/or navigation patterns obtain.Optionally, it searches for APP or search website can be using the most common language of user as the first language Kind, will in addition to the first languages used language as the second languages.For example, before the search behavior of user shows user The search term used is Chinese search word, then can determine that original language is Chinese;The navigation patterns of user are also shown that user visits It asked translation web site, and the mutual translational action between Chinese and English was carried out by the translation web site, therefore can determine the Two languages are English.It is appreciated that the quantity of the second languages of the embodiment of the present invention can be one or more, for example, right For using Chinese as the user of mother tongue, the first languages can be Chinese, and the second languages can be English, Japanese, Korean, moral Text, one kind in French or combination.The embodiment of the present invention is mainly right by taking the first languages are Chinese, the second languages are English as an example The cross-language search method of the embodiment of the present invention illustrates, other first languages and the corresponding cross-language search side of the second languages Method is cross-referenced.
In practical applications, the search term of the first languages can be translated as second by step 202 by client or server The search term of languages, then, the search term according to the second languages are retrieved in the database of the second languages, to obtain second The search result of languages.By taking the second languages are English as an example, the data of American-European website can be stored in English database.It can To understand, the embodiment of the present invention is for according to described search word, obtaining the specific acquisition modes of the search result of the second languages not It limits.
Optionally, it during the search term of the first languages to be translated as to the search term of the second languages, may obtain A variety of different translation results can select the highest one kind of confidence level from a variety of different translation results in such cases Translation result.Further, it is possible to obtain the search result of the second languages according to the highest translation result search of the confidence level;Also may be used To be scanned for respectively according to a variety of different one or more of translation results, and the result conduct that search is obtained The search result of second languages.In a kind of application example of the present invention, if the search term of the first languages is " Donald Trump ", the The search term of two languages can be " Trump ".
Step 203 can be directed to each search result that step 202 obtains, and determine each default exhibition with described search result Show the corresponding target translation model in part;Step 204 can utilize the obtained target translation model of step 203, described in acquisition The corresponding translation search result of each default exposition of search result.In this way, above-mentioned target translation model can be with it is each pre- If the compatible translation model of exposition, that is, the characteristics of above-mentioned target translation model can be according to each default exposition Translation of second languages to the first languages is carried out, therefore the accuracy of translation search result can be improved.
In a kind of alternative embodiment of the present invention, above-mentioned determination is opposite with each default exposition of described search result The step of target translation model answered, may include:The corresponding exhibition of each default exposition for determining that described search result includes Show type;According to the displaying type, target translation model corresponding with each default exposition is obtained.Above-mentioned displaying type The characteristics of can reflecting default exposition, therefore according to the displaying type of each default exposition, it can obtain and each default exhibition Show the target translation model of body fit, and then the accuracy of translation search result can be improved.
In the embodiment of the present invention, above-mentioned default exposition can be used for indicating being directed to the preset displaying content of search result, The embodiment of the present invention can be directed to the search result default exposition that includes and its corresponding displaying type, provide acquisition with it is each The following acquisition scheme of the corresponding target translation model of default exposition:
Acquisition scheme 1,
In acquisition scheme 1, the default exposition may include:Title division;The corresponding displaying of the title division Type may include:Title class;Then the corresponding target translation model of the title division may include:Title translation model;Institute It can be to train to obtain according to title language material to state title translation model.
For the title division of search result, usually has the characteristics that itself:Such as be usually expressed as short sentence, phrase or The form of phrase or special pre- set symbol "-", " | ", " ... " etc., therefore the embodiment of the present invention can be advance would generally be contained The title language material from search result is obtained, optionally, which can be that bilingual corpora or alignment language material (also will It can be matched with the word of intertranslation in double-language sentence);Then it trains to obtain title translation model according to title language material.Due to title Language material also has the characteristics of title division, therefore the title translation model trained according to title language material is it can be considered that short sentence, short The form of language or phrase, the features such as including pre- set symbol, therefore more accurate translation search can be obtained for title division As a result.
Acquisition scheme 2,
In acquisition scheme 2, the default exposition may include:Abstract part;The corresponding displaying in the abstract part Type may include:Abstract class;Then the corresponding target translation model in the abstract part may include:Abstract translation model;Institute It can be to train to obtain according to abstract language material to state abstract translation model.
For the abstract part of search result, usually has the characteristics that itself:Such as be usually expressed as long sentence form or There is certain types of content in specific position and (will appear relatively-stationary content in the beginning location of abstract, such as time, information in person Source etc.) etc., therefore the embodiment of the present invention can obtain the abstract language material from search result in advance, optionally, which can Think bilingual corpora or alignment language material;Then it trains to obtain abstract translation model according to abstract language material.Due to language material of making a summary Have the characteristics of abstract part, according to the abstract translation model trained of abstract language material it can be considered that the form of long sentence or There is the characteristics of certain types of content in specific position, therefore more accurate translation search can be obtained for abstract part and imitated Fruit.
Acquisition scheme 3,
In acquisition scheme 3, the default exposition may include:Content of pages part;The content of pages part pair The displaying type answered may include:Content of pages class;Then the corresponding target translation model in the content of pages part may include: Content of pages translation model;The content translation model is to train to obtain according to pre-set page content language material.
Other than title division and abstract part, content of pages part can be also arranged in certain websites in search result, So that user obtains the more accurate information of the website by the content of pages part.For example, e-commerce website can be Content of pages part is set in search result, which can be used for showing popularization activity, to pass through the popularization activity Attract the eyeball of user.For another example, content of pages part can be arranged in news website in search result, which can For showing hot news event, to attract the eyeball of user by the hot news event.
The content of pages part of website setting is generally configured with the feature related to own website, such as the page of e-commerce website Face content is usually related with commodity, and the content of pages of news website is usually related with news.Therefore the embodiment of the present invention can be advance Pre-set page content language material is obtained, pre-set page content language material here is the language material from search result;Optionally, this is preset Content of pages can be bilingual corpora or alignment language material;Then it trains to obtain content of pages according to pre-set page content language material and turn over Translate model.Since pre-set page content language material also has the characteristics of content of pages part, train to obtain according to pre-set page content Content of pages translation model it can be considered that the characteristics of content of pages part, therefore content of pages part can be obtained more smart True translation search result.
Above by obtaining scheme 1 to obtaining scheme 3 to obtaining target translation mould corresponding with each default exposition The process of type is described in detail, it will be understood that those skilled in the art can use acquisition side according to practical application request Case 1 is to any or several combination obtained in scheme 3, alternatively, can also be directed to other default expositions uses it He obtains scheme, and the embodiment of the present invention is for obtaining the detailed process of target translation model corresponding with default exposition not It limits.
It, can also be according to the target category belonging to search result, to acquisition side in a kind of alternative embodiment of the present invention Case 1 to 3 corresponding title translation model of acquisition scheme, abstract translation model and content of pages translation model optimize, with into One step improves the accuracy of translation search result.Correspondingly, the method can also include:It determines belonging to described search result Target category.It is then aforementioned described according to the corresponding displaying type of each default exposition, it obtains and each default exposition Corresponding target translation model may include:Target category in conjunction with belonging to described search result and each default exposition pair The displaying type answered obtains each default corresponding target translation model of exposition.
Specifically, the target category belonging to the combination described search result and the corresponding displaying class of each default exposition Type, obtaining each default corresponding target translation model of exposition may include:If the default exposition is title portion Point, corresponding displaying type is title class, then the corresponding title translation model of the title division may include:The target The corresponding title translation model of classification;Wherein, the corresponding title translation model of the target category is according to the target category Interior title language material trains to obtain;
And/or
If the default exposition is abstract part, corresponding displaying type is abstract class, then the abstract part Corresponding abstract translation model may include:The corresponding abstract translation model of the target category;Wherein, the target category pair The abstract translation model answered is to train to obtain according to the abstract language material in the target category;
And/or
If the default exposition is content of pages part, corresponding displaying type is content of pages class, then described The corresponding page translation model in content of pages part may include:The corresponding page translation model of the target category;Wherein, institute It is to train to obtain according to the pre-set page content language material in the target category to state the corresponding page translation model of target category.
Optionally, above-mentioned target category may include:E-commerce, forum, news, novel, video etc., then can basis The search result of each target category collects title language material, abstract language material and pre-set page content language material in target category.
In a kind of alternative embodiment of the present invention, the step of target category belonging to above-mentioned determining described search result, May include:The content that described search result includes is matched with the dictionary of each preset classification respectively, it is each preset to obtain The corresponding matching rate of classification;By the corresponding preset classification of the maximum in the corresponding matching rate of all preset classifications, as described Target category belonging to search result.Wherein, the content that described search result includes can be that described search result corresponds to webpage Including content (namely web page contents), can also be the content that the default exposition of described search result includes.
Optionally, the process for obtaining matching rate may include:The default exposition that includes by described search result and/or Web page contents are segmented, the quantity M of vocabulary for counting the quantity N of all vocabulary and occurring in the dictionary of preset classification, Using the ratio of M and N as matching rate, it will be understood that the embodiment of the present invention does not limit the specific acquisition modes of matching rate System.
In a kind of alternative embodiment of the present invention, the step of target category belonging to above-mentioned determining described search result, May include:The content for including by search result inputs grader, and the classification results that the grader is exported are as described in Target category belonging to search result;Wherein, the grader obtains for the search result sample training according to each preset classification. Above-mentioned grader can be used for differentiating which preset classification search result belongs to, that is, the result of grader output namely search knot Target category belonging to fruit.
It should be noted that can be by the training method of machine learning, training obtains the various of the embodiment of the present invention and turns over Translate model or grader.In addition, the embodiment of the present invention is not added with the concrete type of various translation models or grader With limitation, for example, the type of translation model may include:NMT (translate, Neural Machine by neural network machine Translation), statistical machine translation (SMT, Statistical Machine Translation);Alternatively, grader Concrete type may include:SVM (support vector machines, Support Vector Machine), Bayes etc..
Step 204 can utilize the target translation model that step 203 obtains, and obtain the default displaying portion of described search result Divide corresponding translation search result.The present invention a kind of alternative embodiment in, can also according to preset exposition the characteristics of, Preset corresponding translation rule, and translation model is intelligently utilized according to the translation rule, it is searched with obtaining more accurately translating Hitch fruit.
The embodiment of the present invention can be provided using target translation model, obtain each default exposition of described search result The following translation scheme of corresponding translation search result:
Translation scheme 1,
In translation scheme 1, the default exposition may include:Title division, then it is described to be translated using the target Model the step of obtaining each default exposition corresponding translation search result of described search result, may include:
Identify the pre- set symbol that the title division is included;
According to the pre- set symbol, the title division is divided into multiple semantic primitives;
Each semantic primitive obtained to segmentation using the corresponding first object translation model of the title division is translated, To obtain the corresponding translation result of each semantic primitive;
According to the pre- set symbol, the corresponding translation result of each semantic primitive is combined, to obtain the mark Inscribe the corresponding first translation search result in part;The first translation search result includes the pre- set symbol.
Wherein, the corresponding first object translation model of above-mentioned title division can be title translation model above-mentioned, also may be used Think other corresponding translation models of title division.Upper meaning elements can be in character, word, phrase, phrase or short sentence It is any etc..
In practical applications, title division would generally contain special pre- set symbol "-", " | ", " ... " etc., then of the invention Embodiment can be directed to the pre- set symbol of title division, preset corresponding translation rule, and utilize the translation rule intelligence land productivity With translation model, to obtain more accurate translation search result.Specifically, corresponding first mesh of the title division is being utilized During mark translation model is translated, the semantic primitive on these pre- set symbol both sides is separately translated, then by each section The corresponding translation result of semantic primitive is combined, and retain in obtained the first translation search result of combination pre- set symbol, And the relative position between the semantic primitive on the pre- set symbol both sides, therefore title division corresponding first can be improved The accuracy of translation search result.
In a kind of alternative embodiment of the present invention, in order to avoid phrase or sentence are become broken by separated translation, on It states and is utilized respectively the step of corresponding target translation model of the title division translates each section semantic primitive, can wrap It includes:Each semantic primitive and its corresponding context are input to the first object translation model respectively, to obtain described first The corresponding translation result of each semantic primitive of target translation model output.Due to the process in separated translation each section semantic primitive In consider corresponding context relation, therefore can ensure the globality of the first translation search result and of overall importance.
Translation scheme 2,
In translation scheme 2, the default exposition may include:Abstract part, then it is described to be translated using the target Model the step of obtaining each default exposition corresponding translation search result of described search result, may include:
Object content of the extraction positioned at preset position from the abstract part;
Using the corresponding second target translation model of the preset position, the object content is translated, to obtain Corresponding second translation search result.
The embodiment of the present invention finds the following features of abstract part:There is certain types of content in specific position.For example, The beginning location of abstract will appear relatively-stationary content, such as time, information source.Provide herein abstract part as shown below Example:
Example 1,44 is replied-is posted the time:On April 15th, 2014
- MOSCOW, Jan.11 (Xinhua) before example 2,28 minutes -- The Kremlin on Wednesday denied that it has compromising materials on U.S.President-elect Donald Trump
Wherein, example 1 is the abstract part of the search result of forum's classification, " 44 times occurred in beginning location Again ", it " posts the time:It is respectively used on April 15th, 2014 " indicate the reply quantity of model type search result, post the time, The characteristics of abstract part for the search result that this reply quantity, time of posting belong to forum's classification.
Example 2 be news category search result abstract part, beginning location occur " before 28 minutes ", " MOSCOW, Jan.11 (Xinhua) " is respectively used to indicate the difference of the issuing time and current time of news type search result The difference of value, the issue date of news type search result and information source, issuing time and current time, news type search are tied The issue date of fruit and information source belong to the characteristics of abstract part of the search result of news category.
It is appreciated that above-mentioned example 1 and example 2 are showing for the search result of forum's classification and the search result of news category Example, in fact, the abstract part of the search result of other classifications also has:There is the spy of certain types of content in specific position Point.Therefore the embodiment of the present invention can utilize the feature, and corresponding second target translation model is trained for preset position, in this way, In translation process, the object content positioned at preset position can be extracted from the abstract part;Utilize the preset position Corresponding second target translation model, translates the object content, to obtain corresponding second translation search result.Its In, above-mentioned second target translation model can be the corresponding preset language material of preset position train to obtain, can with it is preset The characteristics of corresponding preset language material in position, is adapted, therefore the object content positioned at preset position can be obtained more smart True translation search result.
It should be noted that the second target translation model of the embodiment of the present invention can be:The target category and preset The corresponding translation model in position, in this way, second can be carried out according to the corresponding preset language material of preset position in target category The training of target translation model.
The target translation model is utilized in step 204, each default exposition for obtaining described search result is corresponding After translation search result, step 205 can show that corresponding translate of each default exposition of described search result is searched to user Hitch fruit, wherein client can be by one or more corresponding translation search of default exposition of described search result As a result it is shown.
To sum up, the cross-language search method of the embodiment of the present invention, can be in the second languages search result of cross-language search Translation process in, it is first determined target translation model corresponding with each default exposition of described search result, then Using above-mentioned target translation model, the corresponding translation search result of default exposition of described search result is obtained;On in this way, It can be translation model compatible with each default exposition to state target translation model, that is, above-mentioned target translation model can With according to carrying out translation of second languages to the first languages the characteristics of each default exposition, therefore translation search knot can be improved The accuracy of fruit.
It should be noted that for embodiment of the method, for simple description, therefore it is dynamic to be all expressed as a series of movement It combines, but those skilled in the art should understand that, the embodiment of the present invention is not limited by described athletic performance sequence System, because of embodiment according to the present invention, certain steps can be performed in other orders or simultaneously.Secondly, art technology Personnel should also know that embodiment described in this description belongs to preferred embodiment, and involved athletic performance simultaneously differs Surely it is necessary to the embodiment of the present invention.
Device embodiment
With reference to Fig. 3, shows a kind of structure diagram of cross-language search device embodiment of the present invention, can specifically wrap It includes:Search term acquisition module 301, search result acquisition module 302, search result processing module 303;
Above-mentioned search term acquisition module 301, the search term for obtaining the first languages;
Mentioned above searching results acquisition module 302, for according to described search word, obtaining the search result of the second languages;
Mentioned above searching results processing module 303 is handled for the search result to each second languages;
Wherein, described search result treatment module 303 may include:Translation model determining module 3031, translation search knot Fruit acquisition module 3032 and translation search result display module 3033;
Wherein, above-mentioned translation model determining module 3031, for determining each default exposition with described search result Corresponding target translation model;
Above-mentioned translation search result acquisition module 3032 obtains described search knot for utilizing the target translation model The corresponding translation search result of each default exposition of fruit;And
Above-mentioned translation search result display module 3033, each default displaying portion for showing described search result to user Divide corresponding translation search result.
Optionally, the translation model determining module 3031 may include:Show type determination module and translation model Acquisition submodule;
Wherein, the displaying type determination module, for determining each default exposition that described search result includes Corresponding displaying type;
The translation model acquisition submodule, for according to the displaying type, obtaining opposite with each default exposition The target translation model answered.
Optionally, if the corresponding displaying type of the default exposition is title class, the translation model obtains son Module may include:First translation model acquiring unit;
The first translation model acquiring unit, for obtaining title translation model, the title translation model is foundation Title language material trains to obtain;
And/or
If the corresponding displaying type of the default exposition is abstract class, the translation model acquisition submodule can be with Including:Second translation model acquiring unit;
The second translation model acquiring unit, for obtaining abstract translation model, the abstract translation model is foundation Abstract language material trains to obtain;
And/or
If the corresponding displaying type of the default exposition is content of pages class, the translation model acquisition submodule May include:Third translation model acquiring unit;
The third translation model acquiring unit, for obtaining content of pages translation model, the content translation model is It trains to obtain according to pre-set page content language material.
Optionally, if the default exposition is title division, the translation search result acquisition module 3032 can To include:Identify submodule, segmentation submodule, the first translation submodule and combination submodule;
Wherein, the identification submodule, the pre- set symbol that the title division is included for identification;
The segmentation submodule, for according to the pre- set symbol, the title division to be divided into multiple semantic primitives;
It is described first translation submodule, for using the corresponding first object translation model of the title division to dividing To each semantic primitive translated, to obtain the corresponding translation result of each semantic primitive;
The combination submodule, for according to the pre- set symbol, to the corresponding translation result of each semantic primitive into Row combination, to obtain the corresponding first translation search result of the title division;The first translation search result may include The pre- set symbol.
Optionally, the first translation submodule may include:Translation unit;
The translation unit is turned over for each semantic primitive and its corresponding context to be input to the first object respectively Model is translated, to obtain the corresponding translation result of each semantic primitive of the first object translation model output.
Optionally, if the default exposition is abstract part, the translation search result acquisition module 3032 can To include:Extracting sub-module and the second translation submodule;
Wherein, the extracting sub-module, for extracting the object content positioned at preset position from the abstract part;
The second translation submodule utilizes the corresponding second target translation model of the preset position, in the target Appearance is translated, to obtain corresponding second translation search result.
Optionally, described device can also include:Category determination module;
The category determination module, for determining the target category belonging to described search result;
The translation model acquisition submodule may include:Model acquiring unit;
The model acquiring unit, for combining target category and each default exposition pair belonging to described search result The displaying type answered obtains each default corresponding target translation model of exposition.
Optionally, the category determination module may include:Matched sub-block and determination sub-module;
Wherein, the matched sub-block, the content for may include by described search result respectively and each preset classification Dictionary matched, to obtain the corresponding matching rate of each preset classification;
The determination sub-module, for by the corresponding preset class of the maximum in the corresponding matching rate of all preset classifications Not, as the target category belonging to described search result.
Optionally, the category determination module may include:Classification submodule;
The classification submodule, content for may include by search result input grader, and by the grader The classification results of output are as the target category belonging to described search result;Wherein, the grader is according to each preset classification Search result sample training obtain.
For device embodiments, since it is basically similar to the method embodiment, so fairly simple, the correlation of description Place illustrates referring to the part of embodiment of the method.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with The difference of other embodiment, the same or similar parts between the embodiments can be referred to each other.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, explanation will be not set forth in detail herein.
Fig. 4 be a kind of device 900 for cross-language search shown according to an exemplary embodiment as terminal when Block diagram.For example, device 900 can be mobile phone, and computer, digital broadcast terminal, messaging devices, game console, Tablet device, Medical Devices, body-building equipment, personal digital assistant etc..
With reference to Fig. 4, device 900 may include following one or more components:Processing component 902, memory 904, power supply Component 906, multimedia component 908, audio component 910, the interface 912 of input/output (I/O), sensor module 914, and Communication component 916.
The integrated operation of 902 usual control device 900 of processing component, such as with display, call, data communication, phase Machine operates and record operates associated operation.Processing element 902 may include that one or more processors 920 refer to execute It enables, to perform all or part of the steps of the methods described above.In addition, processing component 902 may include one or more modules, just Interaction between processing component 902 and other assemblies.For example, processing component 902 may include multi-media module, it is more to facilitate Interaction between media component 908 and processing component 902.
Memory 904 is configured as storing various types of data to support the operation in equipment 900.These data are shown Example includes instruction for any application program or method that are operated on device 900, contact data, and telephone book data disappears Breath, picture, video etc..Memory 904 can be by any kind of volatibility or non-volatile memory device or their group It closes and realizes, such as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.
Power supply module 906 provides electric power for the various assemblies of device 900.Power supply module 906 may include power management system System, one or more power supplys and other generated with for device 900, management and the associated component of distribution electric power.
Multimedia component 908 is included in the screen of one output interface of offer between described device 900 and user.One In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding motion The boundary of action, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, Multimedia component 908 includes a front camera and/or rear camera.When equipment 900 is in operation mode, mould is such as shot When formula or video mode, front camera and/or rear camera can receive external multi-medium data.Each preposition camera shooting Head and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 910 is configured as output and/or input audio signal.For example, audio component 910 includes a Mike Wind (MIC), when device 900 is in operation mode, when such as call model, logging mode and speech recognition mode, microphone by with It is set to reception external audio signal.The received audio signal can be further stored in memory 904 or via communication set Part 916 is sent.In some embodiments, audio component 910 further includes a loud speaker, is used for exports audio signal.
I/O interfaces 912 provide interface between processing component 902 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lock Determine button.
Sensor module 914 includes one or more sensors, and the state for providing various aspects for device 900 is commented Estimate.For example, sensor module 914 can detect the state that opens/closes of equipment 900, and the relative positioning of component, for example, it is described Component is the display and keypad of device 900, and sensor module 914 can be with 900 1 components of detection device 900 or device Position change, the existence or non-existence that user contacts with device 900,900 orientation of device or acceleration/deceleration and device 900 Temperature change.Sensor module 914 may include proximity sensor, be configured to detect without any physical contact Presence of nearby objects.Sensor module 914 can also include optical sensor, such as CMOS or ccd image sensor, at As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 916 is configured to facilitate the communication of wired or wireless way between device 900 and other equipment.Device 900 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or combination thereof.In an exemplary implementation In example, communication component 916 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 916 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, device 900 can be believed by one or more application application-specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind of Such as include the memory 904 of instruction, above-metioned instruction can be executed by the processor 920 of device 900 to complete the above method.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is held by the processor of terminal When row so that terminal is able to carry out a kind of cross-language search method, the method includes:Obtain the search term of the first languages;Root According to described search word, the search result of the second languages is obtained;For the search result of each second languages, following steps are executed: Determine target translation model corresponding with each default exposition of described search result;Using the target translation model, Obtain the corresponding translation search result of each default exposition of described search result;The each of described search result is shown to user The corresponding translation search result of default exposition.
Optionally, the corresponding target translation model of each default exposition of the determination and described search result, packet It includes:
The corresponding displaying type of each default exposition for determining that described search result includes;
According to the displaying type, target translation model corresponding with each default exposition is obtained.
Optionally, if the corresponding displaying type of the default exposition is title class, the acquisition and each default exhibition Show that the corresponding target translation model in part includes:Title translation model is obtained, the title translation model is according to title language Material training obtains;
And/or
If the corresponding displaying type of the default exposition is abstract class, the acquisition and each default exposition phase Corresponding target translation model includes:Abstract translation model is obtained, the abstract translation model is trained according to language material of making a summary It arrives;
And/or
If the corresponding displaying type of the default exposition is content of pages class, the acquisition and each default displaying portion The corresponding target translation model of split-phase includes:Content of pages translation model is obtained, the content translation model is according to preset page Face content language material trains to obtain.
Optionally, described to utilize the target translation model if the default exposition is title division, obtain institute The corresponding translation search of each default exposition of search result is stated as a result, including:
Identify the pre- set symbol that the title division is included;
According to the pre- set symbol, the title division is divided into multiple semantic primitives;
Each semantic primitive obtained to segmentation using the corresponding first object translation model of the title division is translated, To obtain the corresponding translation result of each semantic primitive;
According to the pre- set symbol, the corresponding translation result of each semantic primitive is combined, to obtain the mark Inscribe the corresponding first translation search result in part;The first translation search result includes the pre- set symbol.
Optionally, each semantic list that segmentation is obtained using the title division corresponding first object translation model Member is translated, including:
Each semantic primitive and its corresponding context are input to the first object translation model respectively, it is described to obtain The corresponding translation result of each semantic primitive of first object translation model output.
Optionally, described to utilize the target translation model if the default exposition is abstract part, obtain institute The corresponding translation search of each default exposition of search result is stated as a result, including:
Object content of the extraction positioned at preset position from the abstract part;
Using the corresponding second target translation model of the preset position, the object content is translated, to obtain Corresponding second translation search result.
Optionally, the terminal be also configured to by one either more than one processor execute it is one or one Procedure above includes the instruction for being operated below:
Determine the target category belonging to described search result;
It is described according to the displaying type, obtaining target translation model corresponding with each default exposition includes:
Target category in conjunction with belonging to described search result and the corresponding displaying type of each default exposition obtain each pre- If the corresponding target translation model of exposition.
Optionally, the target category belonging to the determining described search result, including:
The content that described search result includes is matched with the dictionary of each preset classification respectively, to obtain each preset class Not corresponding matching rate;
By the corresponding preset classification of the maximum in the corresponding matching rate of all preset classifications, as described search result institute The target category of category.
Optionally, the preset classification of target belonging to the determining described search result, including:
The content for including by search result inputs grader, and the classification results that the grader exports are searched as described in Target category belonging to hitch fruit;Wherein, the grader obtains for the search result sample training according to each preset classification.
Fig. 5 be a kind of device for cross-language search shown according to an exemplary embodiment as server when frame Figure.The server 1900 can generate bigger difference because configuration or performance are different, may include in one or more Central processor (central processing units, CPU) 1922 (for example, one or more processors) and memory 1932, one or more storage application programs 1942 or data 1944 storage medium 1930 (such as one or one with Upper mass memory unit).Wherein, memory 1932 and storage medium 1930 can be of short duration storage or persistent storage.It is stored in The program of storage medium 1930 may include one or more modules (diagram does not mark), and each module may include to clothes The series of instructions operation being engaged in device.Further, central processing unit 1922 could be provided as communicating with storage medium 1930, The series of instructions operation in storage medium 1930 is executed on server 1900.
Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM Etc..
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the present invention Its embodiment.The present invention is directed to cover the present invention any variations, uses, or adaptations, these modifications, purposes or Person's adaptive change follows the general principle of the present invention and includes the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.
It should be understood that the invention is not limited in the precision architectures for being described above and being shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.
Above to a kind of cross-language search method provided by the present invention, a kind of cross-language search device and it is a kind of for across The device of language search, is described in detail, specific case used herein to the principle of the present invention and embodiment into Elaboration is gone, the explanation of above example is only intended to facilitate the understanding of the method and its core concept of the invention;Meanwhile for this The those skilled in the art in field, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, In conclusion the content of the present specification should not be construed as limiting the invention.

Claims (11)

1. a kind of cross-language search method, which is characterized in that including:
Obtain the search term of the first languages;
According to described search word, the search result of the second languages is obtained;
For the search result of each second languages, following steps are executed:
Determine target translation model corresponding with each default exposition of described search result;
Using the target translation model, the corresponding translation search result of each default exposition of described search result is obtained;
The corresponding translation search result of each default exposition of described search result is shown to user.
2. according to the method described in claim 1, it is characterized in that, each default displaying portion of the determination and described search result The step of split-phase corresponding target translation model, including:
The corresponding displaying type of each default exposition for determining that described search result includes;
According to the displaying type, target translation model corresponding with each default exposition is obtained.
3. according to the method described in claim 2, it is characterized in that,
If the corresponding displaying type of the default exposition is title class, it is described obtain it is corresponding with each default exposition Target translation model include:Title translation model is obtained, the title translation model is to train to obtain according to title language material;
And/or
If the corresponding displaying type of the default exposition is abstract class, it is described obtain it is corresponding with each default exposition Target translation model include:Abstract translation model is obtained, the abstract translation model is to train to obtain according to abstract language material;
And/or
If the corresponding displaying type of the default exposition is content of pages class, the acquisition and each default exposition phase Corresponding target translation model includes:Content of pages translation model is obtained, the content translation model is according in pre-set page Hold language material to train to obtain.
4. according to any method in claims 1 to 3, which is characterized in that if the default exposition is title portion Point, then it is described to utilize the target translation model, obtain the corresponding translation search of each default exposition of described search result As a result the step of, including:
Identify the pre- set symbol that the title division is included;
According to the pre- set symbol, the title division is divided into multiple semantic primitives;
Each semantic primitive obtained to segmentation using the corresponding first object translation model of the title division is translated, with To the corresponding translation result of each semantic primitive;
According to the pre- set symbol, the corresponding translation result of each semantic primitive is combined, to obtain the title portion Divide corresponding first translation search result;The first translation search result includes the pre- set symbol.
5. according to the method described in claim 4, it is characterized in that, described turned over using the corresponding first object of the title division The step of each semantic primitive that model obtains segmentation is translated is translated, including:
Each semantic primitive and its corresponding context are input to the first object translation model respectively, to obtain described first The corresponding translation result of each semantic primitive of target translation model output.
6. according to any method in claims 1 to 3, which is characterized in that if the default exposition is abstract portion Point, then it is described to utilize the target translation model, obtain the corresponding translation search of each default exposition of described search result As a result the step of, including:
Object content of the extraction positioned at preset position from the abstract part;
Using the corresponding second target translation model of the preset position, the object content is translated, to be corresponded to The second translation search result.
7. according to the method in claim 2 or 3, which is characterized in that the method further includes:Determine described search result institute The target category of category;
It is described according to the displaying type, obtaining target translation model corresponding with each default exposition includes:
Target category in conjunction with belonging to described search result and the corresponding displaying type of each default exposition obtain each default exhibition Show the corresponding target translation model in part.
8. the method according to the description of claim 7 is characterized in that target category belonging to the determining described search result Step, including:
The content that described search result includes is matched with the dictionary of each preset classification respectively, to obtain each preset classification pair The matching rate answered;
By the corresponding preset classification of the maximum in the corresponding matching rate of all preset classifications, belonging to described search result Target category.
9. the method according to the description of claim 7 is characterized in that the preset class of target belonging to the determining described search result Other step, including:
The content for including by search result inputs grader, and the classification results that the grader is exported are as described search knot Target category belonging to fruit;Wherein, the grader obtains for the search result sample training according to each preset classification.
10. a kind of cross-language search device, which is characterized in that including:
Search term acquisition module, the search term for obtaining the first languages;
Search result acquisition module, for according to described search word, obtaining the search result of the second languages;
Search result processing module is handled for the search result to each second languages;
Described search result treatment module includes:
Translation model determining module, for determining target translation mould corresponding with each default exposition of described search result Type;
Translation search result acquisition module obtains each default exhibition of described search result for utilizing the target translation model Show the corresponding translation search result in part;And
Translation search result display module, the corresponding translation of each default exposition for showing described search result to user Search result.
11. a kind of device for cross-language search, which is characterized in that include memory and one or more than one Program, one of them either more than one program be stored in memory and be configured to by one or more than one It includes the instruction for being operated below to manage device and execute the one or more programs:
Obtain the search term of the first languages;
According to described search word, the search result of the second languages is obtained;
For the search result of each second languages, following steps are executed:
Determine target translation model corresponding with each default exposition of described search result;
Using the target translation model, the corresponding translation search result of each default exposition of described search result is obtained;
The corresponding translation search result of each default exposition of described search result is shown to user.
CN201710025472.6A 2017-01-13 2017-01-13 Cross-language search method and device for cross-language search Active CN108304412B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710025472.6A CN108304412B (en) 2017-01-13 2017-01-13 Cross-language search method and device for cross-language search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710025472.6A CN108304412B (en) 2017-01-13 2017-01-13 Cross-language search method and device for cross-language search

Publications (2)

Publication Number Publication Date
CN108304412A true CN108304412A (en) 2018-07-20
CN108304412B CN108304412B (en) 2022-09-30

Family

ID=62872442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710025472.6A Active CN108304412B (en) 2017-01-13 2017-01-13 Cross-language search method and device for cross-language search

Country Status (1)

Country Link
CN (1) CN108304412B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334526A (en) * 2017-01-20 2018-07-27 北京搜狗科技发展有限公司 The methods of exhibiting and device of search result items
WO2019109663A1 (en) * 2017-12-08 2019-06-13 北京搜狗科技发展有限公司 Cross-language search method and apparatus, and apparatus for cross-language search
CN110930208A (en) * 2018-09-19 2020-03-27 阿里巴巴集团控股有限公司 Object searching method and device
CN111368117A (en) * 2018-12-26 2020-07-03 财团法人工业技术研究院 Cross-language information constructing and processing method and cross-language information system
CN111737550A (en) * 2019-03-25 2020-10-02 阿里巴巴集团控股有限公司 Search result processing method and device, storage medium and processor
CN112287217A (en) * 2020-10-23 2021-01-29 平安科技(深圳)有限公司 Medical literature retrieval method, device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090248422A1 (en) * 2008-03-28 2009-10-01 Microsoft Corporation Intra-language statistical machine translation
CN102651003A (en) * 2011-02-28 2012-08-29 北京百度网讯科技有限公司 Cross-language searching method and device
CN102779135A (en) * 2011-05-13 2012-11-14 北京百度网讯科技有限公司 Method and device for obtaining cross-linguistic search resources and corresponding search method and device
CN103838774A (en) * 2012-11-26 2014-06-04 英业达科技有限公司 Webpage inquiring system and inquiring method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090248422A1 (en) * 2008-03-28 2009-10-01 Microsoft Corporation Intra-language statistical machine translation
CN102651003A (en) * 2011-02-28 2012-08-29 北京百度网讯科技有限公司 Cross-language searching method and device
CN102779135A (en) * 2011-05-13 2012-11-14 北京百度网讯科技有限公司 Method and device for obtaining cross-linguistic search resources and corresponding search method and device
CN103838774A (en) * 2012-11-26 2014-06-04 英业达科技有限公司 Webpage inquiring system and inquiring method thereof

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334526A (en) * 2017-01-20 2018-07-27 北京搜狗科技发展有限公司 The methods of exhibiting and device of search result items
WO2019109663A1 (en) * 2017-12-08 2019-06-13 北京搜狗科技发展有限公司 Cross-language search method and apparatus, and apparatus for cross-language search
CN110930208A (en) * 2018-09-19 2020-03-27 阿里巴巴集团控股有限公司 Object searching method and device
CN110930208B (en) * 2018-09-19 2023-05-05 阿里巴巴集团控股有限公司 Object searching method and device
CN111368117A (en) * 2018-12-26 2020-07-03 财团法人工业技术研究院 Cross-language information constructing and processing method and cross-language information system
CN111368117B (en) * 2018-12-26 2023-05-30 财团法人工业技术研究院 Cross-language information construction and processing method and cross-language information system
CN111737550A (en) * 2019-03-25 2020-10-02 阿里巴巴集团控股有限公司 Search result processing method and device, storage medium and processor
CN111737550B (en) * 2019-03-25 2024-01-23 阿里巴巴集团控股有限公司 Search result processing method and device, storage medium and processor
CN112287217A (en) * 2020-10-23 2021-01-29 平安科技(深圳)有限公司 Medical literature retrieval method, device, electronic equipment and storage medium
CN112287217B (en) * 2020-10-23 2023-08-04 平安科技(深圳)有限公司 Medical document retrieval method, medical document retrieval device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN108304412B (en) 2022-09-30

Similar Documents

Publication Publication Date Title
US10515627B2 (en) Method and apparatus of building acoustic feature extracting model, and acoustic feature extracting method and apparatus
CN108304412A (en) A kind of cross-language search method and apparatus, a kind of device for cross-language search
KR102544453B1 (en) Method and device for processing information, and storage medium
CN111465918B (en) Method for displaying service information in preview interface and electronic equipment
CN113792207B (en) Cross-modal retrieval method based on multi-level feature representation alignment
CN107784034B (en) Page type identification method and device for page type identification
CN108121736A (en) A kind of descriptor determines the method for building up, device and electronic equipment of model
CN113515942A (en) Text processing method and device, computer equipment and storage medium
CN112269853B (en) Retrieval processing method, device and storage medium
CN114238690A (en) Video classification method, device and storage medium
CN108958503A (en) input method and device
CN111428522B (en) Translation corpus generation method, device, computer equipment and storage medium
CN107870904A (en) A kind of interpretation method, device and the device for translation
CN108345625B (en) Information mining method and device for information mining
CN108255940A (en) A kind of cross-language search method and apparatus, a kind of device for cross-language search
CN108255939A (en) A kind of cross-language search method and apparatus, a kind of device for cross-language search
CN116166843B (en) Text video cross-modal retrieval method and device based on fine granularity perception
CN108322770B (en) Video program identification method, related device, equipment and system
CN106919642A (en) A kind of cross-language search method and apparatus, a kind of device for cross-language search
CN113033163A (en) Data processing method and device and electronic equipment
CN112100501A (en) Information flow processing method and device and electronic equipment
CN115526602A (en) Memo reminding method, device, terminal and storage medium
CN111428523B (en) Translation corpus generation method, device, computer equipment and storage medium
CN111222011B (en) Video vector determining method and device
CN110929122B (en) Data processing method and device for data processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant