CN104778155B - The processing method and processing device of page official documents and correspondence - Google Patents

The processing method and processing device of page official documents and correspondence Download PDF

Info

Publication number
CN104778155B
CN104778155B CN201410010001.4A CN201410010001A CN104778155B CN 104778155 B CN104778155 B CN 104778155B CN 201410010001 A CN201410010001 A CN 201410010001A CN 104778155 B CN104778155 B CN 104778155B
Authority
CN
China
Prior art keywords
vocabulary
correspondence
language
density value
official documents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410010001.4A
Other languages
Chinese (zh)
Other versions
CN104778155A (en
Inventor
丁世远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Singapore Holdings Pte Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201410010001.4A priority Critical patent/CN104778155B/en
Publication of CN104778155A publication Critical patent/CN104778155A/en
Application granted granted Critical
Publication of CN104778155B publication Critical patent/CN104778155B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The application provides a kind of processing method and processing device of page official documents and correspondence.The embodiment of the present application is by determining at least one vocabulary included in pending page official documents and correspondence, and then obtain current density value of each vocabulary in the affiliated first language edition system of the page official documents and correspondence, make it possible in history density value and each vocabulary history density value in second language edition system of each vocabulary in the first language edition system at least one of, and current density value of each vocabulary of the acquisition in the first language edition system, determine that the page official documents and correspondence whether there is anomaly, processing procedure is participated in without operating personnel, it is simple to operate, and accuracy is high, so as to improve the efficiency and reliability of page official documents and correspondence processing.

Description

The processing method and processing device of page official documents and correspondence
【Technical field】
The application is related to page official documents and correspondence technology, more particularly to a kind of processing method and processing device of page official documents and correspondence.
【Background technology】
Page official documents and correspondence, official documents and correspondence can also be referred to as, be a kind of form that Informational Expression is carried out using spoken and written languages.With the whole world Change, the acceleration of internationalization process, for the user group of different language, same system needs to provide the page of different language version Face official documents and correspondence, this system are properly termed as multi-language system.For example, the user for Chinese, there is provided the page of a Chinese version Official documents and correspondence, for the user of English, there is provided page official documents and correspondence of an english language version etc..In the prior art, it is necessary to by operating personnel by The page official documents and correspondence of a pair of various language versions is checked, to find that the page official documents and correspondence whether there is anomaly, for example, wrong Turn over, leak and turn over.
However, the processing operating time length of existing page official documents and correspondence, and easily malfunction, so as to result at page official documents and correspondence The reduction of the efficiency and reliability of reason.
【The content of the invention】
The many aspects of the application provide a kind of processing method and processing device of page official documents and correspondence, to improve the processing of page official documents and correspondence Efficiency and reliability.
The one side of the application, there is provided a kind of processing method of page official documents and correspondence, including:
Determine at least one vocabulary included in pending page official documents and correspondence;
Obtain current density value of each vocabulary in the affiliated first language edition system of the page official documents and correspondence;
Existed according to history density value of each vocabulary in the first language edition system and each vocabulary In history density value in second language edition system at least one of, and the acquisition each vocabulary in first language The current density value in edition system is sayed, determines that the page official documents and correspondence whether there is anomaly;Wherein,
The first language edition system and the second language edition system are to belong to same multi-language system not With language version system.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, the page text Case includes mail official documents and correspondence, document official documents and correspondence or Web page official documents and correspondence.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, it is described to obtain often Current density value of the individual vocabulary in the first language edition system, including:
According to Di=(ti+ai)/(T+A), obtain current density value of each vocabulary in the first language edition system;
Wherein,
I represents i-th of vocabulary, and value is natural number;
tiRepresent the number that i-th of vocabulary occurs in the page official documents and correspondence;
aiRepresent the history number that i-th of vocabulary occurs in the first language edition system;
T represents the vocabulary total amount in the page official documents and correspondence;
A represents the history vocabulary total amount in the first language edition system;
DiRepresent current density value of i-th of the vocabulary in the first language edition system.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, it is described according to institute History density value and each vocabulary of each vocabulary in the first language edition system are stated in second language version system In history density value in system at least one of, and the acquisition each vocabulary in the first language edition system Current density value, before determining that the page official documents and correspondence whether there is anomaly, in addition to:
Obtain each history density value of the vocabulary in the first language edition system;And/or
Obtain each history density value of the vocabulary in the second language edition system.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, it is described according to institute History density value and each vocabulary of each vocabulary in the first language edition system are stated in second language version system In history density value in system at least one of, and the acquisition each vocabulary in the first language edition system Current density value, determine that the page official documents and correspondence whether there is anomaly, including:
If current density value of the vocabulary in the first language edition system is more than 0, and in the first language version History density value in system is 0, determines that the page official documents and correspondence whether there is anomaly;Or
If current density value of the vocabulary in the first language edition system is more than 0, and in the second language version History density value in system in part system is more than 0, determines that the page official documents and correspondence whether there is anomaly;Or
If current density value of the vocabulary in the first language edition system is more than 0, and in the second language version History density value in system is more than 0, and current density value of the vocabulary in the first language edition system, with being somebody's turn to do History density value of the vocabulary in the second language edition system, difference, more than or equal to the density threshold pre-set, Determine that the page official documents and correspondence whether there is anomaly;Or
If current density value of the vocabulary in the first language edition system is 0, and the vocabulary is in the second language History density value in edition system in whole systems is more than 0, determines that the page official documents and correspondence whether there is anomaly.
The another aspect of the application, there is provided a kind of processing unit of page official documents and correspondence, including:
Determining unit, for determining at least one vocabulary included in pending page official documents and correspondence;
Obtaining unit, it is current close in the affiliated first language edition system of the page official documents and correspondence for obtaining each vocabulary Angle value;
Processing unit, for according to each history density value and institute of the vocabulary in the first language edition system State in history density value of each vocabulary in second language edition system at least one of, and the obtaining unit obtain Each current density value of the vocabulary in the first language edition system, determine the page official documents and correspondence with the presence or absence of abnormal existing As;Wherein,
The first language edition system and the second language edition system are to belong to same multi-language system not With language version system.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, the page text Case includes mail official documents and correspondence, document official documents and correspondence or Web page official documents and correspondence.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, it is described to obtain list Member, it is specifically used for
According to Di=(ti+ai)/(T+A), obtain current density value of each vocabulary in the first language edition system;
Wherein,
I represents i-th of vocabulary, and value is natural number;
tiRepresent the number that i-th of vocabulary occurs in the page official documents and correspondence;
aiRepresent the history number that i-th of vocabulary occurs in the first language edition system;
T represents the vocabulary total amount in the page official documents and correspondence;
A represents the history vocabulary total amount in the first language edition system;
DiRepresent current density value of i-th of the vocabulary in the first language edition system.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, it is described to obtain list Member, it is additionally operable to
Obtain each history density value of the vocabulary in the first language edition system;And/or
Obtain each history density value of the vocabulary in the second language edition system.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, the processing are single Member, it is specifically used for
If current density value of the vocabulary in the first language edition system is more than 0, and in the first language version History density value in system is 0, determines that the page official documents and correspondence whether there is anomaly;Or
If current density value of the vocabulary in the first language edition system is more than 0, and in the second language version History density value in system in part system is more than 0, determines that the page official documents and correspondence whether there is anomaly;Or
If current density value of the vocabulary in the first language edition system is more than 0, and in the second language version History density value in system is more than 0, and current density value of the vocabulary in the first language edition system, with being somebody's turn to do History density value of the vocabulary in the second language edition system, difference, more than or equal to the density threshold pre-set, Determine that the page official documents and correspondence whether there is anomaly;Or
If current density value of the vocabulary in the first language edition system is 0, and the vocabulary is in the second language History density value in edition system in whole systems is more than 0, determines that the page official documents and correspondence whether there is anomaly.
As shown from the above technical solution, the embodiment of the present application by determine in pending page official documents and correspondence it is included at least One vocabulary, and then current density value of each vocabulary in the affiliated first language edition system of the page official documents and correspondence is obtained, make Can be according to history density value of each vocabulary in the first language edition system and each vocabulary the In history density value in two language version systems at least one of, and the acquisition each vocabulary in the first language Current density value in edition system, determine that the page official documents and correspondence whether there is anomaly, handled without operating personnel's participation Process, it is simple to operate, and also accuracy is high, so as to improve the efficiency and reliability of page official documents and correspondence processing.
In addition, the technical scheme provided using the application, can be occurred abnormal existing to page official documents and correspondence automatically in real time As being identified, the real-time of page official documents and correspondence processing can be effectively improved.
【Brief description of the drawings】
, below will be to embodiment or description of the prior art in order to illustrate more clearly of the technical scheme in the embodiment of the present application In the required accompanying drawing used be briefly described, it should be apparent that, drawings in the following description are some realities of the application Example is applied, for those of ordinary skill in the art, without having to pay creative labor, can also be attached according to these Figure obtains other accompanying drawings.
Fig. 1 is the schematic flow sheet of the processing method for the page official documents and correspondence that the embodiment of the application one provides;
Fig. 2 is the structural representation of the processing unit for the page official documents and correspondence that another embodiment of the application provides.
【Embodiment】
To make the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In accompanying drawing, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, rather than whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art The whole other embodiments obtained under the premise of creative work is not made, belong to the scope of the application protection.
It should be noted that terminal involved in the embodiment of the present application can include but is not limited to mobile phone, individual digital Assistant(Personal Digital Assistant, PDA), radio hand-held equipment, tablet personal computer(Tablet Computer)、 PC(Personal Computer, PC), MP3 player, MP4 players etc..
In addition, the terms "and/or", only a kind of incidence relation for describing affiliated partner, represents there may be Three kinds of relations, for example, A and/or B, can be represented:Individualism A, while A and B be present, these three situations of individualism B.Separately Outside, character "/" herein, it is a kind of relation of "or" to typically represent forward-backward correlation object.
Fig. 1 is the schematic flow sheet of the processing method for the page official documents and correspondence that the embodiment of the application one provides, as shown in Figure 1.
101st, at least one vocabulary included in pending page official documents and correspondence is determined.
Wherein, vocabulary, it can be understood as be all or particular range words or fixed phrase etc. in a kind of language.For For different language, there can be different vocabulary, for example, Chinese vocabulary, English glossary etc..
It should be noted that the content of page official documents and correspondence can include immobilized substance and changing content, due to changing content Unpredictability and uncertainty, for example, merchandise news etc., in general, in 101, can specifically be carried out to immobilized substance Scanning, to determine at least one vocabulary included in the immobilized substance of page official documents and correspondence.
Alternatively, in a possible implementation of the present embodiment, the page official documents and correspondence involved by the application can wrap Include but be not limited to mail official documents and correspondence, document official documents and correspondence or Web page official documents and correspondence, the present embodiment is to this without being particularly limited to.
Wherein, mail official documents and correspondence is the official documents and correspondence for showing content with mail he, and mail can include but is not limited to plain text class Type mail and HTML(HyperText Markup Language, HTML)Type mail.
Wherein, document official documents and correspondence is the official documents and correspondence for showing content with document form.Document can include but is not limited to WORD documents, EXCEL document or PDF document.
Wherein, WWW(World Wide Web, Web)Page official documents and correspondence is the official documents and correspondence for showing content in the form of Web page. Web page can include by one or more HTML(HyperText Markup Language, HTML)Mark A display block of composition, referred to as page elements are signed, for example, text, label, hyperlink, button, input frame, combobox etc..
It should be noted that pending page official documents and correspondence can be the page official documents and correspondence of content change, or can also be new Increased page official documents and correspondence, the present embodiment is to this without being particularly limited to.
102nd, current density value of each vocabulary in the affiliated first language edition system of the page official documents and correspondence is obtained.
Alternatively,, specifically can be according to D in 102 in a possible implementation of the present embodimenti=(ti+ ai)/(T+A), obtain current density value of each vocabulary in the first language edition system.
Wherein,
I represents i-th of vocabulary, and value is natural number;
tiRepresent the number that i-th of vocabulary occurs in the page official documents and correspondence;
aiRepresent the history number that i-th of vocabulary occurs in the first language edition system;
T represents the vocabulary total amount in the page official documents and correspondence;
A represents the history vocabulary total amount in the first language edition system;
DiRepresent current density value of i-th of the vocabulary in the first language edition system.
103rd, according to each history density value and each word of the vocabulary in the first language edition system Converge in history density value in second language edition system at least one of, and the acquisition each vocabulary described the Current density value in one language version system, determine that the page official documents and correspondence whether there is anomaly.
Wherein, the first language edition system is to belong to same multi-language system with the second language edition system Different language edition system.Multi-language system, being exactly can be with multilingual(A kind of rather than language)Provide the user letter Breath service, content identical information can be obtained using the user of different language from multi-language system by allowing.
Specifically, the system can be website, and multi-language system can be then multi-language website.For example, global speed is sold It is logical(aliexpress)Website, the multi-language website can have multiple language version websites, for example, this website of the French edition Fr.aliexpress.com, German version website de.aliexpress.com, Russian version website Ru.aliexpress.com, japanese version website ja.aliexpress.com etc..
It is understood that second language edition system can be multi-language system in except first language edition system it Other outer all systems, or can also be the other parts system in multi-language system in addition to first language edition system System, the present embodiment is to this without being particularly limited to.
Alternatively, in a possible implementation of the present embodiment, before 103, institute can also further be obtained State history density value of each vocabulary in the first language edition system.It can specifically utilize in first language edition system History page official documents and correspondence, obtain each history density value of the vocabulary in the first language edition system.Specifically, have Body can be according to D 'i=ai/ A, obtain each history density value of the vocabulary in the first language edition system.
Wherein,
I represents i-th of vocabulary, and value is natural number;
aiRepresent the history number that i-th of vocabulary occurs in the first language edition system;
A represents the history vocabulary total amount in the first language edition system;
D′iRepresent current density value of i-th of the vocabulary in the first language edition system.
Alternatively, in a possible implementation of the present embodiment, before 103, institute can also further be obtained State history density value of each vocabulary in the second language edition system.It can specifically utilize in second language edition system History page official documents and correspondence, obtain each history density value of the vocabulary in the second language edition system.Specifically obtain The method of obtaining, the preparation method of each history density value of the vocabulary in the first language edition system is may be referred to, in detail Thin description may refer to the related content in previous possible implementation, and here is omitted.
It is understood that in a possible implementation of the present embodiment, can also be further before 103 The operation in the possible implementation of above-mentioned two is performed, that is, obtains each vocabulary in the first language edition system History density value, and obtain each history density value of the vocabulary in the second language edition system.Retouch in detail The related content that may refer in the possible implementation of both of the aforesaid is stated, here is omitted.
, specifically can be according to the situation of change of the density of each vocabulary, it is determined that described in 103 in the present embodiment Page official documents and correspondence whether there is anomaly, can enumerate possible implementation that is several but being not all of, the present embodiment below To this without being particularly limited to.
Alternatively, in a possible implementation of the present embodiment, in 103, if vocabulary is in the first language Current density value in edition system is more than 0, and the history density value in the first language edition system is 0, then can be with Determine that the page official documents and correspondence whether there is anomaly.Now, illustrate that this vocabulary in pending page official documents and correspondence does not exist Occurred in the history page official documents and correspondence of first language edition system, then can determine that risk is larger.For example, it may be possible to it is this vocabulary It is mess code, or be also likely to be the entanglement of language translation etc..For example, in first language edition system before " mobile phone " this vocabulary In existing page official documents and correspondence do not occurred, i.e., history density value of " mobile phone " this vocabulary in first language edition system is 0, And occur in " mobile phone " this vocabulary page official documents and correspondence pending in first language edition system, i.e., " mobile phone " this vocabulary exists Current density value in first language edition system is then 0.05, illustrates " mobile phone " this vocabulary in pending page official documents and correspondence Risk it is larger.
Alternatively, in a possible implementation of the present embodiment, in 103, if vocabulary is in the first language Current density value in edition system is more than 0, and the history density value in the second language edition system in part system More than 0, then it can determine that the page official documents and correspondence whether there is anomaly.Now, this in pending page official documents and correspondence is illustrated Individual vocabulary occurs in two language version systems or two or more language version system, then can determine to translate it is problematic because Under normal circumstances should a vocabulary be only present in a language version system i.e. language particular text, or appear in whole languages Say that edition system is public text.Turned over for example, it may be possible to which this word is leakage.For example, " mobile phone " this vocabulary is in first language version Occur in the system in pending page official documents and correspondence, i.e. current density of " mobile phone " this vocabulary in first language edition system Value then be 0.05, and " mobile phone " this vocabulary before in second language edition system a system(Rather than whole systems)In Occurred, i.e., history density value of " mobile phone " this vocabulary in second language edition system in a system is 0.1, and explanation is treated " mobile phone " in the page official documents and correspondence of processing is probably that leakage is turned over.
Alternatively, in a possible implementation of the present embodiment, in 103, if vocabulary is in the first language Current density value in edition system is more than 0, and the history density value in the second language edition system is more than 0, and Current density value of the vocabulary in the first language edition system, and the vocabulary is in the second language edition system History density value, difference, more than or equal to the density threshold pre-set, then can determine whether the page official documents and correspondence is deposited In anomaly.Now, illustrate density of this vocabulary in pending page official documents and correspondence in first language edition system with Density difference in second language edition system in other systems is very big, then can determine certain risk be present.For example, it may be possible to It is that this vocabulary is mess code, or is also likely to be that vocabulary is piled up, i.e., how inferior same one section of text occurs in page official documents and correspondence. For example, occur in " iphone " this vocabulary page official documents and correspondence pending in first language edition system, i.e., " iphone " this Current density value of the individual vocabulary in first language edition system is then 0.5, and in the second language before " iphone " this vocabulary Occurred in speech edition system, i.e. history density value 0.0001 of " iphone " this vocabulary in second language edition system, It is probably that vocabulary piles up problem to illustrate " iphone " in pending page official documents and correspondence.
Alternatively, in a possible implementation of the present embodiment, in 103, if vocabulary is in the first language Current density value in edition system is 0, and history of the vocabulary in the second language edition system in whole systems is close Angle value is more than 0, then can determine that the page official documents and correspondence whether there is anomaly.Now, illustrate in pending page official documents and correspondence, And do not occurred this vocabulary in the history page official documents and correspondence of first language edition system, but second language edition system This vocabulary was occurred in history page official documents and correspondence, then can determine to translate it is problematic because under normal circumstances should be one Vocabulary is only present in a language version system i.e. language particular text, or appears in the i.e. public text of whole language version systems This.For example, it may be possible to which this vocabulary is public text, it has been missed.For example, " com " this vocabulary is in first language version system Do not occurred in pending page official documents and correspondence in system, i.e., " com " this vocabulary is current close in first language edition system Angle value then be 0, and " com " this vocabulary before occurred in second language edition system in whole systems, i.e., " com " this History density value of the vocabulary in second language edition system in whole systems is all higher than 0, illustrates in pending page official documents and correspondence " com " be probably public text, be missed.
It is understood that by performing 103, determine that the page official documents and correspondence whether there is anomaly, one can also be entered Step performs alarm operation, to indicate that operating personnel are investigated and adjusted to the page official documents and correspondence.
It should be noted that 101~103 executive agent can be processing unit, for example, page official documents and correspondence editing machine, can It is online to carry out in the client that is located locally, to carry out processed offline, or may be located in the server of network side Processing, the present embodiment is to this without limiting.
It is understood that the client can be mounted in the application program in terminal, or it can also be and browse One webpage of device, if can realize the page official documents and correspondence processing objective reality form can, the present embodiment to this without Limit.
Existing processing method to page official documents and correspondence, it is necessary to be checked, to find the page official documents and correspondence one by one by operating personnel With the presence or absence of anomaly.However, manually check that page official documents and correspondence is that exception easily brings two problems.
Firstth, efficiency is very low, the system of particularly slightly larger type, and page official documents and correspondence just has a hundreds of thousands, and operating personnel can not one One checks;
Secondth, it is artificial to check the exception easily missed in page official documents and correspondence, for example, the abnormal seldom, word in page official documents and correspondence Many situations, operating personnel are difficult that naked eyes are found.
The technical scheme provided using the present embodiment, participated in without operating personnel, it is simple to operate, and also accuracy is high.
In the present embodiment, by determining at least one vocabulary included in pending page official documents and correspondence, and then obtain every Current density value of the individual vocabulary in the affiliated first language edition system of the page official documents and correspondence, enabling according to each word History density value and each vocabulary the going through in second language edition system of the remittance in the first language edition system At least one of in history density value, and current density of each vocabulary in the first language edition system of the acquisition Value, determines that the page official documents and correspondence whether there is anomaly, and processing procedure, simple to operate, Er Qiezheng are participated in without operating personnel True rate is high, so as to improve the efficiency and reliability of page official documents and correspondence processing.
In addition, the technical scheme provided using the application, can be occurred abnormal existing to page official documents and correspondence automatically in real time As being identified, the real-time of page official documents and correspondence processing can be effectively improved.
It should be noted that for foregoing each method embodiment, in order to be briefly described, therefore it is all expressed as a series of Combination of actions, but those skilled in the art should know, the application is not limited by described sequence of movement because According to the application, some steps can use other orders or carry out simultaneously.Secondly, those skilled in the art should also know Know, embodiment described in this description belongs to preferred embodiment, involved action and module not necessarily the application It is necessary.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiment.
Fig. 2 is the structural representation of the processing unit for the page official documents and correspondence that another embodiment of the application provides, as shown in Figure 2. The processing unit of the page official documents and correspondence of the present embodiment can include determining that unit 21, obtaining unit 22 and processing unit 23.Wherein,
Determining unit 21, for determining at least one vocabulary included in pending page official documents and correspondence.Wherein, vocabulary, It can be understood as all or particular range words or fixed phrase etc. in a kind of language., can be with for different language There is different vocabulary, for example, Chinese vocabulary, English glossary etc..It should be noted that the content of page official documents and correspondence can include fixing Content and changing content, due to the unpredictability and uncertainty of changing content, for example, merchandise news etc., in general, institute Stating determining unit 21 can specifically be scanned to immobilized substance, with determine in the immobilized substance of page official documents and correspondence it is included at least One vocabulary.It should be noted that pending page official documents and correspondence can be the page official documents and correspondence of content change, or can also be new Increased page official documents and correspondence, the present embodiment is to this without being particularly limited to.
Obtaining unit 22, it is current in the affiliated first language edition system of the page official documents and correspondence for obtaining each vocabulary Density value.
Processing unit 23, for according to each history density value of the vocabulary in the first language edition system and At least one of in each history density value of the vocabulary in second language edition system, and the obtaining unit 22 obtains Current density value of each vocabulary obtained in the first language edition system, determine the page official documents and correspondence with the presence or absence of abnormal Phenomenon.
Wherein, the first language edition system is to belong to same multi-language system with the second language edition system Different language edition system.Multi-language system, being exactly can be with multilingual(A kind of rather than language)Provide the user letter Breath service, content identical information can be obtained using the user of different language from multi-language system by allowing.
Specifically, the system can be website, and multi-language system can be then multi-language website.For example, global speed is sold It is logical(aliexpress)Website, the multi-language website can have multiple language version websites, for example, this website of the French edition Fr.aliexpress.com, German version website de.aliexpress.com, Russian version website Ru.aliexpress.com, japanese version website ja.aliexpress.com etc..
It is understood that second language edition system can be multi-language system in except first language edition system it Other outer all systems, or can also be the other parts system in multi-language system in addition to first language edition system System, the present embodiment is to this without being particularly limited to.
Alternatively, in a possible implementation of the present embodiment, the page official documents and correspondence involved by the application can wrap Include but be not limited to mail official documents and correspondence, document official documents and correspondence or Web page official documents and correspondence, the present embodiment is to this without being particularly limited to.
Wherein, mail official documents and correspondence is the official documents and correspondence for showing content with mail he, and mail can include but is not limited to plain text class Type mail and HTML(HyperText Markup Language, HTML)Type mail.
Wherein, document official documents and correspondence is the official documents and correspondence for showing content with document form.Document can include but is not limited to WORD documents, EXCEL document or PDF document.
Wherein, WWW(World Wide Web, Web)Page official documents and correspondence is the official documents and correspondence for showing content in the form of Web page. Web page can include by one or more HTML(HyperText Markup Language, HTML)Mark A display block of composition, referred to as page elements are signed, for example, text, label, hyperlink, button, input frame, combobox etc..
Alternatively, in a possible implementation of the present embodiment, the obtaining unit 22, it specifically can be used for root According to Di=(ti+ai)/(T+A), obtain current density value of each vocabulary in the first language edition system.
Wherein,
I represents i-th of vocabulary, and value is natural number;
tiRepresent the number that i-th of vocabulary occurs in the page official documents and correspondence;
aiRepresent the history number that i-th of vocabulary occurs in the first language edition system;
T represents the vocabulary total amount in the page official documents and correspondence;
A represents the history vocabulary total amount in the first language edition system;
DiRepresent current density value of i-th of the vocabulary in the first language edition system.
Alternatively, in a possible implementation of the present embodiment, the obtaining unit 22, can also further use In the acquisition each history density value of the vocabulary in the first language edition system.The obtaining unit 22 specifically can be with Using the history page official documents and correspondence in first language edition system, each vocabulary is obtained in the first language edition system History density value.Specifically, the obtaining unit 22 specifically can be according to D 'i=ai/ A, each vocabulary is obtained described History density value in first language edition system.
Wherein,
I represents i-th of vocabulary, and value is natural number;
aiRepresent the history number that i-th of vocabulary occurs in the first language edition system;
A represents the history vocabulary total amount in the first language edition system;
D′iRepresent current density value of i-th of the vocabulary in the first language edition system.
Alternatively, in a possible implementation of the present embodiment, the obtaining unit 22, can also further use In the acquisition each history density value of the vocabulary in the second language edition system.The obtaining unit 22 specifically can be with Using the history page official documents and correspondence in second language edition system, each vocabulary is obtained in the second language edition system History density value.The 22 specific preparation method of obtaining unit, each vocabulary is may be referred in the first language The preparation method of history density value in edition system, the correlation that may refer in previous possible implementation is described in detail Content, here is omitted.
It is understood that in a possible implementation of the present embodiment, the obtaining unit 22 can also be entered One step is used to perform the operation in the possible implementation of above-mentioned two, that is, obtains each vocabulary in the first language version History density value in the system, and obtain each history density of the vocabulary in the second language edition system Value.The related content that may refer in the possible implementation of both of the aforesaid is described in detail, here is omitted.
In the present embodiment, the processing unit 23 specifically can be according to the situation of change of the density of each vocabulary, really The fixed page official documents and correspondence whether there is anomaly, can enumerate possible implementation that is several but being not all of below, this Embodiment is to this without being particularly limited to.
Alternatively, in a possible implementation of the present embodiment, if vocabulary is in the first language edition system In current density value be more than 0, and the history density value in the first language edition system be 0, the processing unit 23 It can then determine that the page official documents and correspondence whether there is anomaly.Now, this vocabulary in pending page official documents and correspondence is illustrated Do not occurred in the history page official documents and correspondence of first language edition system, then can determine that risk is larger.For example, it may be possible to it is this Individual vocabulary is mess code, or be also likely to be the entanglement of language translation etc..For example, in first language version before " mobile phone " this vocabulary Existing page official documents and correspondence did not occur in the system, i.e. history density of " mobile phone " this vocabulary in first language edition system Be worth for 0, and occur in " mobile phone " this vocabulary page official documents and correspondence pending in first language edition system, i.e., " mobile phone " this Current density value of the vocabulary in first language edition system is then 0.05, illustrate " mobile phone " in pending page official documents and correspondence this The risk of individual vocabulary is larger.
Alternatively, in a possible implementation of the present embodiment, if vocabulary is in the first language edition system In current density value be more than 0, and the history density value in the second language edition system in part system be more than 0, institute Stating processing unit 23 can then determine that the page official documents and correspondence whether there is anomaly.Now, pending page official documents and correspondence is illustrated In this vocabulary occur in two language version systems or two or more language version system, then can determine that translation is asked Topic because under normal circumstances should a vocabulary be only present in a language version system i.e. language particular text, or occur It is public text in whole language version systems.Turned over for example, it may be possible to which this word is leakage.For example, " mobile phone " this vocabulary is Occur in one language version system in pending page official documents and correspondence, i.e., " mobile phone " this vocabulary is in first language edition system Current density value then be 0.05, and " mobile phone " this vocabulary before in second language edition system a system(It is rather than complete Portion's system)In occurred, i.e. history density value of " mobile phone " this vocabulary in second language edition system in a system is 0.1, it is probably that leakage is turned over to illustrate " mobile phone " in pending page official documents and correspondence.
Alternatively, in a possible implementation of the present embodiment, if vocabulary is in the first language edition system In current density value be more than 0, and the history density value in the second language edition system is more than 0, and the vocabulary exists Current density value in the first language edition system, and history of the vocabulary in the second language edition system is close Angle value, difference, more than or equal to the density threshold pre-set, the processing unit 23 can then determine the page official documents and correspondence With the presence or absence of anomaly.Now, illustrate this vocabulary in pending page official documents and correspondence in first language edition system Density is very big with the density difference in the other systems in second language edition system, then can determine certain risk be present.Example Such as, it may be possible to which this vocabulary is mess code, or is also likely to be that vocabulary is piled up i.e. same one section of text and occurred in page official documents and correspondence It is more inferior.For example, occur in " iphone " this vocabulary page official documents and correspondence pending in first language edition system, i.e., Current density value of " iphone " this vocabulary in first language edition system then be 0.5, and " iphone " this vocabulary it It is preceding to occur in second language edition system, i.e. history density of " iphone " this vocabulary in second language edition system Value 0.0001, it is probably that vocabulary piles up problem to illustrate " iphone " in pending page official documents and correspondence.
Alternatively, in a possible implementation of the present embodiment, if vocabulary is in the first language edition system In current density value be 0, and history density value of the vocabulary in the second language edition system in whole systems is more than 0, the processing unit 23 can then determine that the page official documents and correspondence whether there is anomaly.Now, the pending page is illustrated Do not occurred this vocabulary in official documents and correspondence and in the history page official documents and correspondence of first language edition system, but second language version Occurred this vocabulary in the history page official documents and correspondence of the system, then can determine to translate it is problematic because under normal circumstances Should a vocabulary be only present in a language version system i.e. language particular text, or appear in whole language version systems I.e. public text.For example, it may be possible to which this vocabulary is public text, it has been missed.For example, " com " this vocabulary is in the first language Do not occurred in pending page official documents and correspondence in speech edition system, i.e., " com " this vocabulary is in first language edition system Current density value then be 0, and " com " this vocabulary before occurred in second language edition system in whole systems, i.e., History density value of " com " this vocabulary in second language edition system in whole systems is all higher than 0, illustrates pending page " com " in the official documents and correspondence of face is probably public text, is missed.
It is understood that the processing unit 23 may be used also by determining that the page official documents and correspondence whether there is anomaly Further to perform alarm operation, to indicate that operating personnel are investigated and adjusted to the page official documents and correspondence.
It should be noted that the processing unit for the page official documents and correspondence that the present embodiment provides, for example, it may be page official documents and correspondence editor Device, in the client that can be located locally, to carry out processed offline, or it may be located in the server of network side, to enter The online processing of row, the present embodiment is to this without limiting.
It is understood that the client can be mounted in the application program in terminal, or it can also be and browse One webpage of device, if can realize the page official documents and correspondence processing objective reality form can, the present embodiment to this without Limit.
Existing processing method to page official documents and correspondence, it is necessary to be checked, to find the page official documents and correspondence one by one by operating personnel With the presence or absence of anomaly.However, manually check that page official documents and correspondence is that exception easily brings two problems.
Firstth, efficiency is very low, the system of particularly slightly larger type, and page official documents and correspondence just has a hundreds of thousands, and operating personnel can not one One checks;
Secondth, it is artificial to check the exception easily missed in page official documents and correspondence, for example, the abnormal seldom, word in page official documents and correspondence Many situations, operating personnel are difficult that naked eyes are found.
The technical scheme provided using the present embodiment, participated in without operating personnel, it is simple to operate, and also accuracy is high.
In the present embodiment, at least one vocabulary included in pending page official documents and correspondence is determined by determining unit, is entered And current density value of each vocabulary in the affiliated first language edition system of the page official documents and correspondence is obtained by obtaining unit so that Processing unit can be according to each history density value and each word of the vocabulary in the first language edition system At least one of in the history density value converged in second language edition system, and each vocabulary that the obtaining unit obtains Current density value in the first language edition system, determine that the page official documents and correspondence whether there is anomaly, without behaviour Make personnel and participate in processing procedure, it is simple to operate, and accuracy is high, so as to improve the efficiency of page official documents and correspondence processing and reliable Property.
In addition, the technical scheme provided using the application, can be occurred abnormal existing to page official documents and correspondence automatically in real time As being identified, the real-time of page official documents and correspondence processing can be effectively improved.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.
In several embodiments provided herein, it should be understood that disclosed system, apparatus and method can be with Realize by another way.For example, device embodiment described above is only schematical, for example, the unit Division, only a kind of division of logic function, can there is other dividing mode, such as multiple units or component when actually realizing Another system can be combined or be desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or The mutual coupling discussed or direct-coupling or communication connection can be the indirect couplings by some interfaces, device or unit Close or communicate to connect, can be electrical, mechanical or other forms.
The unit illustrated as separating component can be or may not be physically separate, show as unit The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
In addition, each functional unit in each embodiment of the application can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list Member can both be realized in the form of hardware, can also be realized in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit realized in the form of SFU software functional unit, can be stored in one and computer-readable deposit In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are causing a computer Equipment(Can be personal computer, server, or network equipment etc.)Or processor(processor)It is each to perform the application The part steps of embodiment methods described.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage(Read- Only Memory, ROM), random access memory(Random Access Memory, RAM), magnetic disc or CD etc. it is various Can be with the medium of store program codes.
Finally it should be noted that:Above example is only to illustrate the technical scheme of the application, rather than its limitations;Although The application is described in detail with reference to the foregoing embodiments, it will be understood by those within the art that:It still may be used To be modified to the technical scheme described in foregoing embodiments, or equivalent substitution is carried out to which part technical characteristic; And these modification or replace, do not make appropriate technical solution essence depart from each embodiment technical scheme of the application spirit and Scope.

Claims (10)

  1. A kind of 1. processing method of page official documents and correspondence, it is characterised in that including:
    Determine at least one vocabulary included in pending page official documents and correspondence;
    Obtain current density value of each vocabulary in the affiliated first language edition system of the page official documents and correspondence;
    According to history density value of each vocabulary in the first language edition system and each vocabulary second At least one of in history density value in language version system, and each vocabulary is first belonging to the page official documents and correspondence Current density value in language version system, determine that the page official documents and correspondence whether there is anomaly;
    Wherein, the second language edition system be in multi-language system in addition to first language edition system other are all System or part system.
  2. 2. according to the method for claim 1, it is characterised in that the page official documents and correspondence include mail official documents and correspondence, document official documents and correspondence or Web page official documents and correspondence.
  3. 3. according to the method for claim 1, it is characterised in that described to obtain each vocabulary belonging to the page official documents and correspondence the Current density value in one language version system, including:
    According to Di=(ti+ai)/(T+A), obtain current density value of each vocabulary in the first language edition system;
    Wherein,
    I represents i-th of vocabulary, and value is natural number;
    tiRepresent the number that i-th of vocabulary occurs in the page official documents and correspondence;
    aiRepresent the history number that i-th of vocabulary occurs in the first language edition system;
    T represents the vocabulary total amount in the page official documents and correspondence;
    A represents the history vocabulary total amount in the first language edition system;
    DiRepresent current density value of i-th of the vocabulary in the first language edition system.
  4. 4. according to the method for claim 1, it is characterised in that it is described according to each vocabulary in the first language version At least one in history density value and each history density value of the vocabulary in second language edition system in the system , and each current density value of the vocabulary in the affiliated first language edition system of the page official documents and correspondence, it is determined that described Before page official documents and correspondence whether there is anomaly, in addition to:
    Obtain each history density value of the vocabulary in the first language edition system;And/or
    Obtain each history density value of the vocabulary in the second language edition system.
  5. 5. according to the method described in Claims 1 to 4 any claim, it is characterised in that described according to each vocabulary History density value and each history of the vocabulary in second language edition system in the first language edition system At least one of in density value, and each vocabulary is current in the affiliated first language edition system of the page official documents and correspondence Density value, determine that the page official documents and correspondence whether there is anomaly, including:
    If current density value of the vocabulary in the first language edition system is more than 0, and in the first language edition system In history density value be 0, determine that the page official documents and correspondence has anomaly;Or
    If current density value of the vocabulary in the first language edition system is more than 0, and in the second language edition system History density value in middle part system is more than 0, determines that the page official documents and correspondence has anomaly;Or
    If current density value of the vocabulary in the first language edition system is more than 0, and in the second language edition system In history density value be more than 0, and current density value of the vocabulary in the first language edition system and the vocabulary exist The difference of history density value in the second language edition system, more than or equal to the density threshold pre-set, determine institute State page official documents and correspondence and anomaly be present;Or
    If current density value of the vocabulary in the first language edition system is 0, and the vocabulary is in the second language version History density value in system in whole systems is more than 0, determines that the page official documents and correspondence has anomaly.
  6. A kind of 6. processing unit of page official documents and correspondence, it is characterised in that including:
    Determining unit, for determining at least one vocabulary included in pending page official documents and correspondence;
    Obtaining unit, for obtaining current density of each vocabulary in the affiliated first language edition system of the page official documents and correspondence Value;
    Processing unit, for according to history density value of each vocabulary in the first language edition system and described every In history density value of the individual vocabulary in second language edition system at least one of, and the obtaining unit obtain it is each Current density value of the vocabulary in the affiliated first language edition system of the page official documents and correspondence, determines that the page official documents and correspondence whether there is Anomaly;Wherein, the second language edition system is its in addition to first language edition system in multi-language system His all systems or part system.
  7. 7. device according to claim 6, it is characterised in that the page official documents and correspondence include mail official documents and correspondence, document official documents and correspondence or Web page official documents and correspondence.
  8. 8. device according to claim 6, it is characterised in that the obtaining unit, specifically for according to Di=(ti+ai)/ (T+A) current density value of each vocabulary in the first language edition system, is obtained;
    Wherein,
    I represents i-th of vocabulary, and value is natural number;
    tiRepresent the number that i-th of vocabulary occurs in the page official documents and correspondence;
    aiRepresent the history number that i-th of vocabulary occurs in the first language edition system;
    T represents the vocabulary total amount in the page official documents and correspondence;
    A represents the history vocabulary total amount in the first language edition system;
    DiRepresent current density value of i-th of the vocabulary in the first language edition system.
  9. 9. device according to claim 6, it is characterised in that the obtaining unit, be additionally operable to
    Obtain each history density value of the vocabulary in the first language edition system;And/or
    Obtain each history density value of the vocabulary in the second language edition system.
  10. 10. according to the device described in claim 6~9 any claim, it is characterised in that the processing unit, it is specific to use In
    If current density value of the vocabulary in the first language edition system is more than 0, and in the first language edition system In history density value be 0, determine that the page official documents and correspondence has anomaly;Or
    If current density value of the vocabulary in the first language edition system is more than 0, and in the second language edition system History density value in middle part system is more than 0, determines that the page official documents and correspondence has anomaly;Or
    If current density value of the vocabulary in the first language edition system is more than 0, and in the second language edition system In history density value be more than 0, and current density value of the vocabulary in the first language edition system and the vocabulary exist The difference of history density value in the second language edition system, more than or equal to the density threshold pre-set, determine institute State page official documents and correspondence and anomaly be present;Or
    If current density value of the vocabulary in the first language edition system is 0, and the vocabulary is in the second language version History density value in system in whole systems is more than 0, determines that the page official documents and correspondence has anomaly.
CN201410010001.4A 2014-01-09 2014-01-09 The processing method and processing device of page official documents and correspondence Active CN104778155B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410010001.4A CN104778155B (en) 2014-01-09 2014-01-09 The processing method and processing device of page official documents and correspondence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410010001.4A CN104778155B (en) 2014-01-09 2014-01-09 The processing method and processing device of page official documents and correspondence

Publications (2)

Publication Number Publication Date
CN104778155A CN104778155A (en) 2015-07-15
CN104778155B true CN104778155B (en) 2017-12-15

Family

ID=53619629

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410010001.4A Active CN104778155B (en) 2014-01-09 2014-01-09 The processing method and processing device of page official documents and correspondence

Country Status (1)

Country Link
CN (1) CN104778155B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030199A (en) * 2005-11-14 2007-09-05 林伯颖 Process official and business documents in several languages for different national institutions
CN101923540A (en) * 2010-07-20 2010-12-22 陈洁 Language translation quality auditing method
CN101950286A (en) * 2010-09-14 2011-01-19 传神联合(北京)信息技术有限公司 Error correction module and method in software translation system
CN102262621A (en) * 2010-05-26 2011-11-30 钟长林 Device and method for checking translated text
CN103049437A (en) * 2011-10-17 2013-04-17 圣侨资讯事业股份有限公司 Multi-language editing system for online publications

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030199A (en) * 2005-11-14 2007-09-05 林伯颖 Process official and business documents in several languages for different national institutions
CN102262621A (en) * 2010-05-26 2011-11-30 钟长林 Device and method for checking translated text
CN101923540A (en) * 2010-07-20 2010-12-22 陈洁 Language translation quality auditing method
CN101950286A (en) * 2010-09-14 2011-01-19 传神联合(北京)信息技术有限公司 Error correction module and method in software translation system
CN103049437A (en) * 2011-10-17 2013-04-17 圣侨资讯事业股份有限公司 Multi-language editing system for online publications

Also Published As

Publication number Publication date
CN104778155A (en) 2015-07-15

Similar Documents

Publication Publication Date Title
Jurgens et al. Incorporating dialectal variability for socially equitable language identification
US10599765B2 (en) Semantic translation model training
US10765956B2 (en) Named entity recognition on chat data
US8463598B2 (en) Word detection
Garg et al. Sentiment analysis of the Uri terror attack using Twitter
US20160110352A1 (en) Information redaction from document data
William et al. Framework for design and implementation of chat support system using natural language processing
CN106874253A (en) Recognize the method and device of sensitive information
CN104866308A (en) Scenario image generation method and apparatus
CN109992653A (en) Information processing method and processing system
CN105893484A (en) Microblog Spammer recognition method based on text characteristics and behavior characteristics
JP2015072614A (en) Method for detecting expression capable of becoming dangerous expression by relying on specific theme and electronic device and program for electronic device for detecting the same expression
CN104915359A (en) Theme label recommending method and device
Olney et al. Part of speech tagging Java method names
CN112560846B (en) Error correction corpus generation method and device and electronic equipment
CN104750670B (en) The processing method and processing device of page official documents and correspondence
CN110738056A (en) Method and apparatus for generating information
CN109062891A (en) Media processing method, device, terminal and medium
CN104778155B (en) The processing method and processing device of page official documents and correspondence
Zhou et al. Virtual data augmentation: A robust and general framework for fine-tuning pre-trained models
CN110276001B (en) Checking page identification method and device, computing equipment and medium
Hemati et al. PCoQA: Persian Conversational Question Answering Dataset
Barker et al. Assessing the Comparability of News Texts.
CN106775914A (en) A kind of code method for internationalizing and device for automatically generating key assignments
Kumar Challenges in the development of annotated corpus of computer-mediated communication in Indian Languages: A Case of Hindi

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240322

Address after: Singapore

Patentee after: Alibaba Singapore Holdings Ltd.

Country or region after: Singapore

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: ALIBABA GROUP HOLDING Ltd.

Country or region before: Cayman Islands

TR01 Transfer of patent right