CN104778155B - The processing method and processing device of page official documents and correspondence - Google Patents
The processing method and processing device of page official documents and correspondence Download PDFInfo
- Publication number
- CN104778155B CN104778155B CN201410010001.4A CN201410010001A CN104778155B CN 104778155 B CN104778155 B CN 104778155B CN 201410010001 A CN201410010001 A CN 201410010001A CN 104778155 B CN104778155 B CN 104778155B
- Authority
- CN
- China
- Prior art keywords
- vocabulary
- correspondence
- language
- density value
- official documents
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012545 processing Methods 0.000 title claims abstract description 43
- 238000003672 processing method Methods 0.000 title claims abstract description 11
- 238000000034 method Methods 0.000 claims abstract description 16
- 230000002159 abnormal effect Effects 0.000 description 7
- 230000008859 change Effects 0.000 description 5
- 239000000126 substance Substances 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000002360 preparation method Methods 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007306 turnover Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000007257 malfunction Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Abstract
The application provides a kind of processing method and processing device of page official documents and correspondence.The embodiment of the present application is by determining at least one vocabulary included in pending page official documents and correspondence, and then obtain current density value of each vocabulary in the affiliated first language edition system of the page official documents and correspondence, make it possible in history density value and each vocabulary history density value in second language edition system of each vocabulary in the first language edition system at least one of, and current density value of each vocabulary of the acquisition in the first language edition system, determine that the page official documents and correspondence whether there is anomaly, processing procedure is participated in without operating personnel, it is simple to operate, and accuracy is high, so as to improve the efficiency and reliability of page official documents and correspondence processing.
Description
【Technical field】
The application is related to page official documents and correspondence technology, more particularly to a kind of processing method and processing device of page official documents and correspondence.
【Background technology】
Page official documents and correspondence, official documents and correspondence can also be referred to as, be a kind of form that Informational Expression is carried out using spoken and written languages.With the whole world
Change, the acceleration of internationalization process, for the user group of different language, same system needs to provide the page of different language version
Face official documents and correspondence, this system are properly termed as multi-language system.For example, the user for Chinese, there is provided the page of a Chinese version
Official documents and correspondence, for the user of English, there is provided page official documents and correspondence of an english language version etc..In the prior art, it is necessary to by operating personnel by
The page official documents and correspondence of a pair of various language versions is checked, to find that the page official documents and correspondence whether there is anomaly, for example, wrong
Turn over, leak and turn over.
However, the processing operating time length of existing page official documents and correspondence, and easily malfunction, so as to result at page official documents and correspondence
The reduction of the efficiency and reliability of reason.
【The content of the invention】
The many aspects of the application provide a kind of processing method and processing device of page official documents and correspondence, to improve the processing of page official documents and correspondence
Efficiency and reliability.
The one side of the application, there is provided a kind of processing method of page official documents and correspondence, including:
Determine at least one vocabulary included in pending page official documents and correspondence;
Obtain current density value of each vocabulary in the affiliated first language edition system of the page official documents and correspondence;
Existed according to history density value of each vocabulary in the first language edition system and each vocabulary
In history density value in second language edition system at least one of, and the acquisition each vocabulary in first language
The current density value in edition system is sayed, determines that the page official documents and correspondence whether there is anomaly;Wherein,
The first language edition system and the second language edition system are to belong to same multi-language system not
With language version system.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, the page text
Case includes mail official documents and correspondence, document official documents and correspondence or Web page official documents and correspondence.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, it is described to obtain often
Current density value of the individual vocabulary in the first language edition system, including:
According to Di=(ti+ai)/(T+A), obtain current density value of each vocabulary in the first language edition system;
Wherein,
I represents i-th of vocabulary, and value is natural number;
tiRepresent the number that i-th of vocabulary occurs in the page official documents and correspondence;
aiRepresent the history number that i-th of vocabulary occurs in the first language edition system;
T represents the vocabulary total amount in the page official documents and correspondence;
A represents the history vocabulary total amount in the first language edition system;
DiRepresent current density value of i-th of the vocabulary in the first language edition system.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, it is described according to institute
History density value and each vocabulary of each vocabulary in the first language edition system are stated in second language version system
In history density value in system at least one of, and the acquisition each vocabulary in the first language edition system
Current density value, before determining that the page official documents and correspondence whether there is anomaly, in addition to:
Obtain each history density value of the vocabulary in the first language edition system;And/or
Obtain each history density value of the vocabulary in the second language edition system.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, it is described according to institute
History density value and each vocabulary of each vocabulary in the first language edition system are stated in second language version system
In history density value in system at least one of, and the acquisition each vocabulary in the first language edition system
Current density value, determine that the page official documents and correspondence whether there is anomaly, including:
If current density value of the vocabulary in the first language edition system is more than 0, and in the first language version
History density value in system is 0, determines that the page official documents and correspondence whether there is anomaly;Or
If current density value of the vocabulary in the first language edition system is more than 0, and in the second language version
History density value in system in part system is more than 0, determines that the page official documents and correspondence whether there is anomaly;Or
If current density value of the vocabulary in the first language edition system is more than 0, and in the second language version
History density value in system is more than 0, and current density value of the vocabulary in the first language edition system, with being somebody's turn to do
History density value of the vocabulary in the second language edition system, difference, more than or equal to the density threshold pre-set,
Determine that the page official documents and correspondence whether there is anomaly;Or
If current density value of the vocabulary in the first language edition system is 0, and the vocabulary is in the second language
History density value in edition system in whole systems is more than 0, determines that the page official documents and correspondence whether there is anomaly.
The another aspect of the application, there is provided a kind of processing unit of page official documents and correspondence, including:
Determining unit, for determining at least one vocabulary included in pending page official documents and correspondence;
Obtaining unit, it is current close in the affiliated first language edition system of the page official documents and correspondence for obtaining each vocabulary
Angle value;
Processing unit, for according to each history density value and institute of the vocabulary in the first language edition system
State in history density value of each vocabulary in second language edition system at least one of, and the obtaining unit obtain
Each current density value of the vocabulary in the first language edition system, determine the page official documents and correspondence with the presence or absence of abnormal existing
As;Wherein,
The first language edition system and the second language edition system are to belong to same multi-language system not
With language version system.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, the page text
Case includes mail official documents and correspondence, document official documents and correspondence or Web page official documents and correspondence.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, it is described to obtain list
Member, it is specifically used for
According to Di=(ti+ai)/(T+A), obtain current density value of each vocabulary in the first language edition system;
Wherein,
I represents i-th of vocabulary, and value is natural number;
tiRepresent the number that i-th of vocabulary occurs in the page official documents and correspondence;
aiRepresent the history number that i-th of vocabulary occurs in the first language edition system;
T represents the vocabulary total amount in the page official documents and correspondence;
A represents the history vocabulary total amount in the first language edition system;
DiRepresent current density value of i-th of the vocabulary in the first language edition system.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, it is described to obtain list
Member, it is additionally operable to
Obtain each history density value of the vocabulary in the first language edition system;And/or
Obtain each history density value of the vocabulary in the second language edition system.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, the processing are single
Member, it is specifically used for
If current density value of the vocabulary in the first language edition system is more than 0, and in the first language version
History density value in system is 0, determines that the page official documents and correspondence whether there is anomaly;Or
If current density value of the vocabulary in the first language edition system is more than 0, and in the second language version
History density value in system in part system is more than 0, determines that the page official documents and correspondence whether there is anomaly;Or
If current density value of the vocabulary in the first language edition system is more than 0, and in the second language version
History density value in system is more than 0, and current density value of the vocabulary in the first language edition system, with being somebody's turn to do
History density value of the vocabulary in the second language edition system, difference, more than or equal to the density threshold pre-set,
Determine that the page official documents and correspondence whether there is anomaly;Or
If current density value of the vocabulary in the first language edition system is 0, and the vocabulary is in the second language
History density value in edition system in whole systems is more than 0, determines that the page official documents and correspondence whether there is anomaly.
As shown from the above technical solution, the embodiment of the present application by determine in pending page official documents and correspondence it is included at least
One vocabulary, and then current density value of each vocabulary in the affiliated first language edition system of the page official documents and correspondence is obtained, make
Can be according to history density value of each vocabulary in the first language edition system and each vocabulary the
In history density value in two language version systems at least one of, and the acquisition each vocabulary in the first language
Current density value in edition system, determine that the page official documents and correspondence whether there is anomaly, handled without operating personnel's participation
Process, it is simple to operate, and also accuracy is high, so as to improve the efficiency and reliability of page official documents and correspondence processing.
In addition, the technical scheme provided using the application, can be occurred abnormal existing to page official documents and correspondence automatically in real time
As being identified, the real-time of page official documents and correspondence processing can be effectively improved.
【Brief description of the drawings】
, below will be to embodiment or description of the prior art in order to illustrate more clearly of the technical scheme in the embodiment of the present application
In the required accompanying drawing used be briefly described, it should be apparent that, drawings in the following description are some realities of the application
Example is applied, for those of ordinary skill in the art, without having to pay creative labor, can also be attached according to these
Figure obtains other accompanying drawings.
Fig. 1 is the schematic flow sheet of the processing method for the page official documents and correspondence that the embodiment of the application one provides;
Fig. 2 is the structural representation of the processing unit for the page official documents and correspondence that another embodiment of the application provides.
【Embodiment】
To make the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
In accompanying drawing, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described embodiment is
Some embodiments of the present application, rather than whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art
The whole other embodiments obtained under the premise of creative work is not made, belong to the scope of the application protection.
It should be noted that terminal involved in the embodiment of the present application can include but is not limited to mobile phone, individual digital
Assistant(Personal Digital Assistant, PDA), radio hand-held equipment, tablet personal computer(Tablet Computer)、
PC(Personal Computer, PC), MP3 player, MP4 players etc..
In addition, the terms "and/or", only a kind of incidence relation for describing affiliated partner, represents there may be
Three kinds of relations, for example, A and/or B, can be represented:Individualism A, while A and B be present, these three situations of individualism B.Separately
Outside, character "/" herein, it is a kind of relation of "or" to typically represent forward-backward correlation object.
Fig. 1 is the schematic flow sheet of the processing method for the page official documents and correspondence that the embodiment of the application one provides, as shown in Figure 1.
101st, at least one vocabulary included in pending page official documents and correspondence is determined.
Wherein, vocabulary, it can be understood as be all or particular range words or fixed phrase etc. in a kind of language.For
For different language, there can be different vocabulary, for example, Chinese vocabulary, English glossary etc..
It should be noted that the content of page official documents and correspondence can include immobilized substance and changing content, due to changing content
Unpredictability and uncertainty, for example, merchandise news etc., in general, in 101, can specifically be carried out to immobilized substance
Scanning, to determine at least one vocabulary included in the immobilized substance of page official documents and correspondence.
Alternatively, in a possible implementation of the present embodiment, the page official documents and correspondence involved by the application can wrap
Include but be not limited to mail official documents and correspondence, document official documents and correspondence or Web page official documents and correspondence, the present embodiment is to this without being particularly limited to.
Wherein, mail official documents and correspondence is the official documents and correspondence for showing content with mail he, and mail can include but is not limited to plain text class
Type mail and HTML(HyperText Markup Language, HTML)Type mail.
Wherein, document official documents and correspondence is the official documents and correspondence for showing content with document form.Document can include but is not limited to WORD documents,
EXCEL document or PDF document.
Wherein, WWW(World Wide Web, Web)Page official documents and correspondence is the official documents and correspondence for showing content in the form of Web page.
Web page can include by one or more HTML(HyperText Markup Language, HTML)Mark
A display block of composition, referred to as page elements are signed, for example, text, label, hyperlink, button, input frame, combobox etc..
It should be noted that pending page official documents and correspondence can be the page official documents and correspondence of content change, or can also be new
Increased page official documents and correspondence, the present embodiment is to this without being particularly limited to.
102nd, current density value of each vocabulary in the affiliated first language edition system of the page official documents and correspondence is obtained.
Alternatively,, specifically can be according to D in 102 in a possible implementation of the present embodimenti=(ti+
ai)/(T+A), obtain current density value of each vocabulary in the first language edition system.
Wherein,
I represents i-th of vocabulary, and value is natural number;
tiRepresent the number that i-th of vocabulary occurs in the page official documents and correspondence;
aiRepresent the history number that i-th of vocabulary occurs in the first language edition system;
T represents the vocabulary total amount in the page official documents and correspondence;
A represents the history vocabulary total amount in the first language edition system;
DiRepresent current density value of i-th of the vocabulary in the first language edition system.
103rd, according to each history density value and each word of the vocabulary in the first language edition system
Converge in history density value in second language edition system at least one of, and the acquisition each vocabulary described the
Current density value in one language version system, determine that the page official documents and correspondence whether there is anomaly.
Wherein, the first language edition system is to belong to same multi-language system with the second language edition system
Different language edition system.Multi-language system, being exactly can be with multilingual(A kind of rather than language)Provide the user letter
Breath service, content identical information can be obtained using the user of different language from multi-language system by allowing.
Specifically, the system can be website, and multi-language system can be then multi-language website.For example, global speed is sold
It is logical(aliexpress)Website, the multi-language website can have multiple language version websites, for example, this website of the French edition
Fr.aliexpress.com, German version website de.aliexpress.com, Russian version website
Ru.aliexpress.com, japanese version website ja.aliexpress.com etc..
It is understood that second language edition system can be multi-language system in except first language edition system it
Other outer all systems, or can also be the other parts system in multi-language system in addition to first language edition system
System, the present embodiment is to this without being particularly limited to.
Alternatively, in a possible implementation of the present embodiment, before 103, institute can also further be obtained
State history density value of each vocabulary in the first language edition system.It can specifically utilize in first language edition system
History page official documents and correspondence, obtain each history density value of the vocabulary in the first language edition system.Specifically, have
Body can be according to D 'i=ai/ A, obtain each history density value of the vocabulary in the first language edition system.
Wherein,
I represents i-th of vocabulary, and value is natural number;
aiRepresent the history number that i-th of vocabulary occurs in the first language edition system;
A represents the history vocabulary total amount in the first language edition system;
D′iRepresent current density value of i-th of the vocabulary in the first language edition system.
Alternatively, in a possible implementation of the present embodiment, before 103, institute can also further be obtained
State history density value of each vocabulary in the second language edition system.It can specifically utilize in second language edition system
History page official documents and correspondence, obtain each history density value of the vocabulary in the second language edition system.Specifically obtain
The method of obtaining, the preparation method of each history density value of the vocabulary in the first language edition system is may be referred to, in detail
Thin description may refer to the related content in previous possible implementation, and here is omitted.
It is understood that in a possible implementation of the present embodiment, can also be further before 103
The operation in the possible implementation of above-mentioned two is performed, that is, obtains each vocabulary in the first language edition system
History density value, and obtain each history density value of the vocabulary in the second language edition system.Retouch in detail
The related content that may refer in the possible implementation of both of the aforesaid is stated, here is omitted.
, specifically can be according to the situation of change of the density of each vocabulary, it is determined that described in 103 in the present embodiment
Page official documents and correspondence whether there is anomaly, can enumerate possible implementation that is several but being not all of, the present embodiment below
To this without being particularly limited to.
Alternatively, in a possible implementation of the present embodiment, in 103, if vocabulary is in the first language
Current density value in edition system is more than 0, and the history density value in the first language edition system is 0, then can be with
Determine that the page official documents and correspondence whether there is anomaly.Now, illustrate that this vocabulary in pending page official documents and correspondence does not exist
Occurred in the history page official documents and correspondence of first language edition system, then can determine that risk is larger.For example, it may be possible to it is this vocabulary
It is mess code, or be also likely to be the entanglement of language translation etc..For example, in first language edition system before " mobile phone " this vocabulary
In existing page official documents and correspondence do not occurred, i.e., history density value of " mobile phone " this vocabulary in first language edition system is 0,
And occur in " mobile phone " this vocabulary page official documents and correspondence pending in first language edition system, i.e., " mobile phone " this vocabulary exists
Current density value in first language edition system is then 0.05, illustrates " mobile phone " this vocabulary in pending page official documents and correspondence
Risk it is larger.
Alternatively, in a possible implementation of the present embodiment, in 103, if vocabulary is in the first language
Current density value in edition system is more than 0, and the history density value in the second language edition system in part system
More than 0, then it can determine that the page official documents and correspondence whether there is anomaly.Now, this in pending page official documents and correspondence is illustrated
Individual vocabulary occurs in two language version systems or two or more language version system, then can determine to translate it is problematic because
Under normal circumstances should a vocabulary be only present in a language version system i.e. language particular text, or appear in whole languages
Say that edition system is public text.Turned over for example, it may be possible to which this word is leakage.For example, " mobile phone " this vocabulary is in first language version
Occur in the system in pending page official documents and correspondence, i.e. current density of " mobile phone " this vocabulary in first language edition system
Value then be 0.05, and " mobile phone " this vocabulary before in second language edition system a system(Rather than whole systems)In
Occurred, i.e., history density value of " mobile phone " this vocabulary in second language edition system in a system is 0.1, and explanation is treated
" mobile phone " in the page official documents and correspondence of processing is probably that leakage is turned over.
Alternatively, in a possible implementation of the present embodiment, in 103, if vocabulary is in the first language
Current density value in edition system is more than 0, and the history density value in the second language edition system is more than 0, and
Current density value of the vocabulary in the first language edition system, and the vocabulary is in the second language edition system
History density value, difference, more than or equal to the density threshold pre-set, then can determine whether the page official documents and correspondence is deposited
In anomaly.Now, illustrate density of this vocabulary in pending page official documents and correspondence in first language edition system with
Density difference in second language edition system in other systems is very big, then can determine certain risk be present.For example, it may be possible to
It is that this vocabulary is mess code, or is also likely to be that vocabulary is piled up, i.e., how inferior same one section of text occurs in page official documents and correspondence.
For example, occur in " iphone " this vocabulary page official documents and correspondence pending in first language edition system, i.e., " iphone " this
Current density value of the individual vocabulary in first language edition system is then 0.5, and in the second language before " iphone " this vocabulary
Occurred in speech edition system, i.e. history density value 0.0001 of " iphone " this vocabulary in second language edition system,
It is probably that vocabulary piles up problem to illustrate " iphone " in pending page official documents and correspondence.
Alternatively, in a possible implementation of the present embodiment, in 103, if vocabulary is in the first language
Current density value in edition system is 0, and history of the vocabulary in the second language edition system in whole systems is close
Angle value is more than 0, then can determine that the page official documents and correspondence whether there is anomaly.Now, illustrate in pending page official documents and correspondence,
And do not occurred this vocabulary in the history page official documents and correspondence of first language edition system, but second language edition system
This vocabulary was occurred in history page official documents and correspondence, then can determine to translate it is problematic because under normal circumstances should be one
Vocabulary is only present in a language version system i.e. language particular text, or appears in the i.e. public text of whole language version systems
This.For example, it may be possible to which this vocabulary is public text, it has been missed.For example, " com " this vocabulary is in first language version system
Do not occurred in pending page official documents and correspondence in system, i.e., " com " this vocabulary is current close in first language edition system
Angle value then be 0, and " com " this vocabulary before occurred in second language edition system in whole systems, i.e., " com " this
History density value of the vocabulary in second language edition system in whole systems is all higher than 0, illustrates in pending page official documents and correspondence
" com " be probably public text, be missed.
It is understood that by performing 103, determine that the page official documents and correspondence whether there is anomaly, one can also be entered
Step performs alarm operation, to indicate that operating personnel are investigated and adjusted to the page official documents and correspondence.
It should be noted that 101~103 executive agent can be processing unit, for example, page official documents and correspondence editing machine, can
It is online to carry out in the client that is located locally, to carry out processed offline, or may be located in the server of network side
Processing, the present embodiment is to this without limiting.
It is understood that the client can be mounted in the application program in terminal, or it can also be and browse
One webpage of device, if can realize the page official documents and correspondence processing objective reality form can, the present embodiment to this without
Limit.
Existing processing method to page official documents and correspondence, it is necessary to be checked, to find the page official documents and correspondence one by one by operating personnel
With the presence or absence of anomaly.However, manually check that page official documents and correspondence is that exception easily brings two problems.
Firstth, efficiency is very low, the system of particularly slightly larger type, and page official documents and correspondence just has a hundreds of thousands, and operating personnel can not one
One checks;
Secondth, it is artificial to check the exception easily missed in page official documents and correspondence, for example, the abnormal seldom, word in page official documents and correspondence
Many situations, operating personnel are difficult that naked eyes are found.
The technical scheme provided using the present embodiment, participated in without operating personnel, it is simple to operate, and also accuracy is high.
In the present embodiment, by determining at least one vocabulary included in pending page official documents and correspondence, and then obtain every
Current density value of the individual vocabulary in the affiliated first language edition system of the page official documents and correspondence, enabling according to each word
History density value and each vocabulary the going through in second language edition system of the remittance in the first language edition system
At least one of in history density value, and current density of each vocabulary in the first language edition system of the acquisition
Value, determines that the page official documents and correspondence whether there is anomaly, and processing procedure, simple to operate, Er Qiezheng are participated in without operating personnel
True rate is high, so as to improve the efficiency and reliability of page official documents and correspondence processing.
In addition, the technical scheme provided using the application, can be occurred abnormal existing to page official documents and correspondence automatically in real time
As being identified, the real-time of page official documents and correspondence processing can be effectively improved.
It should be noted that for foregoing each method embodiment, in order to be briefly described, therefore it is all expressed as a series of
Combination of actions, but those skilled in the art should know, the application is not limited by described sequence of movement because
According to the application, some steps can use other orders or carry out simultaneously.Secondly, those skilled in the art should also know
Know, embodiment described in this description belongs to preferred embodiment, involved action and module not necessarily the application
It is necessary.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiment.
Fig. 2 is the structural representation of the processing unit for the page official documents and correspondence that another embodiment of the application provides, as shown in Figure 2.
The processing unit of the page official documents and correspondence of the present embodiment can include determining that unit 21, obtaining unit 22 and processing unit 23.Wherein,
Determining unit 21, for determining at least one vocabulary included in pending page official documents and correspondence.Wherein, vocabulary,
It can be understood as all or particular range words or fixed phrase etc. in a kind of language., can be with for different language
There is different vocabulary, for example, Chinese vocabulary, English glossary etc..It should be noted that the content of page official documents and correspondence can include fixing
Content and changing content, due to the unpredictability and uncertainty of changing content, for example, merchandise news etc., in general, institute
Stating determining unit 21 can specifically be scanned to immobilized substance, with determine in the immobilized substance of page official documents and correspondence it is included at least
One vocabulary.It should be noted that pending page official documents and correspondence can be the page official documents and correspondence of content change, or can also be new
Increased page official documents and correspondence, the present embodiment is to this without being particularly limited to.
Obtaining unit 22, it is current in the affiliated first language edition system of the page official documents and correspondence for obtaining each vocabulary
Density value.
Processing unit 23, for according to each history density value of the vocabulary in the first language edition system and
At least one of in each history density value of the vocabulary in second language edition system, and the obtaining unit 22 obtains
Current density value of each vocabulary obtained in the first language edition system, determine the page official documents and correspondence with the presence or absence of abnormal
Phenomenon.
Wherein, the first language edition system is to belong to same multi-language system with the second language edition system
Different language edition system.Multi-language system, being exactly can be with multilingual(A kind of rather than language)Provide the user letter
Breath service, content identical information can be obtained using the user of different language from multi-language system by allowing.
Specifically, the system can be website, and multi-language system can be then multi-language website.For example, global speed is sold
It is logical(aliexpress)Website, the multi-language website can have multiple language version websites, for example, this website of the French edition
Fr.aliexpress.com, German version website de.aliexpress.com, Russian version website
Ru.aliexpress.com, japanese version website ja.aliexpress.com etc..
It is understood that second language edition system can be multi-language system in except first language edition system it
Other outer all systems, or can also be the other parts system in multi-language system in addition to first language edition system
System, the present embodiment is to this without being particularly limited to.
Alternatively, in a possible implementation of the present embodiment, the page official documents and correspondence involved by the application can wrap
Include but be not limited to mail official documents and correspondence, document official documents and correspondence or Web page official documents and correspondence, the present embodiment is to this without being particularly limited to.
Wherein, mail official documents and correspondence is the official documents and correspondence for showing content with mail he, and mail can include but is not limited to plain text class
Type mail and HTML(HyperText Markup Language, HTML)Type mail.
Wherein, document official documents and correspondence is the official documents and correspondence for showing content with document form.Document can include but is not limited to WORD documents,
EXCEL document or PDF document.
Wherein, WWW(World Wide Web, Web)Page official documents and correspondence is the official documents and correspondence for showing content in the form of Web page.
Web page can include by one or more HTML(HyperText Markup Language, HTML)Mark
A display block of composition, referred to as page elements are signed, for example, text, label, hyperlink, button, input frame, combobox etc..
Alternatively, in a possible implementation of the present embodiment, the obtaining unit 22, it specifically can be used for root
According to Di=(ti+ai)/(T+A), obtain current density value of each vocabulary in the first language edition system.
Wherein,
I represents i-th of vocabulary, and value is natural number;
tiRepresent the number that i-th of vocabulary occurs in the page official documents and correspondence;
aiRepresent the history number that i-th of vocabulary occurs in the first language edition system;
T represents the vocabulary total amount in the page official documents and correspondence;
A represents the history vocabulary total amount in the first language edition system;
DiRepresent current density value of i-th of the vocabulary in the first language edition system.
Alternatively, in a possible implementation of the present embodiment, the obtaining unit 22, can also further use
In the acquisition each history density value of the vocabulary in the first language edition system.The obtaining unit 22 specifically can be with
Using the history page official documents and correspondence in first language edition system, each vocabulary is obtained in the first language edition system
History density value.Specifically, the obtaining unit 22 specifically can be according to D 'i=ai/ A, each vocabulary is obtained described
History density value in first language edition system.
Wherein,
I represents i-th of vocabulary, and value is natural number;
aiRepresent the history number that i-th of vocabulary occurs in the first language edition system;
A represents the history vocabulary total amount in the first language edition system;
D′iRepresent current density value of i-th of the vocabulary in the first language edition system.
Alternatively, in a possible implementation of the present embodiment, the obtaining unit 22, can also further use
In the acquisition each history density value of the vocabulary in the second language edition system.The obtaining unit 22 specifically can be with
Using the history page official documents and correspondence in second language edition system, each vocabulary is obtained in the second language edition system
History density value.The 22 specific preparation method of obtaining unit, each vocabulary is may be referred in the first language
The preparation method of history density value in edition system, the correlation that may refer in previous possible implementation is described in detail
Content, here is omitted.
It is understood that in a possible implementation of the present embodiment, the obtaining unit 22 can also be entered
One step is used to perform the operation in the possible implementation of above-mentioned two, that is, obtains each vocabulary in the first language version
History density value in the system, and obtain each history density of the vocabulary in the second language edition system
Value.The related content that may refer in the possible implementation of both of the aforesaid is described in detail, here is omitted.
In the present embodiment, the processing unit 23 specifically can be according to the situation of change of the density of each vocabulary, really
The fixed page official documents and correspondence whether there is anomaly, can enumerate possible implementation that is several but being not all of below, this
Embodiment is to this without being particularly limited to.
Alternatively, in a possible implementation of the present embodiment, if vocabulary is in the first language edition system
In current density value be more than 0, and the history density value in the first language edition system be 0, the processing unit 23
It can then determine that the page official documents and correspondence whether there is anomaly.Now, this vocabulary in pending page official documents and correspondence is illustrated
Do not occurred in the history page official documents and correspondence of first language edition system, then can determine that risk is larger.For example, it may be possible to it is this
Individual vocabulary is mess code, or be also likely to be the entanglement of language translation etc..For example, in first language version before " mobile phone " this vocabulary
Existing page official documents and correspondence did not occur in the system, i.e. history density of " mobile phone " this vocabulary in first language edition system
Be worth for 0, and occur in " mobile phone " this vocabulary page official documents and correspondence pending in first language edition system, i.e., " mobile phone " this
Current density value of the vocabulary in first language edition system is then 0.05, illustrate " mobile phone " in pending page official documents and correspondence this
The risk of individual vocabulary is larger.
Alternatively, in a possible implementation of the present embodiment, if vocabulary is in the first language edition system
In current density value be more than 0, and the history density value in the second language edition system in part system be more than 0, institute
Stating processing unit 23 can then determine that the page official documents and correspondence whether there is anomaly.Now, pending page official documents and correspondence is illustrated
In this vocabulary occur in two language version systems or two or more language version system, then can determine that translation is asked
Topic because under normal circumstances should a vocabulary be only present in a language version system i.e. language particular text, or occur
It is public text in whole language version systems.Turned over for example, it may be possible to which this word is leakage.For example, " mobile phone " this vocabulary is
Occur in one language version system in pending page official documents and correspondence, i.e., " mobile phone " this vocabulary is in first language edition system
Current density value then be 0.05, and " mobile phone " this vocabulary before in second language edition system a system(It is rather than complete
Portion's system)In occurred, i.e. history density value of " mobile phone " this vocabulary in second language edition system in a system is
0.1, it is probably that leakage is turned over to illustrate " mobile phone " in pending page official documents and correspondence.
Alternatively, in a possible implementation of the present embodiment, if vocabulary is in the first language edition system
In current density value be more than 0, and the history density value in the second language edition system is more than 0, and the vocabulary exists
Current density value in the first language edition system, and history of the vocabulary in the second language edition system is close
Angle value, difference, more than or equal to the density threshold pre-set, the processing unit 23 can then determine the page official documents and correspondence
With the presence or absence of anomaly.Now, illustrate this vocabulary in pending page official documents and correspondence in first language edition system
Density is very big with the density difference in the other systems in second language edition system, then can determine certain risk be present.Example
Such as, it may be possible to which this vocabulary is mess code, or is also likely to be that vocabulary is piled up i.e. same one section of text and occurred in page official documents and correspondence
It is more inferior.For example, occur in " iphone " this vocabulary page official documents and correspondence pending in first language edition system, i.e.,
Current density value of " iphone " this vocabulary in first language edition system then be 0.5, and " iphone " this vocabulary it
It is preceding to occur in second language edition system, i.e. history density of " iphone " this vocabulary in second language edition system
Value 0.0001, it is probably that vocabulary piles up problem to illustrate " iphone " in pending page official documents and correspondence.
Alternatively, in a possible implementation of the present embodiment, if vocabulary is in the first language edition system
In current density value be 0, and history density value of the vocabulary in the second language edition system in whole systems is more than
0, the processing unit 23 can then determine that the page official documents and correspondence whether there is anomaly.Now, the pending page is illustrated
Do not occurred this vocabulary in official documents and correspondence and in the history page official documents and correspondence of first language edition system, but second language version
Occurred this vocabulary in the history page official documents and correspondence of the system, then can determine to translate it is problematic because under normal circumstances
Should a vocabulary be only present in a language version system i.e. language particular text, or appear in whole language version systems
I.e. public text.For example, it may be possible to which this vocabulary is public text, it has been missed.For example, " com " this vocabulary is in the first language
Do not occurred in pending page official documents and correspondence in speech edition system, i.e., " com " this vocabulary is in first language edition system
Current density value then be 0, and " com " this vocabulary before occurred in second language edition system in whole systems, i.e.,
History density value of " com " this vocabulary in second language edition system in whole systems is all higher than 0, illustrates pending page
" com " in the official documents and correspondence of face is probably public text, is missed.
It is understood that the processing unit 23 may be used also by determining that the page official documents and correspondence whether there is anomaly
Further to perform alarm operation, to indicate that operating personnel are investigated and adjusted to the page official documents and correspondence.
It should be noted that the processing unit for the page official documents and correspondence that the present embodiment provides, for example, it may be page official documents and correspondence editor
Device, in the client that can be located locally, to carry out processed offline, or it may be located in the server of network side, to enter
The online processing of row, the present embodiment is to this without limiting.
It is understood that the client can be mounted in the application program in terminal, or it can also be and browse
One webpage of device, if can realize the page official documents and correspondence processing objective reality form can, the present embodiment to this without
Limit.
Existing processing method to page official documents and correspondence, it is necessary to be checked, to find the page official documents and correspondence one by one by operating personnel
With the presence or absence of anomaly.However, manually check that page official documents and correspondence is that exception easily brings two problems.
Firstth, efficiency is very low, the system of particularly slightly larger type, and page official documents and correspondence just has a hundreds of thousands, and operating personnel can not one
One checks;
Secondth, it is artificial to check the exception easily missed in page official documents and correspondence, for example, the abnormal seldom, word in page official documents and correspondence
Many situations, operating personnel are difficult that naked eyes are found.
The technical scheme provided using the present embodiment, participated in without operating personnel, it is simple to operate, and also accuracy is high.
In the present embodiment, at least one vocabulary included in pending page official documents and correspondence is determined by determining unit, is entered
And current density value of each vocabulary in the affiliated first language edition system of the page official documents and correspondence is obtained by obtaining unit so that
Processing unit can be according to each history density value and each word of the vocabulary in the first language edition system
At least one of in the history density value converged in second language edition system, and each vocabulary that the obtaining unit obtains
Current density value in the first language edition system, determine that the page official documents and correspondence whether there is anomaly, without behaviour
Make personnel and participate in processing procedure, it is simple to operate, and accuracy is high, so as to improve the efficiency of page official documents and correspondence processing and reliable
Property.
In addition, the technical scheme provided using the application, can be occurred abnormal existing to page official documents and correspondence automatically in real time
As being identified, the real-time of page official documents and correspondence processing can be effectively improved.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.
In several embodiments provided herein, it should be understood that disclosed system, apparatus and method can be with
Realize by another way.For example, device embodiment described above is only schematical, for example, the unit
Division, only a kind of division of logic function, can there is other dividing mode, such as multiple units or component when actually realizing
Another system can be combined or be desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or
The mutual coupling discussed or direct-coupling or communication connection can be the indirect couplings by some interfaces, device or unit
Close or communicate to connect, can be electrical, mechanical or other forms.
The unit illustrated as separating component can be or may not be physically separate, show as unit
The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
In addition, each functional unit in each embodiment of the application can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list
Member can both be realized in the form of hardware, can also be realized in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit realized in the form of SFU software functional unit, can be stored in one and computer-readable deposit
In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are causing a computer
Equipment(Can be personal computer, server, or network equipment etc.)Or processor(processor)It is each to perform the application
The part steps of embodiment methods described.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage(Read-
Only Memory, ROM), random access memory(Random Access Memory, RAM), magnetic disc or CD etc. it is various
Can be with the medium of store program codes.
Finally it should be noted that:Above example is only to illustrate the technical scheme of the application, rather than its limitations;Although
The application is described in detail with reference to the foregoing embodiments, it will be understood by those within the art that:It still may be used
To be modified to the technical scheme described in foregoing embodiments, or equivalent substitution is carried out to which part technical characteristic;
And these modification or replace, do not make appropriate technical solution essence depart from each embodiment technical scheme of the application spirit and
Scope.
Claims (10)
- A kind of 1. processing method of page official documents and correspondence, it is characterised in that including:Determine at least one vocabulary included in pending page official documents and correspondence;Obtain current density value of each vocabulary in the affiliated first language edition system of the page official documents and correspondence;According to history density value of each vocabulary in the first language edition system and each vocabulary second At least one of in history density value in language version system, and each vocabulary is first belonging to the page official documents and correspondence Current density value in language version system, determine that the page official documents and correspondence whether there is anomaly;Wherein, the second language edition system be in multi-language system in addition to first language edition system other are all System or part system.
- 2. according to the method for claim 1, it is characterised in that the page official documents and correspondence include mail official documents and correspondence, document official documents and correspondence or Web page official documents and correspondence.
- 3. according to the method for claim 1, it is characterised in that described to obtain each vocabulary belonging to the page official documents and correspondence the Current density value in one language version system, including:According to Di=(ti+ai)/(T+A), obtain current density value of each vocabulary in the first language edition system;Wherein,I represents i-th of vocabulary, and value is natural number;tiRepresent the number that i-th of vocabulary occurs in the page official documents and correspondence;aiRepresent the history number that i-th of vocabulary occurs in the first language edition system;T represents the vocabulary total amount in the page official documents and correspondence;A represents the history vocabulary total amount in the first language edition system;DiRepresent current density value of i-th of the vocabulary in the first language edition system.
- 4. according to the method for claim 1, it is characterised in that it is described according to each vocabulary in the first language version At least one in history density value and each history density value of the vocabulary in second language edition system in the system , and each current density value of the vocabulary in the affiliated first language edition system of the page official documents and correspondence, it is determined that described Before page official documents and correspondence whether there is anomaly, in addition to:Obtain each history density value of the vocabulary in the first language edition system;And/orObtain each history density value of the vocabulary in the second language edition system.
- 5. according to the method described in Claims 1 to 4 any claim, it is characterised in that described according to each vocabulary History density value and each history of the vocabulary in second language edition system in the first language edition system At least one of in density value, and each vocabulary is current in the affiliated first language edition system of the page official documents and correspondence Density value, determine that the page official documents and correspondence whether there is anomaly, including:If current density value of the vocabulary in the first language edition system is more than 0, and in the first language edition system In history density value be 0, determine that the page official documents and correspondence has anomaly;OrIf current density value of the vocabulary in the first language edition system is more than 0, and in the second language edition system History density value in middle part system is more than 0, determines that the page official documents and correspondence has anomaly;OrIf current density value of the vocabulary in the first language edition system is more than 0, and in the second language edition system In history density value be more than 0, and current density value of the vocabulary in the first language edition system and the vocabulary exist The difference of history density value in the second language edition system, more than or equal to the density threshold pre-set, determine institute State page official documents and correspondence and anomaly be present;OrIf current density value of the vocabulary in the first language edition system is 0, and the vocabulary is in the second language version History density value in system in whole systems is more than 0, determines that the page official documents and correspondence has anomaly.
- A kind of 6. processing unit of page official documents and correspondence, it is characterised in that including:Determining unit, for determining at least one vocabulary included in pending page official documents and correspondence;Obtaining unit, for obtaining current density of each vocabulary in the affiliated first language edition system of the page official documents and correspondence Value;Processing unit, for according to history density value of each vocabulary in the first language edition system and described every In history density value of the individual vocabulary in second language edition system at least one of, and the obtaining unit obtain it is each Current density value of the vocabulary in the affiliated first language edition system of the page official documents and correspondence, determines that the page official documents and correspondence whether there is Anomaly;Wherein, the second language edition system is its in addition to first language edition system in multi-language system His all systems or part system.
- 7. device according to claim 6, it is characterised in that the page official documents and correspondence include mail official documents and correspondence, document official documents and correspondence or Web page official documents and correspondence.
- 8. device according to claim 6, it is characterised in that the obtaining unit, specifically for according to Di=(ti+ai)/ (T+A) current density value of each vocabulary in the first language edition system, is obtained;Wherein,I represents i-th of vocabulary, and value is natural number;tiRepresent the number that i-th of vocabulary occurs in the page official documents and correspondence;aiRepresent the history number that i-th of vocabulary occurs in the first language edition system;T represents the vocabulary total amount in the page official documents and correspondence;A represents the history vocabulary total amount in the first language edition system;DiRepresent current density value of i-th of the vocabulary in the first language edition system.
- 9. device according to claim 6, it is characterised in that the obtaining unit, be additionally operable toObtain each history density value of the vocabulary in the first language edition system;And/orObtain each history density value of the vocabulary in the second language edition system.
- 10. according to the device described in claim 6~9 any claim, it is characterised in that the processing unit, it is specific to use InIf current density value of the vocabulary in the first language edition system is more than 0, and in the first language edition system In history density value be 0, determine that the page official documents and correspondence has anomaly;OrIf current density value of the vocabulary in the first language edition system is more than 0, and in the second language edition system History density value in middle part system is more than 0, determines that the page official documents and correspondence has anomaly;OrIf current density value of the vocabulary in the first language edition system is more than 0, and in the second language edition system In history density value be more than 0, and current density value of the vocabulary in the first language edition system and the vocabulary exist The difference of history density value in the second language edition system, more than or equal to the density threshold pre-set, determine institute State page official documents and correspondence and anomaly be present;OrIf current density value of the vocabulary in the first language edition system is 0, and the vocabulary is in the second language version History density value in system in whole systems is more than 0, determines that the page official documents and correspondence has anomaly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410010001.4A CN104778155B (en) | 2014-01-09 | 2014-01-09 | The processing method and processing device of page official documents and correspondence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410010001.4A CN104778155B (en) | 2014-01-09 | 2014-01-09 | The processing method and processing device of page official documents and correspondence |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104778155A CN104778155A (en) | 2015-07-15 |
CN104778155B true CN104778155B (en) | 2017-12-15 |
Family
ID=53619629
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410010001.4A Active CN104778155B (en) | 2014-01-09 | 2014-01-09 | The processing method and processing device of page official documents and correspondence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104778155B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101030199A (en) * | 2005-11-14 | 2007-09-05 | 林伯颖 | Process official and business documents in several languages for different national institutions |
CN101923540A (en) * | 2010-07-20 | 2010-12-22 | 陈洁 | Language translation quality auditing method |
CN101950286A (en) * | 2010-09-14 | 2011-01-19 | 传神联合(北京)信息技术有限公司 | Error correction module and method in software translation system |
CN102262621A (en) * | 2010-05-26 | 2011-11-30 | 钟长林 | Device and method for checking translated text |
CN103049437A (en) * | 2011-10-17 | 2013-04-17 | 圣侨资讯事业股份有限公司 | Multi-language editing system for online publications |
-
2014
- 2014-01-09 CN CN201410010001.4A patent/CN104778155B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101030199A (en) * | 2005-11-14 | 2007-09-05 | 林伯颖 | Process official and business documents in several languages for different national institutions |
CN102262621A (en) * | 2010-05-26 | 2011-11-30 | 钟长林 | Device and method for checking translated text |
CN101923540A (en) * | 2010-07-20 | 2010-12-22 | 陈洁 | Language translation quality auditing method |
CN101950286A (en) * | 2010-09-14 | 2011-01-19 | 传神联合(北京)信息技术有限公司 | Error correction module and method in software translation system |
CN103049437A (en) * | 2011-10-17 | 2013-04-17 | 圣侨资讯事业股份有限公司 | Multi-language editing system for online publications |
Also Published As
Publication number | Publication date |
---|---|
CN104778155A (en) | 2015-07-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jurgens et al. | Incorporating dialectal variability for socially equitable language identification | |
US10599765B2 (en) | Semantic translation model training | |
US10765956B2 (en) | Named entity recognition on chat data | |
US8463598B2 (en) | Word detection | |
Garg et al. | Sentiment analysis of the Uri terror attack using Twitter | |
US20160110352A1 (en) | Information redaction from document data | |
William et al. | Framework for design and implementation of chat support system using natural language processing | |
CN106874253A (en) | Recognize the method and device of sensitive information | |
CN104866308A (en) | Scenario image generation method and apparatus | |
CN109992653A (en) | Information processing method and processing system | |
CN105893484A (en) | Microblog Spammer recognition method based on text characteristics and behavior characteristics | |
JP2015072614A (en) | Method for detecting expression capable of becoming dangerous expression by relying on specific theme and electronic device and program for electronic device for detecting the same expression | |
CN104915359A (en) | Theme label recommending method and device | |
Olney et al. | Part of speech tagging Java method names | |
CN112560846B (en) | Error correction corpus generation method and device and electronic equipment | |
CN104750670B (en) | The processing method and processing device of page official documents and correspondence | |
CN110738056A (en) | Method and apparatus for generating information | |
CN109062891A (en) | Media processing method, device, terminal and medium | |
CN104778155B (en) | The processing method and processing device of page official documents and correspondence | |
Zhou et al. | Virtual data augmentation: A robust and general framework for fine-tuning pre-trained models | |
CN110276001B (en) | Checking page identification method and device, computing equipment and medium | |
Hemati et al. | PCoQA: Persian Conversational Question Answering Dataset | |
Barker et al. | Assessing the Comparability of News Texts. | |
CN106775914A (en) | A kind of code method for internationalizing and device for automatically generating key assignments | |
Kumar | Challenges in the development of annotated corpus of computer-mediated communication in Indian Languages: A Case of Hindi |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240322 Address after: Singapore Patentee after: Alibaba Singapore Holdings Ltd. Country or region after: Singapore Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands Patentee before: ALIBABA GROUP HOLDING Ltd. Country or region before: Cayman Islands |
|
TR01 | Transfer of patent right |