CN111460836B - Data processing method and device for data processing - Google Patents

Data processing method and device for data processing Download PDF

Info

Publication number
CN111460836B
CN111460836B CN201910046779.3A CN201910046779A CN111460836B CN 111460836 B CN111460836 B CN 111460836B CN 201910046779 A CN201910046779 A CN 201910046779A CN 111460836 B CN111460836 B CN 111460836B
Authority
CN
China
Prior art keywords
word
target
preset
language text
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910046779.3A
Other languages
Chinese (zh)
Other versions
CN111460836A (en
Inventor
冯静静
周纤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201910046779.3A priority Critical patent/CN111460836B/en
Publication of CN111460836A publication Critical patent/CN111460836A/en
Application granted granted Critical
Publication of CN111460836B publication Critical patent/CN111460836B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides a data processing method, a data processing device and a data processing device. The method specifically comprises the following steps: determining a target language text corresponding to the source language text; if the target word segment meeting the replacement condition exists in the target language text, replacing the target word segment with a target preset word to obtain a replaced target language text; and outputting the replaced target language text. According to the embodiment of the invention, the target word segmentation in the target language text can be replaced by the target preset word, so that the replaced target language text can be more in line with the input habit of the user and the actual requirement of the user.

Description

Data processing method and device for data processing
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method and apparatus, and a device for data processing.
Background
Machine translation, also known as automatic translation, is a process of converting a source language into a target language using a computer. With the continuous development of computer computing, the translation mode of machine translation mainly goes through the following three development stages: rule-based translation, statistical-based translation, and artificial neural network-based translation.
Currently, the focus of the above-mentioned translation method is on how to provide a more accurate translation result to the user, and thus the obtained translation result is generally standardized content, however, such a translation result is not suitable for all users or all application scenarios.
For example, the user may use more spoken language to communicate during chat in the instant messaging application, so that the standardized translation result will not conform to the spoken language expression habit of the user. For example, the source language "what are you doing? "may be translated into" what do you do? ", but the user may be more inclined to use spoken expressions in a chat scenario: "you are doing what is you doing? As can be seen, the translation results obtained by the current machine translation cannot conform to the spoken language expression habit of the user.
Disclosure of Invention
The embodiment of the invention provides a data processing method and device and a device for data processing, which can improve the translation efficiency of a translation earphone and the convenience of a user in using the translation earphone.
In order to solve the above problems, an embodiment of the present invention discloses a data processing method, including:
determining a target language text corresponding to the source language text;
If the target word segment meeting the replacement condition exists in the target language text, replacing the target word segment with a target preset word to obtain a replaced target language text;
and outputting the replaced target language text.
In another aspect, an embodiment of the present invention discloses a data processing apparatus, including:
the determining module is used for determining a target language text corresponding to the source language text;
the replacing module is used for replacing the target word with a target preset word under the condition that the target word meeting the replacing condition exists in the target language text, so that the replaced target language text is obtained;
And the output module is used for outputting the replaced target language text.
In yet another aspect, an embodiment of the present invention discloses an apparatus for data processing, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:
determining a target language text corresponding to the source language text;
If the target word segment meeting the replacement condition exists in the target language text, replacing the target word segment with a target preset word to obtain a replaced target language text;
and outputting the replaced target language text.
In yet another aspect, embodiments of the invention disclose a machine-readable medium having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform a data processing method as described in one or more of the preceding.
The embodiment of the invention has the following advantages:
After determining a target language text corresponding to a source language text, the embodiment of the invention can further judge whether target word segmentation meeting the replacement condition exists in the target language text, if so, the target word segmentation is replaced by a target preset word to obtain a replaced target language text, and the replaced target language text is output. The target preset word can be a word which accords with the input habit of the user under the current scene, and the target word segmentation in the target language text is replaced by the target preset word, so that the replaced target language text accords with the input habit of the user and meets the actual requirement of the user.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of steps of an embodiment of a data processing method of the present invention;
FIG. 2 is a block diagram of an embodiment of a data processing apparatus of the present invention;
FIG. 3 is a block diagram of an apparatus 800 for data processing in accordance with the present invention; and
Fig. 4 is a schematic diagram of a server in some embodiments of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Method embodiment
Referring to fig. 1, a flowchart illustrating steps of an embodiment of a data processing method according to the present invention may specifically include the following steps:
Step 101, determining a target language text corresponding to a source language text;
102, if target word segmentation meeting the replacement condition exists in the target language text, replacing the target word segmentation with a target preset word to obtain a replaced target language text;
And 103, outputting the replaced target language text.
The embodiment of the invention can be applied to a translation scene, and a translation client corresponding to the translation scene can translate a source language text into a target language text according to the source language set by a user and the type of the target language. It may be appreciated that the embodiments of the present invention do not limit the kinds of source language and target language, for example, the source language may be chinese, and the target language may be english; or the source language may be english and the target language may be japanese, etc.
The embodiment of the invention does not limit the form of the translation client, for example, the translation client can be a translation APP (Application), and a user can download, install and use the APP in a terminal; or the translation client may be a web page online tool, the user may open a web page, use an online translation client in the web page, etc.
The translation client may be running on a terminal that specifically includes, but is not limited to: smart phones, tablet computers, e-book readers, MP3 (moving picture experts compression standard audio layer 3,Moving Picture Experts Group Audio Layer III) players, MP4 (moving picture experts compression standard audio layer 4,Moving Picture Experts Group Audio Layer IV) players, laptop portable computers, car computers, desktop computers, set top boxes, smart televisions, wearable devices, and the like.
In the embodiment of the invention, the source language text can be used for representing the text to be translated, and the embodiment of the invention can translate the source language text into the target language text. It will be appreciated that embodiments of the present invention are not limited to a particular source of the source language text.
In an alternative embodiment of the present invention, the source language text may be obtained from an instant messaging message obtained from an instant messaging application. For example, the source language text may be text in an instant messaging message sent by the user to the opposite communication end through the instant messaging application, or the source language text may be text in an instant messaging message received by the user from the opposite communication end through the instant messaging application, or the source language text may also be text input by the user in an input box of the instant messaging application, or the like.
The target language can be obtained according to the setting of the user, or can be obtained according to the historical language of the user, or can also be obtained by intelligent analysis of the input scene. For example, taking an input scene as an instant messaging application as an example, the embodiment of the invention can determine the target language according to the communication message received by the instant messaging application, if the received communication message is English, the target language can be determined to be English, and if the received communication message is changed from English to Japanese, the target language can be automatically switched from English to Japanese, so that the cost of manual operation of a user can be reduced, and the input efficiency of the user can be improved.
Alternatively, a plurality of languages may be provided for the user to select, and the language selected by the user is taken as the target language; the history language may be a language used by the user, and the target language may be obtained from a history language other than the source language.
For the source language text to be translated, the embodiment of the invention can firstly determine the target language text corresponding to the source language text according to the existing translation method. It will be appreciated that the embodiment of the present invention does not limit the specific translation method for determining the target language text corresponding to the source language text. In an optional embodiment of the present invention, the determining the target language text corresponding to the source language text may specifically include: determining a target language text corresponding to the source language text according to the system translation model or the user translation model; the system translation model is used for describing the translation probability from the source language text to the target language text without the user translation preference; the user translation model is used to describe the probability of translation of source language text to target language text with user translation preferences.
After determining the target language text corresponding to the source language text, judging whether target word segmentation exists in the target language text, wherein the target word segmentation is a word segmentation which can be replaced by a target preset word.
In an alternative embodiment of the present invention, the replacement condition may specifically include: the semantics of the target word is related to the semantics of the target preset word, and the use times of the target preset word exceed a first threshold.
The target preset word is related to the semantics of the target word, and the use times of the target preset word exceed a first threshold value, which indicates that the use times of the target preset word by a user are more, so that under the condition of semantic relativity, the user may prefer to use the target preset word rather than the target word, that is, the target preset word accords with the input habit of the user more than the target word, therefore, the target word can be determined to meet the replacement condition, the target word in the target language text can be replaced by the target preset word, and the replaced target language text can be output.
The semantic related concrete may refer to that the target preset word and the target word are the same or similar in semantic meaning, for example, the target preset word and the target word may be synonyms or near-meaning words, or the target preset word and the target word may be spoken words and written words with the same semantic meaning, respectively.
It may be appreciated that the specific value of the first threshold is not limited in the embodiment of the present invention, and those skilled in the art may set the specific value of the first threshold according to actual needs, for example, the first threshold may be set to 20, etc.
In one application example of the present invention, assume that the source language text is acquired in an instant messaging application as "what are you doing? "what is you doing" can be determined as the target language text corresponding to the source language text according to the system translation model? According to the history input record of the user in the instant messaging application, the number of times that the user uses the target preset word "dry instant" related to the semantic meaning of "do what" exceeds the first threshold 20, so that it can be determined that the target word meeting the replacement condition exists in the target language text as "do what", and the target word "do what" can be replaced by the target preset word "dry instant", so that the replaced target language text is "you are in dry instant? The replaced target language text is more in line with the input habit of the user in the instant messaging application.
In an alternative embodiment of the present invention, the method may further include: acquiring a current input environment; the replacement condition may specifically include: the semantics of the target word is related to the semantics of the target preset word, and the association coefficient of the target preset word and the input environment is larger than the association coefficient of the target word and the input environment.
The input environment may specifically include: an application environment in which text can be input is any of an instant messaging application, a text editing application, a news webpage, a game interface and the like.
The embodiment of the invention can acquire the current input environment and judge whether the target word segmentation which is not matched with the current input environment exists in the target language text. Specifically, the embodiment of the invention can collect a large amount of input data of users, and according to the collected input data, the association coefficient between each word and different input environments is counted, and the larger the association coefficient is, the more the word accords with the current input environment.
If the target preset word exists, the target preset word is semantically related to the target word in the target language text, and the association coefficient of the target preset word and the current input environment is larger than that of the target word and the current input environment, the target word which accords with the replacement condition can be determined to exist in the target language text.
The obtained replaced target language text accords with the current input environment. For example, if the current input environment is an instant messaging application, the replaced target language text more accords with the spoken language input habit in the instant messaging application; if the current input environment is a text editing application, the replaced target language text more accords with written language habit and the like in the text editing application.
In an optional embodiment of the present invention, the function of the translation client in the embodiment of the present invention may be applied in an input method program, for example, a button for triggering the translation client may be added in an input interface of the input method program, so that a user may input a source language text through the input method program, and the translation client may be triggered to perform a translation operation through the button, so as to translate the input source language text into a target language text, replace a target word in the target language text, and output a replaced target language text, thereby saving an operation time for the user to search and open the translation client, and further improving the translation efficiency.
It can be appreciated that the translation client may be applied in an input method program, and specifically may include: the functionality of the translation client may be integrated into the input method program or the translation client may be invoked by the input method program to perform the translation function by the translation client, which embodiments of the present invention are not limited in this respect.
In addition, because the input method program has the characteristic of crossing platforms, the input method program can be hosted in any host application, for example, the input method program can be hosted in applications such as instant messaging, mailbox, text editing and the like. The user can trigger the input method program by triggering input operation in the host application, for example, the user can trigger the input method program by clicking a left mouse button in an input box of the host application, and further, the source language text can be input through the input method program, so that the translation efficiency can be further improved.
Furthermore, the embodiment of the invention can also acquire the type of the host application through the input method program, and determine the target preset word corresponding to the target word in the target language text according to the history input record of the user in the host application, so that the replaced target language text accords with the habit of the user for using the host application. For example, if the host application is an instant messaging application, the replaced target language text more accords with the spoken habit of the user; if the host application is a text editing application, the replaced target language text more accords with the written language habit of the user and the like.
In an optional embodiment of the present invention, the determining that the target word segment meeting the replacement condition exists in the target language text may specifically include:
S11, word segmentation is carried out on the target language text to obtain a word segmentation sequence corresponding to the target language text;
Step S12, judging whether candidate preset words related to the semanteme of the word segmentation in the word segmentation sequence exist in the preset word library according to the semanteme corresponding to the word segmentation in the word segmentation sequence and the semanteme corresponding to the preset word in the preset word library, and if so, determining that the target word segmentation meeting the replacement condition exists in the target language text if the candidate preset words exceed a first threshold value in the number of times of using the candidate preset words; wherein, the preset word library comprises preset words and the use times of the preset words;
And S13, determining the candidate preset word with the use times exceeding a first threshold value as a target preset word, and determining the word segmentation related to the semantics of the target preset word in the word segmentation sequence as a target word segmentation.
The embodiment of the invention can establish a preset word stock which can be used for reflecting the input habit of a user, wherein the preset word stock can comprise preset words and the use times of the preset words, the preset words can be history words input by the user, and the use times of the preset words can be the times of the history words input by the user.
Optionally, according to the embodiment of the invention, a preset word stock corresponding to the current user can be established according to the historical words input by the current user, so that the preset word stock can reflect the personalized input habit of the current user; or the embodiment of the invention can also establish a preset word stock corresponding to the whole network user according to all or part of the history words input by the whole network user so that the preset word stock can reflect the general input habit of most users; it will be appreciated that embodiments of the invention are not limited in this regard.
In an optional embodiment of the present invention, the preset word library may be obtained according to a history word input by a user in an instant messaging application and the input times of the history word. Therefore, the preset word stock can reflect the input habit of the user in the instant messaging application.
Alternatively, the embodiment of the invention can respectively establish a preset word stock corresponding to the host application, for example, can establish a preset word stock corresponding to the instant messaging application, or can establish a preset word stock corresponding to the text editing application, and the like. Thus, for different host applications, a preset word stock corresponding to the host application can be queried, for example, when a user uses the instant messaging application, a target language text corresponding to a source language text in the instant messaging application can be queried, so that the queried target preset word accords with the input habit of the user in the instant messaging application; for another example, when the user uses the text editing application, for the target language text corresponding to the source language text in the text editing application, the preset word library corresponding to the text editing application can be queried, so that the target preset word obtained by query accords with the input habit of the user in the text editing application, and the like.
Specifically, in the process of using an instant messaging application by a user, the embodiment of the invention can acquire text content input by the user, and word segmentation is carried out on the text content to obtain historical words in the text content, and if preset words matched with the historical words exist in a preset word bank, the using frequency corresponding to the preset words is increased by 1; if no preset word matched with the history word exists in the preset word stock, the history word is added into the preset word stock as the preset word, and the using frequency of the newly added preset word is recorded to be 1.
The text content input by the user specifically may include: text content that has been previously on-screen at the current cursor position, text content copied by the user, etc. The text content input by the user can be specifically text content input by the user in an instant messaging application and sent to a communication opposite terminal, or text content input by the user in an input environment such as a browser, a document, a microblog, a mail and the like; it will be appreciated that embodiments of the present invention are not limited to a particular source of text content entered by a user.
In addition, the text content input may be text content obtained by converting the voice input by the user, for example, the user may input voice information in an application such as instant messaging, and the embodiment of the present invention may convert the voice information into text content.
In one application example of the present invention, it is assumed that, during the process of using the instant messaging application by the user, the text content input by the user is "you are in jersey? ", word segmentation of the text content may result in the following history words: "you", "at", "dry", "woolen". The preset word library is assumed to be recorded with the following preset words matched with the history words: the numbers in brackets represent the number of uses corresponding to the preset word, and the number of uses of the preset word in the preset word stock can be increased by 1 to obtain the number of uses corresponding to the preset word after the following update: "you (10)", "at (8)", "dry (20)", and "woolen (9)".
In another application example of the present invention, during the process of using the instant messaging application by the user, the text content input by the user is obtained as "what is? What is the case of the onset? ", word segmentation of the text content may result in the following history words: "what is" what is the case ". The preset word library is assumed to be recorded with the following preset words matched with the history words: "what is used in the instant (29)", "what is used in the instant (9)", the number of times of use of the preset word in the preset word stock can be increased by 1, so as to obtain the number of times of use corresponding to the preset word after the following update: "what is meant by (30)", "what is meant by (10)".
After determining a target language text corresponding to a source language text, word segmentation can be performed on the target language text to obtain a word segmentation sequence corresponding to the target language text; the method comprises the steps of carrying out semantic analysis on the word in the word segmentation sequence, querying a preset word library to judge whether candidate word preset words related to the semantics of the word in the word segmentation sequence exist in the preset word library, if so, and the number of times of using the candidate preset words exceeds a first threshold value, indicating that the number of times of using the candidate preset words by a user is more, determining that target word segmentation meeting replacement conditions exists in the target language text, determining that the candidate preset words with the number of times exceeding the first threshold value are target preset words, and determining that the word in the word segmentation sequence related to the semantics of the target preset words are target word segmentation.
In practical application, there may be a plurality of candidate preset words related to the semantics of the word segmentation in the word segmentation sequence in the preset word library, in this case, it may be further determined whether the number of times of use of the plurality of candidate preset words exceeds a first threshold, and the candidate preset word whose number of times of use exceeds the first threshold is taken as a target preset word, and if the number of candidate preset words whose number of times of use exceeds the first threshold is greater than 1, the candidate preset word whose number of times of use is the highest may be taken as the target preset word.
Alternatively, if the number of candidate preset words whose number of uses exceeds the first threshold is greater than 1 and there are a plurality of candidate preset words whose number of uses is the same, a candidate preset word whose number of uses exceeds the first threshold and whose number of characters is the smallest may be regarded as the target preset word. Because the probability of using the spoken language by the user in the instant messaging application is higher, and the spoken language generally has the characteristics of short sentence and easy understanding, in this case, the candidate preset word with the least characters is selected from the plurality of candidate preset words with the use times exceeding the first threshold as the target preset word, so that the determined target preset word better accords with the spoken language expression habit of the user in the instant messaging application.
In an optional embodiment of the present invention, the preset word library may include a word segment, a preset word, and a mapping relationship between semantic relationship values, and the determining whether a candidate preset word related to the semantic meaning of the word segment in the word segment sequence exists in the preset word library may specifically include:
S21, inquiring a mapping relation in the preset word stock according to the word in the word segmentation sequence to obtain a target mapping relation matched with the word in the word segmentation sequence;
Step S22, if the semantic relation value in the target mapping relation exceeds a second threshold value, determining that candidate preset words related to the semantics of the word segmentation in the word segmentation sequence exist in the preset word library, and determining that the preset words in the target mapping relation are candidate preset words.
In order to improve the accuracy and the query efficiency of the query target preset words, the embodiment of the invention can store the mapping relation among the segmentation, the preset words and the semantic relation values in the preset word library. Referring to table 1, a specific illustration of a mapping relationship between a segmentation word, a preset word, and a semantic relationship value according to the present invention is shown.
TABLE 1
Sequence number Word segmentation Preset word Semantic relationship values
1 What is Instant yarn 95
2 Mother's mother Mother's body 99
3 Intelligent Smart 80
Wherein the semantic relation value can be used to represent the degree of semantic relatedness between the segmented word and the preset word, e.g., the higher the semantic relation value, the higher the degree of semantic relatedness is explained. It will be appreciated that the semantic relationship values shown in table 1 are only an application example of the present invention, and the data types and specific values of the semantic relationship values are not limited in the embodiments of the present invention. If the semantic relation value of the segmentation word and the preset word exceeds a second threshold value, the semantics of the segmentation word and the preset word are the same or similar, and the segmentation word and the preset word can be replaced. It will be appreciated that the specific value of the second threshold is not limited in the embodiment of the present invention, for example, the second threshold may be set to 80 or the like.
In one example application of the present invention, assume that the source language text obtained in an instant messaging application is "what is? "what is" the target language text corresponding to the source language text can be determined according to the system translation model? Firstly, word segmentation can be carried out on the target language text to obtain what is the word segmentation sequence corresponding to the target language text, wherein the word segmentation sequence only comprises what is one word segmentation, and according to the word segmentation, the mapping relation in a preset word bank is queried to obtain a target mapping relation matched with the word segmentation, namely, the mapping relation with the sequence number of 1 in the table 1; the semantic relation value in the target mapping relation is 95 and exceeds the second threshold value 80, it can be determined that candidate preset words related to the semantics of the word segmentation in the word segmentation sequence exist in the preset word library, and the preset words "what is in the target mapping relation" are determined to be the candidate preset words. Because the number of times of usage corresponding to the preset word "what" stored in the preset word stock is 30 and exceeds the first threshold 20, the candidate preset word "what" can be determined as the target preset word, and what is the word related to the semantics of the target preset word in the word segmentation sequence can be determined as the target word, so that what is the target word can be replaced by what is the target preset word, and the text of the replaced target language is: instant what is the onset? Finally, the replaced target text can be output to the user, so that the final translation result accords with the input habit of the user.
In summary, after determining a target language text corresponding to a source language text, the embodiment of the invention can further determine whether a target word segment meeting a replacement condition exists in the target language text, if so, replace the target word segment with a target preset word to obtain a replaced target language text, and output the replaced target language text. Because the semantics of the target word is related to the semantics of the target preset word, and the use times of the target preset word exceeds a first threshold value, the user is more inclined to use the target preset word instead of the target word under the condition that the semantics are the same or similar, and therefore the target word in the target language text is replaced by the target preset word, so that the replaced target language text accords with the input habit of the user better and the actual requirement of the user better.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.
Device embodiment
With reference to fig. 2, a block diagram illustrating an embodiment of a data processing apparatus according to the present invention may specifically include:
A determining module 201, configured to determine a target language text corresponding to the source language text;
a replacing module 202, configured to replace, in the case that a target word segment that meets a replacing condition exists in the target language text, the target word segment with a target preset word, so as to obtain a replaced target language text;
And the output module 203 is configured to output the replaced target language text.
Optionally, the determining module 201 may specifically include:
The word segmentation sub-module is used for segmenting the target language text to obtain a word segmentation sequence corresponding to the target language text;
The judging sub-module is used for judging whether candidate preset words related to the semanteme of the word segmentation in the word segmentation sequence exist in the preset word library according to the semanteme corresponding to the word segmentation in the word segmentation sequence and the semanteme corresponding to the preset word in the preset word library, and if the candidate preset words exist, and the using times of the candidate preset words exceed a first threshold, determining that target word segmentation meeting the replacement condition exists in the target language text; wherein, the preset word library comprises preset words and the use times of the preset words;
the first determining sub-module is used for determining the candidate preset word with the using times exceeding a first threshold value as a target preset word and determining the word related to the semantics of the target preset word in the word segmentation sequence as a target word.
Optionally, the first determining sub-module may specifically include:
a first determining unit, configured to determine, from among the candidate preset words whose usage number exceeds the first threshold, a candidate preset word having the largest usage number as a target preset word if the number of candidate preset words whose usage number exceeds the first threshold is greater than 1; or alternatively
And the second determining unit is used for determining the candidate preset word with the shortest character number as the target preset word from the candidate preset words with the same use times and the use times exceeding the first threshold value if the candidate preset words with the same use times and the use times exceeding the first threshold value exist.
Optionally, the judging submodule specifically may include:
The query unit is used for querying the mapping relation in the preset word stock according to the word in the word segmentation sequence so as to obtain a target mapping relation matched with the word in the word segmentation sequence;
And the third determining unit is used for determining that candidate preset words related to the semantics of the word segmentation in the word segmentation sequence exist in the preset word bank if the semantic relation value in the target mapping relation exceeds a second threshold value, and determining that the preset words in the target mapping relation are candidate preset words.
Optionally, the preset word library is obtained according to a history word input by a user in the instant messaging application and the input times of the history word.
Optionally, the source language text is obtained according to an instant messaging message obtained from an instant messaging application.
Optionally, the determining module 201 may specifically include:
The second determining submodule is used for determining a target language text corresponding to the source language text according to the system translation model or the user translation model; the system translation model is used for describing the translation probability from the source language text to the target language text without the user translation preference; the user translation model is used to describe the probability of translation of source language text to target language text with user translation preferences.
Alternatively, the replacing condition may specifically include:
the semantics of the target word is related to the semantics of the target preset word, and the using times of the target preset word exceeds a first threshold; and/or
The semantics of the target word is related to the semantics of the target preset word, and the association coefficient of the target preset word and the input environment is larger than the association coefficient of the target word and the input environment.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
An embodiment of the present invention provides an apparatus for data processing, including a memory, and one or more programs, wherein the one or more programs are stored in the memory, and configured to be executed by one or more processors, the one or more programs comprising instructions for: determining a target language text corresponding to the source language text; if the target word segment meeting the replacement condition exists in the target language text, replacing the target word segment with a target preset word to obtain a replaced target language text; and outputting the replaced target language text.
Fig. 3 is a block diagram illustrating an apparatus 800 for data processing according to an example embodiment. For example, apparatus 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.
Referring to fig. 3, apparatus 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the apparatus 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing element 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the device 800. Examples of such data include instructions for any application or method operating on the device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power supply component 806 provides power to the various components of the device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.
The multimedia component 808 includes a screen between the device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice data processing mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the apparatus 800. For example, the sensor assembly 814 may detect an on/off state of the device 800, a relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or one component of the apparatus 800, the presence or absence of user contact with the apparatus 800, an orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communication between the apparatus 800 and other devices, either in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on radio frequency data processing (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including instructions executable by processor 820 of apparatus 800 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
Fig. 4 is a schematic diagram of a server in some embodiments of the invention. The server 1900 may vary considerably in configuration or performance and may include one or more central processing units (central processing units, cpus) 1922 (e.g., one or more processors) and memory 1932, one or more storage mediums 1930 (e.g., one or more mass storage devices) that store applications 1942 or data 1944. Wherein the memory 1932 and storage medium 1930 may be transitory or persistent. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, a central processor 1922 may be provided in communication with a storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.
The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.
A non-transitory computer readable storage medium, which when executed by a processor of an apparatus (server or terminal) enables the apparatus to perform the data processing method shown in fig. 1.
A non-transitory computer readable storage medium, which when executed by a processor of an apparatus (server or terminal), causes the apparatus to perform a data processing method, the method comprising: determining a target language text corresponding to the source language text; if the target word segment meeting the replacement condition exists in the target language text, replacing the target word segment with a target preset word to obtain a replaced target language text; and outputting the replaced target language text.
The embodiment of the invention discloses A1, a data processing method, which comprises the following steps:
determining a target language text corresponding to the source language text;
If the target word segment meeting the replacement condition exists in the target language text, replacing the target word segment with a target preset word to obtain a replaced target language text;
and outputting the replaced target language text.
A2, determining that the target word meeting the replacement condition exists in the target language text according to the method of A1, wherein the method comprises the following steps:
Word segmentation is carried out on the target language text so as to obtain a word segmentation sequence corresponding to the target language text;
Judging whether candidate preset words related to the semantics of the word segmentation in the word segmentation sequence exist in the preset word library according to the semantics corresponding to the word segmentation in the word segmentation sequence and the semantics corresponding to the preset words in the preset word library, and if so, determining that the target word segmentation meeting the replacement condition exists in the target language text if the number of times of using the candidate preset words exceeds a first threshold; wherein, the preset word library comprises preset words and the use times of the preset words;
And determining the candidate preset words with the using times exceeding a first threshold as target preset words, and determining the word segmentation related to the semantics of the target preset words in the word segmentation sequence as target word segmentation.
A3, according to the method of A2, the determining that the candidate preset word with the usage frequency exceeding the first threshold is the target preset word includes:
If the number of the candidate preset words with the using times exceeding the first threshold is larger than 1, determining the candidate preset word with the largest using times as a target preset word from the candidate preset words with the using times exceeding the first threshold; or alternatively
If a plurality of candidate preset words with the same use times and the use times exceeding a first threshold value exist, determining the candidate preset word with the shortest character number as a target preset word from the plurality of candidate preset words with the same use times and the use times exceeding the first threshold value.
A4, according to the method of A2, the preset word library includes word segmentation, preset words and a mapping relation between semantic relation values, and the judging whether the preset word library has candidate preset words related to the semantics of the word segmentation in the word segmentation sequence includes:
inquiring the mapping relation in the preset word stock according to the word in the word sequence to obtain a target mapping relation matched with the word in the word sequence;
If the semantic relation value in the target mapping relation exceeds a second threshold value, determining that candidate preset words related to the semantics of the word segmentation in the word segmentation sequence exist in the preset word library, and determining that the preset words in the target mapping relation are candidate preset words.
A5, according to the method of any one of A2 to A4, the preset word stock is obtained according to the history words input by the user in the instant messaging application and the input times of the history words.
A6, according to the method of any one of A1 to A4, the source language text is obtained according to the instant messaging information obtained from the instant messaging application.
A7, the method according to any one of A1 to A4, the substitution condition comprising:
the semantics of the target word is related to the semantics of the target preset word, and the using times of the target preset word exceeds a first threshold; and/or
The semantics of the target word is related to the semantics of the target preset word, and the association coefficient of the target preset word and the input environment is larger than the association coefficient of the target word and the input environment.
The embodiment of the invention discloses a B8 data processing device, which comprises:
the determining module is used for determining a target language text corresponding to the source language text;
the replacing module is used for replacing the target word with a target preset word under the condition that the target word meeting the replacing condition exists in the target language text, so that the replaced target language text is obtained;
And the output module is used for outputting the replaced target language text.
B9, the apparatus of B8, the determining module includes:
The word segmentation sub-module is used for segmenting the target language text to obtain a word segmentation sequence corresponding to the target language text;
The judging sub-module is used for judging whether candidate preset words related to the semanteme of the word segmentation in the word segmentation sequence exist in the preset word library according to the semanteme corresponding to the word segmentation in the word segmentation sequence and the semanteme corresponding to the preset word in the preset word library, and if the candidate preset words exist, and the using times of the candidate preset words exceed a first threshold, determining that target word segmentation meeting the replacement condition exists in the target language text; wherein, the preset word library comprises preset words and the use times of the preset words;
the first determining sub-module is used for determining the candidate preset word with the using times exceeding a first threshold value as a target preset word and determining the word related to the semantics of the target preset word in the word segmentation sequence as a target word.
B10, the apparatus of B9, the first determination submodule comprising:
a first determining unit, configured to determine, from among the candidate preset words whose usage number exceeds the first threshold, a candidate preset word having the largest usage number as a target preset word if the number of candidate preset words whose usage number exceeds the first threshold is greater than 1; or alternatively
And the second determining unit is used for determining the candidate preset word with the shortest character number as the target preset word from the candidate preset words with the same use times and the use times exceeding the first threshold value if the candidate preset words with the same use times and the use times exceeding the first threshold value exist.
B11, the device according to B9, the judging sub-module includes:
The query unit is used for querying the mapping relation in the preset word stock according to the word in the word segmentation sequence so as to obtain a target mapping relation matched with the word in the word segmentation sequence;
And the third determining unit is used for determining that candidate preset words related to the semantics of the word segmentation in the word segmentation sequence exist in the preset word bank if the semantic relation value in the target mapping relation exceeds a second threshold value, and determining that the preset words in the target mapping relation are candidate preset words.
B12, according to the device of any one of B9 to B11, the preset word stock is obtained according to the history words input by the user in the instant messaging application and the input times of the history words.
B13, according to the device of any one of B8 to B11, the source language text is obtained according to the instant messaging information obtained from the instant messaging application.
B14, the apparatus of any one of B8 to B11, the replacement condition comprising:
the semantics of the target word is related to the semantics of the target preset word, and the using times of the target preset word exceeds a first threshold; and/or
The semantics of the target word is related to the semantics of the target preset word, and the association coefficient of the target preset word and the input environment is larger than the association coefficient of the target word and the input environment.
The embodiment of the invention discloses a C15, a device for data processing, which comprises a memory and one or more programs, wherein the one or more programs are stored in the memory, and are configured to be executed by one or more processors, and the one or more programs comprise instructions for:
determining a target language text corresponding to the source language text;
If the target word segment meeting the replacement condition exists in the target language text, replacing the target word segment with a target preset word to obtain a replaced target language text;
and outputting the replaced target language text.
C16, the device according to C15, the determining that the target word segment meeting the replacement condition exists in the target language text includes:
Word segmentation is carried out on the target language text so as to obtain a word segmentation sequence corresponding to the target language text;
Judging whether candidate preset words related to the semantics of the word segmentation in the word segmentation sequence exist in the preset word library according to the semantics corresponding to the word segmentation in the word segmentation sequence and the semantics corresponding to the preset words in the preset word library, and if so, determining that the target word segmentation meeting the replacement condition exists in the target language text if the number of times of using the candidate preset words exceeds a first threshold; wherein, the preset word library comprises preset words and the use times of the preset words;
And determining the candidate preset words with the using times exceeding a first threshold as target preset words, and determining the word segmentation related to the semantics of the target preset words in the word segmentation sequence as target word segmentation.
C17, according to the apparatus of C16, the determining that the candidate preset word whose usage number exceeds the first threshold is the target preset word includes:
If the number of the candidate preset words with the using times exceeding the first threshold is larger than 1, determining the candidate preset word with the largest using times as a target preset word from the candidate preset words with the using times exceeding the first threshold; or alternatively
If a plurality of candidate preset words with the same use times and the use times exceeding a first threshold value exist, determining the candidate preset word with the shortest character number as a target preset word from the plurality of candidate preset words with the same use times and the use times exceeding the first threshold value.
C18, according to the device of C16, the mapping relation among word segmentation, preset words and semantic relation values is included in the preset word library, and the judging whether the candidate preset words related to the semantics of the word segmentation in the word segmentation sequence exist in the preset word library includes:
inquiring the mapping relation in the preset word stock according to the word in the word sequence to obtain a target mapping relation matched with the word in the word sequence;
If the semantic relation value in the target mapping relation exceeds a second threshold value, determining that candidate preset words related to the semantics of the word segmentation in the word segmentation sequence exist in the preset word library, and determining that the preset words in the target mapping relation are candidate preset words.
And C19, according to the device of any one of C16 to C18, the preset word stock is obtained according to the history words input by the user in the instant messaging application and the input times of the history words.
C20, according to any one of the devices from C15 to C18, the source language text is obtained according to instant messaging information obtained from instant messaging application.
C21, the apparatus of any one of C15 to C18, the replacement conditions comprising:
the semantics of the target word is related to the semantics of the target preset word, and the using times of the target preset word exceeds a first threshold; and/or
The semantics of the target word is related to the semantics of the target preset word, and the association coefficient of the target preset word and the input environment is larger than the association coefficient of the target word and the input environment.
Embodiments of the invention disclose D22, a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform a data processing method as described in one or more of A1 to A7.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.
The foregoing has outlined a data processing method, a data processing device and a device for data processing in detail, wherein specific examples are provided herein to illustrate the principles and embodiments of the present invention, and the above examples are provided to assist in understanding the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (16)

1. A method of data processing, the method comprising:
Determining a target language text corresponding to the source language text; wherein the target language text is the translated source language text;
If the target word segment meeting the replacement condition exists in the target language text, replacing the target word segment with a target preset word to obtain a replaced target language text;
Outputting the replaced target language text;
Wherein the determining that the target word segment meeting the replacement condition exists in the target language text comprises the following steps:
Word segmentation is carried out on the target language text so as to obtain a word segmentation sequence corresponding to the target language text;
Judging whether candidate preset words related to the semantics of the word segmentation in the word segmentation sequence exist in the preset word library according to the semantics corresponding to the word segmentation in the word segmentation sequence and the semantics corresponding to the preset words in the preset word library, and if so, determining that the target word segmentation meeting the replacement condition exists in the target language text if the number of times of using the candidate preset words exceeds a first threshold; wherein, the preset word library comprises preset words and the use times of the preset words;
Determining the candidate preset words with the using times exceeding a first threshold as target preset words, and determining the word segmentation related to the semantics of the target preset words in the word segmentation sequence as target word segmentation;
wherein the replacement condition includes: the semantics of the target word is related to the semantics of the target preset word, and the using times of the target preset word exceeds a first threshold; and/or
The semantics of the target word is related to the semantics of the target preset word, and the association coefficient of the target preset word and the input environment is larger than the association coefficient of the target word and the input environment.
2. The method according to claim 1, wherein determining the candidate preset word whose number of uses exceeds a first threshold as the target preset word comprises:
If the number of the candidate preset words with the using times exceeding the first threshold is larger than 1, determining the candidate preset word with the largest using times as a target preset word from the candidate preset words with the using times exceeding the first threshold; or alternatively
If a plurality of candidate preset words with the same use times and the use times exceeding a first threshold value exist, determining the candidate preset word with the shortest character number as a target preset word from the plurality of candidate preset words with the same use times and the use times exceeding the first threshold value.
3. The method according to claim 1, wherein the preset word library includes word segments, preset words, and mapping relationships between semantic relationship values, and the determining whether there are candidate preset words in the preset word library related to the semantics of the word segments in the word segment sequence includes:
inquiring the mapping relation in the preset word stock according to the word in the word sequence to obtain a target mapping relation matched with the word in the word sequence;
If the semantic relation value in the target mapping relation exceeds a second threshold value, determining that candidate preset words related to the semantics of the word segmentation in the word segmentation sequence exist in the preset word library, and determining that the preset words in the target mapping relation are candidate preset words.
4. A method according to any one of claims 1 to 3, wherein the preset word stock is obtained from a history word input by a user in an instant messaging application and the number of inputs of the history word.
5. A method according to any one of claims 1 to 3, wherein the source language text is derived from instant messaging messages obtained from an instant messaging application.
6. A data processing apparatus, comprising:
The determining module is used for determining a target language text corresponding to the source language text; wherein the target language text is the translated source language text;
the replacing module is used for replacing the target word with a target preset word under the condition that the target word meeting the replacing condition exists in the target language text, so that the replaced target language text is obtained;
the output module is used for outputting the replaced target language text;
Wherein, the determining module includes:
The word segmentation sub-module is used for segmenting the target language text to obtain a word segmentation sequence corresponding to the target language text;
The judging sub-module is used for judging whether candidate preset words related to the semanteme of the word segmentation in the word segmentation sequence exist in the preset word library according to the semanteme corresponding to the word segmentation in the word segmentation sequence and the semanteme corresponding to the preset word in the preset word library, and if the candidate preset words exist, and the using times of the candidate preset words exceed a first threshold, determining that target word segmentation meeting the replacement condition exists in the target language text; wherein, the preset word library comprises preset words and the use times of the preset words;
The first determining submodule is used for determining the candidate preset words with the using times exceeding a first threshold value as target preset words and determining word fragments related to the semantics of the target preset words in the word fragment sequence as target word fragments;
wherein the replacement condition includes:
the semantics of the target word is related to the semantics of the target preset word, and the using times of the target preset word exceeds a first threshold; and/or
The semantics of the target word is related to the semantics of the target preset word, and the association coefficient of the target preset word and the input environment is larger than the association coefficient of the target word and the input environment.
7. The apparatus of claim 6, wherein the first determination submodule comprises:
a first determining unit, configured to determine, from among the candidate preset words whose usage number exceeds the first threshold, a candidate preset word having the largest usage number as a target preset word if the number of candidate preset words whose usage number exceeds the first threshold is greater than 1; or alternatively
And the second determining unit is used for determining the candidate preset word with the shortest character number as the target preset word from the candidate preset words with the same use times and the use times exceeding the first threshold value if the candidate preset words with the same use times and the use times exceeding the first threshold value exist.
8. The apparatus of claim 6, wherein the determination submodule comprises:
The query unit is used for querying the mapping relation in the preset word stock according to the word in the word segmentation sequence so as to obtain a target mapping relation matched with the word in the word segmentation sequence;
And the third determining unit is used for determining that candidate preset words related to the semantics of the word segmentation in the word segmentation sequence exist in the preset word bank if the semantic relation value in the target mapping relation exceeds a second threshold value, and determining that the preset words in the target mapping relation are candidate preset words.
9. The apparatus according to any one of claims 6 to 8, wherein the preset word stock is obtained according to a history word input by a user in an instant messaging application and the number of inputs of the history word.
10. The apparatus according to any one of claims 6 to 8, wherein the source language text is derived from instant messaging messages obtained from an instant messaging application.
11. An apparatus for data processing comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:
Determining a target language text corresponding to the source language text; wherein the target language text is the translated source language text;
If the target word segment meeting the replacement condition exists in the target language text, replacing the target word segment with a target preset word to obtain a replaced target language text;
Outputting the replaced target language text;
Wherein the determining that the target word segment meeting the replacement condition exists in the target language text comprises the following steps:
Word segmentation is carried out on the target language text so as to obtain a word segmentation sequence corresponding to the target language text;
Judging whether candidate preset words related to the semantics of the word segmentation in the word segmentation sequence exist in the preset word library according to the semantics corresponding to the word segmentation in the word segmentation sequence and the semantics corresponding to the preset words in the preset word library, and if so, determining that the target word segmentation meeting the replacement condition exists in the target language text if the number of times of using the candidate preset words exceeds a first threshold; wherein, the preset word library comprises preset words and the use times of the preset words;
Determining the candidate preset words with the using times exceeding a first threshold as target preset words, and determining the word segmentation related to the semantics of the target preset words in the word segmentation sequence as target word segmentation;
wherein the replacement condition includes:
the semantics of the target word is related to the semantics of the target preset word, and the using times of the target preset word exceeds a first threshold; and/or
The semantics of the target word is related to the semantics of the target preset word, and the association coefficient of the target preset word and the input environment is larger than the association coefficient of the target word and the input environment.
12. The apparatus of claim 11, wherein the determining that the candidate preset word having the number of uses exceeding the first threshold is the target preset word comprises:
If the number of the candidate preset words with the using times exceeding the first threshold is larger than 1, determining the candidate preset word with the largest using times as a target preset word from the candidate preset words with the using times exceeding the first threshold; or alternatively
If a plurality of candidate preset words with the same use times and the use times exceeding a first threshold value exist, determining the candidate preset word with the shortest character number as a target preset word from the plurality of candidate preset words with the same use times and the use times exceeding the first threshold value.
13. The apparatus of claim 11, wherein the preset word library includes a word segment, preset words, and a mapping relationship between semantic relationship values, and wherein the determining whether the preset word library includes candidate preset words related to the semantics of the word segment in the word segment sequence comprises:
inquiring the mapping relation in the preset word stock according to the word in the word sequence to obtain a target mapping relation matched with the word in the word sequence;
If the semantic relation value in the target mapping relation exceeds a second threshold value, determining that candidate preset words related to the semantics of the word segmentation in the word segmentation sequence exist in the preset word library, and determining that the preset words in the target mapping relation are candidate preset words.
14. The apparatus according to any one of claims 11 to 13, wherein the preset word stock is obtained according to a history word input by a user in an instant messaging application and the number of inputs of the history word.
15. The apparatus according to any one of claims 11 to 13, wherein the source language text is derived from instant messaging messages obtained from an instant messaging application.
16. A machine readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the data processing method of one or more of claims 1 to 5.
CN201910046779.3A 2019-01-18 2019-01-18 Data processing method and device for data processing Active CN111460836B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910046779.3A CN111460836B (en) 2019-01-18 2019-01-18 Data processing method and device for data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910046779.3A CN111460836B (en) 2019-01-18 2019-01-18 Data processing method and device for data processing

Publications (2)

Publication Number Publication Date
CN111460836A CN111460836A (en) 2020-07-28
CN111460836B true CN111460836B (en) 2024-04-19

Family

ID=71684919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910046779.3A Active CN111460836B (en) 2019-01-18 2019-01-18 Data processing method and device for data processing

Country Status (1)

Country Link
CN (1) CN111460836B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649288A (en) * 2016-12-12 2017-05-10 北京百度网讯科技有限公司 Translation method and device based on artificial intelligence
CN107148624A (en) * 2015-06-22 2017-09-08 电子部品研究院 The method of preprocessed text and the pretreatment system for performing this method
CN107193807A (en) * 2017-05-12 2017-09-22 北京百度网讯科技有限公司 Language conversion processing method, device and terminal based on artificial intelligence
CN107564526A (en) * 2017-07-28 2018-01-09 北京搜狗科技发展有限公司 Processing method, device and machine readable media

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107148624A (en) * 2015-06-22 2017-09-08 电子部品研究院 The method of preprocessed text and the pretreatment system for performing this method
CN106649288A (en) * 2016-12-12 2017-05-10 北京百度网讯科技有限公司 Translation method and device based on artificial intelligence
CN107193807A (en) * 2017-05-12 2017-09-22 北京百度网讯科技有限公司 Language conversion processing method, device and terminal based on artificial intelligence
CN107564526A (en) * 2017-07-28 2018-01-09 北京搜狗科技发展有限公司 Processing method, device and machine readable media

Also Published As

Publication number Publication date
CN111460836A (en) 2020-07-28

Similar Documents

Publication Publication Date Title
CN107564526B (en) Processing method, apparatus and machine-readable medium
CN107621886B (en) Input recommendation method and device and electronic equipment
CN110391966B (en) Message processing method and device and message processing device
EP3734472A1 (en) Method and device for text processing
CN107424612B (en) Processing method, apparatus and machine-readable medium
CN110633017A (en) Input method, input device and input device
CN108241614B (en) Information processing method and device, and device for information processing
CN107784037B (en) Information processing method and device, and device for information processing
CN111832297A (en) Part-of-speech tagging method and device and computer-readable storage medium
CN111324214B (en) Statement error correction method and device
CN109992790B (en) Data processing method and device for data processing
CN109144286B (en) Input method and device
CN111460836B (en) Data processing method and device for data processing
CN111414766B (en) Translation method and device
CN108345590B (en) Translation method, translation device, electronic equipment and storage medium
CN108073566B (en) Word segmentation method and device and word segmentation device
CN110110292B (en) Data processing method and device for data processing
CN112667124A (en) Information processing method and device and information processing device
CN108983992B (en) Candidate item display method and device with punctuation marks
CN113221030A (en) Recommendation method, device and medium
CN112181163A (en) Input method, input device and input device
CN110929122A (en) Data processing method and device and data processing device
CN110765338A (en) Data processing method and device and data processing device
CN113010768B (en) Data processing method and device for data processing
CN110020244B (en) Method and device for correcting website information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant